Refactoring Databases P2: Cheating a Little

In this post I am going to suggest doing a little design a little upfront and violating the purity of agility in the process. I think that you will see the sense.

To be fair, I do not think that what I am going to say in this post is particularly original… But in my admittedly weak survey of agile methods I did not find these ideas clearly stated… So apologies up front to those who figured this out before me… And to those readers who know this already. In other words, I am assuming that enough of you database geeks are like me, just becoming agile literate, and may find this useful.

In my prior post here I suggested that refactoring, re-working previous stuff due to the lack of design, was the price we pay to avoid over-engineering in an agile project. Now I am going to suggest that some design could eliminate some refactoring to the overall benefit of the project. In particular, I am going to suggest a little database design in advance as a little cheat.

Generally an agile project progresses by picking a set of user stories from a prioritized backlog of stories and tackling development of code for those stories in a series of short, two-week, sprints.

Since design happens in real-time during the sprints it can be uncoordinated… And code or schemas designed this way are refactored in subsequent sprints. Depending on how uncoordinated the schemas become the refactoring can require a significant effort. If the coders are not data folks… And worse, they are abstracted away from the schema via an ORM layer, the schema can become very silly.

Here is what we are trying to do at the Social Security Administration.

In the best case we would build a conceptual data model before the first sprint. This conceptual model would only define the 5-10 major entities in the system… Something highly likely to stand up over time. Sprint teams would then have the ability to agile define new objects within this conceptual framework… And require permission only when new concepts are required.

Then, and this is a better case, we have data modelers working 1-2 sprints ahead of the coders so that there is a fairly detailed model to pin data to. This better case requires the prioritized backlog to be set 1-2 sprints in advance… A reasonable, but not certain, assumption.

Finally, we are hoping to provide developers with more than just a data model… What we really want is to provide an object model with basic CRUD methods in advance. This provides developers with a very strong starting point for their sprints and let’s the data/object architecture to evolve in a more methodological manner.

Let me be clear that this is a slippery slope. I am suggesting that data folks can work 2-4 weeks ahead of the coders and still be very agile. Old school enterprise data modelers will argue… Why not be way ahead and prescribe adherence to an enterprise model. You all will have to manage this slope as you see fit.

We are seeing improvement in the velocity and quality of the code coming from our agile projects… And in agile… code is king. In the new world an enterprise data model evolves evolve based on multiple application models and enterprise data modelers need to find a way to influence, not dictate, data architecture in an agile manner.

10 thoughts on “Refactoring Databases P2: Cheating a Little”

  1. Rob, I think there are two important methodological components which need improvement in order to be again, and entrust in the ability to refactor. First – a solid architecture and set of APIs is needed, one that describes how processes interact. While we may think that architecture is the last thing to invest in, making architecture changes is the most expensive think which can be done, so some effort must be applied to the framework. Second, is that the infrastructure of coding – testing and deployment, integrated use cases, devops (admittedly a culture, not a thing) – all the errata of “delivering” – needs to be fully automated and itself agile with a low latency. We tend to focus on maximum flexibility, but in fact – infrastructure must exist in order to be agile. Without well thought out development infrastructure and process, agile is just a rationalization to ignore important parts of any development process… Put another way – we moved some process responsibility around so we can make wiser and more timely decisions.

    1. Agile is based on the assumption that architecture built first, on a journey with an unknown endpoint, will usually be wrong. Agile is based on the fact that many many projects with up front architecture go badly.

      The idea that architecture is the answer has not proven out over time and over hundreds of projects.

      If agile is a rationalization then it is a new rationalization, Michael. The old school rationalization is that an extensive design phase leads to better results.

      I’m going to try failing the new way for awhile.

      1. For the record, I’m a huge fan of Agile. I’ve also learned some hard lessons, most egregious is cutting corners in the name of agile… The point I make about architecture, is that in order for multiple groups to be successful, they have to have agreements and contracts between them – these are typically architectural constructs. Put another way – if you don’t draw some black boxes, there will be an overwhelming amount of chaos. Need to change something – do it. Get those frameworks and boundaries described in your user stories and epics.

        As a side note, I would assert that any project with an unknown endpoint is destined to fail. Even in Agile you want to have a firm idea of your end state – it’s the journey which goes undefined. I meant to, didn’t get to this comment the first time out – scaffolding the data design as you describe is a great methodology, you can apply it in many many ways – not just to data, but to API and complex front ends as well.

      2. Been thinking about this, Michael…

        I think that when you build a product you have a pretty good idea of the end… So architecture/design up front is appropriate.

        When you build a data warehouse you are often building a general capability that is designed to allow every use case you can think of to be served. So even though new use cases and data are added they click on to a sound general design…. So architecture/design up front is appropriate.

        When you build a business application with that is not general we tend to optimize rather than generalize… And the business requirements to be optimized tend to change… So we can only incrementally optimize and refactor. In this case upfront design is often a waste and incremental design ala agile could be a better approach.

        We are not too far apart, actually… As I am suggesting a little more design work… Just-in-time, not up front, to reduce refactoring.

  2. Clearly what we’re talking about is a balance between planning ahead and executing now. Architecture that did not “prove out over time” is poor architecture, probably over-architecting beyond the known requirements (oops, sorry, “user story”). But what is needed, and what Rob is moving his readers toward, is enough forward planning to avoid really expensive re-work later.

  3. Any project that has had scope change mid-stream started with an unknown endpoint even though the architects thought otherwise.

  4. The JIT approach you suggest is interesting. One note of caution, since agile/refactoring usually takes place in minimal data payloads, there is a scale factor involved in refactoring when it comes to say 5 to 6TB or beyond environments. I’d be interested to know your take on agile projects not built from scratch with say mock data.

  5. “Sprint teams would then have the ability to agile define new objects within this conceptual framework… And require permission only when new concepts are required.”

    This all makes sense. What guidance would you give to differentiate between new objects within a framework vs. new concepts? Would a new independent entity represent a new concept?

    1. I suppose that ‘concepts’ are usually ‘entities’. I was trying to use a term that was neither relational nor OO specific… And I was trying to suggest that multiple tables… Which could be entities… Could fit under a single concept. In other words I am being a little vague on purpose to provide a little wiggle room in how you might use this thinking.

      Sorry for the vagueness… But hope it helps…

Comments are closed.

%d bloggers like this: