Refactoring Databases P2.75: REST and ORB Thoughts

I’m riffing on database design and modeling in an agile methodology. In the previous post here I suggested that we might cheat a little and design database schemas 2-4 weeks in advance of their use… further, I suggested that to maximize agility we should limit the design to a conceptual schema. In this post I’m going to limit the scope little more by considering the use of a database in a restful application.

I am not going to fully define the REST architecture here… but if you are a systems architect you should know this inside and out… what I will say is that sometimes, in order to build a RESTful application, you will find yourself using the database to store the application state. What I want to say here is that when the application stores state in a database that must persist across boundaries in your business process… but that is not required to persist across business processes… then you do not need to model this. Let the programmers do their worst. In this case database tables play the role of working storage (a very old term that dates me… at least I did not say “Data Division”)… and programmers need to be completely free to add and subtract data elements in their programs as required.

I also have a question for the architects out there…

When programmers touch relational data they typically go through one or more abstractions: maybe they call an XML-based RESTful web service… or maybe they write the service themselves and in the service call an ORB-thingie like Hibernate. I’ve seen terrible schema signs that result from programmers building to an object model without looking at the resulting relational model out of the other end. So… when we assign data architects to build a relational schema to underlie an object-oriented programming language… should we architect up the stack and deliver the relational schema, the ORB layer, and/or the RESTful CRUD services for the objects? We are starting down that path… but I thought that I would ask…

Refactoring Databases P2.5: Scoping the Cheats

One of the side-effects of the little cheat posted here is that, if we are going to design early we have to decide what we will design early… and this question has two complications. First, we have to ask ourselves how detailed do we design before the result becomes un-agile? Next, we have to ask ourselves if we should design up the stack a little? My opinions will be doggedly blogged in this post.

I will offer two ends of a spectrum to suggest a way to manage the scope of your advance design cheat. Let me remind you that the cheat suggests that you look at the user stories that will be sprinted on next and devise the schema required from those stories… and maybe refactor the existing schema a little (more in the next post) no more than that.

On one side we may develop a complete design with every detail specified: subject areas, tables, columns, data types, and domains. The advantage here is that the code developers have a spec to code to and these could increase velocity. But the down side is that developers will be working with users to adjust the code in real-time. If the schema does not fit the adjustments then you may be refactoring the new stuff and velocity may decrease.

The other side of the spectrum would have database designers just build a skeleton; a conceptual schema with subject areas, tables and primary keys for each table. This provides a framework that corrals the developers without fencing them in so tight that they cannot express agility.

Remember that the object here is to reduce refactoring without reducing agility…

IMO the conceptual model approach is best. Let’s raise free range software engineers who can eat bugs in the wild rather than be penned in. The conceptual model delimits the range… a detailed schema defines a pen.

There is one more closely related topic for the next post… how do we manage transient objects in a restful application?

Refactoring Databases P2: Cheating a Little

In this post I am going to suggest doing a little design a little upfront and violating the purity of agility in the process. I think that you will see the sense.

To be fair, I do not think that what I am going to say in this post is particularly original… But in my admittedly weak survey of agile methods I did not find these ideas clearly stated… So apologies up front to those who figured this out before me… And to those readers who know this already. In other words, I am assuming that enough of you database geeks are like me, just becoming agile literate, and may find this useful.

In my prior post here I suggested that refactoring, re-working previous stuff due to the lack of design, was the price we pay to avoid over-engineering in an agile project. Now I am going to suggest that some design could eliminate some refactoring to the overall benefit of the project. In particular, I am going to suggest a little database design in advance as a little cheat.

Generally an agile project progresses by picking a set of user stories from a prioritized backlog of stories and tackling development of code for those stories in a series of short, two-week, sprints.

Since design happens in real-time during the sprints it can be uncoordinated… And code or schemas designed this way are refactored in subsequent sprints. Depending on how uncoordinated the schemas become the refactoring can require a significant effort. If the coders are not data folks… And worse, they are abstracted away from the schema via an ORM layer, the schema can become very silly.

Here is what we are trying to do at the Social Security Administration.

In the best case we would build a conceptual data model before the first sprint. This conceptual model would only define the 5-10 major entities in the system… Something highly likely to stand up over time. Sprint teams would then have the ability to agile define new objects within this conceptual framework… And require permission only when new concepts are required.

Then, and this is a better case, we have data modelers working 1-2 sprints ahead of the coders so that there is a fairly detailed model to pin data to. This better case requires the prioritized backlog to be set 1-2 sprints in advance… A reasonable, but not certain, assumption.

Finally, we are hoping to provide developers with more than just a data model… What we really want is to provide an object model with basic CRUD methods in advance. This provides developers with a very strong starting point for their sprints and let’s the data/object architecture to evolve in a more methodological manner.

Let me be clear that this is a slippery slope. I am suggesting that data folks can work 2-4 weeks ahead of the coders and still be very agile. Old school enterprise data modelers will argue… Why not be way ahead and prescribe adherence to an enterprise model. You all will have to manage this slope as you see fit.

We are seeing improvement in the velocity and quality of the code coming from our agile projects… And in agile… code is king. In the new world an enterprise data model evolves evolve based on multiple application models and enterprise data modelers need to find a way to influence, not dictate, data architecture in an agile manner.

Refactoring Databases P1: Defining Some Terms

We are in the middle of several agile projects at the SSA… so I’ll start the year by sharing some related issues and solutions we are considering…

I am going to try to suggest some ideas about refactoring databases that are a little different from some of the concepts in the book and blogs on the subject here and here. In this first post let me try to define refactoring in a general enough way that the same definition and use of the term works for both code and database design.

To start, refactoring could be a general term for incrementally tweaking any software. We might suggest that we have always refactored software more-or-less. But IMO the term has taken meaning as part of agile software development methods and so I will assume this is the proper use of the term.

Agile is a melting pot for several methodologies that emerged as a reaction to inefficiencies using stepwise waterfall methods. As a result agile has many features that make it useful… I’m going to focus on just one that I consider most relevant to refactoring: an agile method develops software incrementally with only a short-term end state as a target. Each increment adds new functionality and the system evolves. As a result, it is not possible, or not correctly agile, to establish a detailed design in advance and the system design evolves with the system.

This is a very hard concept to grasp. You cannot design up front for a system that has an undetermined end state. The waterfall concepts of design first must be modified to be agile. I’ll suggest how to do this in a later post… rest assured that some design is required… think about how we might build software with no design other than that inherited from the last set of sprints and with only the current sprint user stories to guide us.

If you have grokked this then you are on your way to understanding agile and refactoring.

Imagine that you have built a function that is seldom called… But the current sprint user story will call the function thousands of times a second. Imagine further that you built the original function simply in a stateful manner… But now, in order to meet the new scalability requirements, you realize that the function will need to be stateless. What you are imagining is the need to refactor the function.

Now you might ask: should you have known, in advance, that performance was going to be an issue? Maybe… But maybe not. The point is that when you design just-in-time in an agile manner you cannot, and should not, get too far ahead of yourself and over-engineer. Over-engineering is one of the side-effects of waterfall methods that agile aims to avoid…. And refactoring is the result… It is a trade-off not a perfect solution (again, bear with me and I’ll suggest another trade-off later that you might like).

So refactoring is the process that adjusts design incrementally in an agile project:

Refactoring is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior.

Its heart is a series of small behavior preserving transformations. Each transformation (called a “refactoring”) does little, but a sequence of transformations can produce a significant restructuring. Since each refactoring is small, it’s less likely to go wrong. The system is kept fully working after each small refactoring, reducing the chances that a system can get seriously broken during the restructuring.

– Martin Fowler

Note that in the example I suggested where we refactored for performance we followed this definition closely… the refactored function was changed but its behavior was unchanged… no program that called the function was changed as a result.

Let me make two points here and close…

First: refactoring is about making changes to code and to databases that preserves the behavior of the code or the databases. Refactoring is not about new functionality with new behaviors that might be added incrementally as the project agilely progresses.

Next, refactoring is not about any incremental change… It is about incremental change in an agile project where the end state is uncertain enough to preclude a complete design. If we change a column in a database with some certainty that the column will satisfy a long-term vision then that change is not refactoring. Refactoring is not a process to guide a database migration or database modernization process.

When the end state is well understood it is silly to code stuff that you know will break later… And incrementally changing separate parts of a database that you are pretty certain will not change in the future is not refactoring.

This may seem obvious… but as you will see in the next post… the definitions matter.