Refactoring Databases P2.75: REST and ORB Thoughts

I’m riffing on database design and modeling in an agile methodology. In the previous post here I suggested that we might cheat a little and design database schemas 2-4 weeks in advance of their use… further, I suggested that to maximize agility we should limit the design to a conceptual schema. In this post I’m going to limit the scope little more by considering the use of a database in a restful application.

I am not going to fully define the REST architecture here… but if you are a systems architect you should know this inside and out… what I will say is that sometimes, in order to build a RESTful application, you will find yourself using the database to store the application state. What I want to say here is that when the application stores state in a database that must persist across boundaries in your business process… but that is not required to persist across business processes… then you do not need to model this. Let the programmers do their worst. In this case database tables play the role of working storage (a very old term that dates me… at least I did not say “Data Division”)… and programmers need to be completely free to add and subtract data elements in their programs as required.

I also have a question for the architects out there…

When programmers touch relational data they typically go through one or more abstractions: maybe they call an XML-based RESTful web service… or maybe they write the service themselves and in the service call an ORB-thingie like Hibernate. I’ve seen terrible schema signs that result from programmers building to an object model without looking at the resulting relational model out of the other end. So… when we assign data architects to build a relational schema to underlie an object-oriented programming language… should we architect up the stack and deliver the relational schema, the ORB layer, and/or the RESTful CRUD services for the objects? We are starting down that path… but I thought that I would ask…

Refactoring Databases P2.5: Scoping the Cheats

One of the side-effects of the little cheat posted here is that, if we are going to design early we have to decide what we will design early… and this question has two complications. First, we have to ask ourselves how detailed do we design before the result becomes un-agile? Next, we have to ask ourselves if we should design up the stack a little? My opinions will be doggedly blogged in this post.

I will offer two ends of a spectrum to suggest a way to manage the scope of your advance design cheat. Let me remind you that the cheat suggests that you look at the user stories that will be sprinted on next and devise the schema required from those stories… and maybe refactor the existing schema a little (more in the next post) no more than that.

On one side we may develop a complete design with every detail specified: subject areas, tables, columns, data types, and domains. The advantage here is that the code developers have a spec to code to and these could increase velocity. But the down side is that developers will be working with users to adjust the code in real-time. If the schema does not fit the adjustments then you may be refactoring the new stuff and velocity may decrease.

The other side of the spectrum would have database designers just build a skeleton; a conceptual schema with subject areas, tables and primary keys for each table. This provides a framework that corrals the developers without fencing them in so tight that they cannot express agility.

Remember that the object here is to reduce refactoring without reducing agility…

IMO the conceptual model approach is best. Let’s raise free range software engineers who can eat bugs in the wild rather than be penned in. The conceptual model delimits the range… a detailed schema defines a pen.

There is one more closely related topic for the next post… how do we manage transient objects in a restful application?

Refactoring Databases P2: Cheating a Little

In this post I am going to suggest doing a little design a little upfront and violating the purity of agility in the process. I think that you will see the sense.

To be fair, I do not think that what I am going to say in this post is particularly original… But in my admittedly weak survey of agile methods I did not find these ideas clearly stated… So apologies up front to those who figured this out before me… And to those readers who know this already. In other words, I am assuming that enough of you database geeks are like me, just becoming agile literate, and may find this useful.

In my prior post here I suggested that refactoring, re-working previous stuff due to the lack of design, was the price we pay to avoid over-engineering in an agile project. Now I am going to suggest that some design could eliminate some refactoring to the overall benefit of the project. In particular, I am going to suggest a little database design in advance as a little cheat.

Generally an agile project progresses by picking a set of user stories from a prioritized backlog of stories and tackling development of code for those stories in a series of short, two-week, sprints.

Since design happens in real-time during the sprints it can be uncoordinated… And code or schemas designed this way are refactored in subsequent sprints. Depending on how uncoordinated the schemas become the refactoring can require a significant effort. If the coders are not data folks… And worse, they are abstracted away from the schema via an ORM layer, the schema can become very silly.

Here is what we are trying to do at the Social Security Administration.

In the best case we would build a conceptual data model before the first sprint. This conceptual model would only define the 5-10 major entities in the system… Something highly likely to stand up over time. Sprint teams would then have the ability to agile define new objects within this conceptual framework… And require permission only when new concepts are required.

Then, and this is a better case, we have data modelers working 1-2 sprints ahead of the coders so that there is a fairly detailed model to pin data to. This better case requires the prioritized backlog to be set 1-2 sprints in advance… A reasonable, but not certain, assumption.

Finally, we are hoping to provide developers with more than just a data model… What we really want is to provide an object model with basic CRUD methods in advance. This provides developers with a very strong starting point for their sprints and let’s the data/object architecture to evolve in a more methodological manner.

Let me be clear that this is a slippery slope. I am suggesting that data folks can work 2-4 weeks ahead of the coders and still be very agile. Old school enterprise data modelers will argue… Why not be way ahead and prescribe adherence to an enterprise model. You all will have to manage this slope as you see fit.

We are seeing improvement in the velocity and quality of the code coming from our agile projects… And in agile… code is king. In the new world an enterprise data model evolves evolve based on multiple application models and enterprise data modelers need to find a way to influence, not dictate, data architecture in an agile manner.