Refactoring Databases P1: Defining Some Terms

We are in the middle of several agile projects at the SSA… so I’ll start the year by sharing some related issues and solutions we are considering…

I am going to try to suggest some ideas about refactoring databases that are a little different from some of the concepts in the book and blogs on the subject here and here. In this first post let me try to define refactoring in a general enough way that the same definition and use of the term works for both code and database design.

To start, refactoring could be a general term for incrementally tweaking any software. We might suggest that we have always refactored software more-or-less. But IMO the term has taken meaning as part of agile software development methods and so I will assume this is the proper use of the term.

Agile is a melting pot for several methodologies that emerged as a reaction to inefficiencies using stepwise waterfall methods. As a result agile has many features that make it useful… I’m going to focus on just one that I consider most relevant to refactoring: an agile method develops software incrementally with only a short-term end state as a target. Each increment adds new functionality and the system evolves. As a result, it is not possible, or not correctly agile, to establish a detailed design in advance and the system design evolves with the system.

This is a very hard concept to grasp. You cannot design up front for a system that has an undetermined end state. The waterfall concepts of design first must be modified to be agile. I’ll suggest how to do this in a later post… rest assured that some design is required… think about how we might build software with no design other than that inherited from the last set of sprints and with only the current sprint user stories to guide us.

If you have grokked this then you are on your way to understanding agile and refactoring.

Imagine that you have built a function that is seldom called… But the current sprint user story will call the function thousands of times a second. Imagine further that you built the original function simply in a stateful manner… But now, in order to meet the new scalability requirements, you realize that the function will need to be stateless. What you are imagining is the need to refactor the function.

Now you might ask: should you have known, in advance, that performance was going to be an issue? Maybe… But maybe not. The point is that when you design just-in-time in an agile manner you cannot, and should not, get too far ahead of yourself and over-engineer. Over-engineering is one of the side-effects of waterfall methods that agile aims to avoid…. And refactoring is the result… It is a trade-off not a perfect solution (again, bear with me and I’ll suggest another trade-off later that you might like).

So refactoring is the process that adjusts design incrementally in an agile project:

Refactoring is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior.

Its heart is a series of small behavior preserving transformations. Each transformation (called a “refactoring”) does little, but a sequence of transformations can produce a significant restructuring. Since each refactoring is small, it’s less likely to go wrong. The system is kept fully working after each small refactoring, reducing the chances that a system can get seriously broken during the restructuring.

– Martin Fowler

Note that in the example I suggested where we refactored for performance we followed this definition closely… the refactored function was changed but its behavior was unchanged… no program that called the function was changed as a result.

Let me make two points here and close…

First: refactoring is about making changes to code and to databases that preserves the behavior of the code or the databases. Refactoring is not about new functionality with new behaviors that might be added incrementally as the project agilely progresses.

Next, refactoring is not about any incremental change… It is about incremental change in an agile project where the end state is uncertain enough to preclude a complete design. If we change a column in a database with some certainty that the column will satisfy a long-term vision then that change is not refactoring. Refactoring is not a process to guide a database migration or database modernization process.

When the end state is well understood it is silly to code stuff that you know will break later… And incrementally changing separate parts of a database that you are pretty certain will not change in the future is not refactoring.

This may seem obvious… but as you will see in the next post… the definitions matter.