Refactoring Databases P2.75: REST and ORB Thoughts

I’m riffing on database design and modeling in an agile methodology. In the previous post here I suggested that we might cheat a little and design database schemas 2-4 weeks in advance of their use… further, I suggested that to maximize agility we should limit the design to a conceptual schema. In this post I’m going to limit the scope little more by considering the use of a database in a restful application.

I am not going to fully define the REST architecture here… but if you are a systems architect you should know this inside and out… what I will say is that sometimes, in order to build a RESTful application, you will find yourself using the database to store the application state. What I want to say here is that when the application stores state in a database that must persist across boundaries in your business process… but that is not required to persist across business processes… then you do not need to model this. Let the programmers do their worst. In this case database tables play the role of working storage (a very old term that dates me… at least I did not say “Data Division”)… and programmers need to be completely free to add and subtract data elements in their programs as required.

I also have a question for the architects out there…

When programmers touch relational data they typically go through one or more abstractions: maybe they call an XML-based RESTful web service… or maybe they write the service themselves and in the service call an ORB-thingie like Hibernate. I’ve seen terrible schema signs that result from programmers building to an object model without looking at the resulting relational model out of the other end. So… when we assign data architects to build a relational schema to underlie an object-oriented programming language… should we architect up the stack and deliver the relational schema, the ORB layer, and/or the RESTful CRUD services for the objects? We are starting down that path… but I thought that I would ask…

8 thoughts on “Refactoring Databases P2.75: REST and ORB Thoughts”

  1. Rob,

    I’m not a big fan of using an ORM for database access as most ORMs I’ve worked with end up with a tight coupling between the object-model and the physical database model (or relational schema). Tight coupling like this leads to problems when either model needs to change.

    I prefer enabling the database to provide a RESTful API (JSON-based, not XML based, as most modern apps and developers prefer working with JSON) that can be versioned. I also prefer to have that API mapped to business functions, not object or table-based CRUD verbs — such verbs lead to application management of physical data layouts, a detail that the application programmer should not have to deal with.

    There are several ways to implement a RESTful JSON API data access layer as a lightweight wrapper around your chosen data source technology. If you’re using Oracle, you can use the Oracle REST Data Services (ORDS) option in conjunction with PL/SQL stored procedures to provide that interface. The bonus of implementing your data access layer in this fashion is that your application is more loosely coupled to the database, easing future database technology changes to something like MongoDB which speaks JSON natively.

    Good luck with your refactoring efforts — hopefully the idea of looser coupling between the applications and database will make your efforts easier.

  2. Robb, thank you for the thoughtful inquiries and responses. Architects would likely choose an OODBMS like Versant or Intersystems Cache etc… I suspect in this case that there’s “legacy” RDBMS involved, so an encapsulation layer such as that suggested by Scott Wambler (Agile) will likely be proposed/used. Clearly temp data tables are supported by DB2, Oracle, PostgreSQL etc… and some DBMS’ can also persist them. If such constructs are not desired for whatever reason, containers (e.g. table spaces) can be created for such “state” persisting objects (use naming conventions). By definition then, this becomes a physical/infrastructure concern so data architects may not care to get involved although from the perspective of institutional knowledge it should be housed as a data modeling artifact. It should also be noted that depending on the actual manifestation, DBA’s or data center managers should be made aware as to expectations regarding recoverability (e.g. data store crash should these artifacts be recoverable other than the metadata being there etc…). So as we evolve we need to involve the various disciplines so that they can also perform their respective responsibilities.

  3. Also, XML and JSON are RESTful message formats. As such, PostgreSQL 9.4 (BSON), DB2 z/OS, DB2 LUW, and Oracle all support hybrid NOSQL/RDBMS to get the best of both worlds (versus MongoDB). There’s a slight overhead to such encapsulation in the form of UDF’s, Stored Procedures etc…From the perspective of data types, JSON has a limited set so I think XML is still viable.

  4. Based on your environment, I would say “Yes” to your specific question if for no other reason that it is what your application development teams that support the CICS legacy systems are used to. In other words, there has, for many years, been an abstraction layer between application software and the data handling software. So my opinion is that you should deliver something similar to enable them to work in a like manner albeit with more modern software engineering tools. This way you are only changing the tooling on the software manufacturing line and not changing the role at each work station. App. developers can be app developers without worrying about the underlying DB technology. Also, good references here from Wambler:

  5. Rob,
    A thought of mine on your 2nd paragraph regarding REST:
    I agree with you that REST ws is by default stateless, and I’m sure you know that the SOAP ws can be stateless too. As a matter of fact, most, if not all, SOAP ws we have in our organization are stateless. So, if an application that consumes the stateless ws needs to maintain state, something else has to be done. You mentioned the option of using DB to do just that. That’s fine, but in reality, we almost never use DB as a way to store state. That’s because the application, which are the consumers of stateless ws, usually have a good way to maintain state. For example, in the world of JEE, we have the concept of session that allows the application to keep the client state across multiple stateless calls.

    My thought on the last paragraph regarding architecture:
    To me what you seem to be saying is that we should standardize the way multiple applications access the same data in DB and re-use the code at various layer of data access as much as possible. Reusability is always a good thing. I think most architects will agree with you there. Use Java as an example again, by providing the relational schema along with the corresponding Entity classes for a given DB, programmers of the applications, that are allowed to access data directly without having to go through an abstract layer such as ws, can just pick up these Entity classes and drop in their class path, and they are done in terms of data access coding through JPA. For applications that are not allowed to access data directly, then providing a REST or SOAP ws interface would help these apps. All they need to do is call the ws interface and get their data.

    1. You are right, Georgia… I said “REST” when I should have said “stateless”. Thanks for the upgrade and for the description of stateless support outside of REST.

  6. As a proponent of the general idea of having data precede functional components using Agile in large scale data warehousing, here are a few more thoughts:

    1. Don’t model transient application state – stay focused on the business invariant (as proposed in article)

    2. Aggregations can be essential, but nothing can tie a large scale model into a knot more than aggregates (materialized views/queries, join indexes, or manually managed). To avoid such entanglement and enable future refactoring here are three suggestions: (a) don’t include aggregates in the early data sprint. (b) don’t model aggregates, but rather treat them as a physical design concern. (c) Implement all application access using the base model, and avoid directly accessing the aggregates themselves or views on top of these aggregates. This may be tricky to engineer, but most DBMSs have the tools, It will allow for later mutations/refactoring of aggregates without requiring functional change to applications (although performance testing will still be needed)

    3. Sharing vs. Autonomy: Transaction-oriented system sprints can be scoped through well defined class boundaries. Analytical applications that see the world in a dimensional way are likely to involve “dimension” and “transaction/event/fact” objects and their scope can easily creep beyond “single-sprint scope”. Limiting sprints to one or two “functional objectives” at a time may keep scope in check, but this may cause touching the same data in subsequent iterations. The challenge is how to “repackage” scope when planning sprints to address related functional enhancements that touch the same data with one “data sprint”. This will impose testing, and maybe code changes on other application areas that are not making changes but are impacted by shared data. These are likely to be perceived as unexpected annoyances by these application teams, and certainly will raise risk-alarms.

    4. Should sprints be scheduled for compelling global performance or cost/efficiency improvements? This may be very real In large share-nothing environments where model changes may result in substantial savings, free-up resources or satisfy SLA cycles, they may involve long data conversion activities touching many functional applications, and must be designed to accommodate phased application migrations. One solution is to dedicate a small portion of such sprints to be prioritized separately. These changes must be designed to accommodate testing and gradual migration of applications that sharing the impacted data.

    This is why many substantial enhancements are never tackled, leading to much costlier alternatives, or after enhancements are made, older redundant structures remain in production to support one or two low priority applications.It takes only a little bending of the agile rules to be able to get around the “large shared data” issues.

Comments are closed.

%d bloggers like this: