An effective approach to data-centric architecture

In my opinion, the application-centric paradigm is no longer fully up to the task of implementing the “world of things”, the increasingly pervasive collection of connected, intelligent devices communicating in near-real-time that constitutes our developed-world environment. Which, increasingly, characterises the leading edge of developing business systems.

What do I mean by “near-real-time”? I mean that things happen, or information is available, in a timely (and consistently timely) way, as and when we need or expect them to happen.

Not everything needs to happen in real time, but if we’re happy with information that’s an hour out of date, it should always be reliably no more than an hour out of date, no matter what else is going on—and if we decide that we now need it to be at worst 5 minutes, or 5 microseconds, out of date, this shouldn’t introduce a serious discontinuity. All we need to do is implement (or pay for) progressively better service levels.

How do we achieve this? Essentially, by simplifying the infrastructure and getting rid of things that make “real-time”, “near-real-time”, and deterministic (predictable, consistent) service levels impossible. And by introducing efficient, effective and deterministic management of QoS (Quality of Service).

An effective approach is to design a data-centric architecture that can link “systems of systems” into a coherent whole. And, there is an open standard to base this on—OMG (Object Management Group) DDS (Data Distribution Service).

More importantly, perhaps, the actual interoperability and portability of applications between different DDS products from different vendors is publicly tested by the OMG. The results of a 3-way interoperability test are reported here and the results of a 5-way test are due soon. Which means that the usual open platform promises (of allowing applications to communicate with each other regardless of author, underlying transport or purpose, within the whole “system of systems”) are, here, likely to be delivered in practice.

The idea of “systems of systems” is important in the DDS context because because the architecture is most sensibly based on autonomous, cohesive, systems, complete in themselves, loosely coupled by a publish and subscribe framework.

The application-centric model (or, perhaps, the message oriented client-server model specifically) is too clumsy—I have been talking about the “death of applications” in various contexts for ages. We don’t have time to invoke an application via its API, search the network for the appropriate databases, send messages out over the network and hope that, this time, some get through and return reasonably promptly.

We need a data-centric model. Fundamentally, data is what drives everything, in conjunction with metadata (which turns data into information) and business vision (which turns information into knowledge). Anything else (coding, APIs, messages, packets, protocols) is potentially unnecessary waste; we can’t eliminate all of it, but we can minimise it wherever possible and, in particular, let machines (not people) worry about it (it’s the sort of stuff machines are good at).

In a data-centric world, people publish information in compliance with data-dictionaries which include the metadata (data semantics, QoS targets etc.) needed to use it effectively. They subscribe to the information resources they need, with an appropriate QoS. Information about the “state” of things in the system (which is represented by state data) is published and subscribed to by other things in the system.

In other words, the “state” of any one sub-system is published onto the databus at all times by modifying the value of a data attribute of that subsystem and, as this data is changing “on-the-fly”, you need a QoS contract between the publisher and subscriber to ensure that the state information subscribed to is appropriately timely. DDS basically sets up the QoS for the subscriber and publisher and then does the work to make sure that whatever subset of the published state data a particular subscriber needs is available and up-to-date, as specified by that subscriber’s QoS contract, at run time.

As a proof point, DDS is now mandated by UK defence procurement (see here). GVA – Generic Vehicle Architecture – Def-Stan 23-09 mandates an Interoperable Open Architecture, mandates use of a MOD-defined System Data Dictionary (called the ‘Land Data Model’ and defined in standard OMG UML notation) and mandates DDS as the data-bus for system to system communication (it also mandates power, connector and physical transport standards).

There is a public commitment to sharing the System Data Dictionary with other Nation States too. As I understand it, this standard allows you to think of vehicles as “systems”, where at some level a tank and a jeep are subtypes of the same thing, and a weapons guidance system, say, written to this standard, could be unplugged from a tank and quickly adapted to a jeep—even though the technology behind the interfaces might be very different.

However, I’m not much interested in killing machines (except as sources of good practice; when your customers have guns, “customer orientation” can reach an entirely new level of reality).

What this all means, for me, is that data-centric architectures are something which general business developers should be evaluating now, mainly because data-centric architectures are such a good fit with the emerging “universe of things” and inherently cope with the increasing complexity of heterogeneous, globalised business systems. And, a measure of “good governance” is inherent in the inherently manageable data-centric architecture.

There’s even a “database” aspect to all this, as RTI has a “persistent data storage capability” (see the RTI Persistance Service here; in conjunction with its Connect service here). Together, these products provide, in effect, a DDS front-end to a highly distributed database system and/or application and might add capabilities around fault resilience, late join capabilities, recovery etc. I wonder how this would work with Google’s BigTable or HBase on Hadoop?

The OMG DDS standard “merely” ensures that the approach isn’t vendor specific (as well as RTI, Atego and Sparx Systems there are other supporters on the OMG website). OMG standards also deliver something else—a standard UML profile can be used to support the modelling of data-centric solutions using DDS.

This means that, for example, Atego’s model-driven systems-engineering UML development tools can easily support DDS; although Andrew Watson of the OMG points out that although this is useful, it isn’t quite sufficient: “DDS is a middleware specification with a standard wire protocol and standard API.

To say that one is using DDS, one would have to use or implement at least the protocol”. According to Watson, the support of Atego and Sparx Systems is paticularly important to systems-enginering organisations “because of the standardised UML-based Data Dictionary in the GVA which tells companies designing GVA-compliant systems what data structures their boxes can expect to send and receive, and what that data “means” (in some sense of that word)”.

And there are open source software DDS implementations, e.g. OpenSplice. DDS can even run in conjunction with client-server technology, although this can’t provide the full peer-to-peer functionality that is desirable in the data-centric vision (if something claims to support DDS, this is priobably worth checking, although lack of full peer-to-peer functionality doesn’t invalidate the claim).

So far, Data Centric Architectures seem to have been largely developed to cope with the real-time needs of defence and aerospace systems. However, if you think in terms of business outcomes and business environments, general business has similar classes of problem and increasing complexities and scalability needs—in fact, RTI has one of the world’s biggest banks as a customer, using DDS for its trading systems—and legacy approaches are not guaranteed to evolve to cope with them.

David Norfolk is Practice Leader Development and Governance (Development/Governance) at Bloor Research. David first became interested in computers and programming quality in the 1970s, working in the Research School of Chemistry at the Australian National University. Here he discovered that computers could deliver misleading answers, even when programmed by very clever people, and was taught to program in FORTRAN. His ongoing interest in all things related to development has culminated in his joining Bloor in 2007 and taking on the development brief. Development here refers especially to automated systems development. This covers technology including acronym-driven tools such as: Application Lifecycle Management (ALM), Integrated Development Environments (IDE), Model Driven Architecture (MDA), automated data analysis tools and metadata repositories, requirements modelling tools and so on.