Universal Data Vault: A Hyper-Generalized Data Vault — Review of John Giles’ Case Study



, , , , , , , , , ,

I recently came across a brilliant data-modeling case study.  John Giles, who wrote it, is also the author of the insightful and (for us data geeks) entertaining book, ‘The Nimble Elephant’.  In this UDV case study, John combined his strategy of applying, and establishing variations from, universal data-modeling patterns to Dan Linstedt’s Data Vault EDW methodology, with impressive results in a very challenging scenario.  Here are my unofficial summary and impressions.

Case Study Link:   http://www.countrye.com.au/new/wp-content/uploads/2013/01/Universal-Data-Vault-case-study.pdf

John needed to design an enterprise data model that had to align with and support Master Data Management, an Enterprise Service Bus, and a Data Warehouse associated with a substantial enterprise integration effort based on organizational acquisitions.   The fact that I was riveted while reading this is a clear sign that I am, indeed, a data geek.

Here was John’s problem…

A government agency will consolidate 83 previously autonomous regional fire-fighting groups into a single organizational (informational) hierarchy.  Although these organizations performed similar life-saving operations, they each chose to organize that work, and thus their operational information systems, around entities with modeled granularities that not only varied widely, but also show mutually-inconsistent and overlapping mixed (think: ad-hoc, imperfect, odd) granularities.  For example, one group’s table for a firefighting asset may stored a consistent granularity at the level of (aircraft, land-based, team).  A second group’s asset granularity may be highly generalized, simply as (asset) to cover all assets regardless of characteristics.  It get’s worse.  A third group’s granularity might be oddly mixed, as (helicopter, fixed wing aircraft, land-vehicle, firefighter, and a fourth group, mixing granularities in yet another ad-hoc way, uses (aircraft, fire engine, fire-fighting personnel transport, firefighting team).  With group 3 and 4, note that they have a conceptually inconsistent mixing of asset granularities that cut across each other.  These, and many more oddities of mixed granularity were observed within an average of 5 source systems and a combined 20 core entities for each one of the 83 groups to be acquired.  All in all, of the various source systems combined ~8,000 tables, approximately 1,600 mission-critical tables existed.

Project management expectations include the following…

  • Source system data should not be subjectively transformed or re-interpretted in order for it to fit into the target data structure. As such, early interest in Star Schema models were abandoned.
  • Adaptability to Change: In addition to the currently planned ~1,600 tables to integrate now, many more of the total 8,000 would eventually be brought in. This had to be accomplished with minimum refactoring of the early-deployed versions of the DW and MDM solutions. As such, suspicions arose as to whether a classic 3rd normal form entity relational model would be overly rigid.
  • Auditability and Historical Change Tracking: In the new EDW and MDM systems expected to use this model, each record and field in each target table had to be explicitly traceable to a record and field in a table from a source system at a specific group. The auditability requirement also asserted that changes to source system data, whether overwritten in those systems themselves, must have their change history tracked in the target model.

Upon considering a standard Business Data Vault architecture, wherein a Business Key would be identified, and instantiated as a Hub table for each business concept — essentially, each fixed-granularity level for every in-scope business subject area — would have resulted in hundreds of Hubs, in addition to some multiple of that for the number of associated Links and Satellites).  From here, one can only imagine if, in the future, yet more groups were brought in, and those groups used still-different mixtures of granularities for assets or other entities.  This problem, by itself, would likely sabotage the ‘Adaptability to Change’ mandate.

As I mentioned in my opening statement, John Giles specializes in using, and applying variations to, data modeling patterns.  His combination of classic sub-type / super-type model patterns with Business Data Vault methodology is, from my perspective, ingenious.  Here is the link to his ‘Universal Data Vault’ case study…


Reminder and some key takeaways…

  • With my already-stated caveat about John’s specialty in combining and applying variations on data modeling patterns, readers should assume right up front that he has added non-data-vault types of tables to this solution.
  • Hubs, in this case, can be used to manage a business key that has wide variety of granularities, through the use of UDV’s recursive, generalized super-type tables (Hub-Type, Link-Type).
  • John mentions that, with the substantial reduction of Hubs as an intended upside, two downsides are that (a) with the use of mixed granularity hubs, the granularity of each Hub (the range of granularities) is less than obvious; and (b) the UDV renders that already-challenging work of identifying Business Keys even more challenging.
  • Data Vault’s inherent flexibility around the selection of Satellite tables is superbly demonstrated here, with the implicit addition of one more UDM-specific rule for ‘how many Satellites per Hub”. That additional rule might be described as ‘How many levels of granularity are existent for a given multi-granular Hub?

The end result of the case study is a Data Vault schema that contains only a fraction of the number of Hubs / Ensembles than it otherwise would, but still benefits from Business Data Vault’s elegance and avoidance of data integration resulting in data over-transformation, with the resulting loss of auditability and fidelity to source systems.

Parting Questions:

  • Is there any significant downside for the Data Vault methodology to be extended or combined with other patterns?

Enjoy John Giles’ Case Study of Universal Data Vault.


Your comments are always welcome.

Lean Data Warehouse via Data Vault



, , , , , ,

Today, businesses want a data warehouse to be Lean.  In contrast to many traditional data warehouses, a Lean Data Warehouse is simple, fast to deliver, automatable, free of over-complex dependencies,  adaptive to business change, and thus quickly profitable.  Data Vault is an outstanding logical architecture for a just such a Lean Data Warehouse. With a conceptual overview and a few medium-dive examples, this original article points the way forward.


Get every new post delivered to your Inbox.