May 22, 2015March 23, 2016 by Daniel Upton

Universal Data Vault: Hyper-Generalized Data Vault — Review of John Giles’ Case Study

I recently came across an interesting data-modeling case study. John Giles, who wrote it, is also the author of the insightful and — for us data geeks — entertaining book, ‘The Nimble Elephant’. In this UDV case study, John combined his strategy of applying, and establishing variations from, universal data-modeling patterns to Dan Linstedt’s Data Vault EDW methodology, with very practical results in a challenging scenario. Here are my unofficial summary and impressions.

Case Study Link: http://www.countrye.com.au/new/wp-content/uploads/2013/01/Universal-Data-Vault-case-study.pdf

John needed to design an enterprise data model that had to align with and support Master Data Management, an Enterprise Service Bus, and a Data Warehouse associated with a substantial enterprise integration effort based on organizational acquisitions. The fact that I was riveted while reading this is a clear sign that I am, indeed, a data geek.

Here is my impression of his design challenge…

A government agency will consolidate 83 previously autonomous regional fire-fighting groups into a single organizational (informational) hierarchy. Although these organizations performed similar life-saving operations, they each chose to organize that work, and thus their operational information systems, around entities with modeled granularities that not only varied widely, but also show mutually-inconsistent and overlapping mixed (think: ad-hoc, imperfect, odd) granularities. For example, one group’s table for a firefighting asset may stored a consistent granularity at the level of (aircraft, land-based, team). A second group’s asset granularity may be highly generalized, simply as (asset) to cover all assets regardless of characteristics. It get’s worse. A third group’s granularity might be oddly mixed, as (helicopter, fixed wing aircraft, land-vehicle, firefighter, and a fourth group, mixing granularities in yet another ad-hoc way, uses (aircraft, fire engine, fire-fighting personnel transport, firefighting team). With group 3 and 4, note that they have a conceptually inconsistent mixing of asset granularities that cut across each other. These, and many more oddities of mixed granularity were observed within an average of 5 source systems and a combined 20 core entities for each one of the 83 groups to be acquired. All in all, of the various source systems combined ~8,000 tables, approximately 1,600 mission-critical tables existed.

Project management expectations include the following…

Source system data should not be subjectively transformed or re-interpretted in order for it to fit into the target data structure. As such, early interest in Star Schema models were abandoned.
Adaptability to Change: In addition to the currently planned ~1,600 tables to integrate now, many more of the total 8,000 would eventually be brought in. This had to be accomplished with minimum refactoring of the early-deployed versions of the DW and MDM solutions. As such, suspicions arose as to whether a classic 3^rd normal form entity relational model would be overly rigid.
Auditability and Historical Change Tracking: In the new EDW and MDM systems expected to use this model, each record and field in each target table had to be explicitly traceable to a record and field in a table from a source system at a specific group. The auditability requirement also asserted that changes to source system data, whether overwritten in those systems themselves, must have their change history tracked in the target model.

Upon considering a standard Business Data Vault architecture, wherein a Business Key would be identified, and instantiated as a Hub table for each business concept — essentially, each fixed-granularity level for every in-scope business subject area — would have resulted in hundreds of Hubs, in addition to some multiple of that for the number of associated Links and Satellites). From here, one can only imagine if, in the future, yet more groups were brought in, and those groups used still-different mixtures of granularities for assets or other entities. This problem, by itself, would likely sabotage the ‘Adaptability to Change’ mandate.

As I mentioned in my opening statement, John Giles specializes in using, and applying variations to, data modeling patterns. His combination of classic sub-type / super-type model patterns with Business Data Vault methodology is, from my perspective, ingenious. Here is the link to his ‘Universal Data Vault’ case study…

http://www.countrye.com.au/new/wp-content/uploads/2013/01/Universal-Data-Vault-case-study.pdf

Reminder and some key takeaways…

With my already-stated caveat about John’s specialty in combining and applying variations on data modeling patterns, readers should assume right up front that he has added non-data-vault types of tables to this solution.
Hubs, in this case, can be used to manage a business key that has wide variety of granularities, through the use of UDV’s recursive, generalized super-type tables (Hub-Type, Link-Type).
John mentions that, with the substantial reduction of Hubs as an intended upside, two downsides are that (a) with the use of mixed granularity hubs, the granularity of each Hub (the range of granularities) is less than obvious; and (b) the UDV renders that already-challenging work of identifying Business Keys even more challenging.
Data Vault’s inherent flexibility around the selection of Satellite tables is superbly demonstrated here, with the implicit addition of one more UDM-specific rule for ‘how many Satellites per Hub”. That additional rule might be described as ‘How many levels of granularity are existent for a given multi-granular Hub?

The end result of the case study is a Data Vault schema that contains only a fraction of the number of Hubs / Ensembles than it otherwise would, but still benefits from Business Data Vault’s elegance and avoidance of data integration resulting in data over-transformation, with the resulting loss of auditability and fidelity to source systems.

Parting Questions:

Is there any significant downside for the Data Vault methodology to be extended or combined with other patterns?

Enjoy John Giles’ Case Study of Universal Data Vault.

http://www.countrye.com.au/new/wp-content/uploads/2013/01/Universal-Data-Vault-case-study.pdf

Your comments are always welcome.

27 thoughts on “Universal Data Vault: Hyper-Generalized Data Vault — Review of John Giles’ Case Study”

Kent Graziano says:

June 1, 2015 at 9:13 pm

The main downside of generalizing a DV solution is the business keys are not so obvious and hence the model is harder to interpret without the knowledge of the implied super and sub types. Pure DV makes it easy to see the business concepts quite clearly (but yes potentially at the cost of lots of tables).

Since I was a co-author with Len on the first edition of the UDM series (and learned at the feet of my friend & mentor David Hay), I am a fan of the patterns for sure, especially for an enterprise conceptual model, but when it comes to BI, more specificity helps the business users understand the model faster. In these days of demand for self-service BI, I would be very cautious about over generalizing and model that will be used for reporting.

LikeLike

Reply
DecisionLab says:

June 1, 2015 at 11:27 pm

Hi Kent,
I did notice recently that you co-authored that book with Len Silverston. Keep in mind that, in this scenario, a DW (thus BI) was only one of three requirements that the data model had to support, in addition to Master Data Management and an Enterprise Service Bus. But, to keep our discussion focussed, let’s suppose for now that it was only for an EDW for BI.

For an EDW, sub-type / super-type tables are not something one would normally use, especially if the source data were not already in that form. However, in this case, without generalizing tables, a Business Data Vault (eg. Keys aligned into common Hubs, perhaps with N source-specific Satellites per Hub) would likely turn the 1,600 source tables into 3,500 tables. Such a number of tables itself represents a potential barrier to interpret-ability. As I read the case study, I was imagining how much ETL code would be required to load a standard DV, as well as the amount of re-interpretation of entities which had mixed / crossed granularities and semantics among the organizations. Data Vault is supposed to avoid data re-interpretation, and this case study specified that avoiding re-interpretation (naming, relationships) was to be avoided.
Lastly, since Ralph Hughes’ upcoming book will describe hyper-generalization as a DW design approach, I thought that this Universal DV case study was interesting and fairly detailed. For me, the case study was a good brain-stretch.

I will ask John Giles if he’s willing to share any add’l info about the source tables inconsistencies.

As a closing hook, Kent, do you often find customers who want you to implemenent Data Vault ‘just in the places where it’s needed’? Personally, I’ve not yet become comfortable with a partial DV EDW design. As (or if) Data Vault becomes more popular in the U.S., I think that requests, or attempts, such as this will only become more common. What are your thoughts?

LikeLike

Reply
1. Kent Graziano says:
  
  June 2, 2015 at 4:10 am
  
  Well to a large extent, the number of tables has become less of a concern since there are several tools now that help automate the generation of ETL code ( or more properly ELT code). At the 2nd annual World Wide Data Vault Consortium last week quite a few of us presented not only how we are automating data vaults but also virtualizing or information marts on top of the vaults. Check out the Twitter stream for #WWDVC to see some of the discussions.
  
  Dan’s next book has a bunch of example core that should help clarify some of this for folks. It is available for pre order on Amazon now.
  
  In your commentary and the case study you both mention Business Data Vault but no mention of the raw DV. Can you clarify what you mean by BDV? I want to be sure we are all using the same definition. (You might not be using the same one as in the Super Charge book)
  
  And no, no one has ever asked to have just part of their data warehouse be a data vault. 🙂
  
  Definitely an interesting case study. I presented a DV hybrid last week myself. It would be great to have John present his UDV at next years event.
  
  LikeLike
  
  Reply
DecisionLab says:

June 3, 2015 at 7:07 pm

Automation of Data Vault design and/or loading is a big deal, and I’ll look at Dan’s upcoming book. Having said that, the goal was ease of downstream usage, and I suspect that there is a threshold where entity granularity may be so inconsistent, so mixed between data sources, wherein the sub-type / super-type pattern will actually increase the downstream usability of a Data Vault EDW, like it does so well with traditional entity-relational 3NF.
Business Data Vault: Business-key alignment, wherein one Hub is a parent of N Satellite tables, each of which may originate from one or more source tables within one or more data sources. I think that Hultgren defines it like this. Also thought Super Charge and Business SuperModel did, but it’s been awhile.

LikeLike

Reply
1. Kent Graziano says:
  
  June 4, 2015 at 4:11 am
  
  Sounds like you are talking about a raw data vault then (if this is where the data first lands from the source). A Business Data Vault is an extension using standard DV objects but some of the data may be transformed according to soft business rules so that all downstream marts get a consistent answer (as opposed to trying to apply the business rules transformations on the way into each mart.)
  
  LikeLike
  
  Reply
DecisionLab says:

June 7, 2015 at 5:10 am

Well, you are the author, Kent. Would you please remind me where in the books (“Super Charge…”? Or is it “New Business Super Model”), that I will find the explanation for these “…transforms according to soft business rules”? I may be using, or mis-using, Hans Hultgren’s definition.

LikeLike

Reply
1. Kent Graziano says:
  
  June 7, 2015 at 1:54 pm
  
  In the Super Charge book it was just called Data Vault. Dan later wrote a white paper about the BDV. Looks like a DV but has business rules applied to the raw vault data to simplify feeding downstream marts. So you only apply the transformations once for all. With giving that its own name we then had to start referring to the original as a “raw” DV. The definition you gave above is the classic definition if what I referred to as raw DV.
  
  LikeLike
  
  Reply
  1. DecisionLab says:
    
    June 7, 2015 at 4:20 pm
    
    On danlistedt.com, I see (and will now read) the 2010–4-08 entry for ‘Data Vault and Business Data Vault Mixed Together’, but that is not likely what you are referring to. Would you mind sending me a link to it or tell me how I can otherwise access it?
    
    LikeLike
  2. Kent Graziano says:
    
    June 7, 2015 at 4:26 pm
    
    I will see if I can find it. I am sure I have a copy somewhere.
    
    LikeLike
2. Kent Graziano says:
  
  June 9, 2015 at 8:26 pm
  
  Found it in Super Charge – there is a section starting on page 21 that describes the Business Data Vault.
  
  There is also a more detailed white paper as I mentioned. Trying to track that down for you. It may have been part of a package on LearnDataVault.
  
  LikeLike
  
  Reply
Daniel says:

June 10, 2015 at 12:36 am

Thank you, Kent. Just read the Super Charge section you mentioned, plus it’s description of BDV. I’ve also found blogs by Dan, Ron Damhoff, Hans Hultren and Martijn Evers, which I will plow through asap.

…and right you are. I was referring to Raw Vault (but mistakenly calling it Business Vault), with business-key-aligned Hubs. Previously, I had taken the term Raw Vault to indicate a Vault without cross-source-system business-key-alignment in Hubs (wherein each Hub-Satellite set/ensemble can only be sourced from a single source table). Herein, I’ll just refer to that as …a mess.

LikeLike

Reply
1. Kent Graziano says:
  
  June 10, 2015 at 12:50 am
  
  That makes more sense. We now call that a “source system” vault – highly discouraged. Usually happens when the source has all surrogate keys and the architect is in too much of a hurry to figure out the business keys. So, yes, a mess. Basically you end up with a persistent source stage area that just looks like a data vault.
  
  Glad we cleared that up.
  
  LikeLiked by 1 person
  
  Reply
dlinstedt says:

June 10, 2015 at 2:21 pm

Guys, sorry I’m late to the conversation – here’s my two cents for what it’s worth…

Source System Vault – a Data Vault Model 1:1 with A SINGLE Source system. Lacking any and all integration by business key, highly discouraged – as there is no reason to “restructure data” from the source, just to land it in a Vault Model.

Raw Data Vault or “Data Vault” as I have defined it so many times… a Data Vault Model, with Raw Data – purpose built to be integrated by Business Keys. Business Keys are THE tie to business processes, containing same grain, and same semantic definition across the organization / company (horizontal definitions). They tie directly to the Business Glossary. The rest, as they say – is taught in my DV2 boot camp class. It is also offered clearly in my new book (pre-order on Amazon) Building a Scalable Data Warehouse with Data Vault 2.0.

** I take exception to any other definition offered by any other author around the world. Unfortunately, I have to tow the party line here, as I have seen too many “deviations” of my definitions end up “failing” in the hands of the business **

Business Data Vault – again as I have defined it in the Super Charge Your Data Warehouse book, also available on Amazon, and again in Data Architecture: A Primer for the Data Scientist which I co-authored with Bill Inmon. The BDV is more Data Vault structures, however at a higher level of grain (focused on master data, consolidation, coalesced information, merged, aggregated, and altered information). It meets the business needs, and houses Post-Soft-Rule processed data.

Now: Soft & Hard business rules I’ve only ever published in my class: Data Vault 2.0 Boot Camp, and I’ve not released the article / PDF / white paper that Kent made reference to.

Hard Rules: any “rule” that does not change the grain or the content of the data. Examples of this include: Restructuring, Normalization, Denormalization (with appropriate key retention), Assigning Load Dates, Hash Keys, Sequences, Record Sources, and Populating NULL business keys with Service Level Agreement based values. Generally, low propensity to change, and low complexity, can be done in a pass-through process.

Soft Rules: any “rule” that the business owns, but more than that, any rule that changes grain or content OR CONTEXT (interpretation) of the data set. Basically everything else that a hard-rule ISN’T.

Yes, there is a gray line between the two – especially when Super-Typing and type-coding As John Giles has suggested. Yes, there is a slippery slope problem here, that architects need to be careful of not to get on to. Yes, John Giles approach has merritt – but if you read closely, you’ll also soon realize that his approach is followed / coupled with a set of standards and rules that dictate WHEN TO supertype and when NOT TO supertype, it’s why I like the approach.

No, his approach is not standard, and will not work for all clients. Yes, his approach is considered an Advanced Technique, to be applied by someone who really truly understands the nature of super-typing without OVERLOADING fields. There-in lies the rub.

Hope this helps clear the air.
Dan Linstedt
http://KeyLDV.com – Boot Camp Class for DV2 coming on-line soon!

LikeLike

Reply
dlinstedt says:

June 10, 2015 at 2:23 pm

Guys, sorry I’m late to the conversation – here’s my two cents for what it’s worth…

Source System Vault – a Data Vault Model 1:1 with A SINGLE Source system. Lacking any and all integration by business key, highly discouraged – as there is no reason to “restructure data” from the source, just to land it in a Vault Model.

Raw Data Vault or “Data Vault” as I have defined it so many times… a Data Vault Model, with Raw Data – purpose built to be integrated by Business Keys. Business Keys are THE tie to business processes, containing same grain, and same semantic definition across the organization / company (horizontal definitions). They tie directly to the Business Glossary. The rest, as they say – is taught in my DV2 boot camp class. It is also offered clearly in my new book (pre-order on Amazon) Building a Scalable Data Warehouse with Data Vault 2.0.

** I take exception to any other definition offered by any other author around the world. Unfortunately, I have to tow the party line here, as I have seen too many “deviations” of my definitions end up “failing” in the hands of the business **

Business Data Vault – again as I have defined it in the Super Charge Your Data Warehouse book, also available on Amazon, and again in Data Architecture: A Primer for the Data Scientist which I co-authored with Bill Inmon. The BDV is more Data Vault structures, however at a higher level of grain (focused on master data, consolidation, coalesced information, merged, aggregated, and altered information). It meets the business needs, and houses Post-Soft-Rule processed data.

Now: Soft & Hard business rules I’ve only ever published in my class: Data Vault 2.0 Boot Camp, and I’ve not released the article / PDF / white paper that Kent made reference to.

Hard Rules: any “rule” that does not change the grain or the content of the data. Examples of this include: Restructuring, Normalization, Denormalization (with appropriate key retention), Assigning Load Dates, Hash Keys, Sequences, Record Sources, and Populating NULL business keys with Service Level Agreement based values. Generally, low propensity to change, and low complexity, can be done in a pass-through process.

Soft Rules: any “rule” that the business owns, but more than that, any rule that changes grain or content OR CONTEXT (interpretation) of the data set. Basically everything else that a hard-rule ISN’T.

Yes, there is a gray line between the two – especially when Super-Typing and type-coding As John Giles has suggested. Yes, there is a slippery slope problem here, that architects need to be careful of not to get on to. Yes, John Giles approach has merritt – but if you read closely, you’ll also soon realize that his approach is followed / coupled with a set of standards and rules that dictate WHEN TO supertype and when NOT TO supertype, it’s why I like the approach.

No, his approach is not standard, and will not work for all clients. Yes, his approach is considered an Advanced Technique, to be applied by someone who really truly understands the nature of super-typing without OVERLOADING fields. There-in lies the rub.

Hope this helps clear the air.
Dan Linstedt
http://KeyLDV.com – Boot Camp Class for DV2 coming on-line soon!

LikeLike

Reply
DecisionLab says:

June 12, 2015 at 9:13 pm

Thanks for the insights, Dan. Worth more than two cents, and I do look forward to your upcoming DV2.0 book.

Here is some context for my interest in UDV.
Awhile go, I was directed to model a sprawling source system vault — to be loaded from four source systems with related data but lacking any single field business-keys — by a DW manager who, in hindsight, now seems to have badly misread SuperCharge’s guidance on business key alignment and elected to skip the work of identifying business keys (possibly since the business keys would have needed to be multiple columns alongside each other, or concatenating together). Although new at the time to Data Vault modelling, I was already concerned about the weakness of using simple links (vs. key-aligned hubs) as integration-points within, and across source systems, but was unable to sway the decision or reverse the architectural mandate and get time to identify multiple-column business keys. When I then stumbled-across the ‘Q_ _ _ _’ DW automation product that automatically generates a source system vault and called it DV, their product rep called it a ‘Raw Vault’. Although I avoided the product because of its lack of business key alignment, but walked away with wrong language, believing that ‘Raw Vault’ was a source-system vault. Yes, I now see that SuperCharge, as well as the insightful blog-exchanges between Dan, Ron Damhoff, Huns Hultgren and Martijn Evers did clear up the language of Raw Data Vault vs. (soft-rules) Business Data Vault for me, and from what I read, for others. At any rate, now I consider myself cleansed of the semantic slip-up, so let’s go back to the UDV topic.

Since then, I have designed, instantiated, and overseen ETL for a number of other Data Vault implementations, all but one have been fully business-key-aligned Data Vaults, good (Raw) Data Vaults, across many source systems, and those were very successful implementations under challenging conditions (weak business requirements, many disparate source systems, but go fast …you know the drill). The one which was only ‘lightly’ business-key-aligned was based largely on a sub-type / super-type data source, to which a messy structure of, let’s call them ‘Super-Thing-Type-Reason-Group-Stormy-Monday’ tables had been tossed in. As you describe it, these were badly overloaded fields, and a real mess. This mess was the reason, actually, for my interest in The Universal Data Vault case study to begin with. There had to be a better way.

UDV looks like a thoughtful way to adapt Data Vault to relate to sub-type / super-type data sources. Yes, John Giles’ ‘Universal Data Vault’ (herein ‘UDV’) case study itself looks like an advanced Data Vault hybrid. Advanced, for one thing because, unlike my situation described above, there was no explicit indication by Mr. Giles that the source data itself was already in a sub-type / super-type form, but rather that it was selectively transformed into the form based on soft rules about where granularity ‘sweet spots’ could be nailed down into a manageable number of Hubs and where they needed generalization, in order that the data warehouse avoid an excessive proliferation of tables with many mixtures of granularities of, essentially, the same actual real-world entities across the continuum of meaningful granularities. As such, we can assume that soft (interpretative) business rules were applied in ETL for loading into the UDV. Yes, a hybrid for sure.

Two granularity challenges that were dealt with.
(1) “Deployable Resources” (p. 3 bottom): Granularities varies among data sources, roughly as follows: (‘aircraft’ vs. ‘fixed-wing’ and ‘helicopters’ (now, and perhaps next year a new org might add ‘jet fixed-wing’ and ‘propeller fixed wing’), as well as the following (‘fire truck’ vs. water-mega-tanker-truck or compact fire truck or 4WD ‘slip-on tanker trailer’)
(2) “Asset Contains Asset” (p. 11 top): ‘Fire Response Unit’, where granularity varies between data sources as, in one source, a vehicle containing crew members and, in another, as a team with individuals as well as vehicle(s) as team-components.

Other entities, with more fixed granularities, were described as being more directly manageable as Hubs.

Downside: As with all sub-type / super-type forms (and unlike a entity-based tables), the data structure does not itself establish a natural granularity that aligns 1:1 with a single business process. When overloaded, as you say, Dan, it can make the underlying business logic hard to discren from the data structure. Also, non-grouped, un-filtered aggregates are relatively meaningless which, at least for ad-hoc querying common to analytics, places a bigger burden on indexing to maintain acceptable query performance against large data sets.

Upside: Avoidance of Hub proliferation beyond what was deemed reasonable for useability and reporting according to the client organization’s accepted terminology. Of course, that mixed terminology may eventually show up in reports where.

I appreciate the brain-stretching that this case study, as well as this discussion, have so far both given me. Once stretched, the brain does not contract. If I had seen this case study a few years ago, my inclination would have been to adapt the UDV generalization patterns, and, with it, design a more Lean Data Warehouse based on sub-type / super-type data sources.

LikeLiked by 1 person

Reply
John Giles says:

June 13, 2015 at 3:13 am

G’day Guys,
With so many good comments, I am not sure how much more I can add. But a few thoughts follow.
1. I suspect that there is a danger that some who see the phrase “Universal Data Vault” may very understandably think I might be suggesting it is a “universal” solution to all DV implementation i.e. to be applied in every case. Quite the contrary. The prefix of “Universal” was my attempt to give credit to the data model pattern authors that have made such a huge contribution to our industry. At the end of the article I briefly tried to communicate that there are other options for DV implementations, and I suggest they should most definitely be considered, and often ahead of this option. UDV might best be seen as an advanced Data Vault modelling technique that may well be useful in some circumstances, but it does come with some danger of misuse, especially by less experienced modellers.
2. Further, and as noted by Dan’s comments earlier in this blog (thanks Dan), the paper does suggest there are some guidelines to consider as to when super typing / subtyping may deliver value, and that this UDV approach should be used selectively. I sincerely hope that my case study experiences may have provided some helpful techniques, perhaps most commonly applicable by those who see a worrisome growth in Hubs that maybe could be consolidated.
Please keep the dialogue going. I really appreciate others challenging & extending the conversation.
Regards,
John.

LikeLike

Reply
DecisionLab says:

June 13, 2015 at 2:04 pm

Hi John,
Very good point about ‘Universal’. Readers should understood that by ‘Universal’, you mean ‘generalized with sub-type / super-type’ form, and not ‘Universally Applicable’. Of course, the existing term ‘Universal Data Models’ does inherently carry some of that tantalizing connotation.

Some questions: Am I correct in assuming that your source data, from all of those disparate systems, was not primarily already in sub-type / super-type form? Regardless, did you identify any highly reusable design / code patterns with which to inform the ETL into your UDV?

LikeLike

Reply
John Giles says:

June 16, 2015 at 9:38 am

G’day Daniel,
I will try and address your two questions.
Firstly, whether or not the many sources were already reflecting their own super-type/sub-type structures or not. I look at a small subset, and across those, each reflected its own preferred single level of specialisation. To use the Aircraft versus Helicopter or Fixed-wing scenario (which by the way I introduced to assist in explaining UDV, but was not part of that particular case study), one source system might refer specifically to just aircraft, and another system might implement two separate concepts, namely helicopters and fixed-wing planes. The super-type/sub-type hierarchy was introduced to assist consolidation in the central DV.
Secondly, the ETL. As unfortunately sometimes happens, a consultant moves on before seeing the full development. That happened here (budget constraints). And even after I left, the backloading of legacy data from the 83 organisations was delayed – again, competing funding priorities, I suspect! So I can only comment on some arms-length thinking about partial loading.
The bottom line is that I think ETL into a UDV is very similar to that for a regular DV. A given source may load a physical generalised Hub, but it must map to the (logical) sub-type. The Satellites may be split across some Satellites defined against the super-type, and some against the sub-type(s), but these are just Satellite splits. I expect you may end up with the same number of Hub instances as you would in a regular (more specialised) DV, albeit in a smaller collection of physical Hub tables. Likewise the number of Link instances is likely to be the same whether in a regular DV or in a UDV implementation, again with the UDV having less physical Link tables. So I suspect the level of reusability is potentially going to be similar across either approach. But as always, I welcome the views/comments of others.
One other point. You ask about the ETL to get data in. One of the developers observed a delightful ease in capturing data (as compared to other data warehouse approaches), but did note an increased difficulty in extracting data. At that point in time, I do not believe the client had made a conscious separation of raw DV from business DV, and had not employed techniques such as Bridge and Point-In-Time tables. I suspect this would have helped.
Just my “penny’s worth” (and a penny is less valuable than “two cents” – the value you suggested Dan’s contribution far exceeded!)
Regards,
John.

LikeLiked by 1 person

Reply
DecisionLab says:

June 16, 2015 at 5:36 pm

John,
Regarding your use of the terms: instances…, logical…, physical table, are you doing this to distinquish between a logical data model with specialized (standard Data Vault) logical tables, but a physical data model with the added generalized tables? If so, this would seem to be a good way to chunk-out the otherwise seemingly complex source-to-target mapping, first mapping source to data vault (logical only), and then data vault to UDV (logical and physical).

And yes, it is the nature of consulting that we do not always get to be there to see our designs morph into a completed, loaded database. But, as we get paid while helping others while also learning along the way, we are like the son in the ‘Sixpence none the richer’ passage in one of C.S. Lewis’ books: See ‘…Origin of the Bands Name’ at https://en.wikipedia.org/wiki/Sixpence_None_the_Richer

…it was the best I could do on the ‘two cents’ …’penny’s worth’ continuum…

LikeLiked by 1 person

Reply
John Giles says:

June 17, 2015 at 10:09 am

Daniel,
We have probably all observed debates on what constitutes a logical or physical data model (let alone a conceptual one). I can quickly put all that aside in the context of your question. My comments were not intended to relate to data models at all. It was intended to be about the DV (or UDV) implementation itself.
Can I please refer back to some diagrams in the UDV paper? In one that reflects a more traditional DV implementation, I had a physical table I named Hub_Aircraft – a somewhat specialised Hub. In the following UDV diagram, I had a physical table named Hub_Asset, which could be classed as a super-type. I chose not to create physical Hubs for Aircraft, Fire Trucks, Pumps, Generators etc for the more specialised concepts. Instead, I “logically” noted the existence of these sub-types by defining them as rows in the Asset Type table. I hope that helps clarify my use of the terms “logical” and “physical” in my previous comment.
So I am afraid I am not directly responding to your insightful comments. Can I please briefly throw in a comment that might assist? For me, when I have encountered a difficult DV design decision, sometimes I found it worthwhile to stand back and throw together a “normalised” view of the data as I might model it for an operational source system, and then revisit the DV design decisions. How does this sound to you?
Regards, John.

LikeLiked by 1 person

Reply
1. dlinstedt says:
  
  June 18, 2015 at 4:06 am
  
  Hi John,
  
  Sounds ok to me. I just want to be absolutely clear with the reader base – make sure that others understand the following principle:
  
  1) Base standards of DV state: do not super or sub-type – why? because it leads to overloading & complexity, and it leads down the slippery slope of implementing business rules up-stream of the EDW / DV (which can cause issues in scale and velocity of arrival).
  2) Your minimal and applied use of Super-type is a neat case, and should be utilized with caution: a) you have rules & standards – which is a good thing – that can make it work, b) it doesn’t apply to all cases, c) it’s an advanced technique and should not be applied by people that are new to the DV concepts / constructs. d) you’ve had a lot of experience in this area, so you know where the pitfalls are, and can avoid them with your rules & standards.
  
  Really, that’s what my comment is about. By the way, I am thoroughly enjoying the Nimble Elephant book, awesome context. Thank-you John for the signed copy, everyone should read it.
  
  Cheers mate, Dan
  
  LikeLiked by 1 person
  
  Reply
  1. DecisionLab says:
    
    June 22, 2015 at 10:07 pm
    
    Dan,
    Agreed. Readers should keep w/ standard data vault whenever possible.
    Question: Have you published, or do you know of published content, about if, or how data vault can or should deal with data sources that are already in super-type / sub-type form?
    This was actually the justification for my interest in John’s UDV, because I had to model a data vault sourced from an entity-based data source plus a super-type / sub-type source, specifically one to which the super-type / sub-type model, which was part of a purchased application, had been extended a great deal (perhaps over-extended) by my customer-organization.
    
    LikeLike
2. DecisionLab says:
  
  June 22, 2015 at 10:01 pm
  
  John,
  Yes, that does answer my questions.
  Thanks very much for the excellent feedback. I hope that you will subscribe to the blog to see new my upcoming stuff, much of which will be original content.
  Danielhttps://danielcupton.wordpress.com/wp-admin/edit-comments.php#comments-form
  
  LikeLike
  
  Reply
dlinstedt says:

June 23, 2015 at 4:30 pm

RE: super-sub typing content

I only teach this in my DV2 Boot Camp Course. At this time, I have not published any information in public forums with regard to this particular question.

My short answer is this: keep the grain of the Raw DV at the grain of the source system, especially with the splits of the Satellites by source.

We do not condone, nor generally ascribe people to produce super/sub types in the Vault model (except the business Vault). The reason why is simple: soft business rules are a slippery slope, and today’s super-type is not tomorrow’s super-type. As these rules change, if you’ve embedded them upstream, then you must re-architect your entire solution to make the super-types fit. Not something you can do in a large volume solution.

Anyhow, more is taught in the DV2 boot camp class.

Hope this helps,
Dan

LikeLike

Reply
1. DecisionLab says:
  
  June 23, 2015 at 5:18 pm
  
  Makes sense. Thank you, Dan.
  
  LikeLike
  
  Reply
dlinstedt says:

October 5, 2015 at 8:53 am

A little late on the take here, but John Giles agreed to present this concept at WWDVC 2016 (world wide data vault consortium) http://wwdvc.com – It would be great to see all of you there.

LikeLike

Reply
Daniel Upton says:

October 5, 2015 at 9:09 am

Good to hear! That’s another good reason to attend WWDVC 2016.

LikeLike

Reply

DecisionLab.Net

Universal Data Vault: Hyper-Generalized Data Vault — Review of John Giles’ Case Study

27 thoughts on “Universal Data Vault: Hyper-Generalized Data Vault — Review of John Giles’ Case Study”

Leave a comment Cancel reply

Share this:

27 thoughts on “Universal Data Vault: Hyper-Generalized Data Vault — Review of John Giles’ Case Study”

Leave a comment Cancel reply