Boxing Ring

The Data Vault vs. Kimball

Here we go again, the discussion about the claimed benefits of the Data Vault. Thomas Christensen has written some great blog posts about his take on the Vault method. Dan Linstedt has been commenting.

The discussions are a good read to track:

Apologies in advance for my spelling errors in the comments, the posts are written while travelling.

As with most complex subjects, it is often hard to have people state with clarity what EXACTLY their claim is and what their supporting arguments are. It seems that the Vault is no exception – I hope the discussions lead somewhere this time.

For your reference, here are some of the posts I have previous done that solve the postulated problems with the Kimball model.

Notice that there are some very interesting claims being made about normalization creating more load and query parallelism in the comments on Thomas Christensen’s Blog by Sanjay. I personally look forward to hearing the argument for that.

  3Comments

  1. Dave Ballantyne   •  

    Thanks for accumulating these, and would I be correct in assuming that your viewpoint hasn’t changed since these discussions took place. I too have been frustrated with the DV and the religious arguments that are thrown up, you have to believe to see the point in believing. These seems to be even more compounded by the “extra secret sauce” ( my words) in the DV model 2 http://danlinstedt.com/datavaultcat/data-vault-2-0-being-announced/ It may also be interesting to note that the DV model allows for a “point in time” table http://www.tdan.com/view-articles/5067/ that de-normalises away the BETWEEN problem.

    • Thomas Kejser   •     Author

      Dave, my view basically has not changed. I think that DV is a dangerous modelling technique because it pretty consistently gets you into trouble with relational database and their optimisers. It is also much hard to do than it needs to be – there really isn’t (much) wrong with the dimensional model. What troubles me a lot is that is very hard to pinpoint exactly what problem it is that Dan claims to solve.

      I am aware of the de-normalised “solution” to the BETWEEN problem. But one really has to ask: If that is necessary (and it has a significant storage overhead) why did you bother doing it like this in the first place?

  2. Pingback: WMP Blog » Data Warehouse Architecture: Inmon CIF, Kimball Dimensional or Linstedt Data Vault?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">