Defining the Good Data Model

Designing data models is fun – at least if you are a geek like me. But as much as I like the academic thrill of building something that is complex – I am aware that it is often humans that eventually must see and maintain my beautiful (data) model. How can we design something that humans can understand?

Humans are buggy! In general, they don’t deal well with complexity. You can blame modern education, you can scream and shout, or languish on the fact that the IT industry is riddled with incompetence, you may even throw Kimball or Inmon books at the wall in anger. But the empirical tests all show the same: the wetware is the final test of the model.

I will venture a definition of what constitutes a “good data model” for a DW/BI solution

Good DW model ::=

  1. Data in a good model can be easily consumed
  2. Large data changes in a good model are scalable
  3. A good model provides predictable performance
  4. A good model can adapt to changes in requirements, but not at the expense of 1-3

Ad 1) This is the requirement that makes the model usable and maintainable by humans. For now, note that If you are serving data to an end user, you will most likely require something that is mostly de-normalized, or at least looks like it is. If anyone reading this has successfully implemented a large, 3rd normal form data model that non-computer science users find delightful to query for reporting purposes – please comment. I would be curious to know how you did it.

Ad 2) Our goal is to design data warehouses. This means that we have taken on the task of collecting and aggregating all information in an organisation. We can generally buy more hardware (if we put up a fight with infrastructure) – but we cannot significantly reduce incoming data sizes and still maintain flexibility. Scalability of the model means that we gracefully maintain load speeds as data grows by adding more hardware resources.

Ad 3) If there is one thing end users hate more than slow performance, it is unpredictable performance. The same is true for the poor DBA who has to get up in the middle of the night to fix a batch run that did not complete in time. The good data model seeks to minimise risk by creating predictable response times. Note that this may even happen at the cost of some performance: the predictably “good enough” is not always the fastest (though we take delight in our design when it is). For now, note that predictable performance is not always in reach – but it is a goal we aspire to.

Ad 4) I foresee that this will turn out to be one of the central contention points in the discussions to follow. Changes are a fact of life. If you have not yet experienced the intense pain of changing a beautifully designed data model half way through a project, I recommend you go work a few years for any organisation led by a guy with an MBA degree who believes 10% growth is maintainable forever, and that the organisation must react quarterly to “market changes”. I am sure you will find plenty such employers that will teach you the nature of change for good mammon. For now, I will assume that you believe that “change happens” – though we shall later look more closely at what form such change typically takes.

1-3 must often be balanced with 4. The data model must be flexible in some way; it must remain agile. Yet I will not sacrifice the usability of the model on the altar of agility. And staring into the abyss of slow performance chanting protective spells of “flexibility” is not the game I am in either. I hope to show you that doing a good job at 1-3 will often remove some of the fears bred by requirement 4.


  1. Pingback: A Data Warehouse in 4 steps « Data Warehouse Junkie

  2. Jag   •  


    I attended your BI pre con at sqlbits in Brighton, look forward to your blog posts.


    • Thomas Kejser   •  

      Hi Jag

      Looking forward to your comments

  3. Marco Schreuder   •  

    Hi Thomas,

    This has the makings of a promising series on data warehouse modeling. Look forward for more.

    With respect to 1). Seems to me that that could easily be accomplished by giving your business users only access to the data marts and not to the relational data warehouse. This has many advantages with respect to the flexibility you need for 4)

    As to your earlier question to prettify your code. Not exactly a plug-in but very handy:

    • Thomas Kejser   •  

      Hi Marco

      I was actually hoping for such a comment 🙂 You are correct that this will partially be achieved by building a seperate database for the business users. Part of my recommendations will include some amount of data marting. HOWEVER, you are just moving a problem around by doing this. Yes, you have provided your business users with a pretty star schema (bullet 1) – but you have also moved the problem of consuming data to the person that must now load the mart. With regards to changing requirements (bullet 4) – you are not really adapting, since you now have rewrite all your data mart loaders when you change the main DW – no free lunch. I wont give the full argument away (yet) – but one of my key points will be that you don’t actually solve anything by introducing new models as intermediate areas of storage (you may see this as a direct contradiction of Inmon and a clear contradictio of the data Vault Movement). Stay tuned and looking forward to hearing you comments in this

      • Marco Schreuder   •  

        Yes,you’ll divide the loading of the data warehouse into 1) loading your relational data warehouse and 2) loading the data marts. And maybe that’s more work initially. (Although I think most of step 1 can be automated)
        But the advantage is of course that you will only have to build the loading for part 2, if requirements change or part 1, if source systems change. And yes it’s not that black and white, but in essence you’ll save yourself a lot of work in the long run.

        Look forward to the rest of your arguments and who knows … maybe you can change my point of view.

Leave a Reply

Your email address will not be published. Required fields are marked *