Under subject areas, you can find a sample of the things I like working with. There are some general themes in this blog which I will summarise here.
Big Data vs. Data Warehousing
My basic architectural argument is that you should target data warehouse products to systems that HAVE a data model and that Big Data is best used where the model is more loosely defined and near schema-less – which include many of the data structures traditionally used for auditing and history tracking.
If you think of the data flow at a very high level, from a source to the warehouse, this is my approach:
I have written quite a few blog posts that I hope you will find are coming together in a coherent theme of design guidelines. To make it easier to locate them, I have provided the index below.
Keys and how to handle them
In order to deliver clean and integrated data, the proper and high performance handlings of keys is crucial. I have blogged extensively about this here:
- Why Integer Keys are the Right Choice
- Good keys, what are they like?
- An Overview of Source Key Pathologies
- Transforming Source Keys to Real Keys (two parts)
- Physically Placing the Maps in the architecture
- Why Surrogate Keys are not Good Keys
Information about large databases and how they work.
I like to put “common sense” recommendations from the database community to the test. Here are the posts I have written about this so far: