First publicly available courses –in Aarhus, Denmark – 23-24 October
I am pleased to announce that both my Tuning Course and the Data Warehouse Modeling course will be run in Aarhus Denmark. The courses will be held in English, even though they are hosted in the very city I was born.
The course will be hosted by Orangeman, who will be handling the logistics. There are still available spots and if you act fast, you can get the early bird price.
The relevant details:
- 23rd October 2012: Tuning Course
- 24th October 2012: Data Warehouse modeling
See you there for some intense days of tuning and modeling.
Joining FusionIo
I am very happy to announce that I have signed a contract with FusionIo and will be joining them as CTO of EMEA from 1st September 2012.
As many of you know, I have worked together with FusionIo on many occasions and really enjoyed the collaboration. I believe that their products hold the keys to a new era of computing and it is an honour to join their ranks. I will be looking forward to doing a lot of exciting research and customer implementations for them.
This brings me to the work I have been doing since I left Microsoft. Here is how it will transfer:
Consulting Contracts
I have contracts with some customers open. These are all due to terminate before 1st September and I will of course honour my agreements here. Unfortunately, my new job will not allow me to continue the collaboration with these customers on a consulting basis after this. The good news is that my courses will still be available and I will be able to share my knowledge through this channel.
Courses and Conferences
Contributing information to the community is one of my great passions in life. FusionIo has allowed me to continue to pursue this interest. My courses will still be available, although only for a very limited amount of days every month as the course time will be coming out of my vacation days (hint on how to get a discount). I expect demand to be high. There are already three tuning courses set up across Europe which will be held as planned and a lot of people have made it clear they want more. I will be announcing the exact dates for courses planned on this blog soon and let you know how to join the courses that are open to the public. The material is looking amazing and is using the new format that has evolved at SQL BITS and driven the top scores there. I expect this will be my best presentations yet. I am also happy to announce that my data modelling course is well underway and will be available soon.
I will continue to submit abstracts for conferences and stay in close touch with the community, just like I have always done. And this brings me to:
Grade of the Steel
I am very excited that FusionIo has an interest in expanding the testing I have done with my Grade of the Steel Project. I will continue to run benchmarks on the latest and greatest storage and provide non volatile memory specific configuration and tuning guidance. Exactly which format the publications will take is too early to say, I will keep you posted on this blog.
What is the Best Sort Order for a Column Store?
As hinted at in my post about how column stores work, the compression you achieve will depend on how many repetitions (or “runs”) exist in the data for the sort order that is used when the index is built. In this post, I will provide you more background on the effect of sort order on compression and give you some heuristics you can use to save space and increase speed of column stores. This is a theory that I am still only developing, but the early results are promising.
Big/Big Table Joins
With the popularity of my last blog entry on Dangerous Joins, I felt inspired to write a bit more about the join strategies. Thanks for participating and reading, there seems to be a large appetite for Data Modeling out there – this blog now has over 7K unique visits every months.
Today, let us look at strategies for joining two big tables together.
How Vertical Partitioning and Deep Joins Kill Parallelism
Query plans with deep joins trees are often the result of high levels of normalisation in the data model. There are large advantages to normalising data as it minimizes the amount of data that must be written when a change happens. In traditional OLTP systems, this can be a boon.
However, normalisation is not without its costs – especially not in read intensive workloads like data warehouses.
Modeling Dimensions with History Tracked, Generic Attributes
Sometimes, you need to model a database in such a way that you can dynamically extend the model without altering any tables. Perhaps the attributes changes faster than you can add new columns or the data you store has a “ragged” structure that does not lend itself well to being described as a traditional table structure.
The typical approach taken in these cases is to “pivot” the required “flexible columns” of a table into rows instead and dynamically reconstruct the schema at runtime. When you model like this, adding new attributes to the model is simply a question of inserting rows into the database.
As I sure you are aware, there are issues with this approach. Tuning columns into rows can cause interesting issues for database engines. The flexibility requirement and performance loss to be balanced carefully. In this blog, I will walk you through an example of how to handle this generic case in a star schema.
How do Column Stores Work?
In this blog, I will provide you with some basic information about column stores. Nothing I am writing here is vendor specific IP, but merely taken from the papers published throughout history. One of the best papers that serves as an introduction is by Stonebraker:
- Stonebraker et a, Proceedings of 31st VLDB Conference, 2005: C-Store: “C-Store: A Column-oriented DBMS”
- The idea is older than that though, with the first papers published in the 1970’ies.
Shamefully standing on the shoulders of giants, I will walk you through a simple example which illustrate one of the key principles of column stores: Run Length Encoding (RLE).