As my regular readers have noticed, activity on my blog has slowed down lately. My new job at Livedrive is keeping me very busy and excited. It’s the opportunity I have been looking for: right in the middle of the Open Source vs. Microsoft cloud battle (and in a hectic development environment). I am greatly enjoying myself in this space.
However, I have precious little time to blog about SQL Server. And quite frankly, by now I feel there isn’t much more left for me to say on this subject. The time has come for me to move on to other subject areas and master new skills.
How will this affect my public speaking appearances?
In my new job as the CTO of Livedrive, I have the pleasure of working with both Microsoft SQL Server and MySQL. We have a rather nice real estate with tens of petabytes online which keeps us entertained with scale challenges.
Having spent some time with MySQL lately, and being an old SQL Server user, I thought it might be interesting to share some of my early experiences. The good, the bad and the ugly.
These days, we seem to be high on data and data related trends. My opinion on Big Data should be well known to my readers: it is something that has to be carefully managed and largely a fad for all but a select few companies.
With data being the new black, similar trends grab the attention of modern managers. One of these is Self Service. It seems like such a logical consequence of our advanced data visualisation: democratise the data.
It’s worth noting that the notion of humans making better decisions when well served with information is rather old. Thomas Jefferson said: “whenever the people are well-informed, they can be trusted with their own government”. But what exactly does it mean to be well-informed? Another great statesman, Churchill, said: “The best argument against democracy is a five-minute conversation with the average voter”.
In this blog entry, I will argue that is does not follow that humans will make better decisions if we just give them access to more data. In fact, allowing people to self-service their data can be outright harmful.
The problem I would like to talk to you about is one that is well examined and well understood, yet continues to create frustrations with customers. Unfortunately, some of these frustrations come from “best practices” that people apply consistently, which in turn consistently gets them into trouble.
When you are done reading this post, I hope you will have a better understanding on how to avoid these pitfalls. It is also my hope that some of your expectations about query optimisation will have been adjusted.
Finally, I have found some time with my good colleagues at Fusion-io to work on some of my old SQL Scripts.
Our first script queries the servers for wait stats – a very common task for nearly all SQL Server DBAs. This is the first thing I do when I meet a new SQL Server installation and I have generally found that I apply the same filters over and over again. Because of this, I worked with Sumeet Bansal to standardise our approach to this.
You can watch Sumeet introduce the script in this YouTube video: http://youtu.be/zb4FsXYvibY. A TV star is born!
We have already used this script at several customers in a before/after Fusion-io situation. As you may know, the tuning game changes a lot when you remove the I/O bottleneck from the server.
Based on our experiences so far, I wanted to share some more exotic waits and latches we have been seeing lately.
Last week, at SQL Saturday Exeter, I did a new Grade of Steel experiment: to quantify just how expensive it is to do PAGE compression.
The presentation is a 45 minute show, but for those of you who were not there, here is a summary of the highlights of the first part.
My tests were all run on the TPC-H LINEITEM table at scale factor 10. That is about 6.5GB of data.
Test: Table Scan of Compressed Data
My initial test is this statement:
Because the statement only returns one row, the result measured does not drown in client transfer time. Instead, the raw cost of extracting the columns from the compressed format can be quantified.
The result was quite shocking on my home 12 core box:
Even when doing I/O, it still takes quite a bit longer to scan the compressed format. And when scanning from DRAM, the cost is a whopping 2x.
A quick xperf run shows where the time goes when scanning from memory
Indeed, the additional CPU cost explains the effect. The code path is simply longer with compression.
Test: Singleton Row fetch
By sampling some rows from LINEITEM, it is possible to measure the cost of fetching pages in an index seek. This is the test:
FROM LI_SAMPLES S
INNER LOOP JOIN LINEITEM L ON S.L_ORDERKEY = L.L_ORDERKEY
OPTION (MAXDOP 1)
This gives us the plan:
Which has the desired characteristics of having the runtime dominated by the seek into the LINEITEM table.
The numbers again speak for themselves:
And again, the xperf trace shows that this runtime difference can be fully explained from longer code paths.
Test: Singleton UPDATE
Using the now familiar pattern, we can run a test that updates the rows instead of selecting them. By updating a column that is NOT NULL and an INT, we can make sure the update happens in place. This means we pay the price to decompress, change and recompress that row – which should be more expensive than reading. And indeed it is:
Quoting a few of my tests from my presentation, I have shown you that PAGE compression carries a very heavy CPU cost for each page access. Of course, not every workload is dominated by accessing data in pages – some are more compute heavy on the returned data. However, in these days when I/O bottlenecks can easily be removed, it is worth considering if the extra CPU cycles to save space are worth it.
It turns out that it is possible to also show another expected results: that locks are held longer when updating compressed pages (Thereby limiting scalability if the workload contains heavily contended pages). But that is the subject of a new blog entry.
I am happy to announce that my tuning course and newly developed Data Warehousing course is now available.
- Tuning Course in London (13th June 2013)
You can still make it for the early bird rates. This course is probably THE deepest level course about SQL Server out there. You will learn the nitty-gritty details of xperf profiling, spinlock detection and multi-threaded optimisation.
We are going to be digging deep below the surface of SQL Server here, be prepared for an intensive day.
Data Warehousing Course
Data Warehousing in Sweden (7th February 2013)
This will be a very unique course with both Davide Mauri and myself teaching. Its a one day intensive training in my usual “no nonsense” style where you learn about data modeling from two of the leading experts in the field. The course extends the ideas I have blogged extensively about here.
There are still seats left, but they are going fast.
I look forward to seeing you there!