Archive

Archive for the ‘SQL Server’ Category

Bottleneck Diagnosis on SQL Server – New Scripts

April 11, 2013 11 comments

Finally, I have found some time with my good colleagues at Fusion-io to work on some of my old SQL Scripts.

Our first script queries the servers for wait stats – a very common task for nearly all SQL Server DBAs. This is the first thing I do when I meet a new SQL Server installation and I have generally found that I apply the same filters over and over again. Because of this, I worked with Sumeet Bansal to standardise our approach to this.

You can watch Sumeet introduce the script in this YouTube video: http://youtu.be/zb4FsXYvibY. A TV star is born!

We have already used this script at several customers in a before/after Fusion-io situation. As you may know, the tuning game changes a lot when you remove the I/O bottleneck from the server.

Based on our experiences so far, I wanted to share some more exotic waits and latches we have been seeing lately.

Read more…

Quantifying the Cost of Compression

March 11, 2013 17 comments

Last week, at SQL Saturday Exeter, I did a new Grade of Steel experiment: to quantify just how expensive it is to do PAGE compression.

The presentation is a 45 minute show, but for those of you who were not there, here is a summary of the highlights of the first part.

My tests were all run on the TPC-H LINEITEM table at scale factor 10. That is about 6.5GB of data.

Test: Table Scan of Compressed Data

My initial test is this statement:

SELECT MAX(L_SHIPDATE)
, MAX(L_DISCOUNT)
, MAX(L_EXTENDEDPRICE)
, MAX(L_SUPPKEY)
, MAX(L_QUANTITY)
, MAX(L_RETURNFLAG)
, MAX(L_PARTKEY)
, MAX(L_LINESTATUS)
, MAX(L_TAX)
, MAX(L_COMMITDATE)
, MAX(L_RECEIPTDATA)
FROM LINEITEM

Because the statement only returns one row, the result measured does not drown in client transfer time. Instead, the raw cost of extracting the columns from the compressed format can be quantified.

The result was quite shocking on my home 12 core box:

image

Even when doing I/O, it still takes quite a bit longer to scan the compressed format. And when scanning from DRAM, the cost is a whopping 2x.

A quick xperf run shows where the time goes when scanning from memory

image

Indeed, the additional CPU cost explains the effect. The code path is simply longer with compression.

Test: Singleton Row fetch

By sampling some rows from LINEITEM, it is possible to measure the cost of fetching pages in an index seek. This is the test:

SELECT MAX(L_SHIPDATE)
, MAX(L_DISCOUNT)
, MAX(L_EXTENDEDPRICE)
, MAX(L_SUPPKEY)
, MAX(L_QUANTITY)
, MAX(L_RETURNFLAG)
, MAX(L_PARTKEY)
, MAX(L_LINESTATUS)
, MAX(L_TAX)
, MAX(L_COMMITDATE)
, MAX(L_RECEIPTDATA)
FROM LI_SAMPLES S
INNER LOOP JOIN LINEITEM L ON S.L_ORDERKEY = L.L_ORDERKEY
OPTION (MAXDOP 1) 

This gives us the plan:

image

Which has the desired characteristics of having the runtime dominated by the seek into the LINEITEM table.

The numbers again speak for themselves:

image

And again, the xperf trace shows that this runtime difference can be fully explained from longer code paths.

Test: Singleton UPDATE

Using the now familiar pattern, we can run a test that updates the rows instead of selecting them. By updating a column that is NOT NULL and an INT, we can make sure the update happens in place. This means we pay the price to decompress, change and recompress that row – which should be more expensive than reading. And indeed it is:

image

Summary

Quoting a few of my tests from my  presentation, I have shown you that PAGE compression carries a very heavy CPU cost for each page access. Of course, not every workload is dominated by accessing data in pages – some are more compute heavy on the returned data. However, in these days when I/O bottlenecks can easily be removed, it is worth considering if the extra CPU cycles to save space are worth it.

It turns out that it is possible to also show another expected results: that locks are held longer when updating compressed pages (Thereby limiting scalability if the workload contains heavily contended pages). But that is the subject of a new blog entry.

Configuring Kernel Debugging with WinDbg and a NULL modem

December 5, 2012 2 comments

imageLately, I have been digging deep into Windows to get really low level with the the I/O path SQL Server takes (yep, there is an even deeper layer to understand fully).

Once you start playing around with the Windows Kernel, you will at some point need kernel level debugging set up. Traditionally, this is something I have used a Windows machine for, and even there, it can be painful to get working. As you may know, I have switched to Mac as my client machine and my Windows utilities (including WinDbg) now run in VMWare Fusion – it looked like I was heading into an interop nightmare…

Windows 8 allows kernel debugging directly over the network card, but how does one configure kernel debugging with a Windows 2008R2 target from a Mac with VM Ware?

I found a very cheap solution today. You need:

  • A USB to Serial (RS-232) converter
  • A NULL modem (I would recommend getting a long one, so you don’t have to sit next to the server)
  • A  target server that you want to debug
  • A serial port on the target server
  • WinDbg from the Windows SDK in a virtual machine on the Mac
  • Symbols set up as per my previous post

A USB converter and a NULL modem is a dirt cheap way to get the required hardware for kernel debugging. I sourced my cables from Maplins (Thanks @SQLServerMonkey) in the UK – for a total of around 20 GBP.

Step 1: Prepare the Client/Debugger

A Macbook air, like most other lightweight laptops, does not have a serial port. So, we have to use a USB/Serial converter. It is possible to debug directly over USB, but good luck with that from a Mac, I didn’t have the courage.

Make sure VMWare routes the USB device to the Windows Guest OS and not the Mac. It will look something like this in VMWare Fusion 5:

image

Now, install WinDbg and set up your symbol paths if they are not set up already.

Check your Device Manager in the client to see which COM port the USD device created. As you can see below, my laptop mounted the USB/Serial converter as COM3 

image

 

After installing the device, I had to restart my virtual machine before VMWare would let me mount it – but what can you expect from a 10 GBP component? Your mileage may vary depending on the serial/USB driver you have.

Step 2: Connect Client and Server

Using the NULL modem, connect the USB/Serial converter to the server’s Serial port.

Make sure the server has the serial port enabled in the BIOS (my Dell box had it disabled, had to re-enable).

Step 3: Configure Server/Debugee for kernel debugging

Log into the server, start a command line as administrator

First, copy the default startup options into a new boot option:

  • BCDEDIT /copy {current} /d DebugMode

This will create a new entry in the boot list when the server starts. Make a note of the GUID returned or copy it to the clipboard (using the ever so annoying copy/paste function in the Windows command prompt)

Next, configure the parameters for serial cable debugging:

  • BCDEDIT /set <GUID> debugport <port #>
  • BCDEDIT /set <GUID> debugtype serial
  • BCDEDIT /set <GUID> baudrate 115200

Replace the <GUID> the guid returned previously. Set <port #> to the COM port the NULL modem is connected to in your server (NOT the COM port of the client). For example, if the server has the NULL modem in COM2, set <port #> to 2.

Finally, enable debugging for on the newly created boot option:

  • BCDEDIT /debug <GUID> ON

Validate that your configuration works by running:

  • BCDEDIT /v.

You should get an output somewhat like this (this is also how you find the GUID if you didn’t note it down before):

image

If you want to make sure debugging is always turned on (great for a sandbox machines where you explore stuff in the kernel) you can use BCDEDIT /default <GUID> to make debugging the default startup option.

Step 3: Start Debugging

You are now ready to start debugging the windows kernel on the server. Here is how:

On the client, start WinDbg and choose File—>Kernel Debug (or CTRL+K) and set up the com port you got in step 1:

. image

Press OK, and reboot the server. If you didn’t select the debug configuration as the default boot option, make sure you pick it when the server starts.

If you have done things right, you will get something like this in WinDbg (below, I broke execution with CTRL+C)

image

One thing to note when you are debugging the kernel: Not all your typical WinDbg commands work as they normally do (for some good reasons), but that is outside of scope for this blog entry.

Time to dig in even deeper… my Macbook Air to a Windows box :-) …Happy hacking everyone.

Moving on from Microsoft

May 9, 2012 17 comments

Not too long ago, I handed in my notice to Microsoft, terminating my employment mid-June 2012. I suspect this is slowly leaking out on the Twitter and Facebook these hours – as is tradition in the connected world.

Read more…

Why You Need to Stop Worrying about UPDATE Statements

April 27, 2012 4 comments

There seems to be a myth perpetuated out there in the database community that UPDATE statements are somehow “bad” and should be avoided in data warehouses.

Let us have a look at the facts for a moment and weigh up if this myth has any merit.

Read more…

Conference Updates

April 25, 2012 Leave a comment

A few weeks ago, I presented at SQL Bits X. The slides from my co-presentation with Conor are available here:

Last week, I spoke at Microsoft Open World 2012 doing quite a lot of presentations. I have created a new and extended “Grade of the Steel” deck that I hope to blog more about my findings (these things tend to start as PowerPoint slides and evolve into blog entries).

If you want a copy of any of my presentations, please just mail me. Speaking of presentations, I have found this really cool tool that will compress slides: NXPowerLite. This neat tool has taken off around 60-80% of my slide sizes.

Lately, I have been practicing zen style presentations: max four sentences on each slide and heavy use of visualisations instead. I highly recommend putting yourself under such constraints. After I removed the text from my slides I feel much better improvising on stage.

When Statistics are not Enough – Search Patterns

February 21, 2012 4 comments

Co-author: Lasse Nedergaard

Yesterday, Lasse ran into an issues with a query pattern in the large database that he is responsible for. Based on our conversation, we wrote up this blog and created a repro.

The troublesome query we were debugging executed like this:

  1. Find a list of keys values to search for
  2. Insert these keys in a temp table – lets call this the SearchFor table
  3. Join the temp table to a large table (lets call it BigTable) and retrieve the full row from the large table

Why not use a correlated sub query in step 2? In this case, the customer in question had multiple code paths (including one accepting XML queries) that all needed to pass thousands of key to a final search procedure. They wanted a generic way to pass these key filters to to the final access of BigTable.

Read more…

Running Many Batch Statements in Parallel

January 8, 2012 6 comments

imageWhen designing highly scalable architectures for modern machines, you will often need to do some form of manual parallelism control. Managing this is not always easy, but in this blog I will give you one piece of my toolbox to help you.

Let us walk through an example together, a tiny case study. This is a problem which many of you will be familiar with.

Read more…

Implementing MurmurHash and CRC for SQLCLR

December 7, 2011 2 comments

As we saw in my previous post, the build in hash functions of SQL Server were either expensive with good distribution, or cheap, but with poor distribution. As a breath of fresh air, let us look at a useful magic quadrant:

 

image

Read more…

Thread Synchronization in SQL Server

November 9, 2011 4 comments

Any code optimized for highly concurrent workloads must worry about thread synchronization. SQL Server is no exception, because in a database system, synchronization is one of the core functionalities you rely on the engine to provide (the noSQL crowd may ponder that a bit). In this blog post, I will describe the synchronization primitives, like locks, latches and spinlocks, used by SQL Server to coordinate access to memory regions between threads.

Read more…

Follow

Get every new post delivered to your Inbox.

Join 615 other followers