Digital Key

Boosting INSERT Speed by Generating Scalable Keys

Throughout history, similar ideas tend to surface at about the same time. Last week, at SQLBits 9, I did some “on stage” tuning of the Paul Randal INSERT challenge.

It turns out that at almost the same time, a lab run was being done that demonstrated, on a real world workload, a technique similar to the one I ended up using. You can find it at this excellent blog: Rick’s SQL Server Blog.

Now, to remind you of the Paul Randal challenge, it consists of doing as many INSERT statements as possible into a table of this format (the test does 160M inserts total)

CREATE TABLE MyBigTable (
c1 UNIQUEIDENTIFIER ROWGUIDCOL DEFAULT
NEWID ()
,c2 DATETIME DEFAULT GETDATE ()
,c3 CHAR (111) DEFAULT ‘a’
,c4 INT DEFAULT 1
,c5 INT DEFAULT 2
,c6 BIGINT DEFAULT 42);

Last week, I was able to achieve  750K rows/sec (runtime: 213 seconds) on a SuperMicro, AMD 48 Core machine with 4 Fusion-io cards with this test fully tuned. I used 48 data files for best throughput, the subject of a future blog.

Now, I suspect Paul used NEWID() for key generation, to create as much I/O as possible with page splits. If the goal is purely to optimize throughput, why not ask: Is NEWID really the best way to generate a key?

Traditional SQL Server knowledge dictates that either IDENTITY(1,1) or NEWSEQUENTIALID are better key choices: they lay out the keys sequentially, at the end of the index. But one should not easily trust such recommendations without validating them. I did the above test on the 48 core box Fusion-io lend me, and here are the results:

 

image

This might shock you! The “Traditional, sequential layout” of the key is more than 100 times SLOWER than completely random inserts!

The CPU are NOT at 100% during this (they were when using NEWGUID). The problem shows itself when we dig into sys.dm_os_wait_stats. (by the way, below is my new: “this is how I think about it” diagram, let me know if you like this way of doing things and I will continue to use this format for illustrating troubleshooting mind processes)

image

We are waiting for a PAGELATCH on the last pages of the index. This wait dominates the run. The price of page splits is minor (when you have a fast I/O system) compared to the price of coordinating the last page memory access!

As Rick writes in his blog, there are ways to work around this problem by being smart about your key generation. Basically, you want a key that is NOT sequential. Rick flips the bits around in a Denali SEQUENCE objects – a very neat idea that uses a new SQL Server Denali feature.

For my test at SQLBits (which ran on 2008R2), I used a variant of the same idea: I took the @@SPID of the users session and added an increasing value (increments controlled by the app server) to each insert made. On reflecting, using the SPID is not a good idea, you should probably generate the key something like this:

INSERT Key =

[Unique App Server ID] * 10**9 + [Unique Number Generated at App Server]

I also tried the hash trick I have described here – to compare the techniques.

Below are the results of my test of Paul’s INSERT Challenge with different key strategies (I have taken out the sequential ones, since we have seen how poorly they perform)

image

The last column is a trick I will describe in more detail in a later blog. Suffice to say that I reached 1.1M INSERT/sec (runtime: 150 sec) using a key generation strategy very similar to Rick’s.

Edits:

  • The UNIT in the introduction was originally wrong. it should of course be 750K not 750M as can be seen later in the post.

For Myth busting, see the site map:DW and Big Data

  No Comments

  1. Pingback: Michał Simoniuksztis o programowaniu w .NET » Archiwum bloga » Lekcja 4. Typy danych i funkcje wbudowane języka T-SQL

  2. Pingback: Clustered Indexes vs. Heaps | Thomas Kejser's Database Blog

  3. Pingback: Synchronisation in .NET– Part 1: lock(), Dictionaries and Arrays | Thomas Kejser's Database Blog

  4. Pingback: MySQL – First Impressions | Thomas Kejser's Database Blog

  5. Pingback: Data into results » Fast creation of surrogate keys in Greenplum

  6. datatake   •  

    Thomas,

    I’m intrigued as to what the new sequence feature in SQL 2012 would give you with the cache option used. I believe that Rick mentioned this on his blog.

    Regards,

    Chris

    • Thomas Kejser   •  

      A bit reversed, cached, sequencer would be a very good solution to the problem. Ricks code is great and has been proven to work at very high scale.

      Oracle has a feature called “reverse indexes” which simply store the bits in reverse order and achieve the same effect. The query optimizer will then flip the bits when you search for a value before it digs into the index.

  7. Ed   •  

    Hi Thomas. Interesting article, thanks. Would you know how read performance is correlated with each approach?

    • Thomas Kejser   •  

      Hi Ed

      Good question. It would depend on the type of read you are doing. For unordered scans and singletons, Perf is pretty much the same. Fr range scans, there is a cost which will depend on I/O system etc.

  8. Kunal   •  

    “Write Cliff” on flash cards

    • Thomas Kejser   •  

      That’s a reasonable concern and one I should have addressed in the blog.

      I have not seen the write cliff happen on the Fusion cards, even under very long and sustained load (including synthethic high load harness). It really depends on the driver/controller for NAND – the enterprise grade stuff does not run into the issue.

  9. Kunal   •  

    Very informative blog! It’d interesting to see if one lets it run for few hours at that rate @ M+ row/sec….

    • Thomas Kejser   •  

      I only ran the test for max one hour. Intuitively, I don’t see why running it longer should change anything – what worries you about it Kunal?

  10. Pingback: Happy New Year Update – and OOF « Thomas Kejser's Database Blog

  11. Russell Fields   •  

    Thomas, Could you clarify please. Your first bar chart shows newguid() as the fastest, which you report at 750M rows/sec. However the second bar chart show SPID + Affinity as faster than newguid at 1.1M inserts/sec. (From this I am guessing the first number should be 750K rows/sec, but perhaps I am missing something.)

    • Thomas Kejser   •  

      Sorry Russell, you are correct. the GUID is 750K.. while the SPID+affinity is indeed 1.1 milliion (and faster than the GUID)… I will correct units

      Thanks for finding this

  12. Farhan Soomro   •  

    I take my question back after looking at the table structure :)

  13. Farhan Soomro   •  

    Great Blog Thomas and I like the diagram.
    Did yo try to insert in Heap?

  14. Hi Thomas,

    I really like the new “this is how I think about it” diagram as it gives, at least to me som insights to how you could go about analysing som long running statements.

    Keep up the good work, as i really enjoy reading your blog, and find it very inspiring.

    Best Regards
    Kenneth M. Nielsen

    • Thomas Kejser   •  

      Thanks Kennet, I will try to use this more in future troubleshooting blogs

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">