<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Thomas Kejser&#039;s Database Blog</title>
	<atom:link href="http://blog.kejser.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.kejser.org</link>
	<description>Fighting bad Data Modeling</description>
	<lastBuildDate>Tue, 29 May 2012 10:34:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.kejser.org' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/ce0fc1bca7cb608b3830d115543e7a5e?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>Thomas Kejser&#039;s Database Blog</title>
		<link>http://blog.kejser.org</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.kejser.org/osd.xml" title="Thomas Kejser&#039;s Database Blog" />
	<atom:link rel='hub' href='http://blog.kejser.org/?pushpress=hub'/>
		<item>
		<title>Implementing Message Queues in Relational Databases</title>
		<link>http://blog.kejser.org/2012/05/25/implementing-message-queues-in-relational-databases/</link>
		<comments>http://blog.kejser.org/2012/05/25/implementing-message-queues-in-relational-databases/#comments</comments>
		<pubDate>Fri, 25 May 2012 16:25:09 +0000</pubDate>
		<dc:creator>Thomas Kejser</dc:creator>
				<category><![CDATA[Engines]]></category>
		<category><![CDATA[Grade of the Steel]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[FusionIO]]></category>
		<category><![CDATA[Latching]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Queues]]></category>
		<category><![CDATA[Relational]]></category>

		<guid isPermaLink="false">https://kejserbi.wordpress.com/?p=550</guid>
		<description><![CDATA[At the last SQL Bits X I held the FusionIO fireside chat during the launch party. During this presentation, I demonstrated how it is possible to build a table structure inside a relational engine that will act is a message queue and deliver nearly 100K messages/second. My design was inspired by the LMAX Disruptor Pattern [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=550&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>At the last <a href="http://www.sqlbits.com/">SQL Bits X</a> I held the <a href="http://www.fusionio.com/">FusionIO</a> fireside chat during the launch party. During this presentation, I demonstrated how it is possible to build a table structure inside a relational engine that will act is a message queue and <strong>deliver nearly 100K messages/second</strong>.</p>
<p><span id="more-550"></span>
<p>My design was inspired by the <a href="http://code.google.com/p/disruptor/">LMAX Disruptor Pattern</a> and used sequencer objects to greatly boost throughput of the queue structure.</p>
<p>In this blog entry, I will show you how I build this queue structure and give you enough details to do so yourself.</p>
<p>But first, lets me walk you through the problem.</p>
<p style="padding-left:20px;"><em><font size="1">Note: In the following I will use the terms push and pop for respectively adding and deleting messages in a queue. You may have learned the terminology enqueue and dequeue instead, same thing.</font></em></p>
<h3>The Problem Statement</h3>
<p>Every now and again, relational database designers find themselves creating a table that ends up acting as a message queue. Such tables tend to fluctuate a lot in size, from zero rows to several millions. Furthermore, the tables typically need to be ordered so fairness of message pushing and popping. The data flow looks something like this:</p>
<p><a href="http://kejserbi.files.wordpress.com/2012/05/image2.png"><img style="background-image:none;padding-left:0;padding-right:0;display:inline;padding-top:0;border-width:0;" title="image" border="0" alt="image" src="http://kejserbi.files.wordpress.com/2012/05/image_thumb2.png?w=443&h=188" width="443" height="188" /></a></p>
<p>In the following examples, I will assume a message size of 300B – but the argument flies for larger and smaller message sizes too.</p>
<h3>Relational Message Queues – Naïve</h3>
<p>When faced with a table that need to be ordered where the same rows are frequently added and shortly after deleted, something like this is the typical design:</p>
<blockquote><p><font face="Courier New"><strong>CREATE TABLE</strong> dbo.MyQ         <br />(         <br />&#160;&#160;&#160; [message_id] INT NOT NULL         <br />&#160;&#160;&#160; ,[time] [datetime] NULL         <br />&#160;&#160;&#160; ,[message] [char](300) NULL         <br />)         </p>
<p><strong>CREATE UNIQUE CLUSTERED INDEX</strong> CIX         <br /><strong>ON</strong> dbo.MyQ (message_id)</font></p>
<p><font face="Courier New"><strong>CREATE SEQUENCE</strong> dbo.MessageSequence AS INT</font></p>
<p><font face="Courier New"></font></p>
</blockquote>
<p>The rows in the table are implicitly ordered by the index. Push and pop can now implemented like this:</p>
<blockquote><p><font face="Courier New">/* Push Naïve */</font></p>
<p><font face="Courier New"><strong>INSERT</strong> MyQ (message_id, time, message)         <br /><strong>SELECT</strong> NEXT VALUE FOR dbo.MessageSequence, GETDATE(), &#8216;Hello World&#8217;</font></p>
<p><font face="Courier New">/* Pop Naïve */</font></p>
<p><font face="Courier New"><strong>DELETE</strong> Q         <br /><strong>FROM</strong> (SELECT TOP 1 * FROM MyQ ORDER BY message_id) AS Q         <br /></font></p>
</blockquote>
<p>This approach has a lot of problems that will be very familiar with anyone who has tried to be a DBA for a table like this. Among these problems are:</p>
<ul>
<li>The statistics are NEVER up to date on such a table. This typically requires all plans to be forced/hinted or statistics to be custom hacked. </li>
<li>In databases that auto updates statistics, the auto updates constantly kick in, creating jagged CPU patterns and throughput. </li>
<li>The B-tree structure implementing the ordering of the rows is constantly being split at the root as it grows and shrinks. This causes mutex convoys (in SQL Server, these convoys are reported as latches on <strong>ACCESS_METHODS_HOBT_VIRTUAL_ROOT</strong>) </li>
<li>New pages are constantly allocated and deallocated in the buffer pool which stresses internal allocation structures in the database </li>
<li>There is contention on the memory structures that hold the pages at the start and end of the B-tree (reported as <strong>PAGELATCH_EX</strong> in SQL Server) </li>
<li>The writes are extremely sensitive to latency of the transaction log drive and this is exacerbated by the convoy effects on the higher levels of the B-tree always being split. </li>
</ul>
<p>The contention is perhaps best illustrated with this diagram of the hot pages in the B-tree:</p>
<p><a href="http://kejserbi.files.wordpress.com/2012/05/image3.png"><img style="background-image:none;padding-left:0;padding-right:0;display:inline;padding-top:0;border-width:0;" title="image" border="0" alt="image" src="http://kejserbi.files.wordpress.com/2012/05/image_thumb3.png?w=540&h=261" width="540" height="261" /></a></p>
<p>I build this implementation on an <a href="http://h18004.www1.hp.com/products/quickspecs/13595_na/13595_na.HTML">HP DL380G7</a> server with a <a href="http://www.fusionio.com/platforms/iodrive-duo/">FusionIO ioDrive Duo</a> (tlog latency under 50microsec) on a SQL Server 2012 RTM installation. I experimented with different thread counts for both push and pop to find the sweet spot. The highest throughput I could push (pun intended) through the message queue with the above implementation was around 6000 messages/sec.&#160; Very, very far from impressive and fuel for the NoSQL fires.</p>
<ul>The obvious (but wrong) conclusion, is that relational database are not fit for purpose when it comes to building message queues. Of course, there IS a grain of truth to it, but can we do better than a meager 6000 messages/sec?</ul>
<h3>Relational Message Queues – LMAX’ed</h3>
<p>It is in fact possible to do a lot better than the naïve message queue implementation. Inspired by the bright people over at <a href="http://www.lmax.com">LMAX</a> and their <a href="http://code.google.com/p/disruptor/">Disruptor</a> pattern, I decided to build a relational equivalent.</p>
<p>The first realisation is that INSERT/DELETE is just not going to work. People who code high scale systems might intuit this: constant memory allocation and deallocation is expensive and INSERT/DELETE is a form of memory alloc/dealloc. The solution is to preallocate the queue at a certain size and UPDATE a reference count on each message slot in a row instead of completely removing/adding a row for every message sent through the queue. </p>
<p>The queue table now looks like this:</p>
<blockquote><p><font face="Courier New"><strong>CREATE TABLE</strong> dbo.MyQ         <br />(         <br />&#160;&#160;&#160; [Slot] BIGINT NOT NULL         <br />&#160;&#160;&#160; , message_id BIGINT NULL         <br />&#160;&#160;&#160; ,[time] [datetime] NOT NULL         <br />&#160;&#160;&#160; ,[message] [char](300) NOT NULL         <br />&#160;&#160;&#160; ,reference_count TINYINT NOT NULL         <br />) </font></p>
<p><font face="Courier New">/* Prefill messages */        <br /><strong>WHILE</strong> @i &lt; @QueueSize <strong>BEGIN</strong>         <br />&#160; <strong>INSERT </strong>MyQ (Slot, time, message, reference_count)&#160; <br />&#160; <strong>VALUES (</strong>@i, &#8217;2050-01-01&#8242;, &#8216;dummy&#8217;, 0)         <br />&#160;<strong> SET</strong> @i = @i + 1         <br /><strong>END</strong></font></p>
<p><font face="Courier New">/* Create index and fill it up */        <br /><strong>CREATE UNIQUE CLUSTERED INDEX</strong> CIX ON dbo.MyQ (Slot)         <br /><strong>WITH (FILLFACTOR = 100)</strong></font>       </p>
</blockquote>
<p style="padding-left:20px;"><em><font size="1">Side Note: If you are still wondering why UPDATE is faster than INSERT for this case. See my <a href="http://blog.kejser.org/2012/04/27/why-you-need-to-stop-worrying-about-update-statements/">previous blog entry</a>.</font></em></p>
<p>The above preallocates <strong>@QueueSize</strong> message slots. Pushing a message is now these operations:</p>
<ul>
<li>Generate a new <strong>message_id</strong> </li>
<li>Find the next available slot (with <strong>reference_count</strong> = 0) </li>
<li>Update the <strong>message</strong> column </li>
<li>Add one to <strong>reference_count</strong> </li>
</ul>
<p> And pop is:
<ul>
<li>Find the smallest <strong>message_id</strong> that has been “inserted” </li>
<li>Read and return the <strong>message</strong> column </li>
<li>Decrement <strong>reference_count</strong>, marking the slot as available again (variants can be done if you have multiple subscribers) </li>
</ul>
<p>At least, the above is the pseudo code. However, for this to work we have to find a way to quickly locate the next available <strong>slot</strong> in the message queue for push and find the smallest <strong>message_id</strong> that is not popped yet. </p>
<p>First, how do we find the next available slot for push? This turns out to be surprisingly easy: we use a sequencer object. The sequencer, for the non database people out there,&#160; is a very high scale data structure to generate “the next number”. Think of it like a singleton object. The sequencer can be used for BOTH generating the <strong>message_id</strong> AND find the next available <strong>slot</strong>. Here is how:</p>
<blockquote><p><font face="Courier New">/* Sequence to keep track of next message number and slot */</font></p>
<p><font face="Courier New"><strong>CREATE SEQUENCE</strong> dbo.PushSequence AS BIGINT         <br /><strong>START WITH</strong> 1 <strong>INCREMENT BY</strong> 1         <br /><strong>CACHE</strong> 100000;</font></p>
<p><font face="Courier New">/* Push LMAX’ed Begin */</font></p>
<p><font face="Courier New"><strong>SET</strong> @PushSeq = <strong>NEXT VALUE FOR</strong> dbo.PushSequence</font></p>
<p><font face="Courier New">/* Find slot */        <br /><strong>SET </strong>@Slot = @PushSeq % @QueueSize </font></p>
<p><font face="Courier New"><strong>UPDATE</strong> dbo.MyQ /* SQL Server users: hint WITH (ROWLOCK) here */         <br /><strong>SET</strong> [time] = GETDATE()         <br />&#160;&#160;&#160; , [message] = &#8216;Hello World&#8217;         <br />&#160;&#160;&#160; , [message_id] = @PushSeq         <br />&#160;&#160;&#160; , reference_count = reference_count + 1&#160; <br /><strong>WHERE</strong> Slot = @Slot         <br />&#160; <strong>AND</strong> reference_count = 0 /* Don’t overwrite! */</font></p>
<p><font face="Courier New"><strong>IF </strong><em>&lt;No rows affected&gt;</em> <strong>BEGIN</strong>         <br />&#160; /* The slot was not available –&gt; queue is full */         <br />&#160; <em>&lt;Sleep 100ms&gt;          <br />&#160; &lt;Try UPDATE again&gt;</em>         <br /><strong>END</strong></font></p>
<p><font face="Courier New">/* Push LMAX’ed end */</font></p>
</blockquote>
<p>What we have now achieved is essentially a ring buffer for insert operations. To illustrate, here is an example queue table of <strong>@QueueSize</strong> = 100.</p>
<p><a href="http://kejserbi.files.wordpress.com/2012/05/image4.png"><img style="background-image:none;padding-left:0;padding-right:0;display:inline;padding-top:0;border-width:0;" title="image" border="0" alt="image" src="http://kejserbi.files.wordpress.com/2012/05/image_thumb4.png?w=584&h=209" width="584" height="209" /></a></p>
<p>&#160;</p>
<p>We can quickly located new slots for push operations. All the remains is to implement pop. We could chose to keep track of the smallest message_id in a variable that the pop will update. However, it is always better to remove coordinate when possible. Instead of letting pop keep track of the messages, we can create a ANOTHER sequencer object that trails behind <strong>PushSequence</strong>. In fact, we don’t even need to keep track of the value of <strong>PushSequence</strong> to properly pop. Here is how to implement pop:</p>
<blockquote><p><font face="Courier New"><strong>CREATE SEQUENCE</strong> dbo.PopSequence AS BIGINT         <br /><strong>START WITH</strong> 1 <strong>INCREMENT BY</strong> 1         <br /><strong>CACHE</strong> 100000;</font></p>
<p><font face="Courier New">/* Pop LMAX’ed Begin */</font></p>
<p><font face="Courier New"><strong>SET</strong> @PopSeq = <strong>NEXT VALUE FOR</strong> dbo.PopSequence</font></p>
<p><font face="Courier New">/* Find slot */        <br /><strong>SET </strong>@Slot = @PopSeq % @QueueSize </font></p>
<p><font face="Courier New"><strong>UPDATE</strong> dbo.MyQ /* SQL Server users: hint WITH (ROWLOCK) here */         <br /><strong>SET</strong> [time] = GETDATE()         <br />&#160;&#160;&#160; , @OutPutMessage = message&#160; <br />&#160;&#160;&#160; , [message_id] = NULL&#160; <br />&#160;&#160;&#160; , reference_count = reference_count &#8211; 1&#160; <br /><strong>WHERE</strong> Slot = @Slot         <br />&#160; <strong>AND</strong> message_id = @PopSeq /* Make sure we didn’t try to pop an empty slot */</font></p>
<p><font face="Courier New"><strong>IF </strong><em>&lt;No rows affected&gt;</em> <strong>BEGIN</strong>         <br />&#160; /* No message found to pop or we are ahead of push */         <br />&#160; <em>&lt;Sleep 1sec&gt;          <br />&#160; &lt;Try UPDATE again&gt;</em>         <br /><strong>END</strong></font></p>
<p><font face="Courier New">/* Pop LMAX’ed end */</font></p>
</blockquote>
<p>Very similar to push, with a few tricks added. You have to be a bit careful with the boundary conditions. First of all, because there is no coordination between <strong>PopSequence</strong> and <strong>PushSequence</strong>, it CAN happen (for example due to thread scheduling) that the current pop <strong>message_id</strong> gets ahead of push <strong>message_id.</strong> When that happens, we will wait for the pusher to create more rows. We can now complete the above illustration:</p>
<p><a href="http://kejserbi.files.wordpress.com/2012/05/image5.png"><img style="background-image:none;padding-left:0;padding-right:0;display:inline;padding-top:0;border-width:0;" title="image" border="0" alt="image" src="http://kejserbi.files.wordpress.com/2012/05/image_thumb5.png?w=600&h=215" width="600" height="215" /></a></p>
<p>To avoid pop constantly blocking on an empty queue (if message pop faster than they push), it is generally a good idea to let push get a “head start” so there is something in the queue to pop. In my example I have chosen to let pop wait for 1 second before trying to pop from a queue where the last attempt to pop was an empty queue.</p>
<p>I took the above implement and ran it on the same hardware as the naïve approach. The results were VERY different. I can now drive <strong><font color="#ff0000">between 90K and 100K messages/sec</font></strong> through the queue. <strong><font color="#ff0000">A nice little 15x improvement.</font></strong></p>
<h3>Summary</h3>
<p>It can be argued that old school, relational databases and tables are not the best structures to implement durable storage for message queues. The many code paths needed in relational algebra to implement ACID properties, generic concurrency, serialization and block I/O can get in the way of a fast queue implementation, especially the naïve implementation of relational purists.</p>
<p>However, if we combine our knowledge of programming with database design skills, high throughput can be achieved even in the relational model and there are large, often overlooked improvements to be found.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kejserbi.wordpress.com/550/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kejserbi.wordpress.com/550/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kejserbi.wordpress.com/550/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kejserbi.wordpress.com/550/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kejserbi.wordpress.com/550/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kejserbi.wordpress.com/550/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kejserbi.wordpress.com/550/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kejserbi.wordpress.com/550/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kejserbi.wordpress.com/550/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kejserbi.wordpress.com/550/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kejserbi.wordpress.com/550/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kejserbi.wordpress.com/550/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kejserbi.wordpress.com/550/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kejserbi.wordpress.com/550/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=550&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.kejser.org/2012/05/25/implementing-message-queues-in-relational-databases/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/93e3150ebaf2321aef3dc4faee057861?s=96&#38;d=retro&#38;r=G" medium="image">
			<media:title type="html">schastar42</media:title>
		</media:content>

		<media:content url="http://kejserbi.files.wordpress.com/2012/05/image_thumb2.png" medium="image">
			<media:title type="html">image</media:title>
		</media:content>

		<media:content url="http://kejserbi.files.wordpress.com/2012/05/image_thumb3.png" medium="image">
			<media:title type="html">image</media:title>
		</media:content>

		<media:content url="http://kejserbi.files.wordpress.com/2012/05/image_thumb4.png" medium="image">
			<media:title type="html">image</media:title>
		</media:content>

		<media:content url="http://kejserbi.files.wordpress.com/2012/05/image_thumb5.png" medium="image">
			<media:title type="html">image</media:title>
		</media:content>
	</item>
		<item>
		<title>What Structure does your Data Have?</title>
		<link>http://blog.kejser.org/2012/05/15/what-structure-does-your-data-have/</link>
		<comments>http://blog.kejser.org/2012/05/15/what-structure-does-your-data-have/#comments</comments>
		<pubDate>Tue, 15 May 2012 13:36:57 +0000</pubDate>
		<dc:creator>Thomas Kejser</dc:creator>
				<category><![CDATA[Data Warehouse]]></category>
		<category><![CDATA[Grade of the Steel]]></category>
		<category><![CDATA[Compression]]></category>
		<category><![CDATA[Histogram]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">https://kejserbi.wordpress.com/?p=538</guid>
		<description><![CDATA[I am currently thinking about measuring compression rates for different column store indexing strategies. In order for me to get some realistic data, I am looking for people who can share a very small anonymised data sample with me. Specifically, I am looking for samples from Kimball style warehouses from different industries (If you are [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=538&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I am currently thinking about measuring compression rates for different column store indexing strategies. In order for me to get some realistic data, I am looking for people who can share a very small anonymised data sample with me.</p>
<p><span id="more-538"></span>
<p>Specifically, I am looking for samples from Kimball style warehouses from different industries (If you are a 3NF warehouse, I am not interested). Roughly speaking, I would like something like this select statement:</p>
<p>&#160;</p>
<blockquote><p><strong>SELECT HashFunction(DimensionKey1) AS d1        <br />, HashFunction(DimensionKey2) AS d2         <br />, ….         <br />, HashFunction(DimensionKeyN) AS dN         <br />, HashFunction(Measure1) AS m1         <br />, …         <br />, HashFunction(MeasureM) AS mM         <br /></strong></p>
<p><strong>FROM FactTable</strong></p>
<p>&#160;</p>
</blockquote>
<p>Where <strong>HashFunction</strong> is something that yields either a 4 byte or 8 byte integer. If you are using SQL Server, BINARY_CHECKSUM will do (or you can get fancy and <a href="http://blog.kejser.org/2011/12/07/implementing-murmurhash-and-crc-for-sqlclr/">use my C# implementation of CRC32</a>).</p>
<p>10MB of data from a fact table would be optimal, preferably sampled from a single time interval in the data. For example: if a full day is around 10MB in the warehouse, a single day would be the&#160; best sample. I don’t need to know any column names, only which columns are dimensions and which ones are measures. If possible, some indication of which industry the data is from would be helpful, but it is not strictly needed either.</p>
<p>I am fully aware that there may be legal reasons you cannot share data, even when anonymised like above. Please only give it to me if you would feel confident to give this data freely away on the Internet. I am specifically trying to create guidance on compression techniques depending on the content of data, and I plan to share this on&#160; my blog, along with distribution statistics on the anonymised data.</p>
<p>Data can be delivered to me via DropBox or Google Drive in CSV format. If needed, I can also HTTP or FTP GET it.</p>
<p>Thanks for anything you can share, I will return the sharing favour here on my blog</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kejserbi.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kejserbi.wordpress.com/538/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kejserbi.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kejserbi.wordpress.com/538/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kejserbi.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kejserbi.wordpress.com/538/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kejserbi.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kejserbi.wordpress.com/538/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kejserbi.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kejserbi.wordpress.com/538/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kejserbi.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kejserbi.wordpress.com/538/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kejserbi.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kejserbi.wordpress.com/538/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=538&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.kejser.org/2012/05/15/what-structure-does-your-data-have/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/93e3150ebaf2321aef3dc4faee057861?s=96&#38;d=retro&#38;r=G" medium="image">
			<media:title type="html">schastar42</media:title>
		</media:content>
	</item>
		<item>
		<title>Moving on from Microsoft</title>
		<link>http://blog.kejser.org/2012/05/09/moving-on-from-microsoft/</link>
		<comments>http://blog.kejser.org/2012/05/09/moving-on-from-microsoft/#comments</comments>
		<pubDate>Wed, 09 May 2012 17:27:19 +0000</pubDate>
		<dc:creator>Thomas Kejser</dc:creator>
				<category><![CDATA[Musing]]></category>
		<category><![CDATA[Public Speaking]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[SQLCAT]]></category>

		<guid isPermaLink="false">https://kejserbi.wordpress.com/?p=536</guid>
		<description><![CDATA[Not too long ago, I handed in my notice to Microsoft, terminating my employment mid-June 2012. I suspect this is slowly leaking out on the Twitter and Facebook these hours – as is tradition in the connected world. During the graduation ceremony at my university it was said that: “Life is a series of temporary [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=536&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Not too long ago, I handed in my notice to Microsoft, terminating my employment mid-June 2012. I suspect this is slowly leaking out on the Twitter and Facebook these hours – as is tradition in the connected world.</p>
<p><span id="more-536"></span>
<p><a href="http://kejserbi.files.wordpress.com/2012/05/image1.png"><img style="background-image:none;padding-left:0;padding-right:0;display:inline;float:right;padding-top:0;border-width:0;" title="image" border="0" alt="image" align="right" src="http://kejserbi.files.wordpress.com/2012/05/image_thumb1.png?w=250&h=243" width="250" height="243" /></a></p>
<p>During the graduation ceremony at my university it was said that: “Life is a series of temporary relationships” (though hopefully not with your wife). Throughout my career I have taken this fact of life to heart. And I can, without fear of the future, try out new challenges.</p>
<p>After June 2012, I am going to take it a bit slow, and do some consulting. I will then consider which adventure I want to go on from here. Because I believe you have to live out your dreams of adventure while you are still young, and I have a fair bit of fuel left in the tank before “peak oil”.</p>
<p>I part with Microsoft as a friend and will be working on creating a smooth transition over the next month. My work with SQL Server will continue, just outside the campus in Redmond. After June, I will be available for some teaching and short consulting gigs to share my knowledge. Terms and conditions will apply and follow here.</p>
<p>In the meantime, I have made my CV available: <a href="http://blog.kejser.org/about/">About Me</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kejserbi.wordpress.com/536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kejserbi.wordpress.com/536/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kejserbi.wordpress.com/536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kejserbi.wordpress.com/536/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kejserbi.wordpress.com/536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kejserbi.wordpress.com/536/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kejserbi.wordpress.com/536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kejserbi.wordpress.com/536/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kejserbi.wordpress.com/536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kejserbi.wordpress.com/536/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kejserbi.wordpress.com/536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kejserbi.wordpress.com/536/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kejserbi.wordpress.com/536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kejserbi.wordpress.com/536/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=536&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.kejser.org/2012/05/09/moving-on-from-microsoft/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/93e3150ebaf2321aef3dc4faee057861?s=96&#38;d=retro&#38;r=G" medium="image">
			<media:title type="html">schastar42</media:title>
		</media:content>

		<media:content url="http://kejserbi.files.wordpress.com/2012/05/image_thumb1.png" medium="image">
			<media:title type="html">image</media:title>
		</media:content>
	</item>
		<item>
		<title>The Hollywood Business Model is Broken</title>
		<link>http://blog.kejser.org/2012/05/08/the-hollywood-business-model-is-broken/</link>
		<comments>http://blog.kejser.org/2012/05/08/the-hollywood-business-model-is-broken/#comments</comments>
		<pubDate>Tue, 08 May 2012 20:38:45 +0000</pubDate>
		<dc:creator>Thomas Kejser</dc:creator>
				<category><![CDATA[Musing]]></category>
		<category><![CDATA[Capitalism]]></category>
		<category><![CDATA[Rant]]></category>

		<guid isPermaLink="false">https://kejserbi.wordpress.com/?p=529</guid>
		<description><![CDATA[Last week, I watched “The Avengers”. Very nice movie, recommended. Why then, do I feel the need to rant today and tell the movie industry where to stick their business model? Let me give you some background: I like movie theatres, a lot. One of my favourite, the venue last week, is the Electric Cinema [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=529&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Last week, I watched “<a href="http://www.imdb.com/title/tt0848228/">The Avengers</a>”. <strong>Very</strong> nice movie, recommended. Why then, do I feel the need to rant today and tell the movie industry where to stick their business model?</p>
<p><span id="more-529"></span>
<p><a href="http://kejserbi.files.wordpress.com/2012/05/image.png"><img style="background-image:none;margin:0 0 0 3px;padding-left:0;padding-right:0;display:inline;float:right;padding-top:0;border-width:0;" title="image" border="0" alt="image" align="right" src="http://kejserbi.files.wordpress.com/2012/05/image_thumb.png?w=208&h=240" width="208" height="240" /></a>Let me give you some background: I like movie theatres, a lot. One of my favourite, the venue last week, is the <a href="http://www.electriccinema.co.uk/">Electric Cinema</a> in Noting hill. This place represents everything that is great about theatres: It has plush red chairs, footstools, Victorian interior and even a café at the back where you can buy a well made espresso. The sound is great and the screen has high quality, digital projection. The people who go there are not annoying either. The cost is just high enough to discourage people from bringing their youngest children. In other words: It represents everything good about watching movies for the childless couple – my girlfriend and I. Building something like this at home would be, to put it mildly, cost restrictive. It’s not just the quality of the room, it’s the atmosphere.</p>
<p>As I emerged into the daylight, a thought occurred to me: “Wouldn’t it be nice to watch Iron Man 1 and 2 now? I liked those movies too”. My girlfriend, like most people with two X-chromosomes, finds Robert Downey Jr. very charming – it would be an easy sell. I know we are a typical couple in this case – because those movies have suddenly become very popular on iTunes.</p>
<p>Why then, is it that I cant watch Iron Man in the movie theatres? There is really no technical barrier that prevents anyone from uploading digital movies to theatres “on demand”. Yes, the movie is old – but movies don’t go out of fashion. Heck, my father watches the same black and white movies – over and over again. </p>
<p>Imagine a future where I can put down a deposit of money, vote for movies to be shown in my local theatre. I would also provide some dates that I would like to watch. If enough people “vote” for a movie with their deposits, it is shown in a proper timeslot. Of course, it takes off some of the spontaneous &quot;impulse watching” of movies. But I would not expect that everyone who votes will have time to show up. I could buy tickets of someone else or at a discount price – stock market style. This could even be done as Facebook app to drive herd behavior: “Thomas wants to watch Iron Man, hey, I will vote too”.</p>
<p>All of this is of course terribly complicated to implement for someone who lacks planning and IT skills. Instead, I am forced to live by the arbitrary (for the consumer), geographically controlled release dates of Hollywood movies. It seems to me that this is yet another variant of the dinosaur mentality we see in the music industry and it is carrying over to other media companies. It’s a desperate attempt to control a market, because you are too daft, too cocaine snorting or groupie distracted to come up with a better business model. Now, wouldn’t it be nice to have a consumer controlled movie theatre? How much would you pay per month to subscribe if you had influence?</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kejserbi.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kejserbi.wordpress.com/529/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kejserbi.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kejserbi.wordpress.com/529/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kejserbi.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kejserbi.wordpress.com/529/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kejserbi.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kejserbi.wordpress.com/529/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kejserbi.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kejserbi.wordpress.com/529/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kejserbi.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kejserbi.wordpress.com/529/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kejserbi.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kejserbi.wordpress.com/529/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=529&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.kejser.org/2012/05/08/the-hollywood-business-model-is-broken/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/93e3150ebaf2321aef3dc4faee057861?s=96&#38;d=retro&#38;r=G" medium="image">
			<media:title type="html">schastar42</media:title>
		</media:content>

		<media:content url="http://kejserbi.files.wordpress.com/2012/05/image_thumb.png" medium="image">
			<media:title type="html">image</media:title>
		</media:content>
	</item>
		<item>
		<title>Reading Material: Abstractions, Virtualisation and Cloud</title>
		<link>http://blog.kejser.org/2012/05/01/reading-material-abstractions-virtualisation-and-cloud/</link>
		<comments>http://blog.kejser.org/2012/05/01/reading-material-abstractions-virtualisation-and-cloud/#comments</comments>
		<pubDate>Tue, 01 May 2012 11:45:33 +0000</pubDate>
		<dc:creator>Thomas Kejser</dc:creator>
				<category><![CDATA[BigData]]></category>
		<category><![CDATA[Data Warehouse]]></category>
		<category><![CDATA[Grade of the Steel]]></category>
		<category><![CDATA[Modeling]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[Abstraction]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[I/O]]></category>
		<category><![CDATA[OS]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Reading]]></category>
		<category><![CDATA[Scale]]></category>
		<category><![CDATA[Virtualization]]></category>

		<guid isPermaLink="false">https://kejserbi.wordpress.com/?p=519</guid>
		<description><![CDATA[When speaking at conferences, I often get asked questions about virtualization and how fast databases will run on it (and even if they are “supported” on virtualised systems).&#160; This is complex question to answer. Because it requires a very deep understanding of CPU caches, memory and I/O systems to fully describe the tradeoffs. Let us [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=519&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://kejserbi.files.wordpress.com/2012/05/two-sockets.png"><img style="background-image:none;margin:0 0 2px 7px;padding-left:0;padding-right:0;display:inline;float:right;padding-top:0;border-width:0;" title="Two Sockets" border="0" alt="Two Sockets" align="right" src="http://kejserbi.files.wordpress.com/2012/05/two-sockets_thumb.png?w=240&h=214" width="240" height="214" /></a>When speaking at conferences, I often get asked questions about virtualization and how fast databases will run on it (and even if they are “supported” on virtualised systems).&#160; This is complex question to answer. Because it requires a very deep understanding of CPU caches, memory and I/O systems to fully describe the tradeoffs. </p>
<p><span id="more-519"></span>
<p>Let us first look at the political reasons for virtualising: Operation teams, for very good reasons, often try to push developers towards virtualised systems – cloud is just the latest in this ongoing trend. They try to provide an abstraction between application code and the nasty, physical logistics of data centers – making their job easier. The methods of the operation teams employ take many forms: VLAN, SAN, Private clouds and <a href="http://www.vmware.com/">VMWare</a>/<a href="www.microsoft.com/hyper-v-server/">HyperV</a> to quote a few examples. Virtualising will increase their flexibility – and drive down <strong>their</strong> cost per machine, which looks great in the balance sheet. However, this is flexibility comes at a very high cost. It has been said that:</p>
<p>&#160;</p>
<p align="center"><em><font size="4">“</font></em><a href="http://en.wikipedia.org/wiki/Leaky_abstraction"><em><font size="4">All non-trivial abstractions, to some degree, are leaky</font></em></a><em><font size="4">”</font></em></p>
<p align="right"><strong>Joel Spolsky</strong></p>
<p align="left">In the case of virtualisation, the abstraction provided is very non-trivial indeed and the leaking is sometimes equally extreme. Traditionally, the issue with virtualisation has been slowdown of I/O or network – though this has gotten a lot better with hardware support for virtual hosts (though SAN still haunts us). Over provisioned memory is another good example of virtualisation wrecking havoc with performance. All of these seems to be surmountable though and this is driving cloud forward.</p>
<p align="left">However, lately it is becoming increasingly clear that scheduling, NUMA and L2/L3 cache misses are potentially an even larger problem and one that will surface once you take I/O out of the bottleneck club.</p>
<p align="left">As we grow our data centers to cloud massive scale and pay for compute power by the hour, every machine counts and will figure in the balance sheet. It should also be clear that a important optimisation will be to focus on the performance on individual scale nodes – to make the best use of the expensive power.</p>
<p align="left">This morning, I ran into some fascinating research in this area (Barret Rhoden, Kevin Klues, David Zhu, Eric Brewer) who take this to another level:</p>
<p style="padding-left:20px;">“<a href="http://research.klueska.com/pubs/socc11-akaros.pdf">Improving Per-Node Efﬁciency in the Datacenter with New OS Abstractions</a>” (pdf)</p>
<p align="left">To whet your appetite, here is a quote from the abstract (my highlight).</p>
<p style="padding-left:20px;" align="left"><em>“We believe datacenters can beneﬁt from more focus on per-node efﬁciency, performance, and predictability, versus the more common focus so far on scalability to a large number of nodes. Improving per-node efﬁciency decreases costs and fault recovery because fewer nodes are required for the same amount of work. <font>We believe that the use of complex, general-purpose operating systems is a key contributing factor to these inefﬁciencies</font>.”</em></p>
<p align="left">A highly recommend read and a good primer on some of the things that concern me a lot these days.</p>
<h3 align="left">Kejser’s Law</h3>
<p align="left">I think it is time for me to state my own law (or trivial insight if you will) of computing. Though I stand here at the shoulders of giants, I will steal a bit of the fame. I think it is appropriate that I state one of the things I aim to show people at conferences:</p>
<p align="left">&#160;</p>
<p align="center"><em><font size="4">“Any shared resource in a non-trivial scale workload, will eventually bottleneck”</font></em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kejserbi.wordpress.com/519/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kejserbi.wordpress.com/519/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kejserbi.wordpress.com/519/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kejserbi.wordpress.com/519/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kejserbi.wordpress.com/519/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kejserbi.wordpress.com/519/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kejserbi.wordpress.com/519/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kejserbi.wordpress.com/519/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kejserbi.wordpress.com/519/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kejserbi.wordpress.com/519/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kejserbi.wordpress.com/519/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kejserbi.wordpress.com/519/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kejserbi.wordpress.com/519/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kejserbi.wordpress.com/519/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=519&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.kejser.org/2012/05/01/reading-material-abstractions-virtualisation-and-cloud/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/93e3150ebaf2321aef3dc4faee057861?s=96&#38;d=retro&#38;r=G" medium="image">
			<media:title type="html">schastar42</media:title>
		</media:content>

		<media:content url="http://kejserbi.files.wordpress.com/2012/05/two-sockets_thumb.png" medium="image">
			<media:title type="html">Two Sockets</media:title>
		</media:content>
	</item>
		<item>
		<title>Why You Need to Stop Worrying about UPDATE Statements</title>
		<link>http://blog.kejser.org/2012/04/27/why-you-need-to-stop-worrying-about-update-statements/</link>
		<comments>http://blog.kejser.org/2012/04/27/why-you-need-to-stop-worrying-about-update-statements/#comments</comments>
		<pubDate>Fri, 27 Apr 2012 19:02:32 +0000</pubDate>
		<dc:creator>Thomas Kejser</dc:creator>
				<category><![CDATA[Data Warehouse]]></category>
		<category><![CDATA[Grade of the Steel]]></category>
		<category><![CDATA[Modeling]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[FusionIO]]></category>
		<category><![CDATA[Keys]]></category>
		<category><![CDATA[Patterns]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Relational]]></category>

		<guid isPermaLink="false">https://kejserbi.wordpress.com/?p=514</guid>
		<description><![CDATA[There seems to be a myth perpetuated out there in the database community that UPDATE statements are somehow “bad” and should be avoided in data warehouses. Let us have a look at the facts for a moment and weigh up if this myth has any merit. Transaction Logged Data In traditional, relational, ACID property, rows [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=514&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>There seems to be a myth perpetuated out there in the database community that UPDATE statements are somehow “bad” and should be avoided in data warehouses.</p>
<p>Let us have a look at the facts for a moment and weigh up if this myth has any merit.</p>
<p><span id="more-514"></span><br />
<h3>Transaction Logged Data</h3>
<p>In traditional, relational, ACID property, rows stored in pages, relational databases we typically distinguish between two types of DML operations from a transaction logging perspective.</p>
<ul>
<li>Row Logged </li>
<li>Allocation Logged </li>
</ul>
<p>Row logged operations will write a transaction log entry every time a row/tuple is modified. This means that the amount of transaction log traffic generated is proportional to the number of rows touched.</p>
<p>Allocation Logged (called: “Minimal logging” in SQL Server) operations only write the physical allocations to the transaction log, if at all. This means the log traffic (if any) is proportional to the size of the data touched. This typically generates at least an order of magnitude fewer log entries than row logged, and is thus faster… Or is it? Read on…</p>
<p>Typically, ACID databases only allow bulk style loads, index builds and large table/partition truncations and drops to be allocation logged. UPDATE and DELETE statements tend to be fully row logged.</p>
<p><strong>Myth:</strong> This difference in allocation structures makes UPDATE “bad” because the transaction log is the bottleneck.</p>
<p><strong>Reality:</strong> The picture a quite a bit more nuanced than that. </p>
<p>First of all, INSERT in bulk mode is not always allocation logged. Typically a lot of conditions have to be true for allocation logging to work. For those of you interested in SQL Server, I have written about this extensively here: <a href="http://msdn.microsoft.com/en-us/library/dd425070(v=sql.100).aspx">The Data Loading Performance Guide</a>.</p>
<p>Second, the transaction log bottleneck ”wall” is widely exaggerated. I have personally driven 750MB/sec write log traffic into a single database in SQL Server using a <a href="http://www.fusionio.com">FusionIO</a> card. I have seen colleagues do 120MB/sec with traditional, 15K spindles. True: I have also driven 3GB/sec (around 10TB in an hour) using allocation logged INSERT, which is faster than its row logged sibling. However, you have to ask yourself the question: Do you need to go <strong>that</strong> fast in a single database? In an MPP system you will also have several log files that work together to provide linear scale of log traffic with no relevant roof.</p>
<p>Third, it is perfectly conceivable that you are running a warehouse database that does NOT need to serialize DML operations or may even write entire blocks directly to the database instead of the transaction log. Such systems simply do not have the above bottleneck. For example, Neteeza uses a “logless” implementation and so does many noSQL database systems and HADOOP/HIVE. MySQL with the MyISAM engine also allows non logged operations. </p>
<p>&#160;</p>
<h3>INSERT/DELETE vs. in-place UPDATE</h3>
<p>In an traditional, relational database an update can be implemented either as a transactionally wrapped INSERT of the new data followed by a DELETE of the old data (or the other way around). </p>
<p><strong>Myth:</strong> Because UPDATE is an INSERT and a DELETE this can double the amount of data that needs to be written during an UPDATE operation and will make me hit the transaction log wall even faster.</p>
<p><strong>Reality:</strong> If you are not modifying the keys in an index or making the column size wider, UPDATE statements can be executed as in-place modifications of the row. This allows the database to only write a special old/new value into the transaction log. This optimization can even be applied on a column by column basis, further reducing the transaction log footprint. This leads us to:</p>
<p>&#160;</p>
<h3>UPDATE vs. INSERT speed</h3>
<p>This myth is an amalgam of the above arguments. </p>
<p><strong>Myth:</strong> INSERT of row logged data is faster than UPDATE of row logged data</p>
<p><strong>Reality:</strong> Let us first settle one thing which I will type on its own line and in red to make it easy to remember:</p>
<p style="padding-left:50px;padding-right:50px;color:red;"><strong>An in-place, row logged, UPDATE operation on a non compressed page is <u>faster</u> than doing a row logged INSERT of the same data.</strong> </p>
<p>Why is this? Because the INSERT operation has to allocate new physical structures in the index, while the UPDATE can simply reuse database pages without having to allocate more space. </p>
<p>And here are the numbers to prove it where I am running INSERT vs. UPDATE of large dataset in SQL Server (smaller is faster):</p>
<p><a href="http://kejserbi.files.wordpress.com/2012/04/insert-vs-update.png"><img style="background-image:none;padding-left:0;padding-right:0;display:inline;padding-top:0;border-width:0;" title="INSERT vs UPDATE" border="0" alt="INSERT vs UPDATE" src="http://kejserbi.files.wordpress.com/2012/04/insert-vs-update_thumb.png?w=545&h=322" width="545" height="322" /></a></p>
<p><strong>True:</strong> if your UPDATE statement has to do the INSERT/DELETE trick, it will likely be slower. But if you are NOT changing the row size and you get the in-place UPDATE, it might just be FASTER to run an UPDATE than an insert.</p>
<p><strong>Also true:</strong> compression can change the game quite significantly depending on the compression algorithm you use for the table structure. This is again implementation dependent.</p>
<h3>Summary</h3>
<p>The myth about UPDATE statements being bad or slow is too simple a way to look at this crucial DML operation. In fact, the myth is outright false in some cases.</p>
<p>Avoiding UPDATE statements should <strong>not</strong> generally be a major driver for design guidance, or used as the basis for drawing any conclusions about the data modeling techniques you&#160; should apply. There picture is quite a bit more nuanced than this.</p>
<p>First of all, the speed of UPDATE statement as compared to bulk inserts will depend on the database engine you run on.</p>
<p>Second, we have seen that even when UPDATE is fully row logged, the transaction log “wall” is very far away on proper hardware and not much of a concern to 99% of all the installations out there. There are of course cases where you will hit the “wall”, but those are largely mitigated in MPP systems or other sharded deployments that have more than one transaction log.</p>
<p>Third, there are cases where UPDATE statements are actually <strong>faster</strong> than (row logged) INSERT statements. These typically occur when you change columns in such a way that they don’t grow larger than they already are, allowing in-place UPDATE operations. An interesting and highly relevant example of such a case is UPDATE statements that target fact table keys – which (if you follow my guidance) are integers and therefore have constant width.</p>
<p>&#160;</p>
<p><strong>Bonus Exercise and chance to win (for SQL Server people):</strong> Here is an interesting experiment. In theory, it might be possible to create a workload where you UPDATE a very wide table and where the equivalent, minimally logged, super optimized “copy to new table” BULK INSERT or SELECT INTO statement is actually <strong>slower</strong> than the UPDATE. Where is the crossover point on table width? I will offer a free, 1 hour teaching session. Database subject of your choice, you host me, I bring the coffee in the London City area to the first person who can provide a test script and the data to show this crossover point. Alternatively the price can also be claimed if you can conclusively prove that the crossover point does not exist.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kejserbi.wordpress.com/514/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kejserbi.wordpress.com/514/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kejserbi.wordpress.com/514/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kejserbi.wordpress.com/514/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kejserbi.wordpress.com/514/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kejserbi.wordpress.com/514/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kejserbi.wordpress.com/514/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kejserbi.wordpress.com/514/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kejserbi.wordpress.com/514/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kejserbi.wordpress.com/514/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kejserbi.wordpress.com/514/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kejserbi.wordpress.com/514/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kejserbi.wordpress.com/514/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kejserbi.wordpress.com/514/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=514&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.kejser.org/2012/04/27/why-you-need-to-stop-worrying-about-update-statements/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/93e3150ebaf2321aef3dc4faee057861?s=96&#38;d=retro&#38;r=G" medium="image">
			<media:title type="html">schastar42</media:title>
		</media:content>

		<media:content url="http://kejserbi.files.wordpress.com/2012/04/insert-vs-update_thumb.png" medium="image">
			<media:title type="html">INSERT vs UPDATE</media:title>
		</media:content>
	</item>
		<item>
		<title>Conference Updates</title>
		<link>http://blog.kejser.org/2012/04/25/conference-updates/</link>
		<comments>http://blog.kejser.org/2012/04/25/conference-updates/#comments</comments>
		<pubDate>Wed, 25 Apr 2012 20:26:17 +0000</pubDate>
		<dc:creator>Thomas Kejser</dc:creator>
				<category><![CDATA[Grade of the Steel]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Conference]]></category>

		<guid isPermaLink="false">https://kejserbi.wordpress.com/?p=510</guid>
		<description><![CDATA[A few weeks ago, I presented at SQL Bits X. The slides from my co-presentation with Conor are available here: Conor vs. SQL Last week, I spoke at Microsoft Open World 2012 doing quite a lot of presentations. I have created a new and extended “Grade of the Steel” deck that I hope to blog [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=510&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago, I presented at SQL Bits X. The slides from my co-presentation with Conor are available here: </p>
<ul>
<li><a href="http://blogs.msdn.com/b/conor_cunningham_msft/archive/2012/04/03/conor-vs-sqlbits-x-slide-decks.aspx">Conor vs. SQL</a> </li>
</ul>
<p>Last week, I spoke at <a href="http://mow2012.dk/program#">Microsoft Open World 2012</a> doing quite a lot of presentations. I have created a new and extended “Grade of the Steel” deck that I hope to blog more about my findings (these things tend to start as PowerPoint slides and evolve into blog entries).</p>
<p>If you want a copy of any of my presentations, please just mail me. Speaking of presentations, I have found this really cool tool that will compress slides: <a href="http://www.neuxpower.com/">NXPowerLite</a>. This neat tool has taken off around 60-80% of my slide sizes.</p>
<p>Lately, I have been practicing zen style presentations: max four sentences on each slide and heavy use of visualisations instead. I highly recommend putting yourself under such constraints. After I removed the text from my slides I feel much better improvising on stage. </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kejserbi.wordpress.com/510/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kejserbi.wordpress.com/510/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kejserbi.wordpress.com/510/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kejserbi.wordpress.com/510/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kejserbi.wordpress.com/510/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kejserbi.wordpress.com/510/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kejserbi.wordpress.com/510/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kejserbi.wordpress.com/510/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kejserbi.wordpress.com/510/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kejserbi.wordpress.com/510/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kejserbi.wordpress.com/510/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kejserbi.wordpress.com/510/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kejserbi.wordpress.com/510/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kejserbi.wordpress.com/510/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=510&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.kejser.org/2012/04/25/conference-updates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/93e3150ebaf2321aef3dc4faee057861?s=96&#38;d=retro&#38;r=G" medium="image">
			<media:title type="html">schastar42</media:title>
		</media:content>
	</item>
		<item>
		<title>Setting Yourself up for Debugging</title>
		<link>http://blog.kejser.org/2012/03/14/setting-yourself-up-for-debugging/</link>
		<comments>http://blog.kejser.org/2012/03/14/setting-yourself-up-for-debugging/#comments</comments>
		<pubDate>Wed, 14 Mar 2012 10:23:00 +0000</pubDate>
		<dc:creator>Thomas Kejser</dc:creator>
				<category><![CDATA[Grade of the Steel]]></category>
		<category><![CDATA[Utilities]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Tuning]]></category>

		<guid isPermaLink="false">https://kejserbi.wordpress.com/?p=504</guid>
		<description><![CDATA[Lately, I have been doing a fair amount of debugging, and helped other people set up their debuggers. It can be very painful to make this work. To assist up and coming debuggers, I put my notes together in this blog. Here is how to get things going. Downloading and Installing the Tools First, you [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=504&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Lately, I have been doing a fair amount of debugging, and helped other people set up their debuggers.</p>
<p>It can be very painful to make this work. To assist up and coming debuggers, I put my notes together in this blog. Here is how to get things going.</p>
<p><span id="more-504"></span><br />
<h3>Downloading and Installing the Tools</h3>
<p>First, you need the full tool package. Fortunately, all you need is now packaged in the Windows SDK to ease the pain:</p>
<ul>
<li>Download and install Windows SDK for .NET 4.0 from here: <a href="http://msdn.microsoft.com/en-us/windows/hardware/gg463009/">http://msdn.microsoft.com/en-us/windows/hardware/gg463009/</a>. </li>
</ul>
<p>Do the full install of all packages. They will take several GB. Please make sure you have a large drive (more than 100GB of space), you will need it.</p>
<h3>Symbol Setup</h3>
<p>This is the tricky part. In order to debug properly, you need symbols. Getting those set up requires a few tricks.</p>
<p>First, create directories to hold symbols and symbol caches. It is generally a good idea to stick to the naming convention that Microsoft uses in their examples:</p>
<ul>
<li>Create <strong>C:\Symbols</strong> </li>
<li>Create <strong>C:\SymCache</strong> </li>
</ul>
<ul>The next step is to configure symbol environment variables. While you CAN debug without them, setting them up is really worth the effort. First, locate you environment variables in <strong>My Computer –&gt; Properties –&gt; Advanced System Settings –&gt; Advanced</strong> (why is all the good stuff always located in “Advanced”?). Here you will find:</ul>
<ul><a href="http://kejserbi.files.wordpress.com/2012/03/settingupdebug.png"><img style="background-image:none;padding-left:0;padding-right:0;display:inline;padding-top:0;border-width:0;" title="SettingUpDebug" border="0" alt="SettingUpDebug" src="http://kejserbi.files.wordpress.com/2012/03/settingupdebug_thumb.png?w=379&h=463" width="379" height="463" /></a></ul>
<ul>You need to set up two environment variables. I prefer to set them as system variables, since I am the only one using my machine.</ul>
<p> Assuming you use the directories above, set your symbol paths like this:
<ul>
<li><strong>_NT_SYMBOL_PATH</strong>=SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols </li>
<li><strong>_NT_SYMCACHE_PATH</strong>=C:\SymCache </li>
</ul>
<ul>Note that the paths are separated with stars, not semi-colon.</ul>
<h3>Main tools overview</h3>
<p>There are some key command line tools that you will need to get familiar with:</p>
<ul>
<li><strong>SqlDumper.exe</strong> – Used to generate dumps and minidumps. There are other tools that do the same, but I prefer this one. The usage is described here: <a title="SqlDumper information KB" href="http://support.microsoft.com/kb/917825">KB 917825</a> </li>
<li><strong>WinDbg.exe</strong> – A lightweight debugger for analysing dumps. Leaves a lot to be desired in usability, but it is really fast once you memorise the arcane commands required. </li>
<li><strong>SymChk</strong> &#8211; Used to check if symbols can be resolved. Gives verbose information with the /v command line, which is useful to validate symbol paths. See: <a title="SymChk" href="http://msdn.microsoft.com/en-us/library/windows/hardware/ff558845(v=vs.85).aspx">SymChk Command-Line Options</a>. </li>
<li><strong>SymStore.exe</strong> &#8211; Allows you to create your own symbol stores (under C:\symbols for example). This is useful if you are debugging your own code or code which has symbols that dont originate from Microsoft. See: <a title="SynStore Command line refernce" href="http://msdn.microsoft.com/en-us/library/windows/desktop/ms681378(v=vs.85).aspx">SymStore Command-Line Options</a>. </li>
<li><strong>xperf.exe</strong> &#8211; The mother of all code profilers. This tool comes with an enormous amount of command line parameters. It would require an entire blog entry just to list my favourites (Hint: I may do one). Documentation is hard to come by, but here is a good intro: <a href="http://blogs.microsoft.co.il/blogs/sasha/archive/2008/03/15/xperf-windows-performance-toolkit.aspx">All Your Base are Belong to Us – Xperf</a>. </li>
<li><strong>xperfview.exe</strong> &#8211; Used to view the traces generated by <strong>xperf</strong> </li>
</ul>
<h3>WinDbg Cheat Sheet</h3>
<p>There are some commands that I use frequently in WinDbg. I would recommend you minidump a process you know while it is working in a test system (<strong>sqlservr.exe</strong> for example) and try them out. It is really quite amazing what you can see.</p>
<table border="1" cellspacing="0" cellpadding="0" width="598">
<tbody>
<tr>
<td valign="top" width="150"><strong>Command</strong></td>
<td valign="top" width="446"><strong>Description</strong></td>
</tr>
<tr>
<td valign="top" width="150">!uniqstack</td>
<td valign="top" width="446">Shows all unique stacks. This is generally the first command I run to get an overview of the dump</td>
</tr>
<tr>
<td valign="top" width="150">∼</td>
<td valign="top" width="446">Show threads in the dump</td>
</tr>
<tr>
<td valign="top" width="150">∼&lt;thread&gt;s</td>
<td valign="top" width="446">Switch to &lt;thread&gt;. Useful for digging into the thread stack and data structures</td>
</tr>
<tr>
<td valign="top" width="150">k</td>
<td valign="top" width="446">Shows the thread stack. You can use <strong>∼*kn</strong> to show all thread stacks</td>
</tr>
<tr>
<td valign="top" width="150">d&lt;x&gt;</td>
<td valign="top" width="446">There are different version of this command. But all are used to dig into data types that reside in a memory address or address range.</td>
</tr>
<tr>
<td valign="top" width="150">.frame &lt;x&gt;</td>
<td valign="top" width="446">Look at a stack frame</td>
</tr>
<tr>
<td valign="top" width="150">.reload</td>
<td valign="top" width="446">Retries symbol loading. Useful if you locate a missing symbol and want to see if it matches the dump</td>
</tr>
<tr>
<td valign="top" width="150">.cls</td>
<td valign="top" width="446">Clears the screen (which tends to get crowded)</td>
</tr>
<tr>
<td valign="top" width="150">q</td>
<td valign="top" width="446">Quit WinDbg</td>
</tr>
</tbody>
</table>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kejserbi.wordpress.com/504/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kejserbi.wordpress.com/504/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kejserbi.wordpress.com/504/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kejserbi.wordpress.com/504/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kejserbi.wordpress.com/504/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kejserbi.wordpress.com/504/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kejserbi.wordpress.com/504/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kejserbi.wordpress.com/504/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kejserbi.wordpress.com/504/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kejserbi.wordpress.com/504/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kejserbi.wordpress.com/504/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kejserbi.wordpress.com/504/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kejserbi.wordpress.com/504/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kejserbi.wordpress.com/504/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=504&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.kejser.org/2012/03/14/setting-yourself-up-for-debugging/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/93e3150ebaf2321aef3dc4faee057861?s=96&#38;d=retro&#38;r=G" medium="image">
			<media:title type="html">schastar42</media:title>
		</media:content>

		<media:content url="http://kejserbi.files.wordpress.com/2012/03/settingupdebug_thumb.png" medium="image">
			<media:title type="html">SettingUpDebug</media:title>
		</media:content>
	</item>
		<item>
		<title>When Statistics are not Enough &#8211; Search Patterns</title>
		<link>http://blog.kejser.org/2012/02/21/when-statistics-are-not-enough-search-patterns/</link>
		<comments>http://blog.kejser.org/2012/02/21/when-statistics-are-not-enough-search-patterns/#comments</comments>
		<pubDate>Tue, 21 Feb 2012 19:14:46 +0000</pubDate>
		<dc:creator>Thomas Kejser</dc:creator>
				<category><![CDATA[Modeling]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Histogram]]></category>
		<category><![CDATA[I/O]]></category>
		<category><![CDATA[Patterns]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">https://kejserbi.wordpress.com/?p=496</guid>
		<description><![CDATA[Co-author: Lasse Nedergaard Yesterday, Lasse ran into an issues with a query pattern in the large database that he is responsible for. Based on our conversation, we wrote up this blog and created a repro. The troublesome query we were debugging executed like this: Find a list of keys values to search for Insert these [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=496&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong>Co-author: </strong>Lasse Nedergaard</p>
<p>Yesterday, Lasse ran into an issues with a query pattern in the large database that he is responsible for. Based on our conversation, we wrote up this blog and created a repro.</p>
<p>The troublesome query we were debugging executed like this:</p>
<ol>
<li>Find a list of keys values to search for </li>
<li>Insert these keys in a temp table – lets call this the <strong>SearchFor</strong> table </li>
<li>Join the temp table to a large table (lets call it <strong>BigTable</strong>) and retrieve the full row from the large table</li>
</ol>
<p>Why not use a correlated sub query in step 2? In this case, the customer in question had multiple code paths (including one accepting XML queries) that all needed to pass thousands of key to a final search procedure. They wanted a generic way to pass these key filters to to the final access of <strong>BigTable</strong>. The schema will make it clearer.</p>
<h3>Schema</h3>
<p>We created a repro and it turns out to be a great look into the strange world of database statistics. Here is the schema:</p>
<blockquote></blockquote>
<blockquote><p><font face="Courier New"><strong></strong></font></p>
<p><font face="Courier New"><strong>CREATE TABLE</strong> BigTable        <br />(        <br />&#160; NearKey UNIQUEIDENTIFIER NOT NULL        <br />&#160; , Payload char(200) NOT NULL        <br />&#160; , RowType INT NOT NULL        <br />)</font></p>
<p><font face="Courier New">/* Insert rows to simulate large table with 177 being the skewed value */</font></p>
<p><font face="Courier New"><strong>INSERT INTO</strong> Bigtable <strong>WITH (TABLOCK)</strong> (NearKey, PayLoad, RowType)        <br /><strong>SELECT </strong>NEWID()        <br />&#160; </font><font face="Courier New">, REPLICATE(&#8216;X&#8217;, 200)       <br />&#160; , CASE WHEN n%100 BETWEEN 0 AND 50 THEN 177 ELSE n%100 END        <br /><strong>FROM</strong> dbo.fn_nums(1000000)</font></p>
<p><font face="Courier New">/* Creates indexes and statistics */</font></p>
<p><font face="Courier New"><strong>CREATE INDEX</strong> IX_NearKey <strong>ON</strong> BigTable (NearKey)        <br /><strong>CREATE STATISTICS</strong> STAT_RowType <strong>ON </strong>BigTable(RowType)</font></p>
<p><font face="Courier New"><strong>UPDATE STATISTICS</strong> BigTable WITH FULLSCAN</font></p>
<p><font face="Courier New">/* Insert rows that are NOT in the statistics with key = 199 */</font></p>
<p><font face="Courier New"><strong>INSERT INTO </strong>BigTable (NearKey, PayLoad, RowType)        <br /><strong>SELECT </strong>NEWID(), &#8216;New Rows Not stat updated&#8217;, 199 /* New type*/        <br /><strong>FROM </strong>fn_nums (100000)</font></p>
<p><font face="Courier New">/* Create generic search table */</font></p>
<p><font face="Courier New"><strong>CREATE TABLE</strong> #SearchFor (NearKey UNIQUEIDENTIFIER)</font></p>
<p>&#160;</p>
</blockquote>
<p>In Production<strong>, BigTable</strong> is a very large table with billions of rows. Each row in the table has <strong>RowType </strong>that describes metadata about the row. The metadata is used to populate <strong>#SearchFor </strong>before accessing <strong>BigTable, </strong>as this saves significant I/O.</p>
<p>Things to note about <strong>BigTable</strong>:</p>
<ul>
<li>All stats have been fully updated for values that are not <strong>RowType = 199.</strong> As long as we stay below this value, we have done ALL we can </li>
<li>There is big skew in the <strong>RowType </strong>column. </li>
<li>There are 100K values that are not in the statistics (you can run <strong>DBCC SHOW_STATISTICS (BigTable, STAT_RowType) WITH HISTOGRAM </strong>to convince yourself of this)</li>
</ul>
<h3>Query Problems</h3>
<ul>When a user queries this system, the code will first populate <strong>#SearchFor</strong> with a stored procedure like this:</ul>
<p>Typical search procedures in the system look like this:</p>
<blockquote><p><font face="Courier New"><strong>CREATE PROCEDURE </strong>GetDataA        <br /></font><font face="Courier New"><strong>AS</strong></font> </p>
<p><font face="Courier New"><strong>SELECT </strong>… <strong>INTO </strong>#SearchFor</font> </p>
<p><font face="Courier New"><strong>EXEC </strong>dbo.ReturnFromBigTable</font> </p>
<p><font face="Courier New"></font></p>
<p>&#160;</p>
<p><font face="Courier New"></font></p>
</blockquote>
<p><strong>ReturnFromBigTabl</strong>e contains the troublesome statement that is the subject of this blog:</p>
<blockquote><p><font face="Courier New"><strong>SELECT</strong> *         <br /><strong>FROM </strong>BigTable         <br /><strong>JOIN </strong>#SearchFor         <br /><strong>&#160; ON</strong> BigTable.NearKey = #SearchFor.NearKey        <br /><strong>WHERE</strong> RowType = &lt;SomeValue&gt;</font></p>
<p>&#160;</p>
</blockquote>
<p>Looks quite innocent, doesn’t it? </p>
<p>To simulate the customer’s workload, let us say we are searching for 1% of the rows in <strong>BigTable</strong>. We do this by populating <strong>#SearchFor</strong>.</p>
<blockquote><p><font face="Courier New"><strong>INSERT INTO</strong> #SearchFor        <br /><strong>SELECT </strong>TOP 10000 NearKey <strong>FROM </strong>BigTable&#160;&#160;&#160; <br /><strong>TABLESAMPLE </strong>(1 PERCENT)</font>      </p>
</blockquote>
<p>And now, the fun begins! Lets first use our <strong>NearKey</strong> search and for a non skewed <strong>RowType</strong>:</p>
<blockquote><p><font face="Courier New"><strong>SELECT</strong> * <strong>FROM </strong>BigTable         <br /><strong>JOIN </strong>#SearchFor         <br /><strong>&#160; ON</strong> BigTable.NearKey = #SearchFor.NearKey        <br /><strong>WHERE</strong> RowType = 99</font>      </p>
</blockquote>
<p>Plan (only join shown)</p>
<p><a href="http://kejserbi.files.wordpress.com/2012/02/image.png"><img style="background-image:none;padding-left:0;padding-right:0;display:inline;padding-top:0;border-width:0;" title="image" border="0" alt="image" src="http://kejserbi.files.wordpress.com/2012/02/image_thumb.png?w=600&h=162" width="600" height="162" /></a></p>
<p>Looking at Predicate in properties for the <strong>BigTable</strong> scan we unsurprisingly see that the predicate is being driven down: <strong>[dbo].[BigTable].[RowType]=(99)</strong>… Nothing interesting so far. The plan is parallel. Again, not surprising considering that we are scanning accessing 1M rows in <strong>BigTable</strong>.</p>
<p>… but as Lasse and I observed on his workload, something isn’t quite right here. </p>
<p>Let us run the same query, but with the filter <strong>RowType = 99</strong> replaced by <strong>RowType = 177</strong>. The new plan is:</p>
<p><a href="http://kejserbi.files.wordpress.com/2012/02/image1.png"><img style="background-image:none;padding-left:0;padding-right:0;display:inline;padding-top:0;border-width:0;" title="image" border="0" alt="image" src="http://kejserbi.files.wordpress.com/2012/02/image_thumb1.png?w=240&h=162" width="240" height="162" /></a></p>
<p>It is at this point your design intuition should wake you up with that good ol’ WTF feeling.&#160; “Wait a minute you should say”…<u>before</u> (when there were <u>fewer</u> rows coming out of <strong>BigTable</strong>) we were parallel and hashing the output of <strong>BigTable</strong>. Now, when there are more (50x more) rows, we are <u>both</u> reversing the hash order, probing into <strong>BigTable </strong>and hashing <strong>SearchFor</strong>. Note that all estimates are relatively accurate (within 10%) and statistics in the space we are searching is fully up to date… If we take a fully naïve approach, trusting relational algebra and query optimizers, there is nothing more we can do here.</p>
<p>But hey, it gets worse: What happens when we go outside the boundaries of the statistics. This for example, could happen when the stats are not fully updated – a completely normal situation in a database. Searching with <strong>RowType = 199 </strong>(that is not in the histogram) and estimating the plan, we get:</p>
<p><a href="http://kejserbi.files.wordpress.com/2012/02/image2.png"><img style="background-image:none;padding-left:0;padding-right:0;display:inline;padding-top:0;border-width:0;" title="image" border="0" alt="image" src="http://kejserbi.files.wordpress.com/2012/02/image_thumb2.png?w=591&h=148" width="591" height="148" /></a></p>
<p>The estimate coming out of <strong>BigTable </strong>is 1 row, the actual is 100K; this is probably going to hurt. </p>
<p>Running the query gives us the deadly:</p>
<p><a href="http://kejserbi.files.wordpress.com/2012/02/image3.png"><img style="background-image:none;padding-left:0;padding-right:0;display:inline;padding-top:0;border-width:0;" title="image" border="0" alt="image" src="http://kejserbi.files.wordpress.com/2012/02/image_thumb3.png?w=408&h=189" width="408" height="189" /></a></p>
<p>Congratulations to us, we just committed performance suicide!</p>
<p>Lets just list the execution times of these query variants:</p>
<table border="1" cellspacing="0" cellpadding="2" width="581">
<tbody>
<tr>
<td valign="top" width="128"><strong>Join Strategy</strong></td>
<td valign="top" width="84"><strong>RowType</strong></td>
<td valign="top" width="98"><strong>CPU time</strong></td>
<td valign="top" width="99"><strong>Elapsed</strong></td>
<td valign="top" width="170"><strong>Logical I/O Reads</strong></td>
</tr>
<tr>
<td valign="top" width="130">Hash BigTable         <br />Probe #SearchFor</td>
<td valign="top" width="84">99</td>
<td valign="top" width="98">421ms</td>
<td valign="top" width="99">549ms</td>
<td valign="top" width="170">31429</td>
</tr>
<tr>
<td valign="top" width="131">Hash #SearchFor         <br />Probe BigTable</td>
<td valign="top" width="84">177</td>
<td valign="top" width="98">765ms</td>
<td valign="top" width="99">1293ms</td>
<td valign="top" width="170">31429+62</td>
</tr>
<tr>
<td valign="top" width="131">Merge Join (sort both)</td>
<td valign="top" width="84">199</td>
<td valign="top" width="100">1575ms</td>
<td valign="top" width="102">2261ms</td>
<td valign="top" width="181">31429+62</td>
</tr>
</tbody>
</table>
<p>What is the problem here? From the user’s perspective, the query response time will often vary unpredictably – which is a poor experience. But worse, from a system and DBA perspective: the memory consumption will vary a lot depending on which query plan we end up with. This means it becomes hard to extrapolate the total system throughput and resource requirements based on the queries. If you are capitalist inclined in your mindset, let me translate this: At the end of the day, this costs you money!</p>
<p>Why is this happening? It is happening because there are hidden assumptions here that you have not brought into the light. We are trusting technology (in this case the database statistics and optimizer) with solving something that is a data modeling problem.</p>
<p>This is a very good example of something Thomas has been seeing a lot lately: An unwillingness to be specific about your requirements costs you money – a LOT of money. In other words: There is a price tag associated with ignorance. Let us make it clear what we are NOT saying: There is nothing wrong with ignorance: it is hard to know what you don’t know. Sometimes, you simply don’t know a requirement exists. The trouble arrives when denial and lack of curiosity and analysis trumps the observed data.</p>
<p>Fortunately, Lasse is the curious kind and he asked all the right questions and sought knowledge to remedy the situation.</p>
<p>Let’s get pragmatic, and talk about solutions.</p>
<h3>The solution</h3>
<p>With all the query plans above, which one do we want? In this case, the answer is: “None of them”</p>
<p>Here is an observation about the workload we discovered as we were analysing the solution: It is reasonable to assume ask the user to accept that response time is proportional with the number of results returned. From a system perspective, we also want the memory consumption and CPU to be proportional to the returned result size. These are questions you have to ask yourself as system designer and discussion you can have with users.</p>
<p>This is where computer science comes into the equation. We know:</p>
<ol>
<li>That we can search a B-tree of n rows in lg(n) time (which is as good as constant) </li>
<li>That the the returned result are capped by the number of rows (let us call this R) in <strong>#SearchFor</strong> </li>
<li>That we are willing to accept that the runtime and resource use is proportional with number of rows returned </li>
<li>That we cant live with unpredictable behavior</li>
</ol>
<p>The optimizer KNOWS about the search time and the cap (1+2). But the optimizer does NOT know that we are willing to declare certain properties that narrow the search spare (3) and that we want predictability (4). Our &quot;ignorance” has a price, and the optimizer makes a choice that might be optimal, but which unfortunately is also unpredictable.</p>
<p>This is a great case for hinting. And we can do it like this:</p>
<blockquote><p><font face="Courier New"><strong>SELECT </strong>*         <br /></font><font></font><font color="#ff0000"><strong>FROM </strong>#SearchFor             <br /><strong>INNER LOOP JOIN </strong>BigTable             <br />&#160; <strong>ON </strong>BigTable.NearKey = #SearchFor.NearKey</font>          <br /><strong>WHERE </strong>RowType = 177</p>
</blockquote>
<p><strong>Note something important here: </strong>The order of the joined tables <u>matter</u> when you use the LOOP hint (which is why we flipped them above). The above hint BOTH forces the join order and the join strategy. </p>
<p>An alternative, which may be more generally applicable is this rewrite:</p>
<blockquote><p><font face="Courier New"><strong>SELECT </strong>*         <br /></font><font></font><font face="Courier New"></font><font><strong>FROM </strong>BigTable           <br /><strong>INNER JOIN</strong> #SearchFor </font><font color="#ff0000"><strong>WITH (FORCESEEK)</strong></font>          <br /><strong>ON </strong>BigTable.NearKey = #SearchFor.NearKey<font face="Courier New"></font><font>         <br /></font><strong>WHERE </strong>RowType = 177        <br /><font><strong><font color="#ff0000">OPTION (LOOP JOIN)</font></strong>          <br /></font></p>
<p><font face="Courier New"></font></p>
</blockquote>
<p>The above rewrite does not guarantee join ordering. However, with the FORCESEEK hint, the plan we want is almost guaranteed.</p>
<p>Let us add some new rows to the result:</p>
<table border="1" cellspacing="0" cellpadding="2" width="581">
<tbody>
<tr>
<td valign="top" width="128"><strong>Join Strategy</strong></td>
<td valign="top" width="84"><strong>RowType</strong></td>
<td valign="top" width="98"><strong>CPU time</strong></td>
<td valign="top" width="99"><strong>Elapsed</strong></td>
<td valign="top" width="170"><strong>Logical I/O Reads</strong></td>
</tr>
<tr>
<td valign="top" width="130">Hash BigTable         <br />Probe #SearchFor</td>
<td valign="top" width="84">99</td>
<td valign="top" width="98">421ms</td>
<td valign="top" width="99">549ms</td>
<td valign="top" width="170">31429</td>
</tr>
<tr>
<td valign="top" width="131">Hash #SearchFor         <br />Probe BigTable</td>
<td valign="top" width="84">177</td>
<td valign="top" width="98">765ms</td>
<td valign="top" width="99">1293ms</td>
<td valign="top" width="170">31429+62</td>
</tr>
<tr>
<td valign="top" width="131">Merge Join (sort both)</td>
<td valign="top" width="84">199</td>
<td valign="top" width="100">1575ms</td>
<td valign="top" width="102">2261ms</td>
<td valign="top" width="181">31429+62</td>
</tr>
<tr>
<td valign="top" width="131">Hinted LOOP</td>
<td valign="top" width="84">99</td>
<td valign="top" width="100">406</td>
<td valign="top" width="102">718</td>
<td valign="top" width="181">81259</td>
</tr>
<tr>
<td valign="top" width="131">Hinted LOOP</td>
<td valign="top" width="84">177</td>
<td valign="top" width="100">499</td>
<td valign="top" width="102">964</td>
<td valign="top" width="181">81259</td>
</tr>
<tr>
<td valign="top" width="131">Hinted LOOP</td>
<td valign="top" width="84">199</td>
<td valign="top" width="100">306</td>
<td valign="top" width="102">718</td>
<td valign="top" width="181">81259</td>
</tr>
</tbody>
</table>
<p>Above, we can see why the optimizer didn’t want to help us! The number of IOPS required for my hinted plan are much higher than the optimizer generated plans. </p>
<p>The hinted plan shape is:</p>
<p><a href="http://kejserbi.files.wordpress.com/2012/02/image4.png"><img style="background-image:none;padding-left:0;padding-right:0;display:inline;padding-top:0;border-width:0;" title="image" border="0" alt="image" src="http://kejserbi.files.wordpress.com/2012/02/image_thumb4.png?w=600&h=235" width="600" height="235" /></a></p>
<p>Obviously, we could replace the non clustered index with a clustered on (which would get rid of the RID lookup) to make this query faster. But in this case, this was not a viable solution for other reasons.</p>
<h3>Summary</h3>
<p>In this blog, I have shown you an example of database optimizers failing to make the “right choice”. There are several reasons this is happening:</p>
<p><strong>Lack of information:</strong> The database needs declarative information about the user’s intent and what we consider the Right Choice™. In the example, the missing information is: “the speed of the returned result should always be proportional with the result size”. We used a hint to control the optimizers behavior, using our knowledge of the physical execution of queries. Other people might have taken more extreme steps and argued for noSQL – perhaps too aggressive a step in this case. </p>
<p><strong>Leaky abstractions: </strong>We saw the line between the “logical” and “physical” model break down. We believed that by building the right model, we had supplied enough information to the database. The distinction between physical and logical model has, as Thomas previously argued, always been a pretty useless paradigm. We have to consider the components of the system in a holistic way, but not at the expense of maintaining the overview of individual building blocks.</p>
<p><strong>Optimality vs. Good enough</strong>: the optimizers default behavior is to look for “optimal plans”. In the plan search space, even a very small space like this example, plans are sometimes found that look optimal from statistics. Yet, these plans may not be optimal or have the properties we seek – or we may fail to recognize the good plans as we search the space. In our case, we just wanted a plan that was “good enough” and had certain predictable attributes. </p>
<p><strong>False hardware assumptions</strong>: The optimizer makes some assumptions about the cost of IOPS which unfortunately do not correlate with the reality of a modern machine. It assumes that optimizing for a low number of Logical IOPS is a Good Thing™. In this light, the plans generated are the right plans. However, the optimizer does not take into account that the I/O system might be fast, and that there is enough RAM to have a likelihood of finding some of the needed data already residing in the buffer pool. To prove this: Recall that we had fully updated statistics available for queries going after <strong>RowType = 177</strong>. But, running the query on a petty IBM laptop the&#160; loop hinted plan, even on an empty buffer pool, is still faster than running the plan generated by the optimizer.</p>
<p>The simple query we studied here raise some interesting questions about the design tradeoffs that the data modeler and architect is forced to make. In this case, we were fortunate that Lasse was on the alert for these tradeoffs. With this example, we have seen what the potential consequences of “behavior and model ignorance” are and why you need to be alert to it.&#160; </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kejserbi.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kejserbi.wordpress.com/496/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kejserbi.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kejserbi.wordpress.com/496/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kejserbi.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kejserbi.wordpress.com/496/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kejserbi.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kejserbi.wordpress.com/496/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kejserbi.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kejserbi.wordpress.com/496/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kejserbi.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kejserbi.wordpress.com/496/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kejserbi.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kejserbi.wordpress.com/496/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=496&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.kejser.org/2012/02/21/when-statistics-are-not-enough-search-patterns/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/93e3150ebaf2321aef3dc4faee057861?s=96&#38;d=retro&#38;r=G" medium="image">
			<media:title type="html">schastar42</media:title>
		</media:content>

		<media:content url="http://kejserbi.files.wordpress.com/2012/02/image_thumb.png" medium="image">
			<media:title type="html">image</media:title>
		</media:content>

		<media:content url="http://kejserbi.files.wordpress.com/2012/02/image_thumb1.png" medium="image">
			<media:title type="html">image</media:title>
		</media:content>

		<media:content url="http://kejserbi.files.wordpress.com/2012/02/image_thumb2.png" medium="image">
			<media:title type="html">image</media:title>
		</media:content>

		<media:content url="http://kejserbi.files.wordpress.com/2012/02/image_thumb3.png" medium="image">
			<media:title type="html">image</media:title>
		</media:content>

		<media:content url="http://kejserbi.files.wordpress.com/2012/02/image_thumb4.png" medium="image">
			<media:title type="html">image</media:title>
		</media:content>
	</item>
		<item>
		<title>A DW Venn Diagram</title>
		<link>http://blog.kejser.org/2012/01/28/a-dw-venn-diagram/</link>
		<comments>http://blog.kejser.org/2012/01/28/a-dw-venn-diagram/#comments</comments>
		<pubDate>Sat, 28 Jan 2012 03:43:14 +0000</pubDate>
		<dc:creator>Thomas Kejser</dc:creator>
				<category><![CDATA[Data Warehouse]]></category>
		<category><![CDATA[Architecture]]></category>

		<guid isPermaLink="false">https://kejserbi.wordpress.com/?p=483</guid>
		<description><![CDATA[<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=483&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://kejserbi.files.wordpress.com/2012/01/image15.png"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="image" border="0" alt="image" src="http://kejserbi.files.wordpress.com/2012/01/image_thumb15.png?w=507&h=461" width="507" height="461" /></a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/kejserbi.wordpress.com/483/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/kejserbi.wordpress.com/483/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/kejserbi.wordpress.com/483/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/kejserbi.wordpress.com/483/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/kejserbi.wordpress.com/483/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/kejserbi.wordpress.com/483/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/kejserbi.wordpress.com/483/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/kejserbi.wordpress.com/483/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/kejserbi.wordpress.com/483/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/kejserbi.wordpress.com/483/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/kejserbi.wordpress.com/483/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/kejserbi.wordpress.com/483/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/kejserbi.wordpress.com/483/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/kejserbi.wordpress.com/483/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.kejser.org&#038;blog=18866241&#038;post=483&#038;subd=kejserbi&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.kejser.org/2012/01/28/a-dw-venn-diagram/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/93e3150ebaf2321aef3dc4faee057861?s=96&#38;d=retro&#38;r=G" medium="image">
			<media:title type="html">schastar42</media:title>
		</media:content>

		<media:content url="http://kejserbi.files.wordpress.com/2012/01/image_thumb15.png" medium="image">
			<media:title type="html">image</media:title>
		</media:content>
	</item>
	</channel>
</rss>
