Most database systems need some form of log table to keep track of events, for example for auditing purposes. To avoid the log growing forever, it is often a good idea to regularly rotate old log entries out of this table. For small log tables, running a DELETE statement works well for this purpose. However, as the log throughput grows, it is often preferable to use partition switching instead. In this blog, I will show you an implementation of a rotating log table.
When building a data warehouse, you often find yourself needing to move data that is the result of a query into a new table. In SQL Server, there are two way to do this efficiently:
- SELECT INTO
- INSERT INTO WITH (TABLOCK)
While both techniques allow to you achieve bulk logged (i.e. fast) inserts, I am going to argue that method 2 is preferable in most situations.
In the quest for 100% uptime, a great many hours must be invested in careful design. Even when you think you have eliminated every Single Point Of Failure (SPOF) – something new always shows up. It get’s worse: All this the effort is multiplied if you want to BOTH achieve high availability AND run a system at high speed.
In this blog, I will share some lessons we learned the hard way while tuning our high speed SQL Server mirror.
I am a ten finger typist doing around 5 strokes/second – though I do make a lot of mistakes. I learned the craft on a good old fashioned typewriter. The world has moved; unfortunately, keyboards have not. There are keys left in a modern keyboard that are simply of no use anymore – grim reminders of the age of typewriters or just leftovers from ancient (before 2000) times in computers.
Two such keys are CAPS LOCK and INS. As fast typist, you simply don’t need CAPS LOCK. And I have no idea what anyone needs INS for anymore. In other words, hitting either of those keys is always an error. Fortunately, there is a solution that does not require you to take a screwdriver to your keyboard: you can turn those keys off.
The attached registry file will do this nicely. Just unzip and apply.
Featured image: The legendary IBM M13 in Stealth Black.
After all these years, Mac, Windows and Linux still cannot agree on what special characters are used to represent a new line in a text file.
Windows generated text files will typically end a line with CR+LF (0x0D0A). Unix/Linux prefers LF alone (0x0A) and old versions of Macs will use CR alone (0x0D). See this Wikipedia entry for details.
When receiving and loading text files into databases, this causes a mess unless you are careful. Unfortunately, this is particularly true for the BULK INSERT command in SQL Server because the documentation is misleading.
Livedrive is currently looking to hire a rock star DBA and developers with a strong understanding of databases. We have 100TB of mySQL data online and a SQL server mirror running at 15k tx/sec non stop (peaking at 200K) with a nice little 3TB OLTP system.
And my goodness – rock star DBAs are hard to find. For those of you looking for one – or thinking you are one – I wanted to write up my advice.
Here is how I hire a DBA:
During my tuning sessions, there is a race condition I run across so often that I think it is worth blogging about it. It concerns the handling of a common requirement: How do I check if the value I want to insert is in the table already? And if it is, don’t insert the value.
Here is how the typical, non DBA person, often does this:
IF (EXISTS (SELECT * FROM Foo WHERE X = @X))
INSERT INTO Foo (X)
The problem with this pattern is that it is wrong. Under transaction isolation levels less than SERIALIZABLE, the above code does not guarantee that you won’t insert a duplicate.
I have just returned from a trip to Milan, my lovely wife traveling with me. And after a full weekend there, I feel like ranting again. For my trusted readers, if you expect something technical from this, please stop reading now.
In our core system, we have a table with two partitions. One partition contains all the work that “has been done” (which has the column WorkItem set to –1) and the other the “work in progress” (with WorkItem to different values, all > -1).
The reason we have created just two partitions for this table (which is a heap) is that the items that are “work in progress” are often scanned, yet the work that has been done (WorkItem = –1) is the vast majority of the table. This “mini partitioning” is a nice design pattern I often apply to skewed distribution like these. It provides a significant performance boost on table scans. But this week I saw an oddity I have not run into before.