SQL Server Logo

Running SQL Server on a Big Box – Configuration

In this next instalment of the big box blog series I will provide some guidance on how you configure a big box for best performance.

I have previously blogged about my standard guidance for SQL Server configuration. But for a big box, you need more than this.

Think for Yourself!

From the introductory blog, you may recall that I mentioned big boxes are rare. I estimate that 95% of all boxes out there are not big. If you bought a big box,  most the “best practices” and standard guidance you find don’t apply you. You will have to deviate from the standards, sometimes in ways that there is no “official” guidance for. In other words, you can no longer rely on Microsoft to think for you, you have to think for yourself. The corollary is that you can’t rely on me to think for you either. You should take everything I say here with a pinch of salt – it may not apply to your specific case.

This doesn’t mean you you should start guessing random configuration changes. I am afraid you just need to bite the bullet and do the hard work of thinking and testing your theories. A solid grounding in scientific methodology will come in handy.

Generic Configuration

Irrespective of the workload you run, there are some general things you need to configure.

Tempdb: One file per core, no exceptions.

Trace Flag 8048: Documented here, this trace flag enables an important fix.

Trace Flag 1236: Documented here, as above.

Latest Windows Version and service pack: Your IT department may not have “validated” this. Refer to the “think for yourself” principle. The Windows kernel is constantly evolving. One of the places that are seeing improvement is in networking and NUMA awareness, I-O and memory management. Because you are a big box, those improvements are there for you.

Latest SQL Server Service Pack: Same argument as above. You are a big box, learn to live life dangerously.

Automate collection of Spin Lock and Latch Stats: If you are not already collecting the contents of sys.dm_os_spinlock_stats (could someone please ask Microsoft to document this one?) and sys.dm_os_latch_stats, you should be. These DMV become almost as important as sys.dm_os_wait_stats.

Networking

You need to have the best network drivers you can find. With good drivers, the CPU load of the network can be spread evenly over all cores. However, it is often a good idea to affinitise network (and I/O) to specific cores and set up SQL Servers affinity mask to avoid these cores. Getting that configuration optimal is trial and error, but XPERF/WPA helps a lot here as it allows you to pinpoint where the kernel time goes. I have generally had the best experience with Intel NIC and prefer them because I am familiar with the driver stack. I hear that Broadcom NIC are now very good too. You may need to turn of TCP/IP Chimney if you see sporadic disconnect errors from SQL Server (The infamous “Communicating Link Failure”).

You should also make yourself familiar with this paper and the method it describes.

What you are going to experience from here is likely going to depend on the type of workload you run.

OLTP

This workload is characterised by a high amount of user concurrency with a lot of short running queries. Typically, you will be in the tens (or hundreds) of thousands of round trips per second to the server.

The great news is that OLTP workloads are most likely to benefit from a big box, at least if well written. The main issue you will have with OLTP system on a big box is thread coordination. If your main wait stat is locking (LCK) moving to a bigger box is unlikely to help your workload. In fact, it will most likely slow it down because locks take longer to acquire on a big box. A great example of this effect can be seen in TPC-C on the WAREHOUSE table. This table is highly contended with update locks. Moving a small workload TPC-C from a 2 socket to a 4 socket machine will typically slow down the workload.

Once you have eliminated LCK waits, the next thing you have to worry about is latches and spinlocks. Of these, the latches tend to be the most common.

PAGELATCH

The PAGELATCH wait tends to appear in three situations:

Too Few Data files: This typically results in waits for PAGELATCH_UP. They are easy to diagnose and the solution is to add more data files to the filegroups in the database. However, even when your files equal the number of cores, you may still see a little of this wait. Here is an OLTP workload at 48 cores with a varying number of data files:

48CorePAGELATCH

Last page INSERT: Typically shows itself as PAGELATCH_EX. This happens when you use IDENTITY or SEQUENCE to generate keys. I have written elsewhere about this scale problem. The solution is to use GUID or reversed SEQUENCERS for keys instead. fixing this can be a major change in the database design.

Small tables with contested UPDATE: Typically shows itself as PAGELATCH_EX and PAGELATCH_SH. This happens when a small table with narrow rows has a lot of UPDATE statements running on it. The solution is to “pad” the rows so each row is one full page. Adding a CHAR(4000) NOT NULL with a default will typically fix this. On SQL 2014, a default value is not enough, you have to actually change the padding column with an UPDATE statement.

WRITELOG

The transaction log is the final wall for DML heavy workloads like OLTP. Your best bet is to place the log on 2-4 dedicated 15K spindles with a good write caching controller. Don’t put anything else (like data files) on the log drive if you do this. Alternatively, place the log on PCIe based SSD with low latency. You should aim to have the log writes take less than 100 microseconds (0.1ms).

Unfortunately, even if you throw the best disk system at the log, you can still write transactions faster on a 2 socket than a 4 socket. This happens because the log has a central spinlock protecting it (it has to). This spinlock can be locked and unlocked faster on a small system than on a big system. If you see the spin: LOGCACHE_ACCESS grow very fast as you add more work to the system, you know you are bumping into this bottleneck. If this is your bottleneck, you are likely better off on a 2 socket.

Spinlocks

Once you really push an OLTP system, you may see spinlocks increase. It is difficult to say something general about what to do about these. You simply have to diagnose each as you see it and come up with a good guess on how to work around it, or ask PSS for a patch.

Some examples of spinlock bottlenecks I have come across:

MUTEX: I ran into this one while doing dynamic SQL with sp_executesql. The solution was to do more work inside the sp_execute SQL than in the loop surrounding it.

SOS_OBJECT_STORE: I saw this spin in two cases. One of them I blogged about here. The other was related to spins protecting the security data structures. It was fixed by making all users sa – not a nice workaround (There was a SQL Bits session about this one). I think this has now been fixed.

LOCK_HASH: This spin used to be quite common on OLTP workloads but has not largely been patched. From the name, you can guess that it is related to locking. However, you have to drill deeper to understand what lock is causing it. In my particular case, I saw it when a workload used custom schemas (i.e. not dbo). The workload was improved by moving all objects to dbo! I believe this has now been patched as I have not been able to reproduce it on the latest service packs.

Some of the spinlocks of the same name are used in several areas of the engine. Even if my MUTEX spin had a certain solution, it may not mean yours has the same one. You just have to diagnose it using the guidance in the paper above.

The only trace flag I am aware of that need to set to enable the latest fixes is T9024 which is relevant for AlwaysOn systems. If you know any more of these, please comment.

Data Warehouse

Also known as DSS, Analytics, BI, OLAP and Big Data, and other names expensive consulting agencies come up with to sound like they are inventing, instead of renaming things. This workload is characterised by a low amount of query concurrency, but a lot of long running queries that use parallel execution.

First of all, very few parallel queries scale well beyond DOP 16 in SQL Server; index builds being the notable exception. Joe Chang has done several experiments illustrating this and so has Chris Adkin – their experience matches mine. Because of this limitation, it is tempting to configure max degree of parallelism to 16. However, it is worth making yourself familiar with Adam Machanic’s “Magic CROSS APPLY” and his make_parallel procedure.

In fact, if you are planning to run a data warehouse on a big box, you need to read all of Adam’s articles before you do anything else. When you have read them, ask yourself: “Why do I need to do all this to make it work?”. Then, think carefully about what features you REALLY want in the next version (hint: Single table restore or XML indexes are probably no longer that important).

ACCESS_METHOD_DATASET_PARENT

This latch is generally an indication that something is bottlenecking in the query plan. One trick that often works is to hash partition the tables involved. Sometimes, you have to dig into the specific operator in the query plan by looking at the resource_description of the CXPACKET wait.

Network

On a data warehouse, there are two settings you nearly always want to run:

Network Packet Size: Set to 32767. It improves throughput a lot

Jumbo Frames: If possible on your network, enable it.

You are likely going to need at least one or two 10Gbit NIC on a big box. Alternatively, get several 1Gbit NIC.

SELECT INTO and INSERT…SELECT

Until SQL Server 2014, the insert part of these common data warehouse queries is single threaded. You are likely to need some manhandling of the ETL flow to speed up insert speeds on a big box. Refer to this whitepaper:

If you are not able to parallelise these types of queries, you are again likely to see a regression as  you move to a big box. The reason is that small boxes have much lower latency on the transaction log and can run a single thread faster than a big box (because smaller boxes have higher clock rates and lower memory access latency).

I/O and Filegroups

Unless your data warehouse largely fits in DRAM, you are going to need a monster I/O system to feed the database engine. If anyone tells you that SQL Server or Windows cannot do I/O, do this:

SQL Server is a greedy beast, it can eat tens of GB/sec of bandwidth and Windows will happily deliver. Likely, you are going to need some serious SSD (prefer bandwidth over latency for a data warehouse) or a Fast Track design. Test the design carefully before deploying it. When using traditional, spinning media, don’t be afraid to leave a lot of space empty in order to get the speed you need.

If you are tuning for sequential I/O on a data warehouse with spinning disks, it is wise to follow the Fast Track guidance. Even though you may not own a Fast Track reference configuration, the ideas here still apply to you. This means having 1-2 files in per 2 or 4 drives and splitting those drives into a lot of LUN. This can be a real nightmare to manage (you MUST script it), but the performance is often worth it.

NAND flash is now getting very cheap, in the 1 USD/GB range. This means that you can now buy a good “enterprise” flash drive like the Intel 3600 for less than what you pay for SAN storage. The cheaper drives can’t take as many write cycles as the expensive ones, but you would be surprised at just how many write cycles they CAN take. They are likely to outlive the server. Additionally, if you are running a data warehouse, chances are you will be reading a lot more than you are writing. Consider if it is worth the trouble to tune for sequential I-O on spindles, or if you just want to bite the bullet and get a pile of cheap SSD in some big SATA disk shelves.

Backup

On a big box data warehouse, you are likely to need a dedicated network just to handle backups. Again, SQL Server is capable of spamming the backup system with tens of GB/sec. This type of bandwidth is likely to flood your existing backup infrastructure with too much speed. Consider using a dedicated backup system for the data warehouse.

If the database is larger than a few TB, you should be thinking about a filegroup based backup strategy. Remember to back up your PRIMARY filegroup every time (and don’t place any tables or indexes in it).

Summary

In this blog, I have described some tricks you need to apply if you want scale on big boxes. I will be updating this with more details when I have time to write again. However, I wanted you to have early access to this guidance.

 

 

SillyWalk-Featured

Trace Flag 4136

This week, I have been tuning a Dynamics C5 database. What a horrible mess that is. However, I ran into something that might come in handy in a lot of situations.

I have long known about Trace flag 4136 – but until now, I did not realise this flag is officially documented in a Microsoft source:

Dynamics AX in the Field

This flag is immensely useful and not just for old versions of Dynamics C5. In this blog, I will provide an example of its use and some details about what the flag does.

Continue reading…

Millenium Falcon Light Speed

Need a Quick Performance Fix?

I am currently working on getting VC funding for a very exciting database project. Updates to follow on this blog once I can talk more about it.

However, while we are getting the paperwork in order – I have some extra time to spare. Instead of being bored doing things like vacation, I would rather be tuning databases and make some money on the side.

If you have an interesting database problem that needs solving by me, please get in touch on thomas@kejser.org. I offer a competitive price if I can work remotely without traveling or within London zones 1-2. Short term tasks only please.

Update: Has SQL Server already lost Mind Share?

In the comments on my previous post, Erdju made the suggestion that I add StackOverflow to my analysis – suggesting that the more SQL Server questions might be found there, changing the result of the analysis. Using the data.stackexchange.com site to analyse data as suggested by Nick Craver (instead of my own, mocked up version) I can now extract recent data for both sites.

This posts contains the updated result as well as a few additions to the previous post.

Continue reading…

is SQL Server Losing Mindshare?

As many of you have noticed, I have been flirting a lot with open source database lately. I am no longer spending much time going to SQL Server conferences.

About two years ago, I decided that it is time to diversify my knowledge and spend more time with other database products. Back then, this was largely based on a hunch that Open Source is now getting to the point where it is worth looking into. I didn’t really have any data to back up my decision to lower my investment in SQL Server – it was mostly intuition, a feeling for the zeitgeist.

Today, I wanted to assess where things are. It is difficult to get information about the popularity of databases that is not heavily biased, so I decided to mash up some data myself.

Continue reading…

SQL Server Logo

Database Page Row Order

Does Physically ordering rows inside database pages make sense?

A few weeks ago I was asked a great question by Sam, who follows my blog. With Sam’s permission, I am reprinting the question here.

Sam asks (my highlights) :

We are told that we can have only one Clustered Index since we can actually sort the data in only one way. However, a clustered index orders data in the leaf pages in sequential order, but does not order the data within that page itself to be in sequential order. Since the data on the page itself is not sorted sequentially (and thus implying more than one way of possible order), I am confused by the “Since we can sort data only in one way we can’t have more than 1 clustered index” reasoning.

Continue reading…

Prefer Mobile Data over WIFI on Mac OSX

With both 3G and 4G becoming more prevalent, I often find myself getting a better connection over an my mobile data subscription than via a public hot spot.

The problem you often get is that an unstable WIFI hotspot will make your Mac appear unresponsive, even if you have a perfect mobile data connection on a USB or bluetooth connected iPhone.

In this blog, I will describe a simple trick that will configure your Mac OSX to prefer a USB connected iPhone over a WIFI hotspot when both are available.

Continue reading…

Domino pieces about to fall

Adding another transaction log file to Gain Performance

Mike West recently posted a rather active thread on LinkedIn about databases with more than one transaction log file. Eventually, I realised that the full answer to the question: “When does it make sense to add another log file to a database for performance reasons?” is complex enough that it needs a blog available for future reference.

Continue reading…