TPCHFeatured

TPC-H: Data and Query Generation

The TPC council ships two utilities that can be used to generate the data and query for the TPC-H schema: DBGEN and QGEN.

The utilities are written in C to make them portable between platforms. Unfortunately, the code is rather old and I dare say so: not particularly pretty.

The TPC council ships two utilities that can be used to generate the data and query for the TPC-H schema: DBGEN and QGEN.

The utilities are written in C to make them portable between platforms. Unfortunately, the code is rather old and I dare say so: not particularly pretty.

The TPC council ships two utilities that can be used to generate the data and query for the TPC-H schema: DBGEN and QGEN.

The utilities are written in C to make them portable between platforms. Unfortunately, the code is rather old and I dare say so: not particularly pretty.

The TPC council ships two utilities that can be used to generate the data and query for the TPC-H schema: DBGEN and QGEN.

The utilities are written in C to make them portable between platforms. Unfortunately, the code is rather old and I dare say so: not particularly pretty.

The TPC council ships two utilities that can be used to generate the data and query for the TPC-H schema: DBGEN and QGEN.

The utilities are written in C to make them portable between platforms. Unfortunately, the code is rather old and I dare say so: not particularly pretty.

Continue reading…

Domino pieces about to fall

Adding another transaction log file to Gain PerformanceAdding another transaction log file to Gain PerformanceAdding another transaction log file to Gain PerformanceAdding another transaction log file to Gain PerformanceAdding another transaction log file to Gain Performance

Mike West recently posted a rather active thread on LinkedIn about database with more than one transaction log file. Eventually, I realised that the full answer to the question: “When does it make sense to add another log file to a database for performance reasons?” is complex enough that it needs a blog available for future reference. This is my attempt at creating that blog.

Mike West recently posted a rather active thread on LinkedIn about databases with more than one transaction log file. Eventually, I realised that the full answer to the question: “When does it make sense to add another log file to a database for performance reasons?” is complex enough that it needs a blog available for future reference.

Mike West recently posted a rather active thread on LinkedIn about database with more than one transaction log file. Eventually, I realised that the full answer to the question: “When does it make sense to add another log file to a database for performance reasons?” is complex enough that it needs a blog available for future reference. This is my attempt at creating that blog.

Mike West recently posted a rather active thread on LinkedIn about database with more than one transaction log file. Eventually, I realised that the full answer to the question: “When does it make sense to add another log file to a database for performance reasons?” is complex enough that it needs a blog available for future reference. This is my attempt at creating that blog.

Mike West recently posted a rather active thread on LinkedIn about database with more than one transaction log file. Eventually, I realised that the full answer to the question: “When does it make sense to add another log file to a database for performance reasons?” is complex enough that it needs a blog available for future reference. This is my attempt at creating that blog.

Continue reading…

TPCHFeatured

TPC-H: Schema and IndexesTPC-H: Schema and IndexesTPC-H: Schema and IndexesTPC-H: Schema and IndexesTPC-H: Schema and Indexes

The TPC-H benchmark is often used a method for customers to evaluate data warehouse products to make purchasing decisions. Because it is such a crucial benchmark, it is important to understand the challenges it presents for database vendors. Unfortunately, the public information about tuning for TPC-H is rather sparse and it is generally hard to come by good documentation. Vendors do not like to be compared with other vendors – so their secrecy is understandable.

In this blog series, I will try to shed some light on the TPC-H benchmark, what I think is wrong with it, and provide some of my thoughts about the challenges you face when tuning it.

The TPC-H benchmark is often used a method for customers to evaluate data warehouse products to make purchasing decisions. Because it is such a crucial benchmark, it is important to understand the challenges it presents for database vendors. Unfortunately, the public information about tuning for TPC-H is rather sparse and it is generally hard to come by good documentation. Vendors do not like to be compared with other vendors – so their secrecy is understandable.

In this blog series, I will try to shed some light on the TPC-H benchmark, what I think is wrong with it, and provide some of my thoughts about the challenges you face when tuning it.

The TPC-H benchmark is often used a method for customers to evaluate data warehouse products to make purchasing decisions. Because it is such a crucial benchmark, it is important to understand the challenges it presents for database vendors. Unfortunately, the public information about tuning for TPC-H is rather sparse and it is generally hard to come by good documentation. Vendors do not like to be compared with other vendors – so their secrecy is understandable.

In this blog series, I will try to shed some light on the TPC-H benchmark, what I think is wrong with it, and provide some of my thoughts about the challenges you face when tuning it.

The TPC-H benchmark is often used a method for customers to evaluate data warehouse products to make purchasing decisions. Because it is such a crucial benchmark, it is important to understand the challenges it presents for database vendors. Unfortunately, the public information about tuning for TPC-H is rather sparse and it is generally hard to come by good documentation. Vendors do not like to be compared with other vendors – so their secrecy is understandable.

In this blog series, I will try to shed some light on the TPC-H benchmark, what I think is wrong with it, and provide some of my thoughts about the challenges you face when tuning it.

The TPC-H benchmark is often used a method for customers to evaluate data warehouse products to make purchasing decisions. Because it is such a crucial benchmark, it is important to understand the challenges it presents for database vendors. Unfortunately, the public information about tuning for TPC-H is rather sparse and it is generally hard to come by good documentation. Vendors do not like to be compared with other vendors – so their secrecy is understandable.

In this blog series, I will try to shed some light on the TPC-H benchmark, what I think is wrong with it, and provide some of my thoughts about the challenges you face when tuning it.

Continue reading…

Shooting yourself in the foot

Window’s UTF-8 Support is a travestyWindow’s UTF-8 Support is a travestyWindow’s UTF-8 Support is a travestyWindow’s UTF-8 Support is a travestyWindow’s UTF-8 Support is a travesty

As you may have noticed, I have been creating standardised data feed of ISO data lately. In case you missed it, you can find the data here: http://kejser.org/resources/free-data/.

While I was working with this data, I began to notice a pattern: Windows has very poor support for UTF-8. In this post, I am going to be complaining loudly about this!

As you may have noticed, I have been creating standardised feeds of ISO data lately. In case you missed it, you can find the data here: http://kejser.org/resources/free-data/.

While I was working with this data, I began to notice a pattern: Windows has very poor support for UTF-8. In this post, I am going to be complaining loudly about this!

As you may have noticed, I have been creating standardised data feed of ISO data lately. In case you missed it, you can find the data here: http://kejser.org/resources/free-data/.

While I was working with this data, I began to notice a pattern: Windows has very poor support for UTF-8. In this post, I am going to be complaining loudly about this!

As you may have noticed, I have been creating standardised data feed of ISO data lately. In case you missed it, you can find the data here: http://kejser.org/resources/free-data/.

While I was working with this data, I began to notice a pattern: Windows has very poor support for UTF-8. In this post, I am going to be complaining loudly about this!

As you may have noticed, I have been creating standardised data feed of ISO data lately. In case you missed it, you can find the data here: http://kejser.org/resources/free-data/.

While I was working with this data, I began to notice a pattern: Windows has very poor support for UTF-8. In this post, I am going to be complaining loudly about this!

Continue reading…

A pen and handwriting

Make OSX and Linux command line behave more like Windows

Lately, I have been doing more and more work in my native OSX image. And I am really loving it.

However, being an old Windows user, I find a lot of habits hard to break and I constantly find myself wanting the standard Bash command line to behave just a little more like good old DOS. To avoid the same typing error over and over again, I found these modifications to be useful. I am sharing them here in hope they may help other people who are transitioning from Windows. Continue reading…

SQL Server Logo

On using Query Hints

The revealed wisdom in the SQL Server community has generally been that query hints should only be used as a last resort. For a complete novice of SQL Server, I would agree. But to call avoiding query hints a “best practise” is taking it too far.

For anyone with a little experience who knows what they are doing, I find that query hints are not only a good reactive solution, they are a proactive, design time, tool.

Continue reading…

Moebius Strip

Table Pattern: Rotating Log / Ring Buffer

Most database systems need some form of log table to keep track of events, for example for auditing purposes. To avoid the log growing forever, it is often a good idea to regularly rotate old log entries out of this table. For small log tables, running a DELETE statement works well for this purpose. However, as the log throughput grows, it is often preferable to use partition switching instead. In this blog, I will show you an implementation of a rotating log table.

Continue reading…

Shooting yourself in the foot

SELECT INTO – Moving Data From A to B

When building a data warehouse, you often find yourself needing to move data that is the result of a query into a new table. In SQL Server, there are two way to do this efficiently:

  1. SELECT INTO
  2. INSERT INTO WITH (TABLOCK)

While both techniques allow to you achieve bulk logged (i.e. fast) inserts, I am going to argue that method 2 is preferable in most situations.

Continue reading…

Domino pieces about to fall

High Availability, at High SpeedHigh Availability, at High Speed高可用性,高速度High Availability, at High SpeedHigh Availability, at High Speed

In the quest for 100% uptime, a great many hours must be invested in careful design. Even when you think you have eliminated every Single Point Of Failure (SPOF) – something new always shows up. It get’s worse: All this the effort is multiplied if you want to BOTH achieve high availability AND run a system at high speed.

In this blog, I will share some lessons we learned the hard way while tuning our high speed SQL Server mirror.

In the quest for 100% uptime, a great many hours must be invested in careful design. Even when you think you have eliminated every Single Point Of Failure (SPOF) – something new always shows up. It get’s worse: All this the effort is multiplied if you want to BOTH achieve high availability AND run a system at high speed.

In this blog, I will share some lessons we learned the hard way while tuning our high speed SQL Server mirror.

为了寻求100%的正常运行时间,系统设计要求投入很多时间和精力。虽然你认为已经把每一个单一故障点(SPOF)消除了,但新的问题总会出现。而且变得很糟糕:如果你想同时得到高可用性和高速度,你需要投入的精力就要加倍。

这篇文章里,我想和大家共享一些在调节我们高速镜像SQL服务器过程中用我们的艰辛换来的经验。

In the quest for 100% uptime, a great many hours must be invested in careful design. Even when you think you have eliminated every Single Point Of Failure (SPOF) – something new always shows up. It get’s worse: All this the effort is multiplied if you want to BOTH achieve high availability AND run a system at high speed.

In this blog, I will share some lessons we learned the hard way while tuning our high speed SQL Server mirror.

In the quest for 100% uptime, a great many hours must be invested in careful design. Even when you think you have eliminated every Single Point Of Failure (SPOF) – something new always shows up. It get’s worse: All this the effort is multiplied if you want to BOTH achieve high availability AND run a system at high speed.

In this blog, I will share some lessons we learned the hard way while tuning our high speed SQL Server mirror.

Continue reading…

A pen and handwriting

How I Host on WordPressHow I Host on WordPressHow I Host on WordPressHow I Host on WordPressHow I Host on WordPress

Last week, I was asked how I run my new WordPress site. I thought it might be worth sharing my lessons learnt from the redesign. So here goes.Last week, I was asked how I run my new WordPress site. I thought it might be worth sharing my lessons learnt from the redesign. So here goes.Last week, I was asked how I run my new WordPress site. I thought it might be worth sharing my lessons learnt from the redesign. So here goes.Last week, I was asked how I run my new WordPress site. I thought it might be worth sharing my lessons learnt from the redesign. So here goes.Last week, I was asked how I run my new WordPress site. I thought it might be worth sharing my lessons learnt from the redesign. So here goes. Continue reading…