MySQL – First Impressions
In my new job as the CTO of Livedrive, I have the pleasure of working with both Microsoft SQL Server and MySQL. We have a rather nice real estate with tens of petabytes online which keeps us entertained with scale challenges.
Having spent some time with MySQL lately, and being an old SQL Server user, I thought it might be interesting to share some of my early experiences. The good, the bad and the ugly.
The opinions expressed here are my own, not those of my current or previous employers. With that said, lets get on with it.
.NET Support is Excellent
MySQL is beautifully integrated to .NET. Switchers from MSSQL will barely notice the difference in the API. If you code directly against interfaces like IDataReader and IDbConnection (and you should) it is drop and replace. You can connect to both Linux and Windows instances, there isn’t really any difference (unlike for example, DB2 where connectivity is very different across target platforms).
The MySQL connectivity libraries are available as a NuGet package. You can even use the same driver to connect to MariaDb, Percona and Oracle MySQL.
Too much choice?
There are currently at least 4 different versions of MySQL out there: MariaDb, Percona, MySQL (from Oracle) and MemSQL (A hekaton like, in memory optimized version). It takes some time to get used to the ecosystem. The client protocol is identical between the different variants, but picking between them is non trivial. The good news is that both Facebook and Google actively contribute to MySQL, so you will find a lot of cool development mindshare.
But choice isn’t always a good thing. When the solution that must be implemented is clear and well understood, and I would claim that building a database is, pooling resources into solving it The Right Way™ once and for all provides a better experience for the end user.
MySQL Does Not do Query Parallelism
There is no such thing as a parallel query in MySQL! This includes index builds, which are also single threaded. Obviously, this sets some rather serious limitations on what you can reasonably use MySQL for. For large tables, you can work around this limitation by partitioning the tables, but still…
However, you do have the notion of sharding data and using the Shard-Query tool to achieve an effect similar to that of MSSQL, or should I perhaps say PDW.
MySQL does not do Hash Joins
Some might consider this a good thing. If you are an OLTP system, the last thing you want is some developer getting hash join queries on a small dataset, claiming great performance and then horribly breaking stuff when the data grows, the optimiser starts using loops joins and all the missing indexes show their ugly face.
Unfortunately, hash joins are amazing for data warehousing. I have written previously about how they are the bread and butter of good star schema designs. They are sorely missed in MySQL.
MySQL History Lists are EVIL!
(Thanks to Sunny Bains for correcting me on the details of how the purge records are held, I hope this update does it more justice)
MySQL uses MVCC by default, but with a very odd implementation. When a query reads a row that is concurrently updated, the OLD version of the row (the one being read) is kept around in the index. MySQL maintains a list of the old records that are still alive called the “history list”. Eventually, when the old version of the records is no longer visible to queries, they are removed by a background thread – this is called a “purge”. I suppose this implementation is done in an attempt to save on I/O.
Because history lists cannot be “purged” while someone is accessing data in them, what will happen when someone is executing a long running SELECT statement, is that you up making the purge thread fall behind. As the history list grows and long running queries continue to arrive, everything eventually grinds to a halt on the server.
Users of MSSQL or PostGres will find this implementation choice very odd. On older version of MySQL (pre 5.5) the history list purging is single threaded and runs in the main event loop, making things even worse. The history list design is probably the largest design flaw in MySQL and one I hope will be addressed properly.
MySQL is NOT a Data Warehouse Engine
The lack of query parallelism, hash joins and proper handling of long running queries makes MySQL unsuitable as a Data Warehouse engine. MemSQL seems to address some of these issues and there are interesting projects on the that way which may make me change my mind. But for now, I would not use MySQL for anything but OLTP – not even for a mixed workload system.
Users of MSSQL, who often see mixed workloads and handle them nicely, may find themselves frustrated by MySQL when trying to switch. You have been warned.
Auto-increment is Stupid!
As a MSSQL user, you will be deeply familiar with IDENTITY and SEQUENCER to generate keys. MySQL does have the notion of an auto-increment column, which looks somewhat like IDENTITY. However, it has some odd drawbacks:
- The max increment is 64K (which is inconvenient for certain shard strategies)
- You cannot reset the seed value easily, especially not to a value less than the current max value (even though there is plenty of space to generate new values)
- Inserting a value higher than the current value of the auto-increment resets the seed to the new value. In other words: auto-increment is always = SELECT MAX(column)
While the above behaviour protects people against shooting themselves in the foot (having the auto increment duplicate keys for example), it does not provide the freedom to modify key generation in smart ways, for example in replication scenarios.
MySQL supports GUID keys, which has some major advantages that I have previously described.
MySQL has Great Management Tools
There is a large undergrowth of tools for MySQL which fully match the capabilities of Enterprise Manager / Management Studio for MSSQL. There is everything you would want here: Schema exploration, automatic scripting of objects, intellisense, syntax highlighting, version control integration and GUI based management (for those of you who don’t like to script things).
There are a ton of open source tools that help both developers and administrators use the database. My personal favourite is mysql-genocide – which we are using in our QA environment to flush out hard to find bugs.
MySQL Exposes ALL the Knobs
Imagine having access to every undocumented trace flag in SQL Server. That’s what the file my.cnf is all about. In here, you can tweak stuff to your heart’s content. Want to remove ACID properties? There’s a setting for that. Change the way threading works – yep, got that too. Micro manage sort buffer space? – be my guest and shoot yourself in the foot!
Users of SQL Server have a love/hate relationship with trace flags. On the one hand, experts will often wish for certain trace flags that will make designed database that makes assumptions about the world provide reliable response times. On the other hand, the existence of trace flags often lead people to believe that there is some magic “make my crap schema go faster knob”. Instead of fixing the root cause, which is normally bad database design, such people ask Microsoft to help them find and use these magic settings, allowing their denialist mindset to continue… In contrast, the MySQL administrator has a rather powerful armoury that allows him to tweak everything for max performance – or to blow his own brains out.
MySQL has Lovely Documentation and a Strong Ecosystem
I find that the quality of documentation for MySQL is fully on par with SQL Server Book Online. Furthermore, I have observed that when people write about MySQL, they have a respectful and academic (in the sense that they assume intelligence) approach to the reader. Case in point, the O’reilly book: “High Performance MySQL” – which is like the Inside SQL Server books for MySQL. In here you will find a great introduction to MySQL that is both accessible to the novice, yet useful as a reference guide for the expert. Information is presented clearly, concisely and without bloating the book with screenshots or superfluous information. The focus is on the book is making databases go fast, be reliable and provide good response times. The authors never lose sight of that.
While the inevitable Linux religiosity and unsupported elitism DOES taint the MySQL community, the complexity of database engineering tempers this and creates a willingness to learn from mistakes and designs of other database products (including, you will find, MSSQL).
My experience with asking tough questions to the MySQL community, is that this often results in a deeply knowledgeable debate about the trade-offs between different design choices, not just the spewing of “best practices”.
Instrumentation is Lacking – But it is Not as Bad as You May Think
Users of SQL server who have gotten familiar with the DMVs will find that MySQL (which stores this type of information in the INFORMATION_SCHEMA) is lacking for diagnostic capabilities.
The rich querying of operating system resources from inside the database (memory, thread scheduling, CPU usage, wait times) and statistics about the objects in the database are still in their early days in MySQL – on par with what you would expect from SQL Server 7.0/2000. Tracing is reduced to a single “slow query log” which is stored in the file system.
However, when it comes to digesting trace outputs, there are excellent tools out there to analyse traces and group queries up nicely (example: pt-query-digest). Digesting traces beautifully is something that SQL Server has been lacking, the index optimiser just doesn’t cut the mustard. However, one can argue that with sys.dm_exec_query_stats, its less important to have this digest capability, as MSSQL essentially exposes the output of pt-query-digest from INSIDE the database. Replaying traces at scale is also well supported by MySQL (but like SQL Server, it is still a bit hacky).
Online Index Rebuilds and schema modification is basically not available to MySQL users. The usual ways to work around this can of course be applied, but MySQL is lacking in this area. SQL Server is by no means perfect either, but quite a bit further ahead.
(Again a thanks to Sunny for lifting my ignorance on this one and enlightening me on the new features)
In MySQL 5.6, a lot of operations are online as documented here: http://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-overview.html
We are currently not on that version, but will be moving there soon. Since we have some nice index tweaking to do, I look forward to seeing if MySQL will have similar issue to the SQL Server SCH-M lock issue when committing changes.
Partitioning in MySQL is Cool
MySQL supports some powerful partitioning strategies. As you would expect, you can do sliding window, range style partitions. However, you can also perform hash partitioning (a feature we make good use of at Livedrive) or even COMBINE partitioning schemes into composite partitioning strategies (something that users of Analysis Services should be familiar with).
In SQL Server, the lack of deeply integrated hash partitioning has always been a bugbear of mine. While you can fake it with computed columns, MSSQL lacks the awareness of hashing that the MySQL optimiser supports. As should be obvious from studying MPP designs, hash partitioning is the foundation for highly scalable designs – it will be interesting to see where MySQL takes this feature.
One feature I lack in MySQL partitioning is the ability to have non-aligned indexes. In SQL Server, it is possible to partition an index on a table with a different partition function than the table itself. MySQL forces the same partitioning of all indexes.
Even after some exposure to MySQL I am still split in my opinion about the product. It it obvious that the features of MySQL are targeted towards coders who want to “whip up a quick database” and that the design philosophy is one inspired deeply by OLTP systems. This lack of warehouse database features sets some limits on what it can be used for. On the other hand, I am happy to see that MySQL doesn’t try to be everything to everyone.
As long as the big database vendors continue to charge a premium for their products, there will be functionality sacrifices that developers and DBAs are willing to make to get a free database. Considering the price point (0 USD) – MySQL is a surprisingly powerful product for what it does. It is also encouraging to see that the development of MySQL is still focused on “getting the basics right” – not bloating the database with all sorts of beyond SQL/No-SQL like features.
We live in a world where the Web 2.0 developers have forsworn the beauty of relational algebra (mainly, I suspect, because they don’t actually grok single box parallelism) and where the big vendors have their heads so high up in the cloud that they have forgotten their customer base. It is encouraging to see MySQL keeping the relational candle lit, so that the lost NoSQL souls have somewhere to turn when the ACID burns leave them in the dark.