Ranty thoughts about MySQL, with added Oracle

I regularly use MySQL for many of my personal projects (except for that one that uses SQLite). As such, I have a well-used MySQL server on my main cluster in London, and another well-used MySQL server on my local cluster in my flat, not counting the numerous development MySQL servers sat on pretty much every device I own (including my phone…).

At work, we’re developing and maintaining an application which is backed by an Oracle database, and much of our application logic is contained within this database. As such, I get a lot of exposure to Oracle as well, and have even gone as far as setting up an Oracle XE instance at home, just to play with.

I’m not a cranky person, I don’t often complain about things, but recently as I’ve been getting more and more exposure to Oracle, and building more and more complex applications in my personal time on top of MySQL, I’ll try and do something that I know will work perfectly in Oracle, but for some unknown reason it doesn’t work – normally it is a very old bug or design decision I disagree with in MySQL.

To be fair, I have a fair share of grievances with Oracle too, which I’ll try and mention at the bottom, but mostly it’s MySQL’s issues that drive me up the wall.

Most of these stem from the differences between the two systems, and the intended audience of each. Oracle is much more enterprise-focused, but some of the things I mention below would not only be useful to MySQL users, they would allow developers to better leverage the database’s power, rather than being forced to do lots of data manipulation within the application layer.

It’s worth me pointing out at this point that I’m going to use the pt-visual-explain tool from the Percona Toolkit, a collection of open-source MySQL tools. This tool takes the standard MySQL query plan from EXPLAIN SELECT ...¬†and converts it into a visual tree so you can more easily see the query plan generated.

Subqueries, etc. for complex queries

This is really a whole raft of different gripes I have, but I’ll combine them into one.

Let’s assume that you have a query that’s something like the following:

SELECT *
FROM tbl1
INNER JOIN tbl2
LEFT JOIN tbl3
CROSS JOIN tbl4
WHERE colA = colB or colC = colD

Now, something like that is probably going to make your query planner scream in horror at the execution time, so for now let’s abbreviate that to /** REALLY EXPENSIVE QUERY **/. Let’s now assume that you need to run a comparison between different rows of that resultset, so the SQL you might use would be something like:


SELECT *
FROM (/** REALLY EXPENSIVE QUERY **/) a
INNER JOIN (/** REALLY EXPENSIVE QUERY **/) b ON a.colA = b.colB

Now, this is going to get unwieldy rapidly, but that’s a side-effect of writing SQL this way. It’s worth pointing out that MySQL will likely plan to run this query twice. Let’s look at this query on a real table:

select *
from (select id, username, status from user) a
left join (select id, username, status from user) b on a.id = b.id + 1

+------+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                           |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+
|    1 | SIMPLE      | user  | ALL  | NULL          | NULL | NULL    | NULL | 1284 |                                                 |
|    1 | SIMPLE      | user  | ALL  | NULL          | NULL | NULL    | NULL | 1284 | Using where; Using join buffer (flat, BNL join) |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+

JOIN
+- Join buffer
|  +- Filter with WHERE
|     +- Table scan
|        rows           1284
|        +- Table
|           table          user
+- Table scan
   rows           1284
   +- Table
      table          user

MySQL is not alone in this, Oracle’s query planner produces a similar query plan:

As you can see from the simplistic query, identical subqueries are used as a self-join, and each subquery is executed independently before the join. If the subquery was much more expensive, then this expense just doubled. I’m aware that the job of a query planner is to make optimisations as necessary, and for such a simple query it might have decided that re-executing it might be faster, but that explanation won’t fly with this:

EXPLAIN
SELECT a.count, a.username, a.status, b.count, b.username, b.status
FROM (
	SELECT COUNT(*) count, log.user, user.username, user.status
	FROM log
	LEFT JOIN emailtemplate ON concat('Closed ', emailtemplate.id) = log.action
	INNER JOIN user ON user.id = log.user
	WHERE emailtemplate.oncreated = '1' OR log.action = 'Closed custom-y'
	GROUP BY log.user, user.username, user.status
	) a
INNER JOIN (
	SELECT COUNT(*) count, log.user, user.username, user.status
	FROM log
	LEFT JOIN emailtemplate ON concat('Closed ', emailtemplate.id) = log.action
	INNER JOIN user ON user.id = log.user
	WHERE emailtemplate.oncreated = '1' OR log.action = 'Closed custom-y'
	GROUP BY log.user, user.username, user.status
	) b on a.user = b.user + 1
order by a.user

+------+-------------+---------------+--------+-----------------------------+------------------------+---------+--------------------------+---------+-------------------------------------------------+
| id   | select_type | table         | type   | possible_keys               | key                    | key_len | ref                      | rows    | Extra                                           |
+------+-------------+---------------+--------+-----------------------------+------------------------+---------+--------------------------+---------+-------------------------------------------------+
|    1 | PRIMARY     |               | ALL    | NULL                        | NULL                   | NULL    | NULL                     | 5968734 | Using temporary; Using filesort                 |
|    1 | PRIMARY     |               | ALL    | NULL                        | NULL                   | NULL    | NULL                     | 5968734 | Using where; Using join buffer (flat, BNL join) |
|    3 | DERIVED     | log           | index  | log_idx_action,log_idx_user | log_idx_typeuseraction | 908     | NULL                     |  351102 | Using index; Using temporary; Using filesort    |
|    3 | DERIVED     | user          | eq_ref | PRIMARY                     | PRIMARY                | 4       | waca_production.log.user |       1 |                                                 |
|    3 | DERIVED     | emailtemplate | ALL    | NULL                        | NULL                   | NULL    | NULL                     |      17 | Using where; Using join buffer (flat, BNL join) |
|    2 | DERIVED     | log           | index  | log_idx_action,log_idx_user | log_idx_typeuseraction | 908     | NULL                     |  351102 | Using index; Using temporary; Using filesort    |
|    2 | DERIVED     | user          | eq_ref | PRIMARY                     | PRIMARY                | 4       | waca_production.log.user |       1 |                                                 |
|    2 | DERIVED     | emailtemplate | ALL    | NULL                        | NULL                   | NULL    | NULL                     |      17 | Using where; Using join buffer (flat, BNL join) |
+------+-------------+---------------+--------+-----------------------------+------------------------+---------+--------------------------+---------+-------------------------------------------------+


Filesort
+- TEMPORARY
   table          temporary(derived(temporary(log,user,emailtemplate)),derived(temporary(log,user,emailtemplate)))
   +- JOIN
      +- Join buffer
      |  +- Filter with WHERE
      |     +- Table scan
      |        rows           5968734
      |        +- DERIVED
      |           table          derived(temporary(log,user,emailtemplate))
      |           +- Filesort
      |              +- TEMPORARY
      |                 table          temporary(log,user,emailtemplate)
      |                 +- JOIN
      |                    +- Join buffer
      |                    |  +- Filter with WHERE
      |                    |     +- Table scan
      |                    |        rows           17
      |                    |        +- Table
      |                    |           table          emailtemplate
      |                    +- JOIN
      |                       +- Bookmark lookup
      |                       |  +- Table
      |                       |  |  table          user
      |                       |  |  possible_keys  PRIMARY
      |                       |  +- Unique index lookup
      |                       |     key            user->PRIMARY
      |                       |     possible_keys  PRIMARY
      |                       |     key_len        4
      |                       |     ref            waca_production.log.user
      |                       |     rows           1
      |                       +- Index scan
      |                          key            log->log_idx_typeuseraction
      |                          possible_keys  log_idx_action,log_idx_user
      |                          key_len        908
      |                          rows           351102
      +- Table scan
         rows           5968734
         +- DERIVED
            table          derived(temporary(log,user,emailtemplate))
            +- Filesort
               +- TEMPORARY
                  table          temporary(log,user,emailtemplate)
                  +- JOIN
                     +- Join buffer
                     |  +- Filter with WHERE
                     |     +- Table scan
                     |        rows           17
                     |        +- Table
                     |           table          emailtemplate
                     +- JOIN
                        +- Bookmark lookup
                        |  +- Table
                        |  |  table          user
                        |  |  possible_keys  PRIMARY
                        |  +- Unique index lookup
                        |     key            user->PRIMARY
                        |     possible_keys  PRIMARY
                        |     key_len        4
                        |     ref            waca_production.log.user
                        |     rows           1
                        +- Index scan
                           key            log->log_idx_typeuseraction
                           possible_keys  log_idx_action,log_idx_user
                           key_len        908
                           rows           351102

Here we can clearly see one of the main problems of subqueries in general, and why this sort of query really needs to be optimised. Oracle’s plan is only slightly better, joining emailtemplate to log before user instead of joining user to log before joining to emailtemplate. Otherwise, it’s using the same basic plan.

Enter Common Table Expressions!

Common Table Expressions

Common Table Expressions are temporary named resultsets that can be used in further queries. Introduced in SQL:1999, the basic idea is you can refactor work out into a subquery:

WITH my_data AS (SELECT * FROM foo)
SELECT * from my_data;

In the above example, the initial query is SELECT * FROM foo, the result of which is held under the name my_data, which is then used in the second query. This example is needlessly simple, but gives the general syntax. Refactoring an example from above, we get something like this:

WITH expensive AS (/** REALLY EXPENSIVE QUERY **/)
SELECT *
FROM expensive a
INNER JOIN expensive b ON a.colA = b.colB

This is much simpler, and uses the same resultset twice, which is what we want!

Our massive query from above would become this:

WITH cte AS (
SELECT COUNT(*) count, log.user, user.username, user.status
FROM log
LEFT JOIN emailtemplate ON concat('Closed ', emailtemplate.id) = log.action
INNER JOIN user ON user.id = log.user
WHERE emailtemplate.oncreated = '1' OR log.action = 'Closed custom-y'
GROUP BY log.user, user.username, user.status
)
SELECT a.count, a.username, a.status, b.count, b.username, b.status
FROM cte a
INNER JOIN cte b on a.user = b.user + 1
order by a.user

Oracle now gives a plan which looks something like this:

However, in MySQL, common table expressions are not supported:

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'cte AS (SELECT COUNT(*) count, log.user, user.username, user.status FROM log LE' at line 1

Oh well, we can attack this in a different direction and be more explicit about the named temporary resultset. Our next approach is possible in MySQL with additional rights on the database – the ability to create temporary tables.

In MySQL, temporary tables are session-based for both structure and data, unlike Oracle where temporary tables have a defined persistent structure, and the data is temporary.

So, we start by running our expensive query into a temporary table:

CREATE TEMPORARY TABLE temptable AS
SELECT COUNT(*) count, log.user, user.username, user.status
FROM log
LEFT JOIN emailtemplate ON concat('Closed ', emailtemplate.id) = log.action
INNER JOIN user ON user.id = log.user
WHERE emailtemplate.oncreated = '1' OR log.action = 'Closed custom-y'
GROUP BY log.user, user.username, user.status;

… and then we call the temp table in our query:

SELECT a.count, a.username, a.status, b.count, b.username, b.status
FROM temptable a
INNER JOIN temptable b on a.user = b.user + 1
order by a.user;

… to find this rather cryptic error message:

ERROR 1137 (HY000): Can't reopen table: 'a'

As we’ve aliased temptable to a, this basically is telling us we can’t reopen the table temptable. This is a known, documented feature of MySQL. The workaround to this is to create another temporary table with the resultset from the first, but we’re doubling the storage usage of this query after this.

The other option we have is to use a materialised view.

Materialised Views

Materialised views can be thought of as a cross between tables and views. They are defined much in the same way as a view, with a subquery to populate them, but they cache the data so the database doesn’t have to recalculate the values every time the view’s data is needed.

This caching is invalidated either manually, on a timer, or on commit, so it is the choice of the DBA at what point the data should be refreshed. For statistical data like these queries, where you want speedy performance of the underlying query but you don’t care for the most up-to-date data, I’d probably use a timer with a delay appropriate for the usage.

As such, we can create the materialised view:

CREATE MATERIALIZED VIEW matview REFRESH ON DEMAND AS
SELECT COUNT(*) count, log.user_, user_.username, user_.status
FROM log
LEFT JOIN emailtemplate ON concat('Closed ', emailtemplate.id) = log.action
INNER JOIN user_ ON user_.id = log.user_
WHERE emailtemplate.oncreated = '1' OR log.action = 'Closed custom-y'
GROUP BY log.user_, user_.username, user_.status;

Refresh it with the latest data:

exec DBMS_MVIEW.REFRESH('matview');

And finally run our adapted query against it:

SELECT a.count, a.username, a.status, b.count, b.username, b.status
FROM matview a
INNER JOIN matview b on a.user_ = b.user_ + 1
order by a.user_;

As you can see, the query plan is now looking really good:


I think this is enough for one post, I’ll probably be back for more later, as I still have a few things on my list to mention, including roles, MyISAM, function-based indices, hash-join and sort-merge, recursive queries, and dynamic SQL. Stay tuned!