I don't write a lot about database management. How come? I'm a software engineer, and like many of my peers, I leverage databases to store data. I should talk more about this! What made me write this today is that I've discovered that many of my peers wouldn't be able to understand the title of this post.

However, I can tell you that once you've finished reading this, you'll thank me!

DDL?

To understand what this post is about, let's start with DDL. DDL is the abbreviation of Data Definition Language. In summary, a DDL is a language that allows defining your data structure. A famous one is the SQL DDL — and that's the one I talk about here.

I'm sure you already used it if you created a relational database with CREATE TABLE foo (id INTEGER). That is a DDL statement.

In SQL, there's a lot of DDL operations you can do, such as creating a table, renaming a table, creating or removing a column, converting a column to a new type, etc.

Those DDL statements are commonly used in two cases:

  1. When creating your database' tables for the first time. You issue a bunch of CREATE TABLE statements, and your database is ready to be used.
  2. When updating your database by adding, removing or modifying tables or columns. This is typically done when upgrading your application to a new version.

The fact that our DDL is transactional or not in option 1. has often little impact in practice. It's can still be useful, where for example you could get an error because the disk is full — having the ability to roll back in this case can be a life saviour.

In our case here, we'll talk about why you need a transactional DDL when upgrading your database.

Transactional You Said?

What transactional means here? It means that we can issue those DDL statements inside a transaction.

Wait, what's a transaction? To make it simple, in a database, a transaction is a group of operations that are treated as a single coherent operation, independently of other transactions. The final operation has to be atomic, consistent, isolated and durable — therefore that ACID property you keep reading about while always wondering what it meant. The operations composing the transaction are either entirely executed, or not at all.

In our case, having the DDL being transactional means one simple thing: the ability to execute several operations (e.g., several ALTER TABLE) in a single operation, that can be either committed or rolled back.

Let's use an example. Here's a table ingredients with a name column created with:

CREATE TABLE ingredients (
  name text NOT NULL
);

In this table, there is a list of ingredients in the form of water 20 mL, flour 300 g, etc.

Now, we're upgrading our application, and we want to handle the quantity of ingredients in their columns to make it easier to query the data. Let's say we're going to handle quantity and quantity units for our ingredients. We need to add two new columns to our table schema, quantity and unit:

ALTER TABLE ingredients ADD COLUMN quantity integer NOT NULL;
ALTER TABLE ingredients ADD COLUMN unit text NOT NULL;

We also need to convert the name by splitting it into <name> <quantity> <unit> and insert this into the new columns. We can do this like that:

UPDATE ingredients SET name=split_part(name, ' ', 1), quantity=split_part(name, ' ', 2)::int, unit=split_part(name, ' ', 3);
In this example, I'm using the split_part operator from PostgreSQL to split the string.

With the UPDATE statement, the name column containing flour 300 grams now contains flour, and the columns quantity and unit respectively stores 300 and grams.

When we run our upgrade procedure consisting of those two ALTER TABLE and one UPDATE, we got our final table like this:

# SELECT * FROM ingredients;
 name  │ quantity │ unit
───────┼──────────┼──────
 flour │      300 │ g
(1 row)

Exactly what we want.

Ok, So What?

In the previous example, everything worked fine. Our 300 grams of flour string is split, converted and stored into the three different columns. However, let's think about what happens if the conversion fails because our ingredient name is foobar:

# ALTER TABLE ingredients ADD COLUMN quantity integer;
ALTER TABLE
# ALTER TABLE ingredients ADD COLUMN unit text;
ALTER TABLE
# UPDATE ingredients SET name=split_part(name, ' ', 1), quantity=split_part(name, ' ', 2)::int, unit=split_part(name, ' ', 3);
ERROR:  invalid input syntax for integer: ""

Right, so in this case our update failed because it's impossible to convert an empty string to an integer.

We're going to fix this piece of data in our database (manually or automatically, whatever) to make it work, changing foobar to something like foobar 1 kg.

Then, when we rerun the upgrade script, this is what happens:

# ALTER TABLE ingredients ADD COLUMN quantity integer NOT NULL;
ERROR:  column "quantity" of relation "ingredients" already exists

The upgrade script failed earlier — not in the UPDATE statement. It has a good reason to fail: the column quantity already exists.

Why is that? Well, when we run the upgrade procedure the first time, we did not run it inside a transaction. Every DDL statement was committed right after its execution. Therefore, the current state of our database is half-migrated: we have the new schema installed, but not the data migrated.

This sucks. This should not happen. Ever.

Why?

Some database systems (e.g., MySQL) do not support DDL running in a transaction, so you have no choice than running the three operations (ALTER, ALTER and then UPDATE) as three distinct operations: if any of those fails, there's no way to recover and get back to the initial state.

If you're using a database that supports running DDL statements inside a transaction (e.g., PostgreSQL), we can run your upgrade script like this:

postgres=# BEGIN;
postgres=# ALTER TABLE ingredients ADD COLUMN quantity integer;
ALTER TABLE
postgres=# ALTER TABLE ingredients ADD COLUMN unit text;
ALTER TABLE
postgres=# UPDATE ingredients SET name=split_part(name, ' ', 1), quantity=split_part(name, ' ', 2)::int, unit=split_part(name, ' ', 3);
ERROR:  invalid input syntax for integer: ""
postgres=# ROLLBACK;
ROLLBACK

Since the transaction failed, we ended up doing a ROLLBACK. When checking the state of the database, we can see the state did not change:

# \d ingredients;
           Table "public.ingredients"
 Column │ Type │ Collation │ Nullable │ Default
────────┼──────┼───────────┼──────────┼─────────
 name   │ text │           │ not null │

Therefore, it's possible to fix our database content and rerun the migration procedure without being in a half-migrated state.

A Database That Lies

When you're giving data to a database, you're trusting it. It'd be awful if it were lying to you, right? Check this out:

mysql> CREATE TABLE ingredients (name text NOT NULL);
Query OK, 0 rows affected (0.03 sec)

mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)

mysql> ALTER TABLE ingredients ADD COLUMN quantity integer;
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE ingredients ADD COLUMN unit text;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ROLLBACK;
Query OK, 0 rows affected (0.00 sec)

mysql> DESC ingredients;
+----------+---------+------+-----+---------+-------+
| Field    | Type    | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| name     | text    | NO   |     | NULL    |       |
| quantity | int(11) | YES  |     | NULL    |       |
| unit     | text    | YES  |     | NULL    |       |
+----------+---------+------+-----+---------+-------+
3 rows in set (0.00 sec)

In the output above, you can see that we issued two DDL statements inside a transaction and that we then rolled back that transaction. MySQL did not output any error at any time, making us think that it did not alter our table. However, when checking the schema of the database, we can see that nothing has been rolled back. Not only MySQL does not support transactional DDL, but it also fails you entirely and lie about what it's doing.

How Important is That?

Transactional DDL is a feature that is often ignored by software engineers, whereas it's a key feature for managing your database life cycle.

I'm writing this post today because I've been hit by this multiple times over the last years. OpenStack made the choice years ago to go with MySQL, and in consequences, every database upgrade script that fails in the middle of the procedure leave the database is an inconsistence state. In that case, it means that either:

  1. The operator have to determine where the upgrade script stopped, roll back the upgrade by itself, fix the failure, and rerun the upgrade procedure.
  2. The developer must anticipate every case of potential upgrade failure, write a roll back procedure for each of this case by and test every one of those cases.
  3. Use a database system that handles transactional DDL.

No need to tell you that option 3. is the best, option 2. is barely possible to implement and option 1. is what reality looks like. In Gnocchi, we picked option 3. by recommending operators to use PostgreSQL.

Next time you use a database, think carefully about what your upgrade procedure will look like!