Enter your search

Upgrading Amazon RDS MySQL 5.5 to 5.6 without downtime

By
It’s not easy, but upgrading to 5.6 is possible with zero downtime

MySQL 5.6 has many improvements over 5.5. At GoSquared, we wanted to upgrade and we wanted to avoid any downtime. We could also upgrade from a db.m1.medium to a db.m3.medium instance which provides an extra ECU and is marginally cheaper.

The problem

AWS promised automatic upgrading capabilities back in July 2013 but nothing has surfaced. MySQL 5.5 on RDS doesn’t allow access to the binlog either, which makes a zero downtime and consistent migration very tough indeed. There was no way we’d lock the tables and stream a mysqldump into the new instance —that’d cause too much downtime.

Ideally, both instances would be able to accept reads and writes for ~30 minutes whilst all of our applications were redeployed smoothly with the new MySQL endpoint. We needed to replicate all data changes on the old instance to the new one while both simultaneously served queries. This is pretty much the worst imaginable situation to be in for consistency. Further, there’s no access to the binlog (only the general log).

The solution

The solution isn’t pretty. It isn’t clean or efficient and you risk some data becoming inconsistent during the migration. It also requires you to write your own basic replication application. But it works —give or take a couple of potentially lost updates— and so long as the potential issues are known they can be worked around.

Step 0 – Backup and write the importer

Take a database snapshot before starting this work (there’s always a risk that something will go wrong and you’ll need to revert back). Enable the general_log (in the Parameter Groups section) on your old instance and apply the settings immediately.

The importer isn’t terribly nice  —here’s an example of the code in Node.js:

var mysql = require('mysql');
var async = require('async');
var moment = require('moment');

var oldHost = {
  host: 'your-old-mysql-host.us-east-1.rds.amazonaws.com',
  user: 'root',
  password: 'password',
  database: 'mysql',
  port: 3306
};

// edit concurrency to change the number of connections
// and speed up the replication
var concurrency = 50;

var newHost = {
  host: 'your-new-mysql-host.us-east-1.rds.amazonaws.com',
  user: 'root',
  password: 'password',
  database: 'yourdatabase',
  port: 3306,
  connectionLimit: concurrency
};

var oldMySQL = mysql.createConnection(oldHost);
var newMySQL = mysql.createPool(newHost);

// this should be the timestamp when the mysqldump was started
var latest = moment.utc('2014-04-03 17:30:22').unix();

// loop forever until process it exited
async.forever(function(done) {

  // query from the general_log to get any non-SELECT queries (be aware that this is very inefficient)
  oldMySQL.query(
    "SELECT UNIX_TIMESTAMP(event_time) as unix, argument FROM mysql.general_log WHERE command_type = 'Query' AND UNIX_TIMESTAMP(event_time) > ? AND argument != 'flush logs' AND argument NOT LIKE '%SELECT%' AND argument != ''",
    [ latest ],
    function(err, rows) {
    if (!rows.length) {
      console.log('NO MORE ROWS', latest);

      // check the log again for any new queries in 5s
      setTimeout(function() {
        done();
      }, 5000);
      return;
    }

    // set latest to be the event_time of the last query selected
    latest = rows[rows.length -1].unix;

    // log to show we're actually doing something
    console.log(rows.length, moment.unix(latest).format());

    async.eachLimit(rows, concurrency, function(row, fin) {

      newMySQL.getConnection(function(err, con) {

        // replay the query on our new instance
        con.query(row.argument, function(err, rows) {
          con.release();
          if (err) {
            console.log(row.argument, err);
          }
          fin();
        });
      });
    }, done);
  });
});

Step 1 – Boot up the desired instance(s)

Booting up a read-replica of your old instance isn’t essential, but it will help stop the mysqldump from affecting the main instance.

Next, boot up the new MySQL 5.6 instance that you’d like to be the new master.

Step 2 – Stream a `mysqldump` of the old instance into the new one

Remember to note down the timestamp when the dump started and update the replication script. Preferably use a read replica for this.

mysqldump \
  –h old-mysql-host-pref-read-replica.us-east-1.rds.amazonaws.com \ 
  -u root –pPASSWORD \
  --port 3306 --single-transaction --routines --triggers \ 
  --databases your-databases-here --compress --compact \
  | mysql \ 
    -h your-new-mysql-host.us-east-1.rds.amazonaws.com \
    -u root user -pPASSWORD --port 3306

Step 3 – Start the replication!

Run the replication app, it should log how many queries it’s doing. As you migrate your applications to use the new instance, the numbers should decrease.

Step 4 – Migrate your applications to use the new instance

If you’re using a custom domain for your MySQL and a CNAME record for where that points, migrating is easy (so long as the DNS TTL is short enough).

Otherwise, every application will need deploying with the new MySQL host configured. Start with the most UPDATE heavy applications. SELECT only applications can be left until last.

Step 5 – Wait until replication stops

After all applications are deployed, there shouldn’t be any updates going into the old instance. Once this happens, stop the replicator. Taking an export of the entire general_log from the start timestamp may be a good idea – you can take a look through/query it and see if there were any critical changes in there (for us – upgrades, downgrades etc.) to double-check they were replicated properly. SHOW processlist may help to see any open connections that have been forgotten.

Conclusions

Doing a zero downtime migration from MySQL 5.5 to 5.6 on Amazon RDS is not easy, nor is it 100% consistent. Especially as the general_log only provides the query that was run  and not the resultant row so default values/auto-increments can be lost or changed. It worked well for our use case —we don’t run huge number of UPDATE queries. The replication worked fine and we were able to check for any inconsistencies manually.

If possible, it’d be best to schedule some downtime (or at least UPDATE downtime) for the migration.

Good luck and let us know how you got on via Twitter @GoSquared

Written by
Lead developer at GoSquared for integrations, partnerships and the API. Works on pretty much everything.

You May Also Like

Group 5 Created with Sketch. Group 11 Created with Sketch. CLOSE ICON Created with Sketch. icon-microphone Group 9 Created with Sketch. CLOSE ICON Created with Sketch. SEARCH ICON Created with Sketch. Group 4 Created with Sketch. Path Created with Sketch. Group 5 Created with Sketch.