Live upgrade from Tarantool 1.6 to 1.10 | Tarantool
Administration Upgrades Live upgrade from Tarantool 1.6 to 1.10

Live upgrade from Tarantool 1.6 to 1.10

This page includes explanations and solutions to some common issues when upgrading a replica set from Tarantool 1.6 to 1.10.

Versions later that 1.6 have incompatible .snap and .xlog file formats: 1.6 files are supported during upgrade, but you won’t be able to return to 1.6 after running under 1.10 or 2.x for a while. A few configuration parameters are also renamed.

To perform a live upgrade from Tarantool 1.6 to a more recent version, like 2.8.4, 2.10.1 and such, it is necessary to take an intermediate step by upgrading 1.6 -> 1.10 -> 2.x. This is the only way to perform the upgrade without downtime.

However, a direct upgrade of a replica set from 1.6 to 2.x is also possible, but only with downtime.

The procedure of live upgrade from 1.6 to 1.10 is similar to the general upgrade procedure, which is as follows:

  1. Pick any read-only instance in the replica set.

  2. Upgrade this replica to the new Tarantool version. See details in Upgrading Tarantool on a standalone instance. This requires stopping the instance for a while, which won’t interrupt the replica set operation. When the upgraded replica is up again, it will synchronize with the other instances in the replica set so that the data are consistent across all the instances.

  3. Make sure the upgraded replica is running and connected to the rest of the replica set just fine. To do this, run box.info.replication in the instance’s console and check the output table for values like upstream, downstream, and lag.

    For each instance id, there are upstream and downstream values. Both of them should have the value follow, except on the instance where you run this code. This means that the replicas are connected and there are no errors in the data flow.

    The value of the lag field can be less or equal than box.cfg.replication_timeout, but it can also be moderately larger. For example, if box.cfg.replication_timeout is 1 second and the write load on the master is high, it’s generally OK to have a lag of about 10 seconds on the master. It is up to the user to decide what lag values are fine.

  4. Upgrade all the read-only instances by repeating steps 1–3 until only the master keeps running the old Tarantool version.

  5. Make one of the updated replicas the new master:

    • If the replica set is using asynchronous replication without RAFT-based leader elections, first run box.cfg{ read_only = true } on the old master and then run box.cfg{ read_only = false } on the replica that will be the new master.
    • If the replica set is using synchronous replication or RAFT-based leader elections, run box.ctl.promote() on the new master and then run box.cfg{ election_mode = 'voter' } on the old master. This will automatically change the read_only statuses on the instances.
    • For a Cartridge replica set, it is possible to select the new master in the web UI.

    There is no need to restart the new master.

    Check that the new master continues following and being followed by all other replicas, similarly to step 3.

  6. Upgrade the former master, which is now a read-only instance.

  7. Run box.schema.upgrade() on the new master. This will update the Tarantool system spaces to match the currently installed version of Tarantool. There is no need to run box.schema.upgrade() on every node: changes are propagated to other nodes via the regular replication mechanism.

  1. Run box.snapshot() on every node in the replica set to make sure that the replicas immediately see the upgraded database state in case of restart.

What’s different when upgrading from Tarantool 1.6:

Step 2: Tarantool 1.10+ fails to recover from 1.6 xlogs, unless box.cfg{force_recovery = true} is set. There is some small difference between 1.6 and 1.10 xlogs, which makes 1.6 xlogs appear erroneous to 1.10+ instances. In order to work around this, start the instance in force_recovery mode. To do so, add the line force_recovery = true to the file where the instance is initialized – for example, to init.lua.

Step 3: New Tarantool nodes follow 1.6 nodes just fine, but some 1.6 nodes might disconnect from new nodes with an ER_LOADING error. This is not critical, the error goes away when replication on 1.6 is restarted:

old_repl = box.cfg.replication
box.cfg{replication = ""}
box.cfg{replication = old_repl}

Step 7: There was a breaking change between 1.6 and 1.10 – in 1.6, the field type num was an alias to number, and in 1.10, num is converted to unsigned. This means that after box.schema.upgrade() is performed on the master, the user might have some spaces with unsigned fields containing non-unsigned values: double, int, and so on. This will make the snapshot inconsistent, unless an extra action is performed after box.schema.upgrade(). Run this code in the Tarantool console on the new master:

-- First find all spaces containing unsigned fields with non-unsigned values in them.
-- Say, we have one such space denoted problematic_space and the problem is in field problematic_field_no.
a = box.space.problematic_space:format()
a[problematic_field_no].type = 'number'
box.space.problematic_space:format(a)

Once this is performed on the master, it’s safe to proceed to step 8, making a snapshot on every node.

Step 8: The user might be concerned with snapshot size in 1.10 – it’s drastically smaller than the one created by 1.6 (for example, ~300 Mb vs. 6 Gb in some corner cases). There is nothing to worry about. Tarantool 1.6 didn’t compress snapshots, while Tarantool 1.10 and above does that.

Found what you were looking for?
Feedback