Supervised failover

Enterprise Edition

Supervised failover is supported by the Enterprise Edition only.

Example on GitHub: supervised_failover

Tarantool provides the ability to control leadership in a replica set using an external failover coordinator. A failover coordinator reads a cluster configuration from a file or an etcd-based configuration storage, polls instances for their statuses, and appoints a leader for each replica set depending on the availability and health of instances.

To increase fault tolerance, you can run two or more failover coordinators. In this case, an etcd cluster provides synchronization between coordinators.

Overview

The main steps of using an external failover coordinator for a newly configured cluster might look as follows:

Configure a cluster to work with an external coordinator. The main step is setting the replication.failover option to supervised for all replica sets that should be managed by the external coordinator. For newly configured clusters, set replication.bootstrap_strategy to native so that the coordinator controls the initial bootstrap as well.
Start a configured cluster. If replication.bootstrap_strategy is set to native or supervised, an unbootstrapped replica set waits for the failover coordinator to appoint a bootstrap leader and no instance bootstraps on its own. After bootstrap, instances are started in read-only mode until the coordinator appoints a leader.

If the default auto bootstrap strategy is used instead, one instance in an unbootstrapped replica set can start in read-write mode before the coordinator starts.
Start a failover coordinator. If the replica set is waiting for a bootstrap leader, start the coordinator before replication.connect_timeout expires, or start the coordinator before the instances. You can start two or more failover coordinators to increase fault tolerance. In this case, one coordinator is active and others are passive.

Once a cluster and failover coordinators are up and running, a failover coordinator appoints one instance to be a master if there is no master instance in a replica set. Then, the following events may occur:

If a master instance fails, a failover coordinator performs an automated failover.
If an active failover coordinator fails, another coordinator becomes active and performs an automated failover.

Note

Note that a failover coordinator doesn’t work with replica sets with two or more read-write instances. In this case, a coordinator logs a warning to stdout and doesn’t perform any appointments.

Appointing a new master instance

After a master instance has been appointed, a failover coordinator monitors the statuses of all instances in a replica set by sending requests each probe_interval seconds. For the master instance, the coordinator maintains a read-write mode deadline, which is renewed periodically each renew_interval seconds. If all attempts to renew the deadline fail during the specified time interval (lease_interval), the master switches to read-only mode. Then, the coordinator appoints a new instance as the master.

Note

Anonymous replicas are not considered as candidates to be a master.

If a remote etcd-based storage is used to maintain the state of failover coordinators, you can also perform a manual failover.

Active and passive coordinators

To increase fault tolerance, you can run two or more failover coordinators. In this case, only one coordinator is active and used to control leadership in a replica set. Other coordinators are passive and don’t perform any read-write appointments.

To maintain the state of coordinators, Tarantool uses a stateboard – a remote etcd-based storage. This storage uses the same connection settings as a centralized etcd-based configuration storage. If a cluster configuration is stored in the <prefix>/config/* keys in etcd, the failover coordinator looks into <prefix>/failover/* for its state. Here are a few examples of keys used for different purposes:

<prefix>/failover/info/by-uuid/<uuid>: contains a state of a failover coordinator identified by the specified uuid.
<prefix>/failover/active/lock: a unique identifier (UUID) of an active failover coordinator.
<prefix>/failover/active/term: a kind of fencing token allowing to have an order in which coordinators become active (took the lock) over time.
<prefix>/failover/command/<id>: a key used to perform a manual failover.

Configuring a cluster

To configure a cluster to work with an external failover coordinator, follow the steps below:

(Optional) If you need to run several failover coordinators to increase fault tolerance, set up an etcd-based configuration storage, as described in Centralized configuration storages.
Set the replication.failover option to supervised and replication.bootstrap_strategy to native:
```
replication:
  failover: supervised
  bootstrap_strategy: native
```
Grant a user used for replication permissions to execute the failover.execute function:
```
credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [ replication ]
      privileges:
      - permissions: [ execute ]
        lua_call: [ 'failover.execute' ]
```
Note

When replication.bootstrap_strategy is native or supervised, Tarantool automatically grants the guest user runtime privileges to execute failover.execute for the initial cluster bootstrap. The explicit grant above lets the coordinator use the configured replication user after the cluster is bootstrapped.

In Tarantool 3.0 and 3.1, the configuration is different and the function must be created in the application code. See Tarantool 3.0 and 3.1 configuration for details.

(Optional) Configure options that control how a failover coordinator operates in the failover section:

failover:
  probe_interval: 5
  lease_interval: 15
  renew_interval: 5
  stateboard:
    keepalive_interval: 5
    renew_interval: 1

You can find the full example on GitHub: supervised_failover.

Tarantool 3.0 and 3.1 configuration

Before version 3.2, Tarantool used another mechanism to grant execute access to Lua functions. In Tarantool 3.0 and 3.1, the credentials configuration section should look as follows:

# Tarantool 3.0 and 3.1
credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [ replication ]
      privileges:
      - permissions: [ execute ]
        functions: [ 'failover.execute' ]

Additionally, you should create the failover.execute function in the application code. For example, you can create a custom role for this purpose:

-- Tarantool 3.0 and 3.1 --
-- supervised_instance.lua --
return {
    validate = function()
    end,
    apply = function()
        if box.info.ro then
            return
        end
        local func_name = 'failover.execute'
        local opts = { if_not_exists = true }
        box.schema.func.create(func_name, opts)
    end,
    stop = function()
        if box.info.ro then
            return
        end
        local func_name = 'failover.execute'
        if not box.schema.func.exists(func_name) then
            return
        end
        box.schema.func.drop(func_name)
    end,
}

Then, enable this role for all storage instances:

# Tarantool 3.0 and 3.1
roles: [ 'supervised_instance' ]

Starting a failover coordinator

To start a failover coordinator, you need to execute the tarantool command with the failover option. This command accepts the path to a cluster configuration file:

tarantool --failover --config instances.enabled/supervised_failover/config.yaml

If a cluster’s configuration is stored in etcd, the config.yaml file contains connection options for the etcd storage.

You can run two or more failover coordinators to increase fault tolerance. In this case, only one coordinator is active and used to control leadership in a replica set. Learn more from Active and passive coordinators.

Performing manual failover

If an etcd-based storage is used to maintain the state of failover coordinators, you can perform a manual failover. External tools can use the <prefix>/failover/command/<id> key to choose a new master. For example, the tt utility provides the tt cluster failover command for managing a supervised failover.

Version:

Supervised failover

Overview

Appointing a new master instance

Active and passive coordinators

Configuring a cluster

Tarantool 3.0 and 3.1 configuration

Starting a failover coordinator

Performing manual failover