Supervised failover
Enterprise Edition
Supervised failover is supported by the Enterprise Edition only.
Example on GitHub: supervised_failover
Tarantool provides the ability to control leadership in a replica set using an external failover coordinator. A failover coordinator reads a cluster configuration from a file or an etcd-based configuration storage, polls instances for their statuses, and appoints a leader for each replica set depending on the availability and health of instances.
To increase fault tolerance, you can run two or more failover coordinators. In this case, an etcd cluster provides synchronization between coordinators.
The main steps of using an external failover coordinator for a newly configured cluster might look as follows:
- Configure a cluster to work with an external coordinator.
The main step is setting the
replication.failover
option tosupervised
for all replica sets that should be managed by the external coordinator. - Start a configured cluster.
When an external coordinator is still not running, instances in a replica set start in the following modes:
- If a replica set is already bootstrapped, all instances are started in read-only mode.
- If a replica set is not bootstrapped, one instance is started in read-write mode.
- Start a failover coordinator. You can start two or more failover coordinators to increase fault tolerance. In this case, one coordinator is active and others are passive.
Once a cluster and failover coordinators are up and running, a failover coordinator appoints one instance to be a master if there is no master instance in a replica set. Then, the following events may occur:
- If a master instance fails, a failover coordinator performs an automated failover.
- If an active failover coordinator fails, another coordinator becomes active and performs an automated failover.
Note
Note that a failover coordinator doesn’t work with replica sets with two or more read-write instances. In this case, a coordinator logs a warning to stdout and doesn’t perform any appointments.
After a master instance has been appointed, a failover coordinator monitors the statuses of all instances in a replica set by sending requests each probe_interval seconds. For the master instance, the coordinator maintains a read-write mode deadline, which is renewed periodically each renew_interval seconds. If all attempts to renew the deadline fail during the specified time interval (lease_interval), the master switches to read-only mode. Then, the coordinator appoints a new instance as the master.
Note
Anonymous replicas are not considered as candidates to be a master.
If a remote etcd-based storage is used to maintain the state of failover coordinators, you can also perform a manual failover.
To increase fault tolerance, you can run two or more failover coordinators. In this case, only one coordinator is active and used to control leadership in a replica set. Other coordinators are passive and don’t perform any read-write appointments.
To maintain the state of coordinators, Tarantool uses a stateboard – a remote etcd-based storage.
This storage uses the same connection settings as a centralized etcd-based configuration storage.
If a cluster configuration is stored in the <prefix>/config/*
keys in etcd, the failover coordinator looks into <prefix>/failover/*
for its state.
Here are a few examples of keys used for different purposes:
<prefix>/failover/info/by-uuid/<uuid>
: contains a state of a failover coordinator identified by the specifieduuid
.<prefix>/failover/active/lock
: a unique identifier (UUID) of an active failover coordinator.<prefix>/failover/active/term
: a kind of fencing token allowing to have an order in which coordinators become active (took the lock) over time.<prefix>/failover/command/<id>
: a key used to perform a manual failover.
To configure a cluster to work with an external failover coordinator, follow the steps below:
(Optional) If you need to run several failover coordinators to increase fault tolerance, set up an etcd-based configuration storage, as described in Centralized configuration storages.
Set the replication.failover option to
supervised
:replication: failover: supervised
Grant a user used for replication permissions to execute the
failover.execute
function:credentials: users: replicator: password: 'topsecret' roles: [ replication ] privileges: - permissions: [ execute ] functions: [ 'failover.execute' ]
Create the
failover.execute
function in the application code. For example, you can create a custom role for this purpose:-- supervised_instance.lua -- return { validate = function() end, apply = function() if box.info.ro then return end local func_name = 'failover.execute' local opts = { if_not_exists = true } box.schema.func.create(func_name, opts) end, stop = function() if box.info.ro then return end local func_name = 'failover.execute' if not box.schema.func.exists(func_name) then return end box.schema.func.drop(func_name) end, }
Then, you need to enable this role for all storage instances:
roles: [ 'supervised_instance' ]
(Optional) Configure options that control how a failover coordinator operates in the failover section:
failover: probe_interval: 5 lease_interval: 15 renew_interval: 5 stateboard: keepalive_interval: 5 renew_interval: 1
You can find the full example on GitHub: supervised_failover.
To start a failover coordinator, you need to execute the tarantool
command with the failover option.
This command accepts the path to a cluster configuration file:
tarantool --failover --config instances.enabled/supervised_failover/config.yaml
If a cluster’s configuration is stored in etcd, the config.yaml
file contains connection options for the etcd storage.
You can run two or more failover coordinators to increase fault tolerance. In this case, only one coordinator is active and used to control leadership in a replica set. Learn more from Active and passive coordinators.
If an etcd-based storage is used to maintain the state of failover coordinators, you can perform a manual failover.
External tools can use the <prefix>/failover/command/<id>
key to choose a new master.
For example, the tt utility provides the tt cluster failover command for managing a supervised failover.