Master-replica: manual failover
Example on GitHub: manual_leader
This tutorial shows how to configure and work with a replica set with manual failover.
Before starting this tutorial:
Install the tt utility.
Create a tt environment in the current directory by executing the tt init command.
Inside the
instances.enableddirectory of the created tt environment, create themanual_leaderdirectory.Inside
instances.enabled/manual_leader, create theinstances.ymlandconfig.yamlfiles:instances.ymlspecifies instances to run in the current environment and should look like this:instance001: instance002:
The
config.yamlfile is intended to store a replica set configuration.
This section describes how to configure a replica set in config.yaml.
Define a replica set topology inside the groups section:
- The leader option sets
instance001as a replica set leader. - The iproto.listen option specifies an address used to listen for incoming requests and allows replicas to communicate with each other.
groups:
group001:
replicasets:
replicaset001:
leader: instance001
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
In the credentials section, create the replicator user with the replication role:
credentials:
users:
replicator:
password: 'topsecret'
roles: [replication]
Set iproto.advertise.peer to advertise the current instance to other replica set members:
iproto:
advertise:
peer:
login: replicator
The resulting replica set configuration should look as follows:
credentials:
users:
replicator:
password: 'topsecret'
roles: [replication]
iproto:
advertise:
peer:
login: replicator
replication:
failover: manual
groups:
group001:
replicasets:
replicaset001:
leader: instance001
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
After configuring a replica set, execute the tt start command from the tt environment directory:
$ tt start manual_leader • Starting an instance [manual_leader:instance001]... • Starting an instance [manual_leader:instance002]...
Check that instances are in the
RUNNINGstatus using the tt status command:$ tt status manual_leader INSTANCE STATUS PID MODE CONFIG BOX UPSTREAM manual_leader:instance001 RUNNING 8841 RW ready running -- manual_leader:instance002 RUNNING 8842 RO ready running --
Connect to
instance001using tt connect:$ tt connect manual_leader:instance001 • Connecting to the instance... • Connected to manual_leader:instance001
Make sure that the instance is in the
runningstate by executing box.info.status:manual_leader:instance001> box.info.status --- - running ...
Check that the instance is writable using
box.info.ro:manual_leader:instance001> box.info.ro --- - false ...
Execute
box.info.replicationto check a replica set status. Forinstance002,upstream.statusanddownstream.statusshould befollow.manual_leader:instance001> box.info.replication --- - 1: id: 1 uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660 lsn: 7 name: instance001 2: id: 2 uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23 lsn: 0 upstream: status: follow idle: 0.3893879999996 peer: replicator@127.0.0.1:3302 lag: 0.00028800964355469 name: instance002 downstream: status: follow idle: 0.37777199999982 vclock: {1: 7} lag: 0 ...
To see the diagrams that illustrate how the
upstreamanddownstreamconnections look, refer to Monitoring a replica set.
To check that a replica (instance002) gets all updates from the master, follow the steps below:
On
instance001, create a space and add data as described in CRUD operation examples.Open the second terminal, connect to
instance002usingtt connect, and use theselectoperation to make sure data is replicated.Check that box.info.vclock values are the same on both instances:
instance001:manual_leader:instance001> box.info.vclock --- - {1: 21} ...
instance002:manual_leader:instance002> box.info.vclock --- - {1: 21} ...
Note
Note that a
vclockvalue might include the0component that is related to local space operations and might differ for different instances in a replica set.
This section describes how to add a new replica to a replica set.
Add
instance003to theinstances.ymlfile:instance001: instance002: instance003:
Add
instance003with the specifiediproto.listenoption to theconfig.yamlfile:groups: group001: replicasets: replicaset001: leader: instance001 instances: instance001: iproto: listen: - uri: '127.0.0.1:3301' instance002: iproto: listen: - uri: '127.0.0.1:3302' instance003: iproto: listen: - uri: '127.0.0.1:3303'
Open the third terminal to work with a new instance. Start
instance003usingtt start:$ tt start manual_leader:instance003 • Starting an instance [manual_leader:instance003]...
Check a replica set status using
tt status:$ tt status manual_leader INSTANCE STATUS PID MODE CONFIG BOX UPSTREAM manual_leader:instance001 RUNNING 8841 RW ready running -- manual_leader:instance002 RUNNING 8842 RO ready running -- manual_leader:instance003 RUNNING 8856 RO ready running --
After you added instance003 to the configuration and started it, you need to reload configurations on all instances.
This is required to allow instance001 and instance002 to get data from the new instance in case it becomes a master.
Connect to
instance003usingtt connect:$ tt connect manual_leader:instance003 • Connecting to the instance... • Connected to manual_leader:instance001
Reload configurations on all three instances using the
reload()function provided by the config module:instance001:manual_leader:instance001> require('config'):reload() --- ...
instance002:manual_leader:instance002> require('config'):reload() --- ...
instance003:manual_leader:instance003> require('config'):reload() --- ...
Execute
box.info.replicationto check a replica set status. Make sure thatupstream.statusanddownstream.statusarefollowforinstance003.manual_leader:instance001> box.info.replication --- - 1: id: 1 uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660 lsn: 21 name: instance001 2: id: 2 uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23 lsn: 0 upstream: status: follow idle: 0.052655000000414 peer: replicator@127.0.0.1:3302 lag: 0.00010204315185547 name: instance002 downstream: status: follow idle: 0.09503500000028 vclock: {1: 21} lag: 0.00026917457580566 3: id: 3 uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6 lsn: 0 upstream: status: follow idle: 0.77522099999987 peer: replicator@127.0.0.1:3303 lag: 0.0001838207244873 name: instance003 downstream: status: follow idle: 0.33186100000012 vclock: {1: 21} lag: 0 ...
This section shows how to perform manual failover and change a replica set leader.
In the
config.yamlfile, change the replica set leader frominstance001tonull:replicaset001: leader: null
Reload configurations on all three instances using config:reload() and check that instances are in read-only mode. The example below shows how to do this for
instance001:manual_leader:instance001> require('config'):reload() --- ... manual_leader:instance001> box.info.ro --- - true ... manual_leader:instance001> box.info.ro_reason --- - config ...
Make sure that box.info.vclock values are the same on all instances:
instance001:manual_leader:instance001> box.info.vclock --- - {1: 21} ...
instance002:manual_leader:instance002> box.info.vclock --- - {1: 21} ...
instance003:manual_leader:instance003> box.info.vclock --- - {1: 21} ...
Change a replica set leader in
config.yamltoinstance002:replicaset001: leader: instance002
Reload configuration on all instances using config:reload().
Make sure that
instance002is a new master:manual_leader:instance002> box.info.ro --- - false ...
Check replication status using
box.info.replication.
This section describes the process of removing an instance from a replica set.
Before removing an instance, make sure it is in read-only mode. If the instance is a master, perform manual failover.
Clear the
iprotooption forinstance003by setting its value to{}:instance003: iproto: {}
Reload configurations on
instance001andinstance002:instance001:manual_leader:instance001> require('config'):reload() --- ...
instance002:manual_leader:instance002> require('config'):reload() --- ...
Check that the
upstreamsection is missing forinstance003by executingbox.info.replication[3]:manual_leader:instance001> box.info.replication[3] --- - id: 3 uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6 lsn: 0 downstream: status: follow idle: 0.4588760000006 vclock: {1: 21} lag: 0 name: instance003 ...
Stop
instance003using the tt stop command:$ tt stop manual_leader:instance003 • The Instance manual_leader:instance003 (PID = 15551) has been terminated.
Check that
downstream.statusisstoppedforinstance003:manual_leader:instance001> box.info.replication[3] --- - id: 3 uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6 lsn: 0 downstream: status: stopped message: 'unexpected EOF when reading from socket, called on fd 27, aka 127.0.0.1:3301, peer of 127.0.0.1:54185: Broken pipe' system_message: Broken pipe name: instance003 ...
Remove
instance003from theinstances.ymlfile:instance001: instance002:
Remove
instance003fromconfig.yaml:instances: instance001: iproto: listen: - uri: '127.0.0.1:3301' instance002: iproto: listen: - uri: '127.0.0.1:3302'
Reload configurations on
instance001andinstance002:instance001:manual_leader:instance001> require('config'):reload() --- ...
instance002:manual_leader:instance002> require('config'):reload() --- ...
To remove an instance from the replica set permanently, it should be removed from the box.space._cluster system space:
Select all the tuples in the
box.space._clustersystem space:manual_leader:instance002> box.space._cluster:select{} --- - - [1, '9bb111c2-3ff5-36a7-00f4-2b9a573ea660', 'instance001'] - [2, '4cfa6e3c-625e-b027-00a7-29b2f2182f23', 'instance002'] - [3, '9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6', 'instance003'] ...
Delete a tuple corresponding to
instance003:manual_leader:instance002> box.space._cluster:delete(3) --- - [3, '9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6', 'instance003'] ...
Execute
box.info.replicationto check the health status:manual_leader:instance002> box.info.replication --- - 1: id: 1 uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660 lsn: 21 upstream: status: follow idle: 0.73316000000159 peer: replicator@127.0.0.1:3301 lag: 0.00016212463378906 name: instance001 downstream: status: follow idle: 0.7269320000014 vclock: {2: 1, 1: 21} lag: 0.00083398818969727 2: id: 2 uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23 lsn: 1 name: instance002 ...