Master-replica: manual failover

Example on GitHub: manual_leader

This tutorial shows how to configure and work with a replica set with manual failover.

Prerequisites

Before starting this tutorial:

Install the tt utility.
Create a tt environment in the current directory by executing the tt init command.
Inside the instances.enabled directory of the created tt environment, create the manual_leader directory.
Inside instances.enabled/manual_leader, create the instances.yml and config.yaml files:
- instances.yml specifies instances to run in the current environment and should look like this:
```
instance001:
instance002:
```
- The config.yaml file is intended to store a replica set configuration.

Configuring a replica set

This section describes how to configure a replica set in config.yaml.

Step 1: Configuring a failover mode

First, set the replication.failover option to manual:

replication:
  failover: manual

Step 2: Defining a replica set topology

Define a replica set topology inside the groups section:

The leader option sets instance001 as a replica set leader.
The iproto.listen option specifies an address used to listen for incoming requests and allows replicas to communicate with each other.

groups:
  group001:
    replicasets:
      replicaset001:
        leader: instance001
        instances:
          instance001:
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            iproto:
              listen:
              - uri: '127.0.0.1:3302'

Step 3: Creating a user for replication

In the credentials section, create the replicator user with the replication role:

credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [replication]

Step 4: Specifying advertise URIs

Set iproto.advertise.peer to advertise the current instance to other replica set members:

iproto:
  advertise:
    peer:
      login: replicator

Resulting configuration

The resulting replica set configuration should look as follows:

credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [replication]

iproto:
  advertise:
    peer:
      login: replicator

replication:
  failover: manual

groups:
  group001:
    replicasets:
      replicaset001:
        leader: instance001
        instances:
          instance001:
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            iproto:
              listen:
              - uri: '127.0.0.1:3302'

Working with a replica set

Starting instances

After configuring a replica set, execute the tt start command from the tt environment directory:

$ tt start manual_leader
   • Starting an instance [manual_leader:instance001]...
   • Starting an instance [manual_leader:instance002]...

Check that instances are in the RUNNING status using the tt status command:

$ tt status manual_leader
INSTANCE                   STATUS   PID   MODE  CONFIG  BOX      UPSTREAM
manual_leader:instance001  RUNNING  8841  RW    ready   running  --
manual_leader:instance002  RUNNING  8842  RO    ready   running  --

Checking a replica set status

Connect to instance001 using tt connect:

$ tt connect manual_leader:instance001
   • Connecting to the instance...
   • Connected to manual_leader:instance001

Make sure that the instance is in the running state by executing box.info.status:
```
manual_leader:instance001> box.info.status
---
- running
...
```

Check that the instance is writable using box.info.ro:

manual_leader:instance001> box.info.ro
---
- false
...

Execute box.info.replication to check a replica set status. For instance002, upstream.status and downstream.status should be follow.

manual_leader:instance001> box.info.replication
---
- 1:
    id: 1
    uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
    lsn: 7
    name: instance001
  2:
    id: 2
    uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
    lsn: 0
    upstream:
      status: follow
      idle: 0.3893879999996
      peer: replicator@127.0.0.1:3302
      lag: 0.00028800964355469
    name: instance002
    downstream:
      status: follow
      idle: 0.37777199999982
      vclock: {1: 7}
      lag: 0
...

To see the diagrams that illustrate how the upstream and downstream connections look, refer to Monitoring a replica set.

Adding data

To check that a replica (instance002) gets all updates from the master, follow the steps below:

On instance001, create a space and add data as described in CRUD operation examples.
Open the second terminal, connect to instance002 using tt connect, and use the select operation to make sure data is replicated.
Check that box.info.vclock values are the same on both instances:
- instance001:
```
manual_leader:instance001> box.info.vclock
---
- {1: 21}
...
```
- instance002:
```
manual_leader:instance002> box.info.vclock
---
- {1: 21}
...
```
Note

Note that a vclock value might include the 0 component that is related to local space operations and might differ for different instances in a replica set.

Adding instances

This section describes how to add a new replica to a replica set.

Adding an instance to the configuration

Add instance003 to the instances.yml file:
```
instance001:
instance002:
instance003:
```

Add instance003 with the specified iproto.listen option to the config.yaml file:

groups:
  group001:
    replicasets:
      replicaset001:
        leader: instance001
        instances:
          instance001:
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            iproto:
              listen:
              - uri: '127.0.0.1:3302'
          instance003:
            iproto:
              listen:
              - uri: '127.0.0.1:3303'

Starting an instance

Open the third terminal to work with a new instance. Start instance003 using tt start:

$ tt start manual_leader:instance003
   • Starting an instance [manual_leader:instance003]...

Check a replica set status using tt status:

$ tt status manual_leader
INSTANCE                   STATUS   PID   MODE  CONFIG  BOX      UPSTREAM
manual_leader:instance001  RUNNING  8841  RW    ready   running  --
manual_leader:instance002  RUNNING  8842  RO    ready   running  --
manual_leader:instance003  RUNNING  8856  RO    ready   running  --

Reloading configuration

After you added instance003 to the configuration and started it, you need to reload configurations on all instances. This is required to allow instance001 and instance002 to get data from the new instance in case it becomes a master.

Connect to instance003 using tt connect:

$ tt connect manual_leader:instance003
   • Connecting to the instance...
   • Connected to manual_leader:instance001

Reload configurations on all three instances using the reload() function provided by the config module:

instance001:

manual_leader:instance001> require('config'):reload()
---
...

instance002:

manual_leader:instance002> require('config'):reload()
---
...

instance003:

manual_leader:instance003> require('config'):reload()
---
...

Execute box.info.replication to check a replica set status. Make sure that upstream.status and downstream.status are follow for instance003.

manual_leader:instance001> box.info.replication
---
- 1:
    id: 1
    uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
    lsn: 21
    name: instance001
  2:
    id: 2
    uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
    lsn: 0
    upstream:
      status: follow
      idle: 0.052655000000414
      peer: replicator@127.0.0.1:3302
      lag: 0.00010204315185547
    name: instance002
    downstream:
      status: follow
      idle: 0.09503500000028
      vclock: {1: 21}
      lag: 0.00026917457580566
  3:
    id: 3
    uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6
    lsn: 0
    upstream:
      status: follow
      idle: 0.77522099999987
      peer: replicator@127.0.0.1:3303
      lag: 0.0001838207244873
    name: instance003
    downstream:
      status: follow
      idle: 0.33186100000012
      vclock: {1: 21}
      lag: 0
        ...

Performing manual failover

This section shows how to perform manual failover and change a replica set leader.

Switching instances to read-only mode

In the config.yaml file, change the replica set leader from instance001 to null:
```
replicaset001:
  leader: null
```

Reload configurations on all three instances using config:reload() and check that instances are in read-only mode. The example below shows how to do this for instance001:

manual_leader:instance001> require('config'):reload()
---
...
manual_leader:instance001> box.info.ro
---
- true
...
manual_leader:instance001> box.info.ro_reason
---
- config
...

Make sure that box.info.vclock values are the same on all instances:

instance001:

manual_leader:instance001> box.info.vclock
---
- {1: 21}
...

instance002:

manual_leader:instance002> box.info.vclock
---
- {1: 21}
...

instance003:

manual_leader:instance003> box.info.vclock
---
- {1: 21}
...

Configuring a new leader

Change a replica set leader in config.yaml to instance002:
```
replicaset001:
  leader: instance002
```
Reload configuration on all instances using config:reload().

Make sure that instance002 is a new master:

manual_leader:instance002> box.info.ro
---
- false
...

Check replication status using box.info.replication.

Removing instances

This section describes the process of removing an instance from a replica set.

Before removing an instance, make sure it is in read-only mode. If the instance is a master, perform manual failover.

Disconnecting an instance

Clear the iproto option for instance003 by setting its value to {}:
```
instance003:
  iproto: {}
```

Reload configurations on instance001 and instance002:

instance001:

manual_leader:instance001> require('config'):reload()
---
...

instance002:

manual_leader:instance002> require('config'):reload()
---
...

Check that the upstream section is missing for instance003 by executing box.info.replication[3]:

manual_leader:instance001> box.info.replication[3]
---
- id: 3
  uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6
  lsn: 0
  downstream:
    status: follow
    idle: 0.4588760000006
    vclock: {1: 21}
    lag: 0
  name: instance003
...

Stopping an instance

Stop instance003 using the tt stop command:

$ tt stop manual_leader:instance003
   • The Instance manual_leader:instance003 (PID = 15551) has been terminated.

Check that downstream.status is stopped for instance003:

manual_leader:instance001> box.info.replication[3]
---
- id: 3
  uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6
  lsn: 0
  downstream:
    status: stopped
    message: 'unexpected EOF when reading from socket, called on fd 27, aka 127.0.0.1:3301,
      peer of 127.0.0.1:54185: Broken pipe'
    system_message: Broken pipe
  name: instance003
...

Removing an instance from the configuration

Remove instance003 from the instances.yml file:
```
instance001:
instance002:
```

Remove instance003 from config.yaml:

instances:
  instance001:
    iproto:
      listen:
      - uri: '127.0.0.1:3301'
  instance002:
    iproto:
      listen:
      - uri: '127.0.0.1:3302'

Reload configurations on instance001 and instance002:

instance001:

manual_leader:instance001> require('config'):reload()
---
...

instance002:

manual_leader:instance002> require('config'):reload()
---
...

Removing an instance from the ‘_cluster’ space

To remove an instance from the replica set permanently, it should be removed from the box.space._cluster system space:

Select all the tuples in the box.space._cluster system space:

manual_leader:instance002> box.space._cluster:select{}
---
- - [1, '9bb111c2-3ff5-36a7-00f4-2b9a573ea660', 'instance001']
  - [2, '4cfa6e3c-625e-b027-00a7-29b2f2182f23', 'instance002']
  - [3, '9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6', 'instance003']
...

Delete a tuple corresponding to instance003:

manual_leader:instance002> box.space._cluster:delete(3)
---
- [3, '9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6', 'instance003']
...

Execute box.info.replication to check the health status:

manual_leader:instance002> box.info.replication
---
- 1:
    id: 1
    uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
    lsn: 21
    upstream:
      status: follow
      idle: 0.73316000000159
      peer: replicator@127.0.0.1:3301
      lag: 0.00016212463378906
    name: instance001
    downstream:
      status: follow
      idle: 0.7269320000014
      vclock: {2: 1, 1: 21}
      lag: 0.00083398818969727
  2:
    id: 2
    uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
    lsn: 1
    name: instance002
...

Version:

Master-replica: manual failover

Prerequisites

Configuring a replica set

Step 1: Configuring a failover mode

Step 2: Defining a replica set topology

Step 3: Creating a user for replication

Step 4: Specifying advertise URIs

Resulting configuration

Working with a replica set

Starting instances

Checking a replica set status

Adding data

Adding instances

Adding an instance to the configuration

Starting an instance

Reloading configuration

Performing manual failover

Switching instances to read-only mode

Configuring a new leader

Removing instances

Disconnecting an instance

Stopping an instance

Removing an instance from the configuration

Removing an instance from the ‘_cluster’ space