Master-master | Tarantool
Документация на русском языке
поддерживается сообществом

Master-master

Example on GitHub: master_master

This tutorial shows how to configure and work with a master-master replica set.

Before starting this tutorial:

  1. Install the tt utility.

  2. Create a tt environment in the current directory by executing the tt init command.

  3. Inside the instances.enabled directory of the created tt environment, create the master_master directory.

  4. Inside instances.enabled/master_master, create the instances.yml and config.yaml files:

    • instances.yml specifies instances to run in the current environment and should look like this:

      instance001:
      instance002:
      
    • The config.yaml file is intended to store a replica set configuration.

This section describes how to configure a replica set in config.yaml.

First, set the replication.failover option to off:

replication:
  failover: off

Define a replica set topology inside the groups section:

  • The database.mode option should be set to rw to make instances work in read-write mode.
  • The iproto.listen option specifies an address used to listen for incoming requests and allows replicas to communicate with each other.
groups:
  group001:
    replicasets:
      replicaset001:
        instances:
          instance001:
            database:
              mode: rw
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            database:
              mode: rw
            iproto:
              listen:
              - uri: '127.0.0.1:3302'

In the credentials section, create the replicator user with the replication role:

credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [replication]

Set iproto.advertise.peer to advertise the current instance to other replica set members:

iproto:
  advertise:
    peer:
      login: replicator

The resulting replica set configuration should look as follows:

credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [replication]

iproto:
  advertise:
    peer:
      login: replicator

replication:
  failover: off

groups:
  group001:
    replicasets:
      replicaset001:
        instances:
          instance001:
            database:
              mode: rw
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            database:
              mode: rw
            iproto:
              listen:
              - uri: '127.0.0.1:3302'

  1. After configuring a replica set, execute the tt start command from the tt environment directory:

    $ tt start master_master
       • Starting an instance [master_master:instance001]...
       • Starting an instance [master_master:instance002]...
    
  2. Check that instances are in the RUNNING status using the tt status command:

    $ tt status master_master
    INSTANCE                      STATUS      PID
    master_master:instance001     RUNNING     30818
    master_master:instance002     RUNNING     30819
    

  1. Connect to both instances using tt connect. Below is the example for instance001:

    $ tt connect master_master:instance001
       • Connecting to the instance...
       • Connected to master_master:instance001
    
    master_master:instance001>
    
  2. Check that both instances are writable using box.info.ro:

    • instance001:

      master_master:instance001> box.info.ro
      ---
      - false
      ...
      
    • instance002:

      master_master:instance002> box.info.ro
      ---
      - false
      ...
      
  3. Execute box.info.replication to check a replica set status. For instance002, upstream.status and downstream.status should be follow.

    master_master:instance001> box.info.replication
    ---
    - 1:
        id: 1
        uuid: c3bfd89f-5a1c-4556-aa9f-461377713a2a
        lsn: 7
        name: instance001
      2:
        id: 2
        uuid: dccf7485-8bff-47f6-bfc4-b311701e36ef
        lsn: 0
        upstream:
          status: follow
          idle: 0.93246499999987
          peer: replicator@127.0.0.1:3302
          lag: 0.00016188621520996
        name: instance002
        downstream:
          status: follow
          idle: 0.8988360000003
          vclock: {1: 7}
          lag: 0
    ...
    

    To see the diagrams that illustrate how the upstream and downstream connections look, refer to Monitoring a replica set.

Примечание

Note that a vclock value might include the 0 component that is related to local space operations and might differ for different instances in a replica set.

To check that both instances get updates from each other, follow the steps below:

  1. On instance001, create a space, format it, and create a primary index:

    box.schema.space.create('bands')
    box.space.bands:format({
        { name = 'id', type = 'unsigned' },
        { name = 'band_name', type = 'string' },
        { name = 'year', type = 'unsigned' }
    })
    box.space.bands:create_index('primary', { parts = { 'id' } })
    

    Then, add sample data to this space:

    box.space.bands:insert { 1, 'Roxette', 1986 }
    box.space.bands:insert { 2, 'Scorpions', 1965 }
    
  2. On instance002, use the select operation to make sure data is replicated:

    master_master:instance002> box.space.bands:select()
    ---
    - - [1, 'Roxette', 1986]
      - [2, 'Scorpions', 1965]
    ...
    
  3. Add more data to the created space on instance002:

    box.space.bands:insert { 3, 'Ace of Base', 1987 }
    box.space.bands:insert { 4, 'The Beatles', 1960 }
    
  4. Get back to instance001 and use select to make sure new records are replicated:

    master_master:instance001> box.space.bands:select()
    ---
    - - [1, 'Roxette', 1986]
      - [2, 'Scorpions', 1965]
      - [3, 'Ace of Base', 1987]
      - [4, 'The Beatles', 1960]
    ...
    
  5. Check that box.info.vclock values are the same on both instances:

    • instance001:

      master_master:instance001> box.info.vclock
      ---
      - {2: 2, 1: 12}
      ...
      
    • instance002:

      master_master:instance002> box.info.vclock
      ---
      - {2: 2, 1: 12}
      ...
      

Примечание

To learn how to fix and prevent replication conflicts using trigger functions, see Resolving replication conflicts.

To insert conflicting records to instance001 and instance002, follow the steps below:

  1. Stop instance001 using the tt stop command:

    $ tt stop master_master:instance001
    
  2. On instance002, insert a new record:

    box.space.bands:insert { 5, 'incorrect data', 0 }
    
  3. Stop instance002 using tt stop:

    $ tt stop master_master:instance002
    
  4. Start instance001 back:

    $ tt start master_master:instance001
    
  5. Connect to instance001 and insert a record that should conflict with a record already inserted on instance002:

    box.space.bands:insert { 5, 'Pink Floyd', 1965 }
    
  6. Start instance002 back:

    $ tt start master_master:instance002
    

    Then, check box.info.replication on instance001. upstream.status should be stopped because of the Duplicate key exists error:

    master_master:instance001> box.info.replication
    ---
    - 1:
        id: 1
        uuid: c3bfd89f-5a1c-4556-aa9f-461377713a2a
        lsn: 13
        name: instance001
      2:
        id: 2
        uuid: dccf7485-8bff-47f6-bfc4-b311701e36ef
        lsn: 2
        upstream:
          peer: replicator@127.0.0.1:3302
          lag: 115.99977827072
          status: stopped
          idle: 2.0342070000006
          message: Duplicate key exists in unique index "primary" in space "bands" with
            old tuple - [5, "Pink Floyd", 1965] and new tuple - [5, "incorrect data",
            0]
        name: instance002
        downstream:
          status: stopped
          message: 'unexpected EOF when reading from socket, called on fd 24, aka 127.0.0.1:3301,
            peer of 127.0.0.1:58478: Broken pipe'
          system_message: Broken pipe
    ...
    

    The diagram below illustrates how the upstream and downstream connections look like:

    replication status on a new master

To resolve a replication conflict, instance002 should get the correct data from instance001 first. To achieve this, instance002 should be rebootstrapped:

  1. Select all the tuples in the box.space._cluster system space to get a UUID of instance002:

    master_master:instance001> box.space._cluster:select()
    ---
    - - [1, 'c3bfd89f-5a1c-4556-aa9f-461377713a2a', 'instance001']
      - [2, 'dccf7485-8bff-47f6-bfc4-b311701e36ef', 'instance002']
    ...
    
  2. In the config.yaml file, change the following instance002 settings:

    • Set database.mode to ro.
    • Set database.instance_uuid to a UUID value obtained in the previous step.
    instance002:
      database:
        mode: ro
        instance_uuid: 'dccf7485-8bff-47f6-bfc4-b311701e36ef'
    
  3. Reload configurations on both instances using the config:reload() function:

    • instance001:

      master_master:instance001> require('config'):reload()
      ---
      ...
      
    • instance002:

      master_master:instance002> require('config'):reload()
      ---
      ...
      
  4. Delete write-ahead logs and snapshots stored in the var/lib/instance002 directory.

    Примечание

    var/lib is the default directory used by tt to store write-ahead logs and snapshots. Learn more from Configuration.

  5. Restart instance002 using the tt restart command:

    $ tt restart master_master:instance002
    
  6. Connect to instance002 and make sure it received the correct data from instance001:

    master_master:instance002> box.space.bands:select()
    ---
    - - [1, 'Roxette', 1986]
      - [2, 'Scorpions', 1965]
      - [3, 'Ace of Base', 1987]
      - [4, 'The Beatles', 1960]
      - [5, 'Pink Floyd', 1965]
    ...
    

After reseeding a replica, you need to resolve a replication conflict that keeps replication stopped:

  1. Execute box.info.replication on instance001. upstream.status is still stopped:

    master_master:instance001> box.info.replication
    ---
    - 1:
        id: 1
        uuid: c3bfd89f-5a1c-4556-aa9f-461377713a2a
        lsn: 13
        name: instance001
      2:
        id: 2
        uuid: dccf7485-8bff-47f6-bfc4-b311701e36ef
        lsn: 2
        upstream:
          peer: replicator@127.0.0.1:3302
          lag: 115.99977827072
          status: stopped
          idle: 1013.688243
          message: Duplicate key exists in unique index "primary" in space "bands" with
            old tuple - [5, "Pink Floyd", 1965] and new tuple - [5, "incorrect data",
            0]
        name: instance002
        downstream:
          status: follow
          idle: 0.69694700000036
          vclock: {2: 2, 1: 13}
          lag: 0
    ...
    

    The diagram below illustrates how the upstream and downstream connections look like:

    replication status after reseeding a replica
  2. In the config.yaml file, clear the iproto option for instance001 by setting its value to {} to disconnect this instance from instance002. Set database.mode to ro:

    instance001:
      database:
        mode: ro
      iproto: {}
    
  3. Reload configuration on instance001 only:

    master_master:instance001> require('config'):reload()
    ---
    ...
    
  4. Change database.mode values back to rw for both instances and restore iproto.listen for instance001. The database.instance_uuid option can be removed for instance002:

    instance001:
      database:
        mode: rw
      iproto:
        listen:
        - uri: '127.0.0.1:3301'
    instance002:
      database:
        mode: rw
      iproto:
        listen:
        - uri: '127.0.0.1:3302'
    
  5. Reload configurations on both instances one more time:

    • instance001:

      master_master:instance001> require('config'):reload()
      ---
      ...
      
    • instance002:

      master_master:instance002> require('config'):reload()
      ---
      ...
      
  6. Check box.info.replication. upstream.status should be follow now.

    master_master:instance001> box.info.replication
    ---
    - 1:
        id: 1
        uuid: c3bfd89f-5a1c-4556-aa9f-461377713a2a
        lsn: 13
        name: instance001
      2:
        id: 2
        uuid: dccf7485-8bff-47f6-bfc4-b311701e36ef
        lsn: 2
        upstream:
          status: follow
          idle: 0.86873800000012
          peer: replicator@127.0.0.1:3302
          lag: 0.0001060962677002
        name: instance002
        downstream:
          status: follow
          idle: 0.058662999999797
          vclock: {2: 2, 1: 13}
          lag: 0
    ...
    

The process of adding instances to a replica set and removing them is similar for all failover modes. Learn how to do this from the Master-replica: manual failover tutorial:

Before removing an instance from a replica set with replication.failover set to off, make sure this instance is in read-only mode.

Нашли ответ на свой вопрос?
Обратная связь