Resolving replication conflicts

Solving problems with master-master replication

Tarantool guarantees that every update is applied only once on every replica. However, due to the asynchronous nature of replication, the order of updates is not guaranteed. We now analyze this problem with more details, provide examples of replication going out of sync, and suggest solutions.

Replacing the same primary key

Case 1: You have two instances of Tarantool. For example, you try to make a replace operation with the same primary key on both instances in the same time. This will cause a conflict over which tuple to save and which one to discard.

Tarantool trigger functions can help here to implement the rules of conflict resolution on some condition. For example, if you have a timestamp, you can declare saving the tuple with the bigger one.

First, you need a before_replace() trigger on the space which may have conflicts. In this trigger you can compare the old and new replica records and choose which one to use (or skip the update entirely, or merge two records together).

Then you need to set the trigger at the right time, before the space starts to receive any updates. The way you usually set the before_replace trigger is right when the space is created, so you need a trigger to set another trigger on the system space _space, to capture the moment when your space is created and set the trigger there. This can be an on_replace() trigger.

The difference between before_replace and on_replace is that on_replace is called after a row is inserted into the space, and before_replace is called before that.

To set a _space:on_replace() trigger correctly, you also need the right timing. The best timing to use it is when _space is just created, which is the box.ctl.on_schema_init() trigger.

You will also need to utilize box.on_commit to get access to the space being created. The resulting snippet would be the following:

local my_space_name = 'my_space'
local my_trigger = function(old, new) ... end -- your function resolving a conflict
box.ctl.on_schema_init(function()
    box.space._space:on_replace(function(old_space, new_space)
        if not old_space and new_space and new_space.name == my_space_name then
            box.on_commit(function()
                box.space[my_space_name]:before_replace(my_trigger)
            end
        end
    end)
end)

Preventing duplicate insert

Case 2: In a replica set of two masters, suppose master #1 tries to insert a tuple with the same unique key:

tarantool> box.space.tester:insert{1, 'data'}

This would cause an error saying Duplicate key exists in unique index 'primary' in space 'tester' and the replication would be stopped. (This is the behavior when the replication_skip_conflict configuration parameter has its default recommended value, false.)

$ # error messages from master #1
2017-06-26 21:17:03.233 [30444] main/104/applier/rep_user@100.96.166.1 I> can't read row
2017-06-26 21:17:03.233 [30444] main/104/applier/rep_user@100.96.166.1 memtx_hash.cc:226 E> ER_TUPLE_FOUND:
Duplicate key exists in unique index 'primary' in space 'tester'
2017-06-26 21:17:03.233 [30444] relay/[::ffff:100.96.166.178]/101/main I> the replica has closed its socket, exiting
2017-06-26 21:17:03.233 [30444] relay/[::ffff:100.96.166.178]/101/main C> exiting the relay loop

$ # error messages from master #2
2017-06-26 21:17:03.233 [30445] main/104/applier/rep_user@100.96.166.1 I> can't read row
2017-06-26 21:17:03.233 [30445] main/104/applier/rep_user@100.96.166.1 memtx_hash.cc:226 E> ER_TUPLE_FOUND:
Duplicate key exists in unique index 'primary' in space 'tester'
2017-06-26 21:17:03.234 [30445] relay/[::ffff:100.96.166.178]/101/main I> the replica has closed its socket, exiting
2017-06-26 21:17:03.234 [30445] relay/[::ffff:100.96.166.178]/101/main C> exiting the relay loop

If we check replication statuses with box.info, we will see that replication at master #1 is stopped (1.upstream.status = stopped). Additionally, no data is replicated from that master (section 1.downstream is missing in the report), because the downstream has encountered the same error:

# replication statuses (report from master #3)
tarantool> box.info
---
- version: 1.7.4-52-g980d30092
  id: 3
  ro: false
  vclock: {1: 9, 2: 1000000, 3: 3}
  uptime: 557
  lsn: 3
  vinyl: []
  cluster:
    uuid: 34d13b1a-f851-45bb-8f57-57489d3b3c8b
  pid: 30445
  status: running
  signature: 1000012
  replication:
    1:
      id: 1
      uuid: 7ab6dee7-dc0f-4477-af2b-0e63452573cf
      lsn: 9
      upstream:
        peer: replicator@192.168.0.101:3301
        lag: 0.00050592422485352
        status: stopped
        idle: 445.8626639843
        message: Duplicate key exists in unique index 'primary' in space 'tester'
    2:
      id: 2
      uuid: 9afbe2d9-db84-4d05-9a7b-e0cbbf861e28
      lsn: 1000000
      upstream:
        status: follow
        idle: 201.99915885925
        peer: replicator@192.168.0.102:3301
        lag: 0.0015020370483398
      downstream:
        vclock: {1: 8, 2: 1000000, 3: 3}
    3:
      id: 3
      uuid: e826a667-eed7-48d5-a290-64299b159571
      lsn: 3
  uuid: e826a667-eed7-48d5-a290-64299b159571
...

When replication is later manually resumed:

# resuming stopped replication (at all masters)
tarantool> original_value = box.cfg.replication
tarantool> box.cfg{replication={}}
tarantool> box.cfg{replication=original_value}

… the faulty row in the write-ahead-log files is skipped.

Solution #1: replication runs out of sync

In a master-master cluster of two instances, suppose we make the following operation:

tarantool> box.space.tester:upsert({1}, {{'=', 2, box.info.uuid}})

When this operation is applied on both instances in the replica set:

# at master #1
tarantool> box.space.tester:upsert({1}, {{'=', 2, box.info.uuid}})
# at master #2
tarantool> box.space.tester:upsert({1}, {{'=', 2, box.info.uuid}})

… we can have the following results, depending on the order of execution:

each master’s row contains the UUID from master #1,
each master’s row contains the UUID from master #2,
master #1 has the UUID of master #2, and vice versa.

Solution #2: commutative changes

The cases described in the previous paragraphs represent examples of non-commutative operations, i.e. operations whose result depends on the execution order. On the contrary, for commutative operations, the execution order does not matter.

Consider for example the following command:

tarantool> box.space.tester:upsert{{1, 0}, {{'+', 2, 1)}

This operation is commutative: we get the same result no matter in which order the update is applied on the other masters.

Solution #3: trigger usage

The logic and the snippet setting a trigger will be the same here as in case 1. But the trigger function will differ:

local my_space_name = 'test'
local my_trigger = function(old, new, sp, op)
    -- op:  ‘INSERT’, ‘DELETE’, ‘UPDATE’, or ‘REPLACE’
    if new == nil then
        print("No new during "..op, old)
        return -- deletes are ok
    end
    if old == nil then
        print("Insert new, no old", new)
        return new  -- insert without old value: ok
    end
    print(op.." duplicate", old, new)
    if op == 'INSERT' then
        if new[2] > old[2] then
            -- Creating new tuple will change op to ‘REPLACE’
            return box.tuple.new(new)
        end
        return old
    end
    if new[2] > old[2] then
        return new
    else
        return old
    end
    return
end

box.ctl.on_schema_init(function()
    box.space._space:on_replace(function(old_space, new_space)
        if not old_space and new_space and new_space.name == my_space_name then
            box.on_commit(function()
                box.space[my_space_name]:before_replace(my_trigger)
            end)
        end
    end)
end)

Version:

Resolving replication conflicts

Solving problems with master-master replication

Replacing the same primary key

Preventing duplicate insert