Configuring synchronous replication | Tarantool
How-to guides Replication tutorials Configuring synchronous replication

Configuring synchronous replication

Since version 2.5.1, synchronous replication can be enabled per-space by using the is_sync option:

box.schema.create_space('test1', {is_sync = true})

Any transaction doing a DML request on this space becomes synchronous. Notice that DDL on this space (including truncation) is not synchronous.

To control the behavior of synchronous transactions, there exist global box.cfg options:

box.cfg{replication_synchro_quorum = <number of instances>}
box.cfg{replication_synchro_quorum = "N / 2 + 1"}

This option tells how many replicas should confirm the receipt of a synchronous transaction before it is committed. Since version 2.5.3, the parameter supports dynamic evaluation of the quorum number (see reference for the replication_synchro_quorum parameter for details). Since version 2.10.0, this option does not account for anonymous replicas. As a usage example, consider this:

-- Instance 1
box.cfg{
    listen = 3313,
    replication_synchro_quorum = 2,
}
box.schema.user.grant('guest', 'super')
_ = box.schema.space.create('sync', {is_sync=true})
_ = _:create_index('pk')
-- Instance 2
box.cfg{
    listen = 3314,
    replication = 'localhost:3313'
}
-- Instance 1
box.space.sync:replace{1}

When the first instance makes replace(), it won’t finish until the second instance confirms its receipt and successful appliance. Note that the quorum is set to 2, but the transaction was still committed even though there is only one replica. This is because the master instance itself also participates in the quorum.

Now, if the second instance is down, the first one won’t be able to commit any synchronous change.

-- Instance 2
Ctrl+D
-- Instance 1
tarantool> box.space.sync:replace{2}
---
- error: Quorum collection for a synchronous transaction is timed out
...

The transaction wasn’t committed because it failed to achieve the quorum in the given time. The time is a second configuration option:

box.cfg{replication_synchro_timeout = <number of seconds, can be float>}

It tells how many seconds to wait for a synchronous transaction quorum replication until it is declared failed and is rolled back.

A successful synchronous transaction commit is persisted in the WAL as a special CONFIRM record. The rollbacks are similarly persisted with a ROLLBACK record.

The timeout and quorum options are not used on replicas. It means if the master dies, the pending synchronous transactions will be kept waiting on the replicas until a new master is elected.

If a transaction is rolled back, it does not mean the ROLLBACK message reached the replicas. It still can happen that the master node suddenly dies, so the transaction will be committed by the new master. Your application logic should be ready for that.

Synchronous transactions are better to use with full mesh. Then the replicas can talk to each other in case of the master node’s death and still confirm some pending transactions.

Found what you were looking for?
Feedback