Managing leader elections | Tarantool
Tarantool
Check out the new release policy
How-to guides Replication tutorials Managing leader elections

Managing leader elections

Starting from version 2.6.1, Tarantool has the built-in functionality managing automated leader election in a replica set. Learn more about the concept of leader election.

box.cfg({
    election_mode = <string>,
    election_timeout = <seconds>,
    replication_timeout = <seconds>,
    replication_synchro_quorum = <count>,
    election_fencing_enabled = <boolean>
})
  • election_mode – specifies the role of a node in the leader election process. For the details, refer to the option description in the configuration reference.
  • election_timeout – specifies the timeout between election rounds if the previous round ended up with a split vote. For the details, refer to the option description in the configuration reference.
  • replication_timeout – reuse of the replication_timeout configuration option for the purpose of the leader election process. Heartbeats sent by an active leader have a timeout after which a new election starts. Heartbeats are sent once per <replication_timeout> seconds. Default value is 1. The leader is considered dead if it hasn’t sent any heartbeats for the period of replication_timeout * 4.
  • replication_synchro_quorum – reuse of the replication_synchro_quorum option for the purpose of configuring the election quorum. The default value is 1, meaning that each node becomes a leader immediately after voting for itself. It is the best to set up this option value to the (<cluster size> / 2) + 1. Otherwise, there is no guarantee that there is only one leader at a time.
  • election_fencing_enabled – switches the leader fencing mode on and off. For the details, refer to the option description in the configuration reference.

Besides, it is important to know that being a leader is not the only requirement for a node to be writable. A leader node should have its read_only option set to false (box.cfg{read_only = false}), and its connectivity quorum should be satisfied (box.cfg{replication_connect_quorum = <count>}) or disabled (box.cfg{replication_connect_quorum = 0}).

Nothing prevents from setting the read_only option to true, but the leader just won’t be writable then. The option doesn’t affect the election process itself, so a read-only instance can still vote and become a leader.

To monitor the current state of a node regarding the leader election, you can use the box.info.election function. For details, refer to the function description.

Example:

tarantool> box.info.election
---
- state: follower
  vote: 0
  leader: 0
  term: 1
...

The Raft-based election implementation logs all its actions with the RAFT: prefix. The actions are new Raft message handling, node state changing, voting, term bumping, and so on.

Leader election won’t work correctly if the election quorum is set to less or equal than <cluster size> / 2 because in that case, a split vote can lead to a state when two leaders are elected at once.

For example, let’s assume there are five nodes. When quorum is set to 2, node1 and node2 can both vote for node1. node3 and node4 can both vote for node5. In this case, node1 and node5 both win the election. When the quorum is set to the cluster majority, that is (<cluster size> / 2) + 1 or bigger, the split vote is not possible.

That should be considered when adding new nodes. If the majority value is changing, it’s better to update the quorum on all the existing nodes before adding a new one.

Also, the automated leader election won’t bring many benefits in terms of data safety when used without synchronous replication. If the replication is asynchronous and a new leader gets elected, the old leader is still active and considers itself the leader. In such case, nothing stops it from accepting requests from clients and making transactions. Non-synchronous transactions will be successfully committed because they won’t be checked against the quorum of replicas. Synchronous transactions will fail because they won’t be able to collect the quorum – most of the replicas will reject these old leader’s transactions since it is not a leader anymore.

Found what you were looking for?
Feedback