Replication requests and responses | Tarantool
Reference Internals Binary protocol Replication requests and responses

Replication requests and responses

This section describes internal requests and responses that happen during replication. Each of them is distinguished by the header, containing a unique IPROTO_REQUEST_TYPE value. These values and the corresponding packet body structures are considered below.

Connectors and clients do not need to send replication packets.

Name Code Description
IPROTO_JOIN 0x41 Request to join a replica set
IPROTO_SUBSCRIBE 0x42 Request to subscribe to a specific node in a replica set
IPROTO_VOTE 0x44 Request for replication
IPROTO_BALLOT 0x29 Response to IPROTO_VOTE. Used during replica set bootstrap
IPROTO_FETCH_SNAPSHOT 0x45 Fetch the master’s snapshot and start anonymous replication. See replication_anon
IPROTO_REGISTER 0x46 Register an anonymous replica so it is not anonymous anymore

The master also sends heartbeat messages to the replicas. The heartbeat message’s IPROTO_REQUEST_TYPE is 0.

Below are details on individual replication requests. For synchronous replication requests, see below.

Once in replication_timeout seconds, a master sends a heartbeat message to a replica, and the replica sends a response. Both messages’ IPROTO_REQUEST_TYPE is IPROTO_OK. IPROTO_TIMESTAMP is a float-64 MP_DOUBLE 8-byte timestamp.

Since version 2.11, both messages have an optional field in the body that contains the IPROTO_VCLOCK_SYNC key. The master’s heartbeat has no body if the IPROTO_VCLOCK_SYNC key is omitted.

The message from master to a replica:

Heartbeat message from masterSizeMP_UINTHeaderIPROTO_REQUEST_TYPEIPROTO_OKIPROTO_REPLICA_IDMP_UINTIPROTO_TIMESTAMPMP_DOUBLEBody(Optional)IPROTO_VCLOCK_SYNCMP_UINT

The response from the replica:

Response from replicaSizeMP_UINTHeaderIPROTO_REQUEST_TYPEIPROTO_OKIPROTO_REPLICA_IDMP_UINT(Optional)IPROTO_TIMESTAMPMP_DOUBLEBodyIPROTO_VCLOCKMP_MAP(Optional)IPROTO_VCLOCK_SYNCMP_UINTIPROTO_TERMMP_UINT

The tutorial Understanding the binary protocol shows actual byte codes of the above heartbeat examples.

Code: 0x41.

To join a replica set, an instance must send an initial IPROTO_JOIN request to any node in the replica set:

IPROTO_JOINSizeMP_UINTHeaderIPROTO_REQUEST_TYPEIPROTO_JOINIPROTO_SYNCMP_UINTBodyIPROTO_INSTANCE_UUIDMP_STR - UUID of this instance

The node that receives the request does the following in response:

  1. It sends its vclock:

    Response to IPROTO_JOINSizeMP_UINTHeaderIPROTO_REQUEST_TYPEIPROTO_OKIPROTO_SYNCMP_UINTBodyIPROTO_VCLOCKMP_MAP
  2. It sends a number of INSERT requests (with additional LSN and ServerID). In this way, the data is updated on the instance that sent the IPROTO_JOIN request. The instance should not reply to these INSERT requests.

  3. It sends the new vclock’s MP_MAP in a response similar to the one above and closes the socket.

Code: 0x42.

If IPROTO_JOIN was successful, the initiator instance must send an IPROTO_SUBSCRIBE request to all the nodes listed in its box.cfg.replication:

IPROTO_SUBSCRIBESizeMP_UINTHeaderIPROTO_REQUEST_TYPEIPROTO_SUBSCRIBEIPROTO_SYNCMP_UINTBodyIPROTO_INSTANCE_UUIDMP_STRIPROTO_CLUSTER_UUIDMP_STRIPROTO_VCLOCKMP_MAPIPROTO_SERVER_VERSIONMP_UINTIPROTO_REPLICA_ANONMP_BOOL(Optional)IPROTO_ID_FILTERMP_ARRAY

After a successful IPROTO_SUBSCRIBE request, the instance must process every request that could come from other masters. Each master’s request includes a vclock pair corresponding to that master – its instance ID and its LSN, independent from other masters.

IPROTO_ID_FILTER (0x51) is an optional key used in the SUBSCRIBE request followed by an array of ids of instances whose rows won’t be relayed to the replica. The field is encoded only when the ID list is not empty.

Code: 0x44.

When connecting for replication, an instance sends an IPROTO_VOTE request. It has no body:

IPROTO_VOTESizeMP_UINTHeaderIPROTO_REQUEST_TYPEIPROTO_VOTE

IPROTO_VOTE is critical during replica set bootstrap. The response to this request is IPROTO_BALLOT.

Code: 0x29.

This value of IPROTO_REQUEST_TYPE indicates a message sent in response to IPROTO_VOTE (not to be confused with the key IPROTO_RAFT_VOTE).

IPROTO_BALLOT and IPROTO_VOTE are critical during replica set bootstrap. IPROTO_BALLOT corresponds to a map containing the following fields:

IPROTO_BALLOT bodyIPROTO_BALLOT_IS_RO_CFGMP_BOOLIPROTO_BALLOT_VCLOCKMP_MAPIPROTO_BALLOT_GC_VCLOCKMP_MAPIPROTO_BALLOT_IS_ROMP_BOOLIPROTO_BALLOT_IS_ANONMP_BOOLIPROTO_BALLOT_IS_BOOTEDMP_BOOLIPROTO_BALLOT_CAN_LEADMP_BOOLIPROTO_BALLOT_BOOTSTRAP_LEADER_UUIDMP_STRIPROTO_BALLOT_REGISTERED_REPLICA_UUIDSMP_ARRAY

IPROTO_BALLOT_REGISTERED_REPLICA_UUIDS has the MP_ARRAY type. The array contains MP_STR elements.

Name Code Description
IPROTO_RAFT 0x1e Inform that the node changed its RAFT status
IPROTO_RAFT_PROMOTE 0x1f Wait, then choose new replication leader
IPROTO_RAFT_DEMOTE 0x20 Revoke the leader role from the instance
IPROTO_RAFT_CONFIRM 0x28 Confirm that the RAFT transactions have achieved quorum and can be committed
IPROTO_RAFT_ROLLBACK 0x29 Roll back the RAFT transactions because they haven’t achieved quorum

Code: 0x1e.

A node broadcasts the IPROTO_RAFT request to all the replicas connected to it when the RAFT state of the node changes. It can be any actions changing the state, like starting a new election, bumping the term, voting for another node, becoming the leader, and so on.

If there should be a response, for example, in case of a vote request to other nodes, the response will also be an IPROTO_RAFT message. In this case, the node should be connected as a replica to another node from which the response is expected because the response is sent via the replication channel. In other words, there should be a full-mesh connection between the nodes.

IPROTO_RAFTSizeMP_UINTHeaderIPROTO_REQUEST_TYPEIPROTO_RAFTIPROTO_REPLICA_IDMP_UINTBodyIPROTO_RAFT_TERMMP_UINTIPROTO_RAFT_VOTEMP_UINTIPROTO_RAFT_STATEMP_UINTIPROTO_RAFT_VCLOCKMP_MAPIPROTO_RAFT_LEADER_IDMP_UINTIPROTO_RAFT_IS_LEADER_SEENMP_BOOL

IPROTO_REPLICA_ID is the ID of the replica from which the request came.

Code: 0x1f.

See box.ctl.promote().

IPROTO_RAFT_PROMOTESizeMP_UINTHeaderIPROTO_REQUEST_TYPEIPROTO_RAFT_PROMOTEIPROTO_REPLICA_IDMP_UINTIPROTO_LSNMP_UINTBodyIPROTO_REPLICA_IDMP_UINTIPROTO_LSNMP_UINTIPROTO_TERMMP_UINT

In the header:

  • IPROTO_REPLICA_ID is the replica ID of the node that sent the request.
  • IPROTO_LSN is the actual LSN of the promote operation as recorded in the WAL.

In the body:

  • IPROTO_REPLICA_ID is the replica ID of the previous synchronous queue owner.
  • IPROTO_LSN is the LSN of the last operation on the previous synchronous queue owner.
  • IPROTO_TERM is the term in which the node that sent the request becomes the synchronous queue owner. This term corresponds to the value of box.info.synchro.queue.term on the instance.

Code: 0x20.

See box.ctl.demote().

IPROTO_RAFT_DEMOTESizeMP_UINTHeaderIPROTO_REQUEST_TYPEIPROTO_RAFT_DEMOTEIPROTO_REPLICA_IDMP_UINTIPROTO_LSNMP_UINTBodyIPROTO_REPLICA_IDMP_UINTIPROTO_LSNMP_UINTIPROTO_TERMMP_UINT

In the header:

  • IPROTO_REPLICA_ID is the replica ID of the node that sent the request.
  • IPROTO_LSN is the actual LSN of the demote operation as recorded in the WAL.

In the body:

  • IPROTO_REPLICA_ID is the replica ID of the node that sent the request (same as the value in the header).
  • IPROTO_LSN is the LSN of the last synchronous transaction recorded in the node’s WAL.
  • IPROTO_TERM is the term in which the queue becomes empty.

Code: 0x28.

This message is used in replication connections between Tarantool nodes in synchronous replication. It is not supposed to be used by any client applications in their regular connections.

This message confirms that the transactions that originated from the instance with id = IPROTO_REPLICA_ID (body) have achieved quorum and can be committed, up to and including LSN = IPROTO_LSN (body).

The body is a 2-item map:

IPROTO_RAFT_CONFIRMSizeMP_UINTHeaderIPROTO_REQUEST_TYPEIPROTO_RAFT_CONFIRMIPROTO_REPLICA_IDMP_INTIPROTO_LSNMP_INTIPROTO_SYNCMP_UINTBodyIPROTO_REPLICA_IDMP_INTIPROTO_LSNMP_INT

In the header:

  • IPROTO_REPLICA_ID is the ID of the replica that sends the confirm message.
  • IPROTO_LSN is the LSN of the confirmation action.

In the body:

  • IPROTO_REPLICA_ID is the ID of the instance from which the transactions originated.
  • IPROTO_LSN is the LSN up to which the transactions should be confirmed.

Prior to Tarantool v. 2.10.0, IPROTO_RAFT_CONFIRM was called IPROTO_CONFIRM.

Code: 0x29.

This message is used in replication connections between Tarantool nodes in synchronous replication. It is not supposed to be used by any client applications in their regular connections.

This message says that the transactions that originated from the instance with id = IPROTO_REPLICA_ID (body) couldn’t achieve quorum for some reason and should be rolled back, down to LSN = IPROTO_LSN (body) and including it.

The body is a 2-item map:

IPROTO_RAFT_ROLLBACKSizeMP_UINTHeaderIPROTO_REQUEST_TYPEIPROTO_RAFT_ROLLBACKIPROTO_REPLICA_IDMP_INTIPROTO_LSNMP_INTIPROTO_SYNCMP_UINTBodyIPROTO_REPLICA_IDMP_INTIPROTO_LSNMP_INT

In the header:

  • IPROTO_REPLICA_ID is the ID of the replica that sends the rollback message.
  • IPROTO_LSN is the LSN of the rollback action.

In the body:

  • IPROTO_REPLICA_ID is the ID of the instance from which the transactions originated.
  • IPROTO_LSN is the LSN starting with which all pending synchronous transactions should be rolled back.

Prior to Tarantool v. 2.10.0, IPROTO_RAFT_ROLLBACK was called IPROTO_ROLLBACK.

Found what you were looking for?
Feedback