Top.Mail.Ru
box.info.replication | Tarantool
Tarantool
Check out the new release 2.8

box.info.replication

box.info.replication

box.info.replication

The replication section of box.info() is a table array with statistics for all instances in the replica set that the current instance belongs to (see also “Monitoring a replica set”):

In the following, n is the index number of one table item, for example replication[1], which has data about server instance number 1, which may or may not be the same as the current instance (the “current instance” is what is responding to box.info).

  • replication[n].id is a short numeric identifier of instance n within the replica set. This value is stored in the box.space._cluster system space.
  • replication[n].uuid is a globally unique identifier of instance n. This value is stored in the box.space._cluster system space.
  • replication[n].lsn is the log sequence number (LSN) for the latest entry in instance n’s write ahead log (WAL).
  • replication[n].upstream appears (is not nil) if the current instance is following or intending to follow instance n, which ordinarily means replication[n].upstream.status = follow, replication[n].upstream.peer = url of instance n which is being followed, replication[n].lag and idle = the instance’s speed, described later. Another way to say this is: replication[n].upstream will appear when replication[n].upstream.peer is not of the current instance, and is not read-only, and was specified in box.cfg{replication={...}}, so it is shown in box.cfg.replication.
  • replication[n].upstream.status is the replication status of the connection with instance n:
    • auth means that authentication is happening.
    • connecting means that connection is happening.
    • disconnected means that it is not connected to the replica set (due to network problems, not replication errors).
    • follow means that the current instance’s role is “replica” (read-only, or not read-only but acting as a replica for this remote peer in a master-master configuration), and is receiving or able to receive data from instance n’s (upstream) master.
    • stopped means that replication was stopped due to a replication error (for example duplicate key).
    • sync means that the master and replica are synchronizing to have the same data.
  • replication[n].upstream.idle is the time (in seconds) since the last event was received. This is the primary indicator of replication health. See more in Monitoring a replica set.
  • replication[n].upstream.lag is the time difference between the local time of instance n, recorded when the event was received, and the local time at another master recorded when the event was written to the write ahead log on that master. See more in Monitoring a replica set.

  • replication[n].upstream.message contains an error message in case of a degraded state, otherwise it is nil.

  • replication[n].downstream appears (is not nil) with data about an instance that is following instance n or is intending to follow it, which ordinarily means replication[n].downstream.status = follow,

  • replication[n].downstream.vclock contains the vector clock, which is a table of ‘id, lsn’ pairs, for example vclock: {1: 3054773, 4: 8938827, 3: 285902018}. (Notice that the table may have multiple pairs although vclock is a singular name).

    Even if instance n is removed, its values will still appear here; however, its values will be overridden if an instance joins later with the same UUID. Vector clock pairs will only appear if lsn > 0.

    replication[n].downstream.vclock may be the same as the current instance’s vclock (box.info.vclock) because this is for all known vclock values of the cluster. A master will know what is in a replica’s copy of vclock because, when the master makes a data change, it sends the change information to the replica (including the master’s vector clock), and the replica replies with what is in its entire vector clock table.

    Also the replica sends its entire vector clock table in response to a master’s heartbeat message, see the heartbeat-message examples in section Binary protocol – replication.

  • replication[n].downstream.idle is the time (in seconds) since the last time that instance n sent events through the downstream replication.

  • replication[n].downstream.status is the replication status for downstream replications:

    • stopped means that downstream replication has stopped,
    • follow means that downstream replication is in progress (instance n is ready to accept data from the master or is currently doing so).
  • replication[n].downstream.message and replication[n].downstream.system_message will be nil unless a problem occurs with the connection. For example, if instance n goes down, then one may see status = 'stopped', message = 'unexpected EOF when reading from socket', and system_message = 'Broken pipe'. See also degraded state.