Monitoring a replica set
To learn what instances belong to the replica set and obtain statistics for all these instances, issue a box.info.replication request:
tarantool> box.info.replication
---
replication:
1:
id: 1
uuid: b8a7db60-745f-41b3-bf68-5fcce7a1e019
lsn: 88
2:
id: 2
uuid: cd3c7da2-a638-4c5d-ae63-e7767c3a6896
lsn: 31
upstream:
status: follow
idle: 43.187747001648
peer: replicator@192.168.0.102:3301
lag: 0
downstream:
vclock: {1: 31}
3:
id: 3
uuid: e38ef895-5804-43b9-81ac-9f2cd872b9c4
lsn: 54
upstream:
status: follow
idle: 43.187621831894
peer: replicator@192.168.0.103:3301
lag: 2
downstream:
vclock: {1: 54}
...
This report is for a master-master replica set of three instances, each having its own instance id, UUID and log sequence number.
The request was issued at master #1, and the reply includes statistics for the other two masters, given in regard to master #1.
The primary indicators of replication health are:
idle: the time (in seconds) since the instance received the last event from a master.
If the master has no updates to send to the replicas, it sends heartbeat messages every replication_timeout seconds. The master is programmed to disconnect if it does not see acknowledgments of the heartbeat messages within
replication_timeout
* 4 seconds.Therefore, in a healthy replication setup,
idle
should never exceedreplication_timeout
: if it does, either the replication is lagging seriously behind, because the master is running ahead of the replica, or the network link between the instances is down.lag: the time difference between the local time at the instance, recorded when the event was received, and the local time at another master recorded when the event was written to the write ahead log on that master.
Since the
lag
calculation uses the operating system clocks from two different machines, do not be surprised if it’s negative: a time drift may lead to the remote master clock being consistently behind the local instance’s clock.For multi-master configurations,
lag
is the maximal lag.
For better understanding, see the following diagram illustrating the upstream
and downstream
connections within the replica set of three instances: