Мониторинг набора реплик
To learn what instances belong to the replica set and obtain statistics for all these instances, execute a box.info.replication request. The output below shows the replication status for a replica set containing one master and two replicas:
manual_leader:instance001> box.info.replication --- - 1: id: 1 uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660 lsn: 21 name: instance001 2: id: 2 uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23 lsn: 0 upstream: status: follow idle: 0.052655000000414 peer: replicator@127.0.0.1:3302 lag: 0.00010204315185547 name: instance002 downstream: status: follow idle: 0.09503500000028 vclock: {1: 21} lag: 0.00026917457580566 3: id: 3 uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6 lsn: 0 upstream: status: follow idle: 0.77522099999987 peer: replicator@127.0.0.1:3303 lag: 0.0001838207244873 name: instance003 downstream: status: follow idle: 0.33186100000012 vclock: {1: 21} lag: 0 ...
The following diagram illustrates the upstream
and downstream
connections if box.info.replication
executed at the master instance (instance001
):
If box.info.replication
is executed on instance002
, the upstream
and downstream
connections look as follows:
This means that statistics for replicas are given in regard to the instance on which box.info.replication
is executed.
Основные индикаторы работоспособности репликации:
idle: the time (in seconds) since the instance received the last event from a master.
If the master has no updates to send to the replicas, it sends heartbeat messages every replication_timeout seconds. The master is programmed to disconnect if it does not see acknowledgments of the heartbeat messages within
replication_timeout
* 4 seconds.Таким образом, в работоспособном состоянии значение
idle
никогда не должно превышать значениеreplication_timeout
: в противном случае, либо репликация сильно отстает, поскольку мастер опережает реплику, либо отсутствует сетевое подключение между экземплярами.lag: the time difference between the local time at the instance, recorded when the event was received, and the local time at another master recorded when the event was written to the write-ahead log on that master.
Поскольку при расчете
отставания
используются часы операционной системы с двух разных машин, не удивляйтесь, получив отрицательное число: смещение во времени может привести к постоянному запаздыванию времени на удаленном мастере относительно часов на локальном экземпляре.