Top.Mail.Ru
Запуск сервера с репликацией | Tarantool
 
Справочники / Детали реализации / Запуск сервера с репликацией
Справочники / Детали реализации / Запуск сервера с репликацией

Запуск сервера с репликацией

Запуск сервера с репликацией

In addition to the recovery process described above, the server must take additional steps and precautions if replication is enabled.

Once again the startup procedure is initiated by the box.cfg{} request. One of the box.cfg parameters may be replication that specifies replication source(-s). We will refer to this replica, which is starting up due to box.cfg, as the «local» replica to distinguish it from the other replicas in a replica set, which we will refer to as «distant» replicas.

#1. If there is no snapshot .snap file and the „replication“ parameter is empty and cfg.read_only=false:
then the local replica assumes it is an unreplicated «standalone» instance, or is the first replica of a new replica set. It will generate new UUIDs for itself and for the replica set. The replica UUID is stored in the _cluster space (unless the replica is anonymous); the replica set UUID is stored in the _schema space. Since a snapshot contains all the data in all the spaces, that means the local replica’s snapshot will contain the replica UUID and the replica set UUID. Therefore, when the local replica restarts on later occasions, it will be able to recover these UUIDs when it reads the .snap file.

#1a. If there is no snapshot .snap file and the „replication“ parameter is empty and cfg.read_only=true:
When an instance is starting with box.cfg({... read_only = true}), it cannot be the first replica of a new replica set because the first replica must be a master. Therefore an error message will occur: ER_BOOTSTRAP_READONLY. To avoid this, change the setting for this (local) instance to read_only = false, or ensure that another (distant) instance starts first and has the local instance’s UUID in its _cluster space. In the latter case, if ER_BOOTSTRAP_READONLY still occurs, set the local instance’s box.replication_connect_timeout to a larger value.

#2. If there is no snapshot .snap file and the „replication“ parameter is not empty and the ``_cluster`` space contains no other replica UUIDs:
then the local replica assumes it is not a standalone instance, but is not yet part of a replica set. It must now join the replica set. It will send its replica UUID to the first distant replica which is listed in replication and which will act as a master. This is called the «join request». When a distant replica receives a join request, it will send back:

  1. UUID набора реплик, в который входит удаленная реплика
  2. содержимое файла снимка .snap удаленной реплики.
    Когда локальная реплика получает эту информацию, она размещает UUID набора реплики в своем спейсе _schema, UUID удаленной реплики и информацию о подключении в своем спейсе _cluster, а затем создает снимок, который содержит все данные, отправленные удаленной репликой. Затем, если в WAL-файлах .xlog локальной реплики содержатся данные, они отправляются на удаленную реплику. Удаленная реплика получается данные и обновляет свою копию данных, а затем добавляет UUID локальной реплики в свой спейс _cluster.

#3. If there is no snapshot .snap file and the „replication“ parameter is not empty and the ``_cluster`` space contains other replica UUIDs:
then the local replica assumes it is not a standalone instance, and is already part of a replica set. It will send its replica UUID and replica set UUID to all the distant replicas which are listed in replication. This is called the «on-connect handshake». When a distant replica receives an on-connect handshake:

  1. удаленная реплика сопоставляет свою версию UUID набора реплик с UUID, переданным в ходе подтверждения связи при подключении. Если они не совпадают, связь не устанавливается, и локальная реплика отобразит ошибку.
  2. удаленная реплика ищет запись о подключающемся экземпляре в своем спейсе _cluster. Если такой записи нет, связь не устанавливается.
    Если есть, связь подтверждается. Удаленная реплика выполняет чтение любой новой информации из своих файлов .snap и .xlog и отправляет новые запросы на локальную реплику.

In the end … the local replica knows what replica set it belongs to, the distant replica knows that the local replica is a member of the replica set, and both replicas have the same database contents.

#4. If there is a snapshot file and replication source is not empty:
first the local replica goes through the recovery process described in the previous section, using its own .snap and .xlog files. Then it sends a «subscribe» request to all the other replicas of the replica set. The subscribe request contains the server vector clock. The vector clock has a collection of pairs „server id, lsn“ for every replica in the _cluster system space. Each distant replica, upon receiving a subscribe request, will read its .xlog files“ requests and send them to the local replica if (lsn of .xlog file request) is greater than (lsn of the vector clock in the subscribe request). After all the other replicas of the replica set have responded to the local replica’s subscribe request, the replica startup is complete.

The following temporary limitations apply for versions 1.7 and 2.1:

  • URI в параметре replication должны быть указаны в одинаковом порядке на всех репликах. Это необязательно, но помогает соблюдать консистентность.
  • The maximum number of entries in the _cluster space is 32. Tuples for out-of-date replicas are not automatically re-used, so if this 32-replica limit is reached, users may have to reorganize the _cluster space manually.