Cluster instance lifecycle
Every instance in the cluster has an internal state machine. It helps manage cluster operation and describe a distributed system simpler.
Instance lifecycle starts with a
During the initialization,
Cartridge instance binds TCP (iproto) and UDP sockets
(SWIM), checks working directory.
Depending on the result, it enters one
of the following states:
If the working directory is clean and neither snapshots nor cluster-wide
configuration files exist, the instance enters the
The instance starts to accept iproto requests (Tarantool binary protocol) and remains in the state until the user decides to join it to a cluster (to create replicaset or join an existing one).
After that, the instance moves to the
If the instance finds all configuration files and snapshots, it enters the
The instance does not load the files and snapshots yet, because it will download and validate the config first.
On success, the state enters the
On failure, it will move to the
Config is found, loaded and validated. The next step is instance
configuring. If there are any snapshots, the instance will change its
RecoveringSnapshot. Otherwise, it will move to
BootstrappingBox state. By default, all instances start in read-only mode
and don’t start listening until bootstrap/recovery finishes.
The following events can cause instance initialization error:
- Error occurred during
cartridge.remote-control’s connection to binary port
config.ymlfrom workdir (
tmp/), while snapshots are present
- Error while loading configuration from disk
- Invalid config - Server is not present in the cluster configuration
Configuring arguments for
box.cfg if snapshots or config files are
box.cfg execution. Setting up users and stopping
remote-control. The instance will try to start listening to full-featured
iproto protocol. In case of failed attempt instance will change its
BootError. On success, the instance enters the
If there is no replicaset in cluster-wide
config, the instance will set the state to
If snapshots are present,
box.cfg will start a recovery process.
After that, the process is similar to
This state can be caused by the following events:
- Failed binding to binary port for iproto usage
- Server is missing in cluster-wide config
- Replicaset is missing in cluster-wide config
- Failed replication configuration
During this state, a configuration of servers and replicasets is being
performed. Eventually, cluster topology, which is described in the config, is
implemented. But in case of an error instance, the state moves to
BootError. Otherwise, it proceeds to configuring roles.
This state follows the successful configuration of replicasets and cluster topology. The next step is a role configuration.
The state of role configuration. Instance enters this state while
initial setup, after failover trigger(
failover.lua) or after
altering cluster-wide config(
Successful role configuration.
Error during role configuration.