Cluster administrator’s guide

Cluster administrator’s guide

Cluster administrator’s guide

This guide explains how to deploy and manage the Tarantool Enterprise cluster. For more information on managing Tarantool instances, see the Tarantool User’s Guide.

Before deploying the cluster, familiarize yourself with the notion of cluster roles and deploy Tarantool instances according to the desired cluster topology.

Deploying the cluster

To deploy the cluster, first, configure all the necessary replica sets according to the desired cluster topology, then bootstrap the cluster.

You can do this via the Web interface which is available at http://<instance_hostname>:<instance_http_port> (e.g., http://localhost:8081).

In the web interface, do the following:

  1. Depending on the authentication state:

    • If enabled (in production), enter your credentials and click Log in:

      ../_images/auth_creds.png

      And proceed to configuring the cluster.

    • If disabled (for easier testing), simply proceed to configuring the cluster.

      Note

      To enable it manually, click Log in in the upper left corner, enter the credentials, click another Log in and the Auth: disabled switch (it turns into Auth: enabled).

  2. Click Create next to the first unconfigured server to create the first replica set solely for the router.

    ../_images/unconfigured-router.png
  3. In the pop-up window, check the vshard-router role.

    ../_images/create-router.png

    Additionally, check every custom cluster role meant to handle compute-intensive workloads.

    Important

    As described in the built-in roles section, it is a good practice to enable workload-specific cluster roles on instances running on physical servers with workload-specific hardware.

    And click Submit.

  4. (Optional) If required by topology, populate the newly created replica set with more routers (and compute nodes in parallel):

    1. Click Join next to other unconfigured instances:

      ../_images/join-router.png
    2. Select the first router (and compute node) and click Submit:

      ../_images/select-router.png

      The current instance will inherit cluster roles from the replica set.

  5. Create another replica set for storage nodes by repeating the first step and checking the vshard-storage role.

    ../_images/create-storage.png

    Additionally, check every custom cluster role meant to handle transaction-intensive workloads.

    And click Submit.

  6. Click Join next to another unconfigured server dedicated for transaction-intensive workloads.

    ../_images/join-replicaset.png
  7. Select the second (vshard-storage) replica set, and click Submit to add the server to it.

    ../_images/join-storage.png
  8. Depending on cluster topology:

    • add more instances to the first or second sets, or
    • create a new one and populate it with instances meant to handle either compute or transactions.

    For example:

    ../_images/final-cluster.png
  9. (Optional) By default, all new vshard-storage replica sets get a weight of 1 before the vshard’s bootstrap in the next step.

    Note

    In case you add a new replica set after the bootstrap as described in the topology change section, it will get a weight of 0 by default.

    To make different replica sets store different numbers of buckets, click Edit next to a replica set, change its default weight, and click Submit:

    ../_images/change-weight.png

    For more information on buckets and replica set’s weights, see the vshard module documentation.

  10. Bootstrap vshard by clicking the corresponding button and OK, or by saying cluster.admin.boostrap_vshard() over the administrative console.

    ../_images/bootstrap-vshard.png

    This command creates virtual buckets and distributes them among storages.

From now on, all the necessary cluster configuration can be done via the web interface.

Automatic configuration synchronization

All instances in Tarantool cluster have the same configuration. To this end, every instance stores a copy of the configuration file. The cluster keeps these copies in sync.

The distributed configuration file is changed automatically upon submits in the Web interface and includes cluster topology and role descriptions.

The cluster validates any configuration change and rejects inappropriate ones.

Managing the cluster

This chapter explains how to:

  • change the cluster topology,
  • enable automatic failover,
  • switch the replica set’s master manually,
  • deactivate replica sets, and
  • expel instances.

Changing the cluster topology

Upon adding a newly deployed instance to a new or existing replica set:

  1. The cluster validates the configuration update by checking if the new instance is available using the membership module.

    Note

    The membership module works over the UDP protocol and can operate before the box.cfg function is called.

    All the nodes in the cluster must be healthy for validation success.

  2. The new instance waits until another instance in the cluster receives the configuration update and discovers it, again, using the membership module. On this step, the new instance does not have a UUID yet.

  3. Once the instance realizes its presence is known to the cluster, it calls the box.cfg function and starts living its life.

    For more information, see the box.cfg submodule reference.

An optimal strategy for connecting new nodes to the cluster is to deploy a new zero-weight replica set instance by instance, and then increase the weight. Once the weight is updated and all cluster nodes are notified of the configuration change, buckets start migrating to new nodes.

To populate the cluster with more nodes, do the following:

  1. Deploy new Tarantool instances as described in the deployment section.

    If new nodes do not appear in the Web interface, click Probe server and specify their URIs manually.

    ../_images/probe-server.png
  2. In the Web interface, click Create to add an unconfigured instance to a new replica set:

    ../_images/new-unconfig.png

    Check the necessary roles and click Submit.

    Note

    In case you are adding a new vshard-storage instance, remember that all such instances get a 0 weight by default after the vshard’s bootstrap which happened during the initial cluster deployment.

    ../_images/zero.png
  3. Click Join next to another unconfigured server, select the new replica set, and click Submit.

    ../_images/join-new-set.png
  4. If necessary, repeat the above step for more instances to reach the desired redundancy level.

  5. In case you are deploying a new vshard-storage replica set, populate it with data when you are ready.

    ../_images/change-weight.png

    Click Edit next to the replica set in question, increase its weight, and click Submit to start data rebalancing.

Data rebalancing

Rebalancing (resharding) is initiated periodically and upon adding a new replica set with a non-zero weight to the cluster. For more information, see the rebalancing process section of the vshard module documentation.

The most convenient way to trace through the process of rebalancing is to monitor the number of active buckets on storage nodes. Initially, a newly added replica set has 0 active buckets. After a few minutes, the background rebalancing process begins to transfer buckets from other replica sets to the new one. Rebalancing continues until the data is distributed evenly among all replica sets.

To monitor the current number of buckets, connect to any Tarantool instance over the administrative console, and say:

$ tarantool> vshard.storage.info().bucket
---
- receiving: 0
  active: 1000
  total: 1000
  garbage: 0
  sending: 0
...

The number of buckets may be increasing or decreasing depending on whether the rebalancer is migrating buckets to or from the storage node.

For more information on the monitoring parameters, see the monitoring storages section.

Deactivating replica sets

To deactivate an entire replica set (e.g., to perform maintenance on it) means to move all of its buckets to other sets.

To deactivate a set, do the following:

  1. Click Edit next to the set in question.

  2. Set its weight to 0 and click Submit:

    ../_images/zero-weight.png
  3. Wait for the rebalancing process to finish migrating all the set’s buckets away. You can monitor the current bucket number as described in the data rebalancing section.

Expelling instances

Once an instance is expelled, it can never participate in the cluster again as every instance will reject it.

To expel an instance, click Expel next to it, then OK:

../_images/expelling-instance.png

Enabling automatic failover

In a master-replica cluster configuration with automatic failover enabled, if the user-specified master of any replica set fails, the cluster automatically chooses the next replica from the priority list and grants it the active master role. When the failed master comes back online, its role is restored and the active master, again, becomes a replica.

To set the priority in a replica set, click Edit next to the set in question, drag replicas to their place in the priority list, and click Submit:

../_images/failover-priority.png

The failover is disabled by default. To enable it, click Failover: disabled:

../_images/failover.png

And, in the Failover control window, click Enable:

../_images/failover-control.png

The failover status will change to enabled:

../_images/enabled-failover.png

For more information, see the replication section of the Tarantool manual.

Switching the replica set’s master

To manually switch the master in a replica set, click the Edit button next to the replica set in question:

../_images/edit-replica-set.png

Select another master and click Submit:

../_images/switch-master.png

Exploring spaces

The web interface lets you connect (in the browser) to any instance in the cluster and see what spaces it stores (if any) and their contents.

To explore spaces:

  1. Open the Space Explorer tab in the menu on the left:

    ../_images/space_explr_tab.png

    The instances you see were deployed in scope of the example cluster application.

  2. Click connect next to an instance that stores data. The basic sanity-check (test.py) of the example application puts sample data to one replica set (shard), so its master and replica store the data in their spaces:

    ../_images/spaces_with_data.png

    When connected to a instance, the space explorer shows a table with basic information on its spaces. For more information, see the box.space reference.

    To see hidden spaces, tick the corresponding checkbox:

    ../_images/hidden_spaces.png
  3. Click the space’s name to see its format and contents:

    ../_images/space_contents.png

    To search the data, select an index and, optionally, its iteration type from the drop-down lists, and enter the index value:

    ../_images/space_search.png

Resolving conflicts

Tarantool has an embedded mechanism for asynchronous replication. As a consequence, records are distributed among the replicas with a delay, so conflicts can arise.

To prevent conflicts, the special trigger space.before_replace is used. It is executed every time before making changes to the table for which it was configured. The trigger function is implemented in the Lua programming language. This function takes the original and new values of the tuple to be modified as its arguments. The returned value of the function is used to change the result of the operation: this will be the new value of the modified tuple.

For insert operations, the old value is absent, so nil is passed as the first argument.

For delete operations, the new value is absent, so nil is passed as the second argument. The trigger function can also return nil, thus turning this operation into delete.

This example shows how to use the space.before_replace trigger to prevent replication conflicts. Suppose we have a box.space.test table that is modified in multiple replicas at the same time. We store one payload field in this table. To ensure consistency, we also store the last modification time in each tuple of this table and set the space.before_replace trigger, which gives preference to newer tuples. Below is the code in Lua:

fiber = require('fiber')
-- define a function that will modify the function test_replace(tuple)
        -- add a timestamp to each tuple in the space
        tuple = box.tuple.new(tuple):update{{'!', 2, fiber.time()}}
        box.space.test:replace(tuple)
end
box.cfg{ } -- restore from the local directory
-- set the trigger to avoid conflicts
box.space.test:before_replace(function(old, new)
        if old ~= nil and new ~= nil and new[2] < old[2] then
                return old -- ignore the request
        end
        -- otherwise apply as is
end)
box.cfg{ replication = {...} } -- subscribe

Upgrading in production

To upgrade either a single instance or a cluster, you need a new version of the packaged (archived) application.

A single instance upgrade is simple:

  1. Upload the package (archive) to the server.
  2. Stop the current instance.
  3. Deploy the new one as described in deploying packaged applications (or archived ones).

Cluster upgrade

To upgrade a cluster, choose one of the following scenarios:

  • Cluster shutdown. Recommended for backward-incompatible updates, requires downtime.
  • Instance by instance. Recommended for backward-compatible updates, does not require downtime.

To upgrade the cluster, do the following:

  1. Schedule a downtime or plan for the instance-by-instance upgrade.
  2. Upload a new application package (archive) to all servers.

Next, execute the chosen scenario:

  • Cluster shutdown:
    1. Stop all instances on all servers.
    2. Deploy the new package (archive) on every server.
  • Instance by instance. Do the following in every replica set in succession:
    1. Stop a replica on any server.
    2. Deploy the new package (archive) in place of the old replica.
    3. Promote the new replica to a master as described in switching the master section.
    4. Redeploy the old master and the rest of the instances in the replica set.
    5. Be prepared to resolve possible logic conflicts.

Monitoring cluster via CLI

This section describes parameters you can monitor over the administrative console.

Connecting to nodes via CLI

Each Tarantool node (router/storage) provides an administrative console (Command Line Interface) for debugging, monitoring, and troubleshooting. The console acts as a Lua interpreter and displays the result in the human-readable YAML format. To connect to a Tarantool instance via the console, say:

$ tarantoolctl connect <instance_hostname>:<port>

where the <instance_hostname>:<port> is the instance’s URI.

Monitoring storages

Use vshard.storage.info() to obtain information on storage nodes.

Output example

$ tarantool> vshard.storage.info()
---
- replicasets:
    <replicaset_2>:
    uuid: <replicaset_2>
    master:
        uri: storage:storage@127.0.0.1:3303
    <replicaset_1>:
    uuid: <replicaset_1>
    master:
        uri: storage:storage@127.0.0.1:3301
  bucket: <!-- buckets status
    receiving: 0 <!-- buckets in the RECEIVING state
    active: 2 <!-- buckets in the ACTIVE state
    garbage: 0 <!-- buckets in the GARBAGE state (are to be deleted)
    total: 2 <!-- total number of buckets
    sending: 0 <!-- buckets in the SENDING state
  status: 1 <!-- the status of the replica set
  replication:
    status: disconnected <!-- the status of the replication
    idle: <idle>
  alerts:
  - ['MASTER_IS_UNREACHABLE', 'Master is unreachable: disconnected']

Status list

Code Critical level Description
0 Green A replica set works in a regular way.
1 Yellow There are some issues, but they don’t affect a replica set efficiency (worth noticing, but don’t require immediate intervention).
2 Orange A replica set in in a degraded state.
3 Red A replica set is disabled.

Potential issues

  • MISSING_MASTER — No master node in the replica set configuration.

    Critical level: Orange.

    Cluster condition: Service is degraded for data-change requests to the replica set.

    Solution: Set the master node for the replica set in the configuration using API.

  • UNREACHABLE_MASTER — No connection between the master and the replica.

    Critical level:

    • If idle value doesn’t exceed T1 threshold (1 s.) — Yellow,
    • If idle value doesn’t exceed T2 threshold (5 s.) — Orange,
    • If idle value exceeds T3 threshold (10 s.) — Red.

    Cluster condition: For read requests to replica, the data may be obsolete compared with the data on master.

    Solution: Reconnect to the master: fix the network issues, reset the current master, switch to another master.

  • LOW_REDUNDANCY — Master has access to a single replica only.

    Critical level: Yellow.

    Cluster condition: The data storage redundancy factor is equal to 2. It is lower than the minimal recommended value for production usage.

    Solution: Check cluster configuration:

    • If only one master and one replica are specified in the configuration, it is recommended to add at least one more replica to reach the redundancy factor of 3.
    • If three or more replicas are specified in the configuration, consider checking the replicas’ states and network connection among the replicas.
  • INVALID_REBALANCING — Rebalancing invariant was violated. During migration, a storage node can either send or receive buckets. So it shouldn’t be the case that a replica set sends buckets to one replica set and receives buckets from another replica set at the same time.

    Critical level: Yellow.

    Cluster condition: Rebalancing is on hold.

    Solution: There are two possible reasons for invariant violation:

    • The rebalancer has crashed.
    • Bucket states were changed manually.

    Either way, please contact Tarantool support.

  • HIGH_REPLICATION_LAG — Replica’s lag exceeds T1 threshold (1 sec.).

    Critical level:

    • If the lag doesn’t exceed T1 threshold (1 sec.) — Yellow;
    • If the lag exceeds T2 threshold (5 sec.) — Orange.

    Cluster condition: For read-only requests to the replica, the data may be obsolete compared with the data on the master.

    Solution: Check the replication status of the replica. Further instructions are given in the :ref: troubleshooting guide <admin-troubleshoot>`.

  • OUT_OF_SYNC — Mal-synchronization occured. The lag exceeds T3 threshold (10 sec.).

    Critical level: Red.

    Cluster condition: For read-only requests to the replica, the data may be obsolete compared with the data on the master.

    Solution: Check the replication status of the replica. Further instructions are given in the :ref: troubleshooting guide <admin-troubleshoot>`.

  • UNREACHABLE_REPLICA — One or multiple replicas are unreachable.

    Critical level: Yellow.

    Cluster condition: Data storage redundancy factor for the given replica set is less than the configured factor. If the replica is next in the queue for rebalancing (in accordance with the weight configuration), the requests are forwarded to the replica that is still next in the queue.

    Solution: Check the error message and find out which replica is unreachable. If a replica is disabled, enable it. If this doesn’t help, consider checking the network.

  • UNREACHABLE_REPLICASET — All replicas except for the current one are unreachable. Critical level: Red.

    Cluster condition: The replica stores obsolete data.

    Solution: Check if the other replicas are enabled. If all replicas are enabled, consider checking network issues on the master. If the replicas are disabled, check them first: the master might be working properly.

Monitoring routers

Use vshard.router.info() to obtain information on the router.

Output example

$ tarantool> vshard.router.info()
---
- replicasets:
    <replica set UUID>:
      master:
        status: <available / unreachable / missing>
        uri: <!-- URI of master
        uuid: <!-- UUID of instance
      replica:
        status: <available / unreachable / missing>
        uri: <!-- URI of replica used for slave requests
        uuid: <!-- UUID of instance
      uuid: <!-- UUID of replica set
    <replica set UUID>: ...
    ...
  status: <!-- status of router
  bucket:
    known: <!-- number of buckets with the known destination
    unknown: <!-- number of other buckets
  alerts: [<alert code>, <alert description>], ...

Status list

Code Critical level Description
0 Green The router works in a regular way.
1 Yellow Some replicas sre unreachable (affects the speed of executing read requests).
2 Orange Service is degraded for changing data.
3 Red Service is degraded for reading data.

Potential issues

Note

Depending on the nature of the issue, use either the UUID of a replica, or the UUID of a replica set.

  • MISSING_MASTER — The master in one or multiple replica sets is not specified in the configuration.

    Critical level: Orange.

    Cluster condition: Partial degrade for data-change requests.

    Solution: Specify the master in the configuration.

  • UNREACHABLE_MASTER — The router lost connection with the master of one or multiple replica sets.

    Critical level: Orange.

    Cluster condition: Partial degrade for data-change requests.

    Solution: Restore connection with the master. First, check if the master is enabled. If it is, consider checking the network.

  • SUBOPTIMAL_REPLICA — There is a replica for read-only requests, but this replica is not optimal according to the configured weights. This means that the optimal replica is unreachable.

    Critical level: Yellow.

    Cluster condition: Read-only requests are forwarded to a backup replica.

    Solution: Check the status of the optimal replica and its network connection.

  • UNREACHABLE_REPLICASET — A replica set is unreachable for both read-only and data-change requests.

    Critical Level: Red.

    Cluster condition: Partial degrade for read-only and data-change requests.

    Solution: The replica set has an unreachable master and replica. Check the error message to detect this replica set. Then fix the issue in the same way as for UNREACHABLE_REPLICA.

Troubleshooting

Please see the Troubleshooting guide in the Tarantool manual.

Disaster recovery

Please see the section Disaster recovery in the Tarantool manual.

Backups

Please see the section Backups in the Tarantool manual.

Reporting issues

To report a bug or submit an update request for this document, please create an issue in our repository at GitLab.