Updated at 2026-05-09 03:30:10.914749

Overview

Tarantool combines an in-memory DBMS and a Lua server in a single platform providing ACID-compliant storage. It comes in two editions: Community and Enterprise. The use cases for Tarantool vary from ultra-fast cache to product data marts and smart queue services.

Here are some of Tarantool’s key characteristics:

Tarantool allows executing code alongside data, which helps increase the speed of operations. Developers can implement any business logic with Lua, and a single Tarantool instance can also receive SQL requests.

Tarantool has a variety of compatible modules (Lua rocks). You can pick the ones that you need and install them manually.

Tarantool runs on Linux (x86_64, aarch64), macOS (x86_64, aarch64), and FreeBSD (x86_64).

You can use Tarantool with a programming language you’re familiar with. For this purpose, a number of connectors are provided.

Tarantool comes in two editions: the open-source Community Edition (CE) and the commercial Enterprise Edition (EE).

Tarantool Community Edition lets you develop applications and speed up a system in operation. It features synchronous replication, affords easy scalability, and includes tools to develop efficient applications. The Tarantool community helps with any practical questions regarding the Community Edition.

Tarantool Enterprise Edition provides advanced tools for administration, deployment, and security management, along with premium support services. This edition includes all the Community Edition features and is more predictable in terms of solution cost and maintenance. The Enterprise Edition is shipped as an SDK and includes a number of closed-source modules.

Примечание

In this documentation, topics related to Enterprise Edition features are marked with an Enterprise Edition admonition.

Enterprise-версия предлагает дополнительные возможности по разработке и эксплуатации кластерных приложений, например:

Enterprise-версия распространяется в форме SDK, который включает следующие ключевые компоненты:

  • Расширенная Enterprise-версия утилиты tt.
  • Tarantool Cluster Manager – визуальный веб-инструмент для управления кластерами Tarantool.

  • Primary storage
    • No secondary storage required
  • Tolerance to high write loads
  • Support of relational approaches
  • Composite secondary indexes
    • Data access, data slices
  • Predictable request latency

  • Write-behind caching
  • Secondary index support
  • Complex invalidation algorithm support

  • Support of various identification techniques
  • Advanced task lifecycle management
    • Task scheduling
    • Archiving of completed tasks

  • Arbitrary data flows from many sources
  • Incoming data processing
  • Storage
  • Background cycle processing
    • Scheduling support

Руководство для начинающих

This section will get you acquainted with Tarantool.

Installing Tarantool

This section explains how to download and set up Tarantool Enterprise Edition and run a sample application provided with it. To learn how to download and install Tarantool Community Edition, see the Download page.

Примечание

The tt utility provides the ability to install and work with multiple Tarantool versions.

Ниже представлены рекомендуемые системные требования для запуска Tarantool Enterprise.

Чтобы обеспечить полную отказоустойчивость системы распределенного хранения данных, необходимы как минимум три физических компьютера или виртуальных сервера.

For testing/development purposes, the system can be deployed using a smaller number of servers. However, it is not recommended to use such configurations for production.

  1. Tarantool Enterprise поддерживает операционные системы Red Hat Enterprise Linux и CentOS версии 7.5 и выше.

    Примечание

    Tarantool Enterprise может работать на других дистрибутивах Linux на основе systemd, но тестирование на них не проводится, поэтому корректная работа не гарантирована.

  2. Требуется glibc версии 2.17-260.el7_6.6 и выше. Необходимо проверить текущую версию и обновить в случае необходимости:

    $ rpm -q glibc
    glibc-2.17-196.el7_4.2
    $ yum update glibc
    

Здесь и далее по тексту под серверами хранения данных или серверами Tarantool понимаются компьютеры, которые используются для хранения и обработки данных, а под сервером администрирования понимается компьютер, с помощью которого оператор устанавливает и настраивает систему.

Кластер Tarantool работает по принципам полносвязной топологии (full mesh topology), поэтому все серверы Tarantool должны поддерживать прием и передачу данных по TCP и UDP на всех портах, которые используются экземплярами кластера (см. настройки advertise_uri: <host>:<port> и config: advertise_uri: '<host>:<port>' в файле /etc/tarantool/conf.d/*.yml для каждого экземпляра). Например:

# /etc/tarantool/conf.d/*.yml

myapp.s2-replica:
  advertise_uri: localhost:3305 # this is a TCP/UDP port
  http_port: 8085

all:
  ...
  hosts:
    storage-1:
      config:
        advertise_uri: 'vm1:3301' # this is a TCP/UDP port
        http_port: 8081

Чтобы настроить удаленный мониторинг или подключиться по административной консоли, сервер администрирования должен иметь доступ к следующим TCP-портам на серверах Tarantool:

  • 22 – чтобы использовать SSH-протокол;
  • ports specified in instance configuration to monitor the HTTP-metrics.

Кроме того, рекомендуется применить следующие настройки для sysctl на всех серверах Tarantool:

$ # TCP KeepAlive setting
$ sysctl -w net.ipv4.tcp_keepalive_time=60
$ sysctl -w net.ipv4.tcp_keepalive_intvl=5
$ sysctl -w net.ipv4.tcp_keepalive_probes=5

Эта необязательная настройка сетевого стека Linux помогает ускорить решение проблем с сетевым подключением при физическом отказе сервера. Для достижения максимальной производительности может также потребоваться настройка других параметров сетевого стека, которые не относятся к СУБД Tarantool. Для получения дополнительной информации обратитесь к разделу Руководство по оптимизации сетевой производительности (Network Performance Tuning Guide) в пользовательской документации по RHEL7.

The latest release packages of Tarantool Enterprise are available in the customer zone at Tarantool website. Please contact support@tarantool.io for access.

Каждый пакет представляет собой архив tar + gzip и включает в себя следующие компоненты и функции:

Содержимое архива:

Готовый архив tar + gzip необходимо загрузить на сервер и распаковать:

$ tar xvf tarantool-enterprise-sdk-<version>.tar.gz

Дополнительная установка не требуется, поскольку распакованные бинарные файлы практически готовы к работе. Перейдите в каталог с бинарными файлами (tarantool-enterprise) и добавьте их в путь для поиска исполняемых файлов, запустив скрипт из дистрибутива:

$ source ./env.sh

Убедитесь, что вы можете запускать данный скрипт, а также что файл со скриптом является исполняемым. В противном случае задайте разрешения в помощью команд chmod и chown.

Создаем свою первую базу данных на Tarantool

Example on GitHub: create_db

In this tutorial, you create a Tarantool database, write data to it, and select data from this database.

Before starting this tutorial:

The tt create command can be used to create an application from a predefined or custom template. In this tutorial, the application layout is prepared manually:

  1. Create a tt environment in the current directory using the tt init command.

  2. Inside the instances.enabled directory of the created tt environment, create the create_db directory.

  3. Inside instances.enabled/create_db, create the instances.yml and config.yaml files:

    • instances.yml specifies instances to run in the current environment. In this example, there is one instance:

      instance001:
      
    • config.yaml contains basic instance configuration:

      groups:
        group001:
          replicasets:
            replicaset001:
              instances:
                instance001:
                  iproto:
                    listen:
                    - uri: '127.0.0.1:3301'
      

      The instance in the configuration accepts incoming requests on the 3301 port.

  1. Start the Tarantool instance from the tt environment directory using tt start:

    $ tt start create_db
    
  2. To check the running instance, use the tt status command:

    $ tt status create_db
    INSTANCE               STATUS   PID   MODE  CONFIG  BOX      UPSTREAM
    create_db:instance001  RUNNING  8685  RW    ready   running  --
    
  3. Connect to the instance with tt connect:

    $ tt connect create_db:instance001
       • Connecting to the instance...
       • Connected to create_db:instance001
    
    create_db:instance001>
    

    This command opens an interactive Tarantool console with the create_db:instance001> prompt. Now you can enter requests in the command line.

  1. Create a space named bands:

    create_db:instance001> box.schema.space.create('bands')
    ---
    - engine: memtx
      before_replace: 'function: 0x010229d788'
      field_count: 0
      is_sync: false
      is_local: false
      on_replace: 'function: 0x010229d750'
      temporary: false
      index: []
      type: normal
      enabled: false
      name: bands
      id: 512
    - created
    ...
    
  2. Форматируйте созданный спейс, указывая имена и типы полей:

    create_db:instance001> box.space.bands:format({
                               { name = 'id', type = 'unsigned' },
                               { name = 'band_name', type = 'string' },
                               { name = 'year', type = 'unsigned' }
                           })
    ---
    ...
    

  1. Create the primary index based on the id field:

    create_db:instance001> box.space.bands:create_index('primary', { parts = { 'id' } })
    ---
    - unique: true
      parts:
      - fieldno: 1
        sort_order: asc
        type: unsigned
        exclude_null: false
        is_nullable: false
      hint: true
      id: 0
      type: TREE
      space_id: 512
      name: primary
    ...
    
  2. Create the secondary index based on the band_name field:

    create_db:instance001> box.space.bands:create_index('secondary', { parts = { 'band_name' } })
    ---
    - unique: true
      parts:
      - fieldno: 2
        sort_order: asc
        type: string
        exclude_null: false
        is_nullable: false
      hint: true
      id: 1
      type: TREE
      space_id: 512
      name: secondary
    ...
    

  1. Insert three tuples into the space:

    create_db:instance001> box.space.bands:insert { 1, 'Roxette', 1986 }
    ---
    - [1, 'Roxette', 1986]
    ...
    create_db:instance001> box.space.bands:insert { 2, 'Scorpions', 1965 }
    ---
    - [2, 'Scorpions', 1965]
    ...
    create_db:instance001> box.space.bands:insert { 3, 'Ace of Base', 1987 }
    ---
    - [3, 'Ace of Base', 1987]
    ...
    
  2. Select a tuple using the primary index:

    create_db:instance001> box.space.bands:select { 3 }
    ---
    - - [3, 'Ace of Base', 1987]
    ...
    
  3. Select tuples using the secondary index:

    create_db:instance001> box.space.bands.index.secondary:select{'Scorpions'}
    ---
    - - [2, 'Scorpions', 1965]
    ...
    

Creating a sharded cluster

Example on GitHub: sharded_cluster_crud

In this tutorial, you get a sharded cluster up and running on your local machine and learn how to manage the cluster using the tt utility. This cluster uses the following external modules:

The cluster created in this tutorial includes 5 instances: one router and 4 storages, which constitute two replica sets.

Cluster topology

Before starting this tutorial:

The tt create command can be used to create an application from a predefined or custom template. For example, the built-in vshard_cluster template enables you to create a ready-to-run sharded cluster application.

In this tutorial, the application layout is prepared manually:

  1. Create a tt environment in the current directory by executing the tt init command.

  2. Inside the empty instances.enabled directory of the created tt environment, create the sharded_cluster_crud directory.

  3. Inside instances.enabled/sharded_cluster_crud, create the following files:

    • instances.yml specifies instances to run in the current environment.
    • config.yaml specifies the cluster configuration.
    • storage.lua contains code specific for storages.
    • router.lua contains code specific for a router.
    • sharded_cluster_crud-scm-1.rockspec specifies external dependencies required by the application.

    The next Developing the application section shows how to configure the cluster and write code for routing read and write requests to different storages.

Open the instances.yml file and add the following content:

storage-a-001:
storage-a-002:
storage-b-001:
storage-b-002:
router-a-001:

This file specifies instances to run in the current environment.

This section describes how to configure the cluster in the config.yaml file.

Add the credentials configuration section:

credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [ replication ]
    storage:
      password: 'secret'
      roles: [ sharding ]

In this section, two users with the specified passwords are created:

  • The replicator user with the replication role.
  • The storage user with the sharding role.

These users are intended to maintain replication and sharding in the cluster.

Важно

It is not recommended to store passwords as plain text in a YAML configuration. Learn how to load passwords from safe storage such as external files or environment variables from Loading secrets from safe storage.

Add the iproto.advertise section:

iproto:
  advertise:
    peer:
      login: replicator
    sharding:
      login: storage

In this section, the following options are configured:

  • iproto.advertise.peer specifies how to advertise the current instance to other cluster members. In particular, this option informs other replica set members that the replicator user should be used to connect to the current instance.
  • iproto.advertise.sharding specifies how to advertise the current instance to a router and rebalancer.

The cluster topology defined in the following section also specifies the iproto.advertise.client option for each instance. This option accepts a URI used to advertise the instance to clients. For example, Tarantool Cluster Manager uses these URIs to connect to cluster instances.

Specify the total number of buckets in a sharded cluster using the sharding.bucket_count option:

sharding:
  bucket_count: 1000

Define the cluster topology inside the groups section. The cluster includes two groups:

  • storages includes two replica sets. Each replica set contains two instances.
  • routers includes one router instance.

Here is a schematic view of the cluster topology:

groups:
  storages:
    replicasets:
      storage-a:
        # ...
      storage-b:
        # ...
  routers:
    replicasets:
      router-a:
        # ...
  1. To configure storages, add the following code inside the groups section:

    storages:
      roles: [ roles.crud-storage ]
      app:
        module: storage
      sharding:
        roles: [ storage ]
      replication:
        failover: manual
      replicasets:
        storage-a:
          leader: storage-a-001
          instances:
            storage-a-001:
              iproto:
                listen:
                - uri: '127.0.0.1:3302'
                advertise:
                  client: '127.0.0.1:3302'
            storage-a-002:
              iproto:
                listen:
                - uri: '127.0.0.1:3303'
                advertise:
                  client: '127.0.0.1:3303'
        storage-b:
          leader: storage-b-001
          instances:
            storage-b-001:
              iproto:
                listen:
                - uri: '127.0.0.1:3304'
                advertise:
                  client: '127.0.0.1:3304'
            storage-b-002:
              iproto:
                listen:
                - uri: '127.0.0.1:3305'
                advertise:
                  client: '127.0.0.1:3305'
    

    The main group-level options here are:

    • roles: This option enables the roles.crud-storage role provided by the CRUD module for all storage instances.
    • app: The app.module option specifies that code specific to storages should be loaded from the storage module. This is explained below in the Adding storage code section.
    • sharding: The sharding.roles option specifies that all instances inside this group act as storages. A rebalancer is selected automatically from two master instances.
    • replication: The replication.failover option specifies that a leader in each replica set should be specified manually.
    • replicasets: This section configures two replica sets that constitute cluster storages.
  2. To configure a router, add the following code inside the groups section:

    routers:
      roles: [ roles.crud-router ]
      app:
        module: router
      sharding:
        roles: [ router ]
      replicasets:
        router-a:
          instances:
            router-a-001:
              iproto:
                listen:
                - uri: '127.0.0.1:3301'
                advertise:
                  client: '127.0.0.1:3301'
    

    The main group-level options here are:

    • roles: This option enables the roles.crud-router role provided by the CRUD module for a router instance.
    • app: The app.module option specifies that code specific to a router should be loaded from the router module. This is explained below in the Adding router code section.
    • sharding: The sharding.roles option specifies that an instance inside this group acts as a router.
    • replicasets: This section configures a replica set with one router instance.

The resulting config.yaml file should look as follows:

credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [ replication ]
    storage:
      password: 'secret'
      roles: [ sharding ]

iproto:
  advertise:
    peer:
      login: replicator
    sharding:
      login: storage

sharding:
  bucket_count: 1000

groups:
  storages:
    roles: [ roles.crud-storage ]
    app:
      module: storage
    sharding:
      roles: [ storage ]
    replication:
      failover: manual
    replicasets:
      storage-a:
        leader: storage-a-001
        instances:
          storage-a-001:
            iproto:
              listen:
              - uri: '127.0.0.1:3302'
              advertise:
                client: '127.0.0.1:3302'
          storage-a-002:
            iproto:
              listen:
              - uri: '127.0.0.1:3303'
              advertise:
                client: '127.0.0.1:3303'
      storage-b:
        leader: storage-b-001
        instances:
          storage-b-001:
            iproto:
              listen:
              - uri: '127.0.0.1:3304'
              advertise:
                client: '127.0.0.1:3304'
          storage-b-002:
            iproto:
              listen:
              - uri: '127.0.0.1:3305'
              advertise:
                client: '127.0.0.1:3305'
  routers:
    roles: [ roles.crud-router ]
    app:
      module: router
    sharding:
      roles: [ router ]
    replicasets:
      router-a:
        instances:
          router-a-001:
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
              advertise:
                client: '127.0.0.1:3301'

Open the storage.lua file and define a space and indexes inside box.watch() as follows:

box.watch('box.status', function()
    if box.info.ro then
        return
    end

    box.schema.create_space('bands', {
        format = {
            { name = 'id', type = 'unsigned' },
            { name = 'bucket_id', type = 'unsigned' },
            { name = 'band_name', type = 'string' },
            { name = 'year', type = 'unsigned' }
        },
        if_not_exists = true
    })
    box.space.bands:create_index('id', { parts = { 'id' }, if_not_exists = true })
    box.space.bands:create_index('bucket_id', { parts = { 'bucket_id' }, unique = false, if_not_exists = true })
end)
  • The box.schema.create_space() function creates a space. Note that the created bands space includes the bucket_id field. This field represents a sharding key used to partition a dataset across different storage instances.
  • space_object:create_index() creates two indexes based on the id and bucket_id fields.

Примечание

In a sharded space, uniqueness by secondary index is only guaranteed within a single shard, not across the whole cluster.

Open the router.lua file and load the vshard module as follows:

local vshard = require('vshard')

Open the sharded_cluster_crud-scm-1.rockspec file and add the following content:

package = 'sharded_cluster_crud'
version = 'scm-1'
source  = {
    url = '/dev/null',
}

dependencies = {
    'vshard == 0.1.27',
    'crud == 1.5.2'
}
build = {
    type = 'none';
}

The dependencies section includes the specified versions of the vshard and crud modules. To install dependencies, you need to build the application.

In the terminal, open the tt environment directory. Then, execute the tt build command:

$ tt build sharded_cluster_crud
   • Running rocks make
No existing manifest. Attempting to rebuild...
   • Application was successfully built

This installs the vshard and crud modules defined in the *.rockspec file to the .rocks directory.

To start all instances in the cluster, execute the tt start command:

$ tt start sharded_cluster_crud
   • Starting an instance [sharded_cluster_crud:storage-a-001]...
   • Starting an instance [sharded_cluster_crud:storage-a-002]...
   • Starting an instance [sharded_cluster_crud:storage-b-001]...
   • Starting an instance [sharded_cluster_crud:storage-b-002]...
   • Starting an instance [sharded_cluster_crud:router-a-001]...

After starting instances, you need to bootstrap the cluster as follows:

  1. Connect to the router instance using tt connect:

    $ tt connect sharded_cluster_crud:router-a-001
       • Connecting to the instance...
       • Connected to sharded_cluster_crud:router-a-001
    
  2. Call vshard.router.bootstrap() to perform the initial cluster bootstrap and distribute all buckets across the replica sets:

    sharded_cluster_crud:router-a-001> vshard.router.bootstrap()
    ---
    - true
    ...
    

To check the cluster status, execute vshard.router.info() on the router:

sharded_cluster_crud::router-a-001> vshard.router.info()
---
- replicasets:
    storage-b:
      replica:
        network_timeout: 0.5
        status: available
        uri: storage@127.0.0.1:3305
        name: storage-b-002
      bucket:
        available_rw: 500
      master:
        network_timeout: 0.5
        status: available
        uri: storage@127.0.0.1:3304
        name: storage-b-001
      name: storage-b
    storage-a:
      replica:
        network_timeout: 0.5
        status: available
        uri: storage@127.0.0.1:3303
        name: storage-a-002
      bucket:
        available_rw: 500
      master:
        network_timeout: 0.5
        status: available
        uri: storage@127.0.0.1:3302
        name: storage-a-001
      name: storage-a
  bucket:
    unreachable: 0
    available_ro: 0
    unknown: 0
    available_rw: 1000
  status: 0
  alerts: []
...

The output includes the following sections:

  • replicasets: contains information about storages and their availability.
  • bucket: displays the total number of read-write and read-only buckets that are currently available for this router.
  • status: the number from 0 to 3 that indicates whether there are any issues with the cluster. 0 means that there are no issues.
  • alerts: might describe the exact issues related to bootstrapping a cluster, for example, connection issues, failover events, or unidentified buckets.

  1. To insert sample data, call crud.insert_many() on the router:

    crud.insert_many('bands', {
        { 1, box.NULL, 'Roxette', 1986 },
        { 2, box.NULL, 'Scorpions', 1965 },
        { 3, box.NULL, 'Ace of Base', 1987 },
        { 4, box.NULL, 'The Beatles', 1960 },
        { 5, box.NULL, 'Pink Floyd', 1965 },
        { 6, box.NULL, 'The Rolling Stones', 1962 },
        { 7, box.NULL, 'The Doors', 1965 },
        { 8, box.NULL, 'Nirvana', 1987 },
        { 9, box.NULL, 'Led Zeppelin', 1968 },
        { 10, box.NULL, 'Queen', 1970 }
    })
    

    Calling this function distributes data evenly across the cluster nodes.

  2. To get a tuple by the specified ID, call the crud.get() function:

    sharded_cluster_crud:router-a-001> crud.get('bands', 4)
    ---
    - rows:
      - [4, 161, 'The Beatles', 1960]
      metadata: [{'name': 'id', 'type': 'unsigned'}, {'name': 'bucket_id', 'type': 'unsigned'},
        {'name': 'band_name', 'type': 'string'}, {'name': 'year', 'type': 'unsigned'}]
    - null
    ...
    
  3. To insert a new tuple, call crud.insert():

    sharded_cluster_crud:router-a-001> crud.insert('bands', {11, box.NULL, 'The Who', 1962})
    ---
    - rows:
      - [11, 652, 'The Who', 1962]
      metadata: [{'name': 'id', 'type': 'unsigned'}, {'name': 'bucket_id', 'type': 'unsigned'},
        {'name': 'band_name', 'type': 'string'}, {'name': 'year', 'type': 'unsigned'}]
    - null
    ...
    

To check how data is distributed across the replica sets, follow the steps below:

  1. Connect to any storage in the storage-a replica set:

    $ tt connect sharded_cluster_crud:storage-a-001
       • Connecting to the instance...
       • Connected to sharded_cluster_crud:storage-a-001
    

    Then, select all tuples in the bands space:

    sharded_cluster_crud:storage-a-001> box.space.bands:select()
    ---
    - - [1, 477, 'Roxette', 1986]
      - [2, 401, 'Scorpions', 1965]
      - [4, 161, 'The Beatles', 1960]
      - [5, 172, 'Pink Floyd', 1965]
      - [6, 64, 'The Rolling Stones', 1962]
      - [8, 185, 'Nirvana', 1987]
    ...
    
  2. Connect to any storage in the storage-b replica set:

    $ tt connect sharded_cluster_crud:storage-b-001
       • Connecting to the instance...
       • Connected to sharded_cluster_crud:storage-b-001
    

    Select all tuples in the bands space to make sure it contains another subset of data:

    sharded_cluster_crud:storage-b-001> box.space.bands:select()
    ---
    - - [3, 804, 'Ace of Base', 1987]
      - [7, 693, 'The Doors', 1965]
      - [9, 644, 'Led Zeppelin', 1968]
      - [10, 569, 'Queen', 1970]
      - [11, 652, 'The Who', 1962]
    ...
    

Getting started with Tarantool Cluster Manager

Enterprise Edition

This tutorial uses Tarantool Enterprise Edition.

Example on GitHub: tcm_get_started

In this tutorial, you get Tarantool Cluster Manager up and running on your local system, deploy a local Tarantool EE cluster, and learn to manage the cluster from the TCM web UI.

To complete this tutorial, you need:

For more detailed information about using TCM, refer to Tarantool Cluster Manager.

  1. Extract the Tarantool EE SDK archive:

    $ tar -xvzf tarantool-enterprise-sdk-gc64-<VERSION>-<HASH>-r<REVISION>.linux.x86_64.tar.gz
    

    This creates the tarantool-enterprise directory beside the archive. The directory contains three executables for key Tarantool EE components:

  2. Add the Tarantool EE components to the executable path by executing the env.sh script included in the distribution:

    $ source tarantool-enterprise/env.sh
    
  3. To check that the Tarantool EE executables tarantool, tt, and tcm are available in the system, print their versions:

    $ tarantool --version
    Tarantool Enterprise 3.0.0-0-gf58f7d82a-r23-gc64
    Target: Linux-x86_64-RelWithDebInfo
    Build options: cmake . -DCMAKE_INSTALL_PREFIX=/home/centos/release/sdk/tarantool/static-build/tarantool-prefix -DENABLE_BACKTRACE=TRUE
    Compiler: GNU-9.3.1
    C_FLAGS: -fexceptions -funwind-tables -fasynchronous-unwind-tables -static-libstdc++ -fno-common -msse2  -fmacro-prefix-map=/home/centos/release/sdk/tarantool=. -std=c11 -Wall -Wextra -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type -O2 -g -DNDEBUG -ggdb -O2
    CXX_FLAGS: -fexceptions -funwind-tables -fasynchronous-unwind-tables -static-libstdc++ -fno-common -msse2  -fmacro-prefix-map=/home/centos/release/sdk/tarantool=. -std=c++11 -Wall -Wextra -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type -O2 -g -DNDEBUG -ggdb -O2
    $ tt version
    Tarantool CLI EE 2.1.0, linux/amd64. commit: d80c2e3
    $ tcm version
    1.0.0-0-gd38b12c2
    

Tarantool Cluster Manager is ready to run out of the box. To start TCM run the following command:

$ tcm --storage.etcd.embed.enabled

Важно

The TCM bootstrap log in the terminal includes a message with the credentials to use for the first login. Make sure to save them somewhere.

Jan 24 05:51:28.443 WRN Generated super admin credentials login=admin password=qF3A5rjGurjAwmlYccJ7JrL5XqjbIHY6

The –storage.etcd.embed.enabled option makes TCM start its own instance of etcd on bootstrap. This etcd instance is used for storing the TCM configuration.

Примечание

During the development, it is also convenient to use the TCM-embedded etcd as a configuration storage for Tarantool EE clusters connected to TCM. Learn more in Centralized configuration storages.

  1. Open a web browser and go to http://127.0.0.1:8080/.
  2. Enter the username and the password you got from the TCM bootstrap log in the previous step.
  3. Click Log in.

After a successful login, you see the TCM web UI:

TCM stateboard with empty cluster

To prepare a Tarantool EE cluster, complete the following steps:

  1. Define the cluster connection settings in TCM.
  2. Configure the cluster in TCM.
  3. Start the cluster instances locally using the tt utility.

A freshly installed TCM has a predefined cluster named Default cluster. It doesn’t have any configuration or topology out of the box. Its initial properties include the etcd and Tarantool connection parameters. Check these properties to find out where TCM sends the cluster configuration that you write.

To view the Default cluster’s properties:

  1. Go to Clusters and click Edit in the Actions menu opposite the cluster name.

    TCM edit cluster
  2. Click Next on the General tab.

    General cluster settings
  3. Find the connection properties of the configuration storage that the cluster uses. By default, it’s an etcd running on port 2379 (default etcd port) on the same host. The key prefix used for the cluster configuration is /default. Click Next.

    Cluster configuration storage settings
  4. Check the Tarantool user that TCM uses to connect to the cluster instances. It’s guest by default.

    Cluster Tarantool connection settings

TCM provides a web-based editor for writing cluster configurations. It is connected to the configuration storage (etcd in this case): all changes you make in the browser are sent to etcd in one click.

To write the cluster configuration and upload it to the etcd storage:

  1. Go to Configuration.

  2. Click + and provide an arbitrary name for the configuration file, for example, all.

  3. Paste the following YAML configuration into the editor:

    credentials:
      users:
        guest:
          roles: [super]
    groups:
      group-001:
        replicasets:
          replicaset-001:
            replication:
              failover: manual
            leader: instance-001
            instances:
              instance-001:
                iproto:
                  listen:
                  - uri: '127.0.0.1:3301'
                  advertise:
                    client: '127.0.0.1:3301'
              instance-002:
                iproto:
                  listen:
                  - uri: '127.0.0.1:3302'
                  advertise:
                    client: '127.0.0.1:3302'
              instance-003:
                iproto:
                  listen:
                  - uri: '127.0.0.1:3303'
                  advertise:
                    client: '127.0.0.1:3303'
    

    This configuration sets up a cluster of three nodes in one replica set: one leader and two followers.

  4. Click Apply to send the configuration to etcd.

    Cluster configuration in TCM

When the cluster configuration is saved, you can see the cluster topology on the Stateboard page:

Offline cluster stateboard

However, the cluster instances are offline because they aren’t deployed yet.

To deploy a local cluster based on the configuration from etcd:

  1. Go to the system terminal you used when setting up Tarantool.

  2. Create a new tt environment in a directory of your choice:

    $ mkdir cluster-env
    $ cd cluster-env/
    $ tt init
    
  3. Inside the instances.enabled directory of the created tt environment, create the cluster directory.

    $ mkdir instances.enabled/cluster
    $ cd instances.enabled/cluster/
    
  4. Inside instances.enabled/cluster, create the instances.yml and config.yaml files:

    • instances.yml specifies instances to run in the current environment. In this example, there are three instances:

      instance-001:
      instance-002:
      instance-003:
      
    • config.yaml instructs tt to load the cluster configuration from etcd. The specified etcd location matches the configuration storage of the Default cluster in TCM:

      config:
        etcd:
          endpoints:
          - http://localhost:2379
          prefix: /default
      
  5. Start the cluster from the tt environment root (the cluster-env directory):

    $ tt start cluster
    

    To check how the cluster started, run tt status. This output should look like this:

    $ tt status cluster
    INSTANCE              STATUS   PID   MODE  CONFIG  BOX      UPSTREAM
    cluster:instance-001  RUNNING  8747  RW    ready   running  --
    cluster:instance-002  RUNNING  8748  RO    ready   running  --
    cluster:instance-003  RUNNING  8749  RO    ready   running  --
    

To learn to interact with a cluster in TCM, complete typical database tasks such as:

To check the cluster state in TCM, go to Stateboard. Here you see the overview of the cluster topology, health, memory consumption, and other information.

Online cluster stateboard

To view detailed information about an instance, click its name in the instances list on the Stateboard page.

Instance details in TCM

To connect to the instance interactively and execute code on it, go to the Terminal tab.

Instance terminal in TCM

Go to the terminal of instance-001 (the leader instance) and run the following code to create a formatted space with a primary index in the cluster:

box.schema.space.create('bands')
box.space.bands:format({
    { name = 'id', type = 'unsigned' },
    { name = 'band_name', type = 'string' },
    { name = 'year', type = 'unsigned' }
})
box.space.bands:create_index('primary', { type = "tree", parts = { 'id' } })

Since instance-001 is a read-write instance (its box.info.ro is false), the write requests must be executed on it. Run the following code in the instance-001 terminal to write tuples in the space:

box.space.bands:insert { 1, 'Roxette', 1986 }
box.space.bands:insert { 2, 'Scorpions', 1965 }
box.space.bands:insert { 3, 'Ace of Base', 1987 }

Check the space’s tuples by running a read request on instance-001:

box.space.bands:select { 3 }

This is how it looks in TCM:

Writing data through TCM

To check that the data is replicated across instances, run the read request on any other instance – instance-002 or instance-003. The result is the same as on instance-001.

Reading data through TCM

Примечание

If you try to execute a write request on any instance but instance-001, you get an error because these instances are configured to be read-only.

TCM web UI includes a tool for viewing data stored in the cluster. To view the space tuples in TCM:

  1. Click an instance name on the Stateboard page.

  2. Open the Actions menu in the top-right corner and click Explorer.

    Opening Explorer in TCM

    This opens the page that lists user-created spaces on the instance.

    TCM Explorer: spaces
  3. Click View in the Actions menu of the space you want to see. The page shows all the tuples added previously.

    TCM Explorer: space tuples

Platform

This section contains documentation for the Tarantool platform consisting of a database and an application server.

Concepts

A storage engine is a set of low-level routines that store and retrieve values. Tarantool offers a choice of two storage engines:

For details, check the Storage engines section.

Tarantool is a NoSQL database. It stores data in spaces, which can be thought of as tables in a relational database, and tuples, which are analogous to rows. There are six basic data operations in Tarantool.

The platform allows describing the data schema but does not require it.

Tarantool supports highly customizable indexes of various types.

To ensure data persistence and recover quickly in case of failure, Tarantool uses mechanisms like the write-ahead log (WAL) and snapshots.

For details, check the Data model page.

Tarantool executes code in fibers that are managed via cooperative multitasking. Learn more about Tarantool’s thread model.

For details, check the page Fibers, yields, and cooperative multitasking.

Tarantool’s ACID-compliant transaction model lets the user choose between two modes of transactions.

The default mode allows for fast monopolistic atomic transactions. It doesn’t support interactive transactions, and in case of an error, all transaction changes are rolled back.

The MVCC mode relies on a multi-version concurrency control engine that allows yielding within a longer transaction. This mode only works with the default in-memory memtx storage engine.

For details, check the Transactions page.

Replication allows keeping the data in copies of the same database for better reliability.

Several Tarantool instances can be organized in a replica set. They communicate and transfer data via the iproto binary protocol. Learn more about Tarantool’s replication architecture.

By default, replication in Tarantool is asynchronous. A transaction committed locally on the master node may not get replicated onto other instances before the client receives a success response. Thus, if the master reports success and then dies, the client might not see the result of the transaction.

With synchronous replication, transactions on the master node are not considered committed or successful before they are replicated onto a number of instances. This is slower, but more reliable. Synchronous replication in Tarantool is based on an implementation of the RAFT algorithm.

For details, check the Replication section.

Tarantool implements database sharding via the vshard module. For details, go to the Sharding page.

Tarantool allows specifying callback functions that run upon certain database events. They can be useful for resolving replication conflicts. For details, go to the Triggers page.

Using Tarantool as an application server, you can write applications in Lua, C, or C++. You can also create reusable modules.

To increase the speed of code execution, Tarantool has a Lua Just-In-Time compiler (LuaJIT) on board. LuaJIT compiles hot paths in the code – paths that are used many times – thus making the application work faster. To enable developers to work with LuaJIT, Tarantool provides tools like the memory profiler and the getmetrics module.

To learn how to use Tarantool as an application server, refer to the guides in the How-to section.

Движки базы данных

Движок базы данных — это набор низкоуровневых процессов, которые фактически хранят и получают значения кортежей. Tarantool предлагает выбор из двух движков базы данных:

Все подробности о том, как работают движки, вы можете найти в следующих разделах:

Хранение данных с помощью memtx

The memtx storage engine is used in Tarantool by default. The engine keeps all data in random-access memory (RAM), and therefore has a low read latency.

Tarantool prevents the data loss in case of emergency, such as outage or Tarantool instance failure, in the following ways:

In this section, the following topics are discussed in brief with the references to other sections that explain the subject matter in details.

There is a fixed number of independent execution threads. The threads don’t share state. Instead they exchange data using low-overhead message queues. While this approach limits the number of cores that the instance uses, it removes competition for the memory bus and ensures peak scalability of memory access and network throughput.

Only one thread, namely, the transaction processor thread (further, TX thread) can access the database, and there is only one TX thread for each Tarantool instance. In this thread, transactions are executed in a strictly consecutive order. Multi-statement transactions exist to provide isolation: each transaction sees a consistent database state and commits all its changes atomically. At commit time, a yield happens and all transaction changes are written to WAL in a single batch. In case of errors during transaction execution, a transaction is rolled-back completely. Read more in the following sections: Transaction model, Transaction mode: MVCC.

Внутри потока TX есть область памяти, в которой Tarantool хранит данные. Эта область называется Arena.

../../../_images/arena2.svg

Data is stored in spaces. Spaces contain database records – tuples. To access and manipulate the data stored in spaces and tuples, Tarantool builds indexes.

Распределением памяти для спейсов, кортежей и индексов внутри области Arena управляют специальные аллокаторы. Для хранения кортежей главным образом используется аллокатор slab. В Tarantool встроен модуль под названием box.slab, предоставляющий статистику распределения slab. С помощью этой статистики можно отслеживать общее использование памяти и ее фрагментацию. Подробности см. в руководстве по модулю box.slab.

../../../_images/spaces_indexes.svg

Also inside the TX thread, there is an event loop. Within the event loop, there are a number of fibers. Fibers are cooperative primitives that allow interaction with spaces, that is, reading and writing the data. Fibers can interact with the event loop and between each other directly or by using special primitives called channels. Due to the usage of fibers and cooperative multitasking, the memtx engine is lock-free in typical situations.

../../../_images/fibers-channels.svg

To interact with external users, there is a separate network thread also called the iproto thread. The iproto thread receives a request from the network, parses and checks the statement, and transforms it into a special structure—a message containing an executable statement and its options. Then the iproto thread ships this message to the TX thread and runs the user’s request in a separate fiber.

../../../_images/iproto.svg

Tarantool ensures data persistence as follows:

../../../_images/wal.svg ../../../_images/snapshot03.svg

Таким образом, при перезапуске Tarantool данные можно полностью восстановить даже в аварийных ситуациях, например при отключении питания или падении экземпляра Tarantool, когда хранящаяся в оперативной памяти база данных утеряна.

Что происходит при перезапуске:

  1. Tarantool находит и читает последний файл снимка.
  2. Tarantool также находит и читает все файлы WAL, созданные после этого снимка.
  3. Как только снимок и файлы WAL будут прочитаны, набор данных в памяти будет полностью восстановлен. Он будет соответствовать состоянию экземпляра Tarantool на момент, когда тот прекратил работу.
  4. Во время чтения снимка и файлов WAL Tarantool строит первичные индексы.
  5. Когда все данные снова в памяти, Tarantool строит вторичные индексы.
  6. Tarantool запускает приложение.

Чтобы обращаться к данным, хранящимся в оперативной памяти, и работать с ними, Tarantool строит индексы, которые хранятся внутри области памяти Arena.

Tarantool поддерживает несколько типов индексов: TREE, HASH, BITSET, RTREE. Все они предназначены для разных сценариев использования.

Можно выполнять SELECT-запросы как по первичным, так и по вторичным ключам индекса. Ключи могут быть составными.

For detailed information about indexes, refer to the Индексы page.

Хотя эта тема не имеет прямого отношения к движку memtx, она дополняет общую картину того, как работает Tarantool, когда приложение распределенное.

Репликация позволяет нескольким экземплярам Tarantool работать с копиями одной и той же базы данных. Эти копии остаются синхронизированными благодаря тому, что каждый экземпляр может сообщать другим экземплярам о совершенных им изменениях. Для этого используется WAL-репликация.

Чтобы отправить данные на реплику, Tarantool запускает еще один поток, называемый relay. Этот поток читает файлы WAL и отправляет их репликам. На каждой реплике выполняется файбер под названием applier. Он получает изменения от удаленного узла и применяет их к области Arena реплики. Все изменения записываются в файлы WAL через поток WAL реплики так же, как если бы они были сделаны локально.

../../../_images/replica-xlogs.svg

В Tarantool репликация по умолчанию асинхронна: то, что транзакция проходит коммит локально на главном узле, не означает, что она отправляется на какие-то другие реплики.

Эту проблему решает синхронная репликация. Каждая синхронная транзакция проходит коммит лишь после репликации на некотором количестве экземпляров, и только тогда клиенту приходит ответ о завершении транзакции.

Более подробные сведения вы найдете в главе о репликации.

Вот главные принципы, по которым работает движок:

Хранение данных с помощью vinyl

Tarantool – это транзакционная, персистентная СУБД, которая хранит 100% данных в оперативной памяти. Основными преимущества хранения данных оперативной памяти являются скорость и простота использования: нет необходимости в оптимизации, однако производительность остается стабильно высокой.

Несколько лет назад мы решили расширить продукт путем реализации классической технологии хранения как в обычных СУБД: в оперативной памяти хранится лишь кэш данных, а основной объем данных находится на диске. Мы решили, что движок хранения можно будет выбирать независимо для каждой таблицы, как это реализовано в MySQL, но при этом с самого начала будет реализована поддержка транзакций.

Первый вопрос, на который нужен был ответ: создавать свой движок или использовать уже существующую библиотеку? Сообщество разработчиков открытого ПО предлагает готовые библиотеки на выбор. Активнее всего развивалась библиотека RocksDB, которая к настоящему времени стала одной из самых популярных. Есть также несколько менее известных библиотек: WiredTiger, ForestDB, NestDB, LMDB.

Тем не менее, изучив исходный код существующих библиотек и взвесив все «за» и «против», мы решили написать свой движок. Одна из причин – все существующие сторонние библиотеки предполагают, что запросы к данным могут поступать из множества потоков операционной системы, и поэтому содержат сложные примитивы синхронизации для управления одновременным доступом к данным. Если бы мы решили встраивать одну из них в Tarantool, то пользователи были бы вынуждены нести издержки многопоточных приложений, не получая ничего взамен. Дело в том, что в основе Tarantool лежит архитектура на основе акторов. Обработка транзакций в выделенном потоке позволяет обойтись без лишних блокировок, межпроцессного взаимодействия и других затрат ресурсов, которые забирают до 80% процессорного времени в многопоточных СУБД.

../../../_images/actor_threads.png

Процесс в Tarantool состоит из заданного количества потоков

Если изначально проектировать движок с учетом кооперативной многозадачности, можно не только существенно ускорить работу, но и реализовать приемы оптимизации, слишком сложные для многопоточных движков. В общем, использование стороннего решения не привело бы к лучшему результату.

Отказавшись от идеи внедрения существующих библиотек, необходимо было выбрать архитектуру для использования в качестве основы. Есть два альтернативных подхода к хранению данных на диске: старая модель с использованием B-деревьев и их разновидностей и новая – на основе журнально-структурированных деревьев со слиянием, или LSM-деревьев (Log Structured Merge Tree). MySQL, PostgreSQL и Oracle используют B-деревья, а Cassandra, MongoDB и CockroachDB уже используют LSM-деревья.

Считается, что B-деревья более эффективны для чтения, а LSM-деревья – для записи. Тем не менее, с распространением SSD-дисков, у которых в несколько раз выше производительность чтения по сравнению с производительностью записи, преимущества LSM-деревьев стали очевидны в большинстве сценариев.

Прежде чем разбираться с LSM-деревьями в Tarantool, посмотрим, как они работают. Для этого разберем устройство обычного B-дерева и связанные с ним проблемы. «B» в слове B-tree означает «Block», то есть это сбалансированное дерево, состоящее из блоков, которые содержат отсортированные списки пар ключ-значение. Вопросы наполнения дерева, балансировки, разбиения и слияния блоков выходят за рамки данной статьи, подробности вы сможете прочитать в Википедии. В итоге мы получаем отсортированный по возрастанию ключа контейнер, минимальный элемент которого хранится в крайнем левом узле, а максимальный – в крайнем правом. Посмотрим, как в B-дереве осуществляется поиск и вставка данных.

../../../_images/classical_b_tree.png

Классическое B-дерево

Если необходимо найти элемент или проверить его наличие, поиск начинается, как обычно, с вершины. Если ключ обнаружен в корневом блоке, поиск заканчивается; в противном случае, переходим в блок с наибольшим меньшим ключом, то есть в самый правый блок, в котором еще есть элементы меньше искомого (элементы на всех уровнях расположены по возрастанию). Если и там элемент не найден, снова переходим на уровень ниже. В конце концов окажемся в одном из листьев и, возможно, обнаружим искомый элемент. Блоки дерева хранятся на диске и читаются в оперативную память по одному, то есть в рамках одного поиска алгоритм считывает logB(N) блоков, где N – это количество элементов в B-дереве. Запись в самом простом случае осуществляется аналогично: алгоритм находит блок, который содержит необходимый элемент, и обновляет (вставляет) его значение.

Чтобы наглядно представить себе эту структуру данных, возьмем B-дерево на 100 000 000 узлов и предположим, что размер блока равен 4096 байтов, а размер элемента – 100 байтов. Таким образом, в каждом блоке можно будет разместить до 40 элементов с учетом накладных расходов, а в B-дереве будет около 2 570 000 блоков, пять уровней, при этом первые четыре займут по 256 МБ, а последний – до 10 ГБ. Очевидно, что на любом современном компьютере все уровни, кроме последнего, успешно попадут в кэш файловой системы, и фактически любая операция чтения будет требовать не более одной операции ввода-вывода.

Ситуация выглядит существенно менее радужно при смене точки зрения. Предположим, что необходимо обновить один элемент дерева. Так как операции с B-деревьями работают через чтение и запись целых блоков, приходится прочитать 1 блок в память, изменить 100 байт из 4096, а затем записать обновленный блок на диск. Таким образом, нам пришлось записать в 40 раз больше, чем реальный объем измененных данных!

Принимая во внимание, что внутренний размер блока в SSD-дисках может быть 64 КБ и больше, и не любое изменение элемента меняет его целиком, объем «паразитной» нагрузки на диск может быть еще выше.

Феномен таких «паразитных» чтений в литературе и блогах, посвященных хранению на диске, называется read amplification (усложнение чтения), а феномен «паразитной» записи – write amplification (усложнение записи).

Коэффициент усложнения, то есть коэффициент умножения, вычисляется как отношение размера фактически прочитанных (или записанных) данных к реально необходимому (или измененному) размеру. В нашем примере с B-деревом коэффициент составит около 40 как для чтения, так и для записи.

Объем «паразитных» операций ввода-вывода при обновлении данных является одной из основных проблем, которую решают LSM-деревья. Рассмотрим, как это работает.

Ключевое отличие LSM-деревьев от классических B-деревьев заключается в том, что LSM-деревья не просто хранят данные (ключи и значения), а также операции с данными: вставки и удаления.

../../../_images/lsm.png


LSM-дерево:

Например, элемент для операции вставки, помимо ключа и значения, содержит дополнительный байт с кодом операции – обозначенный выше как REPLACE. Элемент для операции удаления содержит ключ элемента (хранить значение нет необходимости) и соответствующий код операции – DELETE. Также каждый элемент LSM-дерева содержит порядковый номер операции (log sequence number – LSN), то есть значение монотонно возрастающей последовательности, которое уникально идентифицирует каждую операцию. Таким образом, всё дерево упорядочено сначала по возрастанию ключа, а в пределах одного ключа – по убыванию LSN.

../../../_images/lsm_single.png

Один уровень LSM-дерева

В отличие от B-дерева, которое полностью хранится на диске и может частично кэшироваться в оперативной памяти, в LSM-дереве разделение между памятью и диском явно присутствует с самого начала. При этом проблема сохранности данных, расположенных в энергозависимой памяти, выносится за рамки алгоритма хранения: ее можно решить разными способами, например, журналированием изменений.

Часть дерева, расположенную в оперативной памяти, называют L0 (level zero – уровень ноль). Объем оперативной памяти ограничен, поэтому для L0 отводится фиксированная область. В конфигурации Tarantool, например, размер L0 задается с помощью параметра vinyl_memory. В начале, когда LSM-дерево не содержит элементов, операции записываются в L0. Следует отметить, что элементы в дереве упорядочены по возрастанию ключа, а затем по убыванию LSN, так что в случае вставки нового значения по данному ключу легко обнаружить и удалить предыдущее значение. L0 может быть представлен любым контейнером, который сохраняет упорядоченность элементов. Например, для хранения L0 Tarantool использует B+*-дерево. Операции поиска и вставки – это стандартные операции структуры данных, используемой для представления L0, и мы их подробно рассматривать не будем.

Рано или поздно количество элементов в дереве превысит размер L0. Тогда L0 записывается в файл на диске (который называется забегом – «run») и освобождается под новые элементы. Эта операция называется «дамп» (dump).

../../../_images/dumps.png


Все дампы на диске образуют последовательность, упорядоченную по LSN: диапазоны LSN в файлах не пересекаются, а ближе к началу последовательности находятся файлы с более новыми операциями. Представим эти файлы в виде пирамиды, где новые файлы расположены вверху, а старые внизу. По мере появления новых файлов забегов, высота пирамиды растет. При этом более свежие файлы могут содержать операции удаления или замены для существующих ключей. Для удаления старых данных необходимо производиться сборку мусора (этот процесс иногда называется «слияние» – в английском языке «merge» или «compaction»), объединяя нескольких старых файлов в новый. Если при слиянии мы встречаем две версии одного и того же ключа, то достаточно оставить только более новую версию, а если после вставки ключа он был удален, то из результата можно исключить обе операции.

../../../_images/purge.png


Ключевым фактором эффективности LSM-дерева является то, в какой момент и для каких файлов делается слияние. Представим, что LSM-дерево в качестве ключей хранит монотонную последовательность вида 1, 2, 3 …, и операций удаления нет. В этом случае слияние будет бесполезным – все элементы уже отсортированы, дерево не содержит мусор и можно однозначно определить, в каком файле находится каждый ключ. Напротив, если LSM-дерево содержит много операций удаления, слияние позволит освободить место на диске. Но даже если удалений нет, а диапазоны ключей в разных файлах сильно пересекаются, слияние может ускорить поиск, так как сократит число просматриваемых файлов. В этом случае имеет смысл выполнять слияние после каждого дампа. Однако следует отметить, что такое слияние приведет к перезаписи всех данных на диске, поэтому если чтений мало, то лучше делать слияния реже.

Для оптимальной конфигурации под любой из описанных выше сценариев в LSM-дереве все файлы организованы в пирамиду: чем новее операции с данными, тем выше они находятся в пирамиде. При этом в слиянии участвуют два или несколько соседних файлов в пирамиде; по возможности выбираются файлы примерно одинакового размера.

../../../_images/compaction.png


  • Многоуровневое слияние может охватить любое количество уровней
  • Уровень может содержать несколько файлов

Все соседние файлы примерно одинакового размера составляют уровень LSM-дерева на диске. Соотношение размеров файлов на различных уровнях определяет пропорции пирамиды, что позволяет оптимизировать дерево под интенсивные вставки, либо интенсивные чтения.

Предположим, что размер L0 составляет 100 МБ, а соотношение размеров файлов на каждом уровне (параметр vinyl_run_size_ratio) равно 5, и на каждом уровне может быть не более 2 файлов (параметр vinyl_run_count_per_level). После первых трех дампов на диске появятся 3 файла по 100 МБ, эти файлы образуют уровень L1. Так как 3 > 2, запустится слияние файлов в новый файл размером 300 МБ, а старые будут удалены. Спустя еще 2 дампа снова запустится слияние, на этот раз файлов в 100, 100 и 300 МБ, в результате файл размером 500 МБ переместится на уровень L2 (вспомним, что соотношение размеров уровней равно 5), а уровень L1 останется пустым. Пройдут еще 10 дампов, и получим 3 файла по 500 МБ на уровне L2, в результате чего будет создан один файл размером 1500 МБ. Спустя еще 10 дампов произойдет следующее: 2 раза произведем слияние 3 файлов по 100 МБ, а также 2 раза слияние файлов по 100, 100 и 300 МБ, что приведет к созданию двух файлов на уровне L2 по 500 МБ. Поскольку на уровне L2 уже есть три файла, запустится слияние двух файлов по 500 МБ и одного файла в 1500 МБ. Полученный в результате файл в 2500 МБ, в силу своего размера, переедет на уровень L3.

Процесс может продолжаться до бесконечности, а если в потоке операций с LSM-деревом будет много удалений, образовавшийся в результате слияния файл может переместиться не только вниз по пирамиде, но и вверх, так как окажется меньше исходных файлов, использовавшихся при слиянии. Иными словами, принадлежность файла к уровню достаточно отслеживать логически на основе размера файла и минимального и максимального LSN среди всех хранящихся в нем операций.

Если число файлов для поиска нужно уменьшить, то соотношение размеров файлов на разных уровнях можно увеличить, и, как следствие, уменьшается число уровней. Если, напротив, необходимо снизить затраты ресурсов, вызванные слиянием, то можно уменьшить соотношение размеров уровней: пирамида будет более высокой, а слияние хотя и выполняется чаще, но работает в среднем с файлами меньшего размера, за счет чего суммарно выполняет меньше работы. В целом, «паразитная запись» в LSM-дереве описывается формулой log_{x}(\\frac {N} {L0}) × x или x × \\frac {ln (\\frac {N} {C0})} {ln(x)}, где N – это общий размер всех элементов дерева, L0 – это размер уровня ноль, а x – это соотношение размеров уровней (параметр level_size_ratio). Если \\frac {N} {C0} = 40 (соотношение диск-память), график выглядит примерно вот так:

../../../_images/curve.png


«Паразитное» чтение при этом пропорционально количеству уровней. Стоимость поиска на каждом уровне не превышает стоимости поиска в B-дереве. Возвращаясь к нашему примеру дерева в 100 000 000 элементов: при наличии 256 МБ оперативной памяти и стандартных значений параметров vinyl_run_size_ratio и vinyl_run_count_per_level, получим коэффициент «паразитной» записи равным примерно 13, коэффициент «паразитной» записи может доходить до 150. Разберемся, почему это происходит.

Если при поиске по одному ключу алгоритм завершается после первого совпадения, то для поиска всех значений в диапазоне (например, всех пользователей с фамилией «Иванов») необходимо просматривать все уровни дерева.

../../../_images/range_search.png

Поиск по диапазону [24,30)

Формирование искомого диапазона при этом происходит так же, как и при слиянии нескольких файлов: из всех источников алгоритм выбирает ключ с максимальным LSN, отбрасывает остальные операции по этому ключу, сдвигает позицию поиска на следующий ключ и повторяет процедуру.

Зачем вообще хранить операции удаления? И почему это не приводит к переполнению дерева, например, в сценарии for i=1,10000000 put(i) delete(i) end?

Роль операций удаления при поиске – сообщать об отсутствии искомого значения, а при слиянии – очищать дерево от «мусорных» записей с более старыми LSN.

Пока данные хранятся только в оперативной памяти, нет необходимости хранить операции удаления. Также нет необходимости сохранять операции удаления после слияния, если оно затрагивает в том числе самый нижний уровень дерева – на нем находятся данные самого старого дампа. Действительно, отсутствие значения на последнем уровне означает, что оно отсутствует в дереве.

  • Нельзя производить удаление из файлов, которые обновляются только путем присоединения новых записей
  • Вместо этого на уровень L0 вносятся маркеры удаленных записей (tombstones)
../../../_images/deletion_1.png

Удаление, шаг 1: вставка удаленной записи в L0

../../../_images/deletion_2.png

Удаление, шаг 2: удаленная запись проходит через промежуточные уровни

../../../_images/deletion_3.png

Удаление, шаг 3: при значительном слиянии удаленная запись удаляется из дерева

Если мы знаем, что удаление следует сразу за вставкой уникального значения – а это частый случай при изменении значения во вторичном индексе – то операцию удаления можно отфильтровывать уже при слиянии промежуточных уровней. Эта оптимизация реализована в vinyl’е.

Помимо снижения «паразитной» записи, подход с периодическими дампами уровня L0 и слиянием уровней L1-Lk имеет ряд преимуществ перед подходом к записи, используемым в B-деревьях:

Одним из ключевых преимуществ B-дерева как структуры данных для поиска является предсказуемость: любая операция занимает не более чем log_{B}(N). В классическом LSM-дереве скорость как чтения, так и записи могут может отличаться в лучшем и худшем случае в сотни и тысячи раз. Например, добавление всего лишь одного элемента в L0 может привести к его переполнению, что в свою очередь, может привести к переполнению L1, L2 и т.д. Процесс чтения может обнаружить исходный элемент в L0, а может задействовать все уровни. Чтение в пределах одного уровня также необходимо оптимизировать, чтобы добиться скорости, сравнимой с B-деревом. К счастью, многие недостатки можно скрасить или полностью устранить с помощью вспомогательных алгоритмов и структур данных. Систематизируем эти недостатки и опишем способы борьбы с ними, используемые в Tarantool.

Вставка данных в LSM-дерево почти всегда задействует исключительно L0. Как избежать простоя, если заполнена область оперативной памяти, отведенная под L0?

Освобождение L0 подразумевает две долгих операции: запись на диск и освобождение памяти. Чтобы избежать простоя во время записи L0 на диск, Tarantool использует упреждающую запись. Допустим, размер L0 составляет 256 MБ. Скорость записи на диск составляет 10 МБ/с. Тогда для записи L0 на диск понадобится 26 секунд. Скорость вставки данных составляет 10 000 запросов в секунду, а размер одного ключа – 100 байтов. На время записи необходимо зарезервировать около 26 MБ доступной оперативной памяти, сократив реальный полезный размер L0 до 230 MБ.

Tarantool does all of these calculations automatically, constantly updating the rolling average of the DBMS workload and the histogram of the disk speed. This allows using L0 as efficiently as possible and it prevents write requests from timing out. But in the case of workload surges, some wait time is still possible. That’s why we also introduced an insertion timeout (the vinyl_timeout parameter), which is set to 60 seconds by default. The write operation itself is executed in dedicated threads. The number of these threads (4 by default) is controlled by the vinyl_write_threads parameter. The default value of 2 allows doing dumps and compactions in parallel, which is also necessary for ensuring system predictability.

Слияния в Tarantool всегда выполняются независимо от дампов, в отдельном потоке выполнения. Это возможно благодаря природе LSM-дерева – после записи файлы в дереве никогда не меняются, а слияние лишь создает новый файл.

К задержкам также может приводить ротация L0 и освобождение памяти, записанной на диск: в процессе записи памятью L0 владеют два потока операционной системы – поток обработки транзакций и поток записи. Хотя в L0 во время ротации элементы не добавляются, он может участвовать в поиске. Чтобы избежать блокировок на чтение во время поиска, поток записи не освобождает записанную память, а оставляет эту задачу потоку обработки транзакций. Само освобождение после завершения дампа происходит мгновенно: для этого в L0 используется специализированный механизм распределения, позволяющий освободить всю память за одну операцию.

../../../_images/dump_from_shadow.png
  • упреждающий дамп
  • загрузка

Дамп происходит из так называемого «теневого» L0, не блокируя новые вставки и чтения

Чтение – самая сложная задача для оптимизации в LSM-деревьях. Главным фактором сложности является большое количество уровней: это не только значительно замедляет поиск, но и потенциально значительно увеличивает требования к оперативной памяти при почти любых попытках оптимизации. К счастью, природа LSM-деревьев, где файлы обновляются только путем присоединения новых записей, позволяет решать эти проблемы нестандартными для традиционных структур данных способами.

../../../_images/read_speed.png
  • постраничный индекс
  • фильтры Блума
  • кэш диапазона кортежей
  • многоуровневое слияние

Сжатие данных в B-деревьях – это либо сложнейшая в реализации задача, либо больше средство маркетинга, чем действительно полезный инструмент. Сжатие в LSM-деревьях работает следующим образом:

При любом дампе или слиянии мы разбиваем все данные в одном файле на страницы. Размер страницы в байтах задается в параметре vinyl_page_size, который можно менять отдельно для каждого индекса. Страница не обязана занимать строго то количество байт, которое прописано vinyl_page_size – она может быть чуть больше или чуть меньше, в зависимости от хранящихся в ней данных. Благодаря этому страница никогда не содержит пустот.

Для сжатия используется потоковый алгоритм Facebook под названием «zstd». Первый ключ каждой страницы и смещение страницы в файле добавляются в так называемый постраничный индекс (page index) – отдельный файл, который позволяет быстро найти нужную страницу. После дампа или слияния постраничный индекс созданного файла также записывается на диск.

Все файлы типа .index кэшируются в оперативной памяти, что позволяет найти нужную страницу за одно чтение из файла .run (такое расширение имени файла используется в vinyl’е для файлов, полученных в результате дампа или слияния). Поскольку данные в странице отсортированы, после чтения и декомпрессии нужный ключ можно найти с помощью простого бинарного поиска. За чтение и декомпрессию отвечают отдельные потоки, их количество определяется в параметре vinyl_read_threads.

Tarantool использует единый формат файлов: например, формат данных в файле .run ничем не отличается от формата файла .xlog (файл журнала). Это упрощает резервное копирование и восстановление, а также работу внешних инструментов.

Хотя постраничный индекс позволяет уменьшить количество страниц, просматриваемых при поиске в одном файле, он не отменяет необходимости искать на всех уровнях дерева. Есть важный частный случай, когда необходимо проверить отсутствие данных, и тогда просмотр всех уровней неизбежен: вставка в уникальный индекс. Если данные уже существуют, то вставка в уникальный индекс должна завершиться с ошибкой. Единственный способ вернуть ошибку до завершения транзакции в LSM-дереве – произвести поиск перед вставкой. Такого рода чтения в СУБД образуют целый класс, называемый «скрытыми» или «паразитными» чтениями.

Другая операция, приводящая к скрытым чтениям, – обновление значения, по которому построен вторичный индекс. Вторичные ключи представляют собой обычные LSM-деревья, в которых данные хранятся в другом порядке. Чаще всего, чтобы не хранить все данные во всех индексах, значение, соответствующее данному ключу, целиком сохраняется только в первичном индексе (любой индекс, хранящий и ключ, и значение, называется покрывающим или кластерным), а во вторичном индексе сохраняются лишь поля, по которым построен вторичный индекс, и значения полей, участвующих в первичном индексе. Тогда при любом изменении значения, по которому построен вторичный ключ, приходится сначала удалять из вторичного индекса старый ключ, и только потом вставлять новый. Старое значение во время обновления неизвестно – именно его и нужно читать из первичного ключа с точки зрения внутреннего устройства.

Например:

update t1 set city=’Moscow’ where id=1

Чтобы уменьшить количество чтений с диска, особенно для несуществующих значений, практически все LSM-деревья используют вероятностные структуры данных. Tarantool не исключение. Классический фильтр Блума – это набор из нескольких (обычно 3-5) битовых массивов. При записи для каждого ключа вычисляется несколько хеш-функций, и в каждом массиве выставляется бит, соответствующий значению хеша. При хешировании могут возникнуть коллизии, поэтому некоторые биты могут быть проставлены дважды. Интерес представляют биты, которые оказались не проставлены после записи всех ключей. При поиске также вычисляются выбранные хеш-функции. Если хотя бы в одном из битовых массивов бит не стоит, то значение в файле отсутствует. Вероятность срабатывания фильтра Блума определяется теоремой Байеса: каждая хеш-функция представляет собой независимую случайную величину, благодаря чему вероятность того, что во всех битовых массивах одновременно произойдет коллизия, очень мала.

Ключевым преимуществом реализации фильтров Блума в Tarantool является простота настройки. Единственный параметр, который можно менять независимо для каждого индекса, называется vinyl_bloom_fpr (FPR в данном случае означает сокращение от «false positive ratio» – коэффициент ложноположительного срабатывания), который по умолчанию равен 0,05, или 5%. На основе этого параметра Tarantool автоматически строит фильтры Блума оптимального размера для поиска как по полному ключу, так и по компонентам ключа. Сами фильтры Блума хранятся вместе с постраничным индексом в файле .index и кэшируются в оперативной памяти.

Многие привыкли считать кэширование панацеей от всех проблем с производительностью: «В любой непонятной ситуации добавляй кэш». В vinyl’е мы смотрим на кэш скорее как на средство снижения общей нагрузки на диск, и, как следствие, получения более предсказуемого времени ответов на запросы, которые не попали в кэш. В vinyl’е реализован уникальный для транзакционных систем вид кэша под названием «кэш диапазона кортежей» (range tuple cache). В отличие от RocksDB, например, или MySQL, этот кэш хранит не страницы, а уже готовые диапазоны значений индекса, после их чтения с диска и слияния всех уровней. Это позволяет использовать кэш для запросов как по одному ключу, так и по диапазону ключей. Поскольку в кэше хранятся только горячие данные, а не, скажем, страницы (в странице может быть востребована лишь часть данных), оперативная память используется наиболее оптимально. Размер кэша задается в параметре vinyl_cache.

Возможно, добравшись до этого места вы уже начали терять концентрацию и нуждаетесь в заслуженной дозе допамина. Самое время сделать перерыв, так как для того, чтобы разобраться с оставшейся частью, понадобятся серьезные усилия.

В vinyl’е устройство одного LSM-дерева – это лишь фрагмент мозаики. Vinyl создает и обслуживает несколько LSM-деревьев даже для одной таблицы (так называемого спейса) – по одному дереву на каждый индекс. Но даже один единственный индекс может состоять из десятков LSM-деревьев. Попробуем разобраться, зачем.

Рассмотрим наш стандартный пример: 100 000 000 записей по 100 байтов каждая. Через некоторое время на самом нижнем уровне LSM у нас может оказаться файл размером 10 ГБ. Во время слияния последнего уровня мы создадим временный файл, который также будет занимать около 10 ГБ. Данные на промежуточных уровнях тоже занимают место: по одному и тому же ключу дерево может хранить несколько операций. Суммарно для хранения 10 ГБ полезных данных нам может потребоваться до 30 ГБ свободного места: 10 ГБ на последний уровень, 10 ГБ на временный файл и 10 ГБ на всё остальное. А если данных не 1 ГБ, а 1 ТБ? Требовать, чтобы количество свободного места на диске всегда в несколько раз превышало объем полезных данных, экономически нецелесообразно, да и создание файла в 1ТБ может занимать десятки часов. При любой аварии или перезапуске системы операцию придется начинать заново.

Рассмотрим другую проблему. Представим, что первичный ключ дерева – это монотонная последовательность, например, временной ряд. В этом случае основные вставки будут приходиться на правую часть диапазона ключей. Нет смысла заново производить слияние лишь для того, чтобы дописать в конец и без того огромного файла еще несколько миллионов записей.

А если вставки происходят, в основном, в одну часть диапазона ключей, а чтения – из другой части? Как в этом случае оптимизировать форму дерева? Если оно будет слишком высоким, пострадают чтения, если слишком низким – запись.

Tarantool «факторизует» проблему, создавая не одно, а множество LSM-деревьев для каждого индекса. Примерный размер каждого поддерева можно задать в конфигурационном параметре vinyl_range_size. Такие поддеревья называется диапазонами («range»).

../../../_images/factor_lsm.png


Факторизация больших LSM-деревьев с помощью диапазонов

  • Диапазоны отражают статичную структуру упорядоченных файлов
  • Срезы объединяют упорядоченный файл в диапазон

Изначально, пока в индексе мало элементов, он состоит из одного диапазона. По мере добавления элементов суммарный объем может превысить максимальный размер диапазона. В таком случае выполняется операция под названием «разделение» (split), которая делит дерево на две равные части. Разделение происходит по срединному элементу диапазона ключей, хранящихся в дереве. Например, если изначально дерево хранит полный диапазон -inf… +inf, то после разделения по срединному ключу X получим два поддерева: одно будет хранить все ключи от -inf до X, другое – от X до +inf. Таким образом, при вставке или чтении мы однозначно знаем, к какому поддереву обращаться. Если в дереве были удаления и каждый из соседних диапазонов уменьшился, выполняется обратная операция под названием «объединение» (coalesce). Она объединяет два соседних дерева в одно.

Разделение и объединение не приводят к слиянию, созданию новых файлов и прочим тяжеловесным операциям. LSM-дерево – это лишь набор файлов. В vinyl’е мы реализовали специальный журнал метаданных, позволяющий легко отслеживать, какой файл принадлежит какому поддереву или поддеревьям. Журнал имеет расширение .vylog, по формату он совместим с файлом .xlog. Как и файл .xlog, происходит автоматическая ротация файла при каждой контрольной точке. Чтобы избежать повторного создания файлов при разделении и объединении, мы ввели промежуточную сущность – срез (slice). Это ссылка на файл с указанием диапазона значений ключа, которая хранится исключительно в журнале метаданных. Когда число ссылок на файл становится равным нулю, файл удаляется. А когда необходимо произвести разделение или объединение, Tarantool создает срезы для каждого нового дерева, старые срезы удаляет, и записывает эти операции в журнал метаданных. Буквально, журнал метаданных хранит записи вида <идентификатор дерева, идентификатор среза> или <идентификатор среза, идентификатор файла, мин, макс>.

Таким образом, непосредственно тяжелая работа по разбиению дерева на два поддерева, откладывается до слияния и выполняется автоматически. Огромным преимуществом подхода с разделением всего диапазона ключей на диапазоны является возможность независимо управлять размером L0, а также процессом создания дампов и слиянием для каждого поддерева. В результате эти процессы являются управляемыми и предсказуемыми. Наличие отдельного журнала метаданных также упрощает выполнение таких операций, как усечение и удаление – в vinyl’е они обрабатываются мгновенно, потому что работают исключительно с журналом метаданных, а удаление мусора выполняется в фоне.

В предыдущих разделах упоминались лишь две операции, которые хранит LSM-дерево: удаление и замена. Давайте рассмотрим, как представлены все остальные. Вставку можно представить с помощью замены – необходимо лишь предварительно убедиться в отсутствии элемента указанным ключом. Для выполнения обновления необходимо предварительно считывать старое значение из дерева, так что и эту операцию проще записать в дерево как замену – это ускорит будущие чтения по этому ключу. Кроме того, обновление должно вернуть новое значение, так что скрытых чтений никак не избежать.

В B-деревьях скрытые чтения почти ничего не стоят: чтобы обновить блок, его в любом случае необходимо прочитать с диска. Для LSM-деревьев идея создания специальной операции обновления, которая не приводила бы к скрытым чтениям, выглядит очень заманчивой.

Такая операция должна содержать как значение по умолчанию, которое нужно вставить, если данных по ключу еще нет, так и список операций обновления, которые нужно выполнить, если значение существует.

На этапе выполнения транзакции Tarantool лишь сохраняет всю операцию в LSM-дереве, а «выполняет» ее уже только во время слияния.

Операция обновления и вставки:

space:upsert(tuple, {{operator, field, value}, ... })
  • Обновление без чтения или вставка
  • Отложенное выполнение
  • Фоновое сжатие операций обновления и вставки предотвращает накапливание операций

К сожалению, если откладывать выполнение операции на этап слияния, возможностей для обработки ошибок не остается. Поэтому Tarantool стремится максимально проверять операции обновления и вставки upsert перед записью в дерево. Тем не менее, некоторые проверки можно выполнить лишь имея старые данные на руках. Например, если обновление прибавляет число к строке или удаляет несуществующее поле.

Операция с похожей семантикой присутствует во многих продуктах, в том числе в PostgreSQL и MongoDB. Но везде она представляет собой лишь синтаксический сахар, объединяющий обновление и вставку, не избавляя СУБД от необходимости выполнять скрытые чтения. Скорее всего, причиной этого является относительная новизна LSM-деревьев в качестве структур данных для хранения.

Хотя обновление и вставка upsert представляет собой очень важную оптимизацию, и ее реализация стоила нам долгой напряженной работы, следует признать, что ее применимость ограничена. Если в таблице есть вторичные ключи или триггеры, скрытых чтений не избежать. А если у вас есть сценарии, для которых не нужны вторичные ключи и обновление после завершения транзакции однозначно не приведет к ошибкам – эта операция для вас.

Небольшая история, связанная с этим оператором: vinyl только начинал «взрослеть», и мы впервые запустили операцию обновления и вставки upsert на рабочие серверы. Казалось бы, идеальные условия: огромный набор ключей, текущее время в качестве значения, операции обновления либо вставляют ключ, либо обновляют текущее время, редкие операции чтения. Нагрузочные тесты показали отличные результаты.

Тем не менее, после пары дней работы процесс Tarantool начал потреблять 100 % CPU, а производительность системы упала практически до нуля.

Начали подробно изучать проблему. Оказалось, что распределение запросов по ключам существенно отличалось от того, что мы видели в тестовом окружении. Оно было… очень неравномерное. Большая часть ключей обновлялась 1-2 раза за сутки, и база для них не была нагружена. Но были ключи гораздо более горячие – десятки тысяч обновлений в сутки. Tarantool прекрасно справлялся с этим потоком обновлений. А вот когда по ключу с десятком тысяч операций обновления и вставки upsert происходило чтение, всё шло под откос. Чтобы вернуть последнее значение, Tarantool приходилось каждый раз прочитать и «проиграть» историю из десятков тысяч команд обновления и вставки upsert. На стадии проекта мы надеялись, что это произойдет автоматически во время слияния уровней, но до слияния дело даже не доходило: памяти L0 было предостаточно, и дампы не создавались.

Решили мы проблему добавлением фонового процесса, осуществляющего упреждающие чтения для ключей, по которым накопилось больше нескольких десятков операций обновления и вставки upsert с последующей заменой на прочитанное значение.

Не только для операции обновления остро стоит проблема оптимизации скрытых чтений. Даже операция замены при наличии вторичных ключей вынуждена читать старое значение: его нужно независимо удалить из вторичных индексов, а вставка нового элемента может этого не сделать, оставив в индексе мусор.

../../../_images/secondary.png


Если вторичные индексы не уникальны, то удаление из них «мусора» также можно перенести в фазу слияния, что мы и делаем в Tarantool. Природа LSM-дерева, в котором файлы обновляются путем присоединения новых записей, позволила нам реализовать в vinyl’е полноценные сериализуемые транзакции. Запросы только для чтения при этом используют старые версии данных и не блокируют запись. Сам менеджер транзакций пока довольно простой: в традиционной классификации он реализует класс MVTO (multiversion timestamp ordering – упорядочение временных меток на основе многоверсионности), при этом в конфликте побеждает та транзакция, что завершилась первой. Блокировок и свойственных им взаимоблокировок нет. Как ни странно, это скорее недостаток, чем преимущество: при параллельном выполнении можно повысить количество успешных транзакций, задерживая некоторые из них в нужный момент на блокировке. Развитие менеджера транзакций в наших ближайших планах. В текущей версии мы сфокусировались на том, чтобы сделать алгоритм корректным и предсказуемым на 100%. Например, наш менеджер транзакций – один из немногих в NoSQL-среде, поддерживающих так называемые «блокировки разрывов» (gap locks).

Различие между движками memtx и vinyl

Основное различие между движками memtx и vinyl в том, что memtx — in-memory движок, тогда как vinyl — это дисковый движок. Обычно in-memory движок быстрее: время выполнения запроса, как правило, менее 1 мс. Поэтому движок memtx используется в Tarantool по умолчанию. Однако если база данных не помещается в доступную память, а дополнительную память добавить невозможно, то лучше использовать дисковый движок, в данном случае vinyl.

Характеристика memtx vinyl
Поддерживаемый тип индекса TREE, HASH, RTREE или BITSET TREE
Временные спейсы Поддерживается Не поддерживается
функция random() Поддерживается Не поддерживается
функция alter() Поддерживается Поддерживается с версии 1.10.2 (первичный индекс изменять нельзя)
функция len() Возвращает количество кортежей в спейсе Возвращает максимальное примерное количество кортежей в спейсе
функция count() Занимает одинаковые периоды времени Занимает различное количество времени в зависимости от состояния БД
функция delete() Возвращает удаленный кортеж, если есть таковой Всегда возвращает nil
передача управления Не передает управление на запросах выборки, если не происходит коммит транзакции в журнал упреждающей записи (WAL) Передает управление на запросах выборки или аналогичных: get() или pairs()

Configuration

Tarantool provides the ability to configure the full topology of a cluster and set parameters specific for concrete instances, such as connection settings, memory used to store data, logging, and snapshot settings. Each instance uses this configuration during startup to organize the cluster.

There are two approaches to configuring Tarantool:

YAML configuration describes the full topology of a Tarantool cluster. A cluster’s topology includes the following elements, starting from the lower level:

groups:
  group001:
    replicasets:
      replicaset001:
        instances:
          instance001:
            # ...
          instance002:
            # ...

You can flexibly configure a cluster’s settings on different levels: from global settings applied to all groups to parameters specific for concrete instances.

Примечание

All the available options are documented in the Configuration reference.

This section provides an overview on how to configure Tarantool in a YAML file.

The example below shows a sample configuration of a single Tarantool instance:

# yaml-language-server: $schema=https://download.tarantool.org/tarantool/schema/config.schema.json

groups:
  group001:
    replicasets:
      replicaset001:
        instances:
          instance001:
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
  • The instances section includes only one instance named instance001. The iproto.listen.uri option sets an address used to listen for incoming requests.
  • The replicasets section contains one replica set named replicaset001.
  • The groups section contains one group named group001.

Примечание

The initial line in this sample contains a link to an annotated Tarantool configuration schema for a YAML language server (e.g. for LSP-Yaml). With this link you can set up your code editor (VScode, Neovim, Sublime, etc.) to get full-text annotations and completion prompts upon Alt+ESC (Linux) / Option+ESC (MacOS) when you work with Tarantool configuration.

../../_images/yaml-annotated.png

This section shows how to control a scope the specified configuration option is applied to. Most of the configuration options can be applied to a specific instance, replica set, group, or to all instances globally.

  • Instance

    To apply certain configuration options to a specific instance, specify such options for this instance only. In the example below, iproto.listen is applied to instance001 only.

    groups:
      group001:
        replicasets:
          replicaset001:
            instances:
              instance001:
                iproto:
                  listen:
                  - uri: '127.0.0.1:3301'
    
  • Replica set

    In this example, iproto.listen is in effect for all instances in replicaset001.

    groups:
      group001:
        replicasets:
          replicaset001:
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
            instances:
              instance001: { }
    
  • Group

    In this example, iproto.listen is in effect for all instances in group001.

    groups:
      group001:
        iproto:
          listen:
          - uri: '127.0.0.1:3301'
        replicasets:
          replicaset001:
            instances:
              instance001: { }
    
  • Global

    In this example, iproto.listen is applied to all instances of the cluster.

    iproto:
      listen:
      - uri: '127.0.0.1:3301'
    
    groups:
      group001:
        replicasets:
          replicaset001:
            instances:
              instance001: { }
    

Configuration scopes above are listed in the order of their precedence – from highest to lowest. For example, if the same option is defined at the instance and global level, the instance’s value takes precedence over the global one.

Примечание

The Configuration reference contains information about scopes to which each configuration option can be applied.

The example below shows how specific configuration options work in different configuration scopes for a replica set with a manual failover. You can learn more about configuring replication from Replication tutorials.

credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [replication]

iproto:
  advertise:
    peer:
      login: replicator

replication:
  failover: manual

groups:
  group001:
    replicasets:
      replicaset001:
        leader: instance001
        instances:
          instance001:
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            iproto:
              listen:
              - uri: '127.0.0.1:3302'
          instance003:
            iproto:
              listen:
              - uri: '127.0.0.1:3303'
  • credentials (global)

    This section is used to create the replicator user and assign it the specified role. These options are applied globally to all instances.

  • iproto (global, instance)

    The iproto section is specified on both global and instance levels. The iproto.advertise.peer option specifies the parameters used by an instance to connect to another instance as a replica, for example, a URI, a login and password, or SSL parameters . In the example above, the option includes login only. An URI is taken from iproto.listen that is set on the instance level.

  • replication (global)

    The replication.failover global option sets a manual failover for all replica sets.

  • leader (replica set)

    The <replicaset-name>.leader option sets a master instance for replicaset001.

An application role is a Lua module that implements specific functions or logic. You can turn on or off a particular role for certain instances in a configuration without restarting these instances.

There can be built-in Tarantool roles, roles provided by third-party Lua modules, or custom roles that are developed as a part of a cluster application. This section describes how to enable and configure roles. To learn how to develop custom roles, see Application roles.

To turn on or off a role for a specific instance or a set of instances, use the roles configuration option. The example below shows how to enable the roles.crud-router role provided by the CRUD module using the roles option:

roles: [ roles.crud-router ]

Similarly, you can enable the roles.crud-storage role to make instances act as CRUD storages:

roles: [ roles.crud-storage ]

Example on GitHub: sharded_cluster_crud

The roles_cfg option allows you to specify the configuration for each role. In this option, the role name is the key and the role configuration is the value.

The example below shows how to enable statistics on called operations by providing the roles.crud-router role’s configuration:

roles:
- roles.crud-router
- roles.metrics-export
roles_cfg:
  roles.crud-router:
    stats: true
    stats_driver: metrics
    stats_quantiles: true

Example on GitHub: sharded_cluster_crud_metrics

As the most of configuration options, roles and their configurations can be defined at different levels. Given that the roles option has the array type and roles_cfg has the map type, there are some specifics of applying the configuration:

  • For roles, an instance’s role takes precedence over roles defined at another level. In the example below, instance001 has only role3:

    # ...
    replicaset001:
      roles: [ role1, role2 ]
      instances:
        instance001:
          roles: [ role3 ]
    

    Learn more about the order of precedence for different configuration scopes in Configuration scopes.

  • For roles_cfg, the following rules are applied:

    • If a configuration for the same role is provided at different levels, an instance configuration takes precedence over the configuration defined at another level. In the example below, role1.greeting is 'Hi':

      # ...
      replicaset001:
        roles_cfg:
          role1:
            greeting: 'Hello'
        instances:
          instance001:
            roles: [ role1 ]
            roles_cfg:
              role1:
                greeting: 'Hi'
      
    • If the configurations for different roles are provided at different levels, both configurations are applied at the instance level. In the example below, instance001 has role1.greeting set to 'Hi' and role2.farewell set to 'Bye':

      # ...
      replicaset001:
        roles_cfg:
          role1:
            greeting: 'Hi'
        instances:
          instance001:
            roles: [ role1, role2 ]
            roles_cfg:
              role2:
                farewell: 'Bye'
      

Labels allow adding custom attributes to your cluster configuration. A label is an arbitrary key: value pair with a string key and value.

labels:
  dc: 'east'
  production: 'false'

Labels can be defined in any configuration scope. An instance receives labels from all scopes it belongs to. The labels section in a group or a replica set scope applies to all instances of the group or a replica set. To override these labels on the instance level or add instance-specific labels, define another labels section in the instance scope.

groups:
  group001:
    replicasets:
      replicaset001:
        labels:
          dc: 'east'
          production: 'false'
        instances:
          instance001:
            labels:
              rack: '10'
              production: 'true'

Example on GitHub: labels

To access instance labels from the application code, call the config:get() function:

myapp:instance001> require('config'):get('labels')
---
- production: 'true'
  rack: '10'
  dc: east
...

Labels can be used to direct function calls to instances that match certain criteria using the connpool module.

In a configuration file, you can use the following predefined variables that are replaced with actual values at runtime:

  • instance_name
  • replicaset_name
  • group_name

To reference these variables in a configuration file, enclose them in double curly braces with whitespaces. In the example below, {{ instance_name }} is replaced with instance001.

groups:
  group001:
    replicasets:
      replicaset001:
        instances:
          instance001:
            snapshot:
              dir: ./var/{{ instance_name }}/snapshots
            wal:
              dir: ./var/{{ instance_name }}/wals

As a result, the paths to snapshots and write-ahead logs differ for different instances.

A YAML configuration can include parts that apply only to instances that meet certain conditions. This is useful for cluster upgrade scenarios: during an upgrade, instances can be running different Tarantool versions and therefore require different configurations.

Conditional parts are defined in the conditional configuration section in the global scope. It includes one or more if subsections. Each if subsection defines conditions and configuration parts that apply to instances that meet these conditions.

The example below shows a conditional section for cluster upgrade from Tarantool 3.0.0 to Tarantool 3.1.0:

  • The user-defined label upgraded is true on instances that are running Tarantool 3.1.0 or later. On older versions, it is false.
  • Two compat options that were introduced in 3.1.0 are defined for Tarantool 3.1.0 instances. On older versions, they would cause an error.
conditional:
  - if: tarantool_version < 3.1.0
    labels:
      upgraded: 'false'
  - if: tarantool_version >= 3.1.0
    labels:
      upgraded: 'true'
    compat:
      box_error_serialize_verbose: 'new'
      box_error_unpack_type_and_code: 'new'

Example on GitHub: conditional

if sections can use one variable – tarantool_version. It contains a three-number Tarantool version and compares with values of the same format using the comparison operators >, <, >=, <=, ==, and !=. You can write complex conditions using the logical operators || (OR) and && (AND). Parentheses () can be used to define the operators precedence.

conditional:
  - if: (tarantool_version > 3.2.0 || tarantool_version == 3.1.3) && tarantool_version <= 3.99.0
    -- < ... >

If the same option is set in multiple if sections that are true for an instance, this option receives the value from the section declared last in the configuration.

Example:

conditional:
  - if: tarantool_version >= 3.0.0
    labels:
        version: '3.0' # applies to versions >= 3.0.0 and < 3.1.0
  - if: tarantool_version >= 3.1.0
    labels:
        version: '3.1+' # applies to versions >= 3.1.0

For each configuration parameter, Tarantool provides two sets of predefined environment variables:

For example, TT_IPROTO_LISTEN and TT_IPROTO_LISTEN_DEFAULT correspond to the iproto.listen option. TT_SNAPSHOT_DIR and TT_SNAPSHOT_DIR_DEFAULT correspond to the snapshot.dir option. To see all the supported environment variables, execute the tarantool command with the --help-env-list option.

$ tarantool --help-env-list

Примечание

There are also special TT_INSTANCE_NAME and TT_CONFIG environment variables that can be used to start the specified Tarantool instance with configuration from the given file.

Below are a few examples that show how to set environment variables of different types, like string, number, array, or map.

In this example, TT_LOG_LEVEL is used to set a logging level to CRITICAL:

$ export TT_LOG_LEVEL='crit'

In this example, a logging level is set to CRITICAL using a corresponding numeric value:

$ export TT_LOG_LEVEL=3

The examples below show how to set the TT_SHARDING_ROLES variable that accepts an array value. Arrays can be passed in two ways: using a simple

$ export TT_SHARDING_ROLES=router,storage

… or JSON format:

$ export TT_SHARDING_ROLES='["router", "storage"]'

The simple format is applicable only to arrays containing scalar values.

To assign map values to environment variables, you can also use simple or JSON formats. In the example below, TT_LOG_MODULES sets different logging levels for different modules using a simple format:

$ export TT_LOG_MODULES=module1=info,module2=error

In the next example, TT_ROLES_CFG is used to specify the value of a custom configuration for a role using a JSON format:

$ export TT_ROLES_CFG='{"greeter":{"greeting":"Hello"}}'

The simple format is applicable only to maps containing scalar values.

In the example below, TT_IPROTO_LISTEN is used to specify a listening host and port values:

$ export TT_IPROTO_LISTEN=['{"uri":"127.0.0.1:3311"}']

You can also pass several listening addresses:

$ export TT_IPROTO_LISTEN=['{"uri":"127.0.0.1:3311"}','{"uri":"127.0.0.1:3312"}']

Enterprise Edition

Centralized configuration storages are supported by the Enterprise Edition only.

Tarantool enables you to store configuration data in one place using a Tarantool or etcd-based storage. To achieve this, you need to:

  1. Set up a centralized configuration storage.

  2. Publish a cluster’s configuration to the storage.

  3. Configure a connection to the storage by providing a local YAML configuration with an endpoint address and key prefix in the config section:

    config:
      etcd:
        endpoints:
        - http://localhost:2379
        prefix: /myapp
    

Learn more from the following guide: Centralized configuration storages.

Tarantool configuration options are applied from multiple sources with the following precedence, from highest to lowest:

If the same option is defined in two or more locations, the option with the highest precedence is applied.

Centralized configuration storages

Enterprise Edition

Centralized configuration storages are supported by the Enterprise Edition only.

Examples on GitHub: centralized_config

Tarantool enables you to store a cluster’s configuration in one reliable place using a Tarantool or etcd-based storage:

With a local YAML configuration, you need to make sure that all cluster instances use identical configuration files:


Local configuration file

Using a centralized configuration storage, all instances get the actual configuration from one place:


Centralized configuration storage

This topic describes how to set up a configuration storage, publish a cluster configuration to this storage, and use this configuration for all cluster instances.

To make a replica set act as a configuration storage, use the built-in config.storage role.

To configure a Tarantool-based storage, follow the steps below:

  1. Define a replica set topology and specify the following options at the replica set level:

    • Enable the config.storage role in roles.
    • Optionally, provide the role configuration in roles_cfg. In the example below, the status_check_interval option sets the interval (in seconds) of status checks.
    groups:
      group001:
        replicasets:
          replicaset001:
            roles: [ config.storage ]
            roles_cfg:
              config.storage:
                status_check_interval: 3
            instances:
              instance001:
                iproto:
                  listen:
                  - uri: '127.0.0.1:4401'
              instance002:
                iproto:
                  listen:
                  - uri: '127.0.0.1:4402'
              instance003:
                iproto:
                  listen:
                  - uri: '127.0.0.1:4403'
    
  2. Create a user and grant them the following privileges:

    • The read and write permissions to the config_storage and config_storage_meta spaces used to store configuration data.
    • The execute permission to universe to allow interacting with the storage using the tt utility.
    credentials:
      users:
        sampleuser:
          password: '123456'
          privileges:
          - permissions: [ read, write ]
            spaces: [ config_storage, config_storage_meta ]
          - permissions: [ execute ]
            universe: true
    
  3. Set the replication.failover option to election to enable automated failover:

    replication:
      failover: election
    
  4. Enable the MVCC transaction mode to provide linearizability of read operations:

    database:
      use_mvcc_engine: true
    

The resulting storage configuration might look as follows:

credentials:
  users:
    sampleuser:
      password: '123456'
      privileges:
      - permissions: [ read, write ]
        spaces: [ config_storage, config_storage_meta ]
      - permissions: [ execute ]
        universe: true
    replicator:
      password: 'topsecret'
      roles: [ replication ]

iproto:
  advertise:
    peer:
      login: replicator

replication:
  failover: election

database:
  use_mvcc_engine: true

groups:
  group001:
    replicasets:
      replicaset001:
        roles: [ config.storage ]
        roles_cfg:
          config.storage:
            status_check_interval: 3
        instances:
          instance001:
            iproto:
              listen:
              - uri: '127.0.0.1:4401'
          instance002:
            iproto:
              listen:
              - uri: '127.0.0.1:4402'
          instance003:
            iproto:
              listen:
              - uri: '127.0.0.1:4403'

You can find the full example here: tarantool_config_storage.

To start instances of the configured storage, use the tt start command, for example:

$ tt start tarantool_config_storage

Learn more from the Starting and stopping instances section.

To learn how to set up an etcd-based configuration storage, consult the etcd documentation.

The example script below demonstrates how to use the etcdctl utility to create a user that has read and write access to configurations stored by the /myapp/ prefix:

etcdctl user add root:topsecret
etcdctl role add myapp_config_manager
etcdctl role grant-permission myapp_config_manager --prefix=true readwrite /myapp/
etcdctl user add sampleuser:123456
etcdctl user grant-role sampleuser myapp_config_manager
etcdctl auth enable

The credentials of this user should be specified when configuring a connection to the etcd cluster.

The tt utility provides the tt cluster command for managing centralized cluster configurations. The tt cluster publish command can be used to publish a cluster’s configuration to both Tarantool and etcd-based storages.

The example below shows how a tt environment and a layout of the application called myapp might look:

├── tt.yaml
├── source.yaml
└── instances.enabled
    └── myapp
        ├── config.yaml
        └── instances.yml
  • tt.yaml: a tt configuration file.
  • source.yaml contains a cluster’s configuration to be published.
  • config.yaml contains a local configuration used to connect to the centralized storage.
  • instances.yml specifies instances to run in the current environment. The configured instances are used by tt when starting a cluster. tt cluster publish ignores this configuration file.

To publish a cluster’s configuration (source.yaml) to a centralized storage, execute tt cluster publish as follows:

$ tt cluster publish "http://sampleuser:123456@localhost:2379/myapp" source.yaml

Executing this command publishes a cluster configuration by the /myapp/config/all path.

Примечание

You can see a cluster’s configuration using the tt cluster show command.

The config module provides the API for interacting with a Tarantool-based configuration storage. The example below shows how to read a configuration stored in the source.yaml file using the fio module API and put this configuration by the /myapp/config/all path:

local fio = require('fio')
local cluster_config_handle = fio.open('../../source.yaml')
local cluster_config = cluster_config_handle:read()
local response = config.storage.put('/myapp/config/all', cluster_config)
cluster_config_handle:close()

Learn more from the config.storage API section.

Примечание

The net.box module provides the ability to monitor configuration updates by watching path or prefix changes. Learn more in conn:watch().

To publish a cluster’s configuration to etcd using the etcdctl utility, use the put command:

$ etcdctl put /myapp/config/all < source.yaml

Примечание

For etcd versions earlier than 3.4, you need to set the ETCDCTL_API environment variable to 3.

To use a configuration from a centralized storage for your cluster, you need to provide connection settings in a local configuration file.

Enterprise Edition

Centralized configuration storages are supported by the Enterprise Edition only.

Connection options for a Tarantool-based storage should be specified in the config.storage section of the configuration file. In the example below, the following options are specified:

config:
  storage:
    endpoints:
      - uri: '127.0.0.1:4401'
        login: sampleuser
        password: '123456'
      - uri: '127.0.0.1:4402'
        login: sampleuser
        password: '123456'
      - uri: '127.0.0.1:4403'
        login: sampleuser
        password: '123456'
    prefix: /myapp
    timeout: 3
    reconnect_after: 5
  • endpoints specifies the list of configuration storage endpoints.
  • prefix sets a key prefix used to search a configuration. Tarantool searches keys by the following path: <prefix>/config/*. Note that <prefix> should start with a slash (/).
  • timeout specifies the interval (in seconds) to perform the status check of a configuration storage.
  • reconnect_after specifies how much time to wait (in seconds) before reconnecting to a configuration storage.

You can find the full example here: config_storage.

Connection options for etcd should be specified in the config.etcd section of the configuration file. In the example below, the following options are specified:

config:
  etcd:
    endpoints:
    - http://localhost:2379
    prefix: /myapp
    username: sampleuser
    password: '123456'
    http:
      request:
        timeout: 3
  • endpoints specifies the list of etcd endpoints.
  • prefix sets a key prefix used to search a configuration. Tarantool searches keys by the following path: <prefix>/config/*. Note that <prefix> should start with a slash (/).
  • username and password specify credentials used for authentication.
  • http.request.timeout configures a request timeout for an etcd server.

You can find the full example here: config_etcd.

Примечание

To run instances in production, it is recommended to use Ansible Tarantool Enterprise installer (ATE). ATE is a set of Ansible playbooks that are used to deploy and maintain Tarantool Enterprise products. ATE documentation is available to users logged in on the Tarantool website.

The tt utility is the recommended way to start Tarantool instances. You can learn how to do this from the Starting and stopping instances section.

You can also use the tarantool command to start a Tarantool instance. In this case, you can eliminate creating a local configuration and provide connection settings using the following environment variables:

Enterprise Edition

Centralized configuration storages are supported by the Enterprise Edition only.

The example below shows how to provide etcd connection settings and start cluster instances using the tarantool command:

$ export TT_CONFIG_ETCD_ENDPOINTS=http://localhost:2379
$ export TT_CONFIG_ETCD_PREFIX=/myapp

$ tarantool --name instance001
$ tarantool --name instance002
$ tarantool --name instance003

By default, Tarantool watches keys with the specified prefix for changes in a cluster’s configuration and reloads a changed configuration automatically. If necessary, you can set the config.reload option to manual to turn off configuration reloading:

config:
  reload: 'manual'
  etcd:
    # ...

In this case, you can reload a configuration in an admin console or application code using the reload() function provided by the config module:

require('config'):reload()

Configuration in code

Примечание

Starting with the 3.0 version, the recommended way of configuring Tarantool is using a configuration file. Configuring Tarantool in code is considered a legacy approach.

This topic covers the specifics of configuring Tarantool in code using the box.cfg API. In this case, a configuration is stored in an initialization file - a Lua script with the specified configuration options. You can find all the available options in the Configuration reference.

If the command to start Tarantool includes an instance file, then Tarantool begins by invoking the Lua program in the file, which may have the name init.lua. The Lua program may get further arguments from the command line or may use operating-system functions, such as getenv(). The Lua program almost always begins by invoking box.cfg(), if the database server will be used or if ports need to be opened. For example, suppose init.lua contains the lines

#!/usr/bin/env tarantool
box.cfg{
    listen              = os.getenv("LISTEN_URI"),
    memtx_memory        = 33554432,
    pid_file            = "tarantool.pid",
    wal_max_size        = 2500
}
print('Starting ', arg[1])

and suppose the environment variable LISTEN_URI contains 3301, and suppose the command line is tarantool init.lua ARG. Then the screen might look like this:

$ export LISTEN_URI=3301
$ tarantool init.lua ARG
... main/101/init.lua C> Tarantool 2.8.3-0-g01023dbc2
... main/101/init.lua C> log level 5
... main/101/init.lua I> mapping 33554432 bytes for memtx tuple arena...
... main/101/init.lua I> recovery start
... main/101/init.lua I> recovering from './00000000000000000000.snap'
... main/101/init.lua I> set 'listen' configuration option to "3301"
... main/102/leave_local_hot_standby I> ready to accept requests
Starting  ARG
... main C> entering the event loop

If you wish to start an interactive session on the same terminal after initialization is complete, you can pass the -i command-line option.

Starting from version 2.8.1, you can specify configuration parameters via special environment variables. The name of a variable should have the following pattern: TT_<NAME>, where <NAME> is the uppercase name of the corresponding box.cfg parameter.

For example:

In case of an array value, separate the array elements by a comma without space:

export TT_REPLICATION="localhost:3301,localhost:3302"

If you need to pass additional parameters for URI, use the ? and & delimiters:

export TT_LISTEN="localhost:3301?param1=value1&param2=value2"

An empty variable (TT_LISTEN=) has the same effect as an unset one, meaning that the corresponding configuration parameter won’t be set when calling box.cfg{}.

Configuration parameters have the form:

box.cfg{[key = value [, key = value ...]]}

Configuration parameters can be set in a Lua initialization file, which is specified on the Tarantool command line.

Most configuration parameters are for allocating resources, opening ports, and specifying database behavior. All parameters are optional. Most of the parameters are dynamic, that is, they can be changed at runtime by calling box.cfg{} a second time. For example, the command below sets the listen port to 3301.

tarantool> box.cfg{ listen = 3301 }
2023-05-10 13:28:54.667 [31326] main/103/interactive I> tx_binary: stopped
2023-05-10 13:28:54.667 [31326] main/103/interactive I> tx_binary: bound to [::]:3301
2023-05-10 13:28:54.667 [31326] main/103/interactive/box.load_cfg I> set 'listen' configuration option to 3301
---
...

To see all the non-null parameters, execute box.cfg (no parentheses).

tarantool> box.cfg
---
- replication_skip_conflict: false
  wal_queue_max_size: 16777216
  feedback_host: https://feedback.tarantool.io
  memtx_dir: .
  memtx_min_tuple_size: 16
  -- other parameters --
...

To see a particular parameter value, call a corresponding box.cfg option. For example, box.cfg.listen shows the specified listen address.

tarantool> box.cfg.listen
---
- 3301
...

Some configuration parameters and some functions depend on a URI (Universal Resource Identifier). The URI string format is similar to the generic syntax for a URI schema. It may contain (in order):

Only a port number is always mandatory. A password is mandatory if a user name is specified unless the user name is „guest“.

Formally, the URI syntax is [host:]port or [username:password@]host:port. If a host is omitted, then «0.0.0.0» or «[::]» is assumed, meaning respectively any IPv4 address or any IPv6 address on the local machine. If username:password is omitted, then the «guest» user is assumed. Some examples:

URI fragment Example
port 3301
host:port 127.0.0.1:3301
username:password@host:port notguest:sesame@mail.ru:3301

In code, the URI value can be passed as a number (if only a port is specified) or a string:

box.cfg { listen = 3301 }

box.cfg { listen = "127.0.0.1:3301" }

In certain circumstances, a Unix domain socket may be used where a URI is expected, for example, unix/:/tmp/unix_domain_socket.sock or simply /tmp/unix_domain_socket.sock.

The uri module provides functions that convert URI strings into their components or turn components into URI strings.

Starting from version 2.10.0, a user can open several listening iproto sockets on a Tarantool instance and, consequently, can specify several URIs in the configuration parameters such as box.cfg.listen and box.cfg.replication.

URI values can be set in a number of ways:

  • As a string with URI values separated by commas.

    box.cfg { listen = "127.0.0.1:3301, /unix.sock, 3302" }
    
  • As a table that contains URIs in the string format.

    box.cfg { listen = {"127.0.0.1:3301", "/unix.sock", "3302"} }
    
  • As an array of tables with the uri field.

    box.cfg { listen = {
            {uri = "127.0.0.1:3301"},
            {uri = "/unix.sock"},
            {uri = 3302}
        }
    }
    
  • In a combined way – an array that contains URIs in both the string and the table formats.

    box.cfg { listen = {
            "127.0.0.1:3301",
            { uri = "/unix.sock" },
            { uri = 3302 }
        }
    }
    

Also, starting from version 2.10.0, it is possible to specify additional parameters for URIs. You can do this in different ways:

  • Using the ? delimiter when URIs are specified in a string format.

    box.cfg { listen = "127.0.0.1:3301?p1=value1&p2=value2, /unix.sock?p3=value3" }
    
  • Using the params table: a URI is passed in a table with additional parameters in the «params» table. Parameters in the «params» table overwrite the ones from a URI string («value2» overwrites «value1» for p1 in the example below).

    box.cfg { listen = {
            "127.0.0.1:3301?p1=value1",
            params = {p1 = "value2", p2 = "value3"}
        }
    }
    
  • Using the default_params table for specifying default parameter values.

    In the example below, two URIs are passed in a table. The default value for the p3 parameter is defined in the default_params table and used if this parameter is not specified in URIs. Parameters in the default_params table are applicable to all the URIs passed in a table.

    box.cfg { listen = {
            "127.0.0.1:3301?p1=value1",
            { uri = "/unix.sock", params = { p2 = "value2" } },
            default_params = { p3 = "value3" }
        }
    }
    

The recommended way for specifying URI with additional parameters is the following:

box.cfg { listen = {
        {uri = "127.0.0.1:3301", params = {p1 = "value1"}},
        {uri = "/unix.sock", params = {p2 = "value2"}},
        {uri = 3302, params = {p3 = "value3"}}
    }
}

In case of a single URI, the following syntax also works:

box.cfg { listen = {
        uri = "127.0.0.1:3301",
        params = { p1 = "value1", p2 = "value2" }
    }
}

Enterprise Edition

Traffic encryption is supported by the Enterprise Edition only.

Since version 2.10.0, Tarantool Enterprise Edition has the built-in support for using SSL to encrypt the client-server communications over binary connections, that is, between Tarantool instances in a cluster or connecting to an instance via connectors using net.box.

Tarantool uses the OpenSSL library that is included in the delivery package. Note that SSL connections use only TLSv1.2.

To configure traffic encryption, you need to set the special URI parameters for a particular connection. The parameters can be set for the following box.cfg options and net.box method:

Below is the list of the parameters. In the next section, you can find details and examples on what should be configured on both the server side and the client side.

  • transport – enables SSL encryption for a connection if set to ssl. The default value is plain, which means the encryption is off. If the parameter is not set, the encryption is off too. Other encryption-related parameters can be used only if the transport = 'ssl' is set.

    Example:

    local connection = require('net.box').connect({
        uri = 'admin:topsecret@127.0.0.1:3301',
        params = { transport = 'ssl',
                   ssl_cert_file = 'certs/instance001/server001.crt',
                   ssl_key_file = 'certs/instance001/server001.key',
                   ssl_password = 'qwerty' }
    })
    
  • ssl_key_file – a path to a private SSL key file. Mandatory for a server. For a client, it’s mandatory if the ssl_ca_file parameter is set for a server; otherwise, optional. If the private key is encrypted, provide a password for it in the ssl_password or ssl_password_file parameter.

  • ssl_cert_file – a path to an SSL certificate file. Mandatory for a server. For a client, it’s mandatory if the ssl_ca_file parameter is set for a server; otherwise, optional.

  • ssl_ca_file – a path to a trusted certificate authorities (CA) file. Optional. If not set, the peer won’t be checked for authenticity.

    Both a server and a client can use the ssl_ca_file parameter:

    • If it’s on the server side, the server verifies the client.
    • If it’s on the client side, the client verifies the server.
    • If both sides have the CA files, the server and the client verify each other.
  • ssl_ciphers – a colon-separated (:) list of SSL cipher suites the connection can use. See the Supported ciphers section for details. Optional. Note that the list is not validated: if a cipher suite is unknown, Tarantool just ignores it, doesn’t establish the connection and writes to the log that no shared cipher found.

  • ssl_password – a password for an encrypted private SSL key. Optional. Alternatively, the password can be provided in ssl_password_file.

  • ssl_password_file – a text file with one or more passwords for encrypted private SSL keys (each on a separate line). Optional. Alternatively, the password can be provided in ssl_password.

    Tarantool applies the ssl_password and ssl_password_file parameters in the following order:

    1. If ssl_password is provided, Tarantool tries to decrypt the private key with it.
    2. If ssl_password is incorrect or isn’t provided, Tarantool tries all passwords from ssl_password_file one by one in the order they are written.
    3. If ssl_password and all passwords from ssl_password_file are incorrect, or none of them is provided, Tarantool treats the private key as unencrypted.

Configuration example:

box.cfg{ listen = {
    uri = 'localhost:3301',
    params = {
        transport = 'ssl',
        ssl_key_file = '/path_to_key_file',
        ssl_cert_file = '/path_to_cert_file',
        ssl_ciphers = 'HIGH:!aNULL',
        ssl_password = 'topsecret'
    }
}}

Tarantool Enterprise supports the following cipher suites:

  • ECDHE-ECDSA-AES256-GCM-SHA384
  • ECDHE-RSA-AES256-GCM-SHA384
  • DHE-RSA-AES256-GCM-SHA384
  • ECDHE-ECDSA-CHACHA20-POLY1305
  • ECDHE-RSA-CHACHA20-POLY1305
  • DHE-RSA-CHACHA20-POLY1305
  • ECDHE-ECDSA-AES128-GCM-SHA256
  • ECDHE-RSA-AES128-GCM-SHA256
  • DHE-RSA-AES128-GCM-SHA256
  • ECDHE-ECDSA-AES256-SHA384
  • ECDHE-RSA-AES256-SHA384
  • DHE-RSA-AES256-SHA256
  • ECDHE-ECDSA-AES128-SHA256
  • ECDHE-RSA-AES128-SHA256
  • DHE-RSA-AES128-SHA256
  • ECDHE-ECDSA-AES256-SHA
  • ECDHE-RSA-AES256-SHA
  • DHE-RSA-AES256-SHA
  • ECDHE-ECDSA-AES128-SHA
  • ECDHE-RSA-AES128-SHA
  • DHE-RSA-AES128-SHA
  • AES256-GCM-SHA384
  • AES128-GCM-SHA256
  • AES256-SHA256
  • AES128-SHA256
  • AES256-SHA
  • AES128-SHA
  • GOST2012-GOST8912-GOST8912
  • GOST2001-GOST89-GOST89

Tarantool Enterprise static build has the embedded engine to support the GOST cryptographic algorithms. If you use these algorithms for traffic encryption, specify the corresponding cipher suite in the ssl_ciphers parameter, for example:

box.cfg{ listen = {
    uri = 'localhost:3301',
    params = {
        transport = 'ssl',
        ssl_key_file = '/path_to_key_file',
        ssl_cert_file = '/path_to_cert_file',
        ssl_ciphers = 'GOST2012-GOST8912-GOST8912'
    }
}}

For detailed information on SSL ciphers and their syntax, refer to OpenSSL documentation.

The URI parameters for traffic encryption can also be set via environment variables, for example:

export TT_LISTEN="localhost:3301?transport=ssl&ssl_cert_file=/path_to_cert_file&ssl_key_file=/path_to_key_file"

When configuring the traffic encryption, you need to specify the necessary parameters on both the server side and the client side. Below you can find the summary on the options and parameters to be used and examples of configuration.

Server side

  • Is configured via the box.cfg.listen option.
  • Mandatory URI parameters: transport, ssl_key_file and ssl_cert_file.
  • Optional URI parameters: ssl_ca_file, ssl_ciphers, ssl_password, and ssl_password_file.

Client side

  • Is configured via the box.cfg.replication option (see details) or net_box_object.connect().

Parameters:

  • If the server side has only the transport, ssl_key_file and ssl_cert_file parameters set, on the client side, you need to specify only transport = ssl as the mandatory parameter. All other URI parameters are optional.
  • If the server side also has the ssl_ca_file parameter set, on the client side, you need to specify transport, ssl_key_file and ssl_cert_file as the mandatory parameters. Other parameters – ssl_ca_file, ssl_ciphers, ssl_password, and ssl_password_file – are optional.

Suppose, there is a master-replica set with two Tarantool instances:

  • 127.0.0.1:3301 – master (server)
  • 127.0.0.1:3302 – replica (client).

Examples below show the configuration related to connection encryption for two cases: when the trusted certificate authorities (CA) file is not set on the server side and when it does. Only mandatory URI parameters are mentioned in these examples.

  1. Without CA
  • 127.0.0.1:3301 – master (server)

    box.cfg{
        listen = {
            uri = '127.0.0.1:3301',
            params = {
                transport = 'ssl',
                ssl_key_file = '/path_to_key_file',
                ssl_cert_file = '/path_to_cert_file'
            }
        }
    }
    
  • 127.0.0.1:3302 – replica (client)

    box.cfg{
        listen = {
            uri = '127.0.0.1:3302',
            params = {transport = 'ssl'}
        },
        replication = {
            uri = 'username:password@127.0.0.1:3301',
            params = {transport = 'ssl'}
        },
        read_only = true
    }
    
  1. With CA
  • 127.0.0.1:3301 – master (server)

    box.cfg{
        listen = {
            uri = '127.0.0.1:3301',
            params = {
                transport = 'ssl',
                ssl_key_file = '/path_to_key_file',
                ssl_cert_file = '/path_to_cert_file',
                ssl_ca_file = '/path_to_ca_file'
            }
        }
    }
    
  • 127.0.0.1:3302 – replica (client)

    box.cfg{
        listen = {
            uri = '127.0.0.1:3302',
            params = {
                transport = 'ssl',
                ssl_key_file = '/path_to_key_file',
                ssl_cert_file = '/path_to_cert_file'
            }
        },
        replication = {
            uri = 'username:password@127.0.0.1:3301',
            params = {
                transport = 'ssl',
                ssl_key_file = '/path_to_key_file',
                ssl_cert_file = '/path_to_cert_file'
            }
        },
        read_only = true
    }
    

Below is the syntax for starting a Tarantool instance configured in a Lua initialization script:

$ tarantool LUA_INITIALIZATION_FILE [OPTION ...]

The tarantool command also provides a set of options that might be helpful for development purposes.

The command below starts a Tarantool instance configured in the init.lua file:

$ tarantool init.lua

Storage

This section contains guides on configuring a storage.

In-memory storage

Example on GitHub: memtx

In Tarantool, all data is stored in random-access memory (RAM) by default. For this purpose, the memtx storage engine is used.

This topic describes how to define basic settings related to in-memory storage in the memtx section of a YAML configuration – for example, memory size and maximum tuple size. For the specific settings related to allocator or sorting threads, check the corresponding memtx options in the Configuration reference.

Примечание

To estimate the required amount of memory, you can use the sizing calculator.

In Tarantool, data is stored in spaces. Each space consists of tuples – the database records. To specify the amount of memory that Tarantool allocates to store tuples, use the memtx.memory configuration option.

In the example below, the memory size is set to 1 GB (1073741824 bytes):

memtx:
  memory: 1073741824

The server does not exceed this limit to allocate tuples. For indexes and connection information, additional memory is used.

When the memtx.memory limit is reached, INSERT or UPDATE requests fail with ER_MEMORY_ISSUE.

You can configure the minimum and the maximum tuple sizes in bytes.

To define the tuple size, use the memtx.min_tuple_size and memtx.max_tuple_size configuration options.

In the example, the minimum size is set to 8 bytes and the maximum size is set to 5 MB:

memtx:
  memory: 1073741824
  min_tuple_size: 8
  max_tuple_size: 5242880

Persistence

To ensure data persistence, Tarantool provides the abilities to:

This topic describes how to configure:

To learn more about the persistence mechanism in Tarantool, see the Persistence section. The formats of WAL and snapshot files are described in detail in the File formats section.

Example on GitHub: snapshot

This section describes how to define snapshot settings in the snapshot section of a YAML configuration.

Примечание

To force immediate creation of a snapshot file, use the box.snapshot() function.

In Tarantool, it is possible to automate the snapshot creation. Automatic creation is enabled by default and can be configured in two ways:

  • A new snapshot is taken once in a given period (see snapshot.by.interval).
  • A new snapshot is taken once the size of all WAL files created since the last snapshot exceeds a given limit (see snapshot.by.wal_size).

The snapshot.by.interval option sets up the checkpoint daemon that takes a new snapshot every snapshot.by.interval seconds. If the snapshot.by.interval option is set to zero, the checkpoint daemon is disabled.

The snapshot.by.wal_size option defines the maximum size in bytes for all WAL files created since the last snapshot taken. Once this size is exceeded, the checkpoint daemon takes a snapshot. Then, Tarantool garbage collector deletes the old WAL files.

The example shows how to specify the snapshot.by.interval and the snapshot.by.wal_size options:

by:
  interval: 7200
  wal_size: 1000000000000000000

In the example, a new snapshot is created in two cases:

  • every 2 hours (every 7200 seconds)
  • when the size for all WAL files created since the last snapshot reaches the size of 1e18 (1000000000000000000) bytes.

To configure a directory where the snapshot files are stored, use the snapshot.dir configuration option. The example below shows how to specify a snapshot directory for instance001 explicitly:

instance001:
  snapshot:
    dir: 'var/lib/{{ instance_name }}/snapshots'

By default, WAL files and snapshot files are stored in the same directory var/lib/{{ instance_name }}. However, you can specify different directories for them. For example, you can place snapshots and write-ahead logs on different hard drives for better reliability:

instance001:
  snapshot:
    dir: '/media/drive1/snapshots'
  wal:
    dir: '/media/drive2/wals'

You can set a limit on the number of snapshots stored in the snapshot.dir directory using the snapshot.count option. Once the number of snapshots reaches the given limit, Tarantool garbage collector deletes the oldest snapshot file and any associated WAL files after the new snapshot is taken.

In the example below, the snapshot is created every two hours (every 7200 seconds) until there are three snapshots in the snapshot.dir directory. After creating a new snapshot (the fourth one), the oldest snapshot and the corresponding WALs are deleted.

count: 3
by:
  interval: 7200

Example on GitHub: wal

This section describes how to define WAL settings in the wal section of a YAML configuration.

The recording to the write-ahead log is enabled by default. It means that if an instance restart occurs, the data will be recovered. The recording to the WAL can be configured using the wal.mode configuration option.

There are two modes that enable writing to the WAL:

  • write (default) – enable WAL and write the data without waiting for the data to be flushed to the storage device.
  • fsync – enable WAL and ensure that the record is written to the storage device.

The example below shows how to specify the write WAL mode:

mode: 'write'

To turn the WAL writer off, set the wal.mode option to none.

To configure a directory where the WAL files are stored, use the wal.dir configuration option. The example below shows how to specify a directory for instance001 explicitly:

instance001:
  wal:
    dir: 'var/lib/{{ instance_name }}/wals'

In case of replication or hot standby mode, Tarantool scans for changes in the WAL files every wal.dir_rescan_delay seconds. The example below shows how to specify the interval between scans:

dir_rescan_delay: 3

A new WAL file is created when the current one reaches the wal.max_size size. The configuration for this option might look as follows:

max_size: 268435456

In Tarantool, the checkpoint daemon takes new snapshots at the given interval (see snapshot.by.interval). After an instance restart, the Tarantool garbage collector deletes the old WAL files.

To delay the immediate deletion of WAL files, use the wal.cleanup_delay configuration option. The delay eliminates possible erroneous situations when the master deletes WALs needed by replicas after restart. As a consequence, replicas sync with the master faster after its restart and don’t need to download all the data again.

In the example, the delay is set to 5 hours (18000 seconds):

cleanup_delay: 18000

In Tarantool Enterprise, you can store an old and new tuple for each CRUD operation performed. A detailed description and examples of the WAL extensions are provided in the WAL extensions section.

See also: wal.ext.* configuration options.

The checkpoint daemon (snapshot daemon) is a constantly running fiber. The checkpoint daemon creates a schedule for the periodic snapshot creation based on the configuration options and the speed of file size growth. If enabled, the daemon makes new snapshot (.snap) files according to this schedule.

The work of the checkpoint daemon is based on the following configuration options:

If necessary, the checkpoint daemon also activates the Tarantool garbage collector that deletes old snapshots and WAL files.

Примечание

The memtx engine takes only regular snapshots with the interval set in the checkpoint daemon configuration.

The vinyl engine runs checkpointing in the background at all times.

Tarantool garbage collector can be activated by the checkpoint daemon. The garbage collector tracks the snapshots that are to be relayed to a replica or needed by other consumers. When the files are no longer needed, Tarantool garbage collector deletes them.

Примечание

The garbage collector called by the checkpoint daemon is distinct from the Lua garbage collector which is for Lua objects, and distinct from the Tarantool garbage collector that specializes in handling shard buckets.

This garbage collector is called as follows:

If an old snapshot file is deleted, the Tarantool garbage collector also deletes any write-ahead log (.xlog) files that meet the following conditions:

Tarantool garbage collector also deletes obsolete vinyl .run files.

Tarantool garbage collector doesn’t delete a file in the following cases:

WAL extensions

Enterprise Edition

WAL extensions are available in the Enterprise Edition only.

WAL extensions allow you to add auxiliary information to each write-ahead log record. For example, you can enable storing an old and new tuple for each CRUD operation performed. This information might be helpful for implementing a CDC (Change Data Capture) utility that transforms a data replication stream.

See also: Configure the write-ahead log.

WAL extensions are disabled by default. To configure them, use the wal.ext.* configuration options. Inside the wal.ext block, you can enable storing old and new tuples as follows:

Note that records with additional fields are replicated as follows:

The table below demonstrates how write-ahead log records might look for the specific CRUD operations if storing old and new tuples is enabled for the bands space.

Operation Example WAL information
insert bands:insert{4, 'The Beatles', 1960}
new_tuple: [4, „The Beatles“, 1960]
tuple: [4, „The Beatles“, 1960]
delete bands:delete{4}
key: [4]
old_tuple: [4, „The Beatles“, 1960]
update bands:update({2}, {{'=', 2, 'Pink Floyd'}})
new_tuple: [2, „Pink Floyd“, 1965]
old_tuple: [2, „Scorpions“, 1965]
key: [2]
tuple: [[„=“, 2, „Pink Floyd“]]
upsert bands:upsert({2, 'Pink Floyd', 1965}, {{'=', 2, 'The Doors'}})
new_tuple: [2, „The Doors“, 1965]
old_tuple: [2, „Pink Floyd“, 1965]
operations: [[„=“, 2, „The Doors“]]
tuple: [2, „Pink Floyd“, 1965]
replace bands:replace{1, 'The Beatles', 1960}
old_tuple: [1, „Roxette“, 1986]
new_tuple: [1, „The Beatles“, 1960]
tuple: [1, „The Beatles“, 1960]

Storing both old and new tuples is especially useful for the update operation because a write-ahead log record contains only a key value.

Примечание

You can use the Printing the contents of .snap and .xlog files command to see the contents of a write-ahead log.

Defining and manipulating data

Tarantool stores data in spaces, which can be thought of as tables in a relational database. Every record or row in a space is called a tuple. A tuple may have any number of fields, and the fields may be of different types.

String data in fields are compared based on the specified collation rules. The user can provide hard limits for data values through constraints and link related spaces with foreign keys.

Tarantool supports highly customizable indexes of various types. In particular, indexes can be defined with generators like sequences.

There are six basic data operations in Tarantool: SELECT, INSERT, UPDATE, UPSERT, REPLACE, and DELETE. A number of complexity factors affects the resource usage of each function.

Tarantool allows describing the data schema but does not require it. The user can migrate a schema without migrating the data.

To ensure data persistence and recover quickly in case of failure, Tarantool uses mechanisms like the write-ahead log (WAL) and snapshots.

This section contains guides on performing data operations in Tarantool.

Data storage

Tarantool operates data in the form of tuples.

tuple

A tuple is a group of data values in Tarantool’s memory. Think of it as a «database record» or a «row». The data values in the tuple are called fields.

When Tarantool returns a tuple value in the console, by default, it uses YAML format, for example: [3, 'Ace of Base', 1993].

Internally, Tarantool stores tuples as MsgPack arrays.

field

Fields are distinct data values, contained in a tuple. They play the same role as «row columns» or «record fields» in relational databases, with a few improvements:

  • fields can be composite structures, such as arrays or maps,
  • fields don’t need to have names.

A given tuple may have any number of fields, and the fields may be of different types.

The field’s number is the identifier of the field. Numbers are counted from base 1 in Lua and other 1-based languages, or from base 0 in languages like PHP or C/C++. So, 1 or 0 can be used in some contexts to refer to the first field of a tuple.

Tarantool stores tuples in containers called spaces.

space

In Tarantool, a space is a primary container that stores data. It is analogous to tables in relational databases. Spaces contain tuples – the Tarantool name for database records. The number of tuples in a space is unlimited.

At least one space is required to store data with Tarantool. Each space has the following attributes:

  • a unique name specified by the user,
  • a unique numeric identifier which can be specified by the user, but usually is assigned automatically by Tarantool,
  • an engine: memtx (default) — in-memory engine, fast but limited in size, or vinyl — on-disk engine for huge data sets.

To be functional, a space also needs to have a primary index. It can also have secondary indexes.

Tarantool is both a database manager and an application server. Therefore a developer often deals with two type sets: the types of the programming language (such as Lua) and the types of the Tarantool storage format (MsgPack).

Scalar / compound MsgPack type Lua type Example value
scalar nil cdata box.NULL
scalar boolean boolean true
scalar string string 'A B C'
scalar integer number 12345
scalar integer cdata 12345
scalar float64 (double) number 1.2345
scalar float64 (double) cdata 1.2345
scalar binary cdata [!!binary 3t7e]
scalar ext (for Tarantool decimal) cdata 1.2
scalar ext (for Tarantool datetime) cdata '2021-08-20T16:21:25.122999906 Europe/Berlin'
scalar ext (for Tarantool interval) cdata +1 months, 1 days
scalar ext (for Tarantool uuid) cdata 12a34b5c-de67-8f90-123g-h4567ab8901
compound map table (with string keys) {'a': 5, 'b': 6}
compound array table (with integer keys) [1, 2, 3, 4, 5]
compound array tuple (cdata) [12345, 'A B C']

Примечание

MsgPack values have variable lengths. So, for example, the smallest number requires only one byte, but the largest number requires nine bytes.

Примечание

The Lua nil type is encoded as MsgPack nil but decoded as msgpack.NULL.

In Lua, the nil type has only one possible value, also called nil. Tarantool displays it as null when using the default YAML format. Nil may be compared to values of any types with == (is-equal) or ~= (is-not-equal), but other comparison operations will not work. Nil may not be used in Lua tables; the workaround is to use box.NULL because nil == box.NULL is true. Example: nil.

A boolean is either true or false.

Example: true.

The Tarantool integer type is for integers between -9223372036854775808 and 18446744073709551615, which is about 18 quintillion. This type corresponds to the number type in Lua and to the integer type in MsgPack.

Example: -2^63.

The Tarantool unsigned type is for integers between 0 and 18446744073709551615. So it is a subset of integer.

Example: 123456.

The double field type exists mainly to be equivalent to Tarantool/SQL’s DOUBLE data type. In msgpuck.h (Tarantool’s interface to MsgPack), the storage type is MP_DOUBLE and the size of the encoded value is always 9 bytes. In Lua, fields of the double type can only contain non-integer numeric values and cdata values with double floating-point numbers.

Examples: 1.234, -44, 1.447e+44.

To avoid using the wrong kind of values inadvertently, use ffi.cast() when searching or changing double fields. For example, instead of space_object:insert{value} use ffi = require('ffi') ... space_object:insert({ffi.cast('double',value)}).

Example:

s = box.schema.space.create('s', {format = {{'d', 'double'}}})
s:create_index('ii')
s:insert({1.1})
ffi = require('ffi')
s:insert({ffi.cast('double', 1)})
s:insert({ffi.cast('double', tonumber('123'))})
s:select(1.1)
s:select({ffi.cast('double', 1)})

Arithmetic with cdata double will not work reliably, so for Lua, it is better to use the number type. This warning does not apply for Tarantool/SQL because Tarantool/SQL does implicit casting.

The Tarantool number field may have both integer and floating-point values, although in Lua a number is a double-precision floating-point.

Tarantool will try to store a Lua number as floating-point if the value contains a decimal point or is very large (greater than 100 trillion = 1e14), otherwise Tarantool will store it as an integer. To ensure that even very large numbers are stored as integers, use the tonumber64 function, or the LL (Long Long) suffix, or the ULL (Unsigned Long Long) suffix. Here are examples of numbers using regular notation, exponential notation, the ULL suffix and the tonumber64 function: -55, -2.7e+20, 100000000000000ULL, tonumber64('18446744073709551615').

You can also use the ffi module to specify a C type to cast the number to. In this case, the number will be stored as cdata.

The Tarantool decimal type is stored as a MsgPack ext (Extension). Values with the decimal type are not floating-point values although they may contain decimal points. They are exact with up to 38 digits of precision.

Example: a value returned by a function in the decimal module.

Introduced in v. 2.10.0. The Tarantool datetime type facilitates operations with date and time, accounting for leap years or the varying number of days in a month. It is stored as a MsgPack ext (Extension). Operations with this data type use code from c-dt, a third-party library.

For more information, see Module datetime.

Since: v. 2.10.0

The Tarantool interval type represents periods of time. They can be added to or subtracted from datetime values or each other. Operations with this data type use code from c-dt, a third-party library. The type is stored as a MsgPack ext (Extension). For more information, see Module datetime.

A string is a variable-length sequence of bytes, usually represented with alphanumeric characters inside single quotes. In both Lua and MsgPack, strings are treated as binary data, with no attempts to determine a string’s character set or to perform any string conversion – unless there is an optional collation. So, usually, string sorting and comparison are done byte-by-byte, without any special collation rules applied. For example, numbers are ordered by their point on the number line, so 2345 is greater than 500; meanwhile, strings are ordered by the encoding of the first byte, then the encoding of the second byte, and so on, so '2345' is less than '500'.

Example: 'A, B, C'.

A bin (binary) value is not directly supported by Lua but there is a Tarantool type varbinary. See the varbinary module reference for details.

Example: "\65 \66 \67".

The Tarantool uuid type is used for Universally Unique Identifiers. Since version 2.4.1 Tarantool stores uuid values as a MsgPack ext (Extension).

Example: 64d22e4d-ac92-4a23-899a-e5934af5479.

An array is represented in Lua with {...} (braces).

Examples: lists of numbers representing points in geometric figures: {10, 11}, {3, 5, 9, 10}.

Lua tables with string keys are stored as MsgPack maps; Lua tables with integer keys starting with 1 are stored as MsgPack arrays. Nils may not be used in Lua tables; the workaround is to use box.NULL.

Example: a box.space.tester:select() request will return a Lua table.

A tuple is a light reference to a MsgPack array stored in the database. It is a special type (cdata) to avoid conversion to a Lua table on retrieval. A few functions may return tables with multiple tuples. For tuple examples, see box.tuple.

Values in a scalar field can be boolean, integer, unsigned, double, number, decimal, string, uuid, or varbinary; but not array, map, or tuple.

Examples: true, 1, 'xxx'.

Values in a field of this type can be boolean, integer, unsigned, double, number, decimal, string, uuid, varbinary, array, map, or tuple.

Examples: true, 1, 'xxx', {box.NULL, 0}.

Examples of insert requests with different field types:

tarantool> box.space.K:insert{1,nil,true,'A B C',12345,1.2345}
---
- [1, null, true, 'A B C', 12345, 1.2345]
...
tarantool> box.space.K:insert{2,{['a']=5,['b']=6}}
---
- [2, {'a': 5, 'b': 6}]
...
tarantool> box.space.K:insert{3,{1,2,3,4,5}}
---
- [3, [1, 2, 3, 4, 5]]
...

To learn more about what values can be stored in indexed fields, read the Indexes section.

By default, when Tarantool compares strings, it uses the so-called binary collation. It only considers the numeric value of each byte in a string. For example, the encoding of 'A' (what used to be called the «ASCII value») is 65, the encoding of 'B' is 66, and the encoding of 'a' is 98. Therefore, if the string is encoded with ASCII or UTF-8, then 'A' < 'B' < 'a'.

Binary collation is the best choice for fast deterministic simple maintenance and searching with Tarantool indexes.

But if you want the ordering that you see in phone books and dictionaries, then you need Tarantool’s optional collations, such as unicode and unicode_ci, which allow for 'a' < 'A' < 'B' and 'a' == 'A' < 'B' respectively.

The unicode and unicode_ci optional collations use the ordering according to the Default Unicode Collation Element Table (DUCET) and the rules described in Unicode® Technical Standard #10 Unicode Collation Algorithm (UTS #10 UCA). The only difference between the two collations is about weights:

As an example, take some Russian words:

'ЕЛЕ'
'елейный'
'ёлка'
'еловый'
'елозить'
'Ёлочка'
'ёлочный'
'ЕЛь'
'ель'

…and show the difference in ordering and selecting by index:

In all, collation involves much more than these simple examples of upper case / lower case and accented / unaccented equivalence in alphabets. We also consider variations of the same character, non-alphabetic writing systems, and special rules that apply for combinations of characters.

For English, Russian, and most other languages and use cases, use the «unicode» and «unicode_ci» collations. If you need Cyrillic letters „Е“ and „Ё“ to have the same level-1 weights, try the Kyrgyz collation.

The tailored optional collations: for other languages, Tarantool supplies tailored collations for every modern language that has more than a million native speakers, and for specialized situations such as the difference between dictionary order and telephone book order. Run box.space._collation:select() to see the complete list.

The tailored collation names have the form unicode_[language code]_[strength], where language code is a standard 2-character or 3-character language abbreviation, and strength is s1 for «primary strength» (level-1 weights), s2 for «secondary», s3 for «tertiary». Tarantool uses the same language codes as the ones in the «list of tailorable locales» on man pages of Ubuntu and Fedora. Charts explaining the precise differences from DUCET order are in the Common Language Data Repository.

Default values are assigned to tuple fields automatically if these fields are skipped during the tuple insert or update.

You can specify a default value for a field in the space_object:format() call that defines the space format. Default values apply regardless of the field nullability: any tuple in which the field is skipped or set to nil receives the default value.

Default values can be set in two ways: explicitly or using a function.

Explicit default values are defined in the default parameter of the field declaration in a space_object:format() call.

local books = box.schema.space.create('books')
books:format({
    { name = 'id', type = 'number' },
    { name = 'name', type = 'string' },
    { name = 'year', type = 'number', default = 2024 },
})
books:create_index('primary', { parts = { 1 } })

To use a default value for a field, skip it or assign nil:

books:insert { 1, 'Thinking in Java' }
books:insert { 2, 'How to code in Go', nil }

Any Lua object that can be evaluated during the space_object.format() call may be used as a default value, for example:

  • a constant: default = 100
  • an initialized variable: default = default_size
  • an expression: default = 10 + default_size
  • a function return value: default = count_default()

Важно

Explicit default values are evaluated only when setting the space format. If you use a variable as a default value, its further assignments do not affect the default value.

To change the default values, call space_object:format() again.

See also the space_object:format() reference.

A default value can be defined as a return value of a stored Lua function. To be the default, a function must be created with box.schema.func.create() with the function body and return one value of the field’s type. It also must not yield.

box.schema.func.create('current_year', {
    language = 'Lua',
    body = "function() return require('datetime').now().year end"
})

Default functions are set in the default_func parameter of the field declaration in a space_object:format() call. To make a function with no arguments the default for a field, specify its name:

local books = box.schema.space.create('books')
books:format({
    { name = 'id', type = 'unsigned' },
    { name = 'isbn', type = 'string' },
    { name = 'title', type = 'string' },
    { name = 'year', type = 'unsigned', default_func = 'current_year' }
})
books:create_index('primary', { parts = { 1 } })

A default function can also have one argument.

box.schema.func.create('randomize', {
    language = 'Lua',
    body = "function(limit) return math.random(limit.min, limit.max) end"
})

To pass the function argument when setting the default, specify it in the default parameter of the space_object:format() call:

books:format({
    { name = 'id', type = 'unsigned', default_func= 'randomize', default = {min = 0, max = 1000} },
    { name = 'isbn', type = 'string' },
    { name = 'title', type = 'string' },
    { name = 'year', type = 'unsigned', default_func = 'current_year' }
})

Примечание

A key difference between a default function (default_func = 'count_default') and a function return value used as a field default value (default = count_default()) is the following:

  • A default function is called every time a default value must be produced, that is, a tuple is inserted or updated without specifying the field.
  • A return value used a field default value: the function is called once when setting the space format. Then, all tuples receive the result of this exact call if the field is not specified.

See also the space_object.format() reference.

For better control over stored data, Tarantool supports constraints – user-defined limitations on the values of certain fields or entire tuples. Together with data types, constraints allow limiting the ranges of available field values both syntactically and semantically.

For example, the field age typically has the number type, so it cannot store strings or boolean values. However, it can still have values that don’t make sense, such as negative numbers. This is where constraints come to help.

There are two types of constraints in Tarantool:

  • Field constraints check that the value being assigned to a field satisfies a given condition. For example, age must be non-negative.
  • Tuple constraints check complex conditions that can involve all fields of a tuple. For example, a tuple contains a date in three fields: year, month, and day. You can validate day values based on the month value (and even year if you consider leap years).

Field constraints work faster, while tuple constraints allow implementing a wider range of limitations.

Constraints use stored Lua functions or SQL expressions, which must return true when the constraint is satisfied. Other return values (including nil) and exceptions make the check fail and prevent tuple insertion or modification.

To create a constraint function, call box.schema.func.create() with the function definition specified in the body attribute.

Constraint functions take two parameters:

  • The tuple and the constraint name for tuple constraints.

    -- Define a tuple constraint function --
    box.schema.func.create('check_person', {
        language = 'LUA',
        is_deterministic = true,
        body = 'function(t, c) return (t.age >= 0 and #(t.name) > 3) end'
    })
    

    Предупреждение

    Tarantool doesn’t check field names used in tuple constraint functions. If a field referenced in a tuple constraint gets renamed, this constraint will break and prevent further insertions and modifications in the space.

  • The field value and the constraint name for field constraints.

    -- Define a field constraint function --
    box.schema.func.create('check_age', {
        language = 'LUA',
        is_deterministic = true,
        body = 'function(f, c) return (f >= 0 and f < 150) end'
    })
    

To create a constraint in a space, specify the corresponding function’s name in the constraint parameter:

  • Tuple constraints: when creating or altering a space.

    -- Create a space with a tuple constraint --
    customers = box.schema.space.create('customers', {constraint = 'check_person'})
    
  • Field constraints: when setting up the space format.

    -- Specify format with a field constraint --
    box.space.customers:format({
        {name = 'id', type = 'number'},
        {name = 'name', type = 'string'},
        {name = 'age',  type = 'number', constraint = 'check_age'},
    })
    

In both cases, constraint can contain multiple function names passed as a tuple. Each constraint can have an optional name:

-- Create one more tuple constraint --
box.schema.func.create('another_constraint',
    {language = 'LUA', is_deterministic = true, body = 'function(t, c) return true end'})

-- Set two constraints with optional names --
box.space.customers:alter{
    constraint = { check1 = 'check_person', check2 = 'another_constraint'}
}

Примечание

When adding a constraint to an existing space with data, Tarantool checks it against the stored data. If there are fields or tuples that don’t satisfy the constraint, it won’t be applied to the space.

Foreign keys provide links between related fields, therefore maintaining the referential integrity of the database.

Fields can contain values that exist only in other fields. For example, a shop order always belongs to a customer. Hence, all values of the customer field of the orders space must also exist in the id field of the customers space. In this case, customers is a parent space for orders (its child space). When two spaces are linked with a foreign key, each time a tuple is inserted or modified in the child space, Tarantool checks that a corresponding value is present in the parent space.

../../../_images/foreign_key.svg

Примечание

A foreign key can link a field to another field in the same space. In this case, the child field must be nullable. Otherwise, it is impossible to insert the first tuple in such a space because there is no parent tuple to which it can link.

There are two types of foreign keys in Tarantool:

  • Field foreign keys check that the value being assigned to a field is present in a particular field of another space. For example, the customer value in a tuple from the orders space must match an id stored in the customers space.
  • Tuple foreign keys check that multiple fields of a tuple have a match in another space. For example, if the orders space has fields customer_id and customer_name, a tuple foreign key can check that the customers space contains a tuple with both these values in the corresponding fields.

Field foreign keys work faster while tuple foreign keys allow implementing more strict references.

Важно

For each foreign key, there must exist a parent space index that includes all its fields.

To create a foreign key in a space, specify the parent space and linked fields in the foreign_key parameter. Parent spaces can be referenced by name or by id. When linking to the same space, the space can be omitted. Fields can be referenced by name or by number:

  • Field foreign keys: when setting up the space format.

    -- Create a space with a field foreign key --
    box.schema.space.create('orders')
    
    box.space.orders:format({
        {name = 'id',   type = 'number'},
        {name = 'customer_id', foreign_key = {space = 'customers', field = 'id'}},
        {name = 'price_total', type = 'number'},
    })
    
  • Tuple foreign keys: when creating or altering a space. Note that for foreign keys with multiple fields there must exist an index that includes all these fields.

    -- Create a space with a tuple foreign key --
    box.schema.space.create("orders", {
        foreign_key = {
            space = 'customers',
            field = {customer_id = 'id', customer_name = 'name'}
        }
    })
    
    box.space.orders:format({
        {name = "id", type = "number"},
        {name = "customer_id" },
        {name = "customer_name"},
        {name = "price_total", type = "number"},
    })
    

Примечание

Type can be omitted for foreign key fields because it’s defined in the parent space.

Foreign keys can have an optional name.

-- Set a foreign key with an optional name --
box.space.orders:alter{
    foreign_key = {
        customer = {
            space = 'customers',
            field = { customer_id = 'id', customer_name = 'name'}
        }
    }
}

A space can have multiple tuple foreign keys. In this case, they all must have names.

-- Set two foreign keys: names are mandatory --
box.space.orders:alter{
    foreign_key = {
        customer = {
            space = 'customers',
            field = {customer_id = 'id', customer_name = 'name'}
        },
        item = {
            space = 'items',
            field = {item_id = 'id'}
        }
    }
}

Tarantool performs integrity checks upon data modifications in parent spaces. If you try to remove a tuple referenced by a foreign key or an entire parent space, you will get an error.

Важно

Renaming parent spaces or referenced fields may break the corresponding foreign keys and prevent further insertions or modifications in the child spaces.

Индексы

Индекс — это специальная структура данных, которая хранит группу ключевых значений и указателей. Индекс используется для эффективного управления данными.

As with spaces, you should specify the index name and let Tarantool come up with a unique numeric identifier («index id»).

An index always has a type. The default index type is TREE. TREE indexes are provided by all Tarantool engines, can index unique and non-unique values, support partial key searches, comparisons, and ordered results. Additionally, the memtx engine supports HASH, RTREE and BITSET indexes.

Индекс может быть составным (multi-part), то есть можно объявить, что ключ индекса состоит из двух или более полей в кортеже в любом порядке. Например, для обычного TREE-индекса максимальное количество частей равно 255.

Индекс может быть уникальным, то есть можно объявить, что недопустимо дважды задавать одно значение ключа.

Первый индекс, определенный для спейса, называется первичный индекс (primary key). Он должен быть уникальным. Все остальные индексы называются вторичными индексами (secondary), они могут строиться по неуникальным значениям.

Indexes have certain limitations. See details on page Limitations.

To create a generator for indexes, you can use a sequence object. Learn how to do it in the tutorial.

Not to be confused with index types – the types of the data structure that is an index. See more about index types below.

Indexes restrict values that Tarantool can store with MsgPack. This is why, for example, 'unsigned' and 'integer' are different field types, although in MsgPack they are both stored as integer values. An 'unsigned' index contains only non-negative integer values, while an ‘integer’ index contains any integer values.

The default field type is 'unsigned' and the default index type is TREE. Although 'nil' is not a legal indexed field type, indexes may contain nil as a non-default option.

To learn more about field types, check the Field type details section.

Field type name string Field type Index type
'boolean' boolean TREE or HASH
'integer' (may also be called 'int') integer, which may include unsigned values TREE or HASH
'unsigned' (may also be called 'uint' or 'num', but 'num' is deprecated) unsigned TREE, BITSET, or HASH
'double' double TREE or HASH
'number' number, which may include integer, double, or decimal values TREE or HASH
'decimal' decimal TREE or HASH
'string' (may also be called 'str') string TREE, BITSET, or HASH
'varbinary' varbinary TREE, HASH, or BITSET (since version 2.7.1)
'uuid' uuid TREE or HASH
'datetime' datetime TREE
'array' array RTREE
'map' table Cannot be indexed
'scalar' may include nil, boolean, integer, unsigned, number, decimal, string, varbinary, or uuid values |
When a scalar field contains values of different underlying types, the key order is: nils, then booleans, then numbers, then strings, then varbinaries, then uuids.
TREE or HASH

An index always has a type. Different types are intended for different usage scenarios.

We give an overview of index features in the following table:

Feature TREE HASH RTREE BITSET
unique + + - -
non-unique + - + +
is_nullable + - - -
can be multi-part + + - -
multikey + - - -
partial-key search + - - -
can be primary key + + - -
exclude_null (version 2.8+) + - - -
Pagination (the after option) + - - -
iterator types ALL, EQ, REQ, GT, GE, LT, LE ALL, EQ ALL, EQ, GT, GE, LT, LE, OVERLAPS, NEIGHBOR ALL, EQ, BITS_ALL_SET, BITS_ANY_SET, BITS_ALL_NOT_SET

Примечание

In 2.11.0, the GT index type is deprecated for HASH indexes.

The default index type is „TREE“. TREE indexes are provided by memtx and vinyl engines, can index unique and non-unique values, support partial key searches, comparisons and ordered results.

This is a universal type of indexes, for most cases it will be the best choice.

Additionally, memtx engine supports HASH, RTREE and BITSET indexes.

HASH indexes require unique fields and loses to TREE in almost all respects. So we do not recommend to use it in the applications. HASH is now present in Tarantool mainly because of backward compatibility.

Here are some tips. Do not use HASH index:

  • just if you want to
  • if you think that HASH is faster with no performance metering
  • if you want to iterate over the data
  • for primary key
  • as an only index

Use HASH index:

  • if it is a secondary key
  • if you 100% won’t need to make it non-unique
  • if you have taken measurements on your data and you see an accountable increase in performance
  • if you save every byte on tuples (HASH is a little more compact)

RTREE is a multidimensional index supporting up to 20 dimensions. It is used especially for indexing spatial information, such as geographical objects. In this example we demonstrate spatial searches via RTREE index.

RTREE index could not be primary, and could not be unique. The option list of this type of index may contain dimension and distance options. The parts definition must contain the one and only part with type array. RTREE index can accept two types of distance functions: euclid and manhattan.

Предупреждение

Currently, the isolation level of RTREE indexes in MVCC transaction mode is read-committed (not serializable, as stated). If a transaction uses these indexes, it can read committed or confirmed data (depending on the isolation level). However, the indexes are subject to different anomalies that can make them unserializable.

Example 1:

my_space = box.schema.create_space("tester")
my_space:format({ { type = 'number', name = 'id' }, { type = 'array', name = 'content' } })
hash_index = my_space:create_index('primary', { type = 'tree', parts = {'id'} })
rtree_index = my_space:create_index('spatial', { type = 'RTREE', unique = false, parts = {'content'} })

Corresponding tuple field thus must be an array of 2 or 4 numbers. 2 numbers mean a point {x, y}; 4 numbers mean a rectangle {x1, y1, x2, y2}, where (x1, y1) and (x2, y2) - diagonal point of the rectangle.

my_space:insert{1, {1, 1}}
my_space:insert{2, {2, 2, 3, 3}}

Selection results depend on a chosen iterator. The default EQ iterator searches for an exact rectangle, a point is treated as zero width and height rectangle:

tarantool> rtree_index:select{1, 1}
---
- - [1, [1, 1]]
...

tarantool> rtree_index:select{1, 1, 1, 1}
---
- - [1, [1, 1]]
...

tarantool> rtree_index:select{2, 2}
---
- []
...

tarantool> rtree_index:select{2, 2, 3, 3}
---
- - [2, [2, 2, 3, 3]]
...

Iterator ALL, which is the default when no key is specified, selects all tuples in arbitrary order:

tarantool> rtree_index:select{}
---
- - [1, [1, 1]]
  - [2, [2, 2, 3, 3]]
...

Iterator LE (less or equal) searches for tuples with their rectangles within a specified rectangle:

tarantool> rtree_index:select({1, 1, 2, 2}, {iterator='le'})
---
- - [1, [1, 1]]
...

Iterator LT (less than, or strictly less) searches for tuples with their rectangles strictly within a specified rectangle:

tarantool> rtree_index:select({0, 0, 3, 3}, {iterator = 'lt'})
---
- - [1, [1, 1]]
...

Iterator GE searches for tuples with a specified rectangle within their rectangles:

tarantool> rtree_index:select({1, 1}, {iterator = 'ge'})
---
- - [1, [1, 1]]
...

Iterator GT searches for tuples with a specified rectangle strictly within their rectangles:

tarantool> rtree_index:select({2.1, 2.1, 2.9, 2.9}, {iterator = 'gt'})
---
- []
...

Iterator OVERLAPS searches for tuples with their rectangles overlapping specified rectangle:

tarantool> rtree_index:select({0, 0, 10, 2}, {iterator='overlaps'})
---
- - [1, [1, 1]]
  - [2, [2, 2, 3, 3]]
...

Iterator NEIGHBOR searches for all tuples and orders them by distance to the specified point:

tarantool> for i=1,10 do
         >    for j=1,10 do
         >        my_space:insert{i*10+j, {i, j, i+1, j+1}}
         >    end
         > end
---
...

tarantool> rtree_index:select({1, 1}, {iterator = 'neighbor', limit = 5})
---
- - [11, [1, 1, 2, 2]]
  - [12, [1, 2, 2, 3]]
  - [21, [2, 1, 3, 2]]
  - [22, [2, 2, 3, 3]]
  - [31, [3, 1, 4, 2]]
...

Example 2:

3D, 4D and more dimensional RTREE indexes work in the same way as 2D except that user must specify more coordinates in requests. Here’s short example of using 4D tree:

tarantool> my_space = box.schema.create_space("tester")
tarantool> my_space:format{ { type = 'number', name = 'id' }, { type = 'array', name = 'content' } }
tarantool> primary_index = my_space:create_index('primary', { type = 'TREE', parts = {'id'} })
tarantool> rtree_index = my_space:create_index('spatial', { type = 'RTREE', unique = false, dimension = 4, parts = {'content'} })
tarantool> my_space:insert{1, {1, 2, 3, 4}} -- insert 4D point
tarantool> my_space:insert{2, {1, 1, 1, 1, 2, 2, 2, 2}} -- insert 4D box

tarantool> rtree_index:select{1, 2, 3, 4} -- find exact point
---
- - [1, [1, 2, 3, 4]]
...

tarantool> rtree_index:select({0, 0, 0, 0, 3, 3, 3, 3}, {iterator = 'LE'}) -- select from 4D box
---
- - [2, [1, 1, 1, 1, 2, 2, 2, 2]]
...

tarantool> rtree_index:select({0, 0, 0, 0}, {iterator = 'neighbor'}) -- select neighbours
---
- - [2, [1, 1, 1, 1, 2, 2, 2, 2]]
  - [1, [1, 2, 3, 4]]
...

Примечание

Keep in mind that select NEIGHBOR iterator with unset limits extracts the entire space in order of increasing distance. And there can be tons of data, and this can affect the performance.

And another frequent mistake is to specify iterator type without quotes, in such way: rtree_index:select(rect, {iterator = LE}). This leads to silent EQ select, because LE is undefined variable and treated as nil, so iterator is unset and default used.

Bitset is a bit mask. You should use it when you need to search by bit masks. This can be, for example, storing a vector of attributes and searching by these attributes.

Предупреждение

Currently, the isolation level of BITSET indexes in MVCC transaction mode is read-committed (not serializable, as stated). If a transaction uses these indexes, it can read committed or confirmed data (depending on the isolation level). However, the indexes are subject to different anomalies that can make them unserializable.

Example 1:

The following script shows creating and searching with a BITSET index. Notice that BITSET cannot be unique, so first a primary-key index is created, and bit values are entered as hexadecimal literals for easier reading.

tarantool> my_space = box.schema.space.create('space_with_bitset')
tarantool> my_space:create_index('primary_index', {
         >   parts = {1, 'string'},
         >   unique = true,
         >   type = 'TREE'
         > })
tarantool> my_space:create_index('bitset_index', {
         >   parts = {2, 'unsigned'},
         >   unique = false,
         >   type = 'BITSET'
         > })
tarantool> my_space:insert{'Tuple with bit value = 01', 0x01}
tarantool> my_space:insert{'Tuple with bit value = 10', 0x02}
tarantool> my_space:insert{'Tuple with bit value = 11', 0x03}
tarantool> my_space.index.bitset_index:select(0x02, {
         >   iterator = box.index.EQ
         > })
---
- - ['Tuple with bit value = 10', 2]
...
tarantool> my_space.index.bitset_index:select(0x02, {
         >   iterator = box.index.BITS_ANY_SET
         > })
---
- - ['Tuple with bit value = 10', 2]
  - ['Tuple with bit value = 11', 3]
...
tarantool> my_space.index.bitset_index:select(0x02, {
         >   iterator = box.index.BITS_ALL_SET
         > })
---
- - ['Tuple with bit value = 10', 2]
  - ['Tuple with bit value = 11', 3]
...
tarantool> my_space.index.bitset_index:select(0x02, {
         >   iterator = box.index.BITS_ALL_NOT_SET
         > })
---
- - ['Tuple with bit value = 01', 1]
...

Example 2:

tarantool> box.schema.space.create('bitset_example')
tarantool> box.space.bitset_example:create_index('primary')
tarantool> box.space.bitset_example:create_index('bitset',{unique = false, type = 'BITSET', parts = {2,'unsigned'}})
tarantool> box.space.bitset_example:insert{1,1}
tarantool> box.space.bitset_example:insert{2,4}
tarantool> box.space.bitset_example:insert{3,7}
tarantool> box.space.bitset_example:insert{4,3}
tarantool> box.space.bitset_example.index.bitset:select(2, {iterator = 'BITS_ANY_SET'})

The result will be:

---
- - [3, 7]
  - [4, 3]
...

because (7 AND 2) is not equal to 0, and (3 AND 2) is not equal to 0.

Additionally, there exist index iterator operations. They can only be used with code in Lua and C/C++. Index iterators are for traversing indexes one key at a time, taking advantage of features that are specific to an index type. For example, they can be used for evaluating Boolean expressions when traversing BITSET indexes, or for going in descending order when traversing TREE indexes.

Using indexes

It is mandatory to create an index for a space before trying to insert tuples into the space, or select tuples from the space.

The simple index-creation operation is:

box.space.space-name:create_index('index-name')

This creates a unique TREE index on the first field of all tuples (often called «Field#1»), which is assumed to be numeric.

A recommended design pattern for a data model is to base primary keys on the first fields of a tuple. This speeds up tuple comparison due to the specifics of data storage and the way comparisons are arranged in Tarantool.

The simple SELECT request is:

box.space.space-name:select(value)

This looks for a single tuple via the first index. Since the first index is always unique, the maximum number of returned tuples will be 1. You can call select() without arguments, and it will return all tuples. Be careful! Using select() for huge spaces hangs your instance.

An index definition may also include identifiers of tuple fields and their expected types. See allowed indexed field types in section Details about indexed field types:

box.space.space-name:create_index(index-name, {type = 'tree', parts = {{field = 1, type = 'unsigned'}}}

Space definitions and index definitions are stored permanently in Tarantool’s system spaces _space and _index.

Tip

See full information about creating indexes, such as how to create a multikey index, an index using the path option, or how to create a functional index in our reference for space_object:create_index().

Index operations are automatic: if a data manipulation request changes a tuple, then it also changes the index keys defined for the tuple.

  1. Create a sample space named bands:

    bands = box.schema.space.create('bands')
    
  2. Format the created space by specifying field names and types:

    box.space.bands:format({
        { name = 'id', type = 'unsigned' },
        { name = 'band_name', type = 'string' },
        { name = 'year', type = 'unsigned' }
    })
    
  3. Create the primary index (named primary):

    box.space.bands:create_index('primary', { parts = { 'id' } })
    

    This index is based on the id field of each tuple.

  4. Insert some tuples into the space:

    box.space.bands:insert { 1, 'Roxette', 1986 }
    box.space.bands:insert { 2, 'Scorpions', 1965 }
    box.space.bands:insert { 3, 'Ace of Base', 1987 }
    box.space.bands:insert { 4, 'The Beatles', 1960 }
    box.space.bands:insert { 5, 'Pink Floyd', 1965 }
    box.space.bands:insert { 6, 'The Rolling Stones', 1962 }
    box.space.bands:insert { 7, 'The Doors', 1965 }
    box.space.bands:insert { 8, 'Nirvana', 1987 }
    box.space.bands:insert { 9, 'Led Zeppelin', 1968 }
    box.space.bands:insert { 10, 'Queen', 1970 }
    
  5. Create secondary indexes:

    -- Create a unique secondary index --
    box.space.bands:create_index('band', { parts = { 'band_name' } })
    
    -- Create a non-unique secondary index --
    box.space.bands:create_index('year', { parts = { { 'year' } }, unique = false })
    
  6. Create a multi-part index with two parts:

    box.space.bands:create_index('year_band', { parts = { { 'year' }, { 'band_name' } } })
    

There are the following SELECT variations:

Tip

You can also add, drop, or alter the definitions at runtime, with some restrictions. Read more about index operations in reference for box.index submodule.

Tuple compression

Enterprise Edition

Tuple compression is available in the Enterprise Edition only.

Tuple compression, introduced in Tarantool Enterprise Edition 2.10.0, aims to save memory space. Typically, it decreases the volume of stored data by 15%. However, the exact volume saved depends on the type of data.

The following compression algorithms are supported:

To learn about the performance costs of each algorithm, check Tuple compression performance.

Tarantool doesn’t compress tuples themselves, just the fields inside these tuples. You can only compress non-indexed fields. Compression works best when JSON is stored in the field.

Примечание

The compress module provides the API for compressing and decompressing data.

First, create a space:

box.schema.space.create('bands')

Then, create an index for this space, for example:

box.space.bands:create_index('primary', {parts = {{1, 'unsigned'}}})

Create a format to declare field names and types. In the example below, the band_name and year fields have the zstd and lz4 compression formats, respectively. The first field (id) has the index, so it cannot be compressed.

box.space.bands:format({
           {name = 'id', type = 'unsigned'},
           {name = 'band_name', type = 'string', compression = 'zstd'},
           {name = 'year', type = 'unsigned', compression = 'lz4'}
       })

Now, the new tuples that you add to the space bands will be compressed. When you read a compressed tuple, you do not need to decompress it back yourself.

To check which fields in a space are compressed, run space_object:format() on the space. If a field is compressed, the format includes the compression algorithm, for example:

tarantool> box.space.bands:format()
    ---
    - [{'name': 'id', 'type': 'unsigned'},
       {'type': 'string', 'compression': 'zstd', 'name': 'band_name'},
       {'type': 'unsigned', 'compression': 'lz4', 'name': 'year'}]
    ...

You can enable compression for existing fields. All the tuples added after that will have this field compressed. However, this doesn’t affect the tuples already stored in the space. You need to make the snapshot and restart Tarantool to compress the existing tuples.

Here’s an example of how to compress existing fields:

  1. Create a space without compression and add several tuples:

    box.schema.space.create('bands')
    
    box.space.bands:format({
        { name = 'id', type = 'unsigned' },
        { name = 'band_name', type = 'string' },
        { name = 'year', type = 'unsigned' }
    })
    
    box.space.bands:create_index('primary', { parts = { 'id' } })
    
    box.space.bands:insert { 1, 'Roxette', 1986 }
    box.space.bands:insert { 2, 'Scorpions', 1965 }
    box.space.bands:insert { 3, 'Ace of Base', 1987 }
    box.space.bands:insert { 4, 'The Beatles', 1960 }
    
  2. Suppose that you want fields 2 and 3 to be compressed from now on. To enable compression, change the format as follows:

    local new_format = box.space.bands:format()
    
    new_format[2].compression = 'zstd'
    new_format[3].compression = 'lz4'
    
    box.space.bands:format(new_format)
    

    From now on, all the tuples that you add to the space have fields 2 and 3 compressed.

  3. To finalize the change, create a snapshot by running box.snapshot() and restart Tarantool. As a result, all old tuples will also be compressed in memory during recovery.

Примечание

space:upgrade() provides the ability to enable compression and update the existing tuples in the background. To achieve this, you need to pass a new space format in the format argument of space:upgrade().

Below are the results of a synthetic test that illustrate how tuple compression affects performance. The test was carried out on a simple Tarantool space containing 100,000 tuples, each having a field with a sample JSON roughly 600 bytes large. The test compared the speed of running select and replace operations on uncompressed and compressed data as well as the overall data size of the space. Performance is measured in requests per second.

Compression type select, RPS replace, RPS Space size, bytes
None 4,486k 1,109k 41,168,548
zstd 308k 26k 21,368,548
lz4 1,765k 672k 25,268,548
zlib 325k 107k 20,768,548

Data schema description

In Tarantool, the use of a data schema is optional.

When creating a space, you do not have to define a data schema. In this case, the tuples store random data. This rule does not apply to indexed fields. Such fields must contain data of the same type.

You can define a data schema when creating a space. Read more in the description of the box.schema.space.create() function. If you have already created a space without specifying a data schema, you can do it later using space_object:format().

After the data schema is defined, all the data is validated by type. Before any insert or update, you will get an error if the data types do not match.

We recommend using a data schema because it helps avoid mistakes.

In Tarantool, you can define a data schema in two different ways.

The code file is usually called init.lua and contains the following schema description:

box.cfg()

users = box.schema.create_space('users', { if_not_exists = true })
users:format({{ name = 'user_id', type = 'number'}, { name = 'fullname', type = 'string'}})

users:create_index('pk', { parts = { { field = 'user_id', type = 'number'}}})

This is quite simple: when you run tarantool, it executes this code and creates a data schema. To run this file, use:

tarantool init.lua

However, it may seem complicated if you do not plan to dive deep into the Lua language and its syntax.

Possible difficulty: the snippet above has a function call with a colon: users:format. It is used to pass the users variable as the first argument of the format function. This is similar to self in object-based languages.

So it might be more convenient for you to describe the data schema with YAML.

The DDL module allows you to describe a data schema in the YAML format in a declarative way.

The schema would look something like this:

spaces:
  users:
    engine: memtx
    is_local: false
    temporary: false
    format:
    - {name: user_id, type: uuid, is_nullable: false}
    - {name: fullname, type: string,  is_nullable: false}
    - {name: bucket_id, type: unsigned, is_nullable: false}
    indexes:
    - name: user_id
      unique: true
      parts: [{path: user_id, type: uuid, is_nullable: false}]
      type: HASH
    - name: bucket_id
      unique: false
      parts: [{path: bucket_id, type: unsigned, is_nullable: false}]
      type: TREE
    sharding_key: [user_id]
    sharding_func: test_module.sharding_func

This alternative is simpler to use, and you do not have to dive deep into Lua.

To use the DDL module, put the following Lua code into the file that you use to run Tarantool. This file is usually called init.lua.

local yaml = require('yaml')
local ddl = require('ddl')

box.cfg{}

local fh = io.open('ddl.yml', 'r')
local schema = yaml.decode(fh:read('*all'))
fh:close()
local ok, err = ddl.check_schema(schema)
if not ok then
    print(err)
end
local ok, err = ddl.set_schema(schema)
if not ok then
    print(err)
end

Предупреждение

It is forbidden to modify the data schema in DDL after it has been applied. For migration, there are different scenarios described in the Migrations section.

Operations

The basic data operations supported in Tarantool are:

All of them are implemented as functions in box.space submodule.

Examples:

Summarizing the examples:

See reference on box.space for more details on using data operations.

Примечание

Besides Lua, you can use Perl, PHP, Python or other programming language connectors. The client server protocol is open and documented. See this annotated BNF.

In reference for box.space and Вложенный модуль box.index submodules, there are notes about which complexity factors might affect the resource usage of each function.

Complexity factor Effect
Index size The number of index keys is the same as the number of tuples in the data set. For a TREE index, if there are more keys, then the lookup time will be greater, although, of course, the effect is not linear. For a HASH index, if there are more keys, then there is more RAM used, but the number of low-level steps tends to remain constant.
Index type Typically, a HASH index is faster than a TREE index if the number of tuples in the space is greater than one.
Number of indexes accessed Ordinarily, only one index is accessed to retrieve one tuple. But to update the tuple, there must be N accesses if the space has N different indexes.
Note regarding storage engine: Vinyl optimizes away such accesses if secondary index fields are unchanged by the update. So, this complexity factor applies only to memtx, since it always makes a full-tuple copy on every update.
Number of tuples accessed A few requests, for example, SELECT, can retrieve multiple tuples. This factor is usually less important than the others.
WAL settings The important setting for the write-ahead log is wal.mode. If the setting causes no writing or delayed writing, this factor is unimportant. If the setting causes every data-change request to wait for writing to finish on a slow device, this factor is more important than all the others.

Примеры CRUD-операций

This section shows basic usage scenarios and typical errors for each data operation in Tarantool: INSERT, DELETE, UPDATE, UPSERT, REPLACE, and SELECT. Before trying out the examples, you need to bootstrap a Tarantool instance as shown below.

-- Create a space --
bands = box.schema.space.create('bands')

-- Specify field names and types --
box.space.bands:format({
    { name = 'id', type = 'unsigned' },
    { name = 'band_name', type = 'string' },
    { name = 'year', type = 'unsigned' }
})

-- Create a primary index --
box.space.bands:create_index('primary', { parts = { 'id' } })

-- Create a unique secondary index --
box.space.bands:create_index('band', { parts = { 'band_name' } })

-- Create a non-unique secondary index --
box.space.bands:create_index('year', { parts = { { 'year' } }, unique = false })

-- Create a multi-part index --
box.space.bands:create_index('year_band', { parts = { { 'year' }, { 'band_name' } } })

The space_object.insert method accepts a well-formatted tuple.

-- Insert a tuple with a unique primary key --
tarantool> bands:insert{1, 'Scorpions', 1965}
---
- [1, 'Scorpions', 1965]
...

insert also checks all the keys for duplicates.

-- Try to insert a tuple with a duplicate primary key --
tarantool> bands:insert{1, 'Scorpions', 1965}
---
- error: Duplicate key exists in unique index "primary" in space "bands" with old
    tuple - [1, "Scorpions", 1965] and new tuple - [1, "Scorpions", 1965]
...

-- Try to insert a tuple with a duplicate secondary key --
tarantool> bands:insert{2, 'Scorpions', 1965}
---
- error: Duplicate key exists in unique index "band" in space "bands" with old tuple
    - [1, "Scorpions", 1965] and new tuple - [2, "Scorpions", 1965]
...

-- Insert a second tuple with unique primary and secondary keys --
tarantool> bands:insert{2, 'Pink Floyd', 1965}
---
- [2, 'Pink Floyd', 1965]
...

-- Delete all tuples --
tarantool> bands:truncate()
---
...

space_object.delete allows you to delete a tuple identified by the primary key.

-- Insert test data --
tarantool> bands:insert{1, 'Roxette', 1986}
           bands:insert{2, 'Scorpions', 1965}
           bands:insert{3, 'Ace of Base', 1987}
           bands:insert{4, 'The Beatles', 1960}

-- Delete a tuple with an existing key --
tarantool> bands:delete{4}
---
- [4, 'The Beatles', 1960]
...
tarantool> bands:select()
---
- - [1, 'Roxette', 1986]
  - [2, 'Scorpions', 1965]
  - [3, 'Ace of Base', 1987]
...

You can also use index_object.delete to delete a tuple by the specified unique index.

-- Delete a tuple by the primary index --
tarantool> bands.index.primary:delete{3}
---
- [3, 'Ace of Base', 1987]
...
tarantool> bands:select()
---
- - [1, 'Roxette', 1986]
  - [2, 'Scorpions', 1965]
...

-- Delete a tuple by a unique secondary index --
tarantool> bands.index.band:delete{'Scorpions'}
---
- [2, 'Scorpions', 1965]
...
tarantool> bands:select()
---
- - [1, 'Roxette', 1986]
...

-- Try to delete a tuple by a non-unique secondary index --
tarantool> bands.index.year:delete(1986)
---
- error: Get() doesn't support partial keys and non-unique indexes
...
tarantool> bands:select()
---
- - [1, 'Roxette', 1986]
...

-- Try to delete a tuple by a partial key --
tarantool> bands.index.year_band:delete('Roxette')
---
- error: Invalid key part count in an exact match (expected 2, got 1)
...

-- Delete a tuple by a full key --
tarantool> bands.index.year_band:delete{1986, 'Roxette'}
---
- [1, 'Roxette', 1986]
...
tarantool> bands:select()
---
- []
...

-- Delete all tuples --
tarantool> bands:truncate()
---
...

space_object.update allows you to update a tuple identified by the primary key. Similarly to delete, the update method accepts a full key and also an operation to execute.

-- Insert test data --
tarantool> bands:insert{1, 'Roxette', 1986}
           bands:insert{2, 'Scorpions', 1965}
           bands:insert{3, 'Ace of Base', 1987}
           bands:insert{4, 'The Beatles', 1960}

-- Update a tuple with an existing key --
tarantool> bands:update({2}, {{'=', 2, 'Pink Floyd'}})
---
- [2, 'Pink Floyd', 1965]
...

tarantool> bands:select()
---
- - [1, 'Roxette', 1986]
  - [2, 'Pink Floyd', 1965]
  - [3, 'Ace of Base', 1987]
  - [4, 'The Beatles', 1960]
...

index_object.update updates a tuple identified by the specified unique index.

-- Update a tuple by the primary index --
tarantool> bands.index.primary:update({2}, {{'=', 2, 'The Rolling Stones'}})
---
- [2, 'The Rolling Stones', 1965]
...

tarantool> bands:select()
---
- - [1, 'Roxette', 1986]
  - [2, 'The Rolling Stones', 1965]
  - [3, 'Ace of Base', 1987]
  - [4, 'The Beatles', 1960]
...

-- Update a tuple by a unique secondary index --
tarantool> bands.index.band:update({'The Rolling Stones'}, {{'=', 2, 'The Doors'}})
---
- [2, 'The Doors', 1965]
...

tarantool> bands:select()
---
- - [1, 'Roxette', 1986]
  - [2, 'The Doors', 1965]
  - [3, 'Ace of Base', 1987]
  - [4, 'The Beatles', 1960]
...

-- Try to update a tuple by a non-unique secondary index --
tarantool> bands.index.year:update({1965}, {{'=', 2, 'Scorpions'}})
---
- error: Get() doesn't support partial keys and non-unique indexes
...
tarantool> bands:select()
---
- - [1, 'Roxette', 1986]
  - [2, 'The Doors', 1965]
  - [3, 'Ace of Base', 1987]
  - [4, 'The Beatles', 1960]
...

-- Delete all tuples --
tarantool> bands:truncate()
---
...

space_object.upsert updates an existing tuple or inserts a new one:

  • If the existing tuple is found by the primary key, Tarantool applies the update operation to this tuple and ignores the new tuple.
  • If no existing tuple is found, Tarantool inserts the new tuple and ignores the update operation.
tarantool> bands:insert{1, 'Scorpions', 1965}
---
- [1, 'Scorpions', 1965]
...
-- As the first argument, upsert accepts a tuple, not a key --
tarantool> bands:upsert({2}, {{'=', 2, 'Pink Floyd'}})
---
- error: Tuple field 2 (band_name) required by space format is missing
...
tarantool> bands:select()
---
- - [1, 'Scorpions', 1965]
...
tarantool> bands:delete(1)
---
- [1, 'Scorpions', 1965]
...

upsert acts as insert when no existing tuple is found by the primary key.

tarantool> bands:upsert({1, 'Scorpions', 1965}, {{'=', 2, 'The Doors'}})
---
...
-- As you can see, {1, 'Scorpions', 1965} is inserted, --
-- and the update operation is not applied. --
tarantool> bands:select()
---
- - [1, 'Scorpions', 1965]
...

-- upsert with the same primary key but different values in other fields --
-- applies the update operation and ignores the new tuple. --
tarantool> bands:upsert({1, 'Scorpions', 1965}, {{'=', 2, 'The Doors'}})
---
...
tarantool> bands:select()
---
- - [1, 'The Doors', 1965]
...

upsert searches for the existing tuple by the primary index, not by the secondary index. This can lead to a duplication error if the tuple violates a secondary index uniqueness.

tarantool> bands:upsert({2, 'The Doors', 1965}, {{'=', 2, 'Pink Floyd'}})
---
- error: Duplicate key exists in unique index "band" in space "bands" with old tuple
    - [1, "The Doors", 1965] and new tuple - [2, "The Doors", 1965]
...
tarantool> bands:select()
---
- - [1, 'The Doors', 1965]
...

-- This works if uniqueness is preserved. --
tarantool> bands:upsert({2, 'The Beatles', 1960}, {{'=', 2, 'Pink Floyd'}})
---
...
tarantool> bands:select()
---
- - [1, 'The Doors', 1965]
  - [2, 'The Beatles', 1960]
...

-- Delete all tuples --
tarantool> bands:truncate()
---
...

space_object.replace accepts a well-formatted tuple and searches for the existing tuple by the primary key of the new tuple:

  • If the existing tuple is found, Tarantool deletes it and inserts the new tuple.
  • If no existing tuple is found, Tarantool inserts the new tuple.
tarantool> bands:replace{1, 'Scorpions', 1965}
---
- [1, 'Scorpions', 1965]
...
tarantool> bands:select()
---
- - [1, 'Scorpions', 1965]
...
tarantool> bands:replace{1, 'The Beatles', 1960}
---
- [1, 'The Beatles', 1960]
...
tarantool> bands:select()
---
- - [1, 'The Beatles', 1960]
...
tarantool> bands:truncate()
---
...

replace can violate unique constraints, like upsert does.

tarantool> bands:insert{1, 'Scorpions', 1965}
- [1, 'Scorpions', 1965]
...
tarantool> bands:insert{2, 'The Beatles', 1960}
---
- [2, 'The Beatles', 1960]
...
tarantool> bands:replace{2, 'Scorpions', 1965}
---
- error: Duplicate key exists in unique index "band" in space "bands" with old tuple
    - [1, "Scorpions", 1965] and new tuple - [2, "Scorpions", 1965]
...
tarantool> bands:truncate()
---
...

The space_object.select request searches for a tuple or a set of tuples in the given space by the primary key. To search by the specified index, use index_object.select. These methods work with any keys, including unique and non-unique, full and partial. If a key is partial, select searches by all keys where the prefix matches the specified key part.

tarantool> bands:insert{1, 'Roxette', 1986}
           bands:insert{2, 'Scorpions', 1965}
           bands:insert{3, 'The Doors', 1965}
           bands:insert{4, 'The Beatles', 1960}

tarantool> bands:select(1)
---
- - [1, 'Roxette', 1986]
...

tarantool> bands:select()
---
- - [1, 'Roxette', 1986]
  - [2, 'Scorpions', 1965]
  - [3, 'The Doors', 1965]
  - [4, 'The Beatles', 1960]
...

tarantool> bands.index.primary:select(2)
---
- - [2, 'Scorpions', 1965]
...

tarantool> bands.index.band:select('The Doors')
---
- - [3, 'The Doors', 1965]
...

tarantool> bands.index.year:select(1965)
---
- - [2, 'Scorpions', 1965]
  - [3, 'The Doors', 1965]
...

This example illustrates how to look at all the spaces, and for each display: approximately how many tuples it contains, and the first field of its first tuple. The function uses the Tarantool’s box.space functions len() and pairs(). The iteration through the spaces is coded as a scan of the _space system space, which contains metadata. The third field in _space contains the space name, so the key instruction space_name = v[3] means space_name is the space_name field in the tuple of _space that we’ve just fetched with pairs(). The function returns a table:

function example()
  local tuple_count, space_name, line
  local ta = {}
  for k, v in box.space._space:pairs() do
    space_name = v[3]
    if box.space[space_name].index[0] ~= nil then
      tuple_count = '1 or more'
    else
      tuple_count = '0'
    end
    line = space_name .. ' tuple_count =' .. tuple_count
    if tuple_count == '1 or more' then
      for k1, v1 in box.space[space_name]:pairs() do
        line = line .. '. first field in first tuple = ' .. v1[1]
        break
      end
    end
    table.insert(ta, line)
  end
  return ta
end

The output below shows what happens if you invoke this function:

tarantool> example()
---
- - _schema tuple_count =1 or more. first field in first tuple = cluster
  - _space tuple_count =1 or more. first field in first tuple = 272
  - _vspace tuple_count =1 or more. first field in first tuple = 272
  - _index tuple_count =1 or more. first field in first tuple = 272
  - _vindex tuple_count =1 or more. first field in first tuple = 272
  - _func tuple_count =1 or more. first field in first tuple = 1
  - _vfunc tuple_count =1 or more. first field in first tuple = 1
  - _user tuple_count =1 or more. first field in first tuple = 0
  - _vuser tuple_count =1 or more. first field in first tuple = 0
  - _priv tuple_count =1 or more. first field in first tuple = 1
  - _vpriv tuple_count =1 or more. first field in first tuple = 1
  - _cluster tuple_count =1 or more. first field in first tuple = 1
...

This examples shows how to display field names and field types of a system space – using metadata to find metadata.

Для начала: как можно сделать выборку кортежа из _space, который описывает _space?

A simple way is to look at the constants in box.schema, which shows that there is an item named SPACE_ID == 288, so these statements retrieve the correct tuple:

box.space._space:select{ 288 }
-- или --
box.space._space:select{ box.schema.SPACE_ID }

Another way is to look at the tuples in box.space._index, which shows that there is a secondary index named „name“ for a space number 288, so this statement also retrieve the correct tuple:

box.space._space.index.name:select{ '_space' }

Однако непросто прочитать информацию из полученного кортежа:

tarantool> box.space._space.index.name:select{'_space'}
---
- - [280, 1, '_space', 'memtx', 0, {}, [{'name': 'id', 'type': 'num'}, {'name': 'owner',
        'type': 'num'}, {'name': 'name', 'type': 'str'}, {'name': 'engine', 'type': 'str'},
      {'name': 'field_count', 'type': 'num'}, {'name': 'flags', 'type': 'str'}, {
        'name': 'format', 'type': '*'}]]
...

Информация подается бессистемно, поскольку по формату поле №7 содержит рекомендованные имена и типы данных. Как же получить эти данные? Поскольку очевидно, что поле №7 представляет собой ассоциативный массив, цикл for проведет организацию данных:

tarantool> do
         >   local tuple_of_space = box.space._space.index.name:get{'_space'}
         >   for _, field in ipairs(tuple_of_space[7]) do
         >     print(field.name .. ', ' .. field.type)
         >   end
         > end
id, num
owner, num
name, str
engine, str
field_count, num
flags, str
format, *
---
...

Using sequences

A sequence is a generator of ordered integer values.

As with spaces and indexes, you should specify the sequence name and let Tarantool generate a unique numeric identifier (sequence ID).

As well, you can specify several options when creating a new sequence. The options determine the values that are generated whenever the sequence is used.

Option name Type and meaning Default Examples
start Integer. The value to generate the first time a sequence is used 1 start=0
min Integer. Values smaller than this cannot be generated 1 min=-1000
max Integer. Values larger than this cannot be generated 9223372036854775807 max=0
cycle Boolean. Whether to start again when values cannot be generated false cycle=true
cache Integer. The number of values to store in a cache 0 cache=0
step Integer. What to add to the previous generated value, when generating a new value 1 step=-1
if_not_exists Boolean. If this is true and a sequence with this name exists already, ignore other options and use the existing values false if_not_exists=true

Once a sequence exists, it can be altered, dropped, reset, forced to generate the next value, or associated with an index.

First, create a sequence:

-- Create a sequence --
box.schema.sequence.create('id_seq',{min=1000, start=1000})
--[[
---
- step: 1
  id: 1
  min: 1000
  cache: 0
  uid: 1
  cycle: false
  name: id_seq
  start: 1000
  max: 9223372036854775807
...
--]]

The result shows that the new sequence has all default values, except for the two that were specified, min and start.

Get the next value from the sequence by calling the next() function:

-- Get the next item --
box.sequence.id_seq:next()
--[[
---
- 1000
...
--]]

The result is the same as the start value. The next call increases the value by one (the default sequence step).

Create a space and specify that its primary key should be generated from the sequence:

-- Create a space --
box.schema.space.create('customers')

-- Create an index that uses the sequence --
box.space.customers:create_index('primary',{ sequence = 'id_seq' })
--[[
---
- parts:
  - type: unsigned
    is_nullable: false
    fieldno: 1
  sequence_id: 1
  id: 0
  space_id: 513
  unique: true
  hint: true
  type: TREE
  name: primary
  sequence_fieldno: 1
...
--]]

Insert a tuple without specifying a value for the primary key:

-- Insert a tuple without the primary key value --
box.space.customers:insert{ nil, 'Adams' }
--[[
---
- [1001, 'Adams']
...
--]]

The result is a new tuple where the first field is assigned the next value from the sequence. This arrangement, where the system automatically generates the values for a primary key, is sometimes called «auto-incrementing» or «identity».

For syntax and implementation details, see the reference for box.schema.sequence.

Migrations

Migration refers to any change in a data schema: adding or removing a field, creating or dropping an index, changing a field format, and so on. Space creation is also a migration. Using migrations, you can track the evolution of your data schema since its initial state. In Tarantool, migrations are presented as Lua code that alters the data schema using the built-in Lua API.

There are two types of migrations:

There are two types of schema migration that do not require data migration:

Other types of migrations are more complex and require additional actions to maintain data consistency.

Migrations are possible in two cases:

For the first case, it is enough to write and test the migration code. The most difficult task is to migrate data when there are active clients. You should keep it in mind when you initially design the data schema.

We identify the following problems if there are active clients:

These issues may or may not be relevant depending on your application and its availability requirements.

Tarantool offers the following features that make migrations easier and safer:

The migration code is executed on a running Tarantool instance. Important: no method guarantees you transactional application of migrations on the whole cluster.

Method 1: include migrations in the application code

This is quite simple: when you reload the code, the data is migrated at the right moment, and the database schema is updated. However, this method may not work for everyone. You may not be able to restart Tarantool or update the code using the hot-reload mechanism.

Method 2: the tt utility

Connect to the necessary instance using tt connect.

$ tt connect admin:password@localhost:3301

You can also connect to the instance and execute the migration script in a single call:

$ tt connect admin:password@localhost:3301 -f 0001-delete-space.lua

Enterprise Edition

Centralized migration management is available in the Enterprise Edition only.

Tarantool EE offers a mechanism for centralized migration management in replication clusters that use etcd as a configuration storage. The mechanism uses the same etcd storage to store migrations and applies them across the entire Tarantool cluster. This ensures migration consistency in the cluster and enables migration history tracking.

The centralized migration management mechanism is implemented in the Enterprise version of the tt utility and in Tarantool Cluster Manager.

To learn how to manage migrations in Tarantool EE clusters from the command line, see Centralized migrations with tt. To learn how to use the mechanism from the TCM web interface, see the Performing migrations TCM documentation page.

Centralized migrations with tt

Example on GitHub: migrations

In this section, you learn to use the centralized migration management mechanism implemented in the Enterprise Edition of the tt utility.

The section includes the following tutorials:

See also:

Basic tt migrations tutorial

Example on GitHub: migrations

In this tutorial, you learn to define the cluster data schema using the centralized migration management mechanism implemented in the Enterprise Edition of the tt utility.

Before starting this tutorial:

The centralized migration mechanism works with Tarantool EE clusters that:

First, start up an etcd instance to use as a configuration storage:

$ etcd

etcd runs on the default port 2379.

Optionally, enable etcd authentication by executing the following script:

#!/usr/bin/env bash

etcdctl user add root:topsecret
etcdctl role add app_config_manager
etcdctl role grant-permission app_config_manager --prefix=true readwrite /myapp/
etcdctl user add app_user:config_pass
etcdctl user grant-role app_user app_config_manager
etcdctl auth enable

It creates an etcd user app_user with read and write permissions to the /myapp prefix, in which the cluster configuration will be stored. The user’s password is config_pass.

Примечание

If you don’t enable etcd authentication, make tt migrations calls without the configuration storage credentials.

  1. Initialize a tt environment:

    $ tt init
    
  2. In the instances.enabled directory, create the myapp directory.

  3. Go to the instances.enabled/myapp directory and create application files:

  • instances.yml:
router-001-a:
storage-001-a:
storage-001-b:
storage-002-a:
storage-002-b:
  • config.yaml:
config:
  etcd:
    endpoints:
    - http://localhost:2379
    prefix: /myapp/
    username: app_user
    password: config_pass
    http:
      request:
        timeout: 3
  • myapp-scm-1.rockspec:

    package = 'myapp'
    version = 'scm-1'
    
    source  = {
        url = '/dev/null',
    }
    
    dependencies = {
        'crud == 1.5.2',
    }
    
    build = {
        type = 'none';
    }
    
  1. Create the source.yaml with a cluster configuration to publish to etcd:

    Примечание

    This configuration describes a typical CRUD-enabled sharded cluster with one router and two storage replica sets, each including one master and one read-only replica.

    credentials:
      users:
        client:
          password: 'secret'
          roles: [super]
        replicator:
          password: 'secret'
          roles: [replication]
        storage:
          password: 'secret'
          roles: [sharding]
    
    iproto:
      advertise:
        peer:
          login: replicator
        sharding:
          login: storage
    
    sharding:
      bucket_count: 3000
    
    groups:
      routers:
        sharding:
          roles: [router]
        roles: [roles.crud-router]
        replicasets:
          router-001:
            instances:
              router-001-a:
                iproto:
                  listen:
                  - uri: localhost:3301
                  advertise:
                    client: localhost:3301
      storages:
        sharding:
          roles: [storage]
        roles: [roles.crud-storage]
        replication:
          failover: manual
        replicasets:
          storage-001:
            leader: storage-001-a
            instances:
              storage-001-a:
                iproto:
                  listen:
                    - uri: localhost:3302
                  advertise:
                    client: localhost:3302
              storage-001-b:
                iproto:
                  listen:
                  - uri: localhost:3303
                  advertise:
                    client: localhost:3303
          storage-002:
            leader: storage-002-a
            instances:
              storage-002-a:
                iproto:
                  listen:
                  - uri: localhost:3304
                  advertise:
                    client: localhost:3304
              storage-002-b:
                iproto:
                  listen:
                  - uri: localhost:3305
                  advertise:
                    client: localhost:3305
    
  2. Publish the configuration to etcd:

    $ tt cluster publish "http://app_user:config_pass@localhost:2379/myapp/" source.yaml
    

The full cluster code is available on GitHub here: migrations.

  1. Build the application:

    $ tt build myapp
    
  2. Start the cluster:

    $ tt start myapp
    

    To check that the cluster is up and running, use tt status:

    $ tt status myapp
    
  3. Bootstrap vshard in the cluster:

    $ tt replicaset vshard bootstrap myapp
    

To perform migrations in the cluster, write them in Lua and publish to the cluster’s etcd configuration storage.

Each migration file must return a Lua table with one object named apply. This object has one field – scenario – that stores the migration function:

local function apply_scenario()
    -- migration code
end

return {
    apply = {
        scenario = apply_scenario,
    },
}

The migration unit is a single file: its scenario is executed as a whole. An error that happens in any step of the scenario causes the entire migration to fail.

Migrations are executed in the lexicographical order. Thus, it’s convenient to use filenames that start with ordered numbers to define the migrations order, for example:

000001_create_space.lua
000002_create_index.lua
000003_alter_space.lua

The default location where tt searches for migration files is /migrations/scenario. Create this subdirectory inside the tt environment. Then, create two migration files:

To publish migrations to the etcd configuration storage, run tt migrations publish:

$ tt migrations publish "http://app_user:config_pass@localhost:2379/myapp"
   • 000001_create_writes_space.lua: successfully published to key "000001_create_writes_space.lua"
   • 000002_create_writers_index.lua: successfully published to key "000002_create_writers_index.lua"

To apply published migrations to the cluster, run tt migrations apply providing a cluster user’s credentials:

$ tt migrations apply "http://app_user:config_pass@localhost:2379/myapp" \
                      --tarantool-username=client --tarantool-password=secret

Важно

The cluster user must have enough access privileges to execute the migrations code.

The output should look as follows:

• router-001:
•     000001_create_writes_space.lua: successfully applied
•     000002_create_writers_index.lua: successfully applied
• storage-001:
•     000001_create_writes_space.lua: successfully applied
•     000002_create_writers_index.lua: successfully applied
• storage-002:
•     000001_create_writes_space.lua: successfully applied
•     000002_create_writers_index.lua: successfully applied

The migrations are applied on all replica set leaders. Read-only replicas receive the changes from the corresponding replica set leaders.

Check the migrations status with tt migration status:

$ tt migrations status "http://app_user:config_pass@localhost:2379/myapp" \
                       --tarantool-username=client --tarantool-password=secret
   • migrations centralized storage scenarios:
   •   000001_create_writes_space.lua
   •   000002_create_writers_index.lua
   • migrations apply status on Tarantool cluster:
   •   router-001:
   •     000001_create_writes_space.lua: APPLIED
   •     000002_create_writers_index.lua: APPLIED
   •   storage-001:
   •     000001_create_writes_space.lua: APPLIED
   •     000002_create_writers_index.lua: APPLIED
   •   storage-002:
   •     000001_create_writes_space.lua: APPLIED
   •     000002_create_writers_index.lua: APPLIED

To make sure that the space and indexes are created in the cluster, connect to the router instance and retrieve the space information:

$ tt connect myapp:router-001-a
myapp:router-001-a> require('crud').schema('writers')
---
- indexes:
    0:
      unique: true
      parts:
      - fieldno: 1
        type: number
        exclude_null: false
        is_nullable: false
      id: 0
      type: TREE
      name: primary
    2:
      unique: true
      parts:
      - fieldno: 4
        type: number
        exclude_null: false
        is_nullable: false
      id: 2
      type: TREE
      name: age
  format: [{'name': 'id', 'type': 'number'}, {'type': 'number', 'name': 'bucket_id',
      'is_nullable': true}, {'name': 'name', 'type': 'string'}, {'name': 'age', 'type': 'number'}]
...

Learn to write and perform data migration in Data migrations with space.upgrade().

Data migrations with space.upgrade()

Example on GitHub: migrations

In this tutorial, you learn to write migrations that include data migration using the space.upgrade() function.

Before starting this tutorial, complete the Basic tt migrations tutorial. As a result, you have a sharded Tarantool EE cluster that uses an etcd-based configuration storage. The cluster has a space with two indexes.

Complex migrations require data migration along with schema migration. Connect to the router instance and insert some tuples into the space before proceeding to the next steps.

$ tt connect myapp:router-001-a
myapp:router-001-a> require('crud').insert_object_many('writers', {
    {id = 1, name = 'Haruki Murakami', age = 75},
    {id = 2, name = 'Douglas Adams', age = 49},
    {id = 3, name = 'Eiji Mikage', age = 41},
}, {noreturn = true})

The next migration changes the space format incompatibly: instead of one name field, the new format includes two fields first_name and last_name. To apply this migration, you need to change each tuple’s structure preserving the stored data. The space.upgrade function helps with this task.

Create a new file 000003_alter_writers_space.lua in /migrations/scenario. Prepare its initial structure the same way as in previous migrations:

local function apply_scenario()
--  migration code
end
return {
    apply = {
        scenario = apply_scenario,
    },
}

Start the migration function with the new format description:

local function apply_scenario()
    local space = box.space['writers']
    local new_format = {
        {name = 'id', type = 'number'},
        {name = 'bucket_id', type = 'number'},
        {name = 'first_name', type = 'string'},
        {name = 'last_name', type = 'string'},
        {name = 'age', type = 'number'},
    }
    box.space.writers.index.age:drop()

Примечание

box.space.writers.index.age:drop() drops an existing index. This is done because indexes rely on field numbers and may break during this format change. If you need the age field indexed, recreate the index after applying the new format.

Next, create a stored function that transforms tuples to fit the new format. In this case, the function extracts the first and the last name from the name field and returns a tuple of the new format:

box.schema.func.create('_writers_split_name', {
    language = 'lua',
    is_deterministic = true,
    body = [[
    function(t)
        local name = t[3]

        local split_data = {}
        local split_regex = '([^%s]+)'
        for v in string.gmatch(name, split_regex) do
            table.insert(split_data, v)
        end

        local first_name = split_data[1]
        assert(first_name ~= nil)

        local last_name = split_data[2]
        assert(last_name ~= nil)

        return {t[1], t[2], first_name, last_name, t[4]}
    end
    ]],
})

Finally, call space:upgrade() with the new format and the transformation function as its arguments. Here is the complete migration code:

local function apply_scenario()
    local space = box.space['writers']
    local new_format = {
        {name = 'id', type = 'number'},
        {name = 'bucket_id', type = 'number'},
        {name = 'first_name', type = 'string'},
        {name = 'last_name', type = 'string'},
        {name = 'age', type = 'number'},
    }
    box.space.writers.index.age:drop()

    box.schema.func.create('_writers_split_name', {
        language = 'lua',
        is_deterministic = true,
        body = [[
        function(t)
            local name = t[3]

            local split_data = {}
            local split_regex = '([^%s]+)'
            for v in string.gmatch(name, split_regex) do
                table.insert(split_data, v)
            end

            local first_name = split_data[1]
            assert(first_name ~= nil)

            local last_name = split_data[2]
            assert(last_name ~= nil)

            return {t[1], t[2], first_name, last_name, t[4]}
        end
        ]],
    })

    local future = space:upgrade({
        func = '_writers_split_name',
        format = new_format,
    })

    future:wait()
end

return {
    apply = {
        scenario = apply_scenario,
    },
}

Learn more about space.upgrade() in Upgrading space schema.

Publish the new migration to etcd.

$ tt migrations publish "http://app_user:config_pass@localhost:2379/myapp" \
                        migrations/scenario/000003_alter_writers_space.lua

Примечание

You can also publish all migrations from the default location /migrations/scenario. All other migrations stored in this directory are already published, so tt skips them.

$ tt migrations publish "http://app_user:config_pass@localhost:2379/myapp"

Apply the published migrations:

$ tt migrations apply "http://app_user:config_pass@localhost:2379/myapp" \
                      --tarantool-username=client --tarantool-password=secret

Connect to the router instance and check that the space and its tuples have the new format:

$ tt connect myapp:router-001-a
myapp:router-001-a> require('crud').get('writers', 2)
---
- rows: [2, 401, 'Douglas', 'Adams', 49]
  metadata: [{'name': 'id', 'type': 'number'}, {'name': 'bucket_id', 'type': 'number'},
    {'name': 'first_name', 'type': 'string'}, {'name': 'last_name', 'type': 'string'},
    {'name': 'age', 'type': 'number'}]
- null
...

Learn to use migrations for data schema definition on new instances added to the cluster in Extending the cluster.

Extending the cluster

Example on GitHub: migrations

In this tutorial, you learn how to consistently define the data schema on newly added cluster instances using the centralized migration management mechanism.

Before starting this tutorial, complete the Basic tt migrations tutorial and Data migrations with space.upgrade(). As a result, you have a sharded Tarantool EE cluster that uses an etcd-based configuration storage. The cluster has a space with two indexes.

Having all migrations in a centralized etcd storage, you can extend the cluster and consistently define the data schema on new instances on the fly.

Add one more storage replica set to the cluster. To do this, edit the cluster files in instances.enabled/myapp:

Publish the new cluster configuration to etcd:

$ tt cluster publish "http://app_user:config_pass@localhost:2379/myapp/" source.yaml

Run tt start to start up the new instances:

$ tt start myapp
   • The instance myapp:router-001-a (PID = 61631) is already running.
   • The instance myapp:storage-001-a (PID = 61632) is already running.
   • The instance myapp:storage-001-b (PID = 61634) is already running.
   • The instance myapp:storage-002-a (PID = 61639) is already running.
   • The instance myapp:storage-002-b (PID = 61640) is already running.
   • Starting an instance [myapp:storage-003-a]...
   • Starting an instance [myapp:storage-003-b]...

Now the cluster contains three storage replica sets.

The new replica set – storage-003– is just started and has no data schema yet. Apply all stored migrations to the cluster to load the same data schema to the new replica set:

$ tt migrations apply "http://app_user:config_pass@localhost:2379/myapp" \
                      --tarantool-username=client --tarantool-password=secret
                      --replicaset=storage-003

Примечание

You can also apply migrations without specifying the replica set. All published migrations are already applied on other replica sets, so tt skips the operation on them.

$ tt migrations apply "http://app_user:config_pass@localhost:2379/myapp" \
                      --tarantool-username=client --tarantool-password=secret

To make sure that the space exists on the new instances, connect to storage-003-a and check box.space.writers:

$ tt connect myapp:storage-003-a
myapp:storage-003-a> box.space.writers ~= nil
---
- true
...

Troubleshooting migrations

The centralized migrations mechanism allows troubleshooting migration issues using dedicated tt migration options. When troubleshooting migrations, remember that any unfinished or failed migration can bring the data schema into to inconsistency. Additional steps may be needed to fix this.

Предупреждение

The options used for migration troubleshooting can cause migration inconsistency in the cluster. Use them only for local development and testing purposes.

If an incorrect migration was published to etcd but wasn’t applied yet, fix the migration file and publish it again with the --overwrite option:

$ tt migrations publish "http://app_user:config_pass@localhost:2379/myapp" \
                        000001_create_space.lua --overwrite

If the migration that needs a fix isn’t the last in the lexicographical order, add also --ignore-order-violation:

$ tt migrations publish "http://app_user:config_pass@localhost:2379/myapp" \
                        000001_create_space.lua --overwrite --ignore-order-violation

If a migration was published by mistake and wasn’t applied yet, you can delete it from etcd using tt migrations remove:

$ tt migrations remove "http://app_user:config_pass@localhost:2379/myapp" \
                    --migration 000003_not_needed.lua

Предупреждение

Any schema change that was made by an incorrect migration before its fail or cancellation must be resolved manually on each replica set before reapply. --force-reapply and other tt migrations options affect only internal status of the migration and don’t revert changes that it has made in the cluster.

If the migration is already applied, publish the fixed version and apply it with the --force-reapply option:

$ tt migrations apply "http://app_user:config_pass@localhost:2379/myapp" \
                      --tarantool-username=client --tarantool-password=secret \
                      --force-reapply

If execution of the incorrect migration version has failed, you may also need to add the --ignore-preceding-status option:

When you reapply a migration, tt checks the statuses of preceding migrations to ensure consistency. To skip this check, add the --ignore-preceding-status option:

$ tt migrations apply "http://app_user:config_pass@localhost:2379/myapp" \
                      --tarantool-username=client --tarantool-password=secret \
                      --migration=00003_alter_space.lua
                      --force-reapply --ignore-preceding-status

To interrupt migration execution on the cluster, use tt migrations stop:

$ tt migrations stop "http://app_user:config_pass@localhost:2379/myapp" \
                      --tarantool-username=client --tarantool-password=secret

You can adjust the maximum migration execution time using the --execution-timeout option of tt migrations apply:

$ tt migrations apply "http://app_user:config_pass@localhost:2379/myapp" \
                      --tarantool-username=client --tarantool-password=secret \
                      --execution-timeout=60

Примечание

If a migration timeout is reached, you may need to call tt migrations stop to cancel requests that were sent when applying migrations.

Upgrading space schema

Enterprise Edition

space:upgrade() is available in the Enterprise Edition only.

In Tarantool, migration refers to any change in a data schema, for example, creating an index, adding a field, or changing a field format. If you need to change a data schema, there are several possible cases:

To solve the task of migrating the data, you can:

The space:upgrade() feature allows users to upgrade the format of a space and the tuples stored in it without blocking the database.

First, specify an upgrade function – a function that will convert the tuples in the space to a new format. The requirements for this function are listed below.

  • The upgrade function takes two arguments. The first argument is a tuple to be upgraded. The second one is optional. It contains some additional information stored in plain Lua object. If omitted, the second argument is nil.
  • The function returns a new tuple or a Lua table. For example, it can add a new field to the tuple. The new tuple must conform to the new space format set by the upgrade operation.
  • The function should be registered with box.schema.func.create. It should also be stored, deterministic, and written in Lua.
  • The function should not change the primary key of the tuple.
  • The function should be idempotent: f(f(t)) = f(t). This is necessary because the function is applied to all tuples returned to the user, and some of them may have already been upgraded in the background.

Then define a new space format. This step is optional. However, it could be useful if, for example, you want to add a new column with data. For details, check the Usage Example section.

The next optional step is to choose an upgrade mode. There are three modes: upgrade, dryrun, and dryrun+upgrade. The default value is upgrade. To check an upgrade function without applying any changes, choose the dryrun mode. To run a space upgrade without testing the function, pick the upgrade mode. If you want to apply both the test and the actual upgrade, use the dryrun+upgrade option. For details, see the Upgrade Modes section.

The user defines an upgrade function. Each tuple of the chosen space is passed through the function. The function converts the tuple from the old format to a new one. The function is applied to all tuples stored in the space in the background. Besides, the function is applied to all tuples returned to the user via the box API (for example, select, get). Therefore, it appears that the space upgrades instantly.

Keep in mind that space:upgrade differs from the space_object:format() in the following ways:

Difference space:upgrade() space:format()
Non-blocking Yes. It returns tuples in the new format, whether or not they have already been converted. Yes.
Set a format incompatible with the current one Yes. Works for non-indexed field types only. No, only expand the format in a compatible way.
Visibility of changes Immediately. All changes are visible and replicated immediately. New data should conform to the new format immediately after the call. After data validation. Data validation starts in the background, it does not block the database. Inserting data incompatible with the new format is allowed before validation is completed – in this case space.format fails.
Cancel (error/restart) Writes the state to the system table. Restart: the operation continues. Error: the operation should be restarted manually, any other attempt to change the table fails. Leaves no traces.
Set the upgrade function Yes. The upgrade may take a while to traverse the space and transform tuples. No.

Примечание

At the moment, the feature is not supported for vinyl spaces.

The space:upgrade() method is added to the space object:

space:upgrade({func[, arg, format, mode, is_async]}])
Параметры:
  • func (string/integer) – upgrade function name (string) or ID (integer). For details, see the upgrade function requirements section.
  • arg – additional information passed to the upgrade function in the second argument. The option accepts any Lua value that can be encoded in MsgPack, which means that the msgpack.encode(arg) should succeed. For example, one can pass a scalar or a Lua table. The default value is nil.
  • format (map) – new space format. The requirements for this are the same as for any other space:format(). If the field is omitted, the space format will remain the same as before the upgrade.
  • mode (string) – upgrade mode. Possible values: upgrade, dryrun, dryrun+upgrade. The default value is upgrade.
  • is_async (boolean) – the flag indicates whether to wait until the upgrade operation is complete before exiting the function. The default value is false – the function is blocked until the upgrade operation is finished.
Return:

object describing the status of the operation (also known as future). The methods of the object are described below.

object future_object
future_object:info(dryrun, status, func, arg, owner, error, progress)

Shows information about the state of the upgrade operation.

Параметры:
  • dryrun (boolean) – dry run mode flag. Possible values: true for a dry run, nil for an actual upgrade.
  • status (string) – upgrade status. Possible values: inprogress, waitrw, error, replica, done.
  • func (string/integer) – name of the upgrade function. It is the same as passed to the space:upgrade method. The field is nil if the status is done.
  • arg – additional information passed to the upgrade function. It is the same as for the space:upgrade method. The field is nil if it is omitted in the space:upgrade.
  • owner (string) – UUID of the instance running the upgrade (see box.info.uuid). The field is nil if the status is done.
  • error (string) – error message if the status is error, otherwise nil.
  • progress (string) – completion percentage if the status is inprogress/waitrw, otherwise nil.
Return:

a table with information about the state of the upgrade operation

Rtype:

table

The fields can also be accessed directly, without calling the info() method. For example, future.status is the same as future:info().status.

future_object:wait([timeout])

Waits until the upgrade operation is completed or a timeout occurs. An operation is considered completed if its status is done or error.

Параметры:
  • timeout (double) – if the timeout argument is omitted, the method waits as long as it takes.
Return:

returns true if the operation has been completed, false on timeout

Rtype:

boolean

future_object:cancel()

Cancels the upgrade operation if it is currently running. Otherwise, an exception is thrown. A canceled upgrade operation completes with an error.

Return:none
Rtype:void

Running space:upgrade() with is_async = false or the is_async field not set is equal to:

local future = space:upgrade({func = 'my_func', is_async = true})
future:wait()
return future

If called without arguments, space:upgrade() returns a future object for the active upgrade operation. If there is none, it returns nil.

There are three upgrade modes: dryrun, dryrun+upgrade, and upgrade. Regardless of the mode selected, the upgrade does not block execution. Once in a while, the background fiber commits the upgraded tuples and yields.

Calling space:upgrade without arguments always returns the current state of the space upgrade, never the state of a dry run. If there is a dry run working in the background, space:upgrade will still return nil. Unlike an actual space upgrade, the future object returned by a dry run upgrade can’t be recovered if it is lost. So a dry run is aborted if it is garbage collected.

Предупреждение

In dryrun+upgrade mode: if the future object is garbage collected by Lua before the end of the dry run and the start of the upgrade, then the dry run will be canceled, and no upgrade will be started.

Upgrade modes:

An upgrade operation has one of the following upgrade states:

../../../../_images/ddl-state.png

While a space upgrade is in progress, the space can’t be altered or dropped. The attempt to do that will throw an exception. Restarting an upgrade is allowed in case the currently running upgrade is canceled or completed with an error. It means the manual restart is possible if the upgrade operation is in the error state.

If a space upgrade was canceled or failed with an error, the space can’t be altered or dropped. The only option is to restart the upgrade using a different upgrade function or format.

The space upgrade state is persisted. It is stored in the _space system table. If an instance with a space upgrade in progress (inprogress state) is shut down, it restarts the space upgrade after recovery. If a space upgrade fails (switches to the error state), it remains in the error state after recovery.

The changes made to a space by a space upgrade are replicated. Just as on the instance where the upgrade is performed, the upgrade function is applied to all tuples returned to the user on the replicas. However, the upgrade operation is not performed on the replicas in the background. The replicas wait for the upgrade operation to complete on the master. They can’t alter or drop the space. Normally, they can’t cancel or restart the upgrade operation either.

There is an emergency exception when the master is permanently dead. It is possible to restart a space upgrade that started on another instance. The restart is possible if the upgrade owner UUID (see the owner field) has been deleted from the _cluster system table.

Примечание

Except the dryrun mode, the upgrade can only be performed on the master. If the instance is no longer the master, the upgrade is suspended until the instance is master again. Restarting the upgrade on a new master works only if the old one has been removed from the replica set (_cluster system space).

Suppose there are two columns in the space testid (unsigned) and data (string). The example shows how to upgrade the schema and add another column to the space using space:upgrade(). The new column contains the id values converted to string. Each step takes a while.

The test space is generated with the following script:

local log = require('log')
box.cfg{
    checkpoint_count = 1,
    memtx_memory = 5 * 1024 * 1024 * 1024,
}
box.schema.space.create('test')
box.space.test:format{
    {name = 'id', type = 'unsigned'},
    {name = 'data', type = 'string'},
}
box.space.test:create_index('pk')
local count = 20 * 1000 * 1000
local progress = 0
box.begin()
for i = 1, count do
    box.space.test:insert{i, 'data' .. i}

    if i % 1000 == 0 then
        box.commit()
        local p = math.floor(i / count * 100)
        if progress ~= p then
            progress = p
            log.info('Generating test data set... %d%% done', p)
        end
        box.begin()
    end
end
box.commit()
box.snapshot()
os.exit(0)

To upgrade the space, connect to the server and then run the commands below:

localhost:3301> box.schema.func.create('convert', {
              >     language = 'lua',
              >     is_deterministic = true,
              >     body = [[function(t)
              >         if #t == 2 then
              >             return t:update({{'!', 2, tostring(t.id)}})
              >         else
              >             return t
              >         end
              >     end]],
              > })
localhost:3301> box.space.test:upgrade({
              >     func = 'convert',
              >     format = {
              >         {name = 'id', type = 'unsigned'},
              >         {name = 'id_string', type = 'string'},
              >         {name = 'data', type = 'string'},
              >     },
              > })

While the upgrade is in progress, you can track the state of the upgrade. To check the status, connect to Tarantool from another console and run the following commands:

localhost:3311> box.space.test:upgrade()
---
- status: inprogress
  progress: 8%
  owner: 579a9e99-427e-4e99-9e2e-216bbd3098a7
  func: convert
...

Even though the upgrade is only 8% complete, selecting the data from the space returns the converted tuples:

localhost:3311> box.space.test:select({}, {iterator = 'req', limit = 5})
---
- - [20000000, '20000000', 'data20000000']
  - [19999999, '19999999', 'data19999999']
  - [19999998, '19999998', 'data19999998']
  - [19999997, '19999997', 'data19999997']
  - [19999996, '19999996', 'data19999996']
...

Примечание

The tuples contain the new field even though the space upgrade is still running.

Wait for the space upgrade to complete using the command below:

localhost:3311> box.space.test:upgrade():wait()

Read views

Enterprise Edition

Read views are available in the Enterprise Edition only.

A read view is an in-memory snapshot of the entire database that isn’t affected by future data modifications. Read views provide access to database spaces and their indexes and enable you to retrieve data using the same select and pairs operations.

Read views can be used to make complex analytical queries. This reduces the load on the main database and improves RPS for a single Tarantool instance.

To improve memory consumption and performance, Tarantool creates read views using the copy-on-write technique. In this case, duplication of the entire data set is not required: Tarantool duplicates only blocks modified after a read view is created.

Примечание

Tarantool Enterprise Edition supports read views starting from v2.11.0 and enables the ability to work with them using both Lua and C API.

Read views have the following limitations:

To create a read view, call the box.read_view.open() function. The snippet below shows how to create a read view with the read_view1 name.

tarantool> read_view1 = box.read_view.open({name = 'read_view1'})

After creating a read view, you can see the information about it by calling read_view_object:info().

tarantool> read_view1:info()
---
- timestamp: 66.606817935
  signature: 24
  is_system: false
  status: open
  vclock: {1: 24}
  name: read_view1
  id: 1
...

To list all the created read views, call the box.read_view.list() function.

After creating a read view, you can access database spaces using the read_view_object.space field. This field provides access to a space object that exposes the select, get, and pairs methods with the same behavior as corresponding box.space methods.

The example below shows how to select 4 records from the bands space:

tarantool> read_view1.space.bands:select({}, {limit = 4})
---
- - [1, 'Roxette', 1986]
  - [2, 'Scorpions', 1965]
  - [3, 'Ace of Base', 1987]
  - [4, 'The Beatles', 1960]
...

Similarly, you can retrieve data by the specific index.

tarantool> read_view1.space.bands.index.year:select({}, {limit = 4})
---
- - [4, 'The Beatles', 1960]
  - [2, 'Scorpions', 1965]
  - [1, 'Roxette', 1986]
  - [3, 'Ace of Base', 1987]
...

Pagination is supported in read views in the same ways as in select requests to spaces: using the fetch_pos and after arguments. To get the cursor position after executing a request on a read view, set fetch_pos to true:

tarantool> result, position = read_view1.space.bands:select({}, { limit = 3, fetch_pos = true })
---
...

tarantool> result
---
- - [1, 'Roxette', 1986]
  - [2, 'Scorpions', 1965]
  - [3, 'Ace of Base', 1987]
...

tarantool> position
---
- kQM
...

Then, pass this position in the after parameter of a request to get the next data chunk:

tarantool> read_view1.space.bands:select({}, { limit = 3, after = position })
---
- - [4, 'The Beatles', 1960]
  - [5, 'Pink Floyd', 1965]
  - [6, 'The Rolling Stones', 1962]
...

When a read view is no longer needed, close it using the read_view_object:close() method because a read view may consume a substantial amount of memory.

tarantool> read_view1:close()
---
...

Otherwise, a read view is closed implicitly when the read view object is collected by the Lua garbage collector.

After the read view is closed, its status is set to closed. On an attempt to use it, an error is raised.

A Tarantool session below demonstrates how to open a read view, get data from this view, and close it. To repeat these steps, you need to bootstrap a Tarantool instance as described in Using data operations (you can skip creating secondary indexes).

  1. Insert test data.

    tarantool> bands:insert{1, 'Roxette', 1986}
               bands:insert{2, 'Scorpions', 1965}
               bands:insert{3, 'Ace of Base', 1987}
               bands:insert{4, 'The Beatles', 1960}
    
  2. Create a read view by calling the open function. Then, make sure that the read view status is open.

    tarantool> read_view1 = box.read_view.open({name = 'read_view1'})
    
    tarantool> read_view1.status
    ---
    - open
    ...
    
  3. Change data in a database using the delete and update operations.

    tarantool> bands:delete(4)
    ---
    - [4, 'The Beatles', 1960]
    ...
    tarantool> bands:update({2}, {{'=', 2, 'Pink Floyd'}})
    ---
    - [2, 'Pink Floyd', 1965]
    ...
    
  4. Query a read view to make sure it contains a snapshot of data before a database is updated.

    tarantool> read_view1.space.bands:select()
    ---
    - - [1, 'Roxette', 1986]
      - [2, 'Scorpions', 1965]
      - [3, 'Ace of Base', 1987]
      - [4, 'The Beatles', 1960]
    ...
    
  5. Close a read view.

    tarantool> read_view1:close()
    ---
    ...
    

SQL guides

This section contains hands-on SQL guides. You might also want to read the in-depth SQL reference.

SQL beginners“ guide

The Beginners“ Guide describes how users can start up with SQL with Tarantool, and necessary concepts.

The SQL Beginners“ Guide is about databases in general, and about the relationship between Tarantool’s NoSQL and SQL products. Most of the matters in the Beginners“ Guide will already be familiar to people who have used relational databases before.

Before starting this tutorial:

  1. Install the tt CLI utility.

  2. Start a Tarantool instance in the interactive mode by running tt run -i:

    $ tt run -i
    Tarantool 3.0.0-0-g6ba34da7f8
    type 'help' for interactive help
    tarantool>
    
  3. Initialize the instance and switch the input language to SQL:

    tarantool> box.cfg{}
    tarantool> \set language sql
    tarantool>  \set delimiter ;
    

Now you have a running Tarantool instance that accepts SQL input.

In football training camp it is traditional for the trainer to begin by showing a football and saying «this is a football». In that spirit, this is a table:

TABLE
          [1]              [2]              [3]
       +-----------------+----------------+----------------+
 Row#1 | Row#1,Column#1  | Row#1,Column#2 | Row#1,Column#3 |
       +-----------------+----------------+----------------+
 Row#2 | Row#2,Column#1  | Row#2,Column#2 | Row#2,Column#3 |
       +-----------------+----------------+----------------+
 Row#3 | Row#3,Column#1  | Row#3,Column#2 | Row#3,Column#3 |
       +-----------------+----------------+----------------+

But the labels are misleading – one usually doesn’t identify rows and columns by their ordinal positions, one prefers to pick out specific items by their contents. In that spirit, this is a table:

MODULES

+-----------------+------+---------------------+
| NAME            | SIZE | PURPOSE             |
+-----------------+------+---------------------+
| box             | 1432 | Database Management |
| clock           |  188 | Seconds             |
| crypto          |    4 | Cryptography        |
+-----------------+------+---------------------+

So one does not use longitude/latitude navigation by talking about «Row#2 Column #2», one uses the contents of the Name column and the name of the Size column by talking about «the size, where the name is „clock“». To be more exact, this is what one says:

SELECT size FROM modules WHERE name = 'clock';

If you’re familiar with Tarantool’s architecture – and ideally you read about that before coming to this chapter – then you know that there is a NoSQL way to get the same thing:

box.space.MODULES:select()[2][2]

Well, you can do that. One of the advantages of Tarantool is that if you can get data via an SQL statement, then you can get the same data via a NoSQL request. But the reverse is not true, because not all NoSQL tuple sets are definable as SQL tables. These restrictions apply for SQL that do not apply for NoSQL:
1. Every column must have a name.
2. Every column should have a scalar type (Tarantool is relaxed about which particular scalar type you can have, but there is no way to index and search arrays, tables within tables, or what MessagePack calls «maps».)

Tarantool/NoSQL’s «format» clause causes the same restrictions.

So an SQL «table» is a NoSQL «tuple set with format restrictions», an SQL «row» is a NoSQL «tuple», an SQL «column» is a NoSQL «list of fields within a tuple set».

This is how to create the modules table:

CREATE TABLE modules (name STRING, size INTEGER, purpose STRING, PRIMARY KEY (name));

The words that are IN CAPITAL LETTERS are «keywords» (although it is only a convention in this manual that keywords are in capital letters, in practice many programmers prefer to avoid shouting). A keyword has meaning for the SQL parser so many keywords are reserved, they cannot be used as names unless they are enclosed inside quotation marks.

The word «modules» is a «table name», and the words «name» and «size» and «purpose» are «column names». All tables and all columns must have names.

The words «STRING» and «INTEGER» are «data types». STRING means «the contents should be characters, the length is indefinite, the equivalent NoSQL type is „string““». INTEGER means «the contents should be numbers without decimal points, the equivalent NoSQL type is „integer“». Tarantool supports other data types but this section’s example table has data types from the two main groups, namely, data types for numbers and data types for strings.

The final clause, PRIMARY KEY (name), means that the name column is the main column used to identify the row.

Frequently it is necessary, at least temporarily, that a column value should be NULL. Typical situations are: the value is unknown, or the value is not applicable. For example, you might make a module as a placeholder but you don’t want to say its size or purpose. If such things are possible, the column is «nullable». The example table’s name column cannot contain nulls, and it could be defined explicitly as «name STRING NOT NULL», but in this case that’s unnecessary – a column defined as PRIMARY KEY is automatically NOT NULL.

Is a NULL in SQL the same thing as a nil in Lua? No, but it is close enough that there will be confusion. When nil means «unknown» or «inapplicable», yes. But when nil means «nonexistent» or «type is nil», no. NULL is a value, it has a data type because it is inside a column which is defined with that data type.

This is how to create indexes for the modules table:

CREATE INDEX size ON modules (size);
CREATE UNIQUE INDEX purpose ON modules (purpose);

There is no need to create an index on the name column, because Tarantool creates an index automatically when it sees a PRIMARY KEY clause in the CREATE TABLE statement. In fact there is no need to create indexes on the size or purpose columns either – if indexes don’t exist, then it is still possible to use the columns for searches. Typically people create non-primary indexes, also called secondary indexes, when it becomes clear that the table will grow large and searches will be frequent, because searching with an index is generally much faster than searching without an index.

Another use for indexes is to enforce uniqueness. When an index is created with CREATE UNIQUE INDEX for the purpose column, it is not possible to have duplicate values in that column.

Putting data into a table is called «inserting». Changing data is called «updating». Removing data is called «deleting». Together, the three SQL statements INSERT plus UPDATE plus DELETE are the three main «data-change» statements.

This is how to insert, update, and delete a row in the modules table:

INSERT INTO modules VALUES ('json', 14, 'format functions for JSON');
UPDATE modules SET size = 15 WHERE name = 'json';
DELETE FROM modules WHERE name = 'json';

The corresponding non-SQL Tarantool requests would be:

box.space.MODULES:insert{'json', 14, 'format functions for JSON'}
box.space.MODULES:update('json', {{'=', 2, 15}})
box.space.MODULES:delete{'json'}

This is how one would populate the table with the values that was shown earlier:

INSERT INTO modules VALUES ('box', 1432, 'Database Management');
INSERT INTO modules VALUES ('clock', 188, 'Seconds');
INSERT INTO modules VALUES ('crypto', 4, 'Cryptography');

Some data-change statements are illegal due to something in the table’s definition. This is called «constraining what can be done». Some types of constraints have already been shown …

NOT NULL – if a column is defined with a NOT NULL clause, it is illegal to put NULL into it. A primary-key column is automatically NOT NULL.

UNIQUE – if a column has a UNIQUE index, it is illegal to put a duplicate into it. A primary-key column automatically has a UNIQUE index.

data domain – if a column is defined as having data type INTEGER, it is illegal to put a non-number into it. More generally, if a value doesn’t correspond to the data type of the definition, it is illegal. Some database management systems (DBMSs) are very forgiving and will try to make allowances for bad values rather than reject them; Tarantool is a bit more strict than those DBMSs.

Now, here are other types of constraints …

CHECK – a table description can have a clause «CHECK (conditional expression)». For example, if the CREATE TABLE modules statement looked like this:

CREATE TABLE modules (name STRING,
                      size INTEGER,
                      purpose STRING,
                      PRIMARY KEY (name),
                      CHECK (size > 0));

then this INSERT statement would be illegal:
INSERT INTO modules VALUES ('box', 0, 'The Database Kernel');
because there is a CHECK constraint saying that the second column, the size column, cannot contain a value which is less than or equal to zero. Try this instead:
INSERT INTO modules VALUES ('box', 1, 'The Database Kernel');

FOREIGN KEY – a table description can have a clause «FOREIGN KEY (column-list) REFERENCES table (column-list)». For example, if there is a new table «submodules» which in a way depends on the modules table, it can be defined like this:

CREATE TABLE submodules (name STRING,
                         module_name STRING,
                         size INTEGER,
                         purpose STRING,
                         PRIMARY KEY (name),
                         FOREIGN KEY (module_name) REFERENCES
                         modules (name));

Now try to insert a new row into this submodules table:

INSERT INTO submodules VALUES
  ('space', 'Box', 10000, 'insert etc.');

The insert will fail because the second column (module_name) refers to the name column in the modules table, and the name column in the modules table does not contain „Box“. However, it does contain „box“. By default searches in Tarantool’s SQL use a binary collation. This will work:

INSERT INTO submodules
  VALUES ('space', 'box', 10000, 'insert etc.');

Now try to delete the corresponding row from the modules table:

DELETE FROM modules WHERE name = 'box';

The delete will fail because the second column (module_name) in the submodules table refers to the name column in the modules table, and the name column in the modules table would not contain „box“ if the delete succeeded. So the FOREIGN KEY constraint affects both the table which contains the FOREIGN KEY clause and the table that the FOREIGN KEY clause refers to.

The constraints in a table’s definition – NOT NULL, UNIQUE, data domain, CHECK, and FOREIGN KEY – are guarantors of the database’s integrity. It is important that they are fixed and well-defined parts of the definition, and hard to bypass with SQL. This is often seen as a difference between SQL and NoSQL – SQL emphasizes law and order, NoSQL emphasizes freedom and making your own rules.

Think about the two tables that have been discussed so far:

CREATE TABLE modules (name STRING,
                      size INTEGER,
                       purpose STRING,
                       PRIMARY KEY (name),
                       CHECK (size > 0));

CREATE TABLE submodules (name STRING,
                         module_name STRING,
                         size INTEGER,
                         purpose STRING,
                         PRIMARY KEY (name),
                         FOREIGN KEY (module_name) REFERENCES
                         modules (name));

Because of the FOREIGN KEYS clause in the submodules table, there is clearly a many-to-one relationship:
submodules –>> modules
that is, every submodules row must refer to one (and only one) modules row, while every modules row can be referred to in zero or more submodules rows.

Table relationships are important, but beware: do not trust anyone who tells you that databases made with SQL are relational «because there are relationships between tables». That is wrong, as will be clear in the discussion about what makes a database relational, later.

Важно

By default, Tarantool prohibits SELECT queries that scan table rows instead of using indexes to avoid unwanted heavy load. For the purposes of this tutorial, allow SQL scan queries in Tarantool by running the command:

SET SESSION "sql_seq_scan" = true;

Alternatively, you can allow a specific query to perform a table scan by adding the SEQSCAN keyword before the table name. Learn more about using SEQSCAN in SQL scan queries in the SQL FROM clause description.

We gave a simple example of a SELECT statement earlier:

SELECT size FROM modules WHERE name = 'clock';

The clause «WHERE name = „clock“» is legal in other statements – it is in examples with UPDATE and DELETE – but here the only examples will be with SELECT.

The first variation is that the WHERE clause does not have to be specified at all, it is optional. So this statement would return all rows:

SELECT size FROM modules;

The second variation is that the comparison operator does not have to be „=“, it can be anything that makes sense: „>“ or „>=“ or „<“ or „<=“, or „LIKE“ which is an operator that works with strings that may contain wildcard characters „_“ meaning „match any one character“ or „%“ meaning „match any zero or one or many characters“. These are legal statements which return all rows:

SELECT size FROM modules WHERE name >= '';
SELECT size FROM modules WHERE name LIKE '%';

The third variation is that IS [NOT] NULL is a special condition. Remembering that the NULL value can mean «it is unknown what the value should be», and supposing that in some row the size is NULL, then the condition «size > 10» is not certainly true and it is not certainly false, so it is evaluated as «unknown». Ordinarily the application of a WHERE clause filters out both false and unknown results. So when searching for NULL, say IS NULL; when searching anything that is not NULL, say IS NOT NULL. This statement will return all rows because (due to the definition) there are no NULLs in the name column:

SELECT size FROM modules WHERE name IS NOT NULL;

The fourth variation is that conditions can be combined with AND / OR, and negated with NOT.

So this statement would return all rows (the first condition is false but the second condition is true, and OR means «return true if either condition is true»):

SELECT size
FROM modules
WHERE name = 'wombat' OR size IS NOT NULL;

Selecting with a select list

Yet again, here is a simple example of a SELECT statement:

SELECT size FROM modules WHERE name = 'clock';

The words between SELECT and FROM are the select list. In this case, the select list is just one word: size. Formally it means that the desire is to return the size values, and technically the name for picking a particular column is called «projection».

The first variation is that one can specify any column in any order:

SELECT name, purpose, size FROM modules;

The second variation is that one can specify an expression, it does not have to be a column name, it does not even have to include a column name. The common expression operators for numbers are the arithmetic operators + - / *; the common expression operator for strings is the concatenation operator ||. For example this statement will return 8, „XY“:

SELECT size * 2, 'X' || 'Y' FROM modules WHERE size = 4;

The third variation is that one can add a clause [AS name] after every expression, so that in the return the column titles will make sense. This is especially important when a title might otherwise be ambiguous or meaningless. For example this statement will return 8, „XY“ as before

SELECT size * 2 AS double_size, 'X' || 'Y' AS concatenated_literals  FROM modules
  WHERE size = 4;

but displayed as a table the result will look like

+----------------+------------------------+
| DOUBLE_SIZE    | CONCATENATED_LITERALS  |
+----------------+------------------------+
|              8 | XY                     |
+----------------+------------------------+

Selecting with a select list with asterisk

Instead of listing columns in a select list, one can just say '*'. For example

SELECT * FROM modules;

This is the same thing as

SELECT name, size, purpose FROM modules;

Selecting with "*" saves time for the writer, but it is unclear to a reader who has not memorized what the column names are. Also it is unstable, because there is a way to change a table’s definition (the ALTER statement, which is an advanced topic). Nevertheless, although it might be bad to use it for production, it is handy to use it for introduction, so "*" will appear in some following examples.

Remember that there is a modules table and there is a submodules table. Suppose that there is a desire to list the submodules that refer to modules for which the purpose is X. That is, this involves a search of one table using a value in another table. This can be done by enclosing «(SELECT …)» within the WHERE clause. For example:

SELECT name FROM submodules
WHERE module_name =
    (SELECT name FROM modules WHERE purpose LIKE '%Database%');

Subqueries are also useful in the select list, when one wishes to combine information from more than one table. For example this statement will display submodules rows but will include values that come from the modules table:

SELECT name AS submodules_name,
    (SELECT purpose FROM modules
     WHERE modules.name = submodules.module_name)
     AS modules_purpose,
    purpose AS submodules_purpose
FROM submodules;

Whoa. What are «modules.name» and «submodules.name»? Whenever you see «x . y» you are looking at a «qualified column name», and the first part is a table identifier, the second part is a column identifier. It is always legal to use qualified column names, but until now it has not been necessary. Now it is necessary, or at least it is a good idea, because both tables have a column named «name».

The result will look like this:

+-------------------+------------------------+--------------------+
| SUBMODULES_NAME   | MODULES_PURPOSE        | SUBMODULES_PURPOSE |
+-------------------+------------------------+--------------------+
| space             | Database Management    | insert etc.        |
+-------------------+------------------------+--------------------+

Perhaps you have read somewhere that SQL stands for «Structured Query Language». That is not true any more. But it is true that the query syntax allows for a structural component, namely the subquery, and that was the original idea. However, there is a different way to combine tables – with joins instead of subqueries.

Until now only «FROM modules» or «FROM submodules» was used in SELECT statements. What if there was more than one table in the FROM clause? For example

SELECT * FROM modules, submodules;

or

SELECT * FROM modules JOIN submodules;

That is legal. Usually it is not what you want, but it is a learning aid. The result will be:

{ columns from modules table }         { columns from submodules table }
+--------+------+---------------------+-------+-------------+-------+-------------+
| NAME   | SIZE | PURPOSE             | NAME  | MODULE_NAME | SIZE  | PURPOSE     |
+--------+------+---------------------+-------+-------------+-------+-------------+
| box    | 1432 | Database Management | space | box         | 10000 | insert etc. |
| clock  |  188 | Seconds             | space | box         | 10000 | insert etc. |
| crypto |    4 | Cryptography        | space | box         | 10000 | insert etc. |
+--------+------+---------------------+-------+-------------+-------+-------------+

It is not an error. The meaning of this type of join is «combine every row in table-1 with every row in table-2». It did not specify what the relationship should be, so the result has everything, even when the submodule has nothing to do with the module.

It is handy to look at the above result, called a «Cartesian join» result, to see what would really be desirable. Probably for this case the row that actually makes sense is the one where the modules.name = submodules.module_name, and it’s better to make that clear in both the select list and the WHERE clause, thus:

SELECT modules.name AS modules_name,
       modules.size AS modules_size,
       modules.purpose AS modules_purpose,
       submodules.name,
       module_name,
       submodules.size,
       submodules.purpose
FROM modules, submodules
WHERE modules.name = submodules.module_name;

The result will be:

+----------+-----------+------------+--------+---------+-------+-------------+
| MODULES_ |  MODULES_ | MODULES_   | NAME   | MODULE_ | SIZE  | PURPOSE     |
| NAME     |  SIZE     | PURPOSE    |        | NAME    |       |             |
+----------+-----------+--------- --+--------+---------+-------+-------------+
| box      |      1432 | Database   | space  | box     | 10000 | insert etc. |
|          |           | Management |        |         |       |             |
+----------+-----------+------------+--------+---------+-------+-------------+

In other words, you can specify a Cartesian join in the FROM clause, then you can filter out the irrelevant rows in the WHERE clause, and then you can rename columns in the select list. This is fine, and every SQL DBMS supports this. But it is worrisome that the number of rows in a Cartesian join is always (number of rows in first table multiplied by number of rows in second table), which means that conceptually you are often filtering in a large set of rows.

It is good to start by looking at Cartesian joins because they show the concept. Many people, though, prefer to use different syntaxes for joins because they look better or clearer. So now those alternatives will be shown.

The ON clause would have the same comparisons as the WHERE clause that was illustrated for the previous section, but the use of different syntax would be making it clear «this is for the sake of the join». Readers can see at a glance that it is, in concept at least, an initial step before the result rows are filtered. For example this

SELECT * FROM modules JOIN submodules
  ON (modules.name = submodules.module_name);

is the same as

SELECT * FROM modules, submodules
  WHERE modules.name = submodules.module_name;

The USING clause would take advantage of names that are held in common between the two tables, with the assumption that the intent is to match those columns with „=“ comparisons. For example,

SELECT * FROM modules JOIN submodules USING (name);

has the same effect as

SELECT * FROM modules JOIN submodules WHERE modules.name = submodules.name;

If the table had been created with a plan in advance to use USING clauses, that would save time. But that did not happen. So, although the above example «works», the results will not be sensible.

A natural join would take advantage of names that are held in common between the two tables, and would do the filtering automatically based on that knowledge, and throw away duplicate columns.

If the table had been created with a plan in advance to use natural joins, that would be very handy. But that did not happen. So, although the following example «works», the results won’t be sensible.

SELECT * FROM modules NATURAL JOIN submodules;

Result: nothing, because modules.name does not match submodules.name, and so on And even if there had been a result, it would only have included four columns: name, module_name, size, purpose.

Now what if there is a desire to join modules to submodules, but it’s necessary to be sure that all the modules are found? In other words, suppose the requirement is to get modules even if the condition submodules.module_name = modules.name is not true, because the module has no submodules.

When that is the requirement, the type of join is an «outer join» (as opposed to the type that has been used so far which is an «inner join»). Specifically the format will be LEFT [OUTER] JOIN because the main table, modules, is on the left. For example:

SELECT *
FROM modules LEFT JOIN submodules
ON modules.name = submodules.module_name;

which returns:

{ columns from modules table }         { columns from submodules table }
+--------+------+---------------------+-------+-------------+-------+-------------+
| NAME   | SIZE | PURPOSE             | NAME  | MODULE_NAME | SIZE  | PURPOSE     |
+--------+------+---------------------+-------+-------------+-------+-------------+
| box    | 1432 | Database Management | space | box         | 10000 | insert etc. |
| clock  |  188 | Seconds             | NULL  | NULL        | NULL  | NULL        |
| crypto |    4 | Cryptography        | NULL  | NULL        | NULL  | NULL        |
+--------+------+---------------------+-------+-------------+-------+-------------+

Thus, for the submodules of the clock module and the submodules of the crypto module – which do not exist – there are NULLs in every column.

A function can take any expression, including an expression that contains another function, and return a scalar value. There are many such functions. Here will be a description of only one, SUBSTR, which returns a substring of a string.

Format: SUBSTR(input-string, start-with [, length])

Description: SUBSTR takes input-string, eliminates any characters before start-with, eliminates any characters after (start-with plus length), and returns the result.

Example: SUBSTR('abcdef', 2, 3) returns „bcd“.

Select with aggregation, GROUP BY, and HAVING

Remember that the modules table looks like this:

MODULES

+-----------------+------+---------------------+
| NAME            | SIZE | PURPOSE             |
+-----------------+------+---------------------+
| box             | 1432 | Database Management |
| clock           |  188 | Seconds             |
| crypto          |    4 | Cryptography        |
+-----------------+------+---------------------+

Suppose that there is no need to know all the individual size values, all that is important is their aggregation, that is, take the attributes of the collection. SQL allows aggregation functions including: AVG (average), SUM, MIN (minimum), MAX (maximum), and COUNT. For example

SELECT AVG(size), SUM(size), MIN(size), MAX(size), COUNT(size) FROM modules;

The result will look like this:

+-----------+-----------+-----------+-----------+-----------+
| COLUMN_1  | COLUMN_2  | COLUMN_3  | COLUMN_4  | COLUMN_5  |
+-----------+-----------+-----------+-----------+-----------|
|       541 |      1624 |         4 |      1432 |         3 |
+-----------+-----------+-----------+-----------+-----------+

Suppose that the requirement is aggregations, but aggregations of rows that have some common characteristic. Supposing further, the rows should be divided into two groups, the ones whose names begin with „b“ and the ones whose names begin with „c“. This can be done by adding a clause [GROUP BY expression]. For example,

SELECT SUBSTR(name, 1, 1), AVG(size), SUM(size), MIN(size), MAX(size), COUNT(size)
FROM modules
GROUP BY SUBSTR(name, 1, 1);

The result will look like this:

+------------+--------------+-----------+-----------+-----------+-------------+
| COLUMN_1   | COLUMN_2     | COLUMN_3  | COLUMN_4  | COLUMN_5  | COLUMN_6    |
+------------+--------------+-----------+-----------+-----------+-------------+
| b          |         1432 |      1432 |      1432 |      1432 |           1 |
| c          |           96 |       192 |         4 |       188 |           2 |
+------------+--------------+-----------+-----------+-----------+-------------+

It is possible to define a temporary (viewed) table within a statement, usually within a SELECT statement, using a WITH clause. For example:

WITH tmp_table AS (SELECT x1 FROM t1) SELECT * FROM tmp_table;

So far, tor every search in the modules table, the rows have come out in alphabetical order by name: „box“, then „clock“, then „crypto“. However, to really be sure about the order, or to ask for a different order, it is necessary to be explicit and add a clause: ORDER BY column-name [ASC|DESC]. (ASC stands for ASCending, DESC stands for DESCending.) For example:

SELECT * FROM modules ORDER BY name DESC;

The result will be the usual rows, in descending alphabetical order: „crypto“ then „clock“ then „box“.

After the ORDER BY clause there can be a clause LIMIT n, where n is the maximum number of rows to retrieve. For example:

SELECT * FROM modules ORDER BY name DESC LIMIT 2;

The result will be the first two rows, „crypto“ and „clock“.

After the ORDER BY clause and the LIMIT clause there can be a clause OFFSET n, where n is the row to start with. The first offset is 0. For example:

SELECT * FROM modules ORDER BY name DESC LIMIT 2 OFFSET 2;

The result will be the third row, „box“.

A view is a canned SELECT. If you have a complex SELECT that you want to run frequently, create a view and then do a simple SELECT on the view. For example:

CREATE VIEW v AS SELECT size, (size *5) AS size_times_5
FROM modules
GROUP BY size, name
ORDER BY size_times_5;
SELECT * FROM v;

Tarantool has a «Write Ahead Log» (WAL). Effects of data-change statements are logged before they are permanently stored on disk. This is a reason that, although entire databases can be stored in temporary memory, they are not vulnerable in case of power failure.

Tarantool supports commits and rollbacks. In effect, asking for a commit means asking for all the recent data-change statements, since a transaction began, to become permanent. In effect, asking for a rollback means asking for all the recent data-change statements, since a transaction began, to be cancelled.

For example, consider these statements:

CREATE TABLE things (remark STRING, PRIMARY KEY (remark));
START TRANSACTION;
INSERT INTO things VALUES ('A');
COMMIT;
START TRANSACTION;
INSERT INTO things VALUES ('B');
ROLLBACK;
SELECT * FROM things;

The result will be: one row, containing „A“. The ROLLBACK cancelled the second INSERT statement, but did not cancel the first one, because it had already been committed.

Ordinarily every statement is automatically committed.

After START TRANSACTION, statements are not automatically committed – Tarantool considers that a transaction is now «active», until the transaction ends with a COMMIT statement or a ROLLBACK statement. While a transaction is active, all statements are legal except another START TRANSACTION.

Tarantool’s SQL data is the same as Tarantool’s NoSQL data. When you create a table or an index with SQL, you are creating a space or an index in NoSQL. For example:

CREATE TABLE things (remark STRING, PRIMARY KEY (remark));
INSERT INTO things VALUES ('X');

is somewhat similar to

box.schema.space.create('THINGS',
{
    format = {
              [1] = {["name"] = "REMARK", ["type"] = "string"}
              }
})
box.space.THINGS:create_index('pk_unnamed_THINGS_1',{unique=true,parts={1,'string'}})
box.space.THINGS:insert{'X'}

Therefore you can take advantage of Tarantool’s NoSQL features even though your primary language is SQL. Here are some possibilities.

(1) NoSQL applications written in one of the connector languages may be slightly faster than SQL applications because SQL statements may require more parsing and may be translated to NoSQL requests.

(2) You can write stored procedures in Lua, combining Lua loop-control and Lua library-access statements with SQL statements. These routines are executed on the server, which is the principal advantage of pure-SQL stored procedures.

(3) There are some options that are implemented in NoSQL that are not (yet) implemented in SQL. For example you can use NoSQL to change an index option, and to deny access to users named „guest“.

(4) System spaces such as _space and _index can be accessed with SQL SELECT statements. This is not quite the same as an information_schema, but it does mean that you can use SQL to access the database’s metadata catalog.

Fields in NoSQL spaces can be accessed with SQL if and only if they are scalar and are defined in format clauses. Indexes of NoSQL spaces will be used with SQL if and only if they are TREE indexes.

Edgar F. Codd, the person most responsible for researching and explaining relational database concepts, listed the main criteria as (Codd’s 12 rules).

Although Tarantool is not advertised as «relational», Tarantool comes with a claim that it complies with these rules, with the following caveats and exceptions …

The rules state that all data must be viewable as relations. A Tarantool SQL table is a relation. However, it is possible to have duplicate values in SQL tables and it is possible to have an implicit ordering. Those characteristics are not allowed for true relations.

The rules state that there must be a dynamic online catalog. Tarantool has one but some metadata is missing from it.

The rules state that the data language must support authorization. Tarantool’s SQL does not. Authorization occurs via NoSQL requests.

The rules require that data must be physically independent (from underlying storage changes) and logically independent (from application program changes). So far there is not enough experience to make this guarantee.

The rules require certain types of updatable views. Tarantool’s views are not updatable.

The rules state that it should be impossible to use a low-level language to bypass integrity as defined in the relational-level language. In Tarantool’s case, this is not true, for example one can execute a request with Tarantool’s NoSQL to violate a foreign-key constraint that was defined with Tarantool’s SQL.

To learn more about SQL in Tarantool, check the reference.

SQL tutorial

This tutorial is a demonstration of the support for SQL in Tarantool. It includes the functionality that you’d encounter in an «SQL-101» course.

Before starting this tutorial:

  1. Install the tt CLI utility.

  2. Start a Tarantool instance in the interactive mode by running tt run -i:

    $ tt run -i
    Tarantool 3.0.0-0-g6ba34da7f8
    type 'help' for interactive help
    tarantool>
    
  3. Initialize the instance and switch the input language to SQL:

    tarantool> box.cfg{}
    tarantool> \set language sql
    tarantool>  \set delimiter ;
    

Now you have a running Tarantool instance that accepts SQL input.

To get started, enter these SQL statements:

CREATE TABLE table1 (column1 INTEGER PRIMARY KEY, column2 VARCHAR(100));
INSERT INTO table1 VALUES (1, 'A');
UPDATE table1 SET column2 = 'B';
SELECT * FROM table1 WHERE column1 = 1;

The result of the SELECT statement looks like this:

sql_tutorial:instance001> SELECT * FROM table1 WHERE column1 = 1;
---
- metadata:
  - name: COLUMN1
    type: integer
  - name: COLUMN2
    type: string
  rows:
  - [1, 'B']
...

The result includes:

  • metadata: the names and data types of each column
  • result rows

For conciseness, metadata is skipped in query results in this tutorial. Only the result rows are shown.

Here is CREATE TABLE with more details:

  • There are multiple columns, with different data types.
  • There is a PRIMARY KEY (unique and not-null) for two of the columns.

Create another table:

CREATE TABLE table2 (column1 INTEGER,
                     column2 VARCHAR(100),
                     column3 SCALAR,
                     column4 DOUBLE,
                     PRIMARY KEY (column1, column2));

The result is: row_count: 1.

Put four rows in the table (table2):

  • The INTEGER and DOUBLE columns get numbers
  • The VARCHAR and SCALAR columns get strings (the SCALAR strings are expressed as hexadecimals)
INSERT INTO table2 VALUES (1, 'AB', X'4142', 5.5);
INSERT INTO table2 VALUES (1, 'CD', X'2020', 1E4);
INSERT INTO table2 VALUES (2, 'AB', X'2020', 12.34567);
INSERT INTO table2 VALUES (-1000, '', X'', 0.0);

Then try to put another row:

INSERT INTO table2 VALUES (1, 'AB', X'A5', -5.5);

This INSERT fails because of a primary-key violation: the row with the primary key 1, 'AB' already exists.

Sequential scan is the scan through all the table rows instead of using indexes. In Tarantool, SELECT SQL queries that perform sequential scans are prohibited by default. For example, this query leads to the error Scanning is not allowed for 'table2':

SELECT * FROM table2;

To execute a scan query, put the SEQSCAN keyword before the table name:

SELECT * FROM SEQSCAN table2;

Try to execute these queries that use indexed column1 in filters:

SELECT * FROM table2 WHERE column1 = 1;
SELECT * FROM table2 WHERE column1 + 1 = 2;

The result is:

  • The first query returns rows:

    - [1, 'AB', 'AB', 10.5]
    - [1, 'CD', '  ', 10005]
    
  • The second query fails with the error Scanning is not allowed for 'TABLE2'. Although column1 is indexed, the expression column1 + 1 is not calculated from the index, which makes this SELECT a scan query.

Примечание

To enable SQL scan queries without SEQSCAN for the current session, run this command:

SET SESSION "sql_seq_scan" = true;

Learn more about using SEQSCAN in the SQL FROM clause description.

Retrieve the 4 rows in the table, in descending order by column2, then (where the column2 values are the same) in ascending order by column4.

* is short for «all columns».

SELECT * FROM SEQSCAN table2 ORDER BY column2 DESC, column4 ASC;

The result is:

- - [1, 'CD', '  ', 10000]
  - [1, 'AB', 'AB', 5.5]
  - [2, 'AB', '  ', 12.34567]
  - [-1000, '', '', 0]

Retrieve some of what you inserted:

  • The first statement uses the LIKE comparison operator which is asking for «first character must be „A“, the next characters can be anything.»
  • The second statement uses logical operators and parentheses, so the AND expressions must be true, or the OR expression must be true. Notice the columns don’t have to be indexed.
SELECT column1, column2, column1 * column4 FROM SEQSCAN table2 WHERE column2
LIKE 'A%';
SELECT column1, column2, column3, column4 FROM SEQSCAN table2
    WHERE (column1 < 2 AND column4 < 10)
    OR column3 = X'2020';

The first result is:

- - [1, 'AB', 5.5]
  - [2, 'AB', 24.69134]

The second result is:

- - [-1000, '', '', 0]
  - [1, 'AB', 'AB', 5.5]
  - [1, 'CD', '  ', 10000]
  - [2, 'AB', '  ', 12.34567]

Retrieve with grouping.

The rows that have the same values for column2 are grouped and are aggregated – summed, counted, averaged – for column4.

SELECT column2, SUM(column4), COUNT(column4), AVG(column4)
FROM SEQSCAN table2
GROUP BY column2;

The result is:

- - ['', 0, 1, 0]
  - ['AB', 17.84567, 2, 8.922835]
  - ['CD', 10000, 1, 10000]

Insert rows that contain NULL values.

NULL is not the same as Lua nil; it commonly is used in SQL for unknown or not-applicable.

INSERT INTO table2 VALUES (1, NULL, X'4142', 5.5);
INSERT INTO table2 VALUES (0, '!!@', NULL, NULL);
INSERT INTO table2 VALUES (0, '!!!', X'00', NULL);

The results are:

  • The first INSERT fails because NULL is not permitted for a column that was defined with a PRIMARY KEY clause.
  • The other INSERT statements succeed.

Create a new index on column4.

There already is an index for the primary key. Indexes are useful for making queries faster. In this case, the index also acts as a constraint, because it prevents two rows from having the same values in column4. However, it is not an error that column4 has multiple occurrences of NULLs.

CREATE UNIQUE INDEX i ON table2 (column4);

The result is: rowcount: 1.

Create a table table3, which contains a subset of the table2 columns and a subset of the table2 rows.

You can do this by combining INSERT with SELECT. Then select everything from the result table.

CREATE TABLE table3 (column1 INTEGER, column2 VARCHAR(100), PRIMARY KEY
(column2));
INSERT INTO table3 SELECT column1, column2 FROM SEQSCAN table2 WHERE column1 <> 2;
SELECT * FROM SEQSCAN table3;

The result is:

- - [-1000, '']
  - [0, '!!!']
  - [0, '!!@']
  - [1, 'AB']
  - [1, 'CD']

A subquery is a query within a query.

Find all the rows in table2 whose (column1, column2) values are not present in table3.

SELECT * FROM SEQSCAN table2 WHERE (column1, column2) NOT IN (SELECT column1,
column2 FROM SEQSCAN table3);

The result is the single row that was excluded when inserting the rows with the INSERT ... SELECT statement:

- - [2, 'AB', '  ', 12.34567]

A join is a combination of two tables. There is more than one way to do them in Tarantool, for example, «Cartesian joins» or «left outer joins».

This example shows the most typical case, where column values from one table match column values from another table.

SELECT * FROM SEQSCAN table2, table3
    WHERE table2.column1 = table3.column1 AND table2.column2 = table3.column2
    ORDER BY table2.column4;

The result is:

- - [0, '!!!', "\0", null, 0, '!!!']
  - [0, '!!@', null, null, 0, '!!@']
  - [-1000, '', '', 0, -1000, '']
  - [1, 'AB', 'AB', 5.5, 1, 'AB']
  - [1, 'CD', ' ', 10000, 1, 'CD']

Create a table that includes a constraint – there must not be any rows containing 13 in column2. After that, try to insert the following row:

CREATE TABLE table4 (column1 INTEGER PRIMARY KEY, column2 INTEGER, CHECK
(column2 <> 13));
INSERT INTO table4 VALUES (12, 13);

Result: the insert fails, as it should, with the message Check constraint 'ck_unnamed_TABLE4_1' failed for tuple.

Create a table that includes a constraint: there must not be any rows containing values that do not appear in table2.

CREATE TABLE table5 (column1 INTEGER, column2 VARCHAR(100),
    PRIMARY KEY (column1),
    FOREIGN KEY (column1, column2) REFERENCES table2 (column1, column2));
INSERT INTO table5 VALUES (2,'AB');
INSERT INTO table5 VALUES (3,'AB');

Result:

  • The first INSERT statement succeeds because table3 contains a row with [2, 'AB', ' ', 12.34567].
  • The second INSERT statement, correctly, fails with the message Foreign key constraint ''fk_unnamed_TABLE5_1'' failed: foreign tuple was not found.

Due to earlier INSERT statements, these values are in column4 of table2: {0, NULL, NULL, 5.5, 10000, 12.34567}. Add 5 to each of these values except 0. Adding 5 to NULL results in NULL, as SQL arithmetic requires. Use SELECT to see what happened to column4.

UPDATE table2 SET column4 = column4 + 5 WHERE column4 <> 0;
SELECT column4 FROM SEQSCAN table2 ORDER BY column4;

The result is: {NULL, NULL, 0, 10.5, 17.34567, 10005}.

Due to earlier INSERT statements, there are 6 rows in table2:

- - [-1000, '', '', 0]
  - [0, '!!!', "\0", null]
  - [0, '!!@', null, null]
  - [1, 'AB', 'AB', 10.5]
  - [1, 'CD', '  ', 10005]
  - [2, 'AB', '  ', 17.34567]

Try to delete the last and first of these rows:

DELETE FROM table2 WHERE column1 = 2;
DELETE FROM table2 WHERE column1 = -1000;
SELECT COUNT(column1) FROM SEQSCAN table2;

The result is:

  • The first DELETE statement causes an error because there’s a foreign-key constraint.
  • The second DELETE statement succeeds.
  • The SELECT statement shows that there are 5 rows remaining.

Create another constraint that there must not be any rows in table1 containing values that do not appear in table5. This was impossible during the table1 creation because at that time table5 did not exist. You can add constraints to existing tables with the ALTER TABLE statement.

ALTER TABLE table1 ADD CONSTRAINT c
    FOREIGN KEY (column1) REFERENCES table5 (column1);
DELETE FROM table1;
ALTER TABLE table1 ADD CONSTRAINT c
    FOREIGN KEY (column1) REFERENCES table5 (column1);

Result: the ALTER TABLE statement fails the first time because there is a row in table1, and ADD CONSTRAINT requires that the table be empty. After the row is deleted, the ALTER TABLE statement completes successfully. Now there is a chain of references, from table1 to table5 and from table5 to table2.

The idea of a trigger is: if a change (INSERT or UPDATE or DELETE) happens, then a further action – perhaps another INSERT or UPDATE or DELETE – will happen.

Set up the following trigger: when a update to table3 is done, do an update to table2. Specify this as FOR EACH ROW, so that the trigger activates 5 times (since there are 5 rows in table3).

SELECT column4 FROM table2 WHERE column1 = 2;
CREATE TRIGGER tr AFTER UPDATE ON table3 FOR EACH ROW
BEGIN UPDATE table2 SET column4 = column4 + 1 WHERE column1 = 2; END;
UPDATE table3 SET column2 = column2;
SELECT column4 FROM table2 WHERE column1 = 2;

Result:

  • The first SELECT shows that the original value of column4 in table2 where column1 = 2 was: 17.34567.
  • The second SELECT returns:
- - [22.34567]

You can manipulate string data (usually defined with CHAR or VARCHAR data types) in many ways. For example:

  • concatenate strings with the || operator
  • extract substrings with the SUBSTR function
SELECT column2, column2 || column2, SUBSTR(column2, 2, 1) FROM SEQSCAN table2;

The result is:

- - ['!!!', '!!!!!!', '!']
  - ['!!@', '!!@!!@', '!']
  - ['AB', 'ABAB', 'B']
  - ['CD', 'CDCD', 'D']
  - ['AB', 'ABAB', 'B']

You can also manipulate number data (usually defined with INTEGER or DOUBLE data types) in many ways. For example:

  • shift left with the << operator
  • get modulo with the % operator
SELECT column1, column1 << 1, column1 << 2, column1 % 2 FROM SEQSCAN table2;

The result is:

- - [0, 0, 0, 0]
  - [0, 0, 0, 0]
  - [1, 2, 4, 1]
  - [1, 2, 4, 1]
  - [2, 4, 8, 0]

Tarantool can handle:

  • integers anywhere in the 4-byte integer range
  • approximate-numerics anywhere in the 8-byte IEEE floating point range
  • any Unicode characters, with UTF-8 encoding and a choice of collations

Insert such values in a new table and see what happens when you select them with arithmetic on a number column and ordering by a string column.

CREATE TABLE t6 (column1 INTEGER, column2 VARCHAR(10), column4 DOUBLE,
PRIMARY KEY (column1));
INSERT INTO t6 VALUES (-1234567890, 'АБВГД', 123456.123456);
INSERT INTO t6 VALUES (+1234567890, 'GD', 1e30);
INSERT INTO t6 VALUES (10, 'FADEW?', 0.000001);
INSERT INTO t6 VALUES (5, 'ABCDEFG', NULL);
SELECT column1 + 1, column2, column4 * 2 FROM SEQSCAN t6 ORDER BY column2;

The result is:

- - [6, 'ABCDEFG', null]
  - [11, 'FADEW?', 2e-06]
  - [1234567891, 'GD', 2e+30]
  - [-1234567889, 'АБВГД', 246912.246912]

A view (or viewed table), is virtual, meaning that its rows aren’t physically in the database, their values are calculated from other tables.

Create a view v3 based on table3 and select from it:

CREATE VIEW v3 AS SELECT SUBSTR(column2,1,2), column4 FROM SEQSCAN t6
WHERE column4 >= 0;
SELECT * FROM v3;

The result is:

- - ['АБ', 123456.123456]
  - ['FA', 1e-06]
  - ['GD', 1e+30]

By putting WITH + SELECT in front of a SELECT, you can make a temporary view that lasts for the duration of the statement.

Create such a view and select from it:

WITH cte AS (
             SELECT SUBSTR(column2,1,2), column4 FROM SEQSCAN t6
             WHERE column4 >= 0)
SELECT * FROM cte;

The result is the same as the CREATE VIEW result:

- - ['АБ', 123456.123456]
  - ['FA', 1e-06]
  - ['GD', 1e+30]

Tarantool can handle statements like SELECT 55; (select without FROM) like some other popular DBMSs. But it also handles the more standard statement VALUES (expression [, expression ...]);.

SELECT 55 * 55, 'The rain in Spain';
VALUES (55 * 55, 'The rain in Spain');

The result of both these statements is:

- - [3025, 'The rain in Spain']

To find out the internal structure of the Tarantool database with SQL, select from the Tarantool system tables _space, _index, and _trigger:

SELECT * FROM SEQSCAN "_space";
SELECT * FROM SEQSCAN "_index";
SELECT * FROM SEQSCAN "_trigger";

Actually, these statements select from NoSQL «system spaces».

Select from _space by a table name:

SELECT "id", "name", "owner", "engine" FROM "_space" WHERE "name"='TABLE3';

The result is:

- - [517, 'TABLE3', 1, 'memtx']

You can execute SQL statements directly from the Lua code without switching to the SQL input.

Change the settings so that the console accepts statements written in Lua instead of statements written in SQL:

sql_tutorial:instance001> \set language lua

You can invoke SQL statements using the Lua function box.execute(string).

sql_tutorial:instance001> box.execute([[SELECT * FROM SEQSCAN table3;]]);

The result is:

- - [-1000, '']
  - [0, '!!!']
  - [0, '!!@']
  - [1, 'AB']
  - [1, 'CD']
...

To see how the SQL in Tarantool scales, create a bigger table.

The following Lua code generates one million rows with random data and inserts them into a table. Copy this code into the Tarantool console and wait a bit:

box.execute("CREATE TABLE tester (s1 INT PRIMARY KEY, s2 VARCHAR(10))");

function string_function()
    local random_number
    local random_string
    random_string = ""
    for x = 1, 10, 1 do
        random_number = math.random(65, 90)
        random_string = random_string .. string.char(random_number)
    end
    return random_string
end;

function main_function()
    local string_value, t, sql_statement
    for i = 1, 1000000, 1 do
        string_value = string_function()
        sql_statement = "INSERT INTO tester VALUES (" .. i .. ",'" .. string_value .. "')"
        box.execute(sql_statement)
    end
end;
start_time = os.clock();
main_function();
end_time = os.clock();
print('insert done in ' .. end_time - start_time .. ' seconds');

The result is: you now have a table with a million rows, with a message saying insert done in 88.570578 seconds.

Check how SELECT works on the million-row table:

  • the first query goes by an index because s1 is the primary key
  • the second query does not go by an index
box.execute([[SELECT * FROM tester WHERE s1 = 73446;]]);
box.execute([[SELECT * FROM SEQSCAN tester WHERE s2 LIKE 'QFML%';]]);

The result is:

  • the first statement completes instantaneously
  • the second statement completed noticeably slower

To cleanup all the objects created in this tutorial, switch to the SQL input language again. Then run the DROP statements for all created tables, views, and triggers.

These statements must be entered separately.

sql_tutorial:instance001> \set language sql
sql_tutorial:instance001> DROP TABLE tester;
sql_tutorial:instance001> DROP TABLE table1;
sql_tutorial:instance001> DROP VIEW v3;
sql_tutorial:instance001> DROP TRIGGER tr;
sql_tutorial:instance001> DROP TABLE table5;
sql_tutorial:instance001> DROP TABLE table4;
sql_tutorial:instance001> DROP TABLE table3;
sql_tutorial:instance001> DROP TABLE table2;
sql_tutorial:instance001> DROP TABLE t6;
sql_tutorial:instance001> \set language lua
sql_tutorial:instance001> os.exit();

Improving MySQL with Tarantool

Replicating MySQL is one of the Tarantool’s killer functions. It allows you to keep your existing MySQL database while at the same time accelerating it and scaling it out horizontally. Even if you aren’t interested in extensive expansion, replacing existing replicas with Tarantool can save you money, because Tarantool is more efficient per core than MySQL. To read a testimonial of a company that implemented Tarantool replication on a large scale, see the following article.

If you run into any trouble with regards to the basics of Tarantool, see the Getting started guide or the Data model description. A helpful log for troubleshooting during this tutorial is replicatord.log in /var/log. You can also have a look at the instance’s log example.log in /var/log/tarantool.

The tutorial is intended for CentOS 7.5 and MySQL 5.7. The tutorial requires that systemd and MySQL are installed.

In this section, you configure MySQL and create a database.

  1. First, install the necessary packages in CentOS:

    $ yum -y install git ncurses-devel cmake gcc-c++ boost boost-devel wget unzip nano bzip2 mysql-devel mysql-lib
    
  2. Clone the Tarantool-MySQL replication package from GitHub:

    $ git clone https://github.com/tarantool/mysql-tarantool-replication.git
    
  3. Build the replicator with cmake:

    $ cd mysql-tarantool-replication
    $ git submodule update --init --recursive
    $ cmake .
    $ make
    
  4. The replicator will run as a systemd daemon called replicatord, so, edit its systemd service file (replicatord.service) in the mysql-tarantool-replication repository:

    $ nano replicatord.service
    

    The following line should be changed:

    ExecStart=/usr/local/sbin/replicatord -c /usr/local/etc/replicatord.cfg
    

    To change it, replace the .cfg extension with .yml:

    ExecStart=/usr/local/sbin/replicatord -c /usr/local/etc/replicatord.yml
    
  5. Next, copy the files from the replicatord repository to other necessary locations:

    $ cp replicatord /usr/local/sbin/replicatord
    $ cp replicatord.service /etc/systemd/system
    
  6. Enter MySQL console and create a sample database (depending on your existing installation, you may be a user other than root):

    mysql -u root -p
    CREATE DATABASE menagerie;
    QUIT
    
  7. Get some sample data from MySQL. The data will be pulled into the root directory. After that, install it from the terminal.

    cd
    wget http://downloads.mysql.com/docs/menagerie-db.zip
    unzip menagerie-db.zip
    cd menagerie-db
    mysql -u root -p menagerie < cr_pet_tbl.sql
    mysql -u root -p menagerie < load_pet_tbl.sql
    mysql menagerie -u root -p < ins_puff_rec.sql
    mysql menagerie -u root -p < cr_event_tbl.sql
    
  8. Enter MySQL console and massage the data for use with the Tarantool replicator. In this step, you:

    • add an ID
    • change a field name to avoid conflict
    • cut down the number of fields

    With real data, this is the step that involves the most tweaking.

    mysql -u root -p
    USE menagerie;
    ALTER TABLE pet ADD id INT PRIMARY KEY AUTO_INCREMENT FIRST;
    ALTER TABLE pet CHANGE COLUMN 'name' 'name2' VARCHAR(255);
    ALTER TABLE pet DROP sex, DROP birth, DROP death;
    QUIT
    
  9. The sample data is set up. Edit MySQL configuration file to use it with the replicator:

    $ cd
    $ nano /etc/my.cnf
    

    Note that your my.cnf for MySQL could be in a slightly different location. Set:

    [mysqld]
    binlog_format = ROW
    server_id = 1
    log-bin = mysql-bin
    interactive_timeout = 3600
    wait_timeout = 3600
    max_allowed_packet = 32M
    socket = /var/lib/mysql/mysql.sock
    bind-address = 127.0.0.1
    
    [client]
    socket = /var/lib/mysql/mysql.sock
    
  10. After exiting nano, restart mysqld:

    $ systemctl restart mysqld
    

In this section, you install Tarantool and set up spaces for replication.

  1. Go to the Download page and follow the installation instructions.

  2. Install the tt CLI utility.

  3. Create a new tt environment in the current directory using the tt init command.

  4. In the /etc/tarantool/instances.available/mysql directory, create the tt instance configuration files:

    • config.yaml – specifies the following configuration

      app:
        file: 'myapp.lua'
      
      groups:
        group001:
          replicasets:
            replicaset001:
              instances:
                instance001:
                  iproto:
                    listen:
                    - uri: '127.0.0.1:3301'
      
    • instances.yml – specifies instances to run in the current environment

      instance001:
      
    • myapp.lua – contains a Lua script with an application to load

      box.schema.user.grant('guest', 'read,write,execute', 'universe')
      
      local function bootstrap()
          if not box.space.mysqldaemon then
              s = box.schema.space.create('mysqldaemon')
              s:create_index('primary',
                      { type = 'tree', parts = { 1, 'unsigned' }, if_not_exists = true })
          end
          if not box.space.mysqldata then
              t = box.schema.space.create('mysqldata')
              t:create_index('primary',
                      { type = 'tree', parts = { 1, 'unsigned' }, if_not_exists = true })
          end
      end
      bootstrap()
      

    For details, see the Configuration section.

  5. Inside the instances.enabled directory of the created tt environment, create a symlink (mysql) to the directory from the previous step:

    $ ln -s /etc/tarantool/instances.available/mysql mysql
    
  6. Next, start up the Lua program with tt, the Tarantool command-line utility:

    $ tt start mysql
    
  7. Enter the Tarantool instance:

    $ tt connect mysql:instance001
    
  8. Check that the target spaces were successfully created:

    mysql:instance001> box.space._space:select()
    

    At the bottom, you will see mysqldaemon and mysqldata spaces. Then exit with «CTRL+C».

MySQL and Tarantool are now set up. You can proceed to configure the replicator.

  1. Edit the replicatord.yml file in the main tarantool-mysql-replication directory:

    nano replicatord.yml
    
  2. Change the entire file as follows. Don’t forget to add your MySQL password and set the appropriate user:

    mysql:
        host: 127.0.0.1
        port: 3306
        user: root
        password:
        connect_retry: 15 # seconds
    
    tarantool:
        host: 127.0.0.1:3301
        binlog_pos_space: 512
        binlog_pos_key: 0
        connect_retry: 15 # seconds
        sync_retry: 1000 # milliseconds
    
    mappings:
     - database: menagerie
       table: pet
       columns: [ id, name2, owner, species ]
       space: 513
       key_fields:  [ 0 ]
       # insert_call: function_name
       # update_call: function_name
       # delete_call: function_name
    
  3. Copy replicatord.yml to the location where systemd looks for it:

    $ cp replicatord.yml /usr/local/etc/replicatord.yml
    
  4. Next, start up the replicator:

    $ systemctl start replicatord
    
  5. Enter the Tarantool instance:

    $ tt connect mysql:instance001
    
  6. Do a select on the mysqldata space. The replicated content from MySQL looks the following way:

    mysql:instance001> box.space.mysqldata:select()
    ---
    - - [1, 'Fluffy', 'Harold', 'cat']
      - [2, 'Claws', 'Gwen', 'cat']
      - [3, 'Buffy', 'Harold', 'dog']
      - [4, 'Fang', 'Benny', 'dog']
      - [5, 'Bowser', 'Diane', 'dog']
      - [6, 'Chirpy', 'Gwen', 'bird']
      - [7, 'Whistler', 'Gwen', 'bird']
      - [8, 'Slim', 'Benny', 'snake']
      - [9, 'Puffball', 'Diane', 'hamster']
    

In this section, you enter a record into MySQL and check that the record is replicated to Tarantool. To do this:

  1. Exit the Tarantool instance with CTRL-D.

  2. Insert a record into MySQL:

    mysql -u root -p
    USE menagerie;
    INSERT INTO pet(name2, owner, species) VALUES ('Spot', 'Brad', 'dog');
    QUIT
    
  3. In the terminal, enter the Tarantool instance:

    $ tt connect mysql:instance001
    
  4. To see the replicated data in Tarantool, run the following command:

    mysql:instance001> box.space.mysqldata:select()
    

Транзакции

Transactions allow users to perform multiple operations atomically.

For more information on how transactions work in Tarantool, see the following sections:

Transaction model

The transaction model of Tarantool corresponds to the properties ACID (atomicity, consistency, isolation, durability).

Tarantool has two modes of transaction behavior:

Each transaction in Tarantool is executed in a single fiber on a single thread, sees a consistent database state and commits all changes atomically.

All transaction changes are written to the WAL (Write Ahead Log) in a single batch in a specific order at the time of the commit. If needed, transaction changes can also be rolled back – completely or to a specified savepoint.

Therefore, every transaction in Tarantool has the highest transaction isolation levelserializable.

By default, the isolation level of Tarantool is serializable. The exception is a failure during writing to the WAL, which can occur, for example, when the disk space is over. In this case, the isolation level of the concurrent read transaction would be read-committed.

The MVСС mode provides several options that enable you to tune the visibility behavior during transaction execution.

The read-committed isolation level makes visible all transactions that started commit (box.commit() was called).

  • Write transactions with reads

    Manual usage of read-committed for write transactions with reads is completely safe, as this transaction will eventually result in a commit. If a previous transactions fails, this transaction will inevitably fail as well due to the serializable isolation level.

  • Read transactions

    Manual usage of read-committed for read transactions may be unsafe, as it may lead to phantom reads.

The read-confirmed isolation level makes visible all transactions that finished the commit (box.commit() was returned). This means that new data is already on disk or even on other replicas.

  • Read transactions

    The use of read-confirmed is safe for read transactions given that data is on disk (for asynchronous replication) or even in other replicas (for synchronous replication).

  • Write transactions

    To achieve serializable, any write transaction should read all data that has already been committed. Otherwise, it may conflict when it reaches its commit.

Linearizability of read operations implies that if a response for a write request arrived earlier than a read request was made, this read request should return the results of the write request. When called with linearizable, box.begin() yields until the instance receives enough data from remote peers to be sure that the transaction is linearizable.

Linearizable transactions may only perform requests to the following memtx space types:

A linearizable transaction can fail with an error in the following cases:

  • If the node can’t contact enough remote peers to determine which data is committed.
  • If the data isn’t received during the timeout specified in box.begin().

Примечание

To start a linearizable transaction, the node should be the replication source for at least N - Q + 1 remote replicas. Here N is the count of registered nodes in the cluster and Q is replication_synchro_quorum. So, for example, you can’t perform a linearizable transaction on anonymous replicas because they can’t be the source of replication for other nodes.

To minimize the possibility of conflicts, MVCC uses what is called best-effort visibility:

This inevitably leads to the serializable isolation level. Since there is no option for MVCC to analyze the whole transaction to make a decision, it makes the choice on the first operation.

Примечание

If the serializable isolation level becomes unreachable, the transaction is marked as «conflicted» and rolled back.

Thread model

The thread model assumes that a query received by Tarantool via network is processed with three operating system threads:

  1. The network thread (or threads) on the server side receives the query, parses the statement, checks if it is correct, and then transforms it into a special structure – a message containing an executable statement and its options.

  2. The network thread sends this message to the instance’s transaction processor thread (TX thread) via a lock-free message bus. Lua programs are executed directly in the transaction processor thread, and do not need to be parsed and prepared.

    The TX thread either uses a space index to find and update the tuple, or executes a stored function that performs a data operation.

  3. The execution of the operation results in a message to the write-ahead logging (WAL) thread used to commit the transaction and the fiber executing the transaction is suspended. When the transaction results in a COMMIT or ROLLBACK, the following actions are taken:

    • The WAL thread responds with a message to the TX thread.
    • The fiber executing the transaction is resumed to process the result of the transaction.
    • The result of the fiber execution is passed to the network thread, and the network thread returns the result to the client.

Примечание

There is only one TX thread in Tarantool. Some users are used to the idea that there can be multiple threads working on the database. For example, thread #1 reads a row #x while thread #2 writes a row #y. With Tarantool this does not happen. Only the TX thread can access the database, and there is only one TX thread for each Tarantool instance.

The TX thread can handle many fibers – a set of computer instructions that can contain «yield» signals. The TX thread executes all computer instructions up to a yield signal, and then switches to execute the instructions of another fiber.

Yields must happen, otherwise the TX thread would be permanently stuck on the same fiber.

There are also several supplementary threads that serve additional capabilities:

Transaction mode: default

By default, Tarantool does not allow «yielding» inside a memtx transaction and the transaction manager is disabled. This allows fast atomic transactions without conflicts, but brings some limitations:

To learn how to enable yielding inside a memtx transaction, see Transaction mode: MVCC.

To switch back to the default mode, disable the transaction manager:

box.cfg { memtx_use_mvcc_engine = false }

Transaction mode: MVCC

Since version 2.6.1, Tarantool has another transaction behavior mode that allows «yielding» inside a memtx transaction. This is controlled by the transaction manager.

This mode allows concurrent transactions but may cause conflicts. You can use this mode on the memtx storage engine. The vinyl storage engine also supports MVCC mode, but has a different implementation.

Примечание

Currently, you cannot use several different storage engines within one transaction.

The transaction manager is designed to isolate concurrent transactions and provides a serializable transaction isolation level. It consists of two parts:

The transaction manager also provides a non-classical snapshot isolation level – this snapshot is not necessarily tied to the start time of the transaction, like the classical snapshot where a transaction can get a consistent snapshot of the database. The conflict manager decides if and when each transaction gets which snapshot. This avoids some conflicts compared to the classic snapshot isolation approach.

Предупреждение

Currently, the isolation level of BITSET and RTREE indexes in MVCC transaction mode is read-committed (not serializable, as stated). If a transaction uses these indexes, it can read committed or confirmed data (depending on the isolation level). However, the indexes are subject to different anomalies that can make them unserializable.

By default, the transaction manager is disabled. Use the memtx_use_mvcc_engine option to enable it via box.cfg.

box.cfg{memtx_use_mvcc_engine = true}

The transaction manager has the following options for the transaction isolation level:

  • best-effort (default)
  • read-committed
  • read-confirmed
  • linearizable (only for a specific transaction)

Using best-effort as the default option allows MVCC to consider the actions of transactions independently and determine the best isolation level for them. It increases the probability of successful completion of the transaction and helps to avoid possible conflicts.

To set another default isolation level, for example, read-committed, use the following command:

box.cfg { txn_isolation = 'read-committed' }

Note that the linearizable isolation level can’t be set as default and can be used for a specific transaction only. You can set an isolation level for a specific transaction in its box.begin() call:

box.begin({ txn_isolation = 'best-effort' })

In this case, you can also use the default option. It sets the transaction’s isolation level to the one set in box.cfg.

Примечание

For autocommit transactions (actions with a statement without explicit box.begin/box.commit calls) there is a rule:

  • Read-only transactions (for example, select) are performed with read-confirmed.
  • All other transactions (for example, replace) are performed with read-committed.

You can also set the isolation level in the net.box stream:begin() method and IPROTO_BEGIN binary protocol request.

Choosing the better option depends on whether you have conflicts or not. If you have many conflicts, you should set a different option or use the default transaction mode.

Create a file init.lua, containing the following:

fiber = require 'fiber'

box.cfg{ listen = '127.0.0.1:3301', memtx_use_mvcc_engine = false }
box.schema.user.grant('guest', 'super', nil, nil, {if_not_exists = true})

tickets = box.schema.create_space('tickets', { if_not_exists = true })
tickets:format({
    { name = "id", type = "number" },
    { name = "place", type = "number" },
})
tickets:create_index('primary', {
    parts = { 'id' },
    if_not_exists = true
})

Connect to the instance using the tt connect command:

tt connect 127.0.0.1:3301

Then try to execute the transaction with yield inside:

box.atomic(function() tickets:replace{1, 429} fiber.yield() tickets:replace{2, 429} end)

You will receive an error message:

---
- error: Transaction has been aborted by a fiber yield
...

Also, if you leave a transaction open while returning from a request, you will get an error message:

127.0.0.1:3301> box.begin()
    ⨯ Failed to execute command: Transaction is active at return from function

Change memtx_use_mvcc_engine to true, restart Tarantool, and try again:

127.0.0.1:3301> box.atomic(function() tickets:replace{1, 429} fiber.yield() tickets:replace{2, 429} end)
---
...

Now check if this transaction was successful:

127.0.0.1:3301> box.space.tickets:select({}, {limit = 10})
---
- - [1, 429]
  - [2, 429]
...

Since v. 2.10.0, IPROTO implements streams and interactive transactions that can be used when memtx_use_mvcc_engine is enabled on the server.

A stream supports multiplexing several transactions over one connection. Each stream has its own identifier, which is unique within the connection. All requests with the same non-zero stream ID belong to the same stream. All requests in a stream are executed strictly sequentially. This allows the implementation of interactive transactions. If the stream ID of a request is 0, it does not belong to any stream and is processed in the old way.

In net.box, a stream is an object above the connection that has the same methods but allows sequential execution of requests. The ID is automatically generated on the client side. If a user writes their own connector and wants to use streams, they must transmit the stream_id over the IPROTO protocol.

Unlike a thread, which involves multitasking and execution within a program, a stream transfers data via the protocol between a client and a server.

An interactive transaction is one that does not need to be sent in a single request. There are multiple ways to begin, commit, and roll back a transaction, and they can be mixed. You can use stream:begin(), stream:commit(), stream:rollback() or the appropriate stream methods – call, eval, or execute – using the SQL transaction syntax.

Let’s create a Lua client (client.lua) and run it with Tarantool:

local net_box = require 'net.box'
local conn = net_box.connect('127.0.0.1:3301')
local conn_tickets = conn.space.tickets
local yaml = require 'yaml'

local stream = conn:new_stream()
local stream_tickets = stream.space.tickets

-- Begin transaction over an iproto stream:
stream:begin()
print("Replaced in a stream\n".. yaml.encode(  stream_tickets:replace({1, 768}) ))

-- Empty select, the transaction was not committed.
-- You can't see it from the requests that do not belong to the
-- transaction.
print("Selected from outside of transaction\n".. yaml.encode(conn_tickets:select({}, {limit = 10}) ))

-- Select returns the previously inserted tuple
-- because this select belongs to the transaction:
print("Selected from within transaction\n".. yaml.encode(stream_tickets:select({}, {limit = 10}) ))

-- Commit transaction:
stream:commit()

-- Now this select also returns the tuple because the transaction has been committed:
print("Selected again from outside of transaction\n".. yaml.encode(conn_tickets:select({}, {limit = 10}) ))

os.exit()

Then call it and see the following output:

Replaced in a stream
--- [1, 768]
...

Selected from outside of transaction
---
- [1, 429]
- [2, 429]
...

Selected from within transaction
---
- [1, 768]
- [2, 429]
...

Selected again from outside of transaction
---
- [1, 768]
- [2, 429]
...```

Replication

Механизм репликации позволяет сразу многим экземплярам Tarantool работать с копиями одних и тех же баз данных. При этом все базы остаются в синхронизированном состоянии благодаря тому, что каждый экземпляр может сообщать другим экземплярам о совершенных им изменениях.

This section includes the following topics:

For practical guides to replication, see Replication tutorials. You can learn about bootstrapping a replica set, adding instances to the replica set, or removing them.

Архитектура механизма репликации

A pack of instances that operate on copies of the same databases makes up a replica set. Each instance in a replica set has a role: master or replica.

A replica gets all updates from the master by continuously fetching and applying its write-ahead log (WAL). Each record in the WAL represents a single Tarantool data-change request such as INSERT, UPDATE, or DELETE, and is assigned a monotonically growing log sequence number (LSN). In essence, Tarantool replication is row-based: each data-change request is fully deterministic and operates on a single tuple. However, unlike a classical row-based log, which contains entire copies of the changed rows, Tarantool’s WAL contains copies of the requests. For example, for UPDATE requests, Tarantool only stores the primary key of the row and the update operations to save space.

Примечание

WAL extensions available in Tarantool Enterprise Edition enable you to add auxiliary information to each write-ahead log record. This information might be helpful for implementing a CDC (Change Data Capture) utility that transforms a data replication stream.

The following are specifics of adding different types of information to the WAL:

  • Invocations of stored programs are not written to the WAL. Instead, records of the actual data-change requests, performed by the Lua code, are written to the WAL. This ensures that the possible non-determinism of Lua does not cause replication to go out of sync.
  • Data definition operations on temporary spaces (created with temporary = true), such as creating/dropping, adding indexes, and truncating, are written to the WAL, since information about temporary spaces is stored in non-temporary system spaces, such as box.space._space.
  • Data change operations on temporary spaces are not written to the WAL and are not replicated.
  • Data change operations on replication-local spaces (created with is_local = true) are written to the WAL but are not replicated.

To learn how to enable replication, check the Bootstrapping a replica set guide.

To create a valid initial state, to which WAL changes can be applied, every instance of a replica set requires a start set of checkpoint files, such as .snap files for memtx and .run files for vinyl. A replica goes through the following stages:

  1. Bootstrap (optional)

    When an entire replica set is bootstrapped for the first time, there is no master that could provide the initial checkpoint. In such a case, replicas connect to each other and elect a master. The master creates the starting set of checkpoint files and distributes them to all the other replicas. This is called an automatic bootstrap of a replica set.

  2. Join

    At this stage, a replica downloads the initial state from the master. The master register this replica in the box.space._cluster space. If join fails with a non-critical error, for example, ER_READONLY, ER_ACCESS_DENIED, or a network-related issue, an instance tries to find a new master to join.

    Примечание

    On subsequent connections, a replica downloads all changes happened after the latest local LSN (there can be many LSNs – each master has its own LSN).

  3. Follow

    At this stage, a replica fetches and applies updates from the master’s write-ahead log.

You can use the box.info.replication[n].upstream.status property to monitor the status of a replica.

Each replica set is identified by a globally unique identifier, called the replica set UUID. The identifier is created by the master, which creates the very first checkpoint and is part of the checkpoint file. It is stored in the box.space._schema system space, for example:

tarantool> box.space._schema:select{'cluster'}
---
- - ['cluster', '6308acb9-9788-42fa-8101-2e0cb9d3c9a0']
...

Additionally, each instance in a replica set is assigned its own UUID, when it joins the replica set. It is called an instance UUID and is a globally unique identifier. The instance UUID is checked to ensure that instances do not join a different replica set, e.g. because of a configuration error. A unique instance identifier is also necessary to apply rows originating from different masters only once, that is, to implement multi-master replication. This is why each row in the write-ahead log, in addition to its log sequence number, stores the instance identifier of the instance on which it was created. But using a UUID as such an identifier would take too much space in the write-ahead log, thus a shorter integer number is assigned to the instance when it joins a replica set. This number is then used to refer to the instance in the write-ahead log. It is called instance ID. All identifiers are stored in the system space box.space._cluster, for example:

tarantool> box.space._cluster:select{}
---
- - [1, '88580b5c-4474-43ab-bd2b-2409a9af80d2']
...

Здесь ID экземпляра – 1 (уникальный номер в рамках набора реплик), а UUID экземпляра – 88580b5c-4474-43ab-bd2b-2409a9af80d2 (глобально уникальный).

Использование идентификаторов экземпляра также полезно для отслеживания состояния всего набора реплик. Например, box.info.vclock описывает состояние репликации в отношении каждого подключенного узла.

tarantool> box.info.vclock
---
- {1: 827, 2: 584}
...

Here vclock contains log sequence numbers (827 and 584) for instances with instance IDs 1 and 2.

If required, you can explicitly specify the instance and the replica set UUID values rather than letting Tarantool generate them. To learn more, see the replicaset_uuid configuration parameter description.

Конфигурационный параметр read_only определяет роль в репликации (мастер или реплика). Рекомендованная роль для всех экземпляров в наборе реплик, кроме одного – «read-only» (реплика).

В конфигурации мастер-реплика каждое изменение, сделанное на мастере, будет отображаться на репликах, но не наоборот.

../../../_images/mr-1m-2r-oneway.svg

Простой набор реплик с двумя экземплярами, один из которых является мастером и расположен на одной машине, а другой – реплика – расположен на другой машине, дает два преимущества:

В конфигурации мастер-мастер (которая также называется «многомастерной») каждое изменение на любом экземпляре будет также отображаться на другом.

../../../_images/mm-3m-mesh.svg

Восстановление после отказа в таком случае также будет преимуществом, а балансировка нагрузки улучшится, поскольку любой экземпляр может обрабатывать запросы и на чтение, и на запись. В то же время, при многомастерной конфигурации необходимо понимать гарантии репликации, которые обеспечивает асинхронный протокол, внедренный в Tarantool.

Многомастерная репликация Tarantool гарантирует, что каждое изменение на каждом мастере передается на все экземпляры и применяется только один раз. Изменения с одного экземпляра применяются в том же порядке, что и на исходном экземпляре. Однако изменения с разных экземпляров могут смешиваться и применяться в различном порядке на разных экземплярах. В определенных случаях это может привести к рассинхронизации.

Например, принимая, что проводятся только операции добавления данных в базу (т.е. она содержит только вставки), многомастерная конфигурация сработает хорошо. Если данные также удаляются, но порядок операций удаления на разных репликах не играет важной роли (например, DELETE используется для отсечения устаревших данных), то конфигурация мастер-мастер также безопасна.

UPDATE operations, however, can easily go out of sync. For example, assignment and increment are not commutative and may yield different results if applied in a different order on different instances.

В общем смысле, безопасно использовать репликацию мастер-мастер в Tarantool, если все изменения в базе данных являются коммутативными: конечный результат не зависит от порядка, в котором применяются изменения. Дополнительную информацию о бесконфликтных типах реплицируемых данных можно получить здесь.

Replication topology is set by the replication configuration parameter. The recommended topology is a full mesh because it makes potential failover easy.

Some database products offer cascading replication topologies: creating a replica on a replica. Tarantool does not recommend such a setup.

../../../_images/no-cascade.svg

Недостаток каскадного набора реплик заключается в том, что некоторые экземпляры не подключаются к другим экземплярам, поэтому не могут получать от них изменения. Одно важное изменение, которое следует передавать на все экземпляры в наборе реплик – запись в системный спейс box.space._cluster с UUID набора реплик. Не зная UUID набора реплик, мастер отклоняет подключения от таких экземпляров при изменении топологии репликации. Вот как это может произойти:

../../../_images/cascade-problem-1.svg

We have a chain of three instances. Instance #1 contains entries for instances #1 and #2 in its _cluster space. Instances #2 and #3 contain entries for instances #1, #2, and #3 in their _cluster spaces.

../../../_images/cascade-problem-2.svg

Now instance #2 is faulty. Instance #3 tries connecting to instance #1 as its new master, but the master refuses the connection since it has no entry, for example, #3.

Тем не менее, кольцевая топология поддерживается:

../../../_images/cascade-to-ring.svg

Поэтому если необходима каскадная топология, можно первоначально создать кольцо, чтобы все экземпляры знали UUID друг друга, а затем разъединить цепочку в необходимом месте.

Как бы то ни было, для репликации мастер-мастер рекомендуется полная ячеистая топология:

../../../_images/mm-3m-mesh.svg

В таком случае можно решить, где расположить экземпляры ячейки – в том же центре обработки данных или разместить в нескольких центрах. Tarantool будет автоматически следить за тем, что каждая строка применяется однократно на каждом экземпляре. Чтобы удалить экземпляр из ячейки после отказа, просто измените конфигурационный параметр replication.

Таким образом можно обеспечить доступность всего кластера в случае локального отказа, например отказа одного экземпляра в одном центре обработки данных, а также в случае отказа всего центра обработки данных.

Максимальное количество реплик в ячейке – 32.

During box.cfg(), an instance tries to join all nodes listed in box.cfg.replication. If the instance does not succeed in connecting to the required number of nodes (see bootstrap_strategy), it switches to the orphan status.

Синхронная репликация

По умолчанию репликация в Tarantool асинхронная: локальный коммит транзакции на мастере не означает, что эта транзакция будет сразу же выполнена на репликах. Если мастер сообщит клиенту об успешном выполнении операции, а потом прекратит работу и после отказа восстановится на реплике, то с точки зрения клиента транзакция пропадет.

Эту проблему решает синхронная репликация. Каждая синхронная транзакция проходит коммит лишь после репликации на некотором количестве экземпляров, и только тогда клиенту приходит ответ о завершении транзакции.

To enable synchronous replication, use the space_opts.is_sync option when creating or altering a space.

В Tarantool синхронную репликацию можно настроить для отдельных спейсов. Эта удобная функция позволит вам не потерять в производительности, если синхронная репликация нужна вам лишь изредка для изменения критически важных данных.

Если наряду с несколькими синхронными транзакциями, ожидающими репликации, совершается асинхронная транзакция, она блокируется синхронными. Коммиты при этом выполняются в той последовательности, в которой для каждой из транзакций вызывается метод box.commit(). Похожим образом работает обычная очередь асинхронных транзакций. Можно сформулировать правило коммитов: порядок коммитов соответствует порядку вызова box.commit() для каждой из транзакций, независимо от того, синхронные транзакции или асинхронные.

Если для одной из синхронных транзакций истечет время ожидания, эта транзакция будет отменена, а вместе с ней и все последующие транзакции в очереди на репликацию. Похожим образом отменяются и асинхронные транзакции при ошибке записи в WAL. Действует правило отмены: транзакции отменяются в порядке, обратном порядку вызова box.commit() для каждой из них, независимо от того, синхронные транзакции или асинхронные.

Асинхронная транзакция, заблокированная синхронной, не становится сама синхронной, а просто ожидает коммита синхронной транзакции. Как только это произойдет, асинхронная транзакция сразу сможет пройти коммит, не ожидая репликации.

Предупреждение

Будьте осторожны при одновременном использовании синхронных и асинхронных транзакций. Для асинхронных транзакций коммит засчитывается успешным даже в том случае, если нет соединения с другими узлами. Поэтому на старом leader-узле (владельце очереди синхронных транзакций) могут быть асинхронные транзакции, которые прошли коммит, но отсутствуют на других узлах набора реплик.

Когда предыдущий лидер снова становится доступен в наборе реплик, он начинает получать данные с нового лидера. В это же время остальные узлы в наборе реплик начинают получать данные со старого лидера, которых у них еще нет. Эти данные содержат и асинхронные транзакции, прошедшие коммит. В этот момент система выдаст ошибку ER_SPLIT_BRAIN, заставляя пользователя провести повторную настройку (rebootstrap) предыдущего лидера.

До версии 2.5.2 способа настроить синхронную репликацию для существующих спейсов не было. Однако, начиная с версии 2.5.2, ее можно включить, вызвав метод space_object:alter({is_sync = true}).

Синхронные транзакции работают исключительно в топологии «мастер-реплика». В кластере может быть несколько реплик, в том числе анонимных, однако синхронные транзакции должен совершать только один узел.

Начиная с версии Tarantool 2.10.0, анонимные реплики не участвуют в кворуме.

В Tarantool, начиная с версии 2.6.1, есть встроенная функциональность для управления автоматическими выборами лидера (automated leader election) в наборе реплик. Подробности можно найти в соответствующей главе.

Автоматические выборы лидера

В Tarantool, начиная с версии 2.6.1, есть встроенная функциональность для управления автоматическими выборами лидера (automated leader election) в наборе реплик (replica set). Эта функциональность повышает отказоустойчивость систем на базе Tarantool и снижает зависимость от внешних инструментов для управления набором реплик.

To learn how to configure and monitor automated leader elections, check Managing leader elections.

Ниже описаны следующие темы:

В Tarantool используется модификация Raft — алгоритма синхронной репликации и автоматических выборов лидера. Полное описание алгоритма Raft можно прочитать в соответствующем документе.

Синхронная репликация и выборы лидера в Tarantool реализованы как две независимые подсистемы. Это означает, что можно настроить синхронную репликацию, а для выборов лидера использовать альтернативный алгоритм. Встроенный механизм выборов лидера, в свою очередь, не требует использования синхронных спейсов. Синхронной репликации посвящен этот раздел документации. Процесс выборов лидера описан ниже.

Примечание

The system behavior can be specified exactly according to the Raft algorithm. To do this:

Автоматические выборы лидера в Tarantool гарантируют, что в каждый момент времени в наборе реплик будет максимум один лидер — узел, доступный для записи. Все остальные узлы будут принимать исключительно запросы на чтение.

Когда функция выборов включена, жизненный цикл набора реплик разделен на так называемые термы (term). Каждый терм описывается монотонно растущим числом. После первой загрузки узла значение его терма равно 1. Когда узел обнаруживает, что не является лидером и при этом лидера в наборе реплик уже какое-то время нет, он увеличивает значение своего терма и начинает новый тур выборов.

Выборы лидера происходят посредством голосования. Узел, начинающий выборы, голосует сам за себя и отправляет другим запросы на голос. Каждый экземпляр голосует за первый узел, от которого пришел такой запрос, и далее в течение всего терма ожидает избрания лидера, не выполняя никаких действий.

The node that collected a quorum of votes defined by the replication.synchro_quorum parameter becomes the leader and notifies other nodes about that. Also, a split vote can happen when no nodes received a quorum of votes. In this case, after a random timeout, each node increases its term and starts a new election round if no new vote request with a greater term arrives during this time. Eventually, a leader is elected.

If any unfinalized synchronous transactions are left from the previous leader, the new leader finalizes them automatically.

All the non-leader nodes are called followers. The nodes that start a new election round are called candidates. The elected leader sends heartbeats to the non-leader nodes to let them know it is alive.

In case there are no heartbeats for the period of replication.timeout * 4, a non-leader node starts a new election if the following conditions are met:

Примечание

A cluster member considers the leader node to be alive if the member received heartbeats from the leader at least once during the replication.timeout * 4, and there are no replication errors (the connection is not broken due to timeout or due to an error).

Terms and votes are persisted by each instance to preserve certain Raft guarantees.

При голосовании узлы отдают предпочтение экземплярам, где сохранены самые новые данные. Поэтому, если прежний лидер перед тем, как стать недоступным, отправит кворуму реплик какую-либо информацию, она не будет потеряна.

When election is enabled, there must be connections between each node pair so as it would be the full mesh topology. This is needed because election messages for voting and other internal things need a direct connection between the nodes.

In the classic Raft algorithm, a leader doesn’t track its connectivity to the rest of the cluster. Once the leader is elected, it considers itself in the leader position until receiving a new term from another cluster node. This can lead to a split situation if the other nodes elect a new leader upon losing the connectivity to the previous one.

The issue is resolved in Tarantool version 2.10.0 by introducing the leader fencing mode. The mode can be switched by the replication.election_fencing_mode configuration parameter. When the fencing is set to soft or strict, the leader resigns its leadership if it has less than replication.synchro_quorum of alive connections to the cluster nodes. The resigning leader receives the status of a follower in the current election term and becomes read-only. Leader fencing can be turned off by setting the replication.election_fencing_mode configuration parameter to off.

In soft mode, a connection is considered dead if there are no responses for 4 * replication.timeout seconds both on the current leader and the followers.

In strict mode, a connection is considered dead if there are no responses for 2 * replication.timeout seconds on the current leader and for 4 * replication.timeout seconds on the followers. This improves chances that there is only one leader at any time.

Fencing applies to the instances that have the replication.election_mode set to «candidate» or «manual».

There can still be a situation when a replica set has two leaders working independently (so-called split-brain). It can happen, for example, if a user mistakenly lowered the replication.synchro_quorum below N / 2 + 1. In this situation, to preserve the data integrity, if an instance detects the split-brain anomaly in the incoming replication data, it breaks the connection with the instance sending the data and writes the ER_SPLIT_BRAIN error in the log.

Eventually, there will be two sets of nodes with the diverged data, and any node from one set is disconnected from any node from the other set with the ER_SPLIT_BRAIN error.

Once noticing the error, a user can choose any representative from each of the sets and inspect the data on them. To correlate the data, the user should remove it from the nodes of one set, and reconnect them to the nodes from the other set that have the correct data.

Любой узел, участвующий в процессе выборов, реплицирует данные только с последнего избранного лидера. Это позволяет избежать ситуации, в которой прежний лидер после выборов нового все еще пытается отправлять изменения на реплики.

Числовые значения термов также выполняют функцию своеобразного фильтра. Например, если на двух узлах включена функция выборов и значение терма node1 меньше значения терма node2, то узел node2 не будет принимать транзакций от узла node1.

replication:
  election_mode: <string>
  election_fencing_mode: <string>
  election_timeout: <seconds>
  timeout: <seconds>
  synchro_quorum: <count>

It is important to know that being a leader is not the only requirement for a node to be writable. The leader should also satisfy the following requirements:

  • The database.mode option is set to rw.
  • The leader shouldn’t be in the orphan state.

Nothing prevents you from setting the database.mode option to ro, but the leader won’t be writable then. The option doesn’t affect the election process itself, so a read-only instance can still vote and become a leader.

To monitor the current state of a node regarding the leader election, use the box.info.election function.

Example:

tarantool> box.info.election
---
- state: follower
  vote: 0
  leader: 0
  term: 1
...

The Raft-based election implementation logs all its actions with the RAFT: prefix. The actions are new Raft message handling, node state changing, voting, and term bumping.

Leader election doesn’t work correctly if the election quorum is set to less or equal than <cluster size> / 2. In that case, a split vote can lead to a state when two leaders are elected at once.

For example, suppose there are five nodes. When the quorum is set to 2, node1 and node2 can both vote for node1. node3 and node4 can both vote for node5. In this case, node1 and node5 both win the election. When the quorum is set to the cluster majority, that is (<cluster size> / 2) + 1 or greater, the split vote is impossible.

That should be considered when adding new nodes. If the majority value is changing, it’s better to update the quorum on all the existing nodes before adding a new one.

Also, the automated leader election doesn’t bring many benefits in terms of data safety when used without synchronous replication. If the replication is asynchronous and a new leader gets elected, the old leader is still active and considers itself the leader. In such case, nothing stops it from accepting requests from clients and making transactions. Non-synchronous transactions are successfully committed because they are not checked against the quorum of replicas. Synchronous transactions fail because they are not able to collect the quorum – most of the replicas reject these old leader’s transactions since it is not a leader anymore.

Supervised failover

Enterprise Edition

Supervised failover is supported by the Enterprise Edition only.

Example on GitHub: supervised_failover

Tarantool provides the ability to control leadership in a replica set using an external failover coordinator. A failover coordinator reads a cluster configuration from a file or an etcd-based configuration storage, polls instances for their statuses, and appoints a leader for each replica set depending on the availability and health of instances.

To increase fault tolerance, you can run two or more failover coordinators. In this case, an etcd cluster provides synchronization between coordinators.


Supervised failover

The main steps of using an external failover coordinator for a newly configured cluster might look as follows:

  1. Configure a cluster to work with an external coordinator. The main step is setting the replication.failover option to supervised for all replica sets that should be managed by the external coordinator.
  2. Start a configured cluster. When an external coordinator is still not running, instances in a replica set start in the following modes:
    • If a replica set is already bootstrapped, all instances are started in read-only mode.
    • If a replica set is not bootstrapped, one instance is started in read-write mode.
  3. Start a failover coordinator. You can start two or more failover coordinators to increase fault tolerance. In this case, one coordinator is active and others are passive.

Once a cluster and failover coordinators are up and running, a failover coordinator appoints one instance to be a master if there is no master instance in a replica set. Then, the following events may occur:

Примечание

Note that a failover coordinator doesn’t work with replica sets with two or more read-write instances. In this case, a coordinator logs a warning to stdout and doesn’t perform any appointments.

After a master instance has been appointed, a failover coordinator monitors the statuses of all instances in a replica set by sending requests each probe_interval seconds. For the master instance, the coordinator maintains a read-write mode deadline, which is renewed periodically each renew_interval seconds. If all attempts to renew the deadline fail during the specified time interval (lease_interval), the master switches to read-only mode. Then, the coordinator appoints a new instance as the master.

Примечание

Anonymous replicas are not considered as candidates to be a master.

If a remote etcd-based storage is used to maintain the state of failover coordinators, you can also perform a manual failover.

To increase fault tolerance, you can run two or more failover coordinators. In this case, only one coordinator is active and used to control leadership in a replica set. Other coordinators are passive and don’t perform any read-write appointments.

To maintain the state of coordinators, Tarantool uses a stateboard – a remote etcd-based storage. This storage uses the same connection settings as a centralized etcd-based configuration storage. If a cluster configuration is stored in the <prefix>/config/* keys in etcd, the failover coordinator looks into <prefix>/failover/* for its state. Here are a few examples of keys used for different purposes:

  • <prefix>/failover/info/by-uuid/<uuid>: contains a state of a failover coordinator identified by the specified uuid.
  • <prefix>/failover/active/lock: a unique identifier (UUID) of an active failover coordinator.
  • <prefix>/failover/active/term: a kind of fencing token allowing to have an order in which coordinators become active (took the lock) over time.
  • <prefix>/failover/command/<id>: a key used to perform a manual failover.

To configure a cluster to work with an external failover coordinator, follow the steps below:

  1. (Optional) If you need to run several failover coordinators to increase fault tolerance, set up an etcd-based configuration storage, as described in Centralized configuration storages.

  2. Set the replication.failover option to supervised:

    replication:
      failover: supervised
    
  3. Grant a user used for replication permissions to execute the failover.execute function:

    credentials:
      users:
        replicator:
          password: 'topsecret'
          roles: [ replication ]
          privileges:
          - permissions: [ execute ]
            lua_call: [ 'failover.execute' ]
    

    Примечание

    In Tarantool 3.0 and 3.1, the configuration is different and the function must be created in the application code. See Tarantool 3.0 and 3.1 configuration for details.

  4. (Optional) Configure options that control how a failover coordinator operates in the failover section:

    failover:
      probe_interval: 5
      lease_interval: 15
      renew_interval: 5
      stateboard:
        keepalive_interval: 5
        renew_interval: 1
    

You can find the full example on GitHub: supervised_failover.

Before version 3.2, Tarantool used another mechanism to grant execute access to Lua functions. In Tarantool 3.0 and 3.1, the credentials configuration section should look as follows:

# Tarantool 3.0 and 3.1
credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [ replication ]
      privileges:
      - permissions: [ execute ]
        functions: [ 'failover.execute' ]

Additionally, you should create the failover.execute function in the application code. For example, you can create a custom role for this purpose:

-- Tarantool 3.0 and 3.1 --
-- supervised_instance.lua --
return {
    validate = function()
    end,
    apply = function()
        if box.info.ro then
            return
        end
        local func_name = 'failover.execute'
        local opts = { if_not_exists = true }
        box.schema.func.create(func_name, opts)
    end,
    stop = function()
        if box.info.ro then
            return
        end
        local func_name = 'failover.execute'
        if not box.schema.func.exists(func_name) then
            return
        end
        box.schema.func.drop(func_name)
    end,
}

Then, enable this role for all storage instances:

# Tarantool 3.0 and 3.1
roles: [ 'supervised_instance' ]

To start a failover coordinator, you need to execute the tarantool command with the failover option. This command accepts the path to a cluster configuration file:

tarantool --failover --config instances.enabled/supervised_failover/config.yaml

If a cluster’s configuration is stored in etcd, the config.yaml file contains connection options for the etcd storage.

You can run two or more failover coordinators to increase fault tolerance. In this case, only one coordinator is active and used to control leadership in a replica set. Learn more from Active and passive coordinators.

If an etcd-based storage is used to maintain the state of failover coordinators, you can perform a manual failover. External tools can use the <prefix>/failover/command/<id> key to choose a new master. For example, the tt utility provides the tt cluster failover command for managing a supervised failover.

Replication tutorials

Master-replica: manual failover

Example on GitHub: manual_leader

This tutorial shows how to configure and work with a replica set with manual failover.

Before starting this tutorial:

  1. Install the tt utility.

  2. Create a tt environment in the current directory by executing the tt init command.

  3. Inside the instances.enabled directory of the created tt environment, create the manual_leader directory.

  4. Inside instances.enabled/manual_leader, create the instances.yml and config.yaml files:

    • instances.yml specifies instances to run in the current environment and should look like this:

      instance001:
      instance002:
      
    • The config.yaml file is intended to store a replica set configuration.

This section describes how to configure a replica set in config.yaml.

First, set the replication.failover option to manual:

replication:
  failover: manual

Define a replica set topology inside the groups section:

  • The leader option sets instance001 as a replica set leader.
  • The iproto.listen option specifies an address used to listen for incoming requests and allows replicas to communicate with each other.
groups:
  group001:
    replicasets:
      replicaset001:
        leader: instance001
        instances:
          instance001:
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            iproto:
              listen:
              - uri: '127.0.0.1:3302'

In the credentials section, create the replicator user with the replication role:

credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [replication]

Set iproto.advertise.peer to advertise the current instance to other replica set members:

iproto:
  advertise:
    peer:
      login: replicator

The resulting replica set configuration should look as follows:

credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [replication]

iproto:
  advertise:
    peer:
      login: replicator

replication:
  failover: manual

groups:
  group001:
    replicasets:
      replicaset001:
        leader: instance001
        instances:
          instance001:
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            iproto:
              listen:
              - uri: '127.0.0.1:3302'

  1. After configuring a replica set, execute the tt start command from the tt environment directory:

    $ tt start manual_leader
       • Starting an instance [manual_leader:instance001]...
       • Starting an instance [manual_leader:instance002]...
    
  2. Check that instances are in the RUNNING status using the tt status command:

    $ tt status manual_leader
    INSTANCE                   STATUS   PID   MODE  CONFIG  BOX      UPSTREAM
    manual_leader:instance001  RUNNING  8841  RW    ready   running  --
    manual_leader:instance002  RUNNING  8842  RO    ready   running  --
    

  1. Connect to instance001 using tt connect:

    $ tt connect manual_leader:instance001
       • Connecting to the instance...
       • Connected to manual_leader:instance001
    
  2. Make sure that the instance is in the running state by executing box.info.status:

    manual_leader:instance001> box.info.status
    ---
    - running
    ...
    
  3. Check that the instance is writable using box.info.ro:

    manual_leader:instance001> box.info.ro
    ---
    - false
    ...
    
  4. Execute box.info.replication to check a replica set status. For instance002, upstream.status and downstream.status should be follow.

    manual_leader:instance001> box.info.replication
    ---
    - 1:
        id: 1
        uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
        lsn: 7
        name: instance001
      2:
        id: 2
        uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
        lsn: 0
        upstream:
          status: follow
          idle: 0.3893879999996
          peer: replicator@127.0.0.1:3302
          lag: 0.00028800964355469
        name: instance002
        downstream:
          status: follow
          idle: 0.37777199999982
          vclock: {1: 7}
          lag: 0
    ...
    

    To see the diagrams that illustrate how the upstream and downstream connections look, refer to Monitoring a replica set.

To check that a replica (instance002) gets all updates from the master, follow the steps below:

  1. On instance001, create a space and add data as described in CRUD operation examples.

  2. Open the second terminal, connect to instance002 using tt connect, and use the select operation to make sure data is replicated.

  3. Check that box.info.vclock values are the same on both instances:

    • instance001:

      manual_leader:instance001> box.info.vclock
      ---
      - {1: 21}
      ...
      
    • instance002:

      manual_leader:instance002> box.info.vclock
      ---
      - {1: 21}
      ...
      

    Примечание

    Note that a vclock value might include the 0 component that is related to local space operations and might differ for different instances in a replica set.

This section describes how to add a new replica to a replica set.

  1. Add instance003 to the instances.yml file:

    instance001:
    instance002:
    instance003:
    
  2. Add instance003 with the specified iproto.listen option to the config.yaml file:

    groups:
      group001:
        replicasets:
          replicaset001:
            leader: instance001
            instances:
              instance001:
                iproto:
                  listen:
                  - uri: '127.0.0.1:3301'
              instance002:
                iproto:
                  listen:
                  - uri: '127.0.0.1:3302'
              instance003:
                iproto:
                  listen:
                  - uri: '127.0.0.1:3303'
    

  1. Open the third terminal to work with a new instance. Start instance003 using tt start:

    $ tt start manual_leader:instance003
       • Starting an instance [manual_leader:instance003]...
    
  2. Check a replica set status using tt status:

    $ tt status manual_leader
    INSTANCE                   STATUS   PID   MODE  CONFIG  BOX      UPSTREAM
    manual_leader:instance001  RUNNING  8841  RW    ready   running  --
    manual_leader:instance002  RUNNING  8842  RO    ready   running  --
    manual_leader:instance003  RUNNING  8856  RO    ready   running  --
    

After you added instance003 to the configuration and started it, you need to reload configurations on all instances. This is required to allow instance001 and instance002 to get data from the new instance in case it becomes a master.

  1. Connect to instance003 using tt connect:

    $ tt connect manual_leader:instance003
       • Connecting to the instance...
       • Connected to manual_leader:instance001
    
  2. Reload configurations on all three instances using the reload() function provided by the config module:

    • instance001:

      manual_leader:instance001> require('config'):reload()
      ---
      ...
      
    • instance002:

      manual_leader:instance002> require('config'):reload()
      ---
      ...
      
    • instance003:

      manual_leader:instance003> require('config'):reload()
      ---
      ...
      
  3. Execute box.info.replication to check a replica set status. Make sure that upstream.status and downstream.status are follow for instance003.

    manual_leader:instance001> box.info.replication
    ---
    - 1:
        id: 1
        uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
        lsn: 21
        name: instance001
      2:
        id: 2
        uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
        lsn: 0
        upstream:
          status: follow
          idle: 0.052655000000414
          peer: replicator@127.0.0.1:3302
          lag: 0.00010204315185547
        name: instance002
        downstream:
          status: follow
          idle: 0.09503500000028
          vclock: {1: 21}
          lag: 0.00026917457580566
      3:
        id: 3
        uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6
        lsn: 0
        upstream:
          status: follow
          idle: 0.77522099999987
          peer: replicator@127.0.0.1:3303
          lag: 0.0001838207244873
        name: instance003
        downstream:
          status: follow
          idle: 0.33186100000012
          vclock: {1: 21}
          lag: 0
            ...
    

This section shows how to perform manual failover and change a replica set leader.

  1. In the config.yaml file, change the replica set leader from instance001 to null:

    replicaset001:
      leader: null
    
  2. Reload configurations on all three instances using config:reload() and check that instances are in read-only mode. The example below shows how to do this for instance001:

    manual_leader:instance001> require('config'):reload()
    ---
    ...
    manual_leader:instance001> box.info.ro
    ---
    - true
    ...
    manual_leader:instance001> box.info.ro_reason
    ---
    - config
    ...
    
  3. Make sure that box.info.vclock values are the same on all instances:

    • instance001:

      manual_leader:instance001> box.info.vclock
      ---
      - {1: 21}
      ...
      
    • instance002:

      manual_leader:instance002> box.info.vclock
      ---
      - {1: 21}
      ...
      
    • instance003:

      manual_leader:instance003> box.info.vclock
      ---
      - {1: 21}
      ...
      

  1. Change a replica set leader in config.yaml to instance002:

    replicaset001:
      leader: instance002
    
  2. Reload configuration on all instances using config:reload().

  3. Make sure that instance002 is a new master:

    manual_leader:instance002> box.info.ro
    ---
    - false
    ...
    
  4. Check replication status using box.info.replication.

This section describes the process of removing an instance from a replica set.

Before removing an instance, make sure it is in read-only mode. If the instance is a master, perform manual failover.

  1. Clear the iproto option for instance003 by setting its value to {}:

    instance003:
      iproto: {}
    
  2. Reload configurations on instance001 and instance002:

    • instance001:

      manual_leader:instance001> require('config'):reload()
      ---
      ...
      
    • instance002:

      manual_leader:instance002> require('config'):reload()
      ---
      ...
      
  3. Check that the upstream section is missing for instance003 by executing box.info.replication[3]:

    manual_leader:instance001> box.info.replication[3]
    ---
    - id: 3
      uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6
      lsn: 0
      downstream:
        status: follow
        idle: 0.4588760000006
        vclock: {1: 21}
        lag: 0
      name: instance003
    ...
    

  1. Stop instance003 using the tt stop command:

    $ tt stop manual_leader:instance003
       • The Instance manual_leader:instance003 (PID = 15551) has been terminated.
    
  2. Check that downstream.status is stopped for instance003:

    manual_leader:instance001> box.info.replication[3]
    ---
    - id: 3
      uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6
      lsn: 0
      downstream:
        status: stopped
        message: 'unexpected EOF when reading from socket, called on fd 27, aka 127.0.0.1:3301,
          peer of 127.0.0.1:54185: Broken pipe'
        system_message: Broken pipe
      name: instance003
    ...
    

  1. Remove instance003 from the instances.yml file:

    instance001:
    instance002:
    
  2. Remove instance003 from config.yaml:

    instances:
      instance001:
        iproto:
          listen:
          - uri: '127.0.0.1:3301'
      instance002:
        iproto:
          listen:
          - uri: '127.0.0.1:3302'
    
  3. Reload configurations on instance001 and instance002:

    • instance001:

      manual_leader:instance001> require('config'):reload()
      ---
      ...
      
    • instance002:

      manual_leader:instance002> require('config'):reload()
      ---
      ...
      

To remove an instance from the replica set permanently, it should be removed from the box.space._cluster system space:

  1. Select all the tuples in the box.space._cluster system space:

    manual_leader:instance002> box.space._cluster:select{}
    ---
    - - [1, '9bb111c2-3ff5-36a7-00f4-2b9a573ea660', 'instance001']
      - [2, '4cfa6e3c-625e-b027-00a7-29b2f2182f23', 'instance002']
      - [3, '9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6', 'instance003']
    ...
    
  2. Delete a tuple corresponding to instance003:

    manual_leader:instance002> box.space._cluster:delete(3)
    ---
    - [3, '9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6', 'instance003']
    ...
    
  3. Execute box.info.replication to check the health status:

    manual_leader:instance002> box.info.replication
    ---
    - 1:
        id: 1
        uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
        lsn: 21
        upstream:
          status: follow
          idle: 0.73316000000159
          peer: replicator@127.0.0.1:3301
          lag: 0.00016212463378906
        name: instance001
        downstream:
          status: follow
          idle: 0.7269320000014
          vclock: {2: 1, 1: 21}
          lag: 0.00083398818969727
      2:
        id: 2
        uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
        lsn: 1
        name: instance002
    ...
    

Master-replica: automated failover

Example on GitHub: auto_leader

This tutorial shows how to configure and work with a replica set with automated failover.

Before starting this tutorial:

  1. Install the tt utility.

  2. Create a tt environment in the current directory by executing the tt init command.

  3. Inside the instances.enabled directory of the created tt environment, create the auto_leader directory.

  4. Inside instances.enabled/auto_leader, create the instances.yml and config.yaml files:

    • instances.yml specifies instances to run in the current environment and should look like this:

      instance001:
      instance002:
      instance003:
      
    • The config.yaml file is intended to store a replica set configuration.

This section describes how to configure a replica set in config.yaml.

First, set the replication.failover option to election:

replication:
  failover: election

Define a replica set topology inside the groups section. The iproto.listen option specifies an address used to listen for incoming requests and allows replicas to communicate with each other.

groups:
  group001:
    replicasets:
      replicaset001:
        instances:
          instance001:
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            iproto:
              listen:
              - uri: '127.0.0.1:3302'
          instance003:
            iproto:
              listen:
              - uri: '127.0.0.1:3303'

In the credentials section, create the replicator user with the replication role:

credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [replication]

Set iproto.advertise.peer to advertise the current instance to other replica set members:

iproto:
  advertise:
    peer:
      login: replicator

The resulting replica set configuration should look as follows:

credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [replication]

iproto:
  advertise:
    peer:
      login: replicator

replication:
  failover: election

groups:
  group001:
    replicasets:
      replicaset001:
        instances:
          instance001:
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            iproto:
              listen:
              - uri: '127.0.0.1:3302'
          instance003:
            iproto:
              listen:
              - uri: '127.0.0.1:3303'

  1. After configuring a replica set, execute the tt start command from the tt environment directory:

    $ tt start auto_leader
       • Starting an instance [auto_leader:instance001]...
       • Starting an instance [auto_leader:instance002]...
       • Starting an instance [auto_leader:instance003]...
    
  2. Check that instances are in the RUNNING status using the tt status command:

    $ tt status auto_leader
    INSTANCE                 STATUS   PID   MODE  CONFIG  BOX      UPSTREAM
    auto_leader:instance001  RUNNING  9170  RO    ready   running  --
    auto_leader:instance002  RUNNING  9171  RO    ready   running  --
    auto_leader:instance003  RUNNING  9172  RW    ready   running  --
    

  1. Connect to instance001 using tt connect:

    $ tt connect auto_leader:instance001
       • Connecting to the instance...
       • Connected to auto_leader:instance001
    
  2. Check the instance state in regard to leader election using box.info.election. The output below shows that instance001 is a follower while instance002 is a replica set leader.

    auto_leader:instance001> box.info.election
    ---
    - leader_idle: 0.77491499999815
      leader_name: instance002
      state: follower
      vote: 0
      term: 2
      leader: 1
    ...
    
  3. Check that instance001 is in read-only mode using box.info.ro:

    auto_leader:instance001> box.info.ro
    ---
    - true
    ...
    
  4. Execute box.info.replication to check a replica set status. Make sure that upstream.status and downstream.status are follow for instance002 and instance003.

    auto_leader:instance001> box.info.replication
    ---
    - 1:
        id: 1
        uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
        lsn: 9
        upstream:
          status: follow
          idle: 0.8257709999998
          peer: replicator@127.0.0.1:3302
          lag: 0.00012326240539551
        name: instance002
        downstream:
          status: follow
          idle: 0.81174199999805
          vclock: {1: 9}
          lag: 0
      2:
        id: 2
        uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
        lsn: 0
        name: instance001
      3:
        id: 3
        uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6
        lsn: 0
        upstream:
          status: follow
          idle: 0.83125499999733
          peer: replicator@127.0.0.1:3303
          lag: 0.00010204315185547
        name: instance003
        downstream:
          status: follow
          idle: 0.83213399999659
          vclock: {1: 9}
          lag: 0
    ...
    

    To see the diagrams that illustrate how the upstream and downstream connections look, refer to Monitoring a replica set.

To check that replicas (instance001 and instance003) get all updates from the master (instance002), follow the steps below:

  1. Connect to instance002 using tt connect:

    $ tt connect auto_leader:instance002
       • Connecting to the instance...
       • Connected to auto_leader:instance002
    
  2. Create a space and add data as described in CRUD operation examples.

  3. Use the select operation on instance001 and instance003 to make sure data is replicated.

  4. Check that the 1 component of box.info.vclock values are the same on all instances:

    • instance001:

      auto_leader:instance001> box.info.vclock
      ---
      - {0: 1, 1: 32}
      ...
      
    • instance002:

      auto_leader:instance002> box.info.vclock
      ---
      - {0: 1, 1: 32}
      ...
      
    • instance003:

      auto_leader:instance003> box.info.vclock
      ---
      - {0: 1, 1: 32}
      ...
      

Примечание

Note that a vclock value might include the 0 component that is related to local space operations and might differ for different instances in a replica set.

To test how automated failover works if the current master is stopped, follow the steps below:

  1. Stop the current master instance (instance002) using the tt stop command:

    $ tt stop auto_leader:instance002
       • The Instance auto_leader:instance002 (PID = 24769) has been terminated.
    
  2. On instance001, check box.info.election. In this example, a new replica set leader is instance001.

    auto_leader:instance001> box.info.election
    ---
    - leader_idle: 0
      leader_name: instance001
      state: leader
      vote: 2
      term: 3
      leader: 2
    ...
    
  3. Check replication status using box.info.replication for instance002:

    • upstream.status is disconnected.
    • downstream.status is stopped.
    auto_leader:instance001> box.info.replication
    ---
    - 1:
        id: 1
        uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
        lsn: 32
        upstream:
          peer: replicator@127.0.0.1:3302
          lag: 0.00032305717468262
          status: disconnected
          idle: 48.352504000002
          message: 'connect, called on fd 20, aka 127.0.0.1:62575: Connection refused'
          system_message: Connection refused
        name: instance002
        downstream:
          status: stopped
          message: 'unexpected EOF when reading from socket, called on fd 32, aka 127.0.0.1:3301,
            peer of 127.0.0.1:62204: Broken pipe'
          system_message: Broken pipe
      2:
        id: 2
        uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
        lsn: 1
        name: instance001
      3:
        id: 3
        uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6
        lsn: 0
        upstream:
          status: follow
          idle: 0.18620999999985
          peer: replicator@127.0.0.1:3303
          lag: 0.00012516975402832
        name: instance003
        downstream:
          status: follow
          idle: 0.19718099999955
          vclock: {2: 1, 1: 32}
          lag: 0.00051403045654297
    ...
    

    The diagram below illustrates how the upstream and downstream connections look like:

    replication status on a new master
  4. Start instance002 back using tt start:

    $ tt start auto_leader:instance002
       • Starting an instance [auto_leader:instance002]...
    

  1. Make sure that box.info.vclock values (except the 0 components) are the same on all instances:

    • instance001:

      auto_leader:instance001> box.info.vclock
      ---
      - {0: 2, 1: 32, 2: 1}
      ...
      
    • instance002:

      auto_leader:instance002> box.info.vclock
      ---
      - {0: 2, 1: 32, 2: 1}
      ...
      
    • instance003:

      auto_leader:instance003> box.info.vclock
      ---
      - {0: 3, 1: 32, 2: 1}
      ...
      
  2. On instance002, run box.ctl.promote() to choose it as a new replica set leader:

    auto_leader:instance002> box.ctl.promote()
    ---
    ...
    
  3. Check box.info.election to make sure instance002 is a leader now:

    auto_leader:instance002> box.info.election
    ---
    - leader_idle: 0
      leader_name: instance002
      state: leader
      vote: 1
      term: 4
      leader: 1
    ...
    

The process of adding instances to a replica set and removing them is similar for all failover modes. Learn how to do this from the Master-replica: manual failover tutorial:

Before removing an instance from a replica set with replication.failover set to election, make sure this instance is in read-only mode. If the instance is a master, choose a new leader manually.

Master-master

Example on GitHub: master_master

This tutorial shows how to configure and work with a master-master replica set.

Before starting this tutorial:

  1. Install the tt utility.

  2. Create a tt environment in the current directory by executing the tt init command.

  3. Inside the instances.enabled directory of the created tt environment, create the master_master directory.

  4. Inside instances.enabled/master_master, create the instances.yml and config.yaml files:

    • instances.yml specifies instances to run in the current environment and should look like this:

      instance001:
      instance002:
      
    • The config.yaml file is intended to store a replica set configuration.

This section describes how to configure a replica set in config.yaml.

First, set the replication.failover option to off:

replication:
  failover: off

Define a replica set topology inside the groups section:

  • The database.mode option should be set to rw to make instances work in read-write mode.
  • The iproto.listen option specifies an address used to listen for incoming requests and allows replicas to communicate with each other.
groups:
  group001:
    replicasets:
      replicaset001:
        instances:
          instance001:
            database:
              mode: rw
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            database:
              mode: rw
            iproto:
              listen:
              - uri: '127.0.0.1:3302'

In the credentials section, create the replicator user with the replication role:

credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [replication]

Set iproto.advertise.peer to advertise the current instance to other replica set members:

iproto:
  advertise:
    peer:
      login: replicator

The resulting replica set configuration should look as follows:

credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [replication]

iproto:
  advertise:
    peer:
      login: replicator

replication:
  failover: off

groups:
  group001:
    replicasets:
      replicaset001:
        instances:
          instance001:
            database:
              mode: rw
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            database:
              mode: rw
            iproto:
              listen:
              - uri: '127.0.0.1:3302'

  1. After configuring a replica set, execute the tt start command from the tt environment directory:

    $ tt start master_master
       • Starting an instance [master_master:instance001]...
       • Starting an instance [master_master:instance002]...
    
  2. Check that instances are in the RUNNING status using the tt status command:

    $ tt status master_master
    INSTANCE                   STATUS   PID   MODE  CONFIG  BOX      UPSTREAM
    master_master:instance001  RUNNING  9263  RW    ready   running  --
    master_master:instance002  RUNNING  9264  RW    ready   running  --
    

  1. Connect to both instances using tt connect. Below is the example for instance001:

    $ tt connect master_master:instance001
       • Connecting to the instance...
       • Connected to master_master:instance001
    
    master_master:instance001>
    
  2. Check that both instances are writable using box.info.ro:

    • instance001:

      master_master:instance001> box.info.ro
      ---
      - false
      ...
      
    • instance002:

      master_master:instance002> box.info.ro
      ---
      - false
      ...
      
  3. Execute box.info.replication to check a replica set status. For instance002, upstream.status and downstream.status should be follow.

    master_master:instance001> box.info.replication
    ---
    - 1:
        id: 1
        uuid: c3bfd89f-5a1c-4556-aa9f-461377713a2a
        lsn: 7
        name: instance001
      2:
        id: 2
        uuid: dccf7485-8bff-47f6-bfc4-b311701e36ef
        lsn: 0
        upstream:
          status: follow
          idle: 0.93246499999987
          peer: replicator@127.0.0.1:3302
          lag: 0.00016188621520996
        name: instance002
        downstream:
          status: follow
          idle: 0.8988360000003
          vclock: {1: 7}
          lag: 0
    ...
    

    To see the diagrams that illustrate how the upstream and downstream connections look, refer to Monitoring a replica set.

Примечание

Note that a vclock value might include the 0 component that is related to local space operations and might differ for different instances in a replica set.

To check that both instances get updates from each other, follow the steps below:

  1. On instance001, create a space, format it, and create a primary index:

    box.schema.space.create('bands')
    box.space.bands:format({
        { name = 'id', type = 'unsigned' },
        { name = 'band_name', type = 'string' },
        { name = 'year', type = 'unsigned' }
    })
    box.space.bands:create_index('primary', { parts = { 'id' } })
    

    Then, add sample data to this space:

    box.space.bands:insert { 1, 'Roxette', 1986 }
    box.space.bands:insert { 2, 'Scorpions', 1965 }
    
  2. On instance002, use the select operation to make sure data is replicated:

    master_master:instance002> box.space.bands:select()
    ---
    - - [1, 'Roxette', 1986]
      - [2, 'Scorpions', 1965]
    ...
    
  3. Add more data to the created space on instance002:

    box.space.bands:insert { 3, 'Ace of Base', 1987 }
    box.space.bands:insert { 4, 'The Beatles', 1960 }
    
  4. Get back to instance001 and use select to make sure new records are replicated:

    master_master:instance001> box.space.bands:select()
    ---
    - - [1, 'Roxette', 1986]
      - [2, 'Scorpions', 1965]
      - [3, 'Ace of Base', 1987]
      - [4, 'The Beatles', 1960]
    ...
    
  5. Check that box.info.vclock values are the same on both instances:

    • instance001:

      master_master:instance001> box.info.vclock
      ---
      - {2: 2, 1: 12}
      ...
      
    • instance002:

      master_master:instance002> box.info.vclock
      ---
      - {2: 2, 1: 12}
      ...
      

Примечание

To learn how to fix and prevent replication conflicts using trigger functions, see Resolving replication conflicts.

To insert conflicting records to instance001 and instance002, follow the steps below:

  1. Stop instance001 using the tt stop command:

    $ tt stop master_master:instance001
    
  2. On instance002, insert a new record:

    box.space.bands:insert { 5, 'incorrect data', 0 }
    
  3. Stop instance002 using tt stop:

    $ tt stop master_master:instance002
    
  4. Start instance001 back:

    $ tt start master_master:instance001
    
  5. Connect to instance001 and insert a record that should conflict with a record already inserted on instance002:

    box.space.bands:insert { 5, 'Pink Floyd', 1965 }
    
  6. Start instance002 back:

    $ tt start master_master:instance002
    

    Then, check box.info.replication on instance001. upstream.status should be stopped because of the Duplicate key exists error:

    master_master:instance001> box.info.replication
    ---
    - 1:
        id: 1
        uuid: c3bfd89f-5a1c-4556-aa9f-461377713a2a
        lsn: 13
        name: instance001
      2:
        id: 2
        uuid: dccf7485-8bff-47f6-bfc4-b311701e36ef
        lsn: 2
        upstream:
          peer: replicator@127.0.0.1:3302
          lag: 115.99977827072
          status: stopped
          idle: 2.0342070000006
          message: Duplicate key exists in unique index "primary" in space "bands" with
            old tuple - [5, "Pink Floyd", 1965] and new tuple - [5, "incorrect data",
            0]
        name: instance002
        downstream:
          status: stopped
          message: 'unexpected EOF when reading from socket, called on fd 24, aka 127.0.0.1:3301,
            peer of 127.0.0.1:58478: Broken pipe'
          system_message: Broken pipe
    ...
    

    The diagram below illustrates how the upstream and downstream connections look like:

    replication status on a new master

To resolve a replication conflict, instance002 should get the correct data from instance001 first. To achieve this, instance002 should be rebootstrapped:

  1. Select all the tuples in the box.space._cluster system space to get a UUID of instance002:

    master_master:instance001> box.space._cluster:select()
    ---
    - - [1, 'c3bfd89f-5a1c-4556-aa9f-461377713a2a', 'instance001']
      - [2, 'dccf7485-8bff-47f6-bfc4-b311701e36ef', 'instance002']
    ...
    
  2. In the config.yaml file, change the following instance002 settings:

    • Set database.mode to ro.
    • Set database.instance_uuid to a UUID value obtained in the previous step.
    instance002:
      database:
        mode: ro
        instance_uuid: 'dccf7485-8bff-47f6-bfc4-b311701e36ef'
    
  3. Reload configurations on both instances using the config:reload() function:

    • instance001:

      master_master:instance001> require('config'):reload()
      ---
      ...
      
    • instance002:

      master_master:instance002> require('config'):reload()
      ---
      ...
      
  4. Delete write-ahead logs and snapshots stored in the var/lib/instance002 directory.

    Примечание

    var/lib is the default directory used by tt to store write-ahead logs and snapshots. Learn more from Configuration.

  5. Restart instance002 using the tt restart command:

    $ tt restart master_master:instance002
    
  6. Connect to instance002 and make sure it received the correct data from instance001:

    master_master:instance002> box.space.bands:select()
    ---
    - - [1, 'Roxette', 1986]
      - [2, 'Scorpions', 1965]
      - [3, 'Ace of Base', 1987]
      - [4, 'The Beatles', 1960]
      - [5, 'Pink Floyd', 1965]
    ...
    

After reseeding a replica, you need to resolve a replication conflict that keeps replication stopped:

  1. Execute box.info.replication on instance001. upstream.status is still stopped:

    master_master:instance001> box.info.replication
    ---
    - 1:
        id: 1
        uuid: c3bfd89f-5a1c-4556-aa9f-461377713a2a
        lsn: 13
        name: instance001
      2:
        id: 2
        uuid: dccf7485-8bff-47f6-bfc4-b311701e36ef
        lsn: 2
        upstream:
          peer: replicator@127.0.0.1:3302
          lag: 115.99977827072
          status: stopped
          idle: 1013.688243
          message: Duplicate key exists in unique index "primary" in space "bands" with
            old tuple - [5, "Pink Floyd", 1965] and new tuple - [5, "incorrect data",
            0]
        name: instance002
        downstream:
          status: follow
          idle: 0.69694700000036
          vclock: {2: 2, 1: 13}
          lag: 0
    ...
    

    The diagram below illustrates how the upstream and downstream connections look like:

    replication status after reseeding a replica
  2. In the config.yaml file, clear the iproto option for instance001 by setting its value to {} to disconnect this instance from instance002. Set database.mode to ro:

    instance001:
      database:
        mode: ro
      iproto: {}
    
  3. Reload configuration on instance001 only:

    master_master:instance001> require('config'):reload()
    ---
    ...
    
  4. Change database.mode values back to rw for both instances and restore iproto.listen for instance001. The database.instance_uuid option can be removed for instance002:

    instance001:
      database:
        mode: rw
      iproto:
        listen:
        - uri: '127.0.0.1:3301'
    instance002:
      database:
        mode: rw
      iproto:
        listen:
        - uri: '127.0.0.1:3302'
    
  5. Reload configurations on both instances one more time:

    • instance001:

      master_master:instance001> require('config'):reload()
      ---
      ...
      
    • instance002:

      master_master:instance002> require('config'):reload()
      ---
      ...
      
  6. Check box.info.replication. upstream.status should be follow now.

    master_master:instance001> box.info.replication
    ---
    - 1:
        id: 1
        uuid: c3bfd89f-5a1c-4556-aa9f-461377713a2a
        lsn: 13
        name: instance001
      2:
        id: 2
        uuid: dccf7485-8bff-47f6-bfc4-b311701e36ef
        lsn: 2
        upstream:
          status: follow
          idle: 0.86873800000012
          peer: replicator@127.0.0.1:3302
          lag: 0.0001060962677002
        name: instance002
        downstream:
          status: follow
          idle: 0.058662999999797
          vclock: {2: 2, 1: 13}
          lag: 0
    ...
    

The process of adding instances to a replica set and removing them is similar for all failover modes. Learn how to do this from the Master-replica: manual failover tutorial:

Before removing an instance from a replica set with replication.failover set to off, make sure this instance is in read-only mode.

Sharding

С ростом проекта масштабируемость баз данных часто становится одной из наиболее серьезных проблем. Если отдельный сервер не может справиться с нагрузкой, необходимо применять средства масштабирования.

Шардирование – это архитектурный принцип баз данных, который позволяет их горизонтально масштабировать, что означает разбиение набора данных на части и распределение их по нескольким серверам.

С помощью модуля vshard кортежи набора данных распределяются по множеству узлов, на каждом из которых находится экземпляр сервера базы данных Tarantool. Каждый экземпляр обрабатывает лишь подмножество от общего количества данных, поэтому увеличение нагрузки можно компенсировать добавлением новых серверов. Первоначальный набор данных секционируется на множество частей, то есть каждая часть хранится на отдельном сервере.

Модуль vshard основан на концепции виртуальных сегментов: набор кортежей распределяется на большое количество абстрактных виртуальных узлов (виртуальных сегментов, или просто сегментов далее по тексту), а не на малое количество физических узлов.

Секционирование набора данных осуществляется с помощью сегментных ключей (идентификаторов сегментов). Хеширование сегментного ключа в большое количество сегментов позволяет незаметно для пользователя изменять количество серверов в кластере. Механизм балансирования распределяет сегменты между шардами при добавлении или удалении каких-либо серверов.

Для сегментов предусмотрены состояния, поэтому можно легко отслеживать состояние сервера. Например, активен ли экземпляр сервера и доступен ли он для всех типов запросов, или же произошел отказ, и сервер принимает только запросы на чтение.

Модуль vshard предоставляет общедоступные и внутренние API роутера и хранилища для приложений с поддержкой шардирования.

Начните с руководства по быстрому запуску или узнайте подробности об архитектуре шардирования в Tarantool:

Ознакомьтесь с руководством администратора по шардированию или перейдите к справочнику по конфигурации и API модуля vshard.

Архитектура

Рассмотрим распределенный Tarantool-кластер, состоящий из подкластеров под названием шарды, в каждом из которых хранится некоторая часть данных. Каждый шард, в свою очередь, представляет собой набор реплик, одна из которых служит ведущим узлом, обрабатывающим все запросы на чтение и запись.

Весь набор данных при шардировании распределяется на заданное количество виртуальных сегментов (далее по тексту просто сегменты). Каждому из них присваивается уникальный номер от 1 до N, где N – это общее количество сегментов. Специально выбирается количество сегментов на несколько порядков больше, чем потенциальное количество кластерных узлов даже с учетом будущего масштабирования кластера. Например, если предполагается M узлов, набор данных может быть разделен на 100 * M или даже 1000 * M сегментов. Особое внимание следует уделить выбору количества сегментов: слишком большое число может потребовать дополнительную память для хранения информации о маршрутизации; слишком маленькое может привести к снижению степени детализации балансировки.

Каждый шард хранит уникальное подмножество сегментов. Один сегмент не может относиться к нескольким шардам одновременно, как показано на схеме ниже:

../../../_images/bucket.svg

Такая схема распределения сегментов по шардам хранится в таблице в одном из системных спейсов Tarantool, при этом в каждом шарде содержится только определенная часть схемы, которая покрывает присвоенные этому шарду сегменты.

Помимо таблицы, идентификатор сегмента также хранится в специальном поле каждого кортежа каждой таблицы, участвующей в шардировании.

Как только шард получает любой запрос (за исключением SELECT) от приложения, этот шард сверяет идентификатор сегмента, указанный в запросе, с таблицей идентификаторов сегментов, которые принадлежат данному узлу. Если указанный идентификатор сегмента недействителен, то запрос завершается со следующей ошибкой: «wrong bucket” (неверный сегмент). В противном случае запрос выполняется, и всем создаваемым данным присваивается указанный в запросе идентификатор сегмента. Обратите внимание, что запрос должен изменять только данные с тем же идентификатором сегмента, что и в запросе.

Хранение идентификаторов сегментов как в самих данных, так и в таблице обеспечивает согласованность данных независимо от логики приложения и прозрачность балансировки для приложения. Хранение таблицы соответствий в системном спейсе обеспечивает последовательность шардирования в случае восстановления после отказа, так как у всех реплик в шарде будет одно исходное состояние таблицы.

Набор данных при шардировании распределяется на большое количество абстрактных узлов, которые называются виртуальные сегменты (далее по тексту просто сегменты).

Секционирование набора данных происходит с помощью сегментного ключа (или идентификатора сегмента (bucket id) в терминах Tarantool). Идентификатор сегмента – это число от 1 до N, где N – это общее количество сегментов.

../../../_images/buckets.svg

В каждом наборе реплик есть уникальное подмножество сегментов. Один сегмент не может относиться к нескольким наборам реплик одновременно.

The total number of buckets is determined by the administrator who sets up the initial cluster configuration.

Every space you plan to shard must have a numeric field containing bucket id-s. You can learn more from Data definition.

Сегментированный кластер в Tarantool состоит из:

../../../_images/schema.svg

Хранилище (storage) – это узел, который хранит подмножество набора данных. Несколько реплицируемых (для резерва) хранилищ составляют набор реплик (также называемый шардом).

У каждого хранилища в наборе реплик есть роль: мастер или реплика. Мастер обрабатывает запросы на чтение и запись. Реплика обрабатывает запросы на чтение, но не может обрабатывать запросы на запись.

../../../_images/master_replica.svg

Роутер (router) – это автономный компонент ПО, который обеспечивает маршрутизацию запросов чтения и записи от клиентского приложения к шардам.

Все запросы из приложения приходят в сегментированный кластер через роутер (router). Роутер сохраняет топологию сегментированного кластера прозрачной для приложения, не сообщая приложению:

  • номер и местоположение шардов,
  • процесс балансировки данных,
  • наличие отказа и восстановление после отказа реплики.

Роутер также может самостоятельно вычислить идентификатор сегмента при условии, что приложение четко определяет правила вычисления идентификатора сегмента на основе данных запроса. Для этого роутеру необходимо знать схему данных.

У роутера нет постоянного статуса, он не хранит топологию кластера и не выполняет балансировку данных. Роутер – это автономный компонент ПО, который может работать на уровне хранилища или на уровне приложения в зависимости от функций приложения.

Роутер поддерживает постоянный пул соединений со всеми хранилищами, созданными при запуске, что помогает избежать ошибок конфигурации. После создания пула роутер кэширует текущее состояние таблицы _vbucket, чтобы ускорить маршрутизацию. Если сегмент был перемещен в другое хранилище в результате балансировки, или же один из шардов переключается на реплику, роутер обновит таблицу маршрутизации так, чтобы это было понятно приложению.

Шардирование не интегрировано ни в одну систему централизованного хранения конфигураций. Предполагается, что само приложение обрабатывает взаимодействие с такой системой и передает параметры шардирования. При этом конфигурацию можно изменить динамически, например, при добавлении или удалении одного или нескольких шардов:

  1. Чтобы добавить новый шард в кластер, системный администратор сначала изменяет конфигурацию всех роутеров, а затем конфигурацию всех хранилищ.
  2. Новый шард становится доступен для балансировки на уровне хранилища.
  3. В результате балансировки один из виртуальных сегментов перемещается на новый шард.
  4. При попытке доступа к виртуальному сегменту роутер получает специальный код ошибки, который указывает новое местоположение сегмента.

CRUD-операции могут:

  • либо выполняться в рамках хранимой процедуры в хранилище,
  • либо запускаться приложением.

В любом случае приложение должно включать идентификатор рабочего сегмента в запрос. При выполнении запроса вставки INSERT идентификатор сегмента хранится в созданном кортеже. В других случаях проверяется, совпадает ли указанный идентификатор рабочего сегмента с идентификатором сегмента кортежа, в который вносятся изменения.

Поскольку хранилище не знает о соответствии идентификатора сегмента и первичного ключа, все запросы выборки SELECT в хранимых процедурах внутри хранилища выполняются только локально. SELECT-запросы, которые были инициализированы приложением, направляются на роутер. И если приложение передало идентификатор сегмента, роутер использует его для вычисления шарда.

Существует несколько способов вызвать хранимые процедуры в наборах реплик кластера. Хранимые процедуры можно вызвать:

  • либо на определенном виртуальном сегменте, расположенном в наборе реплик (в этом случае необходимо различать процедуры чтения и записи, так как процедуры записи не применимы к перемещаемым сегментам),
  • либо без указания определенного сегмента.

Все проверки правильности маршрутизации, выполняемые для шардированных DML-операций, распространяются и на хранимые процедуры, связанные с сегментами.

Балансировщик представляет собой фоновый процесс балансировки, который обеспечивает равномерное распределение сегментов по шардам. Во время балансировки происходит миграция сегментов по наборам реплик.

The rebalancer «wakes up» periodically and redistributes data from the most loaded nodes to less loaded nodes. Rebalancing starts if the replicaset disbalance of a replica set exceeds a disbalance threshold specified in the configuration.

The replicaset disbalance is calculated as follows:

|эталонное_число_сегментов - текущее_число_сегментов| / эталонное_число_сегментов * 100

Набор реплик, из которого переносится сегмент, называется исходный (source); а набор реплик, куда переносится сегмент, называется целевой (destination).

Блокировка набора реплик позволяет набору реплик оставаться невидимым для балансировщика. Набор реплик с блокировкой не может ни принимать новые сегменты, ни мигрировать свои собственные.

Во время миграции у сегмента могут быть разные статусы:

Сегменты в статусе мусора GARBAGE удаляются сборщиком мусора.

../../../_images/states.svg

Миграция происходит следующим образом:

  1. В целевом наборе реплик создается новый сегмент, который получает статус RECEIVING (принимающий), начинается копирование данных, и сегмент отклоняет все запросы.
  2. Отправляемый сегмент в исходном наборе реплик получает статус SENDING и продолжает обрабатывать запросы на чтение.
  3. После копирования данных сегмент в исходном наборе реплик получает статус отправленного (SENT) и перестает принимать запросы.
  4. Сегмент в целевом наборе реплик переходит в активный статус (ACTIVE) и начинает принимать все запросы.

Примечание

Есть специальная ошибка vshard.error.code.TRANSFER_IS_IN_PROGRESS, которая возвращается в том случае, если запрос пытается выполнить действие, неприменимое к перемещаемому сегменту. В этом случае необходимо повторить попытку выполнения запроса.

Системный спейс _bucket в каждом наборе реплик хранит идентификаторы сегментов данного набора реплик. Спейс содержит следующие поля:

  • bucket – идентификатор сегмента
  • status – статус сегмента
  • destination – UUID целевого набора реплик

Пример _bucket.select{}:

---
- - [1, ACTIVE, abfe2ef6-9d11-4756-b668-7f5bc5108e2a]
  - [2, SENT, 19f83dcb-9a01-45bc-a0cf-b0c5060ff82c]
...

После миграции сегмента UUID целевого набора реплик вносится в таблицу. Пока сегмент еще находится в исходном наборе реплик, значение UUID целевого набора реплик равно NULL.

Таблица маршрутизации роутера отображает все идентификаторы сегментов с соответствующими наборами реплик. Она обеспечивает консистентность шардирования в случае отказа.

Роутер поддерживает постоянный пул соединений со всеми хранилищами, созданными при запуске, что помогает избежать ошибки конфигурации. После создания пула соединений роутер кэширует текущее состояние таблицы маршрутизации, чтобы ускорить ее. Если произошла миграция сегмента в другое хранилище после балансировки или же отказ, который вызвал переключение шарда на другую реплику, файбер обнаружения (discovery fiber) в роутере обновит таблицу маршрутизации автоматически.

Поскольку идентификатор сегмента явно указан как в данных, так и в таблице отображения на роутере, данные сохраняются независимо от логики приложения. Это также обеспечивает прозрачность балансировки для приложения.

Запросы в базу данных можно производить из приложения или с помощью хранимых процедур. В любом случае идентификатор сегмента следует явным образом указать в запросе.

Сначала все запросы направляются в роутер. Роутер поддерживает только операцию вызова, которая выполняется с помощью функции vshard.router.call():

result = vshard.router.call(<идентификатор_сегмента>, <режим>, <имя_функции>, {<список_аргументов>}, {<опции>})

Запросы обрабатываются следующим образом:

  1. Роутер использует идентификатор сегмента для поиска набора реплик с соответствующим сегментом в таблице маршрутизации.

    Если роутер не содержит информацию о соответствии идентификатора сегмента набору реплик (файбер обнаружения еще не заполнил таблицу), роутер выполняет запросы ко всем хранилищам, чтобы обнаружить местонахождение сегмента.

  2. После обнаружения сегмента шард проверяет:

    • хранится ли сегмент в системном спейсе _bucket набора реплик;
    • находится ли сегмент в статусе ACTIVE (активный) или PINNED (закрепленный) (если выполняется запрос на чтение, то сегмент может находиться в состоянии отправки SENDING).
  3. Если проверка пройдена, запрос выполняется. В противном случае, выполнение запроса прекращается с ошибкой: “wrong bucket” (несоответствующий сегмент).

Вертикальное масштабирование
Добавление мощности в отдельный сервер: использование более мощного процессора, добавление оперативной памяти, добавление хранилищ и т.д.
Горизонтальное масштабирование
Добавление дополнительных серверов в пул ресурсов, последующее секционирование и распределение набора данных по серверам.
Шардирование
Архитектура базы данных, которая допускает секционирование набора данных по сегментному ключу и распределение набора данных по нескольким серверам. Шардирование представляет собой частный случай горизонтального масштабирования.
Узел
Виртуальный или физический экземпляр сервера.
Кластер
Набор узлов, которые составляют отдельную группу.
Хранилище
Узел, который хранит подмножество данных из набора.
Набор реплик
Ряд узлов, на которых хранятся копии набора данных. У каждого хранилища в наборе реплик есть роль: мастер или реплика.
Мастер
Хранилище в наборе реплик, которое обрабатывает запросы на чтение и запись.
Реплика
Хранилище в наборе реплик, которое обрабатывает только запросы на чтение.
Запросы на чтение
Запросы только на чтение, то есть выборка.
Запросы на запись
Операции по изменению данных, то есть запросы на создание, чтение, изменение и удаление данных.
Сегменты (виртуальные сегменты)
Абстрактные виртуальные узлы, на которые производится секционирование набора данных по сегментному ключу (идентификатору сегмента).
Идентификатор сегмента
Сегментный ключ, который определяет принадлежность сегмента к определенному набору реплик. Идентификатор сегмента можно вычислить по хеш-ключу.
Роутер
Прокси-сервер, который отвечает за запросы маршрутизации от приложения к узлам в кластере.

Шардирование с vshard

Sharding in Tarantool is implemented in the vshard module. For a quick start with vshard, refer to Creating a sharded cluster.

Примечание

Starting with the 3.0 version, the recommended way of configuring Tarantool is using a configuration file. The sharding section defines configuration parameters related to sharding. To learn how to configure vshard in code, see Справочник по настройке.

Модуль vshard не входит в основной дистрибутив Tarantool. Чтобы установить модуль, выполните команду:

$ tt rocks install vshard

If you are developing a sharded cluster application, add the vshard module dependency to a *.rockspec file:

dependencies = {
    'vshard == 0.1.27'
}

Примечание

The minimum required version of vshard is 0.1.25.

Configuring settings related to sharding might include the following steps:

  1. Configure connection settings to allow instances within a sharded cluster to communicate with each other.
  2. Specify which role each replica set plays in a sharded cluster.
  3. Configure how data is partitioned across shards.
  4. Specify settings related to data rebalancing.

This section describes connection options that enable communication between instances within a sharded cluster. For general information about connections, see the Connections topic.

To allow a router and rebalancer to connect to storages, a user with the sharding role should be used. The example below shows how to grant the sharding role to the storage user:

credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [replication]
    storage:
      password: 'secret'
      roles: [sharding]

The sharding role has different privileges depending on a replica set’s sharding role. For replica sets with the storage sharding role, the sharding credential role has the following privileges:

  • All privileges provided by the replication role.
  • Executing vshard.storage.* functions.

If a replica set does not have the storage sharding role, the sharding credential role does not have any privileges.

Each replica set in a sharded cluster can have one of three roles:

You can use the sharding.roles option to assign a specific role to a replica set or group of replica sets. In the example below, all replica sets in the storages group have the storage role while replica sets in the routers group have the router role.

groups:
  storages:
    sharding:
      roles: [storage]
    # ...
  routers:
    sharding:
      roles: [router]
    # ...

Note that the rebalancer role is optional. If it is not specified, a rebalancer is selected automatically from the master instances of replica sets. To specify the rebalancer manually or turn it off, use the sharding.rebalancer_mode option.

This section describes configuration settings related to data partitioning. Learn how to define spaces to be sharded in Data definition.

To define the total number of buckets in a cluster, configure the sharding.bucket_count option at the global level. In the example below, sharding.bucket_count is set to 1000:

sharding:
  bucket_count: 1000

sharding.bucket_count should be several orders of magnitude larger than the potential number of cluster nodes considering potential scaling out in the future.

If the estimated number of nodes in a cluster is N, then the data set should be divided into 100N or even 1000N buckets depending on the planned scaling out. This number is greater than the potential number of cluster nodes in the system being designed.

Keep in mind that too many buckets can cause a need to allocate more memory to store routing information. On the other hand, an insufficient number of buckets can lead to decreased granularity when rebalancing.

A replica set weight defines the storage capacity of the replica set: the larger the weight, the more buckets the replica set can store. You can configure a replica set weight using the sharding.weight option. This option can be used to store the prevailing amount of data on a replica set with more memory space. You can also assign a zero weight to a replica set to initiate migration of its buckets to the remaining cluster nodes.

In the example below, the storage-a replica set can store twice as much data as storage-b:

# ...
replicasets:
  storage-a:
    sharding:
      weight: 2
    # ...
  storage-b:
    sharding:
      weight: 1
    # ...

Существует эталонное число сегментов в наборе реплик («эталонный» в данном случае значит идеальный). Если во всем наборе реплик это число остается неизменным, то сегменты распределяются равномерно.

The etalon number is calculated automatically considering the number of buckets in the cluster and the weights of the replica sets.

Rebalancing starts if the disbalance threshold of a replica set exceeds the disbalance threshold specified in the configuration (the sharding.rebalancer_disbalance_threshold option).

Предел дисбаланса набора реплик рассчитывается следующим образом:

|эталонное_число_сегментов - текущее_число_сегментов| / эталонное_число_сегментов * 100

For example, a cluster is configured as follows:

In this case, the etalon numbers of buckets for the replica sets are:

  • 1st replica set – 1000.
  • 2nd replica set – 500.
  • 3rd replica set – 1500.

You can set a replica set weight to zero to initiate migration of its buckets to the remaining cluster nodes. You can also add a new replica set with a non-zero weight to initiate migration of the buckets from the existing replica sets.

When a new shard is added, a configuration should be reloaded on each instance to migrate buckets to a new shard:

  • If a centralized configuration storage is used, Tarantool reloads a changed configuration automatically.
  • If a local configuration file is used, you need to reload a configuration on all the routers first and then on all the storages.

Originally, vshard had quite a simple rebalancer – one process on one node that calculated routes that should send buckets, how many, and to whom. The nodes applied these routes one by one sequentially.

К сожалению, такая простая схема работала недостаточно быстро, особенно для Vinyl’а, где затраты ресурсов на чтение диска были сопоставимы с сетевыми затратами. На самом деле, механизм применения маршрутов в балансировщике Vinyl’а большую часть времени был в режиме ожидания.

Теперь каждый узел может параллельно посылать несколько сегментов по кругу в несколько пунктов назначения или всего в один.

To set the degree of parallelism, use the sharding.rebalancer_max_sending option:

sharding:
  rebalancer_max_sending: 5

Примечание

Specifying sharding.rebalancer_max_sending = N probably won’t give N times speed up. It depends on network, disk, number of other fibers in the system.

У вас уже есть 10 наборов реплик, добавили новый. Теперь все 10 наборов реплик будут пытаться отправить сегменты на новый.

Assume that each replica set can send up to 5 buckets at once. In that case, the new replica set will experience a rather big load of 50 buckets being downloaded at once. If the node needs to do some other work, perhaps such a big load is undesirable. Also too, many parallel buckets can cause timeouts in the rebalancing process itself.

To fix the problem, you can set a lower value for rebalancer_max_sending for old replica sets, or decrease rebalancer_max_receiving for the new one. In the latter case, some workers on old nodes will be throttled, and you will see that in the logs.

rebalancer_max_sending is important, if you have restrictions for the maximum number of buckets that can be read only at once in the cluster. As you remember, when a bucket is being sent, it does not accept new write requests.

У вас есть 100 000 сегментов, и каждый сегмент хранит ~ 0,001% ваших данных. В кластере 10 наборов реплик. И нельзя позволить себе заблокировать для записи > 0,1% данных. Таким образом, не следует устанавливать значение rebalancer_max_sending > 10 на этих узлах. Тогда балансировщик не будет посылать более 100 сегментов одновременно по всему кластеру.

If rebalancer_max_sending is too high and rebalancer_max_receiving is too low, then some buckets will try to get relocated – and will fail with that. This problem will consume network resources and time. It is important to configure these parameters to not conflict with each other.

A replica set lock (sharding.lock) makes a replica set invisible to the rebalancer: a locked replica set can neither receive new buckets nor migrate its own buckets.

A bucket pin (vshard.storage.bucket_pin(bucket_id)) blocks a specific bucket from migrating: a pinned bucket stays on the replica set to which it is pinned until it is unpinned.

Закрепление всех сегментов в наборе реплик не означает блокирование набора реплик. Даже после закрепления всех сегментов незаблокированный набор реплик может принимать новые сегменты.

A replica set lock is helpful, for example, to separate a replica set from production replica sets for testing, or to preserve some application metadata that must not be sharded for a while. A bucket pin is used for similar cases but in a smaller scope.

By both locking a replica set and pinning all buckets, you can isolate an entire replica set.

Заблокированные наборы реплик и закрепленные сегменты влияют на алгоритм балансировки, так как балансировщик должен игнорировать заблокированные наборы реплик и учитывать закрепленные сегменты при попытке достичь наилучшего возможного баланса.

Это нетривиальная задача, поскольку пользователь может закрепить слишком много сегментов в наборе реплик, так что становится невозможным достижение идеального баланса. Например, рассмотрим следующий кластер (предположим, что все веса наборов реплик равны 1).

Начальная конфигурация:

rs1: bucket_count = 150 -- число сегментов
rs2: bucket_count = 150, pinned_count = 120 -- число сегментов, число закрепленных сегментов

Добавление нового набора реплик:

rs1: bucket_count = 150
rs2: bucket_count = 150, pinned_count = 120
rs3: bucket_count = 0

Идеальным балансом было бы 100 - 100 - 100, чего невозможно достичь, поскольку набор реплик rs2 содержит 120 закрепленных сегментов. The best possible balance here is the following:

rs1: bucket_count = 90
rs2: bucket_count = 120, pinned_count 120
rs3: bucket_count = 90

The rebalancer moved as many buckets as possible from rs2 to decrease the disbalance. At the same time, it respected equal weights of rs1 and rs3.

Алгоритмы реализации блокировки и закрепления совершенно разные, хотя с точки зрения функций они похожи.

Locked replica sets do not participate in rebalancing. This means that even if the actual total number of buckets is not equal to the etalon number, the disbalance cannot be fixed due to the lock. When the rebalancer detects that one of the replica sets is locked, it recalculates the etalon number of buckets of the non-locked replica sets as if the locked replica set and its buckets did not exist at all.

Балансировка наборов реплик с закрепленными сегментами требует более сложного алгоритма. Здесь pinned_count[o] – это число закрепленных сегментов, а etalon_count – это эталонное число сегментов для набора реплик:

  1. Балансировщик рассчитывает эталонное число сегментов, как если бы все сегменты не были закреплены. Затем балансировщик проверяет каждый набор реплик и сопоставляет эталонное число сегментов с числом закрепленных сегментов в наборе реплик. Если pinned_count < etalon_count, незаблокированные наборы реплик (на данном этапе все заблокированные наборы реплик уже отфильтрованы) с закрепленными сегментами могут получать новые сегменты.
  2. Если же pinned_count > etalon_count, дисбаланс исправить нельзя, так как балансировщик не может вывести закрепленные сегменты из этого набора реплик. В таком случае эталонное число обновляется как равное числу закрепленных сегментов. Наборы реплик с pinned_count > etalon_count не обрабатываются балансировщиком`, а число закрепленных сегментов вычитается из общего числа сегментов. Балансировщик пытается вывести как можно больше сегментов из таких наборов реплик.
  3. Эта процедура перезапускается с шага 1 для наборов реплик с pinned_count >= etalon_count до тех пор, пока не будет выполнено условие pinned_count <= etalon_count для всех наборов реплик. Процедура также перезапускается при изменении общего числа сегментов.

Псевдокод для данного алгоритма будет следующим:

function cluster_calculate_perfect_balance(replicasets, bucket_count)
        -- балансировка сегментов с использованием веса рабочих наборов реплик --
end;

cluster = <all of the non-locked replica sets>;
bucket_count = <the total number of buckets in the cluster>;
can_reach_balance = false
while not can_reach_balance do
        can_reach_balance = true
        cluster_calculate_perfect_balance(cluster, bucket_count);
        foreach replicaset in cluster do
                if replicaset.perfect_bucket_count <
                   replicaset.pinned_bucket_count then
                        can_reach_balance = false
                        bucket_count -= replicaset.pinned_bucket_count;
                        replicaset.perfect_bucket_count =
                                replicaset.pinned_bucket_count;
                end;
        end;
end;
cluster_calculate_perfect_balance(cluster, bucket_count);

Сложность алгоритма составляет O(N^2), где N – количество наборов реплик. На каждом шаге алгоритм либо завершает вычисление, либо игнорирует хотя бы один новый набор реплик, перегруженный закрепленными сегментами, и обновляет эталонное число сегментов в других наборах реплик.

Ссылка в сегменте – это счетчик в оперативной памяти, который похож на закрепление сегмента со следующими отличиями:

  1. Ссылка в сегменте никогда не сохраняется. Ссылки предназначены для запрета передачи сегментов во время выполнения запроса, но при перезапуске все запросы отбрасываются.

  2. Есть 2 типа ссылок в сегменте: только чтение (RO) и чтение-запись (RW).

    Если в сегменте есть ссылки типа RW, его нельзя перемещать. Однако, если балансировщику требуется отправка этого сегмента, он блокирует его для новых запросов на запись, ожидает завершения всех текущих запросов, а затем отправляет сегмент.

    Если в сегменте есть ссылки типа RO, его можно отправить, но нельзя удалить. Такой сегмент может даже перейти в статус мусора GARBAGE или отправки SENT, но его данные сохраняются до тех пор, пока не уйдет последний читатель.

    В одном сегменте могут быть ссылки как типа RO, так и типа RW.

  3. Ссылки в сегменте исчисляются.

The vshard.storage.bucket_ref/unref() methods are called automatically when vshard.router.call() or vshard.storage.call() is used. For raw API like r = vshard.router.route() r:callro/callrw, you should explicitly call the bucket_ref() method inside the function. Also, make sure that you call bucket_unref() after bucket_ref(), otherwise the bucket cannot be moved from the storage until the instance is restarted.

Чтобы узнать количество ссылок в сегменте, используйте vshard.storage.buckets_info([идентификатор_сегмента]) (параметр идентификатор_сегмента необязателен).

Пример:

vshard.storage.buckets_info(1)
---
- 1:
    status: active
    ref_rw: 1
    ref_ro: 1
    ro_lock: true
    rw_lock: true
    id: 1

Sharded spaces should be defined in a storage application inside box.once() and should have a field with bucket id values. This field should meet the following requirements:

  • The field’s data type can be unsigned, number, or integer.
  • The field must be non-nullable.
  • The field must be indexed by the shard_index. The default name for this index is bucket_id.

In the example below, the bands space has the bucket_id field, which is used to partition a dataset across different storage instances:

box.once('bands', function()
    box.schema.create_space('bands', {
        format = {
            { name = 'id', type = 'unsigned' },
            { name = 'bucket_id', type = 'unsigned' },
            { name = 'band_name', type = 'string' },
            { name = 'year', type = 'unsigned' }
        },
        if_not_exists = true
    })
    box.space.bands:create_index('id', { parts = { 'id' }, if_not_exists = true })
    box.space.bands:create_index('bucket_id', { parts = { 'bucket_id' }, unique = false, if_not_exists = true })
end)

Example on GitHub: sharded_cluster

Примечание

В шардированном спейсе уникальность по вторичным индексам гарантируется только внутри одного шарда, а не на уровне всего кластера.

All DML operations with data should be performed via a router using the vshard.router.call functions, such as vshard.router.callrw() or vshard.router.callro(). For example, a storage application has the insert_band function used to insert new tuples:

function insert_band(id, bucket_id, band_name, year)
    box.space.bands:insert({ id, bucket_id, band_name, year })
end

In a router application, you can define the put function that specifies how a router selects the storage to write data:

function put(id, band_name, year)
    local bucket_id = vshard.router.bucket_id_mpcrc32({ id })
    vshard.router.callrw(bucket_id, 'insert_band', { id, bucket_id, band_name, year })
end

Learn more at Обработка запросов.

Idempotent requests produce the same result every time they are executed. For example, a data read request or a multiplication by one are both idempotent. Therefore, incrementing by one is an example of a non-idempotent operation. When such an operation is applied again, the value for the field increases by 2 instead of just 1.

Примечание

Any write requests that are intended to be executed repeatedly (for example, retried after an error) should be idempotent. The operations“ idempotency ensures that the change is applied only once.

A request may need to be run again if an error occurs on the server or client side. In this case:

  • Read requests can be executed repeatedly. For this purpose, vshard.router.call() (with mode=read) uses the request_timeout parameter (since vshard 0.1.28). It is necessary to pass the request_timeout and timeout parameters together, with the following requirement:

    timeout > request_timeout
    

    For example, if timeout = 10 and request_timeout = 2, within 10 seconds the router is able to make 5 attempts (2 seconds each) to send a request to different replicas until the request finally succeeds.

  • Write requests (vshard.router.callrw()) generally cannot be re-executed without verifying that they have not been applied before. Lack of such a check might lead to duplicate records or unplanned data changes.

    For example, a client has sent a request to the server. The client is waiting for a response within a specified timeout. If the server sends a successful response after this time has elapsed, the client won’t see this response due to a timeout, and will consider the request as failed. When re-executing this request without additional check, the operation may be applied twice.

    A write request can be executed repeatedly without a check in two cases:

    • The request is idempotent.
    • It’s known for sure that the previous request raised an error before executing any write operations. For example, ER_READONLY was thrown by the server. In this case, we know that the request couldn’t complete due to server in read-only mode.

Deduplication examples

To ensure that the write requests (INSERT, UPDATE, UPSERT, and autoincrement) are idempotent, you should implement a check that the request is applied for the first time.

Примечание

There is no built-in deduplication check in Tarantool. Currently, deduplication can be only implemented by the user in the application code.

For example, when you add a new tuple to a space, you can use a unique insert ID to check the request. In the example below, within a single transaction:

  1. It is checked whether a tuple with the key ID exists in the bands space.
  2. If there is no tuple with this ID in the space, the tuple is inserted.
box.begin()
if box.space.bands:get{key} == nil then
    box.space.bands:insert{key, value}
end
box.commit()

For update and upsert requests, you can create a deduplication space where the request IDs will be saved. Deduplication space is a user space that contains a list of unique identifiers. Each identifier corresponds to one applied request. This space can have any name, in the example it is called deduplication.

In the example below, within a single transaction:

  1. It is checked whether the deduplication_key request ID exists in the deduplication space.
  2. If there is no such ID, the ID is added to the deduplication space.
  3. If the request hasn’t been applied before, it increments the specified field in the bands space by one.

This approach ensures that each data modification request will be executed only once.

function update_1(deduplication_key, key)
    box.begin()
    if box.space.deduplication:get{deduplication_key} == nil then
        box.space.deduplication:insert{deduplication_key}
        box.space.bands:update(key, {{'+', 'value', 1 }})
    end
    box.commit()
end

В случае отказа мастера в наборе реплик рекомендуется:

  1. Переключить одну из реплик в режим мастера, что позволит новому мастеру обрабатывать все входящие запросы.
  2. Обновить конфигурацию всех членов кластера, в результате чего все запросы будут перенаправлены на новый мастер.

In case a whole replica set fails, some part of the dataset becomes inaccessible. Meanwhile, the router tries to reconnect to the master of the failed replica set. This way, once the replica set is up and running again, the cluster is automatically restored.

Для проведения запланированного остановки мастера в наборе реплик рекомендуется:

  1. Update the configuration to use another instance as a master.
  2. Reload the configuration on all the instances. All the requests then are forwarded to a new master.
  3. Отключить старый мастер.

Для проведения запланированной остановки набора реплик рекомендуется:

  1. Migrate all the buckets to the other cluster storages. You can do this by assigning a zero weight to a replica set to initiate migration of its buckets to the remaining cluster nodes.
  2. Обновить конфигурацию всех узлов.
  3. Отключить набор реплик.

Поиск сегментов, восстановление сегментов и балансировка сегментов выполняются автоматически и не требуют ручного вмешательства.

С технической точки зрения есть несколько файберов, которые отвечают за различные типы действий:

Для получения подробной информации см. разделы Процесс балансировки и Миграция сегментов.

Файбер сборщик мусора работает в фоновом режиме на мастер-хранилищах в каждом наборе реплик. Он начинает удалять содержимое сегмента в состоянии мусора GARBAGE по частям. Когда сегмент пуст, запись о нем удаляется из системного спейса _bucket.

Файбер восстановления сегмента работает на мастер-хранилищах. Он помогает восстановить сегменты в статусах отправки SENDING и получения RECEIVING в случае перезагрузки.

Сегменты в статусе SENDING восстанавливаются следующим образом:

  1. Сначала система ищет сегменты в статусе SENDING.
  2. Если такой сегмент обнаружен, система отправляет запрос в целевой набор реплик.
  3. Если сегмент в целевом наборе реплик находится в активном статусе ACTIVE, исходный сегмент удаляется из исходного узла.

Сегменты в статусе RECEIVING удаляются без дополнительных проверок.

A failover fiber runs on every router. If a master of a replica set becomes unavailable, the failover fiber redirects read requests to the replicas. Write requests are rejected with an error until the master becomes available.

Connections and authentication

This section contains guides on how to configure connections and authentication features.

Connections

To set up a Tarantool cluster, you need to enable communication between its instances, regardless of whether they running on one or different hosts. This requires configuring connection settings that include:

Configuring connection settings is also required to enable communication of a Tarantool cluster to external systems. For example, this might be administering cluster members using tt, managing clusters using Tarantool Cluster Manager, or using connectors for different languages.

This topic describes how to define connection settings in the iproto section of a YAML configuration.

Примечание

iproto is a binary protocol used to communicate between cluster instances and with external systems.

To configure URIs used to listen for incoming requests, use the iproto.listen configuration option.

The example below shows how to set a listening IP address for instance001 to 127.0.0.1:3301:

instance001:
  iproto:
    listen:
    - uri: '127.0.0.1:3301'

In this example, instance001 listens on two IP addresses:

instance001:
  iproto:
    listen:
    - uri: '127.0.0.1:3301'
    - uri: '127.0.0.1:3302'

You can pass only a port value to iproto.listen:

instance001:
  iproto:
    listen:
    - uri: '3301'

In this case, this port is used for all IP addresses the server listens on.

In the Enterprise Edition, you can enable SSL for a connection using the params section of the specified URI:

instance001:
  iproto:
    listen:
    - uri: '127.0.0.1:3301'
      params:
        transport: 'ssl'
        ssl_cert_file: 'certs/server.crt'
        ssl_key_file: 'certs/server.key'

Learn more from Securing connections with SSL.

For local development, you can enable communication between cluster members by using Unix domain sockets:

instance001:
  iproto:
    listen:
    - uri: 'unix/:./var/run/{{ instance_name }}/tarantool.iproto'

Enterprise Edition

SSL is supported by the Enterprise Edition only.

Tarantool supports the use of SSL connections to encrypt client-server communications for increased security. To enable SSL, use the <uri>.params.* options, which can be applied to both listen and advertise URIs.

The example below demonstrates how to enable traffic encryption by using a self-signed server certificate. The following parameters are specified for each instance:

instances:
  instance001:
    iproto:
      listen:
      - uri: '127.0.0.1:3301'
        params:
          transport: 'ssl'
          ssl_cert_file: 'certs/server.crt'
          ssl_key_file: 'certs/server.key'
  instance002:
    iproto:
      listen:
      - uri: '127.0.0.1:3302'
        params:
          transport: 'ssl'
          ssl_cert_file: 'certs/server.crt'
          ssl_key_file: 'certs/server.key'
  instance003:
    iproto:
      listen:
      - uri: '127.0.0.1:3303'
        params:
          transport: 'ssl'
          ssl_cert_file: 'certs/server.crt'
          ssl_key_file: 'certs/server.key'

You can find the full example here: ssl_without_ca.

The example below demonstrates how to enable traffic encryption by using a server certificate signed by a trusted certificate authority. In this case, all replica set peers verify each other for authenticity.

The following parameters are specified for each instance:

  • ssl_ca_file: a path to a trusted certificate authorities (CA) file.
  • ssl_cert_file: a path to an SSL certificate file.
  • ssl_key_file: a path to a private SSL key file.
  • ssl_password (instance001): a password for an encrypted private SSL key.
  • ssl_password_file (instance002 and instance003): a text file containing passwords for encrypted SSL keys.
  • ssl_ciphers: a colon-separated list of SSL cipher suites the connection can use.
instances:
  instance001:
    iproto:
      listen:
      - uri: '127.0.0.1:3301'
        params:
          transport: 'ssl'
          ssl_ca_file: 'certs/root_ca.crt'
          ssl_cert_file: 'certs/instance001/server001.crt'
          ssl_key_file: 'certs/instance001/server001.key'
          ssl_password: 'qwerty'
          ssl_ciphers: 'ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256'
  instance002:
    iproto:
      listen:
      - uri: '127.0.0.1:3302'
        params:
          transport: 'ssl'
          ssl_ca_file: 'certs/root_ca.crt'
          ssl_cert_file: 'certs/instance002/server002.crt'
          ssl_key_file: 'certs/instance002/server002.key'
          ssl_password_file: 'certs/ssl_passwords.txt'
          ssl_ciphers: 'ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256'
  instance003:
    iproto:
      listen:
      - uri: '127.0.0.1:3303'
        params:
          transport: 'ssl'
          ssl_ca_file: 'certs/root_ca.crt'
          ssl_cert_file: 'certs/instance003/server003.crt'
          ssl_key_file: 'certs/instance003/server003.key'
          ssl_password_file: 'certs/ssl_passwords.txt'
          ssl_ciphers: 'ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256'

You can find the full example here: ssl_with_ca.

To reload SSL certificate files specified in the configuration, open an admin console and reload the configuration using config.reload():

require('config'):reload()

New certificates will be used for new connections. Existing connections will continue using old SSL certificates until reconnection is required. For example, certificate expiry or a network issue causes reconnection.

Credentials

Tarantool enables flexible management of access to various database resources by providing specific privileges to users. You can read more about the main concepts of Tarantool access control system in the Управление доступом section.

This topic describes how to create users and grant them the specified privileges in the credentials section of a YAML configuration. For example, you can define users with the replication and sharding roles to maintain replication and sharding in a Tarantool cluster.

You can create new or configure credentials of the existing users in the credentials.users section.

In the example below, a dbadmin user without a password is created:

credentials:
  users:
    dbadmin: {}

To set a password, use the credentials.users.<username>.password option:

credentials:
  users:
    dbadmin:
      password: 'T0p_Secret_P@$$w0rd'

To assign a role to a user, use the credentials.users.<username>.roles option. In this example, the dbadmin user gets privileges granted to the super built-in role:

credentials:
  users:
    dbadmin:
      password: 'T0p_Secret_P@$$w0rd'
      roles: [ super ]

To create a new role, define it in the credentials.roles.* section. In the example below, the writers_space_reader role gets privileges to select data in the writers space:

roles:
  writers_space_reader:
    privileges:
    - permissions: [ read ]
      spaces: [ writers ]

Then, you can assign this role to a user using credentials.users.<username>.roles (sampleuser in the example below):

sampleuser:
  password: '123456'
  roles: [ writers_space_reader ]

You can grant specific privileges directly using credentials.users.<username>.privileges. In this example, sampleuser gets privileges to select and modify data in the books space:

sampleuser:
  password: '123456'
  roles: [ writers_space_reader ]
  privileges:
  - permissions: [ read, write ]
    spaces: [ books ]

You can find the full example here: credentials.

To revoke a previously granted privilege, remove it from the configuration.

For example, here is how to grant privileges to a space and how to revoke one of the privileges:

# grant privileges:
privileges:
- permissions: [read, write]
  spaces: [books]

# revoke a privilege:
  privileges:
  - permissions: [read] # !! write permission revoked !!
    spaces: [books]

If you want to revoke the remaining privilege to from a space, you can remove it, too, thus making permissions an empty array:

# empty permissions array:
privileges:
- permissions: [] # !! read permission revoked !!
  spaces: [books]

You can revoke all privileges by making the privileges an empty array:

# empty privileges array:
  privileges: [] # !! no privileges at all !!

Предупреждение

Do not remove a user or a role from configuration in order to revoke that user’s or role’s privileges. If a user or a role is entirely removed from the configuration, it is not tracked by configuration machinery anymore. The user/role is not removed and its privileges are not revoked.

Tarantool enables you to load secrets from safe storage such as external files or environment variables. To do this, you need to define corresponding options in the config.context section. In the examples below, context.dbadmin_password and context.sampleuser_password define how to load user passwords from *.txt files or environment variables:

After configuring how to load passwords, you can set password values using credentials.users.<username>.password as follows:

credentials:
  users:
    dbadmin:
      password: '{{ context.dbadmin_password }}'
    sampleuser:
      password: '{{ context.sampleuser_password }}'

You can find the full examples here: credentials_context_file, credentials_context_env.

Authentication

Enterprise Edition

Authentication features are supported by the Enterprise Edition only.

Tarantool Enterprise Edition provides the ability to apply additional restrictions for user authentication. For example, you can specify the minimum time between authentication attempts or turn off access for guest users.

In the configuration below, security.auth_retries is set to 2, which means that Tarantool lets a client try to authenticate with the same username three times. At the fourth attempt, the authentication delay configured with security.auth_delay is enforced. This means that a client should wait 10 seconds after the first failed attempt.

security:
  auth_delay: 10
  auth_retries: 2
  disable_guest: true

The disable_guest option turns off access over remote connections from unauthenticated or guest users.

A password policy allows you to improve database security by enforcing the use of strong passwords, setting up a maximum password age, and so on. When you create a new user with box.schema.user.create or update the password of an existing user with box.schema.user.passwd, the password is checked against the configured password policy settings.

In the example below, the following options are specified:

security:
  password_min_length: 16
  password_enforce_lowercase: true
  password_enforce_uppercase: true
  password_enforce_digits: true
  password_enforce_specialchars: true
  password_lifetime_days: 365
  password_history_length: 3

By default, Tarantool uses the CHAP protocol to authenticate users and applies SHA-1 hashing to passwords. Note that CHAP stores password hashes in the _user space unsalted. If an attacker gains access to the database, they may crack a password, for example, using a rainbow table.

In the Enterprise Edition, you can enable PAP authentication with the SHA256 hashing algorithm. For PAP, a password is salted with a user-unique salt before saving it in the database, which keeps the database protected from cracking using a rainbow table.

To enable PAP, specify the security.auth_type option as follows:

security:
  auth_type: 'pap-sha256'

For new users, the box.schema.user.create method generates authentication data using PAP-SHA256. For existing users, you need to reset a password using box.schema.user.passwd to use the new authentication protocol.

Предупреждение

Given that PAP transmits a password as plain text, Tarantool requires configuring SSL/TLS for a connection.

The example below shows how to specify the authentication protocol using the auth_type parameter when connecting to an instance using net.box:

local connection = require('net.box').connect({
    uri = 'admin:topsecret@127.0.0.1:3301',
    params = { auth_type = 'pap-sha256',
               transport = 'ssl',
               ssl_cert_file = 'certs/server.crt',
               ssl_key_file = 'certs/server.key' }
})

If the authentication protocol isn’t specified explicitly on the client side, the client uses the protocol configured on the server via security.auth_type.

Security

This section contains guides related to security features.

Журнал аудита

Enterprise Edition

The audit module is available in the Enterprise Edition only.

Example on GitHub: audit_log

The audit module allows you to record various events occurred in Tarantool. Each event is an action related to authorization and authentication, data manipulation, administrator activity, or system events.

The module provides detailed reports of these activities and helps you find and fix breaches to protect your business. For example, you can see who created a new user and when.

It is up to each company to decide exactly what activities to audit and what actions to take. System administrators, security engineers, and people in charge of the company may want to audit different events for different reasons. Tarantool provides such an option for each of them.

The section describes how to enable and configure audit logging and write logs to a selected destination – a file, a pipe, or a system logger.

Read more: Audit log configuration reference.

To enable audit logging, define the log location using the audit_log.to option in the configuration file. Possible log locations:

In the configuration below, the audit_log.to option is set to file. It means that the logs are written to a file. By default, audit logs are saved in the var/log/{{ instance_name }}/audit.log file. To specify the path to an audit log file explicitly, use the audit_log.file option.

audit_log:
  to: file
  file: 'audit_tarantool.log'

If you log to a file, Tarantool reopens the audit log at SIGHUP.

To disable audit logging, set the audit_log.to option to devnull.

Tarantool’s extensive filtering options help you write only the events you need to the audit log. To select the recorded events, use the audit_log.filter option. Its value can be a list of events and event groups. You can customize the filters and use different combinations of them for your purposes. Possible filtering options:

  • Filter by event. You can set a list of events to be recorded. For example, select password_change to monitor the users who have changed their passwords:

    audit_log:
      filter: [ password_change ]
    
  • Filter by group. You can specify a list of event groups to be recorded. For example, select auth and priv to see the events related to authorization and granted privileges:

    audit_log:
      filter: [ auth,priv ]
    
  • Filter by group and event. You can specify a group and a certain event depending on the purpose. In the configuration below, user_create, data_operations, ddl, and custom are selected to see the events related to:

    • user creation
    • space creation, altering, and dropping
    • data modification or selection from spaces
    • custom events (any events added manually using the audit module API)
    filter: [ user_create,data_operations,ddl,custom ]
    

Use the audit_log.format option to choose the format of audit log events – plain text, CSV, or JSON.

format: json

JSON is used by default. It is more convenient to receive log events, analyze them, and integrate them with other systems if needed. The plain format can be efficiently compressed. The CSV format allows you to view audit log events in tabular form.

The audit_log.spaces option is used to specify a list of space names for which data operation events should be logged.

In the configuration below, only the events from the bands space are logged:

spaces: [ bands ]

If set to true, the audit_log.extract_key option forces the audit subsystem to log the primary key instead of a full tuple in DML operations.

extract_key: true

In this example, the following audit log configuration is used:

audit_log:
  to: file
  file: 'audit_tarantool.log'
  filter: [ user_create,data_operations,ddl,custom ]
  format: json
  spaces: [ bands ]
  extract_key: true

Create a space bands and check the logs in the file after the creation:

box.schema.space.create('bands')

The audit log entry for the space_create event might look as follows:

{
  "time": "2024-01-24T11:43:21.566+0300",
  "uuid": "26af0a7d-1052-490a-9946-e19eacc822c9",
  "severity": "INFO",
  "remote": "unix/:(socket)",
  "session_type": "console",
  "module": "tarantool",
  "user": "admin",
  "type": "space_create",
  "tag": "",
  "description": "Create space Bands"
}

Then insert one tuple to space:

box.space.bands:insert { 1, 'Roxette', 1986 }

If the extract_key option is set to true, the audit system prints the primary key instead of the full tuple:

{
  "time": "2024-01-24T11:45:42.358+0300",
  "uuid": "b437934d-62a7-419a-8d59-e3b33c688d7a",
  "severity": "VERBOSE",
  "remote": "unix/:(socket)",
  "session_type": "console",
  "module": "tarantool",
  "user": "admin",
  "type": "space_insert",
  "tag": "",
  "description": "Insert key [1] into space bands"
}

If the extract_key option is set to false, the audit system prints the full tuple like this:

{
  "time": "2024-01-24T11:45:42.358+0300",
  "uuid": "b437934d-62a7-419a-8d59-e3b33c688d7a",
  "severity": "VERBOSE",
  "remote": "unix/:(socket)",
  "session_type": "console",
  "module": "tarantool",
  "user": "admin",
  "type": "space_insert",
  "tag": "",
  "description": "Insert tuple [1, \"Roxette\", 1986] into space bands"
}

The Tarantool audit log module can record various events that you can monitor and decide whether you need to take actions:

  • Administrator activity – events related to actions performed by the administrator. For example, such logs record the creation of a user.
  • Access events – events related to authorization and authentication of users. For example, such logs record failed attempts to access secure data.
  • Data access and modification – events of data manipulation in the storage.
  • System events – events related to modification or configuration of resources. For example, such logs record the replacement of a space.
  • Custom events – any events added manually using the audit module API.

The full list of available audit log events is provided in the table below:

Событие Event type Severity level Пример
Audit log enabled for events audit_enable VERBOSE  
Custom events custom INFO (default)  
User authorized successfully auth_ok VERBOSE Authenticate user <USER>
User authorization failed auth_fail ALARM Failed to authenticate user <USER>
User logged out or quit the session disconnect VERBOSE Close connection
User created user_create INFO Create user <USER>
User dropped user_drop INFO Drop user <USER>
Role created role_create INFO Create role <ROLE>
Role dropped role_drop INFO Drop role <ROLE>
User disabled user_disable INFO Disable user <USER>
User enabled user_enable INFO Enable user <USER>
User granted rights user_grant_rights INFO Grant <PRIVILEGE> rights for <OBJECT_TYPE> <OBJECT_NAME> to user <USER>
User revoked rights user_revoke_rights INFO Revoke <PRIVILEGE> rights for <OBJECT_TYPE> <OBJECT_NAME> from user <USER>
Role granted rights role_grant_rights INFO Grant <PRIVILEGE> rights for <OBJECT_TYPE> <OBJECT_NAME> to role <ROLE>
Role revoked rights role_revoke_rights INFO Revoke <PRIVILEGE> rights for <OBJECT_TYPE> <OBJECT_NAME> from role <ROLE>
User password changed password_change INFO Change password for user <USER>
Failed attempt to access secure data (for example, personal records, details, geolocation) access_denied ALARM <ACCESS_TYPE> denied to <OBJECT_TYPE> <OBJECT_NAME>
Expressions with arguments evaluated in a string eval INFO Evaluate expression <EXPR>
Function called with arguments call VERBOSE Call function <FUNCTION> with arguments <ARGS>
Iterator key selected from space.index space_select VERBOSE Select <ITER_TYPE> <KEY> from <SPACE>.<INDEX>
Space created space_create INFO Create space <SPACE>
Space altered space_alter INFO Alter space <SPACE>
Space dropped space_drop INFO Drop space <SPACE>
Tuple inserted into space space_insert VERBOSE Insert tuple <TUPLE> into space <SPACE>
Tuple replaced in space space_replace VERBOSE Replace tuple <TUPLE> with <NEW_TUPLE> in space <SPACE>
Tuple deleted from space space_delete VERBOSE Delete tuple <TUPLE> from space <SPACE>

Примечание

The eval event displays data from the console module and the eval function of the net.box module. For more on how they work, see Module console and Module net.box – eval. To separate the data, specify console or binary in the session field.

Each audit log event contains a number of fields that can be used to filter and aggregate the resulting logs. An example of a Tarantool audit log entry in JSON:

{
    "time": "2024-01-15T13:39:36.046+0300",
    "uuid": "cb44fb2b-5c1f-4c4b-8f93-1dd02a76cec0",
    "severity": "VERBOSE",
    "remote": "unix/:(socket)",
    "session_type": "console",
    "module": "tarantool",
    "user": "admin",
    "type": "auth_ok",
    "tag": "",
    "description": "Authenticate user Admin"
}

Each event consists of the following fields:

Field Описание Пример
time Time of the event 2024-01-15T16:33:12.368+0300
uuid Since 3.0.0. A unique identifier of audit log event cb44fb2b-5c1f-4c4b-8f93-1dd02a76cec0
severity Since 3.0.0. A severity level. Each system audit event has a severity level determined by its importance. Custom events have the INFO severity level by default. VERBOSE
remote Remote host that triggered the event unix/:(socket)
session_type Session type console
module Audit log module. Set to tarantool for system events; can be overwritten for custom events tarantool
user User who triggered the event admin
type Audit event type auth_ok
tag A text field that can be overwritten by the user  
description Human-readable event description Authenticate user Admin

Built-in event groups are used to filter the event types that you want to audit. For example, you can set to record only authorization events or only events related to a space.

Tarantool provides the following event groups:

  • all – all events.

    Примечание

    Events call and eval are included only in the all group.

  • auditaudit_enable event.

  • auth – authorization events: auth_ok, auth_fail.

  • priv – events related to authentication, authorization, users, and roles: user_create, user_drop, role_create, role_drop, user_enable, user_disable, user_grant_rights, user_revoke_rights, role_grant_rights, role_revoke_rights.

  • ddl – events of space creation, altering, and dropping: space_create, space_alter, space_drop.

  • dml – events of data modification in spaces: space_insert, space_replace, space_delete.

  • data_operations – events of data modification or selection from spaces: space_select, space_insert, space_replace, space_delete.

  • compatibility – events available in Tarantool before the version 2.10.0. auth_ok, auth_fail, disconnect, user_create, user_drop, role_create, role_drop, user_enable, user_disable, user_grant_rights, user_revoke_rights, role_grant_rights. role_revoke_rights, password_change, access_denied. This group enables the compatibility with earlier Tarantool versions.

Предупреждение

Be careful when recording all and data_operations event groups. The more events you record, the slower the requests are processed over time. It is recommended that you select only those groups whose events your company needs to monitor and analyze.

Tarantool provides an API for writing custom audit log events. To enable these events, specify the custom value in the audit_log.filter option:

filter: [ user_create,data_operations,ddl,custom ]

To log an event, use the audit.log() function that takes one of the following values:

  • Message string. Printed to the audit log with type message:

    audit.log('Hello, Alice!')
    
  • Format string and arguments. Passed to string format and then output to the audit log with type message:

    audit.log('Hello, %s!', 'Bob')
    
  • Table with audit log field values. The table must contain at least one field – description.

    audit.log({ type = 'custom_hello', description = 'Hello, World!' })
    audit.log({ type = 'custom_farewell', user = 'eve', module = 'custom', description = 'Farewell, Eve!' })
    

Alternatively, you can use audit.new() to create a new log module. This allows you to avoid passing all custom audit log fields each time audit.log() is called. The audit.new() function takes a table of audit log field values (same as audit.log()). The type of the log module for writing custom events must either be message or have the custom_ prefix.

local my_audit = audit.new({ type = 'custom_hello', module = 'my_module' })
my_audit:log('Hello, Alice!')
my_audit:log({ tag = 'admin', description = 'Hello, Bob!' })

It is possible to overwrite most of the custom audit log fields using audit.new() or audit.log(). The only audit log field that cannot be overwritten is time.

audit.log({ type = 'custom_hello', description = 'Hello!',
            session_type = 'my_session', remote = 'my_remote' })

If omitted, the session_type is set to the current session type, remote is set to the remote peer address.

Примечание

To avoid confusion with system events, the value of the type field must either be message (default) or begin with the custom_ prefix. Otherwise, you receive the error message. Custom events are filtered out by default.

By default, custom events have the INFO severity level. To override the level, you can:

  • specify the severity field
  • use a shortcut function

The following shortcuts are available:

Shortcut Equivalent
audit.verbose(...) audit.log({severity = 'VERBOSE', ...})
audit.info(...) audit.log({severity = 'INFO', ...})
audit.warning(...) audit.log({severity = 'WARNING', ...})
audit.alarm(...) audit.log({severity = 'ALARM', ...})

Example

audit.log({ severity = 'VERBOSE', description = 'Hello!' })

If you write to a file, the size of the Tarantool audit log is limited by the disk space. If you write to a system logger, the size of the Tarantool audit log is limited by the system logger. If you write to a pipe, the size of the Tarantool audit message is limited by the system buffer. If the audit_log.nonblock = false, if audit_log.nonblock = true, there is no limit.

Consider setting up a schedule in your company. It is recommended to review audit logs at least every 3 months.

It is recommended to store audit logs for at least one year.

It is recommended to use SIEM systems for this issue.

Аудит безопасности

This document will help you audit the security of a Tarantool cluster. It explains certain security aspects, their rationale, and the ways to check them. For details on how to configure Tarantool Enterprise Edition and its infrastructure for each aspect, refer to the security hardening guide.

Tarantool uses the iproto binary protocol for replicating data between instances and also in the connector libraries.

Since version 2.10.0, the Enterprise Edition has the built-in support for using SSL to encrypt the client-server communications over binary connections. For details on enabling SSL encryption, see the Securing connections with SSL section of this document.

In case the built-in encryption is not enabled, we recommend using VPN to secure data exchange between data centers.

Если в кластере Tarantool не используется iproto для внешних запросов, подключение к портам iproto должно разрешаться только между экземплярами Tarantool.

Подробнее о настройке портов для iproto см. в разделе advertise_uri в документации Cartridge.

Экземпляр Tarantool может принимать HTTP-подключения от внешних источников или при доступе к веб-интерфейсу администратора. Все такие подключения должны проходить через веб-сервер с HTTPS, работающий на том же хосте (например, nginx). Это требование относится как к виртуальным, так и к физическим хостам. Проведение HTTP-трафика через несколько отдельных хостов с механизмом HTTPS termination недостаточно безопасно.

Tarantool accepts HTTP connections on a specific port. It must be only available on the same host for nginx to connect to it.

Убедитесь, что настроенный HTTP-порт закрыт, а HTTPS-порт ( по умолчанию 443) открыт.

Модуль console дает возможность подключиться к рабочему экземпляру и запускать пользовательский код Lua. Это полезная возможность для разработчиков и администраторов. В следующих примерах показано, как открыть соединение по TCP-порту и на UNIX-сокете.

console.listen(<port number>)
console.listen('/var/lib/tarantool/socket_name.sock')

Открывать административную консоль через TCP-порт всегда небезопасно. Убедитесь, что в коде нет вызовов формата console.listen(<port_number>).

При подключении через сокет требуется право write для директории /var/lib/tarantool. Убедитесь, что оно выдано только пользователю tarantool.

Connecting to the instance with tt connect or tarantoolctl connect without user credentials (under the guest user) must be disabled.

Есть два способа проверить эту уязвимость:

For more details, refer to the documentation on access control.

Using the web interface must require logging in with a username and password.

Все экземпляры Tarantool должны работать под пользователем tarantool.

Пользователь tarantool не должен иметь права sudo. Кроме того, у него не должно быть пароля, чтобы не допустить входа через SSH или su.

Для надежности резервного копирования в экземпляре Tarantool должны храниться минимум два последних снимка данных. Не забудьте проверить каждый экземпляр.

The snapshot_count value determines the number of kept snapshots. Configuration values are primarily set in the configuration files but can be overridden with environment variables and command-line arguments. So, it’s best to check both the values in the configuration files and the actual values using the console:

tarantool> box.cfg.checkpoint_count
---
- 2

Tarantool фиксирует все входящие данные в журнале упреждающей записи (WAL). WAL должен быть включен, чтобы в случае перезапуска экземпляра эти данные можно было восстановить.

Secure values of the wal.mode configuration option are write and fsync:

wal:
  dir: 'var/lib/{{ instance_name }}/wals'
  mode: 'write'

An exclusion from this requirement is when the instance is processing data, which can be freely rejected - for example, when Tarantool is used for caching. In this case, WAL can be disabled to reduce i/o load.

Уровень ведения журнала должен быть 5 (INFO), 6 (VERBOSE) или 7 (DEBUG). Тогда в случае нарушения безопасности в журналах приложения будет достаточно информации для расследования инцидента.

tarantool> box.cfg.log_level
---
- 5

Полный список существующих уровней см. в разделе справки по log_level.

В Tarantool для ведения журнала следует использовать journald.

Инструкции по повышению безопасности

This guide explains how to enhance security in your Tarantool Enterprise Edition’s cluster using built-in features and provides general recommendations on security hardening. If you need to perform a security audit of a Tarantool Enterprise cluster, refer to the security checklist.

Tarantool Enterprise Edition does not provide a dedicated API for security control. All the necessary configurations can be done via an administrative console or initialization code.

В Tarantool Enterprise есть следующие встроенные средства безопасности:

Tarantool Enterprise поддерживает аутентификацию на основе паролей и допускает два типа соединений:

For more information on authentication and connection types, see the Безопасность section in Administration.

Кроме того, Tarantool предоставляет следующие функциональные возможности:

Для администраторов Tarantool Enterprise предоставляет средства предотвращения несанкционированного доступа к базе данных и к определенным функциям.

Tarantool различает:

The following system spaces are used to store users and privileges:

For more information, see the Access control section.

Users who create objects (spaces, indexes, users, roles, sequences, and functions) in the database become their owners and automatically acquire privileges for what they create. For more information, see the Owners and privileges section.

В Tarantool Enterprise есть встроенный журнал аудита, в котором записываются такие события, как:

The audit log contains:

You can configure the following audit log options:

Для получения дополнительной информации о журналировании см. следующие разделы:

Права доступа к файлам журнала можно настроить, как для любого другого объекта файловой системы Unix – через chmod.

В этом разделе даны рекомендации, которые могут помочь вам повысить безопасность кластера.

Since version 2.10.0, Tarantool Enterprise Edition has built-in support for using SSL to encrypt the client-server communications over binary connections, that is, between Tarantool instances in a cluster. For details on enabling SSL encryption, see the Securing connections with SSL section of this guide.

In case the built-in encryption is not set for particular connections, consider the following security recommendations:

  • настроить туннелирование соединения или
  • зашифровать сами данные, которые хранятся в базе.

For more information on data encryption, see the crypto module reference.

The HTTP server module provided by rocks does not support the HTTPS protocol. To set up a secure connection for a client (e.g., REST service), consider hiding the Tarantool instance (router if it is a cluster of instances) behind an Nginx server and setting up an SSL certificate for it.

To make sure that no information can be intercepted „from the wild“, run nginx on the same physical server as the instance and set up their communication over a Unix socket. For more information, see the socket module reference.

To protect the cluster from any unwanted network activity „from the wild“, configure the firewall on each server to allow traffic on ports listed in Network requirements.

Если вы используете статические IP-адреса, повторно внесите их в белый список на каждом сервере, поскольку кластер работает на принципах полносвязной топологии (full mesh topology). Рекомендуется внести в черный список всех остальные адреса на всех серверах, кроме роутера (работающего за сервером Nginx).

Tarantool Enterprise не предоставляет защиту от DoS-атак или DDoS-атак. Для этих целей рекомендуется использовать сторонние программы.

Tarantool Enterprise Edition does not keep checksums or provide the means to control data integrity. However, it ensures data persistence using a write-ahead log, regularly snapshots the entire data set to disk, and checks the data format whenever it reads the data back from the disk. For more information, see the Data persistence section.

Триггеры

Триггеры, которые также называют обратными вызовами, представляют собой функции, которые выполняет сервер при наступлении определенных событий.

Чтобы связать событие с колбэк-функцией, передайте её в соответствующую функцию обработки событий on_event:

Тогда сервер сохранит колбэк-функцию и будет вызывать ее при наступлении соответствующего события.

У всех триггеров есть следующие особенности:

Пример:

Здесь мы записываем события подключения и отключения в журнал на сервере Tarantool.

log = require('log')

function on_connect_impl()
 log.info("connected "..box.session.peer()..", sid "..box.session.id())
end

function on_disconnect_impl()
 log.info("disconnected, sid "..box.session.id())
end

function on_auth_impl(user)
 log.info("authenticated sid "..box.session.id().." as "..user)
end

function on_connect() pcall(on_connect_impl) end
function on_disconnect() pcall(on_disconnect_impl) end
function on_auth(user) pcall(on_auth_impl, user) end

box.session.on_connect(on_connect)
box.session.on_disconnect(on_disconnect)
box.session.on_auth(on_auth)

Applications

Using Tarantool as an application server, you can write your own applications. Tarantool’s native language for writing applications is Lua, so a typical application would be a file that contains your Lua script. But you can also write applications in C or C++.

Launching an application

Using Tarantool as an application server, you can write your own applications. Tarantool’s native language for writing applications is Lua, so a typical application would be a file that contains your Lua script. But you can also write applications in C or C++.

Примечание

If you’re new to Lua, we recommend going over the interactive Tarantool tutorial before proceeding with this chapter. To launch the tutorial, say tutorial() in Tarantool console:

tarantool> tutorial()
---
- |
 Tutorial -- Screen #1 -- Hello, Moon
 ====================================

 Welcome to the Tarantool tutorial.
 It will introduce you to Tarantool’s Lua application server
 and database server, which is what’s running what you’re seeing.
 This is INTERACTIVE -- you’re expected to enter requests
 based on the suggestions or examples in the screen’s text.
 <...>

Let’s create and launch our first Lua application for Tarantool. Here’s a simplest Lua application, the good old «Hello, world!»:

#!/usr/bin/env tarantool
print('Hello, world!')

We save it in a file. Let it be myapp.lua in the current directory.

Now let’s discuss how we can launch our application with Tarantool.

If we run Tarantool in a Docker container, the following command will start Tarantool without any application:

$ # create a temporary container and run it in interactive mode
$ docker run --rm -t -i tarantool/tarantool:latest

To run Tarantool with our application, we can say:

$ # create a temporary container and
$ # launch Tarantool with our application
$ docker run --rm -t -i \
             -v `pwd`/myapp.lua:/opt/tarantool/myapp.lua \
             -v /data/dir/on/host:/var/lib/tarantool \
             tarantool/tarantool:latest tarantool /opt/tarantool/myapp.lua

Here two resources on the host get mounted in the container:

By convention, the directory for Tarantool application code inside a container is /opt/tarantool, and the directory for data is /var/lib/tarantool.

If we run Tarantool from a package or from a source build, we can launch our application:

The simplest way is to pass the filename to Tarantool at start:

$ tarantool myapp.lua
Hello, world!
$

Tarantool starts, executes our script in the script mode and exits.

Now let’s turn this script into a server application. We use box.cfg from Tarantool’s built-in Lua module to:

We also add some simple database logic, using space.create() and create_index() to create a space with a primary index. We use the function box.once() to make sure that our logic will be executed only once when the database is initialized for the first time, so we don’t try to create an existing space or index on each invocation of the script:

#!/usr/bin/env tarantool
-- Configure database
box.cfg {
   listen = 3301
}
box.once("bootstrap", function()
   box.schema.space.create('tweedledum')
   box.space.tweedledum:create_index('primary',
       { type = 'TREE', parts = {1, 'unsigned'}})
end)

Now we launch our application in the same manner as before:

$ tarantool myapp.lua
Hello, world!
2017-08-11 16:07:14.250 [41436] main/101/myapp.lua C> version 2.1.0-429-g4e5231702
2017-08-11 16:07:14.250 [41436] main/101/myapp.lua C> log level 5
2017-08-11 16:07:14.251 [41436] main/101/myapp.lua I> mapping 1073741824 bytes for tuple arena...
2017-08-11 16:07:14.255 [41436] main/101/myapp.lua I> recovery start
2017-08-11 16:07:14.255 [41436] main/101/myapp.lua I> recovering from `./00000000000000000000.snap'
2017-08-11 16:07:14.271 [41436] main/101/myapp.lua I> recover from `./00000000000000000000.xlog'
2017-08-11 16:07:14.271 [41436] main/101/myapp.lua I> done `./00000000000000000000.xlog'
2017-08-11 16:07:14.272 [41436] main/102/hot_standby I> recover from `./00000000000000000000.xlog'
2017-08-11 16:07:14.274 [41436] iproto/102/iproto I> binary: started
2017-08-11 16:07:14.275 [41436] iproto/102/iproto I> binary: bound to [::]:3301
2017-08-11 16:07:14.275 [41436] main/101/myapp.lua I> done `./00000000000000000000.xlog'
2017-08-11 16:07:14.278 [41436] main/101/myapp.lua I> ready to accept requests

This time, Tarantool executes our script and keeps working as a server, accepting TCP requests on port 3301. We can see Tarantool in the current session’s process list:

$ ps | grep "tarantool"
  PID TTY           TIME CMD
41608 ttys001       0:00.47 tarantool myapp.lua <running>

But the Tarantool instance will stop if we close the current terminal window. To detach Tarantool and our application from the terminal window, we can launch it in the daemon mode. To do so, we add some parameters to box.cfg{}:

For example:

box.cfg {
   listen = 3301,
   background = true,
   log = '1.log',
   pid_file = '1.pid'
}

We launch our application in the same manner as before:

$ tarantool myapp.lua
Hello, world!
$

Tarantool executes our script, gets detached from the current shell session (you won’t see it with ps | grep "tarantool") and continues working in the background as a daemon attached to the global session (with SID = 0):

$ ps -ef | grep "tarantool"
  PID SID     TIME  CMD
42178   0  0:00.72 tarantool myapp.lua <running>

Now that we have discussed how to create and launch a Lua application for Tarantool, let’s dive deeper into programming practices.

Application roles

An application role is a Lua module that implements specific functions or logic. You can turn on or off a particular role for certain instances in a configuration without restarting these instances. A role is run when a configuration is loaded or reloaded.

Roles can be divided into the following groups:

This section describes how to develop custom roles. To learn how to enable and configure roles, see Enabling and configuring roles.

Примечание

Don’t confuse application roles with other role types:

A custom role can be configured in the same way as roles provided by Tarantool or third-party Lua modules. You can learn more from Enabling and configuring roles.

This example shows how to enable and configure the greeter role, which is implemented in the next section:

instance001:
  roles: [ greeter ]
  roles_cfg:
    greeter:
      greeting: 'Hi'

The role configuration provided in roles_cfg can be accessed when validating and applying this configuration.

Tarantool includes the experimental.config.utils.schema built-in module that provides tools for managing user-defined configurations of applications (app.cfg) and roles (roles_cfg). The examples below show its basic usage.

Given that a role is a Lua module, a role name is passed to require() to obtain the module. When developing an application, you can place a file with the role code next to the cluster configuration file.

A custom application role is an object which implements custom functions or logic adding to Tarantool’s built-in roles and roles provided by third-party Lua modules. For example, a logging role can be created to add logging functionality on top of the built-in one.

Creating a custom role includes the following steps:

  1. (Optional) Define the role configuration schema.
  2. Define a function that validates a role configuration.
  3. Define a function that applies a validated configuration.
  4. Define a function that stops a role.
  5. (Optional) Define roles from which this custom role depends on.
  6. (Optional) Define the on_event callback function.

As a result, a role module should return an object that has corresponding functions and fields specified:

return {
    validate = function() -- ... -- end,
    apply = function() -- ... -- end,
    stop = function() -- ... -- end,
    dependencies = { -- ... -- },
    on_event = function(config, key, value)
        local log = require('log')
        log.info('roles_cfg.my_role.foo: ' .. config.foo)
        log.info('on_event is triggered by ' .. key)
        log.info('is_ro: ' .. value.is_ro)
    end,
}

The examples in this article show how to do this.

You can omit the optional steps and get a simple role as in the example below.

return {
    validate = function() -- ... -- end,
    apply = function() -- ... -- end,
    stop = function() -- ... -- end,
}

You can modify a role, for example, by adding dependencies or specifying the on_event callback. If you modify a role, you need to restart the Tarantool instance with the role in order to apply the changes.

Примечание

  • Code snippets shown in this section are included from the following application: application_role_cfg.

The experimental.config.utils.schema built-in module provides the schema_object class. An object of this class defines a custom configuration scheme of a role or an application.

This example shows how to define a schema that reflects the role configuration shown above:

local greeter_schema = schema.new('greeter', schema.record({
    greeting = schema.scalar({
        type = 'string',
        allowed_values = { 'Hi', 'Hello' }
    })
}))

If you don’t use the module, skip this step. In this case, use the cfg argument of the role’s validate() and apply() functions to refer to its configuration values, for example, cfg.greeting.

To validate a role configuration, you need to define the validate([cfg]) function.

In the example below, the validate() function of the role configuration schema is used to validate the greeting value:

local function validate(cfg)
    greeter_schema:validate(cfg)
end

If the configuration is not valid, validate() reports an unrecoverable error by throwing an error object.

To apply the validated configuration, define the apply([cfg]) function. As the validate() function, apply() provides access to a role’s configuration using the cfg argument.

In the example below, the apply() function uses the log module to write a value from the role configuration to the log:

local function apply(cfg)
    log.info("%s from the 'greeter' role!", greeter_schema:get(cfg, 'greeting'))
end

To stop a role, use the stop() function.

In the example below, the stop() function uses the log module to indicate that a role is stopped:

local function stop()
    log.info("The 'greeter' role is stopped")
end

When you’ve defined all the role functions, you need to return an object that has corresponding functions specified:

return {
    validate = validate,
    apply = apply,
    stop = stop,
}

To define a role’s dependencies, use the dependencies field. In this example, the byeer role has the greeter role as the dependency:

-- byeer.lua --
local log = require('log').new("byeer")

return {
    dependencies = { 'greeter' },
    validate = function() end,
    apply = function() log.info("Bye from the 'byeer' role!") end,
    stop = function() end,
}

A role cannot be started without its dependencies. This means that all the dependencies of a role should be defined in the roles configuration parameter:

instance001:
  roles: [ greeter, byeer ]

You can find the full example here: application_role_cfg.

Since version 3.3.1, you can define the on_event callback for custom roles. The on_event callback is called every time a box.status system event is broadcasted. If multiple custom roles have the on_event callback defined, these callbacks are called one after another in the order defined by roles dependencies.

The on_event callback accepts 3 arguments, when it is called:

  • config, which contains the configuration of the role;

  • key, which reflects the trigger event and is set to:

    • config.apply if the callback was triggered by a configuration update;
    • box.status if it was triggered by the box.status system event.
  • value, which shows the information about the instance status as in the trigger box.status system event. If the callback is triggered by a configuration update, the value shows the information of the most recent box.status system event.

Примечание

  • All on_event callbacks with the config.apply key are executed as a part of the configuration process. Process statuses ready or check_warnings are reached only after all such on_event callbacks are done.
  • All on_event callbacks are executed inside of a pcall. If an error is raised for a callback, it is logged with the error level and the series execution continues.

The example of the on_event callback is provided in the spaces creation article below.

You can add initialization code to a role by defining and calling a function with an arbitrary name at the top level of a module, for example:

local function init()
    -- ... --
end

init()

For example, you can create spaces, define indexes, or grant privileges to specific users or roles.

See also: Specifics of creating spaces.

To create a space in a role, you need to make sure that the target instance is in read-write mode (its box.info.ro is false). You can check an instance state by subscribing to the box.status event using box.watch():

box.watch('box.status', function()
    -- creating a space
    -- ...
end)

Примечание

Given that a role may be enabled when an instance is already in read-write mode, you also need to execute schema initialization code from apply(). To make sure a space is created only once, use the if_not_exists option.

Since version 3.3.1, you can define space creation in a role via the on_event callback function.

See the example of such definition below:

return {
    validate = function() end,
    apply = function() end,
    stop = function() end,
    on_event = function(config, key, value)
        -- Can only create spaces on RW.
        if value.is_ro then
            return
        end
        -- Assume the role config is a table.
        if type(config) ~= 'table' then
            error('Config must be a table')
        end
        local space_name = config.space_name or 'default'
        box.schema.space.create(space_name, {
            if_not_exists = true,
        })
    end
}

A role’s life cycle includes the stages described below.

  1. Loading roles

    On each run, all roles are loaded in the order they are specified in the configuration. This stage takes effect when a role is enabled or an instance with this role is restarted. At this stage, a role executes the initialization code.

    A role cannot be started if it has dependencies that are not specified in a configuration.

    Примечание

    Dependencies do not affect the order in which roles are loaded. However, the validate(), apply(), and stop() functions are executed taking dependencies into account. Learn more in Executing functions for dependent roles.

  1. Stopping roles

    This stage takes effect during a configuration reload when a role is removed from the configuration for a given instance. Note that all stop() calls are performed before any validate() or apply() calls. This means that old roles are stopped first, and only then new roles are started.

  1. Validating a role’s configurations

    At this stage, a configuration for each role is validated using the corresponding validate() function in the same order in which they are specified in the configuration.

  1. Applying a role’s configurations

    At this stage, a configuration for each role is applied using the corresponding apply() function in the same order in which they are specified in the configuration.

All role’s functions report an unrecoverable error by throwing an error object. If an error is thrown in any phase, applying a configuration is stopped. If starting or stopping a role throws an error, no roles are stopped or started afterward. An error is caught and shown in config:info() in the alerts section.

For roles that depend on each other, their validate(), apply(), and stop() functions are executed taking into account the dependencies. Suppose, there are three independent and two dependent roles:

role1
role2
role3
    └─── role4
             └─── role5
  • role1, role2, and role5 are independent roles.
  • role3 depends on role4, role4 depends on role5.

The roles are enabled in a configuration as follows:

roles: [ role1, role2, role3, role4, role5 ]

In this case, validate() and apply() for these roles are executed in the following order:

role1 -> role2 -> role5 -> role4 -> role3

Roles removed from a configuration are stopped in the order reversed to the order they are specified in a configuration, taking into account the dependencies. Suppose, all roles except role1 are removed from the configuration above:

roles: [ role1 ]

After reloading a configuration, stop() functions for the removed roles are executed in the following order:

role3 -> role4 -> role5 -> role2

The example below shows how to enable the custom greeter role for instance001:

instance001:
  roles: [ greeter ]

The implementation of this role looks as follows:

-- greeter.lua --
return {
    validate = function() end,
    apply = function() require('log').info("Hi from the 'greeter' role!") end,
    stop = function() end,
}

Example on GitHub: application_role

The example below shows how to enable the custom greeter role for instance001 and specify the configuration for this role:

instance001:
  roles: [ greeter ]
  roles_cfg:
    greeter:
      greeting: 'Hi'

The implementation of this role looks as follows:

-- greeter.lua --
local log = require('log').new("greeter")
local schema = require('experimental.config.utils.schema')

local greeter_schema = schema.new('greeter', schema.record({
    greeting = schema.scalar({
        type = 'string',
        allowed_values = { 'Hi', 'Hello' }
    })
}))

local function validate(cfg)
    greeter_schema:validate(cfg)
end

local function apply(cfg)
    log.info("%s from the 'greeter' role!", greeter_schema:get(cfg, 'greeting'))
end

local function stop()
    log.info("The 'greeter' role is stopped")
end

return {
    validate = validate,
    apply = apply,
    stop = stop,
}

Example on GitHub: application_role_cfg

The example below shows how to enable and configure the http-api custom role:

instance001:
  roles: [ http-api ]
  roles_cfg:
    http-api:
      host: '127.0.0.1'
      port: 8080

The implementation of this role looks as follows:

-- http-api.lua --
local httpd
local json = require('json')
local schema = require('experimental.config.utils.schema')

local function validate_host(host, w)
    local host_pattern = "^(%d+)%.(%d+)%.(%d+)%.(%d+)$"
    if not host:match(host_pattern) then
        w.error("'host' should be a string containing a valid IP address, got %q", host)
    end
end

local function validate_port(port, w)
    if port <= 1 or port >= 65535 then
        w.error("'port' should be between 1 and 65535, got %d", port)
    end
end

local listen_address_schema = schema.new('listen_address', schema.record({
    host = schema.scalar({
        type = 'string',
        validate = validate_host,
        default = '127.0.0.1',
    }),
    port = schema.scalar({
        type = 'integer',
        validate = validate_port,
        default = 8080,
    }),
}))

local function validate(cfg)
    listen_address_schema:validate(cfg)
end

local function apply(cfg)
    if httpd then
        httpd:stop()
    end
    local cfg_with_defaults = listen_address_schema:apply_default(cfg)
    local host = listen_address_schema:get(cfg_with_defaults, 'host')
    local port = listen_address_schema:get(cfg_with_defaults, 'port')
    httpd = require('http.server').new(host, port)
    local response_headers = { ['content-type'] = 'application/json' }
    httpd:route({ path = '/band/:id', method = 'GET' }, function(req)
        local id = req:stash('id')
        local band_tuple = box.space.bands:get(tonumber(id))
        if not band_tuple then
            return { status = 404, body = 'Band not found' }
        else
            local band = { id = band_tuple['id'],
                           band_name = band_tuple['band_name'],
                           year = band_tuple['year'] }
            return { status = 200, headers = response_headers, body = json.encode(band) }
        end
    end)
    httpd:route({ path = '/band', method = 'GET' }, function(req)
        local limit = req:query_param('limit')
        if not limit then
            limit = 5
        end
        local band_tuples = box.space.bands:select({}, { limit = tonumber(limit) })
        local bands = {}
        for _, tuple in pairs(band_tuples) do
            local band = { id = tuple['id'],
                           band_name = tuple['band_name'],
                           year = tuple['year'] }
            table.insert(bands, band)
        end
        return { status = 200, headers = response_headers, body = json.encode(bands) }
    end)
    httpd:start()
end

local function stop()
    httpd:stop()
end

local function init()
    require('data'):add_sample_data()
end

init()

return {
    validate = validate,
    apply = apply,
    stop = stop,
}

Example on GitHub: application_role_http_api

Members  
validate([cfg]) Validate a role’s configuration.
apply([cfg]) Apply a role’s configuration.
stop() Stop a role.
dependencies Define a role’s dependencies.
validate([cfg])

Validate a role’s configuration. This function is called on instance startup or when the configuration is reloaded for the instance with this role. Note that the validate() function is called regardless of whether the role’s configuration or any field in a cluster’s configuration is changed.

validate() should throw an error if the validation fails.

Параметры:
  • cfg – a role’s role configuration to be validated. This parameter provides access to configuration options defined in roles_cfg.<role_name>. To get values of configuration options placed outside roles_cfg.<role_name>, use config:get().

See also: Validating a role configuration

apply([cfg])

Apply a role’s configuration. apply() is called after validate() is executed for all the enabled roles. As the validate() function, apply() is called on instance startup or when the configuration is reloaded for the instance with this role.

apply() should throw an error if the specified configuration can’t be applied.

Примечание

Note that apply() is not invoked if an instance switches to read-write mode when replication.failover is set to election or supervised. You can check an instance state by subscribing to the box.status event using box.watch().

Параметры:
  • cfg – a role’s role configuration to be applied. This parameter provides access to configuration options defined in roles_cfg.<role_name>. To get values of configuration options placed outside roles_cfg.<role_name>, use config:get().

See also: Applying a role configuration

stop()

Stop a role. This function is called on configuration reload if the role is removed from roles for the given instance.

See also: Stopping a role

dependencies

(Optional) Define a role’s dependencies.

Rtype:table

See also: Role dependencies

Fibers, yields, and cooperative multitasking

Creating a fiber is the Tarantool way of making application logic work in the background at all times. A fiber is a set of instructions that are executed with cooperative multitasking: the instructions contain yield signals, upon which control is passed to another fiber.

Fibers are similar to threads of execution in computing. The key difference is that threads use preemptive multitasking, while fibers use cooperative multitasking (see below). This gives fibers the following two advantages over threads:

Yet fibers have some limitations as compared with threads, the main limitation being no multi-core mode. All fibers in an application belong to a single thread, so they all use the same CPU core as the parent thread. Meanwhile, this limitation is not really serious for Tarantool applications, because a typical bottleneck for Tarantool is the HDD, not the CPU.

A fiber has all the features of a Lua coroutine and all programming concepts that apply for Lua coroutines will apply for fibers as well. However, Tarantool has made some enhancements for fibers and has used fibers internally. So, although the use of coroutines is possible and supported, the use of fibers is recommended.

Any live fiber can be in one of three states: running, suspended, and ready. After a fiber dies, the dead status returns.

To learn more about fibers, go to the fiber module documentation.

Yield is an action that occurs in a cooperative environment that transfers control of the thread from the current fiber to another fiber that is ready to execute.

Any live fiber can be in one of three states: running, suspended, and ready. After a fiber dies, the dead status is returned. By observing fibers from the outside, you can only see running (for the current fiber) and suspended for any other fiber waiting for an event from the event loop (ev) for execution.

../../../_images/yields.svg

After a yield has occurred, the next ready fiber is taken from the queue and executed. When there are no more ready fibers, execution is transferred to the event loop.

After a fiber has yielded and regained control, it immediately issues testcancel.

Yields can be explicit or implicit.

Explicit yields are clearly visible from the invoking code. There are only two explicit yields: fiber.yield() and fiber.sleep(t).

  • fiber.yield() yields execution to another ready fiber while putting itself in the ready state, meaning that it will be executed again as soon as possible while being polite to other fibers waiting for execution.
  • fiber.sleep(t) yields execution to another ready fiber and puts itself in the suspended state for time t until time passes and the event loop wakes up this fiber to the ready state.

In general, it is good behavior for long-running cpu-intensive tasks to yield periodically to be cooperative to other waiting fibers.

On the other hand, there are many operations, such as operations with sockets, file system, and disk I/O, which imply some waiting for the current fiber while others can be executed. When such an operation occurs, a possible blocking operation would be passed into the event loop and the fiber would be suspended until the resource is ready to continue fiber execution.

Here is the list of implicitly yielding operations:

  • Connection establishment (socket).
  • Socket read and write (socket).
  • Filesystem operations (from fio).
  • Channel data transfer (fiber.channel).
  • File input/output (from fio).
  • Console operations (since console is a socket).
  • HTTP requests (since HTTP is a socket operation).
  • Database modifications (if they imply a disk write).
  • Database reading for the vinyl engine.
  • Invocation of another process (popen).

Примечание

Please note that all operations of the os module are non-cooperative and exclusively block the whole tx thread.

For memtx, since all data is in memory, there is no yielding for a read request (like :select, :pairs, :get).

For vinyl, since some data may not be in memory, there may be disk I/O for a read (to fetch data from disk) or write (because a stall may occur while waiting for memory to be freed).

For both memtx and vinyl, since data change requests must be recorded in the WAL, there is normally a box.commit().

With the default autocommit mode the following operations are yielding:

To provide atomicity for transactions in transaction mode, some changes are applied to the modification operations for the memtx engine. After executing box.begin or within a box.atomic call, any modification operation will not yield, and yield will occur only on box.commit or upon return from box.atomic. Meanwhile, box.rollback does not yield.

That is why executing separate commands like select(), insert(), update() in the console inside a transaction without MVCC will cause it to an abort. This is due to implicit yield after each chunk of code is executed in the console.

  • Engine = memtx.
space:get()
space:insert()

The sequence has one yield, at the end of the insert, caused by implicit commit; get() has nothing to write to the WAL and so does not yield.

  • Engine = memtx.
box.begin()
space1:get()
space1:insert()
space2:get()
space2:insert()
box.commit()

The sequence has one yield, at the end of the box.commit, none of the inserts are yielding.

  • Engine = vinyl.
space:get()
space:insert()

The sequence has one to three yields, since get() may yield if the data is not in the cache, insert() may yield if it waits for available memory, and there is an implicit yield at commit.

  • Engine = vinyl.
box.begin()
space1:get()
space1:insert()
space2:get()
space2:insert()
box.commit()

The sequence may yield from 1 to 5 times.

Assume that there are tuples in the memtx space tester where the third field represents a positive dollar amount.

Let’s start a transaction, withdraw from tuple#1, deposit in tuple#2, and end the transaction, making its effects permanent.

tarantool> function txn_example(from, to, amount_of_money)
         >   box.atomic(function()
         >     box.space.tester:update(from, {{'-', 3, amount_of_money}})
         >     box.space.tester:update(to,   {{'+', 3, amount_of_money}})
         >   end)
         >   return "ok"
         > end

Result:
---
...
tarantool> txn_example({999}, {1000}, 1.00)
---
- "ok"
...

If wal_mode = none, then there is no implicit yielding at the commit time because there are no writes to the WAL.

If a request if performed via network connector such as net.box and implies sending requests to the server and receiving responses, then it involves network I/O and thus implicit yielding. Even if the request that is sent to the server has no implicit yield. Therefore, the following sequence causes yields three times sequentially when sending requests to the network and awaiting the results.

conn.space.test:get{1}
conn.space.test:get{2}
conn.space.test:get{3}

Cooperative multitasking means that unless a running fiber deliberately yields control, it is not preempted by some other fiber. But a running fiber will deliberately yield when it encounters a «yield point»: a transaction commit, an operating system call, or an explicit «yield» request. Any system call which can block will be performed asynchronously, and any running fiber which must wait for a system call will be preempted, so that another ready-to-run fiber takes its place and becomes the new running fiber.

This model makes all programmatic locks unnecessary: cooperative multitasking ensures that there will be no concurrency around a resource, no race conditions, and no memory consistency issues. The way to achieve this is simple: Use no yields, explicit or implicit in critical sections, and no one can interfere with code execution.

For small requests, such as simple UPDATE or INSERT or DELETE or SELECT, fiber scheduling is fair: it takes little time to process the request, schedule a disk write, and yield to a fiber serving the next client.

However, a function may perform complex calculations or be written in such a way that yields take a long time to occur. This can lead to unfair scheduling when a single client throttles the rest of the system, or to apparent stalls in processing requests. It is the responsibility of the function author to avoid this situation. As a protective mechanism, a fiber slice can be used.

Примеры и рекомендации по разработке на Lua

Ниже представлены дополнения в виде Lua-программ для часто встречающихся или сложных случаев.

Любую из этих программ можно выполнить, скопировав код в .lua-файл, а затем выполнив в командной строке chmod +x ./имя-программы.lua и :samp :./{имя-программы}.lua.

Первая строка – это шебанг:

#!/usr/bin/env tarantool

Он запускает сервер приложений Tarantool на языке Lua, который должен быть в пути выполнения.

В этом разделе собраны следующие рецепты:

Можно использовать свободно.

Другие рецепты см. на GitHub Tarantool.

Стандартный пример простой программы.

#!/usr/bin/env tarantool

 print('Hello, World!')

Для инициализации базы данных (создания спейсов) используйте box.once(), если сервер запускается впервые. Затем используйте console.start(), чтобы запустить интерактивный режим.

#!/usr/bin/env tarantool

-- Настроить базу данных
box.cfg {
    listen = 3313
}

box.once("bootstrap", function()
    box.schema.space.create('tweedledum')
    box.space.tweedledum:create_index('primary',
        { type = 'TREE', parts = {1, 'unsigned'}})
end)

require('console').start()

Используйте Модуль fio, чтобы открыть, прочитать и закрыть файл.

#!/usr/bin/env tarantool

local fio = require('fio')
local errno = require('errno')
local f = fio.open('/tmp/xxxx.txt', {'O_RDONLY' })
if not f then
    error("Failed to open file: "..errno.strerror())
end
local data = f:read(4096)
f:close()
print(data)

Используйте Модуль fio, чтобы открыть, записать данные и закрыть файл.

#!/usr/bin/env tarantool

local fio = require('fio')
local errno = require('errno')
local f = fio.open('/tmp/xxxx.txt', {'O_CREAT', 'O_WRONLY', 'O_APPEND'},
    tonumber('0666', 8))
if not f then
    error("Failed to open file: "..errno.strerror())
end
f:write("Hello\n");
f:close()

Используйте Библиотеку LuaJIT FFI, чтобы вызвать встроенную в C функцию: printf(). (Чтобы лучше понимать FFI, см. Учебное пособие по FFI.)

#!/usr/bin/env tarantool

local ffi = require('ffi')
ffi.cdef[[
    int printf(const char *format, ...);
]]

ffi.C.printf("Hello, %s\n", os.getenv("USER"));

Используйте Библиотеку LuaJIT FFI, чтобы вызвать встроенную в C функцию: gettimeofday(). Она позволяет получить значение времени с точностью в миллисекундах, в отличие от функции времени в Tarantool Модуль clock.

#!/usr/bin/env tarantool

local ffi = require('ffi')
ffi.cdef[[
    typedef long time_t;
    typedef struct timeval {
    time_t tv_sec;
    time_t tv_usec;
} timeval;
    int gettimeofday(struct timeval *t, void *tzp);
]]

local timeval_buf = ffi.new("timeval")
local now = function()
    ffi.C.gettimeofday(timeval_buf, nil)
    return tonumber(timeval_buf.tv_sec * 1000 + (timeval_buf.tv_usec / 1000))
end

Используйте Библиотеку LuaJIT FFI, чтобы вызвать библиотечную функцию в C. (Чтобы лучше понимать FFI, см. Учебное пособие по FFI.)

#!/usr/bin/env tarantool

local ffi = require("ffi")
ffi.cdef[[
    unsigned long compressBound(unsigned long sourceLen);
    int compress2(uint8_t *dest, unsigned long *destLen,
    const uint8_t *source, unsigned long sourceLen, int level);
    int uncompress(uint8_t *dest, unsigned long *destLen,
    const uint8_t *source, unsigned long sourceLen);
]]
local zlib = ffi.load(ffi.os == "Windows" and "zlib1" or "z")

-- Надстройка Lua для функции compress2()
local function compress(txt)
    local n = zlib.compressBound(#txt)
    local buf = ffi.new("uint8_t[?]", n)
    local buflen = ffi.new("unsigned long[1]", n)
    local res = zlib.compress2(buf, buflen, txt, #txt, 9)
    assert(res == 0)
    return ffi.string(buf, buflen[0])
end

-- Надстройка Lua для функции uncompress
local function uncompress(comp, n)
    local buf = ffi.new("uint8_t[?]", n)
    local buflen = ffi.new("unsigned long[1]", n)
    local res = zlib.uncompress(buf, buflen, comp, #comp)
    assert(res == 0)
    return ffi.string(buf, buflen[0])
end

-- Простой код теста
local txt = string.rep("abcd", 1000)
print("Uncompressed size: ", #txt)
local c = compress(txt)
print("Compressed size: ", #c)
local txt2 = uncompress(c, #txt)
assert(txt2 == txt)

Используйте Библиотеку LuaJIT FFI, чтобы получить доступ к объекту в C с помощью метаметода (метод, который определен метатаблицей).

#!/usr/bin/env tarantool

local ffi = require("ffi")
ffi.cdef[[
typedef struct { double x, y; } point_t;
]]

local point
local mt = {
  __add = function(a, b) return point(a.x+b.x, a.y+b.y) end,
  __len = function(a) return math.sqrt(a.x*a.x + a.y*a.y) end,
  __index = {
    area = function(a) return a.x*a.x + a.y*a.y end,
  },
}
point = ffi.metatype("point_t", mt)

local a = point(3, 4)
print(a.x, a.y)  --> 3  4
print(#a)        --> 5
print(a:area())  --> 25
local b = a + point(0.5, 8)
print(#b)        --> 12.5

Используйте оператор „#“, чтобы получить количество элементов в Lua-таблице типа массива. У этой операции сложность O(log(N)).

#!/usr/bin/env tarantool

array = { 1, 2, 3}
print(#array)

Отсутствующие элементы в массивах, которые Lua рассматривает как nil, заставляют простой оператор „#“ выдавать неправильные результаты. Команда «print(#t)» выведет «4», команда «print(counter)» выведет «3», а команда «print(max)» – «10». Другие табличные функции, такие как table.sort(), также сработают неправильно при наличии нулевых значений nil.

#!/usr/bin/env tarantool

local t = {}
t[1] = 1
t[4] = 4
t[10] = 10
print(#t)
local counter = 0
for k,v in pairs(t) do counter = counter + 1 end
print(counter)
local max = 0
for k,v in pairs(t) do if k > max then max = k end end
print(max)

Используйте явные значения``NULL``, чтобы избежать проблем, вызванных nil в Lua == поведение с пропущенными значениями. Хотя json.NULL == nil является true, все команды вывода в данной программе выведут правильное значение: 10.

#!/usr/bin/env tarantool

local json = require('json')
local t = {}
t[1] = 1; t[2] = json.NULL; t[3]= json.NULL;
t[4] = 4; t[5] = json.NULL; t[6]= json.NULL;
t[6] = 4; t[7] = json.NULL; t[8]= json.NULL;
t[9] = json.NULL
t[10] = 10
print(#t)
local counter = 0
for k,v in pairs(t) do counter = counter + 1 end
print(counter)
local max = 0
for k,v in pairs(t) do if k > max then max = k end end
print(max)

Программа используется для получения количества элементов в таблице типа ассоциативного массива.

#!/usr/bin/env tarantool

local map = { a = 10, b = 15, c = 20 }
local size = 0
for _ in pairs(map) do size = size + 1; end
print(size)

Программа использует особенность Lua менять местами две переменные без необходимости использования третьей переменной.

#!/usr/bin/env tarantool

local x = 1
local y = 2
x, y = y, x
print(x, y)

Используется для создания класса, метатаблицы для класса, экземпляра класса. Другой пример можно найти в http://lua-users.org/wiki/LuaClassesWithMetatable.

#!/usr/bin/env tarantool

-- определить объекты класса
local myclass_somemethod = function(self)
    print('test 1', self.data)
end

local myclass_someothermethod = function(self)
    print('test 2', self.data)
end

local myclass_tostring = function(self)
    return 'MyClass <'..self.data..'>'
end

local myclass_mt = {
    __tostring = myclass_tostring;
    __index = {
        somemethod = myclass_somemethod;
        someothermethod = myclass_someothermethod;
    }
}

-- создать новый объект своего класса myclass
local object = setmetatable({ data = 'data'}, myclass_mt)
print(object:somemethod())
print(object.data)

Запустите сборщик мусора в Lua с помощью функции collectgarbage.

#!/usr/bin/env tarantool

collectgarbage('collect')

Запустите один файбер для производителя и один файбер для потребителя. Используйте fiber.channel() для обмена данных и синхронизации. Можно настроить ширину канала (ch_size в программном коде) для управления количеством одновременных задач к обработке.

#!/usr/bin/env tarantool

local fiber = require('fiber')
local function consumer_loop(ch, i)
    -- инициализировать потребитель синхронно или выдать ошибку()
    fiber.sleep(0) -- позволить fiber.create() продолжать
    while true do
        local data = ch:get()
        if data == nil then
            break
        end
        print('consumed', i, data)
        fiber.sleep(math.random()) -- моделировать работу
    end
end

local function producer_loop(ch, i)
    -- инициализировать потребитель синхронно или выдать ошибку()
    fiber.sleep(0) -- allow fiber.create() to continue
    while true do
        local data = math.random()
        ch:put(data)
        print('produced', i, data)
    end
end

local function start()
    local consumer_n = 5
    local producer_n = 3

    -- создать канал
    local ch_size = math.max(consumer_n, producer_n)
    local ch = fiber.channel(ch_size)

    -- запустить потребители
    for i=1, consumer_n,1 do
        fiber.create(consumer_loop, ch, i)
    end

    -- запустить производители
    for i=1, producer_n,1 do
        fiber.create(producer_loop, ch, i)
    end
end

start()
print('started')

Используйте socket.tcp_connect() для подключения к удаленному серверу по TCP. Можно отобразить информацию о подключении и результат запроса GET.

#!/usr/bin/env tarantool

local s = require('socket').tcp_connect('google.com', 80)
print(s:peer().host)
print(s:peer().family)
print(s:peer().type)
print(s:peer().protocol)
print(s:peer().port)
print(s:write("GET / HTTP/1.0\r\n\r\n"))
print(s:read('\r\n'))
print(s:read('\r\n'))

Используйте socket.tcp_connect() для настройки простого TCP-сервера путем создания функции, которая обрабатывает запросы и отражает их, а затем передачи функции на socket.tcp_server(). Данная программа была протестирована на 100 000 клиентов, каждый из которых получил отдельный файбер.

#!/usr/bin/env tarantool

local function handler(s, peer)
    s:write("Welcome to test server, " .. peer.host .."\n")
    while true do
        local line = s:read('\n')
        if line == nil then
            break -- ошибка или конец файла
        end
        if not s:write("pong: "..line) then
            break -- ошибка или конец файла
        end
    end
end

local server, addr = require('socket').tcp_server('localhost', 3311, handler)

Используйте socket.getaddrinfo(), чтобы провести неблокирующее разрешение имен DNS, получая как AF_INET6, так и AF_INET информацию для „google.com“. Данная техника не всегда необходима для TCP-соединений, поскольку socket.tcp_connect() выполняет socket.getaddrinfo с точки зрения внутреннего устройства до попытки соединения с первым доступным адресом.

#!/usr/bin/env tarantool

local s = require('socket').getaddrinfo('google.com', 'http', { type = 'SOCK_STREAM' })
print('host=',s[1].host)
print('family=',s[1].family)
print('type=',s[1].type)
print('protocol=',s[1].protocol)
print('port=',s[1].port)
print('host=',s[2].host)
print('family=',s[2].family)
print('type=',s[2].type)
print('protocol=',s[2].protocol)
print('port=',s[2].port)

В данный момент в Tarantool нет функции udp_server, поэтому socket_udp_echo.lua – более сложная программа, чем socket_tcp_echo.lua. Ее можно реализовать с помощью сокетов и файберов.

#!/usr/bin/env tarantool

local socket = require('socket')
local errno = require('errno')
local fiber = require('fiber')

local function udp_server_loop(s, handler)
    fiber.name("udp_server")
    while true do
        -- попытка прочитать сначала датаграмму
        local msg, peer = s:recvfrom()
        if msg == "" then
            -- сокет был закрыт с помощью s:close()
            break
        elseif msg ~= nil then
            -- получена новая датаграмма
            handler(s, peer, msg)
        else
            if s:errno() == errno.EAGAIN or s:errno() == errno.EINTR then
                -- сокет не готов
                s:readable() -- передача управления, epoll сообщит, когда будут новые данные
            else
                -- ошибка сокета
                local msg = s:error()
                s:close() -- сохранить ресурсы и не ждать сборку мусора
                error("Socket error: " .. msg)
            end
        end
    end
end

local function udp_server(host, port, handler)
    local s = socket('AF_INET', 'SOCK_DGRAM', 0)
    if not s then
        return nil -- проверить номер ошибки errno:strerror()
   end
    if not s:bind(host, port) then
        local e = s:errno() -- сохранить номер ошибки errno
        s:close()
        errno(e) -- восстановить номер ошибки errno
        return nil -- проверить номер ошибки errno:strerror()
    end

    fiber.create(udp_server_loop, s, handler) -- запустить новый файбер в фоновом режиме
    return s
end

Функция для клиента, который подключается к этому серверу, может выглядеть следующим образом:

local function handler(s, peer, msg)
    -- Необязательно ждать, пока сокет будет готов отправлять UDP
    -- s:writable()
    s:sendto(peer.host, peer.port, "Pong: " .. msg)
end

local server = udp_server('127.0.0.1', 3548, handler)
if not server then
    error('Failed to bind: ' .. errno.strerror())
end

print('Started')

require('console').start()

Используйте Модуль HTTP для получения данных по HTTP.

#!/usr/bin/env tarantool

local http_client = require('http.client')
local json = require('json')
local r = http_client.get('https://api.frankfurter.app/latest?to=USD%2CRUB')
if r.status ~= 200 then
    print('Failed to get currency ', r.reason)
    return
end
local data = json.decode(r.body)
print(data.base, 'rate of', data.date, 'is', data.rates.RUB, 'RUB or', data.rates.USD, 'USD')

Используйте Модуль HTTP для отправки данных по HTTP.

#!/usr/bin/env tarantool

local http_client = require('http.client')
local json = require('json')
local data = json.encode({ Key = 'Value'})
local headers = { Token = 'xxxx', ['X-Secret-Value'] = '42' }
local r = http_client.post('http://localhost:8081', data, { headers = headers})
if r.status == 200 then
    print 'Success'
end

Используйте сторонний модуль http (который необходимо предварительно установить), чтобы превратить Tarantool в веб-сервер.

#!/usr/bin/env tarantool

local function handler(self)
    return self:render{ json = { ['Your-IP-Is'] = self.peer.host } }
end

local server = require('http.server').new(nil, 8080, {charset = "utf8"}) -- прослушивание *:8080
server:route({ path = '/' }, handler)
server:start()
-- подключение к localhost:8080 и просмотр json

Use the http rock (which must first be installed) to generate HTML pages from templates. The http rock has a fairly simple template engine which allows execution of regular Lua code inside text blocks (like PHP). Therefore there is no need to learn new languages in order to write templates.

#!/usr/bin/env tarantool

local function handler(self)
local fruits = { 'Apple', 'Orange', 'Grapefruit', 'Banana'}
    return self:render{ fruits = fruits }
end

local server = require('http.server').new(nil, 8080, {charset = "utf8"}) -- nil означает '*'
server:route({ path = '/', file = 'index.html.lua' }, handler)
server:start()

HTML-файл для этого сервера, включая Lua, может выглядеть следующим образом (будет выведено «1 Apple | 2 Orange | 3 Grapefruit | 4 Banana»). Создайте директорию templates и поместите в неё файл:

<html>
<body>
    <table border="1">
        % for i,v in pairs(fruits) do
        <tr>
            <td><%= i %></td>
            <td><%= v %></td>
        </tr>
        % end
    </table>
</body>
</html>

На языке Go выборка содержимого всего спейса не является тривиальной задачей, которая решается в одну строчку. Ниже мы приводим пример программы, которая осуществляет полную выборку из спейса „tester“. Эту программу нужно вызвать на том экземпляре, с которым вы собираетесь установить соединение через Go-коннектор.

package main

import (
	"fmt"
	"log"

	"github.com/tarantool/go-tarantool"
)

/*
box.cfg{listen = 3301}
box.schema.user.passwd('pass')

s = box.schema.space.create('tester')
s:format({
    {name = 'id', type = 'unsigned'},
    {name = 'band_name', type = 'string'},
    {name = 'year', type = 'unsigned'}
})
s:create_index('primary', { type = 'hash', parts = {'id'} })
s:create_index('scanner', { type = 'tree', parts = {'id', 'band_name'} })

s:insert{1, 'Roxette', 1986}
s:insert{2, 'Scorpions', 2015}
s:insert{3, 'Ace of Base', 1993}
*/

func main() {
	conn, err := tarantool.Connect("127.0.0.1:3301", tarantool.Opts{
		User: "admin",
		Pass: "pass",
	})

	if err != nil {
		log.Fatalf("Connection refused")
	}
	defer conn.Close()

	spaceName := "tester"
	indexName := "scanner"
	idFn := conn.Schema.Spaces[spaceName].Fields["id"].Id
	bandNameFn := conn.Schema.Spaces[spaceName].Fields["band_name"].Id

	var tuplesPerRequest uint32 = 2
	cursor := []interface{}{}

	for {
		resp, err := conn.Select(spaceName, indexName, 0, tuplesPerRequest, tarantool.IterGt, cursor)
		if err != nil {
			log.Fatalf("Failed to select: %s", err)
		}

		if resp.Code != tarantool.OkCode {
			log.Fatalf("Select failed: %s", resp.Error)
		}

		if len(resp.Data) == 0 {
			break
		}

		fmt.Println("Iteration")

		tuples := resp.Tuples()
		for _, tuple := range tuples {
			fmt.Printf("\t%v\n", tuple)
		}

		lastTuple := tuples[len(tuples)-1]
		cursor = []interface{}{lastTuple[idFn], lastTuple[bandNameFn]}
	}
}

Практические задания на Lua

Если вы только осваиваете Lua, рекомендуем выполнить практическое задание, встроенное в Tarantool. Чтобы начать работу с этим заданием, выполните команду tutorial() в консоли Tarantool:

tarantool> tutorial()
---
- |
 Tutorial -- Screen #1 -- Hello, Moon
 ====================================

 Welcome to the Tarantool tutorial.
 It will introduce you to Tarantool’s Lua application server
 and database server, which is what’s running what you’re seeing.
 This is INTERACTIVE -- you’re expected to enter requests
 based on the suggestions or examples in the screen’s text.
 <...>

Задание по данному практикуму: “Вставьте 1 миллион кортежей. В каждом кортеже должно быть поле, которое соответствует ключу в первичном индексе, в виде постоянно возрастающего числа, а также поле в виде буквенной строки со случайным значением из 10 символов.”

Цель данного упражнения состоит в том, чтобы показать, как выглядят Lua-функции в Tarantool. Необходимо будет работать с математической библиотекой Lua, библиотекой для работы со строками интерпретатора Lua, Tarantool-библиотекой box, Tarantool-библиотекой box.tuple, циклами и конкатенацией. Инструкции легко будет выполнять даже тем, кто никогда не использовал раньше Lua или Tarantool. Единственное требование – знание того, как работают другие языки программирования, и изучение первых двух глав данного руководства. Но для лучшего понимания можно следовать по комментариям и ссылкам на руководство по Lua или другим пунктам в данном руководстве по Tarantool. А чтобы облегчить изучение, читайте инструкции параллельно с вводом операторов в Tarantool-клиент.

Будем использовать Tarantool-песочницу, которую создавали для упражнений раздела «Руководство для начинающих». Таким образом, у нас есть один спейс и числовой ключ первичного индекса, а также экземпляр Tarantool, который также выступает в виде клиента.

В более ранних версиях Tarantool многострочные функции обрамляются символами-разделителями. Сейчас в них нет необходимости, поэтому в данном практическом задании они использоваться не будут. Однако они все еще поддерживаются. Если вы хотите использовать разделители или используете более раннюю версию Tarantool, перед работой проверьте описание синтаксиса для объявления разделителя.

Начнем с создания функции, которая возвращает заданную строку – “Hello world”.

function string_function()
  return "hello world"
end

Слово «function» (функция) – ключевое слово в языке Lua. Рассмотрим подробно работу с языком Lua. Имя функции – string_function (строковая_функция). В функции есть один исполняемый оператор, return "hello world" (вернуть «hello world»). Строка «hello world» здесь заключена в двойные кавычки, хотя в Lua это не имеет значения, можно использовать одинарные кавычки. Слово «end» означает, что “это конец объявления Lua-функции.” Чтобы проверить работу функции, можем выполнить команду

string_function()

Отправка function-name() (имя-функции) означает команду вызова Lua-функции. В результате возвращаемая функцией строка появится на экране.

Для получения подробной информации о строках в языке Lua, см. Главу 2.4 «Строки» в руководстве по языку Lua. Для получения подробной информации о функциях см. Главу 5 «Функции» в руководстве по языку Lua (chapter 5 «Functions»).

Теперь вывод на экране выглядит следующим образом:

tarantool> function string_function()
         >   return "hello world"
         > end
---
...
tarantool> string_function()
---
- hello world
...
tarantool>

Теперь у нас есть функция string_function, и можно вызвать ее с помощью другой функции.

function main_function()
  local string_value
  string_value = string_function()
  return string_value
end

Сначала объявим переменную «string_value» (значение_строки). Слово «local» (локально) означает, что string_value появится только в main_function (основная_функция). Если бы мы не использовали «local», то string_value увидели бы даже пользователи других клиентов, которые подключились к данному экземпляру! Иногда это может быть очень полезно при взаимодействии клиентов, но не в нашем случае.

Затем определим значение для string_value, а именно, результат функции string_function(). Сейчас вызовем main_function(), чтобы проверить, что значение определено.

Для получения подробной информации о переменных в языке Lua, см. Главу 4.2 «Локальные переменные и блоки» в руководстве по языку Lua (chapter 4.2 «Local Variables and Blocks»).

Теперь вывод на экране выглядит следующим образом:

tarantool> function main_function()
         >   local string_value
         >   string_value = string_function()
         >   return string_value
         > end
---
...
tarantool> main_function()
---
- hello world
...
tarantool>

Сейчас стало понятно, как задавать переменную, поэтому можно изменить функцию string_function() так, чтобы вместо возврата заданной фразы «Hello world», она возвращала случайным образом выбранную букву от „A“ до „Z“.

function string_function()
  local random_number
  local random_string
  random_number = math.random(65, 90)
  random_string = string.char(random_number)
  return random_string
end

Нет необходимости стирать содержание старой функции string_function(), оно просто перезаписывается. Первый оператор вызывает функцию из математической библиотеки Lua, которая возвращает случайное число; параметры означают, что число должно быть целым от 65 до 90. Второй оператор вызывает функцию из библиотеки Lua для работы со строками, которая преобразует число в символ; параметр представляет собой кодовую точку символа. К счастью, в кодировке ASCII символу „A“ соответствует значение 65, а „Z“ – 90, так что в результате всегда получим букву от A до Z.

For more about Lua math-library functions see Lua users «Math Library Tutorial». For more about Lua string-library functions see Lua users «String Library Tutorial» .

И снова функцию string_function() можно вызвать из main_function(), которую можно вызвать с помощью main_function().

Теперь вывод на экране выглядит следующим образом:

tarantool> function string_function()
         >   local random_number
         >   local random_string
         >   random_number = math.random(65, 90)
         >   random_string = string.char(random_number)
         >   return random_string
         > end
---
...
tarantool> main_function()
---
- C
...
tarantool>

… На самом деле, вывод не всегда будет именно таким, поскольку функция math.random() вызывает случайные числа. Но для наглядности случайные значения в строке не важны.

Сейчас стало понятно, как вызывать строки из одной случайной буквы, поэтому можно перейти к нашей цели – возврату строки из десяти букв с помощью конкатенации десяти строк из одной случайной буквы в цикле.

function string_function()
  local random_number
  local random_string
  random_string = ""
  for x = 1,10,1 do
    random_number = math.random(65, 90)
    random_string = random_string .. string.char(random_number)
  end
  return random_string
end

Слова «for x = 1,10,1» означают: “начать с x, равного 1, зацикливать до тех пор, пока x не будет равен 10, увеличивать x на 1 на каждом шаге цикла”. Символ «..» означает «конкатенацию», то есть добавление строки справа от знака «..» к строке слева от знака «..». Поскольку в начале определяется, что random_string (случайная_строка) представляет собой «» (пустую строку), в результате получим, что в random_string 10 случайных букв. И снова функцию string_function() можно вызвать из main_function(), которую можно вызвать с помощью main_function().

Для получения подробной информации о циклах в языке Lua, см. Главу 4.3.4 «Числовой оператор for» в руководстве по языку Lua (chapter 4.3.4 «Numeric for»).

Теперь вывод на экране выглядит следующим образом:

tarantool> function string_function()
         >   local random_number
         >   local random_string
         >   random_string = ""
         >   for x = 1,10,1 do
         >     random_number = math.random(65, 90)
         >     random_string = random_string .. string.char(random_number)
         >   end
         >   return random_string
         > end
---
...
tarantool> main_function()
---
- 'ZUDJBHKEFM'
...
tarantool>

Сейчас стало понятно, как создать строку из 10 случайных букв, поэтому можно создать кортеж, который будет содержать число и строку из 10 случайных букв, с помощью функции в Tarantool-библиотеке Lua-функций.

function main_function()
  local string_value, t
  string_value = string_function()
  t = box.tuple.new({1, string_value})
  return t
end

После этого, «t» будет представлять собой значение нового кортежа с двумя полями. Первое поле является числовым: «1». Второе поле представляет собой случайную строку. И снова функцию string_function() можно вызвать из main_function(), которую можно вызвать с помощью main_function().

Для получения подробной информации о кортежах в Tarantool, см. раздел Вложенный модуль box.tuple руководства по Tarantool.

Теперь вывод на экране выглядит следующим образом:

tarantool> function main_function()
         > local string_value, t
         > string_value = string_function()
         > t = box.tuple.new({1, string_value})
         > return t
         > end
---
...
tarantool> main_function()
---
- [1, 'PNPZPCOOKA']
...
tarantool>

Сейчас стало понятно, как создавать кортеж, который содержит число и строку из десяти случайных букв, поэтому осталось только поместить этот кортеж в спейс tester. Следует отметить, что tester – это первый спейс, определенный в песочнице, поэтому он представляет собой таблицу в базе данных.

function main_function()
  local string_value, t
  string_value = string_function()
  t = box.tuple.new({1,string_value})
  box.space.tester:replace(t)
end

Здесь новая строка – box.space.tester:replace(t). Имя содержит слово „tester“, потому что вставка будет осуществляться в спейс tester. Второй параметр представляет собой значение в кортеже. Для абсолютной точности мы могли ввести команду box.space.tester:insert(t), а не box.space.tester:replace(t), но слово «replace» (заменить) означает “вставить, даже если уже существует кортеж, у которого значение первичного ключа совпадает”, и это облегчит повтор упражнения, даже если песочница не пуста. После того, как это будет выполнено, спейс tester будет содержать кортеж с двумя полями. Первое поле будет 1. Второе поле будет представлять собой строку из десяти случайных букв. И снова функцию string_function() можно вызвать из main_function(), которую можно вызвать с помощью main_function(). Но функция main_function() не может полностью отразить ситуацию, поскольку она не возвращает t, она только размещает t в базе данных. Чтобы убедиться, что произошла вставка, используем SELECT-запрос.

main_function()
         box.space.tester:select{1}

Для получения подробной информации о вызовах insert и replace в Tarantool, см. разделы Вложенный модуль box.space, space_object:insert() и space_object:replace() руководства по Tarantool.

Теперь вывод на экране выглядит следующим образом:

tarantool> function main_function()
         >   local string_value, t
         >   string_value = string_function()
         >   t = box.tuple.new({1,string_value})
         >   box.space.tester:replace(t)
         > end
---
...
tarantool> main_function()
---
...
tarantool> box.space.tester:select{1}
---
- - [1, 'EUJYVEECIL']
...
tarantool>

Сейчас стало понятно, как вставить кортеж в базу данных, поэтому несложно догадаться, как можно увеличить масштаб: вместо того, чтобы вставлять значение 1 для первичного ключа, вставьте значение переменной от 1 до миллиона в цикле. Поскольку уже рассматривалось, как заводить цикл, это будет несложно. Мы лишь добавим небольшой штрих – функцию распределения во времени.

function main_function()
  local string_value, t
  for i = 1,1000000,1 do
    string_value = string_function()
    t = box.tuple.new({i,string_value})
    box.space.tester:replace(t)
  end
end
start_time = os.clock()
main_function()
end_time = os.clock()
'insert done in ' .. end_time - start_time .. ' seconds'

Стандартная Lua-функция os.clock() вернет время ЦП в секундах с момента начала программы. Таким образом, выводя start_time = number of seconds (время_начала = число секунд) прямо перед вставкой, а затем выводя end_time = number of seconds (время_окончания = число секунд) сразу после вставки, можно рассчитать (время_окончания - время_начала) = затраченное время в секундах. Отобразим это значение путем ввода в запрос без операторов, что приведет к тому, что Tarantool отправит значение на клиент, который выведет это значение. (Ответ Lua на C-функцию printf(), а именно print(), также сработает.)

Для получения подробной информации о функции os.clock() см. Главу 22.1 «Дата и время» в руководстве по языку Lua (chapter 22.1 «Date and Time»). Для получения подробной информации о функции print() см. Главу 5 «Функции» в руководстве по языку Lua (chapter 5 «Functions»).

И поскольку наступает кульминация – повторно введем окончательные варианты всех необходимых запросов: запрос, который создает string_function(), запрос, который создает main_function(), и запрос, который вызывает main_function().

function string_function()
           local random_number
           local random_string
           random_string = ""
           for x = 1,10,1 do
             random_number = math.random(65, 90)
             random_string = random_string .. string.char(random_number)
           end
           return random_string
         end

         function main_function()
           local string_value, t
           for i = 1,1000000,1 do
             string_value = string_function()
             t = box.tuple.new({i,string_value})
             box.space.tester:replace(t)
           end
         end
         start_time = os.clock()
         main_function()
         end_time = os.clock()
         'insert done in ' .. end_time - start_time .. ' seconds'

Теперь вывод на экране выглядит следующим образом:

tarantool> function string_function()
         >   local random_number
         >   local random_string
         >   random_string = ""
         >   for x = 1,10,1 do
         >     random_number = math.random(65, 90)
         >     random_string = random_string .. string.char(random_number)
         >   end
         >   return random_string
         > end
---
...
tarantool> function main_function()
         >   local string_value, t
         >   for i = 1,1000000,1 do
         >     string_value = string_function()
         >     t = box.tuple.new({i,string_value})
         >     box.space.tester:replace(t)
         >   end
         > end
---
...
tarantool> start_time = os.clock()
---
...
tarantool> main_function()
---
...
tarantool> end_time = os.clock()
---
...
tarantool> 'insert done in ' .. end_time - start_time .. ' seconds'
---
- insert done in 37.62 seconds
...
tarantool>

Итак, мы доказали, что возможности Lua-функций довольно многообразны (на самом деле, с помощью хранимых процедур на языке Lua в Tarantool можно сделать больше, чем с помощью хранимых процедур в некоторых SQL СУБД), и несложно комбинировать функции Lua-библиотек и функции Tarantool-библиотек.

Также мы показали, что вставка миллиона кортежей заняла 37 секунд. Хостом выступил ноутбук с ОС Linux. А изменив значение wal_mode на „none“ перед запуском теста, можно уменьшить затраченное время до 4 секунд.

Задание по данному практикуму: “Предположим, что в каждом кортеже есть строка в формате JSON. В каждой строке есть числовое поле формата JSON. Для каждого кортежа необходимо найти значение числового поля и прибавить его к переменной „sum“ (сумма). В конце функция должна вернуть переменную „sum“.” Цель данного упражнения – получить опыт в прочтении и обработке кортежей одновременно.

 1json = require('json')
 2function sum_json_field(field_name)
 3  local v, t, sum, field_value, is_valid_json, lua_table
 4  sum = 0
 5  for v, t in box.space.tester:pairs() do
 6    is_valid_json, lua_table = pcall(json.decode, t[2])
 7    if is_valid_json then
 8      field_value = lua_table[field_name]
 9      if type(field_value) == "number" then sum = sum + field_value end
10    end
11  end
12  return sum
13end

СТРОКА 3: ЗАЧЕМ НУЖЕН «LOCAL». Эта строка объявляет все переменные, которые будут использоваться в функции. На самом деле, нет необходимости в начале объявлять все переменные, а в длинной функции лучше объявить переменные прямо перед их использованием. Фактически объявлять переменные вообще необязательно, но необъявленная переменная будет «глобальной». Это представляется нежелательным для всех переменных, объявленных в строке 1, поскольку все они используются только в рамках функции.

СТРОКА 5: ЗАЧЕМ НУЖЕН «PAIRS()». Наша задача – пройти по всем строкам, что можно сделать двумя способами: с помощью box.space.space_object:pairs() или с помощью variable = select(...) с указанием for i, n, 1 do некая-функция(variable[i]) end. Для данного примера мы предпочли использовать pairs().

СТРОКА 5: НАЧАЛО ОСНОВНОГО ЦИКЛА. Всё внутри цикла «for» будет повторяться до тех пор, пока не кончатся индекс-ключи. На полученный кортеж можно сослаться с помощью переменной t.

LINE 6: WHY «PCALL». If we simply said lua_table = json.decode(t[2])), then the function would abort with an error if it encountered something wrong with the JSON string - a missing colon, for example. By putting the function inside «pcall» (protected call), we’re saying: we want to intercept that sort of error, so if there’s a problem just set is_valid_json = false and we will know what to do about it later.

СТРОКА 6: ЗНАЧЕНИЕ. Функция json.decode означает декодирование JSON-строки, а параметр t[2] представляет собой ссылку на JSON-строку. Здесь есть заранее заданные значения, а мы предполагаем, что JSON-строка была вставлена во второе поле кортежа. Например, предположим, что кортеж выглядит следующим образом:

field[1]: 444
field[2]: '{"Hello": "world", "Quantity": 15}'

что означает, что первое поле кортежа, первичное поле, представляет собой число, а второе поле кортежа, JSON-строка, является строкой. Таким образом, значение оператора будет следующим: «декодировать t[2] (второе поле кортежа) как JSON-строку; если обнаружится ошибка, то указать is_valid_json = false; если ошибок нет, указать is_valid_json = true и lua_table = Lua-таблица, в которой находится декодированная строка».

СТРОКА 8. Наконец, мы готовы получить значение JSON-поля из Lua-таблицы, взятое из JSON-строки. Значение в field_name (имя_поля), которое является параметром всей функции, должно представлять собой JSON-поле. Например, в JSON-строке '{"Hello": "world", "Quantity": 15}' есть два JSON-поля: «Hello» и «Quantity». Если вся функция вызывается с помощью sum_json_field("Quantity"), тогда field_value = lua_table[field_name] (значение_поля = Lua_таблица[имя_поля]) по сути аналогично field_value = lua_table["Quantity"] или даже field_value = lua_table.Quantity. Итак, этими тремя способами можно ввести следующую команду: получить значение поля Quantity в Lua-таблице и поместить его в переменную field_value.

СТРОКА 9: ЗАЧЕМ НУЖЕН «IF». Предположим, что JSON-строка не содержит синтаксических ошибок, но JSON-поле не является числовым или вовсе отсутствует. В таком случае выполнение функции прервется при попытке прибавить значение к сумме. Если сначала проверить, type(field_value) == "number" (тип(значение_поля) == «число»), можно избежать прерывания функции. Если вы уверены, что база данных в идеальном состоянии, этот шаг можно пропустить.

И функция готова. Пора протестировать ее. Начинаем с пустой базы данных так же, как с песочницы в упражнения в «Руководстве для начинающих»,

-- если спейс tester остался от предыдущего задания, удалите его
box.space.tester:drop()
box.schema.space.create('tester')
box.space.tester:create_index('primary', {parts = {1, 'unsigned'}})

затем добавим несколько кортежей, где первое поле является числовым, а второе поле представляет собой строку.

box.space.tester:insert{444, '{"Item": "widget", "Quantity": 15}'}
box.space.tester:insert{445, '{"Item": "widget", "Quantity": 7}'}
box.space.tester:insert{446, '{"Item": "golf club", "Quantity": "sunshine"}'}
box.space.tester:insert{447, '{"Item": "waffle iron", "Quantit": 3}'}

Для целей практики здесь допущены ошибки. В «golf club» и «waffle iron» поля Quantity не являются числовыми, поэтому будут игнорироваться. Таким образом, итоговая сумма для полей Quantity в JSON-строках должна быть следующей: 15 + 7 = 22.

Вызовите функцию с помощью sum_json_field("Quantity").

tarantool> sum_json_field("Quantity")
---
- 22
...

It works. We’ll just leave, as exercises for future improvement, the possibility that the «hard coding» assumptions could be removed, that there might have to be an overflow check if some field values are huge, and that the function should contain a yield instruction if the count of tuples is huge.

Tips on Lua syntax

The Lua syntax for data-manipulation functions can vary. Here are examples of the variations with select() requests. The same rules exist for the other data-manipulation functions.

Every one of the examples does the same thing: select a tuple set from a space named „tester“ where the primary-key field value equals 1. For these examples, we assume that the numeric id of „tester“ is 512, which happens to be the case in our sandbox example only.

First, there are three object reference variations:

-- #1 module . submodule . name
tarantool> box.space.tester:select{1}
-- #2 replace name with a literal in square brackets
tarantool> box.space['tester']:select{1}
-- #3 use a variable for the entire object reference
tarantool> s = box.space.tester
tarantool> s:select{1}

Examples in this manual usually have the «box.space.tester:» form (#1). However, this is a matter of user preference and all the variations exist in the wild.

Also, descriptions in this manual use the syntax «space_object:» for references to objects which are spaces, and «index_object:» for references to objects which are indexes (for example box.space.tester.index.primary:).

Then, there are seven parameter variations:

-- #1
tarantool> box.space.tester:select{1}
-- #2
tarantool> box.space.tester:select({1})
-- #3
tarantool> box.space.tester:select(1)
-- #4
tarantool> box.space.tester.select(box.space.tester,1)
-- #5
tarantool> box.space.tester:select({1},{iterator='EQ'})
-- #6
tarantool> variable = 1
tarantool> box.space.tester:select{variable}
-- #7
tarantool> variable = {1}
tarantool> box.space.tester:select(variable)

Lua allows to omit parentheses () when invoking a function if its only argument is a Lua table, and we use it sometimes in our examples. This is why select{1} is equivalent to select({1}). Literal values such as 1 (a scalar value) or {1} (a Lua table value) may be replaced by variable names, as in examples #6 and #7.

Although there are special cases where braces can be omitted, they are preferable because they signal «Lua table». Examples and descriptions in this manual have the {1} form. However, this too is a matter of user preference and all the variations exist in the wild.

Database objects have loose rules for names: the maximum length is 65000 bytes (not characters), and almost any legal Unicode character is allowed, including spaces, ideograms and punctuation.

In those cases, to prevent confusion with Lua operators and separators, object references should have the literal-in-square-brackets form (#2), or the variable form (#3). For example:

tarantool> box.space['1*A']:select{1}
tarantool> s = box.space['1*A !@$%^&*()_+12345678901234567890']
tarantool> s:select{1}

Disallowed:

Not recommended: characters which cannot be displayed.

Names are «case sensitive», so „A“ and „a“ are not the same.

Enterprise modules

This section covers open and closed source Lua modules for Tarantool Enterprise Edition included in the distribution as an offline rocks repository.

To use a module, install the following:

  1. All the necessary third-party software packages (if any). See the module’s prerequisites for the list.

  2. The module itself on every Tarantool instance:

    $ tt rocks install MODULE_NAME [MODULE_VERSION]
    

See the tt rocks reference to learn more about managing Lua modules.

Создание приложения

Further we walk you through key programming practices that will give you a good start in writing Lua applications for Tarantool. We will implement a real microservice based on Tarantool! It is a backend for a simplified version of Pokémon Go, a location-based augmented reality game launched in mid-2016.

In this game, players use the GPS capability of a mobile device to locate, catch, battle, and train virtual monsters called «pokémon» that appear on the screen as if they were in the same real-world location as the player.

To stay within the walk-through format, let’s narrow the original gameplay as follows. We have a map with pokémon spawn locations. Next, we have multiple players who can send catch-a-pokémon requests to the server (which runs our Tarantool microservice). The server responds whether the pokémon is caught or not, increases the player’s pokémon counter if yes, and triggers the respawn-a-pokémon method that spawns a new pokémon at the same location in a while.

We leave client-side applications outside the scope of this story. However, we promise a mini-demo in the end to simulate real users and give us some fun.

../../../_images/aster.svg

Follow these topics to implement our application:

Modules, rocks and applications

To make our game logic available to other developers and Lua applications, let’s put it into a Lua module.

A module (called «rock» in Lua) is an optional library which enhances Tarantool functionality. So, we can install our logic as a module in Tarantool and use it from any Tarantool application or module. Like applications, modules in Tarantool can be written in Lua (rocks), C or C++.

Modules are good for two things:

Technically, a module is a file with source code that exports its functions in an API. For example, here is a Lua module named mymodule.lua that exports one function named myfun:

local exports = {}
exports.myfun = function(input_string)
   print('Hello', input_string)
end
return exports

To launch the function myfun() – from another module, from a Lua application, or from Tarantool itself, – we need to save this module as a file, then load this module with the require() directive and call the exported function.

For example, here’s a Lua application that uses myfun() function from mymodule.lua module:

-- loading the module
local mymodule = require('mymodule')

-- calling myfun() from within test() function
local test = function()
  mymodule.myfun()
end

A thing to remember here is that the require() directive takes load paths to Lua modules from the package.path variable. This is a semicolon-separated string, where a question mark is used to interpolate the module name. By default, this variable contains system-wide Lua paths and the working directory. But if we put our modules inside a specific folder (e.g. scripts/), we need to add this folder to package.path before any calls to require():

package.path = 'scripts/?.lua;' .. package.path

For our microservice, a simple and convenient solution would be to put all methods in a Lua module (say pokemon.lua) and to write a Lua application (say game.lua) that initializes the gaming environment and starts the game loop.

../../../../_images/aster1.svg

Now let’s get down to implementation details. In our game, we need three entities:

We’ll store these entities as tuples in Tarantool spaces. But to deliver our backend application as a microservice, the good practice would be to send/receive our data in the universal JSON format, thus using Tarantool as a document storage.

Avro schemas

To store JSON data as tuples, we will apply a savvy practice which reduces data footprint and ensures all stored documents are valid. We will use Tarantool module avro-schema which checks the schema of a JSON document and converts it to a Tarantool tuple. The tuple will contain only field values, and thus take a lot less space than the original document. In avro-schema terms, converting JSON documents to tuples is «flattening», and restoring the original documents is «unflattening».

First you need to install the module with tt rocks install avro-schema.

Further usage is quite straightforward:

  1. For each entity, we need to define a schema in Apache Avro schema syntax, where we list the entity’s fields with their names and Avro data types.
  2. At initialization, we call avro-schema.create() that creates objects in memory for all schema entities, and compile() that generates flatten/unflatten methods for each entity.
  3. Further on, we just call flatten/unflatten methods for a respective entity on receiving/sending the entity’s data.

Here’s what our schema definitions for the player and pokémon entities look like:

local schema = {
    player = {
        type="record",
        name="player_schema",
        fields={
            {name="id", type="long"},
            {name="name", type="string"},
            {
                name="location",
                type= {
                    type="record",
                    name="player_location",
                    fields={
                        {name="x", type="double"},
                        {name="y", type="double"}
                    }
                }
            }
        }
    },
    pokemon = {
        type="record",
        name="pokemon_schema",
        fields={
            {name="id", type="long"},
            {name="status", type="string"},
            {name="name", type="string"},
            {name="chance", type="double"},
            {
                name="location",
                type= {
                    type="record",
                    name="pokemon_location",
                    fields={
                        {name="x", type="double"},
                        {name="y", type="double"}
                    }
                }
            }
        }
    }
}

And here’s how we create and compile our entities at initialization:

-- load avro-schema module with require()
local avro = require('avro_schema')

-- create models
local ok_m, pokemon = avro.create(schema.pokemon)
local ok_p, player = avro.create(schema.player)
if ok_m and ok_p then
    -- compile models
    local ok_cm, compiled_pokemon = avro.compile(pokemon)
    local ok_cp, compiled_player = avro.compile(player)
    if ok_cm and ok_cp then
        -- start the game
        <...>
    else
        log.error('Schema compilation failed')
    end
else
    log.info('Schema creation failed')
end
return false

As for the map entity, it would be an overkill to introduce a schema for it, because we have only one map in the game, it has very few fields, and – which is most important – we use the map only inside our logic, never exposing it to external users.

../../../../_images/aster1.svg

Next, we need methods to implement the game logic. To simulate object-oriented programming in our Lua code, let’s store all Lua functions and shared variables in a single local variable (let’s name it as game). This will allow us to address functions or variables from within our module as self.func_name or self.var_name. Like this:

local game = {
    -- a local variable
    num_players = 0,

    -- a method that prints a local variable
    hello = function(self)
      print('Hello! Your player number is ' .. self.num_players .. '.')
    end,

    -- a method that calls another method and returns a local variable
    sign_in = function(self)
      self.num_players = self.num_players + 1
      self:hello()
      return self.num_players
    end
}

In OOP terms, we can now regard local variables inside game as object fields, and local functions as object methods.

Примечание

In this manual, Lua examples use local variables. Use global variables with caution, since the module’s users may be unaware of them.

To enable/disable the use of undeclared global variables in your Lua code, use Tarantool’s strict module.

So, our game module will have the following methods:

Besides, it would be convenient to have methods for working with Tarantool storage. For example:

We’ll need these two methods primarily when initializing our game, but we can also call them later, for example to test our code.

Bootstrapping a database

Let’s discuss game initialization. In start() method, we need to populate Tarantool spaces with pokémon data. Why not keep all game data in memory? Why use a database? The answer is: persistence. Without a database, we risk losing data on power outage, for example. But if we store our data in an in-memory database, Tarantool takes care to persist it on disk whenever it’s changed. This gives us one more benefit: quick startup in case of failure. Tarantool has a smart algorithm that quickly loads all data from disk into memory on startup, so the warm-up takes little time.

We’ll be using functions from Tarantool built-in box module:

Notice the parts = argument in the index specification. The pokémon ID is the first field in a Tarantool tuple since it’s the first member of the respective Avro type. So does the pokémon status. The actual JSON document may have ID or status fields at any position of the JSON map.

The implementation of start() method looks like this:

-- create game object
start = function(self)
    -- create spaces and indexes
    box.once('init', function()
        box.schema.create_space('pokemons')
        box.space.pokemons:create_index(
            "primary", {type = 'hash', parts = {1, 'unsigned'}}
        )
        box.space.pokemons:create_index(
            "status", {type = "tree", parts = {2, 'str'}}
        )
    end)

    -- create models
    local ok_m, pokemon = avro.create(schema.pokemon)
    local ok_p, player = avro.create(schema.player)
    if ok_m and ok_p then
        -- compile models
        local ok_cm, compiled_pokemon = avro.compile(pokemon)
        local ok_cp, compiled_player = avro.compile(player)
        if ok_cm and ok_cp then
            -- start the game
            <...>
        else
            log.error('Schema compilation failed')
        end
    else
        log.info('Schema creation failed')
    end
    return false
end

GIS

Now let’s discuss catch(), which is the main method in our gaming logic.

Here we receive the player’s coordinates and the target pokémon’s ID number, and we need to answer whether the player has actually caught the pokémon or not (remember that each pokémon has a chance to escape).

First thing, we validate the received player data against its Avro schema. And we check whether such a pokémon exists in our database and is displayed on the map (the pokémon must have the active status):

catch = function(self, pokemon_id, player)
    -- check player data
    local ok, tuple = self.player_model.flatten(player)
    if not ok then
        return false
    end
    -- get pokemon data
    local p_tuple = box.space.pokemons:get(pokemon_id)
    if p_tuple == nil then
        return false
    end
    local ok, pokemon = self.pokemon_model.unflatten(p_tuple)
    if not ok then
        return false
    end
    if pokemon.status ~= self.state.ACTIVE then
        return false
    end
    -- more catch logic to follow
    <...>
end

Next, we calculate the answer: caught or not.

To work with geographical coordinates, we use Tarantool gis module.

To keep things simple, we don’t load any specific map, assuming that we deal with a world map. And we do not validate incoming coordinates, assuming again that all received locations are within the planet Earth.

We use two geo-specific variables:

Both these systems are listed in the EPSG Geodetic Parameter Registry, where each system has a unique number. In our code, we assign these listing numbers to respective variables:

wgs84 = 4326,
nationalmap = 2163,

For our game logic, we need one more variable, catch_distance, which defines how close a player must get to a pokémon before trying to catch it. Let’s set the distance to 100 meters.

catch_distance = 100,

Now we’re ready to calculate the answer. We need to project the current location of both player (p_pos) and pokémon (m_pos) on the map, check whether the player is close enough to the pokémon (using catch_distance), and calculate whether the player has caught the pokémon (here we generate some random value and let the pokémon escape if the random value happens to be less than 100 minus pokémon’s chance value):

-- project locations
local m_pos = gis.Point(
    {pokemon.location.x, pokemon.location.y}, self.wgs84
):transform(self.nationalmap)
local p_pos = gis.Point(
    {player.location.x, player.location.y}, self.wgs84
):transform(self.nationalmap)

-- check catch distance condition
if p_pos:distance(m_pos) > self.catch_distance then
    return false
end
-- try to catch pokemon
local caught = math.random(100) >= 100 - pokemon.chance
if caught then
    -- update and notify on success
    box.space.pokemons:update(
        pokemon_id, {{'=', self.STATUS, self.state.CAUGHT}}
    )
    self:notify(player, pokemon)
end
return caught

Index iterators

By our gameplay, all caught pokémons are returned back to the map. We do this for all pokémons on the map every 60 seconds using respawn() method. We iterate through pokémons by status using Tarantool index iterator function index_object:pairs() and reset the statuses of all «caught» pokémons back to «active» using box.space.pokemons:update().

respawn = function(self)
    fiber.name('Respawn fiber')
    for _, tuple in box.space.pokemons.index.status:pairs(
           self.state.CAUGHT) do
        box.space.pokemons:update(
            tuple[self.ID],
            {{'=', self.STATUS, self.state.ACTIVE}}
        )
    end
 end

For readability, we introduce named fields:

ID = 1, STATUS = 2,

The complete implementation of start() now looks like this:

-- create game object
start = function(self)
    -- create spaces and indexes
    box.once('init', function()
       box.schema.create_space('pokemons')
       box.space.pokemons:create_index(
           "primary", {type = 'hash', parts = {1, 'unsigned'}}
       )
       box.space.pokemons:create_index(
           "status", {type = "tree", parts = {2, 'str'}}
       )
    end)

    -- create models
    local ok_m, pokemon = avro.create(schema.pokemon)
    local ok_p, player = avro.create(schema.player)
    if ok_m and ok_p then
        -- compile models
        local ok_cm, compiled_pokemon = avro.compile(pokemon)
        local ok_cp, compiled_player = avro.compile(player)
        if ok_cm and ok_cp then
            -- start the game
            self.pokemon_model = compiled_pokemon
            self.player_model = compiled_player
            self.respawn()
            log.info('Started')
            return true
         else
            log.error('Schema compilation failed')
         end
    else
        log.info('Schema creation failed')
    end
    return false
end

Fibers, yields and cooperative multitasking

But wait! If we launch it as shown above – self.respawn() – the function will be executed only once, just like all the other methods. But we need to execute respawn() every 60 seconds. Creating a fiber is the Tarantool way of making application logic work in the background at all times.

A fiber is a set of instructions that are executed with cooperative multitasking: the instructions contain yield signals, upon which control is passed to another fiber.

Let’s launch respawn() in a fiber to make it work in the background all the time. To do so, we’ll need to amend respawn():

respawn = function(self)
    -- let's give our fiber a name;
    -- this will produce neat output in fiber.info()
    fiber.name('Respawn fiber')
    while true do
        for _, tuple in box.space.pokemons.index.status:pairs(
                self.state.CAUGHT) do
            box.space.pokemons:update(
                tuple[self.ID],
                {{'=', self.STATUS, self.state.ACTIVE}}
            )
        end
        fiber.sleep(self.respawn_time)
    end
end

and call it as a fiber in start():

start = function(self)
    -- create spaces and indexes
        <...>
    -- create models
        <...>
    -- compile models
        <...>
    -- start the game
       self.pokemon_model = compiled_pokemon
       self.player_model = compiled_player
       fiber.create(self.respawn, self)
       log.info('Started')
    -- errors if schema creation or compilation fails
       <...>
end

Logging

One more helpful function that we used in start() was log.infо() from Tarantool log module. We also need this function in notify() to add a record to the log file on every successful catch:

-- event notification
notify = function(self, player, pokemon)
    log.info("Player '%s' caught '%s'", player.name, pokemon.name)
end

We use default Tarantool log settings, so we’ll see the log output in console when we launch our application in script mode.

../../../../_images/aster1.svg

Great! We’ve discussed all programming practices used in our Lua module (see pokemon.lua).

Now let’s prepare the test environment. As planned, we write a Lua application (see game.lua) to initialize Tarantool’s database module, initialize our game, call the game loop and simulate a couple of player requests.

To launch our microservice, we put both the pokemon.lua module and the game.lua application in the current directory, install all external modules, and launch the Tarantool instance running our game.lua application (this example is for Ubuntu):

$ ls
game.lua  pokemon.lua
$ sudo apt-get install tarantool-gis
$ sudo apt-get install tarantool-avro-schema
$ tarantool game.lua

Tarantool starts and initializes the database. Then Tarantool executes the demo logic from game.lua: adds a pokémon named Pikachu (its chance to be caught is very high, 99.1), displays the current map (it contains one active pokémon, Pikachu) and processes catch requests from two players. Player1 is located just near the lonely Pikachu pokémon and Player2 is located far away from it. As expected, the catch results in this output are «true» for Player1 and «false» for Player2. Finally, Tarantool displays the current map which is empty, because Pikachu is caught and temporarily inactive:

$ tarantool game.lua
2017-01-09 20:19:24.605 [6282] main/101/game.lua C> version 1.7.3-43-gf5fa1e1
2017-01-09 20:19:24.605 [6282] main/101/game.lua C> log level 5
2017-01-09 20:19:24.605 [6282] main/101/game.lua I> mapping 1073741824 bytes for tuple arena...
2017-01-09 20:19:24.609 [6282] main/101/game.lua I> initializing an empty data directory
2017-01-09 20:19:24.634 [6282] snapshot/101/main I> saving snapshot `./00000000000000000000.snap.inprogress'
2017-01-09 20:19:24.635 [6282] snapshot/101/main I> done
2017-01-09 20:19:24.641 [6282] main/101/game.lua I> ready to accept requests
2017-01-09 20:19:24.786 [6282] main/101/game.lua I> Started
---
- {'id': 1, 'status': 'active', 'location': {'y': 2, 'x': 1}, 'name': 'Pikachu', 'chance': 99.1}
...

2017-01-09 20:19:24.789 [6282] main/101/game.lua I> Player 'Player1' caught 'Pikachu'
true
false
--- []
...

2017-01-09 20:19:24.789 [6282] main C> entering the event loop

nginx

In the real life, this microservice would work over HTTP. Let’s add nginx web server to our environment and make a similar demo. But how do we make Tarantool methods callable via REST API? We use nginx with Tarantool nginx upstream module and create one more Lua script (app.lua) that exports three of our game methods – add_pokemon(), map() and catch() – as REST endpoints of the nginx upstream module:

local game = require('pokemon')
box.cfg{listen=3301}
game:start()

-- add, map and catch functions exposed to REST API
function add(request, pokemon)
    return {
        result=game:add_pokemon(pokemon)
    }
end

function map(request)
    return {
        map=game:map()
    }
end

function catch(request, pid, player)
    local id = tonumber(pid)
    if id == nil then
        return {result=false}
    end
    return {
        result=game:catch(id, player)
    }
end

An easy way to configure and launch nginx would be to create a Docker container based on a Docker image with nginx and the upstream module already installed (see http/Dockerfile). We take a standard nginx.conf, where we define an upstream with our Tarantool backend running (this is another Docker container, see details below):

upstream tnt {
      server pserver:3301 max_fails=1 fail_timeout=60s;
      keepalive 250000;
}

and add some Tarantool-specific parameters (see descriptions in the upstream module’s README file):

server {
  server_name tnt_test;

  listen 80 default deferred reuseport so_keepalive=on backlog=65535;

  location = / {
      root /usr/local/nginx/html;
  }

  location /api {
    # answers check infinity timeout
    tnt_read_timeout 60m;
    if ( $request_method = GET ) {
       tnt_method "map";
    }
    tnt_http_rest_methods get;
    tnt_http_methods all;
    tnt_multireturn_skip_count 2;
    tnt_pure_result on;
    tnt_pass_http_request on parse_args;
    tnt_pass tnt;
  }
}

Likewise, we put Tarantool server and all our game logic in a second Docker container based on the official Tarantool 1.9 image (see src/Dockerfile) and set the container’s default command to tarantool app.lua. This is the backend.

Non-blocking IO

To test the REST API, we create a new script (client.lua), which is similar to our game.lua application, but makes HTTP POST and GET requests rather than calling Lua functions:

local http = require('curl').http()
local json = require('json')
local URI = os.getenv('SERVER_URI')
local fiber = require('fiber')

local player1 = {
    name="Player1",
    id=1,
    location = {
        x=1.0001,
        y=2.0003
    }
}
local player2 = {
    name="Player2",
    id=2,
    location = {
        x=30.123,
        y=40.456
    }
}

local pokemon = {
    name="Pikachu",
    chance=99.1,
    id=1,
    status="active",
    location = {
        x=1,
        y=2
    }
}

function request(method, body, id)
    local resp = http:request(
        method, URI, body
    )
    if id ~= nil then
        print(string.format('Player %d result: %s',
            id, resp.body))
    else
        print(resp.body)
    end
end

local players = {}
function catch(player)
    fiber.sleep(math.random(5))
    print('Catch pokemon by player ' .. tostring(player.id))
    request(
        'POST', '{"method": "catch",
        "params": [1, '..json.encode(player)..']}',
        tostring(player.id)
    )
    table.insert(players, player.id)
end

print('Create pokemon')
request('POST', '{"method": "add",
    "params": ['..json.encode(pokemon)..']}')
request('GET', '')

fiber.create(catch, player1)
fiber.create(catch, player2)

-- wait for players
while #players ~= 2 do
    fiber.sleep(0.001)
end

request('GET', '')
os.exit()

When you run this script, you’ll notice that both players have equal chances to make the first attempt at catching the pokémon. In a classical Lua script, a networked call blocks the script until it’s finished, so the first catch attempt can only be done by the player who entered the game first. In Tarantool, both players play concurrently, since all modules are integrated with Tarantool cooperative multitasking and use non-blocking I/O.

Indeed, when Player1 makes its first REST call, the script doesn’t block. The fiber running catch() function on behalf of Player1 issues a non-blocking call to the operating system and yields control to the next fiber, which happens to be the fiber of Player2. Player2’s fiber does the same. When the network response is received, Player1’s fiber is activated by Tarantool cooperative scheduler, and resumes its work. All Tarantool modules use non-blocking I/O and are integrated with Tarantool cooperative scheduler. For module developers, Tarantool provides an API.

For our HTTP test, we create a third container based on the official Tarantool 1.9 image (see client/Dockerfile) and set the container’s default command to tarantool client.lua.

../../../../_images/aster1.svg

To run this test locally, download our pokemon project from GitHub and say:

$ docker-compose build
$ docker-compose up

Docker Compose builds and runs all the three containers: pserver (Tarantool backend), phttp (nginx) and pclient (demo client). You can see log messages from all these containers in the console, pclient saying that it made an HTTP request to create a pokémon, made two catch requests, requested the map (empty since the pokémon is caught and temporarily inactive) and exited:

pclient_1  | Create pokemon
<...>
pclient_1  | {"result":true}
pclient_1  | {"map":[{"id":1,"status":"active","location":{"y":2,"x":1},"name":"Pikachu","chance":99.100000}]}
pclient_1  | Catch pokemon by player 2
pclient_1  | Catch pokemon by player 1
pclient_1  | Player 1 result: {"result":true}
pclient_1  | Player 2 result: {"result":false}
pclient_1  | {"map":[]}
pokemon_pclient_1 exited with code 0

Congratulations! Here’s the end point of our walk-through. As further reading, see more about installing and contributing a module.

See also reference on Tarantool modules and C API, and don’t miss our Lua cookbook recipes.

Практическое задание на C

Tarantool может вызывать код на языке C с помощью модулей, ffi или хранимых процедур на C. В данном практическом задании рассматривается только третий метод, хранимые процедуры на языке C. На самом деле, программы всегда представляют собой функции на языке C, но исторически сложилось так, что широко используется фраза «хранимая процедура».

Данное практическое задание могут выполнить те, у кого есть пакет программ для разработки Tarantool и компилятор языка программирования C. Оно состоит из пяти задач:

  1. easy.c – выводит «hello world»;
  2. harder.c – декодирует переданное значение параметра;
  3. hardest.c – использует API для языка C для вставки в базу данных;
  4. read.c – использует API для языка C для выборки из базы данных;
  5. write.c – использует API для языка C для замены в базе данных.

По окончании задания, вы увидите описанные здесь результаты и сможете самостоятельно написать хранимые процедуры.

Проверьте наличие следующих элементов на компьютере:

  • Tarantool 2.1 or later
  • Компилятор GCC, подойдет любая современная версия
  • module.h и включенные в него файлы
  • msgpuck.h
  • libmsgpuck.a (только для некоторых последних версий msgpuck)

Файл module.h есть в системе, если Tarantool был установлен из исходных файлов. В противном случае, следует установить пакет Tarantool «developer». Например, на Ubuntu введите команду:

$ sudo apt-get install tarantool-dev

или на Fedora введите команду:

$ dnf -y install tarantool-devel

The msgpuck.h file will exist if Tarantool was installed from source. Otherwise the «msgpuck» package must be installed from https://github.com/tarantool/msgpuck.

Чтобы компилятор C увидел файлы module.h и msgpuck.h, путь к ним следует сохранить в переменной. Например, если адрес файла module.h/usr/local/include/tarantool/module.h, а адрес файла msgpuck.h/usr/local/include/msgpuck/msgpuck.h, введите команду:

$ export CPATH=/usr/local/include/tarantool:/usr/local/include/msgpuck

Статическая библиотека libmsgpuck.a нужна для версий msgpuck старше февраля 2017 года. Только в том случае, если встречаются проблемы соединения при использовании операторов GCC в примерах данного практического задания, в пути следует указывать libmsgpuck.a (libmsgpuck.a создан из исходных файлов загрузки msgpuck и Tarantool, поэтому его легко найти). Например, вместо «gcc -shared -o harder.so -fPIC harder.c» во втором примере ниже, необходимо ввести «gcc -shared -o harder.so -fPIC harder.c libmsgpuck.a».

Tarantool выполняет запросы в качестве клиента. Запустите Tarantool и введите эти запросы.

box.cfg{listen=3306}
box.schema.space.create('capi_test')
box.space.capi_test:create_index('primary')
net_box = require('net.box')
capi_connection = net_box:new(3306)

Проще говоря: создайте спейс под названием capi_test, и выполните соединение с одноименным capi_connection.

Не закрывайте клиент. Он понадобится для последующих запросов.

Запустите еще один терминал. Измените директорию (cd), чтобы она совпадала с директорией, где запущен клиент.

Create a file. Name it easy.c. Put the following code in it:

#include "module.h"
int easy(box_function_ctx_t *ctx, const char *args, const char *args_end)
{
  printf("hello world\n");
  return 0;
}
int easy2(box_function_ctx_t *ctx, const char *args, const char *args_end)
{
  printf("hello world -- easy2\n");
  return 0;
}

Скомпилируйте программу, что создаст файл библиотеки под названием easy.so:

$ gcc -shared -o easy.so -fPIC easy.c

Теперь вернитесь в клиент и выполните следующие запросы:

box.schema.func.create('easy', {language = 'C'})
box.schema.user.grant('guest', 'execute', 'function', 'easy')
capi_connection:call('easy')

Если эти запросы вам незнакомы, перечитайте описание box.schema.func.create(), box.schema.user.grant() и conn:call().

Важна функция capi_connection:call('easy').

Во-первых, она ищет функцию easy, что должно быть легко, потому что по умолчанию Tarantool ищет в текущей директории файл под названием easy.so.

Во-вторых, она вызывает функцию easy. Поскольку функция easy() в easy.c начинается с printf("hello world\n"), слова «hello world» появятся на экране.

В-третьих, она проверяет, что вызов прошел успешно. Поскольку функция easy() в easy.c оканчивается на return 0, сообщение об ошибке отсутствует, и запрос выполнен.

Результат должен выглядеть следующим образом:

tarantool> capi_connection:call('easy')
hello world
---
- []
...

Теперь вызовем другую функцию в easy.c – easy2(). Она практически совпадает с функцией easy(), но есть небольшое отличие: если имя файла не совпадет с именем функции, нужно будет указать имя-файла.имя-функции.

box.schema.func.create('easy.easy2', {language = 'C'})
box.schema.user.grant('guest', 'execute', 'function', 'easy.easy2')
capi_connection:call('easy.easy2')

… и на этот раз результатом будет: «hello world – easy2».

Вывод: вызвать C-функцию легко.

Вернитесь в терминал, где была создана программа easy.c.

Создайте файл. Назовите его harder.c. Запишите в него следующие 17 строк:

#include "module.h"
#include "msgpuck.h"
int harder(box_function_ctx_t *ctx, const char *args, const char *args_end)
{
  uint32_t arg_count = mp_decode_array(&args);
  printf("arg_count = %d\n", arg_count);
  uint32_t field_count = mp_decode_array(&args);
  printf("field_count = %d\n", field_count);
  uint32_t val;
  int i;
  for (i = 0; i < field_count; ++i)
  {
    val = mp_decode_uint(&args);
    printf("val=%d.\n", val);
  }
  return 0;
}

Скомпилируйте программу, что создаст файл библиотеки под названием harder.so:

$ gcc -shared -o harder.so -fPIC harder.c

Теперь вернитесь в клиент и выполните следующие запросы:

box.schema.func.create('harder', {language = 'C'})
box.schema.user.grant('guest', 'execute', 'function', 'harder')
passable_table = {}
table.insert(passable_table, 1)
table.insert(passable_table, 2)
table.insert(passable_table, 3)
capi_connection:call('harder', {passable_table})

На этот раз вызов передает Lua-таблицу (passable_table) в функцию harder(). Функция``harder()`` увидит это, как указано в параметре char *args.

At this point the harder() function will start using functions defined in msgpuck.h. The routines that begin with «mp» are msgpuck functions that handle data formatted according to the MsgPack specification. Passes and returns are always done with this format so one must become acquainted with msgpuck to become proficient with the C API.

Однако, пока достаточно понимать, что функция mp_decode_array() возвращает количество элементов в массиве, а функция mp_decode_uint возвращает целое число без знака из args. Есть также побочный эффект: по окончании декодирования args изменился и теперь указывает на следующий элемент.

Таким образом, первой будет отображена строка «arg_count = 1», поскольку был передан только один элемент: passable_table.
Второй будет отображена строка «field_count = 3», потому что в таблице находятся три элемента.
Следующие три строки будут «1», «2» и «3», потому что это значения элементов в таблице.

Теперь вывод на экране выглядит следующим образом:

tarantool> capi_connection:call('harder', passable_table)
arg_count = 1
field_count = 3
val=1.
val=2.
val=3.
---
- []
...

Вывод: на первый взгляд, декодирование значений параметров, переданных в C-функцию непросто, но существуют документированные процедуры для этих целей, и их не так много.

Вернитесь в терминал, где были созданы программы easy.c и harder.c.

Создайте файл. Назовите его `hardest.c. Запишите в него следующие 13 строк:

#include "module.h"
#include "msgpuck.h"
int hardest(box_function_ctx_t *ctx, const char *args, const char *args_end)
{
  uint32_t space_id = box_space_id_by_name("capi_test", strlen("capi_test"));
  char tuple[1024]; /* Must be big enough for mp_encode results */
  char *tuple_pointer = tuple;
  tuple_pointer = mp_encode_array(tuple_pointer, 2);
  tuple_pointer = mp_encode_uint(tuple_pointer, 10000);
  tuple_pointer = mp_encode_str(tuple_pointer, "String 2", 8);
  int n = box_insert(space_id, tuple, tuple_pointer, NULL);
  return n;
}

Скомпилируйте программу, что создаст файл библиотеки под названием hardest.so:

$ gcc -shared -o hardest.so -fPIC hardest.c

Теперь вернитесь в клиент и выполните следующие запросы:

box.schema.func.create('hardest', {language = "C"})
box.schema.user.grant('guest', 'execute', 'function', 'hardest')
box.schema.user.grant('guest', 'read,write', 'space', 'capi_test')
capi_connection:call('hardest')

На этот раз C-функция выполняет три действия:

  1. найдет числовой идентификатор спейса capi_test путем вызова box_space_id_by_name();
  2. форматирует кортеж, используя другие функции msgpuck.h;
  3. вставит кортеж с помощью box_insert().

Предупреждение

char tuple[1024]; используется здесь просто в качестве быстрого способа ввода команды «выделить байтов с запасом». В серьезных программах разработчику следует обратить внимание на то, чтобы выделить достаточно места, которое будут использовать процедуры mp_encode.

Затем всё еще в клиенте выполните следующий запрос:

box.space.capi_test:select()

Результат должен выглядеть следующим образом:

tarantool> box.space.capi_test:select()
---
- - [10000, 'String 2']
...

Это доказывает, что функция hardest() была успешно выполнена, но откуда взялись box_space_id_by_name() и box_insert()? Ответ: API для языка C.

Вернитесь в терминал, где были созданы программы easy.c, harder.c и hardest.c.

Создайте файл. Назовите его read.c. Запишите в него следующие 43 строки:

#include "module.h"
#include <msgpuck.h>
int read(box_function_ctx_t *ctx, const char *args, const char *args_end)
{
  char tuple_buf[1024];      /* where the raw MsgPack tuple will be stored */
  uint32_t space_id = box_space_id_by_name("capi_test", strlen("capi_test"));
  uint32_t index_id = 0;     /* The number of the space's first index */
  uint32_t key = 10000;      /* The key value that box_insert() used */
  mp_encode_array(tuple_buf, 0); /* clear */
  box_tuple_format_t *fmt = box_tuple_format_default();
  box_tuple_t *tuple = NULL;
  char key_buf[16];          /* Pass key_buf = encoded key = 1000 */
  char *key_end = key_buf;
  key_end = mp_encode_array(key_end, 1);
  key_end = mp_encode_uint(key_end, key);
  assert(key_end <= key_buf + sizeof(key_buf));
  /* Get the tuple. There's no box_select() but there's this. */
  int r = box_index_get(space_id, index_id, key_buf, key_end, &tuple);
  assert(r == 0);
  assert(tuple != NULL);
  /* Get each field of the tuple + display what you get. */
  int field_no;             /* The first field number is 0. */
  for (field_no = 0; field_no < 2; ++field_no)
  {
    const char *field = box_tuple_field(tuple, field_no);
    assert(field != NULL);
    assert(mp_typeof(*field) == MP_STR || mp_typeof(*field) == MP_UINT);
    if (mp_typeof(*field) == MP_UINT)
    {
      uint32_t uint_value = mp_decode_uint(&field);
      printf("uint value=%u.\n", uint_value);
    }
    else /* if (mp_typeof(*field) == MP_STR) */
    {
      const char *str_value;
      uint32_t str_value_length;
      str_value = mp_decode_str(&field, &str_value_length);
      printf("string value=%.*s.\n", str_value_length, str_value);
    }
  }
  return 0;
}

Скомпилируйте программу, что создаст файл библиотеки под названием read.so:

$ gcc -shared -o read.so -fPIC read.c

Теперь вернитесь в клиент и выполните следующие запросы:

box.schema.func.create('read', {language = "C"})
box.schema.user.grant('guest', 'execute', 'function', 'read')
box.schema.user.grant('guest', 'read,write', 'space', 'capi_test')
capi_connection:call('read')

На этот раз C-функция выполняет четыре действия:

  1. снова найдет числовой идентификатор спейса capi_test путем вызова box_space_id_by_name();
  2. форматирует ключ поиска = 10 000, используя другие функции msgpuck.h;
  3. получает кортеж с помощью box_index_get();
  4. проходит по полям каждого кортежа с помощью box_tuple_get(). а затем декодирует каждое поле в зависимости от его типа. В данном случае, поскольку мы получаем кортеж, который сами вставили с помощью hardest.c, мы знаем заранее, что его тип будет MP_UINT или MP_STR. Однако, весьма часто здесь употребляется оператор выбора case с одной опцией для каждого возможного типа.

В результате вызова capi_connection:call('read') должны получить:

tarantool> capi_connection:call('read')
uint value=10000.
string value=String 2.
---
- []
...

Это доказывает, что функция read() была успешно выполнена. И снова важные функции, которые начинаются с boxbox_index_get() и box_tuple_field() – пришли из API для языка C.

Вернитесь в терминал, где были созданы программы easy.c, harder.c, hardest.c и read.c.

Создайте файл. Назовите его write.c. Запишите в него следующие 24 строки:

#include "module.h"
#include <msgpuck.h>
int write(box_function_ctx_t *ctx, const char *args, const char *args_end)
{
  static const char *space = "capi_test";
  char tuple_buf[1024]; /* Должен быть достаточно большим, чтобы вместить результат mp_encode */
  uint32_t space_id = box_space_id_by_name(space, strlen(space));
  if (space_id == BOX_ID_NIL) {
    return box_error_set(__FILE__, __LINE__, ER_PROC_C,
    "Can't find space %s", "capi_test");
  }
  char *tuple_end = tuple_buf;
  tuple_end = mp_encode_array(tuple_end, 2);
  tuple_end = mp_encode_uint(tuple_end, 1);
  tuple_end = mp_encode_uint(tuple_end, 22);
  box_txn_begin();
  if (box_replace(space_id, tuple_buf, tuple_end, NULL) != 0)
    return -1;
  box_txn_commit();
  fiber_sleep(0.001);
  struct tuple *tuple = box_tuple_new(box_tuple_format_default(),
                                      tuple_buf, tuple_end);
  return box_return_tuple(ctx, tuple);
}

Скомпилируйте программу, что создаст файл библиотеки под названием write.so:

$ gcc -shared -o write.so -fPIC write.c

Теперь вернитесь в клиент и выполните следующие запросы:

box.schema.func.create('write', {language = "C"})
box.schema.user.grant('guest', 'execute', 'function', 'write')
box.schema.user.grant('guest', 'read,write', 'space', 'capi_test')
capi_connection:call('write')

На этот раз C-функция выполняет шесть действий:

  1. снова найдет числовой идентификатор спейса capi_test путем вызова box_space_id_by_name();
  2. создает новый кортеж;
  3. начинает транзакцию;
  4. заменяет кортеж в box.space.capi_test
  5. заканчивает транзакцию;
  6. последняя строка заменяет цикл read.c – вместо получения и вывода каждого поля, использует функцию box_return_tuple(...) для возврата всего кортежа вызывающему клиенту, чтобы вывести его на экран.

В результате вызова capi_connection:call('write') должны получить:

tarantool> capi_connection:call('write')
---
- [[1, 22]]
...

Это доказывает, что функция write() была успешно выполнена. И снова важные функции, которые начинаются с boxbox_txn_begin(), box_txn_commit() и box_return_tuple() – пришли из API для языка C.

Вывод: длинное описание всего API для языка C необходимо в силу весомых причин. Все функции можно вызвать из C-функций, которые вызываются из Lua. Таким образом, хранимые процедуры на языке C получают полный доступ к базе данных.

  • Удалите все кортежи с функцией с помощью box.schema.func.drop().
  • Удалите спейс capi_test с помощью box.schema.capi_test:drop().
  • Удалите файлы с разрешением .c и .so, созданные для данного практического задания.

Скачайте исходный код Tarantool. Откройте поддиректорию test/box. Проверьте наличие файла под названием tuple_bench.test.lua и еще одного файла под названием tuple_bench.c. Изучите Lua-файл на предмет вызова функции в C-файле с использованием методов, описанных в данном практическом задании.

Вывод: некоторые тесты из стандартного набора используют хранимые процедуры на языке C, а они должны работать, поскольку мы не можем выпустить Tarantool, если он не прошел тестирование.

Разработка с IDE

Для разработки и отладки Lua-приложений для Tarantool можно использовать IntelliJ IDEA в качестве интегрированной среды разработки (IDE).

  1. Загрузите и установите IDE с официального сайта.

    JetBrains предоставляет специализированные версии для разных языков программирования: IntelliJ IDEA (Java), PHPStorm (PHP), PyCharm (Python), RubyMine (Ruby), CLion (C/C++), WebStorm (Web) и другие. Поэтому загрузите версию, которая подходит предпочитаемому языку.

    Для всех версий поддерживается интеграция с Tarantool.

  2. Настройте IDE:

    1. Запустите IntelliJ IDEA.

    2. Нажмите кнопку Configure и выберите Plugins.

      ../../../_images/ide_1.png
    3. Нажмите Browse repositories.

      ../../../_images/ide_2.png
    4. Установите плагин EmmyLua.

      Примечание

      Не путайте с плагином Lua, у которого меньше возможностей, чем у EmmyLua.

      ../../../_images/ide_3.png
    5. Перезапустите IntelliJ IDEA.

    6. Нажмите Configure, выберите Project Defaults, а затем Run Configurations.

      ../../../_images/ide_4.png
    7. Найдите Lua Application в боковой панели слева.

    8. В Program введите путь к установленному бинарному файлу tarantool.

      По умолчанию, это tarantool или /usr/bin/tarantool на большинстве платформ.

      Если вы установили tarantool из источников в другую директорию, укажите здесь правильный путь.

      ../../../_images/ide_5.png

      Теперь IntelliJ IDEA можно использовать с Tarantool.

  3. Создайте новый проект на Lua.

    ../../../_images/ide_6.png
  4. Добавьте новый Lua-файл, например, init.lua.

    ../../../_images/ide_7.png
  5. Разработайте код, сохраните файл.

  6. Чтобы запустить приложение, нажмите Run -> Run в основном меню и выберите исходный файл из списка.

    ../../../_images/ide_8.png

    Или нажмите Run -> Debug для начала отладки.

    Примечание

    Чтобы использовать Lua-отладчик, обновите Tarantool до версии 1.7.5-29-gbb6170e4b или более поздней версии.

    ../../../_images/ide_9.png

Tooling

This section describes the tools that enable developers and administrators to work with Tarantool.

tt CLI utility

tt is a utility that provides a unified command-line interface for managing Tarantool-based applications. It covers a wide range of tasks – from installing a specific Tarantool version to managing remote instances and developing applications.

tt is developed in its own GitHub repository. Here you can find its source code, changelog, and releases information. For a complete list of releases, see the Releases section on GitHub.

There is also the Enterprise version of tt available in a Tarantool Enterprise Edition’s release package. The Enterprise version provides additional features, for example, importing and exporting data.

This section provides instructions on tt installation and configuration, concept explanation, and the tt command reference.

The key aspect of the tt usage is an environment. A tt environment is a directory that includes a tt configuration, Tarantool installations, application files, and other resources. If you’re familiar with Python virtual environments, you can think of tt environments as their analog.

tt environments enable independent management of multiple Tarantool applications, each running on its own Tarantool version and configuration, on a single host in an isolated manner.

To create a tt environment in a directory, run tt init in it.

tt supports Tarantool applications that run on multiple instances. For example, you can write an application that includes different source files for storage and router instances. With tt, you can start and stop them in a single call, or manage each instance independently.

Learn more about working with multi-instance applications in Multi-instance applications.

A multi-purpose tool for working with Tarantool from the command line, tt has come to replace the deprecated utilities tarantoolctl and Cartridge CLI command-line utilities. The instructions on migration to tt are provided in Migration from tarantoolctl to tt.

Installation

To install the tt command-line utility, use a package manager – Yum or APT on Linux, or Homebrew on macOS. If you need a specific build, you can build tt from sources.

Примечание

A Tarantool Enterprise Edition’s release package includes the tt utility extended with additional features like importing and exporting data.

On Linux systems, you can install tt with yum or apt package managers from the tarantool/modules repository. Learn how to add this repository.

The installation command looks like this:

On macOS, use Homebrew to install tt:

$ brew install tt

To build tt from sources:

  1. Install third-party software required for building tt:
  1. Clone the tarantool/tt repository:

    git clone https://github.com/tarantool/tt --recursive
    
  2. Go to the tt directory:

    cd tt
    
  3. (Optional) Checkout a release tag to build a specific version:

    git checkout tags/v1.0.0
    
  4. Build tt using mage:

    mage build
    

tt will appear in the current directory.

To enable the completion for tt commands, run the following command specifying the shell (bash or zsh):

. <(tt completion bash)

Configuration

The key artifact that defines the tt environment and various aspects of its execution is its configuration file. You can generate it with a tt init call. In the default launch mode, the file is generated in the current directory, making it the environment root.

By default, the configuration file is called tt.yaml and located in the tt environment root directory. It depends on the launch mode.

It is also possible to pass the configuration file name and location explicitly using the following ways:

  1. -c/--cfg global option
  2. TT_CLI_CFG environment variable.

The TT_CLI_CFG variable has a lower priority than the --cfg option.

The tt configuration file is a YAML file with the following structure:

env:
  instances_enabled: path/to/available/applications
  bin_dir: path/to/bin_dir
  inc_dir: path/to/inc_dir
  restart_on_failure: bool
  tarantoolctl_layout: bool
modules:
  directory: path/to/modules/dir
app:
  run_dir: path/to/run_dir
  log_dir: path/to/log_dir
  wal_dir: path/to/wal_dir
  vinyl_dir: path/to/vinyl_dir
  memtx_dir: path/to/memtx_dir
repo:
  rocks: path/to/rocks
  distfiles: path/to/install
ee:
  credential_path: path/to/file
templates:
  - path: path/to/app/templates1
  - path: path/to/app/templates2

Примечание

The tt configuration format and application layout have been changed in version 2.0. Learn how to upgrade from earlier versions in Migrating from tt 1.* to 2.0 or later.

Примечание

The paths specified in env.* parameters are relative to the current tt environment’s root.

  • instances_enabled – the directory where instances are stored. Default: instances.enabled.

  • bin_dir – the directory where binary files are stored. Default: bin.

  • inc_dir – the base directory for storing header files. They will be placed in the include subdirectory inside the specified directory. Default: include.

    Примечание

    The header files directory path can also be passed using the TT_CLI_TARANTOOL_PREFIX environment variable. If it is set, tt rocks and tt build commands use the include/tarantool directory inside TT_CLI_TARANTOOL_PREFIX as the header files directory.

  • restart_on_failure – restart the instance on failure: true or false. Default: false.

  • tarantoolctl_layout – use a layout compatible with the deprecated tarantoolctl utility for artifact files: control sockets, .pid files, log files. Default: false.

Примечание

The paths specified in app.*_dir parameters are relative to the application location inside the instances.enabled directory specified in the env configuration section. For example, the default location of the myapp application’s logs is instances.enabled/myapp/var/log. Inside this location, tt creates separate directories for each application instance that runs in the current environment.

  • run_dir– the directory for instance runtime artifacts, such as console sockets or PID files. Default: var/run.
  • log_dir – the directory where log files are stored. Default: var/log.
  • wal_dir – the directory where write-ahead log (.xlog) files are stored. Default: var/lib.
  • memtx_dir – the directory where memtx stores snapshot (.snap) files. Default: var/lib.
  • vinyl_dir – the directory where vinyl files or subdirectories are stored. Default: var/lib.

  • rocks – the directory where rocks files are stored.

    Примечание

    The rocks directory path can be passed in the TT_CLI_REPO_ROCKS environment variable instead. The variable is also used if the directory specified in repo.rocks does not include a repository manifest.

  • distfiles – the directory where installation files are stored.

  • credential_path – a path to the file with credentials used for downloading Tarantool Enterprise Edition (Tarantool customer zone credentials). The file should contain a username and a password, each on a separate line. Find an example in the tt install command reference.

    Примечание

    The customer zone credentials can also be passed in the TT_CLI_EE_USERNAME and TT_CLI_EE_PASSWORD environment variables.

  • path – a path to application templates used for creating applications with tt create. May be specified more than once.

tt launch mode defines its working directory and the way it searches for the configuration file. There are three launch modes:

Global option: none

Configuration file: searched from the current directory to the root. Taken from /etc/tarantool if the file is not found.

Working directory: The directory where the configuration file is found.

Global option: --system or -S

Configuration file: Taken from /etc/tarantool.

Working directory: Current directory.

Global option: --local=DIRECTORY or -L=DIRECTORY

Configuration file: Searched from the specified directory to the root. Taken from /etc/tarantool if the file is not found.

Working directory: The specified directory. If tarantool or tt executable files are found in the working directory, they will be used.

The tt configuration and application layout were changed in version 2.0. If you are using tt 1.*, complete the following steps to migrate to tt 2.0 or later:

  1. Update the tt configuration file. In tt 2.0, the following changes were made to the configuration file:

    • The root section tt was removed. Its child sections – app, repo, modules, and other – have been moved to the top level.
    • Environment configuration parameters were moved from the app section to the new section env. These parameters are instances.enabled, bin_dir, inc_dir, and restart_on_failure.
    • The paths in the app section are now relative to the app directory in instances.enabled instead of the environment root.

    You can use tt init to generate a configuration file with the new structure and default parameter values.

  2. Move application artifacts. With tt 1.*, application artifacts (logs, snapshots, pid, and other files) were created in the var directory inside the environment root. Starting from tt 2.0, these artifacts are created in the var directory inside the application directory, which is instances.enabled/<app-name>. This is how an application directory looks:

    instances.enabled/app/
    ├── init.lua
    ├── instances.yml
    └── var
        ├── lib
        │   ├── instance1
        │   └── instance2
        ├── log
        │   ├── instance1
        │   └── instance2
        └── run
            ├── instance1
            └── instance2
    

    To continue using existing application artifacts after migration from tt 1.*:

    1. Create the var directory inside the application directory.
    2. Create the lib, log, and run directories inside var.
    3. Move directories with instance artifacts from the old var directory to the new var directories in applications“ directories.
  3. Move the files accessed from the application code. The working directory of instance processes was changed from the tt working directory to the application directory inside instances.enabled. If the application accesses files using relative paths, move the files accordingly or adjust the application code.

Global options

Важно

Global options of tt must be passed before its commands and other options. For example:

$ tt --cfg tt-conf.yaml start app

tt has the following global options:

-c=file, --cfg=file,

Path to the configuration file.

Alternatively, this path can be passed in the TT_CLI_CFG environment variable.

-h, --help

Display help.

--integrity-check PUBLIC_KEY

Enterprise Edition

This option is supported by the Enterprise Edition only.

Perform an integrity check using the specified public key before executing the operation. Learn more in Integrity check.

-I, --internal

Force the use of an internal module even if there is an external module with the same name.

-L=DIRECTORY, --local=DIRECTORY

Use the tt environment from the specified directory. Learn more about the local launch mode.

-s, --self

Use the current tt version instead of executing the one located in the bin_dir directory.

-S, --system

Use the tt environment installed in the system. Learn more about the system launch mode.

-V, --verbose

Display detailed processing information (verbose mode).

Developing applications

This section describes tt capabilities related to developing cluster applications.

Application environment

This section provides a high-level overview on how to prepare a Tarantool application for deployment and how the application’s environment and layout might look. This information is helpful for understanding how to administer Tarantool instances using tt CLI in both development and production environments.

The main steps of creating and preparing the application for deployment are:

  1. Initializing a local environment.
  2. Creating and developing an application.
  3. Packaging the application.

In this section, a sharded_cluster_crud application is used as an example. This cluster includes 5 instances: one router and 4 storages, which constitute two replica sets.

Cluster topology

Before creating an application, you need to set up a local environment for tt:

  1. Create a home directory for the environment.

  2. Run tt init in this directory:

    ~/myapp$ tt init
       • Environment config is written to 'tt.yaml'
    

This command creates a default tt configuration file tt.yaml for a local environment and the directories for applications, control sockets, logs, and other artifacts:

~/myapp$ ls
bin  distfiles  include  instances.enabled  modules  templates  tt.yaml

Find detailed information about the tt configuration parameters and launch modes on the tt configuration page.

You can create an application in two ways:

In this example, the application’s layout is prepared manually and looks as follows.

~/myapp$ tree
.
├── bin
├── distfiles
├── include
├── instances.enabled
│   └── sharded_cluster_crud
│       ├── config.yaml
│       ├── instances.yaml
│       ├── router.lua
│       ├── sharded_cluster_crud-scm-1.rockspec
│       └── storage.lua
├── modules
├── templates
└── tt.yaml

The sharded_cluster_crud directory contains the following files:

You can find the full example here: sharded_cluster_crud.

To package the ready application, use the tt pack command. This command can create an installable DEB/RPM package or generate .tgz archive.

The structure below reflects the content of the packed .tgz archive for the sharded_cluster_crud application:

~/myapp$ tree -a
.
├── bin
│   ├── tarantool
│   └── tt
├── instances.enabled
│   └── sharded_cluster_crud -> ../sharded_cluster_crud
├── sharded_cluster_crud
│   ├── .rocks
│   │   └── share
│   │       └── ...
│   ├── config.yaml
│   ├── instances.yaml
│   ├── router.lua
│   └── storage.lua
└── tt.yaml

The application’s layout looks similar to the one defined when developing the application with some differences:

Примечание

In DEB/PRM packages generated by tt pack, there are also .service unit files for each packaged application.

When deploying a distributed cluster application from a .tar.gz archive, you can define instances to run on each machine by changing the content of the instances.yaml file.

  • On the developer’s machine, this file might include all the instances defined in the cluster configuration.

    Cluster topology

    instances.yaml:

    storage-a-001:
    storage-a-002:
    storage-b-001:
    storage-b-002:
    router-a-001:
    
  • In the production environment, this file includes instances to run on the specific machine.

    Cluster topology

    instances.yaml (Server-001):

    router-a-001:
    

    instances.yaml (Server-002):

    storage-a-001:
    storage-b-001:
    

    instances.yaml (Server-003):

    storage-a-002:
    storage-b-002:
    

The Starting and stopping instances section describes how to start and stop Tarantool instances.

Tarantool applications installed from DEB and RPM packages built with tt pack can run as systemd services. They run on behalf of the tarantool system user. It is created automatically during the package installation.

By default, the application artifacts are placed in the following directories:

  • /var/lib/tarantool/sys_env – application data
  • /var/log/tarantool/sys_env – logs
  • /var/run/tarantool/sys_env – runtime artifacts

If you want to change these directories, make sure that the tarantool user has enough permissions on the directories you use.

Starting and stopping instances

Примечание

To run instances in production, it is recommended to use Ansible Tarantool Enterprise installer (ATE). ATE is a set of Ansible playbooks that are used to deploy and maintain Tarantool Enterprise products. ATE documentation is available to users logged in on the Tarantool website.

This section describes how to manage instances in a Tarantool cluster using the tt utility. A cluster can include multiple instances that run different code. A typical example is a cluster application that includes router and storage instances. Particularly, you can perform the following actions:

To get more context on how the application’s environment might look, refer to Application environment.

Примечание

In this section, a sharded_cluster_crud application is used to demonstrate how to start, stop, and manage instances in a cluster.

To start Tarantool instances use the tt start command:

$ tt start sharded_cluster_crud
   • Starting an instance [sharded_cluster_crud:storage-a-001]...
   • Starting an instance [sharded_cluster_crud:storage-a-002]...
   • Starting an instance [sharded_cluster_crud:storage-b-001]...
   • Starting an instance [sharded_cluster_crud:storage-b-002]...
   • Starting an instance [sharded_cluster_crud:router-a-001]...

After the cluster has started and worked for some time, you can find its artifacts in the directories specified in the tt configuration. These are the default locations in the local launch mode:

In the system launch mode, artifacts are created in these locations:

Most of the commands described in this section can be called with or without an instance name. Without the instance name, they are executed for all instances defined in instances.yaml.

To check the status of instances, execute tt status:

$ tt status sharded_cluster_crud
 INSTANCE                            STATUS   PID   MODE  CONFIG  BOX      UPSTREAM
 sharded_cluster_crud:router-a-001   RUNNING  8382  RW    ready   running  --
 sharded_cluster_crud:storage-a-001  RUNNING  8386  RW    ready   running  --
 sharded_cluster_crud:storage-a-002  RUNNING  8390  RO    ready   running  --
 sharded_cluster_crud:storage-b-001  RUNNING  8379  RW    ready   running  --
 sharded_cluster_crud:storage-b-002  RUNNING  8380  RO    ready   running  --

To check the status of a specific instance, you need to specify its name:

$ tt status sharded_cluster_crud:storage-a-001
 INSTANCE                            STATUS   PID   MODE  CONFIG  BOX      UPSTREAM
 sharded_cluster_crud:storage-a-001  RUNNING  8386  RW    ready   running  --

To connect to the instance, use the tt connect command:

$ tt connect sharded_cluster_crud:storage-a-001
   • Connecting to the instance...
   • Connected to sharded_cluster_crud:storage-a-001

sharded_cluster_crud:storage-a-001>

In the instance’s console, you can execute commands provided by the box module. For example, box.info can be used to get various information about a running instance:

sharded_cluster_crud:storage-a-001> box.info.ro
---
- false
...

To restart an instance, use tt restart:

$ tt restart sharded_cluster_crud:storage-a-002

After executing tt restart, you need to confirm this operation:

Confirm restart of 'sharded_cluster_crud:storage-a-002' [y/n]: y
   • The Instance sharded_cluster_crud:storage-a-002 (PID = 2026) has been terminated.
   • Starting an instance [sharded_cluster_crud:storage-a-002]...

To stop the specific instance, use tt stop as follows:

$ tt stop sharded_cluster_crud:storage-a-002

You can also stop all the instances at once as follows:

$ tt stop sharded_cluster_crud
   • The Instance sharded_cluster_crud:storage-b-001 (PID = 2020) has been terminated.
   • The Instance sharded_cluster_crud:storage-b-002 (PID = 2021) has been terminated.
   • The Instance sharded_cluster_crud:router-a-001 (PID = 2022) has been terminated.
   • The Instance sharded_cluster_crud:storage-a-001 (PID = 2023) has been terminated.
   • can't "stat" the PID file. Error: "stat /home/testuser/myapp/instances.enabled/sharded_cluster_crud/var/run/storage-a-002/tt.pid: no such file or directory"

Примечание

The error message indicates that storage-a-002 is already not running.

The tt clean command removes instance artifacts (such as logs or snapshots):

$ tt clean sharded_cluster_crud
   • List of files to delete:

   • /home/testuser/myapp/instances.enabled/sharded_cluster_crud/var/log/storage-a-001/tt.log
   • /home/testuser/myapp/instances.enabled/sharded_cluster_crud/var/lib/storage-a-001/00000000000000001062.snap
   • /home/testuser/myapp/instances.enabled/sharded_cluster_crud/var/lib/storage-a-001/00000000000000001062.xlog
   • ...

Confirm [y/n]:

Enter y and press Enter to confirm removing of artifacts for each instance.

Примечание

The -f option of the tt clean command can be used to remove the files without confirmation.

Tarantool supports loading and running chunks of Lua code before starting instances. To load or run Lua code immediately upon Tarantool startup, specify the TT_PRELOAD environment variable. Its value can be either a path to a Lua script or a Lua module name:

To load several scripts or modules, pass them in a single quoted string, separated by semicolons:

$ TT_PRELOAD="preload_script.lua;preload_module" tt start sharded_cluster_crud

If an error happens during the execution of the preload script or module, Tarantool reports the problem and exits.

Commands

Below is a list of tt commands. Run tt COMMAND help to see the detailed help for the given command.

binaries Show a list of installed binaries and their versions
build Build an application locally
cartridge Manage a Cartridge application
cat Print the contents of .snap or .xlog files into stdout
cfg Manage a tt environment configuration
check Check an application file for syntax errors
clean Clean instance files
cluster Manage a cluster’s configuration
completion Generate completion for a specified shell
connect Connect to a Tarantool instance
coredump Manipulate Tarantool core dumps
create Create an application from a template
crud Interact with the CRUD module (Enterprise only)
download Download the Tarantool Enterprise SDK
export Export data to a file (Enterprise only)
help Display help for tt or a specific command
import Import data from a file (Enterprise only)
init Create a new tt environment in the current directory
install Install Tarantool or tt
instances List enabled applications
kill Terminate Tarantool applications or instances
log Print instance logs
logrotate Rotate instance logs
migrations Manage migrations
pack Package an application
play Play the contents of .snap or .xlog files to another Tarantool instance
replicaset Manage replica sets
restart Restart Tarantool applications or instances
rocks Use the LuaRocks package manager
run Run Lua code in a Tarantool instance
search Search available Tarantool and tt versions
start Start Tarantool applications or instances
status Get the current status of applications or instances
stop Stop Tarantool applications or instances
tdg2 Interact with Tarantool Data Grid 2 clusters
uninstall Uninstall Tarantool or tt
version Show the tt version information

Managing binaries in the current environment

$ tt binaries COMMAND [COMMAND_OPTION ...]

tt binaries manages Tarantool and tt binaries installed in the current environment.

COMMAND is one of the following:

$ tt binaries list

tt binaries list shows a list of installed binaries and their versions.

To show a list of installed Tarantool versions:

$ tt binaries list
List of installed binaries:
   • tarantool:
        3.1.0 [active]
        2.11.2
   • tt:
        2.3.0
        2.2.1 [active]

$ tt binaries switch [PROGRAM_NAME] [VERSION]

tt binaries switch switches binaries used in the current environment. The possible values of PROGRAM_NAME are:

When called without arguments, the command lets you choose the program and version interactively:

$ tt binaries switch
Use the arrow keys to navigate: ↓ ↑ → ←
? Select program:
  ▸ tarantool
    tarantool-ee
    tt

You can also specify the program name and version in the call.

To view tt versions installed in the current environment and switch between them:

$ tt binaries switch tt
Use the arrow keys to navigate: ↓ ↑ → ←
? Select version:
  ▸ 2.2.1
    2.3.0 [active]

To switch to a specific Tarantool EE version installed in the current environment:

$ tt binaries switch tarantool-ee 3.1.0

Building an application

$ tt build [PATH] [--spec SPEC_FILE_PATH]

tt build builds a Tarantool application locally.

--spec SPEC_FILE_PATH

Path to a .rockspec file to use for the current build

The PATH argument should contain the path to the application directory (that is, to the build source). The default path is . (current directory).

The application directory must contain a .rockspec file to use for the build. If there is more than one .rockspec file in the application directory, specify the one to use in the --spec argument.

tt build builds an application with the tt rocks make command. It downloads the application dependencies into the .rocks directory, making the application ready to run locally.

In addition to building the application with LuaRocks, tt build can execute pre-build and post-build scripts. These scripts should contain steps to execute right before and after building the application. These files must be named tt.pre-build and tt.post-build correspondingly and located in the application directory.

Примечание

For compatibility with Cartridge applications, the pre-build and post-build scripts can also have names cartridge.pre-build and cartridge.post-build.

tt.pre-build is helpful when your application depends on closed-source rocks, or if the build should contain rocks from a project added as a submodule. You can install these dependencies using the pre-build script before building. Example:

#!/bin/sh

# The main purpose of this script is to build non-standard rocks modules.
# The script will run before `tt rocks make` during application build.

tt rocks make --chdir ./third_party/proj

tt.post-build is a script that runs after tt rocks make. The main purpose of this script is to remove build artifacts from the final package. Example:

#!/bin/sh

# The main purpose of this script is to remove build artifacts from the resulting package.
# The script will run after `tt rocks make` during application build.

rm -rf third_party
rm -rf node_modules
rm -rf doc

Managing a Cartridge application

Важно

The Tarantool Cartridge framework is deprecated and is not compatible with Tarantool 3.0 and later. This command is added for backward compatibility with earlier versions.

$ tt cartridge COMMAND {[OPTION ...]|SUBCOMMAND}

tt cartridge manages a Cartridge application. COMMAND is one of the following:

$ tt cartridge admin ADMIN_FUNC_NAME [ADMIN_OPTION ...]

tt cartridge admin calls admin functions provided by the application.

--name STRING

(Required) An application name.

-l, --list

List the available admin functions.

--instance STRING

A name of the instance to connect to.

--conn STRING

An address to connect to.

--run-dir STRING

A directory where PID and socket files are stored. Defaults to /var/run/tarantool.

Get a list of the available admin functions:

$ tt cartridge admin --name APPNAME --list

   • Available admin functions:

probe  Probe instance

Get help for a specific function:

$ tt cartridge admin --name APPNAME probe --help

   • Admin function "probe" usage:

Probe instance

Args:
  --uri string  Instance URI

Call a function with an argument:

$ tt cartridge admin --name APPNAME probe --uri localhost:3301

   • Probe "localhost:3301": OK

$ tt cartridge bench [BENCH_OPTION ...]

tt cartridge bench runs benchmarks for Tarantool.

--url STRING

A Tarantool instance address (the default is 127.0.0.1:3301).

--user STRING

A username used to connect to the instance (the default is guest).

--password STRING

A password used to connect to the instance.

--connections INT

A number of concurrent connections (the default is 10).

--requests INT

A number of simultaneous requests per connection (the default is 10).

--duration INT

The duration of a benchmark test in seconds (the default is 10).

--keysize INT

The size of a key part of benchmark data in bytes (the default is 10).

--datasize INT

The size of a value part of benchmark data in bytes (the default is 20).

--insert INT

A percentage of inserts (the default is 100).

--select INT

A percentage of selects.

--update INT

A percentage of updates.

--fill INT

A number of records to pre-fill the space (the default is 1000000).

$ tt cartridge failover COMMAND [COMMAND_OPTION ...]

tt cartridge failover manages an application failover. The following commands are available:

$ tt cartridge failover set MODE [FAILOVER_SET_OPTION ...]

Setup failover in the specified mode:

  • stateful
  • eventual
  • disabled

Options:

  • --state-provider STRING: A failover’s state provider. Can be stateboard or etcd2. Used only in the stateful mode.
  • --params STRING: Failover parameters specified in a JSON-formatted string, for example, "{'fencing_timeout': 10', 'fencing_enabled': true}".
  • --provider-params STRING: Failover provider parameters specified in a JSON-formatted string, for example, "{'lock_delay': 14}".

$ tt cartridge failover setup --file STRING

Setup failover with parameters described in a file. The failover configuration file defaults to failover.yml.

The failover.yml file might look as follows:

mode: stateful
state_provider: stateboard
stateboard_params:
    uri: localhost:4401
    password: passwd
failover_timeout: 15

$ tt cartridge failover status

Get the current failover status.

$ tt cartridge failover disable

Disable failover.

--name STRING

An application name. Defaults to «package» in rockspec.

--file STRING

A path to the file containing failover settings. Defaults to failover.yml.

$ tt cartridge repair COMMAND [REPAIR_OPTION ...]

tt cartridge repair repairs a running application. The following commands are available:

$ tt cartridge repair list-topology [REPAIR_OPTION ...]

Get a summary of the current cluster topology.

$ tt cartridge repair remove-instance UUID [REPAIR_OPTION ...]

Remove the instance with the specified UUID from the cluster. If the instance isn’t found, raise an error.

$ tt cartridge repair set-advertise-uri INSTANCE-UUID NEW-URI [REPAIR_OPTION ...]

Change the instance’s advertise URI. Raise an error if the instance isn’t found or is expelled.

$ tt cartridge repair set-leader REPLICASET-UUID INSTANCE-UUID [REPAIR_OPTION ...]

Set the instance as the leader of the replica set. Raise an error in the following cases:

  • There is no replica set or instance with that UUID.
  • The instance doesn’t belong to the replica set.
  • The instance has been disabled or expelled.

The following options work with any repair subcommand:

--name

(Required) An application name.

--data-dir

The directory containing the instances“ working directories. Defaults to /var/lib/tarantool.

The following options work with any repair command, except list-topology:

--run-dir

The directory where PID and socket files are stored. Defaults to /var/run/tarantool.

--dry-run

Launch in dry-run mode: show changes but do not apply them.

--reload

Enable instance configuration to reload after the patch.

$ tt cartridge replicasets COMMAND [COMMAND_OPTION ...]

tt cartridge replicasets manages an application’s replica sets. The following commands are available:

$ tt cartridge replicasets setup [--file FILEPATH] [--bootstrap-vshard]

Setup replica sets using a file.

Options:

  • --file: A file with a replica set configuration. Defaults to replicasets.yml.
  • --bootstrap-vshard: Bootstrap vshard upon setup.

$ tt cartridge replicasets save [--file FILEPATH]

Save the current replica set configuration to a file.

Options:

  • --file: A file to save the configuration to. Defaults to replicasets.yml.

$ tt cartridge replicasets list [--replicaset STRING]

List the current cluster topology.

Options:

  • --replicaset STRING: A replica set name.

$ tt cartridge replicasets join INSTANCE_NAME ... [--replicaset STRING]

Join the instance to a cluster. If a replica set with the specified alias isn’t found in the cluster, it is created. Otherwise, instances are joined to an existing replica set.

Options:

  • --replicaset STRING: A replica set name.

$ tt cartridge replicasets list-roles

List the available roles.

$ tt cartridge replicasets list-vshard-groups

List the available vshard groups.

$ tt cartridge replicasets add-roles ROLE_NAME ... [--replicaset STRING] [--vshard-group STRING]

Add roles to the replica set.

Options:

  • --replicaset STRING: A replica set name.
  • --vshard-group STRING: A vshard group for vshard-storage replica sets.

$ tt cartridge replicasets remove-roles ROLE_NAME ... [--replicaset STRING]

Remove roles from the replica set.

Options:

  • --replicaset STRING: A replica set name.

$ tt cartridge replicasets set-weight WEIGHT [--replicaset STRING]

Specify replica set weight.

Options:

  • --replicaset STRING: A replica set name.

$ tt cartridge replicasets set-failover-priority INSTANCE_NAME ... [--replicaset STRING]

Configure replica set failover priority.

Options:

  • --replicaset STRING: A replica set name.

$ tt cartridge replicasets bootstrap-vshard

Bootstrap vshard.

$ tt cartridge replicasets expel INSTANCE_NAME ...

Expel one or more instances from the cluster.

Printing the contents of .snap and .xlog files

$ tt cat FILE ... [OPTION ...]

tt cat prints the contents of snapshot (.snap) and WAL (.xlog) files to stdout. A single call of tt cat can print the contents of multiple files.

--format FORMAT

Output format: yaml (default), json, or lua.

--from LSN

Show operations starting from the given LSN.

--to LSN

Show operations up to the given LSN. Default: 18446744073709551615.

--replica ID

Filter the output by replica ID. Can be passed more than once.

When calling tt cat with filters by LSN (--from and --to flags) and replica ID (--replica), remember that LSNs differ across replicas. Thus, if you pass more than one replica ID via --from or --to, the result may not reflect the actual sequence of operations.

--space ID

Filter the output by space ID. Can be passed more than once.

--show-system

Show the contents of system spaces.

Environment configuration

$ tt cfg COMMAND [OPTION ...]

tt cfg manages a tt environment configuration.

dump

Print a tt environment configuration.

Options:

  • -r, --raw: Print a raw content of the tt.yaml configuration file.

Print the current tt environment configuration:

$ tt cfg dump

Checking an application file

$ tt check {FILEPATH | APPLICATION[:APP_INSTANCE]}

tt check checks the syntax correctness of Lua files within Tarantool applications or separate Lua scripts. The files must be stored inside the instances_enabled directory specified in the tt configuration file.

To check all Lua files in an application directory at once, specify the directory name:

$ tt check app

To check a single Lua file from an application directory, add the path to this file:

$ tt check app/router
# or
$ tt check app/router.lua

Примечание

The .lua extension can be omitted.

Cleaning instance files

$ tt clean APPLICATION[:APP_INSTANCE] [OPTION ...]

tt clean cleans stored files of Tarantool instances: logs, snapshots, and other files. To avoid accidental deletion of files, tt clean shows the files it is going to delete and asks for confirmation.

When called without arguments, cleans files of all applications in the current environment.

-f, --force

Clean files without confirmation.

Managing cluster configurations

$ tt cluster COMMAND [COMMAND_OPTION ...]

tt cluster manages configurations of Tarantool applications. This command works both with local YAML files in application directories and with centralized configuration storages (etcd or Tarantool-based).

COMMAND is one of the following:

$ tt cluster publish {APPLICATION[:APP_INSTANCE] | CONFIG_URI} [FILE] [OPTION ...]

tt cluster publish publishes a cluster configuration using an arbitrary YAML file as a source.

tt cluster publish can modify local cluster configurations stored in config.yaml files inside application directories.

To write a configuration to a local config.yaml, run tt cluster publish with two arguments:

  • the application name.
  • the path to a YAML file from which the configuration should be taken.
$ tt cluster publish myapp source.yaml

tt cluster publish can modify centralized cluster configurations in storages of both supported types: etcd or a Tarantool-based configuration storage.

To publish a configuration from a file to a centralized configuration storage, run tt cluster publish with a URI of this storage’s instance as the target. For example, the command below publishes a configuration from source.yaml to a local etcd instance running on the default port 2379:

$ tt cluster publish "http://localhost:2379/myapp" source.yaml

A URI must include a prefix that is unique for the application. It can also include credentials and other connection parameters. Find the detailed description of the URI format in URI format.

In addition to whole cluster configurations, tt cluster publish can manage configurations of specific instances within applications: rewrite configurations of existing instances and add new instance configurations.

In this case, it operates with YAML fragments that describe a single instance configuration section. For example, the following YAML file can be a source when publishing an instance configuration:

# instance_source.yaml
iproto:
  listen:
  - uri: 127.0.0.1:3311

To send an instance configuration to a local config.yaml, run tt cluster publish with the application:instance pair as the target argument:

$ tt cluster publish myapp:instance-002 instance_source.yaml

To send an instance configuration to a centralized configuration storage, specify the instance name in the name argument of the storage URI:

$ tt cluster publish "http://localhost:2379/myapp?name=instance-002" instance_source.yaml

If the instance already exists, this call overwrites its configuration with the one from the file.

To add a new instance configuration from a YAML fragment, specify the name to assign to the new instance and its location in the cluster topology – replica set and group – in the --replicaset and --group options.

Примечание

The --group option can be omitted if the configuration contains only one group.

To add a new instance instance-003 to the replicaset-001 replica set:

$ tt cluster publish "http://localhost:2379/myapp?name=instance-003" instance_source.yaml --replicaset replicaset-001

tt cluster publish validates configurations against the Tarantool configuration schema and aborts in case of an error. To skip the validation, add the --force option:

$ tt cluster publish myapp source.yaml --force

Enterprise Edition

The integrity check functionality is supported by the Enterprise Edition only.

When called with the --with-integrity-check option, tt cluster publish generates a checksum of the configurations it publishes. It signs the checksum using the private key passed as the option argument, and writes it into the configuration store.

$ tt cluster publish "http://localhost:2379/myapp" source.yaml --with-integrity-check private.pem

If an application configuration is published this way, it can be checked for integrity using the --integrity-check global option.

$ tt --integrity-check public.pem cluster show myapp
$ tt --integrity-check public.pem start myapp

Learn more about integrity checks upon application startup and in runtime in the tt start reference.

To ensure the configuration integrity when updating it, call tt cluster publish with two options:

  • --integrity-check PUBLIC_KEY global option checks that the configuration wasn’t changed since it was published
  • --with-integrity-check PRIVATE_KEY generates new hash and signature for future integrity checks of the updated configuration.
$ tt --integrity-check public.pem cluster publish \
     --with-integrity-check private.pem \
     "http://localhost:2379/myapp" source.yaml

$ tt cluster show {APPLICATION[:APP_INSTANCE] | CONFIG_URI} [OPTION ...]

tt cluster show displays a cluster configuration.

tt cluster show can read local cluster configurations stored in config.yaml files inside application directories.

To print a local configuration from an application’s config.yaml, specify the application name as an argument:

$ tt cluster show myapp

tt cluster show can display centralized cluster configurations from configuration storages of both supported types: etcd or a Tarantool-based configuration storage.

To print a cluster configuration from a centralized storage, run tt cluster show with a storage URI including the prefix identifying the application. For example, to print myapp’s configuration from a local etcd storage:

$ tt cluster show "http://localhost:2379/myapp"

In addition to whole cluster configurations, tt cluster show can display configurations of specific instances within applications. In this case, it prints YAML fragments that describe a single instance configuration section.

To print an instance configuration from a local config.yaml, use the application:instance argument:

$ tt cluster show myapp:instance-002

To print an instance configuration from a centralized configuration storage, specify the instance name in the name argument of the URI:

$ tt cluster show "http://localhost:2379/myapp?name=instance-002"

To validate configurations when printing them with tt cluster show, enable the validation by adding the --validate option:

$ tt cluster show "http://localhost:2379/myapp" --validate

$ tt cluster replicaset SUBCOMMAND {APPLICATION[:APP_INSTANCE] | CONFIG_URI} [OPTION ...]

tt cluster replicaset manages instances in a replica set. It supports the following subcommands:

Важно

tt cluster replicaset works only with centralized cluster configurations. To manage replica sets in clusters with local YAML configurations, use tt replicaset.

$ tt cluster replicaset promote CONFIG_URI INSTANCE_NAME [OPTION ...]

tt cluster replicaset promote promotes the specified instance, making it a leader of its replica set. This command works on Tarantool clusters with centralized configuration and with failover modes off and manual. It updates the centralized configuration according to the specified arguments and reloads it:

  • off failover mode: the command sets database.mode to rw on the specified instance.

    Важно

    If failover is off, the command doesn’t consider the modes of other replica set members, so there can be any number of read-write instances in one replica set.

  • manual failover mode: the command updates the leader option of the replica set configuration. Other instances of this replica set become read-only.

Example:

$ tt cluster replicaset promote "http://localhost:2379/myapp" storage-001-a

$ tt cluster replicaset demote CONFIG_URI INSTANCE_NAME [OPTION ...]

tt cluster replicaset demote demotes an instance in a replica set. This command works on Tarantool clusters with centralized configuration and with failover mode off.

Примечание

In clusters with manual failover mode, you can demote a read-write instance by promoting a read-only instance from the same replica set with tt cluster replicaset promote.

The command sets the instance’s database.mode to ro and reloads the configuration.

Важно

If failover is off, the command doesn’t consider the modes of other replica set members, so there can be any number of read-write instances in one replica set.

$ tt cluster replicaset expel CONFIG_URI INSTANCE_NAME [OPTION ...]

tt cluster replicaset expel expels an instance from the cluster. Example:

$ tt cluster replicaset expel "http://localhost:2379" storage-b-002

$ tt cluster replicaset roles [add|remove] CONFIG_URI ROLE_NAME [OPTION ...]

tt cluster replicaset roles manages application roles in the configuration scope specified in the command options. It has two subcommands:

  • add adds a role
  • remove removes a role

Use the --global, --group, --replicaset, --instance options to specify the configuration scope to add or remove roles. For example, to add a role to all instances in a replica set:

$ tt cluster replicaset roles add "http://localhost:2379" roles.my-role --replicaset storage-a

To remove a role defined in the global configuration scope:

$ tt cluster replicaset roles remove "http://localhost:2379" roles.my-role --global

The changes that tt cluster replicaset makes to the configuration storage occur transactionally. Each call creates a new revision. In case of a revision mismatch, an error is raised.

If the cluster configuration is distributed over multiple keys in the configuration storage (for example, in two paths /myapp/config/k1 and /myapp/config/k2), the affected instance configuration can be present in more that one of them. If it is found under several different keys, the command prompts the user to choose a key for patching. You can skip the selection by adding the -f/--force option:

$ tt cluster replicaset promote "http://localhost:2379/myapp" storage-001-a --force

In this case, the command selects the key for patching automatically. A key’s priority is determined by the detail level of the instance or replica set configuration stored under this key. For example, when failover is off, a key with instance.database options takes precedence over a key with the only instance field. In case of equal priority, the first key in the lexicographical order is patched.

$ tt cluster failover SUBCOMMAND [OPTION ...]

tt cluster failover manages a supervised failover in Tarantool clusters.

Важно

tt cluster failover works only with centralized cluster configurations stored in etcd.

$ tt cluster failover switch CONFIG_URI INSTANCE_NAME [OPTION ...]

tt cluster failover switch appoints the specified instance to be a master. This command accepts the following arguments and options:

  • CONFIG_URI: A URI of the cluster configuration storage.
  • INSTANCE_NAME: An instance name.
  • [OPTION ...]: Options to pass to the command.

In the example below, tt cluster failover switch appoints storage-a-002 to be a master:

$ tt cluster failover switch http://localhost:2379/myapp storage-a-002
To check the switching status, run:
tt cluster failover switch-status http://localhost:2379/myapp b1e938dd-2867-46ab-acc4-3232c2ef7ffe

Note that the command output includes an identifier of the task responsible for switching a master. You can use this identifier to see the status of switching a master instance using tt cluster failover switch-status.

$ tt cluster failover switch-status CONFIG_URI TASK_ID

tt cluster failover switch-status shows the status of switching a master instance. This command accepts the following arguments:

  • CONFIG_URI: A URI of the cluster configuration storage.
  • TASK_ID: An identifier of the task used to switch a master instance. You can find the task identifier in the tt cluster failover switch command output.

Example:

$ tt cluster failover switch-status http://localhost:2379/myapp b1e938dd-2867-46ab-acc4-3232c2ef7ffe

There are three ways to pass the credentials for connecting to the centralized configuration storage. They all apply to both etcd and Tarantool-based storages. The following list shows these ways ordered by precedence, from highest to lowest:

  1. Credentials specified in the storage URI: https://username:password@host:port/prefix:

    $ tt cluster show "http://myuser:p4$$w0rD@localhost:2379/myapp"
    
  2. tt cluster options -u/--username and -p/--password:

    $ tt cluster show "http://localhost:2379/myapp" -u myuser -p p4$$w0rD
    
  3. Environment variables TT_CLI_ETCD_USERNAME and TT_CLI_ETCD_PASSWORD:

    $ export TT_CLI_ETCD_USERNAME=myuser
    $ export TT_CLI_ETCD_PASSWORD=p4$$w0rD
    $ tt cluster show "http://localhost:2379/myapp"
    

If connection encryption is enabled on the configuration storage, pass the required SSL parameters in the URI arguments.

A URI of the cluster configuration storage has the following format:

http(s)://[username:password@]host:port[/prefix][?arguments]

-u, --username STRING

A username for connecting to the configuration storage.

See also: Authentication.

-p, --password STRING

A password for connecting to the configuration storage.

See also: Authentication.

--force

Applicable to: publish, replicaset

  • publish: skip validation when publishing. Default: false (validation is enabled).
  • replicaset: skip key selection for patching. Learn more in tt-cluster-replicaset-details:.
-G, --global

Applicable to: replicaset roles

Apply the operation to the global configuration scope, that is, to all instances.

-g, --group

Applicable to: publish, replicaset roles

A name of the configuration group to which the operation applies.

-i, --instance

Applicable to: replicaset roles

A name of the instance to which the operation applies.

-r, --replicaset

Applicable to: publish, replicaset roles

A name of the replica set to which the operation applies.

-t, --timeout UINT

Applicable to: failover

A timeout (in seconds) for executing a command. Default: 30.

--validate

Applicable to: show

Validate the printed configuration. Default: false (validation is disabled).

-w, --wait

Applicable to: failover

Wait while the command completes the execution. Default: false (don’t wait).

--with-integrity-check STRING

Enterprise Edition

This option is supported by the Enterprise Edition only.

Applicable to: publish, replicaset

Generate hashes and signatures for integrity checks.

See also: Publishing configurations with integrity check

Generating completion for tt

$ tt completion SHELL

tt completion generates tab-based completion for tt commands in the specified shell: bash or zsh.

Generate tt completion for the current bash terminal:

$ . <(tt completion bash)

Примечание

You can add an execution of the completion script to a user’s .bashrc file to make the completion work for this user in all their terminals.

Connecting to a Tarantool instance

$ tt connect {URI|INSTANCE_NAME} [OPTION ...]

tt connect connects to a Tarantool instance by its URI or instance name specified in the current environment.

-u USERNAME, --username USERNAME

A Tarantool user for connecting to the instance.

-p PASSWORD, --password PASSWORD

The user’s password.

-f FILEPATH, --file FILEPATH

Connect and evaluate the script from a file.

- – read the script from stdin.

-i, --interactive

Enter the interactive mode after evaluating the script passed in -f/--file.

-l LANGUAGE, --language LANGUAGE

The input language of the tt interactive console: lua (default) or sql.

-x FORMAT, --outputformat FORMAT

The output format of the tt interactive console: yaml (default), lua, table, ttable.

--sslcertfile FILEPATH

The path to an SSL certificate file for encrypted connections.

--sslkeyfile FILEPATH

The path to a private SSL key file for encrypted connections.

--sslcafile FILEPATH

The path to a trusted certificate authorities (CA) file for encrypted connections.

--sslciphers STRING

The list of SSL cipher suites used for encrypted connections, separated by colons (:).

To connect to an instance, tt typically needs its URI – the host name or IP address and the port.

You can also connect to instances in the same tt environment (that is, those that use the same configuration file and Tarantool installation) by their instance names.

When connecting to an instance by its URI, tt connect establishes a remote connection for which authentication is required. Use one of the following ways to pass the username and the password:

  • The -u (--username) and -p (--password) options:

    $ tt connect 192.168.10.10:3301 -u myuser -p p4$$w0rD
    
  • The connection string:

    $ tt connect myuser:p4$$w0rD@192.168.10.10:3301
    
  • Environment variables TT_CLI_USERNAME and TT_CLI_PASSWORD:

    $ export TT_CLI_USERNAME=myuser
    $ export TT_CLI_PASSWORD=p4$$w0rD
    $ tt connect 192.168.10.10:3301
    

If no credentials are provided for a remote connection, the user is automatically guest.

Примечание

Local connections (by instance name instead of the URI) don’t require authentication.

To connect to instances that use SSL encryption, provide the SSL certificate and SSL key files in the --sslcertfile and --sslkeyfile options. If necessary, add other SSL parameters – --sslcafile and --sslciphers.

By default, tt connect opens an interactive tt console. Alternatively, you can open a connection to evaluate a Lua script from a file or stdin. To do this, pass the file path in the -f (--file) option or use -f - to take the script from stdin.

$ tt connect app -f test.lua

Manipulating Tarantool core dumps

$ tt coredump COMMAND [COMMAND_OPTION ...]

tt coredump provides commands for manipulating Tarantool core dumps.

To be able to investigate Tarantool crashes, make sure that core dumps are enabled on the host. Here is the instruction on enabling core dumps on Unix systems.

COMMAND is one of the following:

Важно

tt coredump is not supported on macOS.

$ tt coredump pack COREDUMP_FILE

Pack a Tarantool core dump and supporting data into a tar.gz archive. It includes:

Pack a tar.gz file with a Tarantool core dump and supporting data:

$ tt coredump pack name.core

$ tt coredump unpack ARCHIVE

Unpack a Tarantool core dump archive created with tt coredump pack into a new directory:

$ tt coredump unpack tarantool-core-dump.tar.gz

$ tt coredump inspect [ARCHIVE|DIRECTORY] [-s]

Inspect a Tarantool core dump with the GNU debugger (gdb). The command argument can be either an archive file produced with tt coredump pack or directory where such an archive is extracted.

Inspect the core dump archive with gdb:

$ tt coredump inspect tarantool-core-dump.tar.gz

Inspect the unpacked core dump directory with gdb:

$ tt coredump inspect tarantool-core-dump

-s

Applicable to: inspect

Specify the location of Tarantool sources.

Creating an application from a template

$ tt create TEMPLATE_NAME [OPTION ...]

tt create creates a new Tarantool application from a template.

Application templates speed up the development of Tarantool applications by defining their initial structure and content. A template can include application code, configuration, build scripts, and other resources.

tt comes with built-in templates for popular use cases. You can also create custom templates for specific purposes.

There are the following built-in templates:

To create the app1 application in the current tt environment from the built-in vshard_cluster template:

$ tt create vshard_cluster --name app1 -dst /opt/tt/apps/

The command requests cluster topology parameters, such as the number of shards or routers, interactively during the execution.

To create the application in the /opt/tt/apps directory with default cluster topology and force rewrite the application directory if it already exists:

$ tt create vshard_cluster --name app1 -f --non-interactive -dst /opt/tt/apps/

tt searches for custom templates in the directories specified in the templates section of its configuration file.

To create the application app1 from the simple_app custom template in the current directory:

$ tt create simple_app --name app1

Application templates are directories with files.

The main file of a template is its manifest. It defines how the applications are instantiated from this template.

A template manifest is a YAML file named MANIFEST.yaml. It can contain the following sections:

  • description – the template description.
  • varstemplate variables.
  • pre-hook and post-hook – paths to executables to run before and after the template instantiation.
  • include – a list of files to keep in the application directory after instantiation. If this section is omitted, the application will contain all template files and directories.

All sections are optional.

Example:

description: Template description
vars:
  - prompt: User name
    name: user_name
    default: admin
    re: ^\w+$

  - prompt: Retry count
    default: "3"
    name: retry_count
    re: ^\d+$
pre-hook: ./hooks/pre-gen.sh
post-hook: ./hooks/post-gen.sh
include:
  - init.lua
  - instances.yml

Files and directories of a template are copied to the application directory according to the include section of the manifest (or its absence).

Примечание

Don’t include the .rocks directory in application templates. To specify application dependencies, use the .rockspec files.

There is a special file type *.tt.template. The content of such files is adjusted for each application with the help of template variables. During the instantiation, the variables in these files are replaced with provided values and the *.tt.template extension is removed.

Templates variables are replaced with their values provided upon the instantiation.

All templates have the name variable. Its value is taken from the --name option.

To add other variables, define them in the vars section of the template manifest. A variable can have the following attributes:

  • prompt: a line of text inviting to enter the variable value in the interactive mode. Required.
  • name: the variable name. Required.
  • default: the default value. Optional.
  • re: a regular expression that the value must match. Optional.

Example:

vars:
  - prompt: Cluster cookie
    name: cluster_cookie
    default: cookie
    re: ^\w+$

Variables can be used in all file names and the content of *.tt template files.

Примечание

Variables don’t work in directory names.

To use a variable, enclose its name with a period in the beginning in double curly braces: {{.var_name}} (as in the Golang text templates syntax).

Examples:

  • init.lua.tt.template file:

    local app_name = {{.name}}
    local login = {{.user_name}}
    
  • A file name {{.user_name}}.txt

Variables receive their values during the template instantiation. By default, tt create asks you to provide the values interactively. You can use the -s (or --non-interactive) option to disable the interactive input. In this case, the values are searched in the following order:

  • In the --var option. Pass a string of the var=value format after the --var option. You can pass multiple variables, each after a separate --var option:

    $ tt create template app --var user_name=admin
    
  • In a file. Specify var=value pairs in a plain text file, each on a new line, and pass it as the value of the --vars-file option:

    $ tt create template app --vars-file variables.txt
    

    variables.txt can look like this:

    user_name=admin
    password=p4$$w0rd
    version=2
    

If a variable isn’t initialized in any of these ways, the default value from the manifest is used.

You can combine different ways of passing variables in a single call of tt create.

By default, the application appears in the directory named after the provided application name (--name value).

To change the application location, use the -dst option.

-d PATH, --dst PATH

Path to the directory where the application will be created.

-f, --force

Force rewrite the application directory if it already exists.

--name NAME

Application name.

-s, --non-interactive

Non-interactive mode.

--var [VAR=VALUE ...]

Variable definition. Usage: --var var_name=value.

--vars-file FILEPATH

Path to the file with variable definitions.

Interacting with the CRUD module

Enterprise Edition

This command is supported by the Enterprise Edition only.

$ tt crud COMMAND [COMMAND_OPTION ...]

tt crud enables the interaction with a cluster using the CRUD module. COMMAND is one of the following:

Downloading Tarantool Enterprise SDK

$ tt download VERSION [OPTION ...]

tt download downloads Tarantool Enterprise SDK from the customer zone.

The VERSION is a part of the SDK archive name between tarantool-enterprise-sdk- and the platform identifier. For example, to download tarantool-enterprise-sdk-gc64-3.0.0-0-gf58f7d82a-r23.linux.x86_64.tar.gz, run:

$ tt download gc64-3.0.0-0-gf58f7d82a-r23

tt automatically chooses the bundle for the current platform.

To download the Tarantool Enterprise SDK using tt download, you need to provide access credentials for the Tarantool customer zone. Use one of the following ways to pass the username and the password:

--dev

Download a development build.

--directory-prefix STRING

The downloaded SDK location. Default: . (current directory).

Adding external applications to environments

$ tt enable {APPLICATION|SCRIPT}

tt enable adds an external Tarantool application to the current environment by creating a symlink to it in the instances.enabled directory.

To add the application located in /home/tt-user/external_app to the current tt environment:

$ tt enable /home/tt-user/external_app

Once the application is added, you can work with it the same way as with applications created in this environment.

Exporting data

Enterprise Edition

This command is supported by the Enterprise Edition only.

$ tt [crud|tdg2] export URI SPACE:FILE ... [EXPORT_OPTION ...]

tt [crud|tdg2] export exports a space’s data to a file. Three export commands cover the following cases:

tt [crud|tdg2] export takes the following arguments:

Примечание

Read access to the space is required to export its data.

tt export exports data in the following formats:

Exporting isn’t supported for the interval field type.

The command below exports data of the customers space to the customers.csv file:

$ tt crud export localhost:3301 customers:customers.csv

If the customers space has five fields (id, bucket_id, firstname, lastname, and age), the file with exported data might look like this:

1,477,Andrew,Fuller,38
2,401,Michael,Suyama,46
3,2804,Robert,King,33
# ...

If a tuple contains a null value, for example, [1, 477, 'Andrew', null, 38], it is exported as an empty value:

1,477,Andrew,,38

To export data with a space’s field names in the first row of the CSV file, use the --header option:

$ tt crud export localhost:3301 customers:customers.csv  \
                 --header

In this case, field values start from the second row, for example:

id,bucket_id,firstname,lastname,age
1,477,Andrew,Fuller,38
2,401,Michael,Suyama,46
3,2804,Robert,King,33
# ...

In the CSV format, tt exports empty values by default for fields containing compound data such as arrays or maps. To export compound values in a specific format, use the --compound-value-format option. For example, the command below exports compound values to CSV serialized in JSON:

$ tt crud export localhost:3301 customers:customers.csv  \
                 --compound-value-format json

Примечание

In the TDG2 data model, a type represents a Tarantool space, and an object of a type represents a tuple in the type’s underlying space.

The command below exports data of the customers type from a TDG2 cluster to the customers.jsonl file:

$ tt tdg2 export localhost:3301 customers:customers.jsonl

If token authentication is enabled in TDG2, pass the application token in the --token option:

$ tt tdg2 export localhost:3301 customers:customers.jsonl \
                 --token=2fc136cf-8cae-4655-a431-7c318967263d

If the customers type has four fields (id, firstname, lastname, and age), the file with exported data might look like this:

{"age":30,"first_name":"Samantha","id":1,"second_name":"Carter"}
{"age":41,"first_name":"Fay","id":2,"second_name":"Rivers"}
{"age":74,"first_name":"Milo","id":4,"second_name":"Walters"}

null field values are skipped:

{"age":13,"first_name":"Zachariah","id":3}

Object fields that contain maps with non-string keys are converted to maps with string keys.

TDG2 sets a limit on the number of objects transferred from each storage during a query execution (the hard-limits.returned TDG2 configuration parameter). If an export batch size (--batch-size parameter) is greater than this limit, it is possible that more than hard-limits.returned objects will be requested from one storage and export will fail. To make sure that hard-limits.returned is never exceeded during an export operation, set the export batch size less or equal to this limit.

For example, if your TDG2 cluster has a 1000 objects hard-limits.returned limit:

# tdg2 config.yaml
# ...
hard-limits.returned: 1000

Set the tt tdg2 export batch size less or equal to 1000:

$ tt tdg2 export localhost:3301 customers:customers.jsonl --batch-size=1000

When connecting to the cluster with enabled authentication, specify access credentials in the --username and --password command options:

$ tt crud export localhost:3301 customers:customers.csv \
                 --username myuser --password p4$$w0rD

To connect to instances that use SSL encryption, provide the SSL certificate and SSL key files in the --sslcertfile and --sslkeyfile options. If necessary, add other SSL parameters in the --ssl* options.

$ tt crud export localhost:3301 customers:customers.csv \
                 --username myuser --password p4$$w0rD   \
                 --auth pap-sha256 --sslcertfile certs/server.crt \
                 --sslkeyfile certs/server.key

For connections that use SSL but don’t require additional parameters, add the --use-ssl option:

$ tt crud export localhost:3301 customers:customers.csv \
                 --username myuser --password p4$$w0rD   \
                 --use-ssl

--auth STRING

Applicable to: tt crud export, tt tdg2 export

Authentication type: chap-sha1, pap-sha256, or auto.

--batch-queue-size INT

The maximum number of tuple batches in a queue between a fetch and write threads (the default is 32).

tt exports data using two threads:

  • A fetch thread makes requests and receives data from a Tarantool instance.
  • A write thread encodes received data and writes it to the output.

The fetch thread uses a queue to pass received tuple batches to the write thread. If a queue is full, the fetch thread waits until the write thread takes a batch from the queue.

--batch-size INT

The number of tuples to transfer per request. The default is:

  • 10000 for tt export and tt crud export.
  • 100 for tt tdg2 export.

Важно

When using tt tdg2 export, make sure that the batch size does not exceed the hard-limits.returned TDG2 parameter value set on the cluster.

--compound-value-format STRING

Applicable to: tt export, tt crud export

A format used to export compound values like arrays or maps. By default, tt exports empty values for fields containing such values.

Supported formats: json.

See also: Exporting compound data.

--header

Applicable to: tt export, tt crud export

Add field names in the first row.

See also: Exporting headers.

--password STRING

A password used to connect to the instance.

--readview

Applicable to: tt export, tt crud export

Export data using a read view.

--sslcafile STRING

Applicable to: tt crud export, tt tdg2 export

The path to a trusted certificate authorities (CA) file for encrypted connections.

See also Encrypted connection.

--sslcertfile STRING

Applicable to: tt crud export, tt tdg2 export

The path to an SSL certificate file for encrypted connections.

See also Encrypted connection.

--sslciphersfile STRING

Applicable to: tt crud export, tt tdg2 export

The list of SSL cipher suites used for encrypted connections, separated by colons (:).

See also Encrypted connection.

--sslkeyfile STRING

Applicable to: tt crud export, tt tdg2 export

The path to a private SSL key file for encrypted connections.

See also Encrypted connection.

--sslpassword STRING

Applicable to: tt crud export, tt tdg2 export

The password for the SSL key file for encrypted connections.

See also Encrypted connection.

--sslpasswordfile STRING

Applicable to: tt crud export, tt tdg2 export

A file with list of passwords to the SSL key file for encrypted connections.

See also Authentication.

--token STRING

Applicable to: tt tdg2 export

An application token for connecting to TDG2.

--use-ssl STRING

Use SSL without providing any additional SSL parameters.

See also Encrypted connection.

--username STRING

A username for connecting to the instance.

Displaying help for tt and its commands

$ tt help [COMMAND]

tt help displays help:

Importing data

Enterprise Edition

This command is supported by the Enterprise Edition only.

$ tt [crud|tdg2] import URI FILE:SPACE [IMPORT_OPTION ...]
# or
$ tt [crud|tdg2] import URI :SPACE < FILE [IMPORT_OPTION ...]

tt [crud|tdg] import imports data from a file to a space. Three import commands cover the following cases:

tt [crud|tdg2] import takes the following arguments:

Примечание

Write access to the space and execute access to universe are required to import data.

tt import imports data from the following formats:

Importing isn’t supported for the interval field type.

Suppose that you have the customers.csv file with a header containing field names in the first row:

id,firstname,lastname,age
1,Andrew,Fuller,38
2,Michael,Suyama,46
3,Robert,King,33
# ...

If the target customers space has fields with the same names, you can import data using the --header and --match options specified as follows:

$ tt crud import localhost:3301 customers.csv:customers \
                 --header \
                 --match=header

In this case, fields in the input file and the target space are matched automatically. You can also match fields manually if field names in the input file and the target space differ. Note that if you’re importing data into a cluster, you don’t need to specify the bucket_id field. The CRUD module generates bucket_id values automatically.

The --match option enables importing data by matching field names in the input file and the target space manually. Suppose that you have the following customers.csv file with four fields:

customer_id,name,surname,customer_age
1,Andrew,Fuller,38
2,Michael,Suyama,46
3,Robert,King,33
# ...

If the target customers space has the id, firstname, lastname, and age fields, you can configure mapping as follows:

$ tt crud import localhost:3301 customers.csv:customers \
                 --header \
                 --match "id=customer_id;firstname=name;lastname=surname;age=customer_age"

Similarly, you can configure mapping using numeric field positions in the input file:

$ tt crud import localhost:3301 customers.csv:customers \
                 --header \
                 --match "id=1;firstname=2;lastname=3;age=4"

Below are the rules if some fields are missing in input data or space:

  • If a space has fields that are not specified in input data, tt [crud] import tries to insert null values.
  • If input data contains fields missing in a target space, these fields are ignored.

When importing data into a CRUD-enabled sharded cluster, tt crud import ignores the bucket_id field values from the input file. This allows CRUD to automatically manage data distribution in the cluster by generating new bucket_id for tuples during import.

If you need to preserve the original bucket_id values, use the --keep-bucket-id option:

$ tt crud import localhost:3301 customers.csv:customers \
                 --keep-bucket-id \
                 --header \
                 --match=header

The --on-exist option enables you to control data import when a duplicate primary key error occurs. In the example below, values already existing in the space are replaced with new ones:

$ tt crud import localhost:3301 customers.csv:customers \
                 --on-exist replace

To skip rows whose data cannot be parsed correctly, use the --on-error option as follows:

$ tt crud import localhost:3301 customers.csv:customers \
                 --on-error skip

Примечание

In the TDG2 data model, a type represents a Tarantool space, and an object of a type represents a tuple in the type’s underlying space.

The command below imports objects of the customers type into a TDG2 cluster. The objects are described in the customers.jsonl file.

$ tt tdg2 import localhost:3301 customers.jsonl:customers

If token authentication is enabled in TDG2, pass the application token in the --token option:

$ tt tdg2 import localhost:3301 customers.jsonl:customers \
                 --token=2fc136cf-8cae-4655-a431-7c318967263d

The input file can look like this:

{"age":30,"first_name":"Samantha","id":1,"second_name":"Carter"}
{"age":41,"first_name":"Fay","id":2,"second_name":"Rivers"}
{"age":74,"first_name":"Milo","id":4,"second_name":"Walters"}

Примечание

Since JSON describes objects in maps with string keys, there is no way to import a field value that is a map with a non-string key.

In case of an error during TDG2 import, tt tdg2 import rolls back the changes made within the current batch on the storage where the error has happened (per-storage rollback) and reports an error. On other storages, objects from the same batch can be successfully imported. So, the rollback process of tt tdg2 import is the same as the one of tt crud import with the --rollback-on-error option.

Since object batches can be imported partially (per-storage rollback), the absence of error matching complicates the debugging in case of errors. To minimize this effect, the default batch size (--batch-size) for tt tdg2 import is 1. This makes the debugging straightforward: you always know which object caused the error. On the other hand, this decreases the performance in comparison to import in larger batches.

If you increase the batch size, tt informs you about the possible issues and asks for an explicit confirmation to proceed. To automatically confirm a batch import operation, add the --force option:

$ tt tdg2 import localhost:3301 customers.jsonl:customers \
                 --batch-size=100 \
                 --force

When connecting to the cluster with enabled authentication, specify access credentials in the --username and --password command options:

$ tt crud import localhost:3301 customers.csv:customers \
                 --header --match=header \
                 --username myuser --password p4$$w0rD

To connect to instances that use SSL encryption, provide the SSL certificate and SSL key files in the --sslcertfile and --sslkeyfile options. If necessary, add other SSL parameters in the --ssl* options.

$ tt crud import localhost:3301 customers.csv:customers \
                 --header --match=header \
                 --username myuser --password p4$$w0rD   \
                 --auth pap-sha256 --sslcertfile certs/server.crt \
                 --sslkeyfile certs/server.key

For connections that use SSL but don’t require additional parameters, add the --use-ssl option:

$ tt crud import localhost:3301 customers.csv:customers \
                 --header --match=header \
                 --username myuser --password p4$$w0rD   \
                 --use-ssl

--auth STRING

Applicable to: tt crud import, tt tdg2 import

Authentication type: chap-sha1, pap-sha256, or auto.

--batch-size INT

Applicable to: tt crud import, tt tdg2 import

The number of tuples to transfer per request. The default is:

--dec-sep STRING

Applicable to: tt import, tt crud import

The string of symbols that defines decimal separators for numeric data (the default is .,).

Примечание

Symbols specified in this option cannot intersect with --th-sep.

--delimiter STRING

Applicable to: tt import, tt crud import

A symbol that defines a field value delimiter. For CSV, the default delimiter is a comma (,). To use a tab character as a delimiter, set this value as tab:

$ tt crud import localhost:3301 customers.csv:customers \
                 --delimiter tab

Примечание

A delimiter cannot be \r, \n, or the Unicode replacement character (U+FFFD).

--error STRING

The name of a file containing rows that are not imported (the default is error).

See also: Handling parsing errors.

--force

Applicable to: tt tdg2 import

Automatically confirm importing into TDG2 with --batch-size greater than one.

--format STRING

A format of input data.

Supported formats: csv.

--header

Applicable to: tt import, tt crud import

Process the first line as a header containing field names. In this case, field values start from the second line.

See also: Matching of input and space fields.

--keep-bucket-id

Applicable to: tt crud import

Preserve original values of the bucket_id field.

See also: Importing bucket_id into sharded clusters.

--log STRING

The name of a log file containing information about import errors (the default is import). If the log file already exists, new data is written to this file.

--match STRING

Applicable to: tt import, tt crud import

Configure matching between field names in the input file and the target space.

See also: Matching of input and space fields.

--null STRING

Applicable to: tt import, tt crud import

A value to be interpreted as null when importing data. By default, an empty value is interpreted as null. For example, a tuple imported from the following row …

1,477,Andrew,,38

… should look as follows: [1, 477, 'Andrew', null, 38].

--on-error STRING

An action performed if a row to be imported cannot be parsed correctly. Possible values:

  • stop: stop importing data.
  • skip: skip rows whose data cannot be parsed correctly.

Duplicate primary key errors are handled using the --on-exist option.

See also: Handling parsing errors.

--on-exist STRING

An action performed if a duplicate primary key error occurs. Possible values:

  • stop: stop importing data.
  • skip: skip existing values when importing.
  • replace: replace existing values when importing.

Other errors are handled using the --on-error option.

See also: Handling duplicate primary key errors.

--password STRING

A password used to connect to the instance.

--progress STRING

The name of a progress file that stores the following information:

  • The positions of lines that were not imported at the last launch.
  • The last position that was processed at the last launch.

If a file with the specified name exists, it is taken into account when importing data. tt import tries to insert lines that were not imported and then continues importing from the last position.

At each launch, the content of a progress file with the specified name is overwritten. If the file with the specified name does not exist, a progress file is created with the results of this run.

Примечание

If the option is not set, then this mechanism is not used.

--quote STRING

Applicable to: tt import, tt crud import

A symbol that defines a quote. For CSV, double quotes are used by default ("). The double symbol of this option acts as the escaping symbol within input data.

--rollback-on-error

Applicable to: tt crud import

Specify whether any operation failed on a storage leads to rolling back batch import on this storage.

Примечание

tt tdg2 import always works as if --rollback-on-error is true.

--sslcafile STRING

Applicable to: tt crud import, tt tdg2 import

The path to a trusted certificate authorities (CA) file for encrypted connections.

See also Encrypted connection.

--sslcertfile STRING

Applicable to: tt crud import, tt tdg2 import

The path to an SSL certificate file for encrypted connections.

See also Encrypted connection.

--sslciphersfile STRING

Applicable to: tt crud import, tt tdg2 import

The list of SSL cipher suites used for encrypted connections, separated by colons (:).

See also Encrypted connection.

--sslkeyfile STRING

Applicable to: tt crud import, tt tdg2 import

The path to a private SSL key file for encrypted connections.

See also Encrypted connection.

--sslpassword STRING

Applicable to: tt crud import, tt tdg2 import

The password for the SSL key file for encrypted connections.

See also Encrypted connection.

--sslpasswordfile STRING

Applicable to: tt crud import, tt tdg2 import

A file with a list of passwords to the SSL key file for encrypted connections.

See also Authentication.

-success STRING

The name of a file with rows that were imported (the default is success). Overwrites the file if it already exists.

--th-sep STRING

Applicable to: tt import, tt crud import

The string of symbols that define thousand separators for numeric data. The default value includes a space and a backtick `. This means that 1 000 000 and 1`000`000 are both imported as 1000000.

Примечание

Symbols specified in this option cannot intersect with --dec-sep.

--token STRING

Applicable to: tt tdg2 import

An application token for connecting to TDG2.

--use-ssl STRING

Use SSL without providing any additional SSL parameters.

See also Encrypted connection.

--username STRING

A username for connecting to the instance.

Creating a tt environment

$ tt init

tt init creates a tt environment in the current directory. This includes:

Важно

The Tarantool Cartridge framework is deprecated and is not compatible with Tarantool 3.0 and later. This command is added for backward compatibility with earlier versions.

tt init checks the existence of configuration files for Cartridge (cartridge.yml) or the tarantoolctl utility (.tarantoolctl) in the current directory. If such files are found, tt generates an environment that uses the same directories:

If there is no cartridge.yml or .tarantoolctl files in the current directory, tt init creates a default environment in it. This includes creating the following directories and files:

Create a tt environment in the current directory:

$ tt init

Installing Tarantool software

$ tt install PROGRAM_NAME [VERSION|COMMIT_HASH|PR_ID] [OPTION ...]

tt install installs the latest or an explicitly specified version of Tarantool or tt. The possible values of PROGRAM_NAME are:

Примечание

For tarantool-ee, account credentials are required. Specify them in a file (see the ee section of the configuration file) or provide them interactively.

Additionally, tt install can build open source programs tarantool and tt from a specific commit or a pull request on their GitHub repositories.

To uninstall a Tarantool or tt version, use tt uninstall.

--dynamic

Applicable to: tarantool, tarantool-ee

Use dynamic linking for building Tarantool.

-f, --force

Skip dependency check before installation.

--local-repo

Install a program from the local repository, which is specified in the repo section of the tt configuration file.

--no-clean

Don’t delete temporary files.

--reinstall

Reinstall a previously installed program.

--use-docker

Applicable to: tarantool, tarantool-ee

Build Tarantool in an Ubuntu 18.04 Docker container.

When called without an explicitly specified version, tt install installs the latest available version. If the version is specified in the incomplete format <MAJOR>.<MINOR>, the command installs the latest available patch version in the series. To check versions available for installation, use tt search.

By default, available versions of Tarantool Community Edition and tt are taken from their git repositories. Their installation includes building from sources, which requires some tools and dependencies, such as a C compiler. Make sure they are available in the system.

Tarantool Enterprise Edition is installed from prebuilt packages.

To install Tarantool EE using tt install, you need to provide access credentials for the Tarantool customer zone. Use one of the following ways to pass the username and the password:

  • A text file specified in the ee.credential_path parameter of the tt enviromnment configuration:

    # tt.yaml
    # ...
    ee:
      credential_path: cred.txt
    

    cred.txt should contain a username and a password on separate lines:

    myuser@tarantool.io
    p4$$w0rD
    
  • Environment variables TT_CLI_EE_USERNAME and TT_CLI_EE_PASSWORD:

    $ export TT_CLI_EE_USERNAME=myuser@tarantool.io
    $ export TT_CLI_EE_PASSWORD=p4$$w0rD
    $ tt install tarantool-ee
    

tt install can be used to build custom Tarantool and tt versions for development purposes from commits and pull requests on their GitHub repositories.

To build Tarantool or tt from a specific commit on their GitHub repository, pass the commit hash (7 or more characters) after the program name. If you want to use a PR as a source, provide a pr/<PR_ID> argument:

$ tt install tarantool 03c184d
$ tt install tt pr/50

If you build Tarantool from sources, you can install local builds to the current tt environment by running tt install with the tarantool-dev program name and the path to the build:

$ tt install tarantool-dev ~/src/tarantool/build

You can also set up a local repository with installation files you need. To use it, specify its location in the repo section of the tt configuration file and run tt install with the --local-repo flag.

Listing enabled applications

$ tt instances

tt instances shows the list of enabled applications and their instances in the current environment.

Примечание

Enabled applications are applications that are stored inside the instances_enabled directory specified in the tt configuration file. They can be either running or not. To check if an application is running, use tt status.

Terminating Tarantool instances

$ tt kill APPLICATION[:APP_INSTANCE]

tt kill terminates instances with SIGQUIT and SIGKILL signals.

To terminate all instances of the app application:

$ tt kill app

To terminate the storage-001-r instance of the app application without confirmation:

$ tt kill app:storage-001-r --force

To terminate the storage-001-r instance of the app application and generate its core dump:

$ tt kill app:storage-001-r --dump

-d, --dump

Generate core dumps of terminated processes.

-f, --force

Kill instances without confirmation.

Printing Tarantool logs

$ tt log [APPLICATION[:APP_INSTANCE]]

tt log prints the last lines of instance logs.

To print 10 last log lines of all the app application instances:

$ tt log app

To print 50 last log lines of the router instance of the app application:

$ tt log -n 50 app:router

To keep printing logs of the app application instances as they grow:

$ tt log -f app

-f, --follow

Keep printing new lines added to the log file.

-n, --lines

The number of last lines to output. Default: 10.

Rotating instance logs

$ tt logrotate APPLICATION[:APP_INSTANCE]

tt logrotate rotates logs of a Tarantool application or specific instances, and the tt log. For example, you need to call this function to continue logging after a log rotation program renames or moves instances“ logs. Learn more about rotating logs.

Calling tt logrotate on an application has the same effect as executing the built-in log.rotate() function on all its instances.

Rotate logs of the app application’s instances:

$ tt logrotate app

Managing centralized migrations

Enterprise Edition

This command is supported by the Enterprise Edition only.

$ tt migrations COMMAND [COMMAND_OPTION ...]

tt migrations manages centralized migrations in a Tarantool EE cluster. See Centralized migrations with tt for a detailed guide on using the centralized migrations mechanism.

Важно

Only Tarantool EE clusters with etcd centralized configuration storage are supported.

COMMAND is one of the following:

$ tt migrations publish ETCD_URI [MIGRATIONS_DIR | MIGRATION_FILE] [OPTION ...]

tt migrations publish sends the migration files to the cluster’s centralized configuration storage for future execution.

By default, the command sends all files stored in migrations/ inside the current directory.

$ tt migrations publish "https://user:pass@localhost:2379/myapp"

To select another directory with migration files, provide a path to it as the command argument:

$ tt migrations publish "https://user:pass@localhost:2379/myapp" my_migrations

To publish a single migration from a file, use its name or path as the command argument:

$ tt migrations publish "https://user:pass@localhost:2379/myapp" migrations/000001_create_space.lua

Optionally, you can provide a key to use as a migration identifier instead of the filename:

$ tt migrations publish "https://user:pass@localhost:2379/myapp" file.lua  \
                        --key=000001_create_space.lua

When publishing migrations, tt performs checks for:

Предупреждение

Using the options that ignore checks when publishing migration may cause migration inconsistency in the cluster.

$ tt migrations apply ETCD_URI [OPTION ...]

tt migrations apply applies published migrations to the cluster. It executes all migrations from the cluster’s centralized configuration storage on all its read-write instances (replica set leaders).

$ tt migrations apply "https://user:pass@localhost:2379/myapp"  \
                    --tarantool-username=admin --tarantool-password=pass

To apply a single published migration, pass its name in the --migration option:

$ tt migrations apply "https://user:pass@localhost:2379/myapp"  \
                    --tarantool-username=admin --tarantool-password=pass  \
                    --migration=000001_create_space.lua

To apply migrations on a single replica set, specify the replicaset option:

$ tt migrations apply "https://user:pass@localhost:2379/myapp"  \
                    --tarantool-username=admin --tarantool-password=pass  \
                    --replicaset=storage-001

The command also provides options for migration troubleshooting: --ignore-order-violation, --force-reapply, and --ignore-preceding-status. Learn to use them in Troubleshooting migrations.

Предупреждение

The use of migration troubleshooting options may lead to migration inconsistency in the cluster. Use them only for local development and testing purposes.

$ tt migrations status ETCD_URI [OPTION ...]

tt migrations status prints the list of migrations published to the centralized storage and the result of their execution on the cluster instances.

Possible migration statuses are:

To get the list of migrations stored in the given etcd storage and information about their execution on the cluster, run:

$ tt migrations status "https://user:pass@localhost:2379/myapp"  \
                       --tarantool-username=admin --tarantool-password=pass

If the cluster uses SSL encryption, add SSL options. Learn more in Authentication.

Use the --migration and --replicaset options to get information about specific migrations or replica sets:

$ tt migrations status "https://user:pass@localhost:2379/myapp"  \
                     --tarantool-username=admin --tarantool-password=pass \
                     --replicaset=storage-001 --migration=000001_create_writers_space.lua

The --display-mode option allows to tailor the command output:

To find out the results of a migration execution on a specific replica set in the cluster, run:

$ tt migrations status "https://user:pass@localhost:2379/myapp"  \
                       --tarantool-username=admin --tarantool-password=pass  \
                       --replicaset=storage-001 --display-mode=cluster

$ tt migrations stop ETCD_URI [OPTION ...]

tt migrations stop stops the execution of migrations in the cluster.

Предупреждение

Calling tt migration stop may cause migration inconsistency in the cluster.

To stop the execution of a migration currently running in the cluster:

$ tt migrations stop "https://user:pass@localhost:2379/myapp"  \
                     --tarantool-username=admin --tarantool-password=pass

tt migrations stop interrupts a single migration. If you call it to interrupt the process that applies multiple migrations, the ones completed before the call receive the APPLIED status. The migration is interrupted by the call remains in APPLY_STARTED.

$ tt migrations remove ETCD_URI [OPTION ...]

tt migrations remove removes published migrations from the centralized storage. With additional options, it can also remove the information about the migration execution on the cluster instances.

To remove all migrations from a specified centralized storage:

$ tt migrations remove "https://user:pass@localhost:2379/myapp"  \
                       --tarantool-username=admin --tarantool-password=pass

To remove a specific migration, pass its name in the --migration option:

$ tt migrations remove "https://user:pass@localhost:2379/myapp"  \
                       --tarantool-username=admin --tarantool-password=pass  \
                       --migration=000001_create_writers_space.lua

Before removing migrations, the command checks their status on the cluster. To ignore the status and remove migrations anyway, add the --force-remove-on=config-storage option:

$ tt migrations remove "https://user:pass@localhost:2379/myapp"  \
                        --force-remove-on=config-storage

Примечание

In this case, cluster credentials are not required.

To remove migration execution information from the cluster (clear the migration status), use the --force-remove-on=cluster option:

$ tt migrations remove "https://user:pass@localhost:2379/myapp"  \
                       --tarantool-username=admin --tarantool-password=pass  \
                       --force-remove-on=cluster

To clear all migration information from the centralized storage and cluster, use the --force-remove-on=all option:

$ tt migrations remove "https://user:pass@localhost:2379/myapp"  \
                       --tarantool-username=admin --tarantool-password=pass  \
                       --force-remove-on=all

Since tt migrations operates migrations via a centralizes etcd storage, it needs credentials to access this storage. There are two ways to pass etcd credentials:

Credentials specified in the URI have a higher priority.

For commands that connect to the cluster (that is, all except publish), Tarantool credentials are also required. The are passed in the --tarantool-username and --tarantool-password options.

If the cluster uses SSL traffic encryption, provide the necessary connection parameters in the --tarantool-ssl* options: --tarantool-sslcertfile, --tarantool-sslkeyfile, and other. All options are listed in Options.

--acquire-lock-timeout INT

Applicable to: apply

Migrations fiber lock acquire timeout in seconds. Default: 60. Fiber lock is used to prevent concurrent migrations run

--config-storage-password STRING

A password for connecting to the centralized migrations storage (etcd).

See also: Authentication.

--config-storage-username STRING

A username for connecting to the centralized migrations storage (etcd).

See also: Authentication.

--display-mode STRING

Applicable to: status

Display only specific information. Possible values:

  • config-storage – information about migrations published to the centralized storage.
  • cluster – information about migration applied on the cluster.

See also: status.

--execution-timeout INT

Applicable to: apply, remove, status, stop

A timeout for completing the operation on a single Tarantool instance, in seconds. Default values:

  • 3 for remove, status, and stop
  • 3600 for apply
--force-reapply

Applicable to: apply

Apply migrations disregarding their previous status.

Предупреждение

Using this option may lead to migrations inconsistency in the cluster.

--force-remove-on STRING

Applicable to: remove

Remove migrations disregarding their status. Possible values:

  • config-storage: remove migrations on etcd centralized migrations storage disregarding the cluster apply status.
  • cluster: remove migrations status info only on a Tarantool cluster.
  • all to execute both config-storage and cluster force removals.

Предупреждение

Using this option may lead to migrations inconsistency in the cluster.

--ignore-order-violation

Applicable to: apply, publish

Skip migration scenarios order check before publish.

Предупреждение

Using this option may lead to migrations inconsistency in the cluster.

--ignore-preceding-status

Applicable to: apply

Skip preceding migrations status check on apply.

Предупреждение

Using this option may lead to migrations inconsistency in the cluster.

--key STRING

Applicable to: publish

Put scenario to /<prefix>/migrations/scenario/<key> etcd key instead. Only for single file publish.

--migration STRING

Applicable to: apply, remove, status

A migration to apply, remove, or check status.

--overwrite

Applicable to: publish

overwrite existing migration storage keys.

Предупреждение

Using this option may lead to migrations inconsistency in the cluster.

--replicaset STRING

Applicable to: apply, remove, status, stop

Execute the operation only on the specified replica set.

--skip-syntax-check

Applicable to: publish

Skip syntax check before publish.

Предупреждение

Using this option may cause further tt migrations calls to fail.

--tarantool-auth STRING

Applicable to: apply, remove, status, stop

Authentication type used to connect to the cluster instances.

--tarantool-connect-timeout INT

Applicable to: apply, remove, status, stop

Tarantool cluster instances connection timeout, in seconds. Default: 3.

--tarantool-password STRING

Applicable to: apply, remove, status, stop

A password used to connect to the cluster instances.

--tarantool-sslcafile STRING

Applicable to: apply, remove, status, stop

SSL CA file used to connect to the cluster instances.

--tarantool-sslcertfile STRING

Applicable to: apply, remove, status, stop

SSL cert file used to connect to the cluster instances.

--tarantool-sslciphers STRING

Applicable to: apply, remove, status, stop

Colon-separated list of SSL ciphers used to connect to the cluster instances.

--tarantool-sslkeyfile STRING

Applicable to: apply, remove, status, stop

SSL key file used to connect to the cluster instances.

--tarantool-sslpassword STRING

Applicable to: apply, remove, status, stop

SSL key file password used to connect to the cluster instances.

--tarantool-sslpasswordfile STRING

Applicable to: apply, remove, status, stop

File with list of password to SSL key file used to connect to the cluster instances.

--tarantool-use-ssl

Applicable to: apply, remove, status, stop

Whether SSL is used to connect to the cluster instances.

--tarantool-username STRING

Applicable to: apply, remove, status, stop

A username for connecting to the Tarantool cluster instances.

Packaging the application

$ tt pack TYPE [OPTION ...] ..

tt pack packages an application into a distributable bundle of the specified TYPE:

The command below creates a DEB package with all applications from the current tt environment:

$ tt pack deb

This command generates a .deb file whose name depends on the environment directory name and the operating system architecture, for example, test-env_0.1.0.0-1_x86_64.deb. The package contains the following files:

You can also pass various options to the tt pack command to adjust generation properties, for example, customize a bundle name, choose which artifacts should be included, specify the required application dependencies.

You can customize your application’s systemd unit file generated by tt pack. To add parameters to the unit file, define them in a YAML file named systemd-unit-params.yml in the application directory.

$ tt pack rpm # unit file with parameters from systemd-unit-params.yml if it exists

You can also pass unit parameters from an arbitrary file by adding the --unit-params-file option to the tt pack call:

$ tt pack rpm --unit-params-file my-params.yml # unit file with parameters from my-params.yml

Важно

The systemd-unit-params.yml file has a higher priority than the --unit-params-file option. If this file exists, it overrides parameters from the file passed in the option.

tt pack supports the following systemd unit parameters:

An example of the systemd-unit-params.yml file:

FdLimit: 200
instance-env:
  INSTANCE: "inst:%i"
  TARANTOOL_WORKDIR: "/tmp"

Enterprise Edition

The integrity check functionality is supported by the Enterprise Edition only.

tt pack can generate checksums and signatures to use for integrity checks when running the application. These files are:

To generate checksums and signatures for integrity check, use the --with-integrity-check option. Its argument must be an RSA private key.

Примечание

You can generate a key pair using OpenSSL 3 as follows:

$ openssl genrsa -traditional -out private.pem 2048
$ openssl rsa -in private.pem -pubout > public.pem

To create a tar.gz archive with integrity check artifacts:

$ tt pack tgz --with-integrity-check private.pem

Learn how to perform integrity checks at the application startup and in runtime in the tt start reference.

--all

Include all artifacts in a bundle. In this case, a bundle might include snapshots, WAL files, and logs.

--app-list APPLICATIONS

Specify the applications included in a bundle.

Example

$ tt pack tgz --app-list app1,app3
--cartridge-compat

Applicable to: tgz

Package a Cartridge CLI-compatible archive.

Важно

The Tarantool Cartridge framework is deprecated and is not compatible with Tarantool 3.0 and later. This command is added for backward compatibility with earlier versions.

--deps STRINGS

Applicable to: deb, rpm

Specify dependencies included in RPM and DEB packages.

Example

$ tt pack deb --deps 'wget,make>0.1.0,unzip>1,unzip<=7'
--deps-file STRING

Applicable to: deb, rpm

Specify the path to a file containing dependencies included in RPM and DEB packages. For example, the package-deps.txt file below contains several dependencies and their versions:

unzip==6.0
neofetch>=6,<7
gcc>8

If this file is placed in the current directory, a tt pack command might look like this:

$ tt pack deb --deps-file package-deps.txt
--filename

Specify a bundle name.

Example

$ tt pack tgz --filename sample-app.tar.gz
--name PACKAGE_NAME

Specify a package name.

Example

$ tt pack tgz --name sample-app --version 1.0.1
--preinst

Applicable to: deb, rpm

Specify the path to a pre-install script for RPM and DEB packages.

Example

$ tt pack deb --preinst pre.sh
--postinst

Applicable to: deb, rpm

Specify the path to a post-install script for RPM and DEB packages.

Example

$ tt pack deb --postinst post.sh
--tarantool-version

Specify a Tarantool version for packaging in a Docker container. For use with --use-docker only.

--unit-params-file

The path to a file with custom systemd unit parameters.

--use-docker

Build a package in an Ubuntu 18.04 Docker container. To specify a Tarantool version to use in the container, add the --tarantool-version option.

Before executing tt pack with this option, make sure Docker is running.

--version PACKAGE_VERSION

Specify a package version.

Example

$ tt pack tgz --name sample-app --version 1.0.1
--with-binaries

Include Tarantool and tt binaries in a bundle.

--with-integrity-check PRIVATE_KEY

Generate checksums and signatures for integrity checks at the application startup.

See also: Generating files for integrity checks

--with-tarantool-deps

Add Tarantool and tt as package dependencies.

--without-binaries

Don’t include Tarantool and tt binaries in a bundle.

--without-modules

Don’t include external modules in a bundle.

Playing the contents of .snap and .xlog files to a Tarantool instance

$ tt play URI FILE ... [OPTION ...]

tt play plays the contents of snapshot (.snap) and WAL (.xlog) files to another Tarantool instance. A single call of tt play can play multiple files.

-u USERNAME, --username USERNAME

A Tarantool user for connecting to the instance.

-p PASSWORD, --password PASSWORD

The user’s password.

--from LSN

Play operations starting from the given LSN.

--to LSN

Play operations up to the given LSN. Default: 18446744073709551615.

--replica ID

Filter the operations by replica ID. Can be passed more than once.

When calling tt cat with filters by LSN (--from and --to flags) and replica ID (--replica), remember that LSNs differ across replicas. Thus, if you pass more than one replica ID via --from or --to, the result may not reflect the actual sequence of operations.

--space ID

Filter the output by space ID. Can be passed more than once.

--show-system

Show the operations on system spaces.

tt play plays operations from .xlog and .snap files to the destination instance one by one. All data changes happen the same way as if they were performed on this instance. This means that:

Use one of the following ways to pass the username and the password when connecting to the instance:

  • The -u (--username) and -p (--password) options:

    $ tt play 192.168.10.10:3301 00000000000000000000.xlog -u myuser -p p4$$w0rD
    
  • The connection string:

    $ tt play myuser:p4$$w0rD@192.168.10.10:3301 00000000000000000000.xlog
    
  • Environment variables TT_CLI_USERNAME and TT_CLI_PASSWORD:

    $ export TT_CLI_USERNAME=myuser
    $ export TT_CLI_PASSWORD=p4$$w0rD
    $ tt play 192.168.10.10:3301 00000000000000000000.xlog
    

Managing replica sets

$ tt replicaset COMMAND {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]
# or
$ tt rs COMMAND {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]

tt replicaset (or tt rs) manages a Tarantool replica set.

COMMAND is one of the following:

$ tt replicaset status {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]
# or
$ tt rs status {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]

tt replicaset status (tt rs status) shows the current status of a replica set.

To view the status of all replica sets of an application in the current tt environment, run tt replicaset status with the application name:

$ tt replicaset status myapp

To view the status of a single replica set of an application, run tt replicaset status with a name or a URI of an instance from this replica set:

$ tt replicaset status myapp:storage-001-a

For a replica outside the current tt environment, specify its URI and access credentials:

$ tt replicaset status 192.168.10.10:3301 -u myuser -p p4$$w0rD

Learn about other ways to provide user credentials in Authentication.

$ tt replicaset promote {APPLICATION:APP_INSTANCE | URI} [OPTIONS ...]
# or
$ tt rs promote {APPLICATION:APP_INSTANCE | URI} [OPTIONS ...]

tt replicaset promote (tt rs promote) promotes the specified instance, making it a leader of its replica set. This command works on Tarantool clusters with a local YAML configuration and Cartridge clusters.

Примечание

To promote an instance in a Tarantool cluster with a centralized configuration, use tt cluster replicaset promote.

tt replicaset promote works on Tarantool clusters with local YAML configurations with failover modes off, manual, and election.

In failover modes off or manual, this command updates the cluster configuration file according to the specified arguments and reloads it:

  • off failover mode: the command sets database.mode to rw on the specified instance.

    Важно

    If failover is off, the command doesn’t consider the modes of other replica set members, so there can be any number of read-write instances in one replica set.

  • manual failover mode: the command updates the leader option of the replica set configuration. Other instances of this replica set become read-only.

Example:

$ tt replicaset promote my-app:storage-001-a

If some members of the affected replica set are running outside the current tt environment, tt replicaset promote can’t ensure the configuration reload on them and reports an error. You can skip this check by adding the -f/--force option:

$ tt replicaset promote my-app:storage-001-a --force

In the election failover mode, tt replicaset promote initiates the new leader election by calling box.ctl.promote() on the specified instance. The --timeout option can be used to specify the election completion timeout:

$ tt replicaset promote my-app:storage-001-a --timeout=10

Важно

The Tarantool Cartridge framework is deprecated and is not compatible with Tarantool 3.0 and later. This command is added for backward compatibility with earlier versions.

tt replicaset promote promotes instances in Cartridge clusters as follows:

  • disabled or eventual failover mode: the command changes the instance failover priority.

    Важно

    In these cases, consistency is not guaranteed and replication conflicts may occur.

  • eventual or raft failover mode: the command calls cartridge.failover_promote() and waits until the instance transitions to the read-write mode. If the -f/--force option is specified, the force_inconsistency option of cartridge.failover_promote is set to true.

$ tt replicaset promote my-cartridge-app:storage-001-a --force

Learn more about Cartridge failover modes.

$ tt replicaset demote APPLICATION:APP_INSTANCE [OPTIONS ...]
# or
$ tt rs demote APPLICATION:APP_INSTANCE [OPTIONS ...]

tt replicaset demote (tt rs demote) demotes an instance in a Tarantool cluster with a local YAML configuration.

Примечание

To demote an instance in a Tarantool cluster with a centralized configuration, use tt cluster replicaset demote.

tt replicaset demote can demote instances in Tarantool clusters with local YAML configurations with failover modes off and election.

Примечание

In clusters with manual failover mode, you can demote a read-write instance by promoting a read-only instance from the same replica set with tt replicaset promote.

In the off failover mode, tt replicaset demote sets the instance’s database.mode to ro and reloads the configuration.

Важно

If failover is off, the command doesn’t consider the modes of other replica set members, so there can be any number of read-write instances in one replica set.

If some members of the affected replica set are running outside the current tt environment, tt replicaset demote can’t ensure the configuration reload on them and reports an error. You can skip this check by adding the -f/--force option:

$ tt replicaset demote my-app:storage-001-a --force

In the election failover mode, tt replicaset demote initiates a leader election in the replica set. The specified instance’s replication.election_mode is changed to voter for this election, which guarantees that another instance is elected as a new replica set leader.

The --timeout option can be used to specify the election completion timeout:

$ tt replicaset demote my-app:storage-001-a --timeout=10

$ tt replicaset expel APPLICATION:APP_INSTANCE [OPTIONS ...]
# or
$ tt rs expel  APPLICATION[:APP_INSTANCE] [OPTIONS ...]

tt replicaset expel (tt rs expel) expels an instance from the cluster.

$ tt replicaset expel myapp:storage-001-b

The command supports the --config, --cartridge, and --custom options that force the use of a specific orchestrator.

To expel an instance from a Cartridge cluster:

$ tt replicaset expel my-cartridge-app:storage-001-b --cartridge

$ tt replicaset vshard COMMAND {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]
# or
$ tt rs vshard COMMAND {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]
# or
$ tt rs vs COMMAND {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]

tt replicaset vshard (tt rs vs) manages vshard in the cluster.

It has the following subcommands:

$ tt replicaset vshard bootstrap {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]
# or
$ tt rs vshard bootstrap {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]
# or
$ tt rs vs bootstrap {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]

tt replicaset vshard bootstrap (tt rs vs bootstrap) bootstraps vshard in the cluster.

$ tt replicaset vshard bootstrap myapp

With a URI and credentials:

$ tt replicaset vshard bootstrap 192.168.10.10:3301 -u myuser -p p4$$w0rD

You can specify the application name or the name of any cluster instance. The command automatically finds a vshard router in the cluster and calls vshard.router.bootstrap() on it.

The command supports the --config, --cartridge, and --custom options that force the use of a specific orchestrator.

To bootstrap vshard in a Cartridge cluster:

$ tt replicaset vshard bootstrap my-cartridge-app --cartridge

Важно

The Tarantool Cartridge framework is deprecated and is not compatible with Tarantool 3.0 and later. This command is added for backward compatibility with earlier versions.

$ tt replicaset bootstrap APPLICATION[:APP_INSTANCE] [OPTIONS ...]
# or
$ tt rs bootstrap APPLICATION[:APP_INSTANCE] [OPTIONS ...]

tt replicaset bootstrap (tt rs bootstrap) bootstraps a Cartridge cluster or an instance. The command works within the current tt environment and uses application and instance names.

Примечание

tt replicasets bootstrap effectively duplicates two other commands:

To bootstrap the cartridge_app application using its default replica sets file replicasets.yml:

$ tt replicaset bootstrap cartridge_app

To use another file with replica set configuration, provide a path to it in the --file option:

$ tt replicaset bootstrap cartridge_app --file replicasets1.yml

To additionally bootstrap vshard after the cluster bootstrap, add --bootstrap-vshard:

$ tt replicaset bootstrap --bootstrap-vshard cartridge_app

When called with the instance name, tt replicaset bootstrap joins the instance to the replica set specified in the --replicaset option:

$ tt replicaset bootstrap --replicaset replicaset cartridge_app:instance1

$ tt replicaset rebootstrap APPLICATION:APP_INSTANCE [-y | --yes]
# or
$ tt rs rebootstrap APPLICATION:APP_INSTANCE [-y | --yes]

tt replicaset rebootstrap (tt rs rebootstrap) rebootstraps an instance: stops it, removes instance artifacts, starts it again.

To rebootstrap the storage-001 instance of the myapp application:

$ tt replicaset rebootstrap myapp:storage-001

To automatically confirm reboostrap, add the -y/--yes option:

$ tt replicaset rebootstrap myapp:storage-001 -y

$ tt replicaset roles [add|remove] APPLICATION[:APP_INSTANCE] ROLE_NAME [OPTIONS ...]
# or
$ tt rs roles [add|remove] APPLICATION[:APP_INSTANCE] ROLE_NAME [OPTIONS ...]

tt replicaset roles (tt rs roles) manages application roles in the cluster. This command works on Tarantool clusters with a local YAML configuration and Cartridge clusters. It has two subcommands:

Примечание

To manage roles in a Tarantool cluster with a centralized configuration, use tt cluster replicaset roles.

When called on clusters with local YAML configurations, tt replicaset roles subcommands add or remove the corresponding lines from the configuration file and reload the configuration.

Use the --global, --group, --replicaset, --instance options to specify the configuration scope to add or remove roles. For example, to add a role to all instances in a replica set:

$ tt replicaset roles add my-app roles.my-role --replicaset storage-a

You can also manage roles of a specific instance by specifying its name after the application name:

$ tt replicaset roles add my-app:router-001 roles.my-role

To remove a role defined in the global configuration scope:

$ tt replicaset roles remove my-app roles.my-role --global

If some instances of the affected scope are running outside the current tt environment, tt replicaset roles can’t ensure the configuration reload on them and reports an error. You can skip this check by adding the -f/--force option:

$ tt replicaset roles add my-app roles.my-role --replicaset storage-a --force

Важно

The Tarantool Cartridge framework is deprecated and is not compatible with Tarantool 3.0 and later. This command is added for backward compatibility with earlier versions.

When called on Cartridge clusters, tt replicaset roles subcommands add or remove Cartridge cluster roles.

Cartridge cluster roles are defined per replica set. Thus, you can use the --replicaset and --group options to define a role’s scope. In this case, a group is a vshard group.

To add a role to a Cartridge cluster replica set:

$ tt replicaset roles add my-cartridge-app my-role --replicaset storage-001

To remove a role from a vshard group:

$ tt replicaset roles remove my-cartridge-app my-role --group cold-data

Learn more about Cartridge cluster roles.

You can specify the orchestrator to use for the application when calling tt replicaset commands. The following options are available:

$ tt replicaset status myapp --config
$ tt replicaset promote my-cartridge-app:storage-001-a --cartridge

If an actual orchestrator that the application uses does not match the specified option, an error is raised.

Use one of the following ways to pass the credentials of a Tarantool user when connecting to the instance by its URI:

--bootstrap-vshard

Applicable to: bootstrap

Additionally bootstrap vshard when bootstrapping a Cartridge application.

--cartridge

Force the Cartridge orchestrator for Tarantool 2.x clusters.

--config

Force the YAML configuration orchestrator for Tarantool 3.0 or later clusters.

--custom

Force a custom orchestrator for Tarantool 2.x clusters.

--file STRING

Applicable to: bootstrap

A file with Cartridge replica sets configuration. Default: instances.yml in the application directory.

See also: Bootstrapping a Cartridge cluster

-f, --force

Applicable to: promote, demote, roles

Skip operation on instances not running in the same environment.

-G, --global

Applicable to: roles on Tarantool 3.x and later

Apply the operation to the global configuration scope, that is, to all instances.

-g, --group STRING

Applicable to: roles

A name of the configuration group to which the operation applies.

-i, --instance STRING

Applicable to: roles

A name of the instance to which the operation applies. Not applicable to Cartridge clusters. Learn more in Managing roles in Cartridge clusters.

-r, --replicaset STRING

Applicable to: bootstrap, roles

A name of the replica set to which the operation applies.

See also: Bootstrapping an instance

-u, --username STRING

A Tarantool user for connecting to the instance using a URI.

-p, --password STRING

The user’s password.

--sslcertfile STRING

The path to an SSL certificate file for encrypted connections for the URI case.

--sslkeyfile STRING

The path to a private SSL key file for encrypted connections for the URI case.

--sslcafile STRING

The path to a trusted certificate authorities (CA) file for encrypted connections for the URI case.

--sslciphers STRING

The list of SSL cipher suites used for encrypted connections for the URI case, separated by colons (:).

--timeout UINT

Applicable to: promote, demote, expel, vshard, bootstrap

The timeout for completing the operation, in seconds. Default:

  • 3 for promote, demote, expel, roles
  • 10 for vshard and bootstrap
--with-integrity-check STRING

Enterprise Edition

This option is supported by the Enterprise Edition only.

Applicable to: promote, demote, expel, roles

Generate hashes and signatures for integrity checks.

-y, --yes

Applicable to: rebootstrap

Automatically confirm rebootstrap.

Restarting a Tarantool instance

$ tt restart APPLICATION[:APP_INSTANCE] [OPTION ...]

tt restart restarts the specified running Tarantool instance. A tt restart call is equivalent to consecutive calls of tt stop and tt start.

When called without arguments, restarts all running applications in the current environment.

See also: Starting Tarantool applications, Stopping a Tarantool instance, Checking instance status.

-y, --yes

Automatic «Yes» to confirmation prompt.

Using the LuaRocks package manager

$ tt rocks [OPTION ...] [VAR=VALUE] COMMAND [ARGUMENT]

tt rocks provides means to manage Lua modules (rocks) via the LuaRocks package manager. tt uses its own LuaRocks installation connected to the Tarantool rocks repository.

Below are lists of supported LuaRocks flags and commands. For detailed information on their usage, refer to LuaRocks documentation.

--dev

Enable the sub-repositories in rocks servers for rockspecs of in-development versions.

--server=SERVER

Fetch rocks/rockspecs from this server (takes priority over config file).

--only-server=SERVER

Fetch rocks/rockspecs from this server only (overrides any entries in the config file).

--only-sources=URL

Restrict downloads to paths matching the given URL.

--lua-dir=PREFIX

Specify which Lua installation to use

--lua-version=VERSION

Specify which Lua version to use.

--tree=TREE

Specify which tree to operate on.

--local

Use the tree in the user’s home directory. Call tt rocks help path to learn how to enable it.

--global

Use the system tree when local_by_default is true.

--verbose

Display verbose output for the command executed.

--timeout=SECONDS

Timeout on network operations, in seconds. 0 means no timeout (wait forever). Default: 30.

admin Use the luarocks-admin tool
build Build and compile a rock
config Query information about the LuaRocks configuration
doc Show documentation for an installed rock
download Download a specific rock file from a rocks server
help Help on commands. Type tt rocks help <command> for more
init Initialize a directory for a Lua project using LuaRocks
install Install a rock
lint Check syntax of a rockspec
list List the currently installed rocks
make Compile package in the current directory using a rockspec
make_manifest Compile a manifest file for a repository
new_version Auto-write a rockspec for a new version of a rock
pack Create a rock, packing sources or binaries
purge Remove all installed rocks from a tree
remove Uninstall a rock
search Query the LuaRocks servers
show Show information about an installed rock
test Run the test suite in the current directory
unpack Unpack the contents of a rock
which Tell which file corresponds to a given module name
write_rockspec Write a template for a rockspec file

Running code in a Tarantool instance

$ tt run [SCRIPT|-e EXPR] [OPTION ...]

tt run executes Lua code in a new Tarantool instance.

-e EXPR, --evaluate EXPR

Execute the specified expression in a Tarantool instance.

-l LIB_NAME, --library LIB_NAME

Require the specified library.

-i, --interactive

Enter the interactive mode after the script execution.

-v, --version

Print the Tarantool version that is used for script execution.

tt run executes arbitrary Lua code in a Tarantool instance. The code can be provided either in a Lua file, or in a string passed after the -e/--evaluate flag. When called without arguments or flags, tt run opens the Tarantool console.

If libraries are required for execution, pass their names after the -l/--library flag.

By default, a Tarantool instance started by tt run shuts down after code execution completes. To leave this instance running and continue working in its console, add the -i/--interactive flag.

Listing available Tarantool versions

$ tt search PROGRAM_NAME [OPTION ...]

tt search lists versions of Tarantool and tt that are available for installation. The possible values of PROGRAM_NAME are:

Примечание

For tarantool-ee, account credentials are required. Specify them in a file (see the ee section of the configuration file) or provide interactively.

--debug

Applicable to: tarantool-ee

Search for debug builds of Tarantool Enterprise Edition’s SDK.

--local-repo

Search in the local repository, which is specified in the repo section of the tt configuration file.

--version VERSION

Applicable to: tarantool-ee

Tarantool Enterprise version.

Starting Tarantool applications

$ tt start [APPLICATION[:APP_INSTANCE]]

tt start starts Tarantool applications. The application files must be stored inside the instances_enabled directory specified in the tt configuration file. For detailed instructions on preparing and running Tarantool applications, see Application environment and Starting and stopping instances.

See also: Stopping a Tarantool instance, Restarting a Tarantool instance, Checking instance status.

To start all instances of the application stored in the app directory inside instances_enabled in accordance with its instances.yml:

$ tt start app

To start all instances of the app application appending their logs to stdout (in the interactive mode):

$ tt start -i app

To start the router instance of the app application:

$ tt start app:router

When called without arguments, starts all enabled applications in the current environment:

$ tt start

tt start can start entire Tarantool clusters based on their YAML configurations. A cluster application directory inside instances_enabled must contain the following files:

For more information about Tarantool application layout, see Application environment.

Примечание

tt also supports Tarantool applications with configuration in code, which is considered a legacy approach since Tarantool 3.0. For information about using tt with such applications, refer to the Tarantool 2.11 documentation.

tt start runs Tarantool applications in the background and uses its own watchdog process for status checks (tt status) and application stopping (tt stop).

Важно

Do not switch on the background mode using the cluster configuration (process.background: true in the YAML configuration) or code (box.cfg.background = true) in applications that you run with tt. If you start such an application with tt start, tt won’t be able to check the application status or stop it using the corresponding commands.

Enterprise Edition

The integrity check functionality is supported by the Enterprise Edition only.

tt start can perform initial and periodical integrity checks of the environment, application, and centralized configuration.

To enable integrity checks of environment and application files, you need to pack the application using tt pack with the --with-integrity-check option. This option generates and signs checksums of executables and configuration files in the current tt environment. Learn more in Generating files for integrity checks.

To enable integrity check of the configuration at the centralized storage, publish the configuration to this storage using tt cluster publish with the --with-integrity-check option. This option generates and signs configuration checksums and saves them to the storage. Learn more in Publishing configurations with integrity check.

To perform the integrity checks when running the application, start it with the --integrity-check global option. Its argument must be a public key matching the private key that was used for generating checksums.

$ tt --integrity-check public.pem start myapp

After such a call, tt checks the environment, application, and configuration integrity using the checksums and starts the application in case of the success. Then, integrity checks are performed periodically while the application is running. By default, they are performed once every 24 hours. You can adjust the integrity check period by adding the --integrity-check-period option:

$ tt --integrity-check public.pem start myapp --integrity-check-period 60

Additionally, Tarantool checks the integrity of the modules that the application uses at the load time, that is, when require('module') is called.

If an integrity check fails, tt stops the application.

-i, --interactive

Start the application or instance in the interactive mode. In this mode, instance logs are printed to the standard output in real time.

You can use the SIGINT signal (CTRL+C) to stop tt and its child Tarantool processes in the interactive mode. No watchdog processes are created.

--integrity-check-interval NUMBER

Integrity check interval in seconds. Default: 86400 (24 hours). Set this option to 0 to disable periodic checks.

See also: Integrity check

Checking instance status

$ tt status [APPLICATION[:APP_INSTANCE]] [OPTION ...]

tt status prints the information about Tarantool applications and instances in the current environment. This includes:

When called without arguments, prints the status of all enabled applications in the current environment.

-d, --details

Print detailed alerts.

-p, --pretty

Print the status as a pretty-formatted table.

Stopping a Tarantool instance

$ tt stop [APPLICATION[:APP_INSTANCE]]

tt stop stops the specified running Tarantool applications or instances. Before stopping the instances, the command prompts the user for confirmation.

When called without arguments, tt stop stops all running applications in the current environment.

See also: Starting Tarantool applications, Restarting a Tarantool instance, Checking instance status.

-y, --yes

Stop instances without confirmation.

Interacting with the Tarantool Data Grid 2

Enterprise Edition

This command is supported by the Enterprise Edition only.

$ tt tdg2 COMMAND [COMMAND_OPTION ...]

tt tdg2 enables the interaction with Tarantool Data Grid 2 clusters. COMMAND is one of the following:

Uninstalling Tarantool software

$ tt uninstall PROGRAM_NAME [VERSION]

tt uninstall uninstalls a previously installed Tarantool version.

Uninstall Tarantool 2.10.4:

$ tt uninstall tarantool 2.10.4

Displaying the tt version

$ tt version

tt version shows the version of the tt utility being used.

Extending the tt functionality

The tt utility implements a modular architecture: its commands are, in fact, separate modules. When you run tt with a command, the corresponding module is executed with the given arguments.

The modular architecture enables the option to extend the tt functionality with external modules (as opposed to internal modules that implement built-in commands). Simply said, you can write any code you want to execute from tt, pack it into an executable, and run it with a tt command:

tt my-module-name my-args

The name of the command that executes a module is the same as the name of the module’s executable.

Executables that implement external tt modules must have two flags:

External modules must be located in the modules directory specified in the configuration file:

tt:
  modules:
    directory: path/to/modules/dir

To check if a module is available in tt, call tt help. It will show the available external modules in the EXTERNAL COMMANDS section together with their descriptions.

External modules can overload built-in tt commands. If you want to change the behavior of a built-in command, create an external module with the same name and your own implementation.

When tt sees two modules – an external and an internal one – with the same name, it will use the external module by default.

For example, if you want tt to show the information about your Tarantool application, write the external module version that outputs the information you need. The tt version call will execute this module instead of the built-in one:

tt version # Calls the external module if it's available

You can force the use of the internal module by running tt with the --internal or -I option. The following call will execute the built-in version even if there is an external module with the same name:

tt version -I # Calls the internal module

tt interactive console

The tt utility features a command-line console that allows executing requests and Lua code interactively on the connected Tarantool instances. It is similar to the Tarantool interactive console with one key difference: the tt console allows connecting to any available instance, both local and remote. Additionally, it offers more flexible output formatting capabilities.

To connect to a Tarantool instance using the tt console, run tt connect.

Specify the instance URI and the user credentials in the corresponding options:

$ tt connect 192.168.10.10:3301 -u myuser -p p4$$w0rD
   • Connecting to the instance...
   • Connected to 192.168.10.10:3301

192.168.10.10:3301>

If a user is not specified, the connection is established on behalf of the guest user.

If the instance runs in the same tt environment, you can establish a local connection with it by specifying the <application>:<instance> string instead of the URI:

$ tt connect app:storage001
    • Connecting to the instance...
    • Connected to app:storage001

 app:storage001>

Local connections are established on behalf of the admin user.

To get the list of supported console commands, enter \help or ?. To quit the console, enter \quit or \q.

Similarly to the Tarantool interactive console, the tt console can handle Lua or SQL input. The default is Lua. For Lua input, the tab-based autocompletion works automatically for loaded modules.

To change the input language to SQL, run \set language sql:

app:storage001> \set language sql
app:storage001> select * from bands where id = 1
---
- metadata:
  - name: id
    type: unsigned
  - name: band_name
    type: string
  - name: year
    type: unsigned
  rows:
  - [1, 'Roxette', 1986]
...

To change the input language back to Lua, run \set language lua:

app:storage001> \set language lua
app:storage001> box.space.bands:select { 1 }
---
- - [1, 'Roxette', 1986]
...

Примечание

You can also specify the input language in the tt connect call using the -l/--language option:

$ tt connect app:storage001 -l sql

By default, the tt console prints the output data in the YAML format, each tuple on the new line:

app:storage001> box.space.bands:select { }
---
- - [1, 'Roxette', 1986]
  - [2, 'Scorpions', 1965]
  - [3, 'Ace of Base', 1987]
...

You can switch to alternative output formats – Lua or ASCII (pseudographics) tables – using the \set output console command:

app:storage001> \set output lua
app:storage001> box.space.bands:select { }
{{1, "Roxette", 1986}, {2, "Scorpions", 1965}, {3, "Ace of Base", 1987}};
app:storage001> \set output table
app:storage001> box.space.bands:select { }
+------+-------------+------+
| id   | band_name   | year |
+------+-------------+------+
| 1    | Roxette     | 1986 |
+------+-------------+------+
| 2    | Scorpions   | 1965 |
+------+-------------+------+
| 3    | Ace of Base | 1987 |
+------+-------------+------+

Примечание

Field names are printed since Tarantool 3.2. On earlier versions, actual names are replaced by numbered placeholders col1, col2, and so on.

The table output can be printed in the transposed format, where an object’s fields are arranged in columns instead of rows:

app:storage001> \set output ttable
app:storage001> box.space.bands:select { }
+-----------+---------+-----------+-------------+
| id        | 1       | 2         | 3           |
+-----------+---------+-----------+-------------+
| band_name | Roxette | Scorpions | Ace of Base |
+-----------+---------+-----------+-------------+
| year      | 1986    | 1965      | 1987        |
+-----------+---------+-----------+-------------+

Примечание

You can also specify the output format in the tt connect call using the -x/--outputformat option:

$ tt connect app:storage001 -x table

For table and ttable output, more customizations are possible with the following commands:

Show help on the tt console.

Quit the tt console.

Show available keyboard shortcuts.

Set the input language. Possible values:

  • lua (default)
  • sql

An analog of the tt connect option -l/--language.

Set the output format. Possible FORMAT values:

  • yaml (default) – each output item is a YAML object. Example: [1, 'Roxette', 1986]. Shorthand: \xy.
  • lua – each output tuple is a separate Lua table. Example: {{1, "Roxette", 1986}};. Shorthand: \xl.
  • table – the output is a table where tuples are rows. Shorthand: \xt.
  • ttable – the output is a transposed table where tuples are columns. Shorthand: \xT.

Примечание

The \x command switches the output format cyclically in the order yaml > lua > table > ttable.

The format of table and ttable output can be adjusted using the \set table_format, \set graphics, and \set table_colum_width commands.

An analog of the tt connect option -x/--outputformat.

Set the table format if the output format is table or ttable. Possible values:

  • default – a pseudographics (ASCII) table.
  • markdown – a table in the Markdown format.
  • jira – a Jira-compatible table.

Whether to print pseudographics for table cells if the output format is table or ttable. Possible values: true (default) and false.

The shorthands are:

  • \xG for true
  • \xg for false

Set the maximum printed width of a table cell content. If the length exceeds this value, it continues on the next line starting from the + (plus) sign.

Shorthand: \xw

Migration from tarantoolctl to tt

tt is a command-line utility for managing Tarantool applications that comes to replace tarantoolctl. Starting from version 3.0, tarantoolctl is no longer shipped as a part of Tarantool distribution; tt is the only recommended tool for managing Tarantool applications from the command line.

tarantoolctl remains fully compatible with Tarantool 2.* versions. However, it doesn’t receive major updates anymore.

We recommend that you migrate from tarantoolctl to tt to ensure the full support and timely updates and fixes.

tt supports system-wide environment configuration by default. If you have Tarantool instances managed by tarantoolctl in such an environment, you can switch to tt without additional migration steps or use tt along with tarantoolctl.

Example:

$ sudo tt instances
List of enabled applications:
• example

$ tarantoolctl start example
Starting instance example...
Forwarding to 'systemctl start tarantool@example'

$ tarantoolctl status example
Forwarding to 'systemctl status tarantool@example' tarantool@example.service - Tarantool Database Server
    Loaded: loaded (/lib/systemd/system/tarantool@.service; enabled; vendor preset: enabled)
    Active: active (running)
    Docs: man:tarantool(1)
    Main PID: 6698 (tarantool)
. . .

$ sudo tt status
• example: RUNNING. PID: 6698.

$ sudo tt connect example
• Connecting to the instance...
• Connected to /var/run/tarantool/example.control

/var/run/tarantool/example.control>

$ sudo tt stop example
• The Instance example (PID = 6698) has been terminated.

$ tarantoolctl status example
Forwarding to 'systemctl status tarantool@example' tarantool@example.service - Tarantool Database Server
    Loaded: loaded (/lib/systemd/system/tarantool@.service; enabled; vendor preset: enabled)
    Active: inactive (dead)

If you have a local tarantoolctl configuration, create a tt environment based on the existing .tarantoolctl configuration file. To do this, run tt init in the directory where the file is located.

Example:

$ cat .tarantoolctl
default_cfg = {
    pid_file  = "./run/tarantool",
    wal_dir   = "./lib/tarantool",
    memtx_dir = "./lib/tarantool",
    vinyl_dir = "./lib/tarantool",
    log       = "./log/tarantool",
    language  = "Lua",
}
instance_dir = "./instances.enabled"

$ tt init
• Found existing config '.tarantoolctl' Environment config is written to 'tt.yaml'

After that, you can start managing Tarantool instances in this environment with tt:

$ tt start app1
• Starting an instance [app1]...

$ tt status app1
• app1: RUNNING. PID: 33837.

$ tt stop app1
• The Instance app1 (PID = 33837) has been terminated.

$ tt check app1
• Result of check: syntax of file '/home/user/instances.enabled/app1.lua' is OK

Most tarantoolctl commands look the same in tt: tarantoolctl start and tt start, tarantoolctl play and tt play, and so on. To migrate such calls, it is usually enough to replace the utility name. There can be slight differences in command flags and format. For details on tt commands, see the tt commands reference.

The following commands are different in tt:

tarantoolctl command tt command
tarantoolctl enter tt connect
tarantoolctl eval tt connect with -f flag

Примечание

tt connect also covers tarantoolctl connect with the same syntax.

Example:

 # tarantoolctl enter > tt connect
 $ tarantoolctl enter app1
 connected to unix/:./run/tarantool/app1.control
 unix/:./run/tarantool/app1.control>

 $ tt connect app1
  Connecting to the instance...
  Connected to /home/user/run/tarantool/app1/app1.control

 # tarantoolctl eval > tt connect -f
 $ tarantoolctl eval app1 eval.lua
 connected to unix/:./run/tarantool/app1.control
 ---
 - 42
 ...

$ tt connect app1 -f eval.lua
 ---
 - 42
 ...

 # tarantoolctl connect > tt connect
 $ tarantoolctl connect localhost:3301
 connected to localhost:3301
 localhost:3301>

 $ tt connect localhost:3301
  Connecting to the instance...
  Connected to localhost:3301

Tarantool Cluster Manager

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

Tarantool Cluster Manager (TCM) это веб-инструмент для настройки, управления и мониторинга кластеров Tarantool EE. Он предоставляет графический интерфейс для работы с кластерами и отдельными экземплярами, от мониторинга их состояния до интерактивного выполнения команд в консоли экземпляра.

Это отдельное приложение, входящее в состав дистрибутива Tarantool Enterprise Edition distribution package. Оно поставляется в виде готового к запуску исполняемого файла для платформ Linux.

TCM работает только с кластерами Tarantool EE, которые используют централизованную конфигурацию в etcd или хранилище конфигурации на базе Tarantool. При создании или редактировании конфигурацию кластера в TCM, он публикует сохраненную конфигурацию в хранилище. Это обеспечивает согласованное и надежное хранение конфигурации. Одна установка TCM может подключаться к нескольким кластерам Tarantool EE и переключаться между ними одним щелчком мыши.

Для обеспечения безопасности корпоративного уровня TCM использует собственную систему контроля доступа на основе ролей. Можно создавать пользователей и назначать им роли, включающие необходимые разрешения. Например, пользователь может быть администратором определенного кластера или иметь только право на чтение данных. Также поддерживается авторизация LDAP.

Web interface overview

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

The Tarantool Cluster Manager web interface is available on the hostname and port defined by the http.host and http.port configuration options. If TLS is enabled, it uses the https protocol, otherwise the protocol is http. When started locally with the default configuration, TCM is available at http://127.0.0.1:8080.

To log into TCM after bootstrap, use the following credentials:

TCM login page

After logging in with the default password:

  1. Adjust the password policy in accordance with the security requirements that apply in your organization.
  2. Change the admin user’s password on the User settings page.

To log out of TCM, click the user’s name in the header and click Log out.

The TCM web interface consists of three parts:

  1. Navigation page on the left shows the list of pages available to the user. The navigation pane can be collapsed by clicking the cross icon at its top.
  2. Header at the top provides access to notifications and user settings.
  3. Working area displays the contents of the selected page.
TCM UI parts: navigation pane, header, working area

The Onboarding item of the navigation pane starts the interactive onboarding tutorial. Use it to get familiar with the main TCM features directly in the web interface.

This overview describes most TCM pages. The exact set of pages and controls available to a particular user is determined by the user’s permissions.

Some features, such as data schema editing, are available only in the development mode. You can switch to it in the user settings of the Default Admin user. To learn more about the development mode, see Development mode.

For easier navigation, TCM pages are grouped in the navigation pane by their content. There are the following page groups:

Read on to learn what you can do on the pages of these groups.

The Cluster group includes pages used for interaction with a particular cluster. To switch between clusters, click the Cluster group name and select a connected cluster from the drop-down list.

The cluster Stateboard is a main page for monitoring the cluster state and interacting with its instances.

TCM stateboard

On this page, you can:

  • view and edit the cluster topology
  • group and filter instances based on various criteria
  • view memory statistics and Tarantool versions running on instances
  • navigate to instance pages by clicking instance names in the cluster topology list
  • start and stop instances (in the development mode).

Learn more about using the cluster stateboard in Viewing cluster state.

The instance page opens when you click an instance name on the Stateboard.

TCM instance page

It provides a set of tabs for performing actions on the selected Tarantool instance:

  • Details and State tabs: view instance details as a human-readable table or as a console output of box.cfg, box.info, and other built-in functions
  • SQL and Terminal tabs: run SQL and Lua commands on the instance
  • Logs tab: view instance logs
  • Slabs tab: view slab allocator statistics
  • Users tab: manage Tarantool users and roles on the instance
  • Funcs: manage and call stored functions
  • Metrics: view instance metrics

The instance page has an Actions menu at the top that allows you to:

  • navigate to the instance explorer
  • edit the instance configuration
  • remove the instance

The Slabs tab in the TCM Web UI visualizes memory allocation within each Tarantool instance using the slab allocator.

This tab is useful for:

  • identifying memory fragmentation
  • analyzing slab saturation by object size
  • debugging excessive memory use in real time

This visualization is based on the output of:

box.slab.stats()

This function returns a Lua table with per-class (per object size) memory allocation statistics from the slab allocator. More about box.slab.stats().

Each entry in the output contains:

  • item_size: object size class
  • slab_count: number of slab blocks
  • slab_size: memory size of each slab
  • item_count: number of allocated objects
  • mem_used: bytes used
  • mem_free: bytes free

These values are parsed and rendered as visual elements in the UI.

Each block represents a single slab (a fixed-size memory region). The color indicates how full the slab is:

  • Green — the slab is less than 30% full
  • Red — slab is full (100% usage)
  • Gradient colors between green and red — indicate intermediate fill levels (e.g., 30%, 50%, 75%)

The color transitions smoothly, providing a quick visual way to understand which slabs are:

  • actively used
  • partially utilized
  • potentially underused or contributing to memory fragmentation

In the example screenshot:

  • Slab #17 (168 KB) — 75% full (dark red)
  • Slab #18 (320 KB) — 53% full (brownish-red)
  • Slab #16 (40 KB) — only 1% used (bright green)
  • Slab #2 (56 B) — 60% used (intermediate gradient)

Each slab block’s size in the visualization reflects the total memory allocated for its item_size class – the more memory allocated, the larger the visual representation.

../../../_images/tcm_ui_slabs.png

The overall fill percentage for a slab is calculated using:

fill % = (item_count * item_size) / (slab_count * slab_size)

However, each slab is visualized individually, so different fill levels across slabs will result in various colors within the same row.

Slab allocation may vary between instances in the same replicaset due to differences in configuration, data loading order, and use of local memory. The reasons are:

  1. Slab allocation may differ because each instance can use its own values for slab_alloc_factor and slab_alloc_granularity. These parameters control how memory is divided into size classes and slabs, affecting memory layout and potential fragmentation.
  2. Differences also appear during replica join or restart. A replica allocates memory for tuples in primary index order, while on the master, allocation follows the order of incoming requests. This results in different slab structures and usually lower fragmentation on replicas after a restart.
  3. Local and temporary spaces exist only on specific instances and are not replicated. They consume memory independently and contribute to differences in slab allocation across nodes.

You can fine-tune the allocator behavior with two configuration options:

  • slab_alloc_factor – multiplier for calculating object size classes. Default value: 1.05
  • slab_alloc_granularity – minimum allocation step (in bytes) for the small allocator. Default value: 8

These parameters affect how memory is allocated per object size class and can help:

  • reduce internal fragmentation
  • optimize memory usage
  • improve slab locality and performance
  • better understand memory consumption via the Slabs tab

Use cases and recommendations table:

Scenario / Goal Parameters (slab_alloc_factor / slab_alloc_granularity) Effect on memory Effect on performance Visualization in Slabs tab
Reduce memory waste (small, uniform tuples) 1.05 / 4 Many size classes – minimal internal memory waste Higher overhead for managing slab pools Many rows, partially filled blocks, gradient from green to red
Optimize performance (mixed-size tuples) 1.3 / 16 Fewer size classes – slightly more memory waste Lower overhead – faster memory allocation Fewer rows, larger blocks, color contrast: partially or filled
Control fragmentation and slab count Task-dependent: lower values – more classes; higher values – fewer classes Balance between internal memory waste and the number of blocks Balance between overhead and allocator speed Balance between number of rows and block sizes; colors indicate fill level

The cluster Configuration page provides an interactive editor for the cluster configuration. It is connected to the centralized configuration storage that the cluster uses. All changes you make and apply to this page are sent to this centralized storage.

TCM cluster configuration page

Learn more in Configuring clusters.

The Security page provides controls for managing the cluster security settings.

TCM cluster security page

Learn more in Security settings.

The Migrations page provides centralized migration management tools for the selected cluster.

TCM cluster migrations page

Learn more in Performing migrations.

Важно

The cluster-wide access to stored data on the Tuples page is supported only for sharded clusters that use the CRUD module. Starting with TCM 1.6.0, the Tuples tab is disabled by default. You can enable the tab in the TCM configuration file (tcm.yaml) using the option below:

feature:
  tuples: True

The Tuples page provides access to data stored in the user spaces of the selected cluster.

TCM tuples page

On this page, you can:

  • view the list of user spaces, their size, and engines
  • view and edit tuples stored in user spaces
  • search for tuples by entering search condition in the Search bar

TCM supports the following comparison operators:

  • == – equal to
  • > – greater than
  • < – less than
  • >= – greater than or equal to
  • <= – less than or equal to

Search condition has the following structure:

index_name comparator value

where:

  • index_name – the name of the index. This is the left-hand side of the expression.
  • comparator – a comparison operator (>, >=, ==, <=, <). It must be separated by spaces on both sides of the expression.
  • value – a string, numeric, or boolean value. This is the right-hand side of the expression. String values must be enclosed in double quotes ("").

Примечание

TCM does not support text search without a search condition. For example, to search for customers named Ivan in a space, use the index name and a comparison operator to specify the expression:

  • correct: typing name == "Ivan" in the Search bar
  • incorrect: typing Ivan in the Search bar

Examples

The search expression below returns tuples with IDs greater than 9990:

id > 9990

In TCM, the result might look as follows:

TCM Tuples page

In the example below, the search returns tuples with the name index equal to Ivan:

name == "Ivan"

The example below specifies a multiple search condition. The search returns all people with an ID greater than 2 who were born in 1980 or earlier.

id > 2; year <= 1980;

The TCF tab provides an interface for clusters that run within Tarantool Clusters Federation.

TCM TCF page

TCF tab can be added via the TCM configuration file:

# tcm.yaml
feature:
    tcf: True

You can also enable it using the environment variable or the feature command-line option. For more details, refer to configuration reference.

On this page, you can:

  • view information about TCF clusters
  • toggle the state of clusters
  • promote or demote clusters
  • change key cluster parameters.

To open the settings, click Actions (the three dots next to the cluster status) and select Settings. Available parameters:

  • dml_users: list of DML users

  • cluster1, cluster2: cluster settings

  • replication_user: replication username

  • replication_password: password associated with the replication user

  • failover_timeout: time period (in seconds) to wait before initiating failover to another cluster. Default value: 20

  • initial_status: initial service state

  • max_suspect_counts: maximum suspect counts for failover. Default value: 3

  • health_check_delay: delay (in seconds) between health checks. Default value: 2

  • enable_system_check: enables or disables system-level health checks. Default value: true

  • status_ttl: time-to-live for service status. Default value: 4

    TCM TCF settings page TCM TCF settings page

Learn more in TCF integration.

TCM provides built-in support for monitoring and inspecting Tarantool Queue Enterprise through the web interface.

The TQE tab can be added via the TCM configuration file:

# tcm.yaml
feature:
    tqe: True

You can also enable it using the environment variable or the feature command-line option. For more details, refer to configuration reference.

After enabling the feature, the TQE page appears in the TCM UI and provides access to Metrics and Queues pages.

Metrics can be viewed in two formats:

  • Chart view
  • Table view
TCM TQE Metrics page

The Queues page displays runtime information for each queue, including:

  • Latency – the time delay (ms) between a message being added to the queue and being processed.
  • Poll max batch – the number of messages retrieved in a single request for processing.
  • Deduplication mode – specifies how duplicate messages are handled. Deduplication is always enabled. Available modes: basic (default), extended, keep_latest, keep_first.
TCM TQE Queues page

The Cluster metrics page provides access to the selected cluster’s metrics.

TCM cluster metrics page

Learn more in Viewing cluster metrics.

The instance Explorer provides access to all spaces of a specific instance, including system spaces.

TCM instance explorer

On this page, you can:

  • view and edit instance spaces, their size, and engines
  • view and edit tuples stored in all spaces of the instance

The Clusters group includes pages used for managing TCM’s cluster connections.

The Clusters page lists Tarantool clusters that are connected to TCM.

TCM clusters page

On this page, you can:

  • connect Tarantool clusters to TCM
  • edit cluster connections
  • disconnect clusters

Learn more in Connecting clusters.

The ACL page displays the TCM access control list.

TCM ACL page

On this page, you can add and delete ACL entries. Learn more in Access control list.

The Users group includes pages related to user access to TCM.

The Users page lists TCM users.

TCM users page

On this page, you can:

  • add, edit, and delete users
  • manage user secrets (passwords and API tokens)
  • revoke user sessions

Learn more in Users.

The Roles page lists TCM user roles.

TCM roles page

On this page, you can add, edit, and delete roles. Learn more in Roles.

The Sessions page lists active sessions of TCM users.

TCM sessions page

On this page, you can view and revoke sessions. Learn more in Sessions.

The Tools group includes service pages used for TCM maintenance and monitoring.

The Audit log tab displays the TCM audit log.

TCM audit log

The TT Column Store tab provides an interface for querying data stored in Tarantool Column Store directly from TCM.

TCM TCS page

Interface elements:

  • Endpoint — TCS HTTP endpoint
  • Query — SQL query editor for entering SELECT statements
  • Query button — executes the query
  • Clear button — clears the editor
  • Result table — displays returned rows and columns

The TCS page can be added via the TCM configuration file:

# tcm.yaml
feature:
    column-store: True

When enabled, the TT Column Store section appears in the left navigation panel. You can also enable it using the environment variable or the feature command-line option. For more details, refer to configuration reference.

TCM provides built-in support for interacting with Tarantool Graph DB through the web interface.

The TT Graph DB tab can be added via the TCM configuration file:

# tcm.yaml
feature:
    ttgraph: True

You can also enable it using the environment variable or the feature command-line option. For more details, refer to configuration reference.

After enabling the feature, the TT Graph DB page appears in the TCM UI and provides the following capabilities:

  • Specify the Endpoint and Query fields for placing a request for a TT Graph DB instance. Click Execute to run the query. The result appears in a table below the input form:

    TCM TT Graph DB page
  • Press Query to view the results in a graph format, allowing to analyze relationships between entities:

    TCM TT Graph DB page

    Each row in the result table can be expanded to view detailed information about a specific record:

    TCM TT Graph DB page

    The graph can be opened in full size:

    TCM TT Graph DB page

The TCM metrics tab provides access to the TCM metrics.

TCM metrics page

The Settings group includes service pages where you can configure various TCM features.

On the Password policy page, you can configure the requirements to user passwords, such as minimal length, required symbols, expiration, and other settings. Learn more in Password policy.

TCM password policy

On the Audit settings page, you can configure how TCM records events to its audit log: whether audit log is enabled, which events are recorded, and so on. Learn more in Audit log.

TCM audit settings

On the LDAP page, you can manage TCM LDAP configurations.

TCM LDAP configurations

The user settings dialog opens when you click Settings under the user’s name in the header.

TCM user settings

This dialog includes the following tabs:

Connecting clusters

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

Tarantool Cluster Manager works with clusters that:

A single TCM installation can have multiple connected clusters. A connection to TCM doesn’t affect the cluster’s functioning. You can connect clusters to TCM and disconnect them on the fly.

There are two scenarios of cluster connection to TCM:

In both cases, you need to deploy Tarantool and start the cluster instances using the tt CLI utility or another suitable way.

To add a cluster to TCM, you can use two ways:

When connecting a cluster to TCM, you need to provide two sets of connection parameters: for the cluster instances and for the centralized configuration storage.

The cluster configuration can be stored in either an etcd cluster or a separate Tarantool-based storage. In both cases, the following connection parameters are required:

  • A key prefix used to identify the cluster in the configuration storage. A prefix must be unique for each cluster in storage.
  • URIs of all instances of the configuration storage.
  • The credentials for accessing the configuration storage: an etcd user or a Tarantool user.

Additionally, if SSL or TLS encryption is enabled for the configuration storage, provide the corresponding encryption configuration: keys, certificates, and other parameters. For the complete list of parameters, consult the etcd documentation or Tarantool Securing connections with SSL.

For interaction with the cluster instances, TCM needs the following access parameters:

  • A Tarantool user that exists in the cluster and their password. TCM connects to the cluster on behalf of this user.
  • An SSL configuration if the traffic encryption is enabled on the cluster.

Administrators can add new clusters, edit, and remove existing ones from TCM.

Connected clusters are listed on the Clusters page.

If you already have a cluster and want to connect it to TCM, follow these steps:

  1. Go to Clusters and click Add.
  2. Fill in the general cluster information:
    • Specify an arbitrary name.
    • Optionally, provide a description and select a color to mark this cluster in TCM.
    • Optionally, enter the URLs of additional services for the cluster. For example, a Grafana dashboard that monitors the cluster metrics, or a syslog server for viewing the cluster logs. TCM provides quick access to these URLs on the cluster Stateboard page.
  1. Provide the details of the cluster configuration storage:
    • Storage type: etcd or tarantool.
    • The Prefix specified in the cluster configuration.
    • The URIs of the configuration storage instances.
    • The credentials for accessing the configuration storage.
    • The SSL/TLS parameters if the connection encryption is enabled on the storage.
  2. Provide the credentials for accessing the cluster: a Tarantool user’s name, their password, and SSL parameters in case traffic encryption is enabled on the cluster.

If you don’t have a cluster yet, you can add one in TCM and write its configuration from scratch using the built-in configuration editor.

Важно

When adding a new cluster, you need to have a storage for its configuration up and running so that TCM can connect to it. Cluster instances can be deployed later.

To add a new cluster:

  1. Go to Clusters and click Add.
  2. Fill in the general cluster information:
    • Specify an arbitrary name.
    • Optionally, provide a description and select a color to mark this cluster in TCM.
    • Optionally, enter the URLs of additional services for the cluster. For example, a Grafana dashboard that monitors the cluster metrics, or a syslog server for viewing the cluster logs. TCM provides quick access to these URLs on the cluster Stateboard page.
  3. Select the type of the cluster configuration storage: etcd or tarantool.
  4. Define a unique Prefix for identifying this cluster in the configuration storage.
  5. Provide the connection details for the cluster configuration storage:
    • The URIs of configuration storage instances.
    • The credentials for accessing the configuration storage.
    • The SSL/TLS parameters if the connection encryption is enabled on the storage.
  6. Provide the cluster credentials: a username, a password, and SSL parameters in case traffic encryption is enabled on the cluster.

Once you add the cluster:

To edit a connected cluster, go to Clusters and click Edit in the Actions menu of the corresponding table row.

To disconnect a cluster from TCM, go to Clusters and click Disconnect in the Actions menu of the corresponding table row.

Примечание

Disconnecting a cluster does not affect its functioning. The only thing that changes is that it’s no longer shown in TCM. You can connect this cluster again at any time.

Cluster management

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

The main goal of Tarantool Cluster Manager is to provide visual tools for managing various aspects of Tarantool clusters from the browser. See the pages of this section to learn how to perform various management operations on Tarantool clusters from TCM.

Viewing cluster state

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

Tarantool Cluster Manager provides a visual interface for checking various aspects of connected clusters, such as:

Cluster state information is available on the Cluster > Stateboard page.

The cluster topology is displayed on the Stateboard page in one of two forms: a list or a graph.

The list view of the cluster topology is used by default. In this view, each row contains the general information about an instance: its current state, memory usage and limit, and other parameters.

In the list view, TCM additionally displays the Tarantool version information and instance states on circle diagrams. You can click the sectors of these diagrams to filter the instances with the selected versions and states.

To switch to the list view, click the list button on the right of the search bar on the Stateboard page.

The graph view of the cluster topology is shown in a tree-like structure where leafs are the cluster’s instances. Each instance’s state is shown by its color. You can move the graph vertices to arrange them as you like, and zoom in and out, which is helpful for larger clusters.

To switch to the graph view, click the graph button on the right of the search bar on the Stateboard page.

By default, the cluster topology is shown hierarchically as it’s defined in the configuration: instances are grouped by their replica set, and replica sets are grouped by their configuration group.

For better navigation across the cluster, you can adjust the instance grouping. For example, you can group instances by their roles or custom tags defined in the configuration. A typical case for such tags is adding a geographical markers to instances. In this case, you see if issues happen in a specific data center or server.

To change the instance grouping, click Group by in the Actions menu on the Stateboard page. Then add or remove grouping criteria.

You can filter the instances shown on the Stateboard page using the search bar at the top. It has predefined filters that select:

  • instances with errors or warnings
  • leader or read-only instances
  • instances with no issues
  • stale instances

To display all instances, delete the filter applied in the search bar.

The general information about the state of cluster instances is shown in the list view of the cluster topology. Each row contains the information about the instance status, used and available memory, read-only status, and virtual buckets for sharded clusters.

To view the detailed information about an instance or connect to it, click the corresponding row in the instances list or a vertex of the graph. On the instance page, you can find:

The page also provides Lua and SQL terminals to execute built-in functions and requests on the instance. You can choose between two Lua terminals: the tt interactive console with code completion and highlighting or the default Tarantool console.

When you connect a cluster to TCM, you can specify URLs of external services linked to this cluster. For example, this can be a Grafana server that monitors the cluster metrics.

All the URLs added for a cluster are available for quick access in the Actions menu on the Stateboard page.

Configuring clusters

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

Tarantool Cluster Manager features a built-in text editor for Tarantool EE cluster configurations.

When you connect a cluster to TCM, it gains access to the cluster’s centralized configuration storage: an etcd or a Tarantool cluster. TCM has both read and write access to the cluster configuration. This enables the configuration editor to work in two ways:

To learn how to write Tarantool cluster configurations, see Configuration.

The configuration editor is available on the Cluster > Configuration page.

To start managing a cluster’s configuration, select this cluster in the Cluster drop-down and go to the Configuration page.

A cluster configuration in TCM can consist of one or multiple YAML files. When there are multiple files, they are all considered parts of a single cluster configuration. You can use this for structuring big cluster configurations. All files that form the configuration of a cluster are listed on the left side of the Cluster configuration page.

To add a cluster configuration file, click the plus icon (+) below the page title.

To open a configuration file in the editor, click its name in the file list.

To delete a cluster configuration file, click the Delete button beside the filename.

To download a cluster configuration file, click the Download button beside the filename.

Предупреждение

All configuration changes are discarded when you leave the Cluster configuration page. Save the configuration if you want to continue editing it later or apply it to start using it on the cluster.

TCM can store configurations drafts. If you want to leave an unfinished configuration and return to it later, save it in TCM. Saving applies to whole cluster configurations: it records the edits of all files, file additions, and file deletions.

To save a cluster configuration draft after editing, click Save in the Cluster configuration page.

All unsaved changes are discarded when you leave the Cluster configuration page.

If you have a saved configuration draft, you can reset the changes for each of its files individually. A reset returns the file into the state that is currently used by a cluster (that is, saved in the configuration storage). If you reset a newly added file, it is deleted.

To reset a saved configuration file, click the Reset button beside the filename.

When you finish editing a configuration and it’s ready to use, apply the updated configuration to the cluster. To apply a cluster configuration, click Apply on the Cluster configuration page. This sends the new configuration to the cluster configuration storage, and it comes into effect upon the cluster configuration reload.

Managing cluster users and roles

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

Tarantool Cluster Manager provides a visual interface for managing Tarantool users and roles on connected clusters.

Примечание

This page describes management of Tarantool users and roles on instances of connected clusters. To learn to manage TCM users, see Access control.

The Tarantool access model defines user access to entities inside a single instance. Thus, to create or alter a cluster-wide user or role, you need to do this on all cluster instances. In replication clusters, changes in access model are possible only on read-write instances (replica set leaders). Changes made on a leader instance are propagated to all instances of its replica set automatically.

Operations on the cluster access model are possible only if the user that TCM uses to connect to the cluster has the privileges to manage users and roles.

You can also manage Tarantool users and roles from TCM using the Lua API as described in Управление доступом. To do this, connect to instance consoles from the Terminal tab of the instance page.

The tools for managing cluster users are located on the Users tab of the instance page.

Важно

To ensure the access model consistency across the cluster, repeat all user management operations on all read-write instances of the cluster.

To create a user on a cluster:

  1. Go to Stateboard.
  2. Find a replica set leader in the instances list and click it to open the instance page.
  3. Go to the Users tab and click Add user.

To edit or delete a user, click the Edit or Delete button against the username in the Users table.

To edit a user’s privileges:

  1. Click the lock icon against the username in the Users table.
  2. In the privileges dialog:
    • Click Add to grant privileges
    • Click Revoke (the trash bin icon) to revoke a privilege

The tools for managing cluster roles are located on the Users tab of the instance page.

Важно

To ensure the access model consistency across the cluster, repeat all role management operations on all read-write instances of the cluster.

To create a role on a cluster:

  1. Go to Stateboard.
  2. Find a replica set leader in the instances list and click it to open the instance page.
  3. Go to the Users tab and click Add role.

To delete a role, click the Delete button against the role name in the Roles table.

To edit a role’s privileges:

  1. Click the lock icon against the role name in the Roles table.
  2. In the privileges dialog:
    • Click Add to grant privileges
    • Click Revoke (the trash bin icon) to revoke a privilege

Security settings

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

Tarantool Cluster Manager includes a web interface for managing security settings of connected clusters. It is available on the Cluster > Security page. On this page, you can manage the following security features in the cluster:

Viewing cluster metrics

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

In Tarantool Cluster Manager, you can view metrics of connected clusters in real time on the Cluster > Cluster metrics page. The list of metrics that Tarantool exposes is provided in the Metrics reference.

Metrics are displayed one by one. To view a metric, select it in the drop-down list at the top of the page. Then, choose a way to visualize it:

Once you select a metric, TCM starts visualizing its current values, updating them once per second. To pause the visualization, click the button on the left from the metrics selector. To stop the visualization, clear the metric selection.

To view metrics of a specific instance, find this instance on the Stateboard, click its name, and go to the Metrics tab of the instance page.

To allow collecting cluster metrics with external systems, such as Prometheus, TCM provides HTTP endpoints at /api/metrics/<clusterId>.

Примечание

Cluster IDs are shown in the cluster selection dialog that opens when you click Cluster at the top of the left navigation pane.

To access such an endpoint, a request must be authorized with an API token that has a cluster.metrics permission on the target cluster.

Below is an example of a Prometheus scrape configuration that collects metrics of a Tarantool cluster from TCM:

- job_name: "tarantool"
    static_configs:
      - targets: ["127.0.0.1:8080"]
    metrics_path: "/api/metrics/00000000-0000-0000-0000-000000000000"
    bearer_token: QgMPZ22JZ3uw7n0QTbqYGAQDmNDs1JnTkhaC1OlQzWM3utmpV78b23GG97zp8YE3

Using supervised failover

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

For Tarantool clusters that use supervised failover, Tarantool Cluster Manager offers tools for interaction with external failover coordinators from its web interface.

The tools for using supervised failover are located on the Failovers page available from the Actions menu on the cluster stateboard.

Примечание

TCM can interact with failover coordinators that are already running. There is no way to start or stop coordinators from TCM.

To view failover coordinators running on the cluster, go to the Failovers tab. On this tab, you can see the information about all Tarantool instances that the cluster uses as failover coordinators. The information includes:

To send a failover command to a coordinator, go to the Commands tab and click Add. Then, provide the command description in the YAML format. It can include the following fields:

Example:

command: switch
new_master: instance-002
timeout: 30

After entering the command, click Save to send the command for execution.

Tarantool assigns an id to the command and waits for the active coordinator to process the command.

All failover commands executed on the cluster are shown on the Commands tab with their ids and statuses. A command can have the following statuses:

To see the command execution details, click this command in the list.

Performing migrations

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

Tarantool Cluster Manager provides a web interface for managing and performing migrations in connected clusters. To learn more about migrations in Tarantool, see Migrations.

Migrations are named Lua files with code that alters the cluster data schema, for example, creates a space, changes its format, or adds indexes. In TCM, there is a dedicated page where you can organize migrations, edit their code, and apply them to the cluster.

Важно

Migrations created between Tarantool versions 1.5.3 and 1.7.3 are not compatible with the tt CLI utility utility, so TCM reverted to the old behavior of handling them; however, these migrations cannot be applied in TCM version 1.8.0, and must be applied on the corresponding Tarantool versions first before upgrading to TCM 1.8.0.

The tools for managing migrations from TCM are located on the Cluster > Migrations page.

To create a migration:

  1. Click Add (the + icon) on the Migrations page.

  2. Enter the migration name.

    Важно

    When naming migrations, remember that they are applied in the lexicographical order. Use ordered numbers as filename prefixes to define the migrations order. For example, 001_create_table, 002_add_column, 003_create_index.

  3. Write the migration code in the editor window. Use the box.schema module reference to learn how to work with Tarantool data schema.

Once you complete writing the migration, save it by clicking Save. This saves the migration that is currently opened in the editor.

After you prepare a set of migrations, apply it to the cluster. To apply all saved migrations to the cluster at once, click Apply.

Важно

Applying all saved migrations at once, in the lexicographical order is the only way to apply migrations in TCM. There is no way to select a single or several migrations to apply. The migrations that are already applied are skipped. To learn how to check a migration status, see Checking migrations status.

Migrations that were created but not saved yet are not applied when you click Apply.

To check the migration results on the cluster, use the Migrated widget on the cluster stateboard. It reflects the general result of the last applied migration set:

Hovering a cursor over the widget shows the number of instances on which the currently saved migration set is successfully applied.

You can also check the status of each particular migration on the Migrations page. The migrations that are successfully applied are marked with green check marks. Failed migrations are marked with exclamation mark icons (!). Hover the cursor over the icon to see the information about the error. To reapply a failed migration, click Force apply in the pop-up with the error information.

The following migration code creates a formatted space with two indexes in a sharded cluster:

local function apply_scenario()
    local space = box.schema.space.create('customers')

    space:format {
        { name = 'id',        type = 'number' },
        { name = 'bucket_id', type = 'number' },
        { name = 'name',      type = 'string' },
    }

    space:create_index('primary', { parts = { 'id' } })
    space:create_index('bucket_id', { parts = { 'bucket_id' }, unique = false })
end

return {
    apply = {
        scenario = apply_scenario,
    },
}

TCF integration

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

Tarantool Cluster Manager provides a web interface for clusters that run within Tarantool Clusters Federation. It is available on the Cluster > TCF page. If a connected cluster is configured to run in a TCF installation, this page shows information about both clusters in this installation: their ID’s, names, and statuses. To switch cluster states in TCF, click Toggle on the TCF page.

To learn more about Tarantool Clusters Federation, see its documentation.

Примечание

For individual clusters, the TCF page is empty.

Accessing cluster data

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

Tarantool Cluster Manager provides access to data stored in connected clusters through its web interface. You can view, add, edit, and delete tuples from spaces.

Примечание

A TCM user’s access to specific clusters and spaces is determined by their cluster permissions and access control list.

Data access is implemented in TCM on a per-instance basis: you can access data stored on one cluster instance at a time. For sharded clusters that use the CRUD module, it’s also possible to access data throughout the whole cluster.

There are the following ways to access data stored on a cluster instance from TCM:

Важно

Data modification is possible only on instances in the read-write mode (replica set leaders). Changes are applied to read-only replicas in accordance with the cluster topology.

The instance explorer provides access to all spaces that exist on the instances in the web interface. This includes both system and user spaces.

To open the instance explorer:

  1. Go to Stateboard.
  2. Click the instance row in the instances list or its graph vertex in the graph view.
  3. Click Explorer in the Actions menu of the instance details page.

To view tuples of a space, click its row in the spaces list.

To add a new tuple, click + on the space page and provide tuple field values in the Lua format, for example, [ 1, 1000, true, "test"].

To edit a tuple, click it in the table and then click Edit.

To delete a tuple, select it in the table and click Delete (the trash bin button).

In the development mode, you can also create, edit, truncate, and delete spaces in the instance explorer. To create a space, click Add and follow the wizard steps. To edit, truncate, or remove a space, click the corresponding button in the Actions menu of the space row in the table.

TCM features an SQL terminal that you can use to access stored data. It is located on the SQL tab of the instance details page. In the SQL terminal, you can execute any supported SQL expressions on the selected instance.

For select SQL queries, you can also download the query result set in the CSV format.

To learn more about using SQL in Tarantool, see the SQL tutorial.

TCM provides interactive access to instances“ consoles on the Terminal tab of the instance details page. You can choose between the tt console (TT Connect tab) and Tarantool interactive console (Direct tab).

In these consoles, you can access the stored data using the Tarantool Lua API.

For sharded clusters that use the CRUD module, it’s possible to access stored data throughout the cluster on the Cluster > Tuples page. This page displays only user spaces.

To view all tuples of a space in a sharded cluster, click the space row in the list.

To add a new tuple, click + on the space page and provide tuple field values in the Lua format, for example [ 1, 1000, true, "test"]. When you add a tuple in a sharded cluster, it is distributed to a replica set based on the sharding key (the bucket_id field) value.

To edit a tuple, click it in the table and then click Edit.

To delete a tuple, select it in the table and click Delete (the trash bin button).

To create a space in a sharded cluster, create it on all read-write cluster instances on their Instance explorer pages.

Важно

Sharded spaces must include the bucket_id field of the unsigned type and a non-unique index by this field with the same name.

To edit, truncate, or delete spaces in a sharded cluster, perform the corresponding action on all read-write cluster instances.

Access control

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

Tarantool Cluster Manager provides means for managing user and client applications access to its own functions and connected clusters:

Role-based access control

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

Tarantool Cluster Manager features a role-based access control system. It enables flexible management of access to TCM functions, connected clusters, and stored data. The TCM access system uses three main entities: permissions, roles, and users (or user accounts). They work as follows:

Примечание

TCM users, roles, and permissions are not to be confused with similar subjects of the Tarantool access control system. To access Tarantool instances directly, Tarantool users with corresponding roles are required.

Permissions define access to specific actions that users can do in TCM. For example, there are permissions to view connected clusters or to manage users.

There are two types of permissions in TCM: administrative and cluster permissions.

Permissions are predefined in TCM, there is no way to change, add, or delete them. The complete lists of administrative and cluster permissions in TCM are provided in the Permissions reference.

Roles are groups of administrative permissions that are assigned to users together.

The assigned roles define pages that users see in TCM and actions available on these pages.

Примечание

Roles don’t include cluster permissions. Access to connected clusters is configured for each user individually.

TCM comes with default roles that cover three common usage scenarios:

  • Super Admin Role is a default role with all available administrative permissions. Additionally, the users with this role automatically gain all cluster permissions to all clusters.
  • Cluster Admin Role is a default role for cluster administration. It includes administrative permissions for cluster management.
  • Default User Role is a default role for working with clusters. It includes basic administrative read permissions that are required to log in to TCM and navigate to a cluster.

Administrators can create new roles, edit, and delete existing ones.

Roles are listed on the Roles page.

To create a new role, click Add, enter the role name, and select the permissions to include in the role.

To edit an existing role, click Edit in the Actions menu of the corresponding table row.

To delete a role, click Delete in the Actions menu of the corresponding table row.

Примечание

You can delete a role only if there are no users with this role.

TCM users gain access to objects and actions through assigned roles and cluster permissions.

A user can have any number of roles or none of them. Users without roles have access only to clusters that are assigned to them.

TCM uses password authentication for users. For information on password management, see the Passwords section below.

There is one default user Default Admin. It has all the available permissions, both administrative and cluster ones. When new clusters are added in TCM, Default Admin automatically receives all cluster permissions for them as well.

Administrators can create new users, edit, and delete existing ones.

The tools for managing users are located on the Users page.

To create a user:

  1. Click Add.
  2. Fill in the user information: username, full name, and description.
  3. Generate or type in a password.
  4. Select roles to assign to the user.
  5. Add clusters to give the user access to, and select cluster permissions for each of them.

To edit a user, click Edit in the Actions menu of the corresponding table row.

To delete a user, click Delete in the Actions menu of the corresponding table row.

TCM uses the general term secret for user authentication keys. A secret is any pair of a public and a private key that can be used for authentication. A password combined with a username is a secret type used for TCM user authentication. In this case, the public key is a username, and the private key is a password.

Users receive their first passwords during their account creation.

All passwords are governed by the password policy. It can be flexibly configured to follow the security requirements of your organization.

To change your own password, click your name in the top-right corner and go to Settings > Change password.

Administrators can manage a user’s password on this user’s Secrets page. To open it, click Secrets in the Actions menu of the corresponding Users table row.

To change a user’s password, click Edit in the Actions menu of the corresponding Secrets table row and enter the new password in the New secret key field.

Passwords expire automatically after the expiration period defined in the password policy. When a user logs in to TCM with an expired password, the only action available to them is a password change. All other TCM functions and objects are unavailable until the new password is set.

Administrators can also set users“ passwords to expired manually. To set a user’s password to expired, click Expire in the Actions menu of the corresponding Secrets table row.

Важно

Password expiration can’t be reverted.

To forbid users“ access to TCM, administrators can temporarily block their passwords. A blocked password can’t be used to log into TCM until it’s unblocked manually or the blocking period expires.

To block a user’s password, click Block in the Actions menu of the corresponding Secrets table row. Then provide a blocking reason and enter the blocking period.

To unblock a blocked password, click Unblock in the Actions menu of the corresponding Secrets table row.

Password policy helps improve security and comply with security requirements that can apply to your organization.

You can edit the TCM password policy on the Password policy page. There are the following password policy settings:

  • Minimal password length.
  • Do not use last N passwords.
  • Password expiration in days. Users“ passwords expire after this number of days since they were set. Users with expired passwords lose access to any objects and functions except password change until they set a new password.
  • Password expiration warning in days. After this number of days, the user sees a warning that their password expires soon.
  • Block after N login attempts. Temporarily block users if they enter their username or password incorrectly this number of times consecutively.
  • User lockout time in seconds. The time interval for which users can’t log in after spending all failed login attempts.
  • Password must include. Characters and symbols that must be present in passwords:
    • Lowercase characters (a-z)
    • Uppercase characters (A-Z)
    • Digits (0-9)
    • Symbols (such as !@#$%^&*()_+№»“:,.;=][{}`?>/.)

The following administrative permissions are available in TCM:

Permission Description
admin.clusters.read View connected clusters“ details
admin.clusters.write Edit cluster details and add new clusters
admin.users.read View users“ details
admin.users.write Edit user details and add new users
admin.roles.read View roles“ details
admin.roles.write Edit roles and add new roles
admin.addons.read View add-ons
admin.addons.write Edit add-on flags
admin.addons.upload Upload new add-ons
admin.auditlog.read View audit log configuration and read audit log in TCM
admin.auditlog.write Edit audit log configuration
admin.sessions.read View users“ sessions
admin.sessions.write Revoke users“ sessions
admin.ldap.read View LDAP configurations
admin.ldap.write Manage LDAP configurations
admin.passwordpolicy.read View password policy
admin.passwordpolicy.write Manage password policy
admin.secrets.read View information about users“ secrets
admin.secrets.write Manage users“ secrets: add, edit, expire, block, delete
user.password.change User’s permission to change their own password
user.api-token.read User’s permission to read their own API tokens information
user.api-token.write User’s permission to modify their own API tokens
admin.metrics Read TCM metrics
admin.acl.read View the access control list (ACL)
admin.acl.write Add and delete ACL entries

The following cluster permissions are available in TCM:

Permission Description
cluster.config.read View cluster configuration
cluster.config.write Manage cluster configuration
cluster.stateboard.read View cluster stateboard
cluster.func.read View cluster’s stored functions
cluster.func.write Edit cluster’s stored functions
cluster.func.call Execute stored functions on cluster instances
cluster.space.read Read cluster data schema
cluster.space.write Modify cluster data schema
cluster.space.data.read Read stored data from cluster
cluster.space.data.write Edit stored data on cluster
cluster.failover.read Read cluster failover information
cluster.failover.write Write cluster failover commands
cluster.terminal Connect to cluster instances with tt terminal from TCM
cluster.sql Execute SQL queries
cluster.metrics View cluster metrics

LDAP authentication

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

In addition to its internal role-based access control model, Tarantool Cluster Manager can use an external LDAP (Lightweight Directory Access Protocol) directory server for user authentication and authorization.

When LDAP authentication is enabled, TCM uses a connected LDAP directory server to authenticates users who submit the login form. TCM constructs requests to the servers according to configuration parameters described on this page. Permissions of LDAP users in TCM are defined by LDAP group mapping.

Both LDAP and secure LDAPS (LDAP over TLS) protocols are supported.

LDAP authentication can be enabled using either of two configuration methods:

To allow LDAP user authentication in TCM, enable the ldap authentication method in the security.auth configuration option before startup:

  • In the YAML TCM configuration:

    security:
      auth:
        - ldap
    
  • In the command line:

    $ tcm --security.auth="ldap"
    

Примечание

If both authentication methods – LDAP and local – are enabled, TCM tries them for each login attempt in the order they are specified in the configuration.

To enable LDAP authentication using the TCM web interface:

  1. Click the user icon in the top-right corner of the screen.
  2. Select Settings from the dropdown menu.
  3. Navigate to the Authentication methods tab.
  4. Check the box next to LDAP.
  5. Save the changes.

To enable LDAP user access to TCM, create an LDAP configuration that connects TCM to the LDAP server that stores the users. An LDAP configuration defines how TCM connects to the server and queries user data. To create an LDAP configuration, go to the LDAP page in the Settings group and click Add.

To edit an LDAP configuration, click Edit in the Actions menu of the corresponding row.

To delete an LDAP configuration, click Delete in the Actions menu of the corresponding row.

Define the general configuration settings:

  • Enabled. Defines if the configuration is used. Turn the toggle off to stop using the configuration.

    Примечание

    If there are several enabled LDAP configurations, TCM attempts to use them for user authentication in the order they are created.

  • Automatically add non-existent users. By default, TCM automatically saves LDAP user information to its backend store upon their first login. Turn the toggle off if you don’t want to save users from this LDAP server.

Enter the LDAP server connection parameters:

  • Endpoints. URLs of the LDAP server. Example: 127.0.0.1:5056.
  • Request timeout. The timeout for TCM requests to the LDAP server, in seconds.
  • Enabled TLS. If the server uses LDAPS, turn this toggle on and specify TLS connection parameters, such as a certificate and a key file.

To define how TCM queries the LDAP server for user authentication and authorization, fill in the fields of the Queries step:

  • Query user and Query password. Credentials of the LDAP user on behalf of which all LDAP queries are executed: a distinguished name (DN) and a password. Example DN:

    cn=admin,cn=users,dc=tarantool,dc=io
    
  • Base DN. The DN of a directory that serves as a root for making all LDAP requests. Example: dc=tarantool,dc=io.

  • Username regex. A regular expression that defines a username template for this LDAP configuration. When a user enters their username on the login page, TCM matches it against username regular expressions of all enabled LDAP configurations and selects the one to use for this user authentication.

    Example: a regex to match employee email addresses within the specified domain.

    ^([\w\-\.]+)@tarantool.io$
    
  • (Optional) Template DN. A template for building a DN to send in an authentication bind request. Use the numbers in curly braces as placeholders to replace with username regex parts: {0}, {1}, and so on.

    Example:

    cn={0},cn=users,dc=tarantool,dc=io
    

    When used with the Username regex shown above, it substitutes {0} with the username part of the email address (before @) entered into the login form. For example, the username user1@tarantool.io forms the following DN for bind request:

    cn=user1,cn=users,dc=tarantool,dc=io
    
  • (Optional) Template query. A template for querying the LDAP server for the DN. This way is used if Template DN is not provided.

  • Group query template. A template for querying groups to which a user belongs for authorization purposes. Learn more in LDAP user permissions. Example:

    (&(objectCategory=person)(objectClass=user)(cn={0}))
    

Permissions of LDAP users in TCM are defined by the groups to which they belong. You can map TCM administrative and cluster permissions to LDAP groups on the Groups step of the configuration creation.

To assign permissions to an LDAP group, click Add group. In the dialog that opens, enter the group name, for example, CN=Admins,CN=Builtin,DC=tarantool,DC=io. Then, select administrative permission to grant to this group in the Permissions list.

To grant cluster permissions, click Add cluster. Select a cluster and the cluster permissions to grant to the group. Save the group.

Each user has permissions of all LDAP groups to which they belong.

To stop using an LDAP configuration, open its Edit page and turn off the Enabled toggle.

Access control list

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

Tarantool Cluster Manager access control list (ACL) determines user access to particular data and functions stored in clusters. You can use it to allow or deny access to specific stored objects one by one.

Each ACL entry specifies privileges that a TCM user has on a particular space or a function. There are three access privileges that can be granted in the ACL: read, write, and execute (for stored functions only). The privileges work as follows:

Важно

User access to space data and stored functions is primarily defined by the cluster permissions cluster.space.data.* and cluster.func.*. ACL only increases the access control granularity to particular objects. Make sure that users have these permissions before enabling ACL for them.

To granularly manage a user’s access to particular objects in a cluster, enable the use of ACL in the user profile:

  1. Go to Users and click Edit in the Actions menu of the corresponding table row.
  2. In the user’s Clusters list, add a cluster on which you want to use ACL or click the pencil icon if the cluster is already on the list.
  3. Select the Use Access Control List (ACL) checkbox and save changes.
  4. Repeat two previous steps for each cluster on which you want to use ACL for this user.
  5. Click Update to save the user account.

If the user doesn’t exist yet, you can do the same when creating it.

Важно

When ACL use is enabled for a user, this user loses access to all spaces and functions of the selected cluster except the ones explicitly specified in the ACL.

The tools for managing ACL are located on the ACL page.

To add an ACL entry:

  1. Click Add.
  2. Select a user to which you want to grant access.
  3. Select a cluster that stores the target object: a space or a function.
  4. Select the target object type and enter its name.
  5. Select the privileges you want to grant.

To delete an ACL entry, click Delete in the Actions menu of the corresponding table row.

API tokens

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

Tarantool Cluster Manager uses the Bearer HTTP authentication scheme with API tokens to authenticate external applications“ requests to TCM. For example, these can be Prometheus jobs that retrieve metrics of connected Tarantool clusters.

The API tokens functionality is disabled by default. To enable it, set the feature.api-token configuration option to true.

feature:
  api-token: true

Each TCM API token belongs to the user that created it and has the same access permissions. Thus, if a user has a permission to view a cluster’s metrics in TCM, this user’s API tokens can be used to read this cluster’s metrics with Prometheus.

API tokens have expiration dates that are set during the token creation and cannot be changed.

Примечание

Each user, including Default Admin and other administrators, can create only their own tokens. There is no way to create a token for another user.

To create a TCM API token:

  1. Open the user settings by clicking the user’s name in the top-right corner.
  2. Go to the API tokens tab and click Add.
  3. Specify the token expiration date and an optional description and click Add.

The created token is shown in a dialog.

Важно

An API token is shown only once after its creation. There is no way to view it again after you close the dialog. Make sure to copy the token in a safe place.

To delete an API token, click Delete in the actions menu of the corresponding API tokens table row.

Administrators can also view information about users“ API tokens and delete them on the Secrets page. To open a user’s secrets, click Secrets in the Actions menu of the corresponding Users table row.

Sessions

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

Tarantool Cluster Manager administrators can view and revoke user sessions in the web interface. All active sessions are listed on the Sessions page. To revoke a session, click Revoke in the Actions menu of the corresponding table row.

To revoke all sessions of a TCM user, go to Users and click Revoke all sessions in the Actions menu of the corresponding table row.

Audit log

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

Tarantool Cluster Manager provides the audit logging functionality for tracking user activity and security-related events, such as:

The complete list of TCM audit events is provided in Event types.

Примечание

TCM audit log records only events that happen in TCM itself. For information about Tarantool audit logging, see Audit module.

Audit logging is disabled in TCM by default. To start recording events, you need to enable and configure it.

The audit log stores event details in the JSON format. Each log entry contains the event type, description, time, impacted objects, and other information that may be used for incident investigation. The complete list of fields is provided in Structure of audit log events.

TCM also provides a built-in interface for reading and searching the audit log. For details, see Viewing audit log.

To enable audit logging in TCM, go to Audit settings and click Enable.

To additionally send audit log events to the standard output, click Send to stdout.

TCM audit events can be logged to a local file or sent to a syslog server. To configure audit logging, go to Audit settings.

To write TCM audit logs to a file:

  1. Go to Audit settings and select the file protocol.
  2. Specify the name of the audit log file. The file appears in the TCM working directory.
  3. Configure the log files rotation: the maximum file size and age, and the number of files to store simultaneously.
  4. (Optional) Enable compression of audit log files.

Configuration parameters:

  • Output file name. The name of the audit log file. Default: audit.log
  • Max size (in MB). The maximum size of the log file before it gets rotated, in megabytes. Default: 100.
  • Max backups. The maximum number of stored audit log files. Default: 10.
  • Max age (in days). The maximum age of audit log files in days. Default: 30.
  • Compress. Compress audit log files into gzip archives when rotating.

If you use a centralized log management system based on syslog, you can configure TCM to send its audit log to your syslog server:

  1. Go to Audit settings and select the syslog protocol.
  2. Enter the syslog server URI and select the network protocol. Typically, syslogd listens on port 514 and uses the UDP protocol.
  3. Specify the syslog logging parameters: timeout, priority, and facility.

Configuration parameters:

  • Protocol. The network protocol used for connecting to the syslog server. Default: udp.
  • Output. The syslog server URI. Default: 127.0.0.1:514 (localhost).
  • Timeout. The syslog write timeout in the ISO 8601 duration format. Default: PT2S (two seconds).
  • Priority. The syslog severity level. Default: info.
  • Facility. The syslog facility. Default: local0.

When the audit log is enabled, TCM records all audit events listed in Event types. To decrease load and make the audit log comply with specific security requirements, you can record only selected events. For example, these can be events of user account management or events of cluster data access.

To select events to record into the audit log, go to Audit settings and enter their types into the Filters field one-by-one, pressing the Enter key after each type.

To remove an event type from a filters list, click the cross icon beside it.

If the audit log is written to a file, you can view it in TCM on the Audit log page. On this page, you can view or search for events.

To view the details of a logged audit event, click the corresponding line in the table.

To search for an event, use the search bar at the top of the page. Note that the search is case-sensitive. For example, to find events with the ALARM severity, enter ALARM, not alarm.

All entries of the TCM audit log include the mandatory fields listed in the table below.

Field Description Example
time Time of the event 2023-11-23T12:05:27.099+07:00
severity Event severity: VERBOSE, INFO, WARNING, or ALARM INFO
type Audit event type user.update
description Human-readable event description Update user
uuid Event UUID f8744f51-5760-40c3-ae2d-0b4d6b44836f
user UUID of the user who triggered the event 942a4f54-cf7f-4f46-80ce-3511dbbb57b7
remote Remote host that triggered the event 100.96.163.226:48722
host The TCM host on which the event happened 100.96.163.226:8080
userAgent Information about the client application and platform that was used to trigger the event Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36
permission The permission that was used to trigger the event [«admin.users.write»]
result Event result: ok or nok ok
err Human-readable error description for events with nok result failed to login
fields Additional fields for specific event types in the key-value format

Key examples:

  • clusterId in cluster-related events
  • payload in events that include sending data to the server
  • username in current.* or auth.* events

This is an example of an audit log entry on a successful login attempt:

{
    "time": "2023-11-23T12:01:27.247+07:00",
    "severity": "INFO",
    "description": "Login user",
    "type": "current.login",
    "uuid": "4b9c2dd1-d9a1-4b40-a448-6bef4a0e5c79",
    "user": "",
    "remote": "127.0.0.1:63370",
    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
    "host": "127.0.0.1:8080",
    "permissions": [],
    "result": "ok",
    "fields": [
        {
            "Key": "username",
            "Value": "admin"
        },
        {
            "Key": "method",
            "Value": "null"
        },
        {
            "Key": "output",
            "Value": "true"
        }
    ]
}

The following table lists all possible values of the type field of TCM audit log events.

Event type Description
auth.fail Authentication failed
auth.ok Authentication successful
access.denied An attempt to access an object without the required permission
crud.insert Data inserted via CRUD operations
crud.delete Data deleted via CRUD operations
user.add User added
user.update User updated
user.delete User deleted
secret.add User secret added
secret.update User secret updated
secret.block User secret blocked
secret.unblock User secret unblocked
secret.delete User secret deleted
secret.expire User secret expired
session.revoke Session revoked
session.revokeuser All user’s sessions revoked
explorer.insert Data inserted in a cluster
explorer.delete Master switched manually
test.devmode Switched to development mode
auditlog.config Audit log configuration changed
passwordpolicy.save Password policy changed
passwordpolicy.resetpasswords All passwords are expired by an administrator
ddl.save Cluster data model saved
ddl.apply Cluster data model applied
cluster.config.save Cluster configuration saved
cluster.config.reset Saved cluster configuration reset
cluster.config.apply Cluster configuration applied
current.logout User logged out their own session
current.revoke User revoked their own session
current.revokeall User revoked all their active sessions
current.changepassword User changed their password
role.add Role added
role.update Role updated
role.delete Role deleted
cluster.add Cluster added
cluster.update Cluster updated
cluster.delete Cluster removed
ldap.testlogin Login test executed for an LDAP configuration
ldap.testconnection Connection test executed for an LDAP configuration
ldap.add LDAP configuration added
ldap.update LDAP configuration updated
ldap.delete LDAP configuration deleted
addon.enable Add-on enabled
addon.disable Add-on disabled
addon.delete Add-on removed
tcmstate.save Low-level information saved in the TCM storage (for debug purposes)
tcmstate.delete Low-level information deleted from the TCM storage (for debug purposes)

Configuration

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

This topic describes how to configure Tarantool Cluster Manager. For the complete list of TCM configuration parameters, see the TCM configuration reference.

Примечание

To learn about Tarantool cluster configuration, see Configuration.

Tarantool Cluster Manager configuration is a set of parameters that define various aspects of TCM functioning. Parameters are grouped by the particular aspect that they affect. There are the following groups:

Parameter groups can be nested. For example, in the http group there are tls and websession-cookie groups, which define TLS encryption and cookie settings.

Parameter names are the full paths from the top-level group to the specific parameter. For example:

There are three ways to pass TCM configuration parameters:

TCM configuration can be stored in a YAML file. Its structure must reflect the configuration parameters hierarchy.

The example below shows a fragment of a TCM configuration file:

# a fragment of a YAML configuration file
cluster: # top-level group
    on-air-limit: 4096
    connection-rate-limit: 512
    tarantool-timeout: 10s
    tarantool-ping-timeout: 5s
http: # top-level group
    basic-auth: # nested group
        enabled: false
    network: tcp
    host: 127.0.0.1
    port: 8080
    request-size: 1572864
    websocket: # nested group
        read-buffer-size: 16384
        write-buffer-size: 16384
        keepalive-ping-interval: 20s
        handshake-timeout: 10s
        init-timeout: 15s

To start TCM with a YAML configuration, pass the location of the configuration file in the -c command-line option:

$ tcm -c=config.yml

TCM can take values of its configuration parameters from environment variables. The variable names start with TCM_. Then goes the full path to the parameter, converted to upper case. All delimiters are replaced with underscores (_). Examples:

  • TCM_HTTP_HOST is a variable for the http.host parameter.
  • TCM_HTTP_WEBSESSION_COOKIE_NAME is a variable for the http.websession-cookie.name parameter.

The example below shows how to start TCM with configuration parameters passed in environment variables:

$ export TCM_HTTP_HOST=0.0.0.0
$ export TCM_HTTP_PORT=8888
$ tcm

The TCM executable has -- command-line options for each configuration parameter. Their names reflect the full path to the parameter, with configuration levels separated by periods (.). Examples:

  • --http.host is an option for http.host.
  • --http.websession-cookie.name is an option for http.websession-cookie.name.

The example below shows how to start TCM with configuration parameters passed in command-line options:

$ tcm --storage.etcd.embed.enabled --addon.enabled --http.host=0.0.0.0 --http.port=8888

TCM configuration options are applied from multiple sources with the following precedence, from highest to lowest:

  1. tcm executable arguments.
  2. TCM_* environment variables.
  3. Configuration from a YAML file.

If the same option is defined in two or more locations, the option with the highest precedence is applied. For options that aren’t defined in any location, the default values are used.

You can combine different ways of TCM configuration for efficient management of multiple TCM installations:

  • A single YAML file for all installations can contain the common configuration parts. For example, a single configuration storage that is used for all installations, or TLS settings.
  • Environment variables that set specific parameters for each server, such as local directories and paths.
  • Command-line options for parameters that must be unique for different TCM instances running on a single server. For example, http.port.

TCM configuration parameters have the Go language types. Note that this is different from the Tarantool configuration parameters, which have Lua types.

Most options have the Go’s basic types: int and other numeric types, bool, string.

http:
    basic-auth:
        enabled: false # bool
    network: tcp # string
    host: 127.0.0.1 # string
    port: 8080 # int
    request-size: 1572864 # int64

Parameters that can take multiple values are arrays. In YAML, they are passed as YAML arrays: each item on a new line, starting with a dash.

storage:
provider: etcd
etcd:
    endpoints: # array
        - https://192.168.0.1:2379 # item 1
        - https://192.168.0.2:2379 # item 2

Примечание

In environment variables and command line options, such arrays are passed as semicolon-separated strings of items.

Parameters that set timeouts, TTLs, and other duration values, have the Go’s time.Duration type. Their values can be passed in time-formatted strings such as 4h30m25s.

cluster:
    tarantool-timeout: 10s # duration
    tarantool-ping-timeout: 5s # duration

Finally, there are parameters whose values are constants defined in Go packages. For example, http.websession-cookie.same-site values are constants from the Go’s http.SameSite type. To find out the exact values available for such parameters, refer to the Go packages documentation.

http:
    websession-cookie:
        same-site: SameSiteStrictMode

You can create a YAML configuration template for TCM with all parameters and their default values using the generate-config option of the tcm executable.

To write a default TCM configuration to the tcm.example.yml file, run:

$ tcm generate-config > tcm.example.yml.

You can use YAML configuration files to create entities in TCM automatically upon the first start. These entities are defined in the initial-settings section of the configuration file.

Важно

The initial settings are applied only once upon the first TCM start. Further changes are not applied upon TCM restarts.

To add clusters to TCM upon the first start, specify their settings in the initial-settings.clusters configuration section.

The initial-settings.clusters section is an array whose items describe separate clusters, for example:

initial-settings:
  clusters:
    - name: Cluster 1
      description: First cluster
      # cluster settings
    - name: Cluster 2
      description: Second cluster
      # cluster settings

In this configuration, you can specify all cluster settings that you define when connecting clusters through the TCM web interface. This includes:

  • the cluster name
  • description
  • additional URLs
  • configuration storage connection
  • Tarantool instances connection
  • and other settings.

For the full list of cluster configuration parameters, see the initial-settings.clusters reference. For example, this is how you add a cluster that uses an etcd configuration storage:

initial-settings:
  clusters:
    - name: My cluster
      description: Cluster description
      urls:
      - label: Test
        url: http://example.com
      storage-connection:
        provider: etcd
        etcd-connection:
          endpoints:
            - http://127.0.0.1:2379
          username: ""
          password: ""
          prefix: /cluster1
        tarantool-connection:
          username: guest
          password: ""

By default, TCM contains a cluster named Default cluster with ID 00000000-0000-0000-0000-000000000000. You can use this ID to modify the default cluster settings upon the first TCM start. For example, rename it and add its connection settings:

initial-settings:
  clusters:
    - id: 00000000-0000-0000-0000-000000000000
      name: My cluster
      storage-connection:
        provider: etcd
        etcd-connection:
          endpoints:
            - http://127.0.0.1:2379
          username: etcd-user
          password: secret
          prefix: /cluster1
        tarantool-connection:
          username: guest
          password: ""

Backend store

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

Tarantool Cluster Manager uses an underlying data store (backend store) for its entities: users, roles, cluster connections, settings, and other objects that you manipulate in TCM. The backend store can be either an etcd or a Tarantool cluster.

For better reliability and scalability, the backend store works independently from TCM. For example, it can be the same ectd or Tarantool cluster that you use as a centralized configuration storage. This makes TCM stateless: all objects created or modified in its web UI are saved to the backend store, and nothing is stored inside the TCM instances themselves. Any number of instances can duplicate each other when connected to the same backend store. If you stop all instances, the store still contains their objects. You can continue working with them right after starting a new instance.

In addition to using an external backend store, you can run TCM with an embedded etcd or Tarantool instance to use as the backend store.

On this page, you will learn to connect TCM to backend stores of both types, or start TCM with an embedded backend store.

The TCM backend store requires the same configuration as Tarantool centralized configuration storage. Follow the instructions in Setting up a configuration storage to set up a backend store.

Примечание

If you already have the centralized configuration store for your Tarantool clusters, you can use it as a TCM backend store as well.

The TCM’s connection to its backend store is configured using the storage.* configuration options. The storage.provider option selects the store type. It can be either etcd or tarantool.

To use an etcd cluster as a TCM backend store, set the storage.provider option to etcd and specify connection parameters in storage.etcd.* options. A minimal etcd configuration includes the storage endpoints:

storage:
  provider: etcd
  etcd:
    endpoints:
      - http://127.0.0.1:2379

If authentication is enabled in etcd, specify storage.etcd.username and storage.etcd.password:

storage:
  provider: etcd
  etcd:
    endpoints:
      - http://127.0.0.1:2379
    username: etcduser
    password: secret

The TCM data is stored in etcd under the prefix specified in storage.etcd.prefix. By default, the prefix is /tcm. If you want to change it or store data of different TCM instances separately in one etcd cluster, set the prefix explicitly:

storage:
  provider: etcd
  etcd:
    endpoints:
      - http://127.0.0.1:2379
    prefix: /tcm2

Other storage.etcd.* options configure various aspects of the etcd store connection, such as network timeouts and limits or TLS parameters. For the full list of the etcd TCM backend store options, see the TCM configuration reference.

To use a Tarantool cluster as a TCM backend store, set the storage.provider option to tarantool and specify connection parameters in storage.tarantool.* options. A minimal configuration includes the one or more addresses of the backend store instances:

storage:
  provider: tarantool
  tarantool:
    addr: http://127.0.0.1:3301

Or:

storage:
  provider: tarantool
  tarantool:
    addrs:
      - http://127.0.0.1:3301
      - http://127.0.0.1:3302
      - http://127.0.0.1:3303

If authentication is enabled in the backend store, specify storage.tarantool.username and storage.tarantool.password:

storage:
  provider: tarantool
  tarantool:
    addr: http://127.0.0.1:3301
    username: tarantooluser
    password: secret

The TCM data is stored in the Tarantool-based backend store under the prefix specified in storage.tarantool.prefix. By default, the prefix is /tcm. If you want to change it or store data of different TCM instances separately in one Tarantool cluster, set the prefix explicitly:

storage:
  provider: tarantool
  tarantool:
    addr: http://127.0.0.1:3301
    username: tarantooluser
    password: secret
    prefix: /tcm2

Other storage.tarantool.* options configure various aspects of TCM connection to the Tarantool-based backend store, such as network timeouts and limits or TLS parameters. For the full list of the Tarantool-based TCM backend store options, see the TCM configuration reference.

For development purposes, you can start TCM with an embedded backend store. This is useful for local runs when you don’t have or don’t need an external backend store.

Важно

Do not use the embedded backend stores in production environments.

An embedded TCM backend store is a single instance of etcd or Tarantool that is started automatically on the same host during the TCM startup. It runs in the background until TCM is stopped. The embedded backend store is persistent: if you start TCM again with the same backend store configuration, it restores the TCM data from the previous runs.

Примечание

To start a clean instance of TCM, remove the working directory of the embedded backend store specified in the storage.etcd.embed.workdir or storage.tarantool.embed.workdir option.

The embedded backend store parameters are configured using the storage.etcd.embed.* options for etcd or storage.tarantool.embed.* options for a Tarantool-based store.

To start TCM with an embedded etcd with default settings, set storage.etcd.embed.enabled to true and leave other storage.* options default:

storage.etcd.embed.enabled: true

You can use the following call to get TCM running with embedded etcd without a configuration file:

$ tcm --storage.etcd.embed.enabled

To start TCM with an embedded Tarantool storage with default settings:

storage:
  provider: tarantool
  tarantool.embed.enabled: true

With command-line arguments:

$ tcm --storage.provider=tarantool --storage.tarantool.embed.enabled

You can tune the embedded backend store, for example, enable and configure TLS on it or change its working directories or startup arguments. To set specific parameters, specify the corresponding storage.etcd.embed.* or storage.tarantool.embed.* options. For the full list of configuration options of embedded backend stores, see the TCM configuration reference.

To simulate the production environment, you can form a distributed multi-instance cluster from embedded stores of multiple TCM instances. To do this, configure each TCM instance’s embedded store to join each other.

For etcd, provide the embedded store clustering parameters storage.etcd.embed.* and specify the endpoints in storage.etcd.endpoints. The options that configure embedded etcd mostly match the etcd configuration options. For more information about these options, see the etcd documentation.

Below are example configurations of three TCM instances that start with embedded etcd instances and form an etcd cluster from them:

  • First instance:

    http:
      port: 8080
    storage:
      provider: etcd
      etcd:
        endpoints:
          - http://127.0.0.1:2379
          - http://127.0.0.1:22379
          - http://127.0.0.1:32379
        embed:
          enabled: true
          name: infra1
          endpoints:
            - http://127.0.0.1:2379
          advertises:
            - http://127.0.0.1:2379
          initial-cluster-state: new
          initial-cluster: "infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380"
          initial-cluster-token: etcd-cluster-1
          peer-endpoints:
            - http://127.0.0.1:12380
          peer-advertises:
            - http://127.0.0.1:12380
          workdir: node1.etcd
    
  • Second instance:

    http:
      port: 8081
    storage:
      provider: etcd
      etcd:
        endpoints:
          - http://127.0.0.1:2379
          - http://127.0.0.1:22379
          - http://127.0.0.1:32379
        embed:
          enabled: true
          name: infra2
          endpoints:
            - http://127.0.0.1:22379
          advertises:
            - http://127.0.0.1:22379
          initial-cluster-state: new
          initial-cluster: "infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380"
          initial-cluster-token: etcd-cluster-1
          peer-endpoints:
            - http://127.0.0.1:22380
          peer-advertises:
            - http://127.0.0.1:22380
          workdir: node2.etcd
    
  • Third instance:

    http:
      port: 8082
    storage:
      provider: etcd
      etcd:
        endpoints:
          - http://127.0.0.1:2379
          - http://127.0.0.1:22379
          - http://127.0.0.1:32379
        embed:
          enabled: true
          name: infra3
          endpoints:
            - http://127.0.0.1:32379
          advertises:
            - http://127.0.0.1:32379
          initial-cluster-state: new
          initial-cluster: "infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380"
          initial-cluster-token: etcd-cluster-1
          peer-endpoints:
            - http://127.0.0.1:32380
          peer-advertises:
            - http://127.0.0.1:32380
          workdir: node3.etcd
    

To set up a cluster from embedded Tarantool-based backend stores:

  1. Specify the Tarantool cluster configuration in storage.tarantool.embed.config (as a plain text) or storage.tarantool.embed.config-file (as a YAML file).
  2. Assign an instance name from this configuration to each instance using storage.tarantool.embed.args to each embedded store.

Below are example configurations of three TCM instances that start with embedded Tarantool-based backend stores and form a cluster from them:

  • First instance:

    http:
      port: 8080
    storage:
      provider: tarantool
      tarantool:
        addrs:
          - http://127.0.0.1:3301
          - http://127.0.0.1:3302
          - http://127.0.0.1:3303
        embed:
          enabled: true
          executable: /path/to/execfile/tarantool-enterprise/tarantool
          config-filename: config.yml
          workdir: node1.tarantool
          args:
            - --name
            - instance-001
            - --config
            - config.yml
    
  • Second instance:

    http:
      port: 8081
    storage:
      provider: tarantool
      tarantool:
        addrs:
          - http://127.0.0.1:3301
          - http://127.0.0.1:3302
          - http://127.0.0.1:3303
        embed:
          enabled: true
          executable: /path/to/execfile/tarantool-enterprise/tarantool
          config-filename: config.yml
          workdir: node2.tarantool
          args:
            - --name
            - instance-002
            - --config
            - config.yml
    
  • Third instance:

    http:
      port: 8082
    storage:
      provider: tarantool
      tarantool:
        addrs:
          - http://127.0.0.1:3301
          - http://127.0.0.1:3302
          - http://127.0.0.1:3303
        embed:
          enabled: true
          executable: /path/to/execfile/tarantool-enterprise/tarantool
          config-filename: config.yml
          workdir: node3.tarantool
          args:
            - --name
            - instance-003
            - --config
            - config.yml
    

Development mode

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

Tarantool Cluster Manager provides a special mode aimed to use during the development. This mode extends the web interface with capabilities that can help in development or testing environments, such as starting and stopping instances or instance promotion.

You can enable TCM development mode in different ways: in its web interface, in the configuration file, using an environment variable, or using a command-line option.

To enable development mode on the running TCM instance, use its web interface:

  1. Open user settings: click Settings under the user name in the header.
  2. Go to the About tab.
  3. Click the toggle button beside tcm/mode.

To start TCM in the development mode, specify the mode: development option in its configuration file:

# tcm_config.yaml
mode: development

To start TCM in the development mode, specify the --mode=development command-line option:

$ tcm --mode=development

To make new TCM instances start in the development mode by default, set the TCM_MODE environment variable to development:

$ export TCM_MODE=development
$ tcm

Configuration reference

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

This topic describes configuration parameters of Tarantool Cluster Manager.

There are the following groups of TCM configuration parameters:

The cluster group defines parameters of TCM interaction with connected Tarantool clusters.

cluster.connection-rate-limit

A rate limit for connections to Tarantool instances.


Type: uint
Default: 512
Environment variable: TCM_CLUSTER_CONNECTION_RATE_LIMIT
Command-line option: --cluster.connection-rate-limit
cluster.tarantool-timeout

A timeout for receiving a response from Tarantool instances.


Type: time.Duration
Default: 10s
Environment variable: TCM_CLUSTER_TARANTOOL_TIMEOUT
Command-line option: --cluster.tarantool-timeout
cluster.tarantool-ping-timeout

A timeout for receiving a ping response from Tarantool instances.


Type: time.Duration
Default: 5s
Environment variable: TCM_CLUSTER_TARANTOOL_PING_TIMEOUT
Command-line option: --cluster.tarantool-ping-timeout
cluster.tt-command

The command that runs the tt utility on hosts with cluster instances.


Type: string
Default: tt
Environment variable: TCM_CLUSTER_TT_COMMAND
Command-line option: --cluster.tt-command
cluster.refresh-state-period

The time interval for refreshing the cluster instances state on the Stateboard.


Type: time.Duration
Default: 5s
Environment variable: TCM_CLUSTER_REFRESH_STATE_PERIOD
Command-line option: --cluster.refresh-state-period
cluster.refresh-state-timeout

The time limit for refreshing an instance state. If this limit is reached, an error is shown.


Type: time.Duration
Default: 4s
Environment variable: TCM_CLUSTER_REFRESH_STATE_TIMEOUT
Command-line option: --cluster.refresh-state-timeout
cluster.discovery-period

The time interval for checking the leadership in replica sets.


Type: time.Duration
Default: 4s
Environment variable: TCM_CLUSTER_DISCOVERY_PERIOD
Command-line option: --cluster.discovery-period
cluster.sharding-index

The name of the space field that is used as a sharding key.


Type: string
Default: bucket_id
Environment variable: TCM_CLUSTER_SHARDING_INDEX
Command-line option: --cluster.sharding-index
cluster.skew-time

The maximum time skew between any two cluster instances. If this limit is reached, a warning is shown.


Type: time.Duration
Default: 30s
Environment variable: TCM_CLUSTER_SKEW_TIME
Command-line option: --cluster.skew-time
cluster.fragmentation-threshold

The count of allocated slabs that reflects high memory fragmentation. When this number is reached, a warning is shown.

See also: Хранение данных с помощью memtx


Type: int
Default: 40
Environment variable: TCM_CLUSTER_FRAGMENTATION_THRESHOLD
Command-line option: --cluster.fragmentation-threshold

The http group defines parameters of HTTP connections between TCM and clients.

http.network

An addressing scheme that TCM uses.

Possible values:

  • tcp: IPv4 address
  • tcp6: IPv6 address
  • unix: Unix domain socket

Type: string
Default: tcp
Environment variable: TCM_HTTP_NETWORK
Command-line option: --http.network
http.host

A host name on which TCM serves.


Type: string
Default: 127.0.0.1
Environment variable: TCM_HTTP_HOST
Command-line option: --http.host
http.port

A port on which TCM serves.


Type: int
Default: 8080
Environment variable: TCM_HTTP_PORT
Command-line option: --http.port
http.request-size

The maximum size (in bytes) of a client HTTP request to TCM.


Type: int64
Default: 1572864
Environment variable: TCM_HTTP_REQUEST_SIZE
Command-line option: --http.request-size
http.websocket.read-buffer-size

The size (in bytes) of the read buffer for WebSocket connections.


Type: int
Default: 16384
Environment variable: TCM_HTTP_WEBSOCKET_READ_BUFFER_SIZE
Command-line option: --http.websocket.read-buffer-size
http.websocket.write-buffer-size

The size (in bytes) of the write buffer for WebSocket connections.


Type: int
Default: 16384
Environment variable: TCM_HTTP_WEBSOCKET_WRITE_BUFFER_SIZE
Command-line option: --http.websocket.write-buffer-size
http.websocket.keepalive-ping-interval

The time interval for sending WebSocket keepalive pings.


Type: time.Duration
Default: 20s
Environment variable: TCM_HTTP_WEBSOCKET_KEEPALIVE_PING_INTERVAL
Command-line option: --http.websocket.keepalive-ping-interval
http.websocket.handshake-timeout

The time limit for completing a WebSocket opening handshake with a client.


Type: time.Duration
Default: 10s
Environment variable: TCM_HTTP_WEBSOCKET_HANDSHAKE_TIMEOUT
Command-line option: --http.websocket.handshake-timeout
http.websocket.init-timeout

The time limit for establishing a WebSocket connection with a client.


Type: time.Duration
Default: 15s
Environment variable: TCM_HTTP_WEBSOCKET_INIT_TIMEOUT
Command-line option: --http.websocket.init-timeout
http.websession-cookie.name

The name of the cookie that TCM sends to clients.

This value is used as the cookie name in the Set-Cookie HTTP response header.


Type: string
Default: tcm
Environment variable: TCM_HTTP_WEBSESSION_COOKIE_NAME
Command-line option: ---http.websession-cookie.name
http.websession-cookie.path

The URL path that must be present in the requested URL in order to send the cookie.

This value is used in the Path attribute of the Set-Cookie HTTP response header.


Type: string
Default: «»
Environment variable: TCM_HTTP_WEBSESSION_COOKIE_PATH
Command-line option: ---http.websession-cookie.path
http.websession-cookie.domain

The domain to which the cookie can be sent.

This value is used in the Domain attribute of the Set-Cookie HTTP response header.


Type: string
Default: «»
Environment variable: TCM_HTTP_WEBSESSION_COOKIE_DOMAIN
Command-line option: ---http.websession-cookie.domain
http.websession-cookie.ttl

The maximum lifetime of the TCM cookie.

This value is used in the Max-Age attribute of the Set-Cookie HTTP response header.


Type: time.Duration
Default: 2h0m0s
Environment variable: TCM_HTTP_WEBSESSION_COOKIE_TTL
Command-line option: ---http.websession-cookie.ttl
http.websession-cookie.secure

Indicates whether the cookie can be sent only over the HTTPS protocol. In this case, it’s never sent over the unencrypted HTTP, therefore preventing man-in-the-middle attacks.

When true, the Secure attribute is added to the Set-Cookie HTTP response header.


Type: bool
Default: false
Environment variable: TCM_HTTP_WEBSESSION_COOKIE_SECURE
Command-line option: ---http.websession-cookie.secure
http.websession-cookie.http-only

Indicates that the cookie can’t be accessed from the JavaScript Document.cookie API. This helps mitigate cross-site scripting attacks.

When true, the HttpOnly attribute is added to the Set-Cookie HTTP response header.


Type: bool
Default: true
Environment variable: TCM_HTTP_WEBSESSION_COOKIE_HTTP_ONLY
Command-line option: ---http.websession-cookie.http-only
http.websession-cookie.same-site

Indicates if it is possible to send the TCM cookie along with cross-site requests. Possible values are the Go’s http.SameSite constants:

  • SameSiteDefaultMode
  • SameSiteLaxMode
  • SameSiteStrictMode
  • SameSiteNoneMode

For details on SameSite modes, see the Set-Cookie header documentation in the MDN web docs.

This value is used in the SameSite attribute of the Set-Cookie HTTP response header.


Type: http.SameSite
Default: SameSiteDefaultMode
Environment variable: TCM_HTTP_WEBSESSION_COOKIE_SAME_SITE
Command-line option: ---http.websession-cookie.same-site
http.cors.enabled

Indicates whether to use the Cross-Origin Resource Sharing (CORS).


Type: bool
Default: false
Environment variable: TCM_HTTP_CORS_ENABLED
Command-line option: --http.cors.enabled
http.cors.allowed-origins

The origins with which the HTTP response can be shared, separated by semicolons.

The specified values are sent in the Access-Control-Allow-Origin HTTP response headers.


Type: []string
Default: []
Environment variable: TCM_HTTP_CORS_ALLOWED_ORIGINS
Command-line option: --http.cors.allowed-origins
http.cors.allowed-methods

HTTP request methods that are allowed when accessing a resource, separated by semicolons.

The specified values are sent in the Access-Control-Allow-Methods HTTP header of a response to a CORS preflight request.


Type: []string
Default: []
Environment variable: TCM_HTTP_CORS_ALLOWED_METHODS
Command-line option: --http.cors.allowed-methods
http.cors.allowed-headers

HTTP headers that are allowed during the actual request, separated by semicolons.

The specified values are sent in the Access-Control-Allow-Headers HTTP header of a response to a CORS preflight request.


Type: []string
Default: []
Environment variable: TCM_HTTP_CORS_ALLOWED_HEADERS
Command-line option: --http.cors.allowed-headers
http.cors.exposed-headers

Response headers that should be made available to scripts running in the browser, in response to a cross-origin request, separated by semicolons.

The specified values are sent in the Access-Control-Expose-Headers HTTP response headers.


Type: []string
Default: []
Environment variable: TCM_HTTP_CORS_EXPOSED_HEADERS
Command-line option: --http.cors.exposed-headers
http.cors.allow-credentials

Whether to expose the response to the frontend JavaScript code when the request’s credentials mode is include.

When true, the Access-Control-Allow-Credentials HTTP response header is sent.


Type: bool
Default: false
Environment variable: TCM_HTTP_CORS_ALLOW_CREDENTIALS
Command-line option: --http.cors.allow-credentials
http.cors.debug

For debug purposes.


Type: bool
Default: false
http.tls.enabled

Indicates whether TLS is enabled for client connections to TCM.


Type: bool
Default: false
Environment variable: TCM_HTTP_TLS_ENABLED
Command-line option: --http.tls.enabled
http.tls.cert-file

A path to a TLS certificate file. Mandatory when TLS is enabled.


Type: string
Default: «»
Environment variable: TCM_HTTP_TLS_CERT_FILE
Command-line option: --http.tls.cert-file
http.tls.key-file

A path to a TLS private key file. Mandatory when TLS is enabled.


Type: string
Default: «»
Environment variable: TCM_HTTP_TLS_KEY_FILE
Command-line option: --http.tls.key-file
http.tls.server

The TLS server.


Type: string
Default: «»
Environment variable: TCM_HTTP_TLS_SERVER
Command-line option: --http.tls.server
http.tls.min-version

The minimum version of the TLS protocol.


Type: uint16
Default: 0
Environment variable: TCM_HTTP_TLS_MIN_VERSION
Command-line option: --http.tls.min-version
http.tls.max-version

The maximum version of the TLS protocol.


Type: uint16
Default: 0
Environment variable: TCM_HTTP_TLS_MAX_VERSION
Command-line option: --http.tls.max-version
http.tls.curve-preferences

Elliptic curves that are used for TLS connections. Possible values are the Go’s tls.CurveID constants:

  • CurveP256
  • CurveP384
  • CurveP521
  • X25519

Type: []tls.CurveID
Default: []
Environment variable: TCM_HTTP_TLS_CURVE_PREFERENCES
Command-line option: --http.tls.curve-preferences
http.tls.cipher-suites

Enabled TLS cipher suites. The supported ciphers are:

  • TLS 1.0 - 1.2 cipher suites:
    • TLS_RSA_WITH_RC4_128_SHA
    • TLS_RSA_WITH_3DES_EDE_CBC_SHA
    • TLS_RSA_WITH_AES_128_CBC_SHA
    • TLS_RSA_WITH_AES_256_CBC_SHA
    • TLS_RSA_WITH_AES_128_CBC_SHA256
    • TLS_RSA_WITH_AES_128_GCM_SHA256
    • TLS_RSA_WITH_AES_256_GCM_SHA384
    • TLS_ECDHE_ECDSA_WITH_RC4_128_SHA
    • TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
    • TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
    • TLS_ECDHE_RSA_WITH_RC4_128_SHA
    • TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA
    • TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
    • TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
    • TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256
    • TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
    • TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
    • TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
    • TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
    • TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
    • TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
    • TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
  • TLS 1.3 cipher suites:
    • TLS_AES_128_GCM_SHA256
    • TLS_AES_256_GCM_SHA384
    • TLS_CHACHA20_POLY1305_SHA256
    • TLS_FALLBACK_SCSV isn’t a standard cipher suite but an indicator that the client is doing version fallback
    • TLS_FALLBACK_SCSV uint16 = 0x5600
    • TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 = TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
    • TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 = TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA25

For detailed information on ciphers, refer to the Golang tls.TLS_* constants.

The example below shows how to configure cipher suites:

http:
  tls:
    cipher-suites:
      - TLS_AES_256_GCM_SHA384
      - TLS_AES_128_GCM_SHA256
      - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
      - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
      - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
      - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
      - TLS_DHE_RSA_WITH_AES_256_GCM_SHA384
      - TLS_DHE_RSA_WITH_AES_128_GCM_SHA256

Type: []uint16
Default: []
Environment variable: TCM_HTTP_TLS_CIPHER_SUITES
Command-line option: --http.tls.cipher-suites
http.read-timeout

A timeout for reading an incoming request.


Type: time.Duration
Default: 30s
Environment variable: TCM_HTTP_READ_TIMEOUT
Command-line option: --http.read-timeout
http.read-header-timeout

A timeout for reading headers of an incoming request.


Type: time.Duration
Default: 30s
Environment variable: TCM_HTTP_READ_HEADER_TIMEOUT
Command-line option: --http.read-header-timeout
http.write-timeout

A timeout for writing a response.


Type: time.Duration
Default: 30s
Environment variable: TCM_HTTP_WRITE_TIMEOUT
Command-line option: --http.write-timeout
http.idle-timeout

The timeout for idle connections.


Type: time.Duration
Default: 30s
Environment variable: TCM_HTTP_IDLE_TIMEOUT
Command-line option: --http.idle-timeout
http.disable-general-options-handler

Whether the client requests with the OPTIONS HTTP method are allowed.


Type: bool
Default: false
Environment variable: TCM_HTTP_DISABLE_GENERAL_OPTIONS_HANDLER
Command-line option: --http.disable-general-options-handler
http.max-header-bytes

The maximum size (in bytes) of a header in a client’s request to TCM.


Type: int
Default: 0
Environment variable: TCM_HTTP_MAX_HEADER_BYTES
Command-line option: --http.max-header-bytes
http.api-timeout

The stateboard update timeout.


Type: time.Duration
Default: 8s
Environment variable: TCM_HTTP_API_TIMEOUT
Command-line option: --http.api-timeout
http.api-update-interval

The stateboard update interval.


Type: time.Duration
Default: 5s
Environment variable: TCM_HTTP_API_UPDATE_INTERVAL
Command-line option: --http.api-update-interval
http.frontend-dir

The directory with custom TCM frontend files (for development purposes).


Type: string
Default: «»
Environment variable: TCM_HTTP_FRONTEND_DIR
Command-line option: --http.frontend-dir
http.show-stack-trace

Whether error stack traces are shown in the web UI.


Type: bool
Default: true
Environment variable: TCM_HTTP_SHOW_STACK_TRACE
Command-line option: --http.show-stack-trace
http.trace

Whether all query tracing information is written in logs.


Type: bool
Default: false
Environment variable: TCM_HTTP_TRACE
Command-line option: --http.trace
http.max-static-size

The maximum size (in bytes) of a static content sent to TCM.


Type: int
Default: 104857600
Environment variable: TCM_HTTP_MAX_STATIC_SIZE
Command-line option: --http.max-static-size
http.graphql.complexity

The maximum complexity of GraphQL queries that TCM processes. If this value is exceeded, TCM returns an error.


Type: int
Default: 40
Environment variable: TCM_HTTP_GRAPHQL_COMPLEXITY
Command-line option: --http.graphql.complexity

The log section defines the TCM logging parameters.

log.default.add-source

Whether sources are added to the TCM log.


Type: bool
Default: false
Environment variable: TCM_LOG_DEFAULT_ADD_SOURCE
Command-line option: --log.default.add-source
log.default.show-stack-trace

Whether stack traces are added to the TCM log.


Type: bool
Default: false
Environment variable: TCM_LOG_DEFAULT_SHOW_STACK_TRACE
Command-line option: --log.default.show-stack-trace
log.default.level

The default TCM logging level.

Possible values:

  • VERBOSE
  • INFO
  • WARN
  • ALARM

Type: string
Default: INFO
Environment variable: TCM_LOG_DEFAULT_LEVEL
Command-line option: --log.default.level
log.default.format

TCM log entries format.

Possible values:

  • struct
  • json

Type: string
Default: struct
Environment variable: TCM_LOG_DEFAULT_FORMAT
Command-line option: --log.default.format
log.default.output

The output used for TCM log.

Possible values:

  • stdout
  • stderr
  • file
  • syslog

Type: string
Default: stdout
Environment variable: TCM_LOG_DEFAULT_OUTPUT
Command-line option: --log.default.output
log.default.no-colorized

Whether the stdout log is not colorized.


Type: bool
Default: false
Environment variable: TCM_LOG_DEFAULT_NO_COLORIZED
Command-line option: --log.default.no-colorized
log.default.file.name

The name of the TCM log file.


Type: string
Default: «»
Environment variable: TCM_LOG_DEFAULT_FILE_NAME
Command-line option: --log.default.file.name
log.default.file.maxsize

The maximum size (in bytes) of the TCM log file.


Type: int
Default: 0
Environment variable: TCM_LOG_DEFAULT_FILE_MAXSIZE
Command-line option: --log.default.file.maxsize
log.default.file.maxage

The maximum age of a TCM log file, in days.


Type: int
Default: 0
Environment variable: TCM_LOG_DEFAULT_FILE_MAXAGE
Command-line option: --log.default.file.maxage
log.default.file.maxbackups

The maximum number of users in TCM.


Type: int
Default: 0
Environment variable: TCM_LOG_DEFAULT_FILE_MAXBACKUPS
Command-line option: --log.default.file.maxbackups
log.default.file.compress

Indicated that TCM compresses log files upon rotation.


Type: bool
Default: false
Environment variable: TCM_LOG_DEFAULT_FILE_COMPRESS
Command-line option: --log.default.file.compress
log.default.syslog.protocol

The network protocol used for connecting to the syslog server. Typically, it’s tcp, udp, or unix. All possible values are listed in the Go’s net.Dial documentation.


Type: string
Default: tcp
Environment variable: TCM_LOG_DEFAULT_SYSLOG_PROTOCOL
Command-line option: --log.default.syslog.protocol
log.default.syslog.output

The syslog server URI.


Type: string
Default: 127.0.0.1:5514
Environment variable: TCM_LOG_DEFAULT_SYSLOG_OUTPUT
Command-line option: --log.default.syslog.output
log.default.syslog.priority

The syslog severity level.


Type: string
Default: «»
Environment variable: TCM_LOG_DEFAULT_SYSLOG_PRIORITY
Command-line option: --log.default.syslog.priority
log.default.syslog.facility

The syslog facility.


Type: string
Default: «»
Environment variable: TCM_LOG_DEFAULT_SYSLOG_FACILITY
Command-line option: --log.default.syslog.facility
log.default.syslog.tag

The syslog tag.


Type: string
Default: «»
Environment variable: TCM_LOG_DEFAULT_SYSLOG_TAG
Command-line option: --log.default.syslog.tag
log.default.syslog.timeout

The timeout for connecting to the syslog server.


Type: time.Duration
Default: 10s
Environment variable: TCM_LOG_DEFAULT_SYSLOG_TIMEOUT
Command-line option: --log.default.syslog.timeout
log.outputs

An array of log outputs that TCM uses in addition to the default one that is defined by the log.default.* parameters. Each array item can include the parameters of the log.default group. If a parameter is skipped, its value is taken from log.default.


Type: []LogOuputConfig
Default: []
Environment variable: TCM_LOG_OUTPUTS
Command-line option: --log-outputs

The storage section defines the parameters of the TCM backend store.

etcd backend store parameters:

Tarantool backend store parameters:

storage.provider

The type of the storage used for storing TCM configuration.

Possible values:

  • etcd
  • tarantool

Type: string
Default: etcd
Environment variable: TCM_STORAGE_PROVIDER
Command-line option: --storage.provider
storage.etcd.prefix

A prefix for the TCM configuration parameters in etcd.


Type: string
Default: «/tcm»
Environment variable: TCM_STORAGE_ETCD_PREFIX
Command-line option: --storage.etcd.prefix
storage.etcd.endpoints

An array of node URIs of the etcd cluster where the TCM configuration is stored, separated by semicolons (;).


Type: []string
Default: [«http://127.0.0.1:2379»]
Environment variable: TCM_STORAGE_ETCD_ENDPOINTS
Command-line option: --storage.etcd.endpoints
storage.etcd.dial-timeout

An etcd dial timeout.


Type: time.Duration
Default: 10s
Environment variable: TCM_STORAGE_ETCD_DIAL_TIMEOUT
Command-line option: --storage.etcd.dial-timeout
storage.etcd.auto-sync-interval

An automated sync interval.


Type: time.Duration
Default: 0 (disabled)
Environment variable: TCM_STORAGE_ETCD_AUTO_SYNC_INTERVAL
Command-line option: --storage.etcd.auto-sync-interval
storage.etcd.dial-keep-alive-time

A dial keep-alive time.


Type: time.Duration
Default: 30s
Environment variable: TCM_STORAGE_ETCD_DIAL_KEEP_ALIVE_TIME
Command-line option: --storage.etcd.dial-keep-alive-time
storage.etcd.dial-keep-alive-timeout

A dial keep-alive timeout.


Type: time.Duration
Default: 30s
Environment variable: TCM_STORAGE_ETCD_DIAL_KEEP_ALIVE_TIMEOUT
Command-line option: --storage.etcd.dial-keep-alive-timeout
storage.etcd.bootstrap-timeout

A bootstrap timeout.


Type: time.Duration
Default: 30s
Environment variable: TCM_STORAGE_ETCD_BOOTSTRAP_TIMEOUT
Command-line option: --storage.etcd.bootstrap-timeout
storage.etcd.max-call-send-msg-size

The maximum size (in bytes) of a transaction between TCM and etcd.


Type: int
Default: 2097152
Environment variable: TCM_STORAGE_ETCD_MAX_CALL_SEND_MSG_SIZE
Command-line option: --storage.etcd.max-call-send-msg-size
storage.etcd.username

A username for accessing the etcd storage.


Type: string
Default: «»
Environment variable: TCM_STORAGE_ETCD_USERNAME
Command-line option: --storage.etcd.username
storage.etcd.password

A password for accessing the etcd storage.


Type: string
Default: «»
Environment variable: TCM_STORAGE_ETCD_PASSWORD
Command-line option: --storage.etcd.password
storage.etcd.password-file

A path to the file with a password for accessing the etcd storage.


Type: string
Default: «»
Environment variable: TCM_STORAGE_ETCD_PASSWORD_FILE
Command-line option: --storage.etcd.password-file
storage.etcd.tls.enabled

Indicates whether TLS is enabled for etcd connections.


Type: bool
Default: false
Environment variable: TCM_STORAGE_ETCD_TLS_ENABLED
Command-line option: --storage.etcd.tls.enabled
storage.etcd.tls.auto

Use generated certificates for etcd connections.


Type: bool
Default: false
Environment variable: TCM_STORAGE_ETCD_TLS_AUTO
Command-line option: --storage.etcd.tls.auto
storage.etcd.tls.cert-file

A path to a TLS certificate file to use for etcd connections.


Type: string
Default: «»
Environment variable: TCM_STORAGE_ETCD_TLS_CERT_FILE
Command-line option: --storage.etcd.tls.cert-file
storage.etcd.tls.key-file

A path to a TLS private key file to use for etcd connections.


Type: string
Default: «»
Environment variable: TCM_STORAGE_ETCD_TLS_KEY_FILE
Command-line option: --storage.etcd.tls.key-file
storage.etcd.tls.trusted-ca-file

A path to a trusted CA certificate file to use for etcd connections.


Type: string
Default: «»
Environment variable: TCM_STORAGE_ETCD_TLS_TRUSTED_CA_FILE
Command-line option: --storage.etcd.tls.trusted-ca-file
storage.etcd.tls.client-cert-auth

Indicates whether client cert authentication is enabled.


Type: bool
Default: false
Environment variable: TCM_STORAGE_ETCD_TLS_CLIENT_CERT_AUTH
Command-line option: --storage.etcd.tls.client-cert-auth
storage.etcd.tls.crl-file

A path to the client certificate revocation list file.


Type: string
Default: «»
Environment variable: TCM_STORAGE_ETCD_TLS_CRL_FILE
Command-line option: --storage.etcd.tls.crl-file
storage.etcd.tls.insecure-skip-verify

Skip checking client certificate in etcd connections.


Type: bool
Default: false
Environment variable: TCM_STORAGE_ETCD_TLS_INSECURE_SKIP_VERIFY
Command-line option: --storage.etcd.tls.insecure-skip-verify
storage.etcd.tls.skip-client-san-verify

Skip verification of SAN field in client certificate for etcd connections.


Type: bool
Default: false
Environment variable: TCM_STORAGE_ETCD_TLS_SKIP_CLIENT_SAN_VERIFY
Command-line option: --storage.etcd.tls.skip-client-san-verify
storage.etcd.tls.server-name

Name of the TLS server for etcd connections.


Type: string
Default: «»
Environment variable: TCM_STORAGE_ETCD_TLS_SERVER_NAME
Command-line option: --storage.etcd.tls.server-name
storage.etcd.tls.cipher-suites

TLS cipher suites for etcd connections. Possible values are the Golang tls.TLS_* constants.


Type: []uint16
Default: []
Environment variable: TCM_STORAGE_ETCD_TLS_CIPHER_SUITES
Command-line option: --storage.etcd.tls.cipher-suites
storage.etcd.tls.allowed-cn

An allowed common name for authentication in etcd connections.


Type: string
Default: «»
Environment variable: TCM_STORAGE_ETCD_TLS_ALLOWED_CN
Command-line option: --storage.etcd.tls.allowed-cn
storage.etcd.tls.allowed-hostname

An allowed TLS certificate name for authentication in etcd connections.


Type: string
Default: «»
Environment variable: TCM_STORAGE_ETCD_TLS_ALLOWED_HOSTNAME
Command-line option: --storage.etcd.tls.allowed-hostname
storage.etcd.tls.empty-cn

Whether the empty common name is allowed in etcd connections.


Type: bool
Default: false
Environment variable: TCM_STORAGE_ETCD_TLS_EMPTY_CN
Command-line option: --storage.etcd.tls.empty-cn
storage.etcd.permit-without-stream

Whether keepalive pings can be send to the etcd server without active streams.


Type: bool
Default: false
Environment variable: TCM_STORAGE_ETCD_PERMIT_WITHOUT_STREAM
Command-line option: --storage.etcd.permit-without-stream

The storage.etcd.embed group defines the configuration of the embedded etcd cluster to use as a TCM backend store. This cluster can be used for development purposes when the production or testing etcd cluster is not available or not needed.

See also Embedded backend store.

storage.tarantool.prefix

A prefix for the TCM configuration parameters in the Tarantool-based configuration storage.


Type: string
Default: «/tcm»
Environment variable: TCM_STORAGE_TARANTOOL_PREFIX
Command-line option: --storage.tarantool.prefix
storage.tarantool.addr

The URI for connecting to the Tarantool-based configuration storage.


Type: string
Default: «unix/:/tmp/tnt_config_instance.sock»
Environment variable: TCM_STORAGE_TARANTOOL_ADDR
Command-line option: --storage.tarantool.addr
storage.tarantool.addrs

An array of the Tarantool-based configuration storage URIs.


Type: []string
Default: [«unix/:/tmp/tnt_config_instance.sock»]
Environment variable: TCM_STORAGE_TARANTOOL_ADDRS
Command-line option: --storage.tarantool.addrs
storage.tarantool.auth

An authentication method for the Tarantool-based configuration storage.

Possible values are the Go’s go-tarantool/Auth constants:

  • AutoAuth (0)
  • ChapSha1Auth
  • PapSha256Auth

Type: int
Default: 0
Environment variable: TCM_STORAGE_TARANTOOL_AUTH
Command-line option: --storage.tarantool.auth
storage.tarantool.timeout

A request timeout for the Tarantool-based configuration storage.

See also go-tarantool.Opts.


Type: time.Duration
Default: 0s
Environment variable: TCM_STORAGE_TARANTOOL_TIMEOUT
Command-line option: --storage.tarantool.timeout
storage.tarantool.reconnect

A timeout between reconnect attempts for the Tarantool-based configuration storage.

See also go-tarantool.Opts.


Type: time.Duration
Default: 0s
Environment variable: TCM_STORAGE_TARANTOOL_RECONNECT
Command-line option: --storage.tarantool.reconnect
storage.tarantool.max-reconnects

The maximum number of reconnect attempts for the Tarantool-based configuration storage.

See also go-tarantool.Opts.


Type: int
Default: 0
Environment variable: TCM_STORAGE_TARANTOOL_MAX_RECONNECTS
Command-line option: --storage.tarantool.max-reconnects
storage.tarantool.username

A username for connecting to the Tarantool-based configuration storage.

See also go-tarantool.Opts.


Type: string
Default: «»
Environment variable: TCM_STORAGE_TARANTOOL_USERNAME
Command-line option: --storage.tarantool.username
storage.tarantool.password

A password for connecting to the Tarantool-based configuration storage.

See also go-tarantool.Opts.


Type: string
Default: «»
Environment variable: TCM_STORAGE_TARANTOOL_PASSWORD
Command-line option: --storage.tarantool.password
storage.tarantool.password-file

A path to the file with a password for connecting to the Tarantool-based configuration storage.


Type: string
Default: «»
Environment variable: TCM_STORAGE_TARANTOOL_PASSWORD_FILE
Command-line option: --storage.tarantool.password-file
storage.tarantool.rate-limit

A rate limit for connecting to the Tarantool-based configuration storage.

See also go-tarantool.Opts.


Type: int
Default: 0
Environment variable: TCM_STORAGE_TARANTOOL_RATE_LIMIT
Command-line option: --storage.tarantool.rate-limit
storage.tarantool.rate-limit-action

An action to perform when the storage.tarantool.rate-limit is reached.

See also go-tarantool.Opts.


Type: int
Default: 0
Environment variable: TCM_STORAGE_TARANTOOL_RATE_LIMIT_ACTION
Command-line option: --storage.tarantool.rate-limit-action
storage.tarantool.concurrency

An amount of separate mutexes for request queues and buffers inside of a connection to the Tarantool TCM configuration storage.

See also go-tarantool.Opts.


Type: int
Default: 0
Environment variable: TCM_STORAGE_TARANTOOL_CONCURRENCY
Command-line option: --storage.tarantool.concurrency
storage.tarantool.skip-schema

Whether the schema is loaded from the Tarantool TCM configuration storage.

See also go-tarantool.Opts.


Type: bool
Default: true
Environment variable: TCM_STORAGE_TARANTOOL_SKIP_SCHEMA
Command-line option: --storage.tarantool.skip-schema
storage.tarantool.transport

The connection type for the Tarantool TCM configuration storage.

See also go-tarantool.Opts.


Type: string
Default: «»
Environment variable: TCM_STORAGE_TARANTOOL_TRANSPORT
Command-line option: --storage.tarantool.transport
storage.tarantool.ssl.key-file

A path to a TLS private key file to use for connecting to the Tarantool TCM configuration storage.

See also: Securing connections with SSL.


Type: string
Default: «»
Environment variable: TCM_STORAGE_TARANTOOL_SSL_KEY_FILE
Command-line option: --storage.tarantool.ssl.key-file
storage.tarantool.ssl.cert-file

A path to an SSL certificate to use for connecting to the Tarantool TCM configuration storage.

See also: Securing connections with SSL.


Type: string
Default: «»
Environment variable: TCM_STORAGE_TARANTOOL_SSL_CERT_FILE
Command-line option: --storage.tarantool.ssl.cert-file
storage.tarantool.ssl.ca-file

A path to a trusted CA certificate to use for connecting to the Tarantool TCM configuration storage.

See also: Securing connections with SSL.


Type: string
Default: «»
Environment variable: TCM_STORAGE_TARANTOOL_SSL_CA_FILE
Command-line option: --storage.tarantool.ssl.ca-file
storage.tarantool.ssl.ciphers

A list of SSL cipher suites that can be used for connecting to the Tarantool TCM configuration storage. Possible values are listed in <uri>.params.ssl_ciphers.

See also: Securing connections with SSL.


Type: string
Default: «»
Environment variable: TCM_STORAGE_TARANTOOL_SSL_CIPHERS
Command-line option: --storage.tarantool.ssl.ciphers
storage.tarantool.ssl.password

A password for an encrypted private SSL key to use for connecting to the Tarantool TCM configuration storage.

See also: Securing connections with SSL.


Type: string
Default: «»
Environment variable: TCM_STORAGE_TARANTOOL_SSL_PASSWORD
Command-line option: --storage.tarantool.ssl.password
storage.tarantool.ssl.password-file

A text file with passwords for encrypted private SSL keys to use for connecting to the Tarantool TCM configuration storage.

See also: Securing connections with SSL.


Type: string
Default: «»
Environment variable: TCM_STORAGE_TARANTOOL_SSL_PASSWORD_FILE
Command-line option: --storage.tarantool.ssl.password-file
storage.tarantool.required-protocol-info.auth

An authentication method for the Tarantool TCM configuration storage.

Possible values are the Go’s go-tarantool/Auth constants:

  • AutoAuth (0)
  • ChapSha1Auth
  • PapSha256Auth

See also go-tarantool.ProtocolInfo.


Type: int
Default: 0
Environment variable: TCM_STORAGE_TARANTOOL_SSL_REQUIRED_PROTOCOL_INFO_AUTH
Command-line option: --storage.tarantool.required-protocol-info.auth
storage.tarantool.required-protocol-info.version

A Tarantool protocol version.

See also go-tarantool.ProtocolInfo.


Type: uint64
Default: 0
Environment variable: TCM_STORAGE_TARANTOOL_SSL_REQUIRED_PROTOCOL_INFO_VERSION
Command-line option: --storage.tarantool.required-protocol-info.version
storage.tarantool.required-protocol-info.features

An array of Tarantool protocol features.

See also go-tarantool.ProtocolInfo.


Type: []int
Default: []
Environment variable: TCM_STORAGE_TARANTOOL_SSL_REQUIRED_PROTOCOL_INFO_FEATURES
Command-line option: --storage.tarantool.required-protocol-info.features

The storage.tarantool.embed group parameters define the configuration of the embedded Tarantool cluster to use as a TCM backend store. This cluster can be used for development purposes when the production or testing cluster is not available or not needed.

See also Embedded backend store.

The addon section defines settings related to TCM add-ons.

addon.enabled

Whether to enable the add-on functionality in TCM.


Type: bool
Default: false
Environment variable: TCM_ADDON_ENABLED
Command-line option: --addon.enabled
addon.addons-dir

The directory from which TCM takes add-ons.


Type: string
Default: addons
Environment variable: TCM_ADDON_ADDONS_DIR
Command-line option: --addon.addons-dir
addon.max-upload-size

The maximum size (in bytes) of addon to upload to TCM.


Type: int64
Default: 104857600
Environment variable: TCM_ADDON_MAX_UPLOAD_SIZE
Command-line option: --addon.max-upload-size
addon.dev-addons-dir

Additional add-on directories for development purposes, separated by semicolons (;).


Type: []string
Default: []
Environment variable: TCM_ADDON_DEV_ADDONS_DIR
Command-line option: --addon.dev-addons-dir

The limits section defines limits on various TCM objects and relations between them.

limits.users-count

The maximum number of users in TCM.


Type: int
Default: 1000
Environment variable: TCM_LIMITS_USERS_COUNT
Command-line option: --limits.users-count
limits.clusters-count

The maximum number of clusters in TCM.


Type: int
Default: 10
Environment variable: TCM_LIMITS_CLUSTERS_COUNT
Command-line option: --limits.clusters-count
limits.roles-count

The maximum number of roles in TCM.


Type: int
Default: 100
Environment variable: TCM_LIMITS_ROLES_COUNT
Command-line option: --limits.roles-count
limits.webhooks-count

The maximum number of webhooks in TCM.


Type: int
Default: 200
Environment variable: TCM_LIMITS_WEBHOOKS_COUNT
Command-line option: --limits.webhooks-count
limits.user-secrets-count

The maximum number secrets that a TCM user can have.


Type: int
Default: 10
Environment variable: TCM_LIMITS_USER_SECRETS_COUNT
Command-line option: --limits.user-secrets-count
limits.user-websessions-count

The maximum number of open sessions that a TCM user can have.


Type: int
Default: 10
Environment variable: TCM_LIMITS_USER_WEBSESSIONS_COUNT
Command-line option: --limits.user-websessions-count
limits.linked-cluster-users

The maximum number of clusters to which a single user can have access.


Type: int
Default: 10
Environment variable: TCM_LIMITS_LINKED_CLUSTER_USERS
Command-line option: --limits.linked-cluster-users

The security section defines the security parameters of TCM.

security.auth

Ways to log into TCM.

Possible values:

  • local
  • ldap

Type: []string
Default: [local]
Environment variable: TCM_SECURITY_AUTH
Command-line option: --security.auth
security.hash-cost

A hash cost for hashing users“ passwords.


Type: int
Default: 12
Environment variable: TCM_SECURITY_HASH_COST
Command-line option: --security.hash-cost
security.encryption-key

An encryption key for passwords used by TCM for accessing Tarantool and etcd clusters.


Type: string
Default: «»
Environment variable: TCM_SECURITY_ENCRYPTION_KEY
Command-line option: --security.encryption-key
security.encryption-key-file

A path to the file with the encryption key for passwords used by TCM for accessing Tarantool and etcd clusters.


Type: string
Default: «»
Environment variable: TCM_SECURITY_ENCRYPTION_KEY_FILE
Command-line option: --security.encryption-key-file
security.bootstrap-password

A password for the first login of the admin user. Only for testing purposes.


Type: string
Default: «»
Environment variable: TCM_SECURITY_BOOTSTRAP_PASSWORD
Command-line option: --security.bootstrap-password
security.bootstrap-api-token

A default API token for the admin user. Only for testing purposes.


Type: string
Default: «»
Environment variable: TCM_SECURITY_BOOTSTRAP_API_TOKEN
Command-line option: --security.bootstrap-api-token
security.integrity-check

Whether to check the digital signature. If true, the error is raised in case an incorrect signature is detected.


Type: bool
Default: false
Environment variable: TCM_SECURITY_INTEGRITY_CHECK
Command-line option: --security.integrity-check
security.signature-private-key-file

A path to a file with the private key to sign TCM data.


Type: string
Default: «»
Environment variable: TCM_SECURITY_SIGNATURE_PRIVATE_KEY_FILE
Command-line option: --security.signature-private-key-file

mode

The TCM mode: production, development, or test.


Type: string
Default: production
Environment variable: TCM_MODE
Command-line option: --mode

The feature section defines the security parameters of TCM.

feature.ttgraph

Whether Tarantool Graph DB integration is enabled.


Type: bool
Default: false
Environment variable: TCM_FEATURE_TTGRAPH
Command-line option: --feature.ttgraph
feature.column-store

Whether Tarantool Column Store integration is enabled.


Type: bool
Default: false
Environment variable: TCM_FEATURE_COLUMN_STORE
Command-line option: --feature.column-store
feature.tqe

Whether Tarantool Queue Enterprise integration is enabled.


Type: bool
Default: false
Environment variable: TCM_FEATURE_TQE
Command-line option: --feature.tqe
feature.api-token

Whether the use of API tokens is enabled.


Type: bool
Default: false
Environment variable: TCM_FEATURE_API_TOKEN
Command-line option: --feature.api-token
feature.tuples

Whether the use of Tuples is enabled.


Type: bool
Default: false
Environment variable: TCM_FEATURE_TUPLES
Command-line option: --feature.tuples

The initial-settings group defines entities that are created automatically upon the first TCM startup.

See also Initial settings.

Важно

The initial-settings.* configuration options can be set in the YAML configuration file only. There are no environment variables nor command-line options for them.

initial-settings.clusters

An array of clusters to create in TCM automatically upon the first startup.

See also Initial settings.


Type: []Cluster
Default: []
initial-settings.clusters.<cluster>.id

Cluster ID. Skip this option to generate an ID automatically. Specify the value 00000000-0000-0000-0000-000000000000 to customize the default cluster upon TCM startup.


Type: string
Default: «» (ID is generated automatically)
initial-settings.clusters.<cluster>.name

Cluster name.


Type: string
Default: «»
initial-settings.clusters.<cluster>.description

Cluster description.


Type: string
Default: «»
initial-settings.clusters.<cluster>.color

A color to highlight the cluster in TCM. Possible values:

  • dark
  • gray
  • red
  • pink
  • grape
  • violet
  • indigo
  • blue
  • cyan
  • green
  • lime
  • yellow
  • orange
  • teal
  • empty string (no color)

Type: string
Default: «» (no color)
initial-settings.clusters.<cluster>.urls

URLs of additional services for the cluster. See also Adding a new cluster.


Type: []ClusterUrl
Default: []
initial-settings.clusters.<cluster>.<url>.label

URL label to show in TCM. Typically, this is the linked service name.


Type: string
Default: «»
initial-settings.clusters.<cluster>.<url>.url

The URL address of the linked service.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.provider

The type of the storage used for storing the cluster configuration.

Possible values:

  • etcd
  • tarantool
  • empty string (undefined)

Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.endpoints

An array of node URIs of the etcd cluster where the Tarantool cluster configuration is stored.


Type: []string
Default: []
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.auto-sync-interval

An automated sync interval.


Type: time.Duration
Default: 0 (disabled)
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.dial-timeout

An etcd dial timeout.


Type: time.Duration
Default: 0 (not set)
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.dial-keep-alive-time

A dial keep-alive time.


Type: time.Duration
Default: 0 (not set)
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.dial-keep-alive-timeout

A dial keep-alive timeout.


Type: time.Duration
Default: 0 (not set)
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.max-call-send-msg-size

The maximum size (in bytes) of a request from the cluster to its etcd configuration storage.


Type: int
Default: 2097152
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.max-call-recv-msg-size

The maximum size (in bytes) of a response to the cluster from its etcd configuration storage.


Type: int
Default: 0 (unlimited)
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.username

A username for accessing the cluster’s etcd storage.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.password

A password for accessing the cluster’s etcd storage.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.reject-old-cluster

Whether etcd should refuse to create a client against an outdated cluster.


Type: bool
Default: false
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.permit-without-stream

Whether keepalive pings can be send to the etcd server without active streams.


Type: bool
Default: false
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.prefix

A prefix for the cluster configuration parameters in etcd.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.enabled

Indicates whether TLS is enabled for connections to the cluster’s etcd storage.


Type: bool
Default: false
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.cert-file

A path to a TLS certificate file to use for etcd connections.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.key-file

A path to a TLS private key file to use for etcd connections.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.trusted-ca-file

A path to a trusted CA certificate file to use for etcd connections.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.client-cert-auth

Indicates whether client cert authentication is enabled.


Type: bool
Default: false
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.crl-file

A path to the client certificate revocation list file.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.insecure-skip-verify

Skip checking client certificate in etcd connections.


Type: bool
Default: false
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.skip-client-san-verify

Skip verification of SAN field in client certificate for etcd connections.


Type: bool
Default: false
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.server-name

Name of the TLS server for etcd connections.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.cipher-suites

TLS cipher suites for etcd connections. Possible values are the Golang tls.TLS_* constants.


Type: []uint16
Default: []
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.allowed-cn

An allowed common name for authentication in etcd connections.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.allowed-hostname

An allowed TLS certificate name for authentication in etcd connections.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.empty-cn

Whether the empty common name is allowed in etcd connections.


Type: bool
Default: false
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.username

A username for connecting to the cluster’s Tarantool-based configuration storage.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.password

A password for connecting to the cluster’s Tarantool-based configuration storage.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.endpoints

An array of the cluster’s Tarantool-based configuration storage URIs.


Type: []string
Default: []
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.method

An authentication method for the cluster’s Tarantool-based configuration storage.

Possible values are the Go’s go-tarantool/Auth constants:

  • AutoAuth (0)
  • ChapSha1Auth
  • PapSha256Auth

Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.prefix

A prefix for the cluster configuration parameters in the Tarantool-based configuration storage.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.ssl.key-file

A path to a TLS private key file to use for connecting to the cluster’s Tarantool-based configuration storage.

See also: Securing connections with SSL.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.ssl.cert-file

A path to an SSL certificate to use for connecting to the cluster’s Tarantool-based configuration storage.

See also: Securing connections with SSL.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.ssl.ca-file

A path to a trusted CA certificate to use for connecting to the cluster’s Tarantool-based configuration storage.

See also: Securing connections with SSL.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.ssl.ciphers

A list of SSL cipher suites that can be used for connecting to the cluster’s Tarantool-based configuration storage. Possible values are listed in <uri>.params.ssl_ciphers.

See also: Securing connections with SSL.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.ssl.enabled

A password for an encrypted private SSL key to use for connecting to the cluster’s Tarantool-based configuration storage.

See also: Securing connections with SSL.


Type: string
Default: «»
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.ssl.password-file

A text file with passwords for encrypted private SSL keys to use for connecting to the cluster’s Tarantool-based configuration storage.

See also: Securing connections with SSL.


Type: string
Default: «»
initial-settings.clusters.<cluster>.tarantool-connection.username

A username for connecting to the cluster instances.


Type: string
Default: «»
initial-settings.clusters.<cluster>.tarantool-connection.password

A password for connecting to the cluster instances.


Type: string
Default: «»
initial-settings.clusters.<cluster>.tarantool-connection.method

An authentication method for connecting to the cluster.

Possible values are the Go’s go-tarantool/Auth constants:

  • AutoAuth (0)
  • ChapSha1Auth
  • PapSha256Auth

Type: string
Default: «»
initial-settings.clusters.<cluster>.tarantool-connection.timeout

The cluster request timeout.


Type: time.Duration
Default: 0 (not set)
initial-settings.clusters.<cluster>.tarantool-connection.rate-limit

The cluster rate limit.


Type: uint
Default: 0 (not set)
initial-settings.clusters.<cluster>.tarantool-connection.ssl.key-file

A path to a TLS private key file to use for connecting to the cluster instances.

See also: Securing connections with SSL.


Type: string
Default: «»
initial-settings.clusters.<cluster>.tarantool-connection.ssl.cert-file

A path to an SSL certificate to use for connecting to the cluster instances.

See also: Securing connections with SSL.


Type: string
Default: «»
initial-settings.clusters.<cluster>.tarantool-connection.ssl.ca-file

A path to a trusted CA certificate to use for connecting to the cluster instances.

See also: Securing connections with SSL.


Type: string
Default: «»
initial-settings.clusters.<cluster>.tarantool-connection.ssl.ciphers

A list of SSL cipher suites that can be used for connecting to the cluster instances. Possible values are listed in <uri>.params.ssl_ciphers.

See also: Securing connections with SSL.


Type: string
Default: «»
initial-settings.clusters.<cluster>.tarantool-connection.ssl.enabled

A password for an encrypted private SSL key to use for connecting to the cluster instances.

See also: Securing connections with SSL.


Type: string
Default: «»
initial-settings.clusters.<cluster>.tarantool-connection.ssl.password-file

A text file with passwords for encrypted private SSL keys to use for connecting to the cluster instances.

See also: Securing connections with SSL.


Type: string
Default: «»

Проверка целостности

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

TCM поддерживает механизм проверки целостности. Механизм проверки целостности в TCM проверяет цифровую подпись централизованных файлов конфигурации. Он гарантирует, что TCM применяет только конфигурации, подписанные доверенным закрытым ключом.

Этот механизм позволяет TCM:

Параметр Описание Тип Значение по умолчанию
security.integrity-check Включает проверку подписи bool false
security.signature-private-key-file Путь к приватному ключу для подписи конфигурации string ""

Проверку целостности можно включить в конфигурационном файле TCM:

# tcm.yaml
security:
    integrity-check: true
    signature-private-key-file: /etc/tcm/private_key.pem

Примечание

integrity-check-period работает только в связке tt + Tarantool, где tt периодически проверяет целостность запущенного экземпляра. В TCM параметр не используется, так как компонент лишь загружает и проверяет подписи конфигураций, не взаимодействуя напрямую с базой данных. Кроме того, у TCM отсутствует возможность остановить выполнение Tarantool в случае обнаружения проблем — это делает только tt, запуская Tarantool с параметрами --integrity-check и --integrity-check-period. Подробнее о проверке целостности tt читайте в документации.

Tarantool Cluster Manager releases

Enterprise Edition

Tarantool Cluster Manager is a part of the Enterprise Edition.

This section contains the list of Tarantool Cluster Manager releases along with descriptions of their key changes.

For information about Tarantool releases, see Примечания к версиям.

Series First release date Versions
1.7 February 17, 2026
1.7.0
1.6 February 06, 2026
1.6.0
1.5 August 28, 2025
1.5.3
1.5.2
1.5.1
1.5.0
1.4 June 9, 2025 1.4.0
1.3 March 14, 2025
1.3.1
1.3.0
1.2 July 30, 2024
1.2.2
1.2.1
1.2.0
1.1 May 16, 2024 1.1.0
1.0 December 23, 2023
1.0.4
1.0.3
1.0.2
1.0.1
1.0.0

Tarantool Cluster Manager 1.7

Release date: February 11, 2026

Latest release in series: 1.7.1

This release introduces control over automatic default cluster creation, improves LDAP authentication handling, and enhances user management capabilities in the UI.

You can now control automatic creation of the default cluster using one of the following options:

This allows administrators to explicitly enable or disable default cluster auto-creation depending on deployment requirements.

Error handling has been improved for LDAP authentication when the Automatically add non-existent users option is disabled.

It is also now possible to create a user via the UI with LDAP authentication enabled, simplifying user management in LDAP-based environments.

Tarantool Cluster Manager 1.6

Release date: February 6, 2026

Latest release in series: 1.6.0

This release introduces support for Tarantool DataBase (TDB) workers in the cluster dashboard with integrated health monitoring, adds TLS configuration guides for secure connections, improves audit log configuration and validation, and introduces a feature flag for managing the Tuples tab. It also includes important fixes for LDAP authentication, TLS configuration parsing, and a memory leak in SSL cluster connections.

TCM adds support for TDB workers in the cluster Stateboard tab with integrated health monitoring and visibility.

TDB workers are supported starting from TDB 3.1.0. Workers are automatically discovered from etcd and continuously monitored via dedicated health check endpoints. Their metrics are proxied through TCM and exposed individually, allowing detailed operational insight.

The interface displays workers directly in the stateboard with clear status indicators and a details panel. Each worker can be in one of four statuses: healthy, degraded, unhealthy, or no connection, helping administrators quickly detect and diagnose issues.

To learn more, see TDB documentation.

The audit log configuration is now safer and more predictable.

Protocol values are validated during startup. If an invalid protocol is specified, the system automatically falls back to default settings and emits a warning. Audit log parameters can be set in advance at the system bootstrap stage by specifying them in the auditlog field of the initial-settings section in the configuration file. These settings will be applied automatically if the audit log has not been configured yet.

A feature flag has been introduced to control the visibility of the Tuples tab in the Explorer interface.

The tab is displayed only when the corresponding feature flag is enabled and the CRUD module is available. The flag can be configured either in the TCM configuration file or via command-line arguments at startup.

To enable the Tuples tab in the TCM configuration file:

# tcm.yaml
feature:
    tuples: True

This release also includes several fixes that improve system stability and security.

LDAP authentication behavior has been adjusted, including logout handling, anonymous binding to Active Directory, and the preservation of authorization method settings after a restart. TLS configuration parsing has been fixed to ensure cipher suites and curve preferences are correctly recognized in both configuration files and command-line arguments. Missing schema attributes for cluster configuration have been added, and configuration validation feedback in the editor has been improved.

Additionally, a memory leak that could occur when SSL-enabled cluster connections became unavailable has been resolved, resulting in more stable cluster operation.

Tarantool Cluster Manager 1.5

Release date: August 28, 2025

Latest release in series: 1.5.3

Tarantool Cluster Manager 1.5 introduces a new UI page for configuring TCF clusters and includes important fixes that enhance reliability, compliance, and user experience.

TCM 1.5.0 adds a dedicated settings page for managing TCF cluster parameters directly through the web interface. You can now retrieve and modify key fields that define cluster behavior and failover logic without editing configuration files manually.

The new page allows configuring the following parameters:

To make tests more efficient and predictable, all occurrences of time.Sleep were replaced with require.Eventually. This change improves test speed and reliability. Additionally, HTTP checks and tuple insertion operations in tests were updated for better performance and accuracy.

Since version 1.5.1, TCM includes a new migrations section with a duration field. This field allows specifying the maximum execution time for long-running migrations, preventing them from being interrupted by the default timeout.

TCM 1.5.3 improves overall cluster stability, fault tolerance, and configuration handling. The cluster now automatically reconnects after transient failures and continuously monitors node health to detect degraded or unavailable instances faster. Configuration changes are applied correctly without disrupting cluster operation. Quorum and health check logic were reworked to better tolerate partial failures. Unavailable nodes are now excluded from quorum calculations, preventing cluster-wide outages when only a minority of nodes becomes unavailable.

The following issues were fixed:

TCM 1.5.3 makes migration handling safer and more predictable. Applied migrations are now automatically locked from editing. Executed migrations are clearly marked as read-only in the interface, and the UI displays an explanatory message to indicate that modifications are not allowed. This prevents accidental changes to already executed migrations and ensures migration history consistency.

This release includes multiple fixes across different modules:

Tarantool Cluster Manager 1.4

Release date: June 9, 2025

Latest release in series: 1.4.0

Tarantool Cluster Manager 1.4.0 improves LDAP support and includes several enhancements and fixes aimed at improving authentication flexibility and system stability.

TCM 1.4.0 significantly enhances the experience of working with LDAP authentication. The web interface now includes a visual confirmation pop-up when a connection to an LDAP server is successfully established. This helps administrators quickly verify the correctness of LDAP settings without checking logs or reloading the page.

The authentication settings now support switching between local and LDAP methods directly in the interface, making it easier to configure hybrid or alternative access scenarios.

The LDAP configuration form has been simplified:

More about LDAP authentication.

In version 1.4.0, TCM improves the behavior of the etcd client. Previously, if one of the etcd nodes became unresponsive while keeping its port open, the client could hang indefinitely. This issue has been fixed to ensure better resilience of etcd-based components.

Additionally, the audit log mechanism now correctly creates log files in the directory of the running application binary. To learn more, see Audit log configuration.

Tarantool Cluster Manager 1.3

Release date: March 14, 2025

Latest release in series: 1.3.1

Tarantool Cluster Manager 1.3.0 enhances the TCF integration page with minor bug fixes and functional improvements. Below is an overview of key updates.

Starting from version 1.3.0, TCM provides additional actions for managing TCF clusters through the web interface. You can now use promote and demote operations directly on the TCF page without switching to external tools. Also, the TCF page is now disabled by default and must be explicitly enabled if needed. In addition, TCM now supports connections to multiple gRPC servers, which improves integration with distributed cluster infrastructures.

TCM 1.3.0 introduces a new approach to pagination in the Explorer. Instead of using a tuple, the interface now relies on pointers for navigating result pages. When sending data to the frontend, binary values (varbinary) are now automatically encoded in base64.

Additionally, TCM fixes an issue where queries using a datetime key could result in type mismatch errors due to incorrect index part handling.

In this version, TCM improves its interaction with etcd-based data sources. Tabs that use etcd for updating can now be refreshed even if some of the etcd endpoints are temporarily unavailable. To improve stability, a check was added to detect and correctly handle empty tuple arrays, preventing unexpected errors when processing empty data.

TCM 1.3.0 includes improvements to how search expressions are parsed in CRUD explorer queries. The CRUD explorer is located on the Tuples page. This release also introduces dedicated tests for the relevant components to ensure consistent behavior in future versions.

Since version 1.3.1, TCM includes missing changes that have now been properly delivered. In addition, several minor issues flagged by the Svacer linter were fixed to improve overall code quality and maintainability.

Tarantool Cluster Manager 1.2

Release date: July 30, 2024

Latest release in series: 1.2.1

Tarantool Cluster Manager 1.2 introduces new features that extend its cluster management capabilities. Below is an overview of its key updates.

TCM 1.2 introduces the ability to manage Tarantool users on connected clusters. Previously, you could manage Tarantool users only though the Lua API (box.schema submodule) or cluster configuration. Now you can create, edit, and delete users and roles on each instance of a Tarantool cluster through the TCM web interface.

The tools for managing Tarantool users on a cluster instance are located on the Users tab of the instance page.

Learn more about managing Tarantool users from TCM in Managing cluster users and roles.

Since version 1.2.0, TCM includes a page for editing and executing migrations on connected clusters. The new page Migrations in the Cluster page group provides a text editor where you can write migration scripts in Lua and apply them to the cluster.

Learn more about migrations in Tarantool Migrations.

Since version 1.2.2, TCM provides a web interface for managing cluster security settings on the Security page in the Cluster group.

Learn more about managing cluster security from TCM in Security settings.

Since version 1.2.2, TCM includes a page for managing clusters that run within Tarantool Clusters Federation.

Learn more about working with TCF in TCM in TCF integration.

Tarantool Cluster Manager 1.1

Release date: May 16, 2024

Latest release in series: 1.1.0

Tarantool Cluster Manager 1.1 introduces a number of new features that extend and improve its cluster management capabilities. Below is an overview of its key updates.

An important update of TCM 1.1.0 is a set of features that enable access to clusters“ stored data.

The instance space explorer shows all spaces that exist on an instance, including system spaces. On its pages, you can view and edit the stored data. To open the instance explorer, find the instance on the cluster stateboard and click its name to open its details page. Then click Explorer in the Actions menu in the top right corner.

In the development mode, the instance explorer also includes the schema editor. It allows you to add new and edit existing spaces.

For clusters that use the CRUD module, there is also the CRUD explorer that enables access to data in user spaces across the entire cluster. The CRUD explorer is located on the Tuples page.

TCM’s access control list (ACL) enables control over user access to particular spaces and stored functions in the web interface.

For each user that has access to a cluster, you can enable the use of ACL on this cluster. This restricts this user’s access to the cluster’s spaces and functions unless they are explicitly specified in the ACL. The ACL must contain an entry for each such space and function.

Users with ACL off have access to all spaces and functions on clusters according to their cluster permissions.

The tools for managing ACL are located on the new ACL page.

TCM 1.1 supports token authentication of external requests. Users can generate API tokens in their user settings dialog. An API token has the same permissions as its creator.

TCM 1.1 extends the functionality of the cluster stateboard to improve the cluster management experience. Here are the key updates of the stateboard:

The instance management dialog has been extended with new functions:

Starting from version 1.1.0, TCM displays metrics of connected clusters. You can view metrics in TCM one by one, visualizing them as charts or tables. The cluster metrics are shown on the new Cluster metrics page.

For more complex monitoring, you can use dedicated solutions, for example, Prometheus. It can integrate with TCM using the API tokens.

The cluster configuration editor now validates the configuration semantically. Previously, TCM was able to highlight the syntax errors in configurations, for example, incorrect spelling of option names or hierarchy. In TCM 1.1.0, the editor checks and highlights possible semantic issues, such as:

TCM 1.1.0 includes an interactive tutorial that takes new users through its main features and pages. It opens automatically after the first start.

Tarantool Cluster Manager 1.0

Release date: December 26, 2023

Latest release in series: 1.0.4

1.0 is the first public release series of Tarantool Cluster Manager. It was introduced as a part of the Tarantool EE 3.0 release. Below is an overview of key features of TCM 1.0.

TCM works as a standalone application. You can connect any number of Tarantool EE 3.0+ clusters to a single TCM instance and switch between them on the fly.

To connect a cluster to TCM, you need to provide the endpoint URLs and connection parameters of its centralized configuration storage (for example, etcd). To learn more, see Connecting clusters.

The cluster stateboard is a main TCM page that visualizes the information about the selected cluster:

From the stateboard, you can navigate to specific instances to view their details or connect to their interactive consoles.

To learn more, see Viewing cluster state.

TCM includes a visual editor for cluster configuration. It allows editing cluster configurations as a YAML file in the browser. Once you’re done editing the configuration, you can send the changes to the configuration storage in one click or save them locally to continue editing them later.

To learn more, see Configuring clusters.

TCM features its own role-based access control system. It defines users that can log into TCM and their permissions to perform various actions or access clusters in its web interface.

You can use built-in roles or create new ones with permissions you need. Users“ access can be limited to specific clusters and operations on them, for example, editing the configuration or calling stored functions. To learn more, see Access control.

TCM also supports LDAP authentication.

TCM has a built-in audit logging mechanism. When enabled, it records information about events that occur in TCM and users“ actions to dedicated audit log files. You can define events to write to the audit log and adjust logging parameters, such as filename, log rotation, or compression.

To learn more, see Audit log.

Interactive console

The interactive console is Tarantool’s basic command-line interface for entering requests and seeing results. It is what users see when they start the server without an instance file. The interactive console is often called the Lua console to distinguish it from the administrative console, but in fact it can handle both Lua and SQL input.

The majority of examples in this manual show what users see with the interactive console. It includes:

-- Interactive console example with Lua input and YAML output --
tarantool> box.info().replication
---
- 1:
    id: 1
    uuid: a5d22f66-2d28-4a35-b78f-5bf73baf6c8a
    lsn: 0
...

The input language can be either Lua (default) or SQL. To change the input language, run \set language <language>, for example:

-- Set input language to SQL --
tarantool> \set language sql
---
- true
...

The delimiter can be changed to any character with \set delimiter <character>. By default, the delimiter is empty, which means the input does not need to end with a delimiter. For example, a common recommendation for SQL input is to use the semicolon delimiter:

-- Set ';' delimiter --
tarantool> \set delimiter ;
---
...

The output format can be either YAML (default) or Lua. To change the output format, run \set output <format>, for example:

-- Set output format Lua --
tarantool> \set output lua
true

The default YAML output format is the following:

The alternative Lua format for console output is the following:

So, when an input is a Lua object description, the output in the Lua format equals it.

For the Lua output format, you can specify an end of statement symbol. It is added to the end of each output statement in the current session and can be used for parsing the output by scripts. By default, the end of statement symbol is empty. You can change it to any character or character sequence. To set an end of statement symbol for the current session, run \`set output lua,local_eos=<symbol>`, for example:

-- Set output format Lua and '#' end of statement symbol --
tarantool> \set output lua,local_eos=#
true#

To switch back to the empty end of statement symbol:

-- Set output format Lua and empty end of statement symbol --
tarantool> \set output lua,local_eos=
true

The YAML output has better readability. The Lua output can be reused in requests. The table below shows output examples in these formats compared with the MsgPack format, which is good for database storage.

Type Lua input Lua output YAML output MsgPack storage
scalar 1 1
---
- 1
...
\x01
scalar sequence 1, 2, 3 1, 2, 3
---
- 1
- 2
- 3
...
\x01 \x02 \x03
2-element table {1, 2} {1, 2}
---
- - 1
  - 2
...
0x92 0x01 0x02
map {key = 1} {key = 1}
---
- key: 1
...
\x81 \xa3 \x6b \x65 \x79 \x01

The console parameters of a Tarantool instance can also be changed from another instance using the console built-in module functions.

Since 2.10.0.

Keyboard shortcut Effect
CTRL+C Discard current input with the SIGINT signal in the console mode and jump to a new line with a default prompt.
CTRL+D Quit Tarantool interactive console.

Важно

Keep in mind that CTRL+C shortcut will shut Tarantool down if there is any currently running command in the console. The SIGINT signal stops the instance running in a daemon mode.

LuaJIT memory profiler

Since version 2.7.1, Tarantool has a built‑in module called misc.memprof that implements a LuaJIT memory profiler (further in this section we call it the profiler for short). The profiler provides a memory allocation report that helps analyze Lua code and find the places that put the most pressure on the Lua garbage collector (GC).

Inside this section:

The profiler usage involves two steps:

  1. Collecting a binary profile of allocations, reallocations, and deallocations in memory related to Lua (further, binary memory profile or binary profile for short).
  2. Parsing the collected binary profile to get a human-readable profiling report.

To collect a binary profile for a particular part of the Lua code, you need to place this part between two misc.memprof functions, namely, misc.memprof.start() and misc.memprof.stop(), and then execute the code in Tarantool.

Below is a chunk of Lua code named test.lua to illustrate this.

 1  -- Prevent allocations on traces.
 2  jit.off()
 3  local str, err = misc.memprof.start("memprof_new.bin")
 4  -- Lua doesn't create a new frame to call string.rep, and all allocations
 5  -- are attributed not to the append() function but to the parent scope.
 6  local function append(str, rep)
 7      return string.rep(str, rep)
 8  end
 9
10  local t = {}
11  for i = 1, 1e4 do
12      -- table.insert is the built-in function and all corresponding
13      -- allocations are reported in the scope of the main chunk.
14      table.insert(t,
15          append('q', i)
16      )
17  end
18  local str, err = misc.memprof.stop()

The Lua code for starting the profiler – as in line 3 in the test.lua example above – is:

local str, err = misc.memprof.start(FILENAME)

where FILENAME is the name of the binary file where profiling events are written.

If the operation fails, for example if it is not possible to open a file for writing or if the profiler is already running, misc.memprof.start() returns nil as the first result, an error-message string as the second result, and a system-dependent error code number as the third result.

If the operation succeeds, misc.memprof.start() returns true.

The Lua code for stopping the profiler – as in line 18 in the test.lua example above – is:

local str, err = misc.memprof.stop()

If the operation fails, for example if there is an error when the file descriptor is being closed or if there is a failure during reporting, misc.memprof.stop() returns nil as the first result, an error-message string as the second result, and a system-dependent error code number as the third result.

If the operation succeeds, misc.memprof.stop() returns true.

To generate the file with memory profile in binary format (in the test.lua code example above the file name is memprof_new.bin), execute the code in Tarantool:

$ tarantool test.lua

Tarantool collects the allocation events in memprof_new.bin, puts the file in its working directory, and closes the session.

The test.lua code example above also illustrates the memory allocation logic in some cases that are important to understand for reading and analyzing a profiling report:

  • Line 2: It is recommended to switch the JIT compilation off by calling jit.off() before the profiler start. Refer to the following note about jitoff for more details.
  • Lines 6-8: Tail call optimization doesn’t create a new call frame, so all allocations inside the function called via the CALLT/CALLMT bytecodes are attributed to the function’s caller. See also the comments preceding these lines.
  • Lines 14-16: Usually the information about allocations inside Lua built‑ins is not really useful for developers. That’s why if a Lua built‑in function is called from a Lua function, the profiler attributes all allocations to the Lua function. Otherwise, this event is attributed to a C function. See also the comments preceding these lines.

After getting the memory profile in binary format, the next step is to parse it to get a human-readable profiling report. You can do this via Tarantool by using the following command (mind the hyphen - before the filename):

$ tarantool -e 'require("memprof")(arg)' - memprof_new.bin

where memprof_new.bin is the binary profile generated earlier by tarantool test.lua.

Примечание

There is a slight behavior change here: the tarantool -e ... command was slightly different in Tarantool versions prior to Tarantool 2.8.1.

Tarantool generates a profiling report and displays it on the console before closing the session:

ALLOCATIONS
@test.lua:14: 10000 events  +50240518 bytes -0 bytes
@test.lua:9: 1 events       +32 bytes       -0 bytes
@test.lua:8: 1 events       +20 bytes       -0 bytes
@test.lua:13: 1 events      +24 bytes       -0 bytes

REALLOCATIONS
@test.lua:13: 13 events     +262216 bytes   -131160 bytes
    Overrides:
        @test.lua:13

@test.lua:14: 11 events     +49536 bytes    -24768 bytes
            Overrides:
        @test.lua:14
        INTERNAL

INTERNAL: 3 events          +8448 bytes     -16896 bytes
    Overrides:
        @test.lua:14

DEALLOCATIONS
INTERNAL: 1723 events       +0 bytes        -483515 bytes
@test.lua:14: 1 events      +0 bytes        -32768 bytes

HEAP SUMMARY:
@test.lua:14 holds 50248326 bytes: 10010 allocs, 10 frees
@test.lua:13 holds 131080 bytes: 14 allocs, 13 frees
INTERNAL holds 8448 bytes: 3 allocs, 3 frees
@test.lua:9 holds 32 bytes: 1 allocs, 0 frees
@test.lua:8 holds 20 bytes: 1 allocs, 0 frees

Примечание

On macOS, a report will be different for the same chunk of code because Tarantool and LuaJIT are built with the GC64 mode enabled for macOS.

Let’s examine the report structure. A report has four sections:

Each section contains event records that are sorted from the most frequent to the least frequent.

An event record has the following format:

@<filename>:<line_number>: <number_of_events> events +<allocated> bytes -<freed> bytes

where:

  • <filename> -— a name of the file containing Lua code.
  • <line_number> -— the line number where the event is detected.
  • <number_of_events> —- a number of events for this code line.
  • +<allocated> bytes —- amount of memory allocated during all the events on this line.
  • -<freed> bytes —- amount of memory freed during all the events on this line.

The Overrides label shows what allocation has been overridden.

See the test.lua chunk above with the explanation in the comments for some examples.

The INTERNAL label indicates that this event is caused by internal LuaJIT structures.

Примечание

Important note regarding the INTERNAL label and the recommendation of switching the JIT compilation off (jit.off()): this version of the profiler doesn’t support verbose reporting for allocations on traces. If memory allocations are made on a trace, the profiler can’t associate the allocations with the part of Lua code that generated the trace. In this case, the profiler labels such allocations as INTERNAL.

So, if the JIT compilation is on, new traces will be generated and there will be a mixture of events labeled INTERNAL in the profiling report: some of them are really caused by internal LuaJIT structures, but some of them are caused by allocations on traces.

If you want to have a more definite report without JIT compiler allocations, call jit.off() before starting the profiling. And if you want to completely exclude the trace allocations from the report, remove also the old traces by additionally calling jit.flush() after jit.off().

Nevertheless, switching the JIT compilation off before the profiling is not «a must». It is rather a recommendation, and in some cases, for example in a production environment, you may need to keep JIT compilation on to see the full picture of all the memory allocations. In this case, the majority of the INTERNAL events are most probably caused by traces.

As for investigating the Lua code with the help of profiling reports, it is always code-dependent and there can’t be hundred per cent definite recommendations in this regard. Nevertheless, you can see some of the things in the Profiling a report analysis example later.

Also, below is the FAQ section with the questions that most probably can arise while using the profiler.

In this section, some profiler-related points are discussed in a Q&A format.

Question (Q): Is the profiler suitable for C allocations or allocations inside C code?

Answer (A): The profiler reports only allocation events caused by the Lua allocator. All Lua-related allocations, like table or string creation are reported. But the profiler doesn’t report allocations made by malloc() or other non-Lua allocators. You can use valgrind to debug them.


Q: Why are there so many INTERNAL allocations in my profiling report? What does it mean?

A: INTERNAL means that these allocations/reallocations/deallocations are related to the internal LuaJIT structures or are made on traces. Currently, the profiler doesn’t verbosely report allocations of objects that are made during trace execution. Try adding jit.off() before the profiler start.


Q: Why are there some reallocations/deallocations without an Overrides section?

A: These objects can be created before the profiler starts. Adding collectgarbage() before the profiler’s start enables collecting all previously allocated objects that are dead when the profiler starts.


Q: Why are some objects not collected during profiling? Is it a memory leak?

A: LuaJIT uses incremental Garbage Collector (GC). A GC cycle may not be finished at the moment the profiler stops. Add collectgarbage() before stopping the profiler to collect all the dead objects for sure.


Q: Can I profile not just a current chunk but the entire running application? Can I start the profiler when the application is already running?

A: Yes. Here is an example of code that can be inserted in the Tarantool console for a running instance.

 1  local fiber = require "fiber"
 2  local log = require "log"
 3
 4  fiber.create(function()
 5    fiber.name("memprof")
 6
 7    collectgarbage() -- Collect all objects already dead
 8    log.warn("start of profile")
 9
10    local st, err = misc.memprof.start(FILENAME)
11    if not st then
12      log.error("failed to start profiler: %s", err)
13    end
14
15    fiber.sleep(TIME)
16
17    collectgarbage()
18    st, err = misc.memprof.stop()
19
20    if not st then
21      log.error("profiler on stop error: %s", err)
22    end
23
24    log.warn("end of profile")
25  end)

where:

Also, you can directly call misc.memprof.start() and misc.memprof.stop() from a console.

In the example below, the following Lua code named format_concat.lua is investigated with the help of the memory profiler reports.

 1  -- Prevent allocations on new traces.
 2  jit.off()
 3
 4  local function concat(a)
 5    local nstr = a.."a"
 6    return nstr
 7  end
 8
 9  local function format(a)
10    local nstr = string.format("%sa", a)
11    return nstr
12  end
13
14  collectgarbage()
15
16  local binfile = "/tmp/memprof_"..(arg[0]):match("([^/]*).lua")..".bin"
17
18  local st, err = misc.memprof.start(binfile)
19  assert(st, err)
20
21  -- Payload.
22  for i = 1, 10000 do
23    local f = format(i)
24    local c = concat(i)
25  end
26  collectgarbage()
27
28  local st, err = misc.memprof.stop()
29  assert(st, err)
30
31  os.exit()

When you run this code in Tarantool and then parse the binary memory profile in /tmp/memprof_format_concat.bin, you will get the following profiling report:

ALLOCATIONS
@format_concat.lua:10: 19996 events +624284 bytes   -0 bytes
INTERNAL: 1 events                  +65536 bytes    -0 bytes

REALLOCATIONS

DEALLOCATIONS
INTERNAL: 19996 events              +0 bytes        -558778 bytes
    Overrides:
        @format_concat.lua:10

@format_concat.lua:10: 2 events     +0 bytes        -98304 bytes
    Overrides:
        @format_concat.lua:10

HEAP SUMMARY:
INTERNAL holds 65536 bytes: 1 allocs, 0 frees

Reasonable questions regarding the report can be:

First of all, LuaJIT doesn’t create a new string if the string with the same payload exists (see details on lua-users.org/wiki). This is called string interning. So, when a string is created via the format() function, there is no need to create the same string via the concat() function, and LuaJIT just uses the previous one.

That is also the reason why the number of allocations is not a round number as could be expected from the cycle operator for i = 1, 10000...: Tarantool creates some strings for internal needs and built‑in modules, so some strings already exist.

But why are there so many allocations? It’s almost twice as big as the expected amount. This is because the string.format() built‑in function creates another string necessary for the %s identifier, so there are two allocations for each iteration: for tostring(i) and for string.format("%sa", string_i_value). You can see the difference in behavior by adding the line local _ = tostring(i) between lines 22 and 23.

To profile only the concat() function, comment out line 23 (which is local f = format(i)) and run the profiler. Now the output looks like this:

ALLOCATIONS
@format_concat.lua:5: 10000 events  +284411 bytes    -0 bytes

REALLOCATIONS

DEALLOCATIONS
INTERNAL: 10000 events              +0 bytes         -218905 bytes
    Overrides:
        @format_concat.lua:5

@format_concat.lua:5: 1 events      +0 bytes         -32768 bytes

HEAP SUMMARY:
@format_concat.lua:5 holds 65536 bytes: 10000 allocs, 9999 frees

Q: But what will change if JIT compilation is enabled?

A: In the code, comment out line 2 (which is jit.off()) and run the profiler. Now there are only 56 allocations in the report, and all the other allocations are JIT-related (see also the related dev issue):

ALLOCATIONS
@format_concat.lua:5: 56 events +1112 bytes -0 bytes
@format_concat.lua:0: 4 events  +640 bytes  -0 bytes
INTERNAL: 2 events              +382 bytes  -0 bytes

REALLOCATIONS

DEALLOCATIONS
INTERNAL: 58 events             +0 bytes    -1164 bytes
    Overrides:
        @format_concat.lua:5
        INTERNAL


HEAP SUMMARY:
@format_concat.lua:0 holds 640 bytes: 4 allocs, 0 frees
INTERNAL holds 360 bytes: 2 allocs, 1 frees

This happens because a trace has been compiled after 56 iterations (the default value of the hotloop compiler parameter). Then, the JIT-compiler removed the unused variable c from the trace, and, therefore, the dead code of the concat() function is eliminated.

Next, let’s profile only the format() function with JIT enabled. For that, comment out lines 2 and 24 (jit.off() and local c = concat(i)), do not comment out line 23 (local f = format(i)), and run the profiler. Now the output will look like this:

ALLOCATIONS
@format_concat.lua:10: 19996 events +624284 bytes  -0 bytes
INTERNAL: 4 events                  +66928 bytes   -0 bytes
@format_concat.lua:0: 4 events      +640 bytes     -0 bytes

REALLOCATIONS

DEALLOCATIONS
INTERNAL: 19997 events              +0 bytes       -559034 bytes
    Overrides:
        @format_concat.lua:0
        @format_concat.lua:10

@format_concat.lua:10: 2 events     +0 bytes       -98304 bytes
    Overrides:
        @format_concat.lua:10


HEAP SUMMARY:
INTERNAL holds 66928 bytes: 4 allocs, 0 frees
@format_concat.lua:0 holds 384 bytes: 4 allocs, 1 frees

Q: Why are there so many allocations in comparison to the concat() function?

A: The answer is simple: the string.format() function with the %s identifier is not yet compiled via LuaJIT. So, a trace can’t be recorded and the compiler doesn’t perform the corresponding optimizations.

If we change the format() function in lines 9-12 of the Profiling a report analysis example in the following way

local function format(a)
  local nstr = string.format("%sa", tostring(a))
  return nstr
end

the profiling report becomes much prettier:

ALLOCATIONS
@format_concat.lua:10: 109 events   +2112 bytes -0 bytes
@format_concat.lua:0: 4 events      +640 bytes  -0 bytes
INTERNAL: 3 events                  +1206 bytes -0 bytes

REALLOCATIONS

DEALLOCATIONS
INTERNAL: 112 events                +0 bytes    -2460 bytes
    Overrides:
        @format_concat.lua:0
        @format_concat.lua:10
        INTERNAL


HEAP SUMMARY:
INTERNAL holds 1144 bytes: 3 allocs, 1 frees
@format_concat.lua:0 holds 384 bytes: 4 allocs, 1 frees

This feature was added in version 2.8.1.

The end of each display is a HEAP SUMMARY section which looks like this:

@<filename>:<line number> holds <number of still reachable bytes> bytes:
<number of allocation events> allocs, <number of deallocation events> frees

Sometimes a program can cause many deallocations, so the DEALLOCATION section can become large, so the display is not easy to read. To minimize output, start the parsing with an extra flag: --leak-only, for example

$ tarantool -e 'require("memprof")(arg)' - --leak-only memprof_new.bin

When --leak-only is used, only the HEAP SUMMARY section is displayed.

LuaJIT platform profiler

The default profiling options for LuaJIT are not fine enough to get an understanding of performance. For example, perf is only able to show the host stack, so all the Lua calls are displayed as a single pcall(). Oppositely, the jit.p module provided with LuaJIT is not able to give any information about the host stack.

Since version 2.10.0, Tarantool has a built‑in module called misc.sysprof that implements a LuaJIT sampling profiler (further in this section we call it the profiler for short). The profiler can capture both guest and host stacks simultaneously, along with virtual machine states, so it can show the whole picture.

Three profiling modes are available:

The profiler comes with a default parser, which produces output in a flamegraph.pl-suitable format.

Inside this section:

The profiler usage involves two steps:

  1. Collecting a binary profile of stacks (further referred as binary sampling profile or binary profile for short).
  2. Parsing the collected binary profile to get a human-readable profiling report.

To collect a binary profile for a particular part of the Lua and C code, you need to place this part between two misc.sysprof functions – namely, misc.sysprof.start() and misc.sysprof.stop() – and then execute the code in Tarantool.

Below is a chunk of Lua code named test.lua to illustrate this.

 1  local function payload()
 2    local function fib(n)
 3      if n <= 1 then
 4        return n
 5      end
 6      return fib(n - 1) + fib(n - 2)
 7    end
 8    return fib(32)
 9  end
10
11  payload()
12
13  local res, err = misc.sysprof.start({mode = 'C', interval = 1, path = 'sysprof.bin'})
14  assert(res, err)
15
16  payload()
17
18  res, err = misc.sysprof.stop()
19  assert(res, err)

The Lua code for starting the profiler – as in line 1 in the test.lua example above – is:

local str, err = misc.sysprof.start({mode = 'C', interval = 1, path = 'sysprof.bin'})

where:

  • mode is a profiling mode;
  • interval is a sampling interval;
  • sysprof.bin is the name of the binary file where profiling events are written.

If the operation fails, for example if it is not possible to open a file for writing or if the profiler is already running, misc.sysprof.start() returns nil as the first result, an error-message string as the second result, and a system-dependent error code number as the third result.

If the operation succeeds, misc.sysprof.start() returns true.

The Lua code for stopping the profiler – as in line 15 in the test.lua example above – is:

local res, err = misc.sysprof.stop()

If the operation fails, for example if there is an error when the file descriptor is being closed or if there is a failure during reporting, misc.sysprof.stop() returns nil as the first result, an error-message string as the second result, and a system-dependent error code number as the third result.

If the operation succeeds, misc.sysprof.stop() returns true.

To generate a file with the memory profile in the binary format (in the test.lua code example above the file name is sysprof.bin), execute the code in Tarantool:

$ tarantool test.lua

Tarantool collects allocation events in sysprof.bin, puts the file in its working directory, and closes the session.

After getting the platform profile in the binary format, the next step is to parse it to get a human-readable profiling report. You can do this via Tarantool with the following command (mind the hyphen - before the filename):

$ tarantool -e 'require("sysprof")(arg)' - sysprof.bin > tmp
$ curl -O https://raw.githubusercontent.com/brendangregg/FlameGraph/refs/heads/master/flamegraph.pl
$ perl flamegraph.pl tmp > sysprof.svg

where sysprof.bin is the binary profile generated earlier by tarantool test.lua.

Примечание

There is a slight behavior change here: the tarantool -e ... command was slightly different in Tarantool versions prior to Tarantool 2.8.1. The resulting SVG image contains a flamegraph with collected stacks and can be opened by a modern web-browser for analysis.

As for investigating the Lua code with the help of profiling reports, it is always code-dependent and there are no definite recommendations in this regard. Nevertheless, you can see some of the things in the Profiling report analysis example below.

The platform profiler provides a Lua interface:

The first two functions return boolean res and err, which is nil on success and contains an error message on failure.

misc.sysprof.report returns a Lua table containing the following counters:

 {
   "samples" = int,
   "INTERP" = int,
   "LFUNC" = int,
   "FFUNC" = int,
   "CFUNC" = int,
   "GC" = int,
   "EXIT" = int,
   "RECORD" = int,
   "OPT" = int,
   "ASM" = int,
   "TRACE" = int
}

The opts argument of misc.sysprof.start can contain the following parameters:

The platform profiler provides a low-level C interface:

Примечание

There is no need to call the configuration functions multiple times if you are starting and stopping the profiler several times in a single program.

Also, it is not necessary to configure sysprof for the Default mode. However, you MUST configure it for other modes.

All of the functions return 0 on success and an error code on failure.

Profiler configuration settings include:

  • typedef size_t (*sp_writer)(const void **data, size_t len, void *ctx) – a writer function for profile events.

    Must be async-safe, see also man 7 signal-safety.

    Should return the amount of written bytes on success, or zero in case of error.

    Setting *data to NULL means end of profiling. For details see lj_wbuf.h.

  • typedef int (*sp_on_stop)(void *ctx, uint8_t *buf) – a callback on profiler stopping. Required for a correct cleanup at VM finalization when the profiler is still running.

    Returns zero on success.

  • typedef void (*sp_backtracer)(void *(*frame_writer)(int frame_no, void *addr)) – a backtracing function for the host stack. Should call frame_writer on each frame in the stack, in the order from the stack top to the stack bottom.

    The frame_writer function is implemented inside sysprof and will be passed to the backtracer function.

    If frame_writer returns NULL, backtracing should be stopped. If frame_writer returns not NULL, the backtracing should be continued if there are frames left.

The options structure for luaM_sysprof_start is as follows:

struct luam_Sysprof_Options {
  /* Profiling mode. */
  uint8_t mode;

  /* Sampling interval in msec. */
  uint64_t interval;

  /* Custom buffer to write data. */
  uint8_t *buf;

  /* The buffer's size. */
  size_t len;

  /* Context for the profile writer and final callback. */
  void *ctx;
};

The platform profiler supports three profiling modes:

  • DEFAULT mode collects only data for luam_sysprof_counters, which is stored in memory and can be collected with luaM_sysprof_report after the profiler stops.
  • LEAF mode = DEFAULT + streams samples with only top frames of the host and guests stacks in the format described in lj_sysprof.h.
  • CALLGRAPH mode = DEFAULT + streams samples with full callchains of the host and guest stacks in the format described in lj_sysprof.h.
#define LUAM_SYSPROF_DEFAULT 0
#define LUAM_SYSPROF_LEAF 1
#define LUAM_SYSPROF_CALLGRAPH 2

The counters structure for luaM_sysprof_report is as follows:

struct luam_Sysprof_Counters {
  uint64_t vmst_interp;
  uint64_t vmst_lfunc;
  uint64_t vmst_ffunc;
  uint64_t vmst_cfunc;
  uint64_t vmst_gc;
  uint64_t vmst_exit;
  uint64_t vmst_record;
  uint64_t vmst_opt;
  uint64_t vmst_asm;
  uint64_t vmst_trace;

  uint64_t samples;
};

Примечание

The order of vmst_* counters is important: it should be the same as the order of the vmstates.

  • Providing writers, backtracers and other settings in the Default mode is pointless, since it only collects counters.
  • There is NO default configuration for sysprof, so luaM_Sysprof_Configure must be called before the first run of sysprof. Mind the async safety.

LuaJIT getmetrics

Tarantool can return metrics of a current instance via the Lua API or the C API.

getmetrics()

Get the metrics values into a table.

Parameters: none

Return:table

Example: metrics_table = misc.getmetrics()

The metrics table contains 19 values. All values have type = „number“ and are the result of a cast to double, so there may be a very slight precision loss. Values whose names begin with gc_ are associated with the LuaJIT garbage collector; a fuller study of the garbage collector can be found at a Lua-users wiki page and a slide from the creator of Lua. Values whose names begin with jit_ are associated with the «phases» of the just-in-time compilation process; a fuller study of JIT phases can be found at A masters thesis from cern.ch.

Values described as «monotonic» are cumulative, that is, they are «totals since all operations began», rather than «since the last getmetrics() call». Overflow is possible.

Because many values are monotonic, a typical analysis involves calling getmetrics(), saving the table, calling getmetrics() again and comparing the table to what was saved. The difference is a «slope curve». An interesting slope curve is one that shows acceleration, for example the difference between the latest value and the previous value keeps increasing. Some of the table members shown here are used in the examples that come later in this section.

Name Content Monotonic?
gc_allocated number of bytes of allocated memory yes
gc_cdatanum number of allocated cdata objects no
gc_freed number of bytes of freed memory yes
gc_steps_atomic number of steps of garbage collector, atomic phases, incremental yes
gc_steps_finalize number of steps of garbage collector, finalize yes
gc_steps_pause number of steps of garbage collector, pauses yes
gc_steps_propagate number of steps of garbage collector, propagate yes
gc_steps_sweep number of steps of garbage collector, sweep phases (see the Sweep phase description) yes
gc_steps_sweepstring number of steps of garbage collector, sweep phases for strings yes
gc_strnum number of allocated string objects no
gc_tabnum number of allocated table objects no
gc_total number of bytes of currently allocated memory (normally equals gc_allocated minus gc_freed) no
gc_udatanum number of allocated udata objects no
jit_mcode_size total size of all allocated machine code areas no
jit_snap_restore overall number of snap restores, based on the number of guard assertions leading to stopping trace executions (see external Snap tutorial) yes
jit_trace_abort overall number of aborted traces yes
jit_trace_num number of JIT traces no
strhash_hit number of strings being interned because, if a string with the same value is found via the hash, a new one is not created / allocated yes
strhash_miss total number of strings allocations during the platform lifetime yes

Note: Although value names are similar to value names in ujit.getmetrics() the values are not the same, primarily because many ujit numbers are not monotonic.

Note: Although value names are similar to value names in LuaJIT metrics, and the values are exactly the same, misc.getmetrics() is slightly easier because there is no need to ‘require’ the misc module.

The Lua getmetrics() function is a wrapper for the C function luaM_metrics().

C programs may include a header named libmisclib.h. The definitions in libmisclib.h include the following lines:

struct luam_Metrics { /* the names described earlier for Lua */ }

LUAMISC_API void luaM_metrics(lua_State *L, struct luam_Metrics *metrics);

The names of struct luam_Metrics members are the same as Lua’s getmetrics table values names. The data types of struct luam_Metrics members are all size_t. The luaM_metrics() function will fill the *metrics structure with the metrics related to the Lua state anchored to the L coroutine.

Example with a C program

Go through the C stored procedures tutorial. Replace the easy.c example with

#include "module.h"
#include <lmisclib.h>

int easy(box_function_ctx_t *ctx, const char *args, const char *args_end)
{
  lua_State *ls = luaT_state();
  struct luam_Metrics m;
  luaM_metrics(ls, &m);
  printf("allocated memory = %lu\n", m.gc_allocated);
  return 0;
}

Now when you go back to the client and execute the requests up to and including the line capi_connection:call('easy') you will see that the display is something like «allocated memory = 4431950» although the number will vary.

To track new string object allocations:

function f()
  collectgarbage("collect")
  local oldm = misc.getmetrics()
  local table_of_strings = {}
  for i = 3000, 4000 do table.insert(table_of_strings, tostring(i)) end
  for i = 3900, 4100 do table.insert(table_of_strings, tostring(i)) end
  local newm = misc.getmetrics()
  print("gc_strnum diff = " .. newm.gc_strnum - oldm.gc_strnum)
  print("strhash_miss diff = " .. newm.strhash_miss - oldm.strhash_miss)
  print("strhash_hit diff = " .. newm.strhash_hit - oldm.strhash_hit)
end
f()

The result will probably be: «gc_strnum diff = 1100» because we added 1202 strings but 101 were duplicates, «strhash_miss_diff = 1100» for the same reason, «strhash_hit_diff = 101» plus some overhead, for the same reason. (There is always a slight overhead amount for strhash_hit, which can be ignored.)

We say «probably» because there is a chance that the strings were already allocated somewhere. It is a good thing if the slope curve of strhash_miss is less than the slope curve of strhash_hit.

The other gc_*num values – gc_cdatanum, gc_tabnum, gc_udatanum – can be accessed in a similar way. Any of the gc_*num values can be useful when looking for memory leaks – the total number of these objects should not grow nonstop. A more general way to look for memory leaks is to watch gc_total. Also jit_mcode_size can be used to watch the amount of allocated memory for machine code traces.

To track an application’s effect on the garbage collector (less is better):

function f()
  for i = 1, 10 do collectgarbage("collect") end
  local oldm = misc.getmetrics()
  local newm = misc.getmetrics()
  oldm = misc.getmetrics()
  collectgarbage("collect")
  newm = misc.getmetrics()
  print("gc_allocated diff = " .. newm.gc_allocated - oldm.gc_allocated)
  print("gc_freed diff = " .. newm.gc_freed - oldm.gc_freed)
end
f()

The result will be: gc_allocated diff = 800, gc_freed diff = 800. This shows that local ... = getmetrics() itself causes memory allocation (because it is creating a table and assigning to it), and shows that when the name of a variable (in this case the oldm variable) is used again, that causes freeing. Ordinarily the freeing would not occur immediately, but collectgarbage("collect") forces it to happen so we can see the effect.

To test whether optimizing for space is possible with tables:

function f()
  collectgarbage("collect")
  local oldm = misc.getmetrics()
  local t = {}
  for i = 1, 513 do
    t[i] = i
  end
  local newm = misc.getmetrics()
  local diff = newm.gc_allocated - oldm.gc_allocated
  print("diff = " .. diff)
end
f()

The result will show that diff equals approximately 18000.

Now see what happens if the table initialization is different:

function f()
  local table_new = require "table.new"
  local oldm = misc.getmetrics()
  local t = table_new(513, 0)
  for i = 1, 513 do
    t[i] = i
  end
  local newm = misc.getmetrics()
  local diff = newm.gc_allocated - oldm.gc_allocated
  print("diff = " .. diff)
end
f()

The result will show that diff equals approximately 6000.

The slope curves of gc_steps_* items can be used for tracking pressure on the garbage collector too. During long-running routines, gc_steps_* values will increase, but long times between gc_steps_atomic increases are a good sign, And, since gc_steps_atomic increases only once per garbage-collector cycle, it shows how many garbage-collector cycles have occurred.

Also, increases in the gc_steps_propagate number can be used to estimate indirectly how many objects there are. These values also correlate with the garbage collector’s step multiplier. For example, the number of incremental steps can grow, but according to the step multiplier configuration, one step can process only a small number of objects. So these metrics should be considered when configuring the garbage collector.

The following function takes a casual look whether an SQL statement causes much pressure:

function f()
  collectgarbage("collect")
  local oldm = misc.getmetrics()
  collectgarbage("collect")
  box.execute([[DROP TABLE _vindex;]])
  local newm = misc.getmetrics()
  print("gc_steps_atomic = " .. newm.gc_steps_atomic - oldm.gc_steps_atomic)
  print("gc_steps_finalize = " .. newm.gc_steps_finalize - oldm.gc_steps_finalize)
  print("gc_steps_pause = " .. newm.gc_steps_pause - oldm.gc_steps_pause)
  print("gc_steps_propagate = " .. newm.gc_steps_propagate - oldm.gc_steps_propagate)
  print("gc_steps_sweep = " .. newm.gc_steps_sweep - oldm.gc_steps_sweep)
end
f()

And the display will show that the gc_steps_* metrics are not significantly different from what they would be if the box.execute() was absent.

Just-in-time compilers will «trace» code looking for opportunities to compile. jit_trace_abort can show how often there was a failed attempt (less is better), and jit_trace_num can show how many traces were generated since the last flush (usually more is better).

The following function does not contain code that can cause trouble for LuaJIT:

function f()
  jit.flush()
  for i = 1, 10 do collectgarbage("collect") end
  local oldm = misc.getmetrics()
  collectgarbage("collect")
  local sum = 0
  for i = 1, 57 do
    sum = sum + 57
  end
  for i = 1, 10 do collectgarbage("collect") end
  local newm = misc.getmetrics()
  print("trace_num = " .. newm.jit_trace_num - oldm.jit_trace_num)
  print("trace_abort = " .. newm.jit_trace_abort - oldm.jit_trace_abort)
end
f()

The result is: trace_num = 1, trace_abort = 0. Fine.

The following function seemingly does contain code that can cause trouble for LuaJIT:

jit.opt.start(0, "hotloop=2", "hotexit=2", "minstitch=15")
_G.globalthing = 5
function f()
  jit.flush()
  collectgarbage("collect")
  local oldm = misc.getmetrics()
  collectgarbage("collect")
  local sum = 0
  for i = 1, box.space._vindex:count()+ _G.globalthing do
    box.execute([[SELECT RANDOMBLOB(0);]])
    require('buffer').ibuf()
    _G.globalthing = _G.globalthing - 1
  end
  local newm = misc.getmetrics()
  print("trace_num = " .. newm.jit_trace_num - oldm.jit_trace_num)
  print("trace_abort = " .. newm.jit_trace_abort - oldm.jit_trace_abort)
end
f()

The result is: trace_num = between 2 and 4, trace_abort = 1. This means that up to four traces needed to be generated instead of one, and this means that something made LuaJIT give up in despair. Tracing more will reveal that the problem is not the suspicious-looking statements within the function, it is the jit.opt.start call. (A look at a jit.dump file might help in examining the trace compilation process.)

If the slope curves of the jit_snap_restore metric grow after changes to old code, that can mean LuaJIT is stopping trace execution more frequently, and that can mean performance is degraded.

Start with this code:

function f()
  local function foo(i)
    return i <= 5 and i or tostring(i)
  end
  -- minstitch option needs to emulate nonstitching behaviour
  jit.opt.start(0, "hotloop=2", "hotexit=2", "minstitch=15")
  local sum = 0
  local oldm = misc.getmetrics()
  for i = 1, 10 do
    sum = sum + foo(i)
  end
  local newm = misc.getmetrics()
  local diff = newm.jit_snap_restore - oldm.jit_snap_restore
  print("diff = " .. diff)
end
f()

The result will be: diff = 3, because there is one side exit when the loop ends, and there are two side exits to the interpreter before LuaJIT may decide that the chunk of code is «hot» (the default value of the hotloop parameter is 56 according to Running LuaJIT).

And now change only one line within function local foo, so now the code is:

function f()
  local function foo(i)
    -- math.fmod is not yet compiled!
    return i <= 5 and i or math.fmod(i, 11)
  end
  -- minstitch option needs to emulate nonstitching behaviour
  jit.opt.start(0, "hotloop=2", "hotexit=2", "minstitch=15")
  local sum = 0
  local oldm = misc.getmetrics()
  for i = 1, 10 do
    sum = sum + foo(i)
  end
  local newm = misc.getmetrics()
  local diff = newm.jit_snap_restore - oldm.jit_snap_restore
  print("diff = " .. diff)
end
f()

The result will be: diff is larger, because there are more side exits. So this test indicates that changing the code affected the performance.

Администрирование

Tarantool устроен таким образом, что возможно запустить несколько экземпляров программы на одном компьютере.

Здесь мы показываем, как администрировать экземпляры Tarantool с помощью любой из следующих утилит:

Примечание

Эта глава включает в себя следующие разделы:

Managing modules

This section covers the installation and reloading of Tarantool modules. To learn about writing your own module and contributing it, check the Contributing a module section.

Модули на Lua и C от разработчиков Tarantool и сторонних разработчиков доступны здесь:

Для получения подробной информации см. README в репозитории tarantool/rocks.

Выполните следующие действия:

  1. Установите Tarantool в соответствии с рекомендациями на странице загрузки.

  2. Install the module you need. Look up the module’s name on Tarantool rocks page and put the prefix «tarantool-» before the module name to avoid ambiguity:

    $ # для Ubuntu/Debian:
    $ sudo apt-get install tarantool-<module-name>
    
    $ # для RHEL/CentOS/Amazon:
    $ sudo yum install tarantool-<module-name>
    

    Например, чтобы установить модуль vshard на Ubuntu, введите:

    $ sudo apt-get install tarantool-vshard
    

Теперь можно:

  • загружать любой модуль с помощью

    tarantool> name = require('module-name')
    

    например:

    tarantool> vshard = require('vshard')
    
  • локально находить установленные модули с помощью package.path (Lua) или package.cpath (C):

    tarantool> package.path
    ---
    - ./?.lua;./?/init.lua; /usr/local/share/tarantool/?.lua;/usr/local/share/
    tarantool/?/init.lua;/usr/share/tarantool/?.lua;/usr/share/tarantool/?/ini
    t.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua;/
    usr/share/lua/5.1/?.lua;/usr/share/lua/5.1/?/init.lua;
    ...
    
    tarantool> package.cpath
    ---
    - ./?.so;/usr/local/lib/x86_64-linux-gnu/tarantool/?.so;/usr/lib/x86_64-li
    nux-gnu/tarantool/?.so;/usr/local/lib/tarantool/?.so;/usr/local/lib/x86_64
    -linux-gnu/lua/5.1/?.so;/usr/lib/x86_64-linux-gnu/lua/5.1/?.so;/usr/local/
    lib/lua/5.1/?.so;
    ...
    

    Примечание

    Знаки вопроса стоят вместо имени модуля, которое было указано ранее при вызове require('module-name').

Любое приложение или модуль Tarantool можно перезагрузить с нулевым временем простоя.

Ниже представлен пример, который иллюстрирует наиболее типичный случай – «обновление и перезагрузка».

Примечание

In this example, we use recommended administration practices based on instance files and tt utility.

  1. Обновите файлы приложения.

    Например, модуль в /usr/share/tarantool/app.lua:

    local function start()
      -- начальная версия
      box.once("myapp:v1.0", function()
        box.schema.space.create("somedata")
        box.space.somedata:create_index("primary")
        ...
      end)
    
      -- код миграции с 1.0 на 1.1
      box.once("myapp:v1.1", function()
        box.space.somedata.index.primary:alter(...)
        ...
      end)
    
      -- код миграции с 1.1 на 1.2
      box.once("myapp:v1.2", function()
        box.space.somedata.index.primary:alter(...)
        box.space.somedata:insert(...)
        ...
      end)
    end
    
    -- запустить файберы в фоновом режиме, если необходимо
    
    local function stop()
      -- остановить все файберы, работающие в фоновом режиме, и очистить ресурсы
    end
    
    local function api_for_call(xxx)
      -- сделать что-то
    end
    
    return {
      start = start,
      stop = stop,
      api_for_call = api_for_call
    }
    
  2. Обновить файл экземпляра.

    Например, /etc/tarantool/instances.enabled/my_app.lua:

    #!/usr/bin/env tarantool
    --
    -- пример горячей перезагрузки кода
    --
    
    box.cfg({listen = 3302})
    
    -- ВНИМАНИЕ: правильно выполните разгрузку!
    local app = package.loaded['app']
    if app ~= nil then
      -- остановите старую версию приложения
      app.stop()
      -- разгрузите приложение
      package.loaded['app'] = nil
      -- разгрузите все зависимости
      package.loaded['somedep'] = nil
    end
    
    -- загрузите приложение
    log.info('require app')
    app = require('app')
    
    -- запустите приложение
    app.start({some app options controlled by sysadmins})
    

    Самое главное – правильно разгрузить приложение и его зависимости.

  3. Вручную перезагрузите файл приложения.

    For example, using tt:

    $ tt connect my_app -f /etc/tarantool/instances.enabled/my_app.lua
    

После компиляции новой версии модуля на C (библиотека общего пользования *.so), вызовите функцию box.schema.func.reload(„module-name“) из Lua-скрипта для перезагрузки модуля.

Журналирование

Each Tarantool instance logs important events to its own log file. For instances started with tt, the log location is defined by the log_dir parameter in the tt configuration. By default, it’s /var/log/tarantool in the tt system mode, and the var/log subdirectory of the tt working directory in the local mode. In the specified location, tt creates separate directories for each instance’s logs.

To check how logging works, write something to the log using the log module:

$ tt connect application
   • Connecting to the instance...
   • Connected to application

application> require('log').info("Hello for the manual readers")
---
...

Затем проверим содержимое журнала:

$ tail instances.enabled/application/var/log/instance001/tt.log
2024-04-09 17:34:29.489 [49502] main/106/gc I> wal/engine cleanup is resumed
2024-04-09 17:34:29.489 [49502] main/104/interactive/box.load_cfg I> set 'instance_name' configuration option to "instance001"
2024-04-09 17:34:29.489 [49502] main/104/interactive/box.load_cfg I> set 'custom_proc_title' configuration option to "tarantool - instance001"
2024-04-09 17:34:29.489 [49502] main/104/interactive/box.load_cfg I> set 'log_nonblock' configuration option to false
2024-04-09 17:34:29.489 [49502] main/104/interactive/box.load_cfg I> set 'replicaset_name' configuration option to "replicaset001"
2024-04-09 17:34:29.489 [49502] main/104/interactive/box.load_cfg I> set 'listen' configuration option to [{"uri":"127.0.0.1:3301"}]
2024-04-09 17:34:29.489 [49502] main/107/checkpoint_daemon I> scheduled next checkpoint for Tue Apr  9 19:08:04 2024
2024-04-09 17:34:29.489 [49502] main/104/interactive/box.load_cfg I> set 'metrics' configuration option to {"labels":{"alias":"instance001"},"include":["all"],"exclude":[]}
2024-04-09 17:34:29.489 [49502] main I> entering the event loop
2024-04-09 17:34:38.905 [49502] main/116/console/unix/:/tarantool I> Hello for the manual readers

When logging to a file, the system administrator must ensure logs are rotated timely and do not take up all the available disk space. The recommended way to prevent log files from growing infinitely is using an external log rotation program, for example, logrotate, which is pre-installed on most mainstream Linux distributions.

A Tarantool log rotation configuration for logrotate can look like this:

# /var/log/tarantool/<env>/<app>/<instance>/*.log
/var/log/tarantool/*/*/*/*.log {
    daily
    size 512k
    missingok
    rotate 10
    compress
    delaycompress
    sharedscripts # Run tt logrotate only once after all logs are rotated.
    postrotate
        /usr/bin/tt -S logrotate
    endscript
}

In this configuration, tt logrotate is called after each log rotation to reopen the instance log files after they are moved by the logrotate program.

There is also the built-in function log.rotate(), which you can call on an instance to reopen its log file after rotation.

Tarantool can write its logs to a log file, to syslog, or to a specified program through a pipe. For example, to send logs to syslog, specify the log.to parameter as follows:

log:
  to: syslog
  syslog:
    server: '127.0.0.1:514'

Безопасность

Tarantool разрешает два типа подключений:

Если вы подключены к административной консоли:

Поэтому порты для административной консоли следует настраивать очень осторожно. Если это TCP-порт, он должен быть открыть только для определенного IP-адреса. В идеале вместо TCP-порта лучше настроить доменный Unix-сокет, который требует наличие прав доступа к серверной машине. Тогда типичная настройка порта для административной консоли будет выглядеть следующим образом:

console.listen('/var/lib/tarantool/socket_name.sock')

а типичный URI для соединения будет таким:

/var/lib/tarantool/socket_name.sock

if the listener has the privilege to write on /var/lib/tarantool and the connector has the privilege to read on /var/lib/tarantool. Alternatively, to connect to an admin console of an instance started with tt, use tt connect.

Выяснить, является ли некоторый TCP-порт портом для административной консоли, можно с помощью telnet. Например:

$ telnet 0 3303
Trying 0.0.0.0...
Connected to 0.
Escape character is '^]'.
Tarantool 2.1.0 (Lua console)
type 'help' for interactive help

В этом примере в ответе от сервера нет слова «binary» и есть слова «Lua console». Это значит, что мы успешно подключились к порту для административной консоли и можем вводить администраторские запросы на этом терминале.

Если вы подключены к бинарному порту:

For ease of use, tt connect command automatically detects the type of connection during handshake and uses EVAL binary protocol command when it’s necessary to execute Lua commands over a binary connection. To execute EVAL, the authenticated user must have global «EXECUTE» privilege.

Поэтому при невозможности подключиться к машине по ssh системный администратор может получить удаленный доступ к экземпляру, создав пользователя Tarantool с глобальными «EXECUTE»-правами и непустым паролем.

Управление доступом

Tarantool enables flexible management of access to various database resources. The main concepts of Tarantool access control system are as follows:

Примечание

The full list of object types and permissions is available in the All object types and permissions section.

A user identifies a person or program that interacts with a Tarantool instance. There might be different types of users, for example:

  • A database administrator responsible for the overall management and administration of a database. An administrator can create other users and grant them specified privileges.
  • A user with limited access to certain data and stored functions. Such users can get their privileges from the database administrator.
  • Users used in communications between Tarantool instances. For example, such users can be created to maintain replication and sharding in a Tarantool cluster.

There are two built-in users in Tarantool:

  • admin is a user with all available administrative privileges. If the connection uses an admin-console port, the current user is admin. For example, admin is used when connecting to an instance using tt connect locally using the instance name:

    $ tt connect app:instance001
    

    To allow remote binary port connections using the admin user, you need to set a password.

  • guest is a user with minimum privileges used by default for remote binary port connections. For example, guest is used when connecting to an instance using tt connect using the IP address and port without specifying the name of a user:

    $ tt connect 192.168.10.10:3301
    

    Предупреждение

    Given that the guest user allows unauthenticated access to Tarantool instances, it is not recommended to grant additional privileges to this user. For example, granting the execute access to universe allows remote code execution on instances.

Примечание

Information about users is stored in the _user space.

Any user (except guest) may have a password. If a password is not set, a user cannot connect to Tarantool instances.

Tarantool password hashes are stored in the _user system space. By default, Tarantool uses the CHAP protocol to authenticate users and applies SHA-1 hashing to passwords. So, if the password is „123456“, the stored hash is a string like „a7SDfrdDKRBe5FaN2n3GftLKKtk=“. In the Enterprise Edition, you can enable PAP authentication with the SHA256 hashing algorithm.

Tarantool Enterprise Edition allows you to improve database security by enforcing the use of strong passwords, setting up a maximum password age, and so on. Learn more from the Authentication topic.

An object is a securable entity to which access can be granted. Tarantool has a number of objects that enable flexible management of access to data, stored functions, specific actions, and so on.

Below are a few examples of objects:

  • universe represents a database (box.schema) that contains database objects, including spaces, indexes, users, roles, sequences, and functions. Granting privileges to universe gives a user access to any object in a database.
  • space enables granting privileges to user-created or system spaces.
  • function enables granting privileges to functions.

Примечание

The full list of object types is available in the Object types section.

The privileges granted to a user determine which operations the user can perform, for example:

  • The read and write permissions granted to the space object allow a user to read or modify data in the specified space.
  • The create permission granted to the space object allows a user to create new spaces.
  • The execute permission granted to the function object allows a user to execute the specified function.
  • The session permission granted to the universe object allows a user to connect to an instance over IPROTO.
  • The usage permission granted to universe object allows a user to use his privileges on database objects (for example, read, write, and alter space).
  • The alter permission granted to a user allows modifying its own settings, for example, a password.
  • The drop permission granted to a user allows dropping users.

Примечание

The full lists of object types and the permissions supported for them are available in the Permissions and Object types and permissions sections.

Note that some privileges might require read and write access to certain system spaces. For example, the create permission granted to the space object requires read and write permissions to the _space system space. Similarly, granting the ability to create functions requires read and write access to the _func space.

Примечание

Information about privileges is stored in the _priv space.

A role is a container for privileges that can be granted to users. Roles can also be assigned to other roles, creating a role hierarchy.

There are the following built-in roles in Tarantool:

  • super has all available administrative permissions.

  • public has certain read permissions. This role is automatically granted to new users when they are created.

  • replication can be granted to a user used to maintain replication in a cluster.

  • sharding can be granted to a user used to maintain sharding in a cluster.

    Примечание

    The sharding role is created only if an instance is managed using YAML configuration.

Below are a few diagrams that demonstrate how privileges can be granted to a user without and with using roles.

  • In this example, a user gets privileges directly without using roles.

    user1 ── privilege1
        ├─── privilege2
        └─── privilege3
    
  • In this example, a user gets all privileges provided by role1 and specific privileges assigned directly.

    user1 ── role1 ── privilege1
        │        └─── privilege2
        ├─── privilege3
        └─── privilege4
    
  • In this example, role2 is granted to role1. This means that a user with role1 subsequently gets all privileges from both roles role1 and role2.

    user1 ── role1 ── privilege1
        │        ├─── privilege2
        │        └─── role2
        │                 ├─── privilege3
        │                 └─── privilege4
        ├─── privilege5
        └─── privilege6
    

Примечание

Information about roles is stored in the _user space.

An owner of a database object is the user who created it. The owner of the database and the owner of objects that are created initially (the system spaces and the default users) is the admin user.

Owners automatically have privileges for objects they create. They can share these privileges with other users or roles using box.schema.user.grant() and box.schema.role.grant().

Примечание

Information about users who gave the specified privileges is stored in the _priv space.

A session is the state of a connection to Tarantool. The session contains:

  • An integer ID identifying the connection.
  • The current user associated with the connection.
  • The text description of the connected peer.
  • A session’s local state, such as Lua variables and functions.

In Tarantool, a single session can execute multiple concurrent transactions. Each transaction is identified by a unique integer ID, which can be queried at the start of the transaction using box.session.sync().

Примечание

Чтобы отследить все подключения и отключения, можно использовать триггеры соединений и аутентификации.

To create a new user, call box.schema.user.create(). In the example below, a user is created without a password:

box.schema.user.create('testuser')

In this example, the password is specified in the options parameter:

box.schema.user.create('testuser', { password = 'foobar' })

To set or change a user’s password, use box.schema.user.passwd(). In the example below, a user password is set for a currently logged-in user:

box.schema.user.passwd('foobar')

To set the password for the specified user, pass a username and password as shown below:

box.schema.user.passwd('testuser', 'foobar')

Примечание

box.schema.user.password() returns a hash of the specified password.

To grant the specified privileges to a user, use the box.schema.user.grant() function. In the example below, testuser gets read permissions to the writers space and read/write permissions to the books space:

box.schema.user.grant('testuser', 'read', 'space', 'writers')
box.schema.user.grant('testuser', 'read,write', 'space', 'books')

Learn more about granting privileges to different types of objects from Granting privileges.

To check whether the specified user exists, call box.schema.user.exists():

box.schema.user.exists('testuser')
--[[
- true
--]]

To get information about privileges granted to a user, call box.schema.user.info():

box.schema.user.info('testuser')
--[[
- - - execute
    - role
    - public
  - - read
    - space
    - writers
  - - read,write
    - space
    - books
  - - session,usage
    - universe
    -
  - - alter
    - user
    - testuser
--]]

In the example above, testuser has the following privileges:

  • The execute permission to the public role means that this role is assigned to the user.
  • The read permission to the writers space means that the user can read data from this space.
  • The read,write permissions to the books space mean that the user can read and modify data in this space.
  • The session,usage permissions to universe mean the following:
    • session: the user can authenticate over an IPROTO connection.
    • usage: lets the user use their privileges on database objects (for example, read and modify data in a space).
  • The alter permission lets testuser modify its own settings, for example, a password.

To revoke the specified privileges, use the box.schema.user.revoke() function. In the example below, write access to the books space is revoked:

box.schema.user.revoke('testuser', 'write', 'space', 'books')

Revoking the session permission to universe can be used to disallow a user to connect to a Tarantool instance:

box.schema.user.revoke('testuser', 'session', 'universe')

The current user name can be found using box.session.user().

box.session.user()
--[[
- admin
--]]

Текущего пользователя можно изменить:

  • For an admin-console connection: using box.session.su():

    box.session.su('testuser')
    box.session.user()
    --[[
    - testuser
    --]]
    
  • For a binary port connection: using the AUTH protocol command, supported by most clients.

  • For a binary-port connection invoking a stored function with the CALL command: if the SETUID property is enabled for the function, Tarantool temporarily replaces the current user with the function’s creator, with all the creator’s privileges, during function execution.

To drop the specified user, call box.schema.user.drop():

box.schema.user.drop('testuser')

To create a new role, call box.schema.role.create(). In the example below, two roles are created:

box.schema.role.create('books_space_manager')
box.schema.role.create('writers_space_reader')

To grant the specified privileges to a role, use the box.schema.role.grant() function. In the example below, the books_space_manager role gets read and write permissions to the books space:

box.schema.role.grant('books_space_manager', 'read,write', 'space', 'books')

The writers_space_reader role gets read permissions to the writers space:

box.schema.role.grant('writers_space_reader', 'read', 'space', 'writers')

Learn more about granting privileges to different types of objects from Granting privileges.

Примечание

Not all privileges can be granted to roles. Learn more from Permissions.

Roles can be assigned to other roles. In the example below, the newly created all_spaces_manager role gets all privileges granted to books_space_manager and writers_space_reader:

box.schema.role.create('all_spaces_manager')
box.schema.role.grant('all_spaces_manager', 'books_space_manager')
box.schema.role.grant('all_spaces_manager', 'writers_space_reader')

To grant the specified role to a user, use the box.schema.user.grant() function. In the example below, testuser gets privileges granted to the books_space_manager and writers_space_reader roles:

box.schema.user.grant('testuser', 'books_space_manager')
box.schema.user.grant('testuser', 'writers_space_reader')

To check whether the specified role exists, call box.schema.role.exists():

box.schema.role.exists('books_space_manager')
--[[
- true
--]]

To get information about privileges granted to a role, call box.schema.role.info():

box.schema.role.info('books_space_manager')
--[[
- - - read,write
    - space
    - books
--]]

If a role has the execute permission to other roles, this means that these roles are granted to this parent role:

box.schema.role.info('all_spaces_manager')
--[[
- - - execute
    - role
    - books_space_manager
  - - execute
    - role
    - writers_space_reader
--]]

To revoke the specified role from a user, revoke the execute privilege for this role using the box.schema.user.revoke() function. In the example below, the books_space_reader role is revoked from testuser:

box.schema.user.revoke('testuser', 'execute', 'role', 'writers_space_reader')

To revoke role’s privileges, use box.schema.role.revoke().

To drop the specified role, call box.schema.role.drop():

box.schema.role.drop('writers_space_reader')

To grant the specified privileges to a user or role, use the box.schema.user.grant() and box.schema.role.grant() functions, which have similar signatures and accept the same set of arguments. For example, the box.schema.user.grant() signature looks as follows:

box.schema.user.grant(username, permissions, object-type, object-name[, {options}])

In the example below, testuser gets privileges allowing them to create any object of any type:

box.schema.user.grant('testuser','read,write,create','universe')

In this example, testuser can grant access to objects that testuser created:

box.schema.user.grant('testuser','write','space','_priv')

In the example below, testuser gets privileges allowing them to create spaces:

box.schema.user.grant('testuser','create','space')
box.schema.user.grant('testuser','write', 'space', '_schema')
box.schema.user.grant('testuser','write', 'space', '_space')

As you can see, the ability to create spaces also requires write access to certain system spaces.

To allow testuser to drop a space that has associated objects, add the following privileges:

box.schema.user.grant('testuser','create,drop','space')
box.schema.user.grant('testuser','write','space','_schema')
box.schema.user.grant('testuser','write','space','_space')
box.schema.user.grant('testuser','write','space','_space_sequence')
box.schema.user.grant('testuser','read','space','_trigger')
box.schema.user.grant('testuser','read','space','_fk_constraint')
box.schema.user.grant('testuser','read','space','_ck_constraint')
box.schema.user.grant('testuser','read','space','_func_index')

In the example below, testuser gets privileges allowing them to create indexes in the „writers“ space:

box.schema.user.grant('testuser','create,read','space','writers')
box.schema.user.grant('testuser','read,write','space','_space_sequence')
box.schema.user.grant('testuser','write', 'space', '_index')

To allow testuser to alter indexes in the writers space, grant the privileges below. This example assumes that indexes in the writers space are not created by testuser.

box.schema.user.grant('testuser','alter','space','writers')
box.schema.user.grant('testuser','read','space','_space')
box.schema.user.grant('testuser','read','space','_index')
box.schema.user.grant('testuser','read','space','_space_sequence')
box.schema.user.grant('testuser','write','space','_index')

If testuser created indexes in the writers space, granting the following privileges is enough to alter indexes:

box.schema.user.grant('testuser','read','space','_space_sequence')
box.schema.user.grant('testuser','read,write','space','_index')

In this example, testuser gets privileges allowing them to select data from the „writers“ space:

box.schema.user.grant('testuser','read','space','writers')

In this example, testuser is allowed to read and modify data in the „books“ space:

box.schema.user.grant('testuser','read,write','space','books')

In this example, testuser gets privileges to create sequence generators:

box.schema.user.grant('testuser','create','sequence')
box.schema.user.grant('testuser', 'read,write', 'space', '_sequence')

To let testuser drop a sequence, grant them the following privileges:

box.schema.user.grant('testuser','drop','sequence')
box.schema.user.grant('testuser','write','space','_sequence_data')
box.schema.user.grant('testuser','write','space','_sequence')

In this example, testuser is allowed to use the id_seq:next() function with a sequence named „id_seq“:

box.schema.user.grant('testuser','read,write','sequence','id_seq')

In the next example, testuser is allowed to use the id_seq:set() or id_seq:reset() functions with a sequence named „id_seq“:

box.schema.user.grant('testuser','write','sequence','id_seq')

In this example, testuser gets privileges to create functions:

box.schema.user.grant('testuser','create','function')
box.schema.user.grant('testuser','read,write','space','_func')

To let testuser drop a function, grant them the following privileges:

box.schema.user.grant('testuser','drop','function')
box.schema.user.grant('testuser','write','space','_func')

To give the ability to execute a function named „sum“, grant the following privileges:

box.schema.user.grant('testuser','execute','function','sum')

Granting the „execute“ privilege on lua_call permits the user to call any global (accessible via the _G Lua table) user-defined Lua function with the IPROTO_CALL request. To grant permission to any non-persistent function, you need to specify its name when granting the lua_call privilege.

Примечание

The function doesn’t need to be defined at the time privileges are granted, meaning that the access to the function will be provided for the user once this function is defined.

function my_func_1() end
function my_func_2() end
box.cfg({listen = 3301})
box.schema.user.create('alice', {password = 'secret'})
conn = require('net.box').connect(box.cfg.listen, {user = 'alice', password = 'secret'})
box.schema.user.grant('alice', 'execute', 'lua_call', 'my_func_1')
conn:call('my_func_1') -- ok
conn:call('my_func_2') -- access denied
box.schema.user.grant('alice', 'execute', 'lua_call', 'box.session.su')
conn:call('box.session.su', {'admin'}) -- ok

In this example, testuser gets privileges to create other users:

box.schema.user.grant('testuser','create','user')
box.schema.user.grant('testuser', 'read,write', 'space', '_user')
box.schema.user.grant('testuser', 'write', 'space', '_priv')

To let testuser create new roles, grant the following privileges:

box.schema.user.grant('testuser','create','role')
box.schema.user.grant('testuser', 'read,write', 'space', '_user')
box.schema.user.grant('testuser', 'write', 'space', '_priv')

To let testuser execute Lua code, grant the execute privilege to the lua_eval object:

box.schema.user.grant('testuser','execute','lua_eval')

Similarly, executing an arbitrary SQL expression requires the execute privilege to the sql object:

box.schema.user.grant('testuser','execute','sql')

In the example below, the created Lua function is executed on behalf of its creator, even if called by another user.

First, the two spaces (space1 and space2) are created, and a no-password user (private_user) is granted full access to them. Then read_and_modify is defined and private_user becomes this function’s creator. Finally, another user (public_user) is granted access to execute Lua functions created by private_user.

box.schema.space.create('space1')
box.schema.space.create('space2')
box.space.space1:create_index('pk')
box.space.space2:create_index('pk')

box.schema.user.create('private_user')

box.schema.user.grant('private_user', 'read,write', 'space', 'space1')
box.schema.user.grant('private_user', 'read,write', 'space', 'space2')
box.schema.user.grant('private_user', 'create', 'universe')
box.schema.user.grant('private_user', 'read,write', 'space', '_func')

function read_and_modify(key)
  local space1 = box.space.space1
  local space2 = box.space.space2
  local fiber = require('fiber')
  local t = space1:get{key}
  if t ~= nil then
    space1:put{key, box.session.uid()}
    space2:put{key, fiber.time()}
  end
end

box.session.su('private_user')
box.schema.func.create('read_and_modify', {setuid= true})
box.session.su('admin')
box.schema.user.create('public_user', {password = 'secret'})
box.schema.user.grant('public_user', 'execute', 'function', 'read_and_modify')

Whenever public_user calls the function, it is executed on behalf of its creator, private_user.

Object type Description
universe A database (box.schema) that contains database objects, including spaces, indexes, users, roles, sequences, and functions. Granting privileges to universe gives a user access to any object in the database.
user A user.
role A role.
space A space.
function A function.
sequence A sequence.
lua_eval Executing arbitrary Lua code.
lua_call Calling any global user-defined Lua function.
sql Executing an arbitrary SQL expression.

Permission Object type Granted to roles Description
read All Yes Allows reading data of the specified object. For example, this permission can be used to allow a user to select data from the specified space.
write All Yes Allows updating data of the specified object. For example, this permission can be used to allow a user to modify data in the specified space.
create All Yes

Allows creating objects of the specified type. For example, this permission can be used to allow a user to create new spaces.

Note that this permission requires read and write access to certain system spaces.

alter All Yes

Allows altering objects of the specified type.

Note that this permission requires read and write access to certain system spaces.

drop All Yes

Allows dropping objects of the specified type.

Note that this permission requires read and write access to certain system spaces.

execute role, universe, function, lua_eval, lua_call, sql Yes For role, allows using the specified role. For other object types, allows calling a function.
session universe No Allows a user to connect to an instance over IPROTO.
usage universe No Allows a user to use their privileges on database objects (for example, read, write, and alter spaces).

Object type Details
universe
  • read: Allows reading any object types, including all spaces or sequence objects.
  • write: Allows modifying any object types, including all spaces or sequence objects.
  • execute: Allows execute functions, Lua code, or SQL expressions, including IPROTO calls.
  • session: Allows a user to connect to an instance over IPROTO.
  • usage: Allows a user to use their privileges on database objects (for example, read, write, and alter space).
  • create: Allows creating users, roles, functions, spaces, and sequences. This permission requires read and write access to certain system spaces.
  • drop: Allows deleting users, roles, functions, spaces, and sequences. This permission requires read and write access to certain system spaces.
  • alter: Allows altering user settings or space objects.
user
  • alter: Allows modifying a user description, for example, change the password.
  • create: Allows creating new users. This permission requires read and write access to the _user system space.
  • drop: Allows dropping users. This permission requires read and write access to the _user system space.
role
  • execute: Indicates that a role is assigned to the user or another role.
  • create: Allows creating new roles. This permission requires read and write access to the _user system space.
  • drop: Allows dropping roles. This permission requires read and write access to the _user system space.
space
  • read: Allows selecting data from a space.

  • write: Allows modifying data in a space.

  • create: Allows creating new spaces. This permission requires read and write access to the _space system space.

  • drop: Allows dropping spaces. This permission requires read and write access to the _space system space.

  • alter: Allows modifying spaces. This permission requires read and write access to the _space system space.

    If a space is created by a user, they can read and write it without granting explicit permission.

function
  • execute: Allows calling a function.

  • create: Allows creating a function. This permission requires read and write access to the _func system space.

    If a function is created by a user, they can execute it without granting explicit permission.

  • drop: Allows dropping a function. This permission requires read and write access to the _func system space.

sequence
  • read: Allows using sequences in space_obj:create_index().

  • write: Allows all operations for a sequence object.

    seq_obj:drop() requires a write permission to the _priv system space.

  • create: Allows creating sequences. This permission requires read and write access to the _sequence system space.

    If a sequence is created by a user, they can read/write it without explicit permission.

  • drop: Allows dropping sequences. This permission requires read and write access to the _sequence system space.

  • alter: Has no effect. seq_obj:alter() and other methods require the write permission.

lua_eval
  • execute: Allows executing arbitrary Lua code using the IPROTO_EVAL request.
lua_call
  • execute: Allows executing any user-defined function using the IPROTO_CALL request. This permission doesn’t allow a user to call built-in Lua functions (for example, loadstring() or box.session.su()) and functions defined in the _func system space.
sql
  • execute: Allows executing arbitrary SQL expression using the IPROTO_PREPARE and IPROTO_EXECUTE requests.

Replication administration

Мониторинг набора реплик

To learn what instances belong to the replica set and obtain statistics for all these instances, execute a box.info.replication request. The output below shows the replication status for a replica set containing one master and two replicas:

manual_leader:instance001> box.info.replication
---
- 1:
    id: 1
    uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
    lsn: 21
    name: instance001
  2:
    id: 2
    uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
    lsn: 0
    upstream:
      status: follow
      idle: 0.052655000000414
      peer: replicator@127.0.0.1:3302
      lag: 0.00010204315185547
    name: instance002
    downstream:
      status: follow
      idle: 0.09503500000028
      vclock: {1: 21}
      lag: 0.00026917457580566
  3:
    id: 3
    uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6
    lsn: 0
    upstream:
      status: follow
      idle: 0.77522099999987
      peer: replicator@127.0.0.1:3303
      lag: 0.0001838207244873
    name: instance003
    downstream:
      status: follow
      idle: 0.33186100000012
      vclock: {1: 21}
      lag: 0
        ...

The following diagram illustrates the upstream and downstream connections if box.info.replication executed at the master instance (instance001):

replication status on master

If box.info.replication is executed on instance002, the upstream and downstream connections look as follows:

replication status on replica

This means that statistics for replicas are given in regard to the instance on which box.info.replication is executed.

Основные индикаторы работоспособности репликации:

Восстановление после сбоя

«Сбой» – это ситуация, когда мастер становится недоступен вследствие проблем с оборудованием, сетевых неполадок или программной ошибки.

../../../_images/mr-degraded.svg

The master’s upstream status is reported as disconnected when executing box.info.replication on a replica:

auto_leader:instance001> box.info.replication
---
- 1:
    id: 1
    uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
    lsn: 32
    upstream:
      peer: replicator@127.0.0.1:3302
      lag: 0.00032305717468262
      status: disconnected
      idle: 48.352504000002
      message: 'connect, called on fd 20, aka 127.0.0.1:62575: Connection refused'
      system_message: Connection refused
    name: instance002
    downstream:
      status: stopped
      message: 'unexpected EOF when reading from socket, called on fd 32, aka 127.0.0.1:3301,
        peer of 127.0.0.1:62204: Broken pipe'
      system_message: Broken pipe
  2:
    id: 2
    uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
    lsn: 1
    name: instance001
  3:
    id: 3
    uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6
    lsn: 0
    upstream:
      status: follow
      idle: 0.18620999999985
      peer: replicator@127.0.0.1:3303
      lag: 0.00012516975402832
    name: instance003
    downstream:
      status: follow
      idle: 0.19718099999955
      vclock: {2: 1, 1: 32}
      lag: 0.00051403045654297
...

To learn how to perform manual failover in a master-replica set, see the Performing manual failover section.

In a master-replica configuration with automated failover, a new master should be elected automatically.

Перезагрузка реплики

If any of a replica’s write-ahead log or snapshot files are corrupted or deleted, you can reseed the replica. This procedure works only if the master’s write-ahead logs are present.

  1. Stop the replica using the tt stop command.

  2. Delete write-ahead logs and snapshots stored in the var/lib/<instance_name> directory.

    Примечание

    var/lib is the default directory used by tt to store write-ahead logs and snapshots. Learn more from Configuration.

  3. Start the replica using the tt start command. The replica should catch up with the master by retrieving all the master’s tuples.

  4. (Optional) If you’re reseeding a replica after a replication conflict, you also need to restart replication.

Решение конфликтов репликации

Tarantool guarantees that every update is applied only once on every replica. However, due to the asynchronous nature of replication, the order of updates is not guaranteed. This topic describes how to solve problems in master-master replication.

Case 1: You have two instances of Tarantool. For example, you try to make a replace operation with the same primary key on both instances at the same time. This causes a conflict over which tuple to save and which one to discard.

Триггер-функции Тарантула могут помочь в реализации правил разрешения конфликтов при определенных условиях. Например, если у вас есть метка времени, то можно указать, что сохранять нужно кортеж с большей меткой.

First, you need a before_replace() trigger on the space which may have conflicts. In this trigger, you can compare the old and new replica records and choose which one to use (or skip the update entirely, or merge two records together).

Then you need to set the trigger at the right time before the space starts to receive any updates. The way you usually set the before_replace trigger is right when the space is created, so you need a trigger to set another trigger on the system space _space, to capture the moment when your space is created and set the trigger there. This can be an on_replace() trigger.

Разница между before_replace и on_replace заключается в том, что on_replace вызывается после вставки строки в спейс, а before_replace вызывается перед ней.

Устанавливать триггер _space:on_replace() также нужно в определенный момент. Лучшее время для его использования – это когда только что создан _space, что является триггером на box.ctl.on_schema_init().

You also need to utilize box.on_commit to get access to the space being created. The resulting snippet would be the following:

local my_space_name = 'my_space'
local my_trigger = function(old, new) ... end -- ваша функция, устраняющая конфликт
box.ctl.on_schema_init(function()
    box.space._space:on_replace(function(old_space, new_space)
        if not old_space and new_space and new_space.name == my_space_name then
            box.on_commit(function()
                box.space[my_space_name]:before_replace(my_trigger)
            end
        end
    end)
end)

Case 2: In a replica set of two masters, both of them try to insert data by the same unique key:

tarantool> box.space.tester:insert{1, 'data'}

Это вызовет сообщение об ошибке дубликата ключа (Duplicate key exists in unique index 'primary' in space 'tester'), и репликация остановится. Такое поведение системы обеспечивается использованием рекомендуемого значения false (по умолчанию) для конфигурационного параметра replication_skip_conflict.

$ # сообщения об ошибках от мастера №1
2017-06-26 21:17:03.233 [30444] main/104/applier/rep_user@100.96.166.1 I> can't read row
2017-06-26 21:17:03.233 [30444] main/104/applier/rep_user@100.96.166.1 memtx_hash.cc:226 E> ER_TUPLE_FOUND:
Duplicate key exists in unique index 'primary' in space 'tester'
2017-06-26 21:17:03.233 [30444] relay/[::ffff:100.96.166.178]/101/main I> the replica has closed its socket, exiting
2017-06-26 21:17:03.233 [30444] relay/[::ffff:100.96.166.178]/101/main C> exiting the relay loop

$ # сообщения об ошибках от мастера №2
2017-06-26 21:17:03.233 [30445] main/104/applier/rep_user@100.96.166.1 I> can't read row
2017-06-26 21:17:03.233 [30445] main/104/applier/rep_user@100.96.166.1 memtx_hash.cc:226 E> ER_TUPLE_FOUND:
Duplicate key exists in unique index 'primary' in space 'tester'
2017-06-26 21:17:03.234 [30445] relay/[::ffff:100.96.166.178]/101/main I> the replica has closed its socket, exiting
2017-06-26 21:17:03.234 [30445] relay/[::ffff:100.96.166.178]/101/main C> exiting the relay loop

Если мы проверим статус репликации с помощью box.info, то увидим, что репликация на мастере №1 остановлена (1.upstream.status = stopped). Кроме того, данные с этого мастера не реплицируются (группа 1.downstream отсутствует в отчете), поскольку встречается та же ошибка:

# статусы репликации (отчет от мастера №3)
tarantool> box.info
---
- version: 1.7.4-52-g980d30092
  id: 3
  ro: false
  vclock: {1: 9, 2: 1000000, 3: 3}
  uptime: 557
  lsn: 3
  vinyl: []
  cluster:
    uuid: 34d13b1a-f851-45bb-8f57-57489d3b3c8b
  pid: 30445
  status: running
  signature: 1000012
  replication:
    1:
      id: 1
      uuid: 7ab6dee7-dc0f-4477-af2b-0e63452573cf
      lsn: 9
      upstream:
        peer: replicator@192.168.0.101:3301
        lag: 0.00050592422485352
        status: stopped
        idle: 445.8626639843
        message: Duplicate key exists in unique index 'primary' in space 'tester'
    2:
      id: 2
      uuid: 9afbe2d9-db84-4d05-9a7b-e0cbbf861e28
      lsn: 1000000
      upstream:
        status: follow
        idle: 201.99915885925
        peer: replicator@192.168.0.102:3301
        lag: 0.0015020370483398
      downstream:
        vclock: {1: 8, 2: 1000000, 3: 3}
    3:
      id: 3
      uuid: e826a667-eed7-48d5-a290-64299b159571
      lsn: 3
  uuid: e826a667-eed7-48d5-a290-64299b159571
...

To learn how to resolve a replication conflict by reseeding a replica, see Resolving replication conflicts.

Предположим, что мы выполняем следующую операцию в кластере из двух экземпляров с конфигурацией мастер-мастер:

tarantool> box.space.tester:upsert({1}, {{'=', 2, box.info.uuid}})

Когда эта операция применяется на обоих экземплярах в наборе реплик:

# на мастере #1
tarantool> box.space.tester:upsert({1}, {{'=', 2, box.info.uuid}})
# на мастере #2
tarantool> box.space.tester:upsert({1}, {{'=', 2, box.info.uuid}})

… можно получить следующие результаты в зависимости от порядка выполнения:

  • каждая строка мастера содержит UUID из мастера №1,
  • каждая строка мастера содержит UUID из мастера №2,
  • у мастера №1 UUID мастера №2, и наоборот.

The cases described in the previous paragraphs represent examples of non-commutative operations, that is operations whose result depends on the execution order. On the contrary, for commutative operations, the execution order does not matter.

Рассмотрим, например, следующую команду:

tarantool> box.space.tester:upsert{{1, 0}, {{'+', 2, 1)}

Эта операция коммутативна: получаем одинаковый результат, независимо от порядка, в котором обновление применяется на других мастерах.

The logic and the snippet setting a trigger will be the same here as in case 1. But the trigger function will differ. Note that the trigger below assumes that tuple has a timestamp in the second field.

local my_space_name = 'test'
local my_trigger = function(old, new, sp, op)
    -- op:  ‘INSERT’, ‘DELETE’, ‘UPDATE’, or ‘REPLACE’
    if new == nil then
        print("No new during "..op, old)
        return -- удаление допустимо
    end
    if old == nil then
        print("Insert new, no old", new)
        return new  -- вставка без старого значения допустима
    end
    print(op.." duplicate", old, new)
    if op == 'INSERT' then
        if new[2] > old[2] then
            -- Создание нового кортежа сменит оператор на REPLACE
            return box.tuple.new(new)
        end
        return old
    end
    if new[2] > old[2] then
        return new
    else
        return old
    end
    return
end

box.ctl.on_schema_init(function()
    box.space._space:on_replace(function(old_space, new_space)
        if not old_space and new_space and new_space.name == my_space_name then
            box.on_commit(function()
                box.space[my_space_name]:before_replace(my_trigger)
            end)
        end
    end)
end)

Просмотр состояния сервера

Tarantool входит в интерактивный режим, если:

Tarantool выводит приглашение командной строки (например, «tarantool>») – и вы можете посылать запросы. Если использовать Tarantool таким образом, он может выступать клиентом для удаленного сервера, см. простые примеры в Руководстве для начинающих.

The interactive mode is used in the tt utility’s connect command.

You can attach to an instance’s admin console and execute some Lua code using tt:

$ # for local instances:
$ tt connect my_app
   • Connecting to the instance...
   • Connected to /var/run/tarantool/example.control

/var/run/tarantool/my_app.control> 1 + 1
---
- 2
...
/var/run/tarantool/my_app.control>

$ # for local and remote instances:
$ tt connect username:password@127.0.0.1:3306

You can also use tt to execute Lua code on an instance without attaching to its admin console. For example:

$ # executing commands directly from the command line
$ <command> | tt connect my_app -f -
<...>

$ # - OR -

$ # executing commands from a script file
$ tt connect my_app -f script.lua
<...>

Примечание

Alternatively, you can use the console module or the net.box module from a Tarantool server. Also, you can write your client programs with any of the connectors. However, most of the examples in this manual illustrate usage with either tt connect or using the Tarantool server as a client.

To check the instance status, run:

$ tt status my_app

$ # - OR -

$ systemctl status tarantool@my_app

To check the boot log, on systems with systemd, run:

$ journalctl -u tarantool@my_app -n 5

For more specific checks, use the reports provided by functions in the following submodules:

Finally, there is the metrics library, which enables collecting metrics (such as memory usage or number of requests) from Tarantool applications and expose them via various protocols, including Prometheus. Check Monitoring for more details.

Пример

Очень часто администраторам приходится вызывать функцию box.slab.info(), которая показывает подробную статистику по использованию памяти для конкретного экземпляра Tarantool.

tarantool> box.slab.info()
---
- items_size: 228128
  items_used_ratio: 1.8%
  quota_size: 1073741824
  quota_used_ratio: 0.8%
  arena_used_ratio: 43.2%
  items_used: 4208
  quota_used: 8388608
  arena_size: 2325176
  arena_used: 1003632
...

Tarantool занимает память операционной системы, например, когда пользователь вставляет много данных. Можно проверить, сколько памяти занято, выполнив команду (в Linux):

ps -eo args,%mem | grep "tarantool"

Tarantool почти никогда не освобождает эту память, даже если пользователь удалит все, что было вставлено, или уменьшит фрагментацию, вызвав сборщик мусора в Lua с помощью функции collectgarbage.

Как правило, это не влияет на производительность. Однако, чтобы заставить Tarantool высвободить память, можно вызвать :box.snapshot(), остановить экземпляр и перезапустить его.

Inspecting binary traffic is a boring task. We offer a Wireshark plugin to simplify the analysis of Tarantool’s traffic.

To enable the plugin, follow the steps below.

Clone the tarantool-dissector repository:

git clone https://github.com/tarantool/tarantool-dissector.git

Copy or symlink the plugin files into the Wireshark plugin directory:

mkdir -p ~/.local/lib/wireshark/plugins
cd ~/.local/lib/wireshark/plugins
ln -s /path/to/tarantool-dissector/MessagePack.lua ./
ln -s /path/to/tarantool-dissector/tarantool.dissector.lua ./

(For the location of the plugin directory on macOS and Windows, please refer to the Plugin folders chapter in the Wireshark documentation.)

Run the Wireshark GUI and ensure that the plugins are loaded:

Now you can inspect incoming and outgoing Tarantool packets with user-friendly annotations.

Visit the project page for details: https://github.com/tarantool/tarantool-dissector.

Иногда Tarantool может работать медленнее, чем обычно. Причин такого поведения может быть несколько: проблемы с диском, Lua-скрипты, активно использующие процессор, или неправильная настройка. В таких случаях в журнале Tarantool’а могут отсутствовать необходимые подробности, поэтому единственным признаком неправильного поведения является наличие в журнале записей вида W> too long DELETE: 8.546 sec. Ниже приведены инструменты и приемы, которые облегчают снятие профиля производительности Tarantool’а. Эта процедура может помочь при решении проблем с замедлением.

Примечание

Большинство инструментов, за исключением fiber.info(), предназначено для дистрибутивов GNU/Linux, но не для FreeBSD или Mac OS.

Самый простой способ профилирования – это использование встроенных функций Tarantool’а. fiber.info() возвращает информацию обо всех работающих файберах с соответствующей трассировкой стека для языка C. Эти данные показывают, сколько файберов запущенно на данный момент и какие функции, написанные на C, вызываются чаще остальных.

Сначала войдите в интерактивную административную консоль вашего экземпляра Tarantool’а:

$ tt connect NAME|URI

После этого загрузите модуль fiber:

tarantool> fiber = require('fiber')

Теперь можно получить необходимую информацию с помощью fiber.info().

На этом шаге в вашей консоли должно выводиться следующее:

tarantool> fiber = require('fiber')
---
...
tarantool> fiber.info()
---
- 360:
    csw: 2098165
    backtrace:
    - '#0 0x4d1b77 in wal_write(journal*, journal_entry*)+487'
    - '#1 0x4bbf68 in txn_commit(txn*)+152'
    - '#2 0x4bd5d8 in process_rw(request*, space*, tuple**)+136'
    - '#3 0x4bed48 in box_process1+104'
    - '#4 0x4d72f8 in lbox_replace+120'
    - '#5 0x50f317 in lj_BC_FUNCC+52'
    fid: 360
    memory:
      total: 61744
      used: 480
    name: main
  129:
    csw: 113
    backtrace: []
    fid: 129
    memory:
      total: 57648
      used: 0
    name: 'console/unix/:'
...

Мы рекомендуем присваивать создаваемым файберам понятные имена, чтобы их можно было легко найти в списке, выводимом fiber.info(). В примере ниже создается файбер с именем myworker:

tarantool> fiber = require('fiber')
---
...
tarantool> f = fiber.create(function() while true do fiber.sleep(0.5) end end)
---
...
tarantool> f:name('myworker') <!-- присваивание имени файберу
---
...
tarantool> fiber.info()
---
- 102:
    csw: 14
    backtrace:
    - '#0 0x501a1a in fiber_yield_timeout+90'
    - '#1 0x4f2008 in lbox_fiber_sleep+72'
    - '#2 0x5112a7 in lj_BC_FUNCC+52'
    fid: 102
    memory:
      total: 57656
      used: 0
    name: myworker <!-- новый созданный фоновый файбер
  101:
    csw: 284
    backtrace: []
    fid: 101
    memory:
      total: 57656
      used: 0
    name: interactive
...

Для принудительного завершения файбера используется команда fiber.kill(fid):

tarantool> fiber.kill(102)
---
...
tarantool> fiber.info()
---
- 101:
    csw: 324
    backtrace: []
    fid: 101
    memory:
      total: 57656
      used: 0
    name: interactive
...

Чтобы получить таблицу всех рабочих файберов, можно использовать fiber.top().

Если вам необходимо динамически получать информацию с помощью fiber.info(), вам может пригодиться приведенный ниже скрипт. Он каждые полсекунды подключается к экземпляру Tarantool’а, указанному в переменной NAME, выполняет команду fiber.info() и записывает ее выход в файл fiber-info.txt:

$ rm -f fiber.info.txt
$ watch -n 0.5 "echo 'require(\"fiber\").info()' | tt connect NAME -f - | tee -a fiber-info.txt"

Если вы не можете самостоятельно разобраться, какой именно файбер вызывает проблемы с производительностью, запустите данный скрипт на 10-15 секунд и пришлите получившийся файл команде Tarantool’а на адрес support@tarantool.org.

pstack <pid>

Чтобы использовать этот инструмент, его необходимо установить с помощью пакетного менеджера, поставляемого с вашим дистрибутивом Linux. Данная команда выводит трассировку стека выполнения для работающего процесса с соответствующим PID. При необходимости команду можно запустить несколько раз, чтобы выявить узкое место, которое вызывает падение производительности.

После установки воспользуйтесь следующей командой:

$ pstack $(pidof tarantool INSTANCENAME.lua)

Затем выполните:

$ echo $(pidof tarantool INSTANCENAME.lua)

чтобы вывести на экран PID экземпляра Tarantool’а, использующего файл INSTANCENAME.lua.

В вашей консоли должно отображаться приблизительно следующее:

Thread 19 (Thread 0x7f09d1bff700 (LWP 24173)):
#0 0x00007f0a1a5423f2 in ?? () from /lib64/libgomp.so.1
#1 0x00007f0a1a53fdc0 in ?? () from /lib64/libgomp.so.1
#2 0x00007f0a1ad5adc5 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f0a1a050ced in clone () from /lib64/libc.so.6
Thread 18 (Thread 0x7f09d13fe700 (LWP 24174)):
#0 0x00007f0a1a5423f2 in ?? () from /lib64/libgomp.so.1
#1 0x00007f0a1a53fdc0 in ?? () from /lib64/libgomp.so.1
#2 0x00007f0a1ad5adc5 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f0a1a050ced in clone () from /lib64/libc.so.6
<...>
Thread 2 (Thread 0x7f09c8bfe700 (LWP 24191)):
#0 0x00007f0a1ad5e6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x000000000045d901 in wal_writer_pop(wal_writer*) ()
#2 0x000000000045db01 in wal_writer_f(__va_list_tag*) ()
#3 0x0000000000429abc in fiber_cxx_invoke(int (*)(__va_list_tag*), __va_list_tag*) ()
#4 0x00000000004b52a0 in fiber_loop ()
#5 0x00000000006099cf in coro_init ()
Thread 1 (Thread 0x7f0a1c47fd80 (LWP 24172)):
#0 0x00007f0a1a0512c3 in epoll_wait () from /lib64/libc.so.6
#1 0x00000000006051c8 in epoll_poll ()
#2 0x0000000000607533 in ev_run ()
#3 0x0000000000428e13 in main ()

gdb -ex «bt» -p <pid>

Как и в случае с pstack, перед использованием GNU-отладчик (также известный как gdb) необходимо сначала установить через пакетный менеджер, встроенный в ваш дистрибутив Linux.

После установки воспользуйтесь следующей командой:

$ gdb -ex "set pagination 0" -ex "thread apply all bt" --batch -p $(pidof tarantool INSTANCENAME.lua)

Затем выполните:

$ echo $(pidof tarantool INSTANCENAME.lua)

чтобы вывести на экран PID экземпляра Tarantool’а, использующего файл INSTANCENAME.lua.

После использования отладчика в консоль должна выводиться следующая информация:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

[CUT]

Thread 1 (Thread 0x7f72289ba940 (LWP 20535)):
#0 _int_malloc (av=av@entry=0x7f7226e0eb20 <main_arena>, bytes=bytes@entry=504) at malloc.c:3697
#1 0x00007f7226acf21a in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3234
#2 0x00000000004631f8 in vy_merge_iterator_reserve (capacity=3, itr=0x7f72264af9e0) at /usr/src/tarantool/src/box/vinyl.c:7629
#3 vy_merge_iterator_add (itr=itr@entry=0x7f72264af9e0, is_mutable=is_mutable@entry=true, belong_range=belong_range@entry=false) at /usr/src/tarantool/src/box/vinyl.c:7660
#4 0x00000000004703df in vy_read_iterator_add_mem (itr=0x7f72264af990) at /usr/src/tarantool/src/box/vinyl.c:8387
#5 vy_read_iterator_use_range (itr=0x7f72264af990) at /usr/src/tarantool/src/box/vinyl.c:8453
#6 0x000000000047657d in vy_read_iterator_start (itr=<optimized out>) at /usr/src/tarantool/src/box/vinyl.c:8501
#7 0x00000000004766b5 in vy_read_iterator_next (itr=itr@entry=0x7f72264af990, result=result@entry=0x7f72264afad8) at /usr/src/tarantool/src/box/vinyl.c:8592
#8 0x000000000047689d in vy_index_get (tx=tx@entry=0x7f7226468158, index=index@entry=0x2563860, key=<optimized out>, part_count=<optimized out>, result=result@entry=0x7f72264afad8) at /usr/src/tarantool/src/box/vinyl.c:5705
#9 0x0000000000477601 in vy_replace_impl (request=<optimized out>, request=<optimized out>, stmt=0x7f72265a7150, space=0x2567ea0, tx=0x7f7226468158) at /usr/src/tarantool/src/box/vinyl.c:5920
#10 vy_replace (tx=0x7f7226468158, stmt=stmt@entry=0x7f72265a7150, space=0x2567ea0, request=<optimized out>) at /usr/src/tarantool/src/box/vinyl.c:6608
#11 0x00000000004615a9 in VinylSpace::executeReplace (this=<optimized out>, txn=<optimized out>, space=<optimized out>, request=<optimized out>) at /usr/src/tarantool/src/box/vinyl_space.cc:108
#12 0x00000000004bd723 in process_rw (request=request@entry=0x7f72265a70f8, space=space@entry=0x2567ea0, result=result@entry=0x7f72264afbc8) at /usr/src/tarantool/src/box/box.cc:182
#13 0x00000000004bed48 in box_process1 (request=0x7f72265a70f8, result=result@entry=0x7f72264afbc8) at /usr/src/tarantool/src/box/box.cc:700
#14 0x00000000004bf389 in box_replace (space_id=space_id@entry=513, tuple=<optimized out>, tuple_end=<optimized out>, result=result@entry=0x7f72264afbc8) at /usr/src/tarantool/src/box/box.cc:754
#15 0x00000000004d72f8 in lbox_replace (L=0x413c5780) at /usr/src/tarantool/src/box/lua/index.c:72
#16 0x000000000050f317 in lj_BC_FUNCC ()
#17 0x00000000004d37c7 in execute_lua_call (L=0x413c5780) at /usr/src/tarantool/src/box/lua/call.c:282
#18 0x000000000050f317 in lj_BC_FUNCC ()
#19 0x0000000000529c7b in lua_cpcall ()
#20 0x00000000004f6aa3 in luaT_cpcall (L=L@entry=0x413c5780, func=func@entry=0x4d36d0 <execute_lua_call>, ud=ud@entry=0x7f72264afde0) at /usr/src/tarantool/src/lua/utils.c:962
#21 0x00000000004d3fe7 in box_process_lua (handler=0x4d36d0 <execute_lua_call>, out=out@entry=0x7f7213020600, request=request@entry=0x413c5780) at /usr/src/tarantool/src/box/lua/call.c:382
#22 box_lua_call (request=request@entry=0x7f72130401d8, out=out@entry=0x7f7213020600) at /usr/src/tarantool/src/box/lua/call.c:405
#23 0x00000000004c0f27 in box_process_call (request=request@entry=0x7f72130401d8, out=out@entry=0x7f7213020600) at /usr/src/tarantool/src/box/box.cc:1074
#24 0x000000000041326c in tx_process_misc (m=0x7f7213040170) at /usr/src/tarantool/src/box/iproto.cc:942
#25 0x0000000000504554 in cmsg_deliver (msg=0x7f7213040170) at /usr/src/tarantool/src/cbus.c:302
#26 0x0000000000504c2e in fiber_pool_f (ap=<error reading variable: value has been optimized out>) at /usr/src/tarantool/src/fiber_pool.c:64
#27 0x000000000041122c in fiber_cxx_invoke(fiber_func, typedef __va_list_tag __va_list_tag *) (f=<optimized out>, ap=<optimized out>) at /usr/src/tarantool/src/fiber.h:645
#28 0x00000000005011a0 in fiber_loop (data=<optimized out>) at /usr/src/tarantool/src/fiber.c:641
#29 0x0000000000688fbf in coro_init () at /usr/src/tarantool/third_party/coro/coro.c:110

Запустите отладчик в цикле, чтобы собрать достаточно информации, которая поможет установить причину спада производительности Tarantool’а. Можно воспользоваться следующим скриптом:

$ rm -f stack-trace.txt
$ watch -n 0.5 "gdb -ex 'set pagination 0' -ex 'thread apply all bt' --batch -p $(pidof tarantool INSTANCENAME.lua) | tee -a stack-trace.txt"

С точки зрения структуры и функциональности, этот скрипт идентичен тому, что используется выше с fiber.info().

Если вам не удается отыскать причину пониженной производительности, запустите данный скрипт на 10-15 секунд и пришлите получившийся файл stack-trace.txt команде Tarantool’а на адрес support@tarantool.org.

Предупреждение

Следует использовать pstack и gdb с осторожностью: каждый раз, подключаясь с работающему процессу, они приостанавливают выполнение этого процесса приблизительно на одну секунду, что может иметь серьезные последствия для высоконагруженных сервисов.

Чтобы использовать профилировщик процессора из набора Google Performance Tools с Tarantool’ом, необходимо сначала установить зависимости:

  • Если вы используете Debian/Ubuntu, запустите эту команду:
$ apt-get install libgoogle-perftools4
  • Если вы используете RHEL/CentOS/Fedora, запустите эту команду:
$ yum install gperftools-libs

После этого установите привязки для Lua:

$ tt rocks install gperftools

После окончания установки войдите в интерактивную административную консоль вашего экземпляра Tarantool’а:

$ tt connect NAME|URI

Для запуска профилировщика выполните следующий код:

tarantool> cpuprof = require('gperftools.cpu')
tarantool> cpuprof.start('/home/<имя_пользователя>/tarantool-on-production.prof')

На сбор метрик производительности у профилировщика уходит по крайней мере пара минут. По истечении этого времени можно сохранять информацию на диск (неограниченное количество раз):

tarantool> cpuprof.flush()

Для остановки профилировщика выполните следующую команду:

tarantool> cpuprof.stop()

Теперь можно проанализировать собранные данные с помощью утилиты pprof, которая входит в пакет gperftools:

$ pprof --text /usr/bin/tarantool /home/<имя_пользователя>/tarantool-on-production.prof

Примечание

В дистрибутивах Debian/Ubuntu утилита pprof называется google-pprof.

В консоль должно выводиться приблизительно следующее:

Total: 598 samples
      83 13.9% 13.9% 83 13.9% epoll_wait
      54 9.0% 22.9% 102 17.1%
vy_mem_tree_insert.constprop.35
      32 5.4% 28.3% 34 5.7% __write_nocancel
      28 4.7% 32.9% 42 7.0% vy_mem_iterator_start_from
      26 4.3% 37.3% 26 4.3% _IO_str_seekoff
      21 3.5% 40.8% 21 3.5% tuple_compare_field
      19 3.2% 44.0% 19 3.2%
::TupleCompareWithKey::compare
      19 3.2% 47.2% 38 6.4% tuple_compare_slowpath
      12 2.0% 49.2% 23 3.8% __libc_calloc
       9 1.5% 50.7% 9 1.5%
::TupleCompare::compare@42efc0
       9 1.5% 52.2% 9 1.5% vy_cache_on_write
       9 1.5% 53.7% 57 9.5% vy_merge_iterator_next_key
       8 1.3% 55.0% 8 1.3% __nss_passwd_lookup
       6 1.0% 56.0% 25 4.2% gc_onestep
       6 1.0% 57.0% 6 1.0% lj_tab_next
       5 0.8% 57.9% 5 0.8% lj_alloc_malloc
       5 0.8% 58.7% 131 21.9% vy_prepare

Этот инструмент для мониторинга и анализа производительности устанавливается отдельно с помощью пакетного менеджера. Попробуйте ввести в окне консоли команду perf и следуйте подсказкам, чтобы установить необходимые пакеты.

Примечание

По умолчанию некоторые команды из пакета perf можно выполнять только с root-правами, поэтому необходимо либо зайти в систему из-под пользователя root, либо добавлять перед каждой командой sudo.

Чтобы начать сбор показателей производительности, выполните следующую команду:

$ perf record -g -p $(pidof tarantool INSTANCENAME.lua)

Эта команда сохраняет собранные данные в файл perf.data, который находится в текущей рабочей папке. Для остановки процесса (обычно через 10-15 секунд) нажмите ctrl+C. В консоли должно появиться следующее:

^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.225 MB perf.data (1573 samples) ]

Затем выполните эту команду:

$ perf report -n -g --stdio | tee perf-report.txt

Она превращает содержащиеся в perf.data статистические данные в отчет о производительности, который сохраняется в файл perf-report.txt.

Получившийся отчет выглядит следующим образом:

# Samples: 14K of event 'cycles'
# Event count (approx.): 9927346847
#
# Children Self Samples Command Shared Object Symbol
# ........ ........ ............ ......... .................. .......................................
#
    35.50% 0.55% 79 tarantool tarantool [.] lj_gc_step
            |
             --34.95%--lj_gc_step
                       |
                       |--29.26%--gc_onestep
                       | |
                       | |--13.85%--gc_sweep
                       | | |
                       | | |--5.59%--lj_alloc_free
                       | | |
                       | | |--1.33%--lj_tab_free
                       | | | |
                       | | | --1.01%--lj_alloc_free
                       | | |
                       | | --1.17%--lj_cdata_free
                       | |
                       | |--5.41%--gc_finalize
                       | | |
                       | | |--1.06%--lj_obj_equal
                       | | |
                       | | --0.95%--lj_tab_set
                       | |
                       | |--4.97%--rehashtab
                       | | |
                       | | --3.65%--lj_tab_resize
                       | | |
                       | | |--0.74%--lj_tab_set
                       | | |
                       | | --0.72%--lj_tab_newkey
                       | |
                       | |--0.91%--propagatemark
                       | |
                       | --0.67%--lj_cdata_free
                       |
                        --5.43%--propagatemark
                                  |
                                   --0.73%--gc_mark

Инструменты gperftools и perf отличаются от pstack и gdb низкой затратой ресурсов (пренебрежимо малой по сравнению с pstack и gdb): они подключаются к работающим процессам без больших задержек, а потому могут использоваться без серьезных последствий.

Профилировщик «jit.p» входит в комплект сервера приложений Tarantool. Чтобы загрузить его, выполните команду require('jit.p') или require('jit.profile'). Есть много параметров для настройки выборки и вывода, они описаны в документации по профилировщику LuaJIT, которая доступна в репозитории LuaJIT на GitHub в ветке 2.1 в файле: doc/ext_profiler.html.

Пример

Создайте функцию для вызова функции под названием f1, которая осуществляет 500 000 вставок и удалений в спейсе Tarantool. Запустите профилировщик, выполните функцию, завершите работу профилировщика. Получите результат выборки профилировщика.

box.space.t:drop()
box.schema.space.create('t')
box.space.t:create_index('i')
function f1() for i = 1,500000 do
  box.space.t:insert{i}
  box.space.t:delete{i}
  end
return 1
end
function f3() f1() end
jit_p = require("jit.profile")
sampletable = {}
jit_p.start("f", function(thread, samples, vmstate)
  local dump=jit_p.dumpstack(thread, "f", 1)
  sampletable[dump] = (sampletable[dump] or 0) + samples
end)
f3()
jit_p.stop()
for d,v in pairs(sampletable) do print(v, d) end

Как правило, результат покажет, что выборка многократно осуществлялась в рамках f1(), а также в рамках внутренних функций Tarantool, имена которых могут изменяться с каждой новой версией.

Контроль за фоновыми программами

Во время событийного цикла в потоке обработки транзакций Tarantool обрабатывает следующие сигналы:

Сигнал Эффект
SIGHUP Может привести к ротации журналов, см. пример в справочнике по параметрам журналирования Tarantool.
SIGUSR1 Может привести к созданию снимка состояния базы данных, см. описание функции Функция box.snapshot.
SIGTERM Может привести к корректному завершению работы (с предварительным сохранением всех данных).
SIGINT (или «прерывание от клавиатуры») Может привести к корректному завершению работы.
SIGKILL Приводит к аварийному завершению работы.

Остальные сигналы приводят к заданному операционной системой поведению. Все сигналы, за исключением SIGKILL, можно игнорировать, особенно если Tarantool выполняет длительную процедуру и не может вернуться в событийный цикл в потоке обработки транзакций.

На платформах, где доступна утилита systemd, systemd автоматически перезагружает все экземпляры Tarantool при сбое. Чтобы продемонстрировать это, отключим один из экземпляров:

$ systemctl status tarantool@my_app|grep PID
Main PID: 5885 (tarantool)
$ tt connect my_app
   • Connecting to the instance...
   • Connected to /var/run/tarantool/my_app.control
/var/run/tarantool/my_app.control> os.exit(-1)
   ⨯ Connection was closed. Probably instance process isn't running anymore

А теперь убедимся, что systemd перезапустила его:

$ systemctl status tarantool@my_app|grep PID
Main PID: 5914 (tarantool)

Additionally, you can find the information about the instance restart in the boot logs:

$ journalctl -u tarantool@my_app -n 8

Tarantool создает дамп памяти при получении одного из следующих сигналов: SIGSEGV, SIGFPE, SIGABRT или SIGQUIT. При сбое Tarantool дамп создается автоматически.

На платформах, где доступна утилита systemd, coredumpctl автоматически сохраняет дампы памяти и трассировку стека при аварийном завершении Tarantool-сервера. Вот как включить создание дампов памяти в Unix-системе:

  1. Убедитесь, что лимиты для сессии установлены таким образом, чтобы можно было создавать дампы памяти, – выполните команду ulimit -c unlimited. Также проверьте «man 5 core» на другие причины, по которым дамп памяти может не создаваться.
  2. Создайте директорию для записи дампов памяти и убедитесь, что в эту директорию действительно можно производить запись. На Linux путь до директории задается в параметре ядра, который настраивается через /proc/sys/kernel/core_pattern.
  3. Убедитесь, что дампы памяти включают трассировку стека. При использовании бинарного дистрибутива Tarantool эта информация включается автоматически. При сборке Tarantool из исходников, если передать CMake флаг -DCMAKE_BUILD_TYPE=Release, вы не получите подробной информации.

Для симуляции сбоя можно попытаться выполнить нелегальную команду на работающем экземпляре Tarantool:

$ # !!! please never do this on a production system !!!
$ tt connect my_app
   • Connecting to the instance...
   • Connected to /var/run/tarantool/my_app.control
/var/run/tarantool/my_app.control> require('ffi').cast('char *', 0)[0] = 48
   ⨯ Connection was closed. Probably instance process isn't running anymore

Есть другой способ: если вы знаете PID экземпляра ($PID в нашем примере), можно остановить этот экземпляр, запустив отладчик gdb:

$ gdb -batch -ex "generate-core-file" -p $PID

или послав вручную сигнал SIGABRT:

$ kill -SIGABRT $PID

Примечание

Чтобы узнать PID экземпляра, можно:

  • посмотреть его с помощью box.info.pid,
  • использовать команду ps -A | grep tarantool, или
  • выполнить systemctl status tarantool@my_app|grep PID.

Чтобы посмотреть на последние сбои Tarantool-демона на платформах, где доступна утилита systemd, выполните команду:

$ coredumpctl list /usr/bin/tarantool
MTIME                            PID   UID   GID SIG PRESENT EXE
Sat 2016-01-23 15:21:24 MSK   20681  1000  1000   6   /usr/bin/tarantool
Sat 2016-01-23 15:51:56 MSK   21035   995   992   6   /usr/bin/tarantool

Чтобы сохранить дамп памяти в файл, выполните команду:

$ coredumpctl -o filename.core info <pid>

Так как Tarantool хранит кортежи в памяти, файлы с дампами памяти могут быть довольно большими. Чтобы найти проблему, обычно целый файл не нужен – достаточно только «трассировки стека» или «обратной трассировки».

Чтобы сохранить трассировку стека в файл, выполните команду:

$ gdb -se "tarantool" -ex "bt full" -ex "thread apply all bt" --batch -c core> /tmp/tarantool_trace.txt

где:

Примечание

Иногда может оказаться, что файл с трассировкой стека не содержит отладочных символов – в таких строках вместо имени будет стоять ”??”. Если это произошло, ознакомьтесь с инструкциями на этих двух wiki-страницах Tarantool: How to debug core dump of stripped tarantool и How to debug core from different OS.

Чтобы получить трассировку стека и прочую полезную информацию в консоли, выполните команду:

$ coredumpctl info 21035
          PID: 21035 (tarantool)
          UID: 995 (tarantool)
          GID: 992 (tarantool)
       Signal: 6 (ABRT)
    Timestamp: Sat 2016-01-23 15:51:42 MSK (4h 36min ago)
 Command Line: tarantool my_app.lua <running>
   Executable: /usr/bin/tarantool
Control Group: /system.slice/system-tarantool.slice/tarantool@my_app.service
         Unit: tarantool@my_app.service
        Slice: system-tarantool.slice
      Boot ID: 7c686e2ef4dc4e3ea59122757e3067e2
   Machine ID: a4a878729c654c7093dc6693f6a8e5ee
     Hostname: localhost.localdomain
      Message: Process 21035 (tarantool) of user 995 dumped core.

               Stack trace of thread 21035:
               #0  0x00007f84993aa618 raise (libc.so.6)
               #1  0x00007f84993ac21a abort (libc.so.6)
               #2  0x0000560d0a9e9233 _ZL12sig_fatal_cbi (tarantool)
               #3  0x00007f849a211220 __restore_rt (libpthread.so.0)
               #4  0x0000560d0aaa5d9d lj_cconv_ct_ct (tarantool)
               #5  0x0000560d0aaa687f lj_cconv_ct_tv (tarantool)
               #6  0x0000560d0aaabe33 lj_cf_ffi_meta___newindex (tarantool)
               #7  0x0000560d0aaae2f7 lj_BC_FUNCC (tarantool)
               #8  0x0000560d0aa9aabd lua_pcall (tarantool)
               #9  0x0000560d0aa71400 lbox_call (tarantool)
               #10 0x0000560d0aa6ce36 lua_fiber_run_f (tarantool)
               #11 0x0000560d0a9e8d0c _ZL16fiber_cxx_invokePFiP13__va_list_tagES0_ (tarantool)
               #12 0x0000560d0aa7b255 fiber_loop (tarantool)
               #13 0x0000560d0ab38ed1 coro_init (tarantool)
               ...

Для запуска отладчика gdb, выполните команду:

$ coredumpctl gdb <pid>

Мы очень рекомендуем установить пакет tarantool-debuginfo, чтобы сделать отладку средствами gdb более эффективной. Например:

$ dnf debuginfo-install tarantool

С помощью gdb можно узнать, какие еще debuginfo-пакеты нужно установить:

$ gdb -p <pid>
...
Missing separate debuginfos, use: dnf debuginfo-install
glibc-2.22.90-26.fc24.x86_64 krb5-libs-1.14-12.fc24.x86_64
libgcc-5.3.1-3.fc24.x86_64 libgomp-5.3.1-3.fc24.x86_64
libselinux-2.4-6.fc24.x86_64 libstdc++-5.3.1-3.fc24.x86_64
libyaml-0.1.6-7.fc23.x86_64 ncurses-libs-6.0-1.20150810.fc24.x86_64
openssl-libs-1.0.2e-3.fc24.x86_64

В трассировке стека присутствуют символические имена, даже если у вас не установлен пакет tarantool-debuginfo.

Аварийное восстановление

The minimal fault-tolerant Tarantool configuration would be a replica set that includes a master and a replica, or two masters. The basic recommendation is to configure all Tarantool instances in a replica set to create snapshot files on a regular basis.

Here are action plans for typical crash scenarios.

Configuration: master-replica (manual failover).

Problem: The master has crashed.

Actions:

  1. Ensure the master is stopped. For example, log in to the master machine and use tt stop.
  2. Configure a new replica set leader using the <replicaset_name>.leader option.
  3. Reload configuration on all instances using config:reload().
  4. Make sure that a new replica set leader is a master using box.info.ro.
  5. On a new master, remove a crashed instance from the „_cluster“ space.
  6. Set up a replacement for the crashed master on a spare host.

See also: Performing manual failover.

Configuration: master-replica (automated failover).

Problem: The master has crashed.

Actions:

  1. Use box.info.election to make sure a new master is elected automatically.
  2. On a new master, remove a crashed instance from the „_cluster“ space.
  3. Set up a replacement for the crashed master on a spare host.

See also: Testing automated failover.

Configuration: master-replica.

Problem: Some transactions are missing on a replica after the master has crashed.

Actions:

You lose a few transactions in the master write-ahead log file, which may have not transferred to the replica before the crash. If you were able to salvage the master .xlog file, you may be able to recover these.

  1. Посмотрите UUID экземпляра в xlog-файле вышедшего из строя мастера:

    $ head -5 var/lib/instance001/*.xlog | grep Instance
    Instance: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
    
  2. Используйте этот UUID на новом мастере для поиска позиции:

    app:instance002> box.info.vclock[box.space._cluster.index.uuid:select{'9bb111c2-3ff5-36a7-00f4-2b9a573ea660'}[1][1]]
    ---
    - 999
    ...
    
  3. Play the records from the crashed .xlog to the new master, starting from the new master position:

    $ tt play 127.0.0.1:3302 var/lib/instance001/00000000000000000000.xlog \
              --from 1000 \
              --replica 1 \
              --username admin --password secret
    

Configuration: master-master.

Problem: one master has crashed.

Actions:

  1. Let the load be handled by another master alone.
  2. Remove a crashed master from a replica set.
  3. Set up a replacement for the crashed master on a spare host. Learn more from Adding and removing instances.

Configuration: master-replica or master-master.

Problem: Data was deleted at one master and this data loss was propagated to the other node (master or replica).

Actions:

  1. Put all nodes in read-only mode. Depending on the replication.failover mode, this can be done as follows:

    • manual: change a replica set leader to null.
    • election: set replication.election_mode to voter or off at the replica set level.
    • off: set database.mode to ro.

    Reload configurations on all instances using the reload() function provided by the config module.

  2. Turn off deletion of expired checkpoints with box.backup.start(). This prevents the Tarantool garbage collector from removing files made with older checkpoints until box.backup.stop() is called.

  3. Get the latest valid .snap file and use tt cat command to calculate at which LSN the data loss occurred.

  4. Start a new instance and use tt play command to play to it the contents of .snap and .xlog files up to the calculated LSN.

  5. Bootstrap a new replica from the recovered master.

Примечание

The steps above are applicable only to data in the memtx storage engine.

Резервное копирование

Архитектура Tarantool-хранилища позволяет производить обновление только путем присоединения новых записей: сами файлы никогда не перезаписываются. Сборщик мусора Tarantool удаляет старые файлы после определенной контрольной точки. В настройках демона создания контрольных точек можно отложить или запретить работу сборщика мусора. Резервное копирование может проводиться в любое время с минимальной затратой ресурсов.

Для резервного копирования в определенных ситуациях используются две функции:

Это особый случай, когда все таблицы хранятся в памяти.

Последний созданный Tarantool файл-снимок является резервной копией всей базы данных; а WAL-файлы, созданные следом за последним файлом-снимком, являются инкрементными копиями. Поэтому процедура резервного копирования сводится к копированию последнего файла-снимка и следующих за ним WAL-файлов.

  1. Use tar to make a (possibly compressed) copy of the latest .snap and .xlog files on the snapshot.dir and wal.dir directories.
  2. Если того требуют правила безопасности, зашифруйте получившийся .tar-файл.
  3. Скопируйте .tar-файл в надежное место.

Later, restoring the database is a matter of taking the .tar file and putting its contents back in the snapshot.dir and wal.dir directories.

Vinyl хранит свои файлы в vinyl_dir и создает для каждого спейса в базе данных отдельную поддиректорию. Создание дампов и слияние – это процессы, которые могут лишь добавлять записи, поэтому в результате создаются новые файлы. Сборщик мусора Tarantool может удалять старые файлы после каждой контрольной точки.

Для создания смешанной резервной копии:

  1. Выполните команду box.backup.start() в административной консоли. Эта команда покажет список файлов для резервного копирования и приостановит сборку мусора до следующего вызова box.backup.stop().
  2. Скопируйте файлы из списка в надежное место. Это касается файлов-снимков memtx, выполняемых vinyl-файлов и индексных файлов, соответствующих последней контрольной точке.
  3. Выполните команду box.backup.stop(), чтобы сборщик мусора мог продолжить работу.

Репликация обеспечивает резервное копирование и помогает балансировать нагрузку.

Поэтому процесс создания резервной копии сводится к обновлению (при необходимости) одной из реплик с последующим холодным резервным копированием. Так как все остальные реплики продолжают функционировать, с точки зрения конечного пользователя, этот процесс не является холодным резервным копированием. Такое копирование можно выполнять регулярно с помощью планировщика cron или файбера Tarantool.

По ходу работы системы необходимо сохранять записи об изменениях, внесенных со времени последнего холодного резервного копирования.

Для этого нужна специальная утилита для копирования файлов (например, rsync), которая позволит удаленно и на постоянной основе копировать только изменившиеся части WAL-файла, а не весь файл целиком.

Можно взять и обычную утилиту для копирования целых файлов, но тогда придется создавать файлы-снимки и WAL-файлы на каждое изменение, чтобы нужно было копировать только новые файлы.

Обновление

Важно

This section contains instructions for upgrading Tarantool clusters to versions up to 2.11.x.

This section describes the general upgrade process for Tarantool. There are two main upgrade scenarios for different use cases:

You can also downgrade to an earlier version using a similar procedure.

For information about backwards compatibility, see the compatibility guarantees description.

Upgrading from or to certain versions can involve specific steps or slightly differ from the general upgrade procedure. Such version-specific cases are described on the dedicated pages inside this section.

This section includes the following topics:

Standalone instance upgrade

This page describes the process of upgrading a standalone Tarantool instance in production. Note that this always implies a downtime because the application needs to be stopped and restarted on the target version.

To upgrade without downtime, you need multiple Tarantool servers running in a replication cluster. Find detailed instructions in Replication cluster upgrade.

Before upgrading, make sure your application is compatible with the target Tarantool version:

  1. Set up a development environment with the target Tarantool version installed. See the installation instructions at the Tarantool download page and in the tt install reference.
  2. Deploy the application in this environment and check how it works. In case of any issues, adjust the application code to ensure compatibility with the target version.

When your application is ready to run on the target Tarantool version, you can start upgrading the production environment.

  1. Stop the Tarantool instance.

  2. Make a copy of all data and the package from which the current (old) version was installed. You may need it for rollback purposes. Find the backup instruction in the appropriate hot backup procedure in Backups.

  3. Install the target Tarantool version on the host. You can do this using a package manager or the tt utility. See the installation instructions at Tarantool download page and in the tt install reference. To check that the target Tarantool version is installed, run tarantool -v.

  4. Start your application on the target version.

  5. Run box.schema.upgrade(). This will update the Tarantool system spaces to match the currently installed version of Tarantool.

    Примечание

    To undo schema upgrade in a case of failed upgrade, you can use box.schema.downgrade().

The rollback procedure for a standalone instance is almost the same as the upgrade. The only difference is in the last step: you should call box.schema.downgrade() to return the schema to the original version.

Replication cluster upgrade

Below are the general instructions for upgrading a Tarantool cluster with replication. Upgrading from some versions can involve certain specifics. To find out if it is your case, check the version-specific topics of the Upgrades section.

A replication cluster can be upgraded without downtime due to its redundancy. When you disconnect a single instance for an upgrade, there is always another instance that takes over its functionality: being a master storage for the same data buckets or working as a router. This way, you can upgrade all the instances one by one.

The high-level steps of cluster upgrade are the following:

  1. Ensure the application compatibility with the target Tarantool version.
  2. Check the cluster health.
  3. Install the target Tarantool version on the cluster nodes.
  4. Upgrade router nodes one by one.
  5. Upgrade storage replica sets one by one.

Важно

The only way to upgrade Tarantool from version 1.6, 1.7, or 1.9 to 2.x without downtime is to take an intermediate step by upgrading to 1.10 and then to 2.x.

Before upgrading Tarantool from 1.6 to 2.x, please read about the associated caveats.

Примечание

Some upgrade steps are moved to the separate section Procedures and checks to avoid overloading the general instruction with details. Typically, these are checks you should repeat during the upgrade to ensure it goes well.

If you experience issues during upgrade, you can roll back to the original version. The rollback instructions are provided in the Rollback section.

Before upgrading, make sure your application is compatible with the target Tarantool version:

  1. Set up a development environment with the target Tarantool version installed. See the installation instructions at the Tarantool download page and in the tt install reference.
  2. Deploy the application in this environment and check how it works. In case of any issues, adjust the application code to ensure compatibility with the target version.

When your application is ready to run on the target Tarantool version, you can start upgrading the production environment.

Perform these steps before the upgrade to ensure that your cluster is working correctly:

  1. On each router instance, perform the vshard.router check:

    tarantool> vshard.router.info()
    -- no issues in the output
    -- sum of 'bucket.available_rw' == total number of buckets
    
  2. On each storage instance, perform the replication check:

    tarantool> box.info
    -- box.info.status == 'running'
    -- box.info.ro == 'false' on one instance in each replica set.
    -- box.info.replication[*].upstream.status == 'follow'
    -- box.info.replication[*].downstream.status == 'follow'
    -- box.info.replication[*].upstream.lag <= box.cfg.replication_timeout
    -- can also be moderately larger under a write load
    
  3. On each storage instance, perform the vshard.storage check:

    tarantool> vshard.storage.info()
    -- no issues in the output
    -- replication.status == 'follow'
    
  4. Check all instances“ logs for application errors.

Примечание

If you’re running Cartridge, you can check the health of the cluster instances on the Cluster tab of its web interface.

In case of any issues, make sure to fix them before starting the upgrade procedure.

Install the target Tarantool version on all hosts of the cluster. You can do this using a package manager or the tt utility. See the installation instructions at the Tarantool download page and in the tt install reference.

Check that the target Tarantool version is installed by running tarantool -v on all hosts.

Upgrade router instances one by one:

  1. Stop one router instance.
  2. Start this instance on the target Tarantool version.
  3. Repeat the previous steps for each router instance.

After completing the router instances upgrade, perform the vshard.router check on each of them.

Before upgrading storage instances:

  • Disable Cartridge failover: run

    tt cartridge failover disable
    

    or use the Cartridge web interface (Cluster tab, Failover: <Mode> button).

  • Disable the rebalancer: run

    tarantool> vshard.storage.rebalancer_disable()
    
  • Make sure that the Cartridge upgrade_schema option is false.

Upgrade storage instances by performing the following steps for each replica set:

Примечание

To detect possible upgrade issues early, we recommend that you perform a replication check on all instances of the replica set after each step.

  1. Pick a replica (a read-only instance) from the replica set. Stop this replica and start it again on the target Tarantool version. Wait until it reaches the running status (box.info.status == running).
  2. Restart all other read-only instances of the replica set on the target version one by one.
  3. Make one of the updated replicas the new master using the applicable instruction from Switching the master.
  4. Restart the last instance of the replica set (the former master, now a replica) on the target version.
  1. Run box.schema.upgrade() on the new master. This will update the Tarantool system spaces to match the currently installed version of Tarantool. The changes will be propagated to other nodes via the replication mechanism later.

Предупреждение

This is the point of no return for upgrading from versions earlier than 2.8.2: once you complete it, the schema is no longer compatible with the initial version.

When upgrading from version 2.8.2 or newer, you can undo the schema upgrade using box.schema.downgrade().

  1. Run box.snapshot() on every node in the replica set to make sure that the replicas immediately see the upgraded database state in case of restart.

Once you complete the steps, enable failover or rebalancer back:

  • Enable Cartridge failover: run

    tt cartridge failover set [mode]
    

    or use the Cartridge web interface (Cluster tab, Failover: Disabled button).

  • Enable the rebalancer: run

    tarantool> vshard.storage.rebalancer_enable()
    

Perform these steps after the upgrade to ensure that your cluster is working correctly:

  1. On each router instance, perform the vshard.router check:

    tarantool> vshard.router.info()
    -- no issues in the output
    -- sum of 'bucket.available_rw' == total number of buckets
    
  2. On each storage instance, perform the replication check:

    tarantool> box.info
    -- box.info.status == 'running'
    -- box.info.ro == 'false' on one instance in each replica set.
    -- box.info.replication[*].upstream.status == 'follow'
    -- box.info.replication[*].downstream.status == 'follow'
    -- box.info.replication[*].upstream.lag <= box.cfg.replication_timeout
    -- can also be moderately larger under a write load
    
  3. On each storage instance, perform the vshard.storage check:

    tarantool> vshard.storage.info()
    -- no issues in the output
    -- replication.status == 'follow'
    
  4. Check all instances“ logs for application errors.

Примечание

If you’re running Cartridge, you can check the health of the cluster instances on the Cluster tab of its web interface.

If you decide to roll back before reaching the point of no return, your data is fully compatible with the version you had before the upgrade. In this case, you can roll back the same way: restart the nodes you’ve already upgraded on the original version.

If you’ve passed the point of no return (that is, executed box.schema.upgrade()) during the upgrade, then a rollback requires downgrading the schema to the original version.

To check if an automatic downgrade is available for your original version, use box.schema.downgrade_versions(). If the version you need is on the list, execute the following steps on each upgraded replica set to roll back:

  1. Run box.schema.downgrade(<version>) on master specifying the original version.
  2. Run box.snapshot() on every instance in the replica set to make sure that the replicas immediately see the downgraded database state after restart.
  3. Restart all read-only instances of the replica set on the initial version one by one.
  4. Make one of the updated replicas the new master using the applicable instruction from Switching the master.
  5. Restart the last instance of the replica set (the former master, now a replica) on the original version.

Then enable failover or rebalancer back as described in the Upgrading storages.

Предупреждение

This section applies to cases when the upgrade procedure has failed and the cluster is not functioning properly anymore. Thus, it implies a downtime and a full cluster restart.

In case of an upgrade failure after passing the point of no return, follow these steps to roll back to the original version:

  1. Stop all cluster instances.

  2. Save snapshot and xlog files from all instances whose data was modified after the last backup procedure. These files will help apply these modifications later.

  3. Save the latest backups from all instances.

  4. Restore the original Tarantool version on all hosts of the cluster.

  5. Launch the cluster on the original Tarantool version.

    Примечание

    At this point, the application becomes fully functional and contains data from the backups. However, the data modifications made after the backups were taken must be restored manually.

  6. Manually apply the latest data modifications from xlog files you saved on step 2 using the xlog module. On instances where such changes happened, do the following:

    1. Find out the vclock value of the latest operation in the original WAL.
    2. Play the operations from the newer xlog starting from this vclock on the instance.

    Важно

    If the upgrade has failed after calling box.schema.upgrade(), don’t apply the modifications of system spaces done by this call. This can make the schema incompatible with the original Tarantool version.

Find more information about the Tarantool recovery in Disaster recovery.

Run box.info:

tarantool> box.info

Check that the following conditions are satisfied:

  • box.info.status is running
  • box.info.replication[*].upstream.status and box.info.replication[*].downstream.status are follow
  • box.info.replication[*].upstream.lag is less or equal than box.cfg.replication_timeout, but it can also be moderately larger under a write load.
  • box.info.ro is false at least on one instance in each replica set. If all instances have box.info.ro = true, this means there are no writable nodes. On Tarantool v. 2.10.0 or later, you can find out why this happened by running box.info.ro_reason. If box.info.ro_reason or box.info.status has the value orphan, the instance doesn’t see the rest of the replica set.

Then run box.info once more and check that box.info.replication[*].upstream.lag values are updated.

Run vshard.storage.info():

tarantool> vshard.storage.info()

Check that the following conditions are satisfied:

  • there are no issues or alerts
  • replication.status is follow

Run vshard.router.info():

tarantool> vshard.router.info()

Check that the following conditions are satisfied:

  • there are no issues or alerts
  • all buckets are available (the sum of bucket.available_rw on all replica sets equals the total number of buckets)

  • Cartridge. If your cluster runs on Cartridge, you can switch the master in the web interface. To do this, go to the Cluster tab, click Edit replica set, and drag an instance to the top of Failover priority list to make it the master.

  • Raft. If your cluster uses automated leader election, switch the master by following these steps:

    1. Pick a candidate – a read-only instance to become the new master.
    2. Run box.ctl.promote() on the candidate. The operation will start and wait for the election to happen.
    3. Run box.cfg{ election_mode = "voter" } on the current master.
    4. Check that the candidate became the new master: its box.info.ro must be false.
  • Legacy. If your cluster neither works on Cartridge nor has automated leader election, switch the master by following these steps:

    1. Pick a candidate – a read-only instance to become the new master.

    2. Run box.cfg{ read_only = true } on the current master.

    3. Check that the candidate’s vclock value matches the master’s: The value of box.info.vclock[<master_id>] on the candidate must be equal to box.info.lsn on the master. <master_id> here is the value of box.info.id on the master.

      If the vclock values don’t match, stop the switch procedure and restore the replica set state by calling box.cfg{ read_only == false } on the master. Then pick another candidate and restart the procedure.

After switching the master, perform the replication check on each instance of the replica set.

Live upgrade from Tarantool 1.6 to 1.10

This page includes explanations and solutions to some common issues when upgrading a replica set from Tarantool 1.6 to 1.10.

Versions later that 1.6 have incompatible .snap and .xlog file formats: 1.6 files are supported during upgrade, but you won’t be able to return to 1.6 after running under 1.10 or 2.x for a while. A few configuration parameters are also renamed.

To perform a live upgrade from Tarantool 1.6 to a more recent version, like 2.8.4, 2.10.1 and such, it is necessary to take an intermediate step by upgrading 1.6 -> 1.10 -> 2.x. This is the only way to perform the upgrade without downtime.

However, a direct upgrade of a replica set from 1.6 to 2.x is also possible, but only with downtime.

The procedure of live upgrade from 1.6 to 1.10 is similar to the general cluster upgrade procedure, but with slight differences in the Upgrading storages step. Find below the general storage upgrade procedure and the 1.6-specific notes for its steps.

Upgrade storage instances by performing the following steps for each replica set:

Примечание

To detect possible upgrade issues early, we recommend that you perform a replication check on all instances of the replica set after each step.

  1. Pick a replica (a read-only instance) from the replica set. Stop this replica and start it again on the target Tarantool version. Wait until it reaches the running status (box.info.status == running).
  2. Restart all other read-only instances of the replica set on the target version one by one.
  3. Make one of the updated replicas the new master using the applicable instruction from Switching the master.
  4. Restart the last instance of the replica set (the former master, now a replica) on the target version.
  1. Run box.schema.upgrade() on the new master. This will update the Tarantool system spaces to match the currently installed version of Tarantool. The changes will be propagated to other nodes via the replication mechanism later.
  2. Run box.snapshot() on every node in the replica set to make sure that the replicas immediately see the upgraded database state in case of restart.

Upgrade from 1.6 directly to 2.x with downtime

Versions later that 1.6 have incompatible .snap and .xlog file formats: 1.6 files are supported during upgrade, but you won’t be able to return to 1.6 after running under 1.10 or 2.x for a while. A few configuration parameters are also renamed.

To perform a live upgrade from Tarantool 1.6 to a more recent version, like 2.8.4, 2.10.1 and such, it is necessary to take an intermediate step by upgrading 1.6 -> 1.10 -> 2.x. This is the only way to perform the upgrade without downtime.

However, a direct upgrade of a replica set from 1.6 to 2.x is also possible, but only with downtime.

Here is how to upgrade from Tarantool 1.6 directly to 2.x:

  1. Stop all instances in the replica set.
  2. Upgrade Tarantool version to 2.x on every instance.
  3. Upgrade the corresponding instance files and applications, if needed.
  4. Start all the instances with Tarantool 2.x.
  5. Execute box.schema.upgrade() on the master.
  6. Execute box.snapshot() on every node in the replica set.

Fix decimal values in vinyl spaces when upgrading to 2.10.1

This is an upgrade guide for fixing one specific problem which could happen with decimal values in vinyl spaces. It’s only relevant when you’re upgrading from Tarantool version <= 2.10.0 to anything >= 2.10.1.

Before gh-6377 was fixed, decimal and double values in a scalar or number index could end up in the wrong order after the update. If such an index has been built for a space that uses the vinyl storage engine, the index is persisted and is not rebuilt even after the upgrade. If this is the case, the user has to rebuild the affected indexes manually.

Here are the rules to determine whether your installation was affected. If all of the statements listed below are true, you have to rebuild indexes for the affected vinyl spaces manually.

If this is the case for you, you can run the following script, which will find all the affected indices:

local fiber = require('fiber')
local decimal = require('decimal')

local function isnan(val)
    return type(val) == 'number' and val ~= val
end

local function isinf(val)
    return val == math.huge or val == -math.huge
end

local function vinyl(id)
    return box.space[id].engine == 'vinyl'
end

require_rebuild = {}
local iters = 0
for _, v in box.space._index:pairs({512, 0}, {iterator='GE'}) do
    local id = v[1]
    iters = iters + 1
    if iters % 1000 == 0 then
        fiber.yield()
    end
    if vinyl(id) then
        local format = v[6]
        local check_fields = {}
        for _, fmt in pairs(v[6]) do
            if fmt[2] == 'number' or fmt[2] == 'scalar' then
                table.insert(check_fields,  fmt[1] + 1)
            end
        end
        local have_decimal = {}
        local have_nan = {}
        if #check_fields > 0 then
            for k, tuple in box.space[id]:pairs() do
                for _, i in pairs(check_fields) do
                    iters = iters + 1
                    if iters % 1000 == 0 then
                        fiber.yield()
                    end
                    have_decimal[i] = have_decimal[i] or
                                    decimal.is_decimal(tuple[i])
                    have_nan[i] = have_nan[i] or isnan(tuple[i]) or
                                isinf(tuple[i])
                    if have_decimal[i] and have_nan[i] then
                        table.insert(require_rebuild, v)
                        goto out
                    end
                end
            end
        end
    end
    ::out::
end

The indices requiring a rebuild will be stored in the require_rebuild table. If the table is empty, you’re safe and can continue using Tarantool as before.

If the require_rebuild table contains some entries, you can rebuild the affected indices with the following script.

Примечание

Please run the script below only on the master node and only after all the nodes are upgraded to the new Tarantool version.

local log = require('log')

local function rebuild_index(idx)
    local index_name = idx[3]
    local space_name = box.space[idx[1]].name
    log.info("Rebuilding index %s on space %s", index_name, space_name)
    if (idx[2] == 0) then
        log.error("Cannot rebuild primary index %s on space %s. Please, "..
                "recreate the space manually", index_name, space_name)
        return
    end
    log.info("Deleting index %s on space %s", index_name, space_name)
    local v = box.space._index:delete{idx[1], idx[2]}
    if v == nil then
        log.error("Couldn't find index %s on space %s", index_name, space_name)
        return
    end
    log.info("Done")
    log.info("Creating index %s on space %s", index_name, space_name)
    box.space._index:insert(v)
end

for _, idx in pairs(require_rebuild) do
    rebuild_index(idx)
end

The script might fail on some of the indices with the following error: «Cannot rebuild primary index index_name on space space_name. Please, recreate the space manually». If this happens, automatic index rebuild is impossible, and you have to manually re-create the space to ensure data integrity:

  1. Create a new space with the same format as the existing one.
  2. Define the same indices on the freshly created space.
  3. Iterate over the old space’s primary key and insert all the data into the new space.
  4. Drop the old space.

Fix illegal type names when upgrading to 2.10.4

This is an upgrade guide for fixing one specific problem which could happen with field type names. It’s only relevant when you’re upgrading from a Tarantool version <=2.10.3 to >=2.10.4.

Before gh-5940 was fixed, the empty string, n, nu, s, and st (that is, leading parts of num and str) were accepted as valid field types. Since 2.10.4, Tarantool doesn’t accept these strings and they must be replaced with correct values num and str.

This instruction is also available on GitHub.

A snapshot can be validated against the issue using the following script:

#!/usr/bin/env tarantool

local xlog = require('xlog')
local json = require('json')

if arg[1] == nil then
    print(('Usage: %s xxxxxxxxxxxxxxxxxxxx.snap'):format(arg[0]))
    os.exit(1)
end

local illegal_types = {
    [''] = true,
    ['n'] = true,
    ['nu'] = true,
    ['s'] = true,
    ['st'] = true,
}

local function report_field_def(name, field_def)
    local msg = 'A field def in a _space entry %q contains an illegal type: %s'
    print(msg:format(name, json.encode(field_def)))
end

local has_broken_format = false

for _, record in xlog.pairs(arg[1]) do
    -- Filter inserts.
    if record.HEADER == nil or record.HEADER.type ~= 'INSERT' then
        goto continue
    end
    -- Filter _space records.
    if record.BODY == nil or record.BODY.space_id ~= 280 then
        goto continue
    end

    local tuple = record.BODY.tuple
    local name = tuple[3]
    local format = tuple[7]

    local is_format_broken = false
    for _, field_def in ipairs(format) do
        if illegal_types[field_def.type] ~= nil then
            report_field_def(name, field_def)
            is_format_broken = true
        end

        if illegal_types[field_def[2]] ~= nil then
            report_field_def(name, field_def)
            is_format_broken = true
        end

    end

    if is_format_broken then
        has_broken_format = true
        local msg = 'The following _space entry contains illegal type(s): %s'
        print(msg:format(json.encode(record)))
    end
    ::continue::
end

if has_broken_format then
    print('')
    print(('%s has an illegal type in a space format'):format(arg[1]))
    print('It is recommended to proceed with the upgrade instruction:')
    print('https://github.com/tarantool/tarantool/wiki/Fix-illegal-field-type-in-a-space-format-when-upgrading-to-2.10.4')
else
    print('Everything looks nice!')
end

os.exit(has_broken_format and 2 or 0)

If the snapshot contains the values that aren’t valid in 2.10.4, you’ll get an output like the following:

To fix the application file that contains illegal type names, add the following code in it before the box.cfg()/vshard.cfg()/cartridge.cfg() call.

Примечание

In Cartridge applications, the instance file is called init.lua.

-- Convert illegal type names in a space format that were
-- allowed before tarantool 2.10.4.

local log = require('log')
local json = require('json')

local transforms = {
    [''] = 'num',
    ['n'] = 'num',
    ['nu'] = 'num',
    ['s'] = 'str',
    ['st'] = 'str',
}

-- The helper for before_replace().
local function transform_field_def(name, field_def, field, new_type)
    local field_def_old_str = json.encode(field_def)
    field_def[field] = new_type
    local field_def_new_str = json.encode(field_def)

    local msg = 'Transform a field def in a _space entry %q: %s -> %s'
    log.info(msg:format(name, field_def_old_str, field_def_new_str))
end

-- _space trigger.
local function before_replace(_, tuple)
    if tuple == nil then return tuple end

    local name = tuple[3]
    local format = tuple[7]

    -- Update format if necessary.
    local is_format_changed = false
    for i, field_def in ipairs(format) do
        local new_type = transforms[field_def.type]
        if new_type ~= nil then
            transform_field_def(name, field_def, 'type', new_type)
            is_format_changed = true
        end

        local new_type = transforms[field_def[2]]
        if new_type ~= nil then
            transform_field_def(name, field_def, 2, new_type)
            is_format_changed = true
        end
    end

    -- No changed: skip.
    if not is_format_changed then return tuple end

    -- Rebuild the tuple.
    local new_tuple = tuple:transform(7, 1, format)
    log.info(('Transformed _space entry %s to %s'):format(
        json.encode(tuple), json.encode(new_tuple)))
    return new_tuple
end

-- on_schema_init trigger to set before_replace().
local function on_schema_init()
    box.space._space:before_replace(before_replace)
end

-- Set the trigger on _space.
box.ctl.on_schema_init(on_schema_init)

You can delete these triggers after the box.cfg()/vshard.cfg()/cartridge.cfg() call.

An example for a Cartridge application:

The triggers will report the changes the make in the following form:

Recover from WALs with mixed transactions when upgrading to 2.11.0

This is a guide on fixing a specific problem that could happen when upgrading from a Tarantool version between 2.1.2 and 2.2.0 to 2.8.1 or later. The described solution is applicable since version 2.11.0.

The problem is described in the issue gh-7932. If two or more transactions happened simultaneously in Tarantool 2.1.2-2.2.0, their operations could be written to the write-ahead log mixed with each other. Starting from version 2.8.1, Tarantool recovers transactions atomically and expects all WAL entries between a transaction’s begin and commit operations to belong to one transaction. If there is an operation belonging to another transaction, Tarantool fails to recover from such a WAL.

Starting from version 2.11.0, Tarantool can recover from WALs with mixed transactions in the force_recovery mode.

If all instances or some of them fail to start after upgrading to 2.11 or a newer version due to a recovery error:

  1. Start these instances with the force_recovery option to true.
  2. Make new snapshots on the instances so that the old WALs with mixed transactions aren’t used for recovery anymore. To do this, call box.snapshot().
  3. Set force_recovery back to false.

After all the instances start successfully, WALs with mixed transactions may still lead to replication issues. Some instances may fail to replicate from other instances because they are sending incorrect WALs. To fix the replication issues, rebootstrap the instances that fail to replicate.

Сообщения об ошибках

Если вы нашли ошибку в Tarantool, вы окажете нам услугу, сообщив о ней.

Пожалуйста, откройте тикет в репозитории Tarantool на GitHub. Рекомендуем включить следующую информацию:

Если это запрос новой функции или это затрагивает определенную группу пользователей, не забудьте это указать.

Обычно член команды Tarantool отвечает в течение одного-двух рабочих дней, чтобы подтвердить, что тикет взят в работу, задать уточняющие вопросы или предложить альтернативное решение описанной проблемы.

Flight recorder

Enterprise Edition

The flight recorder is available in the Enterprise Edition only.

Example on GitHub: flightrec

The flight recorder is an event collection tool that gathers various information about a working Tarantool instance, such as:

This information helps you investigate incidents related to crashing a Tarantool instance.

The flight recorder is disabled by default and can be enabled and configured for a specific Tarantool instance. To enable the flight recorder, set the flightrec.enabled configuration option to true.

flightrec:
  enabled: true

After flightrec.enabled is set to true, the flight recorder starts collecting data in the flight recording file current.ttfr. This file is stored in the snapshot.dir directory. By default, the directory is var/lib/{{ instance_name }}/<file_name>.ttfr.

If the instance crashes and reboots, Tarantool rotates the flight recording: current.ttfr is renamed to <timestamp>.ttfr (for example, 20230411T050721.ttfr) and the new current.ttfr file is created for collecting data. In the case of correct shutdown (for example, using os.exit()), Tarantool continues writing to the existing current.ttfr file after restart.

Примечание

Note that old flight recordings should be removed manually.

When the flight recorder is enabled, you can set the options related to logging, metrics, and storing the request and response data.

The flightrec configuration might look as follows:

flightrec:
  enabled: true
  logs_size: 10485800
  logs_log_level: 5
  metrics_period: 240
  metrics_interval: 0.5
  requests_size: 10485780

In the example, the following options are set:

Read more: Flight recorder configuration options.

Monitoring

Monitoring is the process of capturing runtime information about the instances of a Tarantool cluster using metrics. Metrics can indicate various characteristics, such as memory usage, the number of records in spaces, replication status, and so on. Typically, metrics are monitored in real time, allowing for the identification of current issues or the prediction of potential ones.

Getting started with monitoring

Example on GitHub: sharded_cluster_crud_metrics

Tarantool allows you to configure and expose its metrics using a YAML configuration. You can also use the built-in metrics module to create and collect custom metrics.

To configure metrics, use the metrics section in a cluster configuration. The configuration below enables all metrics excluding vinyl-specific ones:

metrics:
  include: [ all ]
  exclude: [ vinyl ]
  labels:
    alias: '{{ instance_name }}'

The metrics.labels option accepts the predefined {{ instance_name }} variable. This adds an instance name as a label to every observation.

Third-party Lua modules, like crud or expirationd, offer their own metrics. You can enable these metrics by configuring the corresponding role. The example below shows how to enable statistics on called operations by providing the roles.crud-router role’s configuration:

roles:
- roles.crud-router
- roles.metrics-export
roles_cfg:
  roles.crud-router:
    stats: true
    stats_driver: metrics
    stats_quantiles: true

expirationd metrics can be enabled as follows:

expirationd:
  cfg:
    metrics: true

To expose metrics in different formats, you can use a third-party metrics-export-role role. In the following example, the metrics of storage-a-001 are provided on two endpoints:

storage-a-001:
  roles_cfg:
    roles.metrics-export:
      http:
      - listen: '127.0.0.1:8082'
        endpoints:
        - path: /metrics/prometheus/
          format: prometheus
        - path: /metrics/json
          format: json

Example on GitHub: sharded_cluster_crud_metrics

Примечание

The metrics module provides a set of plugins that can be used to collect and expose metrics in different formats. Learn more in Collecting metrics using plugins.

The metrics module allows you to create and collect custom metrics. The example below shows how to collect the number of data operations performed on the specified space by increasing a counter value inside the on_replace() trigger function:

local metrics = require('metrics')
local bands_replace_count = metrics.counter('bands_replace_count', 'The number of data operations')
local trigger = require('trigger')
trigger.set(
        'box.space.bands.on_replace',
        'update_bands_replace_count_metric',
        function(_, _, _, request_type)
            bands_replace_count:inc(1, { request_type = request_type })
        end
)

Learn more in Custom metrics.

When metrics are configured and exposed, you can use the desired third-party tool to collect them. Below is the example of a Prometheus scrape configuration that collects metrics of multiple Tarantool instances:

global:
  scrape_interval:     5s
  evaluation_interval: 5s

scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets:
          - 127.0.0.1:8081
          - 127.0.0.1:8082
          - 127.0.0.1:8083
          - 127.0.0.1:8084
          - 127.0.0.1:8085
    metrics_path: "/metrics/prometheus"

For more information on collecting and visualizing metrics, refer to Grafana dashboard.

Примечание

Tarantool Cluster Manager allows you to view metrics of connected clusters in real time. Learn more in Viewing cluster metrics.

Grafana dashboard

After enabling and configuring metrics, you can visualise them using Tarantool Grafana dashboards. These dashboards are available as part of Grafana official & community-built dashboards:

Tarantool 3 Prometheus, InfluxDB
Tarantool Cartridge and Tarantool 1.10—2.x Prometheus, InfluxDB
Tarantool Data Grid 2 Prometheus, InfluxDB

The Tarantool Grafana dashboard is a ready for import template with basic memory, space operations, and HTTP load panels, based on default metrics package functionality.

../../../_images/Prometheus_dashboard_1.png ../../../_images/Prometheus_dashboard_2.png ../../../_images/Prometheus_dashboard_3.png

Since there are Prometheus and InfluxDB data source Grafana dashboards, you can use one of the following:

For issues related to setting up Prometheus, Telegraf, InfluxDB, or Grafana instances, refer to the corresponding project’s documentation.

To collect metrics for Prometheus, first set up metrics output with prometheus format. You can use the roles.metrics-export configuration or set up the Prometheus plugin manually. To start collecting metrics, add a job to Prometheus configuration with each Tarantool instance URI as a target and metrics path as it was configured on Tarantool instances:

global:
  scrape_interval:     5s
  evaluation_interval: 5s

scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets:
          - 127.0.0.1:8081
          - 127.0.0.1:8082
          - 127.0.0.1:8083
          - 127.0.0.1:8084
          - 127.0.0.1:8085
    metrics_path: "/metrics/prometheus"

To collect metrics for InfluxDB, use the Telegraf agent. First off, configure Tarantool metrics output in json format with roles.metrics-export configuration or corresponding JSON plugin. To start collecting metrics, add http input to Telegraf configuration including each Tarantool instance metrics URL:

[[inputs.http]]
    urls = [
        "http://example_project:8081/metrics/json",
        "http://example_project:8082/metrics/json",
        "http://example_project:8083/metrics/json",
        "http://example_project:8084/metrics/json",
        "http://example_project:8085/metrics/json"
    ]
    timeout = "30s"
    tag_keys = [
        "metric_name",
        "label_pairs_alias",
        "label_pairs_quantile",
        "label_pairs_path",
        "label_pairs_method",
        "label_pairs_status",
        "label_pairs_operation",
        "label_pairs_level",
        "label_pairs_id",
        "label_pairs_engine",
        "label_pairs_name",
        "label_pairs_index_name",
        "label_pairs_delta",
        "label_pairs_stream",
        "label_pairs_thread",
        "label_pairs_kind"
    ]
    insecure_skip_verify = true
    interval = "10s"
    data_format = "json"
    name_prefix = "tarantool_"
    fieldpass = ["value"]

Be sure to include each label key as label_pairs_<key> to extract it with the plugin. For example, if you use { state = 'ready' } labels somewhere in metric collectors, add label_pairs_state tag key.

Open Grafana import menu.

../../../_images/grafana_import.png

To import a specific dashboard, choose one of the following options:

Set dashboard name, folder and uid (if needed).

../../../_images/grafana_import_setup.png

You can choose the data source and data source variables after import.

../../../_images/grafana_variables_setup.png

Alerting

You can set up alerts on metrics to get a notification when something went wrong. We will use Prometheus alert rules as an example here. You can get full alerts.yml file at tarantool/grafana-dashboard GitHub repo.

You can use internal Tarantool metrics to monitor detailed RAM consumption, replication state, database engine status, track business logic issues (like HTTP 4xx and 5xx responses or low request rate) and external modules statistics (like CRUD errors). Evaluation timeouts, severity levels and thresholds (especially ones for business logic) are placed here for the sake of example: you may want to increase or decrease them for your application. Also, don’t forget to set sane rate time ranges based on your Prometheus configuration.

Monitoring tnt_info_memory_lua metric may prevent memory overflow and detect the presence of bad Lua code practices.

Примечание

The Lua memory is limited to 2 GB per instance if Tarantool doesn’t have the GC64 mode enabled for LuaJIT.

- alert: HighLuaMemoryWarning
  expr: tnt_info_memory_lua >= (512 * 1024 * 1024)
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') Lua runtime warning"
    description: "'{{ $labels.alias }}' instance of job '{{ $labels.job }}' uses too much Lua memory
      and may hit threshold soon."

- alert: HighLuaMemoryAlert
  expr: tnt_info_memory_lua >= (1024 * 1024 * 1024)
  for: 1m
  labels:
    severity: page
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') Lua runtime alert"
    description: "'{{ $labels.alias }}' instance of job '{{ $labels.job }}' uses too much Lua memory
      and likely to hit threshold soon."

By monitoring slab allocation statistics you can see how many free RAM is remaining to store memtx tuples and indexes for an instance. If Tarantool hit the limits, the instance will become unavailable for write operations, so this alert may help you see when it’s time to increase your memtx_memory limit or to add a new storage to a vshard cluster.

- alert: LowMemtxArenaRemainingWarning
  expr: (tnt_slab_quota_used_ratio >= 80) and (tnt_slab_arena_used_ratio >= 80)
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') low arena memory remaining"
    description: "Low arena memory (tuples and indexes) remaining for '{{ $labels.alias }}' instance of job '{{ $labels.job }}'.
      Consider increasing memtx_memory or number of storages in case of sharded data."

- alert: LowMemtxArenaRemaining
  expr: (tnt_slab_quota_used_ratio >= 90) and (tnt_slab_arena_used_ratio >= 90)
  for: 1m
  labels:
    severity: page
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') low arena memory remaining"
    description: "Low arena memory (tuples and indexes) remaining for '{{ $labels.alias }}' instance of job '{{ $labels.job }}'.
      You are likely to hit limit soon.
      It is strongly recommended to increase memtx_memory or number of storages in case of sharded data."

- alert: LowMemtxItemsRemainingWarning
  expr: (tnt_slab_quota_used_ratio >= 80) and (tnt_slab_items_used_ratio >= 80)
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') low items memory remaining"
    description: "Low items memory (tuples) remaining for '{{ $labels.alias }}' instance of job '{{ $labels.job }}'.
      Consider increasing memtx_memory or number of storages in case of sharded data."

- alert: LowMemtxItemsRemaining
  expr: (tnt_slab_quota_used_ratio >= 90) and (tnt_slab_items_used_ratio >= 90)
  for: 1m
  labels:
    severity: page
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') low items memory remaining"
    description: "Low items memory (tuples) remaining for '{{ $labels.alias }}' instance of job '{{ $labels.job }}'.
      You are likely to hit limit soon.
      It is strongly recommended to increase memtx_memory or number of storages in case of sharded data."

You can monitor vinyl regulator performance to track possible scheduler or disk issues.

- alert: LowVinylRegulatorRateLimit
  expr: tnt_vinyl_regulator_rate_limit < 100000
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') have low vinyl regulator rate limit"
    description: "Instance '{{ $labels.alias }}' of job '{{ $labels.job }}' have low vinyl engine regulator rate limit.
      This indicates issues with the disk or the scheduler."

Vinyl transactions errors are likely to lead to user requests errors.

- alert: HighVinylTxConflictRate
  expr: rate(tnt_vinyl_tx_conflict[5m]) / rate(tnt_vinyl_tx_commit[5m]) > 0.05
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') have high vinyl tx conflict rate"
    description: "Instance '{{ $labels.alias }}' of job '{{ $labels.job }}' have
      high vinyl transactions conflict rate. It indicates that vinyl is not healthy."

Vinyl scheduler failed tasks are a good signal of disk issues and may be the reason of increasing RAM consumption.

- alert: HighVinylSchedulerFailedTasksRate
  expr: rate(tnt_vinyl_scheduler_tasks{status="failed"}[5m]) > 0.1
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') have high vinyl scheduler failed tasks rate"
    description: "Instance '{{ $labels.alias }}' of job '{{ $labels.job }}' have
      high vinyl scheduler failed tasks rate."

If tnt_replication_status is equal to 0, instance replication status is not equal to "follows": replication is either not ready yet or has been stopped due to some reason.

- alert: ReplicationNotRunning
  expr: tnt_replication_status == 0
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') {{ $labels.stream }} (id {{ $labels.id }})
      replication is not running"
    description: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') {{ $labels.stream }} (id {{ $labels.id }})
      replication is not running."

Even if async replication is "follows", it could be considered malfunctioning if the lag is too high. It also may affect Tarantool garbage collector work, see box.info.gc().

- alert: HighReplicationLag
  expr: tnt_replication_lag > 1
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') have high replication lag (id {{ $labels.id }})"
    description: "Instance '{{ $labels.alias }}' of job '{{ $labels.job }}' have high replication lag
      (id {{ $labels.id }}), check up your network and cluster state."

High fiber event loop time leads to bad application performance, timeouts and various warnings. The reason could be a high quantity of working fibers or fibers that spend too much time without any yields or sleeps.

- alert: HighEVLoopTime
  expr: tnt_ev_loop_time > 0.1
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') event loop has high cycle duration"
    description: "Instance '{{ $labels.alias }}' of job '{{ $labels.job }}' event loop has high cycle duration.
      Some high loaded fiber has too little yields. It may be the reason of 'Too long WAL write' warnings."

Configuration status displays Tarantool 3 configuration apply state. Additional metrics display the count of apply warnings and errors.

- alert: ConfigWarningAlerts
  expr: tnt_config_alerts{level="warn"} > 0
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') has configuration 'warn' alerts"
    description: "Instance '{{ $labels.alias }}' of job '{{ $labels.job }}' has configuration 'warn' alerts.
                  Please, check config:info() for detailed info."

- alert: ConfigErrorAlerts
  expr: tnt_config_alerts{level="error"} > 0
  for: 1m
  labels:
    severity: page
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') has configuration 'error' alerts"
    description: "Instance '{{ $labels.alias }}' of job '{{ $labels.job }}' has configuration 'error' alerts.
                  Latest configuration has not been applied.
                  Please, check config:info() for detailed info."

- alert: ConfigStatusNotReady
  expr: tnt_config_status{status="ready"} == 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') configuration is not ready"
    description: "Instance '{{ $labels.alias }}' of job '{{ $labels.job }}' configuration is not ready.
                  Please, check config:info() for detailed info."

metrics allows to monitor tarantool/http handles, see «Collecting HTTP request latency statistics». Here we use a summary collector with a default name and 0.99 quantile computation.

Too many responses with error codes usually is a sign of API issues or application malfunction.

- alert: HighInstanceHTTPClientErrorRate
  expr: sum by (job, instance, method, path, alias) (rate(http_server_request_latency_count{ job="tarantool", status=~"^4\\d{2}$" }[5m])) > 10
  for: 1m
  labels:
    severity: page
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') high rate of client error responses"
    description: "Too many {{ $labels.method }} requests to {{ $labels.path }} path
      on '{{ $labels.alias }}' instance of job '{{ $labels.job }}' get client error (4xx) responses."

- alert: HighHTTPClientErrorRate
  expr: sum by (job, method, path) (rate(http_server_request_latency_count{ job="tarantool", status=~"^4\\d{2}$" }[5m])) > 20
  for: 1m
  labels:
    severity: page
  annotations:
    summary: "Job '{{ $labels.job }}' high rate of client error responses"
    description: "Too many {{ $labels.method }} requests to {{ $labels.path }} path
      on instances of job '{{ $labels.job }}' get client error (4xx) responses."

- alert: HighHTTPServerErrorRate
  expr: sum by (job, instance, method, path, alias) (rate(http_server_request_latency_count{ job="tarantool", status=~"^5\\d{2}$" }[5m])) > 0
  for: 1m
  labels:
    severity: page
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') server error responses"
    description: "Some {{ $labels.method }} requests to {{ $labels.path }} path
      on '{{ $labels.alias }}' instance of job '{{ $labels.job }}' get server error (5xx) responses."

Responding with high latency is a synonym of insufficient performance. It may be a sign of application malfunction. Or maybe you need to add more routers to your cluster.

- alert: HighHTTPLatency
  expr: http_server_request_latency{ job="tarantool", quantile="0.99" } > 0.1
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') high HTTP latency"
    description: "Some {{ $labels.method }} requests to {{ $labels.path }} path with {{ $labels.status }} response status
      on '{{ $labels.alias }}' instance of job '{{ $labels.job }}' are processed too long."

Having too little requests when you expect them may detect balancer, external client or network malfunction.

- alert: LowRouterHTTPRequestRate
  expr: sum by (job, instance, alias) (rate(http_server_request_latency_count{ job="tarantool", alias=~"^.*router.*$" }[5m])) < 10
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Router '{{ $labels.alias }}' ('{{ $labels.job }}') low activity"
    description: "Router '{{ $labels.alias }}' instance of job '{{ $labels.job }}' gets too little requests.
      Please, check up your balancer middleware."

If your application uses CRUD module requests, monitoring module statistics may track internal errors caused by invalid process of input and internal parameters.

- alert: HighCRUDErrorRate
  expr: rate(tnt_crud_stats_count{ job="tarantool", status="error" }[5m]) > 0.1
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') too many CRUD {{ $labels.operation }} errors."
    description: "Too many {{ $labels.operation }} CRUD requests for '{{ $labels.name }}' space on
      '{{ $labels.alias }}' instance of job '{{ $labels.job }}' get module error responses."

Statistics could also monitor requests performance. Too high request latency will lead to high latency of client responses. It may be caused by network or disk issues. Read requests with bad (with respect to space indexes and sharding schema) conditions may lead to full-scans or map reduces and also could be the reason of high latency.

- alert: HighCRUDLatency
  expr: tnt_crud_stats{ job="tarantool", quantile="0.99" } > 0.1
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') too high CRUD {{ $labels.operation }} latency."
    description: "Some {{ $labels.operation }} {{ $labels.status }} CRUD requests for '{{ $labels.name }}' space on
      '{{ $labels.alias }}' instance of job '{{ $labels.job }}' are processed too long."

You also can directly monitor map reduces and scan rate.

- alert: HighCRUDMapReduceRate
  expr: rate(tnt_crud_map_reduces{ job="tarantool" }[5m]) > 0.1
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') too many CRUD {{ $labels.operation }} map reduces."
    description: "There are too many {{ $labels.operation }} CRUD map reduce requests for '{{ $labels.name }}' space on
      '{{ $labels.alias }}' instance of job '{{ $labels.job }}'.
      Check your request conditions or consider changing sharding schema."

If there are no Tarantool metrics, you may miss critical conditions. Prometheus provide up metric to monitor the health of its targets.

- alert: InstanceDown
  expr: up == 0
  for: 1m
  labels:
    severity: page
  annotations:
    summary: "Instance '{{ $labels.instance }}' ('{{ $labels.job }}') down"
    description: "'{{ $labels.instance }}' of job '{{ $labels.job }}' has been down for more than a minute."

Do not forget to monitor your server’s CPU, disk and RAM from server side with your favorite tools. For example, on some high CPU consumption cases Tarantool instance may stop to send metrics, so you can track such breakdowns only from the outside.

Metrics reference

This page provides a detailed description of metrics from the metrics module.

General instance information:

tnt_cfg_current_time Instance system time in the Unix timestamp format
tnt_info_uptime Time in seconds since the instance has started
tnt_read_only Indicates if the instance is in read-only mode (1 if true, 0 if false)

The following metrics provide a picture of memory usage by the Tarantool process.

tnt_info_memory_cache Number of bytes in the cache used to store tuples with the vinyl storage engine.
tnt_info_memory_data Number of bytes used to store user data (tuples) with the memtx engine and with level 0 of the vinyl engine, without regard for memory fragmentation.
tnt_info_memory_index Number of bytes used for indexing user data. Includes memtx and vinyl memory tree extents, the vinyl page index, and the vinyl bloom filters.
tnt_info_memory_lua Number of bytes used for the Lua runtime. Monitoring this metric can prevent memory overflow.
tnt_info_memory_net Number of bytes used for network input/output buffers.
tnt_info_memory_tx Number of bytes in use by active transactions. For the vinyl storage engine, this is the total size of all allocated objects (struct txv, struct vy_tx, struct vy_read_interval) and tuples pinned for those objects.

Provides a memory usage report for the slab allocator. The slab allocator is the main allocator used to store tuples. The following metrics help monitor the total memory usage and memory fragmentation. To learn more about use cases, refer to the box.slab submodule documentation.

Available memory, bytes:

tnt_slab_quota_size Amount of memory available to store tuples and indexes. Is equal to memtx_memory.
tnt_slab_arena_size Total memory available to store both tuples and indexes. Includes allocated but currently free slabs.
tnt_slab_items_size Total amount of memory available to store only tuples and not indexes. Includes allocated but currently free slabs.

Memory usage, bytes:

tnt_slab_quota_used The amount of memory that is already reserved by the slab allocator.
tnt_slab_arena_used The effective memory used to store both tuples and indexes. Disregards allocated but currently free slabs.
tnt_slab_items_used The effective memory used to store only tuples and not indexes. Disregards allocated but currently free slabs.

Memory utilization, %:

tnt_slab_quota_used_ratio tnt_slab_quota_used / tnt_slab_quota_size
tnt_slab_arena_used_ratio tnt_slab_arena_used / tnt_slab_arena_size
tnt_slab_items_used_ratio tnt_slab_items_used / tnt_slab_items_size

The following metrics provide specific information about each individual space in a Tarantool instance.

tnt_space_len Number of records in the space. This metric always has 2 labels: {name="test", engine="memtx"}, where name is the name of the space and engine is the engine of the space.
tnt_space_bsize Total number of bytes in all tuples. This metric always has 2 labels: {name="test", engine="memtx"}, where name is the name of the space and engine is the engine of the space.
tnt_space_index_bsize Total number of bytes taken by the index. This metric always has 2 labels: {name="test", index_name="pk"}, where name is the name of the space and index_name is the name of the index.
tnt_space_total_bsize Total size of tuples and all indexes in the space. This metric always has 2 labels: {name="test", engine="memtx"}, where name is the name of the space and engine is the engine of the space.
tnt_vinyl_tuples Total tuple count for vinyl. This metric always has 2 labels: {name="test", engine="vinyl"}, where name is the name of the space and engine is the engine of the space. For vinyl this metric is disabled by default and can be enabled only with global variable setup: rawset(_G, 'include_vinyl_count', true).

Network activity stats. These metrics can be used to monitor network load, usage peaks, and traffic drops.

Sent bytes:

tnt_net_sent_total Bytes sent from the instance over the network since the instance’s start time

Received bytes:

tnt_net_received_total Bytes received by the instance since start time

Connections:

tnt_net_connections_total Number of incoming network connections since the instance’s start time
tnt_net_connections_current Number of active network connections

Requests:

tnt_net_requests_total Number of network requests the instance has handled since its start time
tnt_net_requests_current Number of pending network requests

Requests in progress:

tnt_net_requests_in_progress_total Total count of requests processed by tx thread
tnt_net_requests_in_progress_current Count of requests currently being processed in the tx thread

Requests placed in queues of streams:

tnt_net_requests_in_stream_total Total count of requests, which was placed in queues of streams for all time
tnt_net_requests_in_stream_current Count of requests currently waiting in queues of streams

Since Tarantool 2.10 in each network metric has the label thread, showing per-thread network statistics.

Provides the statistics for fibers. If your application creates a lot of fibers, you can use the metrics below to monitor fiber count and memory usage.

tnt_fiber_amount Number of fibers
tnt_fiber_csw Overall number of fiber context switches
tnt_fiber_memalloc Amount of memory reserved for fibers
tnt_fiber_memused Amount of memory used by fibers

You can collect iproto requests an instance has processed and aggregate them by request type. This may help you find out what operations your clients perform most often.

tnt_stats_op_total Total number of calls since server start

To distinguish between request types, this metric has the operation label. For example, it can look as follows: {operation="select"}. For the possible request types, check the table below.

auth Authentication requests
call Requests to execute stored procedures
delete Delete calls
error Requests resulted in an error
eval Calls to evaluate Lua code
execute Execute SQL calls
insert Insert calls
prepare SQL prepare calls
replace Replace calls
select Select calls
update Update calls
upsert Upsert calls

Provides the current replication status. Learn more about replication in Tarantool.

tnt_info_lsn LSN of the instance.
tnt_info_vclock LSN number in vclock. This metric always has the label {id="id"}, where id is the instance’s number in the replica set.
tnt_replication_lsn LSN of the tarantool instance. This metric always has labels {id="id", type="type"}, where id is the instance’s number in the replica set, type is master or replica.
tnt_replication_lag Replication lag value in seconds. This metric always has labels {id="id", stream="stream"}, where id is the instance’s number in the replica set, stream is downstream or upstream.
tnt_replication_status This metrics equals 1 when replication status is «follow» and 0 otherwise. This metric always has labels {id="id", stream="stream"}, where id is the instance’s number in the replica set, stream is downstream or upstream.

tnt_runtime_lua Lua garbage collector size in bytes
tnt_runtime_used Number of bytes used for the Lua runtime
tnt_runtime_tuple Number of bytes used for the tuples (except tuples owned by memtx and vinyl)

LuaJIT metrics provide an insight into the work of the Lua garbage collector. These metrics are available in Tarantool 2.6 and later.

General JIT metrics:

lj_jit_snap_restore_total Overall number of snap restores
lj_jit_trace_num Number of JIT traces
lj_jit_trace_abort_total Overall number of abort traces
lj_jit_mcode_size Total size of allocated machine code areas

JIT strings:

lj_strhash_hit_total Number of strings being interned
lj_strhash_miss_total Total number of string allocations

GC steps:

lj_gc_steps_atomic_total Count of incremental GC steps (atomic state)
lj_gc_steps_sweepstring_total Count of incremental GC steps (sweepstring state)
lj_gc_steps_finalize_total Count of incremental GC steps (finalize state)
lj_gc_steps_sweep_total Count of incremental GC steps (sweep state)
lj_gc_steps_propagate_total Count of incremental GC steps (propagate state)
lj_gc_steps_pause_total Count of incremental GC steps (pause state)

Allocations:

lj_gc_strnum Number of allocated string objects
lj_gc_tabnum Number of allocated table objects
lj_gc_cdatanum Number of allocated cdata objects
lj_gc_udatanum Number of allocated udata objects
lj_gc_freed_total Total amount of freed memory
lj_gc_memory Current allocated Lua memory
lj_gc_allocated_total Total amount of allocated memory

The following metrics provide CPU usage statistics. They are only available on Linux.

tnt_cpu_number Total number of processors configured by the operating system
tnt_cpu_time Host CPU time
tnt_cpu_thread

Tarantool thread CPU time. This metric always has the labels {kind="user", thread_name="tarantool", thread_pid="pid", file_name="init.lua"}, where:

  • kind can be either user or system
  • thread_name is tarantool, wal, iproto, or coio
  • file_name is the entrypoint file name, for example, init.lua.

There are also two cross-platform metrics, which can be obtained with a getrusage() call.

tnt_cpu_user_time Tarantool CPU user time
tnt_cpu_system_time Tarantool CPU system time

Vinyl metrics provide vinyl engine statistics.

The disk metrics are used to monitor overall data size on disk.

tnt_vinyl_disk_data_size Amount of data in bytes stored in the .run files located in vinyl_dir
tnt_vinyl_disk_index_size Amount of data in bytes stored in the .index files located in vinyl_dir

The vinyl regulator decides when to commence disk IO actions. It groups activities in batches so that they are more consistent and efficient.

tnt_vinyl_regulator_dump_bandwidth Estimated average dumping rate, bytes per second. The rate value is initially 10485760 (10 megabytes per second). It is recalculated depending on the the actual rate. Only significant dumps that are larger than 1 MB are used for estimating.
tnt_vinyl_regulator_write_rate Actual average rate of performing write operations, bytes per second. The rate is calculated as a 5-second moving average. If the metric value is gradually going down, this can indicate disk issues.
tnt_vinyl_regulator_rate_limit Write rate limit, bytes per second. The regulator imposes the limit on transactions based on the observed dump/compaction performance. If the metric value is down to approximately 10^5, this indicates issues with the disk or the scheduler.
tnt_vinyl_regulator_dump_watermark Maximum amount of memory in bytes used for in-memory storing of a vinyl LSM tree. When this maximum is accessed, a dump must occur. For details, see Наполнение LSM-дерева. The value is slightly smaller than the amount of memory allocated for vinyl trees, reflected in the vinyl_memory parameter.
tnt_vinyl_regulator_blocked_writers The number of fibers that are blocked waiting for Vinyl level0 memory quota.

tnt_vinyl_tx_commit Counter of commits (successful transaction ends) Includes implicit commits: for example, any insert operation causes a commit unless it is within a box.begin()box.commit() block.
tnt_vinyl_tx_rollback Сounter of rollbacks (unsuccessful transaction ends). This is not merely a count of explicit box.rollback() requests – it includes requests that ended with errors.
tnt_vinyl_tx_conflict Counter of conflicts that caused transactions to roll back. The ratio tnt_vinyl_tx_conflict / tnt_vinyl_tx_commit above 5% indicates that vinyl is not healthy. At that moment, you’ll probably see a lot of other problems with vinyl.
tnt_vinyl_tx_read_views Current number of read views – that is, transactions that entered the read-only state to avoid conflict temporarily. Usually the value is 0. If it stays non-zero for a long time, it is indicative of a memory leak.

The following metrics show state memory areas used by vinyl for caches and write buffers.

tnt_vinyl_memory_tuple_cache Amount of memory in bytes currently used to store tuples (data)
tnt_vinyl_memory_level0 «Level 0» (L0) memory area, bytes. L0 is the area that vinyl can use for in-memory storage of an LSM tree. By monitoring this metric, you can see when L0 is getting close to its maximum (tnt_vinyl_regulator_dump_watermark), at which time a dump will occur. You can expect L0 = 0 immediately after the dump operation is completed.
tnt_vinyl_memory_page_index Amount of memory in bytes currently used to store indexes. If the metric value is close to vinyl_memory, this indicates that vinyl_page_size was chosen incorrectly.
tnt_vinyl_memory_bloom_filter Amount of memory in bytes used by bloom filters.
tnt_vinyl_memory_tuple Total size of memory in bytes occupied by Vinyl tuples. It includes cached tuples and tuples pinned by the Lua world.

The vinyl scheduler invokes the regulator and updates the related variables. This happens once per second.

tnt_vinyl_scheduler_tasks

Number of scheduler dump/compaction tasks. The metric always has label {status = <status_value>}, where <status_value> can be one of the following:

  • inprogress for currently running tasks
  • completed for successfully completed tasks
  • failed for tasks aborted due to errors.
tnt_vinyl_scheduler_dump_time Total time in seconds spent by all worker threads performing dumps.
tnt_vinyl_scheduler_dump_total Counter of dumps completed.

Event loop tx thread information:

tnt_ev_loop_time Event loop time (ms)
tnt_ev_loop_prolog_time Event loop prolog time (ms)
tnt_ev_loop_epilog_time Event loop epilog time (ms)

Shows the current state of a synchronous replication.

tnt_synchro_queue_owner Instance ID of the current synchronous replication master.
tnt_synchro_queue_term Current queue term.
tnt_synchro_queue_len How many transactions are collecting confirmations now.
tnt_synchro_queue_busy Whether the queue is processing any system entry (CONFIRM/ROLLBACK/PROMOTE/DEMOTE).

Shows the current state of a replica set node in regards to leader election.

tnt_election_state

Election state (mode) of the node. When election is enabled, the node is writable only in the leader state. Possible values:

  • 0 (follower): all the non-leader nodes are called followers
  • 1 (candidate): the nodes that start a new election round are called candidates.
  • 2 (leader): the node that collected a quorum of votes becomes the leader
tnt_election_vote ID of a node the current node votes for. If the value is 0, it means the node hasn’t voted in the current term yet.
tnt_election_leader Leader node ID in the current term. If the value is 0, it means the node doesn’t know which node is the leader in the current term.
tnt_election_term Current election term.
tnt_election_leader_idle Time in seconds since the last interaction with the known leader.

Memtx mvcc memory statistics. Transaction manager consists of two parts:

tnt_memtx_tnx_statements are the transaction statements.

For example, the user started a transaction and made an action in it space:replace{0, 1}. Under the hood, this operation will turn into statement for the current transaction. This metric always has the label {kind="..."}, which has the following possible values:

  • total: the number of bytes that are allocated for the statements of all current transactions.
  • average: average bytes used by transactions for statements (txn.statements.total bytes / number of open transactions).
  • max: the maximum number of bytes used by one the current transaction for statements.
tnt_memtx_tnx_user

In Tarantool C API there is a function box_txn_alloc(). By using this function user can allocate memory for the current transaction. This metric always has the label {kind="..."}, which has the following possible values:

  • total: memory allocated by the box_txn_alloc() function on all current transactions.
  • average: transaction average (total allocated bytes / number of all current transactions).
  • max: the maximum number of bytes allocated by box_txn_alloc() function per transaction.
tnt_memtx_tnx_system

There are internals: logs, savepoints. This metric always has the label {kind="..."}, which has the following possible values:

  • total: memory allocated by internals on all current transactions.
  • average: average allocated memory by internals (total memory / number of all current transactions).
  • max: the maximum number of bytes allocated by internals per transaction.

mvcc is responsible for the isolation of transactions. It detects conflicts and makes sure that tuples that are no longer in the space, but read by some transaction (or can be read) have not been deleted.

tnt_memtx_mvcc_trackers

Trackers that keep track of transaction reads. This metric always has the label {kind="..."}, which has the following possible values:

  • total: trackers of all current transactions are allocated in total (in bytes).
  • average: average for all current transactions (total memory bytes / number of transactions).
  • max: maximum trackers allocated per transaction (in bytes).
tnt_memtx_mvcc_conflicts

Allocated in case of transaction conflicts. This metric always has the label {kind="..."}, which has the following possible values:

  • total: bytes allocated for conflicts in total.
  • average: average for all current transactions (total memory bytes / number of transactions).
  • max: maximum bytes allocated for conflicts per transaction.

Saved tuples are divided into 3 categories: used, read_view, tracking.

Each category has two metrics:

  • retained tuples - they are no longer in the index, but MVCC does not allow them to be removed.
  • stories - MVCC is based on the story mechanism, almost every tuple has a story. This is a separate metric because even the tuples that are in the index can have a story. So stories and retained need to be measured separately.
tnt_memtx_mvcc_tuples_used_stories

Tuples that are used by active read-write transactions. This metric always has the label {kind="..."}, which has the following possible values:

  • count: number of used tuples / number of stories.
  • total: amount of bytes used by stories used tuples.
tnt_memtx_mvcc_tuples_used_retained

Tuples that are used by active read-write transactions. But they are no longer in the index, but MVCC does not allow them to be removed. This metric always has the label {kind="..."}, which has the following possible values:

  • count: number of retained used tuples / number of stories.
  • total: amount of bytes used by retained used tuples.
tnt_memtx_mvcc_tuples_read_view_stories

Tuples that are not used by active read-write transactions, but are used by read-only transactions (i.e. in read view). This metric always has the label {kind="..."}, which has the following possible values:

  • count: number of read_view tuples / number of stories.
  • total: amount of bytes used by stories read_view tuples.
tnt_memtx_mvcc_tuples_read_view_retained

Tuples that are not used by active read-write transactions, but are used by read-only transactions (i.e. in read view). This tuples are no longer in the index, but MVCC does not allow them to be removed. This metric always has the label {kind="..."}, which has the following possible values:

  • count: number of retained read_view tuples / number of stories.
  • total: amount of bytes used by retained read_view tuples.
tnt_memtx_mvcc_tuples_tracking_stories

Tuples that are not directly used by any transactions, but are used by MVCC to track reads. This metric always has the label {kind="..."}, which has the following possible values:

  • count: number of tracking tuples / number of tracking stories.
  • total: amount of bytes used by stories tracking tuples.
tnt_memtx_mvcc_tuples_tracking_retained

Tuples that are not directly used by any transactions, but are used by MVCC to track reads. This tuples are no longer in the index, but MVCC does not allow them to be removed. This metric always has the label {kind="..."}, which has the following possible values:

  • count: number of retained tracking tuples / number of stories.
  • total: amount of bytes used by retained tracking tuples.

tnt_memtx_tuples_data_total Total amount of memory (in bytes) allocated for data tuples. This includes tnt_memtx_tuples_data_read_view and tnt_memtx_tuples_data_garbage metric values plus tuples that are actually stored in memtx spaces.
tnt_memtx_tuples_data_read_view Memory (in bytes) held for read views.
tnt_memtx_tuples_data_garbage Memory (in bytes) that is unused and scheduled to be freed (freed lazily on memory allocation).
tnt_memtx_index_total Total amount of memory (in bytes) allocated for indexing data. This includes tnt_memtx_index_read_view metric value plus memory used for indexing tuples that are actually stored in memtx spaces.
tnt_memtx_index_read_view Memory (in bytes) held for read views.

Since: 3.0.0.

tnt_config_alerts Count of current instance configuration apply alerts. {level="warn"} label covers warnings and {level="error"} covers errors.
tnt_config_status

The status of current instance configuration apply. status label contains possible status name. Current status has metric value 1, inactive statuses have metric value 0.

# HELP tnt_config_status Tarantool 3 configuration status
# TYPE tnt_config_status gauge
tnt_config_status{status="reload_in_progress",alias="router-001-a"} 0
tnt_config_status{status="uninitialized",alias="router-001-a"} 0
tnt_config_status{status="check_warnings",alias="router-001-a"} 0
tnt_config_status{status="ready",alias="router-001-a"} 1
tnt_config_status{status="check_errors",alias="router-001-a"} 0
tnt_config_status{status="startup_in_progress",alias="router-001-a"} 0

For example, this set of metrics means that current configuration for router-001-a status is ready.

Замечания по поводу некоторых операционных систем

On macOS, no native system tools for administering Tarantool are supported. The recommended way to administer Tarantool instances is using tt CLI.

В разделе ниже описывается пакет «dev-db/tarantool», установленный из официального оверлея layman (под названием tarantool).

По умолчанию с экземплярами используется директория /etc/tarantool/instances.available, ее можно переопределить в /etc/default/tarantool.

Управление экземплярами Tarantool (запуск/остановка/перезагрузка/проверка статуса и т.д.) можно осуществлять с помощью OpenRC. Рассмотрим пример, как создать экземпляр с управлением OpenRC:

$ cd /etc/init.d
$ ln -s tarantool your_service_name
$ ln -s /usr/share/tarantool/your_service_name.lua /etc/tarantool/instances.available/your_service_name.lua

Проверяем, что работает:

$ /etc/init.d/your_service_name start
$ tail -f -n 100 /var/log/tarantool/your_service_name.log

Руководство по разрешению проблем

Возможные причины

Решение

У вас есть несколько вариантов действий:

Возможные причины

The transaction processor thread consumes over 60% CPU.

Решение

Attach to the Tarantool instance with tt utility, analyze the query statistics with box.stat() and spot the CPU consumption leader. The following commands can help:

$ # attaching to a Tarantool instance
$ tt connect <instance_name|URI>
-- запрашиваем RPS для вызовов хранимых процедур
tarantool> box.stat().CALL.rps

Критическое значение RPS – 75 000, в случае большого Lua-приложения (модульного приложения, содержащего более 200 строк кода) – 10 000 - 20 000.

-- запрашиваем RPS для запросов указанного типа
tarantool> box.stat().<query_type>.rps

Критическое значение RPS для запросов типа SELECT/INSERT/UPDATE/DELETE – 100 000.

Если основная нагрузка генерируется SELECT-запросами, следует добавить slave-сервер и часть запросов обрабатывать на нем.

Если же нагрузка по большей части приходится на INSERT/UPDATE/DELETE-запросы, рекомендуется провести шардинг базы данных.

Возможные причины

Примечание

Все описанные ниже ситуации можно распознать по записям в журнале Tarantool, начинающимся со слов 'Too long...'.

  1. Быстрые и медленные запросы обрабатываются в одном подключении, что приводит к забиванию readahead-буфера медленными запросами.

    Решение

    У вас есть несколько вариантов действий:

    • Увеличить размер readahead-буфера (box.cfg{readahead}).

      This parameter can be changed on the fly, so you don’t need to restart Tarantool. Attach to the Tarantool instance with tt utility and call box.cfg{} with a new readahead value:

      $ # attaching to a Tarantool instance
      $ tt connect <instance_name|URI>
      
      -- задаем новое значение readahead
      tarantool> box.cfg{readahead = 10 * 1024 * 1024}
      

      Пример расчета: при 1000 RPS, размере одного запроса в 1 Кбайт и максимальном времени обработки одного запроса в 10 секунд минимальный размер readahead-буфера должен равняться 10 Мбайт.

    • Обрабатывать быстрые и медленные запросы в отдельных подключениях (решается на уровне бизнес-логики).

  2. Медленная работа дисков.

    Решение

    Проверить занятость дисков (с помощью утилиты iostat, iotop или strace посмотреть на параметр iowait) и попробовать разнести .xlog-файлы и снимки состояния базы данных по разным дискам (т.е. указать разные значения для параметров wal_dir и memtx_dir).

Речь идет о параметрах box.info.replication.(upstream.)lag и box.info.replication.(upstream.)idle из сводной таблицы box.info.replication.

Возможные причины

Не синхронизированы часы на машинах или неправильно работает NTP-сервер.

Решение

Проверить настройки NTP-сервера.

Если проблем с NTP-сервером не обнаружено, то не следует ничего предпринимать, потому что при вычислении лага репликации используются показания системных часов на двух разных машинах, и в случае рассинхронизации может случиться так, что часы удаленного мастер-сервера всегда будут отставать от часов локального экземпляра Tarantool.

Речь идет о кластере, состоящем из одного мастера и нескольких реплик. В таком случае значения общих параметров из сводной таблицы box.info.replication, например box.info.replication.lsn, должны приходить с мастера и должны быть одинаковыми на всех репликах. Если такие параметры не совпадают, это свидетельствует о наличии проблем.

Возможные причины

Сбой репликации.

Решение

Перезапустить репликацию.

Речь идет о том, что параметр box.info.replication(.upstream).status имеет значение stopped.

Возможные причины

В репликационном кластере, состоящем из двух мастер-серверов, один из серверов попытался выполнить действие, уже выполненное другим сервером, – например, повторно вставить кортеж с таким же уникальным ключом (распознается по ошибке вида 'Duplicate key exists in unique index 'primary' in space <space_name>').

Решение

This issue can be fixed in two ways:

Примечание

If one of the instances must be isolated during troubleshooting, it can be put to the isolated mode.

Then, restart replication as described in Restarting replication.

Возможные причины

Неэффективное использование памяти (память занята большим количеством неиспользуемых объектов).

Решение

Запустить сборщик мусора в Lua с помощью функции collectgarbage(count) и измерить время ее выполнения с помощью clock.bench() или clock.proc().

Пример кода для подсчета потребляемой памяти:

$ # attaching to a Tarantool instance
$ tt connect <instance_name|URI>
-- загрузка модуля clock для работы со временем
tarantool> local clock = require 'clock'
-- запускаем таймер
tarantool> local b = clock.proc()
-- запускаем сборку мусора
tarantool> local c = collectgarbage('count')
-- останавливаем таймер по завершении сборки мусора
tarantool> return c, clock.proc() - b

Если возвращаемое clock.proc() значение больше 0.001, это может являться признаком неэффективного использования памяти (активного вмешательства не требуется, но рекомендуется оптимизация кода). Если значение превышает 0.01, необходимо провести подробный анализ кода и оптимизировать потребление памяти.

Если значение больше 0,01, код приложения однозначно необходимо проанализировать на предмет оптимизации использования памяти.

Проблема: Переключатель файберов запрещен в метаметоде __gc

Переключатель файберов запрещен в метаметоде __gc, начиная с этого изменения, во избежание неожиданной нехватки памяти в Lua. Однако может потребоваться функция передачи управления для финализации ресурсов, например, для закрытия сокета.

Ниже приведены примеры правильной реализации такой процедуры.

Для начала есть два простых примера, которые иллюстрируют логику решения:

Далее идет Пример 3, который проиллюстрирует использование модуля sched.lua, — рекомендуемый метод.

Все пояснения приведены в комментариях в листинге кода. -- > обозначает вывод в консоль.

Пример 1

Реализация подходящего финализатора для определенного типа FFI (custom_t).

local ffi = require('ffi')
local fiber = require('fiber')

ffi.cdef('struct custom { int a; };')

local function __custom_gc(self)
  print(("Entered custom GC finalizer for %s... (before yield)"):format(self.a))
  fiber.yield()
  print(("Leaving custom GC finalizer for %s... (after yield)"):format(self.a))
end

local custom_t = ffi.metatype('struct custom', {
  __gc = function(self)
    -- XXX: Do not invoke yielding functions in __gc metamethod.
    -- Create a new fiber to run after the execution leaves
    -- this routine.
    fiber.new(__custom_gc, self)
    print(("Finalization is scheduled for %s..."):format(self.a))
  end
})

-- Create a cdata object of <custom_t> type.
local c = custom_t(42)

-- Remove a single reference to that object to make it subject
-- for GC.
c = nil

-- Run full GC cycle to purge the unreferenced object.
collectgarbage('collect')
-- > Finalization is scheduled for 42...

-- XXX: There is no finalization made until the running fiber
-- yields its execution. Let's do it now.
fiber.yield()
-- > Entered custom GC finalizer for 42... (before yield)
-- > Leaving custom GC finalizer for 42... (after yield)

Пример 2

Implementing a valid finalizer for a particular user type (struct custom).

custom.c

#include <lauxlib.h>
#include <lua.h>
#include <module.h>
#include <stdio.h>

struct custom {
  int a;
};

const char *CUSTOM_MTNAME = "CUSTOM_MTNAME";

/*
 * XXX: Do not invoke yielding functions in __gc metamethod.
 * Create a new fiber to be run after the execution leaves
 * this routine. Unfortunately we can't pass the parameters to the
 * routine to be executed by the created fiber via <fiber_new_ex>.
 * So there is a workaround to load the Lua code below to create
 * __gc metamethod passing the object for finalization via Lua
 * stack to the spawned fiber.
 */
const char *gc_wrapper_constructor = " local fiber = require('fiber')         "
             " print('constructor is initialized')    "
             " return function(__custom_gc)           "
             "   print('constructor is called')       "
             "   return function(self)                "
             "     print('__gc is called')            "
             "     fiber.new(__custom_gc, self)       "
             "     print('Finalization is scheduled') "
             "   end                                  "
             " end                                    "
        ;

int custom_gc(lua_State *L) {
  struct custom *self = luaL_checkudata(L, 1, CUSTOM_MTNAME);
  printf("Entered custom_gc for %d... (before yield)\n", self->a);
  fiber_sleep(0);
  printf("Leaving custom_gc for %d... (after yield)\n", self->a);
  return 0;
}

int custom_new(lua_State *L) {
  struct custom *self = lua_newuserdata(L, sizeof(struct custom));
  luaL_getmetatable(L, CUSTOM_MTNAME);
  lua_setmetatable(L, -2);
  self->a = lua_tonumber(L, 1);
  return 1;
}

static const struct luaL_Reg libcustom_methods [] = {
  { "new", custom_new },
  { NULL, NULL }
};

int luaopen_custom(lua_State *L) {
  int rc;

  /* Create metatable for struct custom type */
  luaL_newmetatable(L, CUSTOM_MTNAME);
  /*
   * Run the constructor initializer for GC finalizer:
   * - load fiber module as an upvalue for GC finalizer
   *   constructor
   * - return GC finalizer constructor on the top of the
   *   Lua stack
   */
  rc = luaL_dostring(L, gc_wrapper_constructor);
  /*
   * Check whether constructor is initialized (i.e. neither
   * syntax nor runtime error is raised).
   */
  if (rc != LUA_OK)
    luaL_error(L, "test module loading failed: constructor init");
  /*
   * Create GC object for <custom_gc> function to be called
   * in scope of the GC finalizer and push it on top of the
   * constructor returned before.
   */
  lua_pushcfunction(L, custom_gc);
  /*
   * Run the constructor with <custom_gc> GCfunc object as
   * a single argument. As a result GC finalizer is returned
   * on the top of the Lua stack.
   */
  rc = lua_pcall(L, 1, 1, 0);
  /*
   * Check whether GC finalizer is created (i.e. neither
   * syntax nor runtime error is raised).
   */
  if (rc != LUA_OK)
    luaL_error(L, "test module loading failed: __gc init");
  /*
   * Assign the returned function as a __gc metamethod to
   * custom type metatable.
   */
  lua_setfield(L, -2, "__gc");

  /*
   * Initialize Lua table for custom module and fill it
   * with the custom methods.
   */
  lua_newtable(L);
  luaL_register(L, NULL, libcustom_methods);
  return 1;
}

custom_c.lua

-- Load custom Lua C extension.
local custom = require('custom')
-- > constructor is initialized
-- > constructor is called

-- Create a userdata object of <struct custom> type.
local c = custom.new(9)

-- Remove a single reference to that object to make it subject
-- for GC.
c = nil

-- Run full GC cycle to purge the unreferenced object.
collectgarbage('collect')
-- > __gc is called
-- > Finalization is scheduled

-- XXX: There is no finalization made until the running fiber
-- yields its execution. Let's do it now.
require('fiber').yield()
-- > Entered custom_gc for 9... (before yield)

-- XXX: Finalizer yields the execution, so now we are here.
print('We are here')
-- > We are here

-- XXX: This fiber finishes its execution, so yield to the
-- remaining fiber to finish the postponed finalization.
-- > Leaving custom_gc for 9... (after yield)

Example 3

It is important to note that the finalizer implementations in the examples above increase pressure on the platform performance by creating a new fiber on each __gc call. To prevent such an excessive fibers spawning, it’s better to start a single «scheduler» fiber and provide the interface to postpone the required asynchronous action.

For this purpose, the module called sched.lua is implemented (see the listing below). It is a part of Tarantool and should be made required in your custom code. The usage example is given in the init.lua file below.

sched.lua

local fiber = require('fiber')

local worker_next_task = nil
local worker_last_task
local worker_fiber
local worker_cv = fiber.cond()

-- XXX: the module is not ready for reloading, so worker_fiber is
-- respawned when sched.lua is purged from package.loaded.

--
-- Worker is a singleton fiber for not urgent delayed execution of
-- functions. Main purpose - schedule execution of a function,
-- which is going to yield, from a context, where a yield is not
-- allowed. Such as an FFI object's GC callback.
--
local function worker_f()
  while true do
    local task
    while true do
      task = worker_next_task
      if task then break end
      -- XXX: Make the fiber wait until the task is added.
      worker_cv:wait()
    end
    worker_next_task = task.next
    task.f(task.arg)
    fiber.yield()
  end
end

local function worker_safe_f()
  pcall(worker_f)
  -- The function <worker_f> never returns. If the execution is
  -- here, this fiber is probably canceled and now is not able to
  -- sleep. Create a new one.
  worker_fiber = fiber.new(worker_safe_f)
end

worker_fiber = fiber.new(worker_safe_f)

local function worker_schedule_task(f, arg)
  local task = { f = f, arg = arg }
  if not worker_next_task then
    worker_next_task = task
  else
    worker_last_task.next = task
  end
  worker_last_task = task
  worker_cv:signal()
end

return {
  postpone = worker_schedule_task
}

init.lua

local ffi = require('ffi')
local fiber = require('fiber')
local sched = require('sched')

local function __custom_gc(self)
  print(("Entered custom GC finalizer for %s... (before yield)"):format(self.a))
  fiber.yield()
  print(("Leaving custom GC finalizer for %s... (after yield)"):format(self.a))
end

ffi.cdef('struct custom { int a; };')
local custom_t = ffi.metatype('struct custom', {
  __gc = function(self)
    -- XXX: Do not invoke yielding functions in __gc metamethod.
    -- Schedule __custom_gc call via sched.postpone to be run
    -- after the execution leaves this routine.
    sched.postpone(__custom_gc, self)
    print(("Finalization is scheduled for %s..."):format(self.a))
  end
})

-- Create several <custom_t> objects to be finalized later.
local t = { }
for i = 1, 10 do t[i] = custom_t(i) end

-- Run full GC cycle to collect the existing garbage. Nothing is
-- going to be printed, since the table <t> is still "alive".
collectgarbage('collect')

-- Remove the reference to the table and, ergo, all references to
-- the objects.
t = nil

-- Run full GC cycle to collect the table and objects inside it.
-- As a result all <custom_t> objects are scheduled for further
-- finalization, but the finalizer itself (i.e. __custom_gc
-- functions) is not called.
collectgarbage('collect')
-- > Finalization is scheduled for 10...
-- > Finalization is scheduled for 9...
-- > ...
-- > Finalization is scheduled for 2...
-- > Finalization is scheduled for 1...

-- XXX: There is no finalization made until the running fiber
-- yields its execution. Let's do it now.
fiber.yield()
-- > Entered custom GC finalizer for 10... (before yield)

-- XXX: Oops, we are here now, since the scheduler fiber yielded
-- the execution to this one. Check this out.
print("We're here now. Let's continue the scheduled finalization.")
-- > We're here now. Let's continue the finalization

-- OK, wait a second to allow the scheduler to cleanup the
-- remaining garbage.
fiber.sleep(1)
-- > Leaving custom GC finalizer for 10... (after yield)
-- > Entered custom GC finalizer for 9... (before yield)
-- > Leaving custom GC finalizer for 9... (after yield)
-- > ...
-- > Entered custom GC finalizer for 1... (before yield)
-- > Leaving custom GC finalizer for 1... (after yield)

print("Did we finish? I guess so.")
-- > Did we finish? I guess so.

-- Stop the instance.
os.exit(0)

Connectors

Connectors are APIs that allow using Tarantool with various programming languages.

Connectors can be divided into two groups – those maintained by the Tarantool team and those supported by the community. The Tarantool team maintains the following connectors:

All other connectors are community-supported, which means that support for new Tarantool features may be delayed. Find all the available connectors on the Connectors page.

Tarantool’s binary protocol was designed with a focus on asynchronous I/O and easy integration with proxies. Each client request starts with a variable-length binary header, containing request id, request type, instance id, log sequence number, and so on.

The mandatory length, present in request header simplifies client or proxy I/O. A response to a request is sent to the client as soon as it is ready. It always carries in its header the same type and id as in the request. The id makes it possible to match a request to a response, even if the latter arrived out of order.

Unless implementing a client driver, you needn’t concern yourself with the complications of the binary protocol. Language-specific drivers provide a friendly way to store domain language data structures in Tarantool. A complete description of the binary protocol is maintained in annotated Backus-Naur form in the source tree. For detailed examples and diagrams of all binary-protocol requests and responses, see Tarantool’s binary protocol.

The Tarantool API exists so that a client program can send a request packet to a server instance, and receive a response. Here is an example of a what the client would send for box.space[513]:insert{'A', 'BB'}. The BNF description of the components is on the page about Tarantool’s binary protocol.

Component Byte #0 Byte #1 Byte #2 Byte #3
code for insert 02      
rest of header
2-digit number: space id cd 02 01  
code for tuple 21      
1-digit number: field count = 2 92      
1-character string: field[1] a1 41    
2-character string: field[2] a2 42 42  

Now, you could send that packet to the Tarantool instance, and interpret the response (the page about Tarantool’s binary protocol has a description of the packet format for responses as well as requests). But it would be easier, and less error-prone, if you could invoke a routine that formats the packet according to typed parameters. Something like response = tarantool_routine("insert", 513, "A", "B");. And that is why APIs exist for drivers for Perl, Python, PHP, and so on.

This chapter has examples that show how to connect to a Tarantool instance via the Perl, PHP, Python, node.js, and C connectors. The examples contain hard code that will work if and only if the following conditions are met:

It is easy to meet all the conditions by starting the instance and executing this script:

box.cfg{listen=3301}
box.schema.space.create('examples',{id=999})
box.space.examples:create_index('primary', {type = 'hash', parts = {1, 'unsigned'}})
box.schema.user.grant('guest','read,write','space','examples')
box.schema.user.grant('guest','read','space','_space')

For all connectors, calling a function via Tarantool causes a return in the MsgPack format. If the function is called using the connector’s API, some conversions may occur. All scalar values are returned as tuples (with a MsgPack type-identifier followed by a value); all non-scalar values are returned as a group of tuples (with a MsgPack array-identifier followed by the scalar values). If the function is called via the binary protocol command layer – «eval» – rather than via the connector’s API, no conversions occur.

In the following example, a Lua function will be created. Since it will be accessed externally by a „guest“ user, a grant of an execute privilege will be necessary. The function returns an empty array, a scalar string, two booleans, and a short integer. The values are the ones described in the table Common Types and MsgPack Encodings.

tarantool> box.cfg{listen=3301}
2016-03-03 18:45:52.802 [27381] main/101/interactive I> ready to accept requests
---
...
tarantool> function f() return {},'a',false,true,127; end
---
...
tarantool> box.schema.func.create('f')
---
...
tarantool> box.schema.user.grant('guest','execute','function','f')
---
...

Here is a C program which calls the function. Although C is being used for the example, the result would be precisely the same if the calling program was written in Perl, PHP, Python, Go, or Java.

#include <stdio.h>
#include <stdlib.h>
#include <tarantool/tarantool.h>
#include <tarantool/tnt_net.h>
#include <tarantool/tnt_opt.h>
void main() {
  struct tnt_stream *tnt = tnt_net(NULL);              /* SETUP */
  tnt_set(tnt, TNT_OPT_URI, "localhost:3301");
   if (tnt_connect(tnt) < 0) {                         /* CONNECT */
       printf("Connection refused\n");
       exit(-1);
   }
   struct tnt_stream *arg; arg = tnt_object(NULL);     /* MAKE REQUEST */
   tnt_object_add_array(arg, 0);
   struct tnt_request *req1 = tnt_request_call(NULL);  /* CALL function f() */
   tnt_request_set_funcz(req1, "f");
   uint64_t sync1 = tnt_request_compile(tnt, req1);
   tnt_flush(tnt);                                     /* SEND REQUEST */
   struct tnt_reply reply;  tnt_reply_init(&reply);    /* GET REPLY */
   tnt->read_reply(tnt, &reply);
   if (reply.code != 0) {
     printf("Call failed %lu.\n", reply.code);
     exit(-1);
   }
   const unsigned char *p= (unsigned char*)reply.data; /* PRINT REPLY */
   while (p < (unsigned char *) reply.data_end)
   {
     printf("%x ", *p);
     ++p;
   }
   printf("\n");
   tnt_close(tnt);                                     /* TEARDOWN */
   tnt_stream_free(arg);
   tnt_stream_free(tnt);
}

When this program is executed, it will print:

dd 0 0 0 5 90 91 a1 61 91 c2 91 c3 91 7f

The first five bytes – dd 0 0 0 5 – are the MsgPack encoding for «32-bit array header with value 5» (see MsgPack specification). The rest are as described in the table Common Types and MsgPack Encodings.

Go

Examples on GitHub: sample_db, go

go-tarantool is the official Go connector for Tarantool. It is not supplied as part of the Tarantool repository and should be installed separately.

This tutorial shows how to use the go-tarantool 2.x library to create a Go application that connects to a remote Tarantool instance, performs CRUD operations, and executes a stored procedure. You can find the full package documentation here: Client in Go for Tarantool.

Примечание

This tutorial shows how to make CRUD requests to a single-instance Tarantool database. To make requests to a sharded Tarantool cluster with the CRUD module, use the crud package’s API.

This section describes the configuration of a sample database that allows remote connections:

credentials:
  users:
    sampleuser:
      password: '123456'
      privileges:
      - permissions: [ read, write ]
        spaces: [ bands ]
      - permissions: [ execute ]
        functions: [ get_bands_older_than ]

groups:
  group001:
    replicasets:
      replicaset001:
        instances:
          instance001:
            iproto:
              listen:
              - uri: '127.0.0.1:3301'

app:
  file: 'myapp.lua'

The myapp.lua file looks as follows:

-- Create a space --
box.schema.space.create('bands')

-- Specify field names and types --
box.space.bands:format({
    { name = 'id', type = 'unsigned' },
    { name = 'band_name', type = 'string' },
    { name = 'year', type = 'unsigned' }
})

-- Create indexes --
box.space.bands:create_index('primary', { parts = { 'id' } })
box.space.bands:create_index('band', { parts = { 'band_name' } })
box.space.bands:create_index('year_band', { parts = { { 'year' }, { 'band_name' } } })

-- Create a stored function --
box.schema.func.create('get_bands_older_than', {
    body = [[
    function(year)
        return box.space.bands.index.year_band:select({ year }, { iterator = 'LT', limit = 10 })
    end
    ]]
})

You can find the full example on GitHub: sample_db.

Before creating and starting a client Go application, you need to run the sample_db application using tt start:

$ tt start sample_db

Now you can create a client Go application that makes requests to this database.

Before you start, make sure you have Go installed on your computer.

  1. Create the hello directory for your application and go to this directory:

    $ mkdir hello
    $ cd hello
    
  2. Initialize a new Go module:

    $ go mod init example/hello
    
  3. Inside the hello directory, create the hello.go file for application code.

In the hello.go file, declare a main package and import the following packages:

package main

import (
	"context"
	"fmt"
	"github.com/tarantool/go-tarantool/v2"
	_ "github.com/tarantool/go-tarantool/v2/datetime"
	_ "github.com/tarantool/go-tarantool/v2/decimal"
	_ "github.com/tarantool/go-tarantool/v2/uuid"
	"time"
)

The packages for external MsgPack types, such as datetime, decimal, or uuid, are required to parse these types in a response.

  1. Declare the main() function:

    func main() {
    
    }
    
  2. Inside the main() function, add the following code:

    // Connect to the database
    ctx, cancel := context.WithTimeout(context.Background(), time.Second)
    defer cancel()
    dialer := tarantool.NetDialer{
    	Address:  "127.0.0.1:3301",
    	User:     "sampleuser",
    	Password: "123456",
    }
    opts := tarantool.Opts{
    	Timeout: time.Second,
    }
    
    conn, err := tarantool.Connect(ctx, dialer, opts)
    if err != nil {
    	fmt.Println("Connection refused:", err)
    	return
    }
    
    // Interact with the database
    // ...
    

    This code establishes a connection to a running Tarantool instance on behalf of sampleuser. The conn object can be used to make CRUD requests and execute stored procedures.

Add the following code to insert four tuples into the bands space:

// Insert data
tuples := [][]interface{}{
	{1, "Roxette", 1986},
	{2, "Scorpions", 1965},
	{3, "Ace of Base", 1987},
	{4, "The Beatles", 1960},
}
var futures []*tarantool.Future
for _, tuple := range tuples {
	request := tarantool.NewInsertRequest("bands").Tuple(tuple)
	futures = append(futures, conn.Do(request))
}
fmt.Println("Inserted tuples:")
for _, future := range futures {
	result, err := future.Get()
	if err != nil {
		fmt.Println("Got an error:", err)
	} else {
		fmt.Println(result)
	}
}

This code makes insert requests asynchronously:

  • The Future structure is used as a handle for asynchronous requests.
  • The NewInsertRequest() method creates an insert request object that is executed by the connection.

Примечание

Making requests asynchronously is the recommended way to perform data operations. Further requests in this tutorial are made synchronously.

To get a tuple by the specified primary key value, use NewSelectRequest() to create an insert request object:

// Select by primary key
data, err := conn.Do(
	tarantool.NewSelectRequest("bands").
		Limit(10).
		Iterator(tarantool.IterEq).
		Key([]interface{}{uint(1)}),
).Get()
if err != nil {
	fmt.Println("Got an error:", err)
}
fmt.Println("Tuple selected by the primary key value:", data)

You can also get a tuple by the value of the specified index by using Index():

// Select by secondary key
data, err = conn.Do(
	tarantool.NewSelectRequest("bands").
		Index("band").
		Limit(10).
		Iterator(tarantool.IterEq).
		Key([]interface{}{"The Beatles"}),
).Get()
if err != nil {
	fmt.Println("Got an error:", err)
}
fmt.Println("Tuple selected by the secondary key value:", data)

NewUpdateRequest() can be used to update a tuple identified by the primary key as follows:

// Update
data, err = conn.Do(
	tarantool.NewUpdateRequest("bands").
		Key(tarantool.IntKey{2}).
		Operations(tarantool.NewOperations().Assign(1, "Pink Floyd")),
).Get()
if err != nil {
	fmt.Println("Got an error:", err)
}
fmt.Println("Updated tuple:", data)

NewUpsertRequest() can be used to update an existing tuple or insert a new one. In the example below, a new tuple is inserted:

// Upsert
data, err = conn.Do(
	tarantool.NewUpsertRequest("bands").
		Tuple([]interface{}{uint(5), "The Rolling Stones", 1962}).
		Operations(tarantool.NewOperations().Assign(1, "The Doors")),
).Get()
if err != nil {
	fmt.Println("Got an error:", err)
}

In this example, NewReplaceRequest() is used to delete the existing tuple and insert a new one:

// Replace
data, err = conn.Do(
	tarantool.NewReplaceRequest("bands").
		Tuple([]interface{}{1, "Queen", 1970}),
).Get()
if err != nil {
	fmt.Println("Got an error:", err)
}
fmt.Println("Replaced tuple:", data)

NewDeleteRequest() in the example below is used to delete a tuple whose primary key value is 5:

// Delete
data, err = conn.Do(
	tarantool.NewDeleteRequest("bands").
		Key([]interface{}{uint(5)}),
).Get()
if err != nil {
	fmt.Println("Got an error:", err)
}
fmt.Println("Deleted tuple:", data)

To execute a stored procedure, use NewCallRequest():

// Call
data, err = conn.Do(
	tarantool.NewCallRequest("get_bands_older_than").Args([]interface{}{1966}),
).Get()
if err != nil {
	fmt.Println("Got an error:", err)
}
fmt.Println("Stored procedure result:", data)

The CloseGraceful() method can be used to close the connection when it is no longer needed:

// Close connection
conn.CloseGraceful()
fmt.Println("Connection is closed")

Примечание

You can find the example with all the requests above on GitHub: go.

  1. Execute the following go get commands to update dependencies in the go.mod file:

    $ go get github.com/tarantool/go-tarantool/v2
    $ go get github.com/tarantool/go-tarantool/v2/decimal
    $ go get github.com/tarantool/go-tarantool/v2/uuid
    
  2. To run the resulting application, execute the go run command in the application directory:

    $ go run .
    Inserted tuples:
    [[1 Roxette 1986]]
    [[2 Scorpions 1965]]
    [[3 Ace of Base 1987]]
    [[4 The Beatles 1960]]
    Tuple selected by the primary key value: [[1 Roxette 1986]]
    Tuple selected by the secondary key value: [[4 The Beatles 1960]]
    Updated tuple: [[2 Pink Floyd 1965]]
    Replaced tuple: [[1 Queen 1970]]
    Deleted tuple: [[5 The Rolling Stones 1962]]
    Stored procedure result: [[[2 Pink Floyd 1965] [4 The Beatles 1960]]]
    Connection is closed
    

Last update: January 2023

There are also the following community-driven Go connectors:

The table below contains a feature comparison for the connectors mentioned above.

  tarantool/go-tarantool viciious/go-tarantool FZambia/tarantool
License BSD 2-Clause MIT BSD 2-Clause
Last update 2023 2022 2022
Documentation README with examples and up-to-date GoDoc README with examples, code comments README with examples
Testing / CI / CD GitHub Actions Travis CI GitHub Actions
GitHub Stars 147 45 14
Static analysis golangci-lint, luacheck golint golangci-lint
Packaging go get go get go get
Code coverage Yes No No
msgpack driver vmihailenco/msgpack/v2 or vmihailenco/msgpack/v5 tinylib/msgp vmihailenco/msgpack/v5
Async work Yes Yes Yes
Schema reload Yes (manual pull) Yes (manual pull) Yes (manual pull)
Space / index names Yes Yes Yes
Tuples as structures Yes (structure and marshall functions must be predefined in Go code) No Yes (structure and marshall functions must be predefined in Go code)
Access tuple fields by names Only if marshalled to structure No Only if marshalled to structure
SQL support Yes No (#18, closed) No
Interactive transactions Yes No No
Varbinary support Yes (with in-built language tools) Yes (with in-built language tools) Yes (decodes to string by default, see #6)
UUID support Yes No No
Decimal support Yes No No
EXT_ERROR support Yes No No
Datetime support Yes No No
box.session.push() responses Yes No (#21) Yes
Session settings Yes No No
Graceful shutdown Yes No No
IPROTO_ID (feature discovering) Yes No No
tarantool/crud support No No No
Connection pool Yes (round-robin failover, no balancing) No No
Transparent reconnecting Yes (see comments in #129) No (handle reconnects explicitly, refer to #11) Yes (see comments in #7)
Transparent request retrying No No No
Watchers Yes No No
Pagination Yes No No
Language features context context context
Miscellaneous Supports tarantool/queue API Can mimic a Tarantool instance (also as replica). Provides instrumentation for reading snapshot and xlog files via snapio module. Implements unpacking of query structs if you want to implement your own iproto proxy API is experimental and breaking changes may happen

Java

The following Java connectors are available:

Примечание

The connectors below are either deprecated or are planned for deprecation.

C

tarantool-c is the official C connector for Tarantool. You can find the full library documentation here: Documentation for tarantool-c.

Here follow two examples of using Tarantool’s high-level C API.

Here is a complete C program that inserts [99999,'B'] into space examples via the high-level C API.

#include <stdio.h>
#include <stdlib.h>

#include <tarantool/tarantool.h>
#include <tarantool/tnt_net.h>
#include <tarantool/tnt_opt.h>

void main() {
   struct tnt_stream *tnt = tnt_net(NULL);          /* See note = SETUP */
   tnt_set(tnt, TNT_OPT_URI, "localhost:3301");
   if (tnt_connect(tnt) < 0) {                      /* See note = CONNECT */
       printf("Connection refused\n");
       exit(-1);
   }
   struct tnt_stream *tuple = tnt_object(NULL);     /* See note = MAKE REQUEST */
   tnt_object_format(tuple, "[%d%s]", 99999, "B");
   tnt_insert(tnt, 999, tuple);                     /* See note = SEND REQUEST */
   tnt_flush(tnt);
   struct tnt_reply reply;  tnt_reply_init(&reply); /* See note = GET REPLY */
   tnt->read_reply(tnt, &reply);
   if (reply.code != 0) {
       printf("Insert failed %lu.\n", reply.code);
   }
   tnt_close(tnt);                                  /* See below = TEARDOWN */
   tnt_stream_free(tuple);
   tnt_stream_free(tnt);
}

Paste the code into a file named example.c and install tarantool-c. One way to install tarantool-c (using Ubuntu) is:

$ git clone git://github.com/tarantool/tarantool-c.git ~/tarantool-c
$ cd ~/tarantool-c
$ git submodule init
$ git submodule update
$ cmake .
$ make
$ make install

To compile and link the program, run:

$ # sometimes this is necessary:
$ export LD_LIBRARY_PATH=/usr/local/lib
$ gcc -o example example.c -ltarantool

Before trying to run, check that a server instance is listening at localhost:3301 and that the space examples exists, as described earlier. To run the program, say ./example. The program will connect to the Tarantool instance, and will send the request. If Tarantool is not running on localhost with listen address = 3301, the program will print “Connection refused”. If the insert fails, the program will print «Insert failed» and an error number (see all error codes in the source file /src/box/errcode.h).

Here are notes corresponding to comments in the example program.

The setup begins by creating a stream.

struct tnt_stream *tnt = tnt_net(NULL);
tnt_set(tnt, TNT_OPT_URI, "localhost:3301");

In this program, the stream will be named tnt. Before connecting on the tnt stream, some options may have to be set. The most important option is TNT_OPT_URI. In this program, the URI is localhost:3301, since that is where the Tarantool instance is supposed to be listening.

Function description:

struct tnt_stream *tnt_net(struct tnt_stream *s)
int tnt_set(struct tnt_stream *s, int option, variant option-value)

Now that the stream named tnt exists and is associated with a URI, this example program can connect to a server instance.

if (tnt_connect(tnt) < 0)
   { printf("Connection refused\n"); exit(-1); }

Function description:

int tnt_connect(struct tnt_stream *s)

The connection might fail for a variety of reasons, such as: the server is not running, or the URI contains an invalid password. If the connection fails, the return value will be -1.

Most requests require passing a structured value, such as the contents of a tuple.

struct tnt_stream *tuple = tnt_object(NULL);
tnt_object_format(tuple, "[%d%s]", 99999, "B");

In this program, the request will be an INSERT, and the tuple contents will be an integer and a string. This is a simple serial set of values, that is, there are no sub-structures or arrays. Therefore it is easy in this case to format what will be passed using the same sort of arguments that one would use with a C printf() function: %d for the integer, %s for the string, then the integer value, then a pointer to the string value.

Function description:

ssize_t tnt_object_format(struct tnt_stream *s, const char *fmt, ...)

The database-manipulation requests are analogous to the requests in the box library.

tnt_insert(tnt, 999, tuple);
tnt_flush(tnt);

In this program, the choice is to do an INSERT request, so the program passes the tnt_stream that was used for connection (tnt) and the tnt_stream that was set up with tnt_object_format() (tuple).

Function description:

ssize_t tnt_insert(struct tnt_stream *s, uint32_t space, struct tnt_stream *tuple)
ssize_t tnt_replace(struct tnt_stream *s, uint32_t space, struct tnt_stream *tuple)
ssize_t tnt_select(struct tnt_stream *s, uint32_t space, uint32_t index,
                   uint32_t limit, uint32_t offset, uint8_t iterator,
                   struct tnt_stream *key)
ssize_t tnt_update(struct tnt_stream *s, uint32_t space, uint32_t index,
                   struct tnt_stream *key, struct tnt_stream *ops)

For most requests, the client will receive a reply containing some indication whether the result was successful, and a set of tuples.

struct tnt_reply reply;  tnt_reply_init(&reply);
tnt->read_reply(tnt, &reply);
if (reply.code != 0)
   { printf("Insert failed %lu.\n", reply.code); }

This program checks for success but does not decode the rest of the reply.

Function description:

struct tnt_reply *tnt_reply_init(struct tnt_reply *r)
tnt->read_reply(struct tnt_stream *s, struct tnt_reply *r)
void tnt_reply_free(struct tnt_reply *r)

When a session ends, the connection that was made with tnt_connect() should be closed, and the objects that were made in the setup should be destroyed.

tnt_close(tnt);
tnt_stream_free(tuple);
tnt_stream_free(tnt);

Function description:

void tnt_close(struct tnt_stream *s)
void tnt_stream_free(struct tnt_stream *s)

Here is a complete C program that selects, using index key [99999], from space examples via the high-level C API. To display the results, the program uses functions in the MsgPuck library which allow decoding of MessagePack arrays.

#include <stdio.h>
#include <stdlib.h>
#include <tarantool/tarantool.h>
#include <tarantool/tnt_net.h>
#include <tarantool/tnt_opt.h>

#define MP_SOURCE 1
#include <msgpuck.h>

void main() {
    struct tnt_stream *tnt = tnt_net(NULL);
    tnt_set(tnt, TNT_OPT_URI, "localhost:3301");
    if (tnt_connect(tnt) < 0) {
        printf("Connection refused\n");
        exit(1);
    }
    struct tnt_stream *tuple = tnt_object(NULL);
    tnt_object_format(tuple, "[%d]", 99999); /* tuple = search key */
    tnt_select(tnt, 999, 0, UINT32_MAX, 0, 0, tuple);
    tnt_flush(tnt);
    struct tnt_reply reply; tnt_reply_init(&reply);
    tnt->read_reply(tnt, &reply);
    if (reply.code != 0) {
        printf("Select failed.\n");
        exit(1);
    }
    char field_type;
    field_type = mp_typeof(*reply.data);
    if (field_type != MP_ARRAY) {
        printf("no tuple array\n");
        exit(1);
    }
    long unsigned int row_count;
    uint32_t tuple_count = mp_decode_array(&reply.data);
    printf("tuple count=%u\n", tuple_count);
    unsigned int i, j;
    for (i = 0; i < tuple_count; ++i) {
        field_type = mp_typeof(*reply.data);
        if (field_type != MP_ARRAY) {
            printf("no field array\n");
            exit(1);
        }
        uint32_t field_count = mp_decode_array(&reply.data);
        printf("  field count=%u\n", field_count);
        for (j = 0; j < field_count; ++j) {
            field_type = mp_typeof(*reply.data);
            if (field_type == MP_UINT) {
                uint64_t num_value = mp_decode_uint(&reply.data);
                printf("    value=%lu.\n", num_value);
            } else if (field_type == MP_STR) {
                const char *str_value;
                uint32_t str_value_length;
                str_value = mp_decode_str(&reply.data, &str_value_length);
                printf("    value=%.*s.\n", str_value_length, str_value);
            } else {
                printf("wrong field type\n");
                exit(1);
            }
        }
    }
    tnt_close(tnt);
    tnt_stream_free(tuple);
    tnt_stream_free(tnt);
}

Similarly to the first example, paste the code into a file named example2.c.

To compile and link the program, say:

$ gcc -o example2 example2.c -ltarantool

To run the program, say ./example2.

The two example programs only show a few requests and do not show all that’s necessary for good practice. See more in the tarantool-c documentation at GitHub.

Python

Examples on GitHub: sample_db, python

tarantool-python is the official Python connector for Tarantool. It is not supplied as part of the Tarantool repository and must be installed separately.

The tutorial shows how to use the tarantool-python library to create a Python script that connects to a remote Tarantool instance, performs CRUD operations, and executes a stored procedure. You can find the full package documentation here: Python client library for Tarantool.

Примечание

This tutorial shows how to make CRUD requests to a single-instance Tarantool database. To make requests to a sharded Tarantool cluster with the CRUD module, use the tarantool.crud module’s API.

This section describes the configuration of a sample database that allows remote connections:

credentials:
  users:
    sampleuser:
      password: '123456'
      privileges:
      - permissions: [ read, write ]
        spaces: [ bands ]
      - permissions: [ execute ]
        functions: [ get_bands_older_than ]

groups:
  group001:
    replicasets:
      replicaset001:
        instances:
          instance001:
            iproto:
              listen:
              - uri: '127.0.0.1:3301'

app:
  file: 'myapp.lua'

The myapp.lua file looks as follows:

-- Create a space --
box.schema.space.create('bands')

-- Specify field names and types --
box.space.bands:format({
    { name = 'id', type = 'unsigned' },
    { name = 'band_name', type = 'string' },
    { name = 'year', type = 'unsigned' }
})

-- Create indexes --
box.space.bands:create_index('primary', { parts = { 'id' } })
box.space.bands:create_index('band', { parts = { 'band_name' } })
box.space.bands:create_index('year_band', { parts = { { 'year' }, { 'band_name' } } })

-- Create a stored function --
box.schema.func.create('get_bands_older_than', {
    body = [[
    function(year)
        return box.space.bands.index.year_band:select({ year }, { iterator = 'LT', limit = 10 })
    end
    ]]
})

You can find the full example on GitHub: sample_db.

Before creating and starting a client Python application, you need to run the sample_db application using tt start:

$ tt start sample_db

Now you can create a client Python application that makes requests to this database.

Before you start, make sure you have Python installed on your computer.

  1. Create the hello directory for your application and go to this directory:

    $ mkdir hello
    $ cd hello
    
  2. Create and activate a Python virtual environment:

    $ python -m venv .venv
    $ source .venv/bin/activate
    
  3. Install the tarantool module:

    $ pip install tarantool
    
  4. Inside the hello directory, create the hello.py file for application code.

In the hello.py file, import the tarantool package:

import tarantool

Add the following code:

# Connect to the database
conn = tarantool.Connection(host='127.0.0.1',
                            port=3301,
                            user='sampleuser',
                            password='123456')

This code establishes a connection to a running Tarantool instance on behalf of sampleuser. The conn object can be used to make CRUD requests and execute stored procedures.

Add the following code to insert four tuples into the bands space:

# Insert data
tuples = [(1, 'Roxette', 1986),
          (2, 'Scorpions', 1965),
          (3, 'Ace of Base', 1987),
          (4, 'The Beatles', 1960)]
print("Inserted tuples:")
for tuple in tuples:
    response = conn.insert(space_name='bands', values=tuple)
    print(response[0])

Connection.insert() is used to insert a tuple to the space.

To get a tuple by the specified primary key value, use Connection.select():

# Select by primary key
response = conn.select(space_name='bands', key=1)
print('Tuple selected by the primary key value:', response[0])

You can also get a tuple by the value of the specified index using the index argument:

# Select by secondary key
response = conn.select(space_name='bands', key='The Beatles', index='band')
print('Tuple selected by the secondary key value:', response[0])

Connection.update() can be used to update a tuple identified by the primary key as follows:

# Update
response = conn.update(space_name='bands',
                       key=2,
                       op_list=[('=', 'band_name', 'Pink Floyd')])
print('Updated tuple:', response[0])

Connection.upsert() updates an existing tuple or inserts a new one. In the example below, a new tuple is inserted:

# Upsert
conn.upsert(space_name='bands',
            tuple_value=(5, 'The Rolling Stones', 1962),
            op_list=[('=', 'band_name', 'The Doors')])

In this example, Connection.replace() deletes the existing tuple and inserts a new one:

# Replace
response = conn.replace(space_name='bands', values=(1, 'Queen', 1970))
print('Replaced tuple:', response[0])

Connection.delete() in the example below deletes a tuple whose primary key value is 5:

# Delete
response = conn.delete(space_name='bands', key=5)
print('Deleted tuple:', response[0])

To execute a stored procedure, use Connection.call():

# Call
response = conn.call('get_bands_older_than', 1966)
print('Stored procedure result:', response[0])

The Connection.close() method can be used to close the connection when it is no longer needed:

# Close connection
conn.close()
print('Connection is closed')

Примечание

You can find the example with all the requests above on GitHub: python.

To run the resulting application, pass the script name to the python command:

$ python hello.py
Inserted tuples:
[1, 'Roxette', 1986]
[2, 'Scorpions', 1965]
[3, 'Ace of Base', 1987]
[4, 'The Beatles', 1960]
Tuple selected by the primary key value: [1, 'Roxette', 1986]
Tuple selected by the secondary key value: [4, 'The Beatles', 1960]
Updated tuple: [2, 'Pink Floyd', 1965]
Replaced tuple: [1, 'Queen', 1970]
Deleted tuple: [5, 'The Rolling Stones', 1962]
Stored procedure result: [[2, 'Pink Floyd', 1965], [4, 'The Beatles', 1960]]
Connection is closed

Last update: September 2023

There are also several community-driven Python connectors:

The table below contains a feature comparison for asynctnt and tarantool-python.

Parameter igorcoding/asynctnt tarantool/tarantool-python
License Apache License 2.0 BSD-2
Is maintained Yes Yes
Known Issues None None
Documentation Yes (github.io) Yes (readthedocs and tarantool.io)
Testing / CI / CD GitHub Actions GitHub Actions
GitHub Stars 73 92
Static Analysis Yes (Flake8) Yes (Flake8, Pylint)
Packaging pip pip, deb, rpm
Code coverage Yes Yes
Support asynchronous mode Yes, asyncio No
Batching support No Yes (with CRUD API)
Schema reload Yes (automatically, see auto_refetch_schema) Yes (automatically)
Space / index names Yes Yes
Access tuple fields by names Yes No
SQL support Yes Yes
Interactive transactions Yes No (issue #163)
Varbinary support Yes (in MP_BIN fields) Yes
Decimal support Yes Yes
UUID support Yes Yes
EXT_ERROR support Yes Yes
Datetime support Yes Yes
Interval support No (issue #30) Yes
box.session.push() responses Yes Yes
Session settings No No
Graceful shutdown No No
IPROTO_ID (feature discovery) Yes Yes
CRUD support No Yes
Transparent request retrying No No
Transparent reconnecting Autoreconnect Yes (reconnect_max_attempts, reconnect_delay), checking of connection liveness
Connection pool No Yes (with master discovery)
Support of PEP 249 – Python Database API Specification v2.0 No Yes
Encrypted connection (Enterprise Edition) No (issue #22) Yes

C++

tntcxx is the official C++ connector for Tarantool.

Connecting to Tarantool from C++

To simplify the start of your working with the Tarantool C++ connector, we will use the example application from the connector repository. We will go step by step through the application code and explain what each part does.

The following main topics are discussed in this manual:

To go through this Getting Started exercise, you need the following pre-requisites to be done:

The Tarantool C++ connector is currently supported for Linux only.

The connector itself is a header-only library, so, it doesn’t require installation and building as such. All you need is to clone the connector source code and embed it in your C++ project.

Also, make sure you have other necessary software and Tarantool installed.

  1. Make sure you have the following third-party software. If you miss some of the items, install them:

  2. If you don’t have Tarantool on your OS, install it in one of the ways:

  3. Clone the Tarantool C++ connector repository.

    git clone git@github.com:tarantool/tntcxx.git
    

Start Tarantool locally or in Docker and create a space with the following schema and index:

box.cfg{listen = 3301}
t = box.schema.space.create('t')
t:format({
         {name = 'id', type = 'unsigned'},
         {name = 'a', type = 'string'},
         {name = 'b', type = 'number'}
         })
t:create_index('primary', {
         type = 'hash',
         parts = {'id'}
         })

Важно

Do not close the terminal window where Tarantool is running. You will need it later to connect to Tarantool from your C++ application.

To be able to execute the necessary operations in Tarantool, you need to grant the guest user with the read-write rights. The simplest way is to grant the user with the super role:

box.schema.user.grant('guest', 'super')

There are three main parts of the C++ connector: the IO-zero-copy buffer, the msgpack encoder/decoder, and the client that handles requests.

To set up connection to a Tarantool instance from a C++ application, you need to do the following:

Embed the connector in your C++ application by including the main header:

#include "../src/Client/Connector.hpp"

First, we should create a connector client. It can handle many connections to Tarantool instances asynchronously. To instantiate a client, you should specify the buffer and the network provider implementations as template parameters. The connector’s main class has the following signature:

template<class BUFFER, class NetProvider = EpollNetProvider<BUFFER>>
class Connector;

The buffer is parametrized by allocator. It means that users can choose which allocator will be used to provide memory for the buffer’s blocks. Data is organized into a linked list of blocks of fixed size that is specified as the template parameter of the buffer.

You can either implement your own buffer or network provider or use the default ones as we do in our example. So, the default connector instantiation looks as follows:

using Buf_t = tnt::Buffer<16 * 1024>;
#include "../src/Client/LibevNetProvider.hpp"
using Net_t = LibevNetProvider<Buf_t, DefaultStream>;
Connector<Buf_t, Net_t> client;

To use the BUFFER class, the buffer header should also be included:

#include "../src/Buffer/Buffer.hpp"

A client itself is not enough to work with Tarantool instances–we also need to create connection objects. A connection also takes the buffer and the network provider as template parameters. Note that they must be the same as ones of the client:

Connection<Buf_t, Net_t> conn(client);

Our Tarantool instance is listening to the 3301 port on localhost. Let’s define the corresponding variables as well as the WAIT_TIMEOUT variable for connection timeout.

const char *address = "127.0.0.1";
int port = 3301;
int WAIT_TIMEOUT = 1000; //milliseconds

To connect to the Tarantool instance, we should invoke the Connector::connect() method of the client object and pass three arguments: connection instance, address, and port.

int rc = client.connect(conn, {.address = address,
			       .service = std::to_string(port),
			       /*.user = ...,*/
			       /*.passwd = ...,*/
			       /* .transport = STREAM_SSL, */});

Implementation of the connector is exception free, so we rely on the return codes: in case of fail, the connect() method returns rc < 0. To get the error message corresponding to the last error occured during communication with the instance, we can invoke the Connection::getError() method.

if (rc != 0) {
	//assert(conn.getError().saved_errno != 0);
	std::cerr << conn.getError().msg << std::endl;
	return -1;
}

To reset connection after errors, that is, to clean up the error message and connection status, the Connection::reset() method is used.

In this section, we will show how to:

We will also go through the case of having several connections and executing a number of requests from different connections simultaneously.

In our example C++ application, we execute the following types of requests:

Примечание

Examples on other request types, namely, insert, delete, upsert, and update, will be added to this manual later.

Each request method returns a request ID that is a sort of future. This ID can be used to get the response message when it is ready. Requests are queued in the output buffer of connection until the Connector::wait() method is called.

At this step, requests are encoded in the MessagePack format and saved in the output connection buffer. They are ready to be sent but the network communication itself will be done later.

Let’s remind that for the requests manipulating with data we are dealing with the Tarantool space t created earlier, and the space has the following format:

t:format({
         {name = 'id', type = 'unsigned'},
         {name = 'a', type = 'string'},
         {name = 'b', type = 'number'}
         })

ping

rid_t ping = conn.ping();

replace

Equals to Lua request <space_name>:replace(pk_value, "111", 1).

uint32_t space_id = 512;
int pk_value = 666;
std::tuple data = std::make_tuple(pk_value /* field 1*/, "111" /* field 2*/, 1.01 /* field 3*/);
rid_t replace = conn.space[space_id].replace(data);

select

Equals to Lua request <space_name>.index[0]:select({pk_value}, {limit = 1}).

uint32_t index_id = 0;
uint32_t limit = 1;
uint32_t offset = 0;
IteratorType iter = IteratorType::EQ;
auto i = conn.space[space_id].index[index_id];
rid_t select = i.select(std::make_tuple(pk_value), limit, offset, iter);

To send requests to the server side, invoke the client.wait() method.

client.wait(conn, ping, WAIT_TIMEOUT);

The wait() method takes the connection to poll, the request ID, and, optionally, the timeout as parameters. Once a response for the specified request is ready, wait() terminates. It also provides a negative return code in case of system related fails, for example, a broken or timeouted connection. If wait() returns 0, then a response has been received and expected to be parsed.

Now let’s send our requests to the Tarantool instance. The futureIsReady() function checks availability of a future and returns true or false.

while (! conn.futureIsReady(ping)) {
	/*
	 * wait() is the main function responsible for sending/receiving
	 * requests and implements event-loop under the hood. It may
	 * fail due to several reasons:
	 *  - connection is timed out;
	 *  - connection is broken (e.g. closed);
	 *  - epoll is failed.
	 */
	if (client.wait(conn, ping, WAIT_TIMEOUT) != 0) {
		std::cerr << conn.getError().msg << std::endl;
		conn.reset();
	}
}

To get the response when it is ready, use the Connection::getResponse() method. It takes the request ID and returns an optional object containing the response. If the response is not ready yet, the method returns std::nullopt. Note that on each future, getResponse() can be called only once: it erases the request ID from the internal map once it is returned to a user.

A response consists of a header and a body (response.header and response.body). Depending on success of the request execution on the server side, body may contain either runtime error(s) accessible by response.body.error_stack or data (tuples)–response.body.data. In turn, data is a vector of tuples. However, tuples are not decoded and come in the form of pointers to the start and the end of msgpacks. See the «Decoding and reading the data» section to understand how to decode tuples.

There are two options for single connection it regards to receiving responses: we can either wait for one specific future or for all of them at once. We’ll try both options in our example. For the ping request, let’s use the first option.

std::optional<Response<Buf_t>> response = conn.getResponse(ping);
/*
 * Since conn.futureIsReady(ping) returned <true>, then response
 * must be ready.
 */
assert(response != std::nullopt);
/*
 * If request is successfully executed on server side, response
 * will contain data (i.e. tuple being replaced in case of :replace()
 * request or tuples satisfying search conditions in case of :select();
 * responses for pings contain nothing - empty map).
 * To tell responses containing data from error responses, one can
 * rely on response code storing in the header or check
 * Response->body.data and Response->body.error_stack members.
 */
printResponse<Buf_t>(*response);

For the replace and select requests, let’s examine the option of waiting for both futures at once.

/* Let's wait for both futures at once. */
std::vector<rid_t> futures(2);
futures[0] = replace;
futures[1] = select;
/* No specified timeout means that we poll futures until they are ready.*/
client.waitAll(conn, futures);
for (size_t i = 0; i < futures.size(); ++i) {
	assert(conn.futureIsReady(futures[i]));
	response = conn.getResponse(futures[i]);
	assert(response != std::nullopt);
	printResponse<Buf_t>(*response);
}

Now, let’s have a look at the case when we establish two connections to Tarantool instance simultaneously.

/* Let's create another connection. */
Connection<Buf_t, Net_t> another(client);
if (client.connect(another, {.address = address,
			     .service = std::to_string(port),
			     /* .transport = STREAM_SSL, */}) != 0) {
	std::cerr << conn.getError().msg << std::endl;
	return -1;
}
/* Simultaneously execute two requests from different connections. */
rid_t f1 = conn.ping();
rid_t f2 = another.ping();
/*
 * waitAny() returns the first connection received response.
 * All connections registered via :connect() call are participating.
 */
std::optional<Connection<Buf_t, Net_t>> conn_opt = client.waitAny(WAIT_TIMEOUT);
Connection<Buf_t, Net_t> first = *conn_opt;
if (first == conn) {
	assert(conn.futureIsReady(f1));
	(void) f1;
} else {
	assert(another.futureIsReady(f2));
	(void) f2;
}

Finally, a user is responsible for closing connections.

client.close(conn);
client.close(another);

Now, we are going to build our example C++ application, launch it to connect to the Tarantool instance and execute all the requests defined.

Make sure you are in the root directory of the cloned C++ connector repository. To build the example application:

cd examples
cmake .
make

Make sure the Tarantool session you started earlier is running. Launch the application:

./Simple

As you can see from the execution log, all the connections to Tarantool defined in our application have been established and all the requests have been executed successfully.

Responses from a Tarantool instance contain raw data, that is, the data encoded into the MessagePack tuples. To decode client’s data, the user has to write their own decoders (readers) based on the database schema and include them in one’s application:

#include "Reader.hpp"

To show the logic of decoding a response, we will use the reader from our example.

First, the structure corresponding our example space format is defined:

/**
 * Corresponds to tuples stored in user's space:
 * box.execute("CREATE TABLE t (id UNSIGNED PRIMARY KEY, a TEXT, d DOUBLE);")
 */
struct UserTuple {
	uint64_t field1;
	std::string field2;
	double field3;

	static constexpr auto mpp = std::make_tuple(
		&UserTuple::field1, &UserTuple::field2, &UserTuple::field3);
};

Prototype of the base reader is given in src/mpp/Dec.hpp:

template <class BUFFER, Type TYPE>
struct SimpleReaderBase : DefaultErrorHandler {
   using BufferIterator_t = typename BUFFER::iterator;
   /* Allowed type of values to be parsed. */
   static constexpr Type VALID_TYPES = TYPE;
   BufferIterator_t* StoreEndIterator() { return nullptr; }
};

Every new reader should inherit from it or directly from the DefaultErrorHandler.

To parse a particular value, we should define the Value() method. First two arguments of the method are common and unused as a rule, but the third one defines the parsed value. In case of POD (Plain Old Data) structures, it’s enough to provide a byte-to-byte copy. Since there are fields of three different types in our schema, let’s define the corresponding Value() functions:

It’s also important to understand that a tuple itself is wrapped in an array, so, in fact, we should parse the array first. Let’s define another reader for that purpose.

The SetReader() method sets the reader that is invoked while each of the array’s entries is parsed. To make two readers defined above work, we should create a decoder, set its iterator to the position of the encoded tuple, and invoke the Read() method (the code block below is from the example application).

C++ connector API

The official C++ connector for Tarantool is located in the tanartool/tntcxx repository.

It is not supplied as part of the Tarantool repository and requires additional actions for usage. The connector itself is a header-only library and, as such, doesn’t require installation and building. All you need is to clone the connector source code and embed it in your C++ project. See the C++ connector Getting started document for details and examples.

Below is the description of the connector public API.

template<class BUFFER, class NetProvider = EpollNetProvider<BUFFER>>
class Connector

The Connector class is a template class that defines a connector client which can handle many connections to Tarantool instances asynchronously.

To instantiate a client, you should specify the buffer and the network provider implementations as template parameters. You can either implement your own buffer or network provider or use the default ones.

The default connector instantiation looks as follows:

using Buf_t = tnt::Buffer<16 * 1024>;
using Net_t = EpollNetProvider<Buf_t >;
Connector<Buf_t, Net_t> client;

int connect(Connection<BUFFER, NetProvider> &conn, const std::string_view &addr, unsigned port, size_t timeout = DEFAULT_CONNECT_TIMEOUT)

Connects to a Tarantool instance that is listening on addr:port. On successful connection, the method returns 0. If the host doesn’t reply within the timeout period or another error occurs, it returns -1. Then, Connection.getError() gives the error message.

Параметры:
  • conn – object of the Connection class.
  • addr – address of the host where a Tarantool instance is running.
  • port – port that a Tarantool instance is listening on.
  • timeout – connection timeout, seconds. Optional. Defaults to 2.
Результат:

0 on success, or -1 otherwise.

Rtype:

int

Possible errors:

  • connection timeout
  • refused to connect (due to incorrect address or/and port)
  • system errors: a socket can’t be created; failure of any of the system calls (fcntl, select, send, receive).

Example:

using Buf_t = tnt::Buffer<16 * 1024>;
using Net_t = EpollNetProvider<Buf_t >;

Connector<Buf_t, Net_t> client;
Connection<Buf_t, Net_t> conn(client);

int rc = client.connect(conn, "127.0.0.1", 3301);
int wait(Connection<BUFFER, NetProvider> &conn, rid_t future, int timeout = 0)

The main method responsible for sending a request and checking the response readiness.

You should prepare a request beforehand by using the necessary method of the Connection class, such as ping() and so on, which encodes the request in the MessagePack format and saves it in the output connection buffer.

wait() sends the request and is polling the future for the response readiness. Once the response is ready, wait() returns 0. If at timeout the response isn’t ready or another error occurs, it returns -1. Then, Connection.getError() gives the error message. timeout = 0 means the method is polling the future until the response is ready.

Параметры:
  • conn – object of the Connection class.
  • future – request ID returned by a request method of the Connection class, such as, ping() and so on.
  • timeout – waiting timeout, milliseconds. Optional. Defaults to 0.
Результат:

0 on receiving a response, or -1 otherwise.

Rtype:

int

Possible errors:

  • timeout exceeded
  • other possible errors depend on a network provider used. If the EpollNetProvider is used, failing of the poll, read, and write system calls leads to system errors, such as, EBADF, ENOTSOCK, EFAULT, EINVAL, EPIPE, and ENOTCONN (EWOULDBLOCK and EAGAIN don’t occur in this case).

Example:

client.wait(conn, ping, WAIT_TIMEOUT)
void waitAll(Connection<BUFFER, NetProvider> &conn, rid_t *futures, size_t future_count, int timeout = 0)

Similar to wait(), the method sends the requests prepared and checks the response readiness, but can send several different requests stored in the futures array. Exceeding the timeout leads to an error; Connection.getError() gives the error message. timeout = 0 means the method is polling the futures until all the responses are ready.

Параметры:
  • conn – object of the Connection class.
  • futures – array with the request IDs returned by request methods of the Connection class, such as, ping() and so on.
  • future_count – size of the futures array.
  • timeout – waiting timeout, milliseconds. Optional. Defaults to 0.
Результат:

none

Rtype:

none

Possible errors:

  • timeout exceeded
  • other possible errors depend on a network provider used. If the EpollNetProvider is used, failing of the poll, read, and write system calls leads to system errors, such as, EBADF, ENOTSOCK, EFAULT, EINVAL, EPIPE, and ENOTCONN (EWOULDBLOCK and EAGAIN don’t occur in this case).

Example:

rid_t futures[2];
futures[0] = replace;
futures[1] = select;

client.waitAll(conn, (rid_t *) &futures, 2);
Connection<BUFFER, NetProvider> *waitAny(int timeout = 0)

Sends all requests that are prepared at the moment and is waiting for any first response to be ready. Upon the response readiness, waitAny() returns the corresponding connection object. If at timeout no response is ready or another error occurs, it returns nullptr. Then, Connection.getError() gives the error message. timeout = 0 means no time limitation while waiting for the response readiness.

Параметры:timeout – waiting timeout, milliseconds. Optional. Defaults to 0.
Результат:object of the Connection class on success, or nullptr on error.
Rtype:Connection<BUFFER, NetProvider>*

Possible errors:

  • timeout exceeded
  • other possible errors depend on a network provider used. If the EpollNetProvider is used, failing of the poll, read, and write system calls leads to system errors, such as, EBADF, ENOTSOCK, EFAULT, EINVAL, EPIPE, and ENOTCONN (EWOULDBLOCK and EAGAIN don’t occur in this case).

Example:

rid_t f1 = conn.ping();
rid_t f2 = another_conn.ping();

Connection<Buf_t, Net_t> *first = client.waitAny(WAIT_TIMEOUT);
if (first == &conn) {
    assert(conn.futureIsReady(f1));
} else {
    assert(another_conn.futureIsReady(f2));
}
void close(Connection<BUFFER, NetProvider> &conn)

Closes the connection established earlier by the connect() method.

Параметры:conn – connection object of the Connection class.
Результат:none
Rtype:none

Possible errors: none.

Example:

client.close(conn);

template<class BUFFER, class NetProvider>
class Connection

The Connection class is a template class that defines a connection objects which is required to interact with a Tarantool instance. Each connection object is bound to a single socket.

Similar to a connector client, a connection object also takes the buffer and the network provider as template parameters, and they must be the same as ones of the client. For example:

//Instantiating a connector client
using Buf_t = tnt::Buffer<16 * 1024>;
using Net_t = EpollNetProvider<Buf_t >;
Connector<Buf_t, Net_t> client;

//Instantiating connection objects
Connection<Buf_t, Net_t> conn01(client);
Connection<Buf_t, Net_t> conn02(client);

The Connection class has two nested classes, namely, Space and Index that implement the data-manipulation methods like select(), replace(), and so on.

typedef size_t rid_t

The alias of the built-in size_t type. rid_t is used for entities that return or contain a request ID.

template<class T>
rid_t call(const std::string &func, const T &args)

Executes a call of a remote stored-procedure similar to conn:call(). The method returns the request ID that is used to get the response by getResponse().

Параметры:
  • func – a remote stored-procedure name.
  • args – procedure’s arguments.
Результат:

a request ID

Rtype:

rid_t

Possible errors: none.

Example:

The following function is defined on the Tarantool instance you are connected to:

box.execute("DROP TABLE IF EXISTS t;")
box.execute("CREATE TABLE t(id INT PRIMARY KEY, a TEXT, b DOUBLE);")

function remote_replace(arg1, arg2, arg3)
    return box.space.T:replace({arg1, arg2, arg3})
end

The function call can look as follows:

rid_t f1 = conn.call("remote_replace", std::make_tuple(5, "some_sring", 5.55));
bool futureIsReady(rid_t future)

Checks availability of a request ID (future) returned by any of the request methods, such as, ping() and so on.

futureIsReady() returns true if the future is available or false otherwise.

Параметры:future – a request ID.
Результат:true or false
Rtype:bool

Possible errors: none.

Example:

rid_t ping = conn.ping();
conn.futureIsReady(ping);
std::optional<Response<BUFFER>> getResponse(rid_t future)

The method takes a request ID (future) as an argument and returns an optional object containing a response. If the response is not ready, the method returns std::nullopt. Note that for each future the method can be called only once because it erases the request ID from the internal map as soon as the response is returned to a user.

A response consists of a header (response.header) and a body (response.body). Depending on success of the request execution on the server side, body may contain either runtime errors accessible by response.body.error_stack or data (tuples) accessible by response.body.data. Data is a vector of tuples. However, tuples are not decoded and come in the form of pointers to the start and the end of MessagePacks. For details on decoding the data received, refer to «Decoding and reading the data».

Параметры:future – a request ID
Результат:a response object or std::nullopt
Rtype:std::optional<Response<BUFFER>>

Possible errors: none.

Example:

rid_t ping = conn.ping();
std::optional<Response<Buf_t>> response = conn.getResponse(ping);
std::string &getError()

Returns an error message for the last error occured during the execution of methods of the Connector and Connection classes.

Результат:an error message
Rtype:std::string&

Possible errors: none.

Example:

int rc = client.connect(conn, address, port);

if (rc != 0) {
    std::cerr << conn.getError() << std::endl;
    return -1;
}
void reset()

Resets a connection after errors, that is, cleans up the error message and the connection status.

Результат:none
Rtype:none

Possible errors: none.

Example:

if (client.wait(conn, ping, WAIT_TIMEOUT) != 0) {
    assert(conn.status.is_failed);
    std::cerr << conn.getError() << std::endl;
    conn.reset();
}
rid_t ping()

Prepares a request to ping a Tarantool instance.

The method encodes the request in the MessagePack format and queues it in the output connection buffer to be sent later by one of Connector’s methods, namely, wait(), waitAll(), or waitAny().

Returns the request ID that is used to get the response by the getResponce() method.

Результат:a request ID
Rtype:rid_t

Possible errors: none.

Example:

rid_t ping = conn.ping();

class Space : Connection

Space is a nested class of the Connection class. It is a public wrapper to access the data-manipulation methods in the way similar to the Tarantool submodule box.space, like, space[space_id].select(), space[space_id].replace(), and so on.

All the Space class methods listed below work in the following way:

  • A method encodes the corresponding request in the MessagePack format and queues it in the output connection buffer to be sent later by one of Connector’s methods, namely, wait(), waitAll(), or waitAny().
  • A method returns the request ID. To get and read the actual data requested, first you need to get the response object by using the getResponce() method and then decode the data.

Public methods:

template<class T>
rid_t select(const T &key, uint32_t index_id = 0, uint32_t limit = UINT32_MAX, uint32_t offset = 0, IteratorType iterator = EQ)

Searches for a tuple or a set of tuples in the given space. The method works similar to space_object:select() and performs the search against the primary index (index_id = 0) by default. In other words, space[space_id].select() equals to space[space_id].index[0].select().

Параметры:
  • key – value to be matched against the index key.
  • index_id – index ID. Optional. Defaults to 0.
  • limit – maximum number of tuples to select. Optional. Defaults to UINT32_MAX.
  • offset – number of tuples to skip. Optional. Defaults to 0.
  • iterator – the type of iterator. Optional. Defaults to EQ.
Результат:

a request ID

Rtype:

rid_t

Possible errors: none.

Example:

/* Equals to space_object:select({key_value}, {limit = 1}) in Tarantool*/
uint32_t space_id = 512;
int key_value = 5;
uint32_t limit = 1;
auto i = conn.space[space_id];
rid_t select = i.select(std::make_tuple(key_value), index_id, limit, offset, iter);
template<class T>
rid_t replace(const T &tuple)

Inserts a tuple into the given space. If a tuple with the same primary key already exists, replace() replaces the existing tuple with a new one. The method works similar to space_object:replace() / put().

Параметры:tuple – a tuple to insert.
Результат:a request ID
Rtype:rid_t

Possible errors: none.

Example:

/* Equals to space_object:replace(key_value, "111", 1.01) in Tarantool*/
uint32_t space_id = 512;
int key_value = 5;
std::tuple data = std::make_tuple(key_value, "111", 1.01);
rid_t replace = conn.space[space_id].replace(data);
template<class T>
rid_t insert(const T &tuple)

Inserts a tuple into the given space. The method works similar to space_object:insert().

Параметры:tuple – a tuple to insert.
Результат:a request ID
Rtype:rid_t

Possible errors: none.

Example:

/* Equals to space_object:insert(key_value, "112", 2.22) in Tarantool*/
uint32_t space_id = 512;
int key_value = 6;
std::tuple data = std::make_tuple(key_value, "112", 2.22);
rid_t insert = conn.space[space_id].insert(data);
template<class K, class T>
rid_t update(const K &key, const T &tuple, uint32_t index_id = 0)

Updates a tuple in the given space. The method works similar to space_object:update() and searches for the tuple to update against the primary index (index_id = 0) by default. In other words, space[space_id].update() equals to space[space_id].index[0].update().

The tuple parameter specifies an update operation, an identifier of the field to update, and a new field value. The set of available operations and the format of specifying an operation and a field identifier is the same as in Tarantool. Refer to the description of :doc:` </reference/reference_lua/box_space/update>` and example below for details.

Параметры:
  • key – value to be matched against the index key.
  • tuple – parameters for the update operation, namely, operator, field_identifier, value.
  • index_id – index ID. Optional. Defaults to 0.
Результат:

a request ID

Rtype:

rid_t

Possible errors: none.

Example:

/* Equals to space_object:update(key, {{'=', 1, 'update' }, {'+', 2, 12}}) in Tarantool*/
uint32_t space_id = 512;
std::tuple key = std::make_tuple(5);
std::tuple op1 = std::make_tuple("=", 1, "update");
std::tuple op2 = std::make_tuple("+", 2, 12);
rid_t f1 = conn.space[space_id].update(key, std::make_tuple(op1, op2));
template<class T, class O>
rid_t upsert(const T &tuple, const O &ops, uint32_t index_base = 0)

Updates or inserts a tuple in the given space. The method works similar to space_object:upsert().

If there is an existing tuple that matches the key fields of tuple, the request has the same effect as update() and the ops parameter is used. If there is no existing tuple that matches the key fields of tuple, the request has the same effect as insert() and the tuple parameter is used.

Параметры:
  • tuple – a tuple to insert.
  • ops – parameters for the update operation, namely, operator, field_identifier, value.
  • index_base – starting number to count fields in a tuple: 0 or 1. Optional. Defaults to 0.
Результат:

a request ID

Rtype:

rid_t

Possible errors: none.

Example:

/* Equals to space_object:upsert({333, "upsert-insert", 0.0}, {{'=', 1, 'upsert-update'}}) in Tarantool*/
uint32_t space_id = 512;
std::tuple tuple = std::make_tuple(333, "upsert-insert", 0.0);
std::tuple op1 = std::make_tuple("=", 1, "upsert-update");
rid_t f1 = conn.space[space_id].upsert(tuple, std::make_tuple(op1));
template<class T>
rid_t delete_(const T &key, uint32_t index_id = 0)

Deletes a tuple in the given space. The method works similar to space_object:delete() and searches for the tuple to delete against the primary index (index_id = 0) by default. In other words, space[space_id].delete_() equals to space[space_id].index[0].delete_().

Параметры:
  • key – value to be matched against the index key.
  • index_id – index ID. Optional. Defaults to 0.
Результат:

a request ID

Rtype:

rid_t

Possible errors: none.

Example:

/* Equals to space_object:delete(123) in Tarantool*/
uint32_t space_id = 512;
std::tuple key = std::make_tuple(123);
rid_t f1 = conn.space[space_id].delete_(key);

class Index : Space

Index is a nested class of the Space class. It is a public wrapper to access the data-manipulation methods in the way similar to the Tarantool submodule box.index, like, space[space_id].index[index_id].select() and so on.

All the Index class methods listed below work in the following way:

  • A method encodes the corresponding request in the MessagePack format and queues it in the output connection buffer to be sent later by one of Connector’s methods, namely, wait(), waitAll(), or waitAny().
  • A method returns the request ID that is used to get the response by the getResponce() method. Refer to the getResponce() description to understand the response structure and how to read the requested data.

Public methods:

template<class T>
rid_t select(const T &key, uint32_t limit = UINT32_MAX, uint32_t offset = 0, IteratorType iterator = EQ)

This is an alternative to space.select(). The method searches for a tuple or a set of tuples in the given space against a particular index and works similar to index_object:select().

Параметры:
  • key – value to be matched against the index key.
  • limit – maximum number of tuples to select. Optional. Defaults to UINT32_MAX.
  • offset – number of tuples to skip. Optional. Defaults to 0.
  • iterator – the type of iterator. Optional. Defaults to EQ.
Результат:

a request ID

Rtype:

rid_t

Possible errors: none.

Example:

/* Equals to index_object:select({key}, {limit = 1}) in Tarantool*/
uint32_t space_id = 512;
uint32_t index_id = 1;
int key = 10;
uint32_t limit = 1;
auto i = conn.space[space_id].index[index_id];
rid_t select = i.select(std::make_tuple(key), limit, offset, iter);
template<class K, class T>
rid_t update(const K &key, const T &tuple)

This is an alternative to space.update(). The method updates a tuple in the given space but searches for the tuple against a particular index. The method works similar to index_object:update().

The tuple parameter specifies an update operation, an identifier of the field to update, and a new field value. The set of available operations and the format of specifying an operation and a field identifier is the same as in Tarantool. Refer to the description of :doc:` </reference/reference_lua/box_index/update>` and example below for details.

Параметры:
  • key – value to be matched against the index key.
  • tuple – parameters for the update operation, namely, operator, field_identifier, value.
Результат:

a request ID

Rtype:

rid_t

Possible errors: none.

Example:

/* Equals to index_object:update(key, {{'=', 1, 'update' }, {'+', 2, 12}}) in Tarantool*/
uint32_t space_id = 512;
uint32_t index_id = 1;
std::tuple key = std::make_tuple(10);
std::tuple op1 = std::make_tuple("=", 1, "update");
std::tuple op2 = std::make_tuple("+", 2, 12);
rid_t f1 = conn.space[space_id].index[index_id].update(key, std::make_tuple(op1, op2));
template<class T>
rid_t delete_(const T &key)

This is an alternative to space.delete_(). The method deletes a tuple in the given space but searches for the tuple against a particular index. The method works similar to index_object:delete().

Параметры:key – value to be matched against the index key.
Результат:a request ID
Rtype:rid_t

Possible errors: none.

Example:

/* Equals to index_object:delete(123) in Tarantool*/
uint32_t space_id = 512;
uint32_t index_id = 1;
std::tuple key = std::make_tuple(123);
rid_t f1 = conn.space[space_id].index[index_id].delete_(key);

Community-supported connectors

This section provides information on several community-supported connectors. Note that they may have limited support for new Tarantool features.

For Erlang, use the Erlang tarantool driver.

For R, use the tarantoolr connector.

C#

The most commonly used C# driver is progaudi.tarantool, previously named tarantool-csharp. It is not supplied as part of the Tarantool repository; it must be installed separately. The makers recommend cross-platform installation using Nuget.

To be consistent with the other instructions in this chapter, here is a way to install the driver directly on Ubuntu 16.04.

  1. Install .net core from Microsoft. Follow .net core installation instructions.

Примечание

  1. Create a new console project.

    $ cd ~
    $ mkdir progaudi.tarantool.test
    $ cd progaudi.tarantool.test
    $ dotnet new console
    
  2. Add progaudi.tarantool reference.

    $ dotnet add package progaudi.tarantool
    
  3. Change code in Program.cs.

    $ cat <<EOT > Program.cs
    using System;
    using System.Threading.Tasks;
    using ProGaudi.Tarantool.Client;
    
    public class HelloWorld
    {
      static public void Main ()
      {
        Test().GetAwaiter().GetResult();
      }
      static async Task Test()
      {
        var box = await Box.Connect("127.0.0.1:3301");
        var schema = box.GetSchema();
        var space = await schema.GetSpace("examples");
        await space.Insert((99999, "BB"));
      }
    }
    EOT
    
  4. Build and run your application.

    Before trying to run, check that the server is listening at localhost:3301 and that the space examples exists, as described earlier.

    $ dotnet restore
    $ dotnet run
    

    The program will:

    • connect using an application-specific definition of the space,
    • open a socket connection with the Tarantool server at localhost:3301,
    • send an INSERT request, and — if all is well — end without saying anything.

    If Tarantool is not running on localhost with listen port = 3301, or if user „guest“ does not have authorization to connect, or if the INSERT request fails for any reason, the program will print an error message, among other things (stacktrace, etc).

The example program only shows one request and does not show all that’s necessary for good practice. For that, please see the progaudi.tarantool driver repository.

Node.js

The most commonly used node.js driver is the Node Tarantool driver. It is not supplied as part of the Tarantool repository; it must be installed separately. The most common way to install it is with npm. For example, on Ubuntu, the installation could look like this after npm has been installed:

$ npm install tarantool-driver --global

Here is a complete node.js program that inserts [99999,'BB'] into space[999] via the node.js API. Before trying to run, check that the server instance is listening at localhost:3301 and that the space examples exists, as described earlier. To run, paste the code into a file named example.rs and say node example.rs. The program will connect using an application-specific definition of the space. The program will open a socket connection with the Tarantool instance at localhost:3301, then send an INSERT request, then — if all is well — end after saying «Insert succeeded». If Tarantool is not running on localhost with listen port = 3301, the program will print “Connect failed”. If the „guest“ user does not have authorization to connect, the program will print «Auth failed». If the insert request fails for any reason, for example because the tuple already exists, the program will print «Insert failed».

var TarantoolConnection = require('tarantool-driver');
var conn = new TarantoolConnection({port: 3301});
var insertTuple = [99999, "BB"];
conn.connect().then(function() {
    conn.auth("guest", "").then(function() {
        conn.insert(999, insertTuple).then(function() {
            console.log("Insert succeeded");
            process.exit(0);
    }, function(e) { console.log("Insert failed");  process.exit(1); });
    }, function(e) { console.log("Auth failed");    process.exit(1); });
    }, function(e) { console.log("Connect failed"); process.exit(1); });

The example program only shows one request and does not show all that’s necessary for good practice. For that, please see The node.js driver repository.

Perl

The most commonly used Perl driver is tarantool-perl. It is not supplied as part of the Tarantool repository; it must be installed separately. The most common way to install it is by cloning from GitHub.

To avoid minor warnings that may appear the first time tarantool-perl is installed, start with installing some other modules that tarantool-perl uses, with CPAN, the Comprehensive Perl Archive Network:

$ sudo cpan install AnyEvent
$ sudo cpan install Devel::GlobalDestruction

Then, to install tarantool-perl itself, say:

$ git clone https://github.com/tarantool/tarantool-perl.git tarantool-perl
$ cd tarantool-perl
$ git submodule init
$ git submodule update --recursive
$ perl Makefile.PL
$ make
$ sudo make install

Here is a complete Perl program that inserts [99999,'BB'] into space[999] via the Perl API. Before trying to run, check that the server instance is listening at localhost:3301 and that the space examples exists, as described earlier. To run, paste the code into a file named example.pl and say perl example.pl. The program will connect using an application-specific definition of the space. The program will open a socket connection with the Tarantool instance at localhost:3301, then send an space_object:INSERT request, then — if all is well — end without displaying any messages. If Tarantool is not running on localhost with listen port = 3301, the program will print “Connection refused”.

#!/usr/bin/perl
use DR::Tarantool ':constant', 'tarantool';
use DR::Tarantool ':all';
use DR::Tarantool::MsgPack::SyncClient;

my $tnt = DR::Tarantool::MsgPack::SyncClient->connect(
  host    => '127.0.0.1',                      # look for tarantool on localhost
  port    => 3301,                             # on port 3301
  user    => 'guest',                          # username. for 'guest' we do not also say 'password=>...'

  spaces  => {
    999 => {                                   # definition of space[999] ...
      name => 'examples',                      #   space[999] name = 'examples'
      default_type => 'STR',                   #   space[999] field type is 'STR' if undefined
      fields => [ {                            #   definition of space[999].fields ...
          name => 'field1', type => 'NUM' } ], #     space[999].field[1] name='field1',type='NUM'
      indexes => {                             #   definition of space[999] indexes ...
        0 => {
          name => 'primary', fields => [ 'field1' ] } } } } );

$tnt->insert('examples' => [ 99999, 'BB' ]);

The example program uses field type names „STR“ and „NUM“ instead of „string“ and „unsigned“, due to a temporary Perl limitation.

The example program only shows one request and does not show all that’s necessary for good practice. For that, please see the tarantool-perl repository.

PHP

tarantool-php is the official PHP connector for Tarantool. It is not supplied as part of the Tarantool repository and must be installed separately (see installation instructions in the connector’s README file).

Here is a complete PHP program that inserts [99999,'BB'] into a space named examples via the PHP API.

Before trying to run, check that the server instance is listening at localhost:3301 and that the space examples exists, as described earlier.

To run, paste the code into a file named example.php and say:

$ php -d extension=~/tarantool-php/modules/tarantool.so example.php

The program will open a socket connection with the Tarantool instance at localhost:3301, then send an INSERT request, then – if all is well – print «Insert succeeded».

If the tuple already exists, the program will print «Duplicate key exists in unique index „primary“ in space „examples“».

<?php
$tarantool = new Tarantool('localhost', 3301);

try {
    $tarantool->insert('examples', [99999, 'BB']);
    echo "Insert succeeded\n";
} catch (Exception $e) {
    echo $e->getMessage(), "\n";
}

The example program only shows one request and does not show all that’s necessary for good practice. For that, please see tarantool/tarantool-php project at GitHub.

Besides, there is another community-driven tarantool-php GitHub project which includes an alternative connector written in pure PHP, an object mapper, a queue and other packages.

Справочники

Configuration reference

This topic describes all configuration parameters provided by Tarantool.

Most of the configuration options described in this reference can be applied to a specific instance, replica set, group, or to all instances globally. To do so, you need to define the required option at the specified level.

Using Tarantool as an application server, you can run your own Lua applications. In the app section, you can load the application and provide an application configuration in the app.cfg section.

Примечание

app can be defined in any scope.

Примечание

Note that an application specified using app is loaded after application roles specified using the roles option.

app.cfg

A configuration of the application loaded using app.file or app.module.

Example

In the example below, the application is loaded from the myapp.lua file placed next to the YAML configuration file:

app:
  file: 'myapp.lua'
  cfg:
    greeting: 'Hello'

Example on GitHub: application

Совет

The experimental.config.utils.schema built-in module provides an API for managing user-defined configurations of applications (app.cfg) and roles (roles_cfg).


Type: map
Default: nil
Environment variable: TT_APP_CFG
app.file

A path to a Lua file to load an application from.


Type: string
Default: nil
Environment variable: TT_APP_FILE
app.module

A Lua module to load an application from.

Example

The app section can be placed in any configuration scope. As an example use case, you can provide different applications for storages and routers in a sharded cluster:

groups:
  storages:
    app:
      module: storage
      # ...
  routers:
    app:
      module: router
      # ...

Type: string
Default: nil
Environment variable: TT_APP_MODULE

Enterprise Edition

Configuring audit_log parameters is available in the Enterprise Edition only.

The audit_log section defines configuration parameters related to audit logging.

Примечание

audit_log can be defined in any scope.

audit_log.extract_key

If set to true, the audit subsystem extracts and prints only the primary key instead of full tuples in DML events (space_insert, space_replace, space_delete). Otherwise, full tuples are logged. The option may be useful in case tuples are big.


Type: boolean
Default: false
Environment variable: TT_AUDIT_LOG_EXTRACT_KEY
audit_log.file

Specify a file for the audit log destination. You can set the file type using the audit_log.to option. If you write logs to a file, Tarantool reopens the audit log at SIGHUP.


Type: string
Default: „var/log/{{ instance_name }}/audit.log“
Environment variable: TT_AUDIT_LOG_FILE
audit_log.filter

Enable logging for a specified subset of audit events. This option accepts the following values:

  • Event names (for example, password_change). For details, see Audit log events.
  • Event groups (for example, audit). For details, see Event groups.

The option contains either one value from Possible values section (see below) or a combination of them.

To enable custom audit log events, specify the custom value in this option.

Example

filter: [ user_create,data_operations,ddl,custom ]

Type: array
Possible values: „all“, „audit“, „auth“, „priv“, „ddl“, „dml“, „data_operations“, „compatibility“, „audit_enable“, „auth_ok“, „auth_fail“, „disconnect“, „user_create“, „user_drop“, „role_create“, „role_drop“, „user_disable“, „user_enable“, „user_grant_rights“, „role_grant_rights“, „role_revoke_rights“, „password_change“, „access_denied“, „eval“, „call“, „space_select“, „space_create“, „space_alter“, „space_drop“, „space_insert“, „space_replace“, „space_delete“, „custom“
Default: „nil“
Environment variable: TT_AUDIT_LOG_FILTER
audit_log.format

Specify a format that is used for the audit log.

Example

If you set the option to plain,

audit_log:
  to: file
  format: plain

the output in the file might look as follows:

2024-01-17T00:12:27.155+0300
4b5a2624-28e5-4b08-83c7-035a0c5a1db9
INFO remote:unix/:(socket)
session_type:console
module:tarantool
user:admin
type:space_create
tag:
description:Create space Bands

Type: string
Possible values: „json“, „csv“, „plain“
Default: „json“
Environment variable: TT_AUDIT_LOG_FORMAT
audit_log.nonblock

Specify the logging behavior if the system is not ready to write. If set to true, Tarantool does not block during logging if the system is non-writable and writes a message instead. Using this value may improve logging performance at the cost of losing some log messages.

Примечание

The option only has an effect if the audit_log.to is set to syslog or pipe.


Type: boolean
Default: false
Environment variable: TT_AUDIT_LOG_NONBLOCK
audit_log.pipe

Specify a pipe for the audit log destination. You can set the pipe type using the audit_log.to option. If log is a program, its pid is stored in the audit.pid field. You need to send it a signal to rotate logs.

Example

This starts the cronolog program when the server starts and sends all audit_log messages to cronolog standard input (stdin).

audit_log:
  to: pipe
  pipe: 'cronolog audit_tarantool.log'

Type: string
Default: box.NULL
Environment variable: TT_AUDIT_LOG_PIPE
audit_log.spaces

The array of space names for which data operation events (space_select, space_insert, space_replace, space_delete) should be logged. The array accepts string values. If set to box.NULL, the data operation events are logged for all spaces.

Example

In the example, only the events of bands and singers spaces are logged:

audit_log:
  spaces: [bands, singers]

Type: array
Default: box.NULL
Environment variable: TT_AUDIT_LOG_SPACES
audit_log.to

Enable audit logging and define the log location. This option accepts the following values:

By default, audit logging is disabled.

Example

The basic audit log configuration might look as follows:

audit_log:
  to: file
  file: 'audit_tarantool.log'
  filter: [ user_create,data_operations,ddl,custom ]
  format: json
  spaces: [ bands ]
  extract_key: true

Type: string
Possible values: „devnull“, „file“, „pipe“, „syslog“
Default: „devnull“
Environment variable: TT_AUDIT_LOG_TO

audit_log.syslog.facility

Specify a system logger keyword that tells syslogd where to send the message. You can enable logging to a system logger using the audit_log.to option.

See also: syslog configuration example


Type: string
Possible values: „auth“, „authpriv“, „cron“, „daemon“, „ftp“, „kern“, „lpr“, „mail“, „news“, „security“, „syslog“, „user“, „uucp“, „local0“, „local1“, „local2“, „local3“, „local4“, „local5“, „local6“, „local7“
Default: „local7“
Environment variable: TT_AUDIT_LOG_SYSLOG_FACILITY
audit_log.syslog.identity

Specify an application name to show in logs. You can enable logging to a system logger using the audit_log.to option.

See also: syslog configuration example


Type: string
Default: „tarantool“
Environment variable: TT_AUDIT_LOG_SYSLOG_IDENTITY
audit_log.syslog.server

Set a location for the syslog server. It can be a Unix socket path starting with „unix:“ or an ipv4 port number. You can enable logging to a system logger using the audit_log.to option.

Example

audit_log:
  to: syslog
  syslog:
    server: 'unix:/dev/log'
    facility: 'user'
    identity: 'tarantool_audit'

These options are interpreted as a message for the syslogd program, which runs in the background of any Unix-like platform.

An example of a Tarantool audit log entry in the syslog:

09:32:52 tarantool_audit: {"time": "2024-02-08T09:32:52.190+0300", "uuid": "94454e46-9a0e-493a-bb9f-d59e44a43581", "severity": "INFO", "remote": "unix/:(socket)", "session_type": "console", "module": "tarantool", "user": "admin", "type": "space_create", "tag": "", "description": "Create space bands"}

Предупреждение

Above is an example of writing audit logs to a directory shared with the system logs. Tarantool allows this option, but it is not recommended to do this to avoid difficulties when working with audit logs. System and audit logs should be written separately. To do this, create separate paths and specify them.


Type: string
Default: box.NULL
Environment variable: TT_AUDIT_LOG_SYSLOG_SERVER

The compat section defines values of the compat module options.

Примечание

compat can be defined in any scope.

compat.binary_data_decoding

Define how to store binary data fields in Lua after decoding:

  • new: as varbinary objects
  • old: as plain strings

See also: Decoding binary objects


Type: string
Possible values: „new“, „old“
Default: „new“
Environment variable: TT_COMPAT_BINARY_DATA_DECODING
compat.box_cfg_replication_sync_timeout

Set a default replication sync timeout:

  • new: 0
  • old: 300 seconds

Важно

This value is set during the initial box.cfg{} call and cannot be changed later.

See also: Default value for replication_sync_timeout


Type: string
Possible values: „new“, „old“
Default: „new“
Environment variable: TT_COMPAT_BOX_CFG_REPLICATION_SYNC_TIMEOUT
compat.box_error_serialize_verbose

Since: 3.1.0

Set the verbosity of error objects serialization:

  • new: serialize the error message together with other potentially useful fields
  • old: serialize only the error message

Type: string
Possible values: „new“, „old“
Default: „old“
Environment variable: TT_COMPAT_BOX_ERROR_SERIALIZE_VERBOSE
compat.box_error_unpack_type_and_code

Since: 3.1.0

Whether to show error fields in box.error.unpack():

  • new: do not show base_type and custom_type fields; do not show the code field if it is 0. Note that base_type is still accessible for an error object.
  • old: show all fields

Type: string
Possible values: „new“, „old“
Default: „old“
Environment variable: TT_COMPAT_BOX_ERROR_UNPACK_TYPE_AND_CODE
compat.box_info_cluster_meaning

Define the behavior of box.info.cluster:

  • new: show the entire cluster
  • old:: show the current replica set

See also: Meaning of box.info.cluster


Type: string
Possible values: „new“, „old“
Default: „new“
Environment variable: TT_COMPAT_BOX_INFO_CLUSTER_MEANING
compat.box_session_push_deprecation

Whether to raise errors on attempts to call the deprecated function box.session.push:

  • new: raise an error
  • old: do not raise an error

See also: box.session.push() deprecation


Type: string
Possible values: „new“, „old“
Default: „old“
Environment variable: TT_COMPAT_BOX_SESSION_PUSH_DEPRECATION
compat.box_space_execute_priv

Whether the execute privilege can be granted on spaces:

  • new: an error is raised
  • old: the privilege can be granted with no actual effect

Type: string
Possible values: „new“, „old“
Default: „new“
Environment variable: TT_COMPAT_BOX_SPACE_EXECUTE_PRIV
compat.box_space_max

Set the maximum space identifier (box.schema.SPACE_MAX):

  • new: 2147483646
  • old: 2147483647

The limit was decremented because the old max value is used as an error indicator in the box C API.


Type: string
Possible values: „new“, „old“
Default: „new“
Environment variable: TT_COMPAT_BOX_SPACE_MAX
compat.box_tuple_extension

Controls IPROTO_FEATURE_CALL_RET_TUPLE_EXTENSION and IPROTO_FEATURE_CALL_ARG_TUPLE_EXTENSION feature bits that define tuple encoding in iproto call and eval requests.

  • new: tuples with formats are encoded as MP_TUPLE
  • old: tuples with formats are encoded as MP_ARRAY

Type: string
Possible values: „new“, „old“
Default: „new“
Environment variable: TT_COMPAT_BOX_TUPLE_EXTENSION
compat.box_tuple_new_vararg

Controls how box.tuple.new interprets an argument list:

  • new: as a value with a tuple format
  • old: as an array of tuple fields

Type: string
Possible values: „new“, „old“
Default: „new“
Environment variable: TT_COMPAT_BOX_TUPLE_NEW_VARARG
compat.c_func_iproto_multireturn

Controls wrapping of multiple results of a stored C function when returning them via iproto:

  • new: return without wrapping (consistently with a local call via box.func)
  • old: wrap results into a MessagePack array

Type: string
Possible values: „new“, „old“
Default: „new“
Environment variable: TT_COMPAT_C_FUNC_IPROTO_MULTIRETURN
compat.fiber_channel_close_mode

Define the behavior of fiber channels after closing:

  • new: mark the channel read-only
  • old: destroy the channel object

See also: Fiber channel close mode


Type: string
Possible values: „new“, „old“
Default: „new“
Environment variable: TT_COMPAT_FIBER_CHANNEL_CLOSE_MODE
compat.fiber_slice_default

Define the maximum fiber execution time without a yield:

  • new: {warn = 0.5, err = 1.0}
  • old: infinity (no warnings or errors raised).

See also: Default value for max fiber slice


Type: string
Possible values: „new“, „old“
Default: „new“
Environment variable: TT_COMPAT_FIBER_SLICE_DEFAULT
compat.json_escape_forward_slash

Whether to escape the forward slash symbol „/“ using a backslash in a json.encode() result:

  • new: do not escape the forward slash
  • old: escape the forward slash

See also: JSON encode escape forward slash


Type: string
Possible values: „new“, „old“
Default: „new“
Environment variable: TT_COMPAT_JSON_ESCAPE_FORWARD_SLASH
compat.sql_priv

Whether to enable access checks for SQL requests over iproto:

  • new: check the user’s access permissions
  • old: allow any user to execute SQL over iproto

Type: string
Possible values: „new“, „old“
Default: „new“
Environment variable: TT_COMPAT_SQL_PRIV
compat.sql_seq_scan_default

Controls the default value of the sql_seq_scan session setting:

  • new: false
  • old: true

See also: Default value for sql_seq_scan session setting


Type: string
Possible values: „new“, „old“
Default: „new“
Environment variable: TT_COMPAT_SQL_SEQ_SCAN_DEFAULT
compat.yaml_pretty_multiline

Whether to encode in block scalar style all multiline strings or ones containing the \n\n substring:

  • new: all multiline strings
  • old: only strings containing the \n\n substring

See also: Lua-YAML prettier multiline output


Type: string
Possible values: „new“, „old“
Default: „new“
Environment variable: TT_COMPAT_YAML_PRETTY_MULTILINE

The conditional section defines the configuration parts that apply to instances that meet certain conditions.

Примечание

conditional can be defined in the global scope only.

conditional.if

Specify a conditional section of the configuration. The configuration options defined inside a conditional.if section apply only to instances on which the specified condition is true.

Conditions can include one variable – tarantool_version: a three-number Tarantool version running on the instance, for example, 3.1.0. It compares to version literal values that include three numbers separated by periods (x.y.z).

The following operators are available in conditions:

  • comparison: >, <, >=, <=, ==, !=
  • logical operators || (OR) and && (AND)
  • parentheses ()

Example:

In this example, different configuration parts apply to instances running Tarantool versions above and below 3.1.0:

  • On versions less than 3.1.0, the upgraded label is set to false.
  • On versions 3.1.0 or newer, the upgraded label is set to true. Additionally, new compat options are defined. These options were introduced in version 3.1.0, so on older versions they would cause an error.
conditional:
  - if: tarantool_version < 3.1.0
    labels:
      upgraded: 'false'
  - if: tarantool_version >= 3.1.0
    labels:
      upgraded: 'true'
    compat:
      box_error_serialize_verbose: 'new'
      box_error_unpack_type_and_code: 'new'

See also: Conditional configuration sections

The config section defines various parameters related to centralized configuration.

Примечание

config can be defined in the global scope only.

config.reload

Specify how the configuration is reloaded. This option accepts the following values:

  • auto: configuration is reloaded automatically when it is changed.
  • manual: configuration should be reloaded manually. In this case, you can reload the configuration in the application code using config:reload().

See also: Reloading configuration


Type: string
Possible values: „auto“, „manual“
Default: „auto“
Environment variable: TT_CONFIG_RELOAD

This section describes options related to loading configuration settings from external storage such as external files or environment variables.

config.context

Specify how to load settings from external storage. For example, this option can be used to load passwords from safe storage. You can find examples in the Loading secrets from safe storage section.


Type: map
Default: nil
Environment variable: TT_CONFIG_CONTEXT
config.context.<name>

The name of an entity that identifies a configuration value to load.

config.context.<name>.env

The name of an environment variable to load a configuration value from. To load a configuration value from an environment variable, set config.context.<name>.from to env.

Example

In this example, passwords are loaded from the DBADMIN_PASSWORD and SAMPLEUSER_PASSWORD environment variables:

config:
  context:
    dbadmin_password:
      from: env
      env: DBADMIN_PASSWORD
    sampleuser_password:
      from: env
      env: SAMPLEUSER_PASSWORD

See also: Loading secrets from safe storage

config.context.<name>.from

The type of storage to load a configuration value from. There are the following storage types:

  • file: load a configuration value from a file. In this case, you need to specify the path to the file using config.context.<name>.file.
  • env: load a configuration value from an environment variable. In this case, specify the environment variable name using config.context.<name>.env.
config.context.<name>.file

The path to a file to load a configuration value from. To load a configuration value from a file, set config.context.<name>.from to file.

Example

In this example, passwords are loaded from the dbadmin_password.txt and sampleuser_password.txt files:

config:
  context:
    dbadmin_password:
      from: file
      file: secrets/dbadmin_password.txt
      rstrip: true
    sampleuser_password:
      from: file
      file: secrets/sampleuser_password.txt
      rstrip: true

See also: Loading secrets from safe storage

config.context.<name>.rstrip

(Optional) Whether to strip whitespace characters and newlines from the end of data.

Enterprise Edition

Centralized configuration storages are supported by the Enterprise Edition only.

This section describes options related to providing connection settings to a centralized etcd-based storage. If replication.failover is set to supervised, Tarantool also uses etcd to maintain the state of failover coordinators.

Примечание

Note that a centralized cluster configuration cannot contain the config.etcd section.

config.etcd.endpoints

The list of endpoints used to access an etcd server.

See also: Configuring connection to an etcd storage


Type: array
Default: nil
Environment variable: TT_CONFIG_ETCD_ENDPOINTS
config.etcd.prefix

A key prefix used to search a configuration on an etcd server. Tarantool searches keys by the following path: <prefix>/config/*. Note that <prefix> should start with a slash (/).

See also: Configuring connection to an etcd storage


Type: string
Default: nil
Environment variable: TT_CONFIG_ETCD_PREFIX
config.etcd.username

A username used for authentication.

See also: Configuring connection to an etcd storage


Type: string
Default: nil
Environment variable: TT_CONFIG_ETCD_USERNAME
config.etcd.password

A password used for authentication.

See also: Configuring connection to an etcd storage


Type: string
Default: nil
Environment variable: TT_CONFIG_ETCD_PASSWORD
config.etcd.ssl.ca_file

A path to a trusted certificate authorities (CA) file.


Type: string
Default: nil
Environment variable: TT_CONFIG_ETCD_SSL_CA_FILE
config.etcd.ssl.ca_path

A path to a directory holding certificates to verify the peer with.


Type: string
Default: nil
Environment variable: TT_CONFIG_ETCD_SSL_CA_PATH
config.etcd.ssl.ssl_cert

Since: 3.2.0

A path to an SSL certificate file.


Type: string
Default: nil
Environment variable: TT_CONFIG_ETCD_SSL_SSL_CERT
config.etcd.ssl.ssl_key

A path to a private SSL key file.


Type: string
Default: nil
Environment variable: TT_CONFIG_ETCD_SSL_SSL_KEY
config.etcd.ssl.verify_host

Enable verification of the certificate’s name (CN) against the specified host.


Type: boolean
Default: nil
Environment variable: TT_CONFIG_ETCD_SSL_VERIFY_HOST
config.etcd.ssl.verify_peer

Enable verification of the peer’s SSL certificate.


Type: boolean
Default: nil
Environment variable: TT_CONFIG_ETCD_SSL_VERIFY_PEER
config.etcd.http.request.timeout

A time period required to process an HTTP request to an etcd server: from sending a request to receiving a response.

See also: Configuring connection to an etcd storage


Type: number
Default: nil
Environment variable: TT_CONFIG_ETCD_HTTP_REQUEST_TIMEOUT
config.etcd.http.request.unix_socket

A Unix domain socket used to connect to an etcd server.


Type: string
Default: nil
Environment variable: TT_CONFIG_ETCD_HTTP_REQUEST_UNIX_SOCKET
config.etcd.watchers.reconnect_max_attempts

Since: 3.1.0

The maximum number of attempts to reconnect to an etcd server in case of connection failure.


Type: integer
Default: nil
Environment variable: TT_CONFIG_ETCD_WATCHERS_RECONNECT_MAX_ATTEMPTS
config.etcd.watchers.reconnect_timeout

Since: 3.1.0

The timeout (in seconds) between attempts to reconnect to an etcd server in case of connection failure.


Type: number
Default: nil
Environment variable: TT_CONFIG_ETCD_WATCHERS_RECONNECT_TIMEOUT

Enterprise Edition

Centralized configuration storages are supported by the Enterprise Edition only.

This section describes options related to providing connection settings to a centralized Tarantool-based storage.

Примечание

Note that a centralized cluster configuration cannot contain the config.storage section.

config.storage.endpoints

An array of endpoints used to access a configuration storage. Each endpoint can include the following fields:

  • uri: a URI of the configuration storage’s instance.
  • login: a username used to connect to the instance.
  • password: a password used for authentication.
  • params: SSL parameters required for encrypted connections (<uri>.params.*).

See also: Configuring connection to a Tarantool storage


Type: array
Default: nil
Environment variable: TT_CONFIG_STORAGE_ENDPOINTS
config.storage.prefix

A key prefix used to search a configuration in a centralized configuration storage. Tarantool searches keys by the following path: <prefix>/config/*. Note that <prefix> should start with a slash (/).

See also: Configuring connection to a Tarantool storage


Type: string
Default: nil
Environment variable: TT_CONFIG_STORAGE_PREFIX
config.storage.reconnect_after

A number of seconds to wait before reconnecting to a configuration storage.


Type: number
Default: 3
Environment variable: TT_CONFIG_STORAGE_RECONNECT_AFTER
config.storage.timeout

The interval (in seconds) to perform the status check of a configuration storage.

See also: Configuring connection to a Tarantool storage


Type: number
Default: 3
Environment variable: TT_CONFIG_STORAGE_TIMEOUT

Configure the administrative console. A client to the console is tt connect.

Примечание

console can be defined in any scope.

console.enabled

Whether to listen on the Unix socket provided in the console.socket option.

If the option is set to false, the administrative console is disabled.


Type: boolean
Default: true
Environment variable: TT_CONSOLE_ENABLED
console.socket

The Unix socket for the administrative console.

Mind the following nuances:

  • Only a Unix domain socket is allowed. A TCP socket can’t be configured this way.
  • console.socket is a file path, without any unix: or unix/: prefixes.
  • If the file path is a relative path, it is interpreted relative to process.work_dir.

Type: string
Default: „var/run/{{ instance_name }}/tarantool.control“
Environment variable: TT_CONSOLE_SOCKET

The credentials section allows you to create users and grant them the specified privileges. Learn more in Credentials.

Примечание

credentials can be defined in any scope.

credentials.roles

An array of roles that can be granted to users or other roles.

Example

In the example below, the writers_space_reader role gets privileges to select data in the writers space:

roles:
  writers_space_reader:
    privileges:
    - permissions: [ read ]
      spaces: [ writers ]

See also: Managing users and roles


Type: map
Default: nil
Environment variable: TT_CREDENTIALS_ROLES
credentials.roles.<role_name>.roles

An array of roles granted to this role.

credentials.roles.<role_name>.privileges

An array of privileges granted to this role.

See <user_or_role_name>.privileges.*.

credentials.users

An array of users.

Example

In this example, sampleuser gets the following privileges:

  • Privileges granted to the writers_space_reader role.
  • Privileges to select and modify data in the books space.
sampleuser:
  password: '123456'
  roles: [ writers_space_reader ]
  privileges:
  - permissions: [ read, write ]
    spaces: [ books ]

See also: Managing users and roles


Type: map
Default: nil
Environment variable: TT_CREDENTIALS_USERS
credentials.users.<username>.password

A user’s password.

Example

In the example below, a password for the dbadmin user is set:

credentials:
  users:
    dbadmin:
      password: 'T0p_Secret_P@$$w0rd'

See also: Loading secrets from safe storage

credentials.users.<username>.roles

An array of roles granted to this user.

credentials.users.<username>.privileges

An array of privileges granted to this user.

See <user_or_role_name>.privileges.*.

<user_or_role_name>.privileges

Privileges that can be granted to a user or role using the following options:

<user_or_role_name>.privileges.permissions

Permissions assigned to this user or a user with this role.

Example

In this example, sampleuser gets privileges to select and modify data in the books space:

sampleuser:
  password: '123456'
  roles: [ writers_space_reader ]
  privileges:
  - permissions: [ read, write ]
    spaces: [ books ]

See also: Managing users and roles

<user_or_role_name>.privileges.spaces

Spaces to which this user or a user with this role gets the specified permissions.

Example

In this example, sampleuser gets privileges to select and modify data in the books space:

sampleuser:
  password: '123456'
  roles: [ writers_space_reader ]
  privileges:
  - permissions: [ read, write ]
    spaces: [ books ]

See also: Managing users and roles

<user_or_role_name>.privileges.functions

Functions to which this user or a user with this role gets the specified permissions.

<user_or_role_name>.privileges.sequences

Sequences to which this user or a user with this role gets the specified permissions.

<user_or_role_name>.privileges.lua_eval

Whether this user or a user with this role can execute arbitrary Lua code.

<user_or_role_name>.privileges.lua_call

A list of global user-defined Lua functions that this user or a user with this role can call. To allow calling a specific function, specify its name as the value. To allow calling all global Lua functions except built-in ones functions, specify the all value.

This option should be configured together with the execute permission.

Since version 3.3.0, the lua_call option allows granting users privileges to call specified lua function on the instance in runtime (thus it doesn’t require an ability to write to the database).

Example to grant custom functions to the „alice“ user:

credentials:
  users:
    alice:
      privileges:
        - permissions: [execute]
          lua_call: [my_func, my_func2]
<user_or_role_name>.privileges.sql

Whether this user or a user with this role can execute an arbitrary SQL expression.

The database section defines database-specific configuration parameters, such as an instance’s read-write mode or transaction isolation level.

Примечание

database can be defined in any scope.

database.hot_standby

Whether to start the server in the hot standby mode. This mode can be used to provide failover without replication.

Suppose there are two cluster applications. Each cluster has one instance with the same configuration:

groups:
  group001:
    replicasets:
      replicaset001:
        instances:
          instance001:
            database:
              hot_standby: true
            wal:
              dir: /tmp/wals
            snapshot:
              dir: /tmp/snapshots
            iproto:
              listen:
              - uri: '127.0.0.1:3301'

In particular, both instances use the same directory for storing write-ahead logs and snapshots.

When you start both cluster applications on the same machine, the instance from the first one will be the primary instance and the second will be the standby instance. In the logs of the second cluster instance, you should see a notification:

main/104/interactive I> Entering hot standby mode

This means that the standby instance is ready to take over if the primary instance goes down. The standby instance initializes and tries to take a lock on a directory for storing write-ahead logs but fails because the primary instance has made a lock on this directory.

If the primary instance goes down for any reason, the lock is released. In this case, the standby instance succeeds in taking the lock and becomes the primary instance.

database.hot_standby has no effect:

  • If wal.mode is set to none.
  • If wal.dir_rescan_delay is set to a large value on macOS or FreeBSD. On these platforms, the hot standby mode is designed so that the loop repeats every wal.dir_rescan_delay seconds.
  • For spaces created with engine set to vinyl.

Examples on GitHub: hot_standby_1, hot_standby_2


Type: boolean
Default: false
Environment variable: TT_DATABASE_HOT_STANDBY
database.instance_uuid

An instance UUID.

By default, instance UUIDs are generated automatically. database.instance_uuid can be used to specify an instance identifier manually.

UUIDs should follow these rules:

  • The values must be true unique identifiers, not shared by other instances or replica sets within the common infrastructure.
  • The values must be used consistently, not changed after the initial setup. The initial values are stored in snapshot files and are checked whenever the system is restarted.
  • The values must comply with RFC 4122. The nil UUID is not allowed.

See also: database.replicaset_uuid


Type: string
Default: box.NULL
Environment variable: TT_DATABASE_INSTANCE_UUID
database.mode

An instance’s operating mode. This option is in effect if replication.failover is set to off.

The following modes are available:

  • rw: an instance is in read-write mode.
  • ro: an instance is in read-only mode.

If not specified explicitly, the default value depends on the number of instances in a replica set. For a single instance, the rw mode is used, while for multiple instances, the ro mode is used.

Example

You can set the database.mode option to rw on all instances in a replica set to make a master-master configuration. In this case, replication.failover should be set to off.

credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [replication]

iproto:
  advertise:
    peer:
      login: replicator

replication:
  failover: off

groups:
  group001:
    replicasets:
      replicaset001:
        instances:
          instance001:
            database:
              mode: rw
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            database:
              mode: rw
            iproto:
              listen:
              - uri: '127.0.0.1:3302'

# Load sample data
app:
  file: 'myapp.lua'

Type: string
Default: box.NULL (the actual default value depends on the number of instances in a replica set)
Environment variable: TT_DATABASE_MODE
database.replicaset_uuid

A replica set UUID.

By default, replica set UUIDs are generated automatically. database.replicaset_uuid can be used to specify a replica set identifier manually.

See also: database.instance_uuid


Type: string
Default: box.NULL
Environment variable: TT_DATABASE_REPLICASET_UUID
database.txn_isolation

A transaction isolation level.


Type: string
Default: best-effort
Possible values: best-effort, read-committed, read-confirmed
Environment variable: TT_DATABASE_TXN_ISOLATION
database.txn_timeout

A timeout (in seconds) after which the transaction is rolled back.

See also: box.begin()


Type: number
Default: 3153600000 (~100 years)
Environment variable: TT_DATABASE_TXN_TIMEOUT
database.use_mvcc_engine

Whether the transactional manager is enabled.


Type: boolean
Default: false
Environment variable: TT_DATABASE_USE_MVCC_ENGINE

The failover section defines parameters related to a supervised failover.

Примечание

failover can be defined in the global scope only.

failover.log.to

Since: 3.3.0

Enterprise Edition

Configuring failover.log.to and failover.log.file parameters is available in the Enterprise Edition only.

Define a location Tarantool sends failover logs to. This option accepts the following values:

  • stderr: write logs to the standard error stream.
  • file: write logs to a file (see failover.log.file).

Type: string
Default: „stderr“
Environment variable: TT_FAILOVER_LOG_TO
failover.log.file

Since: 3.3.0

Specify a file for failover logs destination. To write logs to a file, set failover.log.to to file. Otherwise, failover.log.file is ignored.

Example

The example below shows how to write failover logs to a file placed in the specified directory:

failover:
  log:
    to: file
    file: var/log/failover.log

Type: string
Default: nil
Environment variable: TT_FAILOVER_LOG_FILE
failover.call_timeout

Since: 3.1.0

A call timeout (in seconds) for monitoring and failover requests to an instance.

Type: number
Default: 1
Environment variable: TT_FAILOVER_CALL_TIMEOUT
failover.connect_timeout

Since: 3.1.0

A connection timeout (in seconds) for monitoring and failover requests to an instance.

Type: number
Default: 1
Environment variable: TT_FAILOVER_CONNECT_TIMEOUT
failover.lease_interval

Since: 3.1.0

A time interval (in seconds) that specifies how long an instance should be a leader without renew requests from a coordinator. When this interval expires, the leader switches to read-only mode. This action is performed by the instance itself and works even if there is no connectivity between the instance and the coordinator.

Type: number
Default: 30
Environment variable: TT_FAILOVER_LEASE_INTERVAL
failover.probe_interval

Since: 3.1.0

A time interval (in seconds) that specifies how often a monitoring service of the failover coordinator polls an instance for its status.

Type: number
Default: 10
Environment variable: TT_FAILOVER_PROBE_INTERVAL
failover.renew_interval

Since: 3.1.0

A time interval (in seconds) that specifies how often a failover coordinator sends read-write deadline renewals.

Type: number
Default: 10
Environment variable: TT_FAILOVER_RENEW_INTERVAL

failover.stateboard.* options define configuration parameters related to maintaining the state of failover coordinators in a remote etcd-based storage.

See also: Active and passive coordinators

failover.stateboard.keepalive_interval

Since: 3.1.0

A time interval (in seconds) that specifies how long a transient state information is stored and how quickly a lock expires.

Примечание

failover.stateboard.keepalive_interval should be smaller than failover.lease_interval. Otherwise, switching of a coordinator causes a replica set leader to go to read-only mode for some time.

Type: number
Default: 10
Environment variable: TT_FAILOVER_STATEBOARD_KEEPALIVE_INTERVAL
failover.stateboard.renew_interval

Since: 3.1.0

A time interval (in seconds) that specifies how often a failover coordinator writes its state information to etcd. This option also determines the frequency at which an active coordinator reads new commands from etcd.

Type: number
Default: 2
Environment variable: TT_FAILOVER_STATEBOARD_RENEW_INTERVAL

The feedback section describes configuration parameters for sending information about a running Tarantool instance to the specified feedback server.

Примечание

feedback can be defined in any scope.

feedback.crashinfo

Whether to send crash information in the case of an instance failure. This information includes:

  • General information from the uname output.
  • Build information.
  • The crash reason.
  • The stack trace.

To turn off sending crash information, set this option to false.


Type: boolean
Default: true
Environment variable: TT_FEEDBACK_CRASHINFO
feedback.enabled

Whether to send information about a running instance to the feedback server. To turn off sending feedback, set this option to false.


Type: boolean
Default: true
Environment variable: TT_FEEDBACK_ENABLED
feedback.host

The address to which information is sent.


Type: string
Environment variable: TT_FEEDBACK_HOST
feedback.interval

The interval (in seconds) of sending information.


Type: number
Default: 3600
Environment variable: TT_FEEDBACK_INTERVAL
feedback.metrics_collect_interval

The interval (in seconds) for collecting metrics.


Type: number
Default: 60
Environment variable: TT_FEEDBACK_METRICS_COLLECT_INTERVAL
feedback.metrics_limit

The maximum size of memory (in bytes) used to store metrics before sending them to the feedback server. If the size of collected metrics exceeds this value, earlier metrics are dropped.


Type: integer
Default: 1024 * 1024 (1048576)
Environment variable: TT_FEEDBACK_METRICS_LIMIT
feedback.send_metrics

Whether to send metrics to the feedback server. Note that all collected metrics are dropped after sending them to the feedback server.


Type: boolean
Default: true
Environment variable: TT_FEEDBACK_SEND_METRICS

The fiber section describes options related to configuring fibers, yields, and cooperative multitasking.

Примечание

fiber can be defined in any scope.

fiber.io_collect_interval

The time period (in seconds) a fiber sleeps between iterations of the event loop.

fiber.io_collect_interval can be used to reduce CPU load in deployments where the number of client connections is large, but requests are not so frequent (for example, each connection issues just a handful of requests per second).


Type: number
Default: box.NULL
Environment variable: TT_FIBER_IO_COLLECT_INTERVAL
fiber.too_long_threshold

If processing a request takes longer than the given period (in seconds), the fiber warns about it in the log.

fiber.too_long_threshold has effect only if log.level is greater than or equal to 4 (warn).


Type: number
Default: 0.5
Environment variable: TT_FIBER_TOO_LONG_THRESHOLD
fiber.worker_pool_threads

The maximum number of threads to use during execution of certain internal processes (for example, socket.getaddrinfo() and coio_call()).


Type: number
Default: 4
Environment variable: TT_FIBER_WORKER_POOL_THREADS

This section describes options related to configuring time periods for fiber slices. See fiber.set_max_slice for details and examples.

fiber.slice.warn

Set a time period (in seconds) that specifies the warning slice.


Type: number
Default: 0.5
Environment variable: TT_FIBER_SLICE_WARN
fiber.slice.err

Set a time period (in seconds) that specifies the error slice.


Type: number
Default: 1
Environment variable: TT_FIBER_SLICE_ERR

This section describes options related to configuring the fiber.top() function, normally used for debug purposes. fiber.top() shows all alive fibers and their CPU consumption.

fiber.top.enabled

Enable or disable the fiber.top() function.

Enabling fiber.top() slows down fiber switching by about 15%, so it is disabled by default.


Type: boolean
Default: false
Environment variable: TT_FIBER_TOP_ENABLED

Enterprise Edition

Configuring flightrec parameters is available in the Enterprise Edition only.

The flightrec section describes options related to the flight recorder configuration.

Примечание

flightrec can be defined in any scope.

flightrec.enabled

Enable the flight recorder.


Type: boolean
Default: false
Environment variable: TT_FLIGHTREC_ENABLED
flightrec.logs_size

Specify the size (in bytes) of the log storage. You can set this option to 0 to disable the log storage.


Type: integer
Default: 10485760
Environment variable: TT_FLIGHTREC_LOGS_SIZE
flightrec.logs_max_msg_size

Specify the maximum size (in bytes) of the log message. The log message is truncated if its size exceeds this limit.


Type: integer
Default: 4096
Maximum: 16384
Environment variable: TT_FLIGHTREC_LOGS_MAX_MSG_SIZE
flightrec.logs_log_level

Specify the level of detail the log has. The default value is 6 (VERBOSE). You can learn more about log levels from the log_level option description. Note that the flightrec.logs_log_level value might differ from log_level.


Type: integer
Default: 6
Environment variable: TT_FLIGHTREC_LOGS_LOG_LEVEL
flightrec.metrics_period

Specify the time period (in seconds) that defines how long metrics are stored from the moment of dump. So, this value defines how much historical metrics data is collected up to the moment of crash. The frequency of metric dumps is defined by flightrec.metrics_interval.


Type: integer
Default: 180
Environment variable: TT_FLIGHTREC_METRICS_PERIOD
flightrec.metrics_interval

Specify the time interval (in seconds) that defines the frequency of dumping metrics. This value shouldn’t exceed flightrec.metrics_period.


Type: number
Default: 1.0
Minimum: 0.001
Environment variable: TT_FLIGHTREC_METRICS_INTERVAL

Примечание

Given that the average size of a metrics entry is 2 kB, you can estimate the size of the metrics storage as follows:

(flightrec_metrics_period / flightrec_metrics_interval) * 2 kB
flightrec.requests_size

Specify the size (in bytes) of storage for the request and response data. You can set this parameter to 0 to disable a storage of requests and responses.


Type: integer
Default: 10485760
Environment variable: TT_FLIGHTREC_REQUESTS_SIZE
flightrec.requests_max_req_size

Specify the maximum size (in bytes) of a request entry. A request entry is truncated if this size is exceeded.


Type: integer
Default: 16384
Environment variable: TT_FLIGHTREC_REQUESTS_MAX_REQ_SIZE
flightrec.requests_max_res_size

Specify the maximum size (in bytes) of a response entry. A response entry is truncated if this size is exceeded.


Type: integer
Default: 16384
Environment variable: TT_FLIGHTREC_REQUESTS_MAX_RES_SIZE

The iproto section is used to configure parameters related to communicating to and between cluster instances.

Примечание

iproto can be defined in any scope.

iproto.listen

An array of URIs used to listen for incoming requests. If required, you can enable SSL for specific URIs by providing additional parameters (<uri>.params.*).

Note that a URI value can’t contain parameters, a login, or a password.

Example

In the example below, iproto.listen is set explicitly for each instance in a cluster:

groups:
  group001:
    replicasets:
      replicaset001:
        instances:
          instance001:
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            iproto:
              listen:
              - uri: '127.0.0.1:3302'
          instance003:
            iproto:
              listen:
              - uri: '127.0.0.1:3303'

See also: Connections


Type: array
Default: nil
Environment variable: TT_IPROTO_LISTEN
iproto.net_msg_max

To handle messages, Tarantool allocates fibers. To prevent fiber overhead from affecting the whole system, Tarantool restricts how many messages the fibers handle, so that some pending requests are blocked.

  • On powerful systems, increase net_msg_max, and the scheduler starts processing pending requests immediately.
  • On weaker systems, decrease net_msg_max, and the overhead may decrease. However, this may take some time because the scheduler must wait until already-running requests finish.

When net_msg_max is reached, Tarantool suspends processing of incoming packages until it has processed earlier messages. This is not a direct restriction of the number of fibers that handle network messages, rather it is a system-wide restriction of channel bandwidth. This in turn restricts the number of incoming network messages that the transaction processor thread handles, and therefore indirectly affects the fibers that handle network messages.

Примечание

The number of fibers is smaller than the number of messages because messages can be released as soon as they are delivered, while incoming requests might not be processed until some time after delivery.


Type: integer
Default: 768
Environment variable: TT_IPROTO_NET_MSG_MAX
iproto.readahead

The size of the read-ahead buffer associated with a client connection. The larger the buffer, the more memory an active connection consumes, and the more requests can be read from the operating system buffer in a single system call.

The recommendation is to make sure that the buffer can contain at least a few dozen requests. Therefore, if a typical tuple in a request is large, e.g. a few kilobytes or even megabytes, the read-ahead buffer size should be increased. If batched request processing is not used, it’s prudent to leave this setting at its default.


Type: integer
Default: 16320
Environment variable: TT_IPROTO_READAHEAD
iproto.threads

The number of network threads. There can be unusual workloads where the network thread is 100% loaded and the transaction processor thread is not, so the network thread is a bottleneck. In that case, set iproto_threads to 2 or more. The operating system kernel determines which connection goes to which thread.


Type: integer
Default: 1
Environment variable: TT_IPROTO_THREADS

iproto.advertise.client

A URI used to advertise the current instance to clients.

The iproto.advertise.client option accepts a URI in the following formats:

  • An address: host:port.
  • A Unix domain socket: unix/:.

Note that this option doesn’t allow to set a username and password. If a remote client needs this information, it should be delivered outside of the cluster configuration.

Примечание

The host value cannot be 0.0.0.0/[::] and the port value cannot be 0.


Type: string
Default: box.NULL
Environment variable: TT_IPROTO_ADVERTISE_CLIENT
iproto.advertise.peer

Settings used to advertise the current instance to other cluster members. The format of these settings is described in iproto.advertise.<peer_or_sharding>.*.

Example

In the example below, the following configuration options are specified:

  • In the credentials section, the replicator user with the replication role is created.
  • iproto.advertise.peer specifies that other instances should connect to an address defined in iproto.listen using the replicator user.
credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [replication]

iproto:
  advertise:
    peer:
      login: replicator

replication:
  failover: election

groups:
  group001:
    replicasets:
      replicaset001:
        instances:
          instance001:
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            iproto:
              listen:
              - uri: '127.0.0.1:3302'
          instance003:
            iproto:
              listen:
              - uri: '127.0.0.1:3303'

Type: map
Environment variable: see iproto.advertise.<peer_or_sharding>.*
iproto.advertise.sharding

Settings used to advertise the current instance to a router and rebalancer. The format of these settings is described in iproto.advertise.<peer_or_sharding>.*.

Примечание

If iproto.advertise.sharding is not specified, advertise settings from iproto.advertise.peer are used.

Example

In the example below, the following configuration options are specified:

  • In the credentials section, the replicator and storage users are created.
  • iproto.advertise.peer specifies that other instances should connect to an address defined in iproto.listen with the replicator user.
  • iproto.advertise.sharding specifies that a router should connect to storages using an address defined in iproto.listen with the storage user.
credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [replication]
    storage:
      password: 'secret'
      roles: [sharding]

iproto:
  advertise:
    peer:
      login: replicator
    sharding:
      login: storage

Type: map
Environment variable: see iproto.advertise.<peer_or_sharding>.*

iproto.advertise.<peer_or_sharding>.uri

(Optional) A URI used to advertise the current instance. By default, the URI defined in iproto.listen is used to advertise the current instance.

Примечание

The host value cannot be 0.0.0.0/[::] and the port value cannot be 0.


Type: string
Default: nil
Environment variable: TT_IPROTO_ADVERTISE_PEER_URI, TT_IPROTO_ADVERTISE_SHARDING_URI
iproto.advertise.<peer_or_sharding>.login

(Optional) A username used to connect to the current instance. If a username is not set, the guest user is used.


Type: string
Default: nil
Environment variable: TT_IPROTO_ADVERTISE_PEER_LOGIN, TT_IPROTO_ADVERTISE_SHARDING_LOGIN
iproto.advertise.<peer_or_sharding>.password

(Optional) A password for the specified user. If a login is specified but a password is missing, it is taken from the user’s credentials.


Type: string
Default: nil
Environment variable: TT_IPROTO_ADVERTISE_PEER_PASSWORD, TT_IPROTO_ADVERTISE_SHARDING_PASSWORD
iproto.advertise.<peer_or_sharding>.params

(Optional) URI parameters (<uri>.params.*) required for connecting to the current instance.

Enterprise Edition

TLS traffic encryption is supported by the Enterprise Edition only.

URI parameters that can be used in the iproto.listen.<uri>.params and iproto.advertise.<peer_or_sharding>.params options.

Примечание

Note that <uri>.params.* options don’t have corresponding environment variables for URIs specified in iproto.listen.

<uri>.params.transport

Allows you to enable traffic encryption for client-server communications over binary connections. In a Tarantool cluster, one instance might act as the server that accepts connections from other instances and the client that connects to other instances.

<uri>.params.transport accepts one of the following values:

  • plain (default): turn off traffic encryption.
  • ssl: encrypt traffic by using the TLS 1.2 protocol (Enterprise Edition only).

Example

The example below demonstrates how to enable traffic encryption by using a self-signed server certificate. The following parameters are specified for each instance:

  • ssl_cert_file: a path to an SSL certificate file.
  • ssl_key_file: a path to a private SSL key file.
replicaset001:
  replication:
    failover: manual
  leader: instance001
  iproto:
    advertise:
      peer:
        login: replicator
  instances:
    instance001:
      iproto:
        listen:
        - uri: '127.0.0.1:3301'
          params:
            transport: 'ssl'
            ssl_cert_file: 'certs/server.crt'
            ssl_key_file: 'certs/server.key'
    instance002:
      iproto:
        listen:
        - uri: '127.0.0.1:3302'
          params:
            transport: 'ssl'
            ssl_cert_file: 'certs/server.crt'
            ssl_key_file: 'certs/server.key'
    instance003:
      iproto:
        listen:
        - uri: '127.0.0.1:3303'
          params:
            transport: 'ssl'
            ssl_cert_file: 'certs/server.crt'
            ssl_key_file: 'certs/server.key'

Example on Github: ssl_without_ca


Type: string
Default: „plain“
Environment variable: TT_IPROTO_ADVERTISE_PEER_PARAMS_TRANSPORT, TT_IPROTO_ADVERTISE_SHARDING_PARAMS_TRANSPORT
<uri>.params.ssl_ca_file

(Optional) A path to a trusted certificate authorities (CA) file. If not set, the peer won’t be checked for authenticity.

Both a server and a client can use the ssl_ca_file parameter:

  • If it’s on the server side, the server verifies the client.
  • If it’s on the client side, the client verifies the server.
  • If both sides have the CA files, the server and the client verify each other.

See also: <uri>.params.transport


Type: string
Default: nil
Environment variable: TT_IPROTO_ADVERTISE_PEER_PARAMS_SSL_CA_FILE, TT_IPROTO_ADVERTISE_SHARDING_PARAMS_SSL_CA_FILE
<uri>.params.ssl_cert_file

A path to an SSL certificate file:

  • For a server, it’s mandatory.
  • For a client, it’s mandatory if the ssl_ca_file parameter is set for a server; otherwise, optional.

See also: <uri>.params.transport


Type: string
Default: nil
Environment variable: TT_IPROTO_ADVERTISE_PEER_PARAMS_SSL_CERT_FILE, TT_IPROTO_ADVERTISE_SHARDING_PARAMS_SSL_CERT_FILE
<uri>.params.ssl_ciphers

(Optional) A colon-separated (:) list of SSL cipher suites the connection can use. Note that the list is not validated: if a cipher suite is unknown, Tarantool ignores it, doesn’t establish the connection, and writes to the log that no shared cipher was found.

The supported cipher suites are:

  • ECDHE-ECDSA-AES256-GCM-SHA384
  • ECDHE-RSA-AES256-GCM-SHA384
  • DHE-RSA-AES256-GCM-SHA384
  • ECDHE-ECDSA-CHACHA20-POLY1305
  • ECDHE-RSA-CHACHA20-POLY1305
  • DHE-RSA-CHACHA20-POLY1305
  • ECDHE-ECDSA-AES128-GCM-SHA256
  • ECDHE-RSA-AES128-GCM-SHA256
  • DHE-RSA-AES128-GCM-SHA256
  • ECDHE-ECDSA-AES256-SHA384
  • ECDHE-RSA-AES256-SHA384
  • DHE-RSA-AES256-SHA256
  • ECDHE-ECDSA-AES128-SHA256
  • ECDHE-RSA-AES128-SHA256
  • DHE-RSA-AES128-SHA256
  • ECDHE-ECDSA-AES256-SHA
  • ECDHE-RSA-AES256-SHA
  • DHE-RSA-AES256-SHA
  • ECDHE-ECDSA-AES128-SHA
  • ECDHE-RSA-AES128-SHA
  • DHE-RSA-AES128-SHA
  • AES256-GCM-SHA384
  • AES128-GCM-SHA256
  • AES256-SHA256
  • AES128-SHA256
  • AES256-SHA
  • AES128-SHA
  • GOST2012-GOST8912-GOST8912
  • GOST2001-GOST89-GOST89

For detailed information on SSL ciphers and their syntax, refer to OpenSSL documentation.

See also: <uri>.params.transport


Type: string
Default: nil
Environment variable: TT_IPROTO_ADVERTISE_PEER_PARAMS_SSL_CIPHERS, TT_IPROTO_ADVERTISE_SHARDING_PARAMS_SSL_CIPHERS
<uri>.params.ssl_key_file

A path to a private SSL key file:

  • For a server, it’s mandatory.
  • For a client, it’s mandatory if the ssl_ca_file parameter is set for a server; otherwise, optional.

If the private key is encrypted, provide a password for it in the ssl_password or ssl_password_file parameter.

See also: <uri>.params.transport


Type: string
Default: nil
Environment variable: TT_IPROTO_ADVERTISE_PEER_PARAMS_SSL_KEY_FILE, TT_IPROTO_ADVERTISE_SHARDING_PARAMS_SSL_KEY_FILE
<uri>.params.ssl_password

(Optional) A password for an encrypted private SSL key provided using ssl_key_file. Alternatively, the password can be provided in ssl_password_file.

Tarantool applies the ssl_password and ssl_password_file parameters in the following order:

  1. If ssl_password is provided, Tarantool tries to decrypt the private key with it.
  2. If ssl_password is incorrect or isn’t provided, Tarantool tries all passwords from ssl_password_file one by one in the order they are written.
  3. If ssl_password and all passwords from ssl_password_file are incorrect, or none of them is provided, Tarantool treats the private key as unencrypted.

See also: <uri>.params.transport


Type: string
Default: nil
Environment variable: TT_IPROTO_ADVERTISE_PEER_PARAMS_SSL_PASSWORD, TT_IPROTO_ADVERTISE_SHARDING_PARAMS_SSL_PASSWORD
<uri>.params.ssl_password_file

(Optional) A text file with one or more passwords for encrypted private SSL keys provided using ssl_key_file (each on a separate line). Alternatively, the password can be provided in ssl_password.

See also: <uri>.params.transport


Type: string
Default: nil
Environment variable: TT_IPROTO_ADVERTISE_PEER_PARAMS_SSL_PASSWORD_FILE, TT_IPROTO_ADVERTISE_SHARDING_PARAMS_SSL_PASSWORD_FILE

The groups section provides the ability to define the full topology of a Tarantool cluster.

Примечание

groups can be defined in the global scope only.

groups.<group_name>

A group name.

The following rules are applied to group names:

  • The maximum number of symbols is 63.
  • Should start with a letter.
  • Can contain lowercase letters (a-z).
  • Can contain digits (0-9).
  • Can contain the following characters: -, _.
groups.<group_name>.replicasets

Replica sets that belong to this group. See replicasets.

groups.<group_name>.<config_parameter>

Any configuration parameter that can be defined in the group scope. For example, iproto and database configuration parameters defined at the group level are applied to all instances in this group.

Примечание

replicasets can be defined in the group scope only.

replicasets.<replicaset_name>

A replica set name.

Note that the rules applied to a replica set name are the same as for groups. Learn more in groups.<group_name>.

replicasets.<replicaset_name>.leader

A replica set leader. This option can be used to set a replica set leader when manual replication.failover is used.

To perform controlled failover, <replicaset_name>.leader can be temporarily removed or set to null.

Example

replication:
  failover: manual

groups:
  group001:
    replicasets:
      replicaset001:
        leader: instance001
        instances:
          instance001:
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            iproto:
              listen:
              - uri: '127.0.0.1:3302'
          instance003:
            iproto:
              listen:
              - uri: '127.0.0.1:3303'
replicasets.<replicaset_name>.bootstrap_leader

A bootstrap leader for a replica set. To specify a bootstrap leader manually, you need to set replication.bootstrap_strategy to config.

Example

groups:
  group001:
    replicasets:
      replicaset001:
        replication:
          bootstrap_strategy: config
        bootstrap_leader: instance001
        instances:
          instance001:
            iproto:
              listen:
              - uri: '127.0.0.1:3301'
          instance002:
            iproto:
              listen:
              - uri: '127.0.0.1:3302'
          instance003:
            iproto:
              listen:
              - uri: '127.0.0.1:3303'
replicasets.<replicaset_name>.instances

Instances that belong to this replica set. See instances.

replicasets.<replicaset_name>.<config_parameter>

Any configuration parameter that can be defined in the replica set scope. For example, iproto and database configuration parameters defined at the replica set level are applied to all instances in this replica set.

Примечание

instances can be defined in the replica set scope only.

instances.<instance_name>

An instance name.

Note that the rules applied to an instance name are the same as for groups. Learn more in groups.<group_name>.

instances.<instance_name>.<config_parameter>

Any configuration parameter that can be defined in the instance scope. For example, iproto and database configuration parameters defined at the instance level are applied to this instance only.

Since version 3.3.0, a new isolated option is added to instance configuration.

The option takes boolean values, by default it is set to false. isolated: true moves the instance it has been applied at to the isolated mode.

The isolated mode allows the user to temporarily isolate an instance and perform maintenance activities on it.

In the isolated mode:

Примечание

Isolated instance can’t be bootstrapped (a local snapshot is required to start).

Example

The example below shows how to isolate an instance:

groups:
  g:
    replicasets:
      r:
        instances:
          i-001: {}
          i-002: {}
          i-003: {}
          i-004:
            isolated: true

The labels section allows adding custom attributes to the configuration. Attributes must be key: value pairs with string keys and values.

Примечание

labels can be defined in any scope.

labels.<label_name>

A value of the label with the specified name.

Example

The example below shows how to define labels on the replica set and instance levels:

groups:
  group001:
    replicasets:
      replicaset001:
        labels:
          dc: 'east'
          production: 'false'
        instances:
          instance001:
            labels:
              rack: '10'
              production: 'true'

See also: Adding labels

The log section defines configuration parameters related to logging. To handle logging in your application, use the log module.

Примечание

log can be defined in any scope.

log.to

Define a location Tarantool sends logs to. This option accepts the following values:

  • stderr: write logs to the standard error stream.
  • file: write logs to a file (see log.file).
  • pipe: start a program and write logs to its standard input (see log.pipe).
  • syslog: write logs to a system logger (see log.syslog.*).

Type: string
Default: „stderr“
Environment variable: TT_LOG_TO
log.file

Specify a file for logs destination. To write logs to a file, you need to set log.to to file. Otherwise, log.file is ignored.

Example

The example below shows how to write logs to a file placed in the specified directory:

log:
  to: file
  file: var/log/{{ instance_name }}/instance.log

Example on GitHub: log_file


Type: string
Default: „var/log/{{ instance_name }}/tarantool.log“
Environment variable: TT_LOG_FILE
log.format

Specify a format that is used for a log entry. The following formats are supported:

  • plain: a log entry is formatted as plain text. Example:

    2024-04-09 11:00:10.369 [12089] main/104/interactive I> log level 5 (INFO)
    
  • json: a log entry is formatted as JSON and includes additional fields. Example:

    {
      "time": "2024-04-09T11:00:57.174+0300",
      "level": "INFO",
      "message": "log level 5 (INFO)",
      "pid": 12160,
      "cord_name": "main",
      "fiber_id": 104,
      "fiber_name": "interactive",
      "file": "src/main.cc",
      "line": 498
    }
    

Type: string
Default: „plain“
Environment variable: TT_LOG_FORMAT
log.level

Specify the level of detail logs have. There are the following levels:

  • 0 – fatal
  • 1 – syserror
  • 2 – error
  • 3 – crit
  • 4 – warn
  • 5 – info
  • 6 – verbose
  • 7 – debug

By setting log.level, you can enable logging of all events with severities above or equal to the given level.

Example

The example below shows how to log all events with severities above or equal to the VERBOSE level.

log:
  level: 'verbose'

Example on GitHub: log_level


Type: number, string
Default: 5
Environment variable: TT_LOG_LEVEL
log.modules

Configure the specified log levels (log.level) for different modules.

You can specify a logging level for the following module types:

Example 1: Set log levels for files that use the default logger

Suppose you have two identical modules placed by the following paths: test/module1.lua and test/module2.lua. These modules use the default logger and look as follows:

return {
    say_hello = function()
        local log = require('log')
        log.info('Info message from module1')
    end
}

To configure logging levels, you need to provide module names corresponding to paths to these modules:

log:
  modules:
    test.module1: 'verbose'
    test.module2: 'error'
app:
  file: 'app.lua'

To load these modules in your application (app.lua), you need to add the corresponding require directives:

module1 = require('test.module1')
module2 = require('test.module2')

Given that module1 has the verbose logging level and module2 has the error level, calling module1.say_hello() shows a message but module2.say_hello() is swallowed:

-- Prints 'info' messages --
module1.say_hello()
--[[
[92617] main/103/interactive/test.logging.module1 I> Info message from module1
---
...
--]]

-- Swallows 'info' messages --
module2.say_hello()
--[[
---
...
--]]

Example on GitHub: log_existing_modules

Example 2: Set log levels for modules that use custom loggers

This example shows how to set the verbose level for module1 and the error level for module2:

log:
  modules:
    module1: 'verbose'
    module2: 'error'
app:
  file: 'app.lua'

To create custom loggers in your application (app.lua), call the log.new() function:

-- Creates new loggers --
module1_log = require('log').new('module1')
module2_log = require('log').new('module2')

Given that module1 has the verbose logging level and module2 has the error level, calling module1_log.info() shows a message but module2_log.info() is swallowed:

-- Prints 'info' messages --
module1_log.info('Info message from module1')
--[[
[16300] main/103/interactive/module1 I> Info message from module1
---
...
--]]

-- Swallows 'debug' messages --
module1_log.debug('Debug message from module1')
--[[
---
...
--]]

-- Swallows 'info' messages --
module2_log.info('Info message from module2')
--[[
---
...
--]]

Example on GitHub: log_new_modules

Example 3: Set a log level for C modules

This example shows how to set the info level for the tarantool module:

log:
  modules:
    tarantool: 'info'
app:
  file: 'app.lua'

The specified level affects messages logged from C modules:

ffi = require('ffi')

-- Prints 'info' messages --
ffi.C._say(ffi.C.S_INFO, nil, 0, nil, 'Info message from C module')
--[[
[6024] main/103/interactive I> Info message from C module
---
...
--]]

-- Swallows 'debug' messages --
ffi.C._say(ffi.C.S_DEBUG, nil, 0, nil, 'Debug message from C module')
--[[
---
...
--]]

The example above uses the LuaJIT ffi library to call C functions provided by the say module.

Example on GitHub: log_existing_c_modules


Type: map
Default: box.NULL
Environment variable: TT_LOG_MODULES
log.nonblock

Specify the logging behavior if the system is not ready to write. If set to true, Tarantool does not block during logging if the system is non-writable and writes a message instead. Using this value may improve logging performance at the cost of losing some log messages.

Примечание

The option only has an effect if the log.to is set to syslog or pipe.


Type: boolean
Default: false
Environment variable: TT_LOG_NONBLOCK
log.pipe

Start a program and write logs to its standard input (stdin). To send logs to a program’s standard input, you need to set log.to to pipe.

Example

In the example below, Tarantool writes logs to the standard input of cronolog:

log:
  to: pipe
  pipe: 'cronolog tarantool.log'

Example on GitHub: log_pipe


Type: string
Default: box.NULL
Environment variable: TT_LOG_PIPE

log.syslog.facility

Specify the syslog facility to be used when syslog is enabled. To write logs to syslog, you need to set log.to to syslog.


Type: string
Possible values: „auth“, „authpriv“, „cron“, „daemon“, „ftp“, „kern“, „lpr“, „mail“, „news“, „security“, „syslog“, „user“, „uucp“, „local0“, „local1“, „local2“, „local3“, „local4“, „local5“, „local6“, „local7“
Default: „local7“
Environment variable: TT_LOG_SYSLOG_FACILITY
log.syslog.identity

Specify an application name used to identify Tarantool messages in syslog logs. To write logs to syslog, you need to set log.to to syslog.


Type: string
Default: „tarantool“
Environment variable: TT_LOG_SYSLOG_IDENTITY
log.syslog.server

Set a location of a syslog server. This option accepts one of the following values:

  • An IPv4 address. Example: 127.0.0.1:514.
  • A Unix socket path starting with unix:. Examples: unix:/dev/log on Linux or unix:/var/run/syslog on macOS.

To write logs to syslog, you need to set log.to to syslog.

Example

In the example below, Tarantool writes logs to a syslog server that listens for logging messages on the 127.0.0.1:514 address:

log:
  to: syslog
  syslog:
    server: '127.0.0.1:514'

Example on GitHub: log_syslog


Type: string
Default: box.NULL
Environment variable: TT_LOG_SYSLOG_SERVER

The lua section outlines the configuration parameters related to the Lua environment within Tarantool.

Примечание

lua can be defined in any scope.

lua.memory

Specifies the maximum memory amount available to Lua scripts, measured in bytes.

When the specified value exceeds the current memory usage, the new limit takes effect immediately without a restart. However, when the specified value is lower than the current memory usage, a restart of the instance is required for the change to take effect.

Example to set the Lua memory limit to 4 GB:

lua:
    memory: 4294967296

Type: integer
Default: 2147483648 (2GB)
Environment variable: TT_LUA_MEMORY

The memtx section is used to configure parameters related to the memtx engine.

Примечание

memtx can be defined in any scope.

memtx.allocator

Specify the allocator that manages memory for memtx tuples. Possible values:

  • system – the memory is allocated as needed, checking that the quota is not exceeded. THe allocator is based on the malloc function.
  • small – a slab allocator. The allocator repeatedly uses a memory block to allocate objects of the same type. Note that this allocator is prone to unresolvable fragmentation on specific workloads, so you can switch to system in such cases.

Type: string
Default: „small“
Environment variable: TT_MEMTX_ALLOCATOR
memtx.max_tuple_size

Size of the largest allocation unit for the memtx storage engine in bytes. It can be increased if it is necessary to store large tuples.


Type: integer
Default: 1048576
Environment variable: TT_MEMTX_MAX_TUPLE_SIZE
memtx.memory

The amount of memory in bytes that Tarantool allocates to store tuples. When the limit is reached, INSERT and UPDATE requests fail with the ER_MEMORY_ISSUE error. The server does not go beyond the memtx.memory limit to allocate tuples, but there is additional memory used to store indexes and connection information.

Example

In the example below, the memory size is set to 1 GB (1073741824 bytes).

memtx:
  memory: 1073741824

Type: integer
Default: 268435456
Environment variable: TT_MEMTX_MEMORY
memtx.min_tuple_size

Size of the smallest allocation unit in bytes. It can be decreased if most of the tuples are very small.


Type: integer
Default: 16
Possible values: between 8 and 1048280 inclusive
Environment variable: TT_MEMTX_MIN_TUPLE_SIZE
memtx.slab_alloc_factor

The multiplier for computing the sizes of memory chunks that tuples are stored in. A lower value may result in less wasted memory depending on the total amount of memory available and the distribution of item sizes.

See also: memtx.slab_alloc_granularity


Type: number
Default: 1.05
Possible values: between 1 and 2 inclusive
Environment variable: TT_MEMTX_SLAB_ALLOC_FACTOR
memtx.slab_alloc_granularity

Specify the granularity in bytes of memory allocation in the small allocator. The memtx.slab_alloc_granularity value should meet the following conditions:

  • The value is a power of two.
  • The value is greater than or equal to 4.

Below are few recommendations on how to adjust the memtx.slab_alloc_granularity option:

  • If the tuples in space are small and have about the same size, set the option to 4 bytes to save memory.
  • If the tuples are different-sized, increase the option value to allocate tuples from the same mempool (memory pool).

See also: memtx.slab_alloc_factor


Type: integer
Default: 8
Environment variable: TT_MEMTX_SLAB_ALLOC_GRANULARITY
memtx.sort_threads

The number of threads from the thread pool used to sort keys of secondary indexes on loading a memtx database. The minimum value is 1, the maximum value is 256. The default is to use all available cores.

Примечание

Since 3.0.0, this option replaces the approach when OpenMP threads are used to parallelize sorting. For backward compatibility, the OMP_NUM_THREADS environment variable is taken into account to set the number of sorting threads.


Type: integer
Default: box.NULL
Environment variable: TT_MEMTX_SORT_THREADS

The metrics section defines configuration parameters for metrics.

Примечание

metrics can be defined in any scope.

metrics.exclude

An array containing the metrics to turn off. The array can contain the same values as the exclude configuration parameter passed to metrics.cfg().

Example

metrics:
  include: [ all ]
  exclude: [ vinyl ]
  labels:
    alias: '{{ instance_name }}'

Type: array
Default: []
Environment variable: TT_METRICS_EXCLUDE
metrics.include

An array containing the metrics to turn on. The array can contain the same values as the include configuration parameter passed to metrics.cfg().


Type: array
Default: [ all ]
Environment variable: TT_METRICS_INCLUDE
metrics.labels

Global labels to be added to every observation.


Type: map
Default: { alias = names.instance_name }
Environment variable: TT_METRICS_LABELS

The process section defines configuration parameters of the Tarantool process in the system.

Примечание

process can be defined in any scope.

process.background

Run the server as a daemon process.

If this option is set to true, Tarantool log location defined by the log.to option should be set to file, pipe, or syslog – anything other than stderr, the default, because a daemon process is detached from a terminal and it can’t write to the terminal’s stderr.

Важно

Do not enable the background mode for applications intended to run by the tt utility. For more information, see the tt start reference.


Type: boolean
Default: false
Environment variable: TT_PROCESS_BACKGROUND
process.coredump

Create coredump files.

Usually, an administrator needs to call ulimit -c unlimited (or set corresponding options in systemd’s unit file) before running a Tarantool process to get core dumps. If process.coredump is enabled, Tarantool sets the corresponding resource limit by itself and the administrator doesn’t need to call ulimit -c unlimited (see man 3 setrlimit).

This option also sets the state of the dumpable attribute, which is enabled by default, but may be dropped in some circumstances (according to man 2 prctl, see PR_SET_DUMPABLE).


Type: boolean
Default: false
Environment variable: TT_PROCESS_COREDUMP
process.title

Add the given string to the server’s process title (it is shown in the COMMAND column for the Linux commands ps -ef and top -c).

For example, if you set the option to myservice - {{ instance_name }}:

process:
  title: myservice - {{ instance_name }}

ps -ef might show the Tarantool server process like this:

$ ps -ef | grep tarantool
503      68100 68098  0 10:33 pts/2    00:00.10 tarantool <running>: myservice instance1

Type: string
Default: „tarantool - {{ instance_name }}“
Environment variable: TT_PROCESS_TITLE
process.pid_file

Store the process id in this file.

This option may contain a relative file path. In this case, it is interpreted as relative to process.work_dir.


Type: string
Default: „var/run/{{ instance_name }}/tarantool.pid“
Environment variable: TT_PROCESS_PID_FILE
process.strip_core

Whether coredump files should not include memory allocated for tuples – this memory can be large if Tarantool runs under heavy load. Setting to true means «do not include».


Type: boolean
Default: true
Environment variable: TT_PROCESS_STRIP_CORE
process.username

The name of the system user to switch to after start.


Type: string
Default: box.NULL
Environment variable: TT_PROCESS_USERNAME
process.work_dir

A directory where Tarantool working files will be stored (database files, logs, a PID file, a console Unix socket, and other files if an application generates them in the current directory). The server instance switches to process.work_dir with chdir(2) after start.

If set as a relative file path, it is relative to the current working directory, from where Tarantool is started. If not specified, defaults to the current working directory.

Other directory and file parameters, if set as relative paths, are interpreted as relative to process.work_dir, for example, directories for storing snapshots and write-ahead logs.


Type: string
Default: box.NULL
Environment variable: TT_PROCESS_WORK_DIR

The replication section defines configuration parameters related to replication.

replication.anon

Whether to make the current instance act as an anonymous replica. Anonymous replicas are read-only and can be used, for example, for backups.

To make the specified instance act as an anonymous replica, set replication.anon to true:

instance003:
  replication:
    anon: true

You can find the full example on GitHub: anonymous_replica.

Anonymous replicas are not displayed in the box.info.replication section. You can check their status using box.info.replication_anon().

While anonymous replicas are read-only, you can write data to replication-local and temporary spaces (created with is_local = true and temporary = true, respectively). Given that changes to replication-local spaces are allowed, an anonymous replica might increase the 0 component of the vclock value.

Here are the limitations of having anonymous replicas in a replica set:

Примечание

Anonymous replicas are not registered in the _cluster space. This means that there is no limitation on the number of anonymous replicas in a replica set.


Type: boolean
Default: false
Environment variable: TT_REPLICATION_ANON
replication.autoexpel

Since: 3.3.0

The replication.autoexpel option designed for managing dynamic clusters using YAML-based configurations. It enables the automatic expulsion of instances that are removed from the YAML configuration.

Only instances with names that match the specified prefix are considered for expulsion; all others are excluded. Additionally, instances without a persistent name are ignored.

If an instance is in read-write mode and has the latest database schema, it initiates the expulsion of instances that:

  • Match the specified prefix
  • Absent from the updated YAML configuration

The expulsion process follows the standard procedure, involving the removal of the instance from the _cluster system space.

The autoexpel logic is activated during specific events:

  • Startup. When the cluster starts, autoexpel checks and removes instances not matching the updated configuration.
  • Reconfiguration. When the YAML configuration is reloaded, autoexpel compares the current state to the updated configuration and performs necessary expulsions.
  • box.status watcher event. Changes detected by the box.status watcher also trigger the autoexpel mechanism.

autoexpel does not take any actions on newly joined instances unless one of the triggering events occurs. This means that an instance meeting the autoexpel criterion can still join the cluster, but it may be removed later during reconfiguration or on subsequent triggering events.

Примечание

The replication.autoexpel option governs the expelling process and is configurable at the replicaset, group, and global levels. It is not applicable at the instance level.

Configuration fields

  • by (string, default: nil): specifies the autoexpel criterion. Currently, only prefix is supported and must be explicitly set.
  • enabled (boolean, default: false): enables or disables the autoexpel logic.
  • prefix (string, default: nil): defines the pattern for instance names that are considered part of the cluster.

replication.autoexpel_by purpose is to define the criterion used for determining which instances in a cluster are subject to the autoexpel process.

The by field helps differentiate between:

  • Instances that are part of the cluster and should adhere to the YAML configuration.

    • Instances or tools (e.g., CDC tools) that use the replication channel but are not part of the cluster configuration.

The default value of by is nil, meaning no autoexpel criterion is applied unless explicitly set.

Currently, the only supported value for by is prefix. The prefix value instructs the system to identify instances based on their names, matching them against a prefix pattern defined in the configuration.

If the autoexpel feature is enabled (enabled: true), the by field must be explicitly set to prefix.

The absence of this field or an unsupported value will result in configuration errors.

replication:
  autoexpel:
    enabled: true
    by: prefix
    prefix: '{{ replicaset_name }}'

Type: string
Default: nil
Environment variable: TT_REPLICATION_AUTOEXPEL_BY

The replication.autoexpel_enabled field is a boolean configuration option that determines whether the autoexpel logic is active for the cluster. This feature is designed to automatically manage dynamic cluster configurations by removing instances that are no longer present in the YAML configuration.

Примечание

By default, the enabled field is set to false, meaning the autoexpel logic is turned off. This ensures that no instances are automatically remove