Updated at 2026-05-11 03:30:04.794845
Overview
Tarantool comes in two editions: the open-source Community Edition (CE)
and the commercial Enterprise Edition (EE).
- Primary storage
- No secondary storage required
- Tolerance to high write loads
- Support of relational approaches
- Composite secondary indexes
- Predictable request latency
- Write-behind caching
- Secondary index support
- Complex invalidation algorithm support
- Support of various identification techniques
- Advanced task lifecycle management
- Task scheduling
- Archiving of completed tasks
Data-centric applications
- Arbitrary data flows from many sources
- Incoming data processing
- Storage
- Background cycle processing
Getting started
This section will get you acquainted with Tarantool.
Creating a sharded cluster
Example on GitHub: sharded_cluster_crud
In this tutorial, you get a sharded cluster up and running on your local machine and learn how to manage the cluster using the tt utility.
This cluster uses the following external modules:
- vshard enables sharding in the cluster.
- crud allows you to manipulate data in the sharded cluster.
The cluster created in this tutorial includes 5 instances: one router and 4 storages, which constitute two replica sets.
Before starting this tutorial:
Creating a cluster application
The tt create command can be used to create an application from a predefined or custom template.
For example, the built-in vshard_cluster template enables you to create a ready-to-run sharded cluster application.
In this tutorial, the application layout is prepared manually:
Create a tt environment in the current directory by executing the tt init command.
Inside the empty instances.enabled directory of the created tt environment, create the sharded_cluster_crud directory.
Inside instances.enabled/sharded_cluster_crud, create the following files:
instances.yml specifies instances to run in the current environment.
config.yaml specifies the cluster configuration.
storage.lua contains code specific for storages.
router.lua contains code specific for a router.
sharded_cluster_crud-scm-1.rockspec specifies external dependencies required by the application.
The next Developing the application section shows how to configure the cluster and write code for routing read and write requests to different storages.
Developing the application
Configuring instances to run
Open the instances.yml file and add the following content:
storage-a-001:
storage-a-002:
storage-b-001:
storage-b-002:
router-a-001:
This file specifies instances to run in the current environment.
This section describes how to configure the cluster in the config.yaml file.
Step 1: Configuring credentials
Add the credentials configuration section:
credentials:
users:
replicator:
password: 'topsecret'
roles: [ replication ]
storage:
password: 'secret'
roles: [ sharding ]
In this section, two users with the specified passwords are created:
- The
replicator user with the replication role.
- The
storage user with the sharding role.
These users are intended to maintain replication and sharding in the cluster.
Important
It is not recommended to store passwords as plain text in a YAML configuration.
Learn how to load passwords from safe storage such as external files or environment variables from Loading secrets from safe storage.
Step 2: Specifying advertise URIs
Add the iproto.advertise section:
iproto:
advertise:
peer:
login: replicator
sharding:
login: storage
In this section, the following options are configured:
iproto.advertise.peer specifies how to advertise the current instance to other cluster members.
In particular, this option informs other replica set members that the replicator user should be used to connect to the current instance.
iproto.advertise.sharding specifies how to advertise the current instance to a router and rebalancer.
The cluster topology defined in the following section also specifies the iproto.advertise.client option for each instance.
This option accepts a URI used to advertise the instance to clients.
For example, Tarantool Cluster Manager uses these URIs to connect to cluster instances.
Step 3: Configuring bucket count
Specify the total number of buckets in a sharded cluster using the sharding.bucket_count option:
sharding:
bucket_count: 1000
Step 4: Defining the cluster topology
Define the cluster topology inside the groups section.
The cluster includes two groups:
storages includes two replica sets. Each replica set contains two instances.
routers includes one router instance.
Here is a schematic view of the cluster topology:
groups:
storages:
replicasets:
storage-a:
# ...
storage-b:
# ...
routers:
replicasets:
router-a:
# ...
To configure storages, add the following code inside the groups section:
storages:
roles: [ roles.crud-storage ]
app:
module: storage
sharding:
roles: [ storage ]
replication:
failover: manual
replicasets:
storage-a:
leader: storage-a-001
instances:
storage-a-001:
iproto:
listen:
- uri: '127.0.0.1:3302'
advertise:
client: '127.0.0.1:3302'
storage-a-002:
iproto:
listen:
- uri: '127.0.0.1:3303'
advertise:
client: '127.0.0.1:3303'
storage-b:
leader: storage-b-001
instances:
storage-b-001:
iproto:
listen:
- uri: '127.0.0.1:3304'
advertise:
client: '127.0.0.1:3304'
storage-b-002:
iproto:
listen:
- uri: '127.0.0.1:3305'
advertise:
client: '127.0.0.1:3305'
The main group-level options here are:
roles: This option enables the roles.crud-storage role provided by the CRUD module for all storage instances.
app: The app.module option specifies that code specific to storages should be loaded from the storage module. This is explained below in the Adding storage code section.
sharding: The sharding.roles option specifies that all instances inside this group act as storages.
A rebalancer is selected automatically from two master instances.
replication: The replication.failover option specifies that a leader in each replica set should be specified manually.
replicasets: This section configures two replica sets that constitute cluster storages.
To configure a router, add the following code inside the groups section:
routers:
roles: [ roles.crud-router ]
app:
module: router
sharding:
roles: [ router ]
replicasets:
router-a:
instances:
router-a-001:
iproto:
listen:
- uri: '127.0.0.1:3301'
advertise:
client: '127.0.0.1:3301'
The main group-level options here are:
roles: This option enables the roles.crud-router role provided by the CRUD module for a router instance.
app: The app.module option specifies that code specific to a router should be loaded from the router module. This is explained below in the Adding router code section.
sharding: The sharding.roles option specifies that an instance inside this group acts as a router.
replicasets: This section configures a replica set with one router instance.
The resulting config.yaml file should look as follows:
credentials:
users:
replicator:
password: 'topsecret'
roles: [ replication ]
storage:
password: 'secret'
roles: [ sharding ]
iproto:
advertise:
peer:
login: replicator
sharding:
login: storage
sharding:
bucket_count: 1000
groups:
storages:
roles: [ roles.crud-storage ]
app:
module: storage
sharding:
roles: [ storage ]
replication:
failover: manual
replicasets:
storage-a:
leader: storage-a-001
instances:
storage-a-001:
iproto:
listen:
- uri: '127.0.0.1:3302'
advertise:
client: '127.0.0.1:3302'
storage-a-002:
iproto:
listen:
- uri: '127.0.0.1:3303'
advertise:
client: '127.0.0.1:3303'
storage-b:
leader: storage-b-001
instances:
storage-b-001:
iproto:
listen:
- uri: '127.0.0.1:3304'
advertise:
client: '127.0.0.1:3304'
storage-b-002:
iproto:
listen:
- uri: '127.0.0.1:3305'
advertise:
client: '127.0.0.1:3305'
routers:
roles: [ roles.crud-router ]
app:
module: router
sharding:
roles: [ router ]
replicasets:
router-a:
instances:
router-a-001:
iproto:
listen:
- uri: '127.0.0.1:3301'
advertise:
client: '127.0.0.1:3301'
Open the storage.lua file and define a space and indexes inside box.watch() as follows:
box.watch('box.status', function()
if box.info.ro then
return
end
box.schema.create_space('bands', {
format = {
{ name = 'id', type = 'unsigned' },
{ name = 'bucket_id', type = 'unsigned' },
{ name = 'band_name', type = 'string' },
{ name = 'year', type = 'unsigned' }
},
if_not_exists = true
})
box.space.bands:create_index('id', { parts = { 'id' }, if_not_exists = true })
box.space.bands:create_index('bucket_id', { parts = { 'bucket_id' }, unique = false, if_not_exists = true })
end)
- The box.schema.create_space() function creates a space.
Note that the created
bands space includes the bucket_id field.
This field represents a sharding key used to partition a dataset across different storage instances.
- space_object:create_index() creates two indexes based on the
id and bucket_id fields.
Note
In a sharded space, uniqueness by secondary index is only guaranteed within a single shard, not across the whole cluster.
Open the router.lua file and load the vshard module as follows:
local vshard = require('vshard')
Configuring build settings
Open the sharded_cluster_crud-scm-1.rockspec file and add the following content:
package = 'sharded_cluster_crud'
version = 'scm-1'
source = {
url = '/dev/null',
}
dependencies = {
'vshard == 0.1.27',
'crud == 1.5.2'
}
build = {
type = 'none';
}
The dependencies section includes the specified versions of the vshard and crud modules.
To install dependencies, you need to build the application.
In the terminal, open the tt environment directory.
Then, execute the tt build command:
$ tt build sharded_cluster_crud
• Running rocks make
No existing manifest. Attempting to rebuild...
• Application was successfully built
This installs the vshard and crud modules defined in the *.rockspec file to the .rocks directory.
To start all instances in the cluster, execute the tt start command:
$ tt start sharded_cluster_crud
• Starting an instance [sharded_cluster_crud:storage-a-001]...
• Starting an instance [sharded_cluster_crud:storage-a-002]...
• Starting an instance [sharded_cluster_crud:storage-b-001]...
• Starting an instance [sharded_cluster_crud:storage-b-002]...
• Starting an instance [sharded_cluster_crud:router-a-001]...
After starting instances, you need to bootstrap the cluster as follows:
Connect to the router instance using tt connect:
$ tt connect sharded_cluster_crud:router-a-001
• Connecting to the instance...
• Connected to sharded_cluster_crud:router-a-001
Call vshard.router.bootstrap() to perform the initial cluster bootstrap and distribute all buckets across the replica sets:
Checking the cluster status
To check the cluster status, execute vshard.router.info() on the router:
The output includes the following sections:
replicasets: contains information about storages and their availability.
bucket: displays the total number of read-write and read-only buckets that are currently available for this router.
status: the number from 0 to 3 that indicates whether there are any issues with the cluster.
0 means that there are no issues.
alerts: might describe the exact issues related to bootstrapping a cluster, for example, connection issues, failover events, or unidentified buckets.
Writing and selecting data
To insert sample data, call crud.insert_many() on the router:
crud.insert_many('bands', {
{ 1, box.NULL, 'Roxette', 1986 },
{ 2, box.NULL, 'Scorpions', 1965 },
{ 3, box.NULL, 'Ace of Base', 1987 },
{ 4, box.NULL, 'The Beatles', 1960 },
{ 5, box.NULL, 'Pink Floyd', 1965 },
{ 6, box.NULL, 'The Rolling Stones', 1962 },
{ 7, box.NULL, 'The Doors', 1965 },
{ 8, box.NULL, 'Nirvana', 1987 },
{ 9, box.NULL, 'Led Zeppelin', 1968 },
{ 10, box.NULL, 'Queen', 1970 }
})
Calling this function distributes data evenly across the cluster nodes.
To get a tuple by the specified ID, call the crud.get() function:
To insert a new tuple, call crud.insert():
Checking data distribution
To check how data is distributed across the replica sets, follow the steps below:
Connect to any storage in the storage-a replica set:
$ tt connect sharded_cluster_crud:storage-a-001
• Connecting to the instance...
• Connected to sharded_cluster_crud:storage-a-001
Then, select all tuples in the bands space:
Connect to any storage in the storage-b replica set:
$ tt connect sharded_cluster_crud:storage-b-001
• Connecting to the instance...
• Connected to sharded_cluster_crud:storage-b-001
Select all tuples in the bands space to make sure it contains another subset of data:
Platform
This section contains documentation for the Tarantool platform consisting of a database and an application server.
Concepts
A storage engine is a set of low-level routines that store and
retrieve values. Tarantool offers a choice of two storage engines:
- memtx is the in-memory storage engine used by default.
- vinyl is the on-disk storage engine.
For details, check the Storage engines section.
Tarantool is a NoSQL database. It stores data in spaces,
which can be thought of as tables in a relational database, and tuples,
which are analogous to rows. There are six basic data operations in Tarantool.
The platform allows describing the data schema but does not require it.
Tarantool supports highly customizable indexes of various types.
To ensure data persistence and recover quickly in case of failure,
Tarantool uses mechanisms like the write-ahead log (WAL) and snapshots.
For details, check the Data model page.
Tarantool’s ACID-compliant transaction model lets the user choose
between two modes of transactions.
The default mode allows for fast monopolistic atomic transactions.
It doesn’t support interactive transactions, and in case of an error, all transaction changes are rolled back.
The MVCC mode relies on a multi-version concurrency control engine
that allows yielding within a longer transaction.
This mode only works with the default in-memory memtx storage engine.
For details, check the Transactions page.
Replication allows keeping the data in copies of the same database for better reliability.
Several Tarantool instances can be organized in a replica set.
They communicate and transfer data via the iproto binary protocol.
Learn more about Tarantool’s replication architecture.
By default, replication in Tarantool is asynchronous.
A transaction committed locally on the master node
may not get replicated onto other instances before the client receives a success response.
Thus, if the master reports success and then dies, the client might not see the result of the transaction.
With synchronous replication, transactions on the master node are not considered committed
or successful before they are replicated onto a number of instances. This is slower, but more reliable.
Synchronous replication in Tarantool is based on an implementation of the RAFT algorithm.
For details, check the Replication section.
Tarantool implements database sharding via the vshard module.
For details, go to the Sharding page.
Tarantool allows specifying callback functions that run upon certain database events.
They can be useful for resolving replication conflicts.
For details, go to the Triggers page.
Using Tarantool as an application server, you can write
applications in Lua, C, or C++. You can also create reusable modules.
To increase the speed of code execution, Tarantool has a Lua Just-In-Time compiler (LuaJIT) on board.
LuaJIT compiles hot paths in the code – paths that are used many times –
thus making the application work faster.
To enable developers to work with LuaJIT, Tarantool provides tools like the memory profiler
and the getmetrics module.
To learn how to use Tarantool as an application server, refer to the guides in the How-to section.
Storage engines
A storage engine is a set of low-level routines which actually store and
retrieve tuple values. Tarantool offers a choice of two storage engines:
- memtx is the in-memory storage engine used by default.
- vinyl is the on-disk storage engine.
All the details on the engines you can find in the dedicated sections:
Storing data with memtx
The memtx storage engine is used in Tarantool by default.
The engine keeps all data in random-access memory (RAM), and therefore has a low read latency.
Tarantool prevents the data loss in case of emergency, such as outage or Tarantool instance failure, in the following ways:
- Tarantool persists all data changes by writing requests to the write-ahead log (WAL)
that is stored on disk. Also, Tarantool periodically takes the entire
database snapshot and saves it on disk.
Learn more: Data persistence.
- In case of a distributed application, a synchronous replication is used to ensure keeping the data consistent on a quorum of replicas.
Although replication is not directly a storage engine topic, it is a part of the answer regarding data safety.
Learn more: Replicating data.
In this section, the following topics are discussed in brief with the references to other sections that explain the
subject matter in details.
There is a fixed number of independent execution threads.
The threads don’t share state. Instead they exchange data using low-overhead message queues.
While this approach limits the number of cores that the instance uses,
it removes competition for the memory bus and ensures peak scalability of memory access and network throughput.
Only one thread, namely, the transaction processor thread (further, TX thread)
can access the database, and there is only one TX thread for each Tarantool instance.
In this thread, transactions are executed in a strictly consecutive order.
Multi-statement transactions exist to provide isolation:
each transaction sees a consistent database state and commits all its changes atomically.
At commit time, a yield happens and all transaction changes are written to WAL
in a single batch.
In case of errors during transaction execution, a transaction is rolled-back completely.
Read more in the following sections: Transaction model, Transaction mode: MVCC.
Within the TX thread, there is a memory area allocated for Tarantool to store data. It’s called Arena.

Data is stored in spaces. Spaces contain database records – tuples.
To access and manipulate the data stored in spaces and tuples, Tarantool builds indexes.
Special allocators manage memory allocations for spaces, tuples, and indexes within the Arena.
The slab allocator is the main allocator used to store tuples.
Tarantool has a built-in module called box.slab which provides the slab allocator statistics
that can be used to monitor the total memory usage and memory fragmentation.
For more details, see the box.slab module reference.

Also inside the TX thread, there is an event loop. Within the event loop, there are a number of fibers.
Fibers are cooperative primitives that allow interaction with spaces, that is, reading and writing the data.
Fibers can interact with the event loop and between each other directly or by using special primitives called channels.
Due to the usage of fibers and cooperative multitasking, the memtx engine is lock-free in typical situations.

To interact with external users, there is a separate network thread also called the iproto thread.
The iproto thread receives a request from the network, parses and checks the statement,
and transforms it into a special structure—a message containing an executable statement and its options.
Then the iproto thread ships this message to the TX thread and runs the user’s request in a separate fiber.

Tarantool ensures data persistence as follows:
- After executing data change requests in memory, Tarantool writes each such request to the write-ahead log (WAL) files (
.xlog)
that are stored on disk. Tarantool does this via a separate thread called the WAL thread.

Tarantool periodically takes the entire database snapshot and saves it on disk.
It is necessary for accelerating instance’s restart because when there are too many WAL files, it can be difficult for Tarantool to restart quickly.
To save a snapshot, there is a special fiber called the snapshot daemon.
It reads the consistent content of the entire Arena and writes it on disk into a snapshot file (.snap).
Due of the cooperative multitasking, Tarantool cannot write directly on disk because it is a locking operation.
That is why Tarantool interacts with disk via a separate pool of threads from the fio library.

So, even in emergency situations such as an outage or a Tarantool instance failure,
when the in-memory database is lost, the data can be restored fully during Tarantool restart.
What happens during the restart:
- Tarantool finds the latest snapshot file and reads it.
- Tarantool finds all the WAL files created after that snapshot and reads them as well.
- When the snapshot and WAL files have been read, there is a fully recovered in-memory data set
corresponding to the state when the Tarantool instance stopped.
- While reading the snapshot and WAL files, Tarantool is building the primary indexes.
- When all the data is in memory again, Tarantool is building the secondary indexes.
- Tarantool runs the application.
To access and manipulate the data stored in memory, Tarantool builds indexes.
Indexes are also stored in memory within the Arena.
Tarantool supports a number of index types intended for different usage scenarios.
The possible types are TREE, HASH, BITSET, and RTREE.
Select query are possible against secondary index keys as well as primary keys.
Indexes can have multi-part keys.
For detailed information about indexes, refer to the Indexes page.
Although this topic is not directly related to the memtx engine, it completes the overall picture of how Tarantool works in case of a distributed application.
Replication allows multiple Tarantool instances to work on copies of the same database.
The copies are kept in sync because each instance can communicate its changes to all the other instances.
It is implemented via WAL replication.
To send data to a replica, Tarantool runs another thread called relay.
Its purpose is to read the WAL files and send them to replicas.
On a replica, the fiber called applier is run. It receives the changes from a remote node and applies them to the replica’s Arena.
All the changes are being written to WAL files via the replica’s WAL thread as if they are done locally.

By default, replication in Tarantool is asynchronous: if a transaction
is committed locally on a master node, it does not mean it is replicated onto any
replicas.
Synchronous replication exists to solve this problem. Synchronous transactions
are not considered committed and are not responded to a client until they are
replicated onto some number of replicas.
For more information on replication, refer to the corresponding chapter.
The main key points describing how the in-memory storage engine works can be summarized in the following way:
- All data is in RAM.
- Access to data is from one thread.
- Tarantool writes all data change requests in WAL.
- Data snapshots are taken periodically.
- Indexes are build to access the data.
- WAL can be replicated.
Storing data with vinyl
Tarantool is a transactional and persistent DBMS that maintains 100% of its data
in RAM. The greatest advantages of in-memory databases are their speed and ease
of use: they demonstrate consistently high performance, but you never need to
tune them.
A few years ago we decided to extend the product by implementing a classical
storage engine similar to those used by regular DBMSs: it uses RAM for caching,
while the bulk of its data is stored on disk. We decided to make it possible to
set a storage engine independently for each table in the database, which is the
same way that MySQL approaches it, but we also wanted to support transactions
from the very beginning.
The first question we needed to answer was whether to create our own storage
engine or use an existing library. The open-source community offered a few
viable solutions. The RocksDB library was the fastest growing open-source
library and is currently one of the most prominent out there. There were also
several lesser-known libraries to consider, such as WiredTiger, ForestDB,
NestDB, and LMDB.
Nevertheless, after studying the source code of existing libraries and
considering the pros and cons, we opted for our own storage engine. One reason
is that the existing third-party libraries expected requests to come from
multiple operating system threads and thus contained complex synchronization
primitives for controlling parallel data access. If we had decided to embed one
of these in Tarantool, we would have made our users bear the overhead of a
multithreaded application without getting anything in return. The thing is,
Tarantool has an actor-based architecture. The way it processes transactions in
a dedicated thread allows it to do away with the unnecessary locks, interprocess
communication, and other overhead that accounts for up to 80% of processor time
in multithreaded DBMSs.
The Tarantool process consists of a fixed number of “actor” threads
If you design a database engine with cooperative multitasking in mind right from
the start, it not only significantly speeds up the development process, but also
allows the implementation of certain optimization tricks that would be too
complex for multithreaded engines. In short, using a third-party solution
wouldn’t have yielded the best result.
Once the idea of using an existing library was off the table, we needed to pick
an architecture to build upon. There are two competing approaches to on-disk
data storage: the older one relies on B-trees and their variations; the newer
one advocates the use of log-structured merge-trees, or “LSM” trees. MySQL,
PostgreSQL, and Oracle use B-trees, while Cassandra, MongoDB, and CockroachDB
have adopted LSM trees.
B-trees are considered better suited for reads and LSM trees—for writes.
However, with SSDs becoming more widespread and the fact that SSDs have read
throughput that’s several times greater than write throughput, the advantages of
LSM trees in most scenarios was more obvious to us.
Before dissecting LSM trees in Tarantool, let’s take a look at how they work. To
do that, we’ll begin by analyzing a regular B-tree and the issues it faces. A
B-tree is a balanced tree made up of blocks, which contain sorted lists of key-
value pairs. (Topics such as filling and balancing a B-tree or splitting and
merging blocks are outside of the scope of this article and can easily be found
on Wikipedia). As a result, we get a container sorted by key, where the smallest
element is stored in the leftmost node and the largest one in the rightmost
node. Let’s have a look at how insertions and searches in a B-tree happen.
Classical B-tree
If you need to find an element or check its membership, the search starts at the
root, as usual. If the key is found in the root block, the search stops;
otherwise, the search visits the rightmost block holding the largest element
that’s not larger than the key being searched (recall that elements at each
level are sorted). If the first level yields no results, the search proceeds to
the next level. Finally, the search ends up in one of the leaves and probably
locates the needed key. Blocks are stored and read into RAM one by one, meaning
the algorithm reads
blocks in a single search, where N is the number of
elements in the B-tree. In the simplest case, writes are done similarly: the
algorithm finds the block that holds the necessary element and updates (inserts)
its value.
To better understand the data structure, let’s consider a practical
example: say we have a B-tree with 100,000,000 nodes, a block size of 4096
bytes, and an element size of 100 bytes. Thus each block will hold up to 40
elements (all overhead considered), and the B-tree will consist of around
2,570,000 blocks and 5 levels: the first four will have a size of 256 Mb, while
the last one will grow up to 10 Gb. Obviously, any modern computer will be able
to store all of the levels except the last one in filesystem cache, so read
requests will require just a single I/O operation.
But if we change our
perspective —B-trees don’t look so good anymore. Suppose we need to update a
single element. Since working with B-trees involves reading and writing whole
blocks, we would have to read in one whole block, change our 100 bytes out of
4096, and then write the whole updated block to disk. In other words,we were
forced to write 40 times more data than we actually modified!
If you take into
account the fact that an SSD block has a size of 64 Kb+ and not every
modification changes a whole element, the extra disk workload can be greater
still.
Authors of specialized literature and blogs dedicated to on-disk data
storage have coined two terms for these phenomena: extra reads are referred to
as “read amplification” and writes as “write amplification”.
The amplification
factor (multiplication coefficient) is calculated as the ratio of the size of
actual read (or written) data to the size of data needed (or actually changed).
In our B-tree example, the amplification factor would be around 40 for both
reads and writes.
The huge number of extra I/O operations associated with
updating data is one of the main issues addressed by LSM trees. Let’s see how
they work.
The key difference between LSM trees and regular B-trees is that LSM
trees don’t just store data (keys and values), but also data operations:
insertions and deletions.
LSM tree:
- Stores statements, not values:
-
- Every statement is marked by LSN
- Append-only files, garbage is collected after a checkpoint
- Transactional log of all filesystem changes: vylog
For example, an element corresponding to an insertion operation has, apart from
a key and a value, an extra byte with an operation code (“REPLACE” in the image
above). An element representing the deletion operation contains a key (since
storing a value is unnecessary) and the corresponding operation code—“DELETE”.
Also, each LSM tree element has a log sequence number (LSN), which is the value
of a monotonically increasing sequence that uniquely identifies each operation.
The whole tree is first ordered by key in ascending order, and then, within a
single key scope, by LSN in descending order.
A single level of an LSM tree
Unlike a B-tree, which is stored completely on disk and can be partly cached in
RAM, when using an LSM tree, memory is explicitly separated from disk right from
the start. The issue of volatile memory and data persistence is beyond the scope
of the storage algorithm and can be solved in various ways—for example, by
logging changes.
The part of an LSM tree that’s stored in RAM is called L0 (level zero). The size
of RAM is limited, so L0 is allocated a fixed amount of memory. For example, in
Tarantool, the L0 size is controlled by the vinyl_memory parameter. Initially,
when an LSM tree is empty, operations are written to L0. Recall that all
elements are ordered by key in ascending order, and then within a single key
scope, by LSN in descending order, so when a new value associated with a given
key gets inserted, it’s easy to locate the older value and delete it. L0 can be
structured as any container capable of storing a sorted sequence of elements.
For example, in Tarantool, L0 is implemented as a B+*-tree. Lookups and
insertions are standard operations for the data structure underlying L0, so I
won’t dwell on those.
Sooner or later the number of elements in an LSM tree exceeds the L0 size and
that’s when L0 gets written to a file on disk (called a “run”) and then cleared
for storing new elements. This operation is called a “dump”.
Dumps on disk form a sequence ordered by LSN: LSN ranges in different runs don’t
overlap, and the leftmost runs (at the head of the sequence) hold newer
operations. Think of these runs as a pyramid, with the newest ones closer to the
top. As runs keep getting dumped, the pyramid grows higher. Note that newer runs
may contain deletions or replacements for existing keys. To remove older data,
it’s necessary to perform garbage collection (this process is sometimes called
“merge” or “compaction”) by combining several older runs into a new one. If two
versions of the same key are encountered during a compaction, only the newer one
is retained; however, if a key insertion is followed by a deletion, then both
operations can be discarded.
The key choices determining an LSM tree’s efficiency are which runs to compact
and when to compact them. Suppose an LSM tree stores a monotonically increasing
sequence of keys (1, 2, 3, …,) with no deletions. In this case, compacting
runs would be useless: all of the elements are sorted, the tree doesn’t have any
garbage, and the location of any key can unequivocally be determined. On the
other hand, if an LSM tree contains many deletions, doing a compaction would
free up some disk space. However, even if there are no deletions, but key ranges
in different runs overlap a lot, compacting such runs could speed up lookups as
there would be fewer runs to scan. In this case, it might make sense to compact
runs after each dump. But keep in mind that a compaction causes all data stored
on disk to be overwritten, so with few reads it’s recommended to perform it less
often.
To ensure it’s optimally configurable for any of the scenarios above, an LSM
tree organizes all runs into a pyramid: the newer the data operations, the
higher up the pyramid they are located. During a compaction, the algorithm picks
two or more neighboring runs of approximately equal size, if possible.
- Multi-level compaction can span any number of levels
- A level can contain multiple runs
All of the neighboring runs of approximately equal size constitute an LSM tree
level on disk. The ratio of run sizes at different levels determines the
pyramid’s proportions, which allows optimizing the tree for write-intensive or
read-intensive scenarios.
Suppose the L0 size is 100 Mb, the ratio of run sizes at each level (the
vinyl_run_size_ratio parameter) is 5, and there can be no more than 2 runs per
level (the vinyl_run_count_per_level parameter). After the first 3 dumps, the
disk will contain 3 runs of 100 Mb each—which constitute L1 (level one). Since 3
> 2, the runs will be compacted into a single 300 Mb run, with the older ones
being deleted. After 2 more dumps, there will be another compaction, this time
of 2 runs of 100 Mb each and the 300 Mb run, which will produce one 500 Mb run.
It will be moved to L2 (recall that the run size ratio is 5), leaving L1 empty.
The next 10 dumps will result in L2 having 3 runs of 500 Mb each, which will be
compacted into a single 1500 Mb run. Over the course of 10 more dumps, the
following will happen: 3 runs of 100 Mb each will be compacted twice, as will
two 100 Mb runs and one 300 Mb run, which will yield 2 new 500 Mb runs in L2.
Since L2 now has 3 runs, they will also be compacted: two 500 Mb runs and one
1500 Mb run will produce a 2500 Mb run that will be moved to L3, given its size.
This can go on infinitely, but if an LSM tree contains lots of deletions, the
resulting compacted run can be moved not only down, but also up the pyramid due
to its size being smaller than the sizes of the original runs that were
compacted. In other words, it’s enough to logically track which level a certain
run belongs to, based on the run size and the smallest and greatest LSN among
all of its operations.
When doing a lookup in an LSM tree, what we need to find is not the element
itself, but the most recent operation associated with it. If it’s a deletion,
then the tree doesn’t contain this element. If it’s an insertion, we need to
grab the topmost value in the pyramid, and the search can be stopped after
finding the first matching key. In the worst-case scenario, that is if the tree
doesn’t hold the needed element, the algorithm will have to sequentially visit
all of the levels, starting from L0.
Unfortunately, this scenario is quite common in real life. For example, when
inserting a value into a tree, it’s necessary to make sure there are no
duplicates among primary/unique keys. So to speed up membership checks, LSM
trees use a probabilistic data structure called a “Bloom filter”, which will be
covered a bit later, in a section on how vinyl works under the hood.
In the case of a single-key search, the algorithm stops after encountering the
first match. However, when searching within a certain key range (for example,
looking for all the users with the last name “Ivanov”), it’s necessary to scan
all tree levels.
Searching within a range of [24,30)
The required range is formed the same way as when compacting several runs: the
algorithm picks the key with the largest LSN out of all the sources, ignoring
the other associated operations, then moves on to the next key and repeats the
procedure.
Why would one store deletions? And why doesn’t it lead to a tree overflow in the
case of for i=1,10000000 put(i) delete(i) end?
With regards to lookups, deletions signal the absence of a value being searched;
with compactions, they clear the tree of “garbage” records with older LSNs.
While the data is in RAM only, there’s no need to store deletions. Similarly,
you don’t need to keep them following a compaction if they affect, among other
things, the lowest tree level, which contains the oldest dump. Indeed, if a
value can’t be found at the lowest level, then it doesn’t exist in the tree.
- We can’t delete from append-only files
- Tombstones (delete markers) are inserted into L0 instead
Deletion, step 1: a tombstone is inserted into L0
Deletion, step 2: the tombstone passes through intermediate levels
Deletion, step 3: in the case of a major compaction, the tombstone is removed from the tree
If a deletion is known to come right after the insertion of a unique value,
which is often the case when modifying a value in a secondary index, then the
deletion can safely be filtered out while compacting intermediate tree levels.
This optimization is implemented in vinyl.
Advantages of an LSM tree
Apart from decreasing write amplification, the approach that involves
periodically dumping level L0 and compacting levels L1-Lk has a few advantages
over the approach to writes adopted by B-trees:
- Dumps and compactions write relatively large files: typically, the L0 size
is 50-100 Mb, which is thousands of times larger than the size of a B-tree
block.
- This large size allows efficiently compressing data before writing it.
Tarantool compresses data automatically, which further decreases write
amplification.
- There is no fragmentation overhead, since there’s no
padding/empty space between the elements inside a run.
- All operations create
new runs instead of modifying older data in place. This allows avoiding those
nasty locks that everyone hates so much. Several operations can run in
parallel without causing any conflicts. This also simplifies making backups
and moving data to replicas.
- Storing older versions of data allows for the
efficient implementation of transaction support by using multiversion
concurrency control.
Disadvantages of an LSM tree and how to deal with them
One of the key advantages of the B-tree as a search data structure is its
predictability: all operations take no longer than
to run.
Conversely, in a classical LSM tree, both read and write speeds can differ by a
factor of hundreds (best case scenario) or even thousands (worst case scenario).
For example, adding just one element to L0 can cause it to overflow, which can
trigger a chain reaction in levels L1, L2, and so on. Lookups may find the
needed element in L0 or may need to scan all of the tree levels. It’s also
necessary to optimize reads within a single level to achieve speeds comparable
to those of a B-tree. Fortunately, most disadvantages can be mitigated or even
eliminated with additional algorithms and data structures. Let’s take a closer
look at these disadvantages and how they’re dealt with in Tarantool.
Unpredictable write speed
In an LSM tree, insertions almost always affect L0 only. How do you avoid idle
time when the memory area allocated for L0 is full?
Clearing L0 involves two lengthy operations: writing to disk and memory
deallocation. To avoid idle time while L0 is being dumped, Tarantool uses
writeaheads. Suppose the L0 size is 256 Mb. The disk write speed is 10 Mbps.
Then it would take 26 seconds to dump L0. The insertion speed is 10,000 RPS,
with each key having a size of 100 bytes. While L0 is being dumped, it’s
necessary to reserve 26 Mb of RAM, effectively slicing the L0 size down to 230
Mb.
Tarantool does all of these calculations automatically, constantly updating the
rolling average of the DBMS workload and the histogram of the disk speed. This
allows using L0 as efficiently as possible and it prevents write requests from
timing out. But in the case of workload surges, some wait time is still
possible. That’s why we also introduced an insertion timeout (the
vinyl_timeout parameter), which is set to 60 seconds by default. The write
operation itself is executed in dedicated threads. The number of these threads
(4 by default) is controlled by the vinyl_write_threads parameter. The default
value of 2 allows doing dumps and compactions in parallel, which is also
necessary for ensuring system predictability.
In Tarantool, compactions are always performed independently of dumps, in a
separate execution thread. This is made possible by the append-only nature of an
LSM tree: after dumps runs are never changed, and compactions simply create new
runs.
Delays can also be caused by L0 rotation and the deallocation of memory dumped
to disk: during a dump, L0 memory is owned by two operating system threads, a
transaction processing thread and a write thread. Even though no elements are
being added to the rotated L0, it can still be used for lookups. To avoid read
locks when doing lookups, the write thread doesn’t deallocate the dumped memory,
instead delegating this task to the transaction processor thread. Following a
dump, memory deallocation itself happens instantaneously: to achieve this, L0
uses a special allocator that deallocates all of the memory with a single
operation.
- anticipatory dump
- throttling
The dump is performed from the so-called “shadow” L0 without blocking new
insertions and lookups
Optimizing reads is the most difficult optimization task with regards to LSM
trees. The main complexity factor here is the number of levels: any optimization
causes not only much slower lookups, but also tends to require significantly
larger RAM resources. Fortunately, the append-only nature of LSM trees allows us
to address these problems in ways that would be nontrivial for traditional data
structures.
- page index
- bloom filters
- tuple range cache
- multi-level compaction
Compression and page index
In B-trees, data compression is either the hardest problem to crack or a great
marketing tool—rather than something really useful. In LSM trees, compression
works as follows:
During a dump or compaction all of the data within a single run is split into
pages. The page size (in bytes) is controlled by the vinyl_page_size
parameter and can be set separately for each index. A page doesn’t have to be
exactly of vinyl_page_size size—depending on the data it holds, it can be a
little bit smaller or larger. Because of this, pages never have any empty space
inside.
Data is compressed by
Facebook’s streaming algorithm
called “zstd”. The first key of each page, along with the page offset, is added
to a “page index”, which is a separate file that allows the quick retrieval
of any page. After a dump or compaction, the page index of the created run is
also written to disk.
All .index files are cached in RAM, which allows finding the necessary page
with a single lookup in a .run file (in vinyl, this is the extension of files
resulting from a dump or compaction). Since data within a page is sorted, after
it’s read and decompressed, the needed key can be found using a regular binary
search. Decompression and reads are handled by separate threads, and are
controlled by the vinyl_read_threads parameter.
Tarantool uses a universal file format: for example, the format of a .run file
is no different from that of an .xlog file (log file). This simplifies backup
and recovery as well as the usage of external tools.
Even though using a page index enables scanning fewer pages per run when doing a
lookup, it’s still necessary to traverse all of the tree levels. There’s a
special case, which involves checking if particular data is absent when scanning
all of the tree levels and it’s unavoidable: I’m talking about insertions into a
unique index. If the data being inserted already exists, then inserting the same
data into a unique index should lead to an error. The only way to throw an error
in an LSM tree before a transaction is committed is to do a search before
inserting the data. Such reads form a class of their own in the DBMS world and
are called “hidden” or “parasitic” reads.
Another operation leading to hidden reads is updating a value in a field on
which a secondary index is defined. Secondary keys are regular LSM trees that
store differently ordered data. In most cases, in order not to have to store all
of the data in all of the indexes, a value associated with a given key is kept
in whole only in the primary index (any index that stores both a key and a value
is called “covering” or “clustered”), whereas the secondary index only stores
the fields on which a secondary index is defined, and the values of the fields
that are part of the primary index. Thus, each time a change is made to a value
in a field on which a secondary index is defined, it’s necessary to first remove
the old key from the secondary index—and only then can the new key be inserted.
At update time, the old value is unknown, and it is this value that needs to be
read in from the primary key “under the hood”.
For example:
update t1 set city=’Moscow’ where id=1
To minimize the number of disk reads, especially for nonexistent data, nearly
all LSM trees use probabilistic data structures, and Tarantool is no exception.
A classical Bloom filter is made up of several (usually 3-to-5) bit arrays. When
data is written, several hash functions are calculated for each key in order to
get corresponding array positions. The bits at these positions are then set to
1. Due to possible hash collisions, some bits might be set to 1 twice. We’re
most interested in the bits that remain 0 after all keys have been added. When
looking for an element within a run, the same hash functions are applied to
produce bit positions in the arrays. If any of the bits at these positions is 0,
then the element is definitely not in the run. The probability of a false
positive in a Bloom filter is calculated using Bayes’ theorem: each hash
function is an independent random variable, so the probability of a collision
simultaneously occurring in all of the bit arrays is infinitesimal.
The key advantage of Bloom filters in Tarantool is that they’re easily
configurable. The only parameter that can be specified separately for each index
is called vinyl_bloom_fpr (FPR stands for “false positive ratio”) and it has the
default value of 0.05, which translates to a 5% FPR. Based on this parameter,
Tarantool automatically creates Bloom filters of the optimal size for partial-
key and full-key searches. The Bloom filters are stored in the .index file,
along with the page index, and are cached in RAM.
A lot of people think that caching is a silver bullet that can help with any
performance issue. “When in doubt, add more cache”. In vinyl, caching is viewed
rather as a means of reducing the overall workload and consequently, of getting
a more stable response time for those requests that don’t hit the cache. vinyl
boasts a unique type of cache among transactional systems called a “range tuple
cache”. Unlike, say, RocksDB or MySQL, this cache doesn’t store pages, but
rather ranges of index values obtained from disk, after having performed a
compaction spanning all tree levels. This allows the use of caching for both
single-key and key-range searches. Since this method of caching stores only hot
data and not, say, pages (you may need only some data from a page), RAM is used
in the most efficient way possible. The cache size is controlled by the
vinyl_cache parameter.
Garbage collection control
Chances are that by now you’ve started losing focus and need a well-deserved
dopamine reward. Feel free to take a break, since working through the rest of
the article is going to take some serious mental effort.
An LSM tree in vinyl is just a small piece of the puzzle. Even with a single
table (or so-called “space”), vinyl creates and maintains several LSM trees, one
for each index. But even a single index can be comprised of dozens of LSM trees.
Let’s try to understand why this might be necessary.
Recall our example with a tree containing 100,000,000 records, 100 bytes each.
As time passes, the lowest LSM level may end up holding a 10 Gb run. During
compaction, a temporary run of approximately the same size will be created. Data
at intermediate levels takes up some space as well, since the tree may store
several operations associated with a single key. In total, storing 10 Gb of
actual data may require up to 30 Gb of free space: 10 Gb for the last tree
level, 10 Gb for a temporary run, and 10 Gb for the remaining data. But what if
the data size is not 10 Gb, but 1 Tb? Requiring that the available disk space
always be several times greater than the actual data size is financially
unpractical, not to mention that it may take dozens of hours to create a 1 Tb
run. And in the case of an emergency shutdown or system restart, the process
would have to be started from scratch.
Here’s another scenario. Suppose the primary key is a monotonically increasing
sequence—for example, a time series. In this case, most insertions will fall
into the right part of the key range, so it wouldn’t make much sense to do a
compaction just to append a few million more records to an already huge run.
But what if writes predominantly occur in a particular region of the key range,
whereas most reads take place in a different region? How do you optimize the
form of the LSM tree in this case? If it’s too high, read performance is
impacted; if it’s too low—write speed is reduced.
Tarantool “factorizes” this problem by creating multiple LSM trees for each
index. The approximate size of each subtree may be controlled by the
vinyl_range_size configuration parameter. We call such
subtrees “ranges”.
Factorizing large LSM trees via ranging
- Ranges reflect a static layout of sorted runs
- Slices connect a sorted run into a range
Initially, when the index has few elements, it consists of a single range. As more
elements are added, its total size may exceed
the maximum range size. In that case a
special operation called “split” divides the tree into two equal parts. The tree
is split at the middle element in the range of keys stored in the tree. For
example, if the tree initially stores the full range of -inf…+inf, then after
splitting it at the middle key X, we get two subtrees: one that stores the range
of -inf…X, and the other storing the range of X…+inf. With this approach, we
always know which subtree to use for writes and which one for reads. If the tree
contained deletions and each of the neighboring ranges grew smaller as a result,
the opposite operation called “coalesce” combines two neighboring trees into
one.
Split and coalesce don’t entail a compaction, the creation of new runs, or other
resource-intensive operations. An LSM tree is just a collection of runs. vinyl
has a special metadata log that helps keep track of which run belongs to which
subtree(s). This has the .vylog extension and its format is compatible with an
.xlog file. Similarly to an .xlog file, the metadata log gets rotated at each
checkpoint. To avoid the creation of extra runs with split and coalesce, we have
also introduced an auxiliary entity called “slice”. It’s a reference to a run
containing a key range and it’s stored only in the metadata log. Once the
reference counter drops to zero, the corresponding file gets removed. When it’s
necessary to perform a split or to coalesce, Tarantool creates slice objects for
each new tree, removes older slices, and writes these operations to the metadata
log, which literally stores records that look like this: <tree id, slice id>
or <slice id, run id, min, max>.
This way all of the heavy lifting associated with splitting a tree into two
subtrees is postponed until a compaction and then is performed automatically. A
huge advantage of dividing all of the keys into ranges is the ability to
independently control the L0 size as well as the dump and compaction processes
for each subtree, which makes these processes manageable and predictable. Having
a separate metadata log also simplifies the implementation of both “truncate”
and “drop”. In vinyl, they’re processed instantly, since they only work with the
metadata log, while garbage collection is done in the background.
Advanced features of vinyl
In the previous sections, we mentioned only two operations stored by an
LSM tree: deletion and replacement. Let’s take a look at how all of the other
operations can be represented. An insertion can be represented via a
replacement—you just need to make sure there are no other elements with the
specified key. To perform an update, it’s necessary to read the older value from
the tree, so it’s easier to represent this operation as a replacement as
well—this speeds up future read requests by the key. Besides, an update must
return the new value, so there’s no avoiding hidden reads.
In B-trees, the cost
of hidden reads is negligible: to update a block, it first needs to be read from
disk anyway. Creating a special update operation for an LSM tree that doesn’t
cause any hidden reads is really tempting.
Such an operation must contain not
only a default value to be inserted if a key has no value yet, but also a list
of update operations to perform if a value does exist.
At transaction execution
time, Tarantool just saves the operation in an LSM tree, then “executes” it
later, during a compaction.
The upsert operation:
space:upsert(tuple, {{operator, field, value}, ... })
- Non-reading update or insert
- Delayed execution
- Background upsert squashing prevents upserts from piling up
Unfortunately, postponing the operation execution until a
compaction doesn’t leave much leeway in terms of error handling. That’s why
Tarantool tries to validate upserts as fully as possible before writing them to
an LSM tree. However, some checks are only possible with older data on hand, for
example when the update operation is trying to add a number to a string or to
remove a field that doesn’t exist.
A semantically similar operation exists in
many products including PostgreSQL and MongoDB. But anywhere you look, it’s just
syntactic sugar that combines the update and replace operations without avoiding
hidden reads. Most probably, the reason is that LSM trees as data storage structures
are relatively new.
Even though an upsert is a very important optimization and
implementing it cost us a lot of blood, sweat, and tears, we must admit that it
has limited applicability. If a table contains secondary keys or triggers,
hidden reads can’t be avoided. But if you have a scenario where secondary keys
are not required and the update following the transaction completion will
certainly not cause any errors, then the operation is for you.
I’d like to tell
you a short story about an upsert. It takes place back when vinyl was only
beginning to “mature” and we were using an upsert in production for the first
time. We had what seemed like an ideal environment for it: we had tons of keys,
the current time was being used as values; update operations were inserting keys
or modifying the current time; and we had few reads. Load tests yielded great
results.
Nevertheless, after a couple of days, the Tarantool process started
eating up 100% of our CPU, and the system performance dropped close to zero.
We
started digging into the issue and found out that the distribution of requests
across keys was significantly different from what we had seen in the test
environment. It was…well, quite nonuniform. Most keys were updated once or
twice a day, so the database was idle for the most part, but there were much
hotter keys with tens of thousands of updates per day. Tarantool handled those
just fine. But in the case of lookups by key with tens of thousands of upserts,
things quickly went downhill. To return the most recent value, Tarantool had to
read and “replay” the whole history consisting of all of the upserts. When
designing upserts, we had hoped this would happen automatically during a
compaction, but the process never even got to that stage: the L0 size was more
than enough, so there were no dumps.
We solved the problem by adding a
background process that performed readaheads on any keys that had more than a
few dozen upserts piled up, so all those upserts were squashed and substituted
with the read value.
Update is not the only operation where
optimizing hidden reads is critical. Even the replace operation, given secondary
keys, has to read the older value: it needs to be independently deleted from the
secondary indexes, and inserting a new element might not do this, leaving some
garbage behind.
If secondary indexes are not unique, then collecting “garbage” from them can be
put off until a compaction, which is what we do in Tarantool. The
append-only nature of LSM trees allowed us to implement full-blown serializable
transactions in vinyl. Read-only requests use older versions of data without
blocking any writes. The transaction manager itself is fairly simple for now: in
classical terms, it implements the MVTO (multiversion timestamp ordering) class,
whereby the winning transaction is the one that finished earlier. There are no
locks and associated deadlocks. Strange as it may seem, this is a drawback
rather than an advantage: with parallel execution, you can increase the number
of successful transactions by simply holding some of them on lock when
necessary. We’re planning to improve the transaction manager soon. In the
current release, we focused on making the algorithm behave 100% correctly and
predictably. For example, our transaction manager is one of the few on the NoSQL
market that supports so-called “gap locks”.
Difference between memtx and vinyl storage engines
The primary difference between memtx and vinyl is that memtx is an in-memory
engine while vinyl is an on-disk engine. An in-memory storage engine is
generally faster (each query is usually run under 1 ms), and the memtx engine
is justifiably the default for Tarantool. But on-disk engine such as vinyl is
preferable when the database is larger than the available memory, and adding more
memory is not a realistic option.
| Option |
memtx |
vinyl |
| Supported index type |
TREE, HASH, RTREE or BITSET |
TREE |
| Temporary spaces |
Supported |
Not supported |
| random() function |
Supported |
Not supported |
| alter() function |
Supported |
Supported starting from the 1.10.2 release
(the primary index cannot be modified) |
| len() function |
Returns the number of tuples in the space |
Returns the maximum approximate number of tuples in
the space |
| count() function |
Takes a constant amount of time |
Takes a variable amount of time depending on a state
of a DB |
| delete() function |
Returns the deleted tuple, if any |
Always returns nil |
| yield |
Does not yield on the select requests unless the
transaction is committed to WAL |
Yields on the select requests or on its equivalents:
get() or pairs() |
Configuration
Tarantool provides the ability to configure the full topology of a cluster and set parameters specific for concrete instances, such as connection settings, memory used to store data, logging, and snapshot settings.
Each instance uses this configuration during startup to organize the cluster.
There are two approaches to configuring Tarantool:
Since version 3.0: In the YAML format.
YAML configuration allows you to provide the full cluster topology and specify all configuration options.
You can use local configuration in a YAML file for each instance or store configuration data in a reliable centralized storage.
In version 2.11 and earlier: In code using the box.cfg API.
In this case, configuration is provided in a Lua initialization script.
Note
Starting with the 3.0 version, configuring Tarantool in code is considered a legacy approach.
YAML configuration describes the full topology of a Tarantool cluster.
A cluster’s topology includes the following elements, starting from the lower level:
groups:
group001:
replicasets:
replicaset001:
instances:
instance001:
# ...
instance002:
# ...
instances
An instance represents a single running Tarantool instance.
It stores data or might act as a router for handling CRUD requests in a sharded cluster.
replicasets
A replica set is a pack of instances that operate on same data sets.
Replication provides redundancy and increases data availability.
groups
A group provides the ability to organize replica sets.
For example, in a sharded cluster, one group can contain storage instances and another group can contain routers used to handle CRUD requests.
You can flexibly configure a cluster’s settings on different levels: from global settings applied to all groups to parameters specific for concrete instances.
This section provides an overview on how to configure Tarantool in a YAML file.
Basic instance configuration
The example below shows a sample configuration of a single Tarantool instance:
# yaml-language-server: $schema=https://download.tarantool.org/tarantool/schema/config.schema.json
groups:
group001:
replicasets:
replicaset001:
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
- The
instances section includes only one instance named instance001.
The iproto.listen.uri option sets an address used to listen for incoming requests.
- The
replicasets section contains one replica set named replicaset001.
- The
groups section contains one group named group001.
Note
The initial line in this sample contains a link to an annotated Tarantool configuration
schema for a YAML language server (e.g. for LSP-Yaml).
With this link you can set up your code editor (VScode, Neovim, Sublime, etc.) to get
full-text annotations and completion prompts upon Alt+ESC (Linux) / Option+ESC (MacOS)
when you work with Tarantool configuration.
This section shows how to control a scope the specified configuration option is applied to.
Most of the configuration options can be applied to a specific instance, replica set, group, or to all instances globally.
Instance
To apply certain configuration options to a specific instance,
specify such options for this instance only.
In the example below, iproto.listen is applied to instance001 only.
groups:
group001:
replicasets:
replicaset001:
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
Replica set
In this example, iproto.listen is in effect for all instances in replicaset001.
groups:
group001:
replicasets:
replicaset001:
iproto:
listen:
- uri: '127.0.0.1:3301'
instances:
instance001: { }
Group
In this example, iproto.listen is in effect for all instances in group001.
groups:
group001:
iproto:
listen:
- uri: '127.0.0.1:3301'
replicasets:
replicaset001:
instances:
instance001: { }
Global
In this example, iproto.listen is applied to all instances of the cluster.
iproto:
listen:
- uri: '127.0.0.1:3301'
groups:
group001:
replicasets:
replicaset001:
instances:
instance001: { }
Configuration scopes above are listed in the order of their precedence – from highest to lowest.
For example, if the same option is defined at the instance and global level, the instance’s value takes precedence over the global one.
Note
The Configuration reference contains information about scopes to which each configuration option can be applied.
Configuration scopes: Replica set example
The example below shows how specific configuration options work in different configuration scopes for a replica set with a manual failover.
You can learn more about configuring replication from Replication tutorials.
credentials:
users:
replicator:
password: 'topsecret'
roles: [replication]
iproto:
advertise:
peer:
login: replicator
replication:
failover: manual
groups:
group001:
replicasets:
replicaset001:
leader: instance001
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
instance003:
iproto:
listen:
- uri: '127.0.0.1:3303'
credentials (global)
This section is used to create the replicator user and assign it the specified role.
These options are applied globally to all instances.
iproto (global, instance)
The iproto section is specified on both global and instance levels.
The iproto.advertise.peer option specifies the parameters used by an instance to connect to another instance as a replica, for example, a URI, a login and password, or SSL parameters .
In the example above, the option includes login only.
An URI is taken from iproto.listen that is set on the instance level.
replication (global)
The replication.failover global option sets a manual failover for all replica sets.
leader (replica set)
The <replicaset-name>.leader option sets a master instance for replicaset001.
Enabling and configuring roles
An application role is a Lua module that implements specific functions or logic.
You can turn on or off a particular role for certain instances in a configuration without restarting these instances.
There can be built-in Tarantool roles, roles provided by third-party Lua modules, or custom roles that are developed as a part of a cluster application.
This section describes how to enable and configure roles.
To learn how to develop custom roles, see Application roles.
To turn on or off a role for a specific instance or a set of instances, use the roles configuration option.
The example below shows how to enable the roles.crud-router role provided by the CRUD module using the roles option:
roles: [ roles.crud-router ]
Similarly, you can enable the roles.crud-storage role to make instances act as CRUD storages:
roles: [ roles.crud-storage ]
Example on GitHub: sharded_cluster_crud
The roles_cfg option allows you to specify the configuration for each role.
In this option, the role name is the key and the role configuration is the value.
The example below shows how to enable statistics on called operations by providing the roles.crud-router role’s configuration:
roles:
- roles.crud-router
- roles.metrics-export
roles_cfg:
roles.crud-router:
stats: true
stats_driver: metrics
stats_quantiles: true
Example on GitHub: sharded_cluster_crud_metrics
Roles and configuration scopes
As the most of configuration options, roles and their configurations can be defined at different levels.
Given that the roles option has the array type and roles_cfg has the map type, there are some specifics of applying the configuration:
For roles, an instance’s role takes precedence over roles defined at another level.
In the example below, instance001 has only role3:
# ...
replicaset001:
roles: [ role1, role2 ]
instances:
instance001:
roles: [ role3 ]
Learn more about the order of precedence for different configuration scopes in Configuration scopes.
For roles_cfg, the following rules are applied:
If a configuration for the same role is provided at different levels, an instance configuration takes precedence over the configuration defined at another level.
In the example below, role1.greeting is 'Hi':
# ...
replicaset001:
roles_cfg:
role1:
greeting: 'Hello'
instances:
instance001:
roles: [ role1 ]
roles_cfg:
role1:
greeting: 'Hi'
If the configurations for different roles are provided at different levels, both configurations are applied at the instance level.
In the example below, instance001 has role1.greeting set to 'Hi' and role2.farewell set to 'Bye':
# ...
replicaset001:
roles_cfg:
role1:
greeting: 'Hi'
instances:
instance001:
roles: [ role1, role2 ]
roles_cfg:
role2:
farewell: 'Bye'
Labels allow adding custom attributes to your cluster configuration. A label is
an arbitrary key: value pair with a string key and value.
labels:
dc: 'east'
production: 'false'
Labels can be defined in any configuration scope. An instance receives labels from
all scopes it belongs to. The labels section in a group or a replica set scope
applies to all instances of the group or a replica set. To override these labels on
the instance level or add instance-specific labels, define another labels section in the instance scope.
groups:
group001:
replicasets:
replicaset001:
labels:
dc: 'east'
production: 'false'
instances:
instance001:
labels:
rack: '10'
production: 'true'
Example on GitHub: labels
To access instance labels from the application code, call the config:get() function:
Labels can be used to direct function calls to instances that match certain criteria
using the connpool module.
In a configuration file, you can use the following predefined variables that are replaced with actual values at runtime:
instance_name
replicaset_name
group_name
To reference these variables in a configuration file, enclose them in double curly braces with whitespaces.
In the example below, {{ instance_name }} is replaced with instance001.
groups:
group001:
replicasets:
replicaset001:
instances:
instance001:
snapshot:
dir: ./var/{{ instance_name }}/snapshots
wal:
dir: ./var/{{ instance_name }}/wals
As a result, the paths to snapshots and write-ahead logs differ for different instances.
Conditional configuration sections
A YAML configuration can include parts that apply only to instances that meet certain conditions.
This is useful for cluster upgrade scenarios: during an upgrade, instances can be running
different Tarantool versions and therefore require different configurations.
Conditional parts are defined in the conditional configuration section in the global scope.
It includes one or more if subsections. Each if subsection defines conditions
and configuration parts that apply to instances that meet these conditions.
The example below shows a conditional section for cluster upgrade from Tarantool 3.0.0
to Tarantool 3.1.0:
- The user-defined label
upgraded is true
on instances that are running Tarantool 3.1.0 or later. On older versions, it is false.
- Two compat options that were introduced in 3.1.0 are defined for Tarantool 3.1.0
instances. On older versions, they would cause an error.
conditional:
- if: tarantool_version < 3.1.0
labels:
upgraded: 'false'
- if: tarantool_version >= 3.1.0
labels:
upgraded: 'true'
compat:
box_error_serialize_verbose: 'new'
box_error_unpack_type_and_code: 'new'
Example on GitHub: conditional
if sections can use one variable – tarantool_version. It contains
a three-number Tarantool version and compares with values of the same format
using the comparison operators >, <, >=, <=, ==, and !=.
You can write complex conditions using the logical operators || (OR) and && (AND).
Parentheses () can be used to define the operators precedence.
conditional:
- if: (tarantool_version > 3.2.0 || tarantool_version == 3.1.3) && tarantool_version <= 3.99.0
-- < ... >
If the same option is set in multiple if sections that are true for an instance,
this option receives the value from the section declared last in the configuration.
Example:
conditional:
- if: tarantool_version >= 3.0.0
labels:
version: '3.0' # applies to versions >= 3.0.0 and < 3.1.0
- if: tarantool_version >= 3.1.0
labels:
version: '3.1+' # applies to versions >= 3.1.0
For each configuration parameter, Tarantool provides two sets of predefined environment variables:
TT_<CONFIG_PARAMETER>. These variables are used to substitute parameters specified in a configuration file.
This means that these variables have a higher priority than the options specified in a configuration file.
TT_<CONFIG_PARAMETER>_DEFAULT. These variables are used to specify default values for parameters missing in a configuration file.
These variables have a lower priority than the options specified in a configuration file.
For example, TT_IPROTO_LISTEN and TT_IPROTO_LISTEN_DEFAULT correspond to the iproto.listen option.
TT_SNAPSHOT_DIR and TT_SNAPSHOT_DIR_DEFAULT correspond to the snapshot.dir option.
To see all the supported environment variables, execute the tarantool command with the --help-env-list option.
$ tarantool --help-env-list
Note
There are also special TT_INSTANCE_NAME and TT_CONFIG environment variables that can be used to start the specified Tarantool instance with configuration from the given file.
Below are a few examples that show how to set environment variables of different types, like string, number, array, or map.
In this example, TT_LOG_LEVEL is used to set a logging level to CRITICAL:
$ export TT_LOG_LEVEL='crit'
In this example, a logging level is set to CRITICAL using a corresponding numeric value:
The examples below show how to set the TT_SHARDING_ROLES variable that accepts an array value.
Arrays can be passed in two ways: using a simple …
$ export TT_SHARDING_ROLES=router,storage
… or JSON format:
$ export TT_SHARDING_ROLES='["router", "storage"]'
The simple format is applicable only to arrays containing scalar values.
To assign map values to environment variables, you can also use simple or JSON formats.
In the example below, TT_LOG_MODULES sets different logging levels for different modules using a simple format:
$ export TT_LOG_MODULES=module1=info,module2=error
In the next example, TT_ROLES_CFG is used to specify the value of a custom configuration for a role using a JSON format:
$ export TT_ROLES_CFG='{"greeter":{"greeting":"Hello"}}'
The simple format is applicable only to maps containing scalar values.
In the example below, TT_IPROTO_LISTEN is used to specify a listening host and port values:
$ export TT_IPROTO_LISTEN=['{"uri":"127.0.0.1:3311"}']
You can also pass several listening addresses:
$ export TT_IPROTO_LISTEN=['{"uri":"127.0.0.1:3311"}','{"uri":"127.0.0.1:3312"}']
Centralized configuration
Enterprise Edition
Centralized configuration storages are supported by the Enterprise Edition only.
Tarantool enables you to store configuration data in one place using a Tarantool or etcd-based storage.
To achieve this, you need to:
Set up a centralized configuration storage.
Publish a cluster’s configuration to the storage.
Configure a connection to the storage by providing a local YAML configuration with an endpoint address and key prefix in the config section:
config:
etcd:
endpoints:
- http://localhost:2379
prefix: /myapp
Learn more from the following guide: Centralized configuration storages.
Tarantool configuration options are applied from multiple sources with the following precedence, from highest to lowest:
If the same option is defined in two or more locations, the option with the highest precedence is applied.
Centralized configuration storages
Enterprise Edition
Centralized configuration storages are supported by the Enterprise Edition only.
Examples on GitHub: centralized_config
Tarantool enables you to store a cluster’s configuration in one reliable place using a Tarantool or etcd-based storage:
- A Tarantool-based configuration storage is a replica set that stores a cluster’s configuration in synchronous spaces.
- etcd is a distributed key-value storage for any type of critical data used by distributed systems.
With a local YAML configuration, you need to make sure that all cluster instances use identical configuration files:
Using a centralized configuration storage, all instances get the actual configuration from one place:
This topic describes how to set up a configuration storage, publish a cluster configuration to this storage, and use this configuration for all cluster instances.
Setting up a configuration storage
To learn how to set up an etcd-based configuration storage, consult the etcd documentation.
The example script below demonstrates how to use the etcdctl utility to create a user that has read and write access to configurations stored by the /myapp/ prefix:
etcdctl user add root:topsecret
etcdctl role add myapp_config_manager
etcdctl role grant-permission myapp_config_manager --prefix=true readwrite /myapp/
etcdctl user add sampleuser:123456
etcdctl user grant-role sampleuser myapp_config_manager
etcdctl auth enable
The credentials of this user should be specified when configuring a connection to the etcd cluster.
Publishing a cluster’s configuration
Publishing configuration using the tt utility
The tt utility provides the tt cluster command for managing centralized cluster configurations.
The tt cluster publish command can be used to publish a cluster’s configuration to both Tarantool and etcd-based storages.
The example below shows how a tt environment and a layout of the application called myapp might look:
├── tt.yaml
├── source.yaml
└── instances.enabled
└── myapp
├── config.yaml
└── instances.yml
tt.yaml: a tt configuration file.
source.yaml contains a cluster’s configuration to be published.
config.yaml contains a local configuration used to connect to the centralized storage.
instances.yml specifies instances to run in the current environment.
The configured instances are used by tt when starting a cluster.
tt cluster publish ignores this configuration file.
To publish a cluster’s configuration (source.yaml) to a centralized storage, execute tt cluster publish as follows:
$ tt cluster publish "http://sampleuser:123456@localhost:2379/myapp" source.yaml
Executing this command publishes a cluster configuration by the /myapp/config/all path.
Note
You can see a cluster’s configuration using the tt cluster show command.
Publishing configuration using the ‘config’ module
The config module provides the API for interacting with a Tarantool-based configuration storage.
The example below shows how to read a configuration stored in the source.yaml file using the fio module API and put this configuration by the /myapp/config/all path:
local fio = require('fio')
local cluster_config_handle = fio.open('../../source.yaml')
local cluster_config = cluster_config_handle:read()
local response = config.storage.put('/myapp/config/all', cluster_config)
cluster_config_handle:close()
Learn more from the config.storage API section.
Note
The net.box module provides the ability to monitor configuration updates by watching path or prefix changes. Learn more in conn:watch().
Publishing configuration using etcdctl
To publish a cluster’s configuration to etcd using the etcdctl utility, use the put command:
$ etcdctl put /myapp/config/all < source.yaml
Note
For etcd versions earlier than 3.4, you need to set the ETCDCTL_API environment variable to 3.
Configuring connection to a storage
To use a configuration from a centralized storage for your cluster, you need to provide connection settings in a local configuration file.
Enterprise Edition
Centralized configuration storages are supported by the Enterprise Edition only.
Configuring connection to an etcd storage
Connection options for etcd should be specified in the config.etcd section of the configuration file.
In the example below, the following options are specified:
config:
etcd:
endpoints:
- http://localhost:2379
prefix: /myapp
username: sampleuser
password: '123456'
http:
request:
timeout: 3
endpoints specifies the list of etcd endpoints.
prefix sets a key prefix used to search a configuration. Tarantool searches keys by the following path: <prefix>/config/*. Note that <prefix> should start with a slash (/).
username and password specify credentials used for authentication.
http.request.timeout configures a request timeout for an etcd server.
You can find the full example here: config_etcd.
Note
To run instances in production, it is recommended to use Ansible Tarantool Enterprise installer (ATE).
ATE is a set of Ansible playbooks that are used to deploy and maintain Tarantool Enterprise products.
ATE documentation is available to users logged in on the Tarantool website.
The tt utility is the recommended way to start Tarantool instances.
You can learn how to do this from the Starting and stopping instances section.
You can also use the tarantool command to start a Tarantool instance.
In this case, you can eliminate creating a local configuration and
provide connection settings using the following environment variables:
- Tarantool-based storage:
TT_CONFIG_STORAGE_ENDPOINTS and TT_CONFIG_STORAGE_PREFIX.
- etcd-based storage:
TT_CONFIG_ETCD_ENDPOINTS and TT_CONFIG_ETCD_PREFIX.
Enterprise Edition
Centralized configuration storages are supported by the Enterprise Edition only.
The example below shows how to provide etcd connection settings and start cluster instances using the tarantool command:
$ export TT_CONFIG_ETCD_ENDPOINTS=http://localhost:2379
$ export TT_CONFIG_ETCD_PREFIX=/myapp
$ tarantool --name instance001
$ tarantool --name instance002
$ tarantool --name instance003
By default, Tarantool watches keys with the specified prefix for changes in a cluster’s configuration and reloads a changed configuration automatically.
If necessary, you can set the config.reload option to manual to turn off configuration reloading:
config:
reload: 'manual'
etcd:
# ...
In this case, you can reload a configuration in an admin console or application code using the reload() function provided by the config module:
require('config'):reload()
Configuration in code
Note
Starting with the 3.0 version, the recommended way of configuring Tarantool is using a configuration file.
Configuring Tarantool in code is considered a legacy approach.
This topic covers the specifics of configuring Tarantool in code using the box.cfg API.
In this case, a configuration is stored in an initialization file - a Lua script with the specified configuration options.
You can find all the available options in the Configuration reference.
If the command to start Tarantool includes an instance file, then
Tarantool begins by invoking the Lua program in the file, which may have the name init.lua.
The Lua program may get further arguments
from the command line or may use operating-system functions, such as getenv().
The Lua program almost always begins by invoking box.cfg(), if the database
server will be used or if ports need to be opened. For example, suppose
init.lua contains the lines
#!/usr/bin/env tarantool
box.cfg{
listen = os.getenv("LISTEN_URI"),
memtx_memory = 33554432,
pid_file = "tarantool.pid",
wal_max_size = 2500
}
print('Starting ', arg[1])
and suppose the environment variable LISTEN_URI contains 3301,
and suppose the command line is tarantool init.lua ARG.
Then the screen might look like this:
$ export LISTEN_URI=3301
$ tarantool init.lua ARG
... main/101/init.lua C> Tarantool 2.8.3-0-g01023dbc2
... main/101/init.lua C> log level 5
... main/101/init.lua I> mapping 33554432 bytes for memtx tuple arena...
... main/101/init.lua I> recovery start
... main/101/init.lua I> recovering from './00000000000000000000.snap'
... main/101/init.lua I> set 'listen' configuration option to "3301"
... main/102/leave_local_hot_standby I> ready to accept requests
Starting ARG
... main C> entering the event loop
If you wish to start an interactive session on the same terminal after
initialization is complete, you can pass the -i command-line option.
Starting from version 2.8.1, you can specify configuration parameters via special environment variables.
The name of a variable should have the following pattern: TT_<NAME>,
where <NAME> is the uppercase name of the corresponding box.cfg parameter.
For example:
In case of an array value, separate the array elements by a comma without space:
export TT_REPLICATION="localhost:3301,localhost:3302"
If you need to pass additional parameters for URI, use the ? and & delimiters:
export TT_LISTEN="localhost:3301?param1=value1¶m2=value2"
An empty variable (TT_LISTEN=) has the same effect as an unset one, meaning that the corresponding configuration parameter won’t be set when calling box.cfg{}.
Configuration parameters have the form:
box.cfg{[key = value [, key = value ...]]}
Configuration parameters can be set in a Lua initialization file,
which is specified on the Tarantool command line.
Most configuration parameters are for allocating resources, opening ports, and
specifying database behavior. All parameters are optional.
Most of the parameters are dynamic, that is, they can be changed at runtime by calling box.cfg{} a second time.
For example, the command below sets the listen port to 3301.
To see all the non-null parameters, execute box.cfg (no parentheses).
To see a particular parameter value, call a corresponding box.cfg option.
For example, box.cfg.listen shows the specified listen address.
Some configuration parameters and some functions depend on a URI (Universal Resource Identifier).
The URI string format is similar to the
generic syntax for a URI schema.
It may contain (in order):
- user name for login
- password
- host name or host IP address
- port number
- query parameters
Only a port number is always mandatory. A password is mandatory if a user
name is specified unless the user name is ‘guest’.
Formally, the URI
syntax is [host:]port or [username:password@]host:port.
If a host is omitted, then “0.0.0.0” or “[::]” is assumed,
meaning respectively any IPv4 address or any IPv6 address
on the local machine.
If username:password is omitted, then the “guest” user is assumed. Some examples:
| URI fragment |
Example |
| port |
3301 |
| host:port |
127.0.0.1:3301 |
| username:password@host:port |
notguest:sesame@mail.ru:3301 |
In code, the URI value can be passed as a number (if only a port is specified) or a string:
box.cfg { listen = 3301 }
box.cfg { listen = "127.0.0.1:3301" }
In certain circumstances, a Unix domain socket may be used
where a URI is expected, for example, unix/:/tmp/unix_domain_socket.sock or
simply /tmp/unix_domain_socket.sock.
The uri module provides functions that convert URI strings into their
components or turn components into URI strings.
Starting from version 2.10.0, a user can open several listening iproto sockets on a Tarantool instance
and, consequently, can specify several URIs in the configuration parameters
such as box.cfg.listen and box.cfg.replication.
URI values can be set in a number of ways:
As a string with URI values separated by commas.
box.cfg { listen = "127.0.0.1:3301, /unix.sock, 3302" }
As a table that contains URIs in the string format.
box.cfg { listen = {"127.0.0.1:3301", "/unix.sock", "3302"} }
As an array of tables with the uri field.
box.cfg { listen = {
{uri = "127.0.0.1:3301"},
{uri = "/unix.sock"},
{uri = 3302}
}
}
In a combined way – an array that contains URIs in both the string and the table formats.
box.cfg { listen = {
"127.0.0.1:3301",
{ uri = "/unix.sock" },
{ uri = 3302 }
}
}
Also, starting from version 2.10.0, it is possible to specify additional parameters for URIs.
You can do this in different ways:
Using the ? delimiter when URIs are specified in a string format.
box.cfg { listen = "127.0.0.1:3301?p1=value1&p2=value2, /unix.sock?p3=value3" }
Using the params table: a URI is passed in a table with additional parameters in the “params” table.
Parameters in the “params” table overwrite the ones from a URI string (“value2” overwrites “value1” for p1 in the example below).
box.cfg { listen = {
"127.0.0.1:3301?p1=value1",
params = {p1 = "value2", p2 = "value3"}
}
}
Using the default_params table for specifying default parameter values.
In the example below, two URIs are passed in a table.
The default value for the p3 parameter is defined in the default_params table
and used if this parameter is not specified in URIs.
Parameters in the default_params table are applicable to all the URIs passed in a table.
box.cfg { listen = {
"127.0.0.1:3301?p1=value1",
{ uri = "/unix.sock", params = { p2 = "value2" } },
default_params = { p3 = "value3" }
}
}
The recommended way for specifying URI with additional parameters is the following:
box.cfg { listen = {
{uri = "127.0.0.1:3301", params = {p1 = "value1"}},
{uri = "/unix.sock", params = {p2 = "value2"}},
{uri = 3302, params = {p3 = "value3"}}
}
}
In case of a single URI, the following syntax also works:
box.cfg { listen = {
uri = "127.0.0.1:3301",
params = { p1 = "value1", p2 = "value2" }
}
}
Since version 2.10.0, Tarantool Enterprise Edition has the built-in support for using SSL to encrypt the client-server communications over binary connections,
that is, between Tarantool instances in a cluster or connecting to an instance via connectors using net.box.
Tarantool uses the OpenSSL library that is included in the delivery package.
Note that SSL connections use only TLSv1.2.
To configure traffic encryption, you need to set the special URI parameters for a particular connection.
The parameters can be set for the following box.cfg options and net.box method:
Below is the list of the parameters.
In the next section, you can find details and examples on what should be configured on both the server side and the client side.
transport – enables SSL encryption for a connection if set to ssl.
The default value is plain, which means the encryption is off. If the parameter is not set, the encryption is off too.
Other encryption-related parameters can be used only if the transport = 'ssl' is set.
Example:
local connection = require('net.box').connect({
uri = 'admin:topsecret@127.0.0.1:3301',
params = { transport = 'ssl',
ssl_cert_file = 'certs/instance001/server001.crt',
ssl_key_file = 'certs/instance001/server001.key',
ssl_password = 'qwerty' }
})
ssl_key_file – a path to a private SSL key file.
Mandatory for a server.
For a client, it’s mandatory if the ssl_ca_file parameter is set for a server; otherwise, optional.
If the private key is encrypted, provide a password for it in the ssl_password or ssl_password_file parameter.
ssl_cert_file – a path to an SSL certificate file.
Mandatory for a server.
For a client, it’s mandatory if the ssl_ca_file parameter is set for a server; otherwise, optional.
ssl_ca_file – a path to a trusted certificate authorities (CA) file. Optional. If not set, the peer won’t be checked for authenticity.
Both a server and a client can use the ssl_ca_file parameter:
- If it’s on the server side, the server verifies the client.
- If it’s on the client side, the client verifies the server.
- If both sides have the CA files, the server and the client verify each other.
ssl_ciphers – a colon-separated (:) list of SSL cipher suites the connection can use. See the Supported ciphers section for details. Optional.
Note that the list is not validated: if a cipher suite is unknown, Tarantool just ignores it, doesn’t establish the connection and writes to the log that no shared cipher found.
ssl_password – a password for an encrypted private SSL key. Optional. Alternatively, the password
can be provided in ssl_password_file.
ssl_password_file – a text file with one or more passwords for encrypted private SSL keys
(each on a separate line). Optional. Alternatively, the password can be provided in ssl_password.
Tarantool applies the ssl_password and ssl_password_file parameters in the following order:
- If
ssl_password is provided, Tarantool tries to decrypt the private key with it.
- If
ssl_password is incorrect or isn’t provided, Tarantool tries all passwords from ssl_password_file
one by one in the order they are written.
- If
ssl_password and all passwords from ssl_password_file are incorrect,
or none of them is provided, Tarantool treats the private key as unencrypted.
Configuration example:
box.cfg{ listen = {
uri = 'localhost:3301',
params = {
transport = 'ssl',
ssl_key_file = '/path_to_key_file',
ssl_cert_file = '/path_to_cert_file',
ssl_ciphers = 'HIGH:!aNULL',
ssl_password = 'topsecret'
}
}}
Tarantool Enterprise supports the following cipher suites:
- ECDHE-ECDSA-AES256-GCM-SHA384
- ECDHE-RSA-AES256-GCM-SHA384
- DHE-RSA-AES256-GCM-SHA384
- ECDHE-ECDSA-CHACHA20-POLY1305
- ECDHE-RSA-CHACHA20-POLY1305
- DHE-RSA-CHACHA20-POLY1305
- ECDHE-ECDSA-AES128-GCM-SHA256
- ECDHE-RSA-AES128-GCM-SHA256
- DHE-RSA-AES128-GCM-SHA256
- ECDHE-ECDSA-AES256-SHA384
- ECDHE-RSA-AES256-SHA384
- DHE-RSA-AES256-SHA256
- ECDHE-ECDSA-AES128-SHA256
- ECDHE-RSA-AES128-SHA256
- DHE-RSA-AES128-SHA256
- ECDHE-ECDSA-AES256-SHA
- ECDHE-RSA-AES256-SHA
- DHE-RSA-AES256-SHA
- ECDHE-ECDSA-AES128-SHA
- ECDHE-RSA-AES128-SHA
- DHE-RSA-AES128-SHA
- AES256-GCM-SHA384
- AES128-GCM-SHA256
- AES256-SHA256
- AES128-SHA256
- AES256-SHA
- AES128-SHA
- GOST2012-GOST8912-GOST8912
- GOST2001-GOST89-GOST89
Tarantool Enterprise static build has the embedded engine to support the GOST cryptographic algorithms.
If you use these algorithms for traffic encryption, specify the corresponding cipher suite in the ssl_ciphers parameter, for example:
box.cfg{ listen = {
uri = 'localhost:3301',
params = {
transport = 'ssl',
ssl_key_file = '/path_to_key_file',
ssl_cert_file = '/path_to_cert_file',
ssl_ciphers = 'GOST2012-GOST8912-GOST8912'
}
}}
For detailed information on SSL ciphers and their syntax, refer to OpenSSL documentation.
Using environment variables
The URI parameters for traffic encryption can also be set via environment variables, for example:
export TT_LISTEN="localhost:3301?transport=ssl&ssl_cert_file=/path_to_cert_file&ssl_key_file=/path_to_key_file"
Server-client configuration details
When configuring the traffic encryption, you need to specify the necessary parameters on both the server side and the client side.
Below you can find the summary on the options and parameters to be used and examples of configuration.
Server side
- Is configured via the
box.cfg.listen option.
- Mandatory URI parameters:
transport, ssl_key_file and ssl_cert_file.
- Optional URI parameters:
ssl_ca_file, ssl_ciphers, ssl_password, and ssl_password_file.
Client side
- Is configured via the
box.cfg.replication option (see details) or net_box_object.connect().
Parameters:
- If the server side has only the
transport, ssl_key_file and ssl_cert_file parameters set,
on the client side, you need to specify only transport = ssl as the mandatory parameter.
All other URI parameters are optional.
- If the server side also has the
ssl_ca_file parameter set,
on the client side, you need to specify transport, ssl_key_file and ssl_cert_file as the mandatory parameters.
Other parameters – ssl_ca_file, ssl_ciphers, ssl_password, and ssl_password_file – are optional.
Suppose, there is a master-replica set with two Tarantool instances:
- 127.0.0.1:3301 – master (server)
- 127.0.0.1:3302 – replica (client).
Examples below show the configuration related to connection encryption for two cases:
when the trusted certificate authorities (CA) file is not set on the server side and when it does.
Only mandatory URI parameters are mentioned in these examples.
- Without CA
- With CA
Storage
This section contains guides on configuring a storage.
In-memory storage
Example on GitHub: memtx
In Tarantool, all data is stored in random-access memory (RAM) by default.
For this purpose, the memtx storage engine is used.
This topic describes how to define basic settings related to in-memory storage in the
memtx section of a YAML configuration
– for example, memory size and maximum tuple size.
For the specific settings related to allocator or sorting threads,
check the corresponding memtx options in the Configuration reference.
In Tarantool, data is stored in spaces.
Each space consists of tuples – the database records.
To specify the amount of memory that Tarantool allocates to store tuples, use the
memtx.memory configuration option.
In the example below, the memory size is set to 1 GB (1073741824 bytes):
memtx:
memory: 1073741824
The server does not exceed this limit to allocate tuples.
For indexes and connection information, additional memory is used.
When the memtx.memory limit is reached, INSERT or UPDATE requests fail with
ER_MEMORY_ISSUE.
You can configure the minimum and the maximum tuple sizes in bytes.
- If the tuples are small, you can decrease the minimum size.
- If the tuples are large, you can increase the maximum size.
To define the tuple size, use the memtx.min_tuple_size and
memtx.max_tuple_size configuration options.
In the example, the minimum size is set to 8 bytes and the maximum size is set to 5 MB:
memtx:
memory: 1073741824
min_tuple_size: 8
max_tuple_size: 5242880
Persistence
To ensure data persistence, Tarantool provides the abilities to:
Record each data change request into a write-ahead log (WAL) file (.xlog files).
When a power outage occurs or the Tarantool instance is killed incidentally, the in-memory database is lost.
In such case, Tarantool restores the data from WAL files by reading them and redoing the requests.
This is called the “recovery process”.
Take internals-snapshot that contain an on-disk copy of the entire data set for a given moment
(.snap files).
During the recovery process, Tarantool can load the latest snapshot file and then read the requests from the WAL files, produced after this snapshot was made.
After creating a new snapshot, the earlier WAL files can be removed to free up space.
This topic describes how to configure:
- the snapshot creation in the snapshot section of a YAML configuration.
- the recording to the write-ahead log in the wal section of a YAML configuration.
To learn more about the persistence mechanism in Tarantool, see the Persistence section.
The formats of WAL and snapshot files are described in detail in the File formats section.
The checkpoint daemon (snapshot daemon) is a constantly running fiber.
The checkpoint daemon creates a schedule for the periodic snapshot creation based on
the configuration options and the speed of file size growth.
If enabled, the daemon makes new snapshot (.snap) files according to this schedule.
The work of the checkpoint daemon is based on the following configuration options:
- snapshot.by.interval – a new snapshot is taken once in a given period.
- snapshot.by.wal_size – a new snapshot is taken once the size
of all WAL files created since the last snapshot exceeds a given limit.
If necessary, the checkpoint daemon also activates the Tarantool garbage collector
that deletes old snapshots and WAL files.
Note
The memtx engine takes only regular snapshots with the interval set in
the checkpoint daemon configuration.
The vinyl engine runs checkpointing in the background at all times.
WAL extensions
WAL extensions allow you to add auxiliary information to each write-ahead log record.
For example, you can enable storing an old and new tuple for each CRUD operation performed.
This information might be helpful for implementing a CDC (Change Data Capture) utility
that transforms a data replication stream.
See also: Configure the write-ahead log.
WAL extensions are disabled by default.
To configure them, use the wal.ext.* configuration options.
Inside the wal.ext block, you can enable storing old and new tuples as follows:
To store old and new tuples in a write-ahead log for all spaces, set the
wal.ext.old and wal.ext.new
options to true:
To adjust these options for specific spaces, specify the wal.ext.spaces option:
wal:
ext:
old: true
new: true
spaces:
space1:
old: false
space2:
new: false
The configuration for specific spaces has priority over the configuration in the wal.ext.new and wal.ext.old
options.
It means that only new tuples are added to the log for space1 and only old tuples for space2.
Note that records with additional fields are replicated as follows:
- If a replica doesn’t support the extended format configured on a master, auxiliary fields are skipped.
- If a replica and master have different configurations for WAL records, the master’s configuration is ignored.
The table below demonstrates how write-ahead log records might look
for the specific CRUD operations
if storing old and new tuples is enabled for the bands space.
| Operation |
Example |
WAL information |
| insert |
bands:insert{4, 'The Beatles', 1960} |
new_tuple: [4, ‘The Beatles’, 1960]
tuple: [4, ‘The Beatles’, 1960]
|
| delete |
bands:delete{4} |
key: [4]
old_tuple: [4, ‘The Beatles’, 1960]
|
| update |
bands:update({2}, {{'=', 2, 'Pink Floyd'}}) |
new_tuple: [2, ‘Pink Floyd’, 1965]
old_tuple: [2, ‘Scorpions’, 1965]
key: [2]
tuple: [[‘=’, 2, ‘Pink Floyd’]]
|
| upsert |
bands:upsert({2, 'Pink Floyd', 1965}, {{'=', 2, 'The Doors'}}) |
new_tuple: [2, ‘The Doors’, 1965]
old_tuple: [2, ‘Pink Floyd’, 1965]
operations: [[‘=’, 2, ‘The Doors’]]
tuple: [2, ‘Pink Floyd’, 1965]
|
| replace |
bands:replace{1, 'The Beatles', 1960} |
old_tuple: [1, ‘Roxette’, 1986]
new_tuple: [1, ‘The Beatles’, 1960]
tuple: [1, ‘The Beatles’, 1960]
|
Storing both old and new tuples is especially useful for the update
operation because a write-ahead log record contains only a key value.
Defining and manipulating data
Tarantool stores data in spaces, which can be thought of as tables in a relational database.
Every record or row in a space is called a tuple.
A tuple may have any number of fields, and the fields may be of different types.
String data in fields are compared based on the specified collation rules.
The user can provide hard limits for data values through constraints
and link related spaces with foreign keys.
Tarantool supports highly customizable indexes of various types.
In particular, indexes can be defined with generators like sequences.
There are six basic data operations in Tarantool:
SELECT, INSERT, UPDATE, UPSERT, REPLACE, and DELETE. A number of complexity factors
affects the resource usage of each function.
Tarantool allows describing the data schema but does not require it.
The user can migrate a schema without migrating the data.
To ensure data persistence and recover quickly in case of failure,
Tarantool uses mechanisms like the write-ahead log (WAL) and snapshots.
This section contains guides on performing data operations in Tarantool.
Data storage
Tarantool operates data in the form of tuples.
- tuple
A tuple is a group of data values in Tarantool’s memory.
Think of it as a “database record” or a “row”.
The data values in the tuple are called fields.
When Tarantool returns a tuple value in the console,
by default, it uses YAML format,
for example: [3, 'Ace of Base', 1993].
Internally, Tarantool stores tuples as
MsgPack arrays.
- field
Fields are distinct data values, contained in a tuple.
They play the same role as “row columns” or “record fields” in relational databases,
with a few improvements:
- fields can be composite structures, such as arrays or maps,
- fields don’t need to have names.
A given tuple may have any number of fields, and the fields may be of
different types.
The field’s number is the identifier of the field.
Numbers are counted from base 1 in Lua and other 1-based languages,
or from base 0 in languages like PHP or C/C++.
So, 1 or 0 can be used in some contexts to refer to the first
field of a tuple.
Tarantool stores tuples in containers called spaces.
- space
In Tarantool, a space is a primary container that stores data.
It is analogous to tables in relational databases.
Spaces contain tuples – the Tarantool name for
database records.
The number of tuples in a space is unlimited.
At least one space is required to store data with Tarantool.
Each space has the following attributes:
- a unique name specified by the user,
- a unique numeric identifier which can be specified by
the user, but usually is assigned automatically by Tarantool,
- an engine: memtx (default) — in-memory engine,
fast but limited in size, or vinyl — on-disk engine for huge data sets.
To be functional, a space also needs to have a primary index.
It can also have secondary indexes.
Tarantool is both a database manager and an application server.
Therefore a developer often deals with two type sets:
the types of the programming language (such as Lua) and
the types of the Tarantool storage format (MsgPack).
| Scalar / compound |
MsgPack type |
Lua type |
Example value |
| scalar |
nil |
cdata |
box.NULL |
| scalar |
boolean |
boolean |
true |
| scalar |
string |
string |
'A B C' |
| scalar |
integer |
number |
12345 |
| scalar |
integer |
cdata |
12345 |
| scalar |
float64 (double) |
number |
1.2345 |
| scalar |
float64 (double) |
cdata |
1.2345 |
| scalar |
binary |
cdata |
[!!binary 3t7e] |
| scalar |
ext (for Tarantool decimal) |
cdata |
1.2 |
| scalar |
ext (for Tarantool datetime) |
cdata |
'2021-08-20T16:21:25.122999906 Europe/Berlin' |
| scalar |
ext (for Tarantool interval) |
cdata |
+1 months, 1 days |
| scalar |
ext (for Tarantool uuid) |
cdata |
12a34b5c-de67-8f90-123g-h4567ab8901 |
| compound |
map |
table (with string keys) |
{'a': 5, 'b': 6} |
| compound |
array |
table (with integer keys) |
[1, 2, 3, 4, 5] |
| compound |
array |
tuple (cdata) |
[12345, 'A B C'] |
Note
MsgPack values have variable lengths.
So, for example, the smallest number requires only one byte, but the largest number
requires nine bytes.
Note
The Lua nil type is encoded as MsgPack nil but
decoded as msgpack.NULL.
In Lua, the nil type has only one possible value, also called nil.
Tarantool displays it as null when using the default
YAML format.
Nil may be compared to values of any types with == (is-equal)
or ~= (is-not-equal), but other comparison operations will not work.
Nil may not be used in Lua tables; the workaround is to use
box.NULL because nil == box.NULL is true.
Example: nil.
A boolean is either true or false.
Example: true.
The Tarantool integer type is for integers between
-9223372036854775808 and 18446744073709551615, which is about 18 quintillion.
This type corresponds to the number type in Lua and to the integer type in MsgPack.
Example: -2^63.
The Tarantool unsigned type is for integers between
0 and 18446744073709551615. So it is a subset of integer.
Example: 123456.
The double field type exists
mainly to be equivalent to Tarantool/SQL’s
DOUBLE data type.
In msgpuck.h (Tarantool’s interface to MsgPack),
the storage type is MP_DOUBLE and the size of the encoded value is always 9 bytes.
In Lua, fields of the double type can only contain non-integer numeric values and
cdata values with double floating-point numbers.
Examples: 1.234, -44, 1.447e+44.
To avoid using the wrong kind of values inadvertently, use
ffi.cast() when searching or changing double fields.
For example, instead of
space_object:insert{value}
use
ffi = require('ffi') ...
space_object:insert({ffi.cast('double',value)}).
Example:
Arithmetic with cdata double will not work reliably, so
for Lua, it is better to use the number type.
This warning does not apply for Tarantool/SQL because
Tarantool/SQL does
implicit casting.
The Tarantool number field may have both
integer and floating-point values, although in Lua a number
is a double-precision floating-point.
Tarantool will try to store a Lua number as
floating-point if the value contains a decimal point or is very large
(greater than 100 trillion = 1e14), otherwise Tarantool will store it as an integer.
To ensure that even very large numbers are stored as integers, use the
tonumber64 function, or the LL (Long Long) suffix,
or the ULL (Unsigned Long Long) suffix.
Here are examples of numbers using regular notation, exponential notation,
the ULL suffix and the tonumber64 function:
-55, -2.7e+20, 100000000000000ULL, tonumber64('18446744073709551615').
You can also use the ffi module to specify a C type to cast the number to.
In this case, the number will be stored as cdata.
The Tarantool decimal type is stored as a MsgPack ext (Extension).
Values with the decimal type are not floating-point values although
they may contain decimal points.
They are exact with up to 38 digits of precision.
Example: a value returned by a function in the decimal module.
Introduced in v. 2.10.0.
The Tarantool datetime type facilitates operations with date and time,
accounting for leap years or the varying number of days in a month.
It is stored as a MsgPack ext (Extension).
Operations with this data type use code from c-dt, a third-party library.
For more information, see Module datetime.
Since: v. 2.10.0
The Tarantool interval type represents periods of time.
They can be added to or subtracted from datetime values or each other.
Operations with this data type use code from c-dt, a third-party library.
The type is stored as a MsgPack ext (Extension).
For more information, see Module datetime.
A string is a variable-length sequence of bytes, usually represented with
alphanumeric characters inside single quotes. In both Lua and MsgPack, strings
are treated as binary data, with no attempts to determine a string’s
character set or to perform any string conversion – unless there is an optional
collation.
So, usually, string sorting and comparison are done byte-by-byte, without any special
collation rules applied.
For example, numbers are ordered by their point on the number line, so 2345 is
greater than 500; meanwhile, strings are ordered by the encoding of the first
byte, then the encoding of the second byte, and so on, so '2345' is less than '500'.
Example: 'A, B, C'.
A bin (binary) value is not directly supported by Lua but there is
a Tarantool type varbinary. See the varbinary module reference
for details.
Example: "\65 \66 \67".
An array is represented in Lua with {...} (braces).
Examples: lists of numbers representing points in geometric figures:
{10, 11}, {3, 5, 9, 10}.
Lua tables with string keys are stored as MsgPack maps;
Lua tables with integer keys starting with 1 are stored as MsgPack arrays.
Nils may not be used in Lua tables; the workaround is to use
box.NULL.
Example: a box.space.tester:select() request will return a Lua table.
A tuple is a light reference to a MsgPack array stored in the database.
It is a special type (cdata) to avoid conversion to a Lua table on retrieval.
A few functions may return tables with multiple tuples. For tuple examples,
see box.tuple.
Values in a scalar field can be boolean, integer, unsigned, double,
number, decimal, string, uuid, or varbinary; but not array, map, or tuple.
Examples: true, 1, 'xxx'.
Values in a field of this type can be boolean, integer, unsigned, double,
number, decimal, string, uuid, varbinary, array, map, or tuple.
Examples: true, 1, 'xxx', {box.NULL, 0}.
Examples of insert requests with different field types:
To learn more about what values can be stored in indexed fields, read the
Indexes section.
By default, when Tarantool compares strings, it uses the so-called
binary collation.
It only considers the numeric value of each byte in a string.
For example, the encoding of 'A' (what used to be called the “ASCII value”) is 65,
the encoding of 'B' is 66, and the encoding of 'a' is 98.
Therefore, if the string is encoded with ASCII or UTF-8, then 'A' < 'B' < 'a'.
Binary collation is the best choice for fast deterministic simple maintenance and searching
with Tarantool indexes.
But if you want the ordering that you see in phone books and dictionaries,
then you need Tarantool’s optional collations, such as unicode and
unicode_ci, which allow for 'a' < 'A' < 'B' and 'a' == 'A' < 'B'
respectively.
The unicode and unicode_ci optional collations use the ordering according to the
Default Unicode Collation Element Table (DUCET)
and the rules described in
Unicode® Technical Standard #10 Unicode Collation Algorithm (UTS #10 UCA).
The only difference between the two collations is about
weights:
unicode collation observes L1, L2, and L3 weights (strength = ‘tertiary’);
unicode_ci collation observes only L1 weights (strength = ‘primary’), so for example 'a' == 'A' == 'á' == 'Á'.
As an example, take some Russian words:
'ЕЛЕ'
'елейный'
'ёлка'
'еловый'
'елозить'
'Ёлочка'
'ёлочный'
'ЕЛь'
'ель'
…and show the difference in ordering and selecting by index:
In all, collation involves much more than these simple examples of
upper case / lower case and accented / unaccented equivalence in alphabets.
We also consider variations of the same character, non-alphabetic writing systems,
and special rules that apply for combinations of characters.
For English, Russian, and most other languages and use cases, use the “unicode” and “unicode_ci” collations.
If you need Cyrillic letters ‘Е’ and ‘Ё’ to have the same level-1 weights,
try the Kyrgyz collation.
The tailored optional collations: for other languages, Tarantool supplies tailored collations for every
modern language that has more than a million native speakers, and
for specialized situations such as the difference between dictionary
order and telephone book order.
Run box.space._collation:select() to see the complete list.
The tailored collation names have the form
unicode_[language code]_[strength], where language code is a standard
2-character or 3-character language abbreviation, and strength is s1
for “primary strength” (level-1 weights), s2 for “secondary”, s3 for “tertiary”.
Tarantool uses the same language codes as the ones in the “list of tailorable locales” on man pages of
Ubuntu and
Fedora.
Charts explaining the precise differences from DUCET order are
in the
Common Language Data Repository.
Default values are assigned to tuple fields automatically if these fields are
skipped during the tuple insert or update.
You can specify a default value for a field in the space_object:format()
call that defines the space format. Default values apply regardless of the field nullability:
any tuple in which the field is skipped or set to nil receives
the default value.
Default values can be set in two ways: explicitly or using a function.
Explicit default values are defined in the default parameter of the field declaration
in a space_object:format() call.
local books = box.schema.space.create('books')
books:format({
{ name = 'id', type = 'number' },
{ name = 'name', type = 'string' },
{ name = 'year', type = 'number', default = 2024 },
})
books:create_index('primary', { parts = { 1 } })
To use a default value for a field, skip it or assign nil:
books:insert { 1, 'Thinking in Java' }
books:insert { 2, 'How to code in Go', nil }
Any Lua object that can be evaluated during the space_object.format() call
may be used as a default value, for example:
- a constant:
default = 100
- an initialized variable:
default = default_size
- an expression:
default = 10 + default_size
- a function return value:
default = count_default()
Important
Explicit default values are evaluated only when setting the space format.
If you use a variable as a default value, its further assignments do not affect the default value.
To change the default values, call space_object:format() again.
See also the space_object:format() reference.
A default value can be defined as a return value of a stored Lua function. To be
the default, a function must be created with box.schema.func.create()
with the function body and return one value of the field’s type. It also must not yield.
box.schema.func.create('current_year', {
language = 'Lua',
body = "function() return require('datetime').now().year end"
})
Default functions are set in the default_func parameter of the field declaration
in a space_object:format() call. To make a function with no arguments the default
for a field, specify its name:
local books = box.schema.space.create('books')
books:format({
{ name = 'id', type = 'unsigned' },
{ name = 'isbn', type = 'string' },
{ name = 'title', type = 'string' },
{ name = 'year', type = 'unsigned', default_func = 'current_year' }
})
books:create_index('primary', { parts = { 1 } })
A default function can also have one argument.
box.schema.func.create('randomize', {
language = 'Lua',
body = "function(limit) return math.random(limit.min, limit.max) end"
})
To pass the function argument when setting the default, specify it in the default parameter
of the space_object:format() call:
books:format({
{ name = 'id', type = 'unsigned', default_func= 'randomize', default = {min = 0, max = 1000} },
{ name = 'isbn', type = 'string' },
{ name = 'title', type = 'string' },
{ name = 'year', type = 'unsigned', default_func = 'current_year' }
})
Note
A key difference between a default function (default_func = 'count_default')
and a function return value used as a field default value (default = count_default())
is the following:
- A default function is called every time a default value must be produced,
that is, a tuple is inserted or updated without specifying the field.
- A return value used a field default value: the function is called once
when setting the space format. Then, all tuples receive the result of
this exact call if the field is not specified.
See also the space_object.format() reference.
For better control over stored data, Tarantool supports constraints – user-defined
limitations on the values of certain fields or entire tuples. Together with data types,
constraints allow limiting the ranges of available field values both syntactically and semantically.
For example, the field age typically has the number type, so it cannot store
strings or boolean values. However, it can still have values that don’t make sense,
such as negative numbers. This is where constraints come to help.
There are two types of constraints in Tarantool:
- Field constraints check that the value being assigned to a field
satisfies a given condition. For example,
age must be non-negative.
- Tuple constraints check complex conditions that can involve all fields of
a tuple. For example, a tuple contains a date in three fields:
year, month, and day. You can validate day values based on
the month value (and even year if you consider leap years).
Field constraints work faster, while tuple constraints allow implementing
a wider range of limitations.
Constraints use stored Lua functions or SQL expressions, which must return true when the constraint
is satisfied. Other return values (including nil) and exceptions make the
check fail and prevent tuple insertion or modification.
To create a constraint function, call box.schema.func.create() with the function definition specified in the body attribute.
Constraint functions take two parameters:
To create a constraint in a space, specify the corresponding function’s name
in the constraint parameter:
In both cases, constraint can contain multiple function names passed as a tuple.
Each constraint can have an optional name:
-- Create one more tuple constraint --
box.schema.func.create('another_constraint',
{language = 'LUA', is_deterministic = true, body = 'function(t, c) return true end'})
-- Set two constraints with optional names --
box.space.customers:alter{
constraint = { check1 = 'check_person', check2 = 'another_constraint'}
}
Note
When adding a constraint to an existing space with data, Tarantool checks it
against the stored data. If there are fields or tuples that don’t satisfy
the constraint, it won’t be applied to the space.
Foreign keys provide links between related fields, therefore maintaining the
referential integrity
of the database.
Fields can contain values that exist only in other fields. For example,
a shop order always belongs to a customer. Hence, all values of the customer
field of the orders space must also exist in the id field of the customers
space. In this case, customers is a parent space for orders (its child space).
When two spaces are linked with a foreign key, each time a tuple is inserted or
modified in the child space, Tarantool checks that a corresponding value is present
in the parent space.

Note
A foreign key can link a field to another field in the same space. In this case,
the child field must be nullable. Otherwise, it is impossible to insert
the first tuple in such a space because there is no parent tuple to which
it can link.
There are two types of foreign keys in Tarantool:
- Field foreign keys check that the value being assigned to a field
is present in a particular field of another space. For example, the
customer
value in a tuple from the orders space must match an id stored in the customers space.
- Tuple foreign keys check that multiple fields of a tuple have a match in
another space. For example, if the
orders space has fields customer_id
and customer_name, a tuple foreign key can check that the customers space
contains a tuple with both these values in the corresponding fields.
Field foreign keys work faster while tuple foreign keys allow implementing
more strict references.
Important
For each foreign key, there must exist a parent space index that includes
all its fields.
To create a foreign key in a space, specify the parent space and linked fields in the foreign_key parameter.
Parent spaces can be referenced by name or by id. When linking to the same space, the space can be omitted.
Fields can be referenced by name or by number:
Field foreign keys: when setting up the space format.
-- Create a space with a field foreign key --
box.schema.space.create('orders')
box.space.orders:format({
{name = 'id', type = 'number'},
{name = 'customer_id', foreign_key = {space = 'customers', field = 'id'}},
{name = 'price_total', type = 'number'},
})
Tuple foreign keys: when creating or altering a space. Note that for foreign
keys with multiple fields there must exist an index that includes all these fields.
-- Create a space with a tuple foreign key --
box.schema.space.create("orders", {
foreign_key = {
space = 'customers',
field = {customer_id = 'id', customer_name = 'name'}
}
})
box.space.orders:format({
{name = "id", type = "number"},
{name = "customer_id" },
{name = "customer_name"},
{name = "price_total", type = "number"},
})
Note
Type can be omitted for foreign key fields because it’s
defined in the parent space.
Foreign keys can have an optional name.
-- Set a foreign key with an optional name --
box.space.orders:alter{
foreign_key = {
customer = {
space = 'customers',
field = { customer_id = 'id', customer_name = 'name'}
}
}
}
A space can have multiple tuple foreign keys. In this case, they all must have names.
-- Set two foreign keys: names are mandatory --
box.space.orders:alter{
foreign_key = {
customer = {
space = 'customers',
field = {customer_id = 'id', customer_name = 'name'}
},
item = {
space = 'items',
field = {item_id = 'id'}
}
}
}
Tarantool performs integrity checks upon data modifications in parent spaces.
If you try to remove a tuple referenced by a foreign key or an entire parent space,
you will get an error.
Important
Renaming parent spaces or referenced fields may break the corresponding foreign
keys and prevent further insertions or modifications in the child spaces.
Indexes
An index is a special data structure that stores a group of key values and
pointers. It is used for efficient manipulations with data.
As with spaces, you should specify the index name and let Tarantool
come up with a unique numeric identifier (“index id”).
An index always has a type. The default index type is TREE.
TREE indexes are provided by all Tarantool engines, can index unique and
non-unique values, support partial key searches, comparisons, and ordered results.
Additionally, the memtx engine supports HASH,
RTREE and BITSET indexes.
An index may be multi-part, that is, you can declare that an index key value
is composed of two or more fields in the tuple, in any order.
For example, for an ordinary TREE index, the maximum number of parts is 255.
An index may be unique, that is, you can declare that it would be illegal
to have the same key value twice.
The first index defined on a space is called the primary key index,
and it must be unique. All other indexes are called secondary indexes,
and they may be non-unique.
Indexes have certain limitations. See details on page Limitations.
To create a generator for indexes, you can use a sequence object.
Learn how to do it in the tutorial.
Not to be confused with index types – the types of the data structure that is an index.
See more about index types below.
Indexes restrict values that Tarantool can store with MsgPack.
This is why, for example, 'unsigned' and 'integer' are different field types,
although in MsgPack they are both stored as integer values.
An 'unsigned' index contains only non-negative integer values,
while an ‘integer’ index contains any integer values.
The default field type is 'unsigned' and the default index type is TREE.
Although 'nil' is not a legal indexed field type, indexes may contain nil
as a non-default option.
To learn more about field types, check the
Field type details section.
| Field type name string |
Field type |
Index type |
'boolean' |
boolean |
TREE or HASH |
'integer' (may also be called 'int') |
integer, which may include unsigned values |
TREE or HASH |
'unsigned' (may also be called 'uint' or 'num', but 'num' is deprecated) |
unsigned |
TREE, BITSET, or HASH |
'double' |
double |
TREE or HASH |
'number' |
number, which may include
integer, double,
or decimal values |
TREE or HASH |
'decimal' |
decimal |
TREE or HASH |
'string' (may also be called 'str') |
string |
TREE, BITSET, or HASH |
'varbinary' |
varbinary |
TREE, HASH, or BITSET (since version 2.7.1) |
'uuid' |
uuid |
TREE or HASH |
'datetime' |
datetime |
TREE |
'array' |
array |
RTREE |
'map' |
table |
Cannot be indexed |
'scalar' |
may include nil,
boolean,
integer,
unsigned,
number,
decimal,
string,
varbinary,
or uuid values |
When a scalar field contains values of
different underlying types, the key order
is: nils, then booleans, then numbers,
then strings, then varbinaries, then
uuids. |
TREE or HASH |
An index always has a type. Different types are intended for different
usage scenarios.
We give an overview of index features in the following table:
| Feature |
TREE |
HASH |
RTREE |
BITSET |
| unique |
+ |
+ |
- |
- |
| non-unique |
+ |
- |
+ |
+ |
| is_nullable |
+ |
- |
- |
- |
| can be multi-part |
+ |
+ |
- |
- |
| multikey |
+ |
- |
- |
- |
| partial-key search |
+ |
- |
- |
- |
| can be primary key |
+ |
+ |
- |
- |
exclude_null (version 2.8+) |
+ |
- |
- |
- |
| Pagination (the after option) |
+ |
- |
- |
- |
| iterator types |
ALL, EQ, REQ, GT, GE, LT, LE |
ALL, EQ |
ALL, EQ, GT, GE, LT, LE, OVERLAPS, NEIGHBOR |
ALL, EQ, BITS_ALL_SET, BITS_ANY_SET, BITS_ALL_NOT_SET |
Note
In 2.11.0, the GT index type is deprecated for HASH indexes.
The default index type is ‘TREE’.
TREE indexes are provided by memtx and vinyl engines, can index unique and
non-unique values, support partial key searches, comparisons and ordered results.
This is a universal type of indexes, for most cases it will be the best choice.
Additionally, memtx engine supports HASH, RTREE and BITSET indexes.
HASH indexes require unique fields and loses to TREE in almost all respects.
So we do not recommend to use it in the applications.
HASH is now present in Tarantool mainly because of backward compatibility.
Here are some tips. Do not use HASH index:
- just if you want to
- if you think that HASH is faster with no performance metering
- if you want to iterate over the data
- for primary key
- as an only index
Use HASH index:
- if it is a secondary key
- if you 100% won’t need to make it non-unique
- if you have taken measurements on your data and you see an accountable
increase in performance
- if you save every byte on tuples (HASH is a little more compact)
RTREE is a multidimensional index supporting up to 20 dimensions.
It is used especially for indexing spatial information, such as geographical
objects. In this example we demonstrate spatial searches
via RTREE index.
RTREE index could not be primary, and could not be unique.
The option list of this type of index may contain dimension and distance options.
The parts definition must contain the one and only part with type array.
RTREE index can accept two types of distance functions: euclid and manhattan.
Warning
Currently, the isolation level of RTREE indexes
in MVCC transaction mode is read-committed (not serializable, as stated).
If a transaction uses these indexes, it can read committed or confirmed data (depending on the isolation level).
However, the indexes are subject to different anomalies that can make them unserializable.
Example 1:
my_space = box.schema.create_space("tester")
my_space:format({ { type = 'number', name = 'id' }, { type = 'array', name = 'content' } })
hash_index = my_space:create_index('primary', { type = 'tree', parts = {'id'} })
rtree_index = my_space:create_index('spatial', { type = 'RTREE', unique = false, parts = {'content'} })
Corresponding tuple field thus must be an array of 2 or 4 numbers.
2 numbers mean a point {x, y};
4 numbers mean a rectangle {x1, y1, x2, y2},
where (x1, y1) and (x2, y2) - diagonal point of the rectangle.
my_space:insert{1, {1, 1}}
my_space:insert{2, {2, 2, 3, 3}}
Selection results depend on a chosen iterator.
The default EQ iterator searches for an exact rectangle,
a point is treated as zero width and height rectangle:
Iterator ALL, which is the default when no key is specified,
selects all tuples in arbitrary order:
Iterator LE (less or equal) searches for tuples with their rectangles
within a specified rectangle:
Iterator LT (less than, or strictly less) searches for tuples
with their rectangles strictly within a specified rectangle:
Iterator GE searches for tuples with a specified rectangle within their rectangles:
Iterator GT searches for tuples with a specified rectangle strictly within their rectangles:
Iterator OVERLAPS searches for tuples with their rectangles overlapping specified rectangle:
Iterator NEIGHBOR searches for all tuples and orders them by distance to the specified point:
Example 2:
3D, 4D and more dimensional RTREE indexes work in the same way as 2D except
that user must specify more coordinates in requests.
Here’s short example of using 4D tree:
Note
Keep in mind that select NEIGHBOR iterator with unset limits extracts
the entire space in order of increasing distance. And there can be
tons of data, and this can affect the performance.
And another frequent mistake is to specify iterator type without quotes,
in such way: rtree_index:select(rect, {iterator = LE}).
This leads to silent EQ select, because LE is undefined variable and
treated as nil, so iterator is unset and default used.
Bitset is a bit mask. You should use it when you need to search by bit masks.
This can be, for example, storing a vector of attributes and searching by these
attributes.
Warning
Currently, the isolation level of BITSET indexes
in MVCC transaction mode is read-committed (not serializable, as stated).
If a transaction uses these indexes, it can read committed or confirmed data (depending on the isolation level).
However, the indexes are subject to different anomalies that can make them unserializable.
Example 1:
The following script shows creating and searching with a BITSET index.
Notice that BITSET cannot be unique, so first a primary-key index is created,
and bit values are entered as hexadecimal literals for easier reading.
Example 2:
The result will be:
because (7 AND 2) is not equal to 0, and (3 AND 2) is not equal to 0.
Additionally, there exist
index iterator operations.
They can only be used with code in Lua and C/C++. Index iterators are for
traversing indexes one key at a time, taking advantage of features that are
specific to an index type.
For example, they can be used for evaluating Boolean expressions when
traversing BITSET indexes, or for going in descending order when traversing TREE
indexes.
Using indexes
It is mandatory to create an index for a space before trying to insert
tuples into the space, or select tuples from the space.
The simple index-creation
operation is:
box.space.space-name:create_index('index-name')
This creates a unique TREE index on the first field
of all tuples (often called “Field#1”), which is assumed to be numeric.
A recommended design pattern for a data model is to base primary keys on the
first fields of a tuple. This speeds up tuple comparison due to the specifics of
data storage and the way comparisons are arranged in Tarantool.
The simple SELECT request is:
box.space.space-name:select(value)
This looks for a single tuple via the first index. Since the first index
is always unique, the maximum number of returned tuples will be 1.
You can call select() without arguments, and it will return all tuples.
Be careful! Using select() for huge spaces hangs your instance.
An index definition may also include identifiers of tuple fields
and their expected types. See allowed indexed field types in section
Details about indexed field types:
box.space.space-name:create_index(index-name, {type = 'tree', parts = {{field = 1, type = 'unsigned'}}}
Space definitions and index definitions are stored permanently in Tarantool’s
system spaces _space and _index.
Tip
See full information about creating indexes, such as
how to create a multikey index, an index using the path option, or
how to create a functional index in our reference for
space_object:create_index().
Index operations are automatic: if a data manipulation request changes a tuple,
then it also changes the index keys defined for the tuple.
Create a sample space named bands:
bands = box.schema.space.create('bands')
Format the created space by specifying field names and types:
box.space.bands:format({
{ name = 'id', type = 'unsigned' },
{ name = 'band_name', type = 'string' },
{ name = 'year', type = 'unsigned' }
})
Create the primary index (named primary):
box.space.bands:create_index('primary', { parts = { 'id' } })
This index is based on the id field of each tuple.
Insert some tuples into the space:
box.space.bands:insert { 1, 'Roxette', 1986 }
box.space.bands:insert { 2, 'Scorpions', 1965 }
box.space.bands:insert { 3, 'Ace of Base', 1987 }
box.space.bands:insert { 4, 'The Beatles', 1960 }
box.space.bands:insert { 5, 'Pink Floyd', 1965 }
box.space.bands:insert { 6, 'The Rolling Stones', 1962 }
box.space.bands:insert { 7, 'The Doors', 1965 }
box.space.bands:insert { 8, 'Nirvana', 1987 }
box.space.bands:insert { 9, 'Led Zeppelin', 1968 }
box.space.bands:insert { 10, 'Queen', 1970 }
Create secondary indexes:
-- Create a unique secondary index --
box.space.bands:create_index('band', { parts = { 'band_name' } })
-- Create a non-unique secondary index --
box.space.bands:create_index('year', { parts = { { 'year' } }, unique = false })
Create a multi-part index with two parts:
box.space.bands:create_index('year_band', { parts = { { 'year' }, { 'band_name' } } })
There are the following SELECT variations:
The search can use comparisons other than equality:
-- Select maximum 3 tuples with the key value greater than 1965 --
select_greater = bands.index.year:select({ 1965 }, { iterator = 'GT', limit = 3 })
--[[
---
- - [9, 'Led Zeppelin', 1968]
- [10, 'Queen', 1970]
- [1, 'Roxette', 1986]
...
--]]
The comparison operators are:
LT for “less than”
LE for “less than or equal”
GT for “greater”
GE for “greater than or equal”
EQ for “equal”
REQ for “reversed equal”
Value comparisons make sense if and only if the index type is TREE.
The iterator types for other types of indexes are slightly different and work
differently. See details in section Iterator types.
Note that we don’t use the name of the index, which means we use primary index here.
This type of search may return more than one tuple. The tuples will be sorted
in descending order by key if the comparison operator is LT or LE or REQ.
Otherwise they will be sorted in ascending order.
The search can use a secondary index.
-- Select a tuple by the specified secondary key value --
select_secondary = bands.index.band:select { 'The Doors' }
--[[
---
- - [7, 'The Doors', 1965]
...
--]]
Partial key search: The search may be for some key parts starting with
the prefix of the key. Note that partial key searches are available
only in TREE indexes.
-- Select tuples by the specified partial key value --
select_multipart_partial = bands.index.year_band:select { 1965 }
--[[
---
- - [5, 'Pink Floyd', 1965]
- [2, 'Scorpions', 1965]
- [7, 'The Doors', 1965]
...
--]]
The search can be for all fields, using a table as the value:
-- Select a tuple by the specified multi-part secondary key value --
select_multipart = bands.index.year_band:select { 1960, 'The Beatles' }
--[[
---
- - [4, 'The Beatles', 1960]
...
--]]
Tip
You can also add, drop, or alter the definitions at runtime, with some
restrictions. Read more about index operations in reference for
box.index submodule.
Tuple compression
Tuple compression, introduced in Tarantool Enterprise Edition 2.10.0, aims to save memory space.
Typically, it decreases the volume of stored data by 15%.
However, the exact volume saved depends on the type of data.
The following compression algorithms are supported:
To learn about the performance costs of each algorithm,
check Tuple compression performance.
Tarantool doesn’t compress tuples themselves, just the fields inside these tuples.
You can only compress non-indexed fields.
Compression works best when JSON is stored in the field.
Note
The compress module provides the API for compressing and decompressing data.
Enabling compression for a new space
First, create a space:
box.schema.space.create('bands')
Then, create an index for this space, for example:
box.space.bands:create_index('primary', {parts = {{1, 'unsigned'}}})
Create a format to declare field names and types.
In the example below, the band_name and year fields have the zstd and lz4 compression formats, respectively.
The first field (id) has the index, so it cannot be compressed.
box.space.bands:format({
{name = 'id', type = 'unsigned'},
{name = 'band_name', type = 'string', compression = 'zstd'},
{name = 'year', type = 'unsigned', compression = 'lz4'}
})
Now, the new tuples that you add to the space bands will be compressed.
When you read a compressed tuple, you do not need to decompress it back yourself.
Checking which fields are compressed
To check which fields in a space are compressed, run
space_object:format() on the space.
If a field is compressed, the format includes the compression algorithm, for example:
Enabling compression for existing spaces
You can enable compression for existing fields.
All the tuples added after that will have this field compressed.
However, this doesn’t affect the tuples already stored in the space.
You need to make the snapshot and restart Tarantool to compress the existing tuples.
Here’s an example of how to compress existing fields:
Create a space without compression and add several tuples:
box.schema.space.create('bands')
box.space.bands:format({
{ name = 'id', type = 'unsigned' },
{ name = 'band_name', type = 'string' },
{ name = 'year', type = 'unsigned' }
})
box.space.bands:create_index('primary', { parts = { 'id' } })
box.space.bands:insert { 1, 'Roxette', 1986 }
box.space.bands:insert { 2, 'Scorpions', 1965 }
box.space.bands:insert { 3, 'Ace of Base', 1987 }
box.space.bands:insert { 4, 'The Beatles', 1960 }
Suppose that you want fields 2 and 3 to be compressed from now on.
To enable compression, change the format as follows:
local new_format = box.space.bands:format()
new_format[2].compression = 'zstd'
new_format[3].compression = 'lz4'
box.space.bands:format(new_format)
From now on, all the tuples that you add to the space have fields 2 and 3 compressed.
To finalize the change, create a snapshot by running
box.snapshot() and restart Tarantool.
As a result, all old tuples will also be compressed in memory during recovery.
Note
space:upgrade() provides the ability to enable compression
and update the existing tuples in the background.
To achieve this, you need to pass a new space format in the format argument of space:upgrade().
Data schema description
In Tarantool, the use of a data schema is optional.
When creating a space, you do not have to define a data schema. In this case,
the tuples store random data. This rule does not apply to indexed fields.
Such fields must contain data of the same type.
You can define a data schema when creating a space. Read more in the description of the
box.schema.space.create() function.
If you have already created a space without specifying a data schema, you can do it later using
space_object:format().
After the data schema is defined, all the data is validated by type. Before any insert or update,
you will get an error if the data types do not match.
We recommend using a data schema because it helps avoid mistakes.
In Tarantool, you can define a data schema in two different ways.
Data schema description in a code file
The code file is usually called init.lua and contains the following schema description:
box.cfg()
users = box.schema.create_space('users', { if_not_exists = true })
users:format({{ name = 'user_id', type = 'number'}, { name = 'fullname', type = 'string'}})
users:create_index('pk', { parts = { { field = 'user_id', type = 'number'}}})
This is quite simple: when you run tarantool, it executes this code and creates
a data schema. To run this file, use:
However, it may seem complicated if you do not plan to dive deep into the Lua language and its syntax.
Possible difficulty: the snippet above has a function call with a colon: users:format.
It is used to pass the users variable as the first argument
of the format function.
This is similar to self in object-based languages.
So it might be more convenient for you to describe the data schema with YAML.
Data schema description using the DDL module
The DDL module allows you to describe a data schema
in the YAML format in a declarative way.
The schema would look something like this:
spaces:
users:
engine: memtx
is_local: false
temporary: false
format:
- {name: user_id, type: uuid, is_nullable: false}
- {name: fullname, type: string, is_nullable: false}
- {name: bucket_id, type: unsigned, is_nullable: false}
indexes:
- name: user_id
unique: true
parts: [{path: user_id, type: uuid, is_nullable: false}]
type: HASH
- name: bucket_id
unique: false
parts: [{path: bucket_id, type: unsigned, is_nullable: false}]
type: TREE
sharding_key: [user_id]
sharding_func: test_module.sharding_func
This alternative is simpler to use, and you do not have to dive deep into Lua.
To use the DDL module, put the following Lua code into the file that you use to run Tarantool.
This file is usually called init.lua.
local yaml = require('yaml')
local ddl = require('ddl')
box.cfg{}
local fh = io.open('ddl.yml', 'r')
local schema = yaml.decode(fh:read('*all'))
fh:close()
local ok, err = ddl.check_schema(schema)
if not ok then
print(err)
end
local ok, err = ddl.set_schema(schema)
if not ok then
print(err)
end
Warning
It is forbidden to modify the data schema in DDL after it has been applied.
For migration, there are different scenarios described in the Migrations section.
Operations
The basic data operations supported in Tarantool are:
- five data-manipulation operations (INSERT, UPDATE, UPSERT, DELETE, REPLACE), and
- one data-retrieval operation (SELECT).
All of them are implemented as functions in box.space submodule.
Examples:
INSERT: Add a new tuple to space ‘tester’.
The first field, field[1], will be 999 (MsgPack type is integer).
The second field, field[2], will be ‘Taranto’ (MsgPack type is string).
UPDATE: Update the tuple, changing field field[2].
The clause “{999}”, which has the value to look up in the index of the tuple’s
primary-key field, is mandatory, because update() requests must always have
a clause that specifies a unique key, which in this case is field[1].
The clause “{{‘=’, 2, ‘Tarantino’}}” specifies that assignment will happen to
field[2] with the new value.
UPSERT: Upsert the tuple, changing field field[2]
again.
The syntax of upsert() is similar to the syntax of update(). However,
the execution logic of these two requests is different.
UPSERT is either UPDATE or INSERT, depending on the database’s state.
Also, UPSERT execution is postponed until after transaction commit, so, unlike
update(), upsert() doesn’t return data back.
REPLACE: Replace the tuple, adding a new field.
This is also possible with the update() request, but the update()
request is usually more complicated.
SELECT: Retrieve the tuple.
The clause “{999}” is still mandatory, although it does not have to mention
the primary key.
DELETE: Delete the tuple.
In this example, we identify the primary-key field.
Summarizing the examples:
- Functions
insert and replace accept a tuple
(where a primary key comes as part of the tuple).
- Function
upsert accepts a tuple
(where a primary key comes as part of the tuple),
and also the update operations to execute.
- Function
delete accepts a full key of any unique index
(primary or secondary).
- Function
update accepts a full key of any unique index
(primary or secondary),
and also the operations to execute.
- Function
select accepts any key: primary/secondary, unique/non-unique,
full/partial.
See reference on box.space for more
details on using data operations.
In reference for box.space and
Submodule box.index
submodules, there are notes about which complexity factors might affect the
resource usage of each function.
| Complexity factor |
Effect |
| Index size |
The number of index keys is the same as the number
of tuples in the data set. For a TREE index, if
there are more keys, then the lookup time will be
greater, although, of course, the effect is not
linear. For a HASH index, if there are more keys,
then there is more RAM used, but the number of
low-level steps tends to remain constant. |
| Index type |
Typically, a HASH index is faster than a TREE index
if the number of tuples in the space is greater
than one. |
| Number of indexes accessed |
Ordinarily, only one index is accessed to retrieve
one tuple. But to update the tuple, there must be N
accesses if the space has N different indexes.
Note regarding storage engine: Vinyl optimizes away such
accesses if secondary index fields are unchanged by
the update. So, this complexity factor applies only to
memtx, since it always makes a full-tuple copy on every
update. |
| Number of tuples accessed |
A few requests, for example, SELECT, can retrieve
multiple tuples. This factor is usually less
important than the others. |
| WAL settings |
The important setting for the write-ahead log is
wal.mode.
If the setting causes no writing or
delayed writing, this factor is unimportant. If the
setting causes every data-change request to wait
for writing to finish on a slow device, this factor
is more important than all the others. |
CRUD operation examples
This section shows basic usage scenarios and typical errors for each
data operation in Tarantool:
INSERT,
DELETE,
UPDATE,
UPSERT,
REPLACE, and
SELECT.
Before trying out the examples, you need to bootstrap a Tarantool instance as shown below.
-- Create a space --
bands = box.schema.space.create('bands')
-- Specify field names and types --
box.space.bands:format({
{ name = 'id', type = 'unsigned' },
{ name = 'band_name', type = 'string' },
{ name = 'year', type = 'unsigned' }
})
-- Create a primary index --
box.space.bands:create_index('primary', { parts = { 'id' } })
-- Create a unique secondary index --
box.space.bands:create_index('band', { parts = { 'band_name' } })
-- Create a non-unique secondary index --
box.space.bands:create_index('year', { parts = { { 'year' } }, unique = false })
-- Create a multi-part index --
box.space.bands:create_index('year_band', { parts = { { 'year' }, { 'band_name' } } })
The space_object.insert method accepts a well-formatted tuple.
insert also checks all the keys for duplicates.
space_object.update allows you to update a tuple identified by the primary key.
Similarly to delete, the update method accepts a full key and also an operation to execute.
index_object.update updates a tuple identified by the specified unique index.
space_object.upsert updates an existing tuple or inserts a new one:
- If the existing tuple is found by the primary key,
Tarantool applies the update operation to this tuple
and ignores the new tuple.
- If no existing tuple is found,
Tarantool inserts the new tuple and ignores the update operation.
upsert acts as insert when no existing tuple is found by the primary key.
upsert searches for the existing tuple by the primary index,
not by the secondary index. This can lead to a duplication error
if the tuple violates a secondary index uniqueness.
space_object.replace accepts a well-formatted tuple and searches for the existing tuple
by the primary key of the new tuple:
- If the existing tuple is found, Tarantool deletes it and inserts the new tuple.
- If no existing tuple is found, Tarantool inserts the new tuple.
replace can violate unique constraints, like upsert does.
The space_object.select request searches for a tuple or a set of tuples in the given space
by the primary key.
To search by the specified index, use index_object.select.
These methods work with any keys, including unique and non-unique, full and partial.
If a key is partial, select searches by all keys where the prefix matches the specified key part.
Using box.space functions to read _space tuples
This example illustrates how to look at all the spaces, and for each
display: approximately how many tuples it contains, and the first field of
its first tuple. The function uses the Tarantool’s box.space functions len()
and pairs(). The iteration through the spaces is coded as a scan of the
_space system space, which contains metadata. The third field in
_space contains the space name, so the key instruction
space_name = v[3] means space_name is the space_name field in
the tuple of _space that we’ve just fetched with pairs(). The function
returns a table:
function example()
local tuple_count, space_name, line
local ta = {}
for k, v in box.space._space:pairs() do
space_name = v[3]
if box.space[space_name].index[0] ~= nil then
tuple_count = '1 or more'
else
tuple_count = '0'
end
line = space_name .. ' tuple_count =' .. tuple_count
if tuple_count == '1 or more' then
for k1, v1 in box.space[space_name]:pairs() do
line = line .. '. first field in first tuple = ' .. v1[1]
break
end
end
table.insert(ta, line)
end
return ta
end
The output below shows what happens if you invoke this function:
Using box.space functions to organize a _space tuple
This examples shows how to display field names and field types of a system space –
using metadata to find metadata.
To begin: how can one select the _space tuple that describes _space?
A simple way is to look at the constants in box.schema,
which shows that there is an item named SPACE_ID == 288,
so these statements retrieve the correct tuple:
box.space._space:select{ 288 }
-- or --
box.space._space:select{ box.schema.SPACE_ID }
Another way is to look at the tuples in box.space._index,
which shows that there is a secondary index named ‘name’ for a space
number 288, so this statement also retrieve the correct tuple:
box.space._space.index.name:select{ '_space' }
However, the retrieved tuple is not easy to read:
It looks disorganized because field number 7 has been formatted with recommended
names and data types. How can one get those specific sub-fields? Since it’s
visible that field number 7 is an array of maps, this for loop will do the
organizing:
Using sequences
A sequence is a generator of ordered integer values.
As with spaces and indexes, you should specify the sequence name and let
Tarantool generate a unique numeric identifier (sequence ID).
As well, you can specify several options when creating a new sequence.
The options determine the values that are generated whenever the sequence is used.
Options for box.schema.sequence.create()
| Option name |
Type and meaning |
Default |
Examples |
start |
Integer. The value to generate the first time a sequence is used |
1 |
start=0 |
min |
Integer. Values smaller than this cannot be generated |
1 |
min=-1000 |
max |
Integer. Values larger than this cannot be generated |
9223372036854775807 |
max=0 |
cycle |
Boolean. Whether to start again when values cannot be generated |
false |
cycle=true |
cache |
Integer. The number of values to store in a cache |
0 |
cache=0 |
step |
Integer. What to add to the previous generated value, when generating a new value |
1 |
step=-1 |
if_not_exists |
Boolean. If this is true and a sequence with this name exists already,
ignore other options and use the existing values |
false |
if_not_exists=true |
Once a sequence exists, it can be altered, dropped, reset, forced to generate
the next value, or associated with an index.
Associating a sequence with an index
First, create a sequence:
-- Create a sequence --
box.schema.sequence.create('id_seq',{min=1000, start=1000})
--[[
---
- step: 1
id: 1
min: 1000
cache: 0
uid: 1
cycle: false
name: id_seq
start: 1000
max: 9223372036854775807
...
--]]
The result shows that the new sequence has all default values,
except for the two that were specified, min and start.
Get the next value from the sequence by calling the next() function:
-- Get the next item --
box.sequence.id_seq:next()
--[[
---
- 1000
...
--]]
The result is the same as the start value. The next call increases the value
by one (the default sequence step).
Create a space and specify that its primary key should be
generated from the sequence:
-- Create a space --
box.schema.space.create('customers')
-- Create an index that uses the sequence --
box.space.customers:create_index('primary',{ sequence = 'id_seq' })
--[[
---
- parts:
- type: unsigned
is_nullable: false
fieldno: 1
sequence_id: 1
id: 0
space_id: 513
unique: true
hint: true
type: TREE
name: primary
sequence_fieldno: 1
...
--]]
Insert a tuple without specifying a value for the primary key:
-- Insert a tuple without the primary key value --
box.space.customers:insert{ nil, 'Adams' }
--[[
---
- [1001, 'Adams']
...
--]]
The result is a new tuple where the first field is assigned the next value from
the sequence. This arrangement, where the system automatically generates the
values for a primary key, is sometimes called “auto-incrementing”
or “identity”.
For syntax and implementation details, see the reference for
box.schema.sequence.
Migrations
Migration refers to any change in a data schema: adding or removing a field,
creating or dropping an index, changing a field format, and so on. Space creation
is also a migration. Using migrations, you can track the evolution of your
data schema since its initial state. In Tarantool, migrations are presented as Lua
code that alters the data schema using the built-in Lua API.
There are two types of migrations:
- simple migrations don’t require additional actions on existing data
- complex migrations include both schema and data changes
There are two types of schema migration that do not require data migration:
Creating an index. A new index can be created at any time. To learn more about
index creation, see Indexes and the space_object:create_index() reference.
Adding a field to the end of a space. To add a field, update the space format so
that it includes all its fields and also the new field. For example:
local users = box.space.writers
local fmt = users:format()
table.insert(fmt, { name = 'age', type = 'number', is_nullable = true })
users:format(fmt)
The field must have the is_nullable parameter. Otherwise, an error occurs
if the space contains tuples of old format.
Note
After creating a new field, you probably want to fill it with data.
The tarantool/moonwalker
module is useful for this task.
Other types of migrations are more complex and require additional actions to
maintain data consistency.
Migrations are possible in two cases:
- When Tarantool starts, and no client uses the database yet
- During request processing, when active clients are already using the database
For the first case, it is enough to write and test the migration code.
The most difficult task is to migrate data when there are active clients.
You should keep it in mind when you initially design the data schema.
We identify the following problems if there are active clients:
- Associated data can change atomically.
- The system should be able to transfer data using both the new schema and the old one.
- When data is being transferred to a new space, data access should consider
that the data might be in one space or another.
- Write requests must not interfere with the migration.
A common approach is to write according to the new data schema.
These issues may or may not be relevant depending on your application and
its availability requirements.
Tarantool offers the following features that make migrations easier and safer:
- Transaction mechanism. It is useful when writing a migration,
because it allows you to work with the data atomically. But before using
the transaction mechanism, you should explore its limitations.
For details, see the section about transactions.
space:upgrade() function (EE only). With the help of space:upgrade(),
you can enable compression and migrate, including already created tuples.
For details, check the Upgrading space schema section.
- Centralized migration management mechanism (EE only). Implemented
in the Enterprise version of the tt utility and in Tarantool Cluster Manager,
this mechanism enables migration execution and tracking in the replication
clusters. For details, see Centralized migration management.
The migration code is executed on a running Tarantool instance.
Important: no method guarantees you transactional application of migrations
on the whole cluster.
Method 1: include migrations in the application code
This is quite simple: when you reload the code, the data is migrated at the right moment,
and the database schema is updated.
However, this method may not work for everyone.
You may not be able to restart Tarantool or update the code using the hot-reload mechanism.
Method 2: the tt utility
Connect to the necessary instance using tt connect.
$ tt connect admin:password@localhost:3301
If your migration is written in a Lua file, you can execute it
using dofile(). Call this function and specify the path to the
migration file as the first argument. It looks like this:
(or) Copy the migration script code,
paste it into the console, and run it.
You can also connect to the instance and execute the migration script in a single call:
$ tt connect admin:password@localhost:3301 -f 0001-delete-space.lua
Centralized migration management
Enterprise Edition
Centralized migration management is available in the Enterprise Edition only.
Tarantool EE offers a mechanism for centralized migration management in replication
clusters that use etcd as a configuration storage.
The mechanism uses the same etcd storage to store migrations and applies them
across the entire Tarantool cluster. This ensures migration consistency
in the cluster and enables migration history tracking.
The centralized migration management mechanism is implemented in the Enterprise
version of the tt utility and in Tarantool Cluster Manager.
To learn how to manage migrations in Tarantool EE clusters from the command line,
see Centralized migrations with tt. To learn how to use the mechanism from the TCM
web interface, see the Performing migrations TCM documentation page.
Centralized migrations with tt
Example on GitHub: migrations
In this section, you learn to use the centralized migration management mechanism
implemented in the Enterprise Edition of the tt utility.
The section includes the following tutorials:
See also:
Basic tt migrations tutorial
Example on GitHub: migrations
In this tutorial, you learn to define the cluster data schema using the centralized
migration management mechanism implemented in the Enterprise Edition of the tt utility.
Before starting this tutorial:
The centralized migration mechanism works with Tarantool EE clusters that:
- use etcd as a centralized configuration storage
- use the CRUD module or its Enterprise
version for data distribution
First, start up an etcd instance to use as a configuration storage:
etcd runs on the default port 2379.
Optionally, enable etcd authentication by executing the following script:
#!/usr/bin/env bash
etcdctl user add root:topsecret
etcdctl role add app_config_manager
etcdctl role grant-permission app_config_manager --prefix=true readwrite /myapp/
etcdctl user add app_user:config_pass
etcdctl user grant-role app_user app_config_manager
etcdctl auth enable
It creates an etcd user app_user with read and write permissions to the /myapp
prefix, in which the cluster configuration will be stored. The user’s password is config_pass.
Note
If you don’t enable etcd authentication, make tt migrations calls without
the configuration storage credentials.
Initialize a tt environment:
In the instances.enabled directory, create the myapp directory.
Go to the instances.enabled/myapp directory and create application files:
router-001-a:
storage-001-a:
storage-001-b:
storage-002-a:
storage-002-b:
config:
etcd:
endpoints:
- http://localhost:2379
prefix: /myapp/
username: app_user
password: config_pass
http:
request:
timeout: 3
myapp-scm-1.rockspec:
package = 'myapp'
version = 'scm-1'
source = {
url = '/dev/null',
}
dependencies = {
'crud == 1.5.2',
}
build = {
type = 'none';
}
Create the source.yaml with a cluster configuration to publish to etcd:
Note
This configuration describes a typical CRUD-enabled sharded cluster with
one router and two storage replica sets, each including one master and one read-only replica.
credentials:
users:
client:
password: 'secret'
roles: [super]
replicator:
password: 'secret'
roles: [replication]
storage:
password: 'secret'
roles: [sharding]
iproto:
advertise:
peer:
login: replicator
sharding:
login: storage
sharding:
bucket_count: 3000
groups:
routers:
sharding:
roles: [router]
roles: [roles.crud-router]
replicasets:
router-001:
instances:
router-001-a:
iproto:
listen:
- uri: localhost:3301
advertise:
client: localhost:3301
storages:
sharding:
roles: [storage]
roles: [roles.crud-storage]
replication:
failover: manual
replicasets:
storage-001:
leader: storage-001-a
instances:
storage-001-a:
iproto:
listen:
- uri: localhost:3302
advertise:
client: localhost:3302
storage-001-b:
iproto:
listen:
- uri: localhost:3303
advertise:
client: localhost:3303
storage-002:
leader: storage-002-a
instances:
storage-002-a:
iproto:
listen:
- uri: localhost:3304
advertise:
client: localhost:3304
storage-002-b:
iproto:
listen:
- uri: localhost:3305
advertise:
client: localhost:3305
Publish the configuration to etcd:
$ tt cluster publish "http://app_user:config_pass@localhost:2379/myapp/" source.yaml
The full cluster code is available on GitHub here: migrations.
Building and starting the cluster
Build the application:
Start the cluster:
To check that the cluster is up and running, use tt status:
Bootstrap vshard in the cluster:
$ tt replicaset vshard bootstrap myapp
To perform migrations in the cluster, write them in Lua and publish to the cluster’s
etcd configuration storage.
Each migration file must return a Lua table with one object named apply.
This object has one field – scenario – that stores the migration function:
local function apply_scenario()
-- migration code
end
return {
apply = {
scenario = apply_scenario,
},
}
The migration unit is a single file: its scenario is executed as a whole. An error
that happens in any step of the scenario causes the entire migration to fail.
Migrations are executed in the lexicographical order. Thus, it’s convenient to
use filenames that start with ordered numbers to define the migrations order, for example:
000001_create_space.lua
000002_create_index.lua
000003_alter_space.lua
The default location where tt searches for migration files is /migrations/scenario.
Create this subdirectory inside the tt environment. Then, create two migration files:
000001_create_writers_space.lua: create a space, define its format, and
create a primary index.
local helpers = require('tt-migrations.helpers')
local function apply_scenario()
local space = box.schema.space.create('writers')
space:format({
{name = 'id', type = 'number'},
{name = 'bucket_id', type = 'number'},
{name = 'name', type = 'string'},
{name = 'age', type = 'number'},
})
space:create_index('primary', {parts = {'id'}})
space:create_index('bucket_id', {parts = {'bucket_id'}})
helpers.register_sharding_key('writers', {'id'})
end
return {
apply = {
scenario = apply_scenario,
},
}
Note
Note the usage of the tt-migrations.helpers module.
In this example, its function register_sharding_key is used
to define a sharding key for the space.
000002_create_writers_index.lua: add one more index.
local function apply_scenario()
local space = box.space['writers']
space:create_index('age', {parts = {'age'}})
end
return {
apply = {
scenario = apply_scenario,
},
}
To publish migrations to the etcd configuration storage, run tt migrations publish:
$ tt migrations publish "http://app_user:config_pass@localhost:2379/myapp"
• 000001_create_writes_space.lua: successfully published to key "000001_create_writes_space.lua"
• 000002_create_writers_index.lua: successfully published to key "000002_create_writers_index.lua"
To apply published migrations to the cluster, run tt migrations apply providing
a cluster user’s credentials:
$ tt migrations apply "http://app_user:config_pass@localhost:2379/myapp" \
--tarantool-username=client --tarantool-password=secret
Important
The cluster user must have enough access privileges to execute the migrations code.
The output should look as follows:
• router-001:
• 000001_create_writes_space.lua: successfully applied
• 000002_create_writers_index.lua: successfully applied
• storage-001:
• 000001_create_writes_space.lua: successfully applied
• 000002_create_writers_index.lua: successfully applied
• storage-002:
• 000001_create_writes_space.lua: successfully applied
• 000002_create_writers_index.lua: successfully applied
The migrations are applied on all replica set leaders. Read-only replicas
receive the changes from the corresponding replica set leaders.
Check the migrations status with tt migration status:
$ tt migrations status "http://app_user:config_pass@localhost:2379/myapp" \
--tarantool-username=client --tarantool-password=secret
• migrations centralized storage scenarios:
• 000001_create_writes_space.lua
• 000002_create_writers_index.lua
• migrations apply status on Tarantool cluster:
• router-001:
• 000001_create_writes_space.lua: APPLIED
• 000002_create_writers_index.lua: APPLIED
• storage-001:
• 000001_create_writes_space.lua: APPLIED
• 000002_create_writers_index.lua: APPLIED
• storage-002:
• 000001_create_writes_space.lua: APPLIED
• 000002_create_writers_index.lua: APPLIED
To make sure that the space and indexes are created in the cluster, connect to the router
instance and retrieve the space information:
$ tt connect myapp:router-001-a
Data migrations with space.upgrade()
Example on GitHub: migrations
In this tutorial, you learn to write migrations that include data migration using
the space.upgrade() function.
Before starting this tutorial, complete the Basic tt migrations tutorial.
As a result, you have a sharded Tarantool EE cluster that uses an etcd-based configuration
storage. The cluster has a space with two indexes.
Writing a complex migration
Complex migrations require data migration along with schema migration. Connect to
the router instance and insert some tuples into the space before proceeding to the next steps.
$ tt connect myapp:router-001-a
The next migration changes the space format incompatibly: instead of one name
field, the new format includes two fields first_name and last_name.
To apply this migration, you need to change each tuple’s structure preserving the stored
data. The space.upgrade function helps with this task.
Create a new file 000003_alter_writers_space.lua in /migrations/scenario.
Prepare its initial structure the same way as in previous migrations:
local function apply_scenario()
-- migration code
end
return {
apply = {
scenario = apply_scenario,
},
}
Start the migration function with the new format description:
local function apply_scenario()
local space = box.space['writers']
local new_format = {
{name = 'id', type = 'number'},
{name = 'bucket_id', type = 'number'},
{name = 'first_name', type = 'string'},
{name = 'last_name', type = 'string'},
{name = 'age', type = 'number'},
}
box.space.writers.index.age:drop()
Note
box.space.writers.index.age:drop() drops an existing index. This is done
because indexes rely on field numbers and may break during this format change.
If you need the age field indexed, recreate the index after applying the
new format.
Next, create a stored function that transforms tuples to fit the new format.
In this case, the function extracts the first and the last name from the name field
and returns a tuple of the new format:
box.schema.func.create('_writers_split_name', {
language = 'lua',
is_deterministic = true,
body = [[
function(t)
local name = t[3]
local split_data = {}
local split_regex = '([^%s]+)'
for v in string.gmatch(name, split_regex) do
table.insert(split_data, v)
end
local first_name = split_data[1]
assert(first_name ~= nil)
local last_name = split_data[2]
assert(last_name ~= nil)
return {t[1], t[2], first_name, last_name, t[4]}
end
]],
})
Finally, call space:upgrade() with the new format and the transformation function
as its arguments. Here is the complete migration code:
local function apply_scenario()
local space = box.space['writers']
local new_format = {
{name = 'id', type = 'number'},
{name = 'bucket_id', type = 'number'},
{name = 'first_name', type = 'string'},
{name = 'last_name', type = 'string'},
{name = 'age', type = 'number'},
}
box.space.writers.index.age:drop()
box.schema.func.create('_writers_split_name', {
language = 'lua',
is_deterministic = true,
body = [[
function(t)
local name = t[3]
local split_data = {}
local split_regex = '([^%s]+)'
for v in string.gmatch(name, split_regex) do
table.insert(split_data, v)
end
local first_name = split_data[1]
assert(first_name ~= nil)
local last_name = split_data[2]
assert(last_name ~= nil)
return {t[1], t[2], first_name, last_name, t[4]}
end
]],
})
local future = space:upgrade({
func = '_writers_split_name',
format = new_format,
})
future:wait()
end
return {
apply = {
scenario = apply_scenario,
},
}
Learn more about space.upgrade() in Upgrading space schema.
Publish the new migration to etcd.
$ tt migrations publish "http://app_user:config_pass@localhost:2379/myapp" \
migrations/scenario/000003_alter_writers_space.lua
Note
You can also publish all migrations from the default location /migrations/scenario.
All other migrations stored in this directory are already published, so tt
skips them.
$ tt migrations publish "http://app_user:config_pass@localhost:2379/myapp"
Apply the published migrations:
$ tt migrations apply "http://app_user:config_pass@localhost:2379/myapp" \
--tarantool-username=client --tarantool-password=secret
Connect to the router instance and check that the space and its tuples have the new format:
$ tt connect myapp:router-001-a
Learn to use migrations for data schema definition on new instances added to the cluster
in Extending the cluster.
Extending the cluster
Example on GitHub: migrations
In this tutorial, you learn how to consistently define the data schema on newly
added cluster instances using the centralized migration management mechanism.
Having all migrations in a centralized etcd storage, you can extend the cluster
and consistently define the data schema on new instances on the fly.
Add one more storage replica set to the cluster. To do this, edit the cluster files in instances.enabled/myapp:
Publish the new cluster configuration to etcd:
$ tt cluster publish "http://app_user:config_pass@localhost:2379/myapp/" source.yaml
Run tt start to start up the new instances:
$ tt start myapp
• The instance myapp:router-001-a (PID = 61631) is already running.
• The instance myapp:storage-001-a (PID = 61632) is already running.
• The instance myapp:storage-001-b (PID = 61634) is already running.
• The instance myapp:storage-002-a (PID = 61639) is already running.
• The instance myapp:storage-002-b (PID = 61640) is already running.
• Starting an instance [myapp:storage-003-a]...
• Starting an instance [myapp:storage-003-b]...
Now the cluster contains three storage replica sets.
Applying migrations to the new replica set
The new replica set – storage-003– is just started and has no data schema yet.
Apply all stored migrations to the cluster to load the same data schema to the new replica set:
$ tt migrations apply "http://app_user:config_pass@localhost:2379/myapp" \
--tarantool-username=client --tarantool-password=secret
--replicaset=storage-003
Note
You can also apply migrations without specifying the replica set. All published
migrations are already applied on other replica sets, so tt skips the
operation on them.
$ tt migrations apply "http://app_user:config_pass@localhost:2379/myapp" \
--tarantool-username=client --tarantool-password=secret
To make sure that the space exists on the new instances, connect to storage-003-a
and check box.space.writers:
$ tt connect myapp:storage-003-a
Troubleshooting migrations
The centralized migrations mechanism allows troubleshooting migration issues using
dedicated tt migration options. When troubleshooting migrations, remember that
any unfinished or failed migration can bring the data schema into to inconsistency.
Additional steps may be needed to fix this.
Warning
The options used for migration troubleshooting can cause migration inconsistency
in the cluster. Use them only for local development and testing purposes.
Incorrect migration published
If an incorrect migration was published to etcd but wasn’t applied yet,
fix the migration file and publish it again with the --overwrite option:
$ tt migrations publish "http://app_user:config_pass@localhost:2379/myapp" \
000001_create_space.lua --overwrite
If the migration that needs a fix isn’t the last in the lexicographical order,
add also --ignore-order-violation:
$ tt migrations publish "http://app_user:config_pass@localhost:2379/myapp" \
000001_create_space.lua --overwrite --ignore-order-violation
If a migration was published by mistake and wasn’t applied yet, you can delete it
from etcd using tt migrations remove:
$ tt migrations remove "http://app_user:config_pass@localhost:2379/myapp" \
--migration 000003_not_needed.lua
Incorrect migration applied
Warning
Any schema change that was made by an incorrect migration before its fail or
cancellation must be resolved manually on each replica set before reapply.
--force-reapply and other tt migrations options affect only internal
status of the migration and don’t revert changes that it has made in the cluster.
If the migration is already applied, publish the fixed version and apply it with
the --force-reapply option:
$ tt migrations apply "http://app_user:config_pass@localhost:2379/myapp" \
--tarantool-username=client --tarantool-password=secret \
--force-reapply
If execution of the incorrect migration version has failed, you may also need to add
the --ignore-preceding-status option:
When you reapply a migration, tt checks the statuses of preceding migrations
to ensure consistency. To skip this check, add the --ignore-preceding-status option:
$ tt migrations apply "http://app_user:config_pass@localhost:2379/myapp" \
--tarantool-username=client --tarantool-password=secret \
--migration=00003_alter_space.lua
--force-reapply --ignore-preceding-status
Migration execution takes too long
To interrupt migration execution on the cluster, use tt migrations stop:
$ tt migrations stop "http://app_user:config_pass@localhost:2379/myapp" \
--tarantool-username=client --tarantool-password=secret
You can adjust the maximum migration execution time using the --execution-timeout
option of tt migrations apply:
$ tt migrations apply "http://app_user:config_pass@localhost:2379/myapp" \
--tarantool-username=client --tarantool-password=secret \
--execution-timeout=60
Note
If a migration timeout is reached, you may need to call tt migrations stop
to cancel requests that were sent when applying migrations.
Upgrading space schema
In Tarantool, migration refers to any change in a data schema, for example,
creating an index, adding a field, or changing a field format.
If you need to change a data schema, there are several possible cases:
- Schema migration does not require data migration: adding a field with the
is_nullable parameter to the end
of the space, creating an index.
- Schema migration requires data migration. For example, it is necessary when you have to iterate
over the entire space to convert columns to a new format or remove the column completely.
To solve the task of migrating the data, you can:
- Migrate data to a new space manually.
- Use the
space:upgrade() feature.
The space:upgrade() feature allows users to upgrade the format of a space and the tuples stored in it without
blocking the database.
How to apply space upgrade
First, specify an upgrade function – a function that will convert the tuples in the space to a new format.
The requirements for this function are listed below.
- The upgrade function takes two arguments. The first argument is a tuple to be upgraded.
The second one is optional. It contains some additional information stored in plain Lua object.
If omitted, the second argument is
nil.
- The function returns a new tuple or a Lua table. For example, it can add a new field to the tuple.
The new tuple must conform to the new space format set by the upgrade operation.
- The function should be registered with
box.schema.func.create.
It should also be stored, deterministic, and written in Lua.
- The function should not change the primary key of the tuple.
- The function should be idempotent:
f(f(t)) = f(t). This is necessary because the function
is applied to all tuples returned to the user, and some of them may have already been upgraded in the background.
Then define a new space format. This step is optional.
However, it could be useful if, for example, you want to add a new column with data.
For details, check the Usage Example section.
The next optional step is to choose an upgrade mode.
There are three modes: upgrade, dryrun, and dryrun+upgrade.
The default value is upgrade.
To check an upgrade function without applying any changes, choose the dryrun mode.
To run a space upgrade without testing the function, pick the upgrade mode.
If you want to apply both the test and the actual upgrade, use the dryrun+upgrade option.
For details, see the Upgrade Modes section.
The user defines an upgrade function.
Each tuple of the chosen space is passed through the function.
The function converts the tuple from the old format to a new one.
The function is applied to all tuples stored in the space in the background.
Besides, the function is applied to all tuples returned to the user via the box API (for example, select, get).
Therefore, it appears that the space upgrades instantly.
Keep in mind that space:upgrade differs from
the space_object:format() in the following ways:
| Difference |
space:upgrade() |
space:format() |
| Non-blocking |
Yes. It returns tuples in the new format, whether or not they have already been converted. |
Yes. |
| Set a format incompatible with the current one |
Yes. Works for non-indexed field types only. |
No, only expand the format in a compatible way. |
| Visibility of changes |
Immediately. All changes are visible and replicated immediately.
New data should conform to the new format immediately after the call. |
After data validation.
Data validation starts in the background, it does not block the database.
Inserting data incompatible with the new format is allowed before
validation is completed – in this case space.format fails. |
| Cancel (error/restart) |
Writes the state to the system table.
Restart: the operation continues.
Error: the operation should be restarted manually, any other attempt to change the table fails. |
Leaves no traces. |
| Set the upgrade function |
Yes. The upgrade may take a while to traverse the space and transform tuples. |
No. |
Note
At the moment, the feature is not supported
for vinyl spaces.
The space:upgrade() method is added to the space object:
-
space:upgrade({func[, arg, format, mode, is_async]}])
| Parameters: |
- func (
string/integer) – upgrade function name (string) or ID (integer). For details, see the
upgrade function requirements section.
- arg – additional information passed to the upgrade function in the second argument.
The option accepts any Lua value that can be encoded in MsgPack, which means that
the msgpack.encode(arg) should succeed.
For example, one can pass a scalar or a Lua table.
The default value is
nil.
- format (
map) – new space format. The requirements for this are the same as for any other
space:format().
If the field is omitted, the space format will remain the same as before the upgrade.
- mode (
string) – upgrade mode. Possible values: upgrade, dryrun,
dryrun+upgrade. The default value is upgrade.
- is_async (
boolean) – the flag indicates whether to wait until the upgrade operation is complete
before exiting the function.
The default value is false – the function is blocked
until the upgrade operation is finished.
|
| Return: | object describing the status of the operation (also known as future).
The methods of the object are described below.
|
-
object
future_object
-
future_object:info(dryrun, status, func, arg, owner, error, progress)
Shows information about the state of the upgrade operation.
| Parameters: |
- dryrun (
boolean) – dry run mode flag. Possible values:
true for a dry run, nil for an actual upgrade.
- status (
string) – upgrade status. Possible values:
inprogress, waitrw, error, replica, done.
- func (
string/integer) – name of the upgrade function.
It is the same as passed to the space:upgrade method.
The field is nil if the status is done.
- arg – additional information passed to the upgrade function.
It is the same as for the
space:upgrade method.
The field is nil if it is omitted in the space:upgrade.
- owner (
string) – UUID of the instance running the upgrade
(see box.info.uuid).
The field is nil if the status is done.
- error (
string) – error message if the status is error, otherwise nil.
- progress (
string) – completion percentage if the status is inprogress/waitrw,
otherwise nil.
|
| Return: | a table with information about the state of the upgrade operation
|
| Rtype: | table
|
The fields can also be accessed directly, without calling the info() method.
For example, future.status is the same as future:info().status.
-
future_object:wait([timeout])
Waits until the upgrade operation is completed or a timeout occurs.
An operation is considered completed if its status is done or error.
| Parameters: |
- timeout (
double) – if the timeout argument is omitted, the method waits as long as it takes.
|
| Return: | returns true if the operation has been completed, false on timeout
|
| Rtype: | boolean
|
-
future_object:cancel()
Cancels the upgrade operation if it is currently running. Otherwise, an exception is thrown.
A canceled upgrade operation completes with an error.
Running space:upgrade() with is_async = false or the is_async field not set is equal to:
local future = space:upgrade({func = 'my_func', is_async = true})
future:wait()
return future
If called without arguments, space:upgrade() returns a future object for the active upgrade operation.
If there is none, it returns nil.
There are three upgrade modes: dryrun, dryrun+upgrade, and upgrade.
Regardless of the mode selected, the upgrade does not block execution.
Once in a while, the background fiber commits the upgraded tuples and yields.
Calling space:upgrade without arguments always returns the current state of the space upgrade,
never the state of a dry run. If there is a dry run working in the background, space:upgrade will still return nil.
Unlike an actual space upgrade, the future object returned by a dry run upgrade can’t be recovered if it is lost.
So a dry run is aborted if it is garbage collected.
Warning
In dryrun+upgrade mode: if the future object is garbage collected by Lua
before the end of the dry run and the start of the upgrade,
then the dry run will be canceled, and no upgrade will be started.
Upgrade modes:
upgrade mode: the background fiber iterates over the
space, applies the upgrade function, checks that obtained tuples fit the new space format,
and updates the tuples. This mode prevents the space from being altered.
The mode can only be performed on the master instance.
dryrun mode: the dry-run mode is used to check the upgrade function. The mode does not apply any changes
to the target space. It starts a background fiber. The fiber:
- Iterates over the target space.
- Attempts to apply the upgrade function to each tuple stored in the space.
- Checks if the returned tuple matches the new format.
- Checks if the function is idempotent.
- Checks that the function does not modify the primary key.
For details, see the upgrade function requirements section.
To start a dry run, pass mode='dryrun' to the space:upgrade method.
In this case, the future object has the dryrun field set to true.
The possible statuses are inprogress and dryrun. replica and waitrw states are never set
for a dry run future object.
The dryrun mode is not persisted. Restarting the instance does not restart a dry run.
A dry run only works on the original instance, never on replicas.
Unlike a real upgrade, a dry run does not prevent the space from being altered.
The space can even be dropped. In this case, the dry run will complete with an error.
dryrun+upgrade mode: it starts a dry run, which, if completed successfully, triggers an actual upgrade.
The future object returned by space:upgrade remains valid throughout the process.
It starts as the future object of the dry run. Then, under the hood, it is converted into an upgrade future object.
Waiting on it would wait for both the dry run and the upgrade to complete.
During the dry run, the future object has the dryrun field set to true.
When the actual upgrade starts, the dryrun field is set to nil.
The mode can only be performed on the master instance.
An upgrade operation has one of the following upgrade states:
inprogress – the upgrade operation is running in the background.
The function is applied to all tuples returned to the user.
waitrw – the instance was switched to the read-only mode
(for example, by using box.cfg.read_only), so the upgrade couldn’t proceed.
The upgrade process will resume as soon as the instance switches back to read-write mode.
Nevertheless, the upgrade function is applied to all tuples returned to the user.
error – the upgrade operation failed with an error. See the error field for the error message.
See the log for the tuple that caused the error. No alter operation is allowed, except for another upgrade,
supposed to fix the problem.
Nevertheless, the upgrade function is applied to all tuples returned to the user. The space is writable.
done – the upgrade operation is successfully completed. The upgrade function is not applied to tuples returned
to the user anymore. The function can be deleted.
replica – the upgrade operation is either running or completed with an error on another instance.
See the owner field for the UUID of the instance running the upgrade.
Nevertheless, the upgrade function is applied to all tuples returned to the user.
While a space upgrade is in progress, the space can’t be altered or dropped.
The attempt to do that will throw an exception.
Restarting an upgrade is allowed in case the currently running upgrade is canceled or completed with an error.
It means the manual restart is possible if the upgrade operation is in the error state.
If a space upgrade was canceled or failed with an error, the space can’t be altered or dropped.
The only option is to restart the upgrade using a different upgrade function or format.
Interaction with recovery
The space upgrade state is persisted. It is stored in the _space system table. If an instance with
a space upgrade in progress (inprogress state) is shut down, it restarts the space upgrade after recovery.
If a space upgrade fails (switches to the error state), it remains in the error state after recovery.
Interaction with replication
The changes made to a space by a space upgrade are replicated.
Just as on the instance where the upgrade is performed, the upgrade function is applied to all tuples returned
to the user on the replicas. However, the upgrade operation is not performed on the replicas in the background.
The replicas wait for the upgrade operation to complete on the master.
They can’t alter or drop the space. Normally, they can’t cancel or restart the upgrade operation either.
There is an emergency exception when the master is permanently dead.
It is possible to restart a space upgrade that started on another instance.
The restart is possible if the upgrade owner UUID (see the owner field) has been deleted
from the _cluster system table.
Note
Except the dryrun mode, the upgrade can only be performed on the master.
If the instance is no longer the master, the upgrade is suspended until the instance is master again.
Restarting the upgrade on a new master works only if the old one has been removed from the replica set
(_cluster system space).
Suppose there are two columns in the space test – id (unsigned) and data (string).
The example shows how to upgrade the schema and add another column to the space using space:upgrade().
The new column contains the id values converted to string. Each step takes a while.
The test space is generated with the following script:
local log = require('log')
box.cfg{
checkpoint_count = 1,
memtx_memory = 5 * 1024 * 1024 * 1024,
}
box.schema.space.create('test')
box.space.test:format{
{name = 'id', type = 'unsigned'},
{name = 'data', type = 'string'},
}
box.space.test:create_index('pk')
local count = 20 * 1000 * 1000
local progress = 0
box.begin()
for i = 1, count do
box.space.test:insert{i, 'data' .. i}
if i % 1000 == 0 then
box.commit()
local p = math.floor(i / count * 100)
if progress ~= p then
progress = p
log.info('Generating test data set... %d%% done', p)
end
box.begin()
end
end
box.commit()
box.snapshot()
os.exit(0)
To upgrade the space, connect to the server and then run the commands below:
While the upgrade is in progress, you can track the state of the upgrade.
To check the status, connect to Tarantool from another console and run the following commands:
Even though the upgrade is only 8% complete, selecting the data from the space returns the converted tuples:
Note
The tuples contain the new field even though the space upgrade is still running.
Wait for the space upgrade to complete using the command below:
Read views
A read view is an in-memory snapshot of the entire database that isn’t
affected by future data modifications.
Read views provide access to database spaces and their indexes and enable you to
retrieve data using the same select and pairs operations.
Read views can be used to make complex analytical queries.
This reduces the load on the main database and improves RPS for a single Tarantool instance.
To improve memory consumption and performance,
Tarantool creates read views using the copy-on-write technique.
In this case, duplication of the entire data set is not required:
Tarantool duplicates only blocks modified after a read view is created.
Note
Tarantool Enterprise Edition supports read views starting from v2.11.0 and enables the ability
to work with them using both Lua and C API.
Read views have the following limitations:
To create a read view, call the box.read_view.open() function.
The snippet below shows how to create a read view with the read_view1 name.
After creating a read view, you can see the information about it by calling
read_view_object:info().
To list all the created read views, call the box.read_view.list() function.
After creating a read view, you can access database spaces using the
read_view_object.space field.
This field provides access to a space object that exposes the
select, get,
and pairs methods with the same behavior
as corresponding box.space methods.
The example below shows how to select 4 records from the bands space:
Similarly, you can retrieve data by the specific index.
Pagination is supported in read views in the same ways as in select requests
to spaces: using the fetch_pos and after arguments. To get the cursor position
after executing a request on a read view, set fetch_pos to true:
Then, pass this position in the after parameter of a request to get the
next data chunk:
When a read view is no longer needed, close it using the
read_view_object:close() method
because a read view may consume a substantial amount of memory.
Otherwise, a read view is closed implicitly when the read view object is collected by the Lua garbage collector.
After the read view is closed,
its status is set to closed.
On an attempt to use it, an error is raised.
A Tarantool session below demonstrates how to open a read view,
get data from this view, and close it.
To repeat these steps, you need to bootstrap a Tarantool instance
as described in Using data operations
(you can skip creating secondary indexes).
Insert test data.
Create a read view by calling the open function.
Then, make sure that the read view status is open.
Change data in a database using the delete and update operations.
Query a read view to make sure it contains a snapshot of data before a database is updated.
Close a read view.
SQL guides
This section contains hands-on SQL guides.
You might also want to read the in-depth SQL reference.
SQL beginners’ guide
The Beginners’ Guide describes how users can start up with SQL with Tarantool, and necessary concepts.
The SQL Beginners’ Guide is about databases in general, and about the relationship between
Tarantool’s NoSQL and SQL products.
Most of the matters in the Beginners’ Guide will already be familiar to people who have used relational databases before.
Before starting this tutorial:
Install the tt CLI utility.
Start a Tarantool instance in the interactive mode by running tt run -i:
$ tt run -i
Tarantool 3.0.0-0-g6ba34da7f8
type 'help' for interactive help
tarantool>
Initialize the instance and switch the input language to SQL:
Now you have a running Tarantool instance that accepts SQL input.
In football training camp it is traditional for the trainer to begin by showing a football
and saying “this is a football”. In that spirit, this is a table:
TABLE
[1] [2] [3]
+-----------------+----------------+----------------+
Row#1 | Row#1,Column#1 | Row#1,Column#2 | Row#1,Column#3 |
+-----------------+----------------+----------------+
Row#2 | Row#2,Column#1 | Row#2,Column#2 | Row#2,Column#3 |
+-----------------+----------------+----------------+
Row#3 | Row#3,Column#1 | Row#3,Column#2 | Row#3,Column#3 |
+-----------------+----------------+----------------+
But the labels are misleading – one usually doesn’t identify rows and columns by their ordinal positions,
one prefers to pick out specific items by their contents. In that spirit, this is a table:
MODULES
+-----------------+------+---------------------+
| NAME | SIZE | PURPOSE |
+-----------------+------+---------------------+
| box | 1432 | Database Management |
| clock | 188 | Seconds |
| crypto | 4 | Cryptography |
+-----------------+------+---------------------+
So one does not use longitude/latitude navigation by talking about “Row#2 Column #2”,
one uses the contents of the Name column and the name of the Size column
by talking about “the size, where the name is ‘clock’”.
To be more exact, this is what one says:
SELECT size FROM modules WHERE name = 'clock';
If you’re familiar with Tarantool’s architecture – and ideally you read
about that before coming to this chapter – then you know that there is a NoSQL
way to get the same thing:
box.space.MODULES:select()[2][2]
Well, you can do that. One of the advantages of Tarantool is that if you can get
data via an SQL statement, then you can get the same data via a NoSQL request.
But the reverse is not true, because not all NoSQL tuple sets are definable
as SQL tables. These restrictions apply for SQL that do not apply for NoSQL:
1. Every column must have a name.
2. Every column should have a scalar type (Tarantool is relaxed about
which particular scalar type you can have, but there is no way to index and
search arrays, tables within tables, or what MessagePack calls “maps”.)
Tarantool/NoSQL’s “format” clause causes the same restrictions.
So an SQL “table” is a NoSQL “tuple set with format restrictions”,
an SQL “row” is a NoSQL “tuple”, an SQL “column” is a NoSQL “list of fields within a tuple set”.
This is how to create the modules table:
CREATE TABLE modules (name STRING, size INTEGER, purpose STRING, PRIMARY KEY (name));
The words that are IN CAPITAL LETTERS are “keywords” (although it is only a convention in
this manual that keywords are in capital letters, in practice many programmers prefer to avoid shouting).
A keyword has meaning for the SQL parser so many keywords are reserved, they cannot be used as names
unless they are enclosed inside quotation marks.
The word “modules” is a “table name”, and the words “name” and “size” and “purpose” are “column names”.
All tables and all columns must have names.
The words “STRING” and “INTEGER” are “data types”.
STRING means “the contents should be characters, the length is indefinite, the equivalent NoSQL type is ‘string’’”.
INTEGER means “the contents should be numbers without decimal points, the equivalent NoSQL type is ‘integer’”.
Tarantool supports other data types but this section’s example table has data types from the two main groups,
namely, data types for numbers and data types for strings.
The final clause, PRIMARY KEY (name), means that the name column is the main column used to identify the row.
Frequently it is necessary, at least temporarily, that a column value should be NULL.
Typical situations are: the value is unknown, or the value is not applicable.
For example, you might make a module as a placeholder but you don’t want to say its size or purpose.
If such things are possible, the column is “nullable”.
The example table’s name column cannot contain nulls, and it could be defined explicitly as “name STRING NOT NULL”,
but in this case that’s unnecessary – a column defined as PRIMARY KEY is automatically NOT NULL.
Is a NULL in SQL the same thing as a nil in Lua?
No, but it is close enough that there will be confusion.
When nil means “unknown” or “inapplicable”, yes.
But when nil means “nonexistent” or “type is nil”, no.
NULL is a value, it has a data type because it is inside a column which is defined with that data type.
This is how to create indexes for the modules table:
CREATE INDEX size ON modules (size);
CREATE UNIQUE INDEX purpose ON modules (purpose);
There is no need to create an index on the name column,
because Tarantool creates an index automatically when it sees a PRIMARY KEY clause in the CREATE TABLE statement.
In fact there is no need to create indexes on the size or purpose columns
either – if indexes don’t exist, then it is still possible to use the columns for searches.
Typically people create non-primary indexes, also called secondary indexes,
when it becomes clear that the table will grow large and searches will be frequent,
because searching with an index is generally much faster than searching without an index.
Another use for indexes is to enforce uniqueness.
When an index is created with CREATE UNIQUE INDEX for the purpose column,
it is not possible to have duplicate values in that column.
Putting data into a table is called “inserting”.
Changing data is called “updating”.
Removing data is called “deleting”.
Together, the three SQL statements INSERT plus UPDATE plus DELETE are the three main “data-change” statements.
This is how to insert, update, and delete a row in the modules table:
INSERT INTO modules VALUES ('json', 14, 'format functions for JSON');
UPDATE modules SET size = 15 WHERE name = 'json';
DELETE FROM modules WHERE name = 'json';
The corresponding non-SQL Tarantool requests would be:
box.space.MODULES:insert{'json', 14, 'format functions for JSON'}
box.space.MODULES:update('json', {{'=', 2, 15}})
box.space.MODULES:delete{'json'}
This is how one would populate the table with the values that was shown earlier:
INSERT INTO modules VALUES ('box', 1432, 'Database Management');
INSERT INTO modules VALUES ('clock', 188, 'Seconds');
INSERT INTO modules VALUES ('crypto', 4, 'Cryptography');
Some data-change statements are illegal due to something in the table’s definition.
This is called “constraining what can be done”. Some types of constraints have already been shown …
NOT NULL – if a column is defined with a NOT NULL clause, it is illegal to put NULL into it.
A primary-key column is automatically NOT NULL.
UNIQUE – if a column has a UNIQUE index, it is illegal to put a duplicate into it.
A primary-key column automatically has a UNIQUE index.
data domain – if a column is defined as having data type INTEGER, it is illegal to put a non-number into it.
More generally, if a value doesn’t correspond to the data type of the definition, it is illegal.
Some database management systems (DBMSs) are very forgiving and will try to
make allowances for bad values rather than reject them; Tarantool is a bit more strict than those DBMSs.
Now, here are other types of constraints …
CHECK – a table description can have a clause “CHECK (conditional expression)”.
For example, if the CREATE TABLE modules statement looked like this:
CREATE TABLE modules (name STRING,
size INTEGER,
purpose STRING,
PRIMARY KEY (name),
CHECK (size > 0));
then this INSERT statement would be illegal:
INSERT INTO modules VALUES ('box', 0, 'The Database Kernel');
because there is a CHECK constraint saying that the second column, the size column,
cannot contain a value which is less than or equal to zero. Try this instead:
INSERT INTO modules VALUES ('box', 1, 'The Database Kernel');
FOREIGN KEY – a table description can have a clause
“FOREIGN KEY (column-list) REFERENCES table (column-list)”.
For example, if there is a new table “submodules” which in a way depends on the modules table,
it can be defined like this:
CREATE TABLE submodules (name STRING,
module_name STRING,
size INTEGER,
purpose STRING,
PRIMARY KEY (name),
FOREIGN KEY (module_name) REFERENCES
modules (name));
Now try to insert a new row into this submodules table:
INSERT INTO submodules VALUES
('space', 'Box', 10000, 'insert etc.');
The insert will fail because the second column (module_name)
refers to the name column in the modules table, and the name
column in the modules table does not contain ‘Box’.
However, it does contain ‘box’.
By default searches in Tarantool’s SQL use a binary collation. This will work:
INSERT INTO submodules
VALUES ('space', 'box', 10000, 'insert etc.');
Now try to delete the corresponding row from the modules table:
DELETE FROM modules WHERE name = 'box';
The delete will fail because the second column (module_name) in the submodules
table refers to the name column in the modules table, and the name column
in the modules table would not contain ‘box’ if the delete succeeded.
So the FOREIGN KEY constraint affects both the table which contains
the FOREIGN KEY clause and the table that the FOREIGN KEY clause refers to.
The constraints in a table’s definition – NOT NULL, UNIQUE, data domain, CHECK,
and FOREIGN KEY – are guarantors of the database’s integrity.
It is important that they are fixed and well-defined parts of the definition,
and hard to bypass with SQL.
This is often seen as a difference between SQL and NoSQL – SQL emphasizes law and order,
NoSQL emphasizes freedom and making your own rules.
Think about the two tables that have been discussed so far:
CREATE TABLE modules (name STRING,
size INTEGER,
purpose STRING,
PRIMARY KEY (name),
CHECK (size > 0));
CREATE TABLE submodules (name STRING,
module_name STRING,
size INTEGER,
purpose STRING,
PRIMARY KEY (name),
FOREIGN KEY (module_name) REFERENCES
modules (name));
Because of the FOREIGN KEYS clause in the submodules table, there is clearly a many-to-one relationship:
submodules –>> modules
that is, every submodules row must refer to one (and only one) modules row,
while every modules row can be referred to in zero or more submodules rows.
Table relationships are important, but beware:
do not trust anyone who tells you that databases made with SQL are relational
“because there are relationships between tables”.
That is wrong, as will be clear in the discussion about what makes a database relational, later.
Important
By default, Tarantool prohibits SELECT queries that scan table rows
instead of using indexes to avoid unwanted heavy load. For the purposes of
this tutorial, allow SQL scan queries in Tarantool by running the command:
SET SESSION "sql_seq_scan" = true;
Alternatively, you can allow a specific query to perform a table scan by adding
the SEQSCAN keyword before the table name. Learn more about using SEQSCAN
in SQL scan queries in the SQL FROM clause description.
We gave a simple example of a SELECT statement earlier:
SELECT size FROM modules WHERE name = 'clock';
The clause “WHERE name = ‘clock’” is legal in other statements – it
is in examples with UPDATE and DELETE – but here the only examples will be with SELECT.
The first variation is that the WHERE clause does not have to be specified at all,
it is optional. So this statement would return all rows:
SELECT size FROM modules;
The second variation is that the comparison operator does not have to be ‘=’,
it can be anything that makes sense: ‘>’ or ‘>=’ or ‘<’ or ‘<=’,
or ‘LIKE’ which is an operator that works with strings that may
contain wildcard characters ‘_’ meaning ‘match any one character’
or ‘%’ meaning ‘match any zero or one or many characters’.
These are legal statements which return all rows:
SELECT size FROM modules WHERE name >= '';
SELECT size FROM modules WHERE name LIKE '%';
The third variation is that IS [NOT] NULL is a special condition.
Remembering that the NULL value can mean “it is unknown what the value should be”,
and supposing that in some row the size is NULL,
then the condition “size > 10” is not certainly true and it is not certainly false,
so it is evaluated as “unknown”.
Ordinarily the application of a WHERE clause filters out both false and unknown results.
So when searching for NULL, say IS NULL;
when searching anything that is not NULL, say IS NOT NULL.
This statement will return all rows because (due to the definition) there are no NULLs in the name column:
SELECT size FROM modules WHERE name IS NOT NULL;
The fourth variation is that conditions can be combined with AND / OR, and negated with NOT.
So this statement would return all rows (the first condition is false
but the second condition is true, and OR means “return true if either condition is true”):
SELECT size
FROM modules
WHERE name = 'wombat' OR size IS NOT NULL;
Selecting with a select list
Yet again, here is a simple example of a SELECT statement:
SELECT size FROM modules WHERE name = 'clock';
The words between SELECT and FROM are the select list.
In this case, the select list is just one word: size.
Formally it means that the desire is to return the size values,
and technically the name for picking a particular column is called “projection”.
The first variation is that one can specify any column in any order:
SELECT name, purpose, size FROM modules;
The second variation is that one can specify an expression,
it does not have to be a column name, it does not even have to include a column name.
The common expression operators for numbers are the arithmetic operators + - / *;
the common expression operator for strings is the concatenation operator ||.
For example this statement will return 8, ‘XY’:
SELECT size * 2, 'X' || 'Y' FROM modules WHERE size = 4;
The third variation is that one can add a clause [AS name] after every expression,
so that in the return the column titles will make sense.
This is especially important when a title might otherwise be ambiguous or meaningless.
For example this statement will return 8, ‘XY’ as before
SELECT size * 2 AS double_size, 'X' || 'Y' AS concatenated_literals FROM modules
WHERE size = 4;
but displayed as a table the result will look like
+----------------+------------------------+
| DOUBLE_SIZE | CONCATENATED_LITERALS |
+----------------+------------------------+
| 8 | XY |
+----------------+------------------------+
Selecting with a select list with asterisk
Instead of listing columns in a select list, one can just say '*'. For example
This is the same thing as
SELECT name, size, purpose FROM modules;
Selecting with "*" saves time for the writer,
but it is unclear to a reader who has not memorized what the column names are.
Also it is unstable, because there is a way to change a table’s
definition (the ALTER statement, which is an advanced topic).
Nevertheless, although it might be bad to use it for production,
it is handy to use it for introduction, so "*" will appear in some following examples.
Remember that there is a modules table and there is a submodules table.
Suppose that there is a desire to list the submodules that refer to modules for which the purpose is X.
That is, this involves a search of one table using a value in another table.
This can be done by enclosing “(SELECT …)” within the WHERE clause. For example:
SELECT name FROM submodules
WHERE module_name =
(SELECT name FROM modules WHERE purpose LIKE '%Database%');
Subqueries are also useful in the select list, when one wishes to combine
information from more than one table.
For example this statement will display submodules rows but will include values that come from the modules table:
SELECT name AS submodules_name,
(SELECT purpose FROM modules
WHERE modules.name = submodules.module_name)
AS modules_purpose,
purpose AS submodules_purpose
FROM submodules;
Whoa. What are “modules.name” and “submodules.name”?
Whenever you see “x . y” you are looking at a “qualified column name”,
and the first part is a table identifier, the second part is a column identifier.
It is always legal to use qualified column names, but until now it has not been necessary.
Now it is necessary, or at least it is a good idea, because both tables have a column named “name”.
The result will look like this:
+-------------------+------------------------+--------------------+
| SUBMODULES_NAME | MODULES_PURPOSE | SUBMODULES_PURPOSE |
+-------------------+------------------------+--------------------+
| space | Database Management | insert etc. |
+-------------------+------------------------+--------------------+
Perhaps you have read somewhere that SQL stands for “Structured Query Language”.
That is not true any more.
But it is true that the query syntax allows for a structural component,
namely the subquery, and that was the original idea.
However, there is a different way to combine tables – with joins instead of subqueries.
Select with Cartesian join
Until now only “FROM modules” or “FROM submodules” was used in SELECT statements.
What if there was more than one table in the FROM clause? For example
SELECT * FROM modules, submodules;
or
SELECT * FROM modules JOIN submodules;
That is legal. Usually it is not what you want, but it is a learning aid. The result will be:
{ columns from modules table } { columns from submodules table }
+--------+------+---------------------+-------+-------------+-------+-------------+
| NAME | SIZE | PURPOSE | NAME | MODULE_NAME | SIZE | PURPOSE |
+--------+------+---------------------+-------+-------------+-------+-------------+
| box | 1432 | Database Management | space | box | 10000 | insert etc. |
| clock | 188 | Seconds | space | box | 10000 | insert etc. |
| crypto | 4 | Cryptography | space | box | 10000 | insert etc. |
+--------+------+---------------------+-------+-------------+-------+-------------+
It is not an error. The meaning of this type of join is “combine every row in table-1 with every row in table-2”.
It did not specify what the relationship should be, so the result has everything,
even when the submodule has nothing to do with the module.
It is handy to look at the above result, called a “Cartesian join” result, to see what would really be desirable.
Probably for this case the row that actually makes sense is the one where the modules.name = submodules.module_name,
and it’s better to make that clear in both the select list and the WHERE clause, thus:
SELECT modules.name AS modules_name,
modules.size AS modules_size,
modules.purpose AS modules_purpose,
submodules.name,
module_name,
submodules.size,
submodules.purpose
FROM modules, submodules
WHERE modules.name = submodules.module_name;
The result will be:
+----------+-----------+------------+--------+---------+-------+-------------+
| MODULES_ | MODULES_ | MODULES_ | NAME | MODULE_ | SIZE | PURPOSE |
| NAME | SIZE | PURPOSE | | NAME | | |
+----------+-----------+--------- --+--------+---------+-------+-------------+
| box | 1432 | Database | space | box | 10000 | insert etc. |
| | | Management | | | | |
+----------+-----------+------------+--------+---------+-------+-------------+
In other words, you can specify a Cartesian join in the FROM clause,
then you can filter out the irrelevant rows in the WHERE clause,
and then you can rename columns in the select list.
This is fine, and every SQL DBMS supports this.
But it is worrisome that the number of rows in a Cartesian join is always
(number of rows in first table multiplied by number of rows in second table),
which means that conceptually you are often filtering in a large set of rows.
It is good to start by looking at Cartesian joins because they show the concept.
Many people, though, prefer to use different syntaxes for joins because they
look better or clearer. So now those alternatives will be shown.
Select with join with ON clause
The ON clause would have the same comparisons as the WHERE clause that was illustrated
for the previous section, but the use of different syntax would be making it clear
“this is for the sake of the join”.
Readers can see at a glance that it is, in concept at least, an initial step before
the result rows are filtered. For example this
SELECT * FROM modules JOIN submodules
ON (modules.name = submodules.module_name);
is the same as
SELECT * FROM modules, submodules
WHERE modules.name = submodules.module_name;
Select with join with USING clause
The USING clause would take advantage of names that are held in common between the two tables,
with the assumption that the intent is to match those columns with ‘=’ comparisons. For example,
SELECT * FROM modules JOIN submodules USING (name);
has the same effect as
SELECT * FROM modules JOIN submodules WHERE modules.name = submodules.name;
If the table had been created with a plan in advance to use USING clauses,
that would save time. But that did not happen.
So, although the above example “works”, the results will not be sensible.
A natural join would take advantage of names that are held in common between the two tables,
and would do the filtering automatically based on that knowledge, and throw away duplicate columns.
If the table had been created with a plan in advance to use natural joins, that would be very handy.
But that did not happen. So, although the following example “works”, the results won’t be sensible.
SELECT * FROM modules NATURAL JOIN submodules;
Result: nothing, because modules.name does not match submodules.name,
and so on And even if there had been a result, it would only have included
four columns: name, module_name, size, purpose.
Now what if there is a desire to join modules to submodules,
but it’s necessary to be sure that all the modules are found?
In other words, suppose the requirement is to get modules even if the condition submodules.module_name = modules.name
is not true, because the module has no submodules.
When that is the requirement, the type of join is an “outer join”
(as opposed to the type that has been used so far which is an “inner join”).
Specifically the format will be LEFT [OUTER] JOIN because the main table, modules, is on the left. For example:
SELECT *
FROM modules LEFT JOIN submodules
ON modules.name = submodules.module_name;
which returns:
{ columns from modules table } { columns from submodules table }
+--------+------+---------------------+-------+-------------+-------+-------------+
| NAME | SIZE | PURPOSE | NAME | MODULE_NAME | SIZE | PURPOSE |
+--------+------+---------------------+-------+-------------+-------+-------------+
| box | 1432 | Database Management | space | box | 10000 | insert etc. |
| clock | 188 | Seconds | NULL | NULL | NULL | NULL |
| crypto | 4 | Cryptography | NULL | NULL | NULL | NULL |
+--------+------+---------------------+-------+-------------+-------+-------------+
Thus, for the submodules of the clock module and the submodules of the crypto
module – which do not exist – there are NULLs in every column.
A function can take any expression, including an expression that contains another function,
and return a scalar value. There are many such functions. Here will be a description of only one, SUBSTR,
which returns a substring of a string.
Format: SUBSTR(input-string, start-with [, length])
Description: SUBSTR takes input-string, eliminates any characters before start-with,
eliminates any characters after (start-with plus length), and returns the result.
Example: SUBSTR('abcdef', 2, 3) returns ‘bcd’.
Select with aggregation, GROUP BY, and HAVING
Remember that the modules table looks like this:
MODULES
+-----------------+------+---------------------+
| NAME | SIZE | PURPOSE |
+-----------------+------+---------------------+
| box | 1432 | Database Management |
| clock | 188 | Seconds |
| crypto | 4 | Cryptography |
+-----------------+------+---------------------+
Suppose that there is no need to know all the individual size values,
all that is important is their aggregation, that is, take the attributes of the collection.
SQL allows aggregation functions including: AVG (average), SUM, MIN (minimum), MAX (maximum), and COUNT.
For example
SELECT AVG(size), SUM(size), MIN(size), MAX(size), COUNT(size) FROM modules;
The result will look like this:
+-----------+-----------+-----------+-----------+-----------+
| COLUMN_1 | COLUMN_2 | COLUMN_3 | COLUMN_4 | COLUMN_5 |
+-----------+-----------+-----------+-----------+-----------|
| 541 | 1624 | 4 | 1432 | 3 |
+-----------+-----------+-----------+-----------+-----------+
Suppose that the requirement is aggregations, but aggregations of rows that have some common characteristic.
Supposing further, the rows should be divided into two groups, the ones whose names
begin with ‘b’ and the ones whose names begin with ‘c’.
This can be done by adding a clause [GROUP BY expression]. For example,
SELECT SUBSTR(name, 1, 1), AVG(size), SUM(size), MIN(size), MAX(size), COUNT(size)
FROM modules
GROUP BY SUBSTR(name, 1, 1);
The result will look like this:
+------------+--------------+-----------+-----------+-----------+-------------+
| COLUMN_1 | COLUMN_2 | COLUMN_3 | COLUMN_4 | COLUMN_5 | COLUMN_6 |
+------------+--------------+-----------+-----------+-----------+-------------+
| b | 1432 | 1432 | 1432 | 1432 | 1 |
| c | 96 | 192 | 4 | 188 | 2 |
+------------+--------------+-----------+-----------+-----------+-------------+
Select with common table expression
It is possible to define a temporary (viewed) table within a statement,
usually within a SELECT statement, using a WITH clause. For example:
WITH tmp_table AS (SELECT x1 FROM t1) SELECT * FROM tmp_table;
Select with order, limit, and offset clauses
So far, tor every search in the modules table, the rows have come out in alphabetical order by name:
‘box’, then ‘clock’, then ‘crypto’.
However, to really be sure about the order, or to ask for a different order,
it is necessary to be explicit and add a clause:
ORDER BY column-name [ASC|DESC].
(ASC stands for ASCending, DESC stands for DESCending.)
For example:
SELECT * FROM modules ORDER BY name DESC;
The result will be the usual rows, in descending alphabetical order: ‘crypto’ then ‘clock’ then ‘box’.
After the ORDER BY clause there can be a clause LIMIT n, where n is the maximum number of rows to retrieve. For example:
SELECT * FROM modules ORDER BY name DESC LIMIT 2;
The result will be the first two rows, ‘crypto’ and ‘clock’.
After the ORDER BY clause and the LIMIT clause there can be a clause OFFSET n,
where n is the row to start with. The first offset is 0. For example:
SELECT * FROM modules ORDER BY name DESC LIMIT 2 OFFSET 2;
The result will be the third row, ‘box’.
A view is a canned SELECT. If you have a complex SELECT that you want to run frequently, create a view and then do a simple SELECT on the view. For example:
CREATE VIEW v AS SELECT size, (size *5) AS size_times_5
FROM modules
GROUP BY size, name
ORDER BY size_times_5;
SELECT * FROM v;
Tarantool has a “Write Ahead Log” (WAL).
Effects of data-change statements are logged before they are permanently stored on disk.
This is a reason that, although entire databases can be stored in temporary memory,
they are not vulnerable in case of power failure.
Tarantool supports commits and rollbacks. In effect, asking for a commit means
asking for all the recent data-change statements,
since a transaction began, to become permanent.
In effect, asking for a rollback means asking for all the recent data-change statements,
since a transaction began, to be cancelled.
For example, consider these statements:
CREATE TABLE things (remark STRING, PRIMARY KEY (remark));
START TRANSACTION;
INSERT INTO things VALUES ('A');
COMMIT;
START TRANSACTION;
INSERT INTO things VALUES ('B');
ROLLBACK;
SELECT * FROM things;
The result will be: one row, containing ‘A’. The ROLLBACK cancelled the second INSERT statement,
but did not cancel the first one, because it had already been committed.
Ordinarily every statement is automatically committed.
After START TRANSACTION, statements are not automatically committed – Tarantool considers
that a transaction is now “active”, until the transaction ends with a COMMIT statement or a ROLLBACK statement.
While a transaction is active, all statements are legal except another START TRANSACTION.
Edgar F. Codd, the person most responsible for researching and explaining relational database concepts,
listed the main criteria as
(Codd’s 12 rules).
Although Tarantool is not advertised as “relational”, Tarantool comes with a claim that it complies with these rules,
with the following caveats and exceptions …
The rules state that all data must be viewable as relations.
A Tarantool SQL table is a relation.
However, it is possible to have duplicate values in SQL tables and it is possible
to have an implicit ordering. Those characteristics are not allowed for true relations.
The rules state that there must be a dynamic online catalog. Tarantool has one but some metadata is missing from it.
The rules state that the data language must support authorization.
Tarantool’s SQL does not. Authorization occurs via NoSQL requests.
The rules require that data must be physically independent (from underlying storage changes)
and logically independent (from application program changes).
So far there is not enough experience to make this guarantee.
The rules require certain types of updatable views. Tarantool’s views are not updatable.
The rules state that it should be impossible to use a low-level language to bypass
integrity as defined in the relational-level language.
In Tarantool’s case, this is not true, for example one can execute a request
with Tarantool’s NoSQL to violate a foreign-key constraint that was defined with Tarantool’s SQL.
To learn more about SQL in Tarantool, check the reference.
SQL tutorial
This tutorial is a demonstration of the support for SQL in Tarantool.
It includes the functionality that you’d encounter in an “SQL-101” course.
Before starting this tutorial:
Install the tt CLI utility.
Start a Tarantool instance in the interactive mode by running tt run -i:
$ tt run -i
Tarantool 3.0.0-0-g6ba34da7f8
type 'help' for interactive help
tarantool>
Initialize the instance and switch the input language to SQL:
Now you have a running Tarantool instance that accepts SQL input.
Create a table and execute SQL statements
CREATE, INSERT, UPDATE, SELECT
To get started, enter these SQL statements:
CREATE TABLE table1 (column1 INTEGER PRIMARY KEY, column2 VARCHAR(100));
INSERT INTO table1 VALUES (1, 'A');
UPDATE table1 SET column2 = 'B';
SELECT * FROM table1 WHERE column1 = 1;
The result of the SELECT statement looks like this:
The result includes:
- metadata: the names and data types of each column
- result rows
For conciseness, metadata is skipped in query results in this tutorial.
Only the result rows are shown.
Here is CREATE TABLE with more details:
- There are multiple columns, with different data types.
- There is a
PRIMARY KEY (unique and not-null) for two of the columns.
Create another table:
CREATE TABLE table2 (column1 INTEGER,
column2 VARCHAR(100),
column3 SCALAR,
column4 DOUBLE,
PRIMARY KEY (column1, column2));
The result is: row_count: 1.
Put four rows in the table (table2):
- The INTEGER and DOUBLE columns get numbers
- The VARCHAR and SCALAR columns get strings
(the SCALAR strings are expressed as hexadecimals)
INSERT INTO table2 VALUES (1, 'AB', X'4142', 5.5);
INSERT INTO table2 VALUES (1, 'CD', X'2020', 1E4);
INSERT INTO table2 VALUES (2, 'AB', X'2020', 12.34567);
INSERT INTO table2 VALUES (-1000, '', X'', 0.0);
Then try to put another row:
INSERT INTO table2 VALUES (1, 'AB', X'A5', -5.5);
This INSERT fails because of a primary-key violation: the row with the primary
key 1, 'AB' already exists.
Sequential scan is the scan through all the table rows instead of using indexes.
In Tarantool, SELECT SQL queries that perform sequential scans are prohibited by default.
For example, this query leads to the error Scanning is not allowed for 'table2':
To execute a scan query, put the SEQSCAN keyword before the table name:
SELECT * FROM SEQSCAN table2;
Try to execute these queries that use indexed column1 in filters:
SELECT * FROM table2 WHERE column1 = 1;
SELECT * FROM table2 WHERE column1 + 1 = 2;
The result is:
The first query returns rows:
The second query fails with the error Scanning is not allowed for 'TABLE2'.
Although column1 is indexed, the expression column1 + 1 is not calculated
from the index, which makes this SELECT a scan query.
Note
To enable SQL scan queries without SEQSCAN for the current session,
run this command:
SET SESSION "sql_seq_scan" = true;
Learn more about using SEQSCAN in the SQL FROM clause description.
SELECT with ORDER BY clause
Retrieve the 4 rows in the table, in descending order by column2, then
(where the column2 values are the same) in ascending order by column4.
* is short for “all columns”.
SELECT * FROM SEQSCAN table2 ORDER BY column2 DESC, column4 ASC;
The result is:
SELECT with WHERE clauses
Retrieve some of what you inserted:
- The first statement uses the
LIKE comparison operator which is asking
for “first character must be ‘A’, the next characters can be anything.”
- The second statement uses logical operators and parentheses, so the
AND expressions must be true, or the OR
expression must be true. Notice the columns don’t have to be indexed.
SELECT column1, column2, column1 * column4 FROM SEQSCAN table2 WHERE column2
LIKE 'A%';
SELECT column1, column2, column3, column4 FROM SEQSCAN table2
WHERE (column1 < 2 AND column4 < 10)
OR column3 = X'2020';
The first result is:
The second result is:
SELECT with GROUP BY and aggregate functions
Retrieve with grouping.
The rows that have the same values for column2 are grouped and are aggregated
– summed, counted, averaged – for column4.
SELECT column2, SUM(column4), COUNT(column4), AVG(column4)
FROM SEQSCAN table2
GROUP BY column2;
The result is:
Complications and complex SELECTs
Insert rows that contain NULL values.
NULL is not the same as Lua nil; it commonly is used in SQL for unknown
or not-applicable.
INSERT INTO table2 VALUES (1, NULL, X'4142', 5.5);
INSERT INTO table2 VALUES (0, '!!@', NULL, NULL);
INSERT INTO table2 VALUES (0, '!!!', X'00', NULL);
The results are:
- The first
INSERT fails because NULL is not
permitted for a column that was defined with a
PRIMARY KEY clause.
- The other
INSERT statements succeed.
Create a new index on column4.
There already is an index for the primary key. Indexes are useful for making queries
faster. In this case, the index also acts as a constraint, because it prevents
two rows from having the same values in column4. However, it is not an error
that column4 has multiple occurrences of NULLs.
CREATE UNIQUE INDEX i ON table2 (column4);
The result is: rowcount: 1.
Create a table table3, which contains a subset of the table2 columns
and a subset of the table2 rows.
You can do this by combining INSERT with SELECT. Then select everything
from the result table.
CREATE TABLE table3 (column1 INTEGER, column2 VARCHAR(100), PRIMARY KEY
(column2));
INSERT INTO table3 SELECT column1, column2 FROM SEQSCAN table2 WHERE column1 <> 2;
SELECT * FROM SEQSCAN table3;
The result is:
A subquery is a query within a query.
Find all the rows in table2 whose (column1, column2) values are not
present in table3.
SELECT * FROM SEQSCAN table2 WHERE (column1, column2) NOT IN (SELECT column1,
column2 FROM SEQSCAN table3);
The result is the single row that was excluded when inserting the rows with
the INSERT ... SELECT statement:
A join is a combination of two tables. There is more than one way to do them in
Tarantool, for example, “Cartesian joins” or “left outer joins”.
This example shows the most typical case, where column values from one table match
column values from another table.
SELECT * FROM SEQSCAN table2, table3
WHERE table2.column1 = table3.column1 AND table2.column2 = table3.column2
ORDER BY table2.column4;
The result is:
Constraints and foreign keys
CREATE TABLE with a CHECK clause
Create a table that includes a constraint – there must not be any rows
containing 13 in column2. After that, try to insert the following row:
CREATE TABLE table4 (column1 INTEGER PRIMARY KEY, column2 INTEGER, CHECK
(column2 <> 13));
INSERT INTO table4 VALUES (12, 13);
Result: the insert fails, as it should, with the message
Check constraint 'ck_unnamed_TABLE4_1' failed for tuple.
CREATE TABLE with a FOREIGN KEY clause
Create a table that includes a constraint: there must not be any rows containing
values that do not appear in table2.
CREATE TABLE table5 (column1 INTEGER, column2 VARCHAR(100),
PRIMARY KEY (column1),
FOREIGN KEY (column1, column2) REFERENCES table2 (column1, column2));
INSERT INTO table5 VALUES (2,'AB');
INSERT INTO table5 VALUES (3,'AB');
Result:
- The first
INSERT statement succeeds because
table3 contains a row with [2, 'AB', ' ', 12.34567].
- The second
INSERT statement, correctly, fails with the message
Foreign key constraint ''fk_unnamed_TABLE5_1'' failed: foreign tuple was not found.
Due to earlier INSERT statements, these values are in column4 of table2:
{0, NULL, NULL, 5.5, 10000, 12.34567}. Add 5 to each of these values except 0.
Adding 5 to NULL results in NULL, as SQL arithmetic requires.
Use SELECT to see what happened to column4.
UPDATE table2 SET column4 = column4 + 5 WHERE column4 <> 0;
SELECT column4 FROM SEQSCAN table2 ORDER BY column4;
The result is: {NULL, NULL, 0, 10.5, 17.34567, 10005}.
Due to earlier INSERT statements, there are 6 rows in table2:
Try to delete the last and first of these rows:
DELETE FROM table2 WHERE column1 = 2;
DELETE FROM table2 WHERE column1 = -1000;
SELECT COUNT(column1) FROM SEQSCAN table2;
The result is:
- The first
DELETE statement causes an error because
there’s a foreign-key constraint.
- The second
DELETE statement succeeds.
- The
SELECT statement shows that there are 5 rows remaining.
ALTER TABLE with a FOREIGN KEY clause
Create another constraint that there must not be any rows in table1
containing values that do not appear in table5. This was impossible
during the table1 creation because at that time table5 did not exist.
You can add constraints to existing tables with the ALTER TABLE statement.
ALTER TABLE table1 ADD CONSTRAINT c
FOREIGN KEY (column1) REFERENCES table5 (column1);
DELETE FROM table1;
ALTER TABLE table1 ADD CONSTRAINT c
FOREIGN KEY (column1) REFERENCES table5 (column1);
Result: the ALTER TABLE statement fails the first time because there is a row
in table1, and ADD CONSTRAINT requires that the table be empty.
After the row is deleted, the ALTER TABLE statement completes successfully.
Now there is a chain of references, from table1 to table5 and from table5
to table2.
The idea of a trigger is: if a change (INSERT or UPDATE or DELETE) happens,
then a further action – perhaps another INSERT or UPDATE or DELETE
– will happen.
Set up the following trigger: when a update to table3 is done, do an update
to table2. Specify this as FOR EACH ROW, so that the trigger activates 5
times (since there are 5 rows in table3).
SELECT column4 FROM table2 WHERE column1 = 2;
CREATE TRIGGER tr AFTER UPDATE ON table3 FOR EACH ROW
BEGIN UPDATE table2 SET column4 = column4 + 1 WHERE column1 = 2; END;
UPDATE table3 SET column2 = column2;
SELECT column4 FROM table2 WHERE column1 = 2;
Result:
- The first
SELECT shows that the original value of
column4 in table2 where column1 = 2 was: 17.34567.
- The second
SELECT returns:
You can manipulate string data (usually defined with CHAR or VARCHAR data types)
in many ways. For example:
- concatenate strings with the
|| operator
- extract substrings with the
SUBSTR function
SELECT column2, column2 || column2, SUBSTR(column2, 2, 1) FROM SEQSCAN table2;
The result is:
You can also manipulate number data (usually defined with INTEGER
or DOUBLE data types) in many ways. For example:
- shift left with the
<< operator
- get modulo with the
% operator
SELECT column1, column1 << 1, column1 << 2, column1 % 2 FROM SEQSCAN table2;
The result is:
Tarantool can handle:
- integers anywhere in the 4-byte integer range
- approximate-numerics anywhere in the 8-byte IEEE floating point range
- any Unicode characters, with UTF-8 encoding and a choice of collations
Insert such values in a new table and see what happens when you select them
with arithmetic on a number column and ordering by a string column.
CREATE TABLE t6 (column1 INTEGER, column2 VARCHAR(10), column4 DOUBLE,
PRIMARY KEY (column1));
INSERT INTO t6 VALUES (-1234567890, 'АБВГД', 123456.123456);
INSERT INTO t6 VALUES (+1234567890, 'GD', 1e30);
INSERT INTO t6 VALUES (10, 'FADEW?', 0.000001);
INSERT INTO t6 VALUES (5, 'ABCDEFG', NULL);
SELECT column1 + 1, column2, column4 * 2 FROM SEQSCAN t6 ORDER BY column2;
The result is:
A view (or viewed table), is virtual, meaning that its rows aren’t physically
in the database, their values are calculated from other tables.
Create a view v3 based on table3 and select from it:
CREATE VIEW v3 AS SELECT SUBSTR(column2,1,2), column4 FROM SEQSCAN t6
WHERE column4 >= 0;
SELECT * FROM v3;
The result is:
By putting WITH + SELECT in front of a SELECT, you can make a
temporary view that lasts for the duration of the statement.
Create such a view and select from it:
WITH cte AS (
SELECT SUBSTR(column2,1,2), column4 FROM SEQSCAN t6
WHERE column4 >= 0)
SELECT * FROM cte;
The result is the same as the CREATE VIEW result:
Tarantool can handle statements like SELECT 55; (select without FROM)
like some other popular DBMSs. But it also handles the more standard statement
VALUES (expression [, expression ...]);.
SELECT 55 * 55, 'The rain in Spain';
VALUES (55 * 55, 'The rain in Spain');
The result of both these statements is:
You can execute SQL statements directly from the Lua code without switching to
the SQL input.
Change the settings so that the console accepts statements written in Lua instead
of statements written in SQL:
You can invoke SQL statements using the Lua function box.execute(string).
The result is:
Create a million-row table
To see how the SQL in Tarantool scales, create a bigger table.
The following Lua code generates one million rows with random data and
inserts them into a table. Copy this code into the Tarantool console and wait
a bit:
box.execute("CREATE TABLE tester (s1 INT PRIMARY KEY, s2 VARCHAR(10))");
function string_function()
local random_number
local random_string
random_string = ""
for x = 1, 10, 1 do
random_number = math.random(65, 90)
random_string = random_string .. string.char(random_number)
end
return random_string
end;
function main_function()
local string_value, t, sql_statement
for i = 1, 1000000, 1 do
string_value = string_function()
sql_statement = "INSERT INTO tester VALUES (" .. i .. ",'" .. string_value .. "')"
box.execute(sql_statement)
end
end;
start_time = os.clock();
main_function();
end_time = os.clock();
print('insert done in ' .. end_time - start_time .. ' seconds');
The result is: you now have a table with a million rows, with a message saying
insert done in 88.570578 seconds.
Select from a million-row table
Check how SELECT works on the million-row table:
- the first query goes by an index because
s1 is the primary key
- the second query does not go by an index
box.execute([[SELECT * FROM tester WHERE s1 = 73446;]]);
box.execute([[SELECT * FROM SEQSCAN tester WHERE s2 LIKE 'QFML%';]]);
The result is:
- the first statement completes instantaneously
- the second statement completed noticeably slower
To cleanup all the objects created in this tutorial, switch to the SQL input
language again. Then run the DROP statements for all created tables, views,
and triggers.
These statements must be entered separately.
Transactions
Transactions allow users to perform multiple operations atomically.
For more information on how transactions work in Tarantool, see the following sections:
Transaction model
The transaction model of Tarantool corresponds to the properties ACID
(atomicity, consistency, isolation, durability).
Tarantool has two modes of transaction behavior:
- Default – suitable for fast monopolistic atomic transactions
- MVCC – designed for long-running concurrent transactions
Each transaction in Tarantool is executed in a single fiber on a single thread, sees a consistent database state
and commits all changes atomically.
All transaction changes are written to the WAL (Write Ahead Log)
in a single batch in a specific order at the time of the
commit.
If needed, transaction changes can also be rolled back –
completely or to
a specified savepoint.
Therefore, every transaction in Tarantool has the highest
transaction isolation level – serializable.
By default, the isolation level of Tarantool is serializable.
The exception is a failure during writing to the WAL, which can occur, for example, when the disk space is over.
In this case, the isolation level of the concurrent read transaction would be read-committed.
The MVСС mode provides several options that enable you to tune
the visibility behavior during transaction execution.
The read-committed isolation level makes visible all transactions that started commit (box.commit() was called).
Write transactions with reads
Manual usage of read-committed for write transactions with reads is completely safe, as this transaction will eventually result in a commit.
If a previous transactions fails, this transaction will inevitably fail as well due to the serializable isolation level.
Read transactions
Manual usage of read-committed for read transactions may be unsafe, as it may lead to phantom reads.
The read-confirmed isolation level makes visible all transactions that finished
the commit (box.commit() was returned).
This means that new data is already on disk or even on other replicas.
Read transactions
The use of read-confirmed is safe for read transactions given that data
is on disk (for asynchronous replication) or even in other replicas
(for synchronous replication).
Write transactions
To achieve serializable, any write transaction should read all data that has already been committed.
Otherwise, it may conflict when it reaches its commit.
Linearizability of read operations implies that if a response for a write request arrived earlier than a read request was made, this read request should return the results of the write request.
When called with linearizable, box.begin() yields until the instance receives enough data from remote peers to be sure that the transaction is linearizable.
Linearizable transactions may only perform requests to the following memtx space types:
A linearizable transaction can fail with an error in the following cases:
- If the node can’t contact enough remote peers to determine which data is committed.
- If the data isn’t received during the
timeout specified in box.begin().
Note
To start a linearizable transaction, the node should be the replication source for at least N - Q + 1 remote replicas.
Here N is the count of registered nodes in the cluster and Q is replication_synchro_quorum.
So, for example, you can’t perform a linearizable transaction on anonymous replicas because they can’t be the source of replication for other nodes.
To minimize the possibility of conflicts, MVCC uses what is called best-effort visibility:
This inevitably leads to the serializable isolation level.
Since there is no option for MVCC to analyze the whole transaction to make a decision, it makes the choice on
the first operation.
Note
If the serializable isolation level becomes unreachable, the transaction is marked as “conflicted”
and rolled back.
Thread model
The thread model assumes that a query received by Tarantool via network
is processed with three operating system threads:
The network thread (or threads)
on the server side receives the query, parses
the statement, checks if it is correct, and then transforms it into a special
structure – a message containing an executable statement and its options.
The network thread sends this message to the instance’s
transaction processor thread (TX thread) via a lock-free message bus.
Lua programs are executed directly in the transaction processor thread,
and do not need to be parsed and prepared.
The TX thread either uses a space index to find and update the tuple,
or executes a stored function that performs a data operation.
The execution of the operation results in a message to the
write-ahead logging (WAL) thread used to commit
the transaction and the fiber executing the transaction is suspended.
When the transaction results in a COMMIT or ROLLBACK, the following actions are taken:
- The WAL thread responds with a message to the TX thread.
- The fiber executing the transaction is resumed to process the result of the transaction.
- The result of the fiber execution is passed to the network thread,
and the network thread returns the result to the client.
Note
There is only one TX thread in Tarantool.
Some users are used to the idea that there can be multiple threads
working on the database. For example, thread #1 reads a row #x while
thread #2 writes a row #y. With Tarantool this does not happen.
Only the TX thread can access the database,
and there is only one TX thread for each Tarantool instance.
The TX thread can handle many fibers –
a set of computer instructions that can contain “yield” signals.
The TX thread executes all computer instructions up to a
yield signal, and then switches to execute the instructions of another fiber.
Yields must happen, otherwise the TX thread would
be permanently stuck on the same fiber.
There are also several supplementary threads that serve additional capabilities:
For replication, Tarantool creates a separate thread for each connected replica.
This thread reads a write-ahead log and sends it to the replica, following its position in the log.
Separate threads are required because each replica can point to a different position in the log and can run at different speeds.
There is a thread pool for ad hoc asynchronous tasks, such as a DNS resolver or fsync.
There is a thread pool that can be used for parallel sorting (hence, to parallelize building indexes).
To configure it, use the memtx.sort_threads configuration option.
The option sets the number of threads used to sort keys of secondary indexes on loading a memtx database.
Note
Since 3.0.0, this option replaces the approach when OpenMP threads are used to parallelize sorting.
For backward compatibility, the OMP_NUM_THREADS environment variable is taken into account to
set the number of sorting threads.
Transaction mode: default
By default, Tarantool does not allow “yielding” inside a memtx
transaction and the transaction manager is disabled. This allows fast
atomic transactions without conflicts, but brings some limitations:
- You cannot use interactive transactions.
- Any fiber yield leads to the abort of a transaction.
- All changes are made immediately, but in the event of a yield or error,
the transaction is rolled back, including the return of the previous data.
To learn how to enable yielding inside a memtx transaction, see Transaction mode: MVCC.
To switch back to the default mode, disable the transaction manager:
box.cfg { memtx_use_mvcc_engine = false }
Transaction mode: MVCC
Since version 2.6.1,
Tarantool has another transaction behavior mode that
allows “yielding” inside a memtx transaction.
This is controlled by the transaction manager.
This mode allows concurrent transactions but may cause conflicts.
You can use this mode on the memtx storage engine.
The vinyl storage engine also supports MVCC mode,
but has a different implementation.
Note
Currently, you cannot use several different storage engines within one transaction.
The transaction manager is designed to isolate concurrent transactions
and provides a serializable
transaction isolation level.
It consists of two parts:
MVCC – multi version concurrency control engine, which stores all change actions of all
transactions. It also creates the transaction view of the database state and a read view
(a fixed state of the database that is never changed by other transactions) when necessary.
Conflict manager – a manager that tracks changes to transactions and determines their correctness
in the serialization order. The conflict manager declares transactions to be in conflict
or sends transactions to read views when necessary.
Since version 2.10.1, the conflict manager detects conflicts right after
the first one of several conflicting transactions is committed. After this moment, any CRUD operations
in the conflicted transaction will result in errors until the transaction is
rolled back.
The transaction manager also provides a non-classical snapshot isolation level – this snapshot is not
necessarily tied to the start time of the transaction, like the classical snapshot where a transaction
can get a consistent snapshot of the database. The conflict manager decides if and when each transaction
gets which snapshot. This avoids some conflicts compared to the classic snapshot isolation approach.
Warning
Currently, the isolation level of BITSET and RTREE indexes
in MVCC transaction mode is read-committed (not serializable, as stated).
If a transaction uses these indexes, it can read committed or confirmed data (depending on the isolation level).
However, the indexes are subject to different anomalies that can make them unserializable.
Enabling the transaction manager
By default, the transaction manager is disabled. Use the memtx_use_mvcc_engine
option to enable it via box.cfg.
box.cfg{memtx_use_mvcc_engine = true}
Setting the transaction isolation level
The transaction manager has the following options for the transaction isolation level:
best-effort (default)
read-committed
read-confirmed
linearizable (only for a specific transaction)
Using best-effort as the default option allows MVCC to consider the actions of transactions
independently and determine the best isolation level for them.
It increases the probability of successful completion of the transaction and helps to avoid possible conflicts.
To set another default isolation level, for example, read-committed, use the following command:
box.cfg { txn_isolation = 'read-committed' }
Note that the linearizable isolation level can’t be set as default and can be used for a specific transaction only.
You can set an isolation level for a specific transaction in its box.begin() call:
box.begin({ txn_isolation = 'best-effort' })
In this case, you can also use the default option. It sets the transaction’s isolation level
to the one set in box.cfg.
Note
For autocommit transactions (actions with a statement without explicit box.begin/box.commit calls)
there is a rule:
- Read-only transactions (for example,
select) are performed with read-confirmed.
- All other transactions (for example,
replace) are performed with read-committed.
You can also set the isolation level in the net.box stream:begin() method
and IPROTO_BEGIN binary protocol request.
Choosing the better option depends on whether you have conflicts or not.
If you have many conflicts, you should set a different option or use
the default transaction mode.
Examples with MVCC enabled and disabled
Create a file init.lua, containing the following:
fiber = require 'fiber'
box.cfg{ listen = '127.0.0.1:3301', memtx_use_mvcc_engine = false }
box.schema.user.grant('guest', 'super', nil, nil, {if_not_exists = true})
tickets = box.schema.create_space('tickets', { if_not_exists = true })
tickets:format({
{ name = "id", type = "number" },
{ name = "place", type = "number" },
})
tickets:create_index('primary', {
parts = { 'id' },
if_not_exists = true
})
Connect to the instance using the tt connect command:
tt connect 127.0.0.1:3301
Then try to execute the transaction with yield inside:
box.atomic(function() tickets:replace{1, 429} fiber.yield() tickets:replace{2, 429} end)
You will receive an error message:
Also, if you leave a transaction open while returning from a request, you will get an error message:
Change memtx_use_mvcc_engine to true, restart Tarantool, and try again:
Now check if this transaction was successful:
Streams and interactive transactions
Since v. 2.10.0, IPROTO implements streams and interactive
transactions that can be used when memtx_use_mvcc_engine
is enabled on the server.
A stream supports multiplexing several transactions over one connection.
Each stream has its own identifier, which is unique within the connection.
All requests with the same non-zero stream ID belong to the same stream.
All requests in a stream are executed strictly sequentially.
This allows the implementation of
interactive transactions.
If the stream ID of a request is 0, it does not belong to any stream and is
processed in the old way.
In net.box, a stream is an object above
the connection that has the same methods but allows sequential execution of requests.
The ID is automatically generated on the client side.
If a user writes their own connector and wants to use streams,
they must transmit the stream_id over the IPROTO protocol.
Unlike a thread, which involves multitasking and execution within a program,
a stream transfers data via the protocol between a client and a server.
An interactive transaction is one that does not need to be sent in a single request.
There are multiple ways to begin, commit, and roll back a transaction, and they can be mixed.
You can use stream:begin(), stream:commit(),
stream:rollback() or the appropriate stream methods
– call, eval, or execute – using the SQL transaction syntax.
Let’s create a Lua client (client.lua) and run it with Tarantool:
local net_box = require 'net.box'
local conn = net_box.connect('127.0.0.1:3301')
local conn_tickets = conn.space.tickets
local yaml = require 'yaml'
local stream = conn:new_stream()
local stream_tickets = stream.space.tickets
-- Begin transaction over an iproto stream:
stream:begin()
print("Replaced in a stream\n".. yaml.encode( stream_tickets:replace({1, 768}) ))
-- Empty select, the transaction was not committed.
-- You can't see it from the requests that do not belong to the
-- transaction.
print("Selected from outside of transaction\n".. yaml.encode(conn_tickets:select({}, {limit = 10}) ))
-- Select returns the previously inserted tuple
-- because this select belongs to the transaction:
print("Selected from within transaction\n".. yaml.encode(stream_tickets:select({}, {limit = 10}) ))
-- Commit transaction:
stream:commit()
-- Now this select also returns the tuple because the transaction has been committed:
print("Selected again from outside of transaction\n".. yaml.encode(conn_tickets:select({}, {limit = 10}) ))
os.exit()
Then call it and see the following output:
Replaced in a stream
--- [1, 768]
...
Selected from outside of transaction
---
- [1, 429]
- [2, 429]
...
Selected from within transaction
---
- [1, 768]
- [2, 429]
...
Selected again from outside of transaction
---
- [1, 768]
- [2, 429]
...```
Replication
Replication allows multiple Tarantool instances to work on copies of the same
databases. The databases are kept in sync because each instance can communicate
its changes to all the other instances.
This section includes the following topics:
For practical guides to replication, see Replication tutorials.
You can learn about bootstrapping a replica set, adding instances to the replica set, or removing them.
Replication architecture
A pack of instances that operate on copies of the same databases makes up a replica set.
Each instance in a replica set has a role: master or replica.
A replica gets all updates from the master by continuously fetching and applying
its write-ahead log (WAL). Each record in the WAL represents a single
Tarantool data-change request such as INSERT,
UPDATE, or DELETE, and is assigned
a monotonically growing log sequence number (LSN). In essence, Tarantool
replication is row-based: each data-change request is fully deterministic
and operates on a single tuple. However, unlike a classical row-based log, which
contains entire copies of the changed rows, Tarantool’s WAL contains copies of the requests.
For example, for UPDATE requests, Tarantool only stores the primary key of the row and
the update operations to save space.
Note
WAL extensions available in Tarantool Enterprise Edition enable you to add auxiliary information to each write-ahead log record.
This information might be helpful for implementing a CDC (Change Data Capture) utility that transforms a data replication stream.
The following are specifics of adding different types of information to the WAL:
- Invocations of stored programs are not written to the WAL.
Instead, records of the actual data-change requests, performed by the Lua code, are written to the WAL.
This ensures that the possible non-determinism of Lua does not cause replication to go out of sync.
- Data definition operations on temporary spaces (created with
temporary = true), such as creating/dropping, adding indexes, and truncating, are written to the WAL, since information about temporary spaces is stored in non-temporary system spaces, such as box.space._space.
- Data change operations on temporary spaces are not written to the WAL and are not replicated.
- Data change operations on replication-local spaces (created with
is_local = true) are written to the WAL but are not replicated.
To learn how to enable replication, check the Bootstrapping a replica set guide.
To create a valid initial state, to which WAL changes can be applied, every instance of a replica set requires a start set of checkpoint files, such as .snap files for memtx and .run files for vinyl.
A replica goes through the following stages:
Bootstrap (optional)
When an entire replica set is bootstrapped for the first time, there is no master that could provide the initial checkpoint.
In such a case, replicas connect to each other and elect a master.
The master creates the starting set of checkpoint files and distributes them to all the other replicas.
This is called an automatic bootstrap of a replica set.
Join
At this stage, a replica downloads the initial state from the master.
The master register this replica in the box.space._cluster space.
If join fails with a non-critical error, for example, ER_READONLY, ER_ACCESS_DENIED, or a network-related issue, an instance tries to find a new master to join.
Note
On subsequent connections, a replica downloads all changes happened after the latest local LSN (there can be many LSNs – each master has its own LSN).
Follow
At this stage, a replica fetches and applies updates from the master’s write-ahead log.
You can use the box.info.replication[n].upstream.status property to monitor the status of a replica.
Replica set and instance UUIDs
Each replica set is identified by a globally unique identifier, called the replica set UUID.
The identifier is created by the master, which creates the very first checkpoint and is part of the checkpoint file. It is stored in the box.space._schema system space, for example:
Additionally, each instance in a replica set is assigned its own UUID, when it
joins the replica set. It is called an instance UUID and is a globally unique
identifier. The instance UUID is checked to ensure that instances do not join a different
replica set, e.g. because of a configuration error. A unique instance identifier
is also necessary to apply rows originating from different masters only once,
that is, to implement multi-master replication. This is why each row in the write-ahead log,
in addition to its log sequence number, stores the instance identifier
of the instance on which it was created. But using a UUID as such an identifier
would take too much space in the write-ahead log, thus a shorter integer number
is assigned to the instance when it joins a replica set. This number is then
used to refer to the instance in the write-ahead log. It is called
instance ID. All identifiers are stored in the system space
box.space._cluster, for example:
Here the instance ID is 1 (unique within the replica set), and the instance
UUID is 88580b5c-4474-43ab-bd2b-2409a9af80d2 (globally unique).
Using instance IDs is also handy for tracking the state of the entire
replica set. For example, box.info.vclock
describes the state of replication in regard to each connected peer.
Here vclock contains log sequence numbers (827 and 584) for instances with
instance IDs 1 and 2.
If required, you can explicitly specify the instance and the replica set UUID values rather than letting Tarantool generate them.
To learn more, see the replicaset_uuid configuration parameter description.
Replication roles: master and replica
The replication role (master or replica) is set by the
read_only configuration parameter. The recommended
role is “read_only” (replica) for all but one instance in the replica set.
In a master-replica configuration, every change that happens on the master will
be visible on the replicas, but not vice versa.

A simple two-instance replica set with the master on one machine and the replica
on a different machine provides two benefits:
- failover, because if the master goes down, then the replica can take over,
and
- load balancing, because clients can connect to either the master or the
replica for read requests.
In a master-master configuration (also called “multi-master”), every change
that happens on either instance will be visible on the other one.

The failover benefit in this case is still present, and the load-balancing
benefit is enhanced, because any instance can handle both read and write
requests. Meanwhile, for multi-master configurations, it is necessary to
understand the replication guarantees provided by the asynchronous protocol
that Tarantool implements.
Tarantool multi-master replication guarantees that each change on each master is
propagated to all instances and is applied only once. Changes from the same
instance are applied in the same order as on the originating instance. Changes
from different instances, however, can be mixed and applied in a different order on
different instances. This may lead to replication going out of sync in certain
cases.
For example, assuming the database is only appended to (i.e. it contains only
insertions), a multi-master configuration is safe. If there are also
deletions, but it is not mission critical that deletion happens in the same
order on all replicas (e.g. the DELETE is used to prune expired data),
a master-master configuration is also safe.
UPDATE operations, however, can easily go out of sync. For example, assignment
and increment are not commutative and may yield different results if applied
in a different order on different instances.
More generally, it is only safe to use Tarantool master-master replication if
all database changes are commutative: the end result does not depend on the
order in which the changes are applied. You can start learning more about
conflict-free replicated data types
here.
Replication topologies: cascade, ring, and full mesh
Replication topology is set by the replication
configuration parameter. The recommended topology is a full mesh because it
makes potential failover easy.
Some database products offer cascading replication topologies: creating a
replica on a replica. Tarantool does not recommend such a setup.

The problem with a cascading replica set is that some instances have no
connection to other instances and may not receive changes from them. One
essential change that must be propagated across all instances in a replica set
is an entry in box.space._cluster system space with the replica set UUID.
Without knowing the replica set UUID, a master refuses to accept connections from
such instances when replication topology changes. Here is how this can happen:

We have a chain of three instances. Instance #1 contains entries for instances
#1 and #2 in its _cluster space. Instances #2 and #3 contain entries for
instances #1, #2, and #3 in their _cluster spaces.

Now instance #2 is faulty. Instance #3 tries connecting to instance #1 as its
new master, but the master refuses the connection since it has no entry, for
example, #3.
Ring replication topology is, however, supported:

So, if you need a cascading topology, you may first create a ring to ensure all
instances know each other’s UUID, and then disconnect the chain in the place you
desire.
A stock recommendation for a master-master replication topology, however, is a
full mesh:

You then can decide where to locate instances of the mesh – within the same
data center, or spread across a few data centers. Tarantool will automatically
ensure that each row is applied only once on each instance. To remove a degraded
instance from a mesh, simply change the replication configuration parameter.
This ensures full cluster availability in case of a local failure, e.g. one of
the instances failing in one of the data centers, as well as in case of an
entire data center failure.
The maximal number of replicas in a mesh is 32.
Synchronous replication
By default, replication in Tarantool is asynchronous: if a transaction
is committed locally on a master node, it does not mean it is replicated onto any
replicas. If a master responds success to a client and then dies, after failover
to a replica, from the client’s point of view the transaction will disappear.
Synchronous replication exists to solve this problem. Synchronous transactions
are not considered committed and are not responded to a client until they are
replicated onto some number of replicas.
To enable synchronous replication, use the space_opts.is_sync option when creating or altering a space.
Synchronous and asynchronous transactions
A killer feature of Tarantool’s synchronous replication is its being per-space.
So, if you need it only rarely for some critical data changes, you won’t pay for
it in performance terms.
When there is more than one synchronous transaction, they all wait for being
replicated. Moreover, if an asynchronous transaction appears, it will
also be blocked by the existing synchronous transactions. This behavior is very
similar to a regular queue of asynchronous transactions because all the transactions
are committed in the same order as they make the box.commit() call.
So, here comes the commit rule:
transactions are committed in the same order as they make
the box.commit() call – regardless of being synchronous or asynchronous.
If one of the waiting synchronous transactions times out and is rolled back, it
will first roll back all the newer pending transactions. Again, just like how
asynchronous transactions are rolled back when WAL write fails.
So, here comes the rollback rule:
transactions are always rolled back in the order reversed from the one they
make the box.commit() call – regardless of being synchronous or asynchronous.
One more important thing is that if an asynchronous transaction is blocked by
a synchronous transaction, it does not become synchronous as well.
This just means it will wait for the synchronous transaction to be committed.
But once it is done, the asynchronous transaction will be committed
immediately – it won’t wait for being replicated itself.
Warning
Be careful when using synchronous and asynchronous transactions together.
Asynchronous transactions are considered committed even if there is no connection to other nodes.
Therefore, an old leader node (synchronous transaction queue owner) might have some
committed asynchronous transactions that no other replica set member has.
When the connection to such an old (previous) leader node is restored, it starts receiving data from the new leader.
At the same time, other replica set members receive the data from the previous leader that they don’t have yet.
The data from the previous leader contains some committed asynchronous transactions.
At this time, the integrity protection will throw
the ER_SPLIT_BRAIN error, which will force the user to rebootstrap the previous leader.
Limitations and known problems
Until version 2.5.2,
there was no way to enable synchronous replication for
existing spaces, but since 2.5.2 it can be enabled by
space_object:alter({is_sync = true}).
Synchronous transactions work only for master-slave topology. You can have multiple
replicas, anonymous replicas, but only one node can make synchronous transactions.
Since Tarantool 2.10.0, anonymous replicas do not participate in the quorum.
Starting from version 2.6.1,
Tarantool has the built-in functionality
managing automated leader election in a replica set. For more information,
refer to the corresponding chapter.
Automated leader election
Starting from version 2.6.1,
Tarantool has the built-in functionality
managing automated leader election in a replica set.
This functionality increases the fault tolerance of the systems built
on the base of Tarantool and decreases
dependency on external tools for replica set management.
To learn how to configure and monitor automated leader elections,
check Managing leader elections.
The following topics are described below:
Leader election and synchronous replication
Leader election and synchronous replication are implemented in Tarantool as
a modification of the Raft
algorithm.
Raft is an algorithm of synchronous replication and automatic leader election.
Its complete description can be found in the corresponding document.
In Tarantool, synchronous replication and leader election
are supported as two separate subsystems.
So it is possible to get synchronous replication
but use an alternative algorithm for leader election.
And vice versa – elect a leader
in the cluster but don’t use synchronous spaces at all.
Synchronous replication has a separate documentation section.
Leader election is described below.
Note
The system behavior can be specified exactly according to the Raft algorithm. To do this:
Automated leader election in Tarantool helps guarantee that
there is at most one leader at any given moment of time in a replica set.
A leader is a writable node, and all other nodes are non-writable –
they accept read-only requests exclusively.
When the election is enabled, the life cycle of
a replica set is divided into so-called
terms. Each term is described by a monotonically growing number.
After the first boot, each node has its term equal to 1. When a node sees that
it is not a leader and there is no leader available for some time in the replica
set, it increases the term and starts a new leader election round.
Leader election happens via votes. The node that started the election votes
for itself and sends vote requests to other nodes.
Upon receiving vote requests, a node votes for the first of them, and then cannot
do anything in the same term but wait for a leader to be elected.
The node that collected a quorum of votes defined by the replication.synchro_quorum parameter
becomes the leader
and notifies other nodes about that. Also, a split vote can happen
when no nodes received a quorum of votes. In this case,
after a random timeout,
each node increases its term and starts a new election round if no new vote
request with a greater term arrives during this time.
Eventually, a leader is elected.
If any unfinalized synchronous transactions are left from the previous leader,
the new leader finalizes them automatically.
All the non-leader nodes are called followers. The nodes that start a new
election round are called candidates. The elected leader sends heartbeats to
the non-leader nodes to let them know it is alive.
In case there are no heartbeats for the period of replication.timeout * 4,
a non-leader node starts a new election if the following conditions are met:
- The node has a quorum of connections to other cluster members.
- None of these cluster members can see the leader node.
Note
A cluster member considers the leader node to be alive if the member received heartbeats from the leader at least
once during the replication.timeout * 4,
and there are no replication errors (the connection is not broken due to timeout or due to an error).
Terms and votes are persisted by each instance to preserve certain Raft guarantees.
During the election, the nodes prefer to vote for those ones that have the
newest data. So as if an old leader managed to send something before its death
to a quorum of replicas, that data wouldn’t be lost.
When election is enabled, there must be connections
between each node pair so as it would be the full mesh topology. This is needed
because election messages for voting and other internal things need a direct
connection between the nodes.
In the classic Raft algorithm, a leader doesn’t track its connectivity to the rest of the cluster.
Once the leader is elected, it considers itself in the leader position until receiving a new term from another cluster node.
This can lead to a split situation if the other nodes elect a new leader upon losing the connectivity to the previous one.
The issue is resolved in Tarantool version 2.10.0 by introducing the leader fencing mode.
The mode can be switched by the replication.election_fencing_mode configuration parameter.
When the fencing is set to soft or strict, the leader resigns its leadership if it has less than
replication.synchro_quorum of alive connections to the cluster nodes.
The resigning leader receives the status of a follower in the current election term and becomes read-only.
Leader fencing can be turned off by setting the replication.election_fencing_mode configuration parameter to off.
In soft mode, a connection is considered dead if there are no responses for
4 * replication.timeout seconds both on the current leader and the followers.
In strict mode, a connection is considered dead if there are no responses
for 2 * replication.timeout seconds on the current leader and for
4 * replication.timeout seconds on the followers.
This improves chances that there is only one leader at any time.
Fencing applies to the instances that have the replication.election_mode set to “candidate” or “manual”.
There can still be a situation when a replica set has two leaders working independently (so-called split-brain).
It can happen, for example, if a user mistakenly lowered the replication.synchro_quorum below N / 2 + 1.
In this situation, to preserve the data integrity, if an instance detects the split-brain anomaly in the incoming replication data,
it breaks the connection with the instance sending the data and writes the ER_SPLIT_BRAIN error in the log.
Eventually, there will be two sets of nodes with the diverged data,
and any node from one set is disconnected from any node from the other set with the ER_SPLIT_BRAIN error.
Once noticing the error, a user can choose any representative from each of the sets and inspect the data on them.
To correlate the data, the user should remove it from the nodes of one set,
and reconnect them to the nodes from the other set that have the correct data.
Also, if election is enabled on the node, it doesn’t replicate from any nodes except
the newest leader. This is done to avoid the issue when a new leader is elected,
but the old leader has somehow survived and tries to send more changes
to the other nodes.
Term numbers also work as a kind of filter.
For example, if election is enabled on two nodes and node1 has the term number less than node2,
then node2 doesn’t accept any transactions from node1.
Managing leader elections
replication:
election_mode: <string>
election_fencing_mode: <string>
election_timeout: <seconds>
timeout: <seconds>
synchro_quorum: <count>
It is important to know that being a leader is not the only requirement for a node to be writable.
The leader should also satisfy the following requirements:
- The database.mode option is set to
rw.
- The leader shouldn’t be in the orphan state.
Nothing prevents you from setting the database.mode option to ro,
but the leader won’t be writable then. The option doesn’t affect the
election process itself, so a read-only instance can still vote and become
a leader.
To monitor the current state of a node regarding the leader election, use the box.info.election function.
Example:
tarantool> box.info.election
---
- state: follower
vote: 0
leader: 0
term: 1
...
The Raft-based election implementation logs all its actions
with the RAFT: prefix. The actions are new Raft message handling,
node state changing, voting, and term bumping.
Leader election doesn’t work correctly if the election quorum is set to less or equal
than <cluster size> / 2. In that case, a split vote can lead to
a state when two leaders are elected at once.
For example, suppose there are five nodes. When the quorum is set to 2, node1
and node2 can both vote for node1. node3 and node4 can both vote
for node5. In this case, node1 and node5 both win the election.
When the quorum is set to the cluster majority, that is
(<cluster size> / 2) + 1 or greater, the split vote is impossible.
That should be considered when adding new nodes.
If the majority value is changing, it’s better to update the quorum on all the existing nodes
before adding a new one.
Also, the automated leader election doesn’t bring many benefits in terms of data
safety when used without synchronous replication.
If the replication is asynchronous and a new leader gets elected,
the old leader is still active and considers itself the leader.
In such case, nothing stops
it from accepting requests from clients and making transactions.
Non-synchronous transactions are successfully committed because
they are not checked against the quorum of replicas.
Synchronous transactions fail because they are not able
to collect the quorum – most of the replicas reject
these old leader’s transactions since it is not a leader anymore.
Supervised failover
Example on GitHub: supervised_failover
Tarantool provides the ability to control leadership in a replica set using an external failover coordinator.
A failover coordinator reads a cluster configuration from a file or an etcd-based configuration storage, polls instances for their statuses, and appoints a leader for each replica set depending on the availability and health of instances.
To increase fault tolerance, you can run two or more failover coordinators.
In this case, an etcd cluster provides synchronization between coordinators.
The main steps of using an external failover coordinator for a newly configured cluster might look as follows:
- Configure a cluster to work with an external coordinator.
The main step is setting the
replication.failover option to supervised for all replica sets that should be managed by the external coordinator.
- Start a configured cluster.
When an external coordinator is still not running, instances in a replica set start in the following modes:
- If a replica set is already bootstrapped, all instances are started in read-only mode.
- If a replica set is not bootstrapped, one instance is started in read-write mode.
- Start a failover coordinator.
You can start two or more failover coordinators to increase fault tolerance.
In this case, one coordinator is active and others are passive.
Once a cluster and failover coordinators are up and running, a failover coordinator appoints one instance to be a master if there is no master instance in a replica set.
Then, the following events may occur:
- If a master instance fails, a failover coordinator performs an automated failover.
- If an active failover coordinator fails, another coordinator becomes active and performs an automated failover.
Note
Note that a failover coordinator doesn’t work with replica sets with two or more read-write instances.
In this case, a coordinator logs a warning to stdout and doesn’t perform any appointments.
Appointing a new master instance
After a master instance has been appointed, a failover coordinator monitors the statuses of all instances in a replica set by sending requests each probe_interval seconds.
For the master instance, the coordinator maintains a read-write mode deadline, which is renewed periodically each renew_interval seconds.
If all attempts to renew the deadline fail during the specified time interval (lease_interval), the master switches to read-only mode.
Then, the coordinator appoints a new instance as the master.
If a remote etcd-based storage is used to maintain the state of failover coordinators, you can also perform a manual failover.
Active and passive coordinators
To increase fault tolerance, you can run two or more failover coordinators.
In this case, only one coordinator is active and used to control leadership in a replica set.
Other coordinators are passive and don’t perform any read-write appointments.
To maintain the state of coordinators, Tarantool uses a stateboard – a remote etcd-based storage.
This storage uses the same connection settings as a centralized etcd-based configuration storage.
If a cluster configuration is stored in the <prefix>/config/* keys in etcd, the failover coordinator looks into <prefix>/failover/* for its state.
Here are a few examples of keys used for different purposes:
<prefix>/failover/info/by-uuid/<uuid>: contains a state of a failover coordinator identified by the specified uuid.
<prefix>/failover/active/lock: a unique identifier (UUID) of an active failover coordinator.
<prefix>/failover/active/term: a kind of fencing token allowing to have an order in which coordinators become active (took the lock) over time.
<prefix>/failover/command/<id>: a key used to perform a manual failover.
To configure a cluster to work with an external failover coordinator, follow the steps below:
(Optional) If you need to run several failover coordinators to increase fault tolerance, set up an etcd-based configuration storage, as described in Centralized configuration storages.
Set the replication.failover option to supervised:
replication:
failover: supervised
Grant a user used for replication permissions to execute the failover.execute function:
credentials:
users:
replicator:
password: 'topsecret'
roles: [ replication ]
privileges:
- permissions: [ execute ]
lua_call: [ 'failover.execute' ]
(Optional) Configure options that control how a failover coordinator operates in the failover section:
failover:
probe_interval: 5
lease_interval: 15
renew_interval: 5
stateboard:
keepalive_interval: 5
renew_interval: 1
You can find the full example on GitHub: supervised_failover.
Starting a failover coordinator
To start a failover coordinator, you need to execute the tarantool command with the failover option.
This command accepts the path to a cluster configuration file:
tarantool --failover --config instances.enabled/supervised_failover/config.yaml
If a cluster’s configuration is stored in etcd, the config.yaml file contains connection options for the etcd storage.
You can run two or more failover coordinators to increase fault tolerance.
In this case, only one coordinator is active and used to control leadership in a replica set.
Learn more from Active and passive coordinators.
Master-replica: manual failover
Example on GitHub: manual_leader
This tutorial shows how to configure and work with a replica set with manual failover.
Before starting this tutorial:
Install the tt utility.
Create a tt environment in the current directory by executing the tt init command.
Inside the instances.enabled directory of the created tt environment, create the manual_leader directory.
Inside instances.enabled/manual_leader, create the instances.yml and config.yaml files:
Configuring a replica set
This section describes how to configure a replica set in config.yaml.
Step 1: Configuring a failover mode
First, set the replication.failover option to manual:
replication:
failover: manual
Step 2: Defining a replica set topology
Define a replica set topology inside the groups section:
- The leader option sets
instance001 as a replica set leader.
- The iproto.listen option specifies an address used to listen for incoming requests and allows replicas to communicate with each other.
groups:
group001:
replicasets:
replicaset001:
leader: instance001
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
Step 3: Creating a user for replication
In the credentials section, create the replicator user with the replication role:
credentials:
users:
replicator:
password: 'topsecret'
roles: [replication]
Step 4: Specifying advertise URIs
Set iproto.advertise.peer to advertise the current instance to other replica set members:
iproto:
advertise:
peer:
login: replicator
The resulting replica set configuration should look as follows:
credentials:
users:
replicator:
password: 'topsecret'
roles: [replication]
iproto:
advertise:
peer:
login: replicator
replication:
failover: manual
groups:
group001:
replicasets:
replicaset001:
leader: instance001
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
Working with a replica set
After configuring a replica set, execute the tt start command from the tt environment directory:
$ tt start manual_leader
• Starting an instance [manual_leader:instance001]...
• Starting an instance [manual_leader:instance002]...
Check that instances are in the RUNNING status using the tt status command:
$ tt status manual_leader
INSTANCE STATUS PID MODE CONFIG BOX UPSTREAM
manual_leader:instance001 RUNNING 8841 RW ready running --
manual_leader:instance002 RUNNING 8842 RO ready running --
Checking a replica set status
Connect to instance001 using tt connect:
$ tt connect manual_leader:instance001
• Connecting to the instance...
• Connected to manual_leader:instance001
Make sure that the instance is in the running state by executing box.info.status:
manual_leader:instance001> box.info.status
---
- running
...
Check that the instance is writable using box.info.ro:
manual_leader:instance001> box.info.ro
---
- false
...
Execute box.info.replication to check a replica set status.
For instance002, upstream.status and downstream.status should be follow.
manual_leader:instance001> box.info.replication
---
- 1:
id: 1
uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
lsn: 7
name: instance001
2:
id: 2
uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
lsn: 0
upstream:
status: follow
idle: 0.3893879999996
peer: replicator@127.0.0.1:3302
lag: 0.00028800964355469
name: instance002
downstream:
status: follow
idle: 0.37777199999982
vclock: {1: 7}
lag: 0
...
To see the diagrams that illustrate how the upstream and downstream connections look,
refer to Monitoring a replica set.
To check that a replica (instance002) gets all updates from the master, follow the steps below:
On instance001, create a space and add data as described in CRUD operation examples.
Open the second terminal, connect to instance002 using tt connect, and use the select operation to make sure data is replicated.
Check that box.info.vclock values are the same on both instances:
instance001:
manual_leader:instance001> box.info.vclock
---
- {1: 21}
...
instance002:
manual_leader:instance002> box.info.vclock
---
- {1: 21}
...
Note
Note that a vclock value might include the 0 component that is related to local space operations and might differ for different instances in a replica set.
This section describes how to add a new replica to a replica set.
Adding an instance to the configuration
Add instance003 to the instances.yml file:
instance001:
instance002:
instance003:
Add instance003 with the specified iproto.listen option to the config.yaml file:
groups:
group001:
replicasets:
replicaset001:
leader: instance001
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
instance003:
iproto:
listen:
- uri: '127.0.0.1:3303'
Open the third terminal to work with a new instance.
Start instance003 using tt start:
$ tt start manual_leader:instance003
• Starting an instance [manual_leader:instance003]...
Check a replica set status using tt status:
$ tt status manual_leader
INSTANCE STATUS PID MODE CONFIG BOX UPSTREAM
manual_leader:instance001 RUNNING 8841 RW ready running --
manual_leader:instance002 RUNNING 8842 RO ready running --
manual_leader:instance003 RUNNING 8856 RO ready running --
After you added instance003 to the configuration and started it, you need to reload configurations on all instances.
This is required to allow instance001 and instance002 to get data from the new instance in case it becomes a master.
Connect to instance003 using tt connect:
$ tt connect manual_leader:instance003
• Connecting to the instance...
• Connected to manual_leader:instance001
Reload configurations on all three instances using the reload() function provided by the config module:
instance001:
manual_leader:instance001> require('config'):reload()
---
...
instance002:
manual_leader:instance002> require('config'):reload()
---
...
instance003:
manual_leader:instance003> require('config'):reload()
---
...
Execute box.info.replication to check a replica set status.
Make sure that upstream.status and downstream.status are follow for instance003.
manual_leader:instance001> box.info.replication
---
- 1:
id: 1
uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
lsn: 21
name: instance001
2:
id: 2
uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
lsn: 0
upstream:
status: follow
idle: 0.052655000000414
peer: replicator@127.0.0.1:3302
lag: 0.00010204315185547
name: instance002
downstream:
status: follow
idle: 0.09503500000028
vclock: {1: 21}
lag: 0.00026917457580566
3:
id: 3
uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6
lsn: 0
upstream:
status: follow
idle: 0.77522099999987
peer: replicator@127.0.0.1:3303
lag: 0.0001838207244873
name: instance003
downstream:
status: follow
idle: 0.33186100000012
vclock: {1: 21}
lag: 0
...
This section describes the process of removing an instance from a replica set.
Before removing an instance, make sure it is in read-only mode.
If the instance is a master, perform manual failover.
Disconnecting an instance
Clear the iproto option for instance003 by setting its value to {}:
Reload configurations on instance001 and instance002:
instance001:
manual_leader:instance001> require('config'):reload()
---
...
instance002:
manual_leader:instance002> require('config'):reload()
---
...
Check that the upstream section is missing for instance003 by executing box.info.replication[3]:
manual_leader:instance001> box.info.replication[3]
---
- id: 3
uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6
lsn: 0
downstream:
status: follow
idle: 0.4588760000006
vclock: {1: 21}
lag: 0
name: instance003
...
Stop instance003 using the tt stop command:
$ tt stop manual_leader:instance003
• The Instance manual_leader:instance003 (PID = 15551) has been terminated.
Check that downstream.status is stopped for instance003:
manual_leader:instance001> box.info.replication[3]
---
- id: 3
uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6
lsn: 0
downstream:
status: stopped
message: 'unexpected EOF when reading from socket, called on fd 27, aka 127.0.0.1:3301,
peer of 127.0.0.1:54185: Broken pipe'
system_message: Broken pipe
name: instance003
...
Removing an instance from the configuration
Remove instance003 from the instances.yml file:
instance001:
instance002:
Remove instance003 from config.yaml:
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
Reload configurations on instance001 and instance002:
instance001:
manual_leader:instance001> require('config'):reload()
---
...
instance002:
manual_leader:instance002> require('config'):reload()
---
...
Removing an instance from the ‘_cluster’ space
To remove an instance from the replica set permanently, it should be removed from the box.space._cluster system space:
Select all the tuples in the box.space._cluster system space:
manual_leader:instance002> box.space._cluster:select{}
---
- - [1, '9bb111c2-3ff5-36a7-00f4-2b9a573ea660', 'instance001']
- [2, '4cfa6e3c-625e-b027-00a7-29b2f2182f23', 'instance002']
- [3, '9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6', 'instance003']
...
Delete a tuple corresponding to instance003:
manual_leader:instance002> box.space._cluster:delete(3)
---
- [3, '9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6', 'instance003']
...
Execute box.info.replication to check the health status:
manual_leader:instance002> box.info.replication
---
- 1:
id: 1
uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
lsn: 21
upstream:
status: follow
idle: 0.73316000000159
peer: replicator@127.0.0.1:3301
lag: 0.00016212463378906
name: instance001
downstream:
status: follow
idle: 0.7269320000014
vclock: {2: 1, 1: 21}
lag: 0.00083398818969727
2:
id: 2
uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
lsn: 1
name: instance002
...
Master-replica: automated failover
Example on GitHub: auto_leader
This tutorial shows how to configure and work with a replica set with automated failover.
Before starting this tutorial:
Install the tt utility.
Create a tt environment in the current directory by executing the tt init command.
Inside the instances.enabled directory of the created tt environment, create the auto_leader directory.
Inside instances.enabled/auto_leader, create the instances.yml and config.yaml files:
Configuring a replica set
This section describes how to configure a replica set in config.yaml.
Step 1: Configuring a failover mode
First, set the replication.failover option to election:
replication:
failover: election
Step 2: Defining a replica set topology
Define a replica set topology inside the groups section.
The iproto.listen option specifies an address used to listen for incoming requests and allows replicas to communicate with each other.
groups:
group001:
replicasets:
replicaset001:
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
instance003:
iproto:
listen:
- uri: '127.0.0.1:3303'
Step 3: Creating a user for replication
In the credentials section, create the replicator user with the replication role:
credentials:
users:
replicator:
password: 'topsecret'
roles: [replication]
Step 4: Specifying advertise URIs
Set iproto.advertise.peer to advertise the current instance to other replica set members:
iproto:
advertise:
peer:
login: replicator
The resulting replica set configuration should look as follows:
credentials:
users:
replicator:
password: 'topsecret'
roles: [replication]
iproto:
advertise:
peer:
login: replicator
replication:
failover: election
groups:
group001:
replicasets:
replicaset001:
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
instance003:
iproto:
listen:
- uri: '127.0.0.1:3303'
Working with a replica set
After configuring a replica set, execute the tt start command from the tt environment directory:
$ tt start auto_leader
• Starting an instance [auto_leader:instance001]...
• Starting an instance [auto_leader:instance002]...
• Starting an instance [auto_leader:instance003]...
Check that instances are in the RUNNING status using the tt status command:
$ tt status auto_leader
INSTANCE STATUS PID MODE CONFIG BOX UPSTREAM
auto_leader:instance001 RUNNING 9170 RO ready running --
auto_leader:instance002 RUNNING 9171 RO ready running --
auto_leader:instance003 RUNNING 9172 RW ready running --
Checking a replica set status
Connect to instance001 using tt connect:
$ tt connect auto_leader:instance001
• Connecting to the instance...
• Connected to auto_leader:instance001
Check the instance state in regard to leader election using box.info.election.
The output below shows that instance001 is a follower while instance002 is a replica set leader.
auto_leader:instance001> box.info.election
---
- leader_idle: 0.77491499999815
leader_name: instance002
state: follower
vote: 0
term: 2
leader: 1
...
Check that instance001 is in read-only mode using box.info.ro:
auto_leader:instance001> box.info.ro
---
- true
...
Execute box.info.replication to check a replica set status.
Make sure that upstream.status and downstream.status are follow for instance002 and instance003.
auto_leader:instance001> box.info.replication
---
- 1:
id: 1
uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
lsn: 9
upstream:
status: follow
idle: 0.8257709999998
peer: replicator@127.0.0.1:3302
lag: 0.00012326240539551
name: instance002
downstream:
status: follow
idle: 0.81174199999805
vclock: {1: 9}
lag: 0
2:
id: 2
uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
lsn: 0
name: instance001
3:
id: 3
uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6
lsn: 0
upstream:
status: follow
idle: 0.83125499999733
peer: replicator@127.0.0.1:3303
lag: 0.00010204315185547
name: instance003
downstream:
status: follow
idle: 0.83213399999659
vclock: {1: 9}
lag: 0
...
To see the diagrams that illustrate how the upstream and downstream connections look,
refer to Monitoring a replica set.
To check that replicas (instance001 and instance003) get all updates from the master (instance002), follow the steps below:
Connect to instance002 using tt connect:
$ tt connect auto_leader:instance002
• Connecting to the instance...
• Connected to auto_leader:instance002
Create a space and add data as described in CRUD operation examples.
Use the select operation on instance001 and instance003 to make sure data is replicated.
Check that the 1 component of box.info.vclock values are the same on all instances:
instance001:
auto_leader:instance001> box.info.vclock
---
- {0: 1, 1: 32}
...
instance002:
auto_leader:instance002> box.info.vclock
---
- {0: 1, 1: 32}
...
instance003:
auto_leader:instance003> box.info.vclock
---
- {0: 1, 1: 32}
...
Note
Note that a vclock value might include the 0 component that is related to local space operations and might differ for different instances in a replica set.
Testing automated failover
To test how automated failover works if the current master is stopped, follow the steps below:
Stop the current master instance (instance002) using the tt stop command:
$ tt stop auto_leader:instance002
• The Instance auto_leader:instance002 (PID = 24769) has been terminated.
On instance001, check box.info.election.
In this example, a new replica set leader is instance001.
auto_leader:instance001> box.info.election
---
- leader_idle: 0
leader_name: instance001
state: leader
vote: 2
term: 3
leader: 2
...
Check replication status using box.info.replication for instance002:
upstream.status is disconnected.
downstream.status is stopped.
auto_leader:instance001> box.info.replication
---
- 1:
id: 1
uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
lsn: 32
upstream:
peer: replicator@127.0.0.1:3302
lag: 0.00032305717468262
status: disconnected
idle: 48.352504000002
message: 'connect, called on fd 20, aka 127.0.0.1:62575: Connection refused'
system_message: Connection refused
name: instance002
downstream:
status: stopped
message: 'unexpected EOF when reading from socket, called on fd 32, aka 127.0.0.1:3301,
peer of 127.0.0.1:62204: Broken pipe'
system_message: Broken pipe
2:
id: 2
uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
lsn: 1
name: instance001
3:
id: 3
uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6
lsn: 0
upstream:
status: follow
idle: 0.18620999999985
peer: replicator@127.0.0.1:3303
lag: 0.00012516975402832
name: instance003
downstream:
status: follow
idle: 0.19718099999955
vclock: {2: 1, 1: 32}
lag: 0.00051403045654297
...
The diagram below illustrates how the upstream and downstream connections look like:
Start instance002 back using tt start:
$ tt start auto_leader:instance002
• Starting an instance [auto_leader:instance002]...
Choosing a leader manually
Make sure that box.info.vclock values (except the 0 components) are the same on all instances:
instance001:
auto_leader:instance001> box.info.vclock
---
- {0: 2, 1: 32, 2: 1}
...
instance002:
auto_leader:instance002> box.info.vclock
---
- {0: 2, 1: 32, 2: 1}
...
instance003:
auto_leader:instance003> box.info.vclock
---
- {0: 3, 1: 32, 2: 1}
...
On instance002, run box.ctl.promote() to choose it as a new replica set leader:
auto_leader:instance002> box.ctl.promote()
---
...
Check box.info.election to make sure instance002 is a leader now:
auto_leader:instance002> box.info.election
---
- leader_idle: 0
leader_name: instance002
state: leader
vote: 1
term: 4
leader: 1
...
Adding and removing instances
The process of adding instances to a replica set and removing them is similar for all failover modes.
Learn how to do this from the Master-replica: manual failover tutorial:
Before removing an instance from a replica set with replication.failover set to election, make sure this instance is in read-only mode.
If the instance is a master, choose a new leader manually.
Master-master
Example on GitHub: master_master
This tutorial shows how to configure and work with a master-master replica set.
Before starting this tutorial:
Install the tt utility.
Create a tt environment in the current directory by executing the tt init command.
Inside the instances.enabled directory of the created tt environment, create the master_master directory.
Inside instances.enabled/master_master, create the instances.yml and config.yaml files:
Configuring a replica set
This section describes how to configure a replica set in config.yaml.
Step 1: Configuring a failover mode
First, set the replication.failover option to off:
replication:
failover: off
Step 2: Defining a replica set topology
Define a replica set topology inside the groups section:
- The
database.mode option should be set to rw to make instances work in read-write mode.
- The iproto.listen option specifies an address used to listen for incoming requests and allows replicas to communicate with each other.
groups:
group001:
replicasets:
replicaset001:
instances:
instance001:
database:
mode: rw
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
database:
mode: rw
iproto:
listen:
- uri: '127.0.0.1:3302'
Step 3: Creating a user for replication
In the credentials section, create the replicator user with the replication role:
credentials:
users:
replicator:
password: 'topsecret'
roles: [replication]
Step 4: Specifying advertise URIs
Set iproto.advertise.peer to advertise the current instance to other replica set members:
iproto:
advertise:
peer:
login: replicator
The resulting replica set configuration should look as follows:
credentials:
users:
replicator:
password: 'topsecret'
roles: [replication]
iproto:
advertise:
peer:
login: replicator
replication:
failover: off
groups:
group001:
replicasets:
replicaset001:
instances:
instance001:
database:
mode: rw
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
database:
mode: rw
iproto:
listen:
- uri: '127.0.0.1:3302'
Working with a replica set
After configuring a replica set, execute the tt start command from the tt environment directory:
$ tt start master_master
• Starting an instance [master_master:instance001]...
• Starting an instance [master_master:instance002]...
Check that instances are in the RUNNING status using the tt status command:
$ tt status master_master
INSTANCE STATUS PID MODE CONFIG BOX UPSTREAM
master_master:instance001 RUNNING 9263 RW ready running --
master_master:instance002 RUNNING 9264 RW ready running --
Checking a replica set status
Connect to both instances using tt connect.
Below is the example for instance001:
$ tt connect master_master:instance001
• Connecting to the instance...
• Connected to master_master:instance001
master_master:instance001>
Check that both instances are writable using box.info.ro:
instance001:
instance002:
Execute box.info.replication to check a replica set status.
For instance002, upstream.status and downstream.status should be follow.
To see the diagrams that illustrate how the upstream and downstream connections look,
refer to Monitoring a replica set.
Note
Note that a vclock value might include the 0 component that is related to local space operations and might differ for different instances in a replica set.
To check that both instances get updates from each other, follow the steps below:
On instance001, create a space, format it, and create a primary index:
box.schema.space.create('bands')
box.space.bands:format({
{ name = 'id', type = 'unsigned' },
{ name = 'band_name', type = 'string' },
{ name = 'year', type = 'unsigned' }
})
box.space.bands:create_index('primary', { parts = { 'id' } })
Then, add sample data to this space:
box.space.bands:insert { 1, 'Roxette', 1986 }
box.space.bands:insert { 2, 'Scorpions', 1965 }
On instance002, use the select operation to make sure data is replicated:
Add more data to the created space on instance002:
box.space.bands:insert { 3, 'Ace of Base', 1987 }
box.space.bands:insert { 4, 'The Beatles', 1960 }
Get back to instance001 and use select to make sure new records are replicated:
Check that box.info.vclock values are the same on both instances:
instance001:
instance002:
Resolving replication conflicts
Inserting conflicting records
To insert conflicting records to instance001 and instance002, follow the steps below:
Stop instance001 using the tt stop command:
$ tt stop master_master:instance001
On instance002, insert a new record:
box.space.bands:insert { 5, 'incorrect data', 0 }
Stop instance002 using tt stop:
$ tt stop master_master:instance002
Start instance001 back:
$ tt start master_master:instance001
Connect to instance001 and insert a record that should conflict with a record already inserted on instance002:
box.space.bands:insert { 5, 'Pink Floyd', 1965 }
Start instance002 back:
$ tt start master_master:instance002
Then, check box.info.replication on instance001.
upstream.status should be stopped because of the Duplicate key exists error:
The diagram below illustrates how the upstream and downstream connections look like:
To resolve a replication conflict, instance002 should get the correct data from instance001 first.
To achieve this, instance002 should be rebootstrapped:
Select all the tuples in the box.space._cluster system space to get a UUID of instance002:
In the config.yaml file, change the following instance002 settings:
- Set
database.mode to ro.
- Set
database.instance_uuid to a UUID value obtained in the previous step.
instance002:
database:
mode: ro
instance_uuid: 'dccf7485-8bff-47f6-bfc4-b311701e36ef'
Reload configurations on both instances using the config:reload() function:
instance001:
instance002:
Delete write-ahead logs and snapshots stored in the var/lib/instance002 directory.
Note
var/lib is the default directory used by tt to store write-ahead logs and snapshots.
Learn more from Configuration.
Restart instance002 using the tt restart command:
$ tt restart master_master:instance002
Connect to instance002 and make sure it received the correct data from instance001:
After reseeding a replica, you need to resolve a replication conflict that keeps replication stopped:
Execute box.info.replication on instance001.
upstream.status is still stopped:
The diagram below illustrates how the upstream and downstream connections look like:
In the config.yaml file, clear the iproto option for instance001 by setting its value to {} to disconnect this instance from instance002.
Set database.mode to ro:
instance001:
database:
mode: ro
iproto: {}
Reload configuration on instance001 only:
Change database.mode values back to rw for both instances and restore iproto.listen for instance001.
The database.instance_uuid option can be removed for instance002:
instance001:
database:
mode: rw
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
database:
mode: rw
iproto:
listen:
- uri: '127.0.0.1:3302'
Reload configurations on both instances one more time:
instance001:
instance002:
Check box.info.replication.
upstream.status should be follow now.
Adding and removing instances
The process of adding instances to a replica set and removing them is similar for all failover modes.
Learn how to do this from the Master-replica: manual failover tutorial:
Before removing an instance from a replica set with replication.failover set to off, make sure this instance is in read-only mode.
Sharding
Scaling databases in a growing project is often considered one of the most
challenging issues. Once a single server cannot withstand the load, scaling
methods should be applied.
Sharding is a database architecture that allows for
horizontal scaling,
which implies that a dataset is partitioned and distributed over multiple servers.
With Tarantool’s vshard module,
the tuples of a dataset are distributed across
multiple nodes, with a Tarantool database server instance on each node. Each instance
handles only a subset of the total data, so larger loads can be handled by simply
adding more servers. The initial dataset is partitioned into multiple parts, so each
part is stored on a separate server.
The vshard module is based on the concept of
virtual buckets, where a tuple
set is partitioned into a large number of abstract virtual nodes (virtual buckets,
further just buckets) rather than into a smaller number of physical nodes.
The dataset is partitioned using sharding keys (bucket id numbers).
Hashing a sharding key into a large number of buckets allows seamlessly
changing the number of servers in the cluster. The rebalancing mechanism distributes
buckets evenly among all shards in case some servers were added or removed.
The buckets have states, so it is easy to monitor the server states. For example,
a server instance is active and available for all types of requests, or a failover
occurred and the instance accepts only read requests.
The vshard module provides router and storage API (public and internal) for sharding-aware applications.
Check out the quick start guide or
learn more about how sharding works in Tarantool:
You can also find out more about sharding administration
or dive into the vshard configuration and API reference.
Architecture
Consider a distributed Tarantool cluster that consists of subclusters called
shards, each storing some part of data. Each shard, in its turn, constitutes
a replica set consisting of several replicas, one of which serves as a master
node that processes all read and write requests.
The whole dataset is logically partitioned into a predefined number of virtual
buckets (further just buckets), each assigned a unique number
ranging from 1 to N, where N is the total number of buckets.
The number of buckets is specifically chosen
to be several orders of magnitude larger than the potential number of cluster
nodes, even given future cluster scaling. For example, with M projected nodes
the dataset may be split into 100 * M or even 1,000 * M buckets. Care should
be taken when picking the number of buckets: if too large, it may require extra
memory for storing the routing information; if too small, it may decrease
the granularity of rebalancing.
Each shard stores a unique subset of buckets, which means that a bucket cannot
belong to several shards at once, as illustrated below:

This shard-to-bucket mapping is stored in a table in one of Tarantool’s system
spaces, with each shard holding only a specific part of the mapping that covers
those buckets that were assigned to this shard.
Apart from the mapping table, the bucket id is also stored in a special field of
every tuple of every table participating in sharding.
Once a shard receives any request (except for SELECT) from an
application, this shard checks the bucket id specified in the request
against the table of bucket ids that belong to a given node. If the
specified bucket id is invalid, the request gets terminated with the
following error: “wrong bucket”. Otherwise the request is executed, and
all the data created in the process is assigned the bucket id specified
in the request. Note that the request should only modify the data that
has the same bucket id as the request itself.
Storing bucket ids both in the data itself and the mapping table ensures data
consistency regardless of the application logic and makes rebalancing
transparent for the application. Storing the mapping table in a system space
ensures sharding is performed consistently in case of a failover, as all the
replicas in a shard share a common table state.
The sharded dataset is partitioned into a large number of abstract nodes called
virtual buckets (further just buckets).
The dataset is partitioned using the sharding key (or bucket id, in Tarantool
terminology). Bucket id is a number from 1 to N, where N is the total number of
buckets.

Each replica set stores a unique subset of buckets. One bucket cannot belong to
multiple replica sets at a time.
The total number of buckets is determined by the administrator who sets up the initial cluster configuration.
Every space you plan to shard must have a numeric field containing bucket id-s.
You can learn more from Data definition.
A sharded cluster in Tarantool consists of:
One or more replica sets.
Each replica set should contain at least two storage instances.
For redundancy, it is recommended to have 3 or more storage instances in a replica set.
One or more router instances.
The number of router instances is not limited and should be increased if the existing router instances become CPU or I/O bound.
Rebalancer.

Storage is a node storing a subset of the dataset. Multiple replicated (for
redundancy) storages comprise a replica set (also called shard).
Each storage in a replica set has a role, master or replica. A master
processes read and write requests. A replica processes read requests but cannot
process write requests.

Router is a standalone software component that routes read and write requests
from the client application to shards.
All requests from the application come to the sharded cluster through a router.
The router keeps the topology of a sharded cluster transparent for the application,
thus keeping the application unaware of:
- the number and location of shards,
- data rebalancing process,
- the fact and the process of a failover that occurred after a replica’s failure.
A router can also calculate a bucket id on its own provided that the application
clearly defines rules for calculating a bucket id based on the request data.
To do it, a router needs to be aware of the data schema.
The router does not have a persistent state, nor does it store the cluster topology
or balance the data. The router is a standalone software component that can run
in the storage layer or application layer depending on the application features.
A router maintains a constant pool of connections to all the storages that is
created at startup. Creating it this way helps avoid configuration errors. Once
a pool is created, a router caches the current state of the _vbucket table to
speed up the routing. In case a bucket id is moved to another storage as
a result of data rebalancing, or one of the shards fails over to a replica,
a router updates the routing table in a way that’s transparent for the application.
Sharding is not integrated into any centralized configuration storage system.
It is assumed that the application itself handles all the interactions with such
systems and passes sharding parameters. That said, the configuration can be
changed dynamically - for example, when adding or deleting one or several shards:
- To add a new shard to the cluster, a system administrator first changes the
configuration of all the routers and then the configuration of all the storages.
- The new shard becomes available to the storage layer for rebalancing.
- As a result of rebalancing, one of the vbuckets is moved to the new shard.
- When trying to access the vbucket, a router receives a special error code
that specifies the new vbucket location.
CRUD (create, read, update, delete) operations
CRUD operations can be:
- executed in a stored procedure inside a storage, or
- initialized by the application.
In any case, the application must include the operation bucket id in a request.
When executing an INSERT request, the operation bucket id is stored in a newly
created tuple. In other cases, it is checked if the specified operation
bucket id matches the bucket id of a tuple being modified.
Since a storage is not aware of the mapping between a bucket id and a primary
key, all the SELECT requests executed in stored procedures inside a storage are
only executed locally. Those SELECT requests that were initialized by the
application are forwarded to a router. Then, if the application has passed
a bucket id, a router uses it for shard calculation.
Calling stored procedures
There are several ways of calling stored procedures in cluster replica sets.
Stored procedures can be called:
- on a specific vbucket located in a replica set (in this case, it is necessary
to differentiate between read and write procedures, as write procedures are not
applicable to vbuckets that are being migrated), or
- without specifying any particular vbucket.
All the routing validity checks performed for sharded DML operations hold true
for vbucket-bound stored procedures as well.
Rebalancer is a background rebalancing process that ensures an even
distribution of buckets across the shards. During rebalancing, buckets are being
migrated among replica sets.
The rebalancer “wakes up” periodically and redistributes data from the most
loaded nodes to less loaded nodes. Rebalancing starts if the replicaset disbalance
of a replica set exceeds a disbalance threshold specified in the configuration.
The replicaset disbalance is calculated as follows:
|etalon_bucket_number - real_bucket_number| / etalon_bucket_number * 100
A replica set from which the bucket is being migrated is called a source ; a
target replica set to which the bucket is being migrated is called a destination.
A replica set lock makes a replica set invisible to the rebalancer. A locked
replica set can neither receive new buckets nor migrate its own buckets.
While a bucket is being migrated, it can have different states:
- ACTIVE – the bucket is available for read and write requests.
- PINNED – the bucket is locked for migrating to another replica set. Otherwise
pinned buckets are similar to buckets in the ACTIVE state.
- SENDING – the bucket is currently being copied to the destination replica set;
read requests to the source replica set are still processed.
- RECEIVING – the bucket is currently being filled; all requests to it are rejected.
- SENT – the bucket was migrated to the destination replica set. The
router
uses the SENT state to calculate the new location of the bucket. A bucket in
the SENT state goes to the GARBAGE state automatically after 0.5 seconds.
- GARBAGE – the bucket was already migrated to the destination replica set during
rebalancing; or the bucket was initially in the RECEIVING state, but some error
occurred during the migration.
Buckets in the GARBAGE state are deleted by the garbage collector.

Migration is performed as follows:
- At the destination replica set, a new bucket is created and assigned the RECEIVING
state, the data copying starts, and the bucket rejects all requests.
- The source bucket in the source replica set is assigned the SENDING state, and
the bucket continues to process read requests.
- Once the data is copied, the bucket on the source replica set is assigned the SENT
and it starts rejecting all requests.
- The bucket on the destination replica set is assigned the ACTIVE state and starts
accepting all requests.
Note
There is a specific error vshard.error.code.TRANSFER_IS_IN_PROGRESS that
returns in case a request tries to perform an action not applicable to a bucket
which is being relocated. You need to retry the request in this case.
The _bucket system space of each replica set stores the ids of buckets present
in the replica set. The space contains the following fields:
bucket – bucket id
status – state of the bucket
destination – UUID of the destination replica set
An example of _bucket.select{}:
Once the bucket is migrated, the destination replica set identified by UUID is filled in the
table. While the bucket is still located on the source replica set, the value of
the destination replica set UUID is equal to NULL.
А routing table on the router stores the map of all bucket ids to replica sets.
It ensures the consistency of sharding in case of failover.
The router keeps a persistent pool of connections to all the storages that
are created at startup. This helps prevent configuration errors. Once the connection
pool is created, the router caches the current state of the routing table in order
to speed up routing. If a bucket migrated to another storage after rebalancing,
or a failover occurred and caused one of the shards switching to another replica,
the discovery fiber on the router updates the routing table automatically.
As the bucket id is explicitly indicated both in the data and in the mapping table
on the router, the data is consistent regardless of the application logic. It also
makes rebalancing transparent for the application.
Requests to the database can be performed by the application or using stored
procedures. Either way, the bucket id should be explicitly specified in the request.
All requests are forwarded to the router first. The only operation supported
by the router is call. The operation is performed via the vshard.router.call()
function:
result = vshard.router.call(<bucket_id>, <mode>, <function_name>, {<argument_list>}, {<opts>})
Requests are processed as follows:
The router uses the bucket id to search for a replica set with the
corresponding bucket in the routing table.
If the map of the bucket id to the replica set is not known to the router
(the discovery fiber hasn’t filled the table yet), the router makes requests
to all storages to find out where the bucket is located.
Once the bucket is located, the shard checks:
- whether the bucket is stored in the
_bucket system space of the replica set;
- whether the bucket is ACTIVE or PINNED (for a read request, it can also be SENDING).
If all the checks succeed, the request is executed. Otherwise, it is terminated
with the error: “wrong bucket”.
- Vertical scaling
- Adding more power to a single server: using a more powerful CPU, adding
more capacity to RAM, adding more storage space, etc.
- Horizontal scaling
- Adding more servers to the pool of resources, then partitioning and
distributing a dataset across the servers.
- Sharding
- A database architecture that allows partitioning a dataset using a sharding
key and distributing a dataset across multiple servers. Sharding is a
special case of horizontal scaling.
- Node
- A virtual or physical server instance.
- Cluster
- A set of nodes that make up a single group.
- Storage
- A node storing a subset of a dataset.
- Replica set
- A set of storage nodes storing copies of a dataset. Each storage in a
replica set has a role, master or replica.
- Master
- A storage in a replica set processing read and write requests.
- Replica
- A storage in a replica set processing only read requests.
- Read requests
- Read-only requests, that is, select requests.
- Write requests
- Data-change operations, that is create, read, update, delete requests.
- Buckets (virtual buckets)
- The abstract virtual nodes into which the dataset is partitioned by the
sharding key (bucket id).
- Bucket id
- A sharding key defining which bucket belongs to which replica set.
A bucket id may be calculated from a hash key.
- Router
- A proxy server responsible for routing requests from an application to
nodes in a cluster.
Sharding with vshard
Sharding in Tarantool is implemented in the vshard module.
For a quick start with vshard, refer to Creating a sharded cluster.
Note
Starting with the 3.0 version, the recommended way of configuring Tarantool is using a configuration file.
The sharding section defines configuration parameters related to sharding.
To learn how to configure vshard in code, see Configuration reference.
The vshard module is distributed separately from the main Tarantool package.
To install the module, execute the following command:
$ tt rocks install vshard
If you are developing a sharded cluster application, add the vshard module dependency to a *.rockspec file:
dependencies = {
'vshard == 0.1.27'
}
Note
The minimum required version of vshard is 0.1.25.
Configuring settings related to sharding might include the following steps:
- Configure connection settings to allow instances within a sharded cluster to communicate with each other.
- Specify which role each replica set plays in a sharded cluster.
- Configure how data is partitioned across shards.
- Specify settings related to data rebalancing.
This section describes connection options that enable communication between instances within a sharded cluster.
For general information about connections, see the Connections topic.
In a sharded cluster configuration, you need to specify how a router and rebalancer connect to storages using the iproto.advertise.sharding option.
In the example below, the storage user is used for this purpose:
iproto:
advertise:
peer:
login: replicator
sharding:
login: storage
The storage user should have the sharding role described in the next section.
To allow a router and rebalancer to connect to storages, a user with the sharding role should be used.
The example below shows how to grant the sharding role to the storage user:
credentials:
users:
replicator:
password: 'topsecret'
roles: [replication]
storage:
password: 'secret'
roles: [sharding]
The sharding role has different privileges depending on a replica set’s sharding role.
For replica sets with the storage sharding role, the sharding credential role has the following privileges:
- All privileges provided by the
replication role.
- Executing vshard.storage.* functions.
If a replica set does not have the storage sharding role, the sharding credential role does not have any privileges.
Each replica set in a sharded cluster can have one of three roles:
router: a replica set acts as a router.
storage: a replica set acts as a storage.
rebalancer: a replica set acts as a rebalancer.
You can use the sharding.roles option to assign a specific role to a replica set or group of replica sets.
In the example below, all replica sets in the storages group have the storage role while replica sets in the routers group have the router role.
groups:
storages:
sharding:
roles: [storage]
# ...
routers:
sharding:
roles: [router]
# ...
Note that the rebalancer role is optional.
If it is not specified, a rebalancer is selected automatically from the master instances of replica sets.
To specify the rebalancer manually or turn it off, use the sharding.rebalancer_mode option.
This section describes configuration settings related to data partitioning.
Learn how to define spaces to be sharded in Data definition.
To define the total number of buckets in a cluster, configure the sharding.bucket_count option at the global level.
In the example below, sharding.bucket_count is set to 1000:
sharding:
bucket_count: 1000
sharding.bucket_count should be several orders of magnitude larger than the potential number of cluster nodes considering potential scaling out in the future.
If the estimated number of nodes in a cluster is N, then the data set should be divided into 100N or even 1000N buckets depending on the planned scaling out.
This number is greater than the potential number of cluster nodes in the system being designed.
Keep in mind that too many buckets can cause a need to allocate more memory to store routing information.
On the other hand, an insufficient number of buckets can lead to decreased granularity when rebalancing.
A replica set weight defines the storage capacity of the replica set: the larger the weight, the more buckets the replica set can store.
You can configure a replica set weight using the sharding.weight option.
This option can be used to store the prevailing amount of data on a replica set with more memory space.
You can also assign a zero weight to a replica set to initiate migration of its buckets to the remaining cluster nodes.
In the example below, the storage-a replica set can store twice as much data as storage-b:
# ...
replicasets:
storage-a:
sharding:
weight: 2
# ...
storage-b:
sharding:
weight: 1
# ...
There is an etalon number of buckets for a replica set.
(Etalon in this context means “ideal”.)
If there is no deviation
from this number in the whole replica set, then the buckets are distributed evenly.
The etalon number is calculated automatically considering the number of buckets
in the cluster and the weights of the replica sets.
Rebalancing starts if the disbalance threshold of a replica set
exceeds the disbalance threshold specified in the configuration
(the sharding.rebalancer_disbalance_threshold option).
The disbalance threshold of a replica set is calculated as follows:
|etalon_bucket_number - real_bucket_number| / etalon_bucket_number * 100
For example, a cluster is configured as follows:
In this case, the etalon numbers of buckets for the replica sets are:
- 1st replica set – 1000.
- 2nd replica set – 500.
- 3rd replica set – 1500.
You can set a replica set weight to zero to initiate migration of its buckets to the remaining cluster nodes.
You can also add a new replica set with a non-zero weight to initiate migration of the buckets from the existing replica sets.
When a new shard is added, a configuration should be reloaded on each instance to migrate buckets to a new shard:
- If a centralized configuration storage is used, Tarantool reloads a changed configuration automatically.
- If a local configuration file is used, you need to reload a configuration on all the routers first and then on all the storages.
Originally, vshard had quite a simple rebalancer –
one process on one node that calculated routes that should send buckets, how
many, and to whom. The nodes applied these routes one by
one sequentially.
Unfortunately, such a simple schema worked not fast enough,
especially for Vinyl, where costs of reading disk were comparable
with network costs. In fact, with Vinyl the rebalancer routes
applier was sleeping most of the time.
Now each node can send multiple buckets in parallel in a
round-robin manner to multiple destinations, or to just one.
To set the degree of parallelism, use the sharding.rebalancer_max_sending option:
sharding:
rebalancer_max_sending: 5
Note
Specifying sharding.rebalancer_max_sending = N probably won’t give N times
speed up. It depends on network, disk, number of other fibers in the system.
You have 10 replica sets and a new one is added.
Now all the 10 replica sets will try to send buckets to the new one.
Assume that each replica set can send up to 5 buckets at once. In that case,
the new replica set will experience a rather big load of 50 buckets
being downloaded at once. If the node needs to do some other
work, perhaps such a big load is undesirable. Also too, many
parallel buckets can cause timeouts in the rebalancing process
itself.
To fix the problem, you can set a lower value for rebalancer_max_sending
for old replica sets, or decrease rebalancer_max_receiving for the new one.
In the latter case, some workers on old nodes will be throttled,
and you will see that in the logs.
rebalancer_max_sending is important, if you have restrictions for
the maximum number of buckets that can be read only at once in the cluster. As you
remember, when a bucket is being sent, it does not accept new
write requests.
You have 100000 buckets and each
bucket stores ~0.001% of your data. The cluster has 10
replica sets. And you never can afford > 0.1% of data locked on
write. Then you should not set rebalancer_max_sending > 10 on
these nodes. It guarantees that the rebalancer won’t send more
than 100 buckets at once in the whole cluster.
If rebalancer_max_sending is too high and rebalancer_max_receiving is too low,
then some buckets will try to get relocated – and will fail with that.
This problem will consume network resources and time. It is important to
configure these parameters to not conflict with each other.
Replica set lock and bucket pin
A replica set lock (sharding.lock) makes a replica set invisible to the rebalancer: a locked
replica set can neither receive new buckets nor migrate its own buckets.
A bucket pin (vshard.storage.bucket_pin(bucket_id)) blocks a specific bucket from migrating: a pinned bucket stays on
the replica set to which it is pinned until it is unpinned.
Pinning all replica set buckets is not equivalent to locking a replica set. Even if
you pin all buckets, a non-locked replica set can still receive new buckets.
A replica set lock is helpful, for example, to separate a replica set from production
replica sets for testing, or to preserve some application metadata that must not
be sharded for a while. A bucket pin is used for similar cases but in a smaller
scope.
By both locking a replica set and pinning all buckets, you can
isolate an entire replica set.
Locked replica sets and pinned buckets affect the rebalancing algorithm as the
rebalancer must ignore locked replica sets and consider pinned buckets when
attempting to reach the best possible balance.
The issue is not trivial as a user can pin too many buckets to a replica set,
so a perfect balance becomes unreachable. For example, consider the following
cluster (assume all replica set weights are equal to 1).
The initial configuration:
rs1: bucket_count = 150
rs2: bucket_count = 150, pinned_count = 120
Adding a new replica set:
rs1: bucket_count = 150
rs2: bucket_count = 150, pinned_count = 120
rs3: bucket_count = 0
The perfect balance would be 100 - 100 - 100, which is impossible since the
rs2 replica set has 120 pinned buckets. The best possible balance here is the
following:
rs1: bucket_count = 90
rs2: bucket_count = 120, pinned_count 120
rs3: bucket_count = 90
The rebalancer moved as many buckets as possible from rs2 to decrease the
disbalance. At the same time, it respected equal weights of rs1 and rs3.
The algorithms for implementing locks and pins are completely different, although
they look similar in terms of functionality.
Replica set lock and rebalancing
Locked replica sets do not participate in rebalancing. This means that
even if the actual total number of buckets is not equal to the etalon number,
the disbalance cannot be fixed due to the lock. When the rebalancer detects that
one of the replica sets is locked, it recalculates the etalon number of buckets
of the non-locked replica sets as if the locked replica set and its buckets did
not exist at all.
Bucket pin and rebalancing
Rebalancing replica sets with pinned buckets requires a more complex algorithm.
Here pinned_count[o] is the number of pinned buckets, and etalon_count is
the etalon number of buckets for a replica set:
- The
rebalancer calculates the etalon number of buckets as if all buckets
were not pinned. Then the rebalancer checks each replica set and compares the
etalon number of buckets with the number of pinned buckets in a replica set.
If pinned_count < etalon_count, non-locked replica sets (at this point all
locked replica sets already are filtered out) with pinned buckets can receive
new buckets.
- If
pinned_count > etalon_count, the disbalance cannot be fixed, as the
rebalancer cannot move pinned buckets out of this replica set. In such a case
the etalon number is updated and set equal to the number of pinned buckets.
The replica sets with pinned_count > etalon_count are not processed by
the rebalancer, and the number of pinned buckets is subtracted from the
total number of buckets. The rebalancer tries to move out as many buckets as
possible from such replica sets.
- This procedure is restarted from step 1 for replica sets with
pinned_count >= etalon_count until pinned_count <= etalon_count on
all replica sets. The procedure is also restarted when the total number of
buckets is changed.
Here is the pseudocode for the algorithm:
function cluster_calculate_perfect_balance(replicasets, bucket_count)
-- rebalance the buckets using weights of the still viable replica sets --
end;
cluster = <all of the non-locked replica sets>;
bucket_count = <the total number of buckets in the cluster>;
can_reach_balance = false
while not can_reach_balance do
can_reach_balance = true
cluster_calculate_perfect_balance(cluster, bucket_count);
foreach replicaset in cluster do
if replicaset.perfect_bucket_count <
replicaset.pinned_bucket_count then
can_reach_balance = false
bucket_count -= replicaset.pinned_bucket_count;
replicaset.perfect_bucket_count =
replicaset.pinned_bucket_count;
end;
end;
end;
cluster_calculate_perfect_balance(cluster, bucket_count);
The complexity of the algorithm is O(N^2), where N is the number of replica sets.
On each step, the algorithm either finishes the calculation, or ignores at least
one new replica set overloaded with the pinned buckets, and updates the etalon
number of buckets on other replica sets.
Bucket ref is an in-memory counter that is similar to the
bucket pin, but has the following differences:
Bucket ref is not persistent. Refs are intended for forbidding bucket transfer
during request execution, but on restart all requests are dropped.
There are two types of bucket refs: read-only (RO) and read-write (RW).
If a bucket has RW refs, it cannot be moved. However, when the rebalancer
needs it to be sent, it locks the bucket for new write requests, waits
until all current requests are finished, and then sends the bucket.
If a bucket has RO refs, it can be sent, but cannot be dropped. Such a
bucket can even enter GARBAGE or SENT state, but its data is kept until
the last reader is gone.
A single bucket can have both RO and RW refs.
Bucket ref is countable.
The vshard.storage.bucket_ref/unref() methods
are called automatically when vshard.router.call()
or vshard.storage.call() is used.
For raw API like r = vshard.router.route() r:callro/callrw, you should
explicitly call the bucket_ref() method inside the function. Also, make sure
that you call bucket_unref() after bucket_ref(), otherwise the bucket
cannot be moved from the storage until the instance is restarted.
To see how many refs there are for a bucket, use
vshard.storage.buckets_info([bucket_id])
(the bucket_id parameter is optional).
For example:
Defining and manipulating data
Sharded spaces should be defined in a storage application inside box.once() and should have a field with bucket id values.
This field should meet the following requirements:
- The field’s data type can be
unsigned, number, or integer.
- The field must be non-nullable.
- The field must be indexed by the shard_index. The default name for this index is
bucket_id.
In the example below, the bands space has the bucket_id field, which is used to partition a dataset across different storage instances:
box.once('bands', function()
box.schema.create_space('bands', {
format = {
{ name = 'id', type = 'unsigned' },
{ name = 'bucket_id', type = 'unsigned' },
{ name = 'band_name', type = 'string' },
{ name = 'year', type = 'unsigned' }
},
if_not_exists = true
})
box.space.bands:create_index('id', { parts = { 'id' }, if_not_exists = true })
box.space.bands:create_index('bucket_id', { parts = { 'bucket_id' }, unique = false, if_not_exists = true })
end)
Example on GitHub: sharded_cluster
Note
In a sharded space, uniqueness by secondary index is only guaranteed within a single shard, not across the whole cluster.
All DML operations with data should be performed via a router using the vshard.router.call functions, such as vshard.router.callrw() or vshard.router.callro().
For example, a storage application has the insert_band function used to insert new tuples:
function insert_band(id, bucket_id, band_name, year)
box.space.bands:insert({ id, bucket_id, band_name, year })
end
In a router application, you can define the put function that specifies how a router selects the storage to write data:
function put(id, band_name, year)
local bucket_id = vshard.router.bucket_id_mpcrc32({ id })
vshard.router.callrw(bucket_id, 'insert_band', { id, bucket_id, band_name, year })
end
Learn more at Processing requests.
Deduplication of non-idempotent requests
Idempotent requests produce the same result every time they are executed.
For example, a data read request or a multiplication by one are both idempotent.
Therefore, incrementing by one is an example of a non-idempotent operation.
When such an operation is applied again, the value for the field increases by 2 instead of just 1.
Note
Any write requests that are intended to be executed repeatedly (for example, retried after an error) should be idempotent.
The operations’ idempotency ensures that the change is applied only once.
A request may need to be run again if an error occurs on the server or client side.
In this case:
Read requests can be executed repeatedly.
For this purpose, vshard.router.call() (with mode=read) uses the request_timeout parameter
(since vshard 0.1.28).
It is necessary to pass the request_timeout and timeout parameters together, with the following requirement:
timeout > request_timeout
For example, if timeout = 10 and request_timeout = 2,
within 10 seconds the router is able to make 5 attempts (2 seconds each) to send a request to different replicas
until the request finally succeeds.
Write requests (vshard.router.callrw()) generally cannot be re-executed without verifying
that they have not been applied before.
Lack of such a check might lead to duplicate records or unplanned data changes.
For example, a client has sent a request to the server. The client is waiting for a response within a specified timeout.
If the server sends a successful response after this time has elapsed,
the client won’t see this response due to a timeout, and will consider the request as failed.
When re-executing this request without additional check, the operation may be applied twice.
A write request can be executed repeatedly without a check in two cases:
- The request is idempotent.
- It’s known for sure that the previous request raised an error before executing any write operations.
For example, ER_READONLY was thrown by the server.
In this case, we know that the request couldn’t complete due to server in read-only mode.
Deduplication examples
To ensure that the write requests (INSERT, UPDATE, UPSERT, and autoincrement) are idempotent,
you should implement a check that the request is applied for the first time.
Note
There is no built-in deduplication check in Tarantool.
Currently, deduplication can be only implemented by the user in the application code.
For example, when you add a new tuple to a space, you can use a unique insert ID to check the request.
In the example below, within a single transaction:
- It is checked whether a tuple with the
key ID exists in the bands space.
- If there is no tuple with this ID in the space, the tuple is inserted.
box.begin()
if box.space.bands:get{key} == nil then
box.space.bands:insert{key, value}
end
box.commit()
For update and upsert requests, you can create a deduplication space where the request IDs will be saved.
Deduplication space is a user space that contains a list of unique identifiers.
Each identifier corresponds to one applied request.
This space can have any name, in the example it is called deduplication.
In the example below, within a single transaction:
- It is checked whether the
deduplication_key request ID exists in the deduplication space.
- If there is no such ID, the ID is added to the deduplication space.
- If the request hasn’t been applied before, it increments the specified field in the
bands space by one.
This approach ensures that each data modification request will be executed only once.
function update_1(deduplication_key, key)
box.begin()
if box.space.deduplication:get{deduplication_key} == nil then
box.space.deduplication:insert{deduplication_key}
box.space.bands:update(key, {{'+', 'value', 1 }})
end
box.commit()
end
Sharded cluster maintenance
If a replica set master fails, it is recommended to:
- Switch one of the replicas into the master mode. This allows the new master
to process all the incoming requests.
- Update the configuration of all the cluster members. This forwards all the
requests to the new master.
In case a whole replica set fails, some part of the dataset becomes inaccessible.
Meanwhile, the router tries to reconnect to the master of the failed replica set.
This way, once the replica set is up and running again, the cluster is automatically restored.
Master scheduled downtime
To perform a scheduled downtime of a replica set master, it is recommended to:
- Update the configuration to use another instance as a master.
- Reload the configuration on all the instances. All the requests then are forwarded to a new master.
- Shut down the old master.
Replica set scheduled downtime
To perform a scheduled downtime of a replica set, it is recommended to:
- Migrate all the buckets to the other cluster storages.
You can do this by assigning a zero weight to a replica set to initiate migration of its buckets to the remaining cluster nodes.
- Update the configuration of all the nodes.
- Shut down the replica set.
Searches for buckets, buckets recovery, and buckets rebalancing are performed
automatically and do not require manual intervention.
Technically, there are multiple fibers responsible for different types of
operations:
- a discovery fiber on the router searches for buckets in the background
- a failover fiber on the router maintains replica connections
- a garbage collector fiber on each master storage removes the contents
of buckets that were moved
- a bucket recovery fiber on each master storage recovers buckets in the
SENDING and RECEIVING states in case of reboot
- a rebalancer on a single master storage among all replica sets executes the rebalancing process.
See the Rebalancing process and
Migration of buckets sections for details.
A garbage collector fiber runs in the background on the master storages
of each replica set. It starts deleting the contents of the bucket in the GARBAGE
state part by part. Once the bucket is empty, its record is deleted from the
_bucket system space.
A bucket recovery fiber runs on the master storages. It helps to recover
buckets in the SENDING and RECEIVING states in case of reboot.
Buckets in the SENDING state are recovered as follows:
- The system first searches for buckets in the SENDING state.
- If such a bucket is found, the system sends a request to the destination
replica set.
- If the bucket on the destination replica set is ACTIVE, the original bucket
is deleted from the source node.
Buckets in the RECEIVING state are deleted without extra checks.
A failover fiber runs on every router. If a master of a replica set
becomes unavailable, the failover fiber redirects read requests to the replicas.
Write requests are rejected with an error until the master becomes available.
Connections and authentication
This section contains guides on how to configure connections and authentication features.
Connections
To set up a Tarantool cluster, you need to enable communication between its instances, regardless of whether they running on one or different hosts.
This requires configuring connection settings that include:
- One or several URIs used to listen for incoming requests.
- An URI used to advertise an instance to other cluster members. This URI lets other cluster members know how to connect to the current Tarantool instance.
- (Optional) SSL settings used to secure connections between instances.
Configuring connection settings is also required to enable communication of a Tarantool cluster to external systems.
For example, this might be administering cluster members using tt, managing clusters using Tarantool Cluster Manager, or using connectors for different languages.
This topic describes how to define connection settings in the iproto section of a YAML configuration.
Note
iproto is a binary protocol used to communicate between cluster instances and with external systems.
To configure URIs used to listen for incoming requests, use the iproto.listen configuration option.
The example below shows how to set a listening IP address for instance001 to 127.0.0.1:3301:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
Multiple listen addresses
In this example, instance001 listens on two IP addresses:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
- uri: '127.0.0.1:3302'
You can pass only a port value to iproto.listen:
instance001:
iproto:
listen:
- uri: '3301'
In this case, this port is used for all IP addresses the server listens on.
In the Enterprise Edition, you can enable SSL for a connection using the params section of the specified URI:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
params:
transport: 'ssl'
ssl_cert_file: 'certs/server.crt'
ssl_key_file: 'certs/server.key'
Learn more from Securing connections with SSL.
For local development, you can enable communication between cluster members by using Unix domain sockets:
instance001:
iproto:
listen:
- uri: 'unix/:./var/run/{{ instance_name }}/tarantool.iproto'
An advertise URI (iproto.advertise.*) lets other cluster members or clients know how to connect to the current Tarantool instance:
iproto.advertise.peer specifies how to advertise the instance to other cluster members.
iproto.advertise.sharding specifies how to advertise the instance to a router and rebalancer.
iproto.advertise.client accepts a URI used to advertise the instance to clients.
iproto.advertise.<peer_or_sharding> might include the credentials required to connect to this instance, a URI used to listen for incoming requests, and SSL settings.
If iproto.advertise.<peer_or_sharding>.uri is not specified explicitly, a listen URI of this instance is used.
In this case, you need at least to specify credentials for connecting to this instance.
In the example below, the iproto.advertise.peer option is used to inform other replica set members that the replicator user should be used to connect to the current instance:
iproto:
advertise:
peer:
login: replicator
In a sharded cluster, iproto.advertise.sharding specifies that a router and rebalancer should use the storage user to connect to storages:
iproto:
advertise:
peer:
login: replicator
sharding:
login: storage
If required, you can specify an advertise URI explicitly by setting up the iproto.advertise.<peer_or_sharding>.uri option.
In the example below, iproto.listen includes two URIs that can be used to connect to instance001 but only the second one is used to advertise this instance to other replica set peers:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
- uri: '127.0.0.1:4401'
advertise:
peer:
uri: '127.0.0.1:4401'
The iproto.advertise.<peer_or_sharding>.uri option can also accept an FQDN instead of an IP address:
instance001:
iproto:
listen:
- uri: '192.168.0.101:3301'
advertise:
peer:
uri: 'server001.example.com:3301'
To learn about the specifics of configuring an advertise URI’s SSL settings, see Advertise URI specifics.
Securing connections with SSL
Tarantool supports the use of SSL connections to encrypt client-server communications for increased security.
To enable SSL, use the <uri>.params.* options, which can be applied to both listen and advertise URIs.
The example below demonstrates how to enable traffic encryption by using a self-signed server certificate.
The following parameters are specified for each instance:
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
params:
transport: 'ssl'
ssl_cert_file: 'certs/server.crt'
ssl_key_file: 'certs/server.key'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
params:
transport: 'ssl'
ssl_cert_file: 'certs/server.crt'
ssl_key_file: 'certs/server.key'
instance003:
iproto:
listen:
- uri: '127.0.0.1:3303'
params:
transport: 'ssl'
ssl_cert_file: 'certs/server.crt'
ssl_key_file: 'certs/server.key'
You can find the full example here: ssl_without_ca.
The example below demonstrates how to enable traffic encryption by using a server certificate signed by a trusted certificate authority.
In this case, all replica set peers verify each other for authenticity.
The following parameters are specified for each instance:
- ssl_ca_file: a path to a trusted certificate authorities (CA) file.
- ssl_cert_file: a path to an SSL certificate file.
- ssl_key_file: a path to a private SSL key file.
- ssl_password (
instance001): a password for an encrypted private SSL key.
- ssl_password_file (
instance002 and instance003): a text file containing passwords for encrypted SSL keys.
- ssl_ciphers: a colon-separated list of SSL cipher suites the connection can use.
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
params:
transport: 'ssl'
ssl_ca_file: 'certs/root_ca.crt'
ssl_cert_file: 'certs/instance001/server001.crt'
ssl_key_file: 'certs/instance001/server001.key'
ssl_password: 'qwerty'
ssl_ciphers: 'ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
params:
transport: 'ssl'
ssl_ca_file: 'certs/root_ca.crt'
ssl_cert_file: 'certs/instance002/server002.crt'
ssl_key_file: 'certs/instance002/server002.key'
ssl_password_file: 'certs/ssl_passwords.txt'
ssl_ciphers: 'ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256'
instance003:
iproto:
listen:
- uri: '127.0.0.1:3303'
params:
transport: 'ssl'
ssl_ca_file: 'certs/root_ca.crt'
ssl_cert_file: 'certs/instance003/server003.crt'
ssl_key_file: 'certs/instance003/server003.key'
ssl_password_file: 'certs/ssl_passwords.txt'
ssl_ciphers: 'ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256'
You can find the full example here: ssl_with_ca.
SSL parameters for an advertise URI should be set only if this advertise URI is specified explicitly.
Otherwise, SSL parameters of a listen URI are used and no additional configuration is required.
Configuring an advertise URI’s SSL options depends on whether a trusted certificate authorities (CA) file is set or not.
Without the CA file, you only need to set iproto.advertise.<peer_or_sharding>.params.transport to ssl as shown below:
instance001:
iproto:
listen:
- uri: '192.168.0.101:3301'
params:
transport: 'ssl'
ssl_cert_file: 'certs/server.crt'
ssl_key_file: 'certs/server.key'
advertise:
peer:
uri: 'server.example.com:3301'
params:
transport: 'ssl'
If the CA file is specified for a listen URI, you also need to configure ssl_cert_file and ssl_key_file for this advertise URI:
instance001:
iproto:
listen:
- uri: '192.168.0.101:3301'
params:
transport: 'ssl'
ssl_ca_file: 'certs/root_ca.crt'
ssl_cert_file: 'certs/instance001/server001.crt'
ssl_key_file: 'certs/instance001/server001.key'
advertise:
peer:
uri: 'server001.example.com:3301'
params:
transport: 'ssl'
ssl_cert_file: 'certs/instance001/server001.crt'
ssl_key_file: 'certs/instance001/server001.key'
To reload SSL certificate files specified in the configuration, open an admin console and reload the configuration using config.reload():
require('config'):reload()
New certificates will be used for new connections.
Existing connections will continue using old SSL certificates until reconnection is required.
For example, certificate expiry or a network issue causes reconnection.
Credentials
Tarantool enables flexible management of access to various database resources by providing specific privileges to users.
You can read more about the main concepts of Tarantool access control system in the Access control section.
This topic describes how to create users and grant them the specified privileges in the credentials section of a YAML configuration.
For example, you can define users with the replication and sharding roles to maintain replication and sharding in a Tarantool cluster.
You can create new or configure credentials of the existing users in the credentials.users section.
In the example below, a dbadmin user without a password is created:
credentials:
users:
dbadmin: {}
To set a password, use the credentials.users.<username>.password option:
credentials:
users:
dbadmin:
password: 'T0p_Secret_P@$$w0rd'
Granting privileges to a user
To assign a role to a user, use the credentials.users.<username>.roles option.
In this example, the dbadmin user gets privileges granted to the super built-in role:
credentials:
users:
dbadmin:
password: 'T0p_Secret_P@$$w0rd'
roles: [ super ]
To create a new role, define it in the credentials.roles.* section.
In the example below, the writers_space_reader role gets privileges to select data in the writers space:
roles:
writers_space_reader:
privileges:
- permissions: [ read ]
spaces: [ writers ]
Then, you can assign this role to a user using credentials.users.<username>.roles (sampleuser in the example below):
sampleuser:
password: '123456'
roles: [ writers_space_reader ]
You can grant specific privileges directly using credentials.users.<username>.privileges.
In this example, sampleuser gets privileges to select and modify data in the books space:
sampleuser:
password: '123456'
roles: [ writers_space_reader ]
privileges:
- permissions: [ read, write ]
spaces: [ books ]
You can find the full example here: credentials.
Revoking privileges from a user
To revoke a previously granted privilege, remove it from the configuration.
For example, here is how to grant privileges to a space and how to revoke one of the privileges:
# grant privileges:
privileges:
- permissions: [read, write]
spaces: [books]
# revoke a privilege:
privileges:
- permissions: [read] # !! write permission revoked !!
spaces: [books]
If you want to revoke the remaining privilege to from a space, you can remove it, too, thus making permissions an empty array:
# empty permissions array:
privileges:
- permissions: [] # !! read permission revoked !!
spaces: [books]
You can revoke all privileges by making the privileges an empty array:
# empty privileges array:
privileges: [] # !! no privileges at all !!
Warning
Do not remove a user or a role from configuration in order to revoke that user’s or role’s privileges. If a user or a role is entirely
removed from the configuration, it is not tracked by configuration machinery anymore. The user/role is not removed and its privileges are not revoked.
Loading secrets from safe storage
Tarantool enables you to load secrets from safe storage such as external files or environment variables.
To do this, you need to define corresponding options in the config.context section.
In the examples below, context.dbadmin_password and context.sampleuser_password define how to load user passwords from *.txt files or environment variables:
This example shows how to load passwords from *.txt files:
config:
context:
dbadmin_password:
from: file
file: secrets/dbadmin_password.txt
rstrip: true
sampleuser_password:
from: file
file: secrets/sampleuser_password.txt
rstrip: true
This example shows how to load passwords from environment variables:
config:
context:
dbadmin_password:
from: env
env: DBADMIN_PASSWORD
sampleuser_password:
from: env
env: SAMPLEUSER_PASSWORD
These environment variables should be set before starting instances.
After configuring how to load passwords, you can set password values using credentials.users.<username>.password as follows:
credentials:
users:
dbadmin:
password: '{{ context.dbadmin_password }}'
sampleuser:
password: '{{ context.sampleuser_password }}'
You can find the full examples here: credentials_context_file, credentials_context_env.
Authentication
Authentication restrictions
Tarantool Enterprise Edition provides the ability to apply additional restrictions for user authentication.
For example, you can specify the minimum time between authentication attempts
or turn off access for guest users.
In the configuration below, security.auth_retries is set to 2,
which means that Tarantool lets a client try to authenticate with the same username three times.
At the fourth attempt, the authentication delay configured with security.auth_delay is enforced.
This means that a client should wait 10 seconds after the first failed attempt.
security:
auth_delay: 10
auth_retries: 2
disable_guest: true
The disable_guest option turns off access over remote connections from unauthenticated or guest users.
A password policy allows you to improve database security by enforcing the use
of strong passwords, setting up a maximum password age, and so on.
When you create a new user with
box.schema.user.create
or update the password of an existing user with
box.schema.user.passwd,
the password is checked against the configured password policy settings.
In the example below, the following options are specified:
security:
password_min_length: 16
password_enforce_lowercase: true
password_enforce_uppercase: true
password_enforce_digits: true
password_enforce_specialchars: true
password_lifetime_days: 365
password_history_length: 3
By default, Tarantool uses the
CHAP
protocol to authenticate users and applies SHA-1 hashing to
passwords.
Note that CHAP stores password hashes in the _user space unsalted.
If an attacker gains access to the database, they may crack a password, for example, using a rainbow table.
In the Enterprise Edition, you can enable
PAP authentication
with the SHA256 hashing algorithm.
For PAP, a password is salted with a user-unique salt before saving it in the database,
which keeps the database protected from cracking using a rainbow table.
To enable PAP, specify the security.auth_type option as follows:
security:
auth_type: 'pap-sha256'
For new users, the box.schema.user.create method generates authentication data using PAP-SHA256.
For existing users, you need to reset a password using
box.schema.user.passwd
to use the new authentication protocol.
Warning
Given that PAP transmits a password as plain text,
Tarantool requires configuring SSL/TLS
for a connection.
The example below shows how to specify the authentication protocol using the auth_type parameter when connecting to an instance using net.box:
local connection = require('net.box').connect({
uri = 'admin:topsecret@127.0.0.1:3301',
params = { auth_type = 'pap-sha256',
transport = 'ssl',
ssl_cert_file = 'certs/server.crt',
ssl_key_file = 'certs/server.key' }
})
If the authentication protocol isn’t specified explicitly on the client side,
the client uses the protocol configured on the server via security.auth_type.
Security
This section contains guides related to security features.
Audit module
Example on GitHub: audit_log
The audit module allows you to record various events occurred in Tarantool.
Each event is an action related to authorization and authentication, data manipulation,
administrator activity, or system events.
The module provides detailed reports of these activities and helps you find and
fix breaches to protect your business. For example, you can see who created a new user
and when.
It is up to each company to decide exactly what activities to audit and what actions to take.
System administrators, security engineers, and people in charge of the company may want to
audit different events for different reasons. Tarantool provides such an option for each of them.
Examples of audit log entries
In this example, the following audit log configuration is used:
audit_log:
to: file
file: 'audit_tarantool.log'
filter: [ user_create,data_operations,ddl,custom ]
format: json
spaces: [ bands ]
extract_key: true
Create a space bands and check the logs in the file after the creation:
box.schema.space.create('bands')
The audit log entry for the space_create event might look as follows:
{
"time": "2024-01-24T11:43:21.566+0300",
"uuid": "26af0a7d-1052-490a-9946-e19eacc822c9",
"severity": "INFO",
"remote": "unix/:(socket)",
"session_type": "console",
"module": "tarantool",
"user": "admin",
"type": "space_create",
"tag": "",
"description": "Create space Bands"
}
Then insert one tuple to space:
box.space.bands:insert { 1, 'Roxette', 1986 }
If the extract_key option is set to true, the audit system prints the primary key instead of the full tuple:
{
"time": "2024-01-24T11:45:42.358+0300",
"uuid": "b437934d-62a7-419a-8d59-e3b33c688d7a",
"severity": "VERBOSE",
"remote": "unix/:(socket)",
"session_type": "console",
"module": "tarantool",
"user": "admin",
"type": "space_insert",
"tag": "",
"description": "Insert key [1] into space bands"
}
If the extract_key option is set to false, the audit system prints the full tuple like this:
{
"time": "2024-01-24T11:45:42.358+0300",
"uuid": "b437934d-62a7-419a-8d59-e3b33c688d7a",
"severity": "VERBOSE",
"remote": "unix/:(socket)",
"session_type": "console",
"module": "tarantool",
"user": "admin",
"type": "space_insert",
"tag": "",
"description": "Insert tuple [1, \"Roxette\", 1986] into space bands"
}
The Tarantool audit log module can record various events that you can monitor and
decide whether you need to take actions:
- Administrator activity – events related to actions performed by the administrator.
For example, such logs record the creation of a user.
- Access events – events related to authorization and authentication of users.
For example, such logs record failed attempts to access secure data.
- Data access and modification – events of data manipulation in the storage.
- System events – events related to modification or configuration of resources.
For example, such logs record the replacement of a space.
- Custom events – any events added manually using
the audit module API.
The full list of available audit log events is provided in the table below:
| Event |
Event type |
Severity level |
Example |
| Audit log enabled for events |
audit_enable |
VERBOSE |
|
| Custom events |
custom |
INFO (default) |
|
| User authorized successfully |
auth_ok |
VERBOSE |
Authenticate user <USER> |
| User authorization failed |
auth_fail |
ALARM |
Failed to authenticate user <USER> |
| User logged out or quit the session |
disconnect |
VERBOSE |
Close connection |
| User created |
user_create |
INFO |
Create user <USER> |
| User dropped |
user_drop |
INFO |
Drop user <USER> |
| Role created |
role_create |
INFO |
Create role <ROLE> |
| Role dropped |
role_drop |
INFO |
Drop role <ROLE> |
| User disabled |
user_disable |
INFO |
Disable user <USER> |
| User enabled |
user_enable |
INFO |
Enable user <USER> |
| User granted rights |
user_grant_rights |
INFO |
Grant <PRIVILEGE> rights for <OBJECT_TYPE> <OBJECT_NAME> to user <USER> |
| User revoked rights |
user_revoke_rights |
INFO |
Revoke <PRIVILEGE> rights for <OBJECT_TYPE> <OBJECT_NAME> from user <USER> |
| Role granted rights |
role_grant_rights |
INFO |
Grant <PRIVILEGE> rights for <OBJECT_TYPE> <OBJECT_NAME> to role <ROLE> |
| Role revoked rights |
role_revoke_rights |
INFO |
Revoke <PRIVILEGE> rights for <OBJECT_TYPE> <OBJECT_NAME> from role <ROLE> |
| User password changed |
password_change |
INFO |
Change password for user <USER> |
| Failed attempt to access secure data (for example, personal records, details, geolocation) |
access_denied |
ALARM |
<ACCESS_TYPE> denied to <OBJECT_TYPE> <OBJECT_NAME> |
| Expressions with arguments evaluated in a string |
eval |
INFO |
Evaluate expression <EXPR> |
| Function called with arguments |
call |
VERBOSE |
Call function <FUNCTION> with arguments <ARGS> |
Iterator key selected from space.index |
space_select |
VERBOSE |
Select <ITER_TYPE> <KEY> from <SPACE>.<INDEX> |
| Space created |
space_create |
INFO |
Create space <SPACE> |
| Space altered |
space_alter |
INFO |
Alter space <SPACE> |
| Space dropped |
space_drop |
INFO |
Drop space <SPACE> |
| Tuple inserted into space |
space_insert |
VERBOSE |
Insert tuple <TUPLE> into space <SPACE> |
| Tuple replaced in space |
space_replace |
VERBOSE |
Replace tuple <TUPLE> with <NEW_TUPLE> in space <SPACE> |
| Tuple deleted from space |
space_delete |
VERBOSE |
Delete tuple <TUPLE> from space <SPACE> |
Note
The eval event displays data from the console module
and the eval function of the net.box module.
For more on how they work, see Module console
and Module net.box – eval.
To separate the data, specify console or binary in the session field.
Structure of audit log event
Each audit log event contains a number of fields that can be used to filter and aggregate the resulting logs.
An example of a Tarantool audit log entry in JSON:
{
"time": "2024-01-15T13:39:36.046+0300",
"uuid": "cb44fb2b-5c1f-4c4b-8f93-1dd02a76cec0",
"severity": "VERBOSE",
"remote": "unix/:(socket)",
"session_type": "console",
"module": "tarantool",
"user": "admin",
"type": "auth_ok",
"tag": "",
"description": "Authenticate user Admin"
}
Each event consists of the following fields:
| Field |
Description |
Example |
time |
Time of the event |
2024-01-15T16:33:12.368+0300 |
uuid |
Since 3.0.0. A unique identifier of audit log event |
cb44fb2b-5c1f-4c4b-8f93-1dd02a76cec0 |
severity |
Since 3.0.0. A severity level. Each system audit event has a severity level determined by its importance.
Custom events have the INFO severity level by default. |
VERBOSE |
remote |
Remote host that triggered the event |
unix/:(socket) |
session_type |
Session type |
console |
module |
Audit log module. Set to tarantool for system events;
can be overwritten for custom events |
tarantool |
user |
User who triggered the event |
admin |
type |
Audit event type |
auth_ok |
tag |
A text field that can be overwritten by the user |
|
description |
Human-readable event description |
Authenticate user Admin |
Built-in event groups are used to filter the event types that you want to audit.
For example, you can set to record only authorization events or only events related to a space.
Tarantool provides the following event groups:
all – all events.
Note
Events call and eval are included only in the all group.
audit – audit_enable event.
auth – authorization events: auth_ok, auth_fail.
priv – events related to authentication, authorization, users, and roles:
user_create, user_drop, role_create, role_drop, user_enable, user_disable,
user_grant_rights, user_revoke_rights, role_grant_rights, role_revoke_rights.
ddl – events of space creation, altering, and dropping:
space_create, space_alter, space_drop.
dml – events of data modification in spaces:
space_insert, space_replace, space_delete.
data_operations – events of data modification or selection from spaces:
space_select, space_insert, space_replace, space_delete.
compatibility – events available in Tarantool before the version 2.10.0.
auth_ok, auth_fail, disconnect, user_create, user_drop,
role_create, role_drop, user_enable, user_disable,
user_grant_rights, user_revoke_rights, role_grant_rights.
role_revoke_rights, password_change, access_denied.
This group enables the compatibility with earlier Tarantool versions.
Warning
Be careful when recording all and data_operations event groups.
The more events you record, the slower the requests are processed over time.
It is recommended that you select only those groups
whose events your company needs to monitor and analyze.
Tarantool provides an API for writing custom audit log events.
To enable these events, specify the custom value in the audit_log.filter option:
filter: [ user_create,data_operations,ddl,custom ]
To log an event, use the audit.log() function that takes one of the following values:
Message string. Printed to the audit log with type message:
audit.log('Hello, Alice!')
Format string and arguments. Passed to string format and then output to the audit log with type message:
audit.log('Hello, %s!', 'Bob')
Table with audit log field values. The table must contain at least one field – description.
audit.log({ type = 'custom_hello', description = 'Hello, World!' })
audit.log({ type = 'custom_farewell', user = 'eve', module = 'custom', description = 'Farewell, Eve!' })
Alternatively, you can use audit.new() to create a new log module.
This allows you to avoid passing all custom audit log fields each time audit.log() is called.
The audit.new() function takes a table of audit log field values (same as audit.log()).
The type of the log module for writing custom events must either be message or have the custom_ prefix.
local my_audit = audit.new({ type = 'custom_hello', module = 'my_module' })
my_audit:log('Hello, Alice!')
my_audit:log({ tag = 'admin', description = 'Hello, Bob!' })
Overwrite custom event fields
It is possible to overwrite most of the custom audit log fields using audit.new() or audit.log().
The only audit log field that cannot be overwritten is time.
audit.log({ type = 'custom_hello', description = 'Hello!',
session_type = 'my_session', remote = 'my_remote' })
If omitted, the session_type is set to the current session type, remote is set to the remote peer address.
Note
To avoid confusion with system events, the value of the type field must either be message (default)
or begin with the custom_ prefix. Otherwise, you receive the error message.
Custom events are filtered out by default.
By default, custom events have the INFO severity level.
To override the level, you can:
- specify the
severity field
- use a shortcut function
The following shortcuts are available:
| Shortcut |
Equivalent |
audit.verbose(...) |
audit.log({severity = 'VERBOSE', ...}) |
audit.info(...) |
audit.log({severity = 'INFO', ...}) |
audit.warning(...) |
audit.log({severity = 'WARNING', ...}) |
audit.alarm(...) |
audit.log({severity = 'ALARM', ...}) |
Example
audit.log({ severity = 'VERBOSE', description = 'Hello!' })
How many events can be recorded?
If you write to a file, the size of the Tarantool audit log is limited by the disk space.
If you write to a system logger, the size of the Tarantool audit log is limited by the system logger.
If you write to a pipe, the size of the Tarantool audit message is limited by the system buffer.
If the audit_log.nonblock = false, if audit_log.nonblock = true, there is no limit.
How often should audit logs be reviewed?
Consider setting up a schedule in your company. It is recommended to review audit logs at least every 3 months.
How long should audit logs be stored?
It is recommended to store audit logs for at least one year.
What is the best way to process audit logs?
It is recommended to use SIEM systems for this issue.
Security audit
This document will help you audit the security of a Tarantool cluster.
It explains certain security aspects, their rationale, and the ways to check them.
For details on how to configure Tarantool Enterprise Edition and its infrastructure for each aspect,
refer to the security hardening guide.
Encryption of external iproto traffic
Tarantool uses the
iproto binary protocol
for replicating data between instances and also in the connector libraries.
Since version 2.10.0, the Enterprise Edition has the built-in support for using SSL to encrypt the client-server communications over binary connections.
For details on enabling SSL encryption, see the Securing connections with SSL section of this document.
In case the built-in encryption is not enabled, we recommend using VPN to secure data exchange between data centers.
When a Tarantool cluster does not use iproto for external requests,
connections to the iproto ports should be allowed only between Tarantool instances.
For more details on configuring ports for iproto,
see the advertise_uri section in the Cartridge documentation.
HTTPS connection termination
A Tarantool instance can accept HTTP connections from external services
or access the administrative web UI.
All such connections must go through an HTTPS-providing web server,
running on the same host, such as nginx.
This requirement is for both virtual and physical hosts.
Running HTTP traffic through a few separate hosts with HTTPS termination
is not sufficiently secure.
Tarantool accepts HTTP connections on a specific port.
It must be only available on the same host for nginx to connect to it.
Check that the configured HTTP port is closed
and that the HTTPS port (443 by default) is open.
Restricted access to the administrative console
The console module provides
a way to connect to a running instance and run custom Lua code.
This can be useful for development and administration.
The following code examples open connections on a TCP port and on a UNIX socket.
console.listen(<port number>)
console.listen('/var/lib/tarantool/socket_name.sock')
Opening an administrative console through a TCP port is always unsafe.
Check that there are no calls like console.listen(<port_number>)
in the code.
Connecting through a socket requires having the write permission on the
/var/lib/tarantool directory.
Check that write permission to this directory is limited to the tarantool user.
Connecting to the instance with tt connect or tarantoolctl connect without
user credentials (under the guest user) must be disabled.
There are two ways to check this vulnerability:
Check that the source code doesn’t grant access to the guest user.
The corresponding code can look like this:
box.schema.user.grant('guest',
'read,write',
'universe',
nil, { if_not_exists = true }
)
Besides searching for the whole code pattern,
search for any entries of 'universe'.
Try connecting with tt connect to each Tarantool node.
For more details, refer to the documentation on
access control.
Authorization in the web UI
Using the web interface must require logging in with a username and password.
Keeping two or more snapshots
In order to have a reliable backup, a Tarantool instance must keep
two or more latest snapshots.
This should be checked on each Tarantool instance.
The snapshot_count value
determines the number of kept snapshots.
Configuration values are primarily set in the configuration files
but can be overridden with environment variables and command-line arguments.
So, it’s best to check both the values in the configuration files and the actual values
using the console:
Enabled write-ahead logging (WAL)
Tarantool records all incoming data in the write-ahead log (WAL).
The WAL must be enabled to ensure that data will be recovered in case of
a possible instance restart.
Secure values of the wal.mode configuration option are write and fsync:
wal:
dir: 'var/lib/{{ instance_name }}/wals'
mode: 'write'
An exclusion from this requirement is when the instance is processing data,
which can be freely rejected - for example, when Tarantool is used for caching.
In this case, WAL can be disabled to reduce i/o load.
The logging level is INFO or higher
The logging level should be set to 5 (INFO), 6 (VERBOSE), or 7 (DEBUG).
Application logs will then have enough information to research a possible security breach.
For a full list of logging levels, see the
log_level reference.
Tarantool should use journald for logging.
Security hardening guide
This guide explains how to enhance security in your Tarantool Enterprise Edition’s
cluster using built-in features and provides general recommendations on security
hardening.
If you need to perform a security audit of a Tarantool Enterprise cluster,
refer to the security checklist.
Tarantool Enterprise Edition does not provide a dedicated API for security control. All
the necessary configurations can be done via an administrative console or
initialization code.
Tarantool Enterprise Edition has the following built-in security features:
Tarantool Enterprise Edition supports password-based authentication and allows for two
types of connections:
For more information on authentication and connection types, see the
Security section in Administration.
In addition, Tarantool provides the following functionality:
- Sessions
– states which associate connections with users and make Tarantool API available
to them after authentication.
- Authentication triggers,
which execute actions on authentication events.
- Third-party (external) authentication protocols and services such as LDAP or
Active Directory – supported in the web interface, but unavailable
on the binary-protocol level.
Tarantool Enterprise Edition provides the means for administrators to prevent
unauthorized access to the database and to certain functions.
Tarantool recognizes:
- different users (guests and administrators)
- privileges associated with users
- roles (containers for privileges) granted to users
The following system spaces are used to store users and privileges:
- The
_user space to store usernames and hashed passwords for authentication.
- The
_priv space to store privileges for access control.
For more information, see the
Access control section.
Users who create objects (spaces, indexes, users, roles, sequences, and
functions) in the database become their owners and automatically acquire
privileges for what they create. For more information, see the
Owners and privileges section.
Tarantool Enterprise Edition has a built-in audit log that records events such as:
- authentication successes and failures
- connection closures
- creation, removal, enabling, and disabling of users
- changes of passwords, privileges, and roles
- denials of access to database objects
The audit log contains:
- timestamps
- usernames of users who performed actions
- event types (for example,
user_create, user_enable, disconnect)
- descriptions
You can configure the following audit log options:
- audit_log.to – enable audit logging and define the log location (file, pipe, or syslog).
The option is similar to the log.
- audit_log.nonblock – specify the logging behavior if the system is not ready to write.
The option is similar to the log_nonblock.
For more information on logging, see the following:
Access permissions to audit log files can be set up as to any other Unix file
system object – via chmod.
Recommendations on security hardening
This section lists recommendations that can help you harden the cluster’s security.
Since version 2.10.0, Tarantool Enterprise Edition has built-in support for using SSL to encrypt the client-server communications over binary connections,
that is, between Tarantool instances in a cluster. For details on enabling SSL encryption, see the Securing connections with SSL section of this guide.
In case the built-in encryption is not set for particular connections, consider the following security recommendations:
- setting up connection tunneling, or
- encrypting the actual data stored in the database.
For more information on data encryption, see the
crypto module reference.
The HTTP server module provided by rocks
does not support the HTTPS protocol. To set up a secure connection for a client
(e.g., REST service), consider hiding the Tarantool instance (router if it is
a cluster of instances) behind an Nginx server and setting up an SSL certificate
for it.
To make sure that no information can be intercepted ‘from the wild’, run nginx
on the same physical server as the instance and set up their communication over
a Unix socket. For more information, see the
socket module reference.
To protect the cluster from any unwanted network activity ‘from the wild’,
configure the firewall on each server to allow traffic on ports listed in
Network requirements.
If you are using static IP addresses, whitelist them, again, on each server as
the cluster has a full mesh network topology. Consider blacklisting all the other
addresses on all servers except the router (running behind the Nginx server).
Tarantool Enterprise does not provide defense against DoS or DDoS attacks.
Consider using third-party software instead.
Tarantool Enterprise Edition does not keep checksums or provide the means to control
data integrity. However, it ensures data persistence using a write-ahead log,
regularly snapshots the entire data set to disk, and checks the data format
whenever it reads the data back from the disk. For more information, see the
Data persistence section.
Triggers
Triggers, also known as callbacks, are functions which the server
executes when certain events happen.
To associate an event with a callback,
pass the callback to the corresponding on_event function:
Then the server will store the callback function and call it
when the corresponding event happens.
All triggers have the following characteristics:
- Triggers are defined only by the ‘admin’ user.
- Triggers are stored in the Tarantool instance’s memory, not in the database.
Therefore triggers disappear when the instance is shut down.
To make them permanent, put function definitions and trigger settings
into Tarantool’s initialization script.
- Triggers have low overhead. If a trigger is not defined, then the overhead
is minimal: merely a pointer dereference and check. If a trigger is defined,
then its overhead is equivalent to the overhead of calling a function.
- There can be multiple triggers for one event. In this case, triggers are
executed in the reverse order that they were defined in.
- Triggers must work within the event context, that is, operate variables passed
as the trigger function arguments. Triggers should not affect the global state
of the program or change things unrelated to the event. If a trigger performs
such calls as, for example, os.exit()
or box.rollback(), the result of
its execution is undefined.
- Triggers are replaceable. The request to “redefine a trigger” implies
passing a new trigger function and an old trigger function
to one of the
on_event functions.
- The
on_event functions all have parameters which are function
pointers, and they all return function pointers. Remember that a Lua
function definition such as function f() x = x + 1 end is the same
as f = function () x = x + 1 end - in both cases f gets a function pointer.
And trigger = box.session.on_connect(f) is the same as
trigger = box.session.on_connect(function () x = x + 1 end) - in both cases
trigger gets the function pointer which was passed.
- You can call any
on_event function with no arguments to get a list
of its triggers. For example, use box.session.on_connect() to return
a table of all connect-trigger functions.
- Triggers can be useful in solving problems with replication. See details in
Resolving replication conflicts.
Example:
Here we log connect and disconnect events into Tarantool server log.
log = require('log')
function on_connect_impl()
log.info("connected "..box.session.peer()..", sid "..box.session.id())
end
function on_disconnect_impl()
log.info("disconnected, sid "..box.session.id())
end
function on_auth_impl(user)
log.info("authenticated sid "..box.session.id().." as "..user)
end
function on_connect() pcall(on_connect_impl) end
function on_disconnect() pcall(on_disconnect_impl) end
function on_auth(user) pcall(on_auth_impl, user) end
box.session.on_connect(on_connect)
box.session.on_disconnect(on_disconnect)
box.session.on_auth(on_auth)
Applications
Using Tarantool as an application server, you can write your own applications.
Tarantool’s native language for writing applications is
Lua, so a typical application would be
a file that contains your Lua script. But you can also write applications
in C or C++.
Launching an application
Using Tarantool as an application server, you can write your own applications.
Tarantool’s native language for writing applications is
Lua, so a typical application would be
a file that contains your Lua script. But you can also write applications
in C or C++.
Note
If you’re new to Lua, we recommend going over the interactive Tarantool
tutorial before proceeding with this chapter. To launch the tutorial, say
tutorial() in Tarantool console:
Let’s create and launch our first Lua application for Tarantool.
Here’s a simplest Lua application, the good old “Hello, world!”:
#!/usr/bin/env tarantool
print('Hello, world!')
We save it in a file. Let it be myapp.lua in the current directory.
Now let’s discuss how we can launch our application with Tarantool.
If we run Tarantool in a Docker container,
the following command will start Tarantool without any application:
$ # create a temporary container and run it in interactive mode
$ docker run --rm -t -i tarantool/tarantool:latest
To run Tarantool with our application, we can say:
$ # create a temporary container and
$ # launch Tarantool with our application
$ docker run --rm -t -i \
-v `pwd`/myapp.lua:/opt/tarantool/myapp.lua \
-v /data/dir/on/host:/var/lib/tarantool \
tarantool/tarantool:latest tarantool /opt/tarantool/myapp.lua
Here two resources on the host get mounted in the container:
- our application file (myapp.lua) and
- Tarantool data directory (
/data/dir/on/host).
By convention, the directory for Tarantool application code inside a container
is /opt/tarantool, and the directory for data is /var/lib/tarantool.
Launching a binary program
If we run Tarantool from a package or from a source build, we can launch our application:
- in the script mode,
- as a server application, or
- as a daemon service.
The simplest way is to pass the filename to Tarantool at start:
$ tarantool myapp.lua
Hello, world!
$
Tarantool starts, executes our script in the script mode and exits.
Now let’s turn this script into a server application. We use
box.cfg from Tarantool’s built-in
Lua module to:
- launch the database (a database has a persistent on-disk state, which needs
to be restored after we start an application) and
- configure Tarantool as a server that accepts requests over a TCP port.
We also add some simple database logic, using
space.create() and
create_index() to create a space with a primary
index. We use the function box.once() to make sure that our
logic will be executed only once when the database is initialized for the first
time, so we don’t try to create an existing space or index on each invocation
of the script:
#!/usr/bin/env tarantool
-- Configure database
box.cfg {
listen = 3301
}
box.once("bootstrap", function()
box.schema.space.create('tweedledum')
box.space.tweedledum:create_index('primary',
{ type = 'TREE', parts = {1, 'unsigned'}})
end)
Now we launch our application in the same manner as before:
$ tarantool myapp.lua
Hello, world!
2017-08-11 16:07:14.250 [41436] main/101/myapp.lua C> version 2.1.0-429-g4e5231702
2017-08-11 16:07:14.250 [41436] main/101/myapp.lua C> log level 5
2017-08-11 16:07:14.251 [41436] main/101/myapp.lua I> mapping 1073741824 bytes for tuple arena...
2017-08-11 16:07:14.255 [41436] main/101/myapp.lua I> recovery start
2017-08-11 16:07:14.255 [41436] main/101/myapp.lua I> recovering from `./00000000000000000000.snap'
2017-08-11 16:07:14.271 [41436] main/101/myapp.lua I> recover from `./00000000000000000000.xlog'
2017-08-11 16:07:14.271 [41436] main/101/myapp.lua I> done `./00000000000000000000.xlog'
2017-08-11 16:07:14.272 [41436] main/102/hot_standby I> recover from `./00000000000000000000.xlog'
2017-08-11 16:07:14.274 [41436] iproto/102/iproto I> binary: started
2017-08-11 16:07:14.275 [41436] iproto/102/iproto I> binary: bound to [::]:3301
2017-08-11 16:07:14.275 [41436] main/101/myapp.lua I> done `./00000000000000000000.xlog'
2017-08-11 16:07:14.278 [41436] main/101/myapp.lua I> ready to accept requests
This time, Tarantool executes our script and keeps working as a server,
accepting TCP requests on port 3301. We can see Tarantool in the current
session’s process list:
$ ps | grep "tarantool"
PID TTY TIME CMD
41608 ttys001 0:00.47 tarantool myapp.lua <running>
But the Tarantool instance will stop if we close the current terminal window.
To detach Tarantool and our application from the terminal window, we can launch
it in the daemon mode. To do so, we add some parameters to box.cfg{}:
- background =
true that actually tells
Tarantool to work as a daemon service,
- log =
'dir-name' that tells the Tarantool
daemon where to store its log file (other log settings are available in
Tarantool log module), and
- pid_file =
'file-name' that tells the
Tarantool daemon where to store its pid file.
For example:
box.cfg {
listen = 3301,
background = true,
log = '1.log',
pid_file = '1.pid'
}
We launch our application in the same manner as before:
$ tarantool myapp.lua
Hello, world!
$
Tarantool executes our script, gets detached from the current shell session
(you won’t see it with ps | grep "tarantool") and continues working in the
background as a daemon attached to the global session (with SID = 0):
$ ps -ef | grep "tarantool"
PID SID TIME CMD
42178 0 0:00.72 tarantool myapp.lua <running>
Now that we have discussed how to create and launch a Lua application for
Tarantool, let’s dive deeper into programming practices.
Application roles
An application role is a Lua module that implements specific functions or logic.
You can turn on or off a particular role for certain instances in a configuration without restarting these instances.
A role is run when a configuration is loaded or reloaded.
Roles can be divided into the following groups:
- Tarantool’s built-in roles.
For example, the
config.storage role can be used to make a Tarantool replica set act as a configuration storage.
- Roles provided by third-party Lua modules.
For example, the CRUD module provides the
roles.crud-storage and roles.crud-router roles that enable CRUD operations in a sharded cluster.
- Custom roles that are developed as a part of a cluster application.
For example, you can create a custom role to define a stored procedure or implement a supplementary service, such as an email notifier or a replicator.
This section describes how to develop custom roles.
To learn how to enable and configure roles, see Enabling and configuring roles.
Note
Don’t confuse application roles with other role types:
- A role is a container for privileges that can be granted to users. Learn more in Roles.
- A role of a replica set in regard to sharding. Learn more in Sharding roles.
Providing a role configuration
A custom role can be configured in the same way as roles provided by Tarantool or third-party Lua modules.
You can learn more from Enabling and configuring roles.
This example shows how to enable and configure the greeter role, which is implemented in the next section:
instance001:
roles: [ greeter ]
roles_cfg:
greeter:
greeting: 'Hi'
The role configuration provided in roles_cfg can be accessed when validating and applying this configuration.
Tarantool includes the experimental.config.utils.schema
built-in module that provides tools for managing user-defined configurations
of applications (app.cfg) and roles (roles_cfg). The examples below show its
basic usage.
Given that a role is a Lua module, a role name is passed to require() to obtain the module.
When developing an application, you can place a file with the role code next to the cluster configuration file.
A custom application role is an object which implements custom functions or logic adding to Tarantool’s built-in roles and roles provided by third-party Lua modules.
For example, a logging role can be created to add logging functionality on top of the built-in one.
Creating a custom role includes the following steps:
- (Optional) Define the role configuration schema.
- Define a function that validates a role configuration.
- Define a function that applies a validated configuration.
- Define a function that stops a role.
- (Optional) Define roles from which this custom role depends on.
- (Optional) Define the
on_event callback function.
As a result, a role module should return an object that has corresponding functions and fields specified:
return {
validate = function() -- ... -- end,
apply = function() -- ... -- end,
stop = function() -- ... -- end,
dependencies = { -- ... -- },
on_event = function(config, key, value)
local log = require('log')
log.info('roles_cfg.my_role.foo: ' .. config.foo)
log.info('on_event is triggered by ' .. key)
log.info('is_ro: ' .. value.is_ro)
end,
}
The examples in this article show how to do this.
You can omit the optional steps and get a simple role as in the example below.
return {
validate = function() -- ... -- end,
apply = function() -- ... -- end,
stop = function() -- ... -- end,
}
You can modify a role, for example, by adding dependencies or specifying the on_event callback.
If you modify a role, you need to restart the Tarantool instance with the role in order to apply the changes.
Note
- Code snippets shown in this section are included from the following application: application_role_cfg.
Defining the role configuration schema
The experimental.config.utils.schema built-in module
provides the schema_object class. An object of this class defines
a custom configuration scheme of a role or an application.
This example shows how to define a schema that reflects the role configuration shown above:
local greeter_schema = schema.new('greeter', schema.record({
greeting = schema.scalar({
type = 'string',
allowed_values = { 'Hi', 'Hello' }
})
}))
If you don’t use the module, skip this step. In this case, use the cfg argument
of the role’s validate() and apply() functions to refer to its configuration
values, for example, cfg.greeting.
Validating a role configuration
To validate a role configuration, you need to define the validate([cfg]) function.
In the example below, the validate() function of the role configuration schema
is used to validate the greeting value:
local function validate(cfg)
greeter_schema:validate(cfg)
end
If the configuration is not valid, validate() reports an unrecoverable error by throwing an error object.
Applying a role configuration
To apply the validated configuration, define the apply([cfg]) function.
As the validate() function, apply() provides access to a role’s configuration using the cfg argument.
In the example below, the apply() function uses the log module to write a value from the role configuration to the log:
local function apply(cfg)
log.info("%s from the 'greeter' role!", greeter_schema:get(cfg, 'greeting'))
end
To stop a role, use the stop() function.
In the example below, the stop() function uses the log module to indicate that a role is stopped:
local function stop()
log.info("The 'greeter' role is stopped")
end
When you’ve defined all the role functions, you need to return an object that has corresponding functions specified:
return {
validate = validate,
apply = apply,
stop = stop,
}
To define a role’s dependencies, use the dependencies field.
In this example, the byeer role has the greeter role as the dependency:
-- byeer.lua --
local log = require('log').new("byeer")
return {
dependencies = { 'greeter' },
validate = function() end,
apply = function() log.info("Bye from the 'byeer' role!") end,
stop = function() end,
}
A role cannot be started without its dependencies.
This means that all the dependencies of a role should be defined in the roles configuration parameter:
instance001:
roles: [ greeter, byeer ]
You can find the full example here: application_role_cfg.
Since version 3.3.1, you can define the on_event callback for custom roles. The on_event callback is called
every time a box.status system event is broadcasted.
If multiple custom roles have the on_event callback defined, these callbacks are called one after another in the order
defined by roles dependencies.
The on_event callback accepts 3 arguments, when it is called:
config, which contains the configuration of the role;
key, which reflects the trigger event and is set to:
config.apply if the callback was triggered by a configuration update;
box.status if it was triggered by the box.status system event.
value, which shows the information about the instance status as in the trigger box.status system event.
If the callback is triggered by a configuration update, the value shows the information of the most recent box.status system event.
Note
- All
on_event callbacks with the config.apply key are executed as a part of the configuration process.
Process statuses ready or check_warnings are reached only after all such on_event callbacks are done.
- All
on_event callbacks are executed inside of a pcall. If an error is raised for a callback, it is logged
with the error level and the series execution continues.
The example of the on_event callback is provided in the spaces creation article below.
Adding initialization code
You can add initialization code to a role by defining and calling a function with an arbitrary name at the top level of a module, for example:
local function init()
-- ... --
end
init()
For example, you can create spaces, define indexes, or grant privileges to specific users or roles.
See also: Specifics of creating spaces.
Specifics of creating spaces
To create a space in a role, you need to make sure that the target instance is in read-write mode (its box.info.ro is false).
You can check an instance state by subscribing to the box.status event using box.watch():
box.watch('box.status', function()
-- creating a space
-- ...
end)
Note
Given that a role may be enabled when an instance is already in read-write mode,
you also need to execute schema initialization code from apply().
To make sure a space is created only once, use the if_not_exists option.
Since version 3.3.1, you can define space creation in a role via
the on_event callback function.
See the example of such definition below:
return {
validate = function() end,
apply = function() end,
stop = function() end,
on_event = function(config, key, value)
-- Can only create spaces on RW.
if value.is_ro then
return
end
-- Assume the role config is a table.
if type(config) ~= 'table' then
error('Config must be a table')
end
local space_name = config.space_name or 'default'
box.schema.space.create(space_name, {
if_not_exists = true,
})
end
}
A role’s life cycle includes the stages described below.
Loading roles
On each run, all roles are loaded in the order they are specified in the configuration.
This stage takes effect when a role is enabled or an instance with this role is restarted.
At this stage, a role executes the initialization code.
A role cannot be started if it has dependencies that are not specified in a configuration.
Note
Dependencies do not affect the order in which roles are loaded.
However, the validate(), apply(), and stop() functions are executed taking dependencies into account.
Learn more in Executing functions for dependent roles.
Stopping roles
This stage takes effect during a configuration reload when a role is removed from the configuration for a given instance.
Note that all stop() calls are performed before any validate() or apply() calls.
This means that old roles are stopped first, and only then new roles are started.
Validating a role’s configurations
At this stage, a configuration for each role is validated using the corresponding validate() function in the same order in which they are specified in the configuration.
Applying a role’s configurations
At this stage, a configuration for each role is applied using the corresponding apply() function in the same order in which they are specified in the configuration.
All role’s functions report an unrecoverable error by throwing an error object.
If an error is thrown in any phase, applying a configuration is stopped.
If starting or stopping a role throws an error, no roles are stopped or started afterward.
An error is caught and shown in config:info() in the alerts section.
Executing functions for dependent roles
For roles that depend on each other, their validate(), apply(), and stop() functions are executed taking into account the dependencies.
Suppose, there are three independent and two dependent roles:
role1
role2
role3
└─── role4
└─── role5
role1, role2, and role5 are independent roles.
role3 depends on role4, role4 depends on role5.
The roles are enabled in a configuration as follows:
roles: [ role1, role2, role3, role4, role5 ]
In this case, validate() and apply() for these roles are executed in the following order:
role1 -> role2 -> role5 -> role4 -> role3
Roles removed from a configuration are stopped in the order reversed to the order they are specified in a configuration, taking into account the dependencies.
Suppose, all roles except role1 are removed from the configuration above:
After reloading a configuration, stop() functions for the removed roles are executed in the following order:
role3 -> role4 -> role5 -> role2
Example: Role without a configuration
The example below shows how to enable the custom greeter role for instance001:
instance001:
roles: [ greeter ]
The implementation of this role looks as follows:
-- greeter.lua --
return {
validate = function() end,
apply = function() require('log').info("Hi from the 'greeter' role!") end,
stop = function() end,
}
Example on GitHub: application_role
Example: Role with a configuration
The example below shows how to enable the custom greeter role for instance001 and specify the configuration for this role:
instance001:
roles: [ greeter ]
roles_cfg:
greeter:
greeting: 'Hi'
The implementation of this role looks as follows:
-- greeter.lua --
local log = require('log').new("greeter")
local schema = require('experimental.config.utils.schema')
local greeter_schema = schema.new('greeter', schema.record({
greeting = schema.scalar({
type = 'string',
allowed_values = { 'Hi', 'Hello' }
})
}))
local function validate(cfg)
greeter_schema:validate(cfg)
end
local function apply(cfg)
log.info("%s from the 'greeter' role!", greeter_schema:get(cfg, 'greeting'))
end
local function stop()
log.info("The 'greeter' role is stopped")
end
return {
validate = validate,
apply = apply,
stop = stop,
}
Example on GitHub: application_role_cfg
The example below shows how to enable and configure the http-api custom role:
instance001:
roles: [ http-api ]
roles_cfg:
http-api:
host: '127.0.0.1'
port: 8080
The implementation of this role looks as follows:
-- http-api.lua --
local httpd
local json = require('json')
local schema = require('experimental.config.utils.schema')
local function validate_host(host, w)
local host_pattern = "^(%d+)%.(%d+)%.(%d+)%.(%d+)$"
if not host:match(host_pattern) then
w.error("'host' should be a string containing a valid IP address, got %q", host)
end
end
local function validate_port(port, w)
if port <= 1 or port >= 65535 then
w.error("'port' should be between 1 and 65535, got %d", port)
end
end
local listen_address_schema = schema.new('listen_address', schema.record({
host = schema.scalar({
type = 'string',
validate = validate_host,
default = '127.0.0.1',
}),
port = schema.scalar({
type = 'integer',
validate = validate_port,
default = 8080,
}),
}))
local function validate(cfg)
listen_address_schema:validate(cfg)
end
local function apply(cfg)
if httpd then
httpd:stop()
end
local cfg_with_defaults = listen_address_schema:apply_default(cfg)
local host = listen_address_schema:get(cfg_with_defaults, 'host')
local port = listen_address_schema:get(cfg_with_defaults, 'port')
httpd = require('http.server').new(host, port)
local response_headers = { ['content-type'] = 'application/json' }
httpd:route({ path = '/band/:id', method = 'GET' }, function(req)
local id = req:stash('id')
local band_tuple = box.space.bands:get(tonumber(id))
if not band_tuple then
return { status = 404, body = 'Band not found' }
else
local band = { id = band_tuple['id'],
band_name = band_tuple['band_name'],
year = band_tuple['year'] }
return { status = 200, headers = response_headers, body = json.encode(band) }
end
end)
httpd:route({ path = '/band', method = 'GET' }, function(req)
local limit = req:query_param('limit')
if not limit then
limit = 5
end
local band_tuples = box.space.bands:select({}, { limit = tonumber(limit) })
local bands = {}
for _, tuple in pairs(band_tuples) do
local band = { id = tuple['id'],
band_name = tuple['band_name'],
year = tuple['year'] }
table.insert(bands, band)
end
return { status = 200, headers = response_headers, body = json.encode(bands) }
end)
httpd:start()
end
local function stop()
httpd:stop()
end
local function init()
require('data'):add_sample_data()
end
init()
return {
validate = validate,
apply = apply,
stop = stop,
}
Example on GitHub: application_role_http_api
-
validate([cfg])
Validate a role’s configuration.
This function is called on instance startup or when the configuration is reloaded for the instance with this role.
Note that the validate() function is called regardless of whether the role’s configuration or any field in a cluster’s configuration is changed.
validate() should throw an error if the validation fails.
| Parameters: |
- cfg – a role’s role configuration to be validated.
This parameter provides access to configuration options defined in roles_cfg.<role_name>.
To get values of configuration options placed outside
roles_cfg.<role_name>, use config:get().
|
See also: Validating a role configuration
-
apply([cfg])
Apply a role’s configuration.
apply() is called after validate() is executed for all the enabled roles.
As the validate() function, apply() is called on instance startup or when the configuration is reloaded for the instance with this role.
apply() should throw an error if the specified configuration can’t be applied.
Note
Note that apply() is not invoked if an instance switches to read-write mode when replication.failover is set to election or supervised.
You can check an instance state by subscribing to the box.status event using box.watch().
| Parameters: |
- cfg – a role’s role configuration to be applied.
This parameter provides access to configuration options defined in roles_cfg.<role_name>.
To get values of configuration options placed outside
roles_cfg.<role_name>, use config:get().
|
See also: Applying a role configuration
-
stop()
Stop a role.
This function is called on configuration reload if the role is removed from roles for the given instance.
See also: Stopping a role
-
dependencies
(Optional) Define a role’s dependencies.
See also: Role dependencies
Fibers, yields, and cooperative multitasking
Creating a fiber is the Tarantool way of making application logic work in the background at all times.
A fiber is a set of instructions that are executed with cooperative multitasking:
the instructions contain yield signals, upon which control is passed to another fiber.
Fibers are similar to threads of execution in computing.
The key difference is that threads use
preemptive multitasking, while fibers use cooperative multitasking (see below).
This gives fibers the following two advantages over threads:
- Better controllability. Threads often depend on the kernel’s thread scheduler
to preempt a busy thread and resume another thread, so preemption may occur
unpredictably. Fibers yield themselves to run another fiber while executing,
so yields are controlled by application logic.
- Higher performance. Threads require more resources to preempt as they need to
address the system kernel. Fibers are lighter and faster as they don’t need to
address the kernel to yield.
Yet fibers have some limitations as compared with threads, the main limitation
being no multi-core mode. All fibers in an application belong to a single thread,
so they all use the same CPU core as the parent thread. Meanwhile, this
limitation is not really serious for Tarantool applications, because a typical
bottleneck for Tarantool is the HDD, not the CPU.
A fiber has all the features of a Lua
coroutine and all programming
concepts that apply for Lua coroutines will apply for fibers as well. However,
Tarantool has made some enhancements for fibers and has used fibers internally.
So, although the use of coroutines is possible and supported, the use of fibers is
recommended.
Any live fiber can be in one of three states: running, suspended, and
ready. After a fiber dies, the dead status returns.
To learn more about fibers, go to the fiber module documentation.
Yield is an action that occurs in a cooperative environment that
transfers control of the thread from the current fiber to another fiber that is ready to execute.
Any live fiber can be in one of three states: running, suspended, and
ready. After a fiber dies, the dead status is returned. By observing
fibers from the outside, you can only see running (for the current fiber)
and suspended for any other fiber waiting for an event from the event loop (ev)
for execution.

After a yield has occurred, the next ready fiber is taken from the queue and executed.
When there are no more ready fibers, execution is transferred to the event loop.
After a fiber has yielded and regained control, it immediately issues testcancel.
Yields can be explicit or implicit.
Explicit yields are clearly visible from the invoking code. There are only two
explicit yields: fiber.yield() and fiber.sleep(t).
- fiber.yield() yields execution to another
ready fiber while putting itself in the ready state, meaning that it will be executed again as soon as possible while being polite to other fibers waiting for execution.
- fiber.sleep(t) yields execution to another
ready fiber and puts itself in the suspended state for time t until time passes and the event loop wakes up this fiber to the ready state.
In general, it is good behavior for long-running cpu-intensive tasks to yield periodically to
be cooperative to other waiting fibers.
On the other hand, there are many operations, such as operations with sockets, file system,
and disk I/O, which imply some waiting for the current fiber while others can be
executed. When such an operation occurs, a possible blocking operation would be passed into the
event loop and the fiber would be suspended until the resource is ready to
continue fiber execution.
Here is the list of implicitly yielding operations:
- Connection establishment (socket).
- Socket read and write (socket).
- Filesystem operations (from fio).
- Channel data transfer (fiber.channel).
- File input/output (from fio).
- Console operations (since console is a socket).
- HTTP requests (since HTTP is a socket operation).
- Database modifications (if they imply a disk write).
- Database reading for the vinyl engine.
- Invocation of another process (popen).
Note
Please note that all operations of the os module are non-cooperative and
exclusively block the whole tx thread.
For memtx, since all data is in memory, there is no yielding for a read request
(like :select, :pairs, :get).
For vinyl, since some data may not be in memory, there may be disk I/O for a
read (to fetch data from disk) or write (because a stall may occur while waiting for memory to be freed).
For both memtx and vinyl, since data change requests
must be recorded in the WAL, there is normally a box.commit().
With the default autocommit mode the following operations are yielding:
To provide atomicity for transactions in transaction mode, some changes are applied to the
modification operations for the memtx engine. After executing
box.begin or within a box.atomic
call, any modification operation will not yield, and yield will occur only on box.commit or upon return
from box.atomic. Meanwhile, box.rollback does not yield.
That is why executing separate commands like select(), insert(), update() in the console inside a
transaction without MVCC will cause it to an abort. This is due to implicit yield after each
chunk of code is executed in the console.
space:get()
space:insert()
The sequence has one yield, at the end of the insert, caused by implicit commit;
get() has nothing to write to the WAL and so does not yield.
box.begin()
space1:get()
space1:insert()
space2:get()
space2:insert()
box.commit()
The sequence has one yield, at the end of the box.commit, none of the inserts are yielding.
space:get()
space:insert()
The sequence has one to three yields, since get() may yield if the data is not in the cache,
insert() may yield if it waits for available memory, and there is an implicit yield
at commit.
box.begin()
space1:get()
space1:insert()
space2:get()
space2:insert()
box.commit()
The sequence may yield from 1 to 5 times.
Assume that there are tuples in the memtx space tester where the third field
represents a positive dollar amount.
Let’s start a transaction, withdraw from tuple#1, deposit in tuple#2, and end
the transaction, making its effects permanent.
If wal_mode = none, then
there is no implicit yielding at the commit time because there are
no writes to the WAL.
If a request if performed via network connector such as net.box and implies
sending requests to the server and receiving responses, then it involves network
I/O and thus implicit yielding. Even if the request that is sent to the server
has no implicit yield. Therefore, the following sequence causes yields
three times sequentially when sending requests to the network and awaiting the results.
conn.space.test:get{1}
conn.space.test:get{2}
conn.space.test:get{3}
Cooperative multitasking means that unless a running fiber deliberately yields
control, it is not preempted by some other fiber. But a running fiber will
deliberately yield when it encounters a “yield point”: a transaction commit,
an operating system call, or an explicit “yield” request.
Any system call which can block will be performed asynchronously, and any running
fiber which must wait for a system call will be preempted, so that another
ready-to-run fiber takes its place and becomes the new running fiber.
This model makes all programmatic locks unnecessary: cooperative multitasking
ensures that there will be no concurrency around a resource, no race conditions,
and no memory consistency issues. The way to achieve this is simple:
Use no yields, explicit or implicit in critical sections, and no one can
interfere with code execution.
For small requests, such as simple UPDATE or INSERT or DELETE or
SELECT, fiber scheduling is fair: it takes little time to process the
request, schedule a disk write, and yield to a fiber serving the next client.
However, a function may perform complex calculations or be written in
such a way that yields take a long time to occur. This can lead to
unfair scheduling when a single client throttles the rest of the system, or to
apparent stalls in processing requests. It is the responsibility of the function
author to avoid this situation. As a protective mechanism, a fiber slice can be used.
Lua cookbook recipes
Here are contributions of Lua programs for some frequent or tricky situations.
You can execute any of these programs by copying the code into a .lua file,
and then entering chmod +x ./program-name.lua
and ./program-name.lua on the terminal.
The first line is a “hashbang”:
This runs Tarantool Lua application server, which should be on the execution
path.
This section contains the following recipes:
Use freely.
See more recipes on Tarantool GitHub.
The standard example of a simple program.
#!/usr/bin/env tarantool
print('Hello, World!')
Use box.once() to initialize a database
(creating spaces) if this is the first time the server has been run.
Then use console.start() to start interactive mode.
#!/usr/bin/env tarantool
-- Configure database
box.cfg {
listen = 3313
}
box.once("bootstrap", function()
box.schema.space.create('tweedledum')
box.space.tweedledum:create_index('primary',
{ type = 'TREE', parts = {1, 'unsigned'}})
end)
require('console').start()
Use the fio module to open, read, and close a file.
#!/usr/bin/env tarantool
local fio = require('fio')
local errno = require('errno')
local f = fio.open('/tmp/xxxx.txt', {'O_RDONLY' })
if not f then
error("Failed to open file: "..errno.strerror())
end
local data = f:read(4096)
f:close()
print(data)
Use the fio module to open, write, and close a file.
#!/usr/bin/env tarantool
local fio = require('fio')
local errno = require('errno')
local f = fio.open('/tmp/xxxx.txt', {'O_CREAT', 'O_WRONLY', 'O_APPEND'},
tonumber('0666', 8))
if not f then
error("Failed to open file: "..errno.strerror())
end
f:write("Hello\n");
f:close()
Use the LuaJIT ffi library to call a C built-in function: printf().
(For help understanding ffi, see the FFI tutorial.)
#!/usr/bin/env tarantool
local ffi = require('ffi')
ffi.cdef[[
int printf(const char *format, ...);
]]
ffi.C.printf("Hello, %s\n", os.getenv("USER"));
Use the LuaJIT ffi library to call a C function: gettimeofday().
This delivers time with millisecond precision, unlike the time function in
Tarantool’s clock module.
#!/usr/bin/env tarantool
local ffi = require('ffi')
ffi.cdef[[
typedef long time_t;
typedef struct timeval {
time_t tv_sec;
time_t tv_usec;
} timeval;
int gettimeofday(struct timeval *t, void *tzp);
]]
local timeval_buf = ffi.new("timeval")
local now = function()
ffi.C.gettimeofday(timeval_buf, nil)
return tonumber(timeval_buf.tv_sec * 1000 + (timeval_buf.tv_usec / 1000))
end
Use the LuaJIT ffi library to call a C library function.
(For help understanding ffi, see the FFI tutorial.)
#!/usr/bin/env tarantool
local ffi = require("ffi")
ffi.cdef[[
unsigned long compressBound(unsigned long sourceLen);
int compress2(uint8_t *dest, unsigned long *destLen,
const uint8_t *source, unsigned long sourceLen, int level);
int uncompress(uint8_t *dest, unsigned long *destLen,
const uint8_t *source, unsigned long sourceLen);
]]
local zlib = ffi.load(ffi.os == "Windows" and "zlib1" or "z")
-- Lua wrapper for compress2()
local function compress(txt)
local n = zlib.compressBound(#txt)
local buf = ffi.new("uint8_t[?]", n)
local buflen = ffi.new("unsigned long[1]", n)
local res = zlib.compress2(buf, buflen, txt, #txt, 9)
assert(res == 0)
return ffi.string(buf, buflen[0])
end
-- Lua wrapper for uncompress
local function uncompress(comp, n)
local buf = ffi.new("uint8_t[?]", n)
local buflen = ffi.new("unsigned long[1]", n)
local res = zlib.uncompress(buf, buflen, comp, #comp)
assert(res == 0)
return ffi.string(buf, buflen[0])
end
-- Simple test code.
local txt = string.rep("abcd", 1000)
print("Uncompressed size: ", #txt)
local c = compress(txt)
print("Compressed size: ", #c)
local txt2 = uncompress(c, #txt)
assert(txt2 == txt)
Create Lua tables, and print them.
Notice that for the ‘array’ table the iterator function
is ipairs(), while for the ‘map’ table the iterator function
is pairs(). (ipairs() is faster than pairs(), but pairs()
is recommended for map-like tables or mixed tables.)
The display will look like:
“1 Apple | 2 Orange | 3 Grapefruit | 4 Banana | k3 v3 | k1 v1 | k2 v2”.
#!/usr/bin/env tarantool
array = { 'Apple', 'Orange', 'Grapefruit', 'Banana'}
for k, v in ipairs(array) do print(k, v) end
map = { k1 = 'v1', k2 = 'v2', k3 = 'v3' }
for k, v in pairs(map) do print(k, v) end
Use the ‘#’ operator to get the number of items in an array-like Lua table.
This operation has O(log(N)) complexity.
#!/usr/bin/env tarantool
array = { 1, 2, 3}
print(#array)
count_array_with_nils.lua
Missing elements in arrays, which Lua treats as “nil”s,
cause the simple “#” operator to deliver improper results.
The “print(#t)” instruction will print “4”;
the “print(counter)” instruction will print “3”;
the “print(max)” instruction will print “10”.
Other table functions, such as table.sort(), will
also misbehave when “nils” are present.
#!/usr/bin/env tarantool
local t = {}
t[1] = 1
t[4] = 4
t[10] = 10
print(#t)
local counter = 0
for k,v in pairs(t) do counter = counter + 1 end
print(counter)
local max = 0
for k,v in pairs(t) do if k > max then max = k end end
print(max)
count_array_with_nulls.lua
Use explicit NULL values to avoid the problems caused by Lua’s
nil == missing value behavior. Although json.NULL == nil is
true, all the print instructions in this program will print
the correct value: 10.
#!/usr/bin/env tarantool
local json = require('json')
local t = {}
t[1] = 1; t[2] = json.NULL; t[3]= json.NULL;
t[4] = 4; t[5] = json.NULL; t[6]= json.NULL;
t[6] = 4; t[7] = json.NULL; t[8]= json.NULL;
t[9] = json.NULL
t[10] = 10
print(#t)
local counter = 0
for k,v in pairs(t) do counter = counter + 1 end
print(counter)
local max = 0
for k,v in pairs(t) do if k > max then max = k end end
print(max)
Get the number of elements in a map-like table.
#!/usr/bin/env tarantool
local map = { a = 10, b = 15, c = 20 }
local size = 0
for _ in pairs(map) do size = size + 1; end
print(size)
Use a Lua peculiarity to swap two variables without needing a third variable.
#!/usr/bin/env tarantool
local x = 1
local y = 2
x, y = y, x
print(x, y)
Create a class, create a metatable for the class, create an instance of the class.
Another illustration is at http://lua-users.org/wiki/LuaClassesWithMetatable.
#!/usr/bin/env tarantool
-- define class objects
local myclass_somemethod = function(self)
print('test 1', self.data)
end
local myclass_someothermethod = function(self)
print('test 2', self.data)
end
local myclass_tostring = function(self)
return 'MyClass <'..self.data..'>'
end
local myclass_mt = {
__tostring = myclass_tostring;
__index = {
somemethod = myclass_somemethod;
someothermethod = myclass_someothermethod;
}
}
-- create a new object of myclass
local object = setmetatable({ data = 'data'}, myclass_mt)
print(object:somemethod())
print(object.data)
fiber_producer_and_consumer.lua
Start one fiber for producer and one fiber for consumer.
Use fiber.channel() to exchange data and synchronize.
One can tweak the channel size (ch_size in the program code)
to control the number of simultaneous tasks waiting for processing.
#!/usr/bin/env tarantool
local fiber = require('fiber')
local function consumer_loop(ch, i)
-- initialize consumer synchronously or raise an error()
fiber.sleep(0) -- allow fiber.create() to continue
while true do
local data = ch:get()
if data == nil then
break
end
print('consumed', i, data)
fiber.sleep(math.random()) -- simulate some work
end
end
local function producer_loop(ch, i)
-- initialize consumer synchronously or raise an error()
fiber.sleep(0) -- allow fiber.create() to continue
while true do
local data = math.random()
ch:put(data)
print('produced', i, data)
end
end
local function start()
local consumer_n = 5
local producer_n = 3
-- Create a channel
local ch_size = math.max(consumer_n, producer_n)
local ch = fiber.channel(ch_size)
-- Start consumers
for i=1, consumer_n,1 do
fiber.create(consumer_loop, ch, i)
end
-- Start producers
for i=1, producer_n,1 do
fiber.create(producer_loop, ch, i)
end
end
start()
print('started')
Use socket.tcp_connect()
to connect to a remote host via TCP.
Display the connection details and the result of a GET request.
#!/usr/bin/env tarantool
local s = require('socket').tcp_connect('google.com', 80)
print(s:peer().host)
print(s:peer().family)
print(s:peer().type)
print(s:peer().protocol)
print(s:peer().port)
print(s:write("GET / HTTP/1.0\r\n\r\n"))
print(s:read('\r\n'))
print(s:read('\r\n'))
Use socket.tcp_connect()
to set up a simple TCP server, by creating
a function that handles requests and echos them,
and passing the function to
socket.tcp_server().
This program has been used to test with 100,000 clients,
with each client getting a separate fiber.
#!/usr/bin/env tarantool
local function handler(s, peer)
s:write("Welcome to test server, " .. peer.host .."\n")
while true do
local line = s:read('\n')
if line == nil then
break -- error or eof
end
if not s:write("pong: "..line) then
break -- error or eof
end
end
end
local server, addr = require('socket').tcp_server('localhost', 3311, handler)
Use socket.getaddrinfo() to perform
non-blocking DNS resolution, getting both the AF_INET6 and AF_INET
information for ‘google.com’.
This technique is not always necessary for tcp connections because
socket.tcp_connect()
performs socket.getaddrinfo under the hood,
before trying to connect to the first available address.
#!/usr/bin/env tarantool
local s = require('socket').getaddrinfo('google.com', 'http', { type = 'SOCK_STREAM' })
print('host=',s[1].host)
print('family=',s[1].family)
print('type=',s[1].type)
print('protocol=',s[1].protocol)
print('port=',s[1].port)
print('host=',s[2].host)
print('family=',s[2].family)
print('type=',s[2].type)
print('protocol=',s[2].protocol)
print('port=',s[2].port)
Tarantool does not currently have a udp_server function,
therefore socket_udp_echo.lua is more complicated than
socket_tcp_echo.lua.
It can be implemented with sockets and fibers.
#!/usr/bin/env tarantool
local socket = require('socket')
local errno = require('errno')
local fiber = require('fiber')
local function udp_server_loop(s, handler)
fiber.name("udp_server")
while true do
-- try to read a datagram first
local msg, peer = s:recvfrom()
if msg == "" then
-- socket was closed via s:close()
break
elseif msg ~= nil then
-- got a new datagram
handler(s, peer, msg)
else
if s:errno() == errno.EAGAIN or s:errno() == errno.EINTR then
-- socket is not ready
s:readable() -- yield, epoll will wake us when new data arrives
else
-- socket error
local msg = s:error()
s:close() -- save resources and don't wait GC
error("Socket error: " .. msg)
end
end
end
end
local function udp_server(host, port, handler)
local s = socket('AF_INET', 'SOCK_DGRAM', 0)
if not s then
return nil -- check errno:strerror()
end
if not s:bind(host, port) then
local e = s:errno() -- save errno
s:close()
errno(e) -- restore errno
return nil -- check errno:strerror()
end
fiber.create(udp_server_loop, s, handler) -- start a new background fiber
return s
end
A function for a client that connects to this server could
look something like this …
local function handler(s, peer, msg)
-- You don't have to wait until socket is ready to send UDP
-- s:writable()
s:sendto(peer.host, peer.port, "Pong: " .. msg)
end
local server = udp_server('127.0.0.1', 3548, handler)
if not server then
error('Failed to bind: ' .. errno.strerror())
end
print('Started')
require('console').start()
Use the http module
to get data via HTTP.
#!/usr/bin/env tarantool
local http_client = require('http.client')
local json = require('json')
local r = http_client.get('https://api.frankfurter.app/latest?to=USD%2CRUB')
if r.status ~= 200 then
print('Failed to get currency ', r.reason)
return
end
local data = json.decode(r.body)
print(data.base, 'rate of', data.date, 'is', data.rates.RUB, 'RUB or', data.rates.USD, 'USD')
Use the http module
to send data via HTTP.
#!/usr/bin/env tarantool
local http_client = require('http.client')
local json = require('json')
local data = json.encode({ Key = 'Value'})
local headers = { Token = 'xxxx', ['X-Secret-Value'] = '42' }
local r = http_client.post('http://localhost:8081', data, { headers = headers})
if r.status == 200 then
print 'Success'
end
Use the http rock (which must first be installed)
to turn Tarantool into a web server.
#!/usr/bin/env tarantool
local function handler(self)
return self:render{ json = { ['Your-IP-Is'] = self.peer.host } }
end
local server = require('http.server').new(nil, 8080, {charset = "utf8"}) -- listen *:8080
server:route({ path = '/' }, handler)
server:start()
-- connect to localhost:8080 and see json
Use the http rock (which must first be installed)
to generate HTML pages from templates.
The http
rock has a fairly simple template engine which allows execution
of regular Lua code inside text blocks (like PHP). Therefore there is no need
to learn new languages in order to write templates.
#!/usr/bin/env tarantool
local function handler(self)
local fruits = {'Apple', 'Orange', 'Grapefruit', 'Banana'}
return self:render{ fruits = fruits }
end
local server = require('http.server').new(nil, 8080, {charset = "utf8"}) -- nil means '*'
server:route({ path = '/', file = 'index.html.lua' }, handler)
server:start()
An “HTML” file for this server, including Lua, could look like this
(it would produce “1 Apple | 2 Orange | 3 Grapefruit | 4 Banana”).
Create a templates directory and put this file in it:
<html>
<body>
<table border="1">
% for i,v in pairs(fruits) do
<tr>
<td><%= i %></td>
<td><%= v %></td>
</tr>
% end
</table>
</body>
</html>
In Go, there is no one-liner to select all tuples from a Tarantool space.
Yet you can use a script like this one. Call it on the instance you want to
connect to.
package main
import (
"fmt"
"log"
"github.com/tarantool/go-tarantool"
)
/*
box.cfg{listen = 3301}
box.schema.user.passwd('pass')
s = box.schema.space.create('tester')
s:format({
{name = 'id', type = 'unsigned'},
{name = 'band_name', type = 'string'},
{name = 'year', type = 'unsigned'}
})
s:create_index('primary', { type = 'hash', parts = {'id'} })
s:create_index('scanner', { type = 'tree', parts = {'id', 'band_name'} })
s:insert{1, 'Roxette', 1986}
s:insert{2, 'Scorpions', 2015}
s:insert{3, 'Ace of Base', 1993}
*/
func main() {
conn, err := tarantool.Connect("127.0.0.1:3301", tarantool.Opts{
User: "admin",
Pass: "pass",
})
if err != nil {
log.Fatalf("Connection refused")
}
defer conn.Close()
spaceName := "tester"
indexName := "scanner"
idFn := conn.Schema.Spaces[spaceName].Fields["id"].Id
bandNameFn := conn.Schema.Spaces[spaceName].Fields["band_name"].Id
var tuplesPerRequest uint32 = 2
cursor := []interface{}{}
for {
resp, err := conn.Select(spaceName, indexName, 0, tuplesPerRequest, tarantool.IterGt, cursor)
if err != nil {
log.Fatalf("Failed to select: %s", err)
}
if resp.Code != tarantool.OkCode {
log.Fatalf("Select failed: %s", resp.Error)
}
if len(resp.Data) == 0 {
break
}
fmt.Println("Iteration")
tuples := resp.Tuples()
for _, tuple := range tuples {
fmt.Printf("\t%v\n", tuple)
}
lastTuple := tuples[len(tuples)-1]
cursor = []interface{}{lastTuple[idFn], lastTuple[bandNameFn]}
}
}
Lua tutorials
If you’re new to Lua, we recommend going over the interactive Tarantool
tutorial. To launch the tutorial, run the tutorial() command in the Tarantool console:
Insert one million tuples with a Lua stored procedure
This is an exercise assignment: “Insert one million tuples. Each tuple should
have a constantly-increasing numeric primary-key field and a random alphabetic
10-character string field.”
The purpose of the exercise is to show what Lua functions look like inside
Tarantool. It will be necessary to employ the Lua math library, the Lua string
library, the Tarantool box library, the Tarantool box.tuple library, loops, and
concatenations. It should be easy to follow even for a person who has not used
either Lua or Tarantool before. The only requirement is a knowledge of how other
programming languages work and a memory of the first two chapters of this manual.
But for better understanding, follow the comments and the links, which point to
the Lua manual or to elsewhere in this Tarantool manual. To further enhance
learning, type the statements in with the tarantool client while reading along.
In earlier versions of Tarantool, multi-line functions had to be
enclosed within “delimiters”. They are no longer necessary, and
so they will not be used in this tutorial. However, they are still
supported. Users who wish to use delimiters, or users of
older versions of Tarantool, should check the syntax description for
declaring a delimiter before proceeding.
Create a function that returns a string
We will start by making a function that returns a fixed string, “Hello world”.
function string_function()
return "hello world"
end
The word “function” is a Lua keyword – we’re about to go into Lua. The
function name is string_function. The function has one executable statement,
return "hello world". The string “hello world” is enclosed in double quotes
here, although Lua doesn’t care – one could use single quotes instead. The
word “end” means “this is the end of the Lua function declaration.”
To confirm that the function works, we can say
Sending function-name() means “invoke the Lua function.” The effect is
that the string which the function returns will end up on the screen.
For more about Lua strings see Lua manual chapter 2.4 “Strings” . For more
about functions see Lua manual chapter 5 “Functions”.
The screen now looks like this:
Create a function that calls another function and sets a variable
Now that string_function exists, we can invoke it from another
function.
function main_function()
local string_value
string_value = string_function()
return string_value
end
We begin by declaring a variable “string_value”. The word “local”
means that string_value appears only in main_function. If we didn’t use
“local” then string_value would be visible everywhere - even by other
users using other clients connected to this server instance! Sometimes that’s a very
desirable feature for inter-client communication, but not this time.
Then we assign a value to string_value, namely, the result of
string_function(). Soon we will invoke main_function() to check that it
got the value.
For more about Lua variables see Lua manual chapter 4.2 “Local Variables and Blocks” .
The screen now looks like this:
Modify the function so it returns a one-letter random string
Now that it’s a bit clearer how to make a variable, we can change
string_function() so that, instead of returning a fixed literal
“Hello world”, it returns a random letter between ‘A’ and ‘Z’.
function string_function()
local random_number
local random_string
random_number = math.random(65, 90)
random_string = string.char(random_number)
return random_string
end
It is not necessary to destroy the old string_function() contents, they’re
simply overwritten. The first assignment invokes a random-number function
in Lua’s math library; the parameters mean “the number must be an integer
between 65 and 90.” The second assignment invokes an integer-to-character
function in Lua’s string library; the parameter is the code point of the
character. Luckily the ASCII value of ‘A’ is 65 and the ASCII value of ‘Z’
is 90 so the result will always be a letter between A and Z.
For more about Lua math-library functions see Lua users
“Math Library Tutorial”.
For more about Lua string-library functions see Lua users
“String Library Tutorial” .
Once again the string_function() can be invoked from main_function() which
can be invoked with main_function().
The screen now looks like this:
… Well, actually it won’t always look like this because math.random()
produces random numbers. But for the illustration purposes it won’t matter
what the random string values are.
Modify the function so it returns a ten-letter random string
Now that it’s clear how to produce one-letter random strings, we can reach our
goal of producing a ten-letter string by concatenating ten one-letter strings,
in a loop.
function string_function()
local random_number
local random_string
random_string = ""
for x = 1,10,1 do
random_number = math.random(65, 90)
random_string = random_string .. string.char(random_number)
end
return random_string
end
The words “for x = 1,10,1” mean “start with x equals 1, loop until x equals 10,
increment x by 1 for each iteration.” The symbol “..” means “concatenate”, that
is, add the string on the right of the “..” sign to the string on the left of
the “..” sign. Since we start by saying that random_string is “” (a blank
string), the end result is that random_string has 10 random letters. Once
again the string_function() can be invoked from main_function() which
can be invoked with main_function().
For more about Lua loops see Lua manual chapter 4.3.4 “Numeric for”.
The screen now looks like this:
Make a tuple out of a number and a string
Now that it’s clear how to make a 10-letter random string, it’s possible to
make a tuple that contains a number and a 10-letter random string, by invoking
a function in Tarantool’s library of Lua functions.
function main_function()
local string_value, t
string_value = string_function()
t = box.tuple.new({1, string_value})
return t
end
Once this is done, t will be the value of a new tuple which has two fields.
The first field is numeric: 1. The second field is a random string. Once again
the string_function() can be invoked from main_function() which can be
invoked with main_function().
For more about Tarantool tuples see Tarantool manual section Submodule box.tuple.
The screen now looks like this:
Modify main_function to insert a tuple into the database
Now that it’s clear how to make a tuple that contains a number and a 10-letter
random string, the only trick remaining is putting that tuple into tester.
Remember that tester is the first space that was defined in the sandbox, so
it’s like a database table.
function main_function()
local string_value, t
string_value = string_function()
t = box.tuple.new({1,string_value})
box.space.tester:replace(t)
end
The new line here is box.space.tester:replace(t). The name contains
‘tester’ because the insertion is going to be to tester. The second parameter
is the tuple value. To be perfectly correct we could have said
box.space.tester:insert(t) here, rather than box.space.tester:replace(t),
but “replace” means “insert even if there is already a tuple whose primary-key
value is a duplicate”, and that makes it easier to re-run the exercise even if
the sandbox database isn’t empty. Once this is done, tester will contain a tuple
with two fields. The first field will be 1. The second field will be a random
10-letter string. Once again the string_function() can be invoked from
main_function() which can be invoked with main_function(). But
main_function() won’t tell the whole story, because it does not return t, it
only puts t into the database. To confirm that something got inserted, we’ll use
a SELECT request.
main_function()
box.space.tester:select{1}
For more about Tarantool insert and replace calls, see Tarantool manual section
Submodule box.space,
space_object:insert(), and
space_object:replace().
The screen now looks like this:
Modify main_function to insert a million tuples into the database
Now that it’s clear how to insert one tuple into the database, it’s no big deal
to figure out how to scale up: instead of inserting with a literal value = 1
for the primary key, insert with a variable value = between 1 and 1 million, in
a loop. Since we already saw how to loop, that’s a simple thing. The only extra
wrinkle that we add here is a timing function.
function main_function()
local string_value, t
for i = 1,1000000,1 do
string_value = string_function()
t = box.tuple.new({i,string_value})
box.space.tester:replace(t)
end
end
start_time = os.clock()
main_function()
end_time = os.clock()
'insert done in ' .. end_time - start_time .. ' seconds'
The standard Lua function
os.clock()
will return the number of CPU seconds since the
start. Therefore, by getting start_time = number of seconds just before the
inserting, and then getting end_time = number of seconds just after the
inserting, we can calculate (end_time - start_time) = elapsed time in seconds.
We will display that value by putting it in a request without any assignments,
which causes Tarantool to send the value to the client, which prints it. (Lua’s
answer to the C printf() function, which is print(), will also work.)
For more on Lua os.clock() see Lua manual chapter 22.1 “Date and Time”.
For more on Lua print() see Lua manual chapter 5 “Functions”.
Since this is the grand finale, we will redo the final versions of all the
necessary requests: the request that
created string_function(), the request that created main_function(),
and the request that invokes main_function().
function string_function()
local random_number
local random_string
random_string = ""
for x = 1,10,1 do
random_number = math.random(65, 90)
random_string = random_string .. string.char(random_number)
end
return random_string
end
function main_function()
local string_value, t
for i = 1,1000000,1 do
string_value = string_function()
t = box.tuple.new({i,string_value})
box.space.tester:replace(t)
end
end
start_time = os.clock()
main_function()
end_time = os.clock()
'insert done in ' .. end_time - start_time .. ' seconds'
The screen now looks like this:
What has been shown is that Lua functions are quite expressive (in fact one can
do more with Tarantool’s Lua stored procedures than one can do with stored
procedures in some SQL DBMSs), and that it’s straightforward to combine
Lua-library functions and Tarantool-library functions.
What has also been shown is that inserting a million tuples took 37 seconds. The
host computer was a Linux laptop. By changing wal_mode to ‘none’ before
running the test, one can reduce the elapsed time to 4 seconds.
Sum a JSON field for all tuples
This is an exercise assignment: “Assume that inside every tuple there is a
string formatted as JSON. Inside that string there is a JSON numeric field.
For each tuple, find the numeric field’s value and add it to a ‘sum’ variable.
At end, return the ‘sum’ variable.” The purpose of the exercise is to get
experience in one way to read and process tuples.
1json = require('json')
2function sum_json_field(field_name)
3 local v, t, sum, field_value, is_valid_json, lua_table
4 sum = 0
5 for v, t in box.space.tester:pairs() do
6 is_valid_json, lua_table = pcall(json.decode, t[2])
7 if is_valid_json then
8 field_value = lua_table[field_name]
9 if type(field_value) == "number" then sum = sum + field_value end
10 end
11 end
12 return sum
13end
LINE 3: WHY “LOCAL”. This line declares all the variables that will be used in
the function. Actually it’s not necessary to declare all variables at the start,
and in a long function it would be better to declare variables just before using
them. In fact it’s not even necessary to declare variables at all, but an
undeclared variable is “global”. That’s not desirable for any of the variables
that are declared in line 1, because all of them are for use only within the function.
LINE 5: WHY “PAIRS()”. Our job is to go through all the rows and there are two
ways to do it: with box.space.space_object:pairs() or with
variable = select(...) followed by for i, n, 1 do some-function(variable[i]) end.
We preferred pairs() for this example.
LINE 5: START THE MAIN LOOP. Everything inside this “for” loop will be
repeated as long as there is another index key. A tuple is fetched and can be
referenced with variable t.
LINE 6: WHY “PCALL”. If we simply said lua_table = json.decode(t[2])), then
the function would abort with an error if it encountered something wrong with the
JSON string - a missing colon, for example. By putting the function inside “pcall”
(protected call), we’re saying: we want to intercept that sort of error, so if
there’s a problem just set is_valid_json = false and we will know what to do
about it later.
LINE 6: MEANING. The function is json.decode which means decode a JSON
string, and the parameter is t[2] which is a reference to a JSON string. There’s
a bit of hard coding here, we’re assuming that the second field in the tuple is
where the JSON string was inserted. For example, we’re assuming a tuple looks like
field[1]: 444
field[2]: '{"Hello": "world", "Quantity": 15}'
meaning that the tuple’s first field, the primary key field, is a number while
the tuple’s second field, the JSON string, is a string. Thus the entire statement
means “decode t[2] (the tuple’s second field) as a JSON string; if there’s an
error set is_valid_json = false; if there’s no error set is_valid_json = true and
set lua_table = a Lua table which has the decoded string”.
LINE 8. At last we are ready to get the JSON field value from the Lua table that
came from the JSON string. The value in field_name, which is the parameter for the
whole function, must be a name of a JSON field. For example, inside the JSON string
'{"Hello": "world", "Quantity": 15}', there are two JSON fields: “Hello” and
“Quantity”. If the whole function is invoked with sum_json_field("Quantity"),
then field_value = lua_table[field_name] is effectively the same as
field_value = lua_table["Quantity"] or even field_value = lua_table.Quantity.
Those are just three different ways of saying: for the Quantity field in the Lua table,
get the value and put it in variable field_value.
LINE 9: WHY “IF”. Suppose that the JSON string is well formed but the JSON field
is not a number, or is missing. In that case, the function would be aborted when
there was an attempt to add it to the sum. By first checking
type(field_value) == "number", we avoid that abortion. Anyone who knows that
the database is in perfect shape can skip this kind of thing.
And the function is complete. Time to test it. Starting with an empty database,
defined the same way as the sandbox database in our
“Getting started” exercises,
-- if tester is left over from some previous test, destroy it
box.space.tester:drop()
box.schema.space.create('tester')
box.space.tester:create_index('primary', {parts = {1, 'unsigned'}})
then add some tuples where the first field is a number and the second
field is a string.
box.space.tester:insert{444, '{"Item": "widget", "Quantity": 15}'}
box.space.tester:insert{445, '{"Item": "widget", "Quantity": 7}'}
box.space.tester:insert{446, '{"Item": "golf club", "Quantity": "sunshine"}'}
box.space.tester:insert{447, '{"Item": "waffle iron", "Quantit": 3}'}
Since this is a test, there are deliberate errors. The “golf club” and the
“waffle iron” do not have numeric Quantity fields, so must be ignored.
Therefore the real sum of the Quantity field in the JSON strings should be:
15 + 7 = 22.
Invoke the function with sum_json_field("Quantity").
It works. We’ll just leave, as exercises for future improvement, the possibility
that the “hard coding” assumptions could be removed, that there might have to be
an overflow check if some field values are huge, and that the function should
contain a yield instruction if the count of tuples is huge.
Here is a generic function which takes a field identifier
and a search pattern, and returns all tuples that match.
* The field must be the first field of a TREE index.
* The function will use Lua pattern matching,
which allows “magic characters” in regular expressions.
* The initial characters in the pattern, as far as the
first magic character, will be used as an index search key.
For each tuple that is found via the index, there will be
a match of the whole pattern.
* To be cooperative,
the function should yield after every
10 tuples, unless there is a reason to delay yielding.
With this function, we can take advantage of Tarantool’s indexes
for speed, and take advantage of Lua’s pattern matching for flexibility.
It does everything that an SQL
LIKE search can do, and far more.
Read the following Lua code to see how it works.
The comments that begin with “SEE NOTE …” refer to long
explanations that follow the code.
function indexed_pattern_search(space_name, field_no, pattern)
-- SEE NOTE #1 "FIND AN APPROPRIATE INDEX"
if (box.space[space_name] == nil) then
print("Error: Failed to find the specified space")
return nil
end
local index_no = -1
for i=0,box.schema.INDEX_MAX,1 do
if (box.space[space_name].index[i] == nil) then break end
if (box.space[space_name].index[i].type == "TREE"
and box.space[space_name].index[i].parts[1].fieldno == field_no
and (box.space[space_name].index[i].parts[1].type == "scalar"
or box.space[space_name].index[i].parts[1].type == "string")) then
index_no = i
break
end
end
if (index_no == -1) then
print("Error: Failed to find an appropriate index")
return nil
end
-- SEE NOTE #2 "DERIVE INDEX SEARCH KEY FROM PATTERN"
local index_search_key = ""
local index_search_key_length = 0
local last_character = ""
local c = ""
local c2 = ""
for i=1,string.len(pattern),1 do
c = string.sub(pattern, i, i)
if (last_character ~= "%") then
if (c == '^' or c == "$" or c == "(" or c == ")" or c == "."
or c == "[" or c == "]" or c == "*" or c == "+"
or c == "-" or c == "?") then
break
end
if (c == "%") then
c2 = string.sub(pattern, i + 1, i + 1)
if (string.match(c2, "%p") == nil) then break end
index_search_key = index_search_key .. c2
else
index_search_key = index_search_key .. c
end
end
last_character = c
end
index_search_key_length = string.len(index_search_key)
if (index_search_key_length < 3) then
print("Error: index search key " .. index_search_key .. " is too short")
return nil
end
-- SEE NOTE #3 "OUTER LOOP: INITIATE"
local result_set = {}
local number_of_tuples_in_result_set = 0
local previous_tuple_field = ""
while true do
local number_of_tuples_since_last_yield = 0
local is_time_for_a_yield = false
-- SEE NOTE #4 "INNER LOOP: ITERATOR"
for _,tuple in box.space[space_name].index[index_no]:
pairs(index_search_key,{iterator = box.index.GE}) do
-- SEE NOTE #5 "INNER LOOP: BREAK IF INDEX KEY IS TOO GREAT"
if (string.sub(tuple[field_no], 1, index_search_key_length)
> index_search_key) then
break
end
-- SEE NOTE #6 "INNER LOOP: BREAK AFTER EVERY 10 TUPLES -- MAYBE"
number_of_tuples_since_last_yield = number_of_tuples_since_last_yield + 1
if (number_of_tuples_since_last_yield >= 10
and tuple[field_no] ~= previous_tuple_field) then
index_search_key = tuple[field_no]
is_time_for_a_yield = true
break
end
previous_tuple_field = tuple[field_no]
-- SEE NOTE #7 "INNER LOOP: ADD TO RESULT SET IF PATTERN MATCHES"
if (string.match(tuple[field_no], pattern) ~= nil) then
number_of_tuples_in_result_set = number_of_tuples_in_result_set + 1
result_set[number_of_tuples_in_result_set] = tuple
end
end
-- SEE NOTE #8 "OUTER LOOP: BREAK, OR YIELD AND CONTINUE"
if (is_time_for_a_yield ~= true) then
break
end
require('fiber').yield()
end
return result_set
end
NOTE #1 “FIND AN APPROPRIATE INDEX”
The caller has passed space_name (a string) and field_no (a number).
The requirements are:
(a) index type must be “TREE” because for other index types
(HASH, BITSET, RTREE) a search with iterator=GE
will not return strings in order by string value;
(b) field_no must be the first index part;
(c) the field must contain strings, because for other data types
(such as “unsigned”) pattern searches are not possible;
If these requirements are not met by any index, then
print an error message and return nil.
NOTE #2 “DERIVE INDEX SEARCH KEY FROM PATTERN”
The caller has passed pattern (a string).
The index search key will be
the characters in the pattern as far as the first magic character.
Lua’s magic characters are % ^ $ ( ) . [ ] * + - ?.
For example, if the pattern is “ABC.E”, the period is a magic
character and therefore the index search key will be “ABC”.
But there is a complication … If we see “%” followed by a punctuation
character, that punctuation character is “escaped” so
remove the “%” when making the index search key. For example, if the
pattern is “AB%$E”, the dollar sign is escaped and therefore
the index search key will be “AB$E”.
Finally there is a check that the index search key length
must be at least three – this is an arbitrary number, and in
fact zero would be okay, but short index search keys will cause
long search times.
NOTE #3 – “OUTER LOOP: INITIATE”
The function’s job is to return a result set,
just as box.space...select <box_space-select> would. We will fill
it within an outer loop that contains an inner
loop. The outer loop’s job is to execute the inner
loop, and possibly yield, until the search ends.
The inner loop’s job is to find tuples via the index, and put
them in the result set if they match the pattern.
NOTE #4 “INNER LOOP: ITERATOR”
The for loop here is using pairs(), see the
explanation of what index iterators are.
Within the inner loop,
there will be a local variable named “tuple” which contains
the latest tuple found via the index search key.
NOTE #5 “INNER LOOP: BREAK IF INDEX KEY IS TOO GREAT”
The iterator is GE (Greater or Equal), and we must be
more specific: if the search index key has N characters,
then the leftmost N characters of the result’s index field
must not be greater than the search index key. For example,
if the search index key is ‘ABC’, then ‘ABCDE’ is
a potential match, but ‘ABD’ is a signal that
no more matches are possible.
NOTE #6 “INNER LOOP: BREAK AFTER EVERY 10 TUPLES – MAYBE”
This chunk of code is for cooperative multitasking.
The number 10 is arbitrary, and usually a larger number would be okay.
The simple rule would be “after checking 10 tuples, yield,
and then resume the search (that is, do the inner loop again)
starting after the last value that was found”. However, if
the index is non-unique or if there is more than one field
in the index, then we might have duplicates – for example
{“ABC”,1}, {“ABC”, 2}, {“ABC”, 3}” – and it would be difficult
to decide which “ABC” tuple to resume with. Therefore, if
the result’s index field is the same as the previous
result’s index field, there is no break.
NOTE #7 “INNER LOOP: ADD TO RESULT SET IF PATTERN MATCHES”
Compare the result’s index field to the entire pattern.
For example, suppose that the caller passed pattern “ABC.E”
and there is an indexed field containing “ABCDE”.
Therefore the initial index search key is “ABC”.
Therefore a tuple containing an indexed field with “ABCDE”
will be found by the iterator, because “ABCDE” > “ABC”.
In that case string.match will return a value which is not nil.
Therefore this tuple can be added to the result set.
NOTE #8 “OUTER LOOP: BREAK, OR YIELD AND CONTINUE”
There are three conditions which will cause a break from
the inner loop: (1) the for loop ends naturally because
there are no more index keys which are greater than or
equal to the index search key, (2) the index key is too
great as described in NOTE #5, (3) it is time for a yield
as described in NOTE #6. If condition (1) or condition (2)
is true, then there is nothing more to do, the outer loop
ends too. If and only if condition (3) is true, the
outer loop must yield and then continue. If it does
continue, then the inner loop – the iterator search –
will happen again with a new value for the index search key.
EXAMPLE:
Start Tarantool, cut and paste the code for function indexed_pattern_search(),
and try the following:
box.space.t:drop()
box.schema.space.create('t')
box.space.t:create_index('primary',{})
box.space.t:create_index('secondary',{unique=false,parts={2,'string',3,'string'}})
box.space.t:insert{1,'A','a'}
box.space.t:insert{2,'AB',''}
box.space.t:insert{3,'ABC','a'}
box.space.t:insert{4,'ABCD',''}
box.space.t:insert{5,'ABCDE','a'}
box.space.t:insert{6,'ABCDE',''}
box.space.t:insert{7,'ABCDEF','a'}
box.space.t:insert{8,'ABCDF',''}
indexed_pattern_search("t", 2, "ABC.E.")
The result will be:
Tips on Lua syntax
The Lua syntax for data-manipulation functions
can vary. Here are examples of the variations with select() requests.
The same rules exist for the other data-manipulation functions.
Every one of the examples does the same thing:
select a tuple set from a space named ‘tester’ where the primary-key field value
equals 1. For these examples, we assume that the numeric id of ‘tester’
is 512, which happens to be the case in our sandbox example only.
Object reference variations
First, there are three object reference variations:
Examples in this manual usually have the “box.space.tester:”
form (#1). However, this is a matter of user preference and all the variations
exist in the wild.
Also, descriptions in this manual use the syntax “space_object:”
for references to objects which are spaces, and
“index_object:” for references to objects which are indexes (for example
box.space.tester.index.primary:).
Then, there are seven parameter variations:
Lua allows to omit parentheses () when invoking a function if its only
argument is a Lua table, and we use it sometimes in our examples.
This is why select{1} is equivalent to select({1}).
Literal values such as 1 (a scalar value) or {1} (a Lua table value)
may be replaced by variable names, as in examples #6 and #7.
Although there are special cases where braces can be omitted, they are
preferable because they signal “Lua table”.
Examples and descriptions in this manual have the {1} form. However, this
too is a matter of user preference and all the variations exist in the wild.
Database objects have loose rules for names:
the maximum length is 65000 bytes (not characters),
and almost any legal Unicode character is allowed,
including spaces, ideograms and punctuation.
In those cases, to prevent confusion with Lua operators and
separators, object references should have the literal-in-square-brackets
form (#2), or the variable form (#3). For example:
Disallowed:
- characters which are unassigned code points,
- line and paragraph separators,
- control characters,
- the replacement character (U+FFFD).
Not recommended: characters which cannot be displayed.
Names are “case sensitive”, so ‘A’ and ‘a’ are not the same.
Enterprise modules
This section covers open and closed source Lua modules for Tarantool Enterprise Edition
included in the distribution as an offline rocks repository.
- avro-schema
is an assembly of Apache Avro
schema tools;
- checks
is a type checker of functional arguments. This library that declares
a
checks() function and checkers table that allow to check the
parameters passed to a Lua function in a fast and unobtrusive way.
- http is an
on-board HTTP-server, which comes in addition to Tarantool’s out-of-the-box
HTTP client, and must be installed as described in the
installation section.
- icu-date
is a date-and-time formatting library for Tarantool
based on International Components for Unicode;
- kafka
is a full-featured high-performance
kafka library for Tarantool
based on librdkafka;
- luacheck is a static analyzer and
linter for Lua, preconfigured for Tarantool.
- luarapidxml
is a fast XML parser.
- luatest is
a Tarantool test framework written in Lua.
- membership
builds a mesh from multiple Tarantool instances based on gossip protocol.
The mesh monitors itself, helps members discover everyone else in the group
and get notified about their status changes with low latency. It is built
upon the ideas from Consul or, more precisely, the SWIM algorithm.
- metrics is a collection
of useful monitoring metrics.
- tracing
is a module for debugging performance issues.
- vshard
is an automatic sharding system that enables horizontal scaling for Tarantool
DBMS instances.
ldap allows you to authenticate in a LDAP server and perform searches.
odbc is an ODBC connector for Tarantool based on unixODBC.
oracle
is an Oracle connector for Lua applications through which they can send and
receive data to and from Oracle databases.
The advantage of the Tarantool-Oracle integration is that anyone can handle all
the tasks with Oracle DBMSs (control, manipulation, storage, access) with the
same high-level language (Lua) and with minimal delay.
task
is a module for managing background tasks in a Tarantool cluster.
Installing and using modules
To use a module, install the following:
All the necessary third-party software packages (if any). See the
module’s prerequisites for the list.
The module itself on every Tarantool instance:
$ tt rocks install MODULE_NAME [MODULE_VERSION]
See the tt rocks reference to learn more about
managing Lua modules.
Creating an application
Further we walk you through key programming practices that will give you a good
start in writing Lua applications for Tarantool. We will implement a real microservice
based on Tarantool! It is a backend for a simplified version of
Pokémon Go, a location-based
augmented reality game launched in mid-2016.
In this game, players use the GPS capability of a mobile device to locate, catch,
battle, and train virtual monsters called “pokémon” that appear on the screen as
if they were in the same real-world location as the player.
To stay within the walk-through format, let’s narrow the original gameplay as
follows. We have a map with pokémon spawn locations. Next, we have multiple
players who can send catch-a-pokémon requests to the server (which runs our
Tarantool microservice). The server responds whether the
pokémon is caught or not, increases the player’s pokémon counter if yes,
and triggers the respawn-a-pokémon method that spawns a new pokémon at the same
location in a while.
We leave client-side applications outside the scope of this story. However, we
promise a mini-demo in the end to simulate real users and give us some fun.

Follow these topics to implement our application:
Modules, rocks and applications
To make our game logic available to other developers and Lua applications, let’s
put it into a Lua module.
A module (called “rock” in Lua) is an optional library which enhances
Tarantool functionality. So, we can install our logic as a module in Tarantool
and use it from any Tarantool application or module. Like applications, modules
in Tarantool can be written in Lua (rocks), C or C++.
Modules are good for two things:
- easier code management (reuse, packaging, versioning), and
- hot code reload without restarting the Tarantool instance.
Technically, a module is a file with source code that exports its functions in
an API. For example, here is a Lua module named mymodule.lua that exports
one function named myfun:
local exports = {}
exports.myfun = function(input_string)
print('Hello', input_string)
end
return exports
To launch the function myfun() – from another module, from a Lua application,
or from Tarantool itself, – we need to save this module as a file, then load
this module with the require() directive and call the exported function.
For example, here’s a Lua application that uses myfun() function from
mymodule.lua module:
-- loading the module
local mymodule = require('mymodule')
-- calling myfun() from within test() function
local test = function()
mymodule.myfun()
end
A thing to remember here is that the require() directive takes load paths
to Lua modules from the package.path variable. This is a semicolon-separated
string, where a question mark is used to interpolate the module name. By default,
this variable contains system-wide Lua paths and the working directory.
But if we put our modules inside a specific folder (e.g. scripts/), we need
to add this folder to package.path before any calls to require():
package.path = 'scripts/?.lua;' .. package.path
For our microservice, a simple and convenient solution would be to put all
methods in a Lua module (say pokemon.lua) and to write a Lua application
(say game.lua) that initializes the gaming environment and starts the game
loop.

Now let’s get down to implementation details. In our game, we need three entities:
- map, which is an array of pokémons with coordinates of respawn locations;
in this version of the game, let a location be a rectangle identified with two
points, upper-left and lower-right;
- player, which has an ID, a name, and coordinates of the player’s location
point;
- pokémon, which has the same fields as the player, plus a status
(active/inactive, that is present on the map or not) and a catch probability
(well, let’s give our pokémons a chance to escape :-) )
We’ll store these entities as tuples in Tarantool spaces. But to deliver our
backend application as a microservice, the good practice would be to send/receive
our data in the universal JSON format, thus using Tarantool as a document storage.
Avro schemas
To store JSON data as tuples, we will apply a savvy practice which reduces data
footprint and ensures all stored documents are valid. We will use Tarantool
module avro-schema which checks
the schema of a JSON document and converts it to a Tarantool tuple. The tuple
will contain only field values, and thus take a lot less space than the original
document. In avro-schema terms, converting JSON documents to tuples is
“flattening”, and restoring the original documents is “unflattening”.
First you need to install
the module with tt rocks install avro-schema.
Further usage is quite straightforward:
- For each entity, we need to define a schema in
Apache Avro schema syntax,
where we list the entity’s fields with their names and
Avro data types.
- At initialization, we call
avro-schema.create() that creates objects
in memory for all schema entities, and compile() that generates
flatten/unflatten methods for each entity.
- Further on, we just call flatten/unflatten methods for a respective entity
on receiving/sending the entity’s data.
Here’s what our schema definitions for the player and pokémon entities look like:
local schema = {
player = {
type="record",
name="player_schema",
fields={
{name="id", type="long"},
{name="name", type="string"},
{
name="location",
type= {
type="record",
name="player_location",
fields={
{name="x", type="double"},
{name="y", type="double"}
}
}
}
}
},
pokemon = {
type="record",
name="pokemon_schema",
fields={
{name="id", type="long"},
{name="status", type="string"},
{name="name", type="string"},
{name="chance", type="double"},
{
name="location",
type= {
type="record",
name="pokemon_location",
fields={
{name="x", type="double"},
{name="y", type="double"}
}
}
}
}
}
}
And here’s how we create and compile our entities at initialization:
-- load avro-schema module with require()
local avro = require('avro_schema')
-- create models
local ok_m, pokemon = avro.create(schema.pokemon)
local ok_p, player = avro.create(schema.player)
if ok_m and ok_p then
-- compile models
local ok_cm, compiled_pokemon = avro.compile(pokemon)
local ok_cp, compiled_player = avro.compile(player)
if ok_cm and ok_cp then
-- start the game
<...>
else
log.error('Schema compilation failed')
end
else
log.info('Schema creation failed')
end
return false
As for the map entity, it would be an overkill to introduce a schema for it,
because we have only one map in the game, it has very few fields, and – which
is most important – we use the map only inside our logic, never exposing it
to external users.

Next, we need methods to implement the game logic. To simulate object-oriented
programming in our Lua code, let’s store all Lua functions and shared variables
in a single local variable (let’s name it as game). This will allow us to
address functions or variables from within our module as self.func_name or
self.var_name. Like this:
local game = {
-- a local variable
num_players = 0,
-- a method that prints a local variable
hello = function(self)
print('Hello! Your player number is ' .. self.num_players .. '.')
end,
-- a method that calls another method and returns a local variable
sign_in = function(self)
self.num_players = self.num_players + 1
self:hello()
return self.num_players
end
}
In OOP terms, we can now regard local variables inside game as object fields,
and local functions as object methods.
Note
In this manual, Lua examples use local variables. Use global
variables with caution, since the module’s users may be unaware of them.
To enable/disable the use of undeclared global variables in your Lua code,
use Tarantool’s strict module.
So, our game module will have the following methods:
catch() to calculate whether the pokémon was caught (besides the
coordinates of both the player and pokémon, this method will apply
a probability factor, so not every pokémon within the player’s reach
will be caught);
respawn() to add missing pokémons to the map, say, every 60 seconds
(we assume that a frightened pokémon runs away, so we remove a pokémon from
the map on any catch attempt and add it back to the map in a while);
notify() to log information about caught pokémons (like
“Player 1 caught pokémon A”);
start() to initialize the game (it will create database spaces, create
and compile avro schemas, and launch respawn()).
Besides, it would be convenient to have methods for working with Tarantool
storage. For example:
add_pokemon() to add a pokémon to the database, and
map() to populate the map with all pokémons stored in Tarantool.
We’ll need these two methods primarily when initializing our game, but we can
also call them later, for example to test our code.
Bootstrapping a database
Let’s discuss game initialization. In start() method, we need to populate
Tarantool spaces with pokémon data. Why not keep all game data in memory?
Why use a database? The answer is: persistence.
Without a database, we risk losing data on power outage, for example.
But if we store our data in an in-memory database, Tarantool takes care to
persist it on disk whenever it’s changed. This gives us one more benefit:
quick startup in case of failure.
Tarantool has a smart algorithm that quickly
loads all data from disk into memory on startup, so the warm-up takes little time.
We’ll be using functions from Tarantool built-in box module:
box.schema.create_space('pokemons') to create a space named pokemon for
storing information about pokémons (we don’t create a similar space for players,
because we intend to only send/receive player information via API calls, so we
needn’t store it);
box.space.pokemons:create_index('primary', {type = 'hash', parts = {1, 'unsigned'}})
to create a primary HASH index by pokémon ID;
box.space.pokemons:create_index('status', {type = 'tree', parts = {2, 'str'}})
to create a secondary TREE index by pokémon status.
Notice the parts = argument in the index specification. The pokémon ID is
the first field in a Tarantool tuple since it’s the first member of the respective
Avro type. So does the pokémon status. The actual JSON document may have ID or
status fields at any position of the JSON map.
The implementation of start() method looks like this:
-- create game object
start = function(self)
-- create spaces and indexes
box.once('init', function()
box.schema.create_space('pokemons')
box.space.pokemons:create_index(
"primary", {type = 'hash', parts = {1, 'unsigned'}}
)
box.space.pokemons:create_index(
"status", {type = "tree", parts = {2, 'str'}}
)
end)
-- create models
local ok_m, pokemon = avro.create(schema.pokemon)
local ok_p, player = avro.create(schema.player)
if ok_m and ok_p then
-- compile models
local ok_cm, compiled_pokemon = avro.compile(pokemon)
local ok_cp, compiled_player = avro.compile(player)
if ok_cm and ok_cp then
-- start the game
<...>
else
log.error('Schema compilation failed')
end
else
log.info('Schema creation failed')
end
return false
end
GIS
Now let’s discuss catch(), which is the main method in our gaming logic.
Here we receive the player’s coordinates and the target pokémon’s ID number,
and we need to answer whether the player has actually caught the pokémon or not
(remember that each pokémon has a chance to escape).
First thing, we validate the received player data against its
Avro schema. And we check whether such a pokémon
exists in our database and is displayed on the map (the pokémon must have the
active status):
catch = function(self, pokemon_id, player)
-- check player data
local ok, tuple = self.player_model.flatten(player)
if not ok then
return false
end
-- get pokemon data
local p_tuple = box.space.pokemons:get(pokemon_id)
if p_tuple == nil then
return false
end
local ok, pokemon = self.pokemon_model.unflatten(p_tuple)
if not ok then
return false
end
if pokemon.status ~= self.state.ACTIVE then
return false
end
-- more catch logic to follow
<...>
end
Next, we calculate the answer: caught or not.
To work with geographical coordinates, we use Tarantool
gis module.
To keep things simple, we don’t load any specific map, assuming that we deal with
a world map. And we do not validate incoming coordinates, assuming again that all
received locations are within the planet Earth.
We use two geo-specific variables:
wgs84, which stands for the latest revision of the World Geodetic System
standard, WGS84.
Basically, it comprises a standard coordinate system for the Earth and
represents the Earth as an ellipsoid.
nationalmap, which stands for the
US National Atlas Equal Area. This is a projected
coordinates system based on WGS84. It gives us a zero base for location
projection and allows positioning our players and pokémons in meters.
Both these systems are listed in the EPSG Geodetic Parameter Registry, where each
system has a unique number. In our code, we assign these listing numbers to
respective variables:
wgs84 = 4326,
nationalmap = 2163,
For our game logic, we need one more variable, catch_distance, which defines
how close a player must get to a pokémon before trying to catch it. Let’s set
the distance to 100 meters.
Now we’re ready to calculate the answer. We need to project the current location
of both player (p_pos) and pokémon (m_pos) on the map, check whether the
player is close enough to the pokémon (using catch_distance), and calculate
whether the player has caught the pokémon (here we generate some random value and
let the pokémon escape if the random value happens to be less than 100 minus
pokémon’s chance value):
-- project locations
local m_pos = gis.Point(
{pokemon.location.x, pokemon.location.y}, self.wgs84
):transform(self.nationalmap)
local p_pos = gis.Point(
{player.location.x, player.location.y}, self.wgs84
):transform(self.nationalmap)
-- check catch distance condition
if p_pos:distance(m_pos) > self.catch_distance then
return false
end
-- try to catch pokemon
local caught = math.random(100) >= 100 - pokemon.chance
if caught then
-- update and notify on success
box.space.pokemons:update(
pokemon_id, {{'=', self.STATUS, self.state.CAUGHT}}
)
self:notify(player, pokemon)
end
return caught
Index iterators
By our gameplay, all caught pokémons are returned back to the map. We do this
for all pokémons on the map every 60 seconds using respawn() method.
We iterate through pokémons by status using Tarantool index iterator function
index_object:pairs() and reset the statuses of all
“caught” pokémons back to “active” using box.space.pokemons:update().
respawn = function(self)
fiber.name('Respawn fiber')
for _, tuple in box.space.pokemons.index.status:pairs(
self.state.CAUGHT) do
box.space.pokemons:update(
tuple[self.ID],
{{'=', self.STATUS, self.state.ACTIVE}}
)
end
end
For readability, we introduce named fields:
ID = 1,
STATUS = 2,
The complete implementation of start() now looks like this:
-- create game object
start = function(self)
-- create spaces and indexes
box.once('init', function()
box.schema.create_space('pokemons')
box.space.pokemons:create_index(
"primary", {type = 'hash', parts = {1, 'unsigned'}}
)
box.space.pokemons:create_index(
"status", {type = "tree", parts = {2, 'str'}}
)
end)
-- create models
local ok_m, pokemon = avro.create(schema.pokemon)
local ok_p, player = avro.create(schema.player)
if ok_m and ok_p then
-- compile models
local ok_cm, compiled_pokemon = avro.compile(pokemon)
local ok_cp, compiled_player = avro.compile(player)
if ok_cm and ok_cp then
-- start the game
self.pokemon_model = compiled_pokemon
self.player_model = compiled_player
self.respawn()
log.info('Started')
return true
else
log.error('Schema compilation failed')
end
else
log.info('Schema creation failed')
end
return false
end
Fibers, yields and cooperative multitasking
But wait! If we launch it as shown above – self.respawn() – the function
will be executed only once, just like all the other methods. But we need to
execute respawn() every 60 seconds. Creating a fiber
is the Tarantool way of making application logic work in the background at all
times.
A fiber is a set of instructions that are executed with
cooperative multitasking:
the instructions contain yield signals, upon which control is passed to another fiber.
Let’s launch respawn() in a fiber to make it work in the background all the time.
To do so, we’ll need to amend respawn():
respawn = function(self)
-- let's give our fiber a name;
-- this will produce neat output in fiber.info()
fiber.name('Respawn fiber')
while true do
for _, tuple in box.space.pokemons.index.status:pairs(
self.state.CAUGHT) do
box.space.pokemons:update(
tuple[self.ID],
{{'=', self.STATUS, self.state.ACTIVE}}
)
end
fiber.sleep(self.respawn_time)
end
end
and call it as a fiber in start():
start = function(self)
-- create spaces and indexes
<...>
-- create models
<...>
-- compile models
<...>
-- start the game
self.pokemon_model = compiled_pokemon
self.player_model = compiled_player
fiber.create(self.respawn, self)
log.info('Started')
-- errors if schema creation or compilation fails
<...>
end
Logging
One more helpful function that we used in start() was log.infо() from
Tarantool log module. We also need this function in
notify() to add a record to the log file on every successful catch:
-- event notification
notify = function(self, player, pokemon)
log.info("Player '%s' caught '%s'", player.name, pokemon.name)
end
We use default Tarantool log settings, so we’ll see the log
output in console when we launch our application in script mode.

Great! We’ve discussed all programming practices used in our Lua module (see
pokemon.lua).
Now let’s prepare the test environment. As planned, we write a Lua application
(see game.lua) to
initialize Tarantool’s database module, initialize our game, call the game loop
and simulate a couple of player requests.
To launch our microservice, we put both the pokemon.lua module and the game.lua
application in the current directory, install all external modules, and launch
the Tarantool instance running our game.lua application (this example is for
Ubuntu):
$ ls
game.lua pokemon.lua
$ sudo apt-get install tarantool-gis
$ sudo apt-get install tarantool-avro-schema
$ tarantool game.lua
Tarantool starts and initializes the database. Then Tarantool executes the demo
logic from game.lua: adds a pokémon named Pikachu (its chance to be caught
is very high, 99.1), displays the current map (it contains one active pokémon,
Pikachu) and processes catch requests from two players. Player1 is located just
near the lonely Pikachu pokémon and Player2 is located far away from it.
As expected, the catch results in this output are “true” for Player1 and “false”
for Player2. Finally, Tarantool displays the current map which is empty, because
Pikachu is caught and temporarily inactive:
$ tarantool game.lua
2017-01-09 20:19:24.605 [6282] main/101/game.lua C> version 1.7.3-43-gf5fa1e1
2017-01-09 20:19:24.605 [6282] main/101/game.lua C> log level 5
2017-01-09 20:19:24.605 [6282] main/101/game.lua I> mapping 1073741824 bytes for tuple arena...
2017-01-09 20:19:24.609 [6282] main/101/game.lua I> initializing an empty data directory
2017-01-09 20:19:24.634 [6282] snapshot/101/main I> saving snapshot `./00000000000000000000.snap.inprogress'
2017-01-09 20:19:24.635 [6282] snapshot/101/main I> done
2017-01-09 20:19:24.641 [6282] main/101/game.lua I> ready to accept requests
2017-01-09 20:19:24.786 [6282] main/101/game.lua I> Started
---
- {'id': 1, 'status': 'active', 'location': {'y': 2, 'x': 1}, 'name': 'Pikachu', 'chance': 99.1}
...
2017-01-09 20:19:24.789 [6282] main/101/game.lua I> Player 'Player1' caught 'Pikachu'
true
false
--- []
...
2017-01-09 20:19:24.789 [6282] main C> entering the event loop
nginx
In the real life, this microservice would work over HTTP. Let’s add
nginx web server to our environment and make a similar
demo. But how do we make Tarantool methods callable via REST API? We use nginx
with Tarantool nginx upstream
module and create one more Lua script
(app.lua) that
exports three of our game methods – add_pokemon(), map() and catch()
– as REST endpoints of the nginx upstream module:
local game = require('pokemon')
box.cfg{listen=3301}
game:start()
-- add, map and catch functions exposed to REST API
function add(request, pokemon)
return {
result=game:add_pokemon(pokemon)
}
end
function map(request)
return {
map=game:map()
}
end
function catch(request, pid, player)
local id = tonumber(pid)
if id == nil then
return {result=false}
end
return {
result=game:catch(id, player)
}
end
An easy way to configure and launch nginx would be to create a Docker container
based on a Docker image
with nginx and the upstream module already installed (see
http/Dockerfile).
We take a standard
nginx.conf,
where we define an upstream with our Tarantool backend running (this is another
Docker container, see details below):
upstream tnt {
server pserver:3301 max_fails=1 fail_timeout=60s;
keepalive 250000;
}
and add some Tarantool-specific parameters (see descriptions in the upstream
module’s README
file):
server {
server_name tnt_test;
listen 80 default deferred reuseport so_keepalive=on backlog=65535;
location = / {
root /usr/local/nginx/html;
}
location /api {
# answers check infinity timeout
tnt_read_timeout 60m;
if ( $request_method = GET ) {
tnt_method "map";
}
tnt_http_rest_methods get;
tnt_http_methods all;
tnt_multireturn_skip_count 2;
tnt_pure_result on;
tnt_pass_http_request on parse_args;
tnt_pass tnt;
}
}
Likewise, we put Tarantool server and all our game logic in a second Docker
container based on the
official Tarantool 1.9 image (see
src/Dockerfile)
and set the container’s default command to tarantool app.lua.
This is the backend.
Non-blocking IO
To test the REST API, we create a new script
(client.lua),
which is similar to our game.lua application, but makes HTTP POST and GET
requests rather than calling Lua functions:
local http = require('curl').http()
local json = require('json')
local URI = os.getenv('SERVER_URI')
local fiber = require('fiber')
local player1 = {
name="Player1",
id=1,
location = {
x=1.0001,
y=2.0003
}
}
local player2 = {
name="Player2",
id=2,
location = {
x=30.123,
y=40.456
}
}
local pokemon = {
name="Pikachu",
chance=99.1,
id=1,
status="active",
location = {
x=1,
y=2
}
}
function request(method, body, id)
local resp = http:request(
method, URI, body
)
if id ~= nil then
print(string.format('Player %d result: %s',
id, resp.body))
else
print(resp.body)
end
end
local players = {}
function catch(player)
fiber.sleep(math.random(5))
print('Catch pokemon by player ' .. tostring(player.id))
request(
'POST', '{"method": "catch",
"params": [1, '..json.encode(player)..']}',
tostring(player.id)
)
table.insert(players, player.id)
end
print('Create pokemon')
request('POST', '{"method": "add",
"params": ['..json.encode(pokemon)..']}')
request('GET', '')
fiber.create(catch, player1)
fiber.create(catch, player2)
-- wait for players
while #players ~= 2 do
fiber.sleep(0.001)
end
request('GET', '')
os.exit()
When you run this script, you’ll notice that both players have equal chances to
make the first attempt at catching the pokémon. In a classical Lua script,
a networked call blocks the script until it’s finished, so the first catch
attempt can only be done by the player who entered the game first. In Tarantool,
both players play concurrently, since all modules are integrated with Tarantool
cooperative multitasking and use
non-blocking I/O.
Indeed, when Player1 makes its first REST call, the script doesn’t block.
The fiber running catch() function on behalf of Player1 issues a non-blocking
call to the operating system and yields control to the next fiber, which happens
to be the fiber of Player2. Player2’s fiber does the same. When the network
response is received, Player1’s fiber is activated by Tarantool cooperative
scheduler, and resumes its work. All Tarantool modules
use non-blocking I/O and are integrated with Tarantool cooperative scheduler.
For module developers, Tarantool provides an API.
For our HTTP test, we create a third container based on the
official Tarantool 1.9 image (see
client/Dockerfile)
and set the container’s default command to tarantool client.lua.

To run this test locally, download our pokemon
project from GitHub and say:
$ docker-compose build
$ docker-compose up
Docker Compose builds and runs all the three containers: pserver (Tarantool
backend), phttp (nginx) and pclient (demo client). You can see log
messages from all these containers in the console, pclient saying that it made
an HTTP request to create a pokémon, made two catch requests, requested the map
(empty since the pokémon is caught and temporarily inactive) and exited:
pclient_1 | Create pokemon
<...>
pclient_1 | {"result":true}
pclient_1 | {"map":[{"id":1,"status":"active","location":{"y":2,"x":1},"name":"Pikachu","chance":99.100000}]}
pclient_1 | Catch pokemon by player 2
pclient_1 | Catch pokemon by player 1
pclient_1 | Player 1 result: {"result":true}
pclient_1 | Player 2 result: {"result":false}
pclient_1 | {"map":[]}
pokemon_pclient_1 exited with code 0
Congratulations! Here’s the end point of our walk-through. As further reading,
see more about installing and
contributing a module.
See also reference on Tarantool modules and
C API, and don’t miss our
Lua cookbook recipes.
C tutorial
Tarantool can call C code with modules,
or with ffi,
or with C stored procedures.
This tutorial only is about the third option, C stored procedures.
In fact the routines are always “C functions” but the phrase
“stored procedure” is commonly used for historical reasons.
In this tutorial, which can be followed by anyone with a Tarantool
development package and a C compiler, there are five tasks:
- easy.c – prints “hello world”;
- harder.c – decodes a passed parameter value;
- hardest.c – uses the C API to do a DBMS insert;
- read.c – uses the C API to do a DBMS select;
- write.c – uses the C API to do a DBMS replace.
After following the instructions, and seeing that the results
are what is described here, users should feel confident about
writing their own stored procedures.
Check that these items exist on the computer:
- Tarantool 2.1 or later
- A gcc compiler, any modern version should work
module.h and files #included in it
msgpuck.h
libmsgpuck.a (only for some recent msgpuck versions)
The module.h file will exist if Tarantool was installed from source.
Otherwise Tarantool’s “developer” package must be installed.
For example on Ubuntu say:
$ sudo apt-get install tarantool-dev
or on Fedora say:
$ dnf -y install tarantool-devel
The msgpuck.h file will exist if Tarantool was installed from source.
Otherwise the “msgpuck” package must be installed from
https://github.com/tarantool/msgpuck.
Both module.h and msgpuck.h must be on the include path for the
C compiler to see them.
For example, if module.h address is /usr/local/include/tarantool/module.h,
and msgpuck.h address is /usr/local/include/msgpuck/msgpuck.h,
and they are not currently on the include path, say:
$ export CPATH=/usr/local/include/tarantool:/usr/local/include/msgpuck
The libmsgpuck.a static library is necessary with msgpuck versions
produced after February 2017. If and only if you encounter linking
problems when using the gcc statements in the examples for this tutorial, you should
put libmsgpuck.a on the path (libmsgpuck.a is produced from both msgpuck
and Tarantool source downloads so it should be easy to find). For
example, instead of “gcc -shared -o harder.so -fPIC harder.c”
for the second example below, you will need to say
“gcc -shared -o harder.so -fPIC harder.c libmsgpuck.a”.
Requests will be done using Tarantool as a
client.
Start Tarantool, and enter these requests.
box.cfg{listen=3306}
box.schema.space.create('capi_test')
box.space.capi_test:create_index('primary')
net_box = require('net.box')
capi_connection = net_box:new(3306)
In plainer language: create a space named capi_test,
and make a connection to self named capi_connection.
Leave the client running. It will be necessary to enter more requests later.
Start another shell. Change directory (cd) so that it is
the same as the directory that the client is running on.
Create a file. Name it easy.c. Put the following code in it:
#include "module.h"
int easy(box_function_ctx_t *ctx, const char *args, const char *args_end)
{
printf("hello world\n");
return 0;
}
int easy2(box_function_ctx_t *ctx, const char *args, const char *args_end)
{
printf("hello world -- easy2\n");
return 0;
}
Compile the program, producing a library file named easy.so:
$ gcc -shared -o easy.so -fPIC easy.c
Now go back to the client and execute these requests:
box.schema.func.create('easy', {language = 'C'})
box.schema.user.grant('guest', 'execute', 'function', 'easy')
capi_connection:call('easy')
If these requests appear unfamiliar,
re-read the descriptions of
box.schema.func.create(),
box.schema.user.grant()
and conn:call().
The function that matters is capi_connection:call('easy').
Its first job is to find the ‘easy’ function, which should
be easy because by default Tarantool looks on the current
directory for a file named easy.so.
Its second job is to call the ‘easy’ function.
Since the easy() function in easy.c begins with printf("hello world\n"),
the words “hello world” will appear on the screen.
Its third job is to check that the call was successful.
Since the easy() function in easy.c ends with return 0,
there is no error message to display and the request is over.
The result should look like this:
Now let’s call the other function in easy.c – easy2().
This is almost the same as the easy() function, but there’s a detail:
when the file name is not the same as the function name,
then we have to specify
file-name.function-name.
box.schema.func.create('easy.easy2', {language = 'C'})
box.schema.user.grant('guest', 'execute', 'function', 'easy.easy2')
capi_connection:call('easy.easy2')
… and this time the result will be “hello world – easy2”.
Conclusion: calling a C function is easy.
Go back to the shell where the easy.c program was created.
Create a file. Name it harder.c. Put these 17 lines in it:
#include "module.h"
#include "msgpuck.h"
int harder(box_function_ctx_t *ctx, const char *args, const char *args_end)
{
uint32_t arg_count = mp_decode_array(&args);
printf("arg_count = %d\n", arg_count);
uint32_t field_count = mp_decode_array(&args);
printf("field_count = %d\n", field_count);
uint32_t val;
int i;
for (i = 0; i < field_count; ++i)
{
val = mp_decode_uint(&args);
printf("val=%d.\n", val);
}
return 0;
}
Compile the program, producing a library file named harder.so:
$ gcc -shared -o harder.so -fPIC harder.c
Now go back to the client and execute these requests:
box.schema.func.create('harder', {language = 'C'})
box.schema.user.grant('guest', 'execute', 'function', 'harder')
passable_table = {}
table.insert(passable_table, 1)
table.insert(passable_table, 2)
table.insert(passable_table, 3)
capi_connection:call('harder', {passable_table})
This time the call is passing a Lua table (passable_table)
to the harder() function. The harder() function will see it,
it’s in the char *args parameter.
At this point the harder() function will start using functions
defined in msgpuck.h.
The routines that begin with “mp” are msgpuck functions that
handle data formatted according to the MsgPack specification.
Passes and returns are always done with this format so
one must become acquainted with msgpuck
to become proficient with the C API.
For now, though, it’s enough to know that mp_decode_array()
returns the number of elements in an array, and mp_decode_uint
returns an unsigned integer, from args. And there’s a side
effect: when the decoding finishes, args has changed
and is now pointing to the next element.
Therefore the first displayed line will be “arg_count = 1”
because there was only one item passed: passable_table.
The second displayed line will be “field_count = 3”
because there are three items in the table.
The next three lines will be “1” and “2” and “3”
because those are the values in the items in the table.
And now the screen looks like this:
Conclusion: decoding parameter values passed to a
C function is not easy at first, but there are routines
to do the job, and they’re documented, and there aren’t
very many of them.
Go back to the shell where the easy.c
and the harder.c programs were created.
Create a file. Name it hardest.c. Put these 13 lines in it:
#include "module.h"
#include "msgpuck.h"
int hardest(box_function_ctx_t *ctx, const char *args, const char *args_end)
{
uint32_t space_id = box_space_id_by_name("capi_test", strlen("capi_test"));
char tuple[1024]; /* Must be big enough for mp_encode results */
char *tuple_pointer = tuple;
tuple_pointer = mp_encode_array(tuple_pointer, 2);
tuple_pointer = mp_encode_uint(tuple_pointer, 10000);
tuple_pointer = mp_encode_str(tuple_pointer, "String 2", 8);
int n = box_insert(space_id, tuple, tuple_pointer, NULL);
return n;
}
Compile the program, producing a library file named hardest.so:
$ gcc -shared -o hardest.so -fPIC hardest.c
Now go back to the client and execute these requests:
box.schema.func.create('hardest', {language = "C"})
box.schema.user.grant('guest', 'execute', 'function', 'hardest')
box.schema.user.grant('guest', 'read,write', 'space', 'capi_test')
capi_connection:call('hardest')
This time the C function is doing three things:
- finding the numeric identifier of the
capi_test space
by calling box_space_id_by_name();
- formatting a tuple using more
msgpuck.h functions;
- inserting a tuple using
box_insert().
Warning
char tuple[1024]; is used here as just a quick way
of saying “allocate more than enough bytes”. For serious
programs the developer must be careful to allow enough space for
all the bytes that the mp_encode routines will use up.
Now, still on the client, execute this request:
box.space.capi_test:select()
The result should look like this:
This proves that the hardest() function succeeded, but
where did box_space_id_by_name() and
box_insert() come from?
Answer: the C API.
Go back to the shell where the easy.c
and the harder.c and the hardest.c programs were created.
Create a file. Name it read.c. Put these 43 lines in it:
#include "module.h"
#include <msgpuck.h>
int read(box_function_ctx_t *ctx, const char *args, const char *args_end)
{
char tuple_buf[1024]; /* where the raw MsgPack tuple will be stored */
uint32_t space_id = box_space_id_by_name("capi_test", strlen("capi_test"));
uint32_t index_id = 0; /* The number of the space's first index */
uint32_t key = 10000; /* The key value that box_insert() used */
mp_encode_array(tuple_buf, 0); /* clear */
box_tuple_format_t *fmt = box_tuple_format_default();
box_tuple_t *tuple = NULL;
char key_buf[16]; /* Pass key_buf = encoded key = 1000 */
char *key_end = key_buf;
key_end = mp_encode_array(key_end, 1);
key_end = mp_encode_uint(key_end, key);
assert(key_end <= key_buf + sizeof(key_buf));
/* Get the tuple. There's no box_select() but there's this. */
int r = box_index_get(space_id, index_id, key_buf, key_end, &tuple);
assert(r == 0);
assert(tuple != NULL);
/* Get each field of the tuple + display what you get. */
int field_no; /* The first field number is 0. */
for (field_no = 0; field_no < 2; ++field_no)
{
const char *field = box_tuple_field(tuple, field_no);
assert(field != NULL);
assert(mp_typeof(*field) == MP_STR || mp_typeof(*field) == MP_UINT);
if (mp_typeof(*field) == MP_UINT)
{
uint32_t uint_value = mp_decode_uint(&field);
printf("uint value=%u.\n", uint_value);
}
else /* if (mp_typeof(*field) == MP_STR) */
{
const char *str_value;
uint32_t str_value_length;
str_value = mp_decode_str(&field, &str_value_length);
printf("string value=%.*s.\n", str_value_length, str_value);
}
}
return 0;
}
Compile the program, producing a library file named read.so:
$ gcc -shared -o read.so -fPIC read.c
Now go back to the client and execute these requests:
box.schema.func.create('read', {language = "C"})
box.schema.user.grant('guest', 'execute', 'function', 'read')
box.schema.user.grant('guest', 'read,write', 'space', 'capi_test')
capi_connection:call('read')
This time the C function is doing four things:
- once again, finding the numeric identifier of the
capi_test space
by calling box_space_id_by_name();
- formatting a search key = 10000 using more
msgpuck.h functions;
- getting a tuple using
box_index_get();
- going through the tuple’s fields with
box_tuple_get() and then
decoding each field depending on its type. In this case, since
what we are getting is the tuple that we inserted with hardest.c,
we know in advance that the type is either MP_UINT or MP_STR;
however, it’s very common to have a case statement here with one
option for each possible type.
The result of capi_connection:call('read') should look like this:
This proves that the read() function succeeded.
Once again the important functions that start with box
– box_index_get() and
box_tuple_field() –
came from the C API.
Go back to the shell where the programs easy.c, harder.c, hardest.c
and read.c were created.
Create a file. Name it write.c. Put these 24 lines in it:
#include "module.h"
#include <msgpuck.h>
int write(box_function_ctx_t *ctx, const char *args, const char *args_end)
{
static const char *space = "capi_test";
char tuple_buf[1024]; /* Must be big enough for mp_encode results */
uint32_t space_id = box_space_id_by_name(space, strlen(space));
if (space_id == BOX_ID_NIL) {
return box_error_set(__FILE__, __LINE__, ER_PROC_C,
"Can't find space %s", "capi_test");
}
char *tuple_end = tuple_buf;
tuple_end = mp_encode_array(tuple_end, 2);
tuple_end = mp_encode_uint(tuple_end, 1);
tuple_end = mp_encode_uint(tuple_end, 22);
box_txn_begin();
if (box_replace(space_id, tuple_buf, tuple_end, NULL) != 0)
return -1;
box_txn_commit();
fiber_sleep(0.001);
struct tuple *tuple = box_tuple_new(box_tuple_format_default(),
tuple_buf, tuple_end);
return box_return_tuple(ctx, tuple);
}
Compile the program, producing a library file named write.so:
$ gcc -shared -o write.so -fPIC write.c
Now go back to the client and execute these requests:
box.schema.func.create('write', {language = "C"})
box.schema.user.grant('guest', 'execute', 'function', 'write')
box.schema.user.grant('guest', 'read,write', 'space', 'capi_test')
capi_connection:call('write')
This time the C function is doing six things:
- once again, finding the numeric identifier of the
capi_test space
by calling box_space_id_by_name();
- making a new tuple;
- starting a transaction;
- replacing a tuple in
box.space.capi_test
- ending a transaction;
- the final line is a replacement for the loop in
read.c –
instead of getting each field and printing it, use the
box_return_tuple(...) function to return the entire tuple
to the caller and let the caller display it.
The result of capi_connection:call('write') should look like this:
This proves that the write() function succeeded.
Once again the important functions that start with box
– box_txn_begin(),
box_txn_commit() and
box_return_tuple() –
came from the C API.
Conclusion: the long description of the whole C API is
there for a good reason.
All of the functions in it can be called from C functions
which are called from Lua.
So C “stored procedures” have full access to the database.
An example in the test suite
Download the source code of Tarantool. Look in a subdirectory
test/box. Notice that there is a file named
tuple_bench.test.lua and another file named
tuple_bench.c. Examine the Lua file and observe
that it is calling a function in the C file, using the
same techniques that this tutorial has shown.
Conclusion: parts of the standard test suite
use C stored procedures, and they must work,
because releases don’t happen if Tarantool doesn’t pass the tests.
Developing with an IDE
You can use IntelliJ IDEA as an IDE to develop and debug Lua applications for
Tarantool.
Download and install the IDE from the
official web-site.
JetBrains provides specialized editions for particular languages:
IntelliJ IDEA (Java), PHPStorm (PHP), PyCharm (Python), RubyMine (Ruby),
CLion (C/C++), WebStorm (Web) and others.
So, download a version that suits your primary programming language.
Tarantool integration is supported for all editions.
Configure the IDE:
Start IntelliJ IDEA.
Click Configure button and select Plugins.
Click Browse repositories.
Install EmmyLua plugin.
Note
Please don’t be confused with Lua plugin, which is less powerful
than EmmyLua.
Restart IntelliJ IDEA.
Click Configure, select Project Defaults and then
Run Configurations.
Find Lua Application in the sidebar at the left.
In Program, type a path to an installed tarantool binary.
By default, this is tarantool or /usr/bin/tarantool on most
platforms.
If you installed tarantool from sources to a custom directory,
please specify the proper path here.
Now IntelliJ IDEA is ready to use with Tarantool.
Create a new Lua project.
Add a new Lua file, for example init.lua.
Write your code, save the file.
To run you application, click Run -> Run in the main menu and select
your source file in the list.
Or click Run -> Debug to start debugging.
Note
To use Lua debugger, please upgrade Tarantool to version
1.7.5-29-gbb6170e4b or later.
tt CLI utility
tt is a utility that provides a unified command-line interface for managing
Tarantool-based applications. It covers a wide range of tasks – from installing
a specific Tarantool version to managing remote instances and developing applications.
tt is developed in its own GitHub repository.
Here you can find its source code, changelog, and releases information.
For a complete list of releases, see the Releases section on GitHub.
There is also the Enterprise version of tt available in a
Tarantool Enterprise Edition’s release package.
The Enterprise version provides additional features, for example, importing and exporting data.
This section provides instructions on tt installation and configuration,
concept explanation, and the tt command reference.
The key aspect of the tt usage is an environment. A tt environment
is a directory that includes a tt configuration, Tarantool installations,
application files, and other resources. If you’re familiar with Python virtual
environments,
you can think of tt environments as their analog.
tt environments enable independent management of multiple Tarantool applications,
each running on its own Tarantool version and configuration, on a single host in
an isolated manner.
To create a tt environment in a directory, run tt init in it.
Multi-instance applications
tt supports Tarantool applications that run on multiple instances. For example,
you can write an application that includes different source files for storage and router
instances. With tt, you can start and stop them in a single call, or manage
each instance independently.
Learn more about working with multi-instance applications in Multi-instance applications.
Installation
To install the tt command-line utility, use a package manager – Yum or
APT on Linux, or Homebrew on macOS. If you need a specific build, you can build
tt from sources.
Using Linux package managers
On Linux systems, you can install tt with yum or apt package managers
from the tarantool/modules repository. Learn how to add this repository.
The installation command looks like this:
On Ubuntu:
$ sudo apt-get install tt
On CentOS:
On macOS, use Homebrew to install tt:
To build tt from sources:
- Install third-party software required for building
tt:
Clone the tarantool/tt repository:
git clone https://github.com/tarantool/tt --recursive
Go to the tt directory:
(Optional) Checkout a release tag to build a specific version:
Build tt using mage:
tt will appear in the current directory.
Enabling shell completion
To enable the completion for tt commands, run the following command specifying
the shell (bash or zsh):
Configuration
The key artifact that defines the tt environment and various aspects of its
execution is its configuration file. You can generate it with a tt init call.
In the default launch mode, the file is generated
in the current directory, making it the environment root.
By default, the configuration file is called tt.yaml and located in the tt
environment root directory. It depends on the launch mode.
It is also possible to pass the configuration file name and location explicitly using
the following ways:
-c/--cfg global option
TT_CLI_CFG environment variable.
The TT_CLI_CFG variable has a lower priority than the --cfg option.
The tt configuration file is a YAML file with the following structure:
env:
instances_enabled: path/to/available/applications
bin_dir: path/to/bin_dir
inc_dir: path/to/inc_dir
restart_on_failure: bool
tarantoolctl_layout: bool
modules:
directory: path/to/modules/dir
app:
run_dir: path/to/run_dir
log_dir: path/to/log_dir
wal_dir: path/to/wal_dir
vinyl_dir: path/to/vinyl_dir
memtx_dir: path/to/memtx_dir
repo:
rocks: path/to/rocks
distfiles: path/to/install
ee:
credential_path: path/to/file
templates:
- path: path/to/app/templates1
- path: path/to/app/templates2
Note
The paths specified in env.* parameters are relative to the current tt
environment’s root.
instances_enabled – the directory where instances
are stored. Default: instances.enabled.
bin_dir – the directory where binary files are stored. Default: bin.
inc_dir – the base directory for storing header files. They will
be placed in the include subdirectory inside the specified directory.
Default: include.
Note
The header files directory path can also be passed using the TT_CLI_TARANTOOL_PREFIX
environment variable. If it is set, tt rocks and tt build commands use the
include/tarantool directory inside TT_CLI_TARANTOOL_PREFIX as the
header files directory.
restart_on_failure – restart the instance on failure: true or false.
Default: false.
tarantoolctl_layout – use a layout compatible with the deprecated tarantoolctl
utility for artifact files: control sockets, .pid files, log files.
Default: false.
Note
The paths specified in app.*_dir parameters are relative to the application
location inside the instances.enabled directory specified in the env
configuration section. For example, the default location of the myapp
application’s logs is instances.enabled/myapp/var/log.
Inside this location, tt creates separate directories for each application
instance that runs in the current environment.
run_dir– the directory for instance runtime artifacts, such as console
sockets or PID files. Default: var/run.
log_dir – the directory where log files are stored. Default: var/log.
wal_dir – the directory where write-ahead log (.xlog) files are stored.
Default: var/lib.
memtx_dir – the directory where memtx stores snapshot (.snap) files.
Default: var/lib.
vinyl_dir – the directory where vinyl files or subdirectories are stored.
Default: var/lib.
path – a path to application templates used for creating applications with
tt create. May be specified more than once.
tt launch mode defines its working directory and the way it searches for the
configuration file. There are three launch modes:
Global option: none
Configuration file: searched from the current directory to the root.
Taken from /etc/tarantool if the file is not found.
Working directory: The directory where the configuration file is found.
Global option: --system or -S
Configuration file: Taken from /etc/tarantool.
Working directory: Current directory.
Global option: --local=DIRECTORY or -L=DIRECTORY
Configuration file: Searched from the specified directory to the root.
Taken from /etc/tarantool if the file is not found.
Working directory: The specified directory. If tarantool or tt
executable files are found in the working directory, they will be used.
Migrating from tt 1.* to 2.0 or later
The tt configuration and application layout were changed in version 2.0.
If you are using tt 1.*, complete the following steps to migrate to tt 2.0 or later:
Update the tt configuration file.
In tt 2.0, the following changes were made to the configuration file:
- The root section
tt was removed. Its child sections – app, repo,
modules, and other – have been moved to the top level.
- Environment configuration parameters were moved from the
app section
to the new section env. These parameters are instances.enabled,
bin_dir, inc_dir, and restart_on_failure.
- The paths in the
app section are now relative to the app directory in instances.enabled
instead of the environment root.
You can use tt init to generate a configuration file with
the new structure and default parameter values.
Move application artifacts.
With tt 1.*, application artifacts (logs, snapshots, pid, and other files)
were created in the var directory inside the environment root. Starting from
tt 2.0, these artifacts are created in the var directory inside the
application directory, which is instances.enabled/<app-name>. This is
how an application directory looks:
instances.enabled/app/
├── init.lua
├── instances.yml
└── var
├── lib
│ ├── instance1
│ └── instance2
├── log
│ ├── instance1
│ └── instance2
└── run
├── instance1
└── instance2
To continue using existing application artifacts after migration from tt 1.*:
- Create the
var directory inside the application directory.
- Create the
lib, log, and run directories inside var.
- Move directories with instance artifacts from the old
var directory
to the new var directories in applications’ directories.
Move the files accessed from the application code.
The working directory of instance processes was changed from the tt working
directory to the application directory inside instances.enabled. If the
application accesses files using relative paths, move the files accordingly
or adjust the application code.
Global options
Important
Global options of tt must be passed before its commands and other options.
For example:
$ tt --cfg tt-conf.yaml start app
tt has the following global options:
-
-c=file, --cfg=file,
Path to the configuration file.
Alternatively, this path can be passed in the TT_CLI_CFG environment variable.
-
-h, --help
Display help.
-
--integrity-check PUBLIC_KEY
-
Perform an integrity check using the specified public key before executing the operation.
Learn more in Integrity check.
-
-I, --internal
Force the use of an internal module even if there is an
external module with the same name.
-
-L=DIRECTORY, --local=DIRECTORY
Use the tt environment from the specified directory.
Learn more about the local launch mode.
-
-s, --self
Use the current tt version instead of executing the one located
in the bin_dir directory.
-
-S, --system
Use the tt environment installed in the system.
Learn more about the system launch mode.
-
-V, --verbose
Display detailed processing information (verbose mode).
Developing applications
This section describes tt capabilities related to developing cluster applications.
Application environment
This section provides a high-level overview on how to prepare a Tarantool application for deployment
and how the application’s environment and layout might look.
This information is helpful for understanding how to administer Tarantool instances using tt CLI in both development and production environments.
The main steps of creating and preparing the application for deployment are:
- Initializing a local environment.
- Creating and developing an application.
- Packaging the application.
In this section, a sharded_cluster_crud application is used as an example.
This cluster includes 5 instances: one router and 4 storages, which constitute two replica sets.
Initializing a local environment
Before creating an application, you need to set up a local environment for tt:
Create a home directory for the environment.
Run tt init in this directory:
~/myapp$ tt init
• Environment config is written to 'tt.yaml'
This command creates a default tt configuration file tt.yaml for a local
environment and the directories for applications, control sockets, logs, and other
artifacts:
~/myapp$ ls
bin distfiles include instances.enabled modules templates tt.yaml
Find detailed information about the tt configuration parameters and launch modes
on the tt configuration page.
Creating and developing an application
You can create an application in two ways:
- Manually by preparing its layout in a directory inside
instances_enabled.
The directory name is used as the application identifier.
- From a template by using the tt create command.
In this example, the application’s layout is prepared manually and looks as follows.
~/myapp$ tree
.
├── bin
├── distfiles
├── include
├── instances.enabled
│ └── sharded_cluster_crud
│ ├── config.yaml
│ ├── instances.yaml
│ ├── router.lua
│ ├── sharded_cluster_crud-scm-1.rockspec
│ └── storage.lua
├── modules
├── templates
└── tt.yaml
The sharded_cluster_crud directory contains the following files:
config.yaml: contains the configuration of the cluster. This file might include the entire cluster topology or provide connection settings to a centralized configuration storage.
instances.yml: specifies instances to run in the current environment. For example, on the developer’s machine, this file might include all the instances defined in the cluster configuration. In the production environment, this file includes instances to run on the specific machine.
router.lua: includes code specific for a router.
sharded_cluster_crud-scm-1.rockspec: specifies the required external dependencies (for example, vshard and crud).
storage.lua: includes code specific for storages.
You can find the full example here:
sharded_cluster_crud.
Packaging the application
To package the ready application, use the tt pack command.
This command can create an installable DEB/RPM package or generate .tgz archive.
The structure below reflects the content of the packed .tgz archive for the sharded_cluster_crud application:
~/myapp$ tree -a
.
├── bin
│ ├── tarantool
│ └── tt
├── instances.enabled
│ └── sharded_cluster_crud -> ../sharded_cluster_crud
├── sharded_cluster_crud
│ ├── .rocks
│ │ └── share
│ │ └── ...
│ ├── config.yaml
│ ├── instances.yaml
│ ├── router.lua
│ └── storage.lua
└── tt.yaml
The application’s layout looks similar to the one defined when developing the application with some differences:
bin: contains the tarantool and tt binaries packed with the application bundle.
instances.enabled: contains a symlink to the packed sharded_cluster application.
sharded_cluster_crud: a packed application. In addition to files created during the application development, includes the .rocks directory containing application dependencies (for example, vshard and crud).
tt.yaml: a tt configuration file.
Note
In DEB/PRM packages generated by tt pack, there are also .service
unit files for each packaged application.
Deploying the application
When deploying a distributed cluster application from a .tar.gz archive, you can
define instances to run on each machine by changing the content of the instances.yaml file.
On the developer’s machine, this file might include all the instances defined in the cluster configuration.
instances.yaml:
storage-a-001:
storage-a-002:
storage-b-001:
storage-b-002:
router-a-001:
In the production environment, this file includes instances to run on the specific machine.
instances.yaml (Server-001):
instances.yaml (Server-002):
storage-a-001:
storage-b-001:
instances.yaml (Server-003):
storage-a-002:
storage-b-002:
The Starting and stopping instances section describes how to start and stop Tarantool instances.
Tarantool applications installed from DEB and RPM packages built with tt pack
can run as systemd services. They run on behalf of the tarantool system user.
It is created automatically during the package installation.
By default, the application artifacts are placed in the following directories:
/var/lib/tarantool/sys_env – application data
/var/log/tarantool/sys_env – logs
/var/run/tarantool/sys_env – runtime artifacts
If you want to change these directories, make sure that the tarantool user
has enough permissions on the directories you use.
Starting and stopping instances
Note
To run instances in production, it is recommended to use Ansible Tarantool Enterprise installer (ATE).
ATE is a set of Ansible playbooks that are used to deploy and maintain Tarantool Enterprise products.
ATE documentation is available to users logged in on the Tarantool website.
This section describes how to manage instances in a Tarantool cluster using the tt utility.
A cluster can include multiple instances that run different code.
A typical example is a cluster application that includes router and storage instances.
Particularly, you can perform the following actions:
- start all instances in a cluster or only specific ones
- check the status of instances
- connect to a specific instance
- stop all instances or only specific ones
To get more context on how the application’s environment might look, refer to Application environment.
Note
In this section, a sharded_cluster_crud application is used to demonstrate how to start, stop, and manage instances in a cluster.
Basic instance management
Most of the commands described in this section can be called with or without an instance name.
Without the instance name, they are executed for all instances defined in instances.yaml.
Checking an instance’s status
To check the status of instances, execute tt status:
$ tt status sharded_cluster_crud
INSTANCE STATUS PID MODE CONFIG BOX UPSTREAM
sharded_cluster_crud:router-a-001 RUNNING 8382 RW ready running --
sharded_cluster_crud:storage-a-001 RUNNING 8386 RW ready running --
sharded_cluster_crud:storage-a-002 RUNNING 8390 RO ready running --
sharded_cluster_crud:storage-b-001 RUNNING 8379 RW ready running --
sharded_cluster_crud:storage-b-002 RUNNING 8380 RO ready running --
To check the status of a specific instance, you need to specify its name:
$ tt status sharded_cluster_crud:storage-a-001
INSTANCE STATUS PID MODE CONFIG BOX UPSTREAM
sharded_cluster_crud:storage-a-001 RUNNING 8386 RW ready running --
Connecting to an instance
To connect to the instance, use the tt connect command:
$ tt connect sharded_cluster_crud:storage-a-001
• Connecting to the instance...
• Connected to sharded_cluster_crud:storage-a-001
sharded_cluster_crud:storage-a-001>
In the instance’s console, you can execute commands provided by the box module.
For example, box.info can be used to get various information about a running instance:
To restart an instance, use tt restart:
$ tt restart sharded_cluster_crud:storage-a-002
After executing tt restart, you need to confirm this operation:
Confirm restart of 'sharded_cluster_crud:storage-a-002' [y/n]: y
• The Instance sharded_cluster_crud:storage-a-002 (PID = 2026) has been terminated.
• Starting an instance [sharded_cluster_crud:storage-a-002]...
To stop the specific instance, use tt stop as follows:
$ tt stop sharded_cluster_crud:storage-a-002
You can also stop all the instances at once as follows:
$ tt stop sharded_cluster_crud
• The Instance sharded_cluster_crud:storage-b-001 (PID = 2020) has been terminated.
• The Instance sharded_cluster_crud:storage-b-002 (PID = 2021) has been terminated.
• The Instance sharded_cluster_crud:router-a-001 (PID = 2022) has been terminated.
• The Instance sharded_cluster_crud:storage-a-001 (PID = 2023) has been terminated.
• can't "stat" the PID file. Error: "stat /home/testuser/myapp/instances.enabled/sharded_cluster_crud/var/run/storage-a-002/tt.pid: no such file or directory"
Note
The error message indicates that storage-a-002 is already not running.
Removing instance artifacts
The tt clean command removes instance artifacts (such as logs or snapshots):
$ tt clean sharded_cluster_crud
• List of files to delete:
• /home/testuser/myapp/instances.enabled/sharded_cluster_crud/var/log/storage-a-001/tt.log
• /home/testuser/myapp/instances.enabled/sharded_cluster_crud/var/lib/storage-a-001/00000000000000001062.snap
• /home/testuser/myapp/instances.enabled/sharded_cluster_crud/var/lib/storage-a-001/00000000000000001062.xlog
• ...
Confirm [y/n]:
Enter y and press Enter to confirm removing of artifacts for each instance.
Note
The -f option of the tt clean command can be used to remove the files without confirmation.
Preloading Lua scripts and modules
Tarantool supports loading and running chunks of Lua code before starting instances.
To load or run Lua code immediately upon Tarantool startup, specify the TT_PRELOAD
environment variable. Its value can be either a path to a Lua script or a Lua module name:
To run the Lua script preload_script.lua from the sharded_cluster_crud directory, set TT_PRELOAD as follows:
$ TT_PRELOAD=preload_script.lua tt start sharded_cluster_crud
Tarantool runs the preload_script.lua code, waits for it to complete, and
then starts instances.
To load the preload_module from the sharded_cluster_crud directory, set TT_PRELOAD as follows:
$ TT_PRELOAD=preload_module tt start sharded_cluster_crud
Note
TT_PRELOAD values that end with .lua are considered scripts,
so avoid module names with this ending.
To load several scripts or modules, pass them in a single quoted string, separated
by semicolons:
$ TT_PRELOAD="preload_script.lua;preload_module" tt start sharded_cluster_crud
If an error happens during the execution of the preload script or module, Tarantool
reports the problem and exits.
Commands
Below is a list of tt commands. Run tt COMMAND help to see the detailed
help for the given command.
| binaries |
Show a list of installed binaries and their versions |
| build |
Build an application locally |
| cartridge |
Manage a Cartridge application |
| cat |
Print the contents of .snap or .xlog files into stdout |
| cfg |
Manage a tt environment configuration |
| check |
Check an application file for syntax errors |
| clean |
Clean instance files |
| cluster |
Manage a cluster’s configuration |
| completion |
Generate completion for a specified shell |
| connect |
Connect to a Tarantool instance |
| coredump |
Manipulate Tarantool core dumps |
| create |
Create an application from a template |
| crud |
Interact with the CRUD module (Enterprise only) |
| download |
Download the Tarantool Enterprise SDK |
| export |
Export data to a file (Enterprise only) |
| help |
Display help for tt or a specific command |
| import |
Import data from a file (Enterprise only) |
| init |
Create a new tt environment in the current directory |
| install |
Install Tarantool or tt |
| instances |
List enabled applications |
| kill |
Terminate Tarantool applications or instances |
| log |
Print instance logs |
| logrotate |
Rotate instance logs |
| migrations |
Manage migrations |
| pack |
Package an application |
| play |
Play the contents of .snap or .xlog files to another Tarantool instance |
| replicaset |
Manage replica sets |
| restart |
Restart Tarantool applications or instances |
| rocks |
Use the LuaRocks package manager |
| run |
Run Lua code in a Tarantool instance |
| search |
Search available Tarantool and tt versions |
| start |
Start Tarantool applications or instances |
| status |
Get the current status of applications or instances |
| stop |
Stop Tarantool applications or instances |
| tdg2 |
Interact with Tarantool Data Grid 2 clusters |
| uninstall |
Uninstall Tarantool or tt |
| version |
Show the tt version information |
Managing binaries in the current environment
$ tt binaries COMMAND [COMMAND_OPTION ...]
tt binaries manages Tarantool and tt binaries installed in the current environment.
COMMAND is one of the following:
tt binaries list shows a list of installed binaries and their versions.
To show a list of installed Tarantool versions:
$ tt binaries list
List of installed binaries:
• tarantool:
3.1.0 [active]
2.11.2
• tt:
2.3.0
2.2.1 [active]
$ tt binaries switch [PROGRAM_NAME] [VERSION]
tt binaries switch switches binaries used in the current environment.
The possible values of PROGRAM_NAME are:
tarantool: Tarantool Community Edition.
tarantool-ee: Tarantool Enterprise Edition.
tt: the tt command-line utility.
When called without arguments, the command lets you choose the program and
version interactively:
$ tt binaries switch
Use the arrow keys to navigate: ↓ ↑ → ←
? Select program:
▸ tarantool
tarantool-ee
tt
You can also specify the program name and version in the call.
To view tt versions installed in the current environment and switch
between them:
$ tt binaries switch tt
Use the arrow keys to navigate: ↓ ↑ → ←
? Select version:
▸ 2.2.1
2.3.0 [active]
To switch to a specific Tarantool EE version installed in the current environment:
$ tt binaries switch tarantool-ee 3.1.0
Building an application
$ tt build [PATH] [--spec SPEC_FILE_PATH]
tt build builds a Tarantool application locally.
-
--spec SPEC_FILE_PATH
Path to a .rockspec file to use for the current build
The PATH argument should contain the path to the application directory
(that is, to the build source). The default path is . (current directory).
The application directory must contain a .rockspec file to use for the build.
If there is more than one .rockspec file in the application directory, specify
the one to use in the --spec argument.
tt build builds an application with the tt rocks make command.
It downloads the application dependencies into the .rocks directory,
making the application ready to run locally.
Pre-build and post-build scripts
In addition to building the application with LuaRocks, tt build
can execute pre-build and post-build scripts. These scripts should
contain steps to execute right before and after building the application.
These files must be named tt.pre-build and tt.post-build correspondingly
and located in the application directory.
Note
For compatibility with Cartridge applications,
the pre-build and post-build scripts can also have names cartridge.pre-build
and cartridge.post-build.
tt.pre-build is helpful when your application depends on closed-source rocks,
or if the build should contain rocks from a project added as a submodule.
You can install these dependencies using the pre-build script before building.
Example:
#!/bin/sh
# The main purpose of this script is to build non-standard rocks modules.
# The script will run before `tt rocks make` during application build.
tt rocks make --chdir ./third_party/proj
tt.post-build is a script that runs after tt rocks make. The main purpose
of this script is to remove build artifacts from the final package. Example:
#!/bin/sh
# The main purpose of this script is to remove build artifacts from the resulting package.
# The script will run after `tt rocks make` during application build.
rm -rf third_party
rm -rf node_modules
rm -rf doc
Build the application app1 from its directory:
Build the application app1 from the simple_app directory inside the current directory:
Build the application app1 from its directory explicitly specifying the rockspec file to use:
$ tt build --spec app1-scm-1.rockspec
Managing a Cartridge application
Important
The Tarantool Cartridge framework is deprecated and is not compatible with
Tarantool 3.0 and later. This command is added for backward compatibility with
earlier versions.
$ tt cartridge COMMAND {[OPTION ...]|SUBCOMMAND}
tt cartridge manages a Cartridge application.
COMMAND is one of the following:
$ tt cartridge admin ADMIN_FUNC_NAME [ADMIN_OPTION ...]
tt cartridge admin calls admin functions provided by the application.
-
--name STRING
(Required) An application name.
-
-l, --list
List the available admin functions.
-
--instance STRING
A name of the instance to connect to.
-
--conn STRING
An address to connect to.
-
--run-dir STRING
A directory where PID and socket files are stored. Defaults to /var/run/tarantool.
Get a list of the available admin functions:
$ tt cartridge admin --name APPNAME --list
• Available admin functions:
probe Probe instance
Get help for a specific function:
$ tt cartridge admin --name APPNAME probe --help
• Admin function "probe" usage:
Probe instance
Args:
--uri string Instance URI
Call a function with an argument:
$ tt cartridge admin --name APPNAME probe --uri localhost:3301
• Probe "localhost:3301": OK
$ tt cartridge bench [BENCH_OPTION ...]
tt cartridge bench runs benchmarks for Tarantool.
-
--url STRING
A Tarantool instance address (the default is 127.0.0.1:3301).
-
--user STRING
A username used to connect to the instance (the default is guest).
-
--password STRING
A password used to connect to the instance.
-
--connections INT
A number of concurrent connections (the default is 10).
-
--requests INT
A number of simultaneous requests per connection (the default is 10).
-
--duration INT
The duration of a benchmark test in seconds (the default is 10).
-
--keysize INT
The size of a key part of benchmark data in bytes (the default is 10).
-
--datasize INT
The size of a value part of benchmark data in bytes (the default is 20).
-
--insert INT
A percentage of inserts (the default is 100).
-
--select INT
A percentage of selects.
-
--update INT
A percentage of updates.
-
--fill INT
A number of records to pre-fill the space (the default is 1000000).
$ tt cartridge failover COMMAND [COMMAND_OPTION ...]
tt cartridge failover manages an application failover.
The following commands are available:
$ tt cartridge failover set MODE [FAILOVER_SET_OPTION ...]
Setup failover in the specified mode:
stateful
eventual
disabled
Options:
--state-provider STRING: A failover’s state provider. Can be stateboard or etcd2. Used only in the stateful mode.
--params STRING: Failover parameters specified in a JSON-formatted string, for example, "{'fencing_timeout': 10', 'fencing_enabled': true}".
--provider-params STRING: Failover provider parameters specified in a JSON-formatted string, for example, "{'lock_delay': 14}".
$ tt cartridge failover setup --file STRING
Setup failover with parameters described in a file.
The failover configuration file defaults to failover.yml.
The failover.yml file might look as follows:
mode: stateful
state_provider: stateboard
stateboard_params:
uri: localhost:4401
password: passwd
failover_timeout: 15
$ tt cartridge failover status
Get the current failover status.
$ tt cartridge failover disable
Disable failover.
-
--name STRING
An application name. Defaults to “package” in rockspec.
-
--file STRING
A path to the file containing failover settings. Defaults to failover.yml.
$ tt cartridge repair COMMAND [REPAIR_OPTION ...]
tt cartridge repair repairs a running application.
The following commands are available:
list-topology
remove-instance
set-advertise-uri
set-leader
$ tt cartridge repair list-topology [REPAIR_OPTION ...]
Get a summary of the current cluster topology.
$ tt cartridge repair remove-instance UUID [REPAIR_OPTION ...]
Remove the instance with the specified UUID from the cluster. If the instance isn’t found, raise an error.
$ tt cartridge repair set-advertise-uri INSTANCE-UUID NEW-URI [REPAIR_OPTION ...]
Change the instance’s advertise URI. Raise an error if the instance isn’t found or is expelled.
$ tt cartridge repair set-leader REPLICASET-UUID INSTANCE-UUID [REPAIR_OPTION ...]
Set the instance as the leader of the replica set. Raise an error in the following cases:
- There is no replica set or instance with that UUID.
- The instance doesn’t belong to the replica set.
- The instance has been disabled or expelled.
The following options work with any repair subcommand:
-
--name
(Required) An application name.
-
--data-dir
The directory containing the instances’ working directories. Defaults to /var/lib/tarantool.
The following options work with any repair command, except list-topology:
-
--run-dir
The directory where PID and socket files are stored. Defaults to /var/run/tarantool.
-
--dry-run
Launch in dry-run mode: show changes but do not apply them.
-
--reload
Enable instance configuration to reload after the patch.
$ tt cartridge replicasets COMMAND [COMMAND_OPTION ...]
tt cartridge replicasets manages an application’s replica sets.
The following commands are available:
setup
save
list
join
list-roles
list-vshard-groups
add-roles
remove-roles
set-weight
set-failover-priority
bootstrap-vshard
expel
$ tt cartridge replicasets setup [--file FILEPATH] [--bootstrap-vshard]
Setup replica sets using a file.
Options:
--file: A file with a replica set configuration. Defaults to replicasets.yml.
--bootstrap-vshard: Bootstrap vshard upon setup.
$ tt cartridge replicasets save [--file FILEPATH]
Save the current replica set configuration to a file.
Options:
--file: A file to save the configuration to. Defaults to replicasets.yml.
$ tt cartridge replicasets list [--replicaset STRING]
List the current cluster topology.
Options:
--replicaset STRING: A replica set name.
$ tt cartridge replicasets join INSTANCE_NAME ... [--replicaset STRING]
Join the instance to a cluster.
If a replica set with the specified alias isn’t found in the cluster, it is created.
Otherwise, instances are joined to an existing replica set.
Options:
--replicaset STRING: A replica set name.
$ tt cartridge replicasets list-roles
List the available roles.
replicasets list-vshard-groups
$ tt cartridge replicasets list-vshard-groups
List the available vshard groups.
$ tt cartridge replicasets add-roles ROLE_NAME ... [--replicaset STRING] [--vshard-group STRING]
Add roles to the replica set.
Options:
--replicaset STRING: A replica set name.
--vshard-group STRING: A vshard group for vshard-storage replica sets.
$ tt cartridge replicasets remove-roles ROLE_NAME ... [--replicaset STRING]
Remove roles from the replica set.
Options:
--replicaset STRING: A replica set name.
$ tt cartridge replicasets set-weight WEIGHT [--replicaset STRING]
Specify replica set weight.
Options:
--replicaset STRING: A replica set name.
replicasets set-failover-priority
$ tt cartridge replicasets set-failover-priority INSTANCE_NAME ... [--replicaset STRING]
Configure replica set failover priority.
Options:
--replicaset STRING: A replica set name.
replicasets bootstrap-vshard
$ tt cartridge replicasets bootstrap-vshard
Bootstrap vshard.
$ tt cartridge replicasets expel INSTANCE_NAME ...
Expel one or more instances from the cluster.
Printing the contents of .snap and .xlog files
$ tt cat FILE ... [OPTION ...]
tt cat prints the contents of snapshot (.snap) and
WAL (.xlog) files to stdout. A single call of tt cat can
print the contents of multiple files.
-
--format FORMAT
Output format: yaml (default), json, or lua.
-
--from LSN
Show operations starting from the given LSN.
-
--to LSN
Show operations up to the given LSN. Default: 18446744073709551615.
-
--replica ID
Filter the output by replica ID. Can be passed more than once.
When calling tt cat with filters by LSN (--from and --to flags) and
replica ID (--replica), remember that LSNs differ across replicas.
Thus, if you pass more than one replica ID via --from or --to,
the result may not reflect the actual sequence of operations.
-
--space ID
Filter the output by space ID. Can be passed more than once.
-
--show-system
Show the contents of system spaces.
Output contents of 00000000000000000000.xlog WAL file in the YAML format:
$ tt cat 00000000000000000000.xlog
Output operations on spaces with space_id 512 and 513 from the
00000000000000000012.snap snapshot file in the JSON format:
$ tt cat 00000000000000000012.snap --space 512 --space 513 --format json
Output operations on all spaces, including system spaces,
from the 00000000000000000000.xlog WAL file:
$ tt cat 00000000000000000000.xlog --show-system
Output operations with LSNs between 100 and 500 on replica 1
from the 00000000000000000000.xlog WAL file:
$ tt cat 00000000000000000000.xlog --from 100 --to 500 --replica 1
Environment configuration
$ tt cfg COMMAND [OPTION ...]
tt cfg manages a tt environment configuration.
dump |
Print a tt environment configuration.
Options:
-r, --raw: Print a raw content of the tt.yaml configuration file.
|
Print the current tt environment configuration:
Checking an application file
$ tt check {FILEPATH | APPLICATION[:APP_INSTANCE]}
tt check checks the syntax correctness of Lua files within Tarantool applications
or separate Lua scripts. The files must be stored inside the instances_enabled
directory specified in the tt configuration file.
To check all Lua files in an application directory at once, specify the directory name:
To check a single Lua file from an application directory, add the path to this file:
$ tt check app/router
# or
$ tt check app/router.lua
Note
The .lua extension can be omitted.
Cleaning instance files
$ tt clean APPLICATION[:APP_INSTANCE] [OPTION ...]
tt clean cleans stored files of Tarantool instances: logs, snapshots, and
other files. To avoid accidental deletion of files, tt clean shows
the files it is going to delete and asks for confirmation.
When called without arguments, cleans files of all applications in the current environment.
-
-f, --force
Clean files without confirmation.
Managing cluster configurations
$ tt cluster COMMAND [COMMAND_OPTION ...]
tt cluster manages configurations of Tarantool applications.
This command works both with local YAML files in application directories
and with centralized configuration storages (etcd or Tarantool-based).
COMMAND is one of the following:
$ tt cluster publish {APPLICATION[:APP_INSTANCE] | CONFIG_URI} [FILE] [OPTION ...]
tt cluster publish publishes a cluster configuration using an arbitrary YAML file as a source.
Publishing local configurations
tt cluster publish can modify local cluster configurations stored in
config.yaml files inside application directories.
To write a configuration to a local config.yaml, run tt cluster publish
with two arguments:
- the application name.
- the path to a YAML file from which the configuration should be taken.
$ tt cluster publish myapp source.yaml
Publishing configurations in centralized storages
tt cluster publish can modify centralized cluster configurations
in storages of both supported types: etcd or a Tarantool-based configuration storage.
To publish a configuration from a file to a centralized configuration storage,
run tt cluster publish with a URI of this storage’s
instance as the target. For example, the command below publishes a configuration from source.yaml
to a local etcd instance running on the default port 2379:
$ tt cluster publish "http://localhost:2379/myapp" source.yaml
A URI must include a prefix that is unique for the application. It can also include
credentials and other connection parameters. Find the detailed description of the
URI format in URI format.
Publishing configurations of specific instances
In addition to whole cluster configurations, tt cluster publish can manage
configurations of specific instances within applications: rewrite configurations
of existing instances and add new instance configurations.
In this case, it operates with YAML fragments that describe a single instance configuration section.
For example, the following YAML file can be a source when publishing an instance configuration:
# instance_source.yaml
iproto:
listen:
- uri: 127.0.0.1:3311
To send an instance configuration to a local config.yaml, run tt cluster publish
with the application:instance pair as the target argument:
$ tt cluster publish myapp:instance-002 instance_source.yaml
To send an instance configuration to a centralized configuration storage, specify
the instance name in the name argument of the storage URI:
$ tt cluster publish "http://localhost:2379/myapp?name=instance-002" instance_source.yaml
If the instance already exists, this call overwrites its configuration with the one
from the file.
To add a new instance configuration from a YAML fragment, specify the name to assign to
the new instance and its location in the cluster topology – replica set and group –
in the --replicaset and --group options.
Note
The --group option can be omitted if the configuration contains only one group.
To add a new instance instance-003 to the replicaset-001 replica set:
$ tt cluster publish "http://localhost:2379/myapp?name=instance-003" instance_source.yaml --replicaset replicaset-001
tt cluster publish validates configurations against the Tarantool configuration schema
and aborts in case of an error. To skip the validation, add the --force option:
$ tt cluster publish myapp source.yaml --force
Publishing configurations with integrity check
Enterprise Edition
The integrity check functionality is supported by the Enterprise Edition only.
When called with the --with-integrity-check option, tt cluster publish
generates a checksum of the configurations it publishes. It signs the checksum using
the private key passed as the option argument, and writes it into the configuration store.
$ tt cluster publish "http://localhost:2379/myapp" source.yaml --with-integrity-check private.pem
If an application configuration is published this way, it can be checked for integrity
using the --integrity-check global option.
$ tt --integrity-check public.pem cluster show myapp
$ tt --integrity-check public.pem start myapp
Learn more about integrity checks upon application startup and in runtime in the tt start reference.
To ensure the configuration integrity when updating it, call tt cluster publish
with two options:
--integrity-check PUBLIC_KEY global option checks that the configuration wasn’t changed
since it was published
--with-integrity-check PRIVATE_KEY generates new hash and signature
for future integrity checks of the updated configuration.
$ tt --integrity-check public.pem cluster publish \
--with-integrity-check private.pem \
"http://localhost:2379/myapp" source.yaml
$ tt cluster show {APPLICATION[:APP_INSTANCE] | CONFIG_URI} [OPTION ...]
tt cluster show displays a cluster configuration.
Displaying local configurations
tt cluster show can read local cluster configurations stored in config.yaml
files inside application directories.
To print a local configuration from an application’s config.yaml, specify the
application name as an argument:
Displaying configurations from centralized storages
tt cluster show can display centralized cluster configurations
from configuration storages of both supported types: etcd or a Tarantool-based configuration storage.
To print a cluster configuration from a centralized storage, run tt cluster show
with a storage URI including the prefix identifying the application. For example, to print
myapp’s configuration from a local etcd storage:
$ tt cluster show "http://localhost:2379/myapp"
Displaying configurations of specific instances
In addition to whole cluster configurations, tt cluster show can display
configurations of specific instances within applications. In this case, it prints
YAML fragments that describe a single instance configuration section.
To print an instance configuration from a local config.yaml, use the application:instance
argument:
$ tt cluster show myapp:instance-002
To print an instance configuration from a centralized configuration storage, specify
the instance name in the name argument of the URI:
$ tt cluster show "http://localhost:2379/myapp?name=instance-002"
To validate configurations when printing them with tt cluster show, enable the
validation by adding the --validate option:
$ tt cluster show "http://localhost:2379/myapp" --validate
$ tt cluster replicaset SUBCOMMAND {APPLICATION[:APP_INSTANCE] | CONFIG_URI} [OPTION ...]
tt cluster replicaset manages instances in a replica set. It supports the following
subcommands:
Important
tt cluster replicaset works only with centralized cluster configurations.
To manage replica sets in clusters with local YAML configurations,
use tt replicaset.
$ tt cluster replicaset demote CONFIG_URI INSTANCE_NAME [OPTION ...]
tt cluster replicaset demote demotes an instance in a replica set.
This command works on Tarantool clusters with centralized configuration and
with failover mode
off.
Note
In clusters with manual failover mode, you can demote a read-write instance
by promoting a read-only instance from the same replica set with tt cluster replicaset promote.
The command sets the instance’s database.mode
to ro and reloads the configuration.
Important
If failover is off, the command doesn’t consider the modes of other
replica set members, so there can be any number of read-write instances in one replica set.
$ tt cluster replicaset expel CONFIG_URI INSTANCE_NAME [OPTION ...]
tt cluster replicaset expel expels an instance from the cluster. Example:
$ tt cluster replicaset expel "http://localhost:2379" storage-b-002
$ tt cluster replicaset roles [add|remove] CONFIG_URI ROLE_NAME [OPTION ...]
tt cluster replicaset roles manages application roles
in the configuration scope specified in the command options. It has two subcommands:
add adds a role
remove removes a role
Use the --global, --group, --replicaset, --instance options to specify
the configuration scope to add or remove roles. For example, to add a role to
all instances in a replica set:
$ tt cluster replicaset roles add "http://localhost:2379" roles.my-role --replicaset storage-a
To remove a role defined in the global configuration scope:
$ tt cluster replicaset roles remove "http://localhost:2379" roles.my-role --global
The changes that tt cluster replicaset makes to the configuration storage
occur transactionally. Each call creates a new revision. In case of a revision mismatch,
an error is raised.
If the cluster configuration is distributed over multiple keys in the configuration
storage (for example, in two paths /myapp/config/k1 and /myapp/config/k2),
the affected instance configuration can be present in more that one of them.
If it is found under several different keys, the command prompts the user to choose
a key for patching. You can skip the selection by adding the -f/--force option:
$ tt cluster replicaset promote "http://localhost:2379/myapp" storage-001-a --force
In this case, the command selects the key for patching automatically. A key’s priority
is determined by the detail level of the instance or replica set configuration stored
under this key. For example, when failover is off, a key with
instance.database options takes precedence over a key with the only instance field.
In case of equal priority, the first key in the lexicographical order is patched.
$ tt cluster failover SUBCOMMAND [OPTION ...]
tt cluster failover manages a supervised failover in Tarantool clusters.
Important
tt cluster failover works only with centralized cluster configurations stored in etcd.
$ tt cluster failover switch CONFIG_URI INSTANCE_NAME [OPTION ...]
tt cluster failover switch appoints the specified instance to be a master.
This command accepts the following arguments and options:
CONFIG_URI: A URI of the cluster configuration storage.
INSTANCE_NAME: An instance name.
[OPTION ...]: Options to pass to the command.
In the example below, tt cluster failover switch appoints storage-a-002 to be a master:
$ tt cluster failover switch http://localhost:2379/myapp storage-a-002
To check the switching status, run:
tt cluster failover switch-status http://localhost:2379/myapp b1e938dd-2867-46ab-acc4-3232c2ef7ffe
Note that the command output includes an identifier of the task responsible for switching a master.
You can use this identifier to see the status of switching a master instance using tt cluster failover switch-status.
$ tt cluster failover switch-status CONFIG_URI TASK_ID
tt cluster failover switch-status shows the status of switching a master instance.
This command accepts the following arguments:
CONFIG_URI: A URI of the cluster configuration storage.
TASK_ID: An identifier of the task used to switch a master instance. You can find the task identifier in the tt cluster failover switch command output.
Example:
$ tt cluster failover switch-status http://localhost:2379/myapp b1e938dd-2867-46ab-acc4-3232c2ef7ffe
There are three ways to pass the credentials for connecting to the centralized configuration storage.
They all apply to both etcd and Tarantool-based storages. The following list
shows these ways ordered by precedence, from highest to lowest:
Credentials specified in the storage URI: https://username:password@host:port/prefix:
$ tt cluster show "http://myuser:p4$$w0rD@localhost:2379/myapp"
tt cluster options -u/--username and -p/--password:
$ tt cluster show "http://localhost:2379/myapp" -u myuser -p p4$$w0rD
Environment variables TT_CLI_ETCD_USERNAME and TT_CLI_ETCD_PASSWORD:
$ export TT_CLI_ETCD_USERNAME=myuser
$ export TT_CLI_ETCD_PASSWORD=p4$$w0rD
$ tt cluster show "http://localhost:2379/myapp"
If connection encryption is enabled on the configuration storage, pass the required
SSL parameters in the URI arguments.
-
-u, --username STRING
A username for connecting to the configuration storage.
See also: Authentication.
-
-p, --password STRING
A password for connecting to the configuration storage.
See also: Authentication.
-
--force
Applicable to: publish, replicaset
publish: skip validation when publishing. Default: false (validation is enabled).
replicaset: skip key selection for patching. Learn more in tt-cluster-replicaset-details:.
-
-G, --global
Applicable to: replicaset roles
Apply the operation to the global configuration scope, that is, to all instances.
-
-g, --group
Applicable to: publish, replicaset roles
A name of the configuration group to which the operation applies.
-
-i, --instance
Applicable to: replicaset roles
A name of the instance to which the operation applies.
-
-r, --replicaset
Applicable to: publish, replicaset roles
A name of the replica set to which the operation applies.
-
-t, --timeout UINT
Applicable to: failover
A timeout (in seconds) for executing a command. Default: 30.
-
--validate
Applicable to: show
Validate the printed configuration. Default: false (validation is disabled).
-
-w, --wait
Applicable to: failover
Wait while the command completes the execution. Default: false (don’t wait).
-
--with-integrity-check STRING
-
Applicable to: publish, replicaset
Generate hashes and signatures for integrity checks.
See also: Publishing configurations with integrity check
Generating completion for tt
tt completion generates tab-based completion for tt commands
in the specified shell: bash or zsh.
Generate tt completion for the current bash terminal:
$ . <(tt completion bash)
Note
You can add an execution of the completion script to a user’s .bashrc
file to make the completion work for this user in all their terminals.
Creating an application from a template
$ tt create TEMPLATE_NAME [OPTION ...]
tt create creates a new Tarantool application from a template.
Application templates speed up the development of Tarantool applications by
defining their initial structure and content. A template can include application
code, configuration, build scripts, and other resources.
tt comes with built-in templates for popular use cases. You can also create
custom templates for specific purposes.
There are the following built-in templates:
vshard_cluster: a sharded cluster application for Tarantool 3.0 or later.
single_instance: a single-instance application for Tarantool 3.0 or later.
cartridge: a Cartridge cluster application for Tarantool 2.x.
Important
The Tarantool Cartridge framework is deprecated and is not compatible with
Tarantool 3.0 and later.
To create the app1 application in the current tt environment from the built-in
vshard_cluster template:
$ tt create vshard_cluster --name app1 -dst /opt/tt/apps/
The command requests cluster topology parameters, such as the number of shards
or routers, interactively during the execution.
To create the application in the /opt/tt/apps directory with default cluster
topology and force rewrite the application directory if it already exists:
$ tt create vshard_cluster --name app1 -f --non-interactive -dst /opt/tt/apps/
Creating custom application templates
tt searches for custom templates in the directories specified in the templates
section of its configuration file.
To create the application app1 from the simple_app custom template in the current directory:
$ tt create simple_app --name app1
Application templates are directories with files.
The main file of a template is its manifest. It defines how the applications
are instantiated from this template.
A template manifest is a YAML file named MANIFEST.yaml. It can contain the following sections:
description – the template description.
vars – template variables.
pre-hook and post-hook – paths to executables to run before and after the template
instantiation.
include – a list of files to keep in the application directory after
instantiation. If this section is omitted, the application will contain all template files
and directories.
All sections are optional.
Example:
description: Template description
vars:
- prompt: User name
name: user_name
default: admin
re: ^\w+$
- prompt: Retry count
default: "3"
name: retry_count
re: ^\d+$
pre-hook: ./hooks/pre-gen.sh
post-hook: ./hooks/post-gen.sh
include:
- init.lua
- instances.yml
Files and directories of a template are copied to the application directory
according to the include section of the manifest (or its absence).
Note
Don’t include the .rocks directory in application templates.
To specify application dependencies, use the .rockspec files.
There is a special file type *.tt.template. The content of such files is
adjusted for each application with the help of template variables.
During the instantiation, the variables in these files are replaced with provided
values and the *.tt.template extension is removed.
Templates variables are replaced with their values provided upon the instantiation.
All templates have the name variable. Its value is taken from the --name option.
To add other variables, define them in the vars section of the template manifest.
A variable can have the following attributes:
prompt: a line of text inviting to enter the variable value in the interactive mode. Required.
name: the variable name. Required.
default: the default value. Optional.
re: a regular expression that the value must match. Optional.
Example:
vars:
- prompt: Cluster cookie
name: cluster_cookie
default: cookie
re: ^\w+$
Variables can be used in all file names and the content of *.tt template files.
Note
Variables don’t work in directory names.
To use a variable, enclose its name with a period in the beginning in double curly braces:
{{.var_name}} (as in the Golang text templates
syntax).
Examples:
Variables receive their values during the template instantiation. By default, tt create
asks you to provide the values interactively. You can use the -s (or --non-interactive)
option to disable the interactive input. In this case, the values are searched in the following order:
In the --var option. Pass a string of the var=value format after the --var
option. You can pass multiple variables, each after a separate --var option:
$ tt create template app --var user_name=admin
In a file. Specify var=value pairs in a plain text file, each on a new line, and
pass it as the value of the --vars-file option:
$ tt create template app --vars-file variables.txt
variables.txt can look like this:
user_name=admin
password=p4$$w0rd
version=2
If a variable isn’t initialized in any of these ways, the default value
from the manifest is used.
You can combine different ways of passing variables in a single call of tt create.
By default, the application appears in the directory named after the provided
application name (--name value).
To change the application location, use the -dst option.
-
-d PATH, --dst PATH
Path to the directory where the application will be created.
-
-f, --force
Force rewrite the application directory if it already exists.
-
--name NAME
Application name.
-
-s, --non-interactive
Non-interactive mode.
-
--var [VAR=VALUE ...]
Variable definition. Usage: --var var_name=value.
-
--vars-file FILEPATH
Path to the file with variable definitions.
Interacting with the CRUD module
$ tt crud COMMAND [COMMAND_OPTION ...]
tt crud enables the interaction with a cluster using the CRUD module.
COMMAND is one of the following:
Adding external applications to environments
$ tt enable {APPLICATION|SCRIPT}
tt enable adds an external Tarantool application to the current environment
by creating a symlink to it in the instances.enabled directory.
To add the application located in /home/tt-user/external_app to the current
tt environment:
$ tt enable /home/tt-user/external_app
Once the application is added, you can work with it the same way as with applications
created in this environment.
Exporting data
$ tt [crud|tdg2] export URI SPACE:FILE ... [EXPORT_OPTION ...]
tt [crud|tdg2] export exports a space’s data to a file. Three export commands
cover the following cases:
tt [crud|tdg2] export takes the following arguments:
URI: The URI of a router instance if crud is used. Otherwise, it should specify the URI of a storage.
FILE: The name of a file for storing exported data.
SPACE: The name of a space from which data is exported.
Note
Read access to the space is required to export its data.
Exporting isn’t supported for the interval field type.
Exporting with default settings
The command below exports data of the customers space to the customers.csv file:
$ tt crud export localhost:3301 customers:customers.csv
If the customers space has five fields (id, bucket_id, firstname, lastname, and age), the file with exported data might look like this:
1,477,Andrew,Fuller,38
2,401,Michael,Suyama,46
3,2804,Robert,King,33
# ...
If a tuple contains a null value, for example, [1, 477, 'Andrew', null, 38], it is exported as an empty value:
In the CSV format, tt exports empty values by default for fields containing compound data such as arrays or maps.
To export compound values in a specific format, use the --compound-value-format option.
For example, the command below exports compound values to CSV serialized in JSON:
$ tt crud export localhost:3301 customers:customers.csv \
--compound-value-format json
When connecting to the cluster with enabled authentication, specify access credentials
in the --username and --password command options:
$ tt crud export localhost:3301 customers:customers.csv \
--username myuser --password p4$$w0rD
To connect to instances that use SSL encryption,
provide the SSL certificate and SSL key files in the --sslcertfile and --sslkeyfile options.
If necessary, add other SSL parameters in the --ssl* options.
$ tt crud export localhost:3301 customers:customers.csv \
--username myuser --password p4$$w0rD \
--auth pap-sha256 --sslcertfile certs/server.crt \
--sslkeyfile certs/server.key
For connections that use SSL but don’t require additional parameters, add the --use-ssl
option:
$ tt crud export localhost:3301 customers:customers.csv \
--username myuser --password p4$$w0rD \
--use-ssl
-
--auth STRING
Applicable to: tt crud export, tt tdg2 export
Authentication type: chap-sha1, pap-sha256, or auto.
-
--batch-queue-size INT
The maximum number of tuple batches in a queue between a fetch and write threads (the default is 32).
tt exports data using two threads:
- A fetch thread makes requests and receives data from a Tarantool instance.
- A write thread encodes received data and writes it to the output.
The fetch thread uses a queue to pass received tuple batches to the write thread.
If a queue is full, the fetch thread waits until the write thread takes a batch from the queue.
-
--batch-size INT
The number of tuples to transfer per request. The default is:
10000 for tt export and tt crud export.
100 for tt tdg2 export.
Important
When using tt tdg2 export, make sure that the batch size does not exceed
the hard-limits.returned TDG2 parameter value set on the cluster.
-
--compound-value-format STRING
Applicable to: tt export, tt crud export
A format used to export compound values like arrays or maps.
By default, tt exports empty values for fields containing such values.
Supported formats: json.
See also: Exporting compound data.
Applicable to: tt export, tt crud export
Add field names in the first row.
See also: Exporting headers.
-
--password STRING
A password used to connect to the instance.
-
--readview
Applicable to: tt export, tt crud export
Export data using a read view.
-
--sslcafile STRING
Applicable to: tt crud export, tt tdg2 export
The path to a trusted certificate authorities (CA) file for encrypted connections.
See also Encrypted connection.
-
--sslcertfile STRING
Applicable to: tt crud export, tt tdg2 export
The path to an SSL certificate file for encrypted connections.
See also Encrypted connection.
-
--sslciphersfile STRING
Applicable to: tt crud export, tt tdg2 export
The list of SSL cipher suites used for encrypted connections, separated by colons (:).
See also Encrypted connection.
-
--sslkeyfile STRING
Applicable to: tt crud export, tt tdg2 export
The path to a private SSL key file for encrypted connections.
See also Encrypted connection.
-
--sslpassword STRING
Applicable to: tt crud export, tt tdg2 export
The password for the SSL key file for encrypted connections.
See also Encrypted connection.
-
--sslpasswordfile STRING
Applicable to: tt crud export, tt tdg2 export
A file with list of passwords to the SSL key file for encrypted connections.
See also Authentication.
-
--token STRING
Applicable to: tt tdg2 export
An application token for connecting to TDG2.
-
--use-ssl STRING
Use SSL without providing any additional SSL parameters.
See also Encrypted connection.
-
--username STRING
A username for connecting to the instance.
Displaying help for tt and its commands
tt help displays help:
- for
tt utility when called without a COMMAND.
- for a specified
tt command.
Importing data
$ tt [crud|tdg2] import URI FILE:SPACE [IMPORT_OPTION ...]
# or
$ tt [crud|tdg2] import URI :SPACE < FILE [IMPORT_OPTION ...]
tt [crud|tdg] import imports data from a file to a space. Three import commands
cover the following cases:
tt import imports data into a replica set through its master instance using the box.space API.
tt crud import imports data into a sharded cluster through a router using the CRUD module.
tt tdg2 import imports data into a Tarantool Data Grid 2 cluster
through its router using the repository.put function of the TDG2 Repository API.
tt [crud|tdg2] import takes the following arguments:
URI: The URI of a router instance if crud is used. Otherwise, it should specify the URI of a storage.
FILE: The name of a file containing data to be imported.
SPACE: The name of a space to which data is imported.
Note
Write access to the space and execute access to universe are required to import data.
Importing isn’t supported for the interval field type.
Importing bucket_id into sharded clusters
When importing data into a CRUD-enabled sharded cluster, tt crud import ignores
the bucket_id field values from the input file. This allows CRUD to automatically
manage data distribution in the cluster by generating new bucket_id for tuples
during import.
If you need to preserve the original bucket_id values, use the --keep-bucket-id option:
$ tt crud import localhost:3301 customers.csv:customers \
--keep-bucket-id \
--header \
--match=header
Handling duplicate primary key errors
The --on-exist option enables you to control data import when a duplicate primary key error occurs.
In the example below, values already existing in the space are replaced with new ones:
$ tt crud import localhost:3301 customers.csv:customers \
--on-exist replace
To skip rows whose data cannot be parsed correctly, use the --on-error option as follows:
$ tt crud import localhost:3301 customers.csv:customers \
--on-error skip
When connecting to the cluster with enabled authentication, specify access credentials
in the --username and --password command options:
$ tt crud import localhost:3301 customers.csv:customers \
--header --match=header \
--username myuser --password p4$$w0rD
To connect to instances that use SSL encryption,
provide the SSL certificate and SSL key files in the --sslcertfile and --sslkeyfile options.
If necessary, add other SSL parameters in the --ssl* options.
$ tt crud import localhost:3301 customers.csv:customers \
--header --match=header \
--username myuser --password p4$$w0rD \
--auth pap-sha256 --sslcertfile certs/server.crt \
--sslkeyfile certs/server.key
For connections that use SSL but don’t require additional parameters, add the --use-ssl
option:
$ tt crud import localhost:3301 customers.csv:customers \
--header --match=header \
--username myuser --password p4$$w0rD \
--use-ssl
-
--auth STRING
Applicable to: tt crud import, tt tdg2 import
Authentication type: chap-sha1, pap-sha256, or auto.
-
--batch-size INT
Applicable to: tt crud import, tt tdg2 import
The number of tuples to transfer per request. The default is:
-
--dec-sep STRING
Applicable to: tt import, tt crud import
The string of symbols that defines decimal separators for numeric data (the default is .,).
Note
Symbols specified in this option cannot intersect with --th-sep.
-
--delimiter STRING
Applicable to: tt import, tt crud import
A symbol that defines a field value delimiter.
For CSV, the default delimiter is a comma (,).
To use a tab character as a delimiter, set this value as tab:
$ tt crud import localhost:3301 customers.csv:customers \
--delimiter tab
Note
A delimiter cannot be \r, \n, or the Unicode replacement character (U+FFFD).
-
--error STRING
The name of a file containing rows that are not imported (the default is error).
See also: Handling parsing errors.
-
--force
Applicable to: tt tdg2 import
Automatically confirm importing into TDG2 with --batch-size greater than one.
-
--format STRING
A format of input data.
Supported formats: csv.
Applicable to: tt import, tt crud import
Process the first line as a header containing field names.
In this case, field values start from the second line.
See also: Matching of input and space fields.
-
--keep-bucket-id
Applicable to: tt crud import
Preserve original values of the bucket_id field.
See also: Importing bucket_id into sharded clusters.
-
--log STRING
The name of a log file containing information about import errors (the default is import).
If the log file already exists, new data is written to this file.
-
--match STRING
Applicable to: tt import, tt crud import
Configure matching between field names in the input file and the target space.
See also: Matching of input and space fields.
-
--null STRING
Applicable to: tt import, tt crud import
A value to be interpreted as null when importing data.
By default, an empty value is interpreted as null.
For example, a tuple imported from the following row …
… should look as follows: [1, 477, 'Andrew', null, 38].
-
--on-error STRING
An action performed if a row to be imported cannot be parsed correctly.
Possible values:
stop: stop importing data.
skip: skip rows whose data cannot be parsed correctly.
Duplicate primary key errors are handled using the --on-exist option.
See also: Handling parsing errors.
-
--on-exist STRING
An action performed if a duplicate primary key error occurs.
Possible values:
stop: stop importing data.
skip: skip existing values when importing.
replace: replace existing values when importing.
Other errors are handled using the --on-error option.
See also: Handling duplicate primary key errors.
-
--password STRING
A password used to connect to the instance.
-
--progress STRING
The name of a progress file that stores the following information:
- The positions of lines that were not imported at the last launch.
- The last position that was processed at the last launch.
If a file with the specified name exists, it is taken into account when importing data.
tt import tries to insert lines that were not imported and then continues importing from the last position.
At each launch, the content of a progress file with the specified name is overwritten.
If the file with the specified name does not exist, a progress file is created with the results of this run.
Note
If the option is not set, then this mechanism is not used.
-
--quote STRING
Applicable to: tt import, tt crud import
A symbol that defines a quote.
For CSV, double quotes are used by default (").
The double symbol of this option acts as the escaping symbol within input data.
-
--rollback-on-error
Applicable to: tt crud import
Specify whether any operation failed on a storage leads to rolling back batch
import on this storage.
Note
tt tdg2 import always works as if --rollback-on-error is true.
-
--sslcafile STRING
Applicable to: tt crud import, tt tdg2 import
The path to a trusted certificate authorities (CA) file for encrypted connections.
See also Encrypted connection.
-
--sslcertfile STRING
Applicable to: tt crud import, tt tdg2 import
The path to an SSL certificate file for encrypted connections.
See also Encrypted connection.
-
--sslciphersfile STRING
Applicable to: tt crud import, tt tdg2 import
The list of SSL cipher suites used for encrypted connections, separated by colons (:).
See also Encrypted connection.
-
--sslkeyfile STRING
Applicable to: tt crud import, tt tdg2 import
The path to a private SSL key file for encrypted connections.
See also Encrypted connection.
-
--sslpassword STRING
Applicable to: tt crud import, tt tdg2 import
The password for the SSL key file for encrypted connections.
See also Encrypted connection.
-
--sslpasswordfile STRING
Applicable to: tt crud import, tt tdg2 import
A file with a list of passwords to the SSL key file for encrypted connections.
See also Authentication.
-
-success STRING
The name of a file with rows that were imported (the default is success).
Overwrites the file if it already exists.
-
--th-sep STRING
Applicable to: tt import, tt crud import
The string of symbols that define thousand separators for numeric data.
The default value includes a space and a backtick `.
This means that 1 000 000 and 1`000`000 are both imported as 1000000.
Note
Symbols specified in this option cannot intersect with --dec-sep.
-
--token STRING
Applicable to: tt tdg2 import
An application token for connecting to TDG2.
-
--use-ssl STRING
Use SSL without providing any additional SSL parameters.
See also Encrypted connection.
-
--username STRING
A username for connecting to the instance.
Creating a tt environment
tt init creates a tt environment in the current directory. This includes:
- Setting up directories for working files: binaries, templates, and so on.
- Creating a corresponding
tt.yaml configuration file.
Important
The Tarantool Cartridge framework is deprecated and is not compatible with
Tarantool 3.0 and later. This command is added for backward compatibility with
earlier versions.
tt init checks the existence of configuration files for Cartridge (cartridge.yml)
or the tarantoolctl utility (.tarantoolctl) in the current directory.
If such files are found, tt generates an environment that uses the same
directories:
If there is no cartridge.yml or .tarantoolctl files in the current directory,
tt init creates a default environment in it. This includes creating the
following directories and files:
bin – the directory for storing binary files.
include – the directory for storing header files.
distfiles – the directory for storing installation files.
instances.enabled – the directory for storing running applications or symlinks.
modules – the directory for storing external modules.
tt.yaml – the configuration file.
templates – the directory for storing application templates.
Create a tt environment in the current directory:
Listing enabled applications
tt instances shows the list of enabled applications and their instances
in the current environment.
Note
Enabled applications are applications that are stored inside the instances_enabled
directory specified in the tt configuration file.
They can be either running or not. To check if an application is running,
use tt status.
Rotating instance logs
$ tt logrotate APPLICATION[:APP_INSTANCE]
tt logrotate rotates logs of a Tarantool application or specific instances,
and the tt log. For example, you need to call this function to continue logging
after a log rotation program renames or moves instances’ logs.
Learn more about rotating logs.
Calling tt logrotate on an application has the same effect as executing the
built-in log.rotate() function on all its instances.
Rotate logs of the app application’s instances:
Managing centralized migrations
$ tt migrations COMMAND [COMMAND_OPTION ...]
tt migrations manages centralized migrations
in a Tarantool EE cluster. See Centralized migrations with tt for a detailed guide
on using the centralized migrations mechanism.
Important
Only Tarantool EE clusters with etcd centralized configuration storage are supported.
COMMAND is one of the following:
$ tt migrations publish ETCD_URI [MIGRATIONS_DIR | MIGRATION_FILE] [OPTION ...]
tt migrations publish sends the migration files to the cluster’s centralized
configuration storage for future execution.
By default, the command sends all files stored in migrations/ inside the current
directory.
$ tt migrations publish "https://user:pass@localhost:2379/myapp"
To select another directory with migration files, provide a path to it as the command
argument:
$ tt migrations publish "https://user:pass@localhost:2379/myapp" my_migrations
To publish a single migration from a file, use its name or path as the command argument:
$ tt migrations publish "https://user:pass@localhost:2379/myapp" migrations/000001_create_space.lua
Optionally, you can provide a key to use as a migration identifier instead of the filename:
$ tt migrations publish "https://user:pass@localhost:2379/myapp" file.lua \
--key=000001_create_space.lua
When publishing migrations, tt performs checks for:
- Syntax errors in migration files. To skip syntax check, add the
--skip-syntax-check option.
- Existence of migrations with same names. To overwrite an existing migration with
the same name, add the
--overwirte option.
- Migration names order. By default,
tt migrations only adds new migrations
to the end of the migrations list ordered lexicographically. For example, if
migrations 001.lua and 003.lua are already published, an attempt to publish
002.lua will fail. To force publishing migrations disregarding the order,
add the --ignore-order-violation option.
Warning
Using the options that ignore checks when publishing migration may cause
migration inconsistency in the cluster.
$ tt migrations apply ETCD_URI [OPTION ...]
tt migrations apply applies published migrations
to the cluster. It executes all migrations from the cluster’s centralized
configuration storage on all its read-write instances (replica set leaders).
$ tt migrations apply "https://user:pass@localhost:2379/myapp" \
--tarantool-username=admin --tarantool-password=pass
To apply a single published migration, pass its name in the --migration option:
$ tt migrations apply "https://user:pass@localhost:2379/myapp" \
--tarantool-username=admin --tarantool-password=pass \
--migration=000001_create_space.lua
To apply migrations on a single replica set, specify the replicaset option:
$ tt migrations apply "https://user:pass@localhost:2379/myapp" \
--tarantool-username=admin --tarantool-password=pass \
--replicaset=storage-001
The command also provides options for migration troubleshooting: --ignore-order-violation,
--force-reapply, and --ignore-preceding-status. Learn to use them in
Troubleshooting migrations.
Warning
The use of migration troubleshooting options may lead to migration inconsistency
in the cluster. Use them only for local development and testing purposes.
$ tt migrations status ETCD_URI [OPTION ...]
tt migrations status prints the list of migrations published to the centralized
storage and the result of their execution on the cluster instances.
Possible migration statuses are:
APPLY_STARTED – the migration execution has started but not completed yet- or has been interrupted with tt migrations stop <tt-migrations-stop>`
APPLIED – the migration is successfully applied on the instance
FAILED – there were errors during the migration execution on the instance
To get the list of migrations stored in the given etcd storage and information about
their execution on the cluster, run:
$ tt migrations status "https://user:pass@localhost:2379/myapp" \
--tarantool-username=admin --tarantool-password=pass
If the cluster uses SSL encryption, add SSL options. Learn more in Authentication.
Use the --migration and --replicaset options to get information about specific
migrations or replica sets:
$ tt migrations status "https://user:pass@localhost:2379/myapp" \
--tarantool-username=admin --tarantool-password=pass \
--replicaset=storage-001 --migration=000001_create_writers_space.lua
The --display-mode option allows to tailor the command output:
- with
--display-mode config-storage, the command prints only the list of migrations
published to the centralized storage.
- with
--display-mode cluster, the command prints only the migration statuses
on the cluster instances.
To find out the results of a migration execution on a specific replica set in the cluster, run:
$ tt migrations status "https://user:pass@localhost:2379/myapp" \
--tarantool-username=admin --tarantool-password=pass \
--replicaset=storage-001 --display-mode=cluster
$ tt migrations stop ETCD_URI [OPTION ...]
tt migrations stop stops the execution of migrations in the cluster.
Warning
Calling tt migration stop may cause migration inconsistency in the cluster.
To stop the execution of a migration currently running in the cluster:
$ tt migrations stop "https://user:pass@localhost:2379/myapp" \
--tarantool-username=admin --tarantool-password=pass
tt migrations stop interrupts a single migration. If you call it to interrupt
the process that applies multiple migrations, the ones completed before the call
receive the APPLIED status. The migration is interrupted by the call remains in
APPLY_STARTED.
$ tt migrations remove ETCD_URI [OPTION ...]
tt migrations remove removes published migrations from the centralized storage.
With additional options, it can also remove the information about the migration execution
on the cluster instances.
To remove all migrations from a specified centralized storage:
$ tt migrations remove "https://user:pass@localhost:2379/myapp" \
--tarantool-username=admin --tarantool-password=pass
To remove a specific migration, pass its name in the --migration option:
$ tt migrations remove "https://user:pass@localhost:2379/myapp" \
--tarantool-username=admin --tarantool-password=pass \
--migration=000001_create_writers_space.lua
Before removing migrations, the command checks their status
on the cluster. To ignore the status and remove migrations anyway, add the
--force-remove-on=config-storage option:
$ tt migrations remove "https://user:pass@localhost:2379/myapp" \
--force-remove-on=config-storage
Note
In this case, cluster credentials are not required.
To remove migration execution information from the cluster (clear the migration status),
use the --force-remove-on=cluster option:
$ tt migrations remove "https://user:pass@localhost:2379/myapp" \
--tarantool-username=admin --tarantool-password=pass \
--force-remove-on=cluster
To clear all migration information from the centralized storage and cluster,
use the --force-remove-on=all option:
$ tt migrations remove "https://user:pass@localhost:2379/myapp" \
--tarantool-username=admin --tarantool-password=pass \
--force-remove-on=all
Since tt migrations operates migrations via a centralizes etcd storage, it
needs credentials to access this storage. There are two ways to pass etcd credentials:
- command-line options
--config-storage-username and --config-storage-password
- the etcd URI, for example,
https://user:pass@localhost:2379/myapp
Credentials specified in the URI have a higher priority.
For commands that connect to the cluster (that is, all except publish), Tarantool
credentials are also required. The are passed in the --tarantool-username and
--tarantool-password options.
If the cluster uses SSL traffic encryption, provide the necessary connection
parameters in the --tarantool-ssl* options: --tarantool-sslcertfile,
--tarantool-sslkeyfile, and other. All options are listed in Options.
-
--acquire-lock-timeout INT
Applicable to: apply
Migrations fiber lock acquire timeout in seconds. Default: 60.
Fiber lock is used to prevent concurrent migrations run
-
--config-storage-password STRING
A password for connecting to the centralized migrations storage (etcd).
See also: Authentication.
-
--config-storage-username STRING
A username for connecting to the centralized migrations storage (etcd).
See also: Authentication.
-
--display-mode STRING
Applicable to: status
Display only specific information. Possible values:
config-storage – information about migrations published to the centralized storage.
cluster – information about migration applied on the cluster.
See also: status.
-
--execution-timeout INT
Applicable to: apply, remove, status, stop
A timeout for completing the operation on a single Tarantool instance, in seconds.
Default values:
3 for remove, status, and stop
3600 for apply
-
--force-reapply
Applicable to: apply
Apply migrations disregarding their previous status.
Warning
Using this option may lead to migrations inconsistency in the cluster.
-
--force-remove-on STRING
Applicable to: remove
Remove migrations disregarding their status. Possible values:
config-storage: remove migrations on etcd centralized migrations storage disregarding the cluster apply status.
cluster: remove migrations status info only on a Tarantool cluster.
all to execute both config-storage and cluster force removals.
Warning
Using this option may lead to migrations inconsistency in the cluster.
-
--ignore-order-violation
Applicable to: apply, publish
Skip migration scenarios order check before publish.
Warning
Using this option may lead to migrations inconsistency in the cluster.
-
--ignore-preceding-status
Applicable to: apply
Skip preceding migrations status check on apply.
Warning
Using this option may lead to migrations inconsistency in the cluster.
-
--key STRING
Applicable to: publish
Put scenario to /<prefix>/migrations/scenario/<key> etcd key instead.
Only for single file publish.
-
--migration STRING
Applicable to: apply, remove, status
A migration to apply, remove, or check status.
-
--overwrite
Applicable to: publish
overwrite existing migration storage keys.
Warning
Using this option may lead to migrations inconsistency in the cluster.
-
--replicaset STRING
Applicable to: apply, remove, status, stop
Execute the operation only on the specified replica set.
-
--skip-syntax-check
Applicable to: publish
Skip syntax check before publish.
Warning
Using this option may cause further tt migrations calls to fail.
-
--tarantool-auth STRING
Applicable to: apply, remove, status, stop
Authentication type used to connect to the cluster instances.
-
--tarantool-connect-timeout INT
Applicable to: apply, remove, status, stop
Tarantool cluster instances connection timeout, in seconds. Default: 3.
-
--tarantool-password STRING
Applicable to: apply, remove, status, stop
A password used to connect to the cluster instances.
-
--tarantool-sslcafile STRING
Applicable to: apply, remove, status, stop
SSL CA file used to connect to the cluster instances.
-
--tarantool-sslcertfile STRING
Applicable to: apply, remove, status, stop
SSL cert file used to connect to the cluster instances.
-
--tarantool-sslciphers STRING
Applicable to: apply, remove, status, stop
Colon-separated list of SSL ciphers used to connect to the cluster instances.
-
--tarantool-sslkeyfile STRING
Applicable to: apply, remove, status, stop
SSL key file used to connect to the cluster instances.
-
--tarantool-sslpassword STRING
Applicable to: apply, remove, status, stop
SSL key file password used to connect to the cluster instances.
-
--tarantool-sslpasswordfile STRING
Applicable to: apply, remove, status, stop
File with list of password to SSL key file used to connect to the cluster instances.
-
--tarantool-use-ssl
Applicable to: apply, remove, status, stop
Whether SSL is used to connect to the cluster instances.
-
--tarantool-username STRING
Applicable to: apply, remove, status, stop
A username for connecting to the Tarantool cluster instances.
Packaging the application
$ tt pack TYPE [OPTION ...] ..
tt pack packages an application into a distributable bundle of the specified TYPE:
tgz: create a .tgz archive.
deb: create a DEB package.
rpm: create an RPM package.
The command below creates a DEB package with all applications from the current tt
environment:
This command generates a .deb file whose name depends on the environment directory name and the operating system architecture, for example, test-env_0.1.0.0-1_x86_64.deb.
The package contains the following files:
- The content of the application directories: source files, resources, dependencies.
tt environment files: tarantool and tt executables, tt.yaml configuration file,
external modules, headers.
.service unit files that allow running applications as systemd services
(a separate file for each application).
You can also pass various options to the tt pack command to adjust generation properties, for example, customize a bundle name, choose which artifacts should be included, specify the required application dependencies.
You can customize your application’s systemd unit file generated by tt pack.
To add parameters to the unit file, define them in a YAML file named systemd-unit-params.yml
in the application directory.
$ tt pack rpm # unit file with parameters from systemd-unit-params.yml if it exists
You can also pass unit parameters from an arbitrary file by adding the --unit-params-file
option to the tt pack call:
$ tt pack rpm --unit-params-file my-params.yml # unit file with parameters from my-params.yml
Important
The systemd-unit-params.yml file has a higher priority than the --unit-params-file option.
If this file exists, it overrides parameters from the file passed in the option.
tt pack supports the following systemd unit parameters:
FdLimit – the number of open file descriptors (LimitNOFile in the unit file).
instance-env – a list of environment variables in the <VAR_NAME>: <VALUE> format.
Each list item adds an Environment=<VAR_NAME>=<VALUE> line to the unit file.
An example of the systemd-unit-params.yml file:
FdLimit: 200
instance-env:
INSTANCE: "inst:%i"
TARANTOOL_WORKDIR: "/tmp"
Generating files for integrity checks
Enterprise Edition
The integrity check functionality is supported by the Enterprise Edition only.
tt pack can generate checksums and signatures to use for integrity checks
when running the application. These files are:
hashes.json and hashes.json.sig in each application directory.
hashes.json contains SHA256 checksums of executable files that the application uses
and its configuration file. hashes.json.sig contains a digital signature
for hashes.json.
env_hashes.json and env_hashes.json.sig in the environment root are
similar files for the tt environment. They contain checksums for
Tarantool and tt executables, and for the tt.yaml configuration file.
To generate checksums and signatures for integrity check, use the --with-integrity-check
option. Its argument must be an RSA private key.
Note
You can generate a key pair using OpenSSL 3 as follows:
$ openssl genrsa -traditional -out private.pem 2048
$ openssl rsa -in private.pem -pubout > public.pem
To create a tar.gz archive with integrity check artifacts:
$ tt pack tgz --with-integrity-check private.pem
Learn how to perform integrity checks at the application startup and in runtime
in the tt start reference.
-
--all
Include all artifacts in a bundle.
In this case, a bundle might include snapshots, WAL files, and logs.
-
--app-list APPLICATIONS
Specify the applications included in a bundle.
Example
$ tt pack tgz --app-list app1,app3
-
--cartridge-compat
Applicable to: tgz
Package a Cartridge CLI-compatible archive.
Important
The Tarantool Cartridge framework is deprecated and is not compatible with
Tarantool 3.0 and later. This command is added for backward compatibility with
earlier versions.
-
--deps STRINGS
Applicable to: deb, rpm
Specify dependencies included in RPM and DEB packages.
Example
$ tt pack deb --deps 'wget,make>0.1.0,unzip>1,unzip<=7'
-
--deps-file STRING
Applicable to: deb, rpm
Specify the path to a file containing dependencies included in RPM and DEB packages.
For example, the package-deps.txt file below contains several dependencies and their versions:
unzip==6.0
neofetch>=6,<7
gcc>8
If this file is placed in the current directory, a tt pack command might look like this:
$ tt pack deb --deps-file package-deps.txt
-
--filename
Specify a bundle name.
Example
$ tt pack tgz --filename sample-app.tar.gz
-
--name PACKAGE_NAME
Specify a package name.
Example
$ tt pack tgz --name sample-app --version 1.0.1
-
--preinst
Applicable to: deb, rpm
Specify the path to a pre-install script for RPM and DEB packages.
Example
$ tt pack deb --preinst pre.sh
-
--postinst
Applicable to: deb, rpm
Specify the path to a post-install script for RPM and DEB packages.
Example
$ tt pack deb --postinst post.sh
-
--tarantool-version
Specify a Tarantool version for packaging in a Docker container.
For use with --use-docker only.
-
--unit-params-file
The path to a file with custom systemd unit parameters.
-
--use-docker
Build a package in an Ubuntu 18.04 Docker container. To specify a Tarantool
version to use in the container, add the --tarantool-version option.
Before executing tt pack with this option, make sure Docker is running.
-
--version PACKAGE_VERSION
Specify a package version.
Example
$ tt pack tgz --name sample-app --version 1.0.1
-
--with-binaries
Include Tarantool and tt binaries in a bundle.
-
--with-integrity-check PRIVATE_KEY
Generate checksums and signatures for integrity checks at the application startup.
See also: Generating files for integrity checks
-
--with-tarantool-deps
Add Tarantool and tt as package dependencies.
-
--without-binaries
Don’t include Tarantool and tt binaries in a bundle.
-
--without-modules
Don’t include external modules in a bundle.
Playing the contents of .snap and .xlog files to a Tarantool instance
$ tt play URI FILE ... [OPTION ...]
tt play plays the contents of snapshot (.snap) and
WAL (.xlog) files to another Tarantool instance.
A single call of tt play can play multiple files.
-
-u USERNAME, --username USERNAME
A Tarantool user for connecting to the instance.
-
-p PASSWORD, --password PASSWORD
The user’s password.
-
--from LSN
Play operations starting from the given LSN.
-
--to LSN
Play operations up to the given LSN. Default: 18446744073709551615.
-
--replica ID
Filter the operations by replica ID. Can be passed more than once.
When calling tt cat with filters by LSN (--from and --to flags) and
replica ID (--replica), remember that LSNs differ across replicas.
Thus, if you pass more than one replica ID via --from or --to,
the result may not reflect the actual sequence of operations.
-
--space ID
Filter the output by space ID. Can be passed more than once.
-
--show-system
Show the operations on system spaces.
tt play plays operations from .xlog and .snap files to the destination
instance one by one. All data changes happen the same way as if they were performed
on this instance. This means that:
All affected spaces must exist on the destination instance. They must have the same structure
and space_id as on the instance that created the snapshot or WAL file.
To play a snapshot or a WAL to a clean instance, include the operations on system spaces
by adding the --show-system flag. With this flag, tt plays the operations that
create and configure user-defined spaces.
The operations’ LSNs change unless you play all operations that took place since the instance startup.
Replica IDs change in accordance with the destination instance configuration.
Use one of the following ways to pass the username and the password when connecting
to the instance:
Play the contents of 00000000000000000000.xlog to the instance on
192.168.10.10:3301:
$ tt play 192.168.10.10:3301 00000000000000000000.xlog
Play operations on spaces with space_id 512 and 513 from the
00000000000000000012.snap snapshot file:
$ tt play 192.168.10.10:3301 00000000000000000012.snap --space 512 --space 513
Play the contents of 00000000000000000000.xlog including operations on system spaces:
$ tt play 192.168.10.10:3301 00000000000000000000.xlog --show-system
Managing replica sets
$ tt replicaset COMMAND {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]
# or
$ tt rs COMMAND {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]
tt replicaset (or tt rs) manages a Tarantool replica set.
COMMAND is one of the following:
$ tt replicaset status {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]
# or
$ tt rs status {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]
tt replicaset status (tt rs status) shows the current status of a replica set.
Displaying status of all replica sets
To view the status of all replica sets of an application in the current tt
environment, run tt replicaset status with the application name:
$ tt replicaset status myapp
Displaying status of a single replica set
To view the status of a single replica set of an application, run tt replicaset status
with a name or a URI of an instance from this replica set:
$ tt replicaset status myapp:storage-001-a
For a replica outside the current tt environment, specify its URI and access credentials:
$ tt replicaset status 192.168.10.10:3301 -u myuser -p p4$$w0rD
Learn about other ways to provide user credentials in Authentication.
$ tt replicaset demote APPLICATION:APP_INSTANCE [OPTIONS ...]
# or
$ tt rs demote APPLICATION:APP_INSTANCE [OPTIONS ...]
tt replicaset demote (tt rs demote) demotes an instance in a Tarantool
cluster with a local YAML configuration.
Demoting in clusters with local YAML configurations
tt replicaset demote can demote instances in Tarantool clusters with local
YAML configurations with failover modes
off and election.
Note
In clusters with manual failover mode, you can demote a read-write instance
by promoting a read-only instance from the same replica set with tt replicaset promote.
In the off failover mode, tt replicaset demote sets the instance’s database.mode
to ro and reloads the configuration.
Important
If failover is off, the command doesn’t consider the modes of other
replica set members, so there can be any number of read-write instances in one replica set.
If some members of the affected replica set are running outside the current tt
environment, tt replicaset demote can’t ensure the configuration reload on
them and reports an error. You can skip this check by adding the -f/--force option:
$ tt replicaset demote my-app:storage-001-a --force
In the election failover mode, tt replicaset demote initiates a leader
election in the replica set. The specified instance’s replication.election_mode
is changed to voter for this election, which guarantees that another instance
is elected as a new replica set leader.
The --timeout option can be used to specify the election completion timeout:
$ tt replicaset demote my-app:storage-001-a --timeout=10
$ tt replicaset expel APPLICATION:APP_INSTANCE [OPTIONS ...]
# or
$ tt rs expel APPLICATION[:APP_INSTANCE] [OPTIONS ...]
tt replicaset expel (tt rs expel) expels an instance from the cluster.
$ tt replicaset expel myapp:storage-001-b
The command supports the --config, --cartridge, and --custom options
that force the use of a specific orchestrator.
To expel an instance from a Cartridge cluster:
$ tt replicaset expel my-cartridge-app:storage-001-b --cartridge
$ tt replicaset vshard COMMAND {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]
# or
$ tt rs vshard COMMAND {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]
# or
$ tt rs vs COMMAND {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]
tt replicaset vshard (tt rs vs) manages vshard in the cluster.
It has the following subcommands:
$ tt replicaset vshard bootstrap {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]
# or
$ tt rs vshard bootstrap {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]
# or
$ tt rs vs bootstrap {APPLICATION[:APP_INSTANCE] | URI} [OPTIONS ...]
tt replicaset vshard bootstrap (tt rs vs bootstrap) bootstraps vshard
in the cluster.
$ tt replicaset vshard bootstrap myapp
With a URI and credentials:
$ tt replicaset vshard bootstrap 192.168.10.10:3301 -u myuser -p p4$$w0rD
You can specify the application name or the name of any cluster instance. The command
automatically finds a vshard router in the cluster and calls vshard.router.bootstrap() on it.
The command supports the --config, --cartridge, and --custom options
that force the use of a specific orchestrator.
To bootstrap vshard in a Cartridge cluster:
$ tt replicaset vshard bootstrap my-cartridge-app --cartridge
Important
The Tarantool Cartridge framework is deprecated and is not compatible with
Tarantool 3.0 and later. This command is added for backward compatibility with
earlier versions.
$ tt replicaset bootstrap APPLICATION[:APP_INSTANCE] [OPTIONS ...]
# or
$ tt rs bootstrap APPLICATION[:APP_INSTANCE] [OPTIONS ...]
tt replicaset bootstrap (tt rs bootstrap) bootstraps a Cartridge cluster or
an instance. The command works within the current tt environment and uses
application and instance names.
Note
tt replicasets bootstrap effectively duplicates two other commands:
Bootstrapping a Cartridge cluster
To bootstrap the cartridge_app application using its default replica sets file
replicasets.yml:
$ tt replicaset bootstrap cartridge_app
To use another file with replica set configuration, provide a path to it in the --file option:
$ tt replicaset bootstrap cartridge_app --file replicasets1.yml
To additionally bootstrap vshard after the cluster bootstrap, add --bootstrap-vshard:
$ tt replicaset bootstrap --bootstrap-vshard cartridge_app
Bootstrapping an instance
When called with the instance name, tt replicaset bootstrap joins the
instance to the replica set specified in the --replicaset option:
$ tt replicaset bootstrap --replicaset replicaset cartridge_app:instance1
$ tt replicaset rebootstrap APPLICATION:APP_INSTANCE [-y | --yes]
# or
$ tt rs rebootstrap APPLICATION:APP_INSTANCE [-y | --yes]
tt replicaset rebootstrap (tt rs rebootstrap) rebootstraps an instance:
stops it, removes instance artifacts, starts it again.
To rebootstrap the storage-001 instance of the myapp application:
$ tt replicaset rebootstrap myapp:storage-001
To automatically confirm reboostrap, add the -y/--yes option:
$ tt replicaset rebootstrap myapp:storage-001 -y
$ tt replicaset roles [add|remove] APPLICATION[:APP_INSTANCE] ROLE_NAME [OPTIONS ...]
# or
$ tt rs roles [add|remove] APPLICATION[:APP_INSTANCE] ROLE_NAME [OPTIONS ...]
tt replicaset roles (tt rs roles) manages application roles
in the cluster.
This command works on Tarantool clusters with a local YAML
configuration and Cartridge clusters. It has two subcommands:
add adds a role
remove removes a role
Managing roles in clusters with local YAML configurations
When called on clusters with local YAML configurations, tt replicaset roles
subcommands add or remove the corresponding lines from the configuration file
and reload the configuration.
Use the --global, --group, --replicaset, --instance options to specify
the configuration scope to add or remove roles. For example, to add a role to
all instances in a replica set:
$ tt replicaset roles add my-app roles.my-role --replicaset storage-a
You can also manage roles of a specific instance by specifying its name after the application name:
$ tt replicaset roles add my-app:router-001 roles.my-role
To remove a role defined in the global configuration scope:
$ tt replicaset roles remove my-app roles.my-role --global
If some instances of the affected scope are running outside the current tt
environment, tt replicaset roles can’t ensure the configuration reload on
them and reports an error. You can skip this check by adding the -f/--force option:
$ tt replicaset roles add my-app roles.my-role --replicaset storage-a --force
Managing roles in Cartridge clusters
Important
The Tarantool Cartridge framework is deprecated and is not compatible with
Tarantool 3.0 and later. This command is added for backward compatibility with
earlier versions.
When called on Cartridge clusters, tt replicaset roles subcommands add or remove
Cartridge cluster roles.
Cartridge cluster roles are defined per replica set. Thus, you can use the
--replicaset and --group options to define a role’s scope. In this case,
a group is a vshard group.
To add a role to a Cartridge cluster replica set:
$ tt replicaset roles add my-cartridge-app my-role --replicaset storage-001
To remove a role from a vshard group:
$ tt replicaset roles remove my-cartridge-app my-role --group cold-data
Learn more about Cartridge cluster roles.
Selecting the application orchestrator manually
You can specify the orchestrator to use for the application when calling tt replicaset
commands. The following options are available:
--config for applications that use YAML cluster configuration (Tarantool 3.x or later).
--cartridge for Cartridge applications (Tarantool 2.x).
--custom for any other orchestrators used on Tarantool 2.x clusters.
$ tt replicaset status myapp --config
$ tt replicaset promote my-cartridge-app:storage-001-a --cartridge
If an actual orchestrator that the application uses does not match the specified
option, an error is raised.
Use one of the following ways to pass the credentials of a Tarantool user when
connecting to the instance by its URI:
-
--bootstrap-vshard
Applicable to: bootstrap
Additionally bootstrap vshard when bootstrapping a Cartridge application.
-
--cartridge
Force the Cartridge orchestrator for Tarantool 2.x clusters.
-
--config
Force the YAML configuration orchestrator for Tarantool 3.0 or later clusters.
-
--custom
Force a custom orchestrator for Tarantool 2.x clusters.
-
--file STRING
Applicable to: bootstrap
A file with Cartridge replica sets configuration. Default: instances.yml
in the application directory.
See also: Bootstrapping a Cartridge cluster
-
-f, --force
Applicable to: promote, demote, roles
Skip operation on instances not running in the same environment.
-
-G, --global
Applicable to: roles on Tarantool 3.x and later
Apply the operation to the global configuration scope, that is, to all instances.
-
-g, --group STRING
Applicable to: roles
A name of the configuration group to which the operation applies.
-
-i, --instance STRING
Applicable to: roles
A name of the instance to which the operation applies. Not applicable to Cartridge clusters.
Learn more in Managing roles in Cartridge clusters.
-
-r, --replicaset STRING
Applicable to: bootstrap, roles
A name of the replica set to which the operation applies.
See also: Bootstrapping an instance
-
-u, --username STRING
A Tarantool user for connecting to the instance using a URI.
-
-p, --password STRING
The user’s password.
-
--sslcertfile STRING
The path to an SSL certificate file for encrypted connections for the URI case.
-
--sslkeyfile STRING
The path to a private SSL key file for encrypted connections for the URI case.
-
--sslcafile STRING
The path to a trusted certificate authorities (CA) file for encrypted connections for the URI case.
-
--sslciphers STRING
The list of SSL cipher suites used for encrypted connections for the URI case, separated by colons (:).
-
--timeout UINT
Applicable to: promote, demote, expel, vshard, bootstrap
The timeout for completing the operation, in seconds. Default:
3 for promote, demote, expel, roles
10 for vshard and bootstrap
-
--with-integrity-check STRING
-
Applicable to: promote, demote, expel, roles
Generate hashes and signatures for integrity checks.
-
-y, --yes
Applicable to: rebootstrap
Automatically confirm rebootstrap.
Using the LuaRocks package manager
$ tt rocks [OPTION ...] [VAR=VALUE] COMMAND [ARGUMENT]
tt rocks provides means to manage Lua modules (rocks) via the
LuaRocks package manager. tt uses its own
LuaRocks installation connected to the Tarantool rocks repository.
Below are lists of supported LuaRocks flags and commands. For detailed information on
their usage, refer to LuaRocks documentation.
-
--dev
Enable the sub-repositories in rocks servers for rockspecs of in-development versions.
-
--server=SERVER
Fetch rocks/rockspecs from this server (takes priority over config file).
-
--only-server=SERVER
Fetch rocks/rockspecs from this server only (overrides any entries in the config file).
-
--only-sources=URL
Restrict downloads to paths matching the given URL.
-
--lua-dir=PREFIX
Specify which Lua installation to use
-
--lua-version=VERSION
Specify which Lua version to use.
-
--tree=TREE
Specify which tree to operate on.
-
--local
Use the tree in the user’s home directory.
Call tt rocks help path to learn how to enable it.
-
--global
Use the system tree when local_by_default is true.
-
--verbose
Display verbose output for the command executed.
-
--timeout=SECONDS
Timeout on network operations, in seconds.
0 means no timeout (wait forever). Default: 30.
admin |
Use the luarocks-admin tool |
build |
Build and compile a rock |
config |
Query information about the LuaRocks configuration |
doc |
Show documentation for an installed rock |
download |
Download a specific rock file from a rocks server |
help |
Help on commands. Type tt rocks help <command> for more |
init |
Initialize a directory for a Lua project using LuaRocks |
install |
Install a rock |
lint |
Check syntax of a rockspec |
list |
List the currently installed rocks |
make |
Compile package in the current directory using a rockspec |
make_manifest |
Compile a manifest file for a repository |
new_version |
Auto-write a rockspec for a new version of a rock |
pack |
Create a rock, packing sources or binaries |
purge |
Remove all installed rocks from a tree |
remove |
Uninstall a rock |
search |
Query the LuaRocks servers |
show |
Show information about an installed rock |
test |
Run the test suite in the current directory |
unpack |
Unpack the contents of a rock |
which |
Tell which file corresponds to a given module name |
write_rockspec |
Write a template for a rockspec file |
Install the rock queue from the Tarantool rocks repository:
Search for the rock queue in both the Tarantool rocks repository and
the default LuaRocks repository:
$ tt rocks search queue --server='https://luarocks.org'
List the documentation files for the installed rock queue:
$ tt rocks doc queue --list
Without the --list flag, this command displays documentation in the user’s default browser.
Create a *.rock file from the installed rock queue:
Unpack a *.rock file:
$ tt rocks unpack queue-scm-1.all.rock
Remove the installed rock queue:
Starting Tarantool applications
$ tt start [APPLICATION[:APP_INSTANCE]]
tt start starts Tarantool applications. The application files must be stored
inside the instances_enabled directory specified in the tt configuration file.
For detailed instructions on preparing and running Tarantool applications, see
Application environment and Starting and stopping instances.
See also: Stopping a Tarantool instance, Restarting a Tarantool instance, Checking instance status.
To start all instances of the application stored in the app directory inside
instances_enabled in accordance with its instances.yml:
To start all instances of the app application appending their logs to stdout
(in the interactive mode):
To start the router instance of the app application:
When called without arguments, starts all enabled applications in the current environment:
tt start can start entire Tarantool clusters based on their YAML configurations.
A cluster application directory inside instances_enabled must contain the following files:
config.yaml – a YAML configuration that defines
the cluster topology and settings.
It can either contain an explicit configuration in the YAML format or point
to a centralized configuration storage (for Enterprise Edition).
instances.yml – a file that defines the list of cluster instances to run
in the current environment.
- (Optionally)
*.lua files with code to load and run in the cluster.
For more information about Tarantool application layout, see Application environment.
Note
tt also supports Tarantool applications with configuration in code,
which is considered a legacy approach since Tarantool 3.0. For information
about using tt with such applications, refer to the Tarantool 2.11 documentation.
Running in the background
tt start runs Tarantool applications in the background and uses its own watchdog
process for status checks (tt status) and application stopping (tt stop).
Important
Do not switch on the background mode using the cluster configuration
(process.background: true in the YAML configuration) or code (box.cfg.background = true)
in applications that you run with tt.
If you start such an application with tt start, tt won’t be able to check
the application status or stop it using the corresponding commands.
Enterprise Edition
The integrity check functionality is supported by the Enterprise Edition only.
tt start can perform initial and periodical integrity checks of the environment,
application, and centralized configuration.
To enable integrity checks of environment and application files, you need to pack
the application using tt pack with the --with-integrity-check option.
This option generates and signs checksums of executables and configuration files in the current tt
environment. Learn more in Generating files for integrity checks.
To enable integrity check of the configuration at the centralized storage,
publish the configuration to this storage using tt cluster publish with the --with-integrity-check option.
This option generates and signs configuration checksums and saves them to the storage.
Learn more in Publishing configurations with integrity check.
To perform the integrity checks when running the application, start it with the
--integrity-check global option.
Its argument must be a public key matching the private key that was used for
generating checksums.
$ tt --integrity-check public.pem start myapp
After such a call, tt checks the environment, application, and configuration integrity
using the checksums and starts the application in case of the success. Then, integrity
checks are performed periodically while the application is running. By default,
they are performed once every 24 hours. You can adjust the integrity check period
by adding the --integrity-check-period option:
$ tt --integrity-check public.pem start myapp --integrity-check-period 60
Additionally, Tarantool checks the integrity of the modules that the application uses
at the load time, that is, when require('module') is called.
If an integrity check fails, tt stops the application.
-
-i, --interactive
Start the application or instance in the interactive mode.
In this mode, instance logs are printed to the standard output in real time.
You can use the SIGINT signal (CTRL+C) to stop tt and its child
Tarantool processes in the interactive mode. No watchdog processes are created.
-
--integrity-check-interval NUMBER
Integrity check interval in seconds. Default: 86400 (24 hours).
Set this option to 0 to disable periodic checks.
See also: Integrity check
Checking instance status
$ tt status [APPLICATION[:APP_INSTANCE]] [OPTION ...]
tt status prints the information about Tarantool applications and instances
in the current environment. This includes:
INSTANCE – application and instance names
STATUS – instance status: running, not running, or terminated with an error
PID – process IDs
MODE – instance modes: read-write or read-only
CONFIG – the instances’ states in regard to configuration for Tarantool 3.0 or later (see config.info())
BOX – the instances’ box.info() statuses
UPSTREAM – the instances’ box.info.replication[*].upstream statuses
When called without arguments, prints the status of all enabled applications in the current environment.
Print the status of all instances of the app application:
Print the status of the replica instance of the app application:
Pretty-print the status of the replica instance of the app application:
$ tt status app:replica --pretty
-
-d, --details
Print detailed alerts.
-
-p, --pretty
Print the status as a pretty-formatted table.
Displaying the tt version
tt version shows the version of the tt utility being used.
Extending the tt functionality
The tt utility implements a modular architecture: its commands
are, in fact, separate modules. When you run tt with a command, the
corresponding module is executed with the given arguments.
The modular architecture enables the option to extend the tt functionality with
external modules (as opposed to internal modules that implement built-in
commands). Simply said, you can write any code you want to execute
from tt, pack it into an executable, and run it with a tt command:
tt my-module-name my-args
The name of the command that executes a module is the same as the name of the module’s executable.
Module description and help
Executables that implement external tt modules must have two flags:
--description – print a short description of the module. The description is shown alongside
the command in the tt help.
--help – display help. The help message is shown when tt help <module_name> is called.
External modules must be located in the modules directory specified in the
configuration file:
tt:
modules:
directory: path/to/modules/dir
To check if a module is available in tt, call tt help.
It will show the available external modules in the EXTERNAL COMMANDS section together
with their descriptions.
Overloading built-in commands
External modules can overload built-in tt commands.
If you want to change the behavior of a built-in command, create an external
module with the same name and your own implementation.
When tt sees two modules – an external and an internal one – with the same
name, it will use the external module by default.
For example, if you want tt to show the information about your Tarantool
application, write the external module version that outputs the information
you need. The tt version call will execute this module instead of the built-in one:
tt version # Calls the external module if it's available
You can force the use of the internal module by running tt with the --internal or -I
option. The following call will execute the built-in version
even if there is an external module with the same name:
tt version -I # Calls the internal module
tt interactive console
The tt utility features a command-line console that allows executing requests
and Lua code interactively on the connected Tarantool instances.
It is similar to the Tarantool interactive console with
one key difference: the tt console allows connecting to any available instance,
both local and remote. Additionally, it offers more flexible output formatting capabilities.
To connect to a Tarantool instance using the tt console, run tt connect.
Specify the instance URI and the user credentials in the corresponding options:
$ tt connect 192.168.10.10:3301 -u myuser -p p4$$w0rD
• Connecting to the instance...
• Connected to 192.168.10.10:3301
192.168.10.10:3301>
If a user is not specified, the connection is established on behalf of the guest user.
If the instance runs in the same tt environment, you can establish a local
connection with it by specifying the <application>:<instance> string instead of the URI:
$ tt connect app:storage001
• Connecting to the instance...
• Connected to app:storage001
app:storage001>
Local connections are established on behalf of the admin user.
To get the list of supported console commands, enter \help or ?.
To quit the console, enter \quit or \q.
By default, the tt console prints the output data in the YAML format, each
tuple on the new line:
app:storage001> box.space.bands:select { }
---
- - [1, 'Roxette', 1986]
- [2, 'Scorpions', 1965]
- [3, 'Ace of Base', 1987]
...
You can switch to alternative output formats – Lua or ASCII (pseudographics) tables –
using the \set output console command:
app:storage001> \set output lua
app:storage001> box.space.bands:select { }
{{1, "Roxette", 1986}, {2, "Scorpions", 1965}, {3, "Ace of Base", 1987}};
app:storage001> \set output table
app:storage001> box.space.bands:select { }
+------+-------------+------+
| id | band_name | year |
+------+-------------+------+
| 1 | Roxette | 1986 |
+------+-------------+------+
| 2 | Scorpions | 1965 |
+------+-------------+------+
| 3 | Ace of Base | 1987 |
+------+-------------+------+
Note
Field names are printed since Tarantool 3.2. On earlier versions,
actual names are replaced by numbered placeholders col1, col2, and so on.
The table output can be printed in the transposed format, where an object’s fields
are arranged in columns instead of rows:
app:storage001> \set output ttable
app:storage001> box.space.bands:select { }
+-----------+---------+-----------+-------------+
| id | 1 | 2 | 3 |
+-----------+---------+-----------+-------------+
| band_name | Roxette | Scorpions | Ace of Base |
+-----------+---------+-----------+-------------+
| year | 1986 | 1965 | 1987 |
+-----------+---------+-----------+-------------+
Note
You can also specify the output format in the tt connect call using the
-x/--outputformat option:
$ tt connect app:storage001 -x table
For table and ttable output, more customizations are possible with the
following commands:
\set table_format – table format: default (pseudographics, or ASCII table), Markdown,
or Jira-compatible format:
app:storage001> \set table_format jira
app:storage001> box.space.bands:select {}
| id | 1 | 2 | 3 |
| band_name | Roxette | Scorpions | Ace of Base |
| year | 1986 | 1965 | 1987 |
\set grahpics – enable or disable graphics for table cells in the default format:
app:storage001> \set table_format default
app:storage001> \set graphics false
app:storage001> box.space.bands:select {}
id 1 2 3
band_name Roxette Scorpions Ace of Base
year 1986 1965 1987
\set table_column_width – maximum column width.
app:storage001> \set table_column_width 6
app:storage001> box.space.bands:select {}
id 1 2 3
band_n Roxett Scorpi Ace of
+ame +e +ons + Base
year 1986 1965 1987
Show help on the tt console.
Show available keyboard shortcuts.
Set the input language.
Possible values:
An analog of the tt connect option -l/--language.
\set graphics {true|false}, \x{g|G}
Whether to print pseudographics for table cells if the output format is table or ttable.
Possible values: true (default) and false.
The shorthands are:
\xG for true
\xg for false
\set table_colum_width WIDTH, \xw WIDTH
Set the maximum printed width of a table cell content. If the length exceeds this value,
it continues on the next line starting from the + (plus) sign.
Shorthand: \xw
Web interface overview
The Tarantool Cluster Manager web interface is available on the hostname and port defined by the
http.host and http.port configuration options.
If TLS is enabled, it uses the https protocol, otherwise the protocol is http.
When started locally with the default configuration, TCM is available at http://127.0.0.1:8080.
To log into TCM after bootstrap, use the following credentials:
After logging in with the default password:
- Adjust the password policy
in accordance with the security requirements that apply in your organization.
- Change the
admin user’s password on the User settings page.
To log out of TCM, click the user’s name in the header and click Log out.
The TCM web interface consists of three parts:
- Navigation page on the left shows the list of pages available to the user.
The navigation pane can be collapsed by clicking the cross icon at its top.
- Header at the top provides access to notifications and user settings.
- Working area displays the contents of the selected page.
The Onboarding item of the navigation pane starts the interactive onboarding
tutorial. Use it to get familiar with the main TCM features directly in the web interface.
This overview describes most TCM pages. The exact set of pages and controls available
to a particular user is determined by the user’s permissions.
Some features, such as data schema editing, are available only in the development mode.
You can switch to it in the user settings of the Default Admin user.
To learn more about the development mode, see Development mode.
For easier navigation, TCM pages are grouped in the navigation pane by their content.
There are the following page groups:
- Cluster: interaction with the selected cluster.
- Clusters: interaction with all connected clusters in general.
- Users: access management.
- Tools: TCM administration.
- Settings: runtime management of TCM settings.
Read on to learn what you can do on the pages of these groups.
The Cluster group includes pages used for interaction with a particular cluster.
To switch between clusters, click the Cluster group name and select a connected
cluster from the drop-down list.
The cluster Stateboard is a main page for monitoring the cluster state
and interacting with its instances.
On this page, you can:
- view and edit the cluster topology
- group and filter instances based on various criteria
- view memory statistics and Tarantool versions running on instances
- navigate to instance pages
by clicking instance names in the cluster topology list
- start and stop instances (in the development mode).
Learn more about using the cluster stateboard in Viewing cluster state.
The instance page opens when you click an instance name on the Stateboard.
It provides a set of tabs for performing actions on the selected Tarantool instance:
- Details and State tabs: view instance details as a human-readable table
or as a console output of
box.cfg, box.info, and other built-in functions
- SQL and Terminal tabs: run SQL and Lua commands on the instance
- Logs tab: view instance logs
- Slabs tab: view slab allocator statistics
- Users tab: manage Tarantool users and roles on the instance
- Funcs: manage and call stored functions
- Metrics: view instance metrics
The instance page has an Actions menu at the top that allows you to:
- navigate to the instance explorer
- edit the instance configuration
- remove the instance
The Slabs tab in the TCM Web UI visualizes memory allocation within each Tarantool instance using the slab allocator.
This tab is useful for:
- identifying memory fragmentation
- analyzing slab saturation by object size
- debugging excessive memory use in real time
This visualization is based on the output of:
This function returns a Lua table with per-class (per object size) memory allocation statistics from the slab allocator.
More about box.slab.stats().
Each entry in the output contains:
item_size: object size class
slab_count: number of slab blocks
slab_size: memory size of each slab
item_count: number of allocated objects
mem_used: bytes used
mem_free: bytes free
These values are parsed and rendered as visual elements in the UI.
Each block represents a single slab (a fixed-size memory region). The color indicates how full the slab is:
- Green — the slab is less than 30% full
- Red — slab is full (100% usage)
- Gradient colors between green and red — indicate intermediate fill levels (e.g., 30%, 50%, 75%)
The color transitions smoothly, providing a quick visual way to understand which slabs are:
- actively used
- partially utilized
- potentially underused or contributing to memory fragmentation
In the example screenshot:
- Slab #17 (168 KB) — 75% full (dark red)
- Slab #18 (320 KB) — 53% full (brownish-red)
- Slab #16 (40 KB) — only 1% used (bright green)
- Slab #2 (56 B) — 60% used (intermediate gradient)
Each slab block’s size in the visualization reflects the total memory allocated for its item_size class –
the more memory allocated, the larger the visual representation.
Calculating fill percentage
The overall fill percentage for a slab is calculated using:
fill % = (item_count * item_size) / (slab_count * slab_size)
However, each slab is visualized individually, so different fill levels across slabs will result in various colors within the same row.
You can fine-tune the allocator behavior with two configuration options:
These parameters affect how memory is allocated per object size class and can help:
- reduce internal fragmentation
- optimize memory usage
- improve slab locality and performance
- better understand memory consumption via the Slabs tab
Use cases and recommendations table:
| Scenario / Goal |
Parameters (slab_alloc_factor / slab_alloc_granularity) |
Effect on memory |
Effect on performance |
Visualization in Slabs tab |
| Reduce memory waste (small, uniform tuples) |
1.05 / 4 |
Many size classes – minimal internal memory waste |
Higher overhead for managing slab pools |
Many rows, partially filled blocks, gradient from green to red |
| Optimize performance (mixed-size tuples) |
1.3 / 16 |
Fewer size classes – slightly more memory waste |
Lower overhead – faster memory allocation |
Fewer rows, larger blocks, color contrast: partially or filled |
| Control fragmentation and slab count |
Task-dependent: lower values – more classes; higher values – fewer classes |
Balance between internal memory waste and the number of blocks |
Balance between overhead and allocator speed |
Balance between number of rows and block sizes; colors indicate fill level |
The cluster Configuration page provides an interactive editor for the cluster
configuration. It is connected to the centralized configuration
storage that the cluster uses. All changes you make and apply to this page are
sent to this centralized storage.
Learn more in Configuring clusters.
The Security page provides controls for managing the cluster security settings.
Learn more in Security settings.
The Migrations page provides centralized migration management tools for the selected cluster.
Learn more in Performing migrations.
Important
The cluster-wide access to stored data on the Tuples page is supported only
for sharded clusters that use the CRUD module.
Starting with TCM 1.6.0, the Tuples tab is disabled by default.
You can enable the tab in the TCM configuration file (tcm.yaml) using the option below:
The Tuples page provides access to data stored in the user spaces of the selected
cluster.
On this page, you can:
- view the list of user spaces, their size, and engines
- view and edit tuples stored in user spaces
- search for tuples by entering search condition in the Search bar
TCM supports the following comparison operators:
== – equal to
> – greater than
< – less than
>= – greater than or equal to
<= – less than or equal to
Search condition has the following structure:
index_name comparator value
where:
index_name – the name of the index. This is the left-hand side of the expression.
comparator – a comparison operator (>, >=, ==, <=, <). It must be separated by spaces on both sides of the expression.
value – a string, numeric, or boolean value. This is the right-hand side of the expression.
String values must be enclosed in double quotes ("").
Note
TCM does not support text search without a search condition. For example, to search for customers named Ivan in a
space, use the index name and a comparison operator to specify the expression:
- correct: typing
name == "Ivan" in the Search bar
- incorrect: typing
Ivan in the Search bar
Examples
The search expression below returns tuples with IDs greater than 9990:
In TCM, the result might look as follows:
In the example below, the search returns tuples with the name index equal to Ivan:
The example below specifies a multiple search condition.
The search returns all people with an ID greater than 2 who were born in 1980 or earlier.
The TCF tab provides an interface for clusters that run within Tarantool Clusters Federation.
TCF tab can be added via the TCM configuration file:
# tcm.yaml
feature:
tcf: True
You can also enable it using the environment variable or the feature command-line option.
For more details, refer to configuration reference.
On this page, you can:
- view information about TCF clusters
- toggle the state of clusters
- promote or demote clusters
- change key cluster parameters.
To open the settings, click Actions (the three dots next to the cluster status) and select Settings. Available parameters:
dml_users: list of DML users
cluster1, cluster2: cluster settings
replication_user: replication username
replication_password: password associated with the replication user
failover_timeout: time period (in seconds) to wait before initiating failover to another cluster. Default value: 20
initial_status: initial service state
max_suspect_counts: maximum suspect counts for failover. Default value: 3
health_check_delay: delay (in seconds) between health checks. Default value: 2
enable_system_check: enables or disables system-level health checks. Default value: true
status_ttl: time-to-live for service status. Default value: 4
Learn more in TCF integration.
TCM provides built-in support for monitoring and inspecting Tarantool
Queue Enterprise through the web interface.
The TQE tab can be added via the TCM configuration file:
# tcm.yaml
feature:
tqe: True
You can also enable it using the environment variable or the feature command-line option.
For more details, refer to configuration reference.
After enabling the feature, the TQE page appears in the TCM UI and provides access to Metrics and Queues pages.
Metrics can be viewed in two formats:
The Queues page displays runtime information for each queue, including:
- Latency – the time delay (ms) between a message being added to the queue and being processed.
- Poll max batch – the number of messages retrieved in a single request for processing.
- Deduplication mode – specifies how duplicate messages are handled. Deduplication is always enabled. Available modes:
basic (default), extended, keep_latest, keep_first.
The instance Explorer provides access to all spaces of a specific instance,
including system spaces.
On this page, you can:
- view and edit instance spaces, their size, and engines
- view and edit tuples stored in all spaces of the instance
The Clusters group includes pages used for managing TCM’s cluster connections.
The Clusters page lists Tarantool clusters that are connected to TCM.
On this page, you can:
- connect Tarantool clusters to TCM
- edit cluster connections
- disconnect clusters
Learn more in Connecting clusters.
The ACL page displays the TCM access control list.
On this page, you can add and delete ACL entries. Learn more in Access control list.
The Users group includes pages related to user access to TCM.
The Users page lists TCM users.
On this page, you can:
- add, edit, and delete users
- manage user secrets (passwords and
API tokens)
- revoke user sessions
Learn more in Users.
The Roles page lists TCM user roles.
On this page, you can add, edit, and delete roles. Learn more in Roles.
The Sessions page lists active sessions of TCM users.
On this page, you can view and revoke sessions. Learn more in Sessions.
The Settings group includes service pages where you can configure various TCM features.
On the Password policy page, you can configure the requirements to user passwords,
such as minimal length, required symbols, expiration, and other settings.
Learn more in Password policy.
On the Audit settings page, you can configure how TCM records events to its
audit log: whether audit log is enabled, which events are recorded, and so on.
Learn more in Audit log.
On the LDAP page, you can manage TCM LDAP configurations.
The user settings dialog opens when you click Settings under the user’s name
in the header.
This dialog includes the following tabs:
- General tab: switch the color theme
- Change password tab: change your password
- API tokens tab: generate and delete API tokens
- Sessions tab: view and revoke your user sessions
- About tab: view TCM information about switch between development and production modes
Connecting clusters
Tarantool Cluster Manager works with clusters that:
A single TCM installation can have multiple connected clusters. A connection to
TCM doesn’t affect the cluster’s functioning. You can connect clusters to TCM
and disconnect them on the fly.
There are two scenarios of cluster connection to TCM:
In both cases, you need to deploy Tarantool and start the cluster instances using
the tt CLI utility or another suitable way.
To add a cluster to TCM, you can use two ways:
- Use the TCM web interface as described on this page.
- Specify the
initial-settings.clusters section of the TCM configuration.
To learn more, see Initial settings.
When connecting a cluster to TCM, you need to provide two sets of connection parameters:
for the cluster instances and for the centralized configuration storage.
Configuration storage connection
The cluster configuration can be stored in either an etcd
cluster or a separate Tarantool-based storage. In both cases, the following connection
parameters are required:
- A key prefix used to identify the cluster in the configuration storage.
A prefix must be unique for each cluster in storage.
- URIs of all instances of the configuration storage.
- The credentials for accessing the configuration storage: an etcd user
or a Tarantool user.
Additionally, if SSL or TLS encryption is enabled for the configuration storage,
provide the corresponding encryption configuration: keys, certificates, and other
parameters. For the complete list of parameters, consult the etcd documentation
or Tarantool Securing connections with SSL.
For interaction with the cluster instances, TCM needs the following access parameters:
- A Tarantool user that exists in the cluster and their password.
TCM connects to the cluster on behalf of this user.
- An SSL configuration if the traffic encryption
is enabled on the cluster.
Managing connected clusters
Administrators can add new clusters, edit, and remove existing ones from TCM.
Connected clusters are listed on the Clusters page.
If you don’t have a cluster yet, you can add one in TCM and write its configuration
from scratch using the built-in configuration editor.
Important
When adding a new cluster, you need to have a storage for its configuration up
and running so that TCM can connect to it. Cluster instances can be deployed later.
To add a new cluster:
- Go to Clusters and click Add.
- Fill in the general cluster information:
- Specify an arbitrary name.
- Optionally, provide a description and select a color to mark this cluster in TCM.
- Optionally, enter the URLs of additional services for the cluster. For example,
a Grafana dashboard that monitors the cluster metrics, or a syslog server
for viewing the cluster logs. TCM provides quick access to these URLs on
the cluster Stateboard page.
- Select the type of the cluster configuration storage: etcd or tarantool.
- Define a unique Prefix for identifying this cluster in the configuration storage.
- Provide the connection details for the cluster configuration storage:
- The URIs of configuration storage instances.
- The credentials for accessing the configuration storage.
- The SSL/TLS parameters if the connection encryption is enabled on the storage.
- Provide the cluster credentials: a username, a password, and SSL parameters in
case traffic encryption is enabled on
the cluster.
Once you add the cluster:
Editing a connected cluster
To edit a connected cluster, go to Clusters and click Edit in the Actions
menu of the corresponding table row.
To disconnect a cluster from TCM, go to Clusters and click Disconnect
in the Actions menu of the corresponding table row.
Note
Disconnecting a cluster does not affect its functioning. The only
thing that changes is that it’s no longer shown in TCM.
You can connect this cluster again at any time.
Cluster management
The main goal of Tarantool Cluster Manager is to provide visual tools for managing
various aspects of Tarantool clusters from the browser. See the pages of this section
to learn how to perform various management operations on Tarantool clusters from TCM.
Viewing cluster state
Tarantool Cluster Manager provides a visual interface for checking various aspects of connected clusters,
such as:
- topology
- instance state
- memory usage
- data distribution
- Tarantool versions
Cluster state information is available on the Cluster > Stateboard page.
The cluster topology is displayed on the Stateboard page in one of two forms:
a list or a graph.
The list view of the cluster topology is used by default. In this view, each row contains
the general information about an instance: its current state, memory usage and limit,
and other parameters.
In the list view, TCM additionally displays the Tarantool version information
and instance states on circle diagrams. You can click the sectors of these diagrams
to filter the instances with the selected versions and states.
To switch to the list view, click the list button on the right of the search bar on the Stateboard page.
The graph view of the cluster topology is shown in a tree-like structure where
leafs are the cluster’s instances. Each instance’s state is shown by its color.
You can move the graph vertices to arrange them as you like, and zoom in and out,
which is helpful for larger clusters.
To switch to the graph view, click the graph button on the right of the search bar on the Stateboard page.
By default, the cluster topology is shown hierarchically as it’s defined in the configuration:
instances are grouped by their replica set, and replica sets are grouped by
their configuration group.
For better navigation across the cluster, you can adjust the instance grouping.
For example, you can group instances by their roles or custom tags defined in the configuration.
A typical case for such tags is adding a geographical markers to instances. In this case,
you see if issues happen in a specific data center or server.
To change the instance grouping, click Group by in the Actions menu on the Stateboard page.
Then add or remove grouping criteria.
You can filter the instances shown on the Stateboard page using the search bar
at the top. It has predefined filters that select:
- instances with errors or warnings
- leader or read-only instances
- instances with no issues
- stale instances
To display all instances, delete the filter applied in the search bar.
The general information about the state of cluster instances is shown in the
list view of the cluster topology. Each row contains the information about the instance
status, used and available memory, read-only status, and virtual buckets for sharded
clusters.
To view the detailed information about an instance or connect to it, click the corresponding
row in the instances list or a vertex of the graph. On the instance page, you can
find:
- the instance configuration overview
- current state (with warning and error messages if any)
- the detailed Tarantool information returned by the instance introspection functions
from box.info, box.stat,
and other built-in modules
- memory usage by the slab allocator
- instance users and roles
- stored functions
- instance metrics
The page also provides Lua and SQL terminals to execute built-in functions
and requests on the instance. You can choose between two Lua terminals: the
tt interactive console with code completion and highlighting or
the default Tarantool console.
When you connect a cluster to TCM, you can specify
URLs of external services linked to this cluster. For example, this can be a Grafana
server that monitors the cluster metrics.
All the URLs added for a cluster are available for quick access in the Actions
menu on the Stateboard page.
Configuring clusters
Tarantool Cluster Manager features a built-in text editor for Tarantool EE cluster configurations.
When you connect a cluster to TCM, it gains access
to the cluster’s centralized configuration storage: an etcd or a Tarantool cluster.
TCM has both read and write access to the cluster configuration. This enables
the configuration editor to work in two ways:
- If a configuration already exists, the editor shows its current state.
- When you change the configuration in the editor and apply changes, they
are sent to the configuration storage.
To learn how to write Tarantool cluster configurations, see Configuration.
Managing a cluster’s configuration
The configuration editor is available on the Cluster > Configuration page.
To start managing a cluster’s configuration, select this cluster in the Cluster
drop-down and go to the Configuration page.
A cluster configuration in TCM can consist of one or multiple YAML files.
When there are multiple files, they are all considered parts of a single cluster
configuration. You can use this for structuring big cluster configurations.
All files that form the configuration of a cluster are listed on the left side
of the Cluster configuration page.
To add a cluster configuration file, click the plus icon (+) below the page title.
To open a configuration file in the editor, click its name in the file list.
To delete a cluster configuration file, click the Delete button beside the filename.
To download a cluster configuration file, click the Download button beside the filename.
Warning
All configuration changes are discarded when you leave the Cluster configuration page.
Save the configuration if you want to continue
editing it later or apply it
to start using it on the cluster.
Saving a configuration draft
TCM can store configurations drafts. If you want to leave an unfinished configuration
and return to it later, save it in TCM. Saving applies to whole cluster configurations:
it records the edits of all files, file additions, and file deletions.
To save a cluster configuration draft after editing, click Save in the Cluster configuration page.
All unsaved changes are discarded when you leave the Cluster configuration page.
If you have a saved configuration draft, you can reset the changes for each of its
files individually. A reset returns the file into the state that is currently used
by a cluster (that is, saved in the configuration storage). If you reset a newly
added file, it is deleted.
To reset a saved configuration file, click the Reset button beside the filename.
When you finish editing a configuration and it’s ready to use, apply the updated
configuration to the cluster. To apply a cluster configuration, click Apply
on the Cluster configuration page. This sends the new configuration to the cluster
configuration storage, and it comes into effect upon the cluster configuration reload.
Managing cluster users and roles
Tarantool Cluster Manager provides a visual interface for managing Tarantool users and roles
on connected clusters.
The Tarantool access model defines user access to entities
inside a single instance. Thus, to create or alter a cluster-wide user or role, you need to
do this on all cluster instances. In replication clusters, changes in access model
are possible only on read-write instances (replica set leaders). Changes made on
a leader instance are propagated to all instances of its replica set automatically.
Operations on the cluster access model are possible only if the user
that TCM uses to connect to the cluster has the privileges to manage users and roles.
You can also manage Tarantool users and roles from TCM using the Lua API
as described in Access control. To do this, connect to instance consoles
from the Terminal tab of the instance page.
The tools for managing cluster users are located on the Users tab
of the instance page.
Important
To ensure the access model consistency across the cluster, repeat all user
management operations on all read-write instances of the cluster.
To create a user on a cluster:
- Go to Stateboard.
- Find a replica set leader in the instances list and click it to open the instance page.
- Go to the Users tab and click Add user.
To edit or delete a user, click the Edit or Delete button against the username
in the Users table.
To edit a user’s privileges:
- Click the lock icon against the username in the Users table.
- In the privileges dialog:
- Click Add to grant privileges
- Click Revoke (the trash bin icon) to revoke a privilege
The tools for managing cluster roles are located on the Users tab
of the instance page.
Important
To ensure the access model consistency across the cluster, repeat all role
management operations on all read-write instances of the cluster.
To create a role on a cluster:
- Go to Stateboard.
- Find a replica set leader in the instances list and click it to open the instance page.
- Go to the Users tab and click Add role.
To delete a role, click the Delete button against the role name in the Roles table.
To edit a role’s privileges:
- Click the lock icon against the role name in the Roles table.
- In the privileges dialog:
- Click Add to grant privileges
- Click Revoke (the trash bin icon) to revoke a privilege
Security settings
Tarantool Cluster Manager includes a web interface for managing security settings of connected
clusters. It is available on the Cluster > Security page. On this page,
you can manage the following security features in the cluster:
- Authentication settings: protocol (CHAP or PAP), number of retries, and
the delay after a failed authentication attempt (security.auth_*
configuration options). To learn more about Tarantool authentication settings, see Authentication.
- Password policy: minimal password length, required characters, expiration
period, and other settings (security.password_*
configuration options). To learn more about Tarantool password policy, see Password policy.
- Guest access: whether unauthenticated or guest
users can connect to cluster (security.disable_guest
configuration option).
- Secure erasing: whether to delete data files securely so that they cannot be restored
(security.secure_erasing configuration option).
- Audit log: configure audit logging in the cluster
(audit_log.* configuration options).
To learn how to manage audit logging in the cluster, see Audit module.
Viewing cluster metrics
In Tarantool Cluster Manager, you can view metrics of connected clusters in real time on the
Cluster > Cluster metrics page. The list of metrics that Tarantool exposes
is provided in the Metrics reference.
Metrics are displayed one by one. To view a metric, select it in the drop-down list
at the top of the page. Then, choose a way to visualize it:
- Chart: a time series chart with the metric values displayed as lines.
- Table: a table where the metric values are displayed as numbers in table cells.
Once you select a metric, TCM starts visualizing its current values, updating them
once per second. To pause the visualization, click the button on the left from
the metrics selector. To stop the visualization, clear the metric selection.
To view metrics of a specific instance, find this instance on the Stateboard,
click its name, and go to the Metrics tab of the instance page.
Monitoring metrics with Prometheus
To allow collecting cluster metrics with external systems, such as Prometheus,
TCM provides HTTP endpoints at /api/metrics/<clusterId>.
Note
Cluster IDs are shown in the cluster selection dialog that opens when you click
Cluster at the top of the left navigation pane.
To access such an endpoint, a request must be authorized with an API token
that has a cluster.metrics permission on the target cluster.
Below is an example of a Prometheus scrape configuration that collects metrics of
a Tarantool cluster from TCM:
- job_name: "tarantool"
static_configs:
- targets: ["127.0.0.1:8080"]
metrics_path: "/api/metrics/00000000-0000-0000-0000-000000000000"
bearer_token: QgMPZ22JZ3uw7n0QTbqYGAQDmNDs1JnTkhaC1OlQzWM3utmpV78b23GG97zp8YE3
Using supervised failover
For Tarantool clusters that use supervised failover,
Tarantool Cluster Manager offers tools for interaction with external failover coordinators from its web interface.
The tools for using supervised failover are located on the Failovers page
available from the Actions menu on the cluster stateboard.
Note
TCM can interact with failover coordinators that are already running.
There is no way to start or stop coordinators from TCM.
Viewing failover coordinators
To view failover coordinators running on the cluster, go to the Failovers tab.
On this tab, you can see the information about all Tarantool instances that the cluster
uses as failover coordinators. The information includes:
- Current coordinator status –
Active or Not active
PID – process ID
Hostname – the host on which the coordinator is running
UUID – the coordinator ID
Term – a value that defines the order in which coordinators become active
(take the lock) over time.
Executing failover commands
To send a failover command to a coordinator, go to the Commands tab and click Add.
Then, provide the command description in the YAML format. It can include the following
fields:
command – the command name. Possible value: switch – switch master
in a replica set.
new_master – the name of the instance to make the new master.
timeout – the command execution timeout.
Example:
command: switch
new_master: instance-002
timeout: 30
After entering the command, click Save to send the command for execution.
Tarantool assigns an id to the command and waits for the active coordinator to process the command.
All failover commands executed on the cluster are shown on the Commands tab with
their ids and statuses. A command can have the following statuses:
taken – a failover coordinator has started the command execution.
success – the command has completed successfully.
failed – an error occurred during the command execution.
A short error description is shown in the Reason field.
To see the command execution details, click this command in the list.
TCF integration
Tarantool Cluster Manager provides a web interface for clusters that run within Tarantool Clusters Federation.
It is available on the Cluster > TCF page. If a connected cluster is
configured to run in a TCF installation, this page shows information about both
clusters in this installation: their ID’s, names, and statuses. To switch cluster
states in TCF, click Toggle on the TCF page.
To learn more about Tarantool Clusters Federation, see its documentation.
Note
For individual clusters, the TCF page is empty.
Accessing cluster data
Tarantool Cluster Manager provides access to data stored in connected clusters through its
web interface. You can view, add, edit, and delete tuples from spaces.
Data access is implemented in TCM on a per-instance basis: you can access
data stored on one cluster instance at a time. For sharded clusters that use the
CRUD module,
it’s also possible to access data throughout the whole cluster.
There are the following ways to access data stored on a cluster instance from TCM:
- Instance explorer displays the instance’s spaces as tables in the web interface
- SQL terminal allows executing SQL statements on the instance
- Tarantool and tt consoles allow accessing the data using the Lua API
Important
Data modification is possible only on instances in the read-write mode (replica set leaders).
Changes are applied to read-only replicas in accordance with the cluster topology.
The instance explorer provides access to all spaces that exist on the instances
in the web interface. This includes both system and user spaces.
To open the instance explorer:
- Go to Stateboard.
- Click the instance row in the instances list or its graph vertex in the graph view.
- Click Explorer in the Actions menu of the instance details page.
To view tuples of a space, click its row in the spaces list.
To add a new tuple, click + on the space page and provide tuple field values
in the Lua format, for example, [ 1, 1000, true, "test"].
To edit a tuple, click it in the table and then click Edit.
To delete a tuple, select it in the table and click Delete (the trash bin button).
In the development mode, you can also create, edit, truncate, and delete spaces
in the instance explorer. To create a space, click Add and follow the wizard steps.
To edit, truncate, or remove a space, click the corresponding button in the Actions
menu of the space row in the table.
TCM features an SQL terminal that you can use to access stored data. It is located
on the SQL tab of the instance details page. In the SQL terminal, you can execute
any supported SQL expressions on the selected instance.
For select SQL queries, you can also download the query result set in the CSV format.
To learn more about using SQL in Tarantool, see the SQL tutorial.
For sharded clusters that use the CRUD module,
it’s possible to access stored data throughout the cluster on the Cluster > Tuples page.
This page displays only user spaces.
To view all tuples of a space in a sharded cluster, click the space row in the list.
To add a new tuple, click + on the space page and provide tuple field values
in the Lua format, for example [ 1, 1000, true, "test"]. When you add a tuple
in a sharded cluster, it is distributed to a replica set based on the sharding key
(the bucket_id field) value.
To edit a tuple, click it in the table and then click Edit.
To delete a tuple, select it in the table and click Delete (the trash bin button).
Creating spaces in sharded clusters
To create a space in a sharded cluster, create it on all read-write cluster instances
on their Instance explorer pages.
Important
Sharded spaces must include the bucket_id field of the unsigned type
and a non-unique index by this field with the same name.
To edit, truncate, or delete spaces in a sharded cluster, perform the corresponding
action on all read-write cluster instances.
Access control
Tarantool Cluster Manager provides means for managing user and client applications access
to its own functions and connected clusters:
Role-based access control
Tarantool Cluster Manager features a role-based access control system. It enables flexible
management of access to TCM functions, connected clusters, and stored data.
The TCM access system uses three main entities: permissions, roles,
and users (or user accounts). They work as follows:
- Permissions correspond to specific functions or objects in
TCM (administrative permissions) or operations on clusters (cluster permissions).
- Roles are predefined sets of administrative permissions to
assign to users.
- Users have roles that define their access rights to TCM functions and objects, and
cluster permissions that are assigned for each cluster individually.
Note
TCM users, roles, and permissions are not to be confused with similar subjects
of the Tarantool access control system. To access Tarantool
instances directly, Tarantool users with corresponding roles are required.
Permissions define access to specific actions that users can do in TCM. For example,
there are permissions to view connected clusters or to manage users.
There are two types of permissions in TCM: administrative and cluster permissions.
Administrative permissions provide access to TCM functions. They define which
pages and controls are available to users in the web UI. Typically, read permissions
define pages shown in the left menu. Write permissions define the availability
of controls for managing objects on the pages.
For example, users with read permission to clusters can view the Clusters page
but they don’t see Add, Edit, or Remove buttons unless they have the
write permission.
Administrative permissions are assigned to users through roles.
Cluster permissions enable actions with connected Tarantool clusters.
These permissions are granted to users on a per-cluster level: each user has a separate
set of permissions for each cluster.
Cluster permissions define which pages of the Cluster menu section users
see and what actions they can take on these pages.
For example, users with the read configuration permission to a cluster configuration
see the Configuration page when this cluster is selected.
Cluster permissions are assigned to users individually when creating or editing them.
For a fine-grained control over user access to particular spaces and functions stored
in clusters, there is the access control list.
Permissions are predefined in TCM, there is no way to change, add, or delete them.
The complete lists of administrative and cluster permissions in TCM are provided
in the Permissions reference.
Roles are groups of administrative permissions
that are assigned to users together.
The assigned roles define pages that users see in TCM and actions available
on these pages.
Note
Roles don’t include cluster permissions. Access to connected clusters
is configured for each user individually.
TCM comes with default roles that cover three common usage scenarios:
- Super Admin Role is a default role with all available
administrative permissions.
Additionally, the users with this role automatically gain all
cluster permissions
to all clusters.
- Cluster Admin Role is a default role for cluster administration. It includes
administrative permissions for cluster management.
- Default User Role is a default role for working with clusters. It includes
basic administrative read permissions that are required to log in to TCM
and navigate to a cluster.
Administrators can create new roles, edit, and delete existing ones.
Roles are listed on the Roles page.
To create a new role, click Add, enter the role name, and select the permissions
to include in the role.
To edit an existing role, click Edit in the Actions menu of the corresponding
table row.
To delete a role, click Delete in the Actions menu of the corresponding
table row.
Note
You can delete a role only if there are no users with this role.
TCM users gain access to objects and actions through assigned roles
and cluster permissions.
A user can have any number of roles or none of them. Users without roles
have access only to clusters that are assigned to them.
TCM uses password authentication for users. For information on password management,
see the Passwords section below.
There is one default user Default Admin. It has all the available permissions,
both administrative and cluster ones. When new clusters are added in TCM,
Default Admin automatically receives all cluster permissions for them as well.
Administrators can create new users, edit, and delete existing ones.
The tools for managing users are located on the Users page.
To create a user:
- Click Add.
- Fill in the user information: username, full name, and description.
- Generate or type in a password.
- Select roles to assign to the user.
- Add clusters to give the user access to, and select cluster permissions for
each of them.
To edit a user, click Edit in the Actions menu of the corresponding table row.
To delete a user, click Delete in the Actions menu of the corresponding table row.
TCM uses the general term secret for user authentication keys. A secret is any
pair of a public and a private key that can be used for authentication. A password
combined with a username is a secret type used for TCM user authentication.
In this case, the public key is a username, and the private key is a password.
Users receive their first passwords during their account creation.
All passwords are governed by the password policy.
It can be flexibly configured to follow the security requirements of your organization.
To change your own password, click your name in the top-right corner and go to
Settings > Change password.
Changing users’ passwords
Administrators can manage a user’s password on this user’s Secrets page.
To open it, click Secrets in the Actions menu of the corresponding Users table row.
To change a user’s password, click Edit in the Actions menu of the corresponding
Secrets table row and enter the new password in the New secret key field.
Passwords expire automatically after the expiration period defined in the password policy.
When a user logs in to TCM with an expired password, the only action available to
them is a password change. All other TCM functions and objects are unavailable until
the new password is set.
Administrators can also set users’ passwords to expired manually.
To set a user’s password to expired, click Expire in the Actions
menu of the corresponding Secrets table row.
Important
Password expiration can’t be reverted.
To forbid users’ access to TCM, administrators can temporarily block their
passwords. A blocked password can’t be used to log into TCM until it’s
unblocked manually or the blocking period expires.
To block a user’s password, click Block in the Actions menu of the corresponding
Secrets table row. Then provide a blocking reason and enter the blocking period.
To unblock a blocked password, click Unblock in the Actions menu of the corresponding
Secrets table row.
Password policy helps improve security and comply with security requirements that
can apply to your organization.
You can edit the TCM password policy on the Password policy page.
There are the following password policy settings:
- Minimal password length.
- Do not use last N passwords.
- Password expiration in days. Users’ passwords expire
after this number of days since they were set. Users with expired passwords
lose access to any objects and functions except password change until they set
a new password.
- Password expiration warning in days. After this number of days, the user
sees a warning that their password expires soon.
- Block after N login attempts. Temporarily block users if they enter their
username or password incorrectly this number of times consecutively.
- User lockout time in seconds. The time interval for which users can’t log
in after spending all failed login attempts.
- Password must include. Characters and symbols that must be present in passwords:
- Lowercase characters (a-z)
- Uppercase characters (A-Z)
- Digits (0-9)
- Symbols (such as !@#$%^&*()_+№”’:,.;=][{}`?>/.)
Administrative permissions
The following administrative permissions are available in TCM:
| Permission |
Description |
admin.clusters.read |
View connected clusters’ details |
admin.clusters.write |
Edit cluster details and add new clusters |
admin.users.read |
View users’ details |
admin.users.write |
Edit user details and add new users |
admin.roles.read |
View roles’ details |
admin.roles.write |
Edit roles and add new roles |
admin.addons.read |
View add-ons |
admin.addons.write |
Edit add-on flags |
admin.addons.upload |
Upload new add-ons |
admin.auditlog.read |
View audit log configuration and read audit log in TCM |
admin.auditlog.write |
Edit audit log configuration |
admin.sessions.read |
View users’ sessions |
admin.sessions.write |
Revoke users’ sessions |
admin.ldap.read |
View LDAP configurations |
admin.ldap.write |
Manage LDAP configurations |
admin.passwordpolicy.read |
View password policy |
admin.passwordpolicy.write |
Manage password policy |
admin.secrets.read |
View information about users’ secrets |
admin.secrets.write |
Manage users’ secrets: add, edit, expire, block, delete |
user.password.change |
User’s permission to change their own password |
user.api-token.read |
User’s permission to read their own API tokens information |
user.api-token.write |
User’s permission to modify their own API tokens |
admin.metrics |
Read TCM metrics |
admin.acl.read |
View the access control list (ACL) |
admin.acl.write |
Add and delete ACL entries |
The following cluster permissions are available in TCM:
| Permission |
Description |
cluster.config.read |
View cluster configuration |
cluster.config.write |
Manage cluster configuration |
cluster.stateboard.read |
View cluster stateboard |
cluster.func.read |
View cluster’s stored functions |
cluster.func.write |
Edit cluster’s stored functions |
cluster.func.call |
Execute stored functions on cluster instances |
cluster.space.read |
Read cluster data schema |
cluster.space.write |
Modify cluster data schema |
cluster.space.data.read |
Read stored data from cluster |
cluster.space.data.write |
Edit stored data on cluster |
cluster.failover.read |
Read cluster failover information |
cluster.failover.write |
Write cluster failover commands |
cluster.terminal |
Connect to cluster instances with tt terminal from TCM |
cluster.sql |
Execute SQL queries |
cluster.metrics |
View cluster metrics |
LDAP authentication
In addition to its internal role-based access control model,
Tarantool Cluster Manager can use an external LDAP (Lightweight Directory Access Protocol)
directory server for user authentication and authorization.
When LDAP authentication is enabled, TCM uses a connected LDAP directory server
to authenticates users who submit the login form. TCM constructs requests to
the servers according to configuration parameters described on this page. Permissions
of LDAP users in TCM are defined by LDAP group mapping.
Both LDAP and secure LDAPS (LDAP over TLS) protocols are supported.
Enabling LDAP authentication
LDAP authentication can be enabled using either of two configuration methods:
- Enabling via CLI – set the
security.auth option to include ldap in the TCM YAML config or as a CLI flag.
- Enabling via web interface – starting from version 1.4.0, you can enable LDAP authentication interactively in the TCM UI.
To allow LDAP user authentication in TCM, enable the ldap authentication method
in the security.auth configuration option before startup:
Note
If both authentication methods – LDAP and local – are enabled, TCM tries them
for each login attempt in the order they are specified in the configuration.
To enable LDAP authentication using the TCM web interface:
- Click the user icon in the top-right corner of the screen.
- Select Settings from the dropdown menu.
- Navigate to the Authentication methods tab.
- Check the box next to LDAP.
- Save the changes.
To enable LDAP user access to TCM, create an LDAP configuration that connects
TCM to the LDAP server that stores the users. An LDAP configuration
defines how TCM connects to the server and queries user data. To create an LDAP
configuration, go to the LDAP page in the Settings group and click Add.
To edit an LDAP configuration, click Edit in the Actions menu of the corresponding row.
To delete an LDAP configuration, click Delete in the Actions menu of the corresponding row.
Define the general configuration settings:
Enabled. Defines if the configuration is used. Turn the toggle off to
stop using the configuration.
Note
If there are several enabled LDAP configurations, TCM attempts to use them
for user authentication in the order they are created.
Automatically add non-existent users. By default, TCM automatically saves
LDAP user information to its backend store
upon their first login. Turn the toggle off if you don’t want to save users from this LDAP server.
Enter the LDAP server connection parameters:
- Endpoints. URLs of the LDAP server. Example:
127.0.0.1:5056.
- Request timeout. The timeout for TCM requests to the LDAP server, in seconds.
- Enabled TLS. If the server uses LDAPS, turn this toggle on and specify
TLS connection parameters, such as a certificate and a key file.
To define how TCM queries the LDAP server for user authentication and authorization,
fill in the fields of the Queries step:
Query user and Query password. Credentials of the LDAP user on behalf
of which all LDAP queries are executed: a distinguished name (DN) and a password.
Example DN:
cn=admin,cn=users,dc=tarantool,dc=io
Base DN. The DN of a directory that serves as a root for making all LDAP requests.
Example: dc=tarantool,dc=io.
Username regex. A regular expression that defines a username template for
this LDAP configuration. When a user enters their username on the login page,
TCM matches it against username regular expressions of all enabled LDAP
configurations and selects the one to use for this user authentication.
Example: a regex to match employee email addresses within the specified domain.
^([\w\-\.]+)@tarantool.io$
(Optional) Template DN. A template for building a DN to send in an authentication bind request.
Use the numbers in curly braces as placeholders to replace with username regex parts:
{0}, {1}, and so on.
Example:
cn={0},cn=users,dc=tarantool,dc=io
When used with the Username regex shown above, it substitutes {0} with
the username part of the email address (before @) entered into the login form.
For example, the username user1@tarantool.io forms the following DN for bind request:
cn=user1,cn=users,dc=tarantool,dc=io
(Optional) Template query. A template for querying the LDAP server for the DN. This
way is used if Template DN is not provided.
Group query template. A template for querying groups to which a user belongs
for authorization purposes. Learn more in LDAP user permissions.
Example:
(&(objectCategory=person)(objectClass=user)(cn={0}))
Permissions of LDAP users in TCM are defined by the groups to which they belong.
You can map TCM administrative and cluster permissions
to LDAP groups on the Groups step of the configuration creation.
To assign permissions to an LDAP group, click Add group. In the dialog that opens,
enter the group name, for example, CN=Admins,CN=Builtin,DC=tarantool,DC=io.
Then, select administrative permission to grant to this group in the Permissions list.
To grant cluster permissions, click Add cluster. Select a cluster and the cluster
permissions to grant to the group. Save the group.
Each user has permissions of all LDAP groups to which they belong.
Disabling LDAP configurations
To stop using an LDAP configuration, open its Edit page and turn off the Enabled toggle.
Access control list
Tarantool Cluster Manager access control list (ACL) determines user access to particular data
and functions stored in clusters. You can use it to allow or deny access to specific
stored objects one by one.
Each ACL entry specifies privileges that a TCM user has on a particular
space or a function. There are three access privileges that can be granted in the ACL:
read, write, and execute (for stored functions only). The privileges work as follows:
- Spaces:
Read: the user sees the space and its tuples on the Tuples and Explorer pages
Write: the user can add new and edit existing tuples of the space
- Functions:
Read: the user sees the function on the Functions tab of the instance details page.
Write: the user can edit or delete the function
Execute: the user can call the function
Important
User access to space data and stored functions is primarily defined by the
cluster permissions cluster.space.data.* and cluster.func.*.
ACL only increases the access control granularity to particular objects.
Make sure that users have these permissions before enabling ACL for them.
To granularly manage a user’s access to particular objects in a cluster, enable
the use of ACL in the user profile:
- Go to Users and click Edit in the Actions menu of the corresponding table row.
- In the user’s Clusters list, add a cluster on which you want to use ACL
or click the pencil icon if the cluster is already on the list.
- Select the Use Access Control List (ACL) checkbox and save changes.
- Repeat two previous steps for each cluster on which you want to use ACL for this user.
- Click Update to save the user account.
If the user doesn’t exist yet, you can do the same when creating it.
Important
When ACL use is enabled for a user, this user loses access to all spaces and
functions of the selected cluster except the ones explicitly specified in the ACL.
The tools for managing ACL are located on the ACL page.
To add an ACL entry:
- Click Add.
- Select a user to which you want to grant access.
- Select a cluster that stores the target object: a space or a function.
- Select the target object type and enter its name.
- Select the privileges you want to grant.
To delete an ACL entry, click Delete in the Actions menu of the corresponding table row.
API tokens
Tarantool Cluster Manager uses the Bearer HTTP authentication scheme with API tokens to authenticate
external applications’ requests to TCM. For example, these can be Prometheus
jobs that retrieve metrics of connected Tarantool clusters.
The API tokens functionality is disabled by default. To enable it, set the
feature.api-token configuration option to true.
Each TCM API token belongs to the user that created it and has the same access permissions.
Thus, if a user has a permission to view a cluster’s metrics in TCM, this user’s
API tokens can be used to read this cluster’s metrics with Prometheus.
API tokens have expiration dates that are set during the token creation and cannot
be changed.
Note
Each user, including Default Admin and other administrators, can create only
their own tokens. There is no way to create a token for another user.
To create a TCM API token:
- Open the user settings by clicking the user’s name in the top-right corner.
- Go to the API tokens tab and click Add.
- Specify the token expiration date and an optional description and click Add.
The created token is shown in a dialog.
Important
An API token is shown only once after its creation. There is no way to view
it again after you close the dialog. Make sure to copy the token in a safe place.
To delete an API token, click Delete in the actions menu of the corresponding
API tokens table row.
Administrators can also view information about users’ API tokens and delete them
on the Secrets page. To open a user’s secrets, click Secrets in the Actions
menu of the corresponding Users table row.
Sessions
Tarantool Cluster Manager administrators can view and revoke user sessions in the web interface.
All active sessions are listed on the Sessions page. To revoke a session, click
Revoke in the Actions menu of the corresponding table row.
To revoke all sessions of a TCM user, go to Users and click Revoke all sessions
in the Actions menu of the corresponding table row.
Audit log
Tarantool Cluster Manager provides the audit logging functionality for tracking user activity
and security-related events, such as:
- Successful and failed login attempts.
- Access to clusters, their configurations, data models, and stored data.
- Changes in the access control system: users, roles, passwords, LDAP configurations.
The complete list of TCM audit events is provided in Event types.
Note
TCM audit log records only events that happen in TCM itself.
For information about Tarantool audit logging, see Audit module.
Audit logging is disabled in TCM by default. To start recording events, you need
to enable and configure it.
The audit log stores event details in the JSON format. Each log entry contains the
event type, description, time, impacted objects, and other information that
may be used for incident investigation. The complete list of fields is provided in
Structure of audit log events.
TCM also provides a built-in interface for reading and searching the audit log.
For details, see Viewing audit log.
To enable audit logging in TCM, go to Audit settings and click Enable.
To additionally send audit log events to the standard output, click Send to stdout.
TCM audit events can be logged to a local file or sent to a
syslog server.
To configure audit logging, go to Audit settings.
To write TCM audit logs to a file:
- Go to Audit settings and select the file protocol.
- Specify the name of the audit log file. The file appears in the TCM working directory.
- Configure the log files rotation: the maximum file size and age, and the number
of files to store simultaneously.
- (Optional) Enable compression of audit log files.
Configuration parameters:
- Output file name. The name of the audit log file. Default:
audit.log
- Max size (in MB). The maximum size of the log file before it gets rotated, in megabytes. Default: 100.
- Max backups. The maximum number of stored audit log files. Default: 10.
- Max age (in days). The maximum age of audit log files in days. Default: 30.
- Compress. Compress audit log files into
gzip archives when rotating.
If you use a centralized log management system based on syslog,
you can configure TCM to send its audit log to your syslog server:
- Go to Audit settings and select the syslog protocol.
- Enter the syslog server URI and select the network protocol. Typically,
syslogd listens on port 514 and uses the UDP protocol.
- Specify the syslog logging parameters: timeout, priority, and facility.
Configuration parameters:
- Protocol. The network protocol used for connecting to the syslog server. Default:
udp.
- Output. The syslog server URI. Default: 127.0.0.1:514 (localhost).
- Timeout. The syslog write timeout in the ISO 8601 duration format.
Default:
PT2S (two seconds).
- Priority. The syslog severity level. Default:
info.
- Facility. The syslog facility. Default:
local0.
Selecting audit events to record
When the audit log is enabled, TCM records all audit events listed in Event types.
To decrease load and make the audit log comply with specific security
requirements, you can record only selected events. For example, these can be events
of user account management or events of cluster data access.
To select events to record into the audit log, go to Audit settings and
enter their types into the Filters field
one-by-one, pressing the Enter key after each type.
To remove an event type from a filters list, click the cross icon beside it.
If the audit log is written to a file, you can view it in TCM on the Audit log page.
On this page, you can view or search for events.
To view the details of a logged audit event, click the corresponding line in the
table.
To search for an event, use the search bar at the top of the page. Note that the
search is case-sensitive. For example, to find events with the ALARM severity,
enter ALARM, not alarm.
Structure of audit log events
All entries of the TCM audit log include the mandatory fields listed in the table below.
| Field |
Description |
Example |
time |
Time of the event |
2023-11-23T12:05:27.099+07:00 |
severity |
Event severity: VERBOSE, INFO, WARNING, or ALARM |
INFO |
type |
Audit event type |
user.update |
description |
Human-readable event description |
Update user |
uuid |
Event UUID |
f8744f51-5760-40c3-ae2d-0b4d6b44836f |
user |
UUID of the user who triggered the event |
942a4f54-cf7f-4f46-80ce-3511dbbb57b7 |
remote |
Remote host that triggered the event |
100.96.163.226:48722 |
host |
The TCM host on which the event happened |
100.96.163.226:8080 |
userAgent |
Information about the client application and platform that was used to trigger the event |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 |
permission |
The permission that was used to trigger the event |
[“admin.users.write”] |
result |
Event result: ok or nok |
ok |
err |
Human-readable error description for events with nok result |
failed to login |
fields |
Additional fields for specific event types in the key-value format |
Key examples:
clusterId in cluster-related events
payload in events that include sending data to the server
username in current.* or auth.* events
|
This is an example of an audit log entry on a successful login attempt:
{
"time": "2023-11-23T12:01:27.247+07:00",
"severity": "INFO",
"description": "Login user",
"type": "current.login",
"uuid": "4b9c2dd1-d9a1-4b40-a448-6bef4a0e5c79",
"user": "",
"remote": "127.0.0.1:63370",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
"host": "127.0.0.1:8080",
"permissions": [],
"result": "ok",
"fields": [
{
"Key": "username",
"Value": "admin"
},
{
"Key": "method",
"Value": "null"
},
{
"Key": "output",
"Value": "true"
}
]
}
The following table lists all possible values of the type field of TCM
audit log events.
| Event type |
Description |
auth.fail |
Authentication failed |
auth.ok |
Authentication successful |
access.denied |
An attempt to access an object without the required permission |
crud.insert |
Data inserted via CRUD operations |
crud.delete |
Data deleted via CRUD operations |
user.add |
User added |
user.update |
User updated |
user.delete |
User deleted |
secret.add |
User secret added |
secret.update |
User secret updated |
secret.block |
User secret blocked |
secret.unblock |
User secret unblocked |
secret.delete |
User secret deleted |
secret.expire |
User secret expired |
session.revoke |
Session revoked |
session.revokeuser |
All user’s sessions revoked |
explorer.insert |
Data inserted in a cluster |
explorer.delete |
Master switched manually |
test.devmode |
Switched to development mode |
auditlog.config |
Audit log configuration changed |
passwordpolicy.save |
Password policy changed |
passwordpolicy.resetpasswords |
All passwords are expired by an administrator |
ddl.save |
Cluster data model saved |
ddl.apply |
Cluster data model applied |
cluster.config.save |
Cluster configuration saved |
cluster.config.reset |
Saved cluster configuration reset |
cluster.config.apply |
Cluster configuration applied |
current.logout |
User logged out their own session |
current.revoke |
User revoked their own session |
current.revokeall |
User revoked all their active sessions |
current.changepassword |
User changed their password |
role.add |
Role added |
role.update |
Role updated |
role.delete |
Role deleted |
cluster.add |
Cluster added |
cluster.update |
Cluster updated |
cluster.delete |
Cluster removed |
ldap.testlogin |
Login test executed for an LDAP configuration |
ldap.testconnection |
Connection test executed for an LDAP configuration |
ldap.add |
LDAP configuration added |
ldap.update |
LDAP configuration updated |
ldap.delete |
LDAP configuration deleted |
addon.enable |
Add-on enabled |
addon.disable |
Add-on disabled |
addon.delete |
Add-on removed |
tcmstate.save |
Low-level information saved in the TCM storage (for debug purposes) |
tcmstate.delete |
Low-level information deleted from the TCM storage (for debug purposes) |
Configuration
This topic describes how to configure Tarantool Cluster Manager. For the complete
list of TCM configuration parameters, see the TCM configuration reference.
Note
To learn about Tarantool cluster configuration, see Configuration.
Tarantool Cluster Manager configuration is a set of parameters that define various aspects
of TCM functioning. Parameters are grouped by the particular aspect that they
affect. There are the following groups:
- HTTP
- logging
- configuration storage
- security
- add-ons
- limits
- TCM running mode
Parameter groups can be nested. For example, in the http group there are
tls and websession-cookie groups, which define TLS encryption and
cookie settings.
Parameter names are the full paths from the top-level group to the specific parameter.
For example:
http.host is the host parameter that is defined directly in the http group.
http.tls.enabled is the enabled parameter that is defined in the tls
nested group within http.
Ways to pass configuration parameters
There are three ways to pass TCM configuration parameters:
- a YAML file
- environment variables
- command-line options of the TCM executable
TCM configuration can be stored in a YAML file. Its structure must reflect the
configuration parameters hierarchy.
The example below shows a fragment of a TCM configuration file:
# a fragment of a YAML configuration file
cluster: # top-level group
on-air-limit: 4096
connection-rate-limit: 512
tarantool-timeout: 10s
tarantool-ping-timeout: 5s
http: # top-level group
basic-auth: # nested group
enabled: false
network: tcp
host: 127.0.0.1
port: 8080
request-size: 1572864
websocket: # nested group
read-buffer-size: 16384
write-buffer-size: 16384
keepalive-ping-interval: 20s
handshake-timeout: 10s
init-timeout: 15s
To start TCM with a YAML configuration, pass the location of the configuration
file in the -c command-line option:
TCM can take values of its configuration parameters from environment variables.
The variable names start with TCM_. Then goes the full path to the parameter,
converted to upper case. All delimiters are replaced with underscores (_).
Examples:
TCM_HTTP_HOST is a variable for the http.host parameter.
TCM_HTTP_WEBSESSION_COOKIE_NAME is a variable for the http.websession-cookie.name parameter.
The example below shows how to start TCM with configuration parameters passed in
environment variables:
$ export TCM_HTTP_HOST=0.0.0.0
$ export TCM_HTTP_PORT=8888
$ tcm
The TCM executable has -- command-line options for each configuration parameter.
Their names reflect the full path to the parameter, with configuration levels separated by
periods (.). Examples:
--http.host is an option for http.host.
--http.websession-cookie.name is an option for http.websession-cookie.name.
The example below shows how to start TCM with configuration parameters passed in
command-line options:
$ tcm --storage.etcd.embed.enabled --addon.enabled --http.host=0.0.0.0 --http.port=8888
TCM configuration options are applied from multiple sources with the following precedence,
from highest to lowest:
tcm executable arguments.
TCM_* environment variables.
- Configuration from a YAML file.
If the same option is defined in two or more locations, the option with the highest
precedence is applied. For options that aren’t defined in any location, the default
values are used.
You can combine different ways of TCM configuration for efficient management of
multiple TCM installations:
- A single YAML file for all installations can contain the common configuration parts.
For example, a single configuration storage that is used for all installations, or
TLS settings.
- Environment variables that set specific parameters for each server, such as
local directories and paths.
- Command-line options for parameters that must be unique for different TCM instances
running on a single server. For example,
http.port.
Configuration parameter types
TCM configuration parameters have the Go language
types. Note that this is different from the Tarantool configuration parameters,
which have Lua types.
Most options have the Go’s basic types: int and other numeric types, bool, string.
http:
basic-auth:
enabled: false # bool
network: tcp # string
host: 127.0.0.1 # string
port: 8080 # int
request-size: 1572864 # int64
Parameters that can take multiple values are arrays. In YAML, they are passed as
YAML arrays: each item on a new line, starting with a dash.
storage:
provider: etcd
etcd:
endpoints: # array
- https://192.168.0.1:2379 # item 1
- https://192.168.0.2:2379 # item 2
Note
In environment variables and command line options, such arrays are passed as
semicolon-separated strings of items.
Parameters that set timeouts, TTLs, and other duration values, have the Go’s time.Duration
type. Their values can be passed in time-formatted strings such as 4h30m25s.
cluster:
tarantool-timeout: 10s # duration
tarantool-ping-timeout: 5s # duration
Finally, there are parameters whose values are constants defined in Go packages.
For example, http.websession-cookie.same-site
values are constants from the Go’s http.SameSite
type. To find out the exact values available for such parameters, refer to the Go
packages documentation.
http:
websession-cookie:
same-site: SameSiteStrictMode
Creating a configuration template
You can create a YAML configuration template for TCM with all parameters and
their default values using the generate-config option of the tcm executable.
To write a default TCM configuration to the tcm.example.yml file, run:
$ tcm generate-config > tcm.example.yml.
You can use YAML configuration files to create entities in TCM automatically
upon the first start. These entities are defined in the initial-settings
section of the configuration file.
Important
The initial settings are applied only once upon the first TCM start.
Further changes are not applied upon TCM restarts.
To add clusters to TCM upon the first start, specify their settings in the
initial-settings.clusters
configuration section.
The initial-settings.clusters section is an array whose items describe separate clusters,
for example:
initial-settings:
clusters:
- name: Cluster 1
description: First cluster
# cluster settings
- name: Cluster 2
description: Second cluster
# cluster settings
In this configuration, you can specify all cluster settings that you define
when connecting clusters through the TCM web interface.
This includes:
- the cluster name
- description
- additional URLs
- configuration storage connection
- Tarantool instances connection
- and other settings.
For the full list of cluster configuration parameters, see the initial-settings.clusters
reference. For example, this is how you add a cluster that uses an etcd configuration
storage:
initial-settings:
clusters:
- name: My cluster
description: Cluster description
urls:
- label: Test
url: http://example.com
storage-connection:
provider: etcd
etcd-connection:
endpoints:
- http://127.0.0.1:2379
username: ""
password: ""
prefix: /cluster1
tarantool-connection:
username: guest
password: ""
By default, TCM contains a cluster named Default cluster with ID
00000000-0000-0000-0000-000000000000. You can use this ID to modify
the default cluster settings upon the first TCM start. For example, rename it
and add its connection settings:
initial-settings:
clusters:
- id: 00000000-0000-0000-0000-000000000000
name: My cluster
storage-connection:
provider: etcd
etcd-connection:
endpoints:
- http://127.0.0.1:2379
username: etcd-user
password: secret
prefix: /cluster1
tarantool-connection:
username: guest
password: ""
Backend store
Tarantool Cluster Manager uses an underlying data store (backend store) for its entities:
users, roles, cluster connections, settings, and other objects that you manipulate in TCM.
The backend store can be either an etcd or a Tarantool cluster.
For better reliability and scalability, the backend store works independently from TCM.
For example, it can be the same ectd or Tarantool cluster that you use as a centralized configuration storage.
This makes TCM stateless: all objects created or modified in its web UI are saved
to the backend store, and nothing is stored inside the TCM instances themselves.
Any number of instances can duplicate each other when connected to the same backend store.
If you stop all instances, the store still contains their objects. You can continue
working with them right after starting a new instance.
In addition to using an external backend store, you can run TCM with an embedded
etcd or Tarantool instance to use as the backend store.
On this page, you will learn to connect TCM to backend stores of both types,
or start TCM with an embedded backend store.
Configuring backend store connection
The TCM’s connection to its backend store is configured using the storage.*
configuration options. The storage.provider
option selects the store type. It can be either etcd or tarantool.
To use an etcd cluster as a TCM backend store, set the storage.provider option
to etcd and specify connection parameters in storage.etcd.* options.
A minimal etcd configuration includes the storage endpoints:
storage:
provider: etcd
etcd:
endpoints:
- http://127.0.0.1:2379
If authentication is enabled in etcd, specify storage.etcd.username and storage.etcd.password:
storage:
provider: etcd
etcd:
endpoints:
- http://127.0.0.1:2379
username: etcduser
password: secret
The TCM data is stored in etcd under the prefix specified in storage.etcd.prefix.
By default, the prefix is /tcm. If you want to change it or store data of
different TCM instances separately in one etcd cluster, set the prefix explicitly:
storage:
provider: etcd
etcd:
endpoints:
- http://127.0.0.1:2379
prefix: /tcm2
Other storage.etcd.* options configure various aspects of the etcd store connection,
such as network timeouts and limits or TLS parameters.
For the full list of the etcd TCM backend store options, see the
TCM configuration reference.
For development purposes, you can start TCM with an embedded backend store.
This is useful for local runs when you don’t have or don’t need an external backend store.
Important
Do not use the embedded backend stores in production environments.
An embedded TCM backend store is a single instance of etcd or Tarantool that
is started automatically on the same host during the TCM startup. It runs
in the background until TCM is stopped. The embedded backend store is persistent:
if you start TCM again with the same backend store configuration, it restores
the TCM data from the previous runs.
Note
To start a clean instance of TCM, remove the working directory of the
embedded backend store specified in the storage.etcd.embed.workdir or
storage.tarantool.embed.workdir option.
The embedded backend store parameters are configured using the storage.etcd.embed.* options
for etcd or storage.tarantool.embed.* options for a Tarantool-based store.
To start TCM with an embedded etcd with default settings, set storage.etcd.embed.enabled to true
and leave other storage.* options default:
storage.etcd.embed.enabled: true
You can use the following call to get TCM running with embedded etcd without
a configuration file:
$ tcm --storage.etcd.embed.enabled
To start TCM with an embedded Tarantool storage with default settings:
- set
storage.provider to tarantool
- set
storage.tarantool.embed.enabled to true
storage:
provider: tarantool
tarantool.embed.enabled: true
With command-line arguments:
$ tcm --storage.provider=tarantool --storage.tarantool.embed.enabled
You can tune the embedded backend store, for example, enable and configure TLS on it
or change its working directories or startup arguments. To set specific parameters,
specify the corresponding storage.etcd.embed.* or storage.tarantool.embed.*
options. For the full list of configuration options of embedded backend stores, see the
TCM configuration reference.
Setting up a cluster of embedded backend stores
To simulate the production environment, you can form a distributed multi-instance cluster
from embedded stores of multiple TCM instances. To do this, configure each TCM
instance’s embedded store to join each other.
For etcd, provide the embedded store clustering parameters storage.etcd.embed.*
and specify the endpoints in storage.etcd.endpoints. The options that configure
embedded etcd mostly match the etcd configuration options. For more information
about these options, see the etcd documentation.
Below are example configurations of three TCM instances that start with embedded etcd instances
and form an etcd cluster from them:
First instance:
http:
port: 8080
storage:
provider: etcd
etcd:
endpoints:
- http://127.0.0.1:2379
- http://127.0.0.1:22379
- http://127.0.0.1:32379
embed:
enabled: true
name: infra1
endpoints:
- http://127.0.0.1:2379
advertises:
- http://127.0.0.1:2379
initial-cluster-state: new
initial-cluster: "infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380"
initial-cluster-token: etcd-cluster-1
peer-endpoints:
- http://127.0.0.1:12380
peer-advertises:
- http://127.0.0.1:12380
workdir: node1.etcd
Second instance:
http:
port: 8081
storage:
provider: etcd
etcd:
endpoints:
- http://127.0.0.1:2379
- http://127.0.0.1:22379
- http://127.0.0.1:32379
embed:
enabled: true
name: infra2
endpoints:
- http://127.0.0.1:22379
advertises:
- http://127.0.0.1:22379
initial-cluster-state: new
initial-cluster: "infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380"
initial-cluster-token: etcd-cluster-1
peer-endpoints:
- http://127.0.0.1:22380
peer-advertises:
- http://127.0.0.1:22380
workdir: node2.etcd
Third instance:
http:
port: 8082
storage:
provider: etcd
etcd:
endpoints:
- http://127.0.0.1:2379
- http://127.0.0.1:22379
- http://127.0.0.1:32379
embed:
enabled: true
name: infra3
endpoints:
- http://127.0.0.1:32379
advertises:
- http://127.0.0.1:32379
initial-cluster-state: new
initial-cluster: "infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380"
initial-cluster-token: etcd-cluster-1
peer-endpoints:
- http://127.0.0.1:32380
peer-advertises:
- http://127.0.0.1:32380
workdir: node3.etcd
To set up a cluster from embedded Tarantool-based backend stores:
- Specify the Tarantool cluster configuration in
storage.tarantool.embed.config
(as a plain text) or storage.tarantool.embed.config-file (as a YAML file).
- Assign an instance name from this configuration to each instance using
storage.tarantool.embed.args
to each embedded store.
Below are example configurations of three TCM instances that start with embedded
Tarantool-based backend stores and form a cluster from them:
First instance:
http:
port: 8080
storage:
provider: tarantool
tarantool:
addrs:
- http://127.0.0.1:3301
- http://127.0.0.1:3302
- http://127.0.0.1:3303
embed:
enabled: true
executable: /path/to/execfile/tarantool-enterprise/tarantool
config-filename: config.yml
workdir: node1.tarantool
args:
- --name
- instance-001
- --config
- config.yml
Second instance:
http:
port: 8081
storage:
provider: tarantool
tarantool:
addrs:
- http://127.0.0.1:3301
- http://127.0.0.1:3302
- http://127.0.0.1:3303
embed:
enabled: true
executable: /path/to/execfile/tarantool-enterprise/tarantool
config-filename: config.yml
workdir: node2.tarantool
args:
- --name
- instance-002
- --config
- config.yml
Third instance:
http:
port: 8082
storage:
provider: tarantool
tarantool:
addrs:
- http://127.0.0.1:3301
- http://127.0.0.1:3302
- http://127.0.0.1:3303
embed:
enabled: true
executable: /path/to/execfile/tarantool-enterprise/tarantool
config-filename: config.yml
workdir: node3.tarantool
args:
- --name
- instance-003
- --config
- config.yml
Development mode
Tarantool Cluster Manager provides a special mode aimed to use during the development.
This mode extends the web interface with capabilities that can help in development
or testing environments, such as starting and stopping instances or instance promotion.
Enabling development mode
You can enable TCM development mode in different ways: in its web interface,
in the configuration file, using an environment variable, or using a command-line option.
To enable development mode on the running TCM instance, use its web interface:
- Open user settings: click Settings under the user name in the header.
- Go to the About tab.
- Click the toggle button beside tcm/mode.
To start TCM in the development mode, specify the mode: development option
in its configuration file:
# tcm_config.yaml
mode: development
To start TCM in the development mode, specify the --mode=development command-line option:
To make new TCM instances start in the development mode by default, set the
TCM_MODE environment variable to development:
$ export TCM_MODE=development
$ tcm
Configuration reference
This topic describes configuration parameters of Tarantool Cluster Manager.
There are the following groups of TCM configuration parameters:
The cluster group defines parameters of TCM interaction with connected
Tarantool clusters.
-
cluster.connection-rate-limit
A rate limit for connections to Tarantool instances.
Type: uint
Default: 512
Environment variable: TCM_CLUSTER_CONNECTION_RATE_LIMIT
Command-line option: --cluster.connection-rate-limit
-
cluster.tarantool-timeout
A timeout for receiving a response from Tarantool instances.
Type: time.Duration
Default: 10s
Environment variable: TCM_CLUSTER_TARANTOOL_TIMEOUT
Command-line option: --cluster.tarantool-timeout
-
cluster.tarantool-ping-timeout
A timeout for receiving a ping response from Tarantool instances.
Type: time.Duration
Default: 5s
Environment variable: TCM_CLUSTER_TARANTOOL_PING_TIMEOUT
Command-line option: --cluster.tarantool-ping-timeout
-
cluster.tt-command
The command that runs the tt utility on hosts with cluster instances.
Type: string
Default: tt
Environment variable: TCM_CLUSTER_TT_COMMAND
Command-line option: --cluster.tt-command
-
cluster.refresh-state-period
The time interval for refreshing the cluster instances state on the Stateboard.
Type: time.Duration
Default: 5s
Environment variable: TCM_CLUSTER_REFRESH_STATE_PERIOD
Command-line option: --cluster.refresh-state-period
-
cluster.refresh-state-timeout
The time limit for refreshing an instance state.
If this limit is reached, an error is shown.
Type: time.Duration
Default: 4s
Environment variable: TCM_CLUSTER_REFRESH_STATE_TIMEOUT
Command-line option: --cluster.refresh-state-timeout
-
cluster.discovery-period
The time interval for checking the leadership in replica sets.
Type: time.Duration
Default: 4s
Environment variable: TCM_CLUSTER_DISCOVERY_PERIOD
Command-line option: --cluster.discovery-period
-
cluster.sharding-index
The name of the space field that is used as a sharding key.
Type: string
Default: bucket_id
Environment variable: TCM_CLUSTER_SHARDING_INDEX
Command-line option: --cluster.sharding-index
-
cluster.skew-time
The maximum time skew between any two cluster instances.
If this limit is reached, a warning is shown.
Type: time.Duration
Default: 30s
Environment variable: TCM_CLUSTER_SKEW_TIME
Command-line option: --cluster.skew-time
-
cluster.fragmentation-threshold
The count of allocated slabs that reflects high memory fragmentation.
When this number is reached, a warning is shown.
See also: Storing data with memtx
Type: int
Default: 40
Environment variable: TCM_CLUSTER_FRAGMENTATION_THRESHOLD
Command-line option: --cluster.fragmentation-threshold
The http group defines parameters of HTTP connections between TCM and clients.
-
http.network
An addressing scheme that TCM uses.
Possible values:
tcp: IPv4 address
tcp6: IPv6 address
unix: Unix domain socket
Type: string
Default: tcp
Environment variable: TCM_HTTP_NETWORK
Command-line option: --http.network
-
http.host
A host name on which TCM serves.
Type: string
Default: 127.0.0.1
Environment variable: TCM_HTTP_HOST
Command-line option: --http.host
-
http.port
A port on which TCM serves.
Type: int
Default: 8080
Environment variable: TCM_HTTP_PORT
Command-line option: --http.port
-
http.request-size
The maximum size (in bytes) of a client HTTP request to TCM.
Type: int64
Default: 1572864
Environment variable: TCM_HTTP_REQUEST_SIZE
Command-line option: --http.request-size
-
http.websocket.read-buffer-size
The size (in bytes) of the read buffer for WebSocket
connections.
Type: int
Default: 16384
Environment variable: TCM_HTTP_WEBSOCKET_READ_BUFFER_SIZE
Command-line option: --http.websocket.read-buffer-size
-
http.websocket.write-buffer-size
The size (in bytes) of the write buffer for WebSocket
connections.
Type: int
Default: 16384
Environment variable: TCM_HTTP_WEBSOCKET_WRITE_BUFFER_SIZE
Command-line option: --http.websocket.write-buffer-size
-
http.websocket.keepalive-ping-interval
The time interval for sending WebSocket
keepalive pings.
Type: time.Duration
Default: 20s
Environment variable: TCM_HTTP_WEBSOCKET_KEEPALIVE_PING_INTERVAL
Command-line option: --http.websocket.keepalive-ping-interval
-
http.websocket.handshake-timeout
The time limit for completing a WebSocket
opening handshake with a client.
Type: time.Duration
Default: 10s
Environment variable: TCM_HTTP_WEBSOCKET_HANDSHAKE_TIMEOUT
Command-line option: --http.websocket.handshake-timeout
-
http.websocket.init-timeout
The time limit for establishing a WebSocket
connection with a client.
Type: time.Duration
Default: 15s
Environment variable: TCM_HTTP_WEBSOCKET_INIT_TIMEOUT
Command-line option: --http.websocket.init-timeout
-
http.websession-cookie.name
The name of the cookie that TCM sends to clients.
This value is used as the cookie name in the Set-Cookie
HTTP response header.
Type: string
Default: tcm
Environment variable: TCM_HTTP_WEBSESSION_COOKIE_NAME
Command-line option: ---http.websession-cookie.name
-
http.websession-cookie.path
The URL path that must be present in the requested URL in order to send the cookie.
This value is used in the Path attribute of the Set-Cookie
HTTP response header.
Type: string
Default: “”
Environment variable: TCM_HTTP_WEBSESSION_COOKIE_PATH
Command-line option: ---http.websession-cookie.path
-
http.websession-cookie.domain
The domain to which the cookie can be sent.
This value is used in the Domain attribute of the Set-Cookie
HTTP response header.
Type: string
Default: “”
Environment variable: TCM_HTTP_WEBSESSION_COOKIE_DOMAIN
Command-line option: ---http.websession-cookie.domain
-
http.websession-cookie.ttl
The maximum lifetime of the TCM cookie.
This value is used in the Max-Age attribute of the Set-Cookie
HTTP response header.
Type: time.Duration
Default: 2h0m0s
Environment variable: TCM_HTTP_WEBSESSION_COOKIE_TTL
Command-line option: ---http.websession-cookie.ttl
-
http.websession-cookie.secure
Indicates whether the cookie can be sent only over the HTTPS protocol.
In this case, it’s never sent over the unencrypted HTTP, therefore preventing
man-in-the-middle attacks.
When true, the Secure attribute is added to the Set-Cookie
HTTP response header.
Type: bool
Default: false
Environment variable: TCM_HTTP_WEBSESSION_COOKIE_SECURE
Command-line option: ---http.websession-cookie.secure
-
http.websession-cookie.http-only
Indicates that the cookie can’t be accessed from the JavaScript
Document.cookie API.
This helps mitigate cross-site scripting attacks.
When true, the HttpOnly attribute is added to the Set-Cookie
HTTP response header.
Type: bool
Default: true
Environment variable: TCM_HTTP_WEBSESSION_COOKIE_HTTP_ONLY
Command-line option: ---http.websession-cookie.http-only
-
http.websession-cookie.same-site
Indicates if it is possible to send the TCM cookie along with cross-site
requests. Possible values are the Go’s http.SameSite constants:
SameSiteDefaultMode
SameSiteLaxMode
SameSiteStrictMode
SameSiteNoneMode
For details on SameSite modes, see the Set-Cookie header documentation
in the MDN web docs.
This value is used in the SameSite attribute of the Set-Cookie
HTTP response header.
Type: http.SameSite
Default: SameSiteDefaultMode
Environment variable: TCM_HTTP_WEBSESSION_COOKIE_SAME_SITE
Command-line option: ---http.websession-cookie.same-site
-
http.cors.enabled
Indicates whether to use the Cross-Origin Resource Sharing
(CORS).
Type: bool
Default: false
Environment variable: TCM_HTTP_CORS_ENABLED
Command-line option: --http.cors.enabled
-
http.cors.allowed-origins
The origins
with which the HTTP response can be shared, separated by semicolons.
The specified values are sent in the Access-Control-Allow-Origin
HTTP response headers.
Type: []string
Default: []
Environment variable: TCM_HTTP_CORS_ALLOWED_ORIGINS
Command-line option: --http.cors.allowed-origins
-
http.cors.allowed-methods
HTTP request methods that are allowed when accessing a resource,
separated by semicolons.
The specified values are sent in the Access-Control-Allow-Methods
HTTP header of a response to a CORS preflight request.
Type: []string
Default: []
Environment variable: TCM_HTTP_CORS_ALLOWED_METHODS
Command-line option: --http.cors.allowed-methods
HTTP headers that are allowed during the actual request, separated by semicolons.
The specified values are sent in the Access-Control-Allow-Headers
HTTP header of a response to a CORS preflight request.
Type: []string
Default: []
Environment variable: TCM_HTTP_CORS_ALLOWED_HEADERS
Command-line option: --http.cors.allowed-headers
Response headers that should be made available to scripts running in the browser,
in response to a cross-origin request, separated by semicolons.
The specified values are sent in the Access-Control-Expose-Headers
HTTP response headers.
Type: []string
Default: []
Environment variable: TCM_HTTP_CORS_EXPOSED_HEADERS
Command-line option: --http.cors.exposed-headers
-
http.cors.allow-credentials
Whether to expose the response to the frontend JavaScript code when the request’s
credentials
mode is include.
When true, the Access-Control-Allow-Credentials
HTTP response header is sent.
Type: bool
Default: false
Environment variable: TCM_HTTP_CORS_ALLOW_CREDENTIALS
Command-line option: --http.cors.allow-credentials
-
http.cors.debug
For debug purposes.
Type: bool
Default: false
-
http.tls.enabled
Indicates whether TLS is enabled for client connections to TCM.
Type: bool
Default: false
Environment variable: TCM_HTTP_TLS_ENABLED
Command-line option: --http.tls.enabled
-
http.tls.cert-file
A path to a TLS certificate file. Mandatory when TLS is enabled.
Type: string
Default: “”
Environment variable: TCM_HTTP_TLS_CERT_FILE
Command-line option: --http.tls.cert-file
-
http.tls.key-file
A path to a TLS private key file. Mandatory when TLS is enabled.
Type: string
Default: “”
Environment variable: TCM_HTTP_TLS_KEY_FILE
Command-line option: --http.tls.key-file
-
http.tls.server
The TLS server.
Type: string
Default: “”
Environment variable: TCM_HTTP_TLS_SERVER
Command-line option: --http.tls.server
-
http.tls.min-version
The minimum version of the TLS protocol.
Type: uint16
Default: 0
Environment variable: TCM_HTTP_TLS_MIN_VERSION
Command-line option: --http.tls.min-version
-
http.tls.max-version
The maximum version of the TLS protocol.
Type: uint16
Default: 0
Environment variable: TCM_HTTP_TLS_MAX_VERSION
Command-line option: --http.tls.max-version
-
http.tls.curve-preferences
Elliptic curves that are used for TLS connections.
Possible values are the Go’s tls.CurveID constants:
CurveP256
CurveP384
CurveP521
X25519
Type: []tls.CurveID
Default: []
Environment variable: TCM_HTTP_TLS_CURVE_PREFERENCES
Command-line option: --http.tls.curve-preferences
-
http.tls.cipher-suites
Enabled TLS cipher suites. The supported ciphers are:
- TLS 1.0 - 1.2 cipher suites:
- TLS_RSA_WITH_RC4_128_SHA
- TLS_RSA_WITH_3DES_EDE_CBC_SHA
- TLS_RSA_WITH_AES_128_CBC_SHA
- TLS_RSA_WITH_AES_256_CBC_SHA
- TLS_RSA_WITH_AES_128_CBC_SHA256
- TLS_RSA_WITH_AES_128_GCM_SHA256
- TLS_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_ECDSA_WITH_RC4_128_SHA
- TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
- TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
- TLS_ECDHE_RSA_WITH_RC4_128_SHA
- TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA
- TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
- TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
- TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256
- TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
- TLS 1.3 cipher suites:
- TLS_AES_128_GCM_SHA256
- TLS_AES_256_GCM_SHA384
- TLS_CHACHA20_POLY1305_SHA256
- TLS_FALLBACK_SCSV isn’t a standard cipher suite but an indicator that the client is doing version fallback
- TLS_FALLBACK_SCSV uint16 = 0x5600
- TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 = TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 = TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA25
For detailed information on ciphers, refer to the Golang tls.TLS_* constants.
The example below shows how to configure cipher suites:
http:
tls:
cipher-suites:
- TLS_AES_256_GCM_SHA384
- TLS_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_DHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_DHE_RSA_WITH_AES_128_GCM_SHA256
Type: []uint16
Default: []
Environment variable: TCM_HTTP_TLS_CIPHER_SUITES
Command-line option: --http.tls.cipher-suites
-
http.read-timeout
A timeout for reading an incoming request.
Type: time.Duration
Default: 30s
Environment variable: TCM_HTTP_READ_TIMEOUT
Command-line option: --http.read-timeout
A timeout for reading headers of an incoming request.
Type: time.Duration
Default: 30s
Environment variable: TCM_HTTP_READ_HEADER_TIMEOUT
Command-line option: --http.read-header-timeout
-
http.write-timeout
A timeout for writing a response.
Type: time.Duration
Default: 30s
Environment variable: TCM_HTTP_WRITE_TIMEOUT
Command-line option: --http.write-timeout
-
http.idle-timeout
The timeout for idle connections.
Type: time.Duration
Default: 30s
Environment variable: TCM_HTTP_IDLE_TIMEOUT
Command-line option: --http.idle-timeout
-
http.disable-general-options-handler
Whether the client requests with the OPTIONS HTTP method are allowed.
Type: bool
Default: false
Environment variable: TCM_HTTP_DISABLE_GENERAL_OPTIONS_HANDLER
Command-line option: --http.disable-general-options-handler
The maximum size (in bytes) of a header in a client’s request to TCM.
Type: int
Default: 0
Environment variable: TCM_HTTP_MAX_HEADER_BYTES
Command-line option: --http.max-header-bytes
-
http.api-timeout
The stateboard update timeout.
Type: time.Duration
Default: 8s
Environment variable: TCM_HTTP_API_TIMEOUT
Command-line option: --http.api-timeout
-
http.api-update-interval
The stateboard update interval.
Type: time.Duration
Default: 5s
Environment variable: TCM_HTTP_API_UPDATE_INTERVAL
Command-line option: --http.api-update-interval
-
http.frontend-dir
The directory with custom TCM frontend files (for development purposes).
Type: string
Default: “”
Environment variable: TCM_HTTP_FRONTEND_DIR
Command-line option: --http.frontend-dir
-
http.show-stack-trace
Whether error stack traces are shown in the web UI.
Type: bool
Default: true
Environment variable: TCM_HTTP_SHOW_STACK_TRACE
Command-line option: --http.show-stack-trace
-
http.trace
Whether all query tracing information is written in logs.
Type: bool
Default: false
Environment variable: TCM_HTTP_TRACE
Command-line option: --http.trace
-
http.max-static-size
The maximum size (in bytes) of a static content sent to TCM.
Type: int
Default: 104857600
Environment variable: TCM_HTTP_MAX_STATIC_SIZE
Command-line option: --http.max-static-size
-
http.graphql.complexity
The maximum complexity of
GraphQL queries that TCM processes. If this value is exceeded, TCM
returns an error.
Type: int
Default: 40
Environment variable: TCM_HTTP_GRAPHQL_COMPLEXITY
Command-line option: --http.graphql.complexity
The log section defines the TCM logging parameters.
-
log.default.add-source
Whether sources are added to the TCM log.
Type: bool
Default: false
Environment variable: TCM_LOG_DEFAULT_ADD_SOURCE
Command-line option: --log.default.add-source
-
log.default.show-stack-trace
Whether stack traces are added to the TCM log.
Type: bool
Default: false
Environment variable: TCM_LOG_DEFAULT_SHOW_STACK_TRACE
Command-line option: --log.default.show-stack-trace
-
log.default.level
The default TCM logging level.
Possible values:
Type: string
Default: INFO
Environment variable: TCM_LOG_DEFAULT_LEVEL
Command-line option: --log.default.level
-
log.default.format
TCM log entries format.
Possible values:
Type: string
Default: struct
Environment variable: TCM_LOG_DEFAULT_FORMAT
Command-line option: --log.default.format
-
log.default.output
The output used for TCM log.
Possible values:
stdout
stderr
file
syslog
Type: string
Default: stdout
Environment variable: TCM_LOG_DEFAULT_OUTPUT
Command-line option: --log.default.output
-
log.default.no-colorized
Whether the stdout log is not colorized.
Type: bool
Default: false
Environment variable: TCM_LOG_DEFAULT_NO_COLORIZED
Command-line option: --log.default.no-colorized
-
log.default.file.name
The name of the TCM log file.
Type: string
Default: “”
Environment variable: TCM_LOG_DEFAULT_FILE_NAME
Command-line option: --log.default.file.name
-
log.default.file.maxsize
The maximum size (in bytes) of the TCM log file.
Type: int
Default: 0
Environment variable: TCM_LOG_DEFAULT_FILE_MAXSIZE
Command-line option: --log.default.file.maxsize
-
log.default.file.maxage
The maximum age of a TCM log file, in days.
Type: int
Default: 0
Environment variable: TCM_LOG_DEFAULT_FILE_MAXAGE
Command-line option: --log.default.file.maxage
-
log.default.file.maxbackups
The maximum number of users in TCM.
Type: int
Default: 0
Environment variable: TCM_LOG_DEFAULT_FILE_MAXBACKUPS
Command-line option: --log.default.file.maxbackups
-
log.default.file.compress
Indicated that TCM compresses log files upon rotation.
Type: bool
Default: false
Environment variable: TCM_LOG_DEFAULT_FILE_COMPRESS
Command-line option: --log.default.file.compress
-
log.default.syslog.protocol
The network protocol used for connecting to the syslog server. Typically,
it’s tcp, udp, or unix. All possible values are listed in the Go’s
net.Dial documentation.
Type: string
Default: tcp
Environment variable: TCM_LOG_DEFAULT_SYSLOG_PROTOCOL
Command-line option: --log.default.syslog.protocol
-
log.default.syslog.output
The syslog server URI.
Type: string
Default: 127.0.0.1:5514
Environment variable: TCM_LOG_DEFAULT_SYSLOG_OUTPUT
Command-line option: --log.default.syslog.output
-
log.default.syslog.priority
The syslog severity level.
Type: string
Default: “”
Environment variable: TCM_LOG_DEFAULT_SYSLOG_PRIORITY
Command-line option: --log.default.syslog.priority
-
log.default.syslog.facility
The syslog facility.
Type: string
Default: “”
Environment variable: TCM_LOG_DEFAULT_SYSLOG_FACILITY
Command-line option: --log.default.syslog.facility
-
log.default.syslog.tag
The syslog tag.
Type: string
Default: “”
Environment variable: TCM_LOG_DEFAULT_SYSLOG_TAG
Command-line option: --log.default.syslog.tag
-
log.default.syslog.timeout
The timeout for connecting to the syslog server.
Type: time.Duration
Default: 10s
Environment variable: TCM_LOG_DEFAULT_SYSLOG_TIMEOUT
Command-line option: --log.default.syslog.timeout
-
log.outputs
An array of log outputs that TCM uses in addition to the default one
that is defined by the log.default.* parameters. Each array item can include
the parameters of the log.default group. If a parameter is skipped, its
value is taken from log.default.
Type: []LogOuputConfig
Default: []
Environment variable: TCM_LOG_OUTPUTS
Command-line option: --log-outputs
The storage section defines the parameters of the TCM backend store.
etcd backend store parameters:
Tarantool backend store parameters:
-
storage.provider
The type of the storage used for storing TCM configuration.
Possible values:
Type: string
Default: etcd
Environment variable: TCM_STORAGE_PROVIDER
Command-line option: --storage.provider
-
storage.etcd.prefix
A prefix for the TCM configuration parameters in etcd.
Type: string
Default: “/tcm”
Environment variable: TCM_STORAGE_ETCD_PREFIX
Command-line option: --storage.etcd.prefix
-
storage.etcd.endpoints
An array of node URIs of the etcd cluster where the TCM configuration is stored,
separated by semicolons (;).
Type: []string
Environment variable: TCM_STORAGE_ETCD_ENDPOINTS
Command-line option: --storage.etcd.endpoints
-
storage.etcd.dial-timeout
An etcd dial timeout.
Type: time.Duration
Default: 10s
Environment variable: TCM_STORAGE_ETCD_DIAL_TIMEOUT
Command-line option: --storage.etcd.dial-timeout
-
storage.etcd.auto-sync-interval
An automated sync interval.
Type: time.Duration
Default: 0 (disabled)
Environment variable: TCM_STORAGE_ETCD_AUTO_SYNC_INTERVAL
Command-line option: --storage.etcd.auto-sync-interval
-
storage.etcd.dial-keep-alive-time
A dial keep-alive time.
Type: time.Duration
Default: 30s
Environment variable: TCM_STORAGE_ETCD_DIAL_KEEP_ALIVE_TIME
Command-line option: --storage.etcd.dial-keep-alive-time
-
storage.etcd.dial-keep-alive-timeout
A dial keep-alive timeout.
Type: time.Duration
Default: 30s
Environment variable: TCM_STORAGE_ETCD_DIAL_KEEP_ALIVE_TIMEOUT
Command-line option: --storage.etcd.dial-keep-alive-timeout
-
storage.etcd.bootstrap-timeout
A bootstrap timeout.
Type: time.Duration
Default: 30s
Environment variable: TCM_STORAGE_ETCD_BOOTSTRAP_TIMEOUT
Command-line option: --storage.etcd.bootstrap-timeout
-
storage.etcd.max-call-send-msg-size
The maximum size (in bytes) of a transaction between TCM and etcd.
Type: int
Default: 2097152
Environment variable: TCM_STORAGE_ETCD_MAX_CALL_SEND_MSG_SIZE
Command-line option: --storage.etcd.max-call-send-msg-size
-
storage.etcd.username
A username for accessing the etcd storage.
Type: string
Default: “”
Environment variable: TCM_STORAGE_ETCD_USERNAME
Command-line option: --storage.etcd.username
-
storage.etcd.password
A password for accessing the etcd storage.
Type: string
Default: “”
Environment variable: TCM_STORAGE_ETCD_PASSWORD
Command-line option: --storage.etcd.password
-
storage.etcd.password-file
A path to the file with a password for accessing the etcd storage.
Type: string
Default: “”
Environment variable: TCM_STORAGE_ETCD_PASSWORD_FILE
Command-line option: --storage.etcd.password-file
-
storage.etcd.tls.enabled
Indicates whether TLS is enabled for etcd connections.
Type: bool
Default: false
Environment variable: TCM_STORAGE_ETCD_TLS_ENABLED
Command-line option: --storage.etcd.tls.enabled
-
storage.etcd.tls.auto
Use generated certificates for etcd connections.
Type: bool
Default: false
Environment variable: TCM_STORAGE_ETCD_TLS_AUTO
Command-line option: --storage.etcd.tls.auto
-
storage.etcd.tls.cert-file
A path to a TLS certificate file to use for etcd connections.
Type: string
Default: “”
Environment variable: TCM_STORAGE_ETCD_TLS_CERT_FILE
Command-line option: --storage.etcd.tls.cert-file
-
storage.etcd.tls.key-file
A path to a TLS private key file to use for etcd connections.
Type: string
Default: “”
Environment variable: TCM_STORAGE_ETCD_TLS_KEY_FILE
Command-line option: --storage.etcd.tls.key-file
-
storage.etcd.tls.trusted-ca-file
A path to a trusted CA certificate file to use for etcd connections.
Type: string
Default: “”
Environment variable: TCM_STORAGE_ETCD_TLS_TRUSTED_CA_FILE
Command-line option: --storage.etcd.tls.trusted-ca-file
-
storage.etcd.tls.client-cert-auth
Indicates whether client cert authentication is enabled.
Type: bool
Default: false
Environment variable: TCM_STORAGE_ETCD_TLS_CLIENT_CERT_AUTH
Command-line option: --storage.etcd.tls.client-cert-auth
-
storage.etcd.tls.crl-file
A path to the client certificate revocation list file.
Type: string
Default: “”
Environment variable: TCM_STORAGE_ETCD_TLS_CRL_FILE
Command-line option: --storage.etcd.tls.crl-file
-
storage.etcd.tls.insecure-skip-verify
Skip checking client certificate in etcd connections.
Type: bool
Default: false
Environment variable: TCM_STORAGE_ETCD_TLS_INSECURE_SKIP_VERIFY
Command-line option: --storage.etcd.tls.insecure-skip-verify
-
storage.etcd.tls.skip-client-san-verify
Skip verification of SAN field in client certificate for etcd connections.
Type: bool
Default: false
Environment variable: TCM_STORAGE_ETCD_TLS_SKIP_CLIENT_SAN_VERIFY
Command-line option: --storage.etcd.tls.skip-client-san-verify
-
storage.etcd.tls.server-name
Name of the TLS server for etcd connections.
Type: string
Default: “”
Environment variable: TCM_STORAGE_ETCD_TLS_SERVER_NAME
Command-line option: --storage.etcd.tls.server-name
-
storage.etcd.tls.cipher-suites
TLS cipher suites for etcd connections. Possible values are the Golang tls.TLS_* constants.
Type: []uint16
Default: []
Environment variable: TCM_STORAGE_ETCD_TLS_CIPHER_SUITES
Command-line option: --storage.etcd.tls.cipher-suites
-
storage.etcd.tls.allowed-cn
An allowed common name for authentication in etcd connections.
Type: string
Default: “”
Environment variable: TCM_STORAGE_ETCD_TLS_ALLOWED_CN
Command-line option: --storage.etcd.tls.allowed-cn
-
storage.etcd.tls.allowed-hostname
An allowed TLS certificate name for authentication in etcd connections.
Type: string
Default: “”
Environment variable: TCM_STORAGE_ETCD_TLS_ALLOWED_HOSTNAME
Command-line option: --storage.etcd.tls.allowed-hostname
-
storage.etcd.tls.empty-cn
Whether the empty common name is allowed in etcd connections.
Type: bool
Default: false
Environment variable: TCM_STORAGE_ETCD_TLS_EMPTY_CN
Command-line option: --storage.etcd.tls.empty-cn
-
storage.etcd.permit-without-stream
Whether keepalive pings can be send to the etcd server without active streams.
Type: bool
Default: false
Environment variable: TCM_STORAGE_ETCD_PERMIT_WITHOUT_STREAM
Command-line option: --storage.etcd.permit-without-stream
The storage.etcd.embed group defines the configuration of the embedded etcd
cluster to use as a TCM backend store.
This cluster can be used for development purposes when the production or testing
etcd cluster is not available or not needed.
See also Embedded backend store.
-
storage.tarantool.prefix
A prefix for the TCM configuration parameters in the Tarantool-based configuration storage.
Type: string
Default: “/tcm”
Environment variable: TCM_STORAGE_TARANTOOL_PREFIX
Command-line option: --storage.tarantool.prefix
-
storage.tarantool.addr
The URI for connecting to the Tarantool-based configuration storage.
Type: string
Default: “unix/:/tmp/tnt_config_instance.sock”
Environment variable: TCM_STORAGE_TARANTOOL_ADDR
Command-line option: --storage.tarantool.addr
-
storage.tarantool.addrs
An array of the Tarantool-based configuration storage URIs.
Type: []string
Default: [“unix/:/tmp/tnt_config_instance.sock”]
Environment variable: TCM_STORAGE_TARANTOOL_ADDRS
Command-line option: --storage.tarantool.addrs
-
storage.tarantool.auth
An authentication method for the Tarantool-based configuration storage.
Possible values are the Go’s go-tarantool/Auth constants:
AutoAuth (0)
ChapSha1Auth
PapSha256Auth
Type: int
Default: 0
Environment variable: TCM_STORAGE_TARANTOOL_AUTH
Command-line option: --storage.tarantool.auth
-
storage.tarantool.timeout
A request timeout for the Tarantool-based configuration storage.
See also go-tarantool.Opts.
Type: time.Duration
Default: 0s
Environment variable: TCM_STORAGE_TARANTOOL_TIMEOUT
Command-line option: --storage.tarantool.timeout
-
storage.tarantool.reconnect
A timeout between reconnect attempts for the Tarantool-based configuration storage.
See also go-tarantool.Opts.
Type: time.Duration
Default: 0s
Environment variable: TCM_STORAGE_TARANTOOL_RECONNECT
Command-line option: --storage.tarantool.reconnect
-
storage.tarantool.max-reconnects
The maximum number of reconnect attempts for the Tarantool-based configuration storage.
See also go-tarantool.Opts.
Type: int
Default: 0
Environment variable: TCM_STORAGE_TARANTOOL_MAX_RECONNECTS
Command-line option: --storage.tarantool.max-reconnects
-
storage.tarantool.username
A username for connecting to the Tarantool-based configuration storage.
See also go-tarantool.Opts.
Type: string
Default: “”
Environment variable: TCM_STORAGE_TARANTOOL_USERNAME
Command-line option: --storage.tarantool.username
-
storage.tarantool.password
A password for connecting to the Tarantool-based configuration storage.
See also go-tarantool.Opts.
Type: string
Default: “”
Environment variable: TCM_STORAGE_TARANTOOL_PASSWORD
Command-line option: --storage.tarantool.password
-
storage.tarantool.password-file
A path to the file with a password for connecting to the Tarantool-based configuration storage.
Type: string
Default: “”
Environment variable: TCM_STORAGE_TARANTOOL_PASSWORD_FILE
Command-line option: --storage.tarantool.password-file
-
storage.tarantool.rate-limit
A rate limit for connecting to the Tarantool-based configuration storage.
See also go-tarantool.Opts.
Type: int
Default: 0
Environment variable: TCM_STORAGE_TARANTOOL_RATE_LIMIT
Command-line option: --storage.tarantool.rate-limit
-
storage.tarantool.rate-limit-action
An action to perform when the storage.tarantool.rate-limit is reached.
See also go-tarantool.Opts.
Type: int
Default: 0
Environment variable: TCM_STORAGE_TARANTOOL_RATE_LIMIT_ACTION
Command-line option: --storage.tarantool.rate-limit-action
-
storage.tarantool.concurrency
An amount of separate mutexes for request queues and buffers inside of a connection
to the Tarantool TCM configuration storage.
See also go-tarantool.Opts.
Type: int
Default: 0
Environment variable: TCM_STORAGE_TARANTOOL_CONCURRENCY
Command-line option: --storage.tarantool.concurrency
-
storage.tarantool.skip-schema
Whether the schema is loaded from the Tarantool TCM configuration storage.
See also go-tarantool.Opts.
Type: bool
Default: true
Environment variable: TCM_STORAGE_TARANTOOL_SKIP_SCHEMA
Command-line option: --storage.tarantool.skip-schema
-
storage.tarantool.transport
The connection type for the Tarantool TCM configuration storage.
See also go-tarantool.Opts.
Type: string
Default: “”
Environment variable: TCM_STORAGE_TARANTOOL_TRANSPORT
Command-line option: --storage.tarantool.transport
-
storage.tarantool.ssl.key-file
A path to a TLS private key file to use for connecting to the Tarantool TCM
configuration storage.
See also: Securing connections with SSL.
Type: string
Default: “”
Environment variable: TCM_STORAGE_TARANTOOL_SSL_KEY_FILE
Command-line option: --storage.tarantool.ssl.key-file
-
storage.tarantool.ssl.cert-file
A path to an SSL certificate to use for connecting to the Tarantool TCM
configuration storage.
See also: Securing connections with SSL.
Type: string
Default: “”
Environment variable: TCM_STORAGE_TARANTOOL_SSL_CERT_FILE
Command-line option: --storage.tarantool.ssl.cert-file
-
storage.tarantool.ssl.ca-file
A path to a trusted CA certificate to use for connecting to the Tarantool TCM
configuration storage.
See also: Securing connections with SSL.
Type: string
Default: “”
Environment variable: TCM_STORAGE_TARANTOOL_SSL_CA_FILE
Command-line option: --storage.tarantool.ssl.ca-file
-
storage.tarantool.ssl.ciphers
A list of SSL cipher suites that can be used for connecting to the Tarantool TCM
configuration storage. Possible values are listed in <uri>.params.ssl_ciphers.
See also: Securing connections with SSL.
Type: string
Default: “”
Environment variable: TCM_STORAGE_TARANTOOL_SSL_CIPHERS
Command-line option: --storage.tarantool.ssl.ciphers
-
storage.tarantool.ssl.password
A password for an encrypted private SSL key to use for connecting to the Tarantool TCM
configuration storage.
See also: Securing connections with SSL.
Type: string
Default: “”
Environment variable: TCM_STORAGE_TARANTOOL_SSL_PASSWORD
Command-line option: --storage.tarantool.ssl.password
-
storage.tarantool.ssl.password-file
A text file with passwords for encrypted private SSL keys to use
for connecting to the Tarantool TCM configuration storage.
See also: Securing connections with SSL.
Type: string
Default: “”
Environment variable: TCM_STORAGE_TARANTOOL_SSL_PASSWORD_FILE
Command-line option: --storage.tarantool.ssl.password-file
-
storage.tarantool.required-protocol-info.auth
An authentication method for the Tarantool TCM configuration storage.
Possible values are the Go’s go-tarantool/Auth constants:
AutoAuth (0)
ChapSha1Auth
PapSha256Auth
See also go-tarantool.ProtocolInfo.
Type: int
Default: 0
Environment variable: TCM_STORAGE_TARANTOOL_SSL_REQUIRED_PROTOCOL_INFO_AUTH
Command-line option: --storage.tarantool.required-protocol-info.auth
-
storage.tarantool.required-protocol-info.version
A Tarantool protocol version.
See also go-tarantool.ProtocolInfo.
Type: uint64
Default: 0
Environment variable: TCM_STORAGE_TARANTOOL_SSL_REQUIRED_PROTOCOL_INFO_VERSION
Command-line option: --storage.tarantool.required-protocol-info.version
-
storage.tarantool.required-protocol-info.features
An array of Tarantool protocol features.
See also go-tarantool.ProtocolInfo.
Type: []int
Default: []
Environment variable: TCM_STORAGE_TARANTOOL_SSL_REQUIRED_PROTOCOL_INFO_FEATURES
Command-line option: --storage.tarantool.required-protocol-info.features
The addon section defines settings related to TCM add-ons.
-
addon.enabled
Whether to enable the add-on functionality in TCM.
Type: bool
Default: false
Environment variable: TCM_ADDON_ENABLED
Command-line option: --addon.enabled
-
addon.addons-dir
The directory from which TCM takes add-ons.
Type: string
Default: addons
Environment variable: TCM_ADDON_ADDONS_DIR
Command-line option: --addon.addons-dir
-
addon.max-upload-size
The maximum size (in bytes) of addon to upload to TCM.
Type: int64
Default: 104857600
Environment variable: TCM_ADDON_MAX_UPLOAD_SIZE
Command-line option: --addon.max-upload-size
-
addon.dev-addons-dir
Additional add-on directories for development purposes, separated by semicolons (;).
Type: []string
Default: []
Environment variable: TCM_ADDON_DEV_ADDONS_DIR
Command-line option: --addon.dev-addons-dir
The limits section defines limits on various TCM objects and relations
between them.
-
limits.users-count
The maximum number of users in TCM.
Type: int
Default: 1000
Environment variable: TCM_LIMITS_USERS_COUNT
Command-line option: --limits.users-count
-
limits.clusters-count
The maximum number of clusters in TCM.
Type: int
Default: 10
Environment variable: TCM_LIMITS_CLUSTERS_COUNT
Command-line option: --limits.clusters-count
-
limits.roles-count
The maximum number of roles in TCM.
Type: int
Default: 100
Environment variable: TCM_LIMITS_ROLES_COUNT
Command-line option: --limits.roles-count
-
limits.webhooks-count
The maximum number of webhooks in TCM.
Type: int
Default: 200
Environment variable: TCM_LIMITS_WEBHOOKS_COUNT
Command-line option: --limits.webhooks-count
-
limits.user-secrets-count
The maximum number secrets that a TCM user can have.
Type: int
Default: 10
Environment variable: TCM_LIMITS_USER_SECRETS_COUNT
Command-line option: --limits.user-secrets-count
-
limits.user-websessions-count
The maximum number of open sessions that a TCM user can have.
Type: int
Default: 10
Environment variable: TCM_LIMITS_USER_WEBSESSIONS_COUNT
Command-line option: --limits.user-websessions-count
-
limits.linked-cluster-users
The maximum number of clusters to which a single user can have access.
Type: int
Default: 10
Environment variable: TCM_LIMITS_LINKED_CLUSTER_USERS
Command-line option: --limits.linked-cluster-users
The security section defines the security parameters of TCM.
-
security.auth
Ways to log into TCM.
Possible values:
Type: []string
Default: [local]
Environment variable: TCM_SECURITY_AUTH
Command-line option: --security.auth
-
security.hash-cost
A hash cost for hashing users’ passwords.
Type: int
Default: 12
Environment variable: TCM_SECURITY_HASH_COST
Command-line option: --security.hash-cost
-
security.encryption-key
An encryption key for passwords used by TCM for accessing Tarantool
and etcd clusters.
Type: string
Default: “”
Environment variable: TCM_SECURITY_ENCRYPTION_KEY
Command-line option: --security.encryption-key
-
security.encryption-key-file
A path to the file with the encryption key for passwords used by TCM for accessing Tarantool
and etcd clusters.
Type: string
Default: “”
Environment variable: TCM_SECURITY_ENCRYPTION_KEY_FILE
Command-line option: --security.encryption-key-file
-
security.bootstrap-password
A password for the first login of the admin user. Only for testing purposes.
Type: string
Default: “”
Environment variable: TCM_SECURITY_BOOTSTRAP_PASSWORD
Command-line option: --security.bootstrap-password
-
security.bootstrap-api-token
A default API token for the admin user. Only for testing purposes.
Type: string
Default: “”
Environment variable: TCM_SECURITY_BOOTSTRAP_API_TOKEN
Command-line option: --security.bootstrap-api-token
-
security.integrity-check
Whether to check the digital signature. If true, the error is raised
in case an incorrect signature is detected.
Type: bool
Default: false
Environment variable: TCM_SECURITY_INTEGRITY_CHECK
Command-line option: --security.integrity-check
-
security.signature-private-key-file
A path to a file with the private key to sign TCM data.
Type: string
Default: “”
Environment variable: TCM_SECURITY_SIGNATURE_PRIVATE_KEY_FILE
Command-line option: --security.signature-private-key-file
-
mode
The TCM mode: production, development, or test.
Type: string
Default: production
Environment variable: TCM_MODE
Command-line option: --mode
The feature section defines the security parameters of TCM.
-
feature.ttgraph
Whether Tarantool Graph DB integration is enabled.
Type: bool
Default: false
Environment variable: TCM_FEATURE_TTGRAPH
Command-line option: --feature.ttgraph
-
feature.column-store
Whether Tarantool Column Store integration is enabled.
Type: bool
Default: false
Environment variable: TCM_FEATURE_COLUMN_STORE
Command-line option: --feature.column-store
-
feature.tqe
Whether Tarantool Queue Enterprise integration is enabled.
Type: bool
Default: false
Environment variable: TCM_FEATURE_TQE
Command-line option: --feature.tqe
-
feature.api-token
Whether the use of API tokens is enabled.
Type: bool
Default: false
Environment variable: TCM_FEATURE_API_TOKEN
Command-line option: --feature.api-token
-
feature.tuples
Whether the use of Tuples is enabled.
Type: bool
Default: false
Environment variable: TCM_FEATURE_TUPLES
Command-line option: --feature.tuples
The initial-settings group defines entities that are created automatically
upon the first TCM startup.
See also Initial settings.
Important
The initial-settings.* configuration options can be set in the YAML
configuration file only. There are no environment variables nor
command-line options for them.
-
initial-settings.clusters
An array of clusters to create in TCM automatically upon the first startup.
See also Initial settings.
Type: []Cluster
Default: []
-
initial-settings.clusters.<cluster>.id
Cluster ID. Skip this option to generate an ID automatically.
Specify the value 00000000-0000-0000-0000-000000000000
to customize the default cluster upon TCM startup.
Type: string
Default: “” (ID is generated automatically)
-
initial-settings.clusters.<cluster>.name
Cluster name.
-
initial-settings.clusters.<cluster>.description
Cluster description.
-
initial-settings.clusters.<cluster>.color
A color to highlight the cluster in TCM.
Possible values:
dark
gray
red
pink
grape
violet
indigo
blue
cyan
green
lime
yellow
orange
teal
- empty string (no color)
Type: string
Default: “” (no color)
-
initial-settings.clusters.<cluster>.urls
URLs of additional services for the cluster. See also Adding a new cluster.
Type: []ClusterUrl
Default: []
-
initial-settings.clusters.<cluster>.<url>.label
URL label to show in TCM. Typically, this is the linked service name.
-
initial-settings.clusters.<cluster>.<url>.url
The URL address of the linked service.
-
initial-settings.clusters.<cluster>.storage-connection.provider
The type of the storage used for storing the cluster configuration.
Possible values:
etcd
tarantool
- empty string (undefined)
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.endpoints
An array of node URIs of the etcd cluster where the Tarantool cluster configuration is stored.
Type: []string
Default: []
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.auto-sync-interval
An automated sync interval.
Type: time.Duration
Default: 0 (disabled)
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.dial-timeout
An etcd dial timeout.
Type: time.Duration
Default: 0 (not set)
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.dial-keep-alive-time
A dial keep-alive time.
Type: time.Duration
Default: 0 (not set)
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.dial-keep-alive-timeout
A dial keep-alive timeout.
Type: time.Duration
Default: 0 (not set)
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.max-call-send-msg-size
The maximum size (in bytes) of a request from the cluster to its etcd
configuration storage.
Type: int
Default: 2097152
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.max-call-recv-msg-size
The maximum size (in bytes) of a response to the cluster from its etcd
configuration storage.
Type: int
Default: 0 (unlimited)
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.username
A username for accessing the cluster’s etcd storage.
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.password
A password for accessing the cluster’s etcd storage.
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.reject-old-cluster
Whether etcd should refuse to create a client against an outdated cluster.
Type: bool
Default: false
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.permit-without-stream
Whether keepalive pings can be send to the etcd server without active streams.
Type: bool
Default: false
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.prefix
A prefix for the cluster configuration parameters in etcd.
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.enabled
Indicates whether TLS is enabled for connections to the cluster’s etcd storage.
Type: bool
Default: false
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.cert-file
A path to a TLS certificate file to use for etcd connections.
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.key-file
A path to a TLS private key file to use for etcd connections.
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.trusted-ca-file
A path to a trusted CA certificate file to use for etcd connections.
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.client-cert-auth
Indicates whether client cert authentication is enabled.
Type: bool
Default: false
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.crl-file
A path to the client certificate revocation list file.
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.insecure-skip-verify
Skip checking client certificate in etcd connections.
Type: bool
Default: false
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.skip-client-san-verify
Skip verification of SAN field in client certificate for etcd connections.
Type: bool
Default: false
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.server-name
Name of the TLS server for etcd connections.
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.cipher-suites
TLS cipher suites for etcd connections. Possible values are the Golang tls.TLS_* constants.
Type: []uint16
Default: []
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.allowed-cn
An allowed common name for authentication in etcd connections.
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.allowed-hostname
An allowed TLS certificate name for authentication in etcd connections.
-
initial-settings.clusters.<cluster>.storage-connection.etcd-connection.tls.empty-cn
Whether the empty common name is allowed in etcd connections.
Type: bool
Default: false
-
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.username
A username for connecting to the cluster’s Tarantool-based configuration storage.
-
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.password
A password for connecting to the cluster’s Tarantool-based configuration storage.
-
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.endpoints
An array of the cluster’s Tarantool-based configuration storage URIs.
Type: []string
Default: []
-
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.method
An authentication method for the cluster’s Tarantool-based configuration storage.
Possible values are the Go’s go-tarantool/Auth constants:
AutoAuth (0)
ChapSha1Auth
PapSha256Auth
-
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.prefix
A prefix for the cluster configuration parameters in the Tarantool-based configuration storage.
-
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.ssl.key-file
A path to a TLS private key file to use for connecting to the cluster’s Tarantool-based
configuration storage.
See also: Securing connections with SSL.
-
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.ssl.cert-file
A path to an SSL certificate to use for connecting to the cluster’s Tarantool-based
configuration storage.
See also: Securing connections with SSL.
-
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.ssl.ca-file
A path to a trusted CA certificate to use for connecting to the cluster’s Tarantool-based
configuration storage.
See also: Securing connections with SSL.
-
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.ssl.ciphers
A list of SSL cipher suites that can be used for connecting to the cluster’s Tarantool-based
configuration storage. Possible values are listed in <uri>.params.ssl_ciphers.
See also: Securing connections with SSL.
-
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.ssl.enabled
A password for an encrypted private SSL key to use for connecting to the cluster’s Tarantool-based
configuration storage.
See also: Securing connections with SSL.
-
initial-settings.clusters.<cluster>.storage-connection.tarantool-connection.ssl.password-file
A text file with passwords for encrypted private SSL keys to use
for connecting to the cluster’s Tarantool-based configuration storage.
See also: Securing connections with SSL.
-
initial-settings.clusters.<cluster>.tarantool-connection.username
A username for connecting to the cluster instances.
-
initial-settings.clusters.<cluster>.tarantool-connection.password
A password for connecting to the cluster instances.
-
initial-settings.clusters.<cluster>.tarantool-connection.method
An authentication method for connecting to the cluster.
Possible values are the Go’s go-tarantool/Auth constants:
AutoAuth (0)
ChapSha1Auth
PapSha256Auth
-
initial-settings.clusters.<cluster>.tarantool-connection.timeout
The cluster request timeout.
Type: time.Duration
Default: 0 (not set)
-
initial-settings.clusters.<cluster>.tarantool-connection.rate-limit
The cluster rate limit.
Type: uint
Default: 0 (not set)
-
initial-settings.clusters.<cluster>.tarantool-connection.ssl.key-file
A path to a TLS private key file to use for connecting to the cluster instances.
See also: Securing connections with SSL.
-
initial-settings.clusters.<cluster>.tarantool-connection.ssl.cert-file
A path to an SSL certificate to use for connecting to the cluster instances.
See also: Securing connections with SSL.
-
initial-settings.clusters.<cluster>.tarantool-connection.ssl.ca-file
A path to a trusted CA certificate to use for connecting to the cluster instances.
See also: Securing connections with SSL.
-
initial-settings.clusters.<cluster>.tarantool-connection.ssl.ciphers
A list of SSL cipher suites that can be used for connecting to the cluster instances.
Possible values are listed in <uri>.params.ssl_ciphers.
See also: Securing connections with SSL.
-
initial-settings.clusters.<cluster>.tarantool-connection.ssl.enabled
A password for an encrypted private SSL key to use for connecting to the cluster instances.
See also: Securing connections with SSL.
-
initial-settings.clusters.<cluster>.tarantool-connection.ssl.password-file
A text file with passwords for encrypted private SSL keys to use
for connecting to the cluster instances.
See also: Securing connections with SSL.
Integrity check
TCM supports the integrity check mechanism.
The integrity check mechanism in TCM verifies the digital signature of centralized configuration files.
It ensures that TCM only applies configurations that are signed with a trusted private key.
This mechanism allows TCM to:
- Update the configuration with integrity check support
- Detect unauthorized changes in centralized configuration
Interactive console
The interactive console is Tarantool’s basic command-line interface for entering requests
and seeing results.
It is what users see when they start the server
without an instance file.
The interactive console is often called the Lua console to distinguish it from the administrative console,
but in fact it can handle both Lua and SQL input.
The majority of examples in this manual show what users see with the interactive console.
It includes:
tarantool> prompt
- instruction (a Lua request or an SQL statement)
- response (a display in either YAML or Lua format)
Interactive console input and output
The input language can be either Lua (default) or SQL. To change the input
language, run \set language <language>, for example:
The delimiter can be changed to any character with \set delimiter <character>.
By default, the delimiter is empty, which means the input does not need to end
with a delimiter.
For example, a common recommendation for SQL input is to use the semicolon delimiter:
The output format can be either YAML (default) or Lua.
To change the output format, run \set output <format>, for example:
The default YAML output format is the following:
- The output starts from a document-start line
"---".
- Each item begins on a separate line starting with
"- ".
- Each sub-item in a nested structure is indented.
- The output ends with a document-end line
"...".
The alternative Lua format for console output is the following:
- There are no lines for document-start or document-end.
- Items are separated by commas.
- Each sub-item in a nested structure is placed inside “
{}” braces.
So, when an input is a Lua object description, the output in the Lua format equals it.
For the Lua output format, you can specify an end of statement symbol.
It is added to the end of each output statement in the current session and
can be used for parsing the output by scripts. By default, the end of statement
symbol is empty. You can change it to any character or character sequence.
To set an end of statement symbol for the current session, run \`set output lua,local_eos=<symbol>`,
for example:
To switch back to the empty end of statement symbol:
The YAML output has better readability.
The Lua output can be reused in requests.
The table below shows output examples in these formats compared with the MsgPack
format, which is good for database storage.
| Type |
Lua input |
Lua output |
YAML output |
MsgPack storage |
| scalar |
1 |
1 |
|
\x01 |
| scalar sequence |
1, 2, 3 |
1, 2, 3 |
|
\x01 \x02 \x03 |
| 2-element table |
{1, 2} |
{1, 2} |
|
0x92 0x01 0x02 |
| map |
{key = 1} |
{key = 1} |
|
\x81 \xa3 \x6b \x65 \x79 \x01 |
The console parameters of a Tarantool instance can also be changed from another
instance using the console built-in module functions.
Since 2.10.0.
| Keyboard shortcut |
Effect |
CTRL+C |
Discard current input with the SIGINT signal in the console mode and
jump to a new line with a default prompt. |
CTRL+D |
Quit Tarantool interactive console. |
Important
Keep in mind that CTRL+C shortcut will shut Tarantool down if there is any currently running command
in the console.
The SIGINT signal stops the instance running in a daemon mode.
LuaJIT memory profiler
Since version 2.7.1, Tarantool
has a built‑in module called misc.memprof that implements a LuaJIT memory
profiler (further in this section we call it the profiler for short).
The profiler provides a memory allocation report that helps analyze Lua code
and find the places that put the most pressure on the Lua garbage collector (GC).
Inside this section:
Working with the profiler
The profiler usage involves two steps:
- Collecting a binary profile of allocations,
reallocations, and deallocations in memory related to Lua
(further, binary memory profile or binary profile for short).
- Parsing the collected binary profile to get
a human-readable profiling report.
Collecting a binary profile
To collect a binary profile for a particular part of the Lua code,
you need to place this part between two misc.memprof functions,
namely, misc.memprof.start() and misc.memprof.stop(), and then execute
the code in Tarantool.
Below is a chunk of Lua code named test.lua to illustrate this.
1 -- Prevent allocations on traces.
2 jit.off()
3 local str, err = misc.memprof.start("memprof_new.bin")
4 -- Lua doesn't create a new frame to call string.rep, and all allocations
5 -- are attributed not to the append() function but to the parent scope.
6 local function append(str, rep)
7 return string.rep(str, rep)
8 end
9
10 local t = {}
11 for i = 1, 1e4 do
12 -- table.insert is the built-in function and all corresponding
13 -- allocations are reported in the scope of the main chunk.
14 table.insert(t,
15 append('q', i)
16 )
17 end
18 local str, err = misc.memprof.stop()
The Lua code for starting the profiler – as in line 3 in the test.lua example above – is:
local str, err = misc.memprof.start(FILENAME)
where FILENAME is the name of the binary file where profiling events are written.
If the operation fails,
for example if it is not possible to open a file for writing or if the profiler is already running,
misc.memprof.start() returns nil as the first result,
an error-message string as the second result,
and a system-dependent error code number as the third result.
If the operation succeeds, misc.memprof.start() returns true.
The Lua code for stopping the profiler – as in line 18 in the test.lua example above – is:
local str, err = misc.memprof.stop()
If the operation fails,
for example if there is an error when the file descriptor is being closed
or if there is a failure during reporting,
misc.memprof.stop() returns nil as the first result,
an error-message string as the second result,
and a system-dependent error code number as the third result.
If the operation succeeds, misc.memprof.stop() returns true.
To generate the file with memory profile in binary format
(in the test.lua code example above
the file name is memprof_new.bin), execute the code in Tarantool:
Tarantool collects the allocation events in memprof_new.bin, puts
the file in its working directory, and closes
the session.
The test.lua code example above also illustrates the memory
allocation logic in some cases that are important to understand for
reading and analyzing
a profiling report:
- Line 2: It is recommended to switch the JIT compilation off by calling
jit.off()
before the profiler start. Refer to the following
note about jitoff for more details.
- Lines 6-8: Tail call optimization doesn’t create a new call frame, so all
allocations inside the function called via the
CALLT/CALLMT bytecodes
are attributed to the function’s caller. See also the comments preceding these lines.
- Lines 14-16: Usually the information about allocations inside Lua built‑ins
is not really
useful for developers. That’s why if a Lua built‑in function is called from
a Lua function, the profiler attributes all allocations to the Lua function.
Otherwise, this event is attributed to a C function.
See also the comments preceding these lines.
Parsing a binary profile and generating a profiling report
After getting the memory profile in binary format, the next step is
to parse it to get a human-readable profiling report. You can do this
via Tarantool by using the following command
(mind the hyphen - before the filename):
$ tarantool -e 'require("memprof")(arg)' - memprof_new.bin
where memprof_new.bin is the binary profile
generated earlier by tarantool test.lua.
Note
There is a slight behavior change here: the tarantool -e ... command
was slightly different in Tarantool versions prior to Tarantool 2.8.1.
Tarantool generates a profiling report and displays it on the console before closing
the session:
ALLOCATIONS
@test.lua:14: 10000 events +50240518 bytes -0 bytes
@test.lua:9: 1 events +32 bytes -0 bytes
@test.lua:8: 1 events +20 bytes -0 bytes
@test.lua:13: 1 events +24 bytes -0 bytes
REALLOCATIONS
@test.lua:13: 13 events +262216 bytes -131160 bytes
Overrides:
@test.lua:13
@test.lua:14: 11 events +49536 bytes -24768 bytes
Overrides:
@test.lua:14
INTERNAL
INTERNAL: 3 events +8448 bytes -16896 bytes
Overrides:
@test.lua:14
DEALLOCATIONS
INTERNAL: 1723 events +0 bytes -483515 bytes
@test.lua:14: 1 events +0 bytes -32768 bytes
HEAP SUMMARY:
@test.lua:14 holds 50248326 bytes: 10010 allocs, 10 frees
@test.lua:13 holds 131080 bytes: 14 allocs, 13 frees
INTERNAL holds 8448 bytes: 3 allocs, 3 frees
@test.lua:9 holds 32 bytes: 1 allocs, 0 frees
@test.lua:8 holds 20 bytes: 1 allocs, 0 frees
Note
On macOS, a report will be different for the same chunk of code because
Tarantool and LuaJIT are built with the GC64 mode enabled for macOS.
Let’s examine the report structure. A report has four sections:
Each section contains event records that are sorted from the most frequent
to the least frequent.
An event record has the following format:
@<filename>:<line_number>: <number_of_events> events +<allocated> bytes -<freed> bytes
where:
<filename> -— a name of the file containing Lua code.
<line_number> -— the line number where the event is detected.
<number_of_events> —- a number of events for this code line.
+<allocated> bytes —- amount of memory allocated during all the events on this line.
-<freed> bytes —- amount of memory freed during all the events on this line.
The Overrides label shows what allocation has been overridden.
See the test.lua chunk above
with the explanation in the comments for some examples.
The INTERNAL label indicates that this event is caused by internal LuaJIT
structures.
Note
Important note regarding the INTERNAL label and the recommendation
of switching the JIT compilation off (jit.off()): this version of the
profiler doesn’t support verbose reporting for allocations on
traces.
If memory allocations are made on a trace,
the profiler can’t associate the allocations with the part of Lua code
that generated the trace. In this case, the profiler labels such allocations
as INTERNAL.
So, if the JIT compilation is on,
new traces will be generated and there will be a mixture of events labeled
INTERNAL in the profiling report: some of them are really caused by
internal LuaJIT structures, but some of them are caused by allocations on
traces.
If you want to have a more definite report without JIT compiler allocations,
call jit.off() before starting the profiling.
And if you want to completely exclude the trace allocations from the report,
remove also the old traces by additionally calling jit.flush() after
jit.off().
Nevertheless, switching the JIT compilation off before the profiling is not
“a must”. It is rather a recommendation, and in some cases,
for example in a production environment, you may need to keep JIT compilation
on to see the full picture of all the memory allocations.
In this case, the majority of the INTERNAL events
are most probably caused by traces.
As for investigating the Lua code with the help of profiling reports,
it is always code-dependent and there can’t be hundred per cent definite
recommendations in this regard. Nevertheless, you can see some of the things
in the Profiling a report analysis example later.
Also, below is the FAQ section with the questions that
most probably can arise while using the profiler.
In this section, some profiler-related points are discussed in
a Q&A format.
Question (Q): Is the profiler suitable for C allocations or allocations
inside C code?
Answer (A): The profiler reports only allocation events caused by the Lua
allocator. All Lua-related allocations, like table or string creation
are reported. But the profiler doesn’t report allocations made by malloc()
or other non-Lua allocators. You can use valgrind to debug them.
Q: Why are there so many INTERNAL allocations in my profiling report?
What does it mean?
A: INTERNAL means that these allocations/reallocations/deallocations are
related to the internal LuaJIT structures or are made on traces.
Currently, the profiler doesn’t verbosely report allocations of objects
that are made during trace execution. Try adding jit.off()
before the profiler start.
Q: Why are there some reallocations/deallocations without an Overrides
section?
A: These objects can be created before the profiler starts. Adding
collectgarbage() before the profiler’s start enables collecting all
previously allocated objects that are dead when the profiler starts.
Q: Why are some objects not collected during profiling? Is it
a memory leak?
A: LuaJIT uses incremental Garbage Collector (GC). A GC cycle may not be
finished at the moment the profiler stops. Add collectgarbage() before
stopping the profiler to collect all the dead objects for sure.
Q: Can I profile not just a current chunk but the entire running application?
Can I start the profiler when the application is already running?
A: Yes. Here is an example of code that can be inserted in the Tarantool
console for a running instance.
1 local fiber = require "fiber"
2 local log = require "log"
3
4 fiber.create(function()
5 fiber.name("memprof")
6
7 collectgarbage() -- Collect all objects already dead
8 log.warn("start of profile")
9
10 local st, err = misc.memprof.start(FILENAME)
11 if not st then
12 log.error("failed to start profiler: %s", err)
13 end
14
15 fiber.sleep(TIME)
16
17 collectgarbage()
18 st, err = misc.memprof.stop()
19
20 if not st then
21 log.error("profiler on stop error: %s", err)
22 end
23
24 log.warn("end of profile")
25 end)
where:
FILENAME—the name of the binary file where profiling events are written
TIME—duration of profiling, in seconds.
Also, you can directly call misc.memprof.start() and misc.memprof.stop()
from a console.
Profiling a report analysis example
In the example below, the following Lua code named format_concat.lua is
investigated with the help of the memory profiler reports.
1 -- Prevent allocations on new traces.
2 jit.off()
3
4 local function concat(a)
5 local nstr = a.."a"
6 return nstr
7 end
8
9 local function format(a)
10 local nstr = string.format("%sa", a)
11 return nstr
12 end
13
14 collectgarbage()
15
16 local binfile = "/tmp/memprof_"..(arg[0]):match("([^/]*).lua")..".bin"
17
18 local st, err = misc.memprof.start(binfile)
19 assert(st, err)
20
21 -- Payload.
22 for i = 1, 10000 do
23 local f = format(i)
24 local c = concat(i)
25 end
26 collectgarbage()
27
28 local st, err = misc.memprof.stop()
29 assert(st, err)
30
31 os.exit()
When you run this code in Tarantool and
then parse the binary memory profile
in /tmp/memprof_format_concat.bin,
you will get the following profiling report:
ALLOCATIONS
@format_concat.lua:10: 19996 events +624284 bytes -0 bytes
INTERNAL: 1 events +65536 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 19996 events +0 bytes -558778 bytes
Overrides:
@format_concat.lua:10
@format_concat.lua:10: 2 events +0 bytes -98304 bytes
Overrides:
@format_concat.lua:10
HEAP SUMMARY:
INTERNAL holds 65536 bytes: 1 allocs, 0 frees
Reasonable questions regarding the report can be:
- Why are there no allocations related to the
concat() function?
- Why is the number of allocations not a round number?
- Why are there about 20K allocations instead of 10K?
First of all, LuaJIT doesn’t create a new string if the string with the same
payload exists (see details on lua-users.org/wiki).
This is called string interning.
So, when a string is
created via the format() function, there is no need to create the same
string via the concat() function, and LuaJIT just uses the previous one.
That is also the reason why the number of allocations is not a round number
as could be expected from the cycle operator for i = 1, 10000...:
Tarantool creates some
strings for internal needs and built‑in modules, so some strings already exist.
But why are there so many allocations? It’s almost twice as big as the expected
amount. This is because the string.format() built‑in function creates
another string necessary for the %s identifier, so there are two allocations
for each iteration: for tostring(i) and for string.format("%sa", string_i_value).
You can see the difference in behavior by adding the line
local _ = tostring(i) between lines 22 and 23.
To profile only the concat() function, comment out line 23 (which is
local f = format(i)) and run the profiler. Now the output looks like this:
ALLOCATIONS
@format_concat.lua:5: 10000 events +284411 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 10000 events +0 bytes -218905 bytes
Overrides:
@format_concat.lua:5
@format_concat.lua:5: 1 events +0 bytes -32768 bytes
HEAP SUMMARY:
@format_concat.lua:5 holds 65536 bytes: 10000 allocs, 9999 frees
Q: But what will change if JIT compilation is enabled?
A: In the code, comment out line 2 (which is
jit.off()) and run the profiler.
Now there are only 56 allocations in the report, and all the other
allocations are JIT-related (see also the related
dev issue):
ALLOCATIONS
@format_concat.lua:5: 56 events +1112 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
INTERNAL: 2 events +382 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 58 events +0 bytes -1164 bytes
Overrides:
@format_concat.lua:5
INTERNAL
HEAP SUMMARY:
@format_concat.lua:0 holds 640 bytes: 4 allocs, 0 frees
INTERNAL holds 360 bytes: 2 allocs, 1 frees
This happens because a trace has been compiled after 56 iterations (the default
value of the hotloop compiler parameter). Then, the
JIT-compiler removed the unused variable c from the trace, and, therefore,
the dead code of the concat() function is eliminated.
Next, let’s profile only the format() function with JIT enabled.
For that, comment out lines 2 and 24 (jit.off() and
local c = concat(i)), do not comment out line 23
(local f = format(i)), and run the profiler.
Now the output will look like this:
ALLOCATIONS
@format_concat.lua:10: 19996 events +624284 bytes -0 bytes
INTERNAL: 4 events +66928 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 19997 events +0 bytes -559034 bytes
Overrides:
@format_concat.lua:0
@format_concat.lua:10
@format_concat.lua:10: 2 events +0 bytes -98304 bytes
Overrides:
@format_concat.lua:10
HEAP SUMMARY:
INTERNAL holds 66928 bytes: 4 allocs, 0 frees
@format_concat.lua:0 holds 384 bytes: 4 allocs, 1 frees
Q: Why are there so many allocations in comparison to the concat() function?
A: The answer is simple: the string.format() function with the %s
identifier is not yet compiled via LuaJIT. So, a trace can’t be recorded and
the compiler doesn’t perform the corresponding optimizations.
If we change the format() function in lines 9-12 of the
Profiling a report analysis example
in the following way
local function format(a)
local nstr = string.format("%sa", tostring(a))
return nstr
end
the profiling report becomes much prettier:
ALLOCATIONS
@format_concat.lua:10: 109 events +2112 bytes -0 bytes
@format_concat.lua:0: 4 events +640 bytes -0 bytes
INTERNAL: 3 events +1206 bytes -0 bytes
REALLOCATIONS
DEALLOCATIONS
INTERNAL: 112 events +0 bytes -2460 bytes
Overrides:
@format_concat.lua:0
@format_concat.lua:10
INTERNAL
HEAP SUMMARY:
INTERNAL holds 1144 bytes: 3 allocs, 1 frees
@format_concat.lua:0 holds 384 bytes: 4 allocs, 1 frees
The heap summary and the –leak-only option
This feature was added in version 2.8.1.
The end of each display is a HEAP SUMMARY section which looks like this:
@<filename>:<line number> holds <number of still reachable bytes> bytes:
<number of allocation events> allocs, <number of deallocation events> frees
Sometimes a program can cause many deallocations, so
the DEALLOCATION section can become large, so the display is not easy to read.
To minimize output, start the parsing with an extra flag: --leak-only,
for example
$ tarantool -e 'require("memprof")(arg)' - --leak-only memprof_new.bin
When --leak-only is used, only the HEAP SUMMARY section is displayed.
LuaJIT getmetrics
Tarantool can return metrics of a current instance via the Lua API or the C API.
-
getmetrics()
Get the metrics values into a table.
Parameters: none
Example: metrics_table = misc.getmetrics()
The metrics table contains 19 values.
All values have type = ‘number’ and are the result of a cast to double, so there may be a very slight precision loss.
Values whose names begin with gc_ are associated with the
LuaJIT garbage collector;
a fuller study of the garbage collector can be found at
a Lua-users wiki page
and
a slide from the creator of Lua.
Values whose names begin with jit_ are associated with the
“phases”
of the just-in-time compilation process; a fuller study of JIT phases can be found at
A masters thesis from cern.ch.
Values described as “monotonic” are cumulative, that is, they are “totals since
all operations began”, rather than “since the last getmetrics() call”.
Overflow is possible.
Because many values are monotonic,
a typical analysis involves calling getmetrics(), saving the table,
calling getmetrics() again and comparing the table to what was saved.
The difference is a “slope curve”.
An interesting slope curve is one that shows acceleration,
for example the difference between the latest value and the previous
value keeps increasing.
Some of the table members shown here are used in the examples that come later in this section.
| Name |
Content |
Monotonic? |
| gc_allocated |
number of bytes of allocated memory |
yes |
| gc_cdatanum |
number of allocated cdata objects |
no |
| gc_freed |
number of bytes of freed memory |
yes |
| gc_steps_atomic |
number of steps of garbage collector,
atomic phases, incremental |
yes |
| gc_steps_finalize |
number of steps of garbage collector,
finalize |
yes |
| gc_steps_pause |
number of steps of garbage collector,
pauses |
yes |
| gc_steps_propagate |
number of steps of garbage collector,
propagate |
yes |
| gc_steps_sweep |
number of steps of garbage collector,
sweep phases
(see the Sweep phase description) |
yes |
| gc_steps_sweepstring |
number of steps of garbage collector,
sweep phases for strings |
yes |
| gc_strnum |
number of allocated string objects |
no |
| gc_tabnum |
number of allocated table objects |
no |
| gc_total |
number of bytes of currently allocated memory
(normally equals gc_allocated minus gc_freed) |
no |
| gc_udatanum |
number of allocated udata objects |
no |
| jit_mcode_size |
total size of all allocated machine code areas |
no |
| jit_snap_restore |
overall number of snap restores, based on the
number of guard assertions leading to stopping
trace executions (see external Snap tutorial) |
yes |
| jit_trace_abort |
overall number of aborted traces |
yes |
| jit_trace_num |
number of JIT traces |
no |
| strhash_hit |
number of strings being interned because, if a
string with the same value is found via the
hash, a new one is not created / allocated |
yes |
| strhash_miss |
total number of strings allocations during
the platform lifetime |
yes |
Note: Although value names are similar to value names in
ujit.getmetrics()
the values are not the same, primarily because many ujit numbers are not monotonic.
Note: Although value names are similar to value names in LuaJIT metrics,
and the values are exactly the same, misc.getmetrics() is slightly easier
because there is no need to ‘require’ the misc module.
The Lua getmetrics() function is a wrapper for the C function luaM_metrics().
C programs may include a header named libmisclib.h.
The definitions in libmisclib.h include the following lines:
struct luam_Metrics { /* the names described earlier for Lua */ }
LUAMISC_API void luaM_metrics(lua_State *L, struct luam_Metrics *metrics);
The names of struct luam_Metrics members are the same as Lua’s
getmetrics table values names.
The data types of struct luam_Metrics members are all size_t.
The luaM_metrics() function will fill the *metrics structure
with the metrics related to the Lua state anchored to the L coroutine.
Example with a C program
Go through the C stored procedures tutorial.
Replace the easy.c example with
#include "module.h"
#include <lmisclib.h>
int easy(box_function_ctx_t *ctx, const char *args, const char *args_end)
{
lua_State *ls = luaT_state();
struct luam_Metrics m;
luaM_metrics(ls, &m);
printf("allocated memory = %lu\n", m.gc_allocated);
return 0;
}
Now when you go back to the client and execute the requests up to and including the line
capi_connection:call('easy')
you will see that the display is something like
“allocated memory = 4431950”
although the number will vary.
Example with gc_strnum, strhash_miss, and strhash_hit
To track new string object allocations:
function f()
collectgarbage("collect")
local oldm = misc.getmetrics()
local table_of_strings = {}
for i = 3000, 4000 do table.insert(table_of_strings, tostring(i)) end
for i = 3900, 4100 do table.insert(table_of_strings, tostring(i)) end
local newm = misc.getmetrics()
print("gc_strnum diff = " .. newm.gc_strnum - oldm.gc_strnum)
print("strhash_miss diff = " .. newm.strhash_miss - oldm.strhash_miss)
print("strhash_hit diff = " .. newm.strhash_hit - oldm.strhash_hit)
end
f()
The result will probably be:
“gc_strnum diff = 1100” because we added 1202 strings but 101 were duplicates,
“strhash_miss_diff = 1100” for the same reason,
“strhash_hit_diff = 101” plus some overhead, for the same reason.
(There is always a slight overhead amount for strhash_hit, which can be ignored.)
We say “probably” because there is a chance that the strings were already
allocated somewhere.
It is a good thing if the slope curve of
strhash_miss is less than the slope curve of strhash_hit.
The other gc_*num values – gc_cdatanum, gc_tabnum, gc_udatanum – can be accessed
in a similar way.
Any of the gc_*num values can be useful when looking for memory leaks – the total
number of these objects should not grow nonstop.
A more general way to look for memory leaks is to watch gc_total.
Also jit_mcode_size can be used to watch the amount of allocated memory for machine code traces.
Example with gc_allocated and gc_freed
To track an application’s effect on the garbage collector (less is better):
function f()
for i = 1, 10 do collectgarbage("collect") end
local oldm = misc.getmetrics()
local newm = misc.getmetrics()
oldm = misc.getmetrics()
collectgarbage("collect")
newm = misc.getmetrics()
print("gc_allocated diff = " .. newm.gc_allocated - oldm.gc_allocated)
print("gc_freed diff = " .. newm.gc_freed - oldm.gc_freed)
end
f()
The result will be: gc_allocated diff = 800, gc_freed diff = 800.
This shows that local ... = getmetrics() itself causes memory allocation
(because it is creating a table and assigning to it),
and shows that when the name of a variable (in this case the oldm variable)
is used again, that causes freeing.
Ordinarily the freeing would not occur immediately, but
collectgarbage("collect") forces it to happen so we can see the effect.
Example with gc_allocated and a space optimization
To test whether optimizing for space is possible with tables:
function f()
collectgarbage("collect")
local oldm = misc.getmetrics()
local t = {}
for i = 1, 513 do
t[i] = i
end
local newm = misc.getmetrics()
local diff = newm.gc_allocated - oldm.gc_allocated
print("diff = " .. diff)
end
f()
The result will show that diff equals approximately 18000.
Now see what happens if the table initialization is different:
function f()
local table_new = require "table.new"
local oldm = misc.getmetrics()
local t = table_new(513, 0)
for i = 1, 513 do
t[i] = i
end
local newm = misc.getmetrics()
local diff = newm.gc_allocated - oldm.gc_allocated
print("diff = " .. diff)
end
f()
The result will show that diff equals approximately 6000.
gc_steps_atomic and gc_steps_propagate
The slope curves of gc_steps_* items can be used for tracking pressure on
the garbage collector too.
During long-running routines, gc_steps_* values will increase,
but long times between gc_steps_atomic increases are a good sign,
And, since gc_steps_atomic increases only once per garbage-collector cycle,
it shows how many garbage-collector cycles have occurred.
Also, increases in the gc_steps_propagate number can be used to
estimate indirectly how many objects there are. These values also correlate with the
garbage collector’s
step multiplier.
For example, the number of incremental steps can grow, but according to the
step multiplier configuration, one step can process only a small number of objects.
So these metrics should be considered when configuring the garbage collector.
The following function takes a casual look whether an SQL statement causes much pressure:
function f()
collectgarbage("collect")
local oldm = misc.getmetrics()
collectgarbage("collect")
box.execute([[DROP TABLE _vindex;]])
local newm = misc.getmetrics()
print("gc_steps_atomic = " .. newm.gc_steps_atomic - oldm.gc_steps_atomic)
print("gc_steps_finalize = " .. newm.gc_steps_finalize - oldm.gc_steps_finalize)
print("gc_steps_pause = " .. newm.gc_steps_pause - oldm.gc_steps_pause)
print("gc_steps_propagate = " .. newm.gc_steps_propagate - oldm.gc_steps_propagate)
print("gc_steps_sweep = " .. newm.gc_steps_sweep - oldm.gc_steps_sweep)
end
f()
And the display will show that the gc_steps_* metrics are not significantly
different from what they would be if the box.execute() was absent.
Example with jit_trace_num and jit_trace_abort
Just-in-time compilers will “trace” code looking for opportunities to
compile. jit_trace_abort can show how often there was a failed attempt
(less is better), and jit_trace_num can show how many traces were
generated since the last flush (usually more is better).
The following function does not contain code that can cause trouble for LuaJIT:
function f()
jit.flush()
for i = 1, 10 do collectgarbage("collect") end
local oldm = misc.getmetrics()
collectgarbage("collect")
local sum = 0
for i = 1, 57 do
sum = sum + 57
end
for i = 1, 10 do collectgarbage("collect") end
local newm = misc.getmetrics()
print("trace_num = " .. newm.jit_trace_num - oldm.jit_trace_num)
print("trace_abort = " .. newm.jit_trace_abort - oldm.jit_trace_abort)
end
f()
The result is: trace_num = 1, trace_abort = 0. Fine.
The following function seemingly does contain code that can cause trouble for LuaJIT:
jit.opt.start(0, "hotloop=2", "hotexit=2", "minstitch=15")
_G.globalthing = 5
function f()
jit.flush()
collectgarbage("collect")
local oldm = misc.getmetrics()
collectgarbage("collect")
local sum = 0
for i = 1, box.space._vindex:count()+ _G.globalthing do
box.execute([[SELECT RANDOMBLOB(0);]])
require('buffer').ibuf()
_G.globalthing = _G.globalthing - 1
end
local newm = misc.getmetrics()
print("trace_num = " .. newm.jit_trace_num - oldm.jit_trace_num)
print("trace_abort = " .. newm.jit_trace_abort - oldm.jit_trace_abort)
end
f()
The result is: trace_num = between 2 and 4, trace_abort = 1.
This means that up to four traces needed to be generated instead of one,
and this means that something made LuaJIT give up in despair.
Tracing more will reveal that the problem is
not the suspicious-looking statements within the function, it
is the jit.opt.start call.
(A look at a jit.dump file might help in examining the trace compilation process.)
Administration
Tarantool is designed to have multiple running instances on the same host.
Here we show how to administer Tarantool instances using any of the following
utilities:
systemd native utilities, or
- tt, a command-line utility for managing Tarantool-based applications.
Note
- Unlike the rest of this manual, here we use system-wide paths.
- Console examples here are for Fedora.
This chapter includes the following sections:
Managing modules
This section covers the installation and reloading of Tarantool modules.
To learn about writing your own module and contributing it,
check the Contributing a module section.
Modules in Lua and C that come from Tarantool developers and community
contributors are available in the following locations:
- Tarantool modules repository (see below)
- Tarantool deb/rpm repositories (see below)
Installing a module from deb/rpm
Follow these steps:
Install Tarantool as recommended on the
download page.
Install the module you need. Look up the module’s name on
Tarantool rocks page and put the prefix
“tarantool-” before the module name to avoid ambiguity:
$ # for Ubuntu/Debian:
$ sudo apt-get install tarantool-<module-name>
$ # for RHEL/CentOS/Amazon:
$ sudo yum install tarantool-<module-name>
For example, to install the module
vshard on Ubuntu, say:
$ sudo apt-get install tarantool-vshard
Once these steps are complete, you can:
You can reload any Tarantool application or module with zero downtime.
Reloading a module in Lua
Here’s an example that illustrates the most typical case – “update and reload”.
Update the application file.
For example, a module in /usr/share/tarantool/app.lua:
local function start()
-- initial version
box.once("myapp:v1.0", function()
box.schema.space.create("somedata")
box.space.somedata:create_index("primary")
...
end)
-- migration code from 1.0 to 1.1
box.once("myapp:v1.1", function()
box.space.somedata.index.primary:alter(...)
...
end)
-- migration code from 1.1 to 1.2
box.once("myapp:v1.2", function()
box.space.somedata.index.primary:alter(...)
box.space.somedata:insert(...)
...
end)
end
-- start some background fibers if you need
local function stop()
-- stop all background fibers and clean up resources
end
local function api_for_call(xxx)
-- do some business
end
return {
start = start,
stop = stop,
api_for_call = api_for_call
}
Update the instance file.
For example, /etc/tarantool/instances.enabled/my_app.lua:
#!/usr/bin/env tarantool
--
-- hot code reload example
--
box.cfg({listen = 3302})
-- ATTENTION: unload it all properly!
local app = package.loaded['app']
if app ~= nil then
-- stop the old application version
app.stop()
-- unload the application
package.loaded['app'] = nil
-- unload all dependencies
package.loaded['somedep'] = nil
end
-- load the application
log.info('require app')
app = require('app')
-- start the application
app.start({some app options controlled by sysadmins})
The important thing here is to properly unload the application and its
dependencies.
Manually reload the application file.
For example, using tt:
$ tt connect my_app -f /etc/tarantool/instances.enabled/my_app.lua
Logs
Each Tarantool instance logs important events to its own log file.
For instances started with tt, the log location is defined by
the log_dir parameter in the tt configuration.
By default, it’s /var/log/tarantool in the tt system mode,
and the var/log subdirectory of the tt working directory in the local mode.
In the specified location, tt creates separate directories for each instance’s logs.
To check how logging works, write something to the log using the log module:
$ tt connect application
• Connecting to the instance...
• Connected to application
application> require('log').info("Hello for the manual readers")
---
...
Then check the logs:
$ tail instances.enabled/application/var/log/instance001/tt.log
2024-04-09 17:34:29.489 [49502] main/106/gc I> wal/engine cleanup is resumed
2024-04-09 17:34:29.489 [49502] main/104/interactive/box.load_cfg I> set 'instance_name' configuration option to "instance001"
2024-04-09 17:34:29.489 [49502] main/104/interactive/box.load_cfg I> set 'custom_proc_title' configuration option to "tarantool - instance001"
2024-04-09 17:34:29.489 [49502] main/104/interactive/box.load_cfg I> set 'log_nonblock' configuration option to false
2024-04-09 17:34:29.489 [49502] main/104/interactive/box.load_cfg I> set 'replicaset_name' configuration option to "replicaset001"
2024-04-09 17:34:29.489 [49502] main/104/interactive/box.load_cfg I> set 'listen' configuration option to [{"uri":"127.0.0.1:3301"}]
2024-04-09 17:34:29.489 [49502] main/107/checkpoint_daemon I> scheduled next checkpoint for Tue Apr 9 19:08:04 2024
2024-04-09 17:34:29.489 [49502] main/104/interactive/box.load_cfg I> set 'metrics' configuration option to {"labels":{"alias":"instance001"},"include":["all"],"exclude":[]}
2024-04-09 17:34:29.489 [49502] main I> entering the event loop
2024-04-09 17:34:38.905 [49502] main/116/console/unix/:/tarantool I> Hello for the manual readers
When logging to a file, the system administrator must ensure
logs are rotated timely and do not take up all the available disk space.
The recommended way to prevent log files from growing infinitely is using an external
log rotation program, for example, logrotate, which is pre-installed on most
mainstream Linux distributions.
A Tarantool log rotation configuration for logrotate can look like this:
# /var/log/tarantool/<env>/<app>/<instance>/*.log
/var/log/tarantool/*/*/*/*.log {
daily
size 512k
missingok
rotate 10
compress
delaycompress
sharedscripts # Run tt logrotate only once after all logs are rotated.
postrotate
/usr/bin/tt -S logrotate
endscript
}
In this configuration, tt logrotate is called after each log
rotation to reopen the instance log files after they are moved by the logrotate
program.
There is also the built-in function log.rotate(), which you
can call on an instance to reopen its log file after rotation.
Tarantool can write its logs to a log file, to syslog, or to a specified program through a pipe.
For example, to send logs to syslog, specify the log.to parameter as follows:
log:
to: syslog
syslog:
server: '127.0.0.1:514'
Security
Tarantool allows for two types of connections:
- With console.listen() function from
console module,
you can set up a port which can be used to open an administrative console to
the server. This is for administrators to connect to a running instance and
make requests. tt invokes console.listen() to create a
control socket for each started instance.
- With box.cfg{listen=…} parameter from
box
module, you can set up a binary port for connections which read and write to
the database or invoke stored procedures.
When you connect to an admin console:
- The client-server protocol is plain text.
- No password is necessary.
- The user is automatically ‘admin’.
- Each command is fed directly to the built-in Lua interpreter.
Therefore you must set up ports for the admin console very cautiously. If it is
a TCP port, it should only be opened for a specific IP. Ideally, it should not
be a TCP port at all, it should be a Unix domain socket, so that access to the
server machine is required. Thus a typical port setup for admin console is:
console.listen('/var/lib/tarantool/socket_name.sock')
and a typical connection URI is:
/var/lib/tarantool/socket_name.sock
if the listener has the privilege to write on /var/lib/tarantool and the
connector has the privilege to read on /var/lib/tarantool. Alternatively,
to connect to an admin console of an instance started with tt, use
tt connect.
To find out whether a TCP port is a port for admin console, use telnet.
For example:
$ telnet 0 3303
Trying 0.0.0.0...
Connected to 0.
Escape character is '^]'.
Tarantool 2.1.0 (Lua console)
type 'help' for interactive help
In this example, the response does not include the word “binary” and does
include the words “Lua console”. Therefore it is clear that this is a successful
connection to a port for admin console, and you can now enter admin requests on
this terminal.
When you connect to a binary port:
- The client-server protocol is binary.
- The user is automatically ‘guest’.
- To change the user, it’s necessary to authenticate.
For ease of use, tt connect command automatically detects the type
of connection during handshake and uses EVAL
binary protocol command when it’s necessary to execute Lua commands over a binary
connection. To execute EVAL, the authenticated user must have global “EXECUTE”
privilege.
Therefore, when ssh access to the machine is not available, creating a
Tarantool user with global “EXECUTE” privilege and non-empty password can be
used to provide a system administrator remote access to an instance.
Access control
Tarantool enables flexible management of access to various database resources.
The main concepts of Tarantool access control system are as follows:
- A user is a person or program that interacts with a Tarantool instance.
- An object is an entity to which access can be granted, for example, a space, an index, or a function.
- A privilege allows a user to perform certain operations on specific objects, for example, creating spaces, reading or updating data.
- A role is a named collection of privileges that can be granted to a user.
A user identifies a person or program that interacts with a Tarantool instance.
There might be different types of users, for example:
- A database administrator responsible for the overall management and administration of a database.
An administrator can create other users and grant them specified privileges.
- A user with limited access to certain data and stored functions.
Such users can get their privileges from the database administrator.
- Users used in communications between Tarantool instances. For example, such users can be created to maintain replication and sharding in a Tarantool cluster.
There are two built-in users in Tarantool:
admin is a user with all available administrative privileges.
If the connection uses an admin-console port, the current user is admin.
For example, admin is used when connecting to an instance using tt connect locally using the instance name:
$ tt connect app:instance001
To allow remote binary port connections using the admin user, you need to set a password.
guest is a user with minimum privileges used by default for remote binary port connections.
For example, guest is used when connecting to an instance using tt connect using the IP address and port without specifying the name of a user:
$ tt connect 192.168.10.10:3301
Warning
Given that the guest user allows unauthenticated access to Tarantool instances, it is not recommended to grant additional privileges to this user.
For example, granting the execute access to universe allows remote code execution on instances.
Note
Information about users is stored in the _user space.
Any user (except guest) may have a password.
If a password is not set, a user cannot connect to Tarantool instances.
Tarantool password hashes are stored in the _user system space.
By default, Tarantool uses the CHAP protocol to authenticate users and applies SHA-1 hashing to
passwords.
So, if the password is ‘123456’, the stored hash is a string like ‘a7SDfrdDKRBe5FaN2n3GftLKKtk=’.
In the Enterprise Edition, you can enable PAP authentication with the SHA256 hashing algorithm.
Tarantool Enterprise Edition allows you to improve database security by enforcing the use of strong passwords, setting up a maximum password age, and so on.
Learn more from the Authentication topic.
An object is a securable entity to which access can be granted.
Tarantool has a number of objects that enable flexible management of access to data, stored functions, specific actions, and so on.
Below are a few examples of objects:
universe represents a database (box.schema) that contains database objects, including spaces, indexes, users, roles, sequences, and functions.
Granting privileges to universe gives a user access to any object in a database.
space enables granting privileges to user-created or system spaces.
function enables granting privileges to functions.
Note
The full list of object types is available in the Object types section.
The privileges granted to a user determine which operations the user can perform, for example:
- The
read and write permissions granted to the space object allow a user to read or modify data in the specified space.
- The
create permission granted to the space object allows a user to create new spaces.
- The
execute permission granted to the function object allows a user to execute the specified function.
- The
session permission granted to the universe object allows a user to connect to an instance over IPROTO.
- The
usage permission granted to universe object allows a user to use his privileges on database objects (for example, read, write, and alter space).
- The
alter permission granted to a user allows modifying its own settings, for example, a password.
- The
drop permission granted to a user allows dropping users.
Note that some privileges might require read and write access to certain system spaces.
For example, the create permission granted to the space object requires read and write permissions to the _space system space.
Similarly, granting the ability to create functions requires read and write access to the _func space.
Note
Information about privileges is stored in the _priv space.
A role is a container for privileges that can be granted to users.
Roles can also be assigned to other roles, creating a role hierarchy.
There are the following built-in roles in Tarantool:
super has all available administrative permissions.
public has certain read permissions. This role is automatically granted to new users when they are created.
replication can be granted to a user used to maintain replication in a cluster.
sharding can be granted to a user used to maintain sharding in a cluster.
Note
The sharding role is created only if an instance is managed using YAML configuration.
Below are a few diagrams that demonstrate how privileges can be granted to a user without and with using roles.
In this example, a user gets privileges directly without using roles.
user1 ── privilege1
├─── privilege2
└─── privilege3
In this example, a user gets all privileges provided by role1 and specific privileges assigned directly.
user1 ── role1 ── privilege1
│ └─── privilege2
├─── privilege3
└─── privilege4
In this example, role2 is granted to role1.
This means that a user with role1 subsequently gets all privileges from both roles role1 and role2.
user1 ── role1 ── privilege1
│ ├─── privilege2
│ └─── role2
│ ├─── privilege3
│ └─── privilege4
├─── privilege5
└─── privilege6
Note
Information about roles is stored in the _user space.
An owner of a database object is the user who created it.
The owner of the database and the owner of objects that are created initially (the system spaces and the default users) is the admin user.
Owners automatically have privileges for objects they create.
They can share these privileges with other users or roles using box.schema.user.grant() and box.schema.role.grant().
Note
Information about users who gave the specified privileges is stored in the _priv space.
A session is the state of a connection to Tarantool.
The session contains:
- An integer ID identifying the connection.
- The current user associated with the connection.
- The text description of the connected peer.
- A session’s local state, such as Lua variables and functions.
In Tarantool, a single session can execute multiple concurrent transactions.
Each transaction is identified by a unique integer ID, which can be queried
at the start of the transaction using box.session.sync().
To create a new user, call box.schema.user.create().
In the example below, a user is created without a password:
box.schema.user.create('testuser')
In this example, the password is specified in the options parameter:
box.schema.user.create('testuser', { password = 'foobar' })
To set or change a user’s password, use box.schema.user.passwd().
In the example below, a user password is set for a currently logged-in user:
box.schema.user.passwd('foobar')
To set the password for the specified user, pass a username and password as shown below:
box.schema.user.passwd('testuser', 'foobar')
Granting privileges to a user
To grant the specified privileges to a user, use the box.schema.user.grant() function.
In the example below, testuser gets read permissions to the writers space and read/write permissions to the books space:
box.schema.user.grant('testuser', 'read', 'space', 'writers')
box.schema.user.grant('testuser', 'read,write', 'space', 'books')
Learn more about granting privileges to different types of objects from Granting privileges.
Revoking user’s privileges
To revoke the specified privileges, use the box.schema.user.revoke() function.
In the example below, write access to the books space is revoked:
box.schema.user.revoke('testuser', 'write', 'space', 'books')
Revoking the session permission to universe can be used to disallow a user to connect to a Tarantool instance:
box.schema.user.revoke('testuser', 'session', 'universe')
Changing the current user
The current user name can be found using box.session.user().
box.session.user()
--[[
- admin
--]]
The current user can be changed:
For an admin-console connection: using box.session.su():
box.session.su('testuser')
box.session.user()
--[[
- testuser
--]]
For a binary port connection: using the
AUTH protocol command, supported by most clients.
For a binary-port connection invoking a stored function with the CALL command:
if the SETUID
property is enabled for the function,
Tarantool temporarily replaces the current user with the
function’s creator, with all the creator’s privileges, during function execution.
To create a new role, call box.schema.role.create().
In the example below, two roles are created:
box.schema.role.create('books_space_manager')
box.schema.role.create('writers_space_reader')
Granting privileges to a role
To grant the specified privileges to a role, use the box.schema.role.grant() function.
In the example below, the books_space_manager role gets read and write permissions to the books space:
box.schema.role.grant('books_space_manager', 'read,write', 'space', 'books')
The writers_space_reader role gets read permissions to the writers space:
box.schema.role.grant('writers_space_reader', 'read', 'space', 'writers')
Learn more about granting privileges to different types of objects from Granting privileges.
Note
Not all privileges can be granted to roles.
Learn more from Permissions.
Granting a role to a role
Roles can be assigned to other roles.
In the example below, the newly created all_spaces_manager role gets all privileges granted to books_space_manager and writers_space_reader:
box.schema.role.create('all_spaces_manager')
box.schema.role.grant('all_spaces_manager', 'books_space_manager')
box.schema.role.grant('all_spaces_manager', 'writers_space_reader')
Granting a role to a user
To grant the specified role to a user, use the box.schema.user.grant() function.
In the example below, testuser gets privileges granted to the books_space_manager and writers_space_reader roles:
box.schema.user.grant('testuser', 'books_space_manager')
box.schema.user.grant('testuser', 'writers_space_reader')
Revoking a role from a user
To revoke the specified role from a user, revoke the execute privilege for this role using the box.schema.user.revoke() function.
In the example below, the books_space_reader role is revoked from testuser:
box.schema.user.revoke('testuser', 'execute', 'role', 'writers_space_reader')
To revoke role’s privileges, use box.schema.role.revoke().
To grant the specified privileges to a user or role, use the box.schema.user.grant() and box.schema.role.grant() functions,
which have similar signatures and accept the same set of arguments.
For example, the box.schema.user.grant() signature looks as follows:
box.schema.user.grant(username, permissions, object-type, object-name[, {options}])
username: the name of the user that gets the specified privileges.
permissions: a string value that represents permissions granted to the user. If there are several permissions, they should be separated by commas without a space.
object-type: a type of object to which permissions are granted.
object-name: the name of the object to which permissions are granted.
An empty string ("") or nil provided instead of object-name grants the specified permissions to all objects of the specified type.
Note
object-name is ignored for the following combinations of permissions and object types:
- Any permission granted to
universe.
- The
create and drop permissions for the following object types: user, role, space, function, sequence.
- The
execute permission for the following object types: lua_eval, lua_call, sql.
In the example below, testuser gets privileges allowing them to create any object of any type:
box.schema.user.grant('testuser','read,write,create','universe')
In this example, testuser can grant access to objects that testuser created:
box.schema.user.grant('testuser','write','space','_priv')
Creating and altering spaces
In the example below, testuser gets privileges allowing them to create spaces:
box.schema.user.grant('testuser','create','space')
box.schema.user.grant('testuser','write', 'space', '_schema')
box.schema.user.grant('testuser','write', 'space', '_space')
As you can see, the ability to create spaces also requires write access to certain system spaces.
To allow testuser to drop a space that has associated objects, add the following privileges:
box.schema.user.grant('testuser','create,drop','space')
box.schema.user.grant('testuser','write','space','_schema')
box.schema.user.grant('testuser','write','space','_space')
box.schema.user.grant('testuser','write','space','_space_sequence')
box.schema.user.grant('testuser','read','space','_trigger')
box.schema.user.grant('testuser','read','space','_fk_constraint')
box.schema.user.grant('testuser','read','space','_ck_constraint')
box.schema.user.grant('testuser','read','space','_func_index')
Creating and altering indexes
In the example below, testuser gets privileges allowing them to create indexes in the ‘writers’ space:
box.schema.user.grant('testuser','create,read','space','writers')
box.schema.user.grant('testuser','read,write','space','_space_sequence')
box.schema.user.grant('testuser','write', 'space', '_index')
To allow testuser to alter indexes in the writers space, grant the privileges below.
This example assumes that indexes in the writers space are not created by testuser.
box.schema.user.grant('testuser','alter','space','writers')
box.schema.user.grant('testuser','read','space','_space')
box.schema.user.grant('testuser','read','space','_index')
box.schema.user.grant('testuser','read','space','_space_sequence')
box.schema.user.grant('testuser','write','space','_index')
If testuser created indexes in the writers space, granting the following privileges is enough to alter indexes:
box.schema.user.grant('testuser','read','space','_space_sequence')
box.schema.user.grant('testuser','read,write','space','_index')
In this example, testuser gets privileges allowing them to select data from the ‘writers’ space:
box.schema.user.grant('testuser','read','space','writers')
In this example, testuser is allowed to read and modify data in the ‘books’ space:
box.schema.user.grant('testuser','read,write','space','books')
Creating and dropping sequences
In this example, testuser gets privileges to create sequence generators:
box.schema.user.grant('testuser','create','sequence')
box.schema.user.grant('testuser', 'read,write', 'space', '_sequence')
To let testuser drop a sequence, grant them the following privileges:
box.schema.user.grant('testuser','drop','sequence')
box.schema.user.grant('testuser','write','space','_sequence_data')
box.schema.user.grant('testuser','write','space','_sequence')
In this example, testuser is allowed to use the id_seq:next() function with a sequence named ‘id_seq’:
box.schema.user.grant('testuser','read,write','sequence','id_seq')
In the next example, testuser is allowed to use the id_seq:set() or id_seq:reset() functions with a sequence named ‘id_seq’:
box.schema.user.grant('testuser','write','sequence','id_seq')
Creating and dropping functions
In this example, testuser gets privileges to create functions:
box.schema.user.grant('testuser','create','function')
box.schema.user.grant('testuser','read,write','space','_func')
To let testuser drop a function, grant them the following privileges:
box.schema.user.grant('testuser','drop','function')
box.schema.user.grant('testuser','write','space','_func')
To give the ability to execute a function named ‘sum’, grant the following privileges:
box.schema.user.grant('testuser','execute','function','sum')
Granting the ‘execute’ privilege on lua_call permits the user to call any global (accessible via the _G Lua table)
user-defined Lua function with the IPROTO_CALL request. To grant permission to any non-persistent function, you need to
specify its name when granting the lua_call privilege.
Note
The function doesn’t need to be defined at the time privileges are granted, meaning that the access to the function will be provided for the user once this function is defined.
function my_func_1() end
function my_func_2() end
box.cfg({listen = 3301})
box.schema.user.create('alice', {password = 'secret'})
conn = require('net.box').connect(box.cfg.listen, {user = 'alice', password = 'secret'})
box.schema.user.grant('alice', 'execute', 'lua_call', 'my_func_1')
conn:call('my_func_1') -- ok
conn:call('my_func_2') -- access denied
box.schema.user.grant('alice', 'execute', 'lua_call', 'box.session.su')
conn:call('box.session.su', {'admin'}) -- ok
In this example, testuser gets privileges to create other users:
box.schema.user.grant('testuser','create','user')
box.schema.user.grant('testuser', 'read,write', 'space', '_user')
box.schema.user.grant('testuser', 'write', 'space', '_priv')
To let testuser create new roles, grant the following privileges:
box.schema.user.grant('testuser','create','role')
box.schema.user.grant('testuser', 'read,write', 'space', '_user')
box.schema.user.grant('testuser', 'write', 'space', '_priv')
To let testuser execute Lua code, grant the execute privilege to the lua_eval object:
box.schema.user.grant('testuser','execute','lua_eval')
Similarly, executing an arbitrary SQL expression requires the execute privilege to the sql object:
box.schema.user.grant('testuser','execute','sql')
In the example below, the created Lua function is executed on behalf of its
creator, even if called by another user.
First, the two spaces (space1 and space2) are created, and a no-password user (private_user)
is granted full access to them. Then read_and_modify is defined and private_user becomes this function’s creator.
Finally, another user (public_user) is granted access to execute Lua functions created by private_user.
box.schema.space.create('space1')
box.schema.space.create('space2')
box.space.space1:create_index('pk')
box.space.space2:create_index('pk')
box.schema.user.create('private_user')
box.schema.user.grant('private_user', 'read,write', 'space', 'space1')
box.schema.user.grant('private_user', 'read,write', 'space', 'space2')
box.schema.user.grant('private_user', 'create', 'universe')
box.schema.user.grant('private_user', 'read,write', 'space', '_func')
function read_and_modify(key)
local space1 = box.space.space1
local space2 = box.space.space2
local fiber = require('fiber')
local t = space1:get{key}
if t ~= nil then
space1:put{key, box.session.uid()}
space2:put{key, fiber.time()}
end
end
box.session.su('private_user')
box.schema.func.create('read_and_modify', {setuid= true})
box.session.su('admin')
box.schema.user.create('public_user', {password = 'secret'})
box.schema.user.grant('public_user', 'execute', 'function', 'read_and_modify')
Whenever public_user calls the function, it is executed on behalf of its creator, private_user.
All object types and permissions
| Object type |
Description |
universe |
A database (box.schema) that contains database objects, including spaces, indexes, users, roles, sequences, and functions. Granting privileges to universe gives a user access to any object in the database. |
user |
A user. |
role |
A role. |
space |
A space. |
function |
A function. |
sequence |
A sequence. |
lua_eval |
Executing arbitrary Lua code. |
lua_call |
Calling any global user-defined Lua function. |
sql |
Executing an arbitrary SQL expression. |
| Permission |
Object type |
Granted to roles |
Description |
read |
All |
Yes |
Allows reading data of the specified object.
For example, this permission can be used to allow a user to select data from the specified space. |
write |
All |
Yes |
Allows updating data of the specified object.
For example, this permission can be used to allow a user to modify data in the specified space. |
create |
All |
Yes |
Allows creating objects of the specified type.
For example, this permission can be used to allow a user to create new spaces.
Note that this permission requires read and write access to certain system spaces.
|
alter |
All |
Yes |
Allows altering objects of the specified type.
Note that this permission requires read and write access to certain system spaces.
|
drop |
All |
Yes |
Allows dropping objects of the specified type.
Note that this permission requires read and write access to certain system spaces.
|
execute |
role, universe, function, lua_eval, lua_call, sql |
Yes |
For role, allows using the specified role.
For other object types, allows calling a function. |
session |
universe |
No |
Allows a user to connect to an instance over IPROTO. |
usage |
universe |
No |
Allows a user to use their privileges on database objects (for example, read, write, and alter spaces). |
Object types and permissions
| Object type |
Details |
universe |
read: Allows reading any object types, including all spaces or sequence objects.
write: Allows modifying any object types, including all spaces or sequence objects.
execute: Allows execute functions, Lua code, or SQL expressions, including IPROTO calls.
session: Allows a user to connect to an instance over IPROTO.
usage: Allows a user to use their privileges on database objects (for example, read, write, and alter space).
create: Allows creating users, roles, functions, spaces, and sequences.
This permission requires read and write access to certain system spaces.
drop: Allows deleting users, roles, functions, spaces, and sequences.
This permission requires read and write access to certain system spaces.
alter: Allows altering user settings or space objects.
|
user |
alter: Allows modifying a user description, for example, change the password.
create: Allows creating new users.
This permission requires read and write access to the _user system space.
drop: Allows dropping users.
This permission requires read and write access to the _user system space.
|
role |
execute: Indicates that a role is assigned to the user or another role.
create: Allows creating new roles.
This permission requires read and write access to the _user system space.
drop: Allows dropping roles.
This permission requires read and write access to the _user system space.
|
space |
read: Allows selecting data from a space.
write: Allows modifying data in a space.
create: Allows creating new spaces.
This permission requires read and write access to the _space system space.
drop: Allows dropping spaces.
This permission requires read and write access to the _space system space.
alter: Allows modifying spaces.
This permission requires read and write access to the _space system space.
If a space is created by a user, they can read and write it without granting explicit permission.
|
function |
execute: Allows calling a function.
create: Allows creating a function.
This permission requires read and write access to the _func system space.
If a function is created by a user, they can execute it without granting explicit permission.
drop: Allows dropping a function.
This permission requires read and write access to the _func system space.
|
sequence |
read: Allows using sequences in space_obj:create_index().
write: Allows all operations for a sequence object.
seq_obj:drop() requires a write permission to the _priv system space.
create: Allows creating sequences.
This permission requires read and write access to the _sequence system space.
If a sequence is created by a user, they can read/write it without explicit permission.
drop: Allows dropping sequences.
This permission requires read and write access to the _sequence system space.
alter: Has no effect.
seq_obj:alter() and other methods require the write permission.
|
lua_eval |
execute: Allows executing arbitrary Lua code using the IPROTO_EVAL request.
|
lua_call |
execute: Allows executing any user-defined function using the IPROTO_CALL request.
This permission doesn’t allow a user to call built-in Lua functions (for example, loadstring() or box.session.su()) and functions defined in the _func system space.
|
sql |
execute: Allows executing arbitrary SQL expression using the IPROTO_PREPARE and IPROTO_EXECUTE requests.
|
Replication administration
Monitoring a replica set
To learn what instances belong to the replica set and obtain statistics for all
these instances, execute a box.info.replication request.
The output below shows the replication status for a replica set containing one master and two replicas:
manual_leader:instance001> box.info.replication
---
- 1:
id: 1
uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
lsn: 21
name: instance001
2:
id: 2
uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
lsn: 0
upstream:
status: follow
idle: 0.052655000000414
peer: replicator@127.0.0.1:3302
lag: 0.00010204315185547
name: instance002
downstream:
status: follow
idle: 0.09503500000028
vclock: {1: 21}
lag: 0.00026917457580566
3:
id: 3
uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6
lsn: 0
upstream:
status: follow
idle: 0.77522099999987
peer: replicator@127.0.0.1:3303
lag: 0.0001838207244873
name: instance003
downstream:
status: follow
idle: 0.33186100000012
vclock: {1: 21}
lag: 0
...
The following diagram illustrates the upstream and downstream connections if box.info.replication executed at the master instance (instance001):
If box.info.replication is executed on instance002, the upstream and downstream connections look as follows:
This means that statistics for replicas are given in regard to the instance on which box.info.replication is executed.
The primary indicators of replication health are:
idle: the time (in seconds) since
the instance received the last event from a master.
If the master has no updates to send to the replicas, it sends heartbeat messages
every replication_timeout seconds. The master
is programmed to disconnect if it does not see acknowledgments of the heartbeat messages
within replication_timeout * 4 seconds.
Therefore, in a healthy replication setup, idle should never exceed
replication_timeout: if it does, either the replication is lagging
seriously behind, because the master is running ahead of the replica, or the
network link between the instances is down.
lag: the time difference between
the local time at the instance, recorded when the event was received, and the
local time at another master recorded when the event was written to the
write-ahead log on that master.
Since the lag calculation uses the operating system clocks from two different
machines, do not be surprised if it’s negative: a time drift may lead to the
remote master clock being consistently behind the local instance’s clock.
Recovering from a degraded state
“Degraded state” is a situation when the master becomes unavailable – due to
hardware or network failure, or due to a programming bug.

In a master-replica set with manual failover, if a master disappears, error messages appear on the
replicas stating that the connection is lost:
2023-12-04 13:19:04.724 [16755] main/110/applier/replicator@127.0.0.1:3301 I> can't read row
2023-12-04 13:19:04.724 [16755] main/110/applier/replicator@127.0.0.1:3301 coio.c:349 E> SocketError: unexpected EOF when reading from socket, called on fd 19, aka 127.0.0.1:55932, peer of 127.0.0.1:3301: Broken pipe
2023-12-04 13:19:04.724 [16755] main/110/applier/replicator@127.0.0.1:3301 I> will retry every 1.00 second
2023-12-04 13:19:04.724 [16755] relay/127.0.0.1:55940/101/main coio.c:349 E> SocketError: unexpected EOF when reading from socket, called on fd 23, aka 127.0.0.1:3302, peer of 127.0.0.1:55940: Broken pipe
2023-12-04 13:19:04.724 [16755] relay/127.0.0.1:55940/101/main I> exiting the relay loop
In a master-replica set with automated failover, a log also includes Raft messages showing the process of a new master’s election:
2023-12-04 13:16:56.340 [16615] main/111/applier/replicator@127.0.0.1:3302 I> can't read row
2023-12-04 13:16:56.340 [16615] main/111/applier/replicator@127.0.0.1:3302 coio.c:349 E> SocketError: unexpected EOF when reading from socket, called on fd 24, aka 127.0.0.1:55687, peer of 127.0.0.1:3302: Broken pipe
2023-12-04 13:16:56.340 [16615] main/111/applier/replicator@127.0.0.1:3302 I> will retry every 1.00 second
2023-12-04 13:16:56.340 [16615] relay/127.0.0.1:55695/101/main coio.c:349 E> SocketError: unexpected EOF when reading from socket, called on fd 25, aka 127.0.0.1:3301, peer of 127.0.0.1:55695: Broken pipe
2023-12-04 13:16:56.340 [16615] relay/127.0.0.1:55695/101/main I> exiting the relay loop
2023-12-04 13:16:59.690 [16615] main/112/applier/replicator@127.0.0.1:3303 I> RAFT: message {term: 3, vote: 2, state: candidate, vclock: {1: 9}} from 2
2023-12-04 13:16:59.690 [16615] main/112/applier/replicator@127.0.0.1:3303 I> RAFT: received a newer term from 2
2023-12-04 13:16:59.690 [16615] main/112/applier/replicator@127.0.0.1:3303 I> RAFT: bump term to 3, follow
2023-12-04 13:16:59.690 [16615] main/112/applier/replicator@127.0.0.1:3303 I> RAFT: vote for 2, follow
2023-12-04 13:16:59.691 [16615] main/119/raft_worker I> RAFT: persisted state {term: 3}
2023-12-04 13:16:59.691 [16615] main/119/raft_worker I> RAFT: persisted state {term: 3, vote: 2}
2023-12-04 13:16:59.691 [16615] main/112/applier/replicator@127.0.0.1:3303 I> RAFT: message {term: 3, vote: 2, leader: 2, state: leader} from 2
2023-12-04 13:16:59.691 [16615] main/112/applier/replicator@127.0.0.1:3303 I> RAFT: vote request is skipped - this is a notification about a vote for a third node, not a request
2023-12-04 13:16:59.691 [16615] main/112/applier/replicator@127.0.0.1:3303 I> RAFT: leader is 2, follow
The master’s upstream status is reported as disconnected when executing box.info.replication on a replica:
auto_leader:instance001> box.info.replication
---
- 1:
id: 1
uuid: 4cfa6e3c-625e-b027-00a7-29b2f2182f23
lsn: 32
upstream:
peer: replicator@127.0.0.1:3302
lag: 0.00032305717468262
status: disconnected
idle: 48.352504000002
message: 'connect, called on fd 20, aka 127.0.0.1:62575: Connection refused'
system_message: Connection refused
name: instance002
downstream:
status: stopped
message: 'unexpected EOF when reading from socket, called on fd 32, aka 127.0.0.1:3301,
peer of 127.0.0.1:62204: Broken pipe'
system_message: Broken pipe
2:
id: 2
uuid: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
lsn: 1
name: instance001
3:
id: 3
uuid: 9a3a1b9b-8a18-baf6-00b3-a6e5e11fd8b6
lsn: 0
upstream:
status: follow
idle: 0.18620999999985
peer: replicator@127.0.0.1:3303
lag: 0.00012516975402832
name: instance003
downstream:
status: follow
idle: 0.19718099999955
vclock: {2: 1, 1: 32}
lag: 0.00051403045654297
...
To learn how to perform manual failover in a master-replica set, see the Performing manual failover section.
In a master-replica configuration with automated failover, a new master should be elected automatically.
Reseeding a replica
If any of a replica’s write-ahead log or snapshot files are corrupted or deleted, you can reseed the replica.
This procedure works only if the master’s write-ahead logs are present.
Stop the replica using the tt stop command.
Delete write-ahead logs and snapshots stored in the var/lib/<instance_name> directory.
Note
var/lib is the default directory used by tt to store write-ahead logs and snapshots.
Learn more from Configuration.
Start the replica using the tt start command.
The replica should catch up with the master by retrieving all the master’s tuples.
(Optional) If you’re reseeding a replica after a replication conflict, you also need to restart replication.
Resolving replication conflicts
Tarantool guarantees that every update is applied only once on every replica.
However, due to the asynchronous nature of replication, the order of updates is not guaranteed.
This topic describes how to solve problems in master-master replication.
Replacing the same primary key
Case 1: You have two instances of Tarantool. For example, you try to make a
replace operation with the same primary key on both instances at the same time.
This causes a conflict over which tuple to save and which one to discard.
Tarantool trigger functions can help here to implement the
rules of conflict resolution on some condition. For example, if you have a
timestamp, you can declare saving the tuple with the bigger one.
First, you need a before_replace() trigger on
the space which may have conflicts. In this trigger, you can compare the old and new
replica records and choose which one to use (or skip the update entirely,
or merge two records together).
Then you need to set the trigger at the right time before the space starts
to receive any updates. The way you usually set the before_replace trigger
is right when the space is created, so you need a trigger to set another trigger
on the system space _space, to capture the moment when your space is created
and set the trigger there. This can be an on_replace()
trigger.
The difference between before_replace and on_replace is that on_replace
is called after a row is inserted into the space, and before_replace
is called before that.
To set a _space:on_replace() trigger correctly, you also need the right timing. The best
timing to use it is when _space is just created, which is
the box.ctl.on_schema_init() trigger.
You also need to utilize box.on_commit to get access to the space being
created. The resulting snippet would be the following:
local my_space_name = 'my_space'
local my_trigger = function(old, new) ... end -- your function resolving a conflict
box.ctl.on_schema_init(function()
box.space._space:on_replace(function(old_space, new_space)
if not old_space and new_space and new_space.name == my_space_name then
box.on_commit(function()
box.space[my_space_name]:before_replace(my_trigger)
end
end
end)
end)
Preventing duplicate insert
Case 2: In a replica set of two masters, both of them try to insert data by the same unique key:
This would cause an error saying Duplicate key exists in unique index
'primary' in space 'tester' and the replication would be stopped.
(This is the behavior when the
replication_skip_conflict
configuration parameter has its default recommended value, false.)
$ # error messages from master #1
2017-06-26 21:17:03.233 [30444] main/104/applier/rep_user@100.96.166.1 I> can't read row
2017-06-26 21:17:03.233 [30444] main/104/applier/rep_user@100.96.166.1 memtx_hash.cc:226 E> ER_TUPLE_FOUND:
Duplicate key exists in unique index 'primary' in space 'tester'
2017-06-26 21:17:03.233 [30444] relay/[::ffff:100.96.166.178]/101/main I> the replica has closed its socket, exiting
2017-06-26 21:17:03.233 [30444] relay/[::ffff:100.96.166.178]/101/main C> exiting the relay loop
$ # error messages from master #2
2017-06-26 21:17:03.233 [30445] main/104/applier/rep_user@100.96.166.1 I> can't read row
2017-06-26 21:17:03.233 [30445] main/104/applier/rep_user@100.96.166.1 memtx_hash.cc:226 E> ER_TUPLE_FOUND:
Duplicate key exists in unique index 'primary' in space 'tester'
2017-06-26 21:17:03.234 [30445] relay/[::ffff:100.96.166.178]/101/main I> the replica has closed its socket, exiting
2017-06-26 21:17:03.234 [30445] relay/[::ffff:100.96.166.178]/101/main C> exiting the relay loop
If we check replication statuses with box.info, we will see that replication
at master #1 is stopped (1.upstream.status = stopped). Additionally, no data
is replicated from that master (section 1.downstream is missing in the
report), because the downstream has encountered the same error:
To learn how to resolve a replication conflict by reseeding a replica, see Resolving replication conflicts.
Replication runs out of sync
In a master-master cluster of two instances, suppose we make the following
operation:
When this operation is applied on both instances in the replica set:
… we can have the following results, depending on the order of execution:
- each master’s row contains the UUID from master #1,
- each master’s row contains the UUID from master #2,
- master #1 has the UUID of master #2, and vice versa.
The cases described in the previous paragraphs represent examples of
non-commutative operations, that is operations whose result depends on the
execution order. On the contrary, for commutative operations, the
execution order does not matter.
Consider for example the following command:
This operation is commutative: we get the same result no matter in which order
the update is applied on the other masters.
The logic and the snippet setting a trigger will be the same here as in case 1.
But the trigger function will differ.
Note that the trigger below assumes that tuple has a timestamp in the second field.
local my_space_name = 'test'
local my_trigger = function(old, new, sp, op)
-- op: ‘INSERT’, ‘DELETE’, ‘UPDATE’, or ‘REPLACE’
if new == nil then
print("No new during "..op, old)
return -- deletes are ok
end
if old == nil then
print("Insert new, no old", new)
return new -- insert without old value: ok
end
print(op.." duplicate", old, new)
if op == 'INSERT' then
if new[2] > old[2] then
-- Creating new tuple will change op to ‘REPLACE’
return box.tuple.new(new)
end
return old
end
if new[2] > old[2] then
return new
else
return old
end
return
end
box.ctl.on_schema_init(function()
box.space._space:on_replace(function(old_space, new_space)
if not old_space and new_space and new_space.name == my_space_name then
box.on_commit(function()
box.space[my_space_name]:before_replace(my_trigger)
end)
end
end)
end)
Server introspection
Executing code on an instance
You can attach to an instance’s admin console and
execute some Lua code using tt:
$ # for local instances:
$ tt connect my_app
• Connecting to the instance...
• Connected to /var/run/tarantool/example.control
/var/run/tarantool/my_app.control> 1 + 1
---
- 2
...
/var/run/tarantool/my_app.control>
$ # for local and remote instances:
$ tt connect username:password@127.0.0.1:3306
You can also use tt to execute Lua code on an instance without
attaching to its admin console. For example:
$ # executing commands directly from the command line
$ <command> | tt connect my_app -f -
<...>
$ # - OR -
$ # executing commands from a script file
$ tt connect my_app -f script.lua
<...>
To check the instance status, run:
$ tt status my_app
$ # - OR -
$ systemctl status tarantool@my_app
To check the boot log, on systems with systemd, run:
$ journalctl -u tarantool@my_app -n 5
For more specific checks, use the reports provided by functions in the following submodules:
- Submodule box.cfg (check and specify all
configuration parameters for the Tarantool server)
- Submodule box.slab (monitor the total use
and fragmentation of memory allocated for storing data in Tarantool)
- Submodule box.info (introspect Tarantool
server variables, primarily those related to replication)
- Submodule box.stat (introspect Tarantool
request and network statistics)
Finally, there is the metrics
library, which enables collecting metrics (such as memory usage or number
of requests) from Tarantool applications and expose them via various
protocols, including Prometheus. Check Monitoring for more details.
Example
A very popular administrator request is
box.slab.info(),
which displays detailed memory usage statistics for a Tarantool instance.
Tarantool takes memory from the operating system,
for example when a user does many insertions.
You can see how much it has taken by saying (on Linux):
ps -eo args,%mem | grep "tarantool"
Tarantool almost never releases this memory, even if the user
deletes everything that was inserted, or reduces
fragmentation by calling the Lua garbage collector via the
collectgarbage function.
Ordinarily this does not affect performance.
But, to force Tarantool to release memory, you can
call box.snapshot(), stop the server instance,
and restart it.
Inspecting binary traffic is a boring task. We offer a
Wireshark plugin to
simplify the analysis of Tarantool’s traffic.
To enable the plugin, follow the steps below.
Clone the tarantool-dissector repository:
git clone https://github.com/tarantool/tarantool-dissector.git
Copy or symlink the plugin files into the Wireshark plugin directory:
mkdir -p ~/.local/lib/wireshark/plugins
cd ~/.local/lib/wireshark/plugins
ln -s /path/to/tarantool-dissector/MessagePack.lua ./
ln -s /path/to/tarantool-dissector/tarantool.dissector.lua ./
(For the location of the plugin directory on macOS and Windows, please refer to
the Plugin folders
chapter in the Wireshark documentation.)
Run the Wireshark GUI and ensure that the plugins are loaded:
- Open Help > About Wireshark > Plugins.
- Find
MessagePack.lua and tarantool.dissector.lua in the list.
Now you can inspect incoming and outgoing Tarantool packets with user-friendly
annotations.
Visit the project page for details:
https://github.com/tarantool/tarantool-dissector.
Profiling performance issues
Tarantool can at times work slower than usual. There can be multiple reasons,
such as disk issues, CPU-intensive Lua scripts or misconfiguration.
Tarantool’s log may lack details in such cases, so the only indications that
something goes wrong are log entries like this: W> too long DELETE: 8.546 sec.
Here are tools and techniques that can help you collect Tarantool’s performance
profile, which is helpful in troubleshooting slowdowns.
Note
Most of these tools – except fiber.info() – are intended for
generic GNU/Linux distributions, but not FreeBSD or Mac OS.
The simplest profiling method is to take advantage of Tarantool’s built-in
functionality. fiber.info() returns information about all
running fibers with their corresponding C stack traces. You can use this data
to see how many fibers are running and which C functions are executed more often
than others.
First, enter your instance’s interactive administrator console:
Once there, load the fiber module:
After that you can get the required information with fiber.info().
At this point, your console output should look something like this:
We highly recommend to assign meaningful names to fibers you create so that you
can find them in the fiber.info() list. In the example below, we create a
fiber named myworker:
You can kill any fiber with fiber.kill(fid):
To get a table of all alive fibers you can use fiber.top().
If you want to dynamically obtain information with fiber.info(), the shell
script below may come in handy. It connects to a Tarantool instance specified by
NAME every 0.5 seconds, grabs the fiber.info() output and writes it to
the fiber-info.txt file:
$ rm -f fiber.info.txt
$ watch -n 0.5 "echo 'require(\"fiber\").info()' | tt connect NAME -f - | tee -a fiber-info.txt"
If you can’t understand which fiber causes performance issues, collect the
metrics of the fiber.info() output for 10-15 seconds using the script above
and contact the Tarantool team at support@tarantool.org.
pstack <pid>
To use this tool, first install it with a package manager that comes with your
Linux distribution. This command prints an execution stack trace of a running
process specified by the PID. You might want to run this command several times
in a row to pinpoint the bottleneck that causes the slowdown.
Once installed, say:
$ pstack $(pidof tarantool INSTANCENAME.lua)
Next, say:
$ echo $(pidof tarantool INSTANCENAME.lua)
to show the PID of the Tarantool instance that runs the INSTANCENAME.lua file.
You should get similar output:
Thread 19 (Thread 0x7f09d1bff700 (LWP 24173)):
#0 0x00007f0a1a5423f2 in ?? () from /lib64/libgomp.so.1
#1 0x00007f0a1a53fdc0 in ?? () from /lib64/libgomp.so.1
#2 0x00007f0a1ad5adc5 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f0a1a050ced in clone () from /lib64/libc.so.6
Thread 18 (Thread 0x7f09d13fe700 (LWP 24174)):
#0 0x00007f0a1a5423f2 in ?? () from /lib64/libgomp.so.1
#1 0x00007f0a1a53fdc0 in ?? () from /lib64/libgomp.so.1
#2 0x00007f0a1ad5adc5 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f0a1a050ced in clone () from /lib64/libc.so.6
<...>
Thread 2 (Thread 0x7f09c8bfe700 (LWP 24191)):
#0 0x00007f0a1ad5e6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x000000000045d901 in wal_writer_pop(wal_writer*) ()
#2 0x000000000045db01 in wal_writer_f(__va_list_tag*) ()
#3 0x0000000000429abc in fiber_cxx_invoke(int (*)(__va_list_tag*), __va_list_tag*) ()
#4 0x00000000004b52a0 in fiber_loop ()
#5 0x00000000006099cf in coro_init ()
Thread 1 (Thread 0x7f0a1c47fd80 (LWP 24172)):
#0 0x00007f0a1a0512c3 in epoll_wait () from /lib64/libc.so.6
#1 0x00000000006051c8 in epoll_poll ()
#2 0x0000000000607533 in ev_run ()
#3 0x0000000000428e13 in main ()
gdb -ex “bt” -p <pid>
As with pstack, the GNU debugger (also known as gdb) needs to be installed
before you can start using it. Your Linux package manager can help you with that.
Once the debugger is installed, say:
$ gdb -ex "set pagination 0" -ex "thread apply all bt" --batch -p $(pidof tarantool INSTANCENAME.lua)
Next, say:
$ echo $(pidof tarantool INSTANCENAME.lua)
to show the PID of the Tarantool instance that runs the INSTANCENAME.lua file.
After using the debugger, your console output should look like this:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[CUT]
Thread 1 (Thread 0x7f72289ba940 (LWP 20535)):
#0 _int_malloc (av=av@entry=0x7f7226e0eb20 <main_arena>, bytes=bytes@entry=504) at malloc.c:3697
#1 0x00007f7226acf21a in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3234
#2 0x00000000004631f8 in vy_merge_iterator_reserve (capacity=3, itr=0x7f72264af9e0) at /usr/src/tarantool/src/box/vinyl.c:7629
#3 vy_merge_iterator_add (itr=itr@entry=0x7f72264af9e0, is_mutable=is_mutable@entry=true, belong_range=belong_range@entry=false) at /usr/src/tarantool/src/box/vinyl.c:7660
#4 0x00000000004703df in vy_read_iterator_add_mem (itr=0x7f72264af990) at /usr/src/tarantool/src/box/vinyl.c:8387
#5 vy_read_iterator_use_range (itr=0x7f72264af990) at /usr/src/tarantool/src/box/vinyl.c:8453
#6 0x000000000047657d in vy_read_iterator_start (itr=<optimized out>) at /usr/src/tarantool/src/box/vinyl.c:8501
#7 0x00000000004766b5 in vy_read_iterator_next (itr=itr@entry=0x7f72264af990, result=result@entry=0x7f72264afad8) at /usr/src/tarantool/src/box/vinyl.c:8592
#8 0x000000000047689d in vy_index_get (tx=tx@entry=0x7f7226468158, index=index@entry=0x2563860, key=<optimized out>, part_count=<optimized out>, result=result@entry=0x7f72264afad8) at /usr/src/tarantool/src/box/vinyl.c:5705
#9 0x0000000000477601 in vy_replace_impl (request=<optimized out>, request=<optimized out>, stmt=0x7f72265a7150, space=0x2567ea0, tx=0x7f7226468158) at /usr/src/tarantool/src/box/vinyl.c:5920
#10 vy_replace (tx=0x7f7226468158, stmt=stmt@entry=0x7f72265a7150, space=0x2567ea0, request=<optimized out>) at /usr/src/tarantool/src/box/vinyl.c:6608
#11 0x00000000004615a9 in VinylSpace::executeReplace (this=<optimized out>, txn=<optimized out>, space=<optimized out>, request=<optimized out>) at /usr/src/tarantool/src/box/vinyl_space.cc:108
#12 0x00000000004bd723 in process_rw (request=request@entry=0x7f72265a70f8, space=space@entry=0x2567ea0, result=result@entry=0x7f72264afbc8) at /usr/src/tarantool/src/box/box.cc:182
#13 0x00000000004bed48 in box_process1 (request=0x7f72265a70f8, result=result@entry=0x7f72264afbc8) at /usr/src/tarantool/src/box/box.cc:700
#14 0x00000000004bf389 in box_replace (space_id=space_id@entry=513, tuple=<optimized out>, tuple_end=<optimized out>, result=result@entry=0x7f72264afbc8) at /usr/src/tarantool/src/box/box.cc:754
#15 0x00000000004d72f8 in lbox_replace (L=0x413c5780) at /usr/src/tarantool/src/box/lua/index.c:72
#16 0x000000000050f317 in lj_BC_FUNCC ()
#17 0x00000000004d37c7 in execute_lua_call (L=0x413c5780) at /usr/src/tarantool/src/box/lua/call.c:282
#18 0x000000000050f317 in lj_BC_FUNCC ()
#19 0x0000000000529c7b in lua_cpcall ()
#20 0x00000000004f6aa3 in luaT_cpcall (L=L@entry=0x413c5780, func=func@entry=0x4d36d0 <execute_lua_call>, ud=ud@entry=0x7f72264afde0) at /usr/src/tarantool/src/lua/utils.c:962
#21 0x00000000004d3fe7 in box_process_lua (handler=0x4d36d0 <execute_lua_call>, out=out@entry=0x7f7213020600, request=request@entry=0x413c5780) at /usr/src/tarantool/src/box/lua/call.c:382
#22 box_lua_call (request=request@entry=0x7f72130401d8, out=out@entry=0x7f7213020600) at /usr/src/tarantool/src/box/lua/call.c:405
#23 0x00000000004c0f27 in box_process_call (request=request@entry=0x7f72130401d8, out=out@entry=0x7f7213020600) at /usr/src/tarantool/src/box/box.cc:1074
#24 0x000000000041326c in tx_process_misc (m=0x7f7213040170) at /usr/src/tarantool/src/box/iproto.cc:942
#25 0x0000000000504554 in cmsg_deliver (msg=0x7f7213040170) at /usr/src/tarantool/src/cbus.c:302
#26 0x0000000000504c2e in fiber_pool_f (ap=<error reading variable: value has been optimized out>) at /usr/src/tarantool/src/fiber_pool.c:64
#27 0x000000000041122c in fiber_cxx_invoke(fiber_func, typedef __va_list_tag __va_list_tag *) (f=<optimized out>, ap=<optimized out>) at /usr/src/tarantool/src/fiber.h:645
#28 0x00000000005011a0 in fiber_loop (data=<optimized out>) at /usr/src/tarantool/src/fiber.c:641
#29 0x0000000000688fbf in coro_init () at /usr/src/tarantool/third_party/coro/coro.c:110
Run the debugger in a loop a few times to collect enough samples for making
conclusions about why Tarantool demonstrates suboptimal performance.
Use the following script:
$ rm -f stack-trace.txt
$ watch -n 0.5 "gdb -ex 'set pagination 0' -ex 'thread apply all bt' --batch -p $(pidof tarantool INSTANCENAME.lua) | tee -a stack-trace.txt"
Structurally and functionally, this script is very similar to the one used with
fiber.info() above.
If you have any difficulties troubleshooting, let the script run for 10-15 seconds
and then send the resulting stack-trace.txt file to the Tarantool team at
support@tarantool.org.
Warning
Use the poor man’s profilers with caution: each time they attach to a running
process, this stops the process execution for about a second, which may leave
a serious footprint in high-load services.
This tool for performance monitoring and analysis is installed separately via
your package manager. Try running the perf command in the terminal and
follow the prompts to install the necessary package(s).
Note
By default, some perf commands are restricted to root, so, to be on
the safe side, either run all commands as root or prepend them with
sudo.
To start gathering performance statistics, say:
$ perf record -g -p $(pidof tarantool INSTANCENAME.lua)
This command saves the gathered data to a file named perf.data inside the
current working directory. To stop this process (usually, after 10-15 seconds),
press ctrl+C. In your console, you’ll see:
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.225 MB perf.data (1573 samples) ]
Now run the following command:
$ perf report -n -g --stdio | tee perf-report.txt
It formats the statistical data in the perf.data file into a performance
report and writes it to the perf-report.txt file.
The resulting output should look similar to this:
# Samples: 14K of event 'cycles'
# Event count (approx.): 9927346847
#
# Children Self Samples Command Shared Object Symbol
# ........ ........ ............ ......... .................. .......................................
#
35.50% 0.55% 79 tarantool tarantool [.] lj_gc_step
|
--34.95%--lj_gc_step
|
|--29.26%--gc_onestep
| |
| |--13.85%--gc_sweep
| | |
| | |--5.59%--lj_alloc_free
| | |
| | |--1.33%--lj_tab_free
| | | |
| | | --1.01%--lj_alloc_free
| | |
| | --1.17%--lj_cdata_free
| |
| |--5.41%--gc_finalize
| | |
| | |--1.06%--lj_obj_equal
| | |
| | --0.95%--lj_tab_set
| |
| |--4.97%--rehashtab
| | |
| | --3.65%--lj_tab_resize
| | |
| | |--0.74%--lj_tab_set
| | |
| | --0.72%--lj_tab_newkey
| |
| |--0.91%--propagatemark
| |
| --0.67%--lj_cdata_free
|
--5.43%--propagatemark
|
--0.73%--gc_mark
Unlike the poor man’s profilers, gperftools and perf have low overhead
(almost negligible as compared with pstack and gdb): they don’t result
in long delays when attaching to a process and therefore can be used without
serious consequences.
The jit.p profiler comes with the Tarantool application server, to load it one
only needs to say require('jit.p') or require('jit.profile').
There are many options for sampling and display, they are described in
the documentation for the LuaJIT Profiler, available from the 2.1 branch of the git
repository in the file: doc/ext_profiler.html.
Example
Make a function that calls a function named f1 that
does 500,000 inserts and deletes in a Tarantool space.
Start the profiler, execute the function, stop the
profiler, and show what the profiler sampled.
box.space.t:drop()
box.schema.space.create('t')
box.space.t:create_index('i')
function f1() for i = 1,500000 do
box.space.t:insert{i}
box.space.t:delete{i}
end
return 1
end
function f3() f1() end
jit_p = require("jit.profile")
sampletable = {}
jit_p.start("f", function(thread, samples, vmstate)
local dump=jit_p.dumpstack(thread, "f", 1)
sampletable[dump] = (sampletable[dump] or 0) + samples
end)
f3()
jit_p.stop()
for d,v in pairs(sampletable) do print(v, d) end
Typically the result will show that the sampling happened
within f1() many times, but also within internal Tarantool
functions, whose names may change with each new version.
Daemon supervision
Tarantool processes these signals during the event loop in the transaction
processor thread:
| Signal |
Effect |
| SIGHUP |
May cause log file rotation. See the
example in
reference on Tarantool logging parameters. |
| SIGUSR1 |
May cause a database checkpoint. See
Function box.snapshot. |
| SIGTERM |
May cause graceful shutdown (information will be
saved first). |
| SIGINT
(also known as
keyboard interrupt) |
May cause graceful shutdown. |
| SIGKILL |
Causes an immediate shutdown. |
Other signals will result in behavior defined by the operating system. Signals
other than SIGKILL may be ignored, especially if Tarantool is executing a
long-running procedure which prevents return to the event loop in the
transaction processor thread.
Automatic instance restart
On systemd-enabled platforms, systemd automatically restarts all
Tarantool instances in case of failure. To demonstrate it, let’s try to destroy
an instance:
$ systemctl status tarantool@my_app|grep PID
Main PID: 5885 (tarantool)
$ tt connect my_app
• Connecting to the instance...
• Connected to /var/run/tarantool/my_app.control
/var/run/tarantool/my_app.control> os.exit(-1)
⨯ Connection was closed. Probably instance process isn't running anymore
Now let’s make sure that systemd has restarted the instance:
$ systemctl status tarantool@my_app|grep PID
Main PID: 5914 (tarantool)
Additionally, you can find the information about the instance restart in the boot logs:
$ journalctl -u tarantool@my_app -n 8
Tarantool makes a core dump if it receives any of the following signals: SIGSEGV,
SIGFPE, SIGABRT or SIGQUIT. This is automatic if Tarantool crashes.
On systemd-enabled platforms, coredumpctl automatically saves core dumps
and stack traces in case of a crash. Here is a general “how to” for how to
enable core dumps on a Unix system:
- Ensure session limits are configured to enable core dumps, i.e. say
ulimit -c unlimited. Check “man 5 core” for other reasons why a core
dump may not be produced.
- Set a directory for writing core dumps to, and make sure that the directory
is writable. On Linux, the directory path is set in a kernel parameter
configurable via
/proc/sys/kernel/core_pattern.
- Make sure that core dumps include stack trace information. If you use a
binary Tarantool distribution, this is automatic. If you build Tarantool
from source, you will not get detailed information if you pass
-DCMAKE_BUILD_TYPE=Release to CMake.
To simulate a crash, you can execute an illegal command against a Tarantool
instance:
$ # !!! please never do this on a production system !!!
$ tt connect my_app
• Connecting to the instance...
• Connected to /var/run/tarantool/my_app.control
/var/run/tarantool/my_app.control> require('ffi').cast('char *', 0)[0] = 48
⨯ Connection was closed. Probably instance process isn't running anymore
Alternatively, if you know the process ID of the instance (here we refer to it
as $PID), you can abort a Tarantool instance by running gdb debugger:
$ gdb -batch -ex "generate-core-file" -p $PID
or manually sending a SIGABRT signal:
Note
To find out the process id of the instance ($PID), you can:
- look it up in the instance’s box.info.pid,
- find it with
ps -A | grep tarantool, or
- say
systemctl status tarantool@my_app|grep PID.
On a systemd-enabled system, to see the latest crashes of the Tarantool
daemon, say:
$ coredumpctl list /usr/bin/tarantool
MTIME PID UID GID SIG PRESENT EXE
Sat 2016-01-23 15:21:24 MSK 20681 1000 1000 6 /usr/bin/tarantool
Sat 2016-01-23 15:51:56 MSK 21035 995 992 6 /usr/bin/tarantool
To save a core dump into a file, say:
$ coredumpctl -o filename.core info <pid>
Since Tarantool stores tuples in memory, core files may be large.
For investigation, you normally don’t need the whole file, but only a
“stack trace” or “backtrace”.
To save a stack trace into a file, say:
$ gdb -se "tarantool" -ex "bt full" -ex "thread apply all bt" --batch -c core> /tmp/tarantool_trace.txt
where:
- “tarantool” is the path to the Tarantool executable,
- “core” is the path to the core file, and
- “/tmp/tarantool_trace.txt” is a sample path to a file for saving the stack trace.
To see the stack trace and other useful information in console, say:
$ coredumpctl info 21035
PID: 21035 (tarantool)
UID: 995 (tarantool)
GID: 992 (tarantool)
Signal: 6 (ABRT)
Timestamp: Sat 2016-01-23 15:51:42 MSK (4h 36min ago)
Command Line: tarantool my_app.lua <running>
Executable: /usr/bin/tarantool
Control Group: /system.slice/system-tarantool.slice/tarantool@my_app.service
Unit: tarantool@my_app.service
Slice: system-tarantool.slice
Boot ID: 7c686e2ef4dc4e3ea59122757e3067e2
Machine ID: a4a878729c654c7093dc6693f6a8e5ee
Hostname: localhost.localdomain
Message: Process 21035 (tarantool) of user 995 dumped core.
Stack trace of thread 21035:
#0 0x00007f84993aa618 raise (libc.so.6)
#1 0x00007f84993ac21a abort (libc.so.6)
#2 0x0000560d0a9e9233 _ZL12sig_fatal_cbi (tarantool)
#3 0x00007f849a211220 __restore_rt (libpthread.so.0)
#4 0x0000560d0aaa5d9d lj_cconv_ct_ct (tarantool)
#5 0x0000560d0aaa687f lj_cconv_ct_tv (tarantool)
#6 0x0000560d0aaabe33 lj_cf_ffi_meta___newindex (tarantool)
#7 0x0000560d0aaae2f7 lj_BC_FUNCC (tarantool)
#8 0x0000560d0aa9aabd lua_pcall (tarantool)
#9 0x0000560d0aa71400 lbox_call (tarantool)
#10 0x0000560d0aa6ce36 lua_fiber_run_f (tarantool)
#11 0x0000560d0a9e8d0c _ZL16fiber_cxx_invokePFiP13__va_list_tagES0_ (tarantool)
#12 0x0000560d0aa7b255 fiber_loop (tarantool)
#13 0x0000560d0ab38ed1 coro_init (tarantool)
...
To start gdb debugger on the core dump, say:
It is highly recommended to install tarantool-debuginfo package to improve
gdb experience, for example:
$ dnf debuginfo-install tarantool
gdb also provides information about the debuginfo packages you need to
install:
$ gdb -p <pid>
...
Missing separate debuginfos, use: dnf debuginfo-install
glibc-2.22.90-26.fc24.x86_64 krb5-libs-1.14-12.fc24.x86_64
libgcc-5.3.1-3.fc24.x86_64 libgomp-5.3.1-3.fc24.x86_64
libselinux-2.4-6.fc24.x86_64 libstdc++-5.3.1-3.fc24.x86_64
libyaml-0.1.6-7.fc23.x86_64 ncurses-libs-6.0-1.20150810.fc24.x86_64
openssl-libs-1.0.2e-3.fc24.x86_64
Symbolic names are present in stack traces even if you don’t have
tarantool-debuginfo package installed.
Disaster recovery
The minimal fault-tolerant Tarantool configuration would be a replica set
that includes a master and a replica, or two masters.
The basic recommendation is to configure all Tarantool instances in a replica set to create snapshot files on a regular basis.
Here are action plans for typical crash scenarios.
Configuration: master-replica.
Problem: Some transactions are missing on a replica after the master has crashed.
Actions:
You lose a few transactions in the master
write-ahead log file, which may have not
transferred to the replica before the crash. If you were able to salvage the master
.xlog file, you may be able to recover these.
Find out instance UUID from the crashed master xlog:
$ head -5 var/lib/instance001/*.xlog | grep Instance
Instance: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660
On the new master, use the UUID to find the position:
Play the records from the crashed .xlog to the new master, starting from the
new master position:
$ tt play 127.0.0.1:3302 var/lib/instance001/00000000000000000000.xlog \
--from 1000 \
--replica 1 \
--username admin --password secret
Configuration: master-master.
Problem: one master has crashed.
Actions:
- Let the load be handled by another master alone.
- Remove a crashed master from a replica set.
- Set up a replacement for the crashed master on a spare host.
Learn more from Adding and removing instances.
Master-replica/master-master: data loss
Configuration: master-replica or master-master.
Problem: Data was deleted at one master and this data loss was propagated to the other node (master or replica).
Actions:
Put all nodes in read-only mode.
Depending on the replication.failover mode, this can be done as follows:
manual: change a replica set leader to null.
election: set replication.election_mode to voter or off at the replica set level.
off: set database.mode to ro.
Reload configurations on all instances using the reload() function provided by the config module.
Turn off deletion of expired checkpoints with box.backup.start().
This prevents the Tarantool garbage collector from removing files
made with older checkpoints until box.backup.stop() is called.
Get the latest valid .snap file and
use tt cat command to calculate at which LSN the data loss occurred.
Start a new instance and use tt play command to
play to it the contents of .snap and .xlog files up to the calculated LSN.
Bootstrap a new replica from the recovered master.
Note
The steps above are applicable only to data in the memtx storage engine.
Backups
Tarantool has an append-only storage architecture: it appends data to files but
it never overwrites earlier data. The
Tarantool garbage collector
removes old files after a
checkpoint. You can prevent or delay the garbage collector’s action
by configuring the
checkpoint daemon. Backups can be taken at
any time, with minimal overhead on database performance.
Two functions are helpful for backups in certain situations:
- box.backup.start() informs
the server that activities related to the removal of outdated backups must
be suspended and returns a table with the names of snapshot and vinyl files
that should be copied.
- box.backup.stop() later informs
the server that normal operations may resume.
This is a special case when there are only in-memory tables.
The last snapshot file is a backup of the entire
database; and the WAL files
that are made after the last snapshot are incremental backups. Therefore taking
a backup is a matter of copying the snapshot and WAL files.
- Use
tar to make a (possibly compressed) copy of the latest .snap and .xlog
files on the snapshot.dir and
wal.dir directories.
- If there is a security policy, encrypt the .tar file.
- Copy the .tar file to a safe place.
Later, restoring the database is a matter of taking the .tar file and putting
its contents back in the snapshot.dir and wal.dir directories.
Vinyl stores its files in vinyl_dir, and creates a
folder for each database space. Dump and compaction processes are append-only and
create new files. The Tarantool garbage collector may remove old files after each
checkpoint.
To take a mixed backup:
- Issue box.backup.start() on the
administrative console. This will return a list of
files to back up and suspend garbage collection for them till the next
box.backup.stop().
- Copy the files from the list to a safe location. This will include memtx
snapshot files, vinyl run and index files, at a state consistent with the
last checkpoint.
- Issue box.backup.stop() so the garbage
collector can continue as usual.
Continuous remote backup (memtx)
The replication feature is useful for backup as
well as for load balancing.
Therefore taking a backup is a matter of ensuring that any given replica is
up to date, and doing a cold backup on it. Since all the other replicas continue
to operate, this is not a cold backup from the end user’s point of view. This
could be done on a regular basis, with a cron job or with a Tarantool fiber.
Continuous backup (memtx)
The logged changes done since the last cold backup must be secured, while the
system is running.
For this purpose, you need a file copy utility that will do the copying
remotely and continuously, copying only the parts of a write ahead log file
that are changing.
One such utility is rsync.
Alternatively, you need an ordinary file copy utility, but there should be
frequent production of new snapshot files or new WAL files as changes occur,
so that only the new files need to be copied.
Upgrades
Important
This section contains instructions for upgrading Tarantool clusters to versions up to 2.11.x.
This section describes the general upgrade process for Tarantool. There are two
main upgrade scenarios for different use cases:
You can also downgrade to an earlier version using a similar procedure.
For information about backwards compatibility,
see the compatibility guarantees description.
Upgrading from or to certain versions can involve specific steps or slightly differ
from the general upgrade procedure. Such version-specific cases are described on
the dedicated pages inside this section.
This section includes the following topics:
Standalone instance upgrade
This page describes the process of upgrading a standalone Tarantool instance in production.
Note that this always implies a downtime because the application needs to be
stopped and restarted on the target version.
To upgrade without downtime, you need multiple Tarantool servers running in a
replication cluster. Find detailed instructions in Replication cluster upgrade.
Checking your application
Before upgrading, make sure your application is compatible with the target
Tarantool version:
- Set up a development environment with the target Tarantool version installed.
See the installation instructions at the Tarantool download page
and in the tt install reference.
- Deploy the application in this environment and check how it works. In case of
any issues, adjust the application code to ensure compatibility with the target version.
When your application is ready to run on the target Tarantool version, you can
start upgrading the production environment.
Upgrading a standalone instance
Stop the Tarantool instance.
Make a copy of all data and the package from which the current (old)
version was installed. You may need it for rollback purposes. Find the
backup instruction in the appropriate hot backup procedure in
Backups.
Install the target Tarantool version on the host. You can do this
using a package manager or the tt utility.
See the installation instructions at Tarantool download page
and in the tt install reference.
To check that the target Tarantool version is installed, run tarantool -v.
Start your application on the target version.
Run box.schema.upgrade().
This will update the Tarantool system spaces to match the currently installed version of Tarantool.
The rollback procedure for a standalone instance is almost the same as the upgrade.
The only difference is in the last step: you should call box.schema.downgrade()
to return the schema to the original version.
Replication cluster upgrade
Below are the general instructions for upgrading a Tarantool cluster with replication.
Upgrading from some versions can involve certain specifics. To find out if it is your case, check the version-specific topics of the Upgrades
section.
A replication cluster can be upgraded without downtime due to its redundancy.
When you disconnect a single instance for an upgrade, there is always another
instance that takes over its functionality: being a master storage for the same
data buckets or working as a router. This way, you can upgrade all the instances one by one.
The high-level steps of cluster upgrade are the following:
- Ensure the application compatibility with
the target Tarantool version.
- Check the cluster health.
- Install the target Tarantool version on the
cluster nodes.
- Upgrade router nodes one by one.
- Upgrade storage replica sets one by one.
Important
The only way to upgrade Tarantool from version 1.6, 1.7, or 1.9 to 2.x without downtime is
to take an intermediate step by upgrading to 1.10 and then to 2.x.
Before upgrading Tarantool from 1.6 to 2.x, please read about the associated
caveats.
Note
Some upgrade steps are moved to the separate section Procedures and checks
to avoid overloading the general instruction with details. Typically, these are
checks you should repeat during the upgrade to ensure it goes well.
If you experience issues during upgrade, you can roll back to the original version.
The rollback instructions are provided in the Rollback
section.
Checking your application
Before upgrading, make sure your application is compatible with the target
Tarantool version:
- Set up a development environment with the target Tarantool version installed.
See the installation instructions at the Tarantool download page
and in the tt install reference.
- Deploy the application in this environment and check how it works. In case of
any issues, adjust the application code to ensure compatibility with the target version.
When your application is ready to run on the target Tarantool version, you can
start upgrading the production environment.
Perform these steps before the upgrade to ensure that your cluster is working correctly:
On each router instance, perform the vshard.router check:
On each storage instance, perform the replication check:
On each storage instance, perform the vshard.storage check:
Check all instances’ logs for application errors.
Note
If you’re running Cartridge, you can check the health of the cluster instances
on the Cluster tab of its web interface.
In case of any issues, make sure to fix them before starting the upgrade procedure.
Installing the target version
Install the target Tarantool version on all hosts of the cluster. You can do this
using a package manager or the tt utility.
See the installation instructions at the Tarantool download page
and in the tt install reference.
Check that the target Tarantool version is installed by running tarantool -v
on all hosts.
Perform these steps after the upgrade to ensure that your cluster is working correctly:
On each router instance, perform the vshard.router check:
On each storage instance, perform the replication check:
On each storage instance, perform the vshard.storage check:
Check all instances’ logs for application errors.
Note
If you’re running Cartridge, you can check the health of the cluster instances
on the Cluster tab of its web interface.
Rollback before the point of no return
If you decide to roll back before reaching the point of no return,
your data is fully compatible with the version you had before the upgrade.
In this case, you can roll back the same way: restart the nodes you’ve already
upgraded on the original version.
Rollback after the point of no return
If you’ve passed the point of no return (that is,
executed box.schema.upgrade()) during the upgrade, then a rollback requires
downgrading the schema to the original version.
To check if an automatic downgrade is available for your original version, use
box.schema.downgrade_versions(). If the version you need is on the list,
execute the following steps on each upgraded replica set to roll back:
- Run
box.schema.downgrade(<version>) on master specifying the original version.
- Run
box.snapshot() on every instance in the replica set to make sure that the
replicas immediately see the downgraded database state after restart.
- Restart all read-only instances of the replica set on the initial
version one by one.
- Make one of the updated replicas the new master using the applicable instruction
from Switching the master.
- Restart the last instance of the replica set (the former master, now
a replica) on the original version.
Then enable failover or rebalancer back as described in the Upgrading storages.
Recovering from a failed upgrade
Warning
This section applies to cases when the upgrade procedure has failed and the
cluster is not functioning properly anymore. Thus, it implies a downtime and
a full cluster restart.
In case of an upgrade failure after passing the point of no return,
follow these steps to roll back to the original version:
Stop all cluster instances.
Save snapshot and xlog files from all instances whose data was modified
after the last backup procedure. These files will help apply these modifications
later.
Save the latest backups from all instances.
Restore the original Tarantool version on all hosts of the cluster.
Launch the cluster on the original Tarantool version.
Note
At this point, the application becomes fully functional and contains data
from the backups. However, the data modifications made after the backups
were taken must be restored manually.
Manually apply the latest data modifications from xlog files you saved on step 2
using the xlog module. On instances where such changes happened,
do the following:
- Find out the vclock value of the latest operation in the original WAL.
- Play the operations from the newer xlog starting from this vclock on the
instance.
Important
If the upgrade has failed after calling box.schema.upgrade(),
don’t apply the modifications of system spaces done by this call.
This can make the schema incompatible with the original Tarantool version.
Find more information about the Tarantool recovery in Disaster recovery.
Run box.info:
Check that the following conditions are satisfied:
box.info.status is running
box.info.replication[*].upstream.status and box.info.replication[*].downstream.status
are follow
box.info.replication[*].upstream.lag is less or equal than box.cfg.replication_timeout,
but it can also be moderately larger under a write load.
box.info.ro is false at least on one instance in each replica set.
If all instances have box.info.ro = true, this means there are no writable nodes.
On Tarantool v. 2.10.0 or later, you can find out
why this happened by running box.info.ro_reason.
If box.info.ro_reason or box.info.status has the value orphan,
the instance doesn’t see the rest of the replica set.
Then run box.info once more and check that box.info.replication[*].upstream.lag
values are updated.
Run vshard.storage.info():
Check that the following conditions are satisfied:
- there are no issues or alerts
replication.status is follow
Run vshard.router.info():
Check that the following conditions are satisfied:
- there are no issues or alerts
- all buckets are available (the sum of
bucket.available_rw on all replica
sets equals the total number of buckets)
Cartridge. If your cluster runs on Cartridge, you can switch the master in the web interface.
To do this, go to the Cluster tab, click Edit replica set, and drag an
instance to the top of Failover priority list to make it the master.
Raft. If your cluster uses automated leader election,
switch the master by following these steps:
- Pick a candidate – a read-only instance to become the new master.
- Run
box.ctl.promote() on the candidate. The operation will start and
wait for the election to happen.
- Run
box.cfg{ election_mode = "voter" } on the current master.
- Check that the candidate became the new master: its
box.info.ro
must be false.
Legacy. If your cluster neither works on Cartridge nor has automated leader election,
switch the master by following these steps:
Pick a candidate – a read-only instance to become the new master.
Run box.cfg{ read_only = true } on the current master.
Check that the candidate’s vclock value matches the master’s:
The value of box.info.vclock[<master_id>] on the candidate must be equal
to box.info.lsn on the master. <master_id> here is the value of
box.info.id on the master.
If the vclock values don’t match, stop the switch procedure and restore
the replica set state by calling box.cfg{ read_only == false } on the master.
Then pick another candidate and restart the procedure.
After switching the master, perform the replication check
on each instance of the replica set.
Upgrade from 1.6 directly to 2.x with downtime
Versions later that 1.6 have incompatible .snap and
.xlog file formats: 1.6 files are
supported during upgrade, but you won’t be able to return to 1.6 after running
under 1.10 or 2.x for a while. A few configuration parameters are also renamed.
To perform a live upgrade from Tarantool 1.6 to a more recent version,
like 2.8.4, 2.10.1 and such,
it is necessary to take an intermediate step by upgrading 1.6 -> 1.10 -> 2.x.
This is the only way to perform the upgrade without downtime.
However, a direct upgrade of a replica set from 1.6 to 2.x is also possible, but only
with downtime.
Here is how to upgrade from Tarantool 1.6 directly to 2.x:
- Stop all instances in the replica set.
- Upgrade Tarantool version to 2.x on every instance.
- Upgrade the corresponding instance files and applications, if needed.
- Start all the instances with Tarantool 2.x.
- Execute
box.schema.upgrade() on the master.
- Execute
box.snapshot() on every node in the replica set.
Fix decimal values in vinyl spaces when upgrading to 2.10.1
This is an upgrade guide for fixing one specific problem which could happen with decimal values in vinyl spaces.
It’s only relevant when you’re upgrading from Tarantool version <= 2.10.0 to anything >= 2.10.1.
Before gh-6377 was fixed, decimal and double values in a scalar or number index
could end up in the wrong order after the update.
If such an index has been built for a space that uses the vinyl storage engine,
the index is persisted and is not rebuilt even after the upgrade.
If this is the case, the user has to rebuild the affected indexes manually.
Here are the rules to determine whether your installation was affected.
If all of the statements listed below are true, you have to rebuild indexes for the affected vinyl spaces manually.
- You were running Tarantool version 2.10.0 and below.
- You have spaces with the
vinyl storage engine.
- The
vinyl spaces have number or scalar indexes.
- The tuples in these spaces may contain both
decimal and double Inf or NaN values.
If this is the case for you, you can run the following script, which will find all the affected indices:
local fiber = require('fiber')
local decimal = require('decimal')
local function isnan(val)
return type(val) == 'number' and val ~= val
end
local function isinf(val)
return val == math.huge or val == -math.huge
end
local function vinyl(id)
return box.space[id].engine == 'vinyl'
end
require_rebuild = {}
local iters = 0
for _, v in box.space._index:pairs({512, 0}, {iterator='GE'}) do
local id = v[1]
iters = iters + 1
if iters % 1000 == 0 then
fiber.yield()
end
if vinyl(id) then
local format = v[6]
local check_fields = {}
for _, fmt in pairs(v[6]) do
if fmt[2] == 'number' or fmt[2] == 'scalar' then
table.insert(check_fields, fmt[1] + 1)
end
end
local have_decimal = {}
local have_nan = {}
if #check_fields > 0 then
for k, tuple in box.space[id]:pairs() do
for _, i in pairs(check_fields) do
iters = iters + 1
if iters % 1000 == 0 then
fiber.yield()
end
have_decimal[i] = have_decimal[i] or
decimal.is_decimal(tuple[i])
have_nan[i] = have_nan[i] or isnan(tuple[i]) or
isinf(tuple[i])
if have_decimal[i] and have_nan[i] then
table.insert(require_rebuild, v)
goto out
end
end
end
end
end
::out::
end
The indices requiring a rebuild will be stored in the require_rebuild table.
If the table is empty, you’re safe and can continue using Tarantool as before.
If the require_rebuild table contains some entries,
you can rebuild the affected indices with the following script.
Note
Please run the script below only on the master node
and only after all the nodes are upgraded to the new Tarantool version.
local log = require('log')
local function rebuild_index(idx)
local index_name = idx[3]
local space_name = box.space[idx[1]].name
log.info("Rebuilding index %s on space %s", index_name, space_name)
if (idx[2] == 0) then
log.error("Cannot rebuild primary index %s on space %s. Please, "..
"recreate the space manually", index_name, space_name)
return
end
log.info("Deleting index %s on space %s", index_name, space_name)
local v = box.space._index:delete{idx[1], idx[2]}
if v == nil then
log.error("Couldn't find index %s on space %s", index_name, space_name)
return
end
log.info("Done")
log.info("Creating index %s on space %s", index_name, space_name)
box.space._index:insert(v)
end
for _, idx in pairs(require_rebuild) do
rebuild_index(idx)
end
The script might fail on some of the indices with the following error:
“Cannot rebuild primary index index_name on space space_name. Please, recreate the space manually”.
If this happens, automatic index rebuild is impossible,
and you have to manually re-create the space to ensure data integrity:
- Create a new space with the same format as the existing one.
- Define the same indices on the freshly created space.
- Iterate over the old space’s primary key and insert all the data into the new space.
- Drop the old space.
Fix illegal type names when upgrading to 2.10.4
This is an upgrade guide for fixing one specific problem which could happen with field type names.
It’s only relevant when you’re upgrading from a Tarantool version <=2.10.3 to >=2.10.4.
Before gh-5940 was fixed, the empty string, n, nu, s,
and st (that is, leading parts of num and str) were accepted as valid field types.
Since 2.10.4, Tarantool doesn’t accept these strings and they must be replaced with
correct values num and str.
This instruction is also available on GitHub.
Check if your snapshots contain illegal type names
A snapshot can be validated against the issue using the following script:
#!/usr/bin/env tarantool
local xlog = require('xlog')
local json = require('json')
if arg[1] == nil then
print(('Usage: %s xxxxxxxxxxxxxxxxxxxx.snap'):format(arg[0]))
os.exit(1)
end
local illegal_types = {
[''] = true,
['n'] = true,
['nu'] = true,
['s'] = true,
['st'] = true,
}
local function report_field_def(name, field_def)
local msg = 'A field def in a _space entry %q contains an illegal type: %s'
print(msg:format(name, json.encode(field_def)))
end
local has_broken_format = false
for _, record in xlog.pairs(arg[1]) do
-- Filter inserts.
if record.HEADER == nil or record.HEADER.type ~= 'INSERT' then
goto continue
end
-- Filter _space records.
if record.BODY == nil or record.BODY.space_id ~= 280 then
goto continue
end
local tuple = record.BODY.tuple
local name = tuple[3]
local format = tuple[7]
local is_format_broken = false
for _, field_def in ipairs(format) do
if illegal_types[field_def.type] ~= nil then
report_field_def(name, field_def)
is_format_broken = true
end
if illegal_types[field_def[2]] ~= nil then
report_field_def(name, field_def)
is_format_broken = true
end
end
if is_format_broken then
has_broken_format = true
local msg = 'The following _space entry contains illegal type(s): %s'
print(msg:format(json.encode(record)))
end
::continue::
end
if has_broken_format then
print('')
print(('%s has an illegal type in a space format'):format(arg[1]))
print('It is recommended to proceed with the upgrade instruction:')
print('https://github.com/tarantool/tarantool/wiki/Fix-illegal-field-type-in-a-space-format-when-upgrading-to-2.10.4')
else
print('Everything looks nice!')
end
os.exit(has_broken_format and 2 or 0)
If the snapshot contains the values that aren’t valid in 2.10.4, you’ll get
an output like the following:
To fix the application file that contains illegal type names, add the following code in it
before the box.cfg()/vshard.cfg()/cartridge.cfg() call.
Note
In Cartridge applications, the instance file is called init.lua.
-- Convert illegal type names in a space format that were
-- allowed before tarantool 2.10.4.
local log = require('log')
local json = require('json')
local transforms = {
[''] = 'num',
['n'] = 'num',
['nu'] = 'num',
['s'] = 'str',
['st'] = 'str',
}
-- The helper for before_replace().
local function transform_field_def(name, field_def, field, new_type)
local field_def_old_str = json.encode(field_def)
field_def[field] = new_type
local field_def_new_str = json.encode(field_def)
local msg = 'Transform a field def in a _space entry %q: %s -> %s'
log.info(msg:format(name, field_def_old_str, field_def_new_str))
end
-- _space trigger.
local function before_replace(_, tuple)
if tuple == nil then return tuple end
local name = tuple[3]
local format = tuple[7]
-- Update format if necessary.
local is_format_changed = false
for i, field_def in ipairs(format) do
local new_type = transforms[field_def.type]
if new_type ~= nil then
transform_field_def(name, field_def, 'type', new_type)
is_format_changed = true
end
local new_type = transforms[field_def[2]]
if new_type ~= nil then
transform_field_def(name, field_def, 2, new_type)
is_format_changed = true
end
end
-- No changed: skip.
if not is_format_changed then return tuple end
-- Rebuild the tuple.
local new_tuple = tuple:transform(7, 1, format)
log.info(('Transformed _space entry %s to %s'):format(
json.encode(tuple), json.encode(new_tuple)))
return new_tuple
end
-- on_schema_init trigger to set before_replace().
local function on_schema_init()
box.space._space:before_replace(before_replace)
end
-- Set the trigger on _space.
box.ctl.on_schema_init(on_schema_init)
You can delete these triggers after the box.cfg()/vshard.cfg()/cartridge.cfg()
call.
An example for a Cartridge application:
The triggers will report the changes the make in the following form:
Recover from WALs with mixed transactions when upgrading to 2.11.0
This is a guide on fixing a specific problem that could happen when upgrading
from a Tarantool version between 2.1.2 and 2.2.0 to 2.8.1 or later. The described
solution is applicable since version 2.11.0.
The problem is described in the issue gh-7932. If two or more
transactions happened simultaneously in Tarantool 2.1.2-2.2.0, their operations
could be written to the write-ahead log mixed with each other. Starting from version
2.8.1, Tarantool recovers transactions atomically and expects all WAL entries
between a transaction’s begin and commit operations to belong to one transaction.
If there is an operation belonging to another transaction, Tarantool fails to recover
from such a WAL.
Starting from version 2.11.0, Tarantool can recover from
WALs with mixed transactions in the force_recovery mode.
If all instances or some of them fail to start after upgrading to 2.11 or a newer
version due to a recovery error:
- Start these instances with the force_recovery
option to
true.
- Make new snapshots on the instances so that the old WALs with mixed transactions
aren’t used for recovery anymore. To do this, call box.snapshot().
- Set
force_recovery back to false.
After all the instances start successfully, WALs with mixed transactions
may still lead to replication issues. Some instances may fail to replicate from other
instances because they are sending incorrect WALs. To fix the replication issues,
rebootstrap the instances that fail to replicate.
Bug reports
If you found a bug in Tarantool, you’re doing us a favor by taking the time to
tell us about it.
Please create an issue at Tarantool repository at GitHub. We encourage you to
include the following information:
- Steps needed to reproduce the bug, and an explanation why this differs from
the expected behavior according to our manual. Please provide specific unique
information. For example, instead of “I can’t get certain information”, say
“box.space.x:delete() didn’t report what was deleted”.
- Your operating system name and version, the Tarantool name and version, and
any unusual details about your machine and its configuration.
- Related files like a stack trace or a Tarantool
log file.
If this is a feature request or if it affects a special category of users, be
sure to mention that.
Usually within one or two workdays a Tarantool team member will write an
acknowledgment, or some questions, or suggestions for a workaround.
Flight recorder
Example on GitHub: flightrec
The flight recorder is an event collection tool that gathers various information about a working Tarantool instance, such as:
- logs
- metrics
- requests and responses
This information helps you investigate incidents related
to crashing a Tarantool instance.
Enable the flight recorder
The flight recorder is disabled by default and can be enabled and configured for
a specific Tarantool instance.
To enable the flight recorder, set the flightrec.enabled
configuration option to true.
After flightrec.enabled is set to true, the flight recorder starts collecting data in the flight recording file current.ttfr.
This file is stored in the snapshot.dir directory.
By default, the directory is var/lib/{{ instance_name }}/<file_name>.ttfr.
If the instance crashes and reboots, Tarantool rotates the flight recording:
current.ttfr is renamed to <timestamp>.ttfr (for example, 20230411T050721.ttfr)
and the new current.ttfr file is created for collecting data.
In the case of correct shutdown (for example, using os.exit()),
Tarantool continues writing to the existing current.ttfr file after restart.
Note
Note that old flight recordings should be removed manually.
Monitoring
Monitoring is the process of capturing runtime information about the instances of a Tarantool cluster using metrics.
Metrics can indicate various characteristics, such as memory usage, the number of records in spaces, replication status, and so on.
Typically, metrics are monitored in real time, allowing for the identification of current issues or the prediction of potential ones.
Getting started with monitoring
Example on GitHub: sharded_cluster_crud_metrics
Tarantool allows you to configure and expose its metrics using a YAML configuration.
You can also use the built-in metrics module to create and collect custom metrics.
To configure metrics, use the metrics section in a cluster configuration.
The configuration below enables all metrics excluding vinyl-specific ones:
metrics:
include: [ all ]
exclude: [ vinyl ]
labels:
alias: '{{ instance_name }}'
The metrics.labels option accepts the predefined {{ instance_name }} variable.
This adds an instance name as a label to every observation.
Third-party Lua modules, like crud or expirationd, offer their own metrics.
You can enable these metrics by configuring the corresponding role.
The example below shows how to enable statistics on called operations by providing the roles.crud-router role’s configuration:
roles:
- roles.crud-router
- roles.metrics-export
roles_cfg:
roles.crud-router:
stats: true
stats_driver: metrics
stats_quantiles: true
expirationd metrics can be enabled as follows:
expirationd:
cfg:
metrics: true
To expose metrics in different formats, you can use a third-party metrics-export-role role.
In the following example, the metrics of storage-a-001 are provided on two endpoints:
/metrics/prometheus: exposes metrics in the Prometheus format.
/metrics/json: exposes metrics in the JSON format.
storage-a-001:
roles_cfg:
roles.metrics-export:
http:
- listen: '127.0.0.1:8082'
endpoints:
- path: /metrics/prometheus/
format: prometheus
- path: /metrics/json
format: json
Example on GitHub: sharded_cluster_crud_metrics
Note
The metrics module provides a set of plugins that can be used to collect and expose metrics in different formats. Learn more in Collecting metrics using plugins.
The metrics module allows you to create and collect custom metrics.
The example below shows how to collect the number of data operations performed on the specified space by increasing a counter value inside the on_replace() trigger function:
local metrics = require('metrics')
local bands_replace_count = metrics.counter('bands_replace_count', 'The number of data operations')
local trigger = require('trigger')
trigger.set(
'box.space.bands.on_replace',
'update_bands_replace_count_metric',
function(_, _, _, request_type)
bands_replace_count:inc(1, { request_type = request_type })
end
)
Learn more in Custom metrics.
When metrics are configured and exposed, you can use the desired third-party tool to collect them.
Below is the example of a Prometheus scrape configuration that collects metrics of multiple Tarantool instances:
global:
scrape_interval: 5s
evaluation_interval: 5s
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- 127.0.0.1:8081
- 127.0.0.1:8082
- 127.0.0.1:8083
- 127.0.0.1:8084
- 127.0.0.1:8085
metrics_path: "/metrics/prometheus"
For more information on collecting and visualizing metrics, refer to Grafana dashboard.
Note
Tarantool Cluster Manager allows you to view metrics of connected clusters in real time.
Learn more in Viewing cluster metrics.
Grafana dashboard
After enabling and configuring metrics, you can visualise them using Tarantool Grafana dashboards.
These dashboards are available as part of
Grafana official & community-built dashboards:
The Tarantool Grafana dashboard is a ready for import template with basic memory,
space operations, and HTTP load panels, based on default metrics
package functionality.
Prepare a monitoring stack
Since there are Prometheus and InfluxDB data source Grafana dashboards,
you can use one of the following:
- Telegraf
as a server agent for collecting metrics, InfluxDB
as a time series database for storing metrics, and Grafana
as a visualization platform.
- Prometheus as both a server agent for collecting metrics
and a time series database for storing metrics, and Grafana
as a visualization platform.
For issues related to setting up Prometheus, Telegraf, InfluxDB, or Grafana instances, refer to the corresponding project’s documentation.
Collect metrics with server agents
To collect metrics for Prometheus, first set up metrics output with prometheus format.
You can use the roles.metrics-export configuration or set up the Prometheus plugin manually.
To start collecting metrics, add a job
to Prometheus configuration with each Tarantool instance URI as a target and
metrics path as it was configured on Tarantool instances:
global:
scrape_interval: 5s
evaluation_interval: 5s
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- 127.0.0.1:8081
- 127.0.0.1:8082
- 127.0.0.1:8083
- 127.0.0.1:8084
- 127.0.0.1:8085
metrics_path: "/metrics/prometheus"
To collect metrics for InfluxDB, use the Telegraf agent.
First off, configure Tarantool metrics output in json format
with roles.metrics-export configuration or corresponding JSON plugin.
To start collecting metrics, add http input
to Telegraf configuration including each Tarantool instance metrics URL:
[[inputs.http]]
urls = [
"http://example_project:8081/metrics/json",
"http://example_project:8082/metrics/json",
"http://example_project:8083/metrics/json",
"http://example_project:8084/metrics/json",
"http://example_project:8085/metrics/json"
]
timeout = "30s"
tag_keys = [
"metric_name",
"label_pairs_alias",
"label_pairs_quantile",
"label_pairs_path",
"label_pairs_method",
"label_pairs_status",
"label_pairs_operation",
"label_pairs_level",
"label_pairs_id",
"label_pairs_engine",
"label_pairs_name",
"label_pairs_index_name",
"label_pairs_delta",
"label_pairs_stream",
"label_pairs_thread",
"label_pairs_kind"
]
insecure_skip_verify = true
interval = "10s"
data_format = "json"
name_prefix = "tarantool_"
fieldpass = ["value"]
Be sure to include each label key as label_pairs_<key> to extract it
with the plugin.
For example, if you use { state = 'ready' } labels somewhere in metric collectors, add label_pairs_state tag key.
Open Grafana import menu.
To import a specific dashboard, choose one of the following options:
Set dashboard name, folder and uid (if needed).
You can choose the data source and data source variables after import.
- If there are no data on the graphs, make sure that you picked datasource and job/measurement correctly.
- If there are no data on the graphs, make sure that you have
info group of Tarantool metrics
(in particular, tnt_info_uptime).
- If some Prometheus graphs show no data because of
parse error: missing unit character in duration,
ensure that you use Grafana 7.2 or newer.
- If some Prometheus graphs display
parse error: bad duration syntax "1m0" or similar error, you need
to update your Prometheus version. See
grafana/grafana#44542 for more details.
Alerting
You can set up alerts on metrics to get a notification when something went
wrong. We will use Prometheus alert rules
as an example here. You can get full alerts.yml file at
tarantool/grafana-dashboard GitHub repo.
If there are no Tarantool metrics, you may miss critical conditions. Prometheus
provide up metric to monitor the health of its targets.
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: page
annotations:
summary: "Instance '{{ $labels.instance }}' ('{{ $labels.job }}') down"
description: "'{{ $labels.instance }}' of job '{{ $labels.job }}' has been down for more than a minute."
Do not forget to monitor your server’s CPU, disk and RAM from server side with
your favorite tools. For example, on some high CPU consumption cases Tarantool
instance may stop to send metrics, so you can track such breakdowns only from
the outside.
Metrics reference
This page provides a detailed description of metrics from the metrics module.
General instance information:
tnt_cfg_current_time |
Instance system time in the Unix timestamp format |
tnt_info_uptime |
Time in seconds since the instance has started |
tnt_read_only |
Indicates if the instance is in read-only mode (1 if true, 0 if false) |
The following metrics provide a picture of memory usage by the Tarantool process.
tnt_info_memory_cache |
Number of bytes in the cache used to store
tuples with the vinyl storage engine. |
tnt_info_memory_data |
Number of bytes used to store user data (tuples)
with the memtx engine and with level 0 of the vinyl engine,
without regard for memory fragmentation. |
tnt_info_memory_index |
Number of bytes used for indexing user data.
Includes memtx and vinyl memory tree extents,
the vinyl page index, and the vinyl bloom filters. |
tnt_info_memory_lua |
Number of bytes used for the Lua runtime.
Monitoring this metric can prevent memory overflow. |
tnt_info_memory_net |
Number of bytes used for network input/output buffers. |
tnt_info_memory_tx |
Number of bytes in use by active transactions.
For the vinyl storage engine,
this is the total size of all allocated objects
(struct txv, struct vy_tx, struct vy_read_interval)
and tuples pinned for those objects. |
Provides a memory usage report for the slab allocator.
The slab allocator is the main allocator used to store tuples.
The following metrics help monitor the total memory usage and memory fragmentation.
To learn more about use cases, refer to the
box.slab submodule documentation.
Available memory, bytes:
tnt_slab_quota_size |
Amount of memory available to store tuples and indexes.
Is equal to memtx_memory. |
tnt_slab_arena_size |
Total memory available to store both tuples and indexes.
Includes allocated but currently free slabs. |
tnt_slab_items_size |
Total amount of memory available to store only tuples and not indexes.
Includes allocated but currently free slabs. |
Memory usage, bytes:
tnt_slab_quota_used |
The amount of memory that is already reserved by the slab allocator. |
tnt_slab_arena_used |
The effective memory used to store both tuples and indexes.
Disregards allocated but currently free slabs. |
tnt_slab_items_used |
The effective memory used to store only tuples and not indexes.
Disregards allocated but currently free slabs. |
Memory utilization, %:
tnt_slab_quota_used_ratio |
tnt_slab_quota_used / tnt_slab_quota_size |
tnt_slab_arena_used_ratio |
tnt_slab_arena_used / tnt_slab_arena_size |
tnt_slab_items_used_ratio |
tnt_slab_items_used / tnt_slab_items_size |
The following metrics provide specific information
about each individual space in a Tarantool instance.
tnt_space_len |
Number of records in the space.
This metric always has 2 labels: {name="test", engine="memtx"},
where name is the name of the space and
engine is the engine of the space. |
tnt_space_bsize |
Total number of bytes in all tuples.
This metric always has 2 labels: {name="test", engine="memtx"},
where name is the name of the space
and engine is the engine of the space. |
tnt_space_index_bsize |
Total number of bytes taken by the index.
This metric always has 2 labels: {name="test", index_name="pk"},
where name is the name of the space and
index_name is the name of the index. |
tnt_space_total_bsize |
Total size of tuples and all indexes in the space.
This metric always has 2 labels: {name="test", engine="memtx"},
where name is the name of the space and
engine is the engine of the space. |
tnt_vinyl_tuples |
Total tuple count for vinyl.
This metric always has 2 labels: {name="test", engine="vinyl"},
where name is the name of the space and
engine is the engine of the space. For vinyl this metric is disabled
by default and can be enabled only with global variable setup:
rawset(_G, 'include_vinyl_count', true). |
Network activity stats.
These metrics can be used to monitor network load, usage peaks, and traffic drops.
Sent bytes:
tnt_net_sent_total |
Bytes sent from the instance over the network since the instance’s start time |
Received bytes:
tnt_net_received_total |
Bytes received by the instance since start time |
Connections:
tnt_net_connections_total |
Number of incoming network connections since the instance’s start time |
tnt_net_connections_current |
Number of active network connections |
Requests:
tnt_net_requests_total |
Number of network requests the instance has handled since its start time |
tnt_net_requests_current |
Number of pending network requests |
Requests in progress:
tnt_net_requests_in_progress_total |
Total count of requests processed by tx thread |
tnt_net_requests_in_progress_current |
Count of requests currently being processed in the tx thread |
Requests placed in queues of streams:
tnt_net_requests_in_stream_total |
Total count of requests, which was placed in queues of streams
for all time |
tnt_net_requests_in_stream_current |
Count of requests currently waiting in queues of streams |
Since Tarantool 2.10 in each network metric has the label thread, showing per-thread network statistics.
Provides the statistics for fibers.
If your application creates a lot of fibers,
you can use the metrics below to monitor fiber count and memory usage.
tnt_fiber_amount |
Number of fibers |
tnt_fiber_csw |
Overall number of fiber context switches |
tnt_fiber_memalloc |
Amount of memory reserved for fibers |
tnt_fiber_memused |
Amount of memory used by fibers |
You can collect iproto requests an instance has processed
and aggregate them by request type.
This may help you find out what operations your clients perform most often.
tnt_stats_op_total |
Total number of calls since server start |
To distinguish between request types, this metric has the operation label.
For example, it can look as follows: {operation="select"}.
For the possible request types, check the table below.
auth |
Authentication requests |
call |
Requests to execute stored procedures |
delete |
Delete calls |
error |
Requests resulted in an error |
eval |
Calls to evaluate Lua code |
execute |
Execute SQL calls |
insert |
Insert calls |
prepare |
SQL prepare calls |
replace |
Replace calls |
select |
Select calls |
update |
Update calls |
upsert |
Upsert calls |
Provides the current replication status.
Learn more about replication in Tarantool.
tnt_info_lsn |
LSN of the instance. |
tnt_info_vclock |
LSN number in vclock.
This metric always has the label {id="id"},
where id is the instance’s number in the replica set. |
tnt_replication_lsn |
LSN of the tarantool instance.
This metric always has labels {id="id", type="type"}, where
id is the instance’s number in the replica set,
type is master or replica. |
tnt_replication_lag |
Replication lag value in seconds.
This metric always has labels {id="id", stream="stream"},
where id is the instance’s number in the replica set,
stream is downstream or upstream. |
tnt_replication_status |
This metrics equals 1 when replication status is “follow” and 0 otherwise.
This metric always has labels {id="id", stream="stream"},
where id is the instance’s number in the replica set,
stream is downstream or upstream. |
tnt_runtime_lua |
Lua garbage collector size in bytes |
tnt_runtime_used |
Number of bytes used for the Lua runtime |
tnt_runtime_tuple |
Number of bytes used for the tuples (except tuples owned by memtx and vinyl) |
LuaJIT metrics provide an insight into the work of the Lua garbage collector.
These metrics are available in Tarantool 2.6 and later.
General JIT metrics:
lj_jit_snap_restore_total |
Overall number of snap restores |
lj_jit_trace_num |
Number of JIT traces |
lj_jit_trace_abort_total |
Overall number of abort traces |
lj_jit_mcode_size |
Total size of allocated machine code areas |
JIT strings:
lj_strhash_hit_total |
Number of strings being interned |
lj_strhash_miss_total |
Total number of string allocations |
GC steps:
lj_gc_steps_atomic_total |
Count of incremental GC steps (atomic state) |
lj_gc_steps_sweepstring_total |
Count of incremental GC steps (sweepstring state) |
lj_gc_steps_finalize_total |
Count of incremental GC steps (finalize state) |
lj_gc_steps_sweep_total |
Count of incremental GC steps (sweep state) |
lj_gc_steps_propagate_total |
Count of incremental GC steps (propagate state) |
lj_gc_steps_pause_total |
Count of incremental GC steps (pause state) |
Allocations:
lj_gc_strnum |
Number of allocated string objects |
lj_gc_tabnum |
Number of allocated table objects |
lj_gc_cdatanum |
Number of allocated cdata objects |
lj_gc_udatanum |
Number of allocated udata objects |
lj_gc_freed_total |
Total amount of freed memory |
lj_gc_memory |
Current allocated Lua memory |
lj_gc_allocated_total |
Total amount of allocated memory |
The following metrics provide CPU usage statistics.
They are only available on Linux.
tnt_cpu_number |
Total number of processors configured by the operating system |
tnt_cpu_time |
Host CPU time |
tnt_cpu_thread |
Tarantool thread CPU time.
This metric always has the labels
{kind="user", thread_name="tarantool", thread_pid="pid", file_name="init.lua"},
where:
kind can be either user or system
thread_name is tarantool, wal, iproto, or coio
file_name is the entrypoint file name, for example, init.lua.
|
There are also two cross-platform metrics, which can be obtained with a getrusage() call.
tnt_cpu_user_time |
Tarantool CPU user time |
tnt_cpu_system_time |
Tarantool CPU system time |
Vinyl metrics provide vinyl engine statistics.
The disk metrics are used to monitor overall data size on disk.
tnt_vinyl_disk_data_size |
Amount of data in bytes stored in the .run files
located in vinyl_dir |
tnt_vinyl_disk_index_size |
Amount of data in bytes stored in the .index files
located in vinyl_dir |
The vinyl regulator decides when to commence disk IO actions.
It groups activities in batches so that they are more consistent and
efficient.
tnt_vinyl_regulator_dump_bandwidth |
Estimated average dumping rate, bytes per second.
The rate value is initially 10485760 (10 megabytes per second).
It is recalculated depending on the the actual rate.
Only significant dumps that are larger than 1 MB are used for estimating. |
tnt_vinyl_regulator_write_rate |
Actual average rate of performing write operations, bytes per second.
The rate is calculated as a 5-second moving average.
If the metric value is gradually going down,
this can indicate disk issues. |
tnt_vinyl_regulator_rate_limit |
Write rate limit, bytes per second.
The regulator imposes the limit on transactions
based on the observed dump/compaction performance.
If the metric value is down to approximately 10^5,
this indicates issues with the disk
or the scheduler. |
tnt_vinyl_regulator_dump_watermark |
Maximum amount of memory in bytes used
for in-memory storing of a vinyl LSM tree.
When this maximum is accessed, a dump must occur.
For details, see Filling an LSM tree.
The value is slightly smaller
than the amount of memory allocated for vinyl trees,
reflected in the vinyl_memory parameter. |
tnt_vinyl_regulator_blocked_writers |
The number of fibers that are blocked waiting
for Vinyl level0 memory quota. |
tnt_vinyl_tx_commit |
Counter of commits (successful transaction ends)
Includes implicit commits: for example, any insert operation causes a
commit unless it is within a
box.begin()–box.commit()
block. |
tnt_vinyl_tx_rollback |
Сounter of rollbacks (unsuccessful transaction ends).
This is not merely a count of explicit
box.rollback()
requests – it includes requests that ended with errors. |
tnt_vinyl_tx_conflict |
Counter of conflicts that caused transactions to roll back.
The ratio tnt_vinyl_tx_conflict / tnt_vinyl_tx_commit
above 5% indicates that vinyl is not healthy.
At that moment, you’ll probably see a lot of other problems with vinyl. |
tnt_vinyl_tx_read_views |
Current number of read views – that is, transactions
that entered the read-only state to avoid conflict temporarily.
Usually the value is 0.
If it stays non-zero for a long time, it is indicative of a memory leak. |
The following metrics show state memory areas used by vinyl for caches and write buffers.
tnt_vinyl_memory_tuple_cache |
Amount of memory in bytes currently used to store tuples (data) |
tnt_vinyl_memory_level0 |
“Level 0” (L0) memory area, bytes.
L0 is the area that vinyl can use for in-memory storage of an LSM tree.
By monitoring this metric, you can see when L0 is getting close to its
maximum (tnt_vinyl_regulator_dump_watermark),
at which time a dump will occur.
You can expect L0 = 0 immediately after the dump operation is completed. |
tnt_vinyl_memory_page_index |
Amount of memory in bytes currently used to store indexes.
If the metric value is close to vinyl_memory,
this indicates that vinyl_page_size
was chosen incorrectly. |
tnt_vinyl_memory_bloom_filter |
Amount of memory in bytes used by
bloom filters. |
tnt_vinyl_memory_tuple |
Total size of memory in bytes occupied by Vinyl tuples.
It includes cached tuples and tuples pinned by the Lua world. |
The vinyl scheduler invokes the regulator and
updates the related variables. This happens once per second.
tnt_vinyl_scheduler_tasks |
Number of scheduler dump/compaction tasks.
The metric always has label {status = <status_value>},
where <status_value> can be one of the following:
inprogress for currently running tasks
completed for successfully completed tasks
failed for tasks aborted due to errors.
|
tnt_vinyl_scheduler_dump_time |
Total time in seconds spent by all worker threads performing dumps. |
tnt_vinyl_scheduler_dump_total |
Counter of dumps completed. |
Event loop tx thread information:
tnt_ev_loop_time |
Event loop time (ms) |
tnt_ev_loop_prolog_time |
Event loop prolog time (ms) |
tnt_ev_loop_epilog_time |
Event loop epilog time (ms) |
Shows the current state of a synchronous replication.
tnt_synchro_queue_owner |
Instance ID of the current synchronous replication master. |
tnt_synchro_queue_term |
Current queue term. |
tnt_synchro_queue_len |
How many transactions are collecting confirmations now. |
tnt_synchro_queue_busy |
Whether the queue is processing any system entry (CONFIRM/ROLLBACK/PROMOTE/DEMOTE). |
Shows the current state of a replica set node in regards to leader election.
tnt_election_state |
Election state (mode) of the node.
When election is enabled, the node is writable only in the leader state.
Possible values:
- 0 (
follower): all the non-leader nodes are called followers
- 1 (
candidate): the nodes that start a new election round are called candidates.
- 2 (
leader): the node that collected a quorum of votes becomes the leader
|
tnt_election_vote |
ID of a node the current node votes for.
If the value is 0, it means the node hasn’t voted in the current term yet. |
tnt_election_leader |
Leader node ID in the current term.
If the value is 0, it means the node doesn’t know which node is the leader in the current term. |
tnt_election_term |
Current election term. |
tnt_election_leader_idle |
Time in seconds since the last interaction with the known leader. |
Memtx mvcc memory statistics.
Transaction manager consists of two parts:
- the transactions themselves (TXN section)
- MVCC
tnt_memtx_tnx_statements are the transaction statements. |
For example, the user started a transaction and made an action in it space:replace{0, 1}.
Under the hood, this operation will turn into statement for the current transaction.
This metric always has the label {kind="..."},
which has the following possible values:
total: the number of bytes that are allocated for the statements of all current transactions.
average: average bytes used by transactions for statements
(txn.statements.total bytes / number of open transactions).
max: the maximum number of bytes used by one the current transaction for statements.
|
tnt_memtx_tnx_user |
In Tarantool C API there is a function box_txn_alloc().
By using this function user can allocate memory for the current transaction.
This metric always has the label {kind="..."},
which has the following possible values:
total: memory allocated by the box_txn_alloc() function on all current transactions.
average: transaction average (total allocated bytes / number of all current transactions).
max: the maximum number of bytes allocated by box_txn_alloc() function per transaction.
|
tnt_memtx_tnx_system |
There are internals: logs, savepoints.
This metric always has the label {kind="..."},
which has the following possible values:
total: memory allocated by internals on all current transactions.
average: average allocated memory by internals (total memory / number of all current transactions).
max: the maximum number of bytes allocated by internals per transaction.
|
mvcc is responsible for the isolation of transactions.
It detects conflicts and makes sure that tuples that are no longer in the space, but read by some transaction
(or can be read) have not been deleted.
tnt_memtx_mvcc_trackers |
Trackers that keep track of transaction reads.
This metric always has the label {kind="..."},
which has the following possible values:
total: trackers of all current transactions are allocated in total (in bytes).
average: average for all current transactions (total memory bytes / number of transactions).
max: maximum trackers allocated per transaction (in bytes).
|
tnt_memtx_mvcc_conflicts |
Allocated in case of transaction conflicts.
This metric always has the label {kind="..."},
which has the following possible values:
total: bytes allocated for conflicts in total.
average: average for all current transactions (total memory bytes / number of transactions).
max: maximum bytes allocated for conflicts per transaction.
|
Saved tuples are divided into 3 categories: used, read_view, tracking.
Each category has two metrics:
retained tuples - they are no longer in the index, but MVCC does not allow them to be removed.
stories - MVCC is based on the story mechanism, almost every tuple has a story.
This is a separate metric because even the tuples that are in the index can have a story.
So stories and retained need to be measured separately.
tnt_memtx_mvcc_tuples_used_stories |
Tuples that are used by active read-write transactions.
This metric always has the label {kind="..."},
which has the following possible values:
count: number of used tuples / number of stories.
total: amount of bytes used by stories used tuples.
|
tnt_memtx_mvcc_tuples_used_retained |
Tuples that are used by active read-write transactions.
But they are no longer in the index, but MVCC does not allow them to be removed.
This metric always has the label {kind="..."},
which has the following possible values:
count: number of retained used tuples / number of stories.
total: amount of bytes used by retained used tuples.
|
tnt_memtx_mvcc_tuples_read_view_stories |
Tuples that are not used by active read-write transactions,
but are used by read-only transactions (i.e. in read view).
This metric always has the label {kind="..."},
which has the following possible values:
count: number of read_view tuples / number of stories.
total: amount of bytes used by stories read_view tuples.
|
tnt_memtx_mvcc_tuples_read_view_retained |
Tuples that are not used by active read-write transactions,
but are used by read-only transactions (i.e. in read view).
This tuples are no longer in the index, but MVCC does not allow them to be removed.
This metric always has the label {kind="..."},
which has the following possible values:
count: number of retained read_view tuples / number of stories.
total: amount of bytes used by retained read_view tuples.
|
tnt_memtx_mvcc_tuples_tracking_stories |
Tuples that are not directly used by any transactions, but are used by MVCC to track reads.
This metric always has the label {kind="..."},
which has the following possible values:
count: number of tracking tuples / number of tracking stories.
total: amount of bytes used by stories tracking tuples.
|
tnt_memtx_mvcc_tuples_tracking_retained |
Tuples that are not directly used by any transactions, but are used by MVCC to track reads.
This tuples are no longer in the index, but MVCC does not allow them to be removed.
This metric always has the label {kind="..."},
which has the following possible values:
count: number of retained tracking tuples / number of stories.
total: amount of bytes used by retained tracking tuples.
|
tnt_memtx_tuples_data_total |
Total amount of memory (in bytes) allocated for data tuples.
This includes tnt_memtx_tuples_data_read_view and
tnt_memtx_tuples_data_garbage metric values plus tuples that
are actually stored in memtx spaces. |
tnt_memtx_tuples_data_read_view |
Memory (in bytes) held for read views. |
tnt_memtx_tuples_data_garbage |
Memory (in bytes) that is unused and scheduled to be freed
(freed lazily on memory allocation). |
tnt_memtx_index_total |
Total amount of memory (in bytes) allocated for indexing data.
This includes tnt_memtx_index_read_view metric value
plus memory used for indexing tuples
that are actually stored in memtx spaces. |
tnt_memtx_index_read_view |
Memory (in bytes) held for read views. |
Notes for operating systems
On macOS, no native system tools for administering Tarantool are supported.
The recommended way to administer Tarantool instances is using tt CLI.
The section below is about a dev-db/tarantool package installed from the
official layman overlay (named tarantool).
The default instance directory is /etc/tarantool/instances.available, can be
redefined in /etc/default/tarantool.
Tarantool instances can be managed (start/stop/reload/status/…) using OpenRC.
Consider the example how to create an OpenRC-managed instance:
$ cd /etc/init.d
$ ln -s tarantool your_service_name
$ ln -s /usr/share/tarantool/your_service_name.lua /etc/tarantool/instances.available/your_service_name.lua
Checking that it works:
$ /etc/init.d/your_service_name start
$ tail -f -n 100 /var/log/tarantool/your_service_name.log
Troubleshooting guide
Problem: INSERT/UPDATE-requests result in ER_MEMORY_ISSUE error
Possible reasons
Lack of RAM (parameters arena_used_ratio and quota_used_ratio in
box.slab.info() report are getting close to 100%).
To check these parameters, say:
$ # attaching to a Tarantool instance
$ tt connect <instance_name|URI>
Solution
Try either of the following measures:
In Tarantool’s instance file, increase the
value of box.cfg{memtx_memory}
(if memory resources are available).
In versions of Tarantool before 1.10, the server needs to be restarted
to change this parameter. The Tarantool
server will be unavailable while restarting from .xlog files, unless
you restart it using hot standby mode.
In the latter case, nearly 100% server availability is guaranteed.
Clean up the database.
Check the indicators of memory fragmentation:
In case of heavy memory fragmentation (quota_used_ratio is getting close
to 100%, items_used_ratio is about 50%), we recommend restarting Tarantool
in the hot standby mode.
Problem: Query processing times out
Possible reasons
Note
All reasons that we discuss here can be identified by messages
in Tarantool’s log file, all starting with the words 'Too long...'.
Both fast and slow queries are processed within a single connection, so the
readahead buffer is cluttered with slow queries.
Solution
Try either of the following measures:
Increase the readahead buffer size
(box.cfg{readahead} parameter).
This parameter can be changed on the fly, so you don’t need to restart
Tarantool. Attach to the Tarantool instance with
tt utility and call box.cfg{} with a
new readahead value:
$ # attaching to a Tarantool instance
$ tt connect <instance_name|URI>
Example: Given 1000 RPS, 1 Кbyte of query size, and 10 seconds of
maximal query processing time, the minimal readahead buffer size must be
10 Mbytes.
On the business logic level, split fast and slow queries processing by
different connections.
Slow disks.
Solution
Check disk performance (use iostat,
iotop or
strace utility to
check iowait parameter) and try to put .xlog files and snapshot files on
different physical disks (i.e. use different locations for
wal_dir and memtx_dir).
Problem: Replication “lag” and “idle” contain negative values
This is about box.info.replication.(upstream.)lag and
box.info.replication.(upstream.)idle values in
box.info.replication section.
Possible reasons
Operating system clock on the hosts is not synchronized, or the NTP server is
faulty.
Solution
Check NTP server settings.
If you found no problems with the NTP server, just do nothing then.
Lag calculation uses operating system clock from two different machines.
If they get out of sync, the remote master clock can get consistently behind
the local instance’s clock.
Problem: Replication statistics differ on replicas within a replica set
This is about a replica set that consists of one master and several replicas.
In a replica set of this type, values in
box.info.replication section, like
box.info.replication.lsn, come from the master and must be the same on all
replicas within the replica set. The problem is that they get different.
Possible reasons
Replication is broken.
Solution
Restart replication.
Problem: Master-master replication is stopped
This is about
box.info.replication(.upstream).status
= stopped.
Possible reasons
In a master-master replica set of two Tarantool instances, one of the masters
has tried to perform an action already performed by the other server,
for example re-insert a tuple with the same unique key. This would cause an
error message like
'Duplicate key exists in unique index 'primary' in space <space_name>'.
Solution
This issue can be fixed in two ways:
Note
If one of the instances must be isolated during troubleshooting, it can be put to the isolated mode.
Then, restart replication as described in Restarting replication.
Connectors
Connectors are APIs that allow using Tarantool with various programming languages.
Connectors can be divided into two groups – those maintained by the Tarantool team
and those supported by the community.
The Tarantool team maintains the following connectors:
All other connectors are community-supported, which means that support for new Tarantool features may be delayed.
Find all the available connectors on the Connectors page.
Tarantool’s binary protocol was designed with a focus on asynchronous I/O and
easy integration with proxies. Each client request starts with a variable-length
binary header, containing request id, request type, instance id, log sequence
number, and so on.
The mandatory length, present in request header simplifies client or proxy I/O.
A response to a request is sent to the client as soon as it is ready. It always
carries in its header the same type and id as in the request. The id makes it
possible to match a request to a response, even if the latter arrived out of
order.
Unless implementing a client driver, you needn’t concern yourself with the
complications of the binary protocol. Language-specific drivers provide a
friendly way to store domain language data structures in Tarantool. A complete
description of the binary protocol is maintained in annotated Backus-Naur form
in the source tree. For detailed examples and diagrams of all binary-protocol
requests and responses, see
Tarantool’s binary protocol.
The Tarantool API exists so that a client program can send a request packet to
a server instance, and receive a response. Here is an example of a what the client
would send for box.space[513]:insert{'A', 'BB'}. The BNF description of
the components is on the page about
Tarantool’s binary protocol.
| Component |
Byte #0 |
Byte #1 |
Byte #2 |
Byte #3 |
| code for insert |
02 |
|
|
|
| rest of header |
… |
… |
… |
… |
| 2-digit number: space id |
cd |
02 |
01 |
|
| code for tuple |
21 |
|
|
|
| 1-digit number: field count = 2 |
92 |
|
|
|
| 1-character string: field[1] |
a1 |
41 |
|
|
| 2-character string: field[2] |
a2 |
42 |
42 |
|
Now, you could send that packet to the Tarantool instance, and interpret the
response (the page about
Tarantool’s binary protocol has a
description of the packet format for responses as well as requests). But it
would be easier, and less error-prone, if you could invoke a routine that
formats the packet according to typed parameters. Something like
response = tarantool_routine("insert", 513, "A", "B");. And that is why APIs
exist for drivers for Perl, Python, PHP, and so on.
Setting up the server for connector examples
This chapter has examples that show how to connect to a Tarantool instance via
the Perl, PHP, Python, node.js, and C connectors. The examples contain hard code that
will work if and only if the following conditions are met:
- the Tarantool instance (tarantool) is running on localhost (127.0.0.1) and is listening on
port 3301 (
box.cfg.listen = '3301'),
- space
examples has id = 999 (box.space.examples.id = 999) and has
a primary-key index for a numeric field
(box.space[999].index[0].parts[1].type = "unsigned"),
- user ‘guest’ has privileges for reading and writing.
It is easy to meet all the conditions by starting the instance and executing this
script:
box.cfg{listen=3301}
box.schema.space.create('examples',{id=999})
box.space.examples:create_index('primary', {type = 'hash', parts = {1, 'unsigned'}})
box.schema.user.grant('guest','read,write','space','examples')
box.schema.user.grant('guest','read','space','_space')
Interpreting function return values
For all connectors, calling a function via Tarantool causes a return in the
MsgPack format. If the function is called using the connector’s API, some
conversions may occur. All scalar values are returned as tuples (with a MsgPack
type-identifier followed by a value); all non-scalar values are returned as a
group of tuples (with a MsgPack array-identifier followed by the scalar values).
If the function is called via the binary protocol command layer – “eval” –
rather than via the connector’s API, no conversions occur.
In the following example, a Lua function will be created. Since it will be
accessed externally by a ‘guest’ user, a
grant of an execute privilege will
be necessary. The function returns an empty array, a scalar string, two booleans,
and a short integer. The values are the ones described in the table
Common Types and MsgPack Encodings.
Here is a C program which calls the function. Although C is being used for the
example, the result would be precisely the same if the calling program was
written in Perl, PHP, Python, Go, or Java.
#include <stdio.h>
#include <stdlib.h>
#include <tarantool/tarantool.h>
#include <tarantool/tnt_net.h>
#include <tarantool/tnt_opt.h>
void main() {
struct tnt_stream *tnt = tnt_net(NULL); /* SETUP */
tnt_set(tnt, TNT_OPT_URI, "localhost:3301");
if (tnt_connect(tnt) < 0) { /* CONNECT */
printf("Connection refused\n");
exit(-1);
}
struct tnt_stream *arg; arg = tnt_object(NULL); /* MAKE REQUEST */
tnt_object_add_array(arg, 0);
struct tnt_request *req1 = tnt_request_call(NULL); /* CALL function f() */
tnt_request_set_funcz(req1, "f");
uint64_t sync1 = tnt_request_compile(tnt, req1);
tnt_flush(tnt); /* SEND REQUEST */
struct tnt_reply reply; tnt_reply_init(&reply); /* GET REPLY */
tnt->read_reply(tnt, &reply);
if (reply.code != 0) {
printf("Call failed %lu.\n", reply.code);
exit(-1);
}
const unsigned char *p= (unsigned char*)reply.data; /* PRINT REPLY */
while (p < (unsigned char *) reply.data_end)
{
printf("%x ", *p);
++p;
}
printf("\n");
tnt_close(tnt); /* TEARDOWN */
tnt_stream_free(arg);
tnt_stream_free(tnt);
}
When this program is executed, it will print:
dd 0 0 0 5 90 91 a1 61 91 c2 91 c3 91 7f
The first five bytes – dd 0 0 0 5 – are the MsgPack encoding for
“32-bit array header with value 5” (see
MsgPack specification).
The rest are as described in the
table Common Types and MsgPack Encodings.
Go
Examples on GitHub: sample_db, go
go-tarantool is the official Go connector for Tarantool.
It is not supplied as part of the Tarantool repository and should be installed separately.
This tutorial shows how to use the go-tarantool 2.x library to create a Go application that connects to a remote Tarantool instance, performs CRUD operations, and executes a stored procedure.
You can find the full package documentation here: Client in Go for Tarantool.
Note
This tutorial shows how to make CRUD requests to a single-instance Tarantool database.
To make requests to a sharded Tarantool cluster with the CRUD module, use the crud package’s API.
Sample database configuration
This section describes the configuration of a sample database that allows remote connections:
credentials:
users:
sampleuser:
password: '123456'
privileges:
- permissions: [ read, write ]
spaces: [ bands ]
- permissions: [ execute ]
functions: [ get_bands_older_than ]
groups:
group001:
replicasets:
replicaset001:
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
app:
file: 'myapp.lua'
- The configuration contains one instance that listens for incoming requests on the
127.0.0.1:3301 address.
sampleuser has privileges to select and modify data in the bands space and execute the get_bands_older_than stored function. This user can be used to connect to the instance remotely.
myapp.lua defines the data model and a stored function.
The myapp.lua file looks as follows:
-- Create a space --
box.schema.space.create('bands')
-- Specify field names and types --
box.space.bands:format({
{ name = 'id', type = 'unsigned' },
{ name = 'band_name', type = 'string' },
{ name = 'year', type = 'unsigned' }
})
-- Create indexes --
box.space.bands:create_index('primary', { parts = { 'id' } })
box.space.bands:create_index('band', { parts = { 'band_name' } })
box.space.bands:create_index('year_band', { parts = { { 'year' }, { 'band_name' } } })
-- Create a stored function --
box.schema.func.create('get_bands_older_than', {
body = [[
function(year)
return box.space.bands.index.year_band:select({ year }, { iterator = 'LT', limit = 10 })
end
]]
})
You can find the full example on GitHub: sample_db.
Starting a sample database application
Before creating and starting a client Go application, you need to run the sample_db application using tt start:
Now you can create a client Go application that makes requests to this database.
Developing a client application
Before you start, make sure you have Go installed on your computer.
Create the hello directory for your application and go to this directory:
Initialize a new Go module:
$ go mod init example/hello
Inside the hello directory, create the hello.go file for application code.
Connecting to the database
Declare the main() function:
Inside the main() function, add the following code:
// Connect to the database
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()
dialer := tarantool.NetDialer{
Address: "127.0.0.1:3301",
User: "sampleuser",
Password: "123456",
}
opts := tarantool.Opts{
Timeout: time.Second,
}
conn, err := tarantool.Connect(ctx, dialer, opts)
if err != nil {
fmt.Println("Connection refused:", err)
return
}
// Interact with the database
// ...
This code establishes a connection to a running Tarantool instance on behalf of sampleuser.
The conn object can be used to make CRUD requests and execute stored procedures.
Add the following code to insert four tuples into the bands space:
// Insert data
tuples := [][]interface{}{
{1, "Roxette", 1986},
{2, "Scorpions", 1965},
{3, "Ace of Base", 1987},
{4, "The Beatles", 1960},
}
var futures []*tarantool.Future
for _, tuple := range tuples {
request := tarantool.NewInsertRequest("bands").Tuple(tuple)
futures = append(futures, conn.Do(request))
}
fmt.Println("Inserted tuples:")
for _, future := range futures {
result, err := future.Get()
if err != nil {
fmt.Println("Got an error:", err)
} else {
fmt.Println(result)
}
}
This code makes insert requests asynchronously:
- The
Future structure is used as a handle for asynchronous requests.
- The
NewInsertRequest() method creates an insert request object that is executed by the connection.
Note
Making requests asynchronously is the recommended way to perform data operations.
Further requests in this tutorial are made synchronously.
To get a tuple by the specified primary key value, use NewSelectRequest() to create an insert request object:
// Select by primary key
data, err := conn.Do(
tarantool.NewSelectRequest("bands").
Limit(10).
Iterator(tarantool.IterEq).
Key([]interface{}{uint(1)}),
).Get()
if err != nil {
fmt.Println("Got an error:", err)
}
fmt.Println("Tuple selected by the primary key value:", data)
You can also get a tuple by the value of the specified index by using Index():
// Select by secondary key
data, err = conn.Do(
tarantool.NewSelectRequest("bands").
Index("band").
Limit(10).
Iterator(tarantool.IterEq).
Key([]interface{}{"The Beatles"}),
).Get()
if err != nil {
fmt.Println("Got an error:", err)
}
fmt.Println("Tuple selected by the secondary key value:", data)
NewUpdateRequest() can be used to update a tuple identified by the primary key as follows:
// Update
data, err = conn.Do(
tarantool.NewUpdateRequest("bands").
Key(tarantool.IntKey{2}).
Operations(tarantool.NewOperations().Assign(1, "Pink Floyd")),
).Get()
if err != nil {
fmt.Println("Got an error:", err)
}
fmt.Println("Updated tuple:", data)
NewUpsertRequest() can be used to update an existing tuple or insert a new one.
In the example below, a new tuple is inserted:
// Upsert
data, err = conn.Do(
tarantool.NewUpsertRequest("bands").
Tuple([]interface{}{uint(5), "The Rolling Stones", 1962}).
Operations(tarantool.NewOperations().Assign(1, "The Doors")),
).Get()
if err != nil {
fmt.Println("Got an error:", err)
}
In this example, NewReplaceRequest() is used to delete the existing tuple and insert a new one:
// Replace
data, err = conn.Do(
tarantool.NewReplaceRequest("bands").
Tuple([]interface{}{1, "Queen", 1970}),
).Get()
if err != nil {
fmt.Println("Got an error:", err)
}
fmt.Println("Replaced tuple:", data)
NewDeleteRequest() in the example below is used to delete a tuple whose primary key value is 5:
// Delete
data, err = conn.Do(
tarantool.NewDeleteRequest("bands").
Key([]interface{}{uint(5)}),
).Get()
if err != nil {
fmt.Println("Got an error:", err)
}
fmt.Println("Deleted tuple:", data)
Executing stored procedures
To execute a stored procedure, use NewCallRequest():
// Call
data, err = conn.Do(
tarantool.NewCallRequest("get_bands_older_than").Args([]interface{}{1966}),
).Get()
if err != nil {
fmt.Println("Got an error:", err)
}
fmt.Println("Stored procedure result:", data)
The CloseGraceful() method can be used to close the connection when it is no longer needed:
// Close connection
conn.CloseGraceful()
fmt.Println("Connection is closed")
Note
You can find the example with all the requests above on GitHub: go.
Starting a client application
Execute the following go get commands to update dependencies in the go.mod file:
$ go get github.com/tarantool/go-tarantool/v2
$ go get github.com/tarantool/go-tarantool/v2/decimal
$ go get github.com/tarantool/go-tarantool/v2/uuid
To run the resulting application, execute the go run command in the application directory:
$ go run .
Inserted tuples:
[[1 Roxette 1986]]
[[2 Scorpions 1965]]
[[3 Ace of Base 1987]]
[[4 The Beatles 1960]]
Tuple selected by the primary key value: [[1 Roxette 1986]]
Tuple selected by the secondary key value: [[4 The Beatles 1960]]
Updated tuple: [[2 Pink Floyd 1965]]
Replaced tuple: [[1 Queen 1970]]
Deleted tuple: [[5 The Rolling Stones 1962]]
Stored procedure result: [[[2 Pink Floyd 1965] [4 The Beatles 1960]]]
Connection is closed
Java
The following Java connectors are available:
Note
The connectors below are either deprecated or are planned for deprecation.
- cartridge-java
supports both single Tarantool nodes and clusters,
as well as applications built using the
Cartridge framework and its modules.
The Tarantool team actively updates this module with the newest Tarantool features.
- tarantool-java
works with early Tarantool versions (1.6 and later)
and offers JDBC interface support for single Tarantool nodes.
This module isn’t currently maintained and
does not support the newest 2.x Tarantool features or Tarantool clusters.
C
tarantool-c is the official C connector for Tarantool.
You can find the full library documentation here: Documentation for tarantool-c.
Here follow two examples of using Tarantool’s high-level C API.
Here is a complete C program that inserts [99999,'B'] into
space examples via the high-level C API.
#include <stdio.h>
#include <stdlib.h>
#include <tarantool/tarantool.h>
#include <tarantool/tnt_net.h>
#include <tarantool/tnt_opt.h>
void main() {
struct tnt_stream *tnt = tnt_net(NULL); /* See note = SETUP */
tnt_set(tnt, TNT_OPT_URI, "localhost:3301");
if (tnt_connect(tnt) < 0) { /* See note = CONNECT */
printf("Connection refused\n");
exit(-1);
}
struct tnt_stream *tuple = tnt_object(NULL); /* See note = MAKE REQUEST */
tnt_object_format(tuple, "[%d%s]", 99999, "B");
tnt_insert(tnt, 999, tuple); /* See note = SEND REQUEST */
tnt_flush(tnt);
struct tnt_reply reply; tnt_reply_init(&reply); /* See note = GET REPLY */
tnt->read_reply(tnt, &reply);
if (reply.code != 0) {
printf("Insert failed %lu.\n", reply.code);
}
tnt_close(tnt); /* See below = TEARDOWN */
tnt_stream_free(tuple);
tnt_stream_free(tnt);
}
Paste the code into a file named example.c and install tarantool-c.
One way to install tarantool-c (using Ubuntu) is:
$ git clone git://github.com/tarantool/tarantool-c.git ~/tarantool-c
$ cd ~/tarantool-c
$ git submodule init
$ git submodule update
$ cmake .
$ make
$ make install
To compile and link the program, run:
$ # sometimes this is necessary:
$ export LD_LIBRARY_PATH=/usr/local/lib
$ gcc -o example example.c -ltarantool
Before trying to run,
check that a server instance is listening at localhost:3301 and that the space
examples exists, as
described earlier.
To run the program, say ./example. The program will connect
to the Tarantool instance, and will send the request.
If Tarantool is not running on localhost with listen address = 3301, the program
will print “Connection refused”.
If the insert fails, the program will print “Insert failed” and an error number
(see all error codes in the source file
/src/box/errcode.h).
Here are notes corresponding to comments in the example program.
The setup begins by creating a stream.
struct tnt_stream *tnt = tnt_net(NULL);
tnt_set(tnt, TNT_OPT_URI, "localhost:3301");
In this program, the stream will be named tnt.
Before connecting on the tnt stream, some options may have to be set.
The most important option is TNT_OPT_URI.
In this program, the URI is localhost:3301, since that is where the
Tarantool instance is supposed to be listening.
Function description:
struct tnt_stream *tnt_net(struct tnt_stream *s)
int tnt_set(struct tnt_stream *s, int option, variant option-value)
Now that the stream named tnt exists and is associated with a
URI, this example program can connect to a server instance.
if (tnt_connect(tnt) < 0)
{ printf("Connection refused\n"); exit(-1); }
Function description:
int tnt_connect(struct tnt_stream *s)
The connection might fail for a variety of reasons, such as:
the server is not running, or the URI contains an invalid password.
If the connection fails, the return value will be -1.
Most requests require passing a structured value, such as
the contents of a tuple.
struct tnt_stream *tuple = tnt_object(NULL);
tnt_object_format(tuple, "[%d%s]", 99999, "B");
In this program, the request will
be an INSERT, and the tuple contents will be an integer
and a string. This is a simple serial set of values, that
is, there are no sub-structures or arrays. Therefore it
is easy in this case to format what will be passed using
the same sort of arguments that one would use with a C
printf() function: %d for the integer, %s for the string,
then the integer value, then a pointer to the string value.
Function description:
ssize_t tnt_object_format(struct tnt_stream *s, const char *fmt, ...)
The database-manipulation requests are analogous to the
requests in the box library.
tnt_insert(tnt, 999, tuple);
tnt_flush(tnt);
In this program, the choice is to do an INSERT request, so
the program passes the tnt_stream that was used for connection
(tnt) and the tnt_stream that was set up with
tnt_object_format() (tuple).
Function description:
ssize_t tnt_insert(struct tnt_stream *s, uint32_t space, struct tnt_stream *tuple)
ssize_t tnt_replace(struct tnt_stream *s, uint32_t space, struct tnt_stream *tuple)
ssize_t tnt_select(struct tnt_stream *s, uint32_t space, uint32_t index,
uint32_t limit, uint32_t offset, uint8_t iterator,
struct tnt_stream *key)
ssize_t tnt_update(struct tnt_stream *s, uint32_t space, uint32_t index,
struct tnt_stream *key, struct tnt_stream *ops)
For most requests, the client will receive a reply containing some
indication whether the result was successful, and a set of tuples.
struct tnt_reply reply; tnt_reply_init(&reply);
tnt->read_reply(tnt, &reply);
if (reply.code != 0)
{ printf("Insert failed %lu.\n", reply.code); }
This program checks for success but does not decode the rest of the reply.
Function description:
struct tnt_reply *tnt_reply_init(struct tnt_reply *r)
tnt->read_reply(struct tnt_stream *s, struct tnt_reply *r)
void tnt_reply_free(struct tnt_reply *r)
When a session ends, the connection that was made with
tnt_connect() should be closed, and the objects that were
made in the setup should be destroyed.
tnt_close(tnt);
tnt_stream_free(tuple);
tnt_stream_free(tnt);
Function description:
void tnt_close(struct tnt_stream *s)
void tnt_stream_free(struct tnt_stream *s)
Here is a complete C program that selects, using index key [99999], from
space examples via the high-level C API.
To display the results, the program uses functions in the
MsgPuck library which allow decoding of
MessagePack arrays.
#include <stdio.h>
#include <stdlib.h>
#include <tarantool/tarantool.h>
#include <tarantool/tnt_net.h>
#include <tarantool/tnt_opt.h>
#define MP_SOURCE 1
#include <msgpuck.h>
void main() {
struct tnt_stream *tnt = tnt_net(NULL);
tnt_set(tnt, TNT_OPT_URI, "localhost:3301");
if (tnt_connect(tnt) < 0) {
printf("Connection refused\n");
exit(1);
}
struct tnt_stream *tuple = tnt_object(NULL);
tnt_object_format(tuple, "[%d]", 99999); /* tuple = search key */
tnt_select(tnt, 999, 0, UINT32_MAX, 0, 0, tuple);
tnt_flush(tnt);
struct tnt_reply reply; tnt_reply_init(&reply);
tnt->read_reply(tnt, &reply);
if (reply.code != 0) {
printf("Select failed.\n");
exit(1);
}
char field_type;
field_type = mp_typeof(*reply.data);
if (field_type != MP_ARRAY) {
printf("no tuple array\n");
exit(1);
}
long unsigned int row_count;
uint32_t tuple_count = mp_decode_array(&reply.data);
printf("tuple count=%u\n", tuple_count);
unsigned int i, j;
for (i = 0; i < tuple_count; ++i) {
field_type = mp_typeof(*reply.data);
if (field_type != MP_ARRAY) {
printf("no field array\n");
exit(1);
}
uint32_t field_count = mp_decode_array(&reply.data);
printf(" field count=%u\n", field_count);
for (j = 0; j < field_count; ++j) {
field_type = mp_typeof(*reply.data);
if (field_type == MP_UINT) {
uint64_t num_value = mp_decode_uint(&reply.data);
printf(" value=%lu.\n", num_value);
} else if (field_type == MP_STR) {
const char *str_value;
uint32_t str_value_length;
str_value = mp_decode_str(&reply.data, &str_value_length);
printf(" value=%.*s.\n", str_value_length, str_value);
} else {
printf("wrong field type\n");
exit(1);
}
}
}
tnt_close(tnt);
tnt_stream_free(tuple);
tnt_stream_free(tnt);
}
Similarly to the first example, paste the code into a file named
example2.c.
To compile and link the program, say:
$ gcc -o example2 example2.c -ltarantool
To run the program, say ./example2.
The two example programs only show a few requests and do not show all that’s
necessary for good practice. See more in the
tarantool-c documentation at GitHub.
Python
Examples on GitHub: sample_db, python
tarantool-python
is the official Python connector for Tarantool. It is not supplied as part
of the Tarantool repository and must be installed separately.
The tutorial shows how to use the tarantool-python library to create a Python script that connects to a remote Tarantool instance, performs CRUD operations, and executes a stored procedure.
You can find the full package documentation here: Python client library for Tarantool.
Note
This tutorial shows how to make CRUD requests to a single-instance Tarantool database.
To make requests to a sharded Tarantool cluster with the CRUD module, use the tarantool.crud module’s API.
Sample database configuration
This section describes the configuration of a sample database that allows remote connections:
credentials:
users:
sampleuser:
password: '123456'
privileges:
- permissions: [ read, write ]
spaces: [ bands ]
- permissions: [ execute ]
functions: [ get_bands_older_than ]
groups:
group001:
replicasets:
replicaset001:
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
app:
file: 'myapp.lua'
- The configuration contains one instance that listens for incoming requests on the
127.0.0.1:3301 address.
sampleuser has privileges to select and modify data in the bands space and execute the get_bands_older_than stored function. This user can be used to connect to the instance remotely.
myapp.lua defines the data model and a stored function.
The myapp.lua file looks as follows:
-- Create a space --
box.schema.space.create('bands')
-- Specify field names and types --
box.space.bands:format({
{ name = 'id', type = 'unsigned' },
{ name = 'band_name', type = 'string' },
{ name = 'year', type = 'unsigned' }
})
-- Create indexes --
box.space.bands:create_index('primary', { parts = { 'id' } })
box.space.bands:create_index('band', { parts = { 'band_name' } })
box.space.bands:create_index('year_band', { parts = { { 'year' }, { 'band_name' } } })
-- Create a stored function --
box.schema.func.create('get_bands_older_than', {
body = [[
function(year)
return box.space.bands.index.year_band:select({ year }, { iterator = 'LT', limit = 10 })
end
]]
})
You can find the full example on GitHub: sample_db.
Starting a sample database application
Before creating and starting a client Python application, you need to run the sample_db application using tt start:
Now you can create a client Python application that makes requests to this database.
Developing a client application
Before you start, make sure you have Python installed on your computer.
Create the hello directory for your application and go to this directory:
Create and activate a Python virtual environment:
$ python -m venv .venv
$ source .venv/bin/activate
Install the tarantool module:
Inside the hello directory, create the hello.py file for application code.
Connecting to the database
Add the following code:
# Connect to the database
conn = tarantool.Connection(host='127.0.0.1',
port=3301,
user='sampleuser',
password='123456')
This code establishes a connection to a running Tarantool instance on behalf of sampleuser.
The conn object can be used to make CRUD requests and execute stored procedures.
Add the following code to insert four tuples into the bands space:
# Insert data
tuples = [(1, 'Roxette', 1986),
(2, 'Scorpions', 1965),
(3, 'Ace of Base', 1987),
(4, 'The Beatles', 1960)]
print("Inserted tuples:")
for tuple in tuples:
response = conn.insert(space_name='bands', values=tuple)
print(response[0])
Connection.insert() is used to insert a tuple to the space.
To get a tuple by the specified primary key value, use Connection.select():
# Select by primary key
response = conn.select(space_name='bands', key=1)
print('Tuple selected by the primary key value:', response[0])
You can also get a tuple by the value of the specified index using the index argument:
# Select by secondary key
response = conn.select(space_name='bands', key='The Beatles', index='band')
print('Tuple selected by the secondary key value:', response[0])
Connection.update() can be used to update a tuple identified by the primary key as follows:
# Update
response = conn.update(space_name='bands',
key=2,
op_list=[('=', 'band_name', 'Pink Floyd')])
print('Updated tuple:', response[0])
Connection.upsert() updates an existing tuple or inserts a new one.
In the example below, a new tuple is inserted:
# Upsert
conn.upsert(space_name='bands',
tuple_value=(5, 'The Rolling Stones', 1962),
op_list=[('=', 'band_name', 'The Doors')])
In this example, Connection.replace() deletes the existing tuple and inserts a new one:
# Replace
response = conn.replace(space_name='bands', values=(1, 'Queen', 1970))
print('Replaced tuple:', response[0])
Connection.delete() in the example below deletes a tuple whose primary key value is 5:
# Delete
response = conn.delete(space_name='bands', key=5)
print('Deleted tuple:', response[0])
Executing stored procedures
To execute a stored procedure, use Connection.call():
# Call
response = conn.call('get_bands_older_than', 1966)
print('Stored procedure result:', response[0])
The Connection.close() method can be used to close the connection when it is no longer needed:
# Close connection
conn.close()
print('Connection is closed')
Note
You can find the example with all the requests above on GitHub: python.
Starting a client application
To run the resulting application, pass the script name to the python command:
$ python hello.py
Inserted tuples:
[1, 'Roxette', 1986]
[2, 'Scorpions', 1965]
[3, 'Ace of Base', 1987]
[4, 'The Beatles', 1960]
Tuple selected by the primary key value: [1, 'Roxette', 1986]
Tuple selected by the secondary key value: [4, 'The Beatles', 1960]
Updated tuple: [2, 'Pink Floyd', 1965]
Replaced tuple: [1, 'Queen', 1970]
Deleted tuple: [5, 'The Rolling Stones', 1962]
Stored procedure result: [[2, 'Pink Floyd', 1965], [4, 'The Beatles', 1960]]
Connection is closed
C++
tntcxx is the official C++ connector for Tarantool.
C++ connector API
The official C++ connector for Tarantool is located in the
tanartool/tntcxx repository.
It is not supplied as part of the Tarantool repository and requires additional
actions for usage.
The connector itself is a header-only library and, as such, doesn’t require
installation and building. All you need is to clone the connector
source code and embed it in your C++ project. See the C++ connector Getting started
document for details and examples.
Below is the description of the connector public API.
-
template<class
BUFFER, class NetProvider = EpollNetProvider<BUFFER>>
class Connector
The Connector class is a template class that defines a connector client
which can handle many connections to Tarantool instances asynchronously.
To instantiate a client, you should specify the buffer and the network provider
implementations as template parameters. You can either implement your own buffer
or network provider or use the default ones.
The default connector instantiation looks as follows:
using Buf_t = tnt::Buffer<16 * 1024>;
using Net_t = EpollNetProvider<Buf_t >;
Connector<Buf_t, Net_t> client;
-
int
connect(Connection<BUFFER, NetProvider> &conn, const std::string_view &addr, unsigned port, size_t timeout = DEFAULT_CONNECT_TIMEOUT)
Connects to a Tarantool instance that is listening on addr:port.
On successful connection, the method returns 0. If the host
doesn’t reply within the timeout period or another error occurs,
it returns -1. Then, Connection.getError()
gives the error message.
| Parameters: |
- conn – object of the Connection
class.
- addr – address of the host where a Tarantool
instance is running.
- port – port that a Tarantool instance is listening on.
- timeout – connection timeout, seconds. Optional. Defaults to
2.
|
| Returns: | 0 on success, or -1 otherwise.
|
| Rtype: | int
|
Possible errors:
- connection timeout
- refused to connect (due to incorrect address or/and port)
- system errors: a socket can’t be created; failure of any of the system
calls (
fcntl, select, send, receive).
Example:
using Buf_t = tnt::Buffer<16 * 1024>;
using Net_t = EpollNetProvider<Buf_t >;
Connector<Buf_t, Net_t> client;
Connection<Buf_t, Net_t> conn(client);
int rc = client.connect(conn, "127.0.0.1", 3301);
-
int
wait(Connection<BUFFER, NetProvider> &conn, rid_t future, int timeout = 0)
The main method responsible for sending a request and checking the response
readiness.
You should prepare a request beforehand by using the necessary
method of the Connection class, such as
ping()
and so on, which encodes the request
in the MessagePack format and saves it in
the output connection buffer.
wait() sends the request and is polling the future for the response
readiness. Once the response is ready, wait() returns 0.
If at timeout the response isn’t ready or another error occurs,
it returns -1. Then, Connection.getError()
gives the error message.
timeout = 0 means the method is polling the future until the response
is ready.
| Parameters: |
- conn – object of the Connection
class.
- future – request ID returned by a request method of
the Connection class, such as,
ping()
and so on.
- timeout – waiting timeout, milliseconds. Optional. Defaults to
0.
|
| Returns: | 0 on receiving a response, or -1 otherwise.
|
| Rtype: | int
|
Possible errors:
- timeout exceeded
- other possible errors depend on a network provider used.
If the
EpollNetProvider is used, failing of the poll,
read, and write system calls leads to system errors,
such as, EBADF, ENOTSOCK, EFAULT, EINVAL, EPIPE,
and ENOTCONN (EWOULDBLOCK and EAGAIN don’t occur
in this case).
Example:
client.wait(conn, ping, WAIT_TIMEOUT)
-
void
waitAll(Connection<BUFFER, NetProvider> &conn, rid_t *futures, size_t future_count, int timeout = 0)
Similar to wait(), the method sends
the requests prepared and checks the response readiness, but can send
several different requests stored in the futures array.
Exceeding the timeout leads to an error; Connection.getError()
gives the error message.
timeout = 0 means the method is polling the futures
until all the responses are ready.
| Parameters: |
- conn – object of the Connection
class.
- futures – array with the request IDs returned by request
methods of the Connection
class, such as, ping()
and so on.
- future_count – size of the
futures array.
- timeout – waiting timeout, milliseconds. Optional. Defaults to
0.
|
| Returns: | none
|
| Rtype: | none
|
Possible errors:
- timeout exceeded
- other possible errors depend on a network provider used.
If the
EpollNetProvider is used, failing of the poll,
read, and write system calls leads to system errors,
such as, EBADF, ENOTSOCK, EFAULT, EINVAL, EPIPE,
and ENOTCONN (EWOULDBLOCK and EAGAIN don’t occur
in this case).
Example:
rid_t futures[2];
futures[0] = replace;
futures[1] = select;
client.waitAll(conn, (rid_t *) &futures, 2);
-
Connection<BUFFER, NetProvider> *
waitAny(int timeout = 0)
Sends all requests that are prepared at the moment and is waiting for
any first response to be ready. Upon the response readiness, waitAny()
returns the corresponding connection object.
If at timeout no response is ready or another error occurs, it returns
nullptr. Then, Connection.getError()
gives the error message.
timeout = 0 means no time limitation while waiting for the response
readiness.
| Parameters: | timeout – waiting timeout, milliseconds. Optional. Defaults to 0. |
| Returns: | object of the Connection class
on success, or nullptr on error. |
| Rtype: | Connection<BUFFER, NetProvider>* |
Possible errors:
- timeout exceeded
- other possible errors depend on a network provider used.
If the
EpollNetProvider is used, failing of the poll,
read, and write system calls leads to system errors,
such as, EBADF, ENOTSOCK, EFAULT, EINVAL, EPIPE,
and ENOTCONN (EWOULDBLOCK and EAGAIN don’t occur
in this case).
Example:
rid_t f1 = conn.ping();
rid_t f2 = another_conn.ping();
Connection<Buf_t, Net_t> *first = client.waitAny(WAIT_TIMEOUT);
if (first == &conn) {
assert(conn.futureIsReady(f1));
} else {
assert(another_conn.futureIsReady(f2));
}
-
void
close(Connection<BUFFER, NetProvider> &conn)
Closes the connection established earlier by
the connect() method.
| Parameters: | conn – connection object of the Connection
class. |
| Returns: | none |
| Rtype: | none |
Possible errors: none.
Example:
-
template<class
BUFFER, class NetProvider>
class Connection
The Connection class is a template class that defines a connection objects
which is required to interact with a Tarantool instance. Each connection object
is bound to a single socket.
Similar to a connector client, a connection
object also takes the buffer and the network provider as template
parameters, and they must be the same as ones of the client. For example:
//Instantiating a connector client
using Buf_t = tnt::Buffer<16 * 1024>;
using Net_t = EpollNetProvider<Buf_t >;
Connector<Buf_t, Net_t> client;
//Instantiating connection objects
Connection<Buf_t, Net_t> conn01(client);
Connection<Buf_t, Net_t> conn02(client);
The Connection class has two nested classes, namely,
Space and Index
that implement the data-manipulation methods like select(),
replace(), and so on.
-
typedef size_t
rid_t
The alias of the built-in size_t type. rid_t is used for entities
that return or contain a request ID.
-
template<class
T>
rid_t call(const std::string &func, const T &args)
Executes a call of a remote stored-procedure similar to conn:call().
The method returns the request ID that is used to get the response by
getResponse().
| Parameters: |
- func – a remote stored-procedure name.
- args – procedure’s arguments.
|
| Returns: | a request ID
|
| Rtype: | rid_t
|
Possible errors: none.
Example:
The following function is defined on the Tarantool instance you are
connected to:
box.execute("DROP TABLE IF EXISTS t;")
box.execute("CREATE TABLE t(id INT PRIMARY KEY, a TEXT, b DOUBLE);")
function remote_replace(arg1, arg2, arg3)
return box.space.T:replace({arg1, arg2, arg3})
end
The function call can look as follows:
rid_t f1 = conn.call("remote_replace", std::make_tuple(5, "some_sring", 5.55));
-
bool
futureIsReady(rid_t future)
Checks availability of a request ID (future)
returned by any of the request methods, such as, ping()
and so on.
futureIsReady() returns true if the future is available
or false otherwise.
| Parameters: | future – a request ID. |
| Returns: | true or false |
| Rtype: | bool |
Possible errors: none.
Example:
rid_t ping = conn.ping();
conn.futureIsReady(ping);
-
std::optional<Response<BUFFER>>
getResponse(rid_t future)
The method takes a request ID (future) as an argument and returns
an optional object containing a response. If the response is not ready,
the method returns std::nullopt.
Note that for each future the method can be called only once because it
erases the request ID from the internal map as soon as the response is
returned to a user.
A response consists of a header (response.header) and a body
(response.body). Depending on success of the request execution on
the server side, body may contain either runtime errors accessible by
response.body.error_stack or data (tuples) accessible by
response.body.data. Data is a vector of tuples. However,
tuples are not decoded and come in the form of pointers to the start and
the end of MessagePacks. For details on decoding the data received, refer to
“Decoding and reading the data”.
| Parameters: | future – a request ID |
| Returns: | a response object or std::nullopt |
| Rtype: | std::optional<Response<BUFFER>> |
Possible errors: none.
Example:
rid_t ping = conn.ping();
std::optional<Response<Buf_t>> response = conn.getResponse(ping);
-
std::string &
getError()
Returns an error message for the last error occured during the execution of
methods of the Connector and
Connection classes.
| Returns: | an error message |
| Rtype: | std::string& |
Possible errors: none.
Example:
int rc = client.connect(conn, address, port);
if (rc != 0) {
std::cerr << conn.getError() << std::endl;
return -1;
}
-
void
reset()
Resets a connection after errors, that is, cleans up the error message
and the connection status.
Possible errors: none.
Example:
if (client.wait(conn, ping, WAIT_TIMEOUT) != 0) {
assert(conn.status.is_failed);
std::cerr << conn.getError() << std::endl;
conn.reset();
}
-
rid_t
ping()
Prepares a request to ping a Tarantool instance.
The method encodes the request in the MessagePack
format and queues it in the output connection buffer to be sent later
by one of Connector’s methods, namely,
wait(), waitAll(),
or waitAny().
Returns the request ID that is used to get the response by
the getResponce() method.
| Returns: | a request ID |
| Rtype: | rid_t |
Possible errors: none.
Example:
rid_t ping = conn.ping();
Nested classes and their methods
-
class
Space : Connection
Space is a nested class of the Connection
class. It is a public wrapper to access the data-manipulation methods in the way
similar to the Tarantool submodule box.space,
like, space[space_id].select(), space[space_id].replace(), and so on.
All the Space class methods listed below work in the following way:
- A method encodes the corresponding request in the MessagePack
format and queues it in the output connection buffer to be sent later
by one of Connector’s methods, namely,
wait(), waitAll(),
or waitAny().
- A method returns the request ID. To get and read the actual data
requested, first you need to get the response object by using the
getResponce() method
and then decode the data.
Public methods:
-
template<class
T>
rid_t select(const T &key, uint32_t index_id = 0, uint32_t limit = UINT32_MAX, uint32_t offset = 0, IteratorType iterator = EQ)
Searches for a tuple or a set of tuples in the given space. The method works
similar to space_object:select() and performs the
search against the primary index (index_id = 0) by default. In other
words, space[space_id].select() equals to
space[space_id].index[0].select().
| Parameters: |
- key – value to be matched against the index key.
- index_id – index ID. Optional. Defaults to
0.
- limit – maximum number of tuples to select. Optional.
Defaults to
UINT32_MAX.
- offset – number of tuples to skip. Optional.
Defaults to
0.
- iterator – the type of iterator. Optional.
Defaults to
EQ.
|
| Returns: | a request ID
|
| Rtype: | rid_t
|
Possible errors: none.
Example:
/* Equals to space_object:select({key_value}, {limit = 1}) in Tarantool*/
uint32_t space_id = 512;
int key_value = 5;
uint32_t limit = 1;
auto i = conn.space[space_id];
rid_t select = i.select(std::make_tuple(key_value), index_id, limit, offset, iter);
-
template<class
T>
rid_t replace(const T &tuple)
Inserts a tuple into the given space. If a tuple with the same primary key
already exists, replace() replaces the existing tuple with a new
one. The method works similar to space_object:replace() / put().
| Parameters: | tuple – a tuple to insert. |
| Returns: | a request ID |
| Rtype: | rid_t |
Possible errors: none.
Example:
/* Equals to space_object:replace(key_value, "111", 1.01) in Tarantool*/
uint32_t space_id = 512;
int key_value = 5;
std::tuple data = std::make_tuple(key_value, "111", 1.01);
rid_t replace = conn.space[space_id].replace(data);
-
template<class
T>
rid_t insert(const T &tuple)
Inserts a tuple into the given space.
The method works similar to space_object:insert().
| Parameters: | tuple – a tuple to insert. |
| Returns: | a request ID |
| Rtype: | rid_t |
Possible errors: none.
Example:
/* Equals to space_object:insert(key_value, "112", 2.22) in Tarantool*/
uint32_t space_id = 512;
int key_value = 6;
std::tuple data = std::make_tuple(key_value, "112", 2.22);
rid_t insert = conn.space[space_id].insert(data);
-
template<class
K, class T>
rid_t update(const K &key, const T &tuple, uint32_t index_id = 0)
Updates a tuple in the given space.
The method works similar to space_object:update()
and searches for the tuple to update against the primary index (index_id = 0)
by default. In other words, space[space_id].update() equals to
space[space_id].index[0].update().
The tuple parameter specifies an update operation, an identifier of the
field to update, and a new field value. The set of available operations and
the format of specifying an operation and a field identifier is the same
as in Tarantool. Refer to the description of :doc:` </reference/reference_lua/box_space/update>`
and example below for details.
| Parameters: |
- key – value to be matched against the index key.
- tuple – parameters for the update operation, namely,
operator, field_identifier, value.
- index_id – index ID. Optional. Defaults to
0.
|
| Returns: | a request ID
|
| Rtype: | rid_t
|
Possible errors: none.
Example:
/* Equals to space_object:update(key, {{'=', 1, 'update' }, {'+', 2, 12}}) in Tarantool*/
uint32_t space_id = 512;
std::tuple key = std::make_tuple(5);
std::tuple op1 = std::make_tuple("=", 1, "update");
std::tuple op2 = std::make_tuple("+", 2, 12);
rid_t f1 = conn.space[space_id].update(key, std::make_tuple(op1, op2));
-
template<class
T, class O>
rid_t upsert(const T &tuple, const O &ops, uint32_t index_base = 0)
Updates or inserts a tuple in the given space.
The method works similar to space_object:upsert().
If there is an existing tuple that matches the key fields of tuple,
the request has the same effect as
update() and the ops parameter
is used.
If there is no existing tuple that matches the key fields of tuple,
the request has the same effect as
insert() and the tuple parameter
is used.
| Parameters: |
- tuple – a tuple to insert.
- ops – parameters for the update operation, namely,
operator, field_identifier, value.
- index_base – starting number to count fields in a tuple:
0 or 1. Optional. Defaults to 0.
|
| Returns: | a request ID
|
| Rtype: | rid_t
|
Possible errors: none.
Example:
/* Equals to space_object:upsert({333, "upsert-insert", 0.0}, {{'=', 1, 'upsert-update'}}) in Tarantool*/
uint32_t space_id = 512;
std::tuple tuple = std::make_tuple(333, "upsert-insert", 0.0);
std::tuple op1 = std::make_tuple("=", 1, "upsert-update");
rid_t f1 = conn.space[space_id].upsert(tuple, std::make_tuple(op1));
-
template<class
T>
rid_t delete_(const T &key, uint32_t index_id = 0)
Deletes a tuple in the given space.
The method works similar to space_object:delete()
and searches for the tuple to delete against the primary index (index_id = 0)
by default. In other words, space[space_id].delete_() equals to
space[space_id].index[0].delete_().
| Parameters: |
- key – value to be matched against the index key.
- index_id – index ID. Optional. Defaults to
0.
|
| Returns: | a request ID
|
| Rtype: | rid_t
|
Possible errors: none.
Example:
/* Equals to space_object:delete(123) in Tarantool*/
uint32_t space_id = 512;
std::tuple key = std::make_tuple(123);
rid_t f1 = conn.space[space_id].delete_(key);
-
class
Index : Space
Index is a nested class of the Space
class. It is a public wrapper to access the data-manipulation methods in the way
similar to the Tarantool submodule box.index,
like, space[space_id].index[index_id].select() and so on.
All the Index class methods listed below work in the following way:
- A method encodes the corresponding request in the MessagePack
format and queues it in the output connection buffer to be sent later
by one of Connector’s methods, namely,
wait(), waitAll(),
or waitAny().
- A method returns the request ID that is used to get the response by
the getResponce() method.
Refer to the getResponce()
description to understand the response structure and how to read
the requested data.
Public methods:
-
template<class
T>
rid_t select(const T &key, uint32_t limit = UINT32_MAX, uint32_t offset = 0, IteratorType iterator = EQ)
This is an alternative to space.select().
The method searches for a tuple or a set of tuples in the given space against
a particular index and works similar to
index_object:select().
| Parameters: |
- key – value to be matched against the index key.
- limit – maximum number of tuples to select. Optional.
Defaults to
UINT32_MAX.
- offset – number of tuples to skip. Optional.
Defaults to
0.
- iterator – the type of iterator. Optional.
Defaults to
EQ.
|
| Returns: | a request ID
|
| Rtype: | rid_t
|
Possible errors: none.
Example:
/* Equals to index_object:select({key}, {limit = 1}) in Tarantool*/
uint32_t space_id = 512;
uint32_t index_id = 1;
int key = 10;
uint32_t limit = 1;
auto i = conn.space[space_id].index[index_id];
rid_t select = i.select(std::make_tuple(key), limit, offset, iter);
-
template<class
K, class T>
rid_t update(const K &key, const T &tuple)
This is an alternative to space.update().
The method updates a tuple in the given space but searches for the tuple
against a particular index.
The method works similar to index_object:update().
The tuple parameter specifies an update operation, an identifier of the
field to update, and a new field value. The set of available operations and
the format of specifying an operation and a field identifier is the same
as in Tarantool. Refer to the description of :doc:` </reference/reference_lua/box_index/update>`
and example below for details.
| Parameters: |
- key – value to be matched against the index key.
- tuple – parameters for the update operation, namely,
operator, field_identifier, value.
|
| Returns: | a request ID
|
| Rtype: | rid_t
|
Possible errors: none.
Example:
/* Equals to index_object:update(key, {{'=', 1, 'update' }, {'+', 2, 12}}) in Tarantool*/
uint32_t space_id = 512;
uint32_t index_id = 1;
std::tuple key = std::make_tuple(10);
std::tuple op1 = std::make_tuple("=", 1, "update");
std::tuple op2 = std::make_tuple("+", 2, 12);
rid_t f1 = conn.space[space_id].index[index_id].update(key, std::make_tuple(op1, op2));
-
template<class
T>
rid_t delete_(const T &key)
This is an alternative to space.delete_().
The method deletes a tuple in the given space but searches for the tuple
against a particular index.
The method works similar to index_object:delete().
| Parameters: | key – value to be matched against the index key. |
| Returns: | a request ID |
| Rtype: | rid_t |
Possible errors: none.
Example:
/* Equals to index_object:delete(123) in Tarantool*/
uint32_t space_id = 512;
uint32_t index_id = 1;
std::tuple key = std::make_tuple(123);
rid_t f1 = conn.space[space_id].index[index_id].delete_(key);
C#
The most commonly used C# driver is
progaudi.tarantool,
previously named tarantool-csharp. It is not supplied as part of the
Tarantool repository; it must be installed separately. The makers recommend
cross-platform installation using Nuget.
To be consistent with the other instructions in this chapter, here is a way to
install the driver directly on Ubuntu 16.04.
- Install .net core from Microsoft. Follow
.net core installation instructions.
Note
- Mono will not work, nor will .Net from xbuild. Only .net core supported on
Linux and Mac.
- Read the Microsoft End User License Agreement first, because it is not an
ordinary open-source agreement and there will be a message during
installation saying “This software may collect information about you and
your use of the software, and send that to Microsoft.”
Still you can
set environment variables
to opt out from telemetry.
Create a new console project.
$ cd ~
$ mkdir progaudi.tarantool.test
$ cd progaudi.tarantool.test
$ dotnet new console
Add progaudi.tarantool reference.
$ dotnet add package progaudi.tarantool
Change code in Program.cs.
$ cat <<EOT > Program.cs
using System;
using System.Threading.Tasks;
using ProGaudi.Tarantool.Client;
public class HelloWorld
{
static public void Main ()
{
Test().GetAwaiter().GetResult();
}
static async Task Test()
{
var box = await Box.Connect("127.0.0.1:3301");
var schema = box.GetSchema();
var space = await schema.GetSpace("examples");
await space.Insert((99999, "BB"));
}
}
EOT
Build and run your application.
Before trying to run, check that the server is listening at localhost:3301
and that the space examples exists, as
described earlier.
$ dotnet restore
$ dotnet run
The program will:
- connect using an application-specific definition of the space,
- open a socket connection with the Tarantool server at
localhost:3301,
- send an INSERT request, and — if all is well — end without saying anything.
If Tarantool is not running on localhost with listen port = 3301, or if user
‘guest’ does not have authorization to connect, or if the INSERT request
fails for any reason, the program will print an error message, among other
things (stacktrace, etc).
The example program only shows one request and does not show all that’s
necessary for good practice. For that, please see the
progaudi.tarantool driver repository.
Node.js
The most commonly used node.js driver is the Node Tarantool driver. It is not supplied as part
of the Tarantool repository; it must be installed separately. The most common
way to install it is with npm. For
example, on Ubuntu, the installation could look like this after npm has been
installed:
$ npm install tarantool-driver --global
Here is a complete node.js program that inserts [99999,'BB'] into
space[999] via the node.js API. Before trying to run, check that the server instance
is listening at localhost:3301 and that the space examples exists, as
described earlier. To run, paste the code into
a file named example.rs and say node example.rs. The program will
connect using an application-specific definition of the space. The program will
open a socket connection with the Tarantool instance at localhost:3301, then
send an INSERT request, then — if all is well — end after saying “Insert
succeeded”. If Tarantool is not running on localhost with listen port =
3301, the program will print “Connect failed”. If the ‘guest’ user does not have
authorization to connect, the program will print “Auth failed”. If the insert
request fails for any reason, for example because the tuple already exists,
the program will print “Insert failed”.
var TarantoolConnection = require('tarantool-driver');
var conn = new TarantoolConnection({port: 3301});
var insertTuple = [99999, "BB"];
conn.connect().then(function() {
conn.auth("guest", "").then(function() {
conn.insert(999, insertTuple).then(function() {
console.log("Insert succeeded");
process.exit(0);
}, function(e) { console.log("Insert failed"); process.exit(1); });
}, function(e) { console.log("Auth failed"); process.exit(1); });
}, function(e) { console.log("Connect failed"); process.exit(1); });
The example program only shows one request and does not show all that’s
necessary for good practice. For that, please see The node.js driver
repository.
Perl
The most commonly used Perl driver is
tarantool-perl. It is not
supplied as part of the Tarantool repository; it must be installed separately.
The most common way to install it is by cloning from GitHub.
To avoid minor warnings that may appear the first time tarantool-perl is
installed, start with installing some other modules that tarantool-perl uses,
with CPAN, the Comprehensive Perl Archive Network:
$ sudo cpan install AnyEvent
$ sudo cpan install Devel::GlobalDestruction
Then, to install tarantool-perl itself, say:
$ git clone https://github.com/tarantool/tarantool-perl.git tarantool-perl
$ cd tarantool-perl
$ git submodule init
$ git submodule update --recursive
$ perl Makefile.PL
$ make
$ sudo make install
Here is a complete Perl program that inserts [99999,'BB'] into space[999]
via the Perl API. Before trying to run, check that the server instance is listening at
localhost:3301 and that the space examples exists, as
described earlier.
To run, paste the code into a file named example.pl and say
perl example.pl. The program will connect using an application-specific
definition of the space. The program will open a socket connection with the
Tarantool instance at localhost:3301, then send an space_object:INSERT request, then — if
all is well — end without displaying any messages. If Tarantool is not running
on localhost with listen port = 3301, the program will print “Connection
refused”.
#!/usr/bin/perl
use DR::Tarantool ':constant', 'tarantool';
use DR::Tarantool ':all';
use DR::Tarantool::MsgPack::SyncClient;
my $tnt = DR::Tarantool::MsgPack::SyncClient->connect(
host => '127.0.0.1', # look for tarantool on localhost
port => 3301, # on port 3301
user => 'guest', # username. for 'guest' we do not also say 'password=>...'
spaces => {
999 => { # definition of space[999] ...
name => 'examples', # space[999] name = 'examples'
default_type => 'STR', # space[999] field type is 'STR' if undefined
fields => [ { # definition of space[999].fields ...
name => 'field1', type => 'NUM' } ], # space[999].field[1] name='field1',type='NUM'
indexes => { # definition of space[999] indexes ...
0 => {
name => 'primary', fields => [ 'field1' ] } } } } );
$tnt->insert('examples' => [ 99999, 'BB' ]);
The example program uses field type names ‘STR’ and ‘NUM’
instead of ‘string’ and ‘unsigned’, due to a temporary Perl limitation.
The example program only shows one request and does not show all that’s
necessary for good practice. For that, please see the
tarantool-perl repository.
PHP
tarantool-php is the official
PHP connector for Tarantool.
It is not supplied as part of the Tarantool repository and must be installed
separately (see installation instructions
in the connector’s README file).
Here is a complete PHP program that inserts [99999,'BB'] into a space named
examples via the PHP API.
Before trying to run, check that the server instance is
listening at localhost:3301 and that the space
examples exists, as described earlier.
To run, paste the code into a file named example.php and say:
$ php -d extension=~/tarantool-php/modules/tarantool.so example.php
The program will open a socket connection with the Tarantool instance at
localhost:3301, then send an INSERT request,
then – if all is well – print “Insert succeeded”.
If the tuple already exists, the program will print
“Duplicate key exists in unique index ‘primary’ in space ‘examples’”.
<?php
$tarantool = new Tarantool('localhost', 3301);
try {
$tarantool->insert('examples', [99999, 'BB']);
echo "Insert succeeded\n";
} catch (Exception $e) {
echo $e->getMessage(), "\n";
}
The example program only shows one request and does not show all that’s
necessary for good practice. For that, please see
tarantool/tarantool-php
project at GitHub.
Besides, there is another community-driven
tarantool-php GitHub project which includes an
alternative connector written in
pure PHP, an object mapper,
a queue and other packages.
Configuration reference
This topic describes all configuration parameters provided by Tarantool.
Most of the configuration options described in this reference can be applied to a specific instance, replica set, group, or to all instances globally.
To do so, you need to define the required option at the specified level.
Using Tarantool as an application server, you can run your own Lua applications.
In the app section, you can load the application and provide an application configuration in the app.cfg section.
Note
app can be defined in any scope.
Note
Note that an application specified using app is loaded after application roles specified using the roles option.
-
app.cfg
A configuration of the application loaded using app.file or app.module.
Example
In the example below, the application is loaded from the myapp.lua file placed next to the YAML configuration file:
app:
file: 'myapp.lua'
cfg:
greeting: 'Hello'
Example on GitHub: application
Tip
The experimental.config.utils.schema
built-in module provides an API for managing user-defined configurations
of applications (app.cfg) and roles (roles_cfg).
Type: map
Default: nil
Environment variable: TT_APP_CFG
-
app.file
A path to a Lua file to load an application from.
Type: string
Default: nil
Environment variable: TT_APP_FILE
-
app.module
A Lua module to load an application from.
Example
The app section can be placed in any configuration scope.
As an example use case, you can provide different applications for storages and routers in a sharded cluster:
groups:
storages:
app:
module: storage
# ...
routers:
app:
module: router
# ...
Type: string
Default: nil
Environment variable: TT_APP_MODULE
Enterprise Edition
Configuring audit_log parameters is available in the Enterprise Edition only.
The audit_log section defines configuration parameters related to audit logging.
Note
audit_log can be defined in any scope.
If set to true, the audit subsystem extracts and prints only the primary key instead of full
tuples in DML events (space_insert, space_replace, space_delete).
Otherwise, full tuples are logged.
The option may be useful in case tuples are big.
Type: boolean
Default: false
Environment variable: TT_AUDIT_LOG_EXTRACT_KEY
-
audit_log.file
Specify a file for the audit log destination.
You can set the file type using the audit_log.to option.
If you write logs to a file, Tarantool reopens the audit log at SIGHUP.
Type: string
Default: ‘var/log/{{ instance_name }}/audit.log’
Environment variable: TT_AUDIT_LOG_FILE
-
audit_log.filter
Enable logging for a specified subset of audit events.
This option accepts the following values:
- Event names (for example,
password_change). For details, see Audit log events.
- Event groups (for example,
audit). For details, see Event groups.
The option contains either one value from Possible values section (see below) or a combination of them.
To enable custom audit log events, specify the custom value in this option.
Example
filter: [ user_create,data_operations,ddl,custom ]
Type: array
Possible values: ‘all’, ‘audit’, ‘auth’, ‘priv’, ‘ddl’, ‘dml’, ‘data_operations’, ‘compatibility’,
‘audit_enable’, ‘auth_ok’, ‘auth_fail’, ‘disconnect’, ‘user_create’, ‘user_drop’, ‘role_create’, ‘role_drop’,
‘user_disable’, ‘user_enable’, ‘user_grant_rights’, ‘role_grant_rights’, ‘role_revoke_rights’, ‘password_change’,
‘access_denied’, ‘eval’, ‘call’, ‘space_select’, ‘space_create’, ‘space_alter’, ‘space_drop’, ‘space_insert’,
‘space_replace’, ‘space_delete’, ‘custom’
Default: ‘nil’
Environment variable: TT_AUDIT_LOG_FILTER
-
audit_log.format
Specify a format that is used for the audit log.
Example
If you set the option to plain,
audit_log:
to: file
format: plain
the output in the file might look as follows:
2024-01-17T00:12:27.155+0300
4b5a2624-28e5-4b08-83c7-035a0c5a1db9
INFO remote:unix/:(socket)
session_type:console
module:tarantool
user:admin
type:space_create
tag:
description:Create space Bands
Type: string
Possible values: ‘json’, ‘csv’, ‘plain’
Default: ‘json’
Environment variable: TT_AUDIT_LOG_FORMAT
-
audit_log.nonblock
Specify the logging behavior if the system is not ready to write.
If set to true, Tarantool does not block during logging if the system is non-writable and writes a message instead.
Using this value may improve logging performance at the cost of losing some log messages.
Note
The option only has an effect if the audit_log.to is set to syslog
or pipe.
Type: boolean
Default: false
Environment variable: TT_AUDIT_LOG_NONBLOCK
-
audit_log.pipe
Specify a pipe for the audit log destination.
You can set the pipe type using the audit_log.to option.
If log is a program, its pid is stored in the audit.pid field.
You need to send it a signal to rotate logs.
Example
This starts the cronolog program when the server starts
and sends all audit_log messages to cronolog standard input (stdin).
audit_log:
to: pipe
pipe: 'cronolog audit_tarantool.log'
Type: string
Default: box.NULL
Environment variable: TT_AUDIT_LOG_PIPE
-
audit_log.spaces
The array of space names for which data operation events (space_select, space_insert, space_replace,
space_delete) should be logged. The array accepts string values.
If set to box.NULL, the data operation events are logged for all spaces.
Example
In the example, only the events of bands and singers spaces are logged:
audit_log:
spaces: [bands, singers]
Type: array
Default: box.NULL
Environment variable: TT_AUDIT_LOG_SPACES
-
audit_log.to
Enable audit logging and define the log location.
This option accepts the following values:
By default, audit logging is disabled.
Example
The basic audit log configuration might look as follows:
audit_log:
to: file
file: 'audit_tarantool.log'
filter: [ user_create,data_operations,ddl,custom ]
format: json
spaces: [ bands ]
extract_key: true
Type: string
Possible values: ‘devnull’, ‘file’, ‘pipe’, ‘syslog’
Default: ‘devnull’
Environment variable: TT_AUDIT_LOG_TO
-
audit_log.syslog.facility
Specify a system logger keyword that tells syslogd where to send the message.
You can enable logging to a system logger using the audit_log.to option.
See also: syslog configuration example
Type: string
Possible values: ‘auth’, ‘authpriv’, ‘cron’, ‘daemon’, ‘ftp’, ‘kern’, ‘lpr’, ‘mail’, ‘news’, ‘security’, ‘syslog’, ‘user’, ‘uucp’, ‘local0’, ‘local1’, ‘local2’, ‘local3’, ‘local4’, ‘local5’, ‘local6’, ‘local7’
Default: ‘local7’
Environment variable: TT_AUDIT_LOG_SYSLOG_FACILITY
-
audit_log.syslog.identity
Specify an application name to show in logs.
You can enable logging to a system logger using the audit_log.to option.
See also: syslog configuration example
Type: string
Default: ‘tarantool’
Environment variable: TT_AUDIT_LOG_SYSLOG_IDENTITY
-
audit_log.syslog.server
Set a location for the syslog server.
It can be a Unix socket path starting with ‘unix:’ or an ipv4 port number.
You can enable logging to a system logger using the audit_log.to option.
Example
audit_log:
to: syslog
syslog:
server: 'unix:/dev/log'
facility: 'user'
identity: 'tarantool_audit'
These options are interpreted as a message for the syslogd program,
which runs in the background of any Unix-like platform.
An example of a Tarantool audit log entry in the syslog:
09:32:52 tarantool_audit: {"time": "2024-02-08T09:32:52.190+0300", "uuid": "94454e46-9a0e-493a-bb9f-d59e44a43581", "severity": "INFO", "remote": "unix/:(socket)", "session_type": "console", "module": "tarantool", "user": "admin", "type": "space_create", "tag": "", "description": "Create space bands"}
Warning
Above is an example of writing audit logs to a directory shared with the system logs.
Tarantool allows this option, but it is not recommended to do this to avoid difficulties
when working with audit logs. System and audit logs should be written separately.
To do this, create separate paths and specify them.
Type: string
Default: box.NULL
Environment variable: TT_AUDIT_LOG_SYSLOG_SERVER
The compat section defines values of the compat module options.
Note
compat can be defined in any scope.
-
compat.binary_data_decoding
Define how to store binary data fields in Lua after decoding:
new: as varbinary objects
old: as plain strings
See also: Decoding binary objects
Type: string
Possible values: ‘new’, ‘old’
Default: ‘new’
Environment variable: TT_COMPAT_BINARY_DATA_DECODING
-
compat.box_cfg_replication_sync_timeout
Set a default replication sync timeout:
Important
This value is set during the initial box.cfg{} call and cannot be changed later.
See also: Default value for replication_sync_timeout
Type: string
Possible values: ‘new’, ‘old’
Default: ‘new’
Environment variable: TT_COMPAT_BOX_CFG_REPLICATION_SYNC_TIMEOUT
-
compat.box_error_serialize_verbose
Since: 3.1.0
Set the verbosity of error objects serialization:
new: serialize the error message together with other potentially useful fields
old: serialize only the error message
Type: string
Possible values: ‘new’, ‘old’
Default: ‘old’
Environment variable: TT_COMPAT_BOX_ERROR_SERIALIZE_VERBOSE
-
compat.box_error_unpack_type_and_code
Since: 3.1.0
Whether to show error fields in box.error.unpack():
new: do not show base_type and custom_type fields; do not show
the code field if it is 0. Note that base_type is still accessible for an error object.
old: show all fields
Type: string
Possible values: ‘new’, ‘old’
Default: ‘old’
Environment variable: TT_COMPAT_BOX_ERROR_UNPACK_TYPE_AND_CODE
-
compat.box_info_cluster_meaning
Define the behavior of box.info.cluster:
new: show the entire cluster
old:: show the current replica set
See also: Meaning of box.info.cluster
Type: string
Possible values: ‘new’, ‘old’
Default: ‘new’
Environment variable: TT_COMPAT_BOX_INFO_CLUSTER_MEANING
-
compat.box_session_push_deprecation
Whether to raise errors on attempts to call the deprecated function box.session.push:
new: raise an error
old: do not raise an error
See also: box.session.push() deprecation
Type: string
Possible values: ‘new’, ‘old’
Default: ‘old’
Environment variable: TT_COMPAT_BOX_SESSION_PUSH_DEPRECATION
-
compat.box_space_execute_priv
Whether the execute privilege can be granted on spaces:
new: an error is raised
old: the privilege can be granted with no actual effect
Type: string
Possible values: ‘new’, ‘old’
Default: ‘new’
Environment variable: TT_COMPAT_BOX_SPACE_EXECUTE_PRIV
-
compat.box_space_max
Set the maximum space identifier (box.schema.SPACE_MAX):
new: 2147483646
old: 2147483647
The limit was decremented because the old max value is used as an error indicator in the box C API.
Type: string
Possible values: ‘new’, ‘old’
Default: ‘new’
Environment variable: TT_COMPAT_BOX_SPACE_MAX
-
compat.box_tuple_extension
Controls IPROTO_FEATURE_CALL_RET_TUPLE_EXTENSION and
IPROTO_FEATURE_CALL_ARG_TUPLE_EXTENSION feature bits that
define tuple encoding in iproto call and eval requests.
new: tuples with formats are encoded as MP_TUPLE
old: tuples with formats are encoded as MP_ARRAY
Type: string
Possible values: ‘new’, ‘old’
Default: ‘new’
Environment variable: TT_COMPAT_BOX_TUPLE_EXTENSION
-
compat.box_tuple_new_vararg
Controls how box.tuple.new interprets an argument list:
new: as a value with a tuple format
old: as an array of tuple fields
Type: string
Possible values: ‘new’, ‘old’
Default: ‘new’
Environment variable: TT_COMPAT_BOX_TUPLE_NEW_VARARG
-
compat.c_func_iproto_multireturn
Controls wrapping of multiple results of a stored C function when returning them via iproto:
new: return without wrapping (consistently with a local call via box.func)
old: wrap results into a MessagePack array
Type: string
Possible values: ‘new’, ‘old’
Default: ‘new’
Environment variable: TT_COMPAT_C_FUNC_IPROTO_MULTIRETURN
-
compat.fiber_channel_close_mode
Define the behavior of fiber channels after closing:
new: mark the channel read-only
old: destroy the channel object
See also: Fiber channel close mode
Type: string
Possible values: ‘new’, ‘old’
Default: ‘new’
Environment variable: TT_COMPAT_FIBER_CHANNEL_CLOSE_MODE
-
compat.fiber_slice_default
Define the maximum fiber execution time without a yield:
new: {warn = 0.5, err = 1.0}
old: infinity (no warnings or errors raised).
See also: Default value for max fiber slice
Type: string
Possible values: ‘new’, ‘old’
Default: ‘new’
Environment variable: TT_COMPAT_FIBER_SLICE_DEFAULT
-
compat.json_escape_forward_slash
Whether to escape the forward slash symbol ‘/’ using a backslash in a json.encode() result:
new: do not escape the forward slash
old: escape the forward slash
See also: JSON encode escape forward slash
Type: string
Possible values: ‘new’, ‘old’
Default: ‘new’
Environment variable: TT_COMPAT_JSON_ESCAPE_FORWARD_SLASH
-
compat.sql_priv
Whether to enable access checks for SQL requests over iproto:
new: check the user’s access permissions
old: allow any user to execute SQL over iproto
Type: string
Possible values: ‘new’, ‘old’
Default: ‘new’
Environment variable: TT_COMPAT_SQL_PRIV
-
compat.sql_seq_scan_default
Controls the default value of the sql_seq_scan session setting:
See also: Default value for sql_seq_scan session setting
Type: string
Possible values: ‘new’, ‘old’
Default: ‘new’
Environment variable: TT_COMPAT_SQL_SEQ_SCAN_DEFAULT
-
compat.yaml_pretty_multiline
Whether to encode in block scalar style all multiline strings or ones containing the \n\n substring:
new: all multiline strings
old: only strings containing the \n\n substring
See also: Lua-YAML prettier multiline output
Type: string
Possible values: ‘new’, ‘old’
Default: ‘new’
Environment variable: TT_COMPAT_YAML_PRETTY_MULTILINE
The conditional section defines the configuration parts that apply to instances
that meet certain conditions.
Note
conditional can be defined in the global scope only.
-
conditional.if
Specify a conditional section of the configuration. The configuration options
defined inside a conditional.if section apply only to instances on which
the specified condition is true.
Conditions can include one variable – tarantool_version: a three-number
Tarantool version running on the instance, for example, 3.1.0. It compares to
version literal values that include three numbers separated by periods (x.y.z).
The following operators are available in conditions:
- comparison:
>, <, >=, <=, ==, !=
- logical operators
|| (OR) and && (AND)
- parentheses
()
Example:
In this example, different configuration parts apply to instances running
Tarantool versions above and below 3.1.0:
- On versions less than 3.1.0, the
upgraded label is set to false.
- On versions 3.1.0 or newer, the
upgraded label is set to true.
Additionally, new compat options are defined. These options were introduced
in version 3.1.0, so on older versions they would cause an error.
conditional:
- if: tarantool_version < 3.1.0
labels:
upgraded: 'false'
- if: tarantool_version >= 3.1.0
labels:
upgraded: 'true'
compat:
box_error_serialize_verbose: 'new'
box_error_unpack_type_and_code: 'new'
See also: Conditional configuration sections
The config section defines various parameters related to centralized configuration.
Note
config can be defined in the global scope only.
-
config.reload
Specify how the configuration is reloaded.
This option accepts the following values:
auto: configuration is reloaded automatically when it is changed.
manual: configuration should be reloaded manually. In this case, you can reload the configuration in the application code using config:reload().
See also: Reloading configuration
Type: string
Possible values: ‘auto’, ‘manual’
Default: ‘auto’
Environment variable: TT_CONFIG_RELOAD
This section describes options related to loading configuration settings from external storage such as external files or environment variables.
-
config.context
Specify how to load settings from external storage.
For example, this option can be used to load passwords from safe storage.
You can find examples in the Loading secrets from safe storage section.
Type: map
Default: nil
Environment variable: TT_CONFIG_CONTEXT
-
config.context.<name>
The name of an entity that identifies a configuration value to load.
-
config.context.<name>.env
The name of an environment variable to load a configuration value from.
To load a configuration value from an environment variable, set config.context.<name>.from to env.
Example
In this example, passwords are loaded from the DBADMIN_PASSWORD and SAMPLEUSER_PASSWORD environment variables:
config:
context:
dbadmin_password:
from: env
env: DBADMIN_PASSWORD
sampleuser_password:
from: env
env: SAMPLEUSER_PASSWORD
See also: Loading secrets from safe storage
-
config.context.<name>.from
The type of storage to load a configuration value from.
There are the following storage types:
file: load a configuration value from a file.
In this case, you need to specify the path to the file using config.context.<name>.file.
env: load a configuration value from an environment variable.
In this case, specify the environment variable name using config.context.<name>.env.
-
config.context.<name>.file
The path to a file to load a configuration value from.
To load a configuration value from a file, set config.context.<name>.from to file.
Example
In this example, passwords are loaded from the dbadmin_password.txt and sampleuser_password.txt files:
config:
context:
dbadmin_password:
from: file
file: secrets/dbadmin_password.txt
rstrip: true
sampleuser_password:
from: file
file: secrets/sampleuser_password.txt
rstrip: true
See also: Loading secrets from safe storage
-
config.context.<name>.rstrip
(Optional) Whether to strip whitespace characters and newlines from the end of data.
Enterprise Edition
Centralized configuration storages are supported by the Enterprise Edition only.
This section describes options related to providing connection settings to a centralized etcd-based storage.
If replication.failover is set to supervised, Tarantool also uses etcd to maintain the state of failover coordinators.
-
config.etcd.endpoints
The list of endpoints used to access an etcd server.
See also: Configuring connection to an etcd storage
Type: array
Default: nil
Environment variable: TT_CONFIG_ETCD_ENDPOINTS
-
config.etcd.prefix
A key prefix used to search a configuration on an etcd server.
Tarantool searches keys by the following path: <prefix>/config/*.
Note that <prefix> should start with a slash (/).
See also: Configuring connection to an etcd storage
Type: string
Default: nil
Environment variable: TT_CONFIG_ETCD_PREFIX
-
config.etcd.username
A username used for authentication.
See also: Configuring connection to an etcd storage
Type: string
Default: nil
Environment variable: TT_CONFIG_ETCD_USERNAME
-
config.etcd.password
A password used for authentication.
See also: Configuring connection to an etcd storage
Type: string
Default: nil
Environment variable: TT_CONFIG_ETCD_PASSWORD
-
config.etcd.ssl.ca_file
A path to a trusted certificate authorities (CA) file.
Type: string
Default: nil
Environment variable: TT_CONFIG_ETCD_SSL_CA_FILE
-
config.etcd.ssl.ca_path
A path to a directory holding certificates to verify the peer with.
Type: string
Default: nil
Environment variable: TT_CONFIG_ETCD_SSL_CA_PATH
-
config.etcd.ssl.ssl_cert
Since: 3.2.0
A path to an SSL certificate file.
Type: string
Default: nil
Environment variable: TT_CONFIG_ETCD_SSL_SSL_CERT
-
config.etcd.ssl.ssl_key
A path to a private SSL key file.
Type: string
Default: nil
Environment variable: TT_CONFIG_ETCD_SSL_SSL_KEY
-
config.etcd.ssl.verify_host
Enable verification of the certificate’s name (CN) against the specified host.
Type: boolean
Default: nil
Environment variable: TT_CONFIG_ETCD_SSL_VERIFY_HOST
-
config.etcd.ssl.verify_peer
Enable verification of the peer’s SSL certificate.
Type: boolean
Default: nil
Environment variable: TT_CONFIG_ETCD_SSL_VERIFY_PEER
-
config.etcd.http.request.timeout
A time period required to process an HTTP request to an etcd server: from sending a request to receiving a response.
See also: Configuring connection to an etcd storage
Type: number
Default: nil
Environment variable: TT_CONFIG_ETCD_HTTP_REQUEST_TIMEOUT
-
config.etcd.http.request.unix_socket
A Unix domain socket used to connect to an etcd server.
Type: string
Default: nil
Environment variable: TT_CONFIG_ETCD_HTTP_REQUEST_UNIX_SOCKET
-
config.etcd.watchers.reconnect_max_attempts
Since: 3.1.0
The maximum number of attempts to reconnect to an etcd server in case of connection failure.
Type: integer
Default: nil
Environment variable: TT_CONFIG_ETCD_WATCHERS_RECONNECT_MAX_ATTEMPTS
-
config.etcd.watchers.reconnect_timeout
Since: 3.1.0
The timeout (in seconds) between attempts to reconnect to an etcd server in case of connection failure.
Type: number
Default: nil
Environment variable: TT_CONFIG_ETCD_WATCHERS_RECONNECT_TIMEOUT
Enterprise Edition
Centralized configuration storages are supported by the Enterprise Edition only.
This section describes options related to providing connection settings to a centralized Tarantool-based storage.
-
config.storage.endpoints
An array of endpoints used to access a configuration storage.
Each endpoint can include the following fields:
uri: a URI of the configuration storage’s instance.
login: a username used to connect to the instance.
password: a password used for authentication.
params: SSL parameters required for encrypted connections (<uri>.params.*).
See also: Configuring connection to a Tarantool storage
Type: array
Default: nil
Environment variable: TT_CONFIG_STORAGE_ENDPOINTS
-
config.storage.prefix
A key prefix used to search a configuration in a centralized configuration storage.
Tarantool searches keys by the following path: <prefix>/config/*.
Note that <prefix> should start with a slash (/).
See also: Configuring connection to a Tarantool storage
Type: string
Default: nil
Environment variable: TT_CONFIG_STORAGE_PREFIX
-
config.storage.reconnect_after
A number of seconds to wait before reconnecting to a configuration storage.
Type: number
Default: 3
Environment variable: TT_CONFIG_STORAGE_RECONNECT_AFTER
-
config.storage.timeout
The interval (in seconds) to perform the status check of a configuration storage.
See also: Configuring connection to a Tarantool storage
Type: number
Default: 3
Environment variable: TT_CONFIG_STORAGE_TIMEOUT
Configure the administrative console. A client to the console is tt connect.
Note
console can be defined in any scope.
-
console.enabled
Whether to listen on the Unix socket provided in the
console.socket option.
If the option is set to false, the administrative console is disabled.
Type: boolean
Default: true
Environment variable: TT_CONSOLE_ENABLED
-
console.socket
The Unix socket for the administrative console.
Mind the following nuances:
- Only a Unix domain socket is allowed. A TCP socket can’t be configured this way.
console.socket is a file path, without any unix: or unix/: prefixes.
- If the file path is a relative path, it is interpreted relative to
process.work_dir.
Type: string
Default: ‘var/run/{{ instance_name }}/tarantool.control’
Environment variable: TT_CONSOLE_SOCKET
The credentials section allows you to create users and grant them the specified privileges.
Learn more in Credentials.
Note
credentials can be defined in any scope.
-
credentials.roles
An array of roles that can be granted to users or other roles.
Example
In the example below, the writers_space_reader role gets privileges to select data in the writers space:
roles:
writers_space_reader:
privileges:
- permissions: [ read ]
spaces: [ writers ]
See also: Managing users and roles
Type: map
Default: nil
Environment variable: TT_CREDENTIALS_ROLES
-
credentials.roles.<role_name>.roles
An array of roles granted to this role.
-
credentials.roles.<role_name>.privileges
An array of privileges granted to this role.
See <user_or_role_name>.privileges.*.
-
credentials.users
An array of users.
Example
In this example, sampleuser gets the following privileges:
- Privileges granted to the
writers_space_reader role.
- Privileges to select and modify data in the
books space.
sampleuser:
password: '123456'
roles: [ writers_space_reader ]
privileges:
- permissions: [ read, write ]
spaces: [ books ]
See also: Managing users and roles
Type: map
Default: nil
Environment variable: TT_CREDENTIALS_USERS
-
credentials.users.<username>.password
A user’s password.
Example
In the example below, a password for the dbadmin user is set:
credentials:
users:
dbadmin:
password: 'T0p_Secret_P@$$w0rd'
See also: Loading secrets from safe storage
-
credentials.users.<username>.roles
An array of roles granted to this user.
-
credentials.users.<username>.privileges
An array of privileges granted to this user.
See <user_or_role_name>.privileges.*.
<user_or_role_name>.privileges.*
-
<user_or_role_name>.privileges
Privileges that can be granted to a user or role using the following options:
-
<user_or_role_name>.privileges.permissions
Permissions assigned to this user or a user with this role.
Example
In this example, sampleuser gets privileges to select and modify data in the books space:
sampleuser:
password: '123456'
roles: [ writers_space_reader ]
privileges:
- permissions: [ read, write ]
spaces: [ books ]
See also: Managing users and roles
-
<user_or_role_name>.privileges.spaces
Spaces to which this user or a user with this role gets the specified permissions.
Example
In this example, sampleuser gets privileges to select and modify data in the books space:
sampleuser:
password: '123456'
roles: [ writers_space_reader ]
privileges:
- permissions: [ read, write ]
spaces: [ books ]
See also: Managing users and roles
-
<user_or_role_name>.privileges.functions
Functions to which this user or a user with this role gets the specified permissions.
-
<user_or_role_name>.privileges.sequences
Sequences to which this user or a user with this role gets the specified permissions.
-
<user_or_role_name>.privileges.lua_eval
Whether this user or a user with this role can execute arbitrary Lua code.
-
<user_or_role_name>.privileges.lua_call
A list of global user-defined Lua functions that this user or a user with this role can call.
To allow calling a specific function, specify its name as the value.
To allow calling all global Lua functions except built-in ones functions, specify the all value.
This option should be configured together with the execute
permission.
Since version 3.3.0, the lua_call option allows granting users privileges to call specified lua function on
the instance in runtime (thus it doesn’t require an ability to write to the database).
Example to grant custom functions to the ‘alice’ user:
credentials:
users:
alice:
privileges:
- permissions: [execute]
lua_call: [my_func, my_func2]
-
<user_or_role_name>.privileges.sql
Whether this user or a user with this role can execute an arbitrary SQL expression.
The database section defines database-specific configuration parameters, such as an instance’s read-write mode or transaction isolation level.
Note
database can be defined in any scope.
-
database.hot_standby
Whether to start the server in the hot standby mode.
This mode can be used to provide failover without replication.
Suppose there are two cluster applications.
Each cluster has one instance with the same configuration:
groups:
group001:
replicasets:
replicaset001:
instances:
instance001:
database:
hot_standby: true
wal:
dir: /tmp/wals
snapshot:
dir: /tmp/snapshots
iproto:
listen:
- uri: '127.0.0.1:3301'
In particular, both instances use the same directory for storing write-ahead logs and snapshots.
When you start both cluster applications on the same machine, the instance from the first one will be the primary instance and the second will be the standby instance.
In the logs of the second cluster instance, you should see a notification:
main/104/interactive I> Entering hot standby mode
This means that the standby instance is ready to take over if the primary instance goes down.
The standby instance initializes and tries to take a lock on a directory for storing write-ahead logs
but fails because the primary instance has made a lock on this directory.
If the primary instance goes down for any reason, the lock is released.
In this case, the standby instance succeeds in taking the lock and becomes the primary instance.
database.hot_standby has no effect:
- If wal.mode is set to
none.
- If wal.dir_rescan_delay is set to a large value on macOS or FreeBSD. On these platforms, the hot standby mode is designed so that the loop repeats every
wal.dir_rescan_delay seconds.
- For spaces created with engine set to
vinyl.
Examples on GitHub: hot_standby_1, hot_standby_2
Type: boolean
Default: false
Environment variable: TT_DATABASE_HOT_STANDBY
-
database.instance_uuid
An instance UUID.
By default, instance UUIDs are generated automatically.
database.instance_uuid can be used to specify an instance identifier manually.
UUIDs should follow these rules:
- The values must be true unique identifiers, not shared by other instances
or replica sets within the common infrastructure.
- The values must be used consistently, not changed after the initial setup.
The initial values are stored in snapshot files
and are checked whenever the system is restarted.
- The values must comply with RFC 4122.
The nil UUID is not allowed.
See also: database.replicaset_uuid
Type: string
Environment variable: TT_DATABASE_INSTANCE_UUID
-
database.mode
An instance’s operating mode.
This option is in effect if replication.failover is set to off.
The following modes are available:
rw: an instance is in read-write mode.
ro: an instance is in read-only mode.
If not specified explicitly, the default value depends on the number of instances in a replica set. For a single instance, the rw mode is used, while for multiple instances, the ro mode is used.
Example
You can set the database.mode option to rw on all instances in a replica set to make a master-master configuration.
In this case, replication.failover should be set to off.
credentials:
users:
replicator:
password: 'topsecret'
roles: [replication]
iproto:
advertise:
peer:
login: replicator
replication:
failover: off
groups:
group001:
replicasets:
replicaset001:
instances:
instance001:
database:
mode: rw
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
database:
mode: rw
iproto:
listen:
- uri: '127.0.0.1:3302'
# Load sample data
app:
file: 'myapp.lua'
Type: string
Default:
box.NULL (the actual default value depends on the number of instances in a replica set)
Environment variable: TT_DATABASE_MODE
-
database.replicaset_uuid
A replica set UUID.
By default, replica set UUIDs are generated automatically.
database.replicaset_uuid can be used to specify a replica set identifier manually.
See also: database.instance_uuid
Type: string
Environment variable: TT_DATABASE_REPLICASET_UUID
-
database.txn_isolation
A transaction isolation level.
Type: string
Default: best-effort
Possible values: best-effort, read-committed, read-confirmed
Environment variable: TT_DATABASE_TXN_ISOLATION
-
database.txn_timeout
A timeout (in seconds) after which the transaction is rolled back.
See also: box.begin()
Type: number
Default: 3153600000 (~100 years)
Environment variable: TT_DATABASE_TXN_TIMEOUT
-
database.use_mvcc_engine
Whether the transactional manager is enabled.
Type: boolean
Default: false
Environment variable: TT_DATABASE_USE_MVCC_ENGINE
The failover section defines parameters related to a supervised failover.
Note
failover can be defined in the global scope only.
-
failover.log.to
Since: 3.3.0
Enterprise Edition
Configuring failover.log.to and failover.log.file parameters is available in the Enterprise Edition only.
Define a location Tarantool sends failover logs to.
This option accepts the following values:
stderr: write logs to the standard error stream.
file: write logs to a file (see failover.log.file).
Type: string
Default: ‘stderr’
Environment variable: TT_FAILOVER_LOG_TO
-
failover.log.file
Since: 3.3.0
Specify a file for failover logs destination.
To write logs to a file, set failover.log.to to file.
Otherwise, failover.log.file is ignored.
Example
The example below shows how to write failover logs to a file placed in the specified directory:
failover:
log:
to: file
file: var/log/failover.log
Type: string
Default: nil
Environment variable: TT_FAILOVER_LOG_FILE
-
failover.call_timeout
Since: 3.1.0
A call timeout (in seconds) for monitoring and failover requests to an instance.
Type: number
Default: 1
Environment variable: TT_FAILOVER_CALL_TIMEOUT
-
failover.connect_timeout
Since: 3.1.0
A connection timeout (in seconds) for monitoring and failover requests to an instance.
Type: number
Default: 1
Environment variable: TT_FAILOVER_CONNECT_TIMEOUT
-
failover.lease_interval
Since: 3.1.0
A time interval (in seconds) that specifies how long an instance should be a leader without renew requests from a coordinator.
When this interval expires, the leader switches to read-only mode.
This action is performed by the instance itself and works even if there is no connectivity between the instance and the coordinator.
Type: number
Default: 30
Environment variable: TT_FAILOVER_LEASE_INTERVAL
-
failover.probe_interval
Since: 3.1.0
A time interval (in seconds) that specifies how often a monitoring service of the failover coordinator polls an instance for its status.
Type: number
Default: 10
Environment variable: TT_FAILOVER_PROBE_INTERVAL
-
failover.renew_interval
Since: 3.1.0
A time interval (in seconds) that specifies how often a failover coordinator sends read-write deadline renewals.
Type: number
Default: 10
Environment variable: TT_FAILOVER_RENEW_INTERVAL
failover.stateboard.* options define configuration parameters related to maintaining the state of failover coordinators in a remote etcd-based storage.
See also: Active and passive coordinators
-
failover.stateboard.keepalive_interval
Since: 3.1.0
A time interval (in seconds) that specifies how long a transient state information is stored and how quickly a lock expires.
Note
failover.stateboard.keepalive_interval should be smaller than failover.lease_interval.
Otherwise, switching of a coordinator causes a replica set leader to go to read-only mode for some time.
Type: number
Default: 10
Environment variable: TT_FAILOVER_STATEBOARD_KEEPALIVE_INTERVAL
-
failover.stateboard.renew_interval
Since: 3.1.0
A time interval (in seconds) that specifies how often a failover coordinator writes its state information to etcd.
This option also determines the frequency at which an active coordinator reads new commands from etcd.
Type: number
Default: 2
Environment variable: TT_FAILOVER_STATEBOARD_RENEW_INTERVAL
The feedback section describes configuration parameters for sending information about a running Tarantool instance to the specified feedback server.
Note
feedback can be defined in any scope.
-
feedback.crashinfo
Whether to send crash information in the case of an instance failure.
This information includes:
- General information from the
uname output.
- Build information.
- The crash reason.
- The stack trace.
To turn off sending crash information, set this option to false.
Type: boolean
Default: true
Environment variable: TT_FEEDBACK_CRASHINFO
-
feedback.enabled
Whether to send information about a running instance to the feedback server.
To turn off sending feedback, set this option to false.
Type: boolean
Default: true
Environment variable: TT_FEEDBACK_ENABLED
-
feedback.host
The address to which information is sent.
Type: string
Environment variable: TT_FEEDBACK_HOST
-
feedback.interval
The interval (in seconds) of sending information.
Type: number
Default: 3600
Environment variable: TT_FEEDBACK_INTERVAL
-
feedback.metrics_collect_interval
The interval (in seconds) for collecting metrics.
Type: number
Default: 60
Environment variable: TT_FEEDBACK_METRICS_COLLECT_INTERVAL
-
feedback.metrics_limit
The maximum size of memory (in bytes) used to store metrics before sending them to the feedback server.
If the size of collected metrics exceeds this value, earlier metrics are dropped.
Type: integer
Default: 1024 * 1024 (1048576)
Environment variable: TT_FEEDBACK_METRICS_LIMIT
-
feedback.send_metrics
Whether to send metrics to the feedback server.
Note that all collected metrics are dropped after sending them to the feedback server.
Type: boolean
Default: true
Environment variable: TT_FEEDBACK_SEND_METRICS
The fiber section describes options related to configuring fibers, yields, and cooperative multitasking.
Note
fiber can be defined in any scope.
-
fiber.io_collect_interval
The time period (in seconds) a fiber sleeps between
iterations of the event loop.
fiber.io_collect_interval can be used to reduce CPU load in deployments
where the number of client connections is large, but requests are not so frequent
(for example, each connection issues just a handful of requests per second).
Type: number
Default: box.NULL
Environment variable: TT_FIBER_IO_COLLECT_INTERVAL
-
fiber.too_long_threshold
If processing a request takes longer than the given period (in seconds),
the fiber warns about it in the log.
fiber.too_long_threshold has effect only if
log.level is greater than
or equal to 4 (warn).
Type: number
Default: 0.5
Environment variable: TT_FIBER_TOO_LONG_THRESHOLD
-
fiber.worker_pool_threads
The maximum number of threads to use during execution
of certain internal processes (for example,
socket.getaddrinfo() and
coio_call()).
Type: number
Default: 4
Environment variable: TT_FIBER_WORKER_POOL_THREADS
This section describes options related to configuring time periods for
fiber slices.
See fiber.set_max_slice for details and examples.
-
fiber.slice.warn
Set a time period (in seconds) that specifies the warning slice.
Type: number
Default: 0.5
Environment variable: TT_FIBER_SLICE_WARN
-
fiber.slice.err
Set a time period (in seconds) that specifies the error slice.
Type: number
Default: 1
Environment variable: TT_FIBER_SLICE_ERR
This section describes options related to configuring the
fiber.top() function, normally used for debug purposes.
fiber.top() shows all alive fibers and their CPU consumption.
-
fiber.top.enabled
Enable or disable the fiber.top() function.
Enabling fiber.top() slows down fiber switching by about 15%,
so it is disabled by default.
Type: boolean
Default: false
Environment variable: TT_FIBER_TOP_ENABLED
Enterprise Edition
Configuring flightrec parameters is available in the Enterprise Edition only.
The flightrec section describes options related to the flight recorder configuration.
Note
flightrec can be defined in any scope.
-
flightrec.enabled
Enable the flight recorder.
Type: boolean
Default: false
Environment variable: TT_FLIGHTREC_ENABLED
-
flightrec.logs_size
Specify the size (in bytes) of the log storage.
You can set this option to 0 to disable the log storage.
Type: integer
Default: 10485760
Environment variable: TT_FLIGHTREC_LOGS_SIZE
-
flightrec.logs_max_msg_size
Specify the maximum size (in bytes) of the log message.
The log message is truncated if its size exceeds this limit.
Type: integer
Default: 4096
Maximum: 16384
Environment variable: TT_FLIGHTREC_LOGS_MAX_MSG_SIZE
-
flightrec.logs_log_level
Specify the level of detail the log has.
The default value is 6 (VERBOSE).
You can learn more about log levels from the log_level
option description.
Note that the flightrec.logs_log_level value might differ from log_level.
Type: integer
Default: 6
Environment variable: TT_FLIGHTREC_LOGS_LOG_LEVEL
-
flightrec.metrics_period
Specify the time period (in seconds) that defines how long metrics are stored from the moment of dump.
So, this value defines how much historical metrics data is collected up to the moment of crash.
The frequency of metric dumps is defined by flightrec.metrics_interval.
Type: integer
Default: 180
Environment variable: TT_FLIGHTREC_METRICS_PERIOD
-
flightrec.metrics_interval
Specify the time interval (in seconds) that defines the frequency of dumping metrics.
This value shouldn’t exceed flightrec.metrics_period.
Type: number
Default: 1.0
Minimum: 0.001
Environment variable: TT_FLIGHTREC_METRICS_INTERVAL
Note
Given that the average size of a metrics entry is 2 kB,
you can estimate the size of the metrics storage as follows:
(flightrec_metrics_period / flightrec_metrics_interval) * 2 kB
-
flightrec.requests_size
Specify the size (in bytes) of storage for the request and response data.
You can set this parameter to 0 to disable a storage of requests and responses.
Type: integer
Default: 10485760
Environment variable: TT_FLIGHTREC_REQUESTS_SIZE
-
flightrec.requests_max_req_size
Specify the maximum size (in bytes) of a request entry.
A request entry is truncated if this size is exceeded.
Type: integer
Default: 16384
Environment variable: TT_FLIGHTREC_REQUESTS_MAX_REQ_SIZE
-
flightrec.requests_max_res_size
Specify the maximum size (in bytes) of a response entry.
A response entry is truncated if this size is exceeded.
Type: integer
Default: 16384
Environment variable: TT_FLIGHTREC_REQUESTS_MAX_RES_SIZE
The iproto section is used to configure parameters related to communicating to and between cluster instances.
Note
iproto can be defined in any scope.
-
iproto.listen
An array of URIs used to listen for incoming requests.
If required, you can enable SSL for specific URIs by providing additional parameters (<uri>.params.*).
Note that a URI value can’t contain parameters, a login, or a password.
Example
In the example below, iproto.listen is set explicitly for each instance in a cluster:
groups:
group001:
replicasets:
replicaset001:
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
instance003:
iproto:
listen:
- uri: '127.0.0.1:3303'
See also: Connections
Type: array
Default: nil
Environment variable: TT_IPROTO_LISTEN
-
iproto.net_msg_max
To handle messages, Tarantool allocates fibers.
To prevent fiber overhead from affecting the whole system,
Tarantool restricts how many messages the fibers handle,
so that some pending requests are blocked.
- On powerful systems, increase
net_msg_max, and the scheduler
starts processing pending requests immediately.
- On weaker systems, decrease
net_msg_max, and the overhead
may decrease. However, this may take some time because the
scheduler must wait until already-running requests finish.
When net_msg_max is reached,
Tarantool suspends processing of incoming packages until it
has processed earlier messages. This is not a direct restriction of
the number of fibers that handle network messages, rather it
is a system-wide restriction of channel bandwidth.
This in turn restricts the number of incoming
network messages that the
transaction processor thread
handles, and therefore indirectly affects the fibers that handle
network messages.
Note
The number of fibers is smaller than the number of messages because
messages can be released as soon as they are delivered, while
incoming requests might not be processed until some time after delivery.
Type: integer
Default: 768
Environment variable: TT_IPROTO_NET_MSG_MAX
-
iproto.readahead
The size of the read-ahead buffer associated with a client connection.
The larger the buffer, the more memory an active connection consumes, and the
more requests can be read from the operating system buffer in a single
system call.
The recommendation is to make sure that the buffer can contain at least a few dozen requests.
Therefore, if a typical tuple in a request is large, e.g. a few kilobytes or even megabytes, the read-ahead buffer size should be increased.
If batched request processing is not used, it’s prudent to leave this setting at its default.
Type: integer
Default: 16320
Environment variable: TT_IPROTO_READAHEAD
-
iproto.threads
The number of network threads.
There can be unusual workloads where the network thread
is 100% loaded and the transaction processor thread is not, so the network
thread is a bottleneck.
In that case, set iproto_threads to 2 or more.
The operating system kernel determines which connection goes to
which thread.
Type: integer
Default: 1
Environment variable: TT_IPROTO_THREADS
-
iproto.advertise.client
A URI used to advertise the current instance to clients.
The iproto.advertise.client option accepts a URI in the following formats:
- An address:
host:port.
- A Unix domain socket:
unix/:.
Note that this option doesn’t allow to set a username and password.
If a remote client needs this information, it should be delivered outside of the cluster configuration.
Note
The host value cannot be 0.0.0.0/[::] and the port value cannot be 0.
Type: string
Environment variable: TT_IPROTO_ADVERTISE_CLIENT
-
iproto.advertise.peer
Settings used to advertise the current instance to other cluster members.
The format of these settings is described in iproto.advertise.<peer_or_sharding>.*.
Example
In the example below, the following configuration options are specified:
- In the credentials section, the
replicator user with the replication role is created.
iproto.advertise.peer specifies that other instances should connect to an address defined in iproto.listen using the replicator user.
credentials:
users:
replicator:
password: 'topsecret'
roles: [replication]
iproto:
advertise:
peer:
login: replicator
replication:
failover: election
groups:
group001:
replicasets:
replicaset001:
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
instance003:
iproto:
listen:
- uri: '127.0.0.1:3303'
-
iproto.advertise.sharding
Settings used to advertise the current instance to a router and rebalancer.
The format of these settings is described in iproto.advertise.<peer_or_sharding>.*.
Note
If iproto.advertise.sharding is not specified, advertise settings from iproto.advertise.peer are used.
Example
In the example below, the following configuration options are specified:
- In the credentials section, the
replicator and storage users are created.
iproto.advertise.peer specifies that other instances should connect to an address defined in iproto.listen with the replicator user.
iproto.advertise.sharding specifies that a router should connect to storages using an address defined in iproto.listen with the storage user.
credentials:
users:
replicator:
password: 'topsecret'
roles: [replication]
storage:
password: 'secret'
roles: [sharding]
iproto:
advertise:
peer:
login: replicator
sharding:
login: storage
iproto.advertise.<peer_or_sharding>.*
-
iproto.advertise.<peer_or_sharding>.uri
(Optional) A URI used to advertise the current instance.
By default, the URI defined in iproto.listen is used to advertise the current instance.
Note
The host value cannot be 0.0.0.0/[::] and the port value cannot be 0.
Type: string
Default: nil
Environment variable: TT_IPROTO_ADVERTISE_PEER_URI, TT_IPROTO_ADVERTISE_SHARDING_URI
-
iproto.advertise.<peer_or_sharding>.login
(Optional) A username used to connect to the current instance.
If a username is not set, the guest user is used.
Type: string
Default: nil
Environment variable: TT_IPROTO_ADVERTISE_PEER_LOGIN, TT_IPROTO_ADVERTISE_SHARDING_LOGIN
-
iproto.advertise.<peer_or_sharding>.password
(Optional) A password for the specified user.
If a login is specified but a password is missing, it is taken from the user’s credentials.
Type: string
Default: nil
Environment variable: TT_IPROTO_ADVERTISE_PEER_PASSWORD, TT_IPROTO_ADVERTISE_SHARDING_PASSWORD
-
iproto.advertise.<peer_or_sharding>.params
(Optional) URI parameters (<uri>.params.*) required for connecting to the current instance.
URI parameters that can be used in the iproto.listen.<uri>.params and iproto.advertise.<peer_or_sharding>.params options.
Note
Note that <uri>.params.* options don’t have corresponding environment variables for URIs specified in iproto.listen.
-
<uri>.params.transport
Allows you to enable traffic encryption for client-server communications over binary connections.
In a Tarantool cluster, one instance might act as the server that accepts connections from other instances and the client that connects to other instances.
<uri>.params.transport accepts one of the following values:
plain (default): turn off traffic encryption.
ssl: encrypt traffic by using the TLS 1.2 protocol (Enterprise Edition only).
Example
The example below demonstrates how to enable traffic encryption by using a self-signed server certificate.
The following parameters are specified for each instance:
ssl_cert_file: a path to an SSL certificate file.
ssl_key_file: a path to a private SSL key file.
replicaset001:
replication:
failover: manual
leader: instance001
iproto:
advertise:
peer:
login: replicator
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
params:
transport: 'ssl'
ssl_cert_file: 'certs/server.crt'
ssl_key_file: 'certs/server.key'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
params:
transport: 'ssl'
ssl_cert_file: 'certs/server.crt'
ssl_key_file: 'certs/server.key'
instance003:
iproto:
listen:
- uri: '127.0.0.1:3303'
params:
transport: 'ssl'
ssl_cert_file: 'certs/server.crt'
ssl_key_file: 'certs/server.key'
Example on Github: ssl_without_ca
Type: string
Default: ‘plain’
Environment variable: TT_IPROTO_ADVERTISE_PEER_PARAMS_TRANSPORT, TT_IPROTO_ADVERTISE_SHARDING_PARAMS_TRANSPORT
-
<uri>.params.ssl_ca_file
(Optional) A path to a trusted certificate authorities (CA) file.
If not set, the peer won’t be checked for authenticity.
Both a server and a client can use the ssl_ca_file parameter:
- If it’s on the server side, the server verifies the client.
- If it’s on the client side, the client verifies the server.
- If both sides have the CA files, the server and the client verify each other.
See also: <uri>.params.transport
Type: string
Default: nil
Environment variable: TT_IPROTO_ADVERTISE_PEER_PARAMS_SSL_CA_FILE, TT_IPROTO_ADVERTISE_SHARDING_PARAMS_SSL_CA_FILE
-
<uri>.params.ssl_cert_file
A path to an SSL certificate file:
- For a server, it’s mandatory.
- For a client, it’s mandatory if the ssl_ca_file parameter is set for a server; otherwise, optional.
See also: <uri>.params.transport
Type: string
Default: nil
Environment variable: TT_IPROTO_ADVERTISE_PEER_PARAMS_SSL_CERT_FILE, TT_IPROTO_ADVERTISE_SHARDING_PARAMS_SSL_CERT_FILE
-
<uri>.params.ssl_ciphers
(Optional) A colon-separated (:) list of SSL cipher suites the connection can use.
Note that the list is not validated: if a cipher suite is unknown, Tarantool ignores it, doesn’t establish the connection, and writes to the log that no shared cipher was found.
The supported cipher suites are:
- ECDHE-ECDSA-AES256-GCM-SHA384
- ECDHE-RSA-AES256-GCM-SHA384
- DHE-RSA-AES256-GCM-SHA384
- ECDHE-ECDSA-CHACHA20-POLY1305
- ECDHE-RSA-CHACHA20-POLY1305
- DHE-RSA-CHACHA20-POLY1305
- ECDHE-ECDSA-AES128-GCM-SHA256
- ECDHE-RSA-AES128-GCM-SHA256
- DHE-RSA-AES128-GCM-SHA256
- ECDHE-ECDSA-AES256-SHA384
- ECDHE-RSA-AES256-SHA384
- DHE-RSA-AES256-SHA256
- ECDHE-ECDSA-AES128-SHA256
- ECDHE-RSA-AES128-SHA256
- DHE-RSA-AES128-SHA256
- ECDHE-ECDSA-AES256-SHA
- ECDHE-RSA-AES256-SHA
- DHE-RSA-AES256-SHA
- ECDHE-ECDSA-AES128-SHA
- ECDHE-RSA-AES128-SHA
- DHE-RSA-AES128-SHA
- AES256-GCM-SHA384
- AES128-GCM-SHA256
- AES256-SHA256
- AES128-SHA256
- AES256-SHA
- AES128-SHA
- GOST2012-GOST8912-GOST8912
- GOST2001-GOST89-GOST89
For detailed information on SSL ciphers and their syntax, refer to OpenSSL documentation.
See also: <uri>.params.transport
Type: string
Default: nil
Environment variable: TT_IPROTO_ADVERTISE_PEER_PARAMS_SSL_CIPHERS, TT_IPROTO_ADVERTISE_SHARDING_PARAMS_SSL_CIPHERS
-
<uri>.params.ssl_key_file
A path to a private SSL key file:
- For a server, it’s mandatory.
- For a client, it’s mandatory if the ssl_ca_file parameter is set for a server; otherwise, optional.
If the private key is encrypted, provide a password for it in the ssl_password or ssl_password_file parameter.
See also: <uri>.params.transport
Type: string
Default: nil
Environment variable: TT_IPROTO_ADVERTISE_PEER_PARAMS_SSL_KEY_FILE, TT_IPROTO_ADVERTISE_SHARDING_PARAMS_SSL_KEY_FILE
-
<uri>.params.ssl_password
(Optional) A password for an encrypted private SSL key provided using ssl_key_file.
Alternatively, the password can be provided in ssl_password_file.
Tarantool applies the ssl_password and ssl_password_file parameters in the following order:
- If
ssl_password is provided, Tarantool tries to decrypt the private key with it.
- If
ssl_password is incorrect or isn’t provided, Tarantool tries all passwords from ssl_password_file
one by one in the order they are written.
- If
ssl_password and all passwords from ssl_password_file are incorrect,
or none of them is provided, Tarantool treats the private key as unencrypted.
See also: <uri>.params.transport
Type: string
Default: nil
Environment variable: TT_IPROTO_ADVERTISE_PEER_PARAMS_SSL_PASSWORD, TT_IPROTO_ADVERTISE_SHARDING_PARAMS_SSL_PASSWORD
-
<uri>.params.ssl_password_file
(Optional) A text file with one or more passwords for encrypted private SSL keys provided using ssl_key_file (each on a separate line).
Alternatively, the password can be provided in ssl_password.
See also: <uri>.params.transport
Type: string
Default: nil
Environment variable: TT_IPROTO_ADVERTISE_PEER_PARAMS_SSL_PASSWORD_FILE, TT_IPROTO_ADVERTISE_SHARDING_PARAMS_SSL_PASSWORD_FILE
The groups section provides the ability to define the full topology of a Tarantool cluster.
Note
groups can be defined in the global scope only.
-
groups.<group_name>
A group name.
The following rules are applied to group names:
- The maximum number of symbols is 63.
- Should start with a letter.
- Can contain lowercase letters (a-z).
- Can contain digits (0-9).
- Can contain the following characters:
-, _.
-
groups.<group_name>.replicasets
Replica sets that belong to this group. See replicasets.
-
groups.<group_name>.<config_parameter>
Any configuration parameter that can be defined in the group scope.
For example, iproto and database configuration parameters defined at the group level are applied to all instances in this group.
Note
replicasets can be defined in the group scope only.
-
replicasets.<replicaset_name>
A replica set name.
Note that the rules applied to a replica set name are the same as for groups.
Learn more in groups.<group_name>.
-
replicasets.<replicaset_name>.leader
A replica set leader.
This option can be used to set a replica set leader when manual replication.failover is used.
To perform controlled failover, <replicaset_name>.leader can be temporarily removed or set to null.
Example
replication:
failover: manual
groups:
group001:
replicasets:
replicaset001:
leader: instance001
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
instance003:
iproto:
listen:
- uri: '127.0.0.1:3303'
-
replicasets.<replicaset_name>.bootstrap_leader
A bootstrap leader for a replica set.
To specify a bootstrap leader manually, you need to set replication.bootstrap_strategy to config.
Example
groups:
group001:
replicasets:
replicaset001:
replication:
bootstrap_strategy: config
bootstrap_leader: instance001
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
instance003:
iproto:
listen:
- uri: '127.0.0.1:3303'
-
replicasets.<replicaset_name>.instances
Instances that belong to this replica set. See instances.
-
replicasets.<replicaset_name>.<config_parameter>
Any configuration parameter that can be defined in the replica set scope.
For example, iproto and database configuration parameters defined at the replica set level are applied to all instances in this replica set.
Note
instances can be defined in the replica set scope only.
-
instances.<instance_name>
An instance name.
Note that the rules applied to an instance name are the same as for groups.
Learn more in groups.<group_name>.
-
instances.<instance_name>.<config_parameter>
Any configuration parameter that can be defined in the instance scope.
For example, iproto and database configuration parameters defined at the instance level are applied to this instance only.
Since version 3.3.0, a new isolated option is added to instance configuration.
The option takes boolean values, by default it is set to false.
isolated: true moves the instance it has been applied at to the isolated mode.
The isolated mode allows the user to temporarily isolate an instance and perform maintenance activities on it.
In the isolated mode:
- The instance is moved to the read-only state
- iproto stops listening for new connections
- iproto drops all the current connections
- The instance is disconnected from all the replication upstreams
- Other replicaset members exclude the isolated instance from the replication upstreams
Note
Isolated instance can’t be bootstrapped (a local snapshot is required to start).
Example
The example below shows how to isolate an instance:
groups:
g:
replicasets:
r:
instances:
i-001: {}
i-002: {}
i-003: {}
i-004:
isolated: true
The labels section allows adding custom attributes to the configuration.
Attributes must be key: value pairs with string keys and values.
Note
labels can be defined in any scope.
-
labels.<label_name>
A value of the label with the specified name.
Example
The example below shows how to define labels on the replica set and instance levels:
groups:
group001:
replicasets:
replicaset001:
labels:
dc: 'east'
production: 'false'
instances:
instance001:
labels:
rack: '10'
production: 'true'
See also: Adding labels
The log section defines configuration parameters related to logging.
To handle logging in your application, use the log module.
Note
log can be defined in any scope.
-
log.to
Define a location Tarantool sends logs to.
This option accepts the following values:
stderr: write logs to the standard error stream.
file: write logs to a file (see log.file).
pipe: start a program and write logs to its standard input (see log.pipe).
syslog: write logs to a system logger (see log.syslog.*).
Type: string
Default: ‘stderr’
Environment variable: TT_LOG_TO
-
log.file
Specify a file for logs destination.
To write logs to a file, you need to set log.to to file.
Otherwise, log.file is ignored.
Example
The example below shows how to write logs to a file placed in the specified directory:
log:
to: file
file: var/log/{{ instance_name }}/instance.log
Example on GitHub: log_file
Type: string
Default: ‘var/log/{{ instance_name }}/tarantool.log’
Environment variable: TT_LOG_FILE
-
log.format
Specify a format that is used for a log entry.
The following formats are supported:
Type: string
Default: ‘plain’
Environment variable: TT_LOG_FORMAT
-
log.level
Specify the level of detail logs have.
There are the following levels:
- 0 –
fatal
- 1 –
syserror
- 2 –
error
- 3 –
crit
- 4 –
warn
- 5 –
info
- 6 –
verbose
- 7 –
debug
By setting log.level, you can enable logging of all events with severities above or equal to the given level.
Example
The example below shows how to log all events with severities above or equal to the VERBOSE level.
Example on GitHub: log_level
Type: number, string
Default: 5
Environment variable: TT_LOG_LEVEL
-
log.modules
Configure the specified log levels (log.level) for different modules.
You can specify a logging level for the following module types:
Example 1: Set log levels for files that use the default logger
Suppose you have two identical modules placed by the following paths: test/module1.lua and test/module2.lua.
These modules use the default logger and look as follows:
return {
say_hello = function()
local log = require('log')
log.info('Info message from module1')
end
}
To configure logging levels, you need to provide module names corresponding to paths to these modules:
log:
modules:
test.module1: 'verbose'
test.module2: 'error'
app:
file: 'app.lua'
To load these modules in your application (app.lua), you need to add the corresponding require directives:
module1 = require('test.module1')
module2 = require('test.module2')
Given that module1 has the verbose logging level and module2 has the error level, calling module1.say_hello() shows a message but module2.say_hello() is swallowed:
-- Prints 'info' messages --
module1.say_hello()
--[[
[92617] main/103/interactive/test.logging.module1 I> Info message from module1
---
...
--]]
-- Swallows 'info' messages --
module2.say_hello()
--[[
---
...
--]]
Example on GitHub: log_existing_modules
Example 2: Set log levels for modules that use custom loggers
This example shows how to set the verbose level for module1 and the error level for module2:
log:
modules:
module1: 'verbose'
module2: 'error'
app:
file: 'app.lua'
To create custom loggers in your application (app.lua), call the log.new() function:
-- Creates new loggers --
module1_log = require('log').new('module1')
module2_log = require('log').new('module2')
Given that module1 has the verbose logging level and module2 has the error level, calling module1_log.info() shows a message but module2_log.info() is swallowed:
-- Prints 'info' messages --
module1_log.info('Info message from module1')
--[[
[16300] main/103/interactive/module1 I> Info message from module1
---
...
--]]
-- Swallows 'debug' messages --
module1_log.debug('Debug message from module1')
--[[
---
...
--]]
-- Swallows 'info' messages --
module2_log.info('Info message from module2')
--[[
---
...
--]]
Example on GitHub: log_new_modules
Example 3: Set a log level for C modules
This example shows how to set the info level for the tarantool module:
log:
modules:
tarantool: 'info'
app:
file: 'app.lua'
The specified level affects messages logged from C modules:
ffi = require('ffi')
-- Prints 'info' messages --
ffi.C._say(ffi.C.S_INFO, nil, 0, nil, 'Info message from C module')
--[[
[6024] main/103/interactive I> Info message from C module
---
...
--]]
-- Swallows 'debug' messages --
ffi.C._say(ffi.C.S_DEBUG, nil, 0, nil, 'Debug message from C module')
--[[
---
...
--]]
The example above uses the LuaJIT ffi library to call C functions provided by the say module.
Example on GitHub: log_existing_c_modules
Type: map
Environment variable: TT_LOG_MODULES
-
log.nonblock
Specify the logging behavior if the system is not ready to write.
If set to true, Tarantool does not block during logging if the system is non-writable and writes a message instead.
Using this value may improve logging performance at the cost of losing some log messages.
Note
The option only has an effect if the log.to is set to syslog
or pipe.
Type: boolean
Default: false
Environment variable: TT_LOG_NONBLOCK
-
log.pipe
Start a program and write logs to its standard input (stdin).
To send logs to a program’s standard input, you need to set log.to to pipe.
Example
In the example below, Tarantool writes logs to the standard input of cronolog:
log:
to: pipe
pipe: 'cronolog tarantool.log'
Example on GitHub: log_pipe
Type: string
Environment variable: TT_LOG_PIPE
-
log.syslog.facility
Specify the syslog facility to be used when syslog is enabled.
To write logs to syslog, you need to set log.to to syslog.
Type: string
Possible values: ‘auth’, ‘authpriv’, ‘cron’, ‘daemon’, ‘ftp’, ‘kern’, ‘lpr’, ‘mail’, ‘news’, ‘security’, ‘syslog’, ‘user’, ‘uucp’, ‘local0’, ‘local1’, ‘local2’, ‘local3’, ‘local4’, ‘local5’, ‘local6’, ‘local7’
Default: ‘local7’
Environment variable: TT_LOG_SYSLOG_FACILITY
-
log.syslog.identity
Specify an application name used to identify Tarantool messages in syslog logs.
To write logs to syslog, you need to set log.to to syslog.
Type: string
Default: ‘tarantool’
Environment variable: TT_LOG_SYSLOG_IDENTITY
-
log.syslog.server
Set a location of a syslog server.
This option accepts one of the following values:
- An IPv4 address. Example:
127.0.0.1:514.
- A Unix socket path starting with
unix:. Examples: unix:/dev/log on Linux or unix:/var/run/syslog on macOS.
To write logs to syslog, you need to set log.to to syslog.
Example
In the example below, Tarantool writes logs to a syslog server that listens for logging messages on the 127.0.0.1:514 address:
log:
to: syslog
syslog:
server: '127.0.0.1:514'
Example on GitHub: log_syslog
Type: string
Default: box.NULL
Environment variable: TT_LOG_SYSLOG_SERVER
The lua section outlines the configuration parameters related to the Lua environment within Tarantool.
Note
lua can be defined in any scope.
-
lua.memory
Specifies the maximum memory amount available to Lua scripts, measured in bytes.
When the specified value exceeds the current memory usage, the new limit takes effect immediately without a restart.
However, when the specified value is lower than the current memory usage, a restart of the instance is required for the change to take effect.
Example to set the Lua memory limit to 4 GB:
Type: integer
Default: 2147483648 (2GB)
Environment variable: TT_LUA_MEMORY
The memtx section is used to configure parameters related to the memtx engine.
Note
memtx can be defined in any scope.
-
memtx.allocator
Specify the allocator that manages memory for memtx tuples.
Possible values:
system – the memory is allocated as needed, checking that the quota is not exceeded.
THe allocator is based on the malloc function.
small – a slab allocator.
The allocator repeatedly uses a memory block to allocate objects of the same type.
Note that this allocator is prone to unresolvable fragmentation on specific workloads,
so you can switch to system in such cases.
Type: string
Default: ‘small’
Environment variable: TT_MEMTX_ALLOCATOR
-
memtx.max_tuple_size
Size of the largest allocation unit for the memtx storage engine in bytes.
It can be increased if it is necessary to store large tuples.
Type: integer
Default: 1048576
Environment variable: TT_MEMTX_MAX_TUPLE_SIZE
-
memtx.memory
The amount of memory in bytes that Tarantool allocates to store tuples.
When the limit is reached, INSERT and
UPDATE requests fail with the ER_MEMORY_ISSUE error.
The server does not go beyond the memtx.memory limit to allocate tuples, but there is additional memory
used to store indexes and connection information.
Example
In the example below, the memory size is set to 1 GB (1073741824 bytes).
memtx:
memory: 1073741824
Type: integer
Default: 268435456
Environment variable: TT_MEMTX_MEMORY
-
memtx.min_tuple_size
Size of the smallest allocation unit in bytes.
It can be decreased if most of the tuples are very small.
Type: integer
Default: 16
Possible values: between 8 and 1048280 inclusive
Environment variable: TT_MEMTX_MIN_TUPLE_SIZE
-
memtx.slab_alloc_factor
The multiplier for computing the sizes of memory
chunks that tuples are stored in.
A lower value may result in less wasted
memory depending on the total amount of memory available and the
distribution of item sizes.
See also: memtx.slab_alloc_granularity
Type: number
Default: 1.05
Possible values: between 1 and 2 inclusive
Environment variable: TT_MEMTX_SLAB_ALLOC_FACTOR
-
memtx.slab_alloc_granularity
Specify the granularity in bytes of memory allocation in the small allocator.
The memtx.slab_alloc_granularity value should meet the following conditions:
- The value is a power of two.
- The value is greater than or equal to 4.
Below are few recommendations on how to adjust the memtx.slab_alloc_granularity option:
- If the tuples in space are small and have about the same size, set the option to 4 bytes to save memory.
- If the tuples are different-sized, increase the option value to allocate tuples from the same
mempool (memory pool).
See also: memtx.slab_alloc_factor
Type: integer
Default: 8
Environment variable: TT_MEMTX_SLAB_ALLOC_GRANULARITY
-
memtx.sort_threads
The number of threads from the thread pool used to sort keys of secondary indexes on loading a memtx database.
The minimum value is 1, the maximum value is 256.
The default is to use all available cores.
Note
Since 3.0.0, this option replaces the approach when OpenMP threads are used to parallelize sorting.
For backward compatibility, the OMP_NUM_THREADS environment variable is taken into account to
set the number of sorting threads.
Type: integer
Default: box.NULL
Environment variable: TT_MEMTX_SORT_THREADS
The metrics section defines configuration parameters for metrics.
Note
metrics can be defined in any scope.
-
metrics.exclude
An array containing the metrics to turn off.
The array can contain the same values as the exclude configuration parameter passed to metrics.cfg().
Example
metrics:
include: [ all ]
exclude: [ vinyl ]
labels:
alias: '{{ instance_name }}'
Type: array
Default: []
Environment variable: TT_METRICS_EXCLUDE
-
metrics.include
An array containing the metrics to turn on.
The array can contain the same values as the include configuration parameter passed to metrics.cfg().
Type: array
Default: [ all ]
Environment variable: TT_METRICS_INCLUDE
-
metrics.labels
Global labels to be added to every observation.
Type: map
Default: { alias = names.instance_name }
Environment variable: TT_METRICS_LABELS
The process section defines configuration parameters of the Tarantool process in the system.
Note
process can be defined in any scope.
-
process.background
Run the server as a daemon process.
If this option is set to true, Tarantool log location defined by the
log.to option should be set to
file, pipe, or syslog – anything other than stderr,
the default, because a daemon process is detached from a terminal
and it can’t write to the terminal’s stderr.
Important
Do not enable the background mode for applications intended to run by the
tt utility. For more information, see the tt start reference.
Type: boolean
Default: false
Environment variable: TT_PROCESS_BACKGROUND
-
process.coredump
Create coredump files.
Usually, an administrator needs to call ulimit -c unlimited
(or set corresponding options in systemd’s unit file)
before running a Tarantool process to get core dumps.
If process.coredump is enabled, Tarantool sets the corresponding
resource limit by itself
and the administrator doesn’t need to call ulimit -c unlimited
(see man 3 setrlimit).
This option also sets the state of the dumpable attribute,
which is enabled by default,
but may be dropped in some circumstances (according to
man 2 prctl, see PR_SET_DUMPABLE).
Type: boolean
Default: false
Environment variable: TT_PROCESS_COREDUMP
-
process.title
Add the given string to the server’s process title
(it is shown in the COMMAND column for the Linux commands
ps -ef and top -c).
For example, if you set the option to myservice - {{ instance_name }}:
process:
title: myservice - {{ instance_name }}
ps -ef might show the Tarantool server process like this:
$ ps -ef | grep tarantool
503 68100 68098 0 10:33 pts/2 00:00.10 tarantool <running>: myservice instance1
Type: string
Default: ‘tarantool - {{ instance_name }}’
Environment variable: TT_PROCESS_TITLE
-
process.pid_file
Store the process id in this file.
This option may contain a relative file path.
In this case, it is interpreted as relative to
process.work_dir.
Type: string
Default: ‘var/run/{{ instance_name }}/tarantool.pid’
Environment variable: TT_PROCESS_PID_FILE
-
process.strip_core
Whether coredump files should not include memory allocated for tuples –
this memory can be large if Tarantool runs under heavy load.
Setting to true means “do not include”.
Type: boolean
Default: true
Environment variable: TT_PROCESS_STRIP_CORE
-
process.username
The name of the system user to switch to after start.
Type: string
Default: box.NULL
Environment variable: TT_PROCESS_USERNAME
-
process.work_dir
A directory where Tarantool working files will be stored
(database files, logs, a PID file, a console Unix socket, and other files
if an application generates them in the current directory).
The server instance switches to process.work_dir with
chdir(2) after start.
If set as a relative file path, it is relative to the current
working directory, from where Tarantool is started.
If not specified, defaults to the current working directory.
Other directory and file parameters, if set as relative paths,
are interpreted as relative to process.work_dir, for example, directories for storing
snapshots and write-ahead logs.
Type: string
Default: box.NULL
Environment variable: TT_PROCESS_WORK_DIR
The replication section defines configuration parameters related to replication.
-
replication.anon
Whether to make the current instance act as an anonymous replica.
Anonymous replicas are read-only and can be used, for example, for backups.
To make the specified instance act as an anonymous replica, set replication.anon to true:
instance003:
replication:
anon: true
You can find the full example on GitHub: anonymous_replica.
Anonymous replicas are not displayed in the box.info.replication section.
You can check their status using box.info.replication_anon().
While anonymous replicas are read-only, you can write data to replication-local and temporary spaces (created with is_local = true and temporary = true, respectively).
Given that changes to replication-local spaces are allowed, an anonymous replica might increase the 0 component of the vclock value.
Here are the limitations of having anonymous replicas in a replica set:
Note
Anonymous replicas are not registered in the _cluster space.
This means that there is no limitation on the number of anonymous replicas in a replica set.
Type: boolean
Default: false
Environment variable: TT_REPLICATION_ANON
-
replication.autoexpel
Since: 3.3.0
The replication.autoexpel option designed for managing dynamic clusters using YAML-based configurations.
It enables the automatic expulsion of instances that are removed from the YAML configuration.
Only instances with names that match the specified prefix are considered for expulsion; all others are excluded.
Additionally, instances without a persistent name are ignored.
If an instance is in read-write mode and has the latest database schema, it initiates the expulsion of instances that:
- Match the specified prefix
- Absent from the updated YAML configuration
The expulsion process follows the standard procedure, involving the removal of the instance from the _cluster system space.
The autoexpel logic is activated during specific events:
- Startup. When the cluster starts,
autoexpel checks and removes instances not matching the updated configuration.
- Reconfiguration. When the YAML configuration is reloaded,
autoexpel compares the current state to the updated configuration and performs necessary expulsions.
box.status watcher event. Changes detected by the box.status watcher also trigger the autoexpel mechanism.
autoexpel does not take any actions on newly joined instances unless one of the triggering events occurs.
This means that an instance meeting the autoexpel criterion can still join the cluster, but it may be removed
later during reconfiguration or on subsequent triggering events.
Note
The replication.autoexpel option governs the expelling process and is configurable at the replicaset, group, and
global levels. It is not applicable at the instance level.
Configuration fields
by (string, default: nil): specifies the autoexpel criterion. Currently, only prefix is supported and must be explicitly set.
enabled (boolean, default: false): enables or disables the autoexpel logic.
prefix (string, default: nil): defines the pattern for instance names that are considered part of the cluster.
replication.autoexpel_by.*
replication.autoexpel_by purpose is to define the criterion used for determining which instances in a cluster are
subject to the autoexpel process.
The by field helps differentiate between:
The default value of by is nil, meaning no autoexpel criterion is applied unless explicitly set.
Currently, the only supported value for by is prefix. The prefix value instructs the system to identify instances
based on their names, matching them against a prefix pattern defined in the configuration.
If the autoexpel feature is enabled (enabled: true), the by field must be explicitly set to prefix.
The absence of this field or an unsupported value will result in configuration errors.
replication:
autoexpel:
enabled: true
by: prefix
prefix: '{{ replicaset_name }}'
Type: string
Default: nil
Environment variable: TT_REPLICATION_AUTOEXPEL_BY
replication.autoexpel_enabled.*
The replication.autoexpel_enabled field is a boolean configuration option that determines whether the autoexpel logic is active for the cluster.
This feature is designed to automatically manage dynamic cluster configurations by removing instances that are no longer present in the YAML configuration.
Note
By default, the enabled field is set to false, meaning the autoexpel logic is turned off. This ensures that no instances are automatically removed unless explicitly configured.
Enabling autoexpel logic
To enable autoexpel, you should set enabled to true in the replication.autoexpel section of your YAML configuration:
replication:
autoexpel:
enabled: true
by: prefix
prefix: '{{ replicaset_name }}'
To disable autoexpel, set enabled to false.
Dependencies
If enabled is set to true, the following fields are required:
by: specifies the criterion for autoexpel (e.g., prefix).
prefix: defines the pattern used to match instance names for expulsion.
Failure to configure these fields when enabled is true will result in a configuration error.
Type: boolean
Default: false
Environment variable: TT_REPLICATION_AUTOEXPEL_ENABLED
replication.autoexpel_prefix.*
The prefix field filters instances for expulsion by differentiating cluster instances (from the YAML configuration) from external services (e.g., CDC tools). Only instances matching the prefix are considered.
A consistent naming pattern ensures the _cluster system space automatically aligns with the YAML configuration.
If the prefix field is not set (nil), the autoexpel logic cannot identify instances for expulsion, and the feature will not function.
This field is mandatory when replication.autoexpel_enabled is set to true.
How it works:
- The prefix filters instance names (e.g.,
{{ replicaset_name }} for replicaset-specific names or i- for names starting with i-).
- Instances matching the prefix and removed from the YAML configuration are expelled.
- Unnamed instances or those not matching the prefix are ignored.
Dynamic prefix based on replicaset name:
replication:
autoexpel:
enabled: true
by: prefix
prefix: '{{ replicaset_name }}'
In this setup:
- Instances are grouped by replicaset names (e.g.,
r-001-i-001 for replicaset r-001).
- The prefix ensures that only instances with names matching the replicaset name are auto expelled when removed from the configuration.
Static prefix for matching patterns:
replication:
autoexpel:
enabled: true
by: prefix
prefix: 'i-'
In this setup:
- All instances with names starting with
i- (e.g., i-001, i-002) are considered for expulsion.
- This is useful when instances follow a uniform naming convention.
Type: string
Default: nil
Environment variable: TT_REPLICATION_AUTOEXPEL_PREFIX
- Create a
config.yaml file with the following content:
credentials:
users:
guest:
roles: [super]
replication:
failover: manual
autoexpel:
enabled: true
by: prefix
prefix: '{{ replicaset_name }}'
iproto:
listen:
- uri: 'unix/:./var/run/{{ instance_name }}.iproto'
groups:
g-001:
replicasets:
r-001:
leader: r-001-i-001
instances:
r-001-i-001: {}
r-001-i-002: {}
r-001-i-003: {}
- This configuration:
- Sets up authentication with a guest user assigned the super role.
- Enables the
autoexpel option to automatically expel instances not present in the YAML file.
- Defines instance names based on a prefix pattern:
{{ replicaset_name }}.
- Lists three instances:
r-001-i-001, r-001-i-002, and r-001-i-003.
- Open terminal window and start three instances using the following commands:
tarantool --name r-001-i-001 --config config.yaml -i
tarantool --name r-001-i-002 --config config.yaml -i
tarantool --name r-001-i-003 --config config.yaml -i
- Edit
config.yaml and remove the following entry for r-001-i-003:
The updated config.yaml should look like this:
groups:
g-001:
replicasets:
r-001:
leader: r-001-i-001
instances:
r-001-i-001: {}
r-001-i-002: {}
Save the file.
- For the leader instance (
r-001-i-001), check the _cluster space:
Hint
The _cluster system space in Tarantool stores metadata about all instances currently recognized as part of the cluster.
It shows which instances are registered and active.
You should see r-001-i-003 still listed in the _cluster system space.
- Reload the configuration:
config = require('config')
config:reload()
- Verify the changes:
box.space._cluster:fselect()
After the reload, r-001-i-003 should no longer appear in the _cluster system space.
-
replication.bootstrap_strategy
Specifies a strategy used to bootstrap a replica set.
The following strategies are available:
auto: a node doesn’t boot if half or more of the other nodes in a replica set are not connected.
For example, if a replica set contains 2 or 3 nodes, a node requires 2 connected instances.
In the case of 4 or 5 nodes, at least 3 connected instances are required.
Moreover, a bootstrap leader fails to boot unless every connected node has chosen it as a bootstrap leader.
config: use the specified node to bootstrap a replica set.
To specify the bootstrap leader, use the <replicaset_name>.bootstrap_leader option.
supervised: a bootstrap leader isn’t chosen automatically but should be appointed using box.ctl.make_bootstrap_leader() on the desired node.
legacy (deprecated since 2.11.0): a node requires the replication_connect_quorum number of other nodes to be connected.
This option is added to keep the compatibility with the current versions of Cartridge and might be removed in the future.
Type: string
Default: auto
Environment variable: TT_REPLICATION_BOOTSTRAP_STRATEGY
-
replication.connect_timeout
A timeout (in seconds) a replica waits when trying to connect to a master in a cluster.
See orphan status for details.
This parameter is different from
replication.timeout,
which a master uses to disconnect a replica when the master
receives no acknowledgments of heartbeat messages.
Type: number
Default: 30
Environment variable: TT_REPLICATION_CONNECT_TIMEOUT
-
replication.election_mode
A role of a replica set node in the leader election process.
The possible values are:
off: a node doesn’t participate in the election activities.
voter: a node can participate in the election process but can’t be a leader.
candidate: a node should be able to become a leader.
manual: allow to control which instance is the leader explicitly instead of relying on automated leader election.
By default, the instance acts like a voter – it is read-only and may vote for other candidate instances.
Once box.ctl.promote() is called, the instance becomes a candidate and starts a new election round.
If the instance wins the elections, it becomes a leader but won’t participate in any new elections.
Note
You can set replication.election_mode to a value other than off if the replication.failover mode is election.
Type: string
Environment variable: TT_REPLICATION_ELECTION_MODE
-
replication.election_timeout
Specifies the timeout (in seconds) between election rounds in the
leader election process if the previous round
ended up with a split vote.
It is quite big, and for most of the cases, it can be lowered to
300-400 ms.
To avoid the split vote repeat, the timeout is randomized on each node
during every new election, from 100% to 110% of the original timeout value.
For example, if the timeout is 300 ms and there are 3 nodes started
the election simultaneously in the same term,
they can set their election timeouts to 300, 310, and 320 respectively,
or to 305, 302, and 324, and so on. In that way, the votes will never be split
because the election on different nodes won’t be restarted simultaneously.
Type: number
Default: 5
Environment variable: TT_REPLICATION_ELECTION_TIMEOUT
-
replication.election_fencing_mode
Specifies the leader fencing mode that
affects the leader election process. When the parameter is set to soft
or strict, the leader resigns its leadership if it has less than
replication.synchro_quorum
of alive connections to the cluster nodes.
The resigning leader receives the status of a follower in the current election term and becomes
read-only.
- In
soft mode, a connection is considered dead if there are no responses for
4 * replication.timeout seconds both on the current leader and the followers.
- In
strict mode, a connection is considered dead if there are no responses
for 2 * replication.timeout seconds on the
current leader and
4 * replication.timeout seconds on the
followers. This improves the chances that there is only one leader at any time.
Fencing applies to the instances that have the
replication.election_mode set to candidate or manual.
To turn off leader fencing, set election_fencing_mode to off.
Type: string
Default: soft
Possible values: off, soft, strict
Environment variable: TT_REPLICATION_ELECTION_FENCING_MODE
-
replication.failover
A failover mode used to take over a master role when the current master instance fails.
The following modes are available:
See also: Replication tutorials
Note
replication.failover can be defined in the global, group, and replica set scope.
Example
In the example below, the following configuration options are specified:
- In the credentials section, the
replicator user with the replication role is created.
- iproto.advertise.peer specifies that other instances should connect to an address defined in iproto.listen using the
replicator user.
replication.failover specifies that a master instance should be set manually.
- <replicaset_name>.leader sets
instance001 as a replica set leader.
credentials:
users:
replicator:
password: 'topsecret'
roles: [replication]
iproto:
advertise:
peer:
login: replicator
replication:
failover: manual
groups:
group001:
replicasets:
replicaset001:
leader: instance001
instances:
instance001:
iproto:
listen:
- uri: '127.0.0.1:3301'
instance002:
iproto:
listen:
- uri: '127.0.0.1:3302'
instance003:
iproto:
listen:
- uri: '127.0.0.1:3303'
Type: string
Default: off
Environment variable: TT_REPLICATION_FAILOVER
-
replication.peers
URIs of instances that constitute a replica set.
These URIs are used by an instance to connect to another instance as a replica.
Alternatively, you can use iproto.advertise.peer to specify a URI used to advertise the current instance to other cluster members.
Example
In the example below, the following configuration options are specified:
- In the credentials section, the
replicator user with the replication role is created.
replication.peers specifies URIs of replica set instances.
credentials:
users:
replicator:
password: 'topsecret'
roles: [replication]
replication:
peers:
- replicator:topsecret@127.0.0.1:3301
- replicator:topsecret@127.0.0.1:3302
- replicator:topsecret@127.0.0.1:3303
Type: array
Environment variable: TT_REPLICATION_PEERS
-
replication.skip_conflict
By default, if a replica adds a unique key that another replica has
added, replication stops
with the ER_TUPLE_FOUND error.
If replication.skip_conflict is set to true, such errors are ignored.
Note
Instead of saving the broken transaction to the write-ahead log, it is written as NOP (No operation).
Type: boolean
Default: false
Environment variable: TT_REPLICATION_SKIP_CONFLICT
-
replication.sync_lag
The maximum delay (in seconds) between the time when data is written to the master and the time when it is written to a replica.
If replication.sync_lag is set to nil or 365 * 100 * 86400 (TIMEOUT_INFINITY),
a replica is always considered to be “synced”.
Note
This parameter is ignored during bootstrap.
See orphan status for details.
Type: number
Default: 10
Environment variable: TT_REPLICATION_SYNC_LAG
-
replication.sync_timeout
The timeout (in seconds) that a node waits when trying to sync with
other nodes in a replica set after connecting or during a configuration update.
This could fail indefinitely if replication.sync_lag is smaller than network latency, or if the replica cannot keep pace with master updates.
If replication.sync_timeout expires, the replica enters orphan status.
Type: number
Default: 0
Environment variable: TT_REPLICATION_SYNC_TIMEOUT
-
replication.synchro_queue_max_size
Since: 3.3.0
The maximum size of the synchronous transaction queue on a master node, in bytes. The size limit isn’t strict, i.e. if there’s at least one free byte, the whole write request fits and no blocking is involved.
This parameter ensures that the queue does not grow indefinitely, potentially impacting performance and resource usage, and applies only to the master node.
The 0 value disables the limit.
If the synchronous queue reaches the configured size limit, new transactions attempting to enter the queue are discarded.
In such cases, the system returns an error to the user:
The synchronous transaction queue is full.
This size limitation does not apply during the recovery process. Transactions processed during recovery are unaffected by the queue size limit.
Use the following command to view the current size of the synchronous queue:
box.info.synchro.queue.size
Set the synchronous queue size limit in the configuration file:
replication:
synchro_queue_max_size: 33554432 # Limit set to 32 MB
Type: integer
Default: 16777216 (16 MB)
Environment variable: TT_REPLICATION_SYNCHRO_QUEUE_MAX_SIZE
-
replication.synchro_quorum
A number of replicas that should confirm the receipt of a synchronous transaction before it can finish its commit.
This option supports dynamic evaluation of the quorum number.
For example, the default value is N / 2 + 1 where N is the current number of replicas registered in a cluster.
Once any replicas are added or removed, the expression is re-evaluated automatically.
Note that the default value (at least 50% of the cluster size + 1) guarantees data reliability.
Using a value less than the canonical one might lead to unexpected results,
including a split-brain.
replication.synchro_quorum is not used on replicas. If the master fails, the pending synchronous
transactions will be kept waiting on the replicas until a new master is elected.
Note
replication.synchro_quorum does not account for anonymous replicas.
Type: string, number
Default: N / 2 + 1
Environment variable: TT_REPLICATION_SYNCHRO_QUORUM
-
replication.synchro_timeout
For synchronous replication only.
Specify how many seconds to wait for a synchronous transaction quorum
replication until it is declared failed and is rolled back.
It is not used on replicas, so if the master fails, the pending synchronous
transactions will be kept waiting on the replicas until a new master is
elected.
Type: number
Default: 5
Environment variable: TT_REPLICATION_SYNCHRO_TIMEOUT
-
replication.threads
The number of threads spawned to decode the incoming replication data.
In most cases, one thread is enough for all incoming data.
Possible values range from 1 to 1000.
If there are multiple replication threads, connections to serve are distributed evenly between the threads.
Type: integer
Default: 1
Environment variable: TT_REPLICATION_THREADS
-
replication.timeout
A time interval (in seconds) used by a master to send heartbeat requests to a replica when there are no updates to send to this replica.
For each request, a replica should return a heartbeat acknowledgment.
If a master or replica gets no heartbeat message for 4 * replication.timeout seconds, a connection is dropped and a replica tries to reconnect to the master.
See also: Monitoring a replica set
Type: number
Default: 1
Environment variable: TT_REPLICATION_TIMEOUT
This section describes configuration parameters related to application roles.
Note
Configuration parameters related to roles can be defined in any scope.
-
roles
Specify the roles of an instance.
To specify a role’s configuration, use the roles_cfg option.
See also: Enabling and configuring roles
Type: array
Default: nil
Environment variable: TT_ROLES
-
roles_cfg
Specify a role’s configuration.
This option accepts a role name as the key and a role’s configuration as the value.
To specify the roles of an instance, use the roles option.
See also: Enabling and configuring roles
Tip
The experimental.config.utils.schema
built-in module provides an API for managing user-defined configurations
of applications (app.cfg) and roles (roles_cfg).
Type: map
Default: nil
Environment variable: TT_ROLES_CFG
Enterprise Edition
Configuring security parameters is available in the Enterprise Edition only.
The security section defines configuration parameters related to various security settings.
Note
security can be defined in any scope.
-
security.auth_delay
Specify a period of time (in seconds) that a specific user should wait for the next attempt after failed authentication.
The security.auth_retries option lets a client try to authenticate the specified number of times before security.auth_delay is enforced.
In the configuration below, Tarantool lets a client try to authenticate with the same username three times.
At the fourth attempt, the authentication delay configured with security.auth_delay is enforced.
This means that a client should wait 10 seconds after the first failed attempt.
security:
auth_delay: 10
auth_retries: 2
Type: number
Default: 0
Environment variable: TT_SECURITY_AUTH_DELAY
-
security.auth_retries
Specify the maximum number of authentication retries allowed before security.auth_delay is enforced.
The default value is 0, which means security.auth_delay is enforced after the first failed authentication attempt.
The retry counter is reset after security.auth_delay seconds since the first failed attempt.
For example, if a client tries to authenticate fewer than security.auth_retries times within security.auth_delay seconds, no authentication delay is enforced.
The retry counter is also reset after any successful authentication attempt.
Type: integer
Default: 0
Environment variable: TT_SECURITY_AUTH_RETRIES
-
security.auth_type
Specify a protocol used to authenticate users.
The possible values are:
chap-sha1: use the CHAP protocol with SHA-1 hashing applied to passwords.
pap-sha256: use PAP authentication with the SHA256 hashing algorithm.
Note that CHAP stores password hashes in the _user space unsalted.
If an attacker gains access to the database, they may crack a password, for example, using a rainbow table.
For PAP, a password is salted with a user-unique salt before saving it in the database,
which keeps the database protected from cracking using a rainbow table.
To enable PAP, specify the security.auth_type option as follows:
security:
auth_type: 'pap-sha256'
Type: string
Default: ‘chap-sha1’
Environment variable: TT_SECURITY_AUTH_TYPE
-
security.disable_guest
If true, turn off access over remote connections from unauthenticated or guest users.
This option affects connections between cluster members and net.box connections.
Type: boolean
Default: false
Environment variable: TT_SECURITY_DISABLE_GUEST
-
security.password_enforce_digits
If true, a password should contain digits (0-9).
Type: boolean
Default: false
Environment variable: TT_SECURITY_PASSWORD_ENFORCE_DIGITS
-
security.password_enforce_lowercase
If true, a password should contain lowercase letters (a-z).
Type: boolean
Default: false
Environment variable: TT_SECURITY_PASSWORD_ENFORCE_LOWERCASE
-
security.password_enforce_specialchars
If true, a password should contain at least one special character (such as &|?!@$).
Type: boolean
Default: false
Environment variable: TT_SECURITY_PASSWORD_ENFORCE_SPECIALCHARS
-
security.password_enforce_uppercase
If true, a password should contain uppercase letters (A-Z).
Type: boolean
Default: false
Environment variable: TT_SECURITY_PASSWORD_ENFORCE_UPPERCASE
-
security.password_history_length
Specify the number of unique new user passwords before an old password can be reused.
Note
Tarantool uses the auth_history field in the
box.space._user
system space to store user passwords.
Type: integer
Default: 0
Environment variable: TT_SECURITY_PASSWORD_HISTORY_LENGTH
-
security.password_lifetime_days
Specify the maximum period of time (in days) a user can use the same password.
When this period ends, a user gets the “Password expired” error on a login attempt.
To restore access for such users, use box.schema.user.passwd.
Note
The default 0 value means that a password never expires.
Type: integer
Default: 0
Environment variable: TT_SECURITY_PASSWORD_LIFETIME_DAYS
-
security.password_min_length
Specify the minimum number of characters for a password.
Type: integer
Default: 0
Environment variable: TT_SECURITY_PASSWORD_MIN_LENGTH
-
security.secure_erasing
If true, forces Tarantool to overwrite a data file a few times before deletion to render recovery of a deleted file impossible.
The option applies to both .xlog and .snap files as well as Vinyl data files.
Type: boolean
Default: false
Environment variable: TT_SECURITY_SECURE_ERASING
The sharding section defines configuration parameters related to sharding.
Note
Sharding support requires installing the vshard module.
The minimum required version of vshard is 0.1.25.
-
sharding.bucket_count
The total number of buckets in a cluster.
Learn more in Bucket count.
Example
sharding:
bucket_count: 1000
Type: integer
Default: 3000
Environment variable: TT_SHARDING_BUCKET_COUNT
-
sharding.discovery_mode
A mode of the background discovery fiber used by the router to find buckets.
Learn more in vshard.router.discovery_set().
Type: string
Default: ‘on’
Possible values: ‘on’, ‘off’, ‘once’
Environment variable: TT_SHARDING_DISCOVERY_MODE
-
sharding.failover_ping_timeout
The timeout (in seconds) after which a node is considered unavailable if there are no responses during this period.
The failover fiber is used to detect if a node is down.
Type: number
Default: 5
Environment variable: TT_SHARDING_FAILOVER_PING_TIMEOUT
-
sharding.lock
Whether a replica set is locked.
A locked replica set cannot receive new buckets nor migrate its own buckets.
Type: boolean
Default: nil
Environment variable: TT_SHARDING_LOCK
-
sharding.rebalancer_disbalance_threshold
The maximum bucket disbalance threshold (in percent).
The disbalance is calculated for each replica set using the following formula:
|etalon_bucket_count - real_bucket_count| / etalon_bucket_count * 100
Type: number
Default: 1
Environment variable: TT_SHARDING_REBALANCER_DISBALANCE_THRESHOLD
-
sharding.rebalancer_max_receiving
The maximum number of buckets that can be received in parallel by a single replica set.
This number must be limited because the rebalancer sends a large number of buckets from the existing replica sets to the newly added one.
This produces a heavy load on the new replica set.
Example
Suppose, rebalancer_max_receiving is equal to 100 and bucket_count is equal to 1000.
There are 3 replica sets with 333, 333, and 334 buckets on each respectively.
When a new replica set is added, each replica set’s etalon_bucket_count becomes
equal to 250. Rather than receiving all 250 buckets at once, the new replica set
receives 100, 100, and 50 buckets sequentially.
Type: integer
Default: 100
Environment variable: TT_SHARDING_REBALANCER_MAX_RECEIVING
-
sharding.rebalancer_max_sending
The degree of parallelism for parallel rebalancing.
Type: integer
Default: 1
Maximum: 15
Environment variable: TT_SHARDING_REBALANCER_MAX_SENDING
-
sharding.rebalancer_mode
Since: 3.1.0
Configure how a rebalancer is selected:
auto (default): if there are no replica sets with the rebalancer sharding role (sharding.roles), a replica set with the rebalancer is selected automatically among all replica sets.
manual: one of the replica sets should have the rebalancer sharding role. The rebalancer is in this replica set.
off: rebalancing is turned off regardless of whether a replica set with the rebalancer sharding role exists or not.
Type: string
Default: ‘auto’
Environment variable: TT_SHARDING_REBALANCER_MODE
-
sharding.roles
Roles of a replica set in regard to sharding.
A replica set can have the following roles:
router: a replica set acts as a router.
storage: a replica set acts as a storage.
rebalancer: a replica set acts as a rebalancer.
The rebalancer role is optional.
If it is not specified, a rebalancer is selected automatically from the master instances of replica sets.
There can be at most one replica set with the rebalancer role.
Additionally, this replica set should have a storage role.
Example
replicasets:
storage-a:
sharding:
roles: [storage, rebalancer]
See also: Sharding roles
Type: array
Default: nil
Environment variable: TT_SHARDING_ROLES
-
sharding.sched_move_quota
A scheduler’s bucket move quota used by the rebalancer.
sched_move_quota defines how many bucket moves can be done in a row if there are pending storage refs.
Then, bucket moves are blocked and a router continues making map-reduce requests.
See also: sharding.sched_ref_quota
Type: number
Default: 1
Environment variable: TT_SHARDING_SCHED_MOVE_QUOTA
-
sharding.sched_ref_quota
A scheduler’s storage ref quota used by a router’s map-reduce API.
For example, the vshard.router.map_callrw() function implements consistent map-reduce over the entire cluster.
sched_ref_quota defines how many storage refs, therefore map-reduce requests, can be executed on the storage in a row if there are pending bucket moves.
Then, storage refs are blocked and the rebalancer continues bucket moves.
See also: sharding.sched_move_quota
Type: number
Default: 300
Environment variable: TT_SHARDING_SCHED_REF_QUOTA
-
sharding.shard_index
The name or ID of a TREE index over the bucket id.
Spaces without this index do not participate in a sharded Tarantool
cluster and can be used as regular spaces if needed. It is necessary to
specify the first part of the index, other parts are optional.
See also: Data definition
Type: string
Default: ‘bucket_id’
Environment variable: TT_SHARDING_SHARD_INDEX
-
sharding.sync_timeout
The timeout to wait for synchronization of the old master with replicas before demotion.
Used when switching a master or when manually calling the sync() function.
Type: number
Default: 1
Environment variable: TT_SHARDING_SYNC_TIMEOUT
-
sharding.weight
Since: 3.1.0
The relative amount of data that a replica set can store.
Learn more at Replica set weights.
Type: number
Default: 1
Environment variable: TT_SHARDING_WEIGHT
-
sharding.zone
A zone that can be set for routers and replicas.
This allows sending read-only requests not only to a master instance but to any available replica that is the nearest to the router.
Note
sharding.zone can be specified at any level.
Type: integer
Default: nil
Environment variable: TT_SHARDING_ZONE
The snapshot section defines configuration parameters related to the snapshot files.
To learn more about the snapshots’ configuration, check the Persistence page.
Note
snapshot can be defined in any scope.
-
snapshot.dir
A directory where memtx stores snapshot (.snap) files.
A relative path in this option is interpreted as relative to process.work_dir.
By default, snapshots and WAL files are stored in the same directory.
However, you can set different values for the snapshot.dir and wal.dir options
to store them on different physical disks for performance matters.
Type: string
Default: ‘var/lib/{{ instance_name }}’
Environment variable: TT_SNAPSHOT_DIR
-
snapshot.snap_io_rate_limit
Reduce the throttling effect of box.snapshot() on
INSERT/UPDATE/DELETE performance by setting a limit on how many
megabytes per second it can write to disk. The same can be
achieved by splitting wal.dir and
snapshot.dir
locations and moving snapshots to a separate disk.
The limit also affects what
box.stat.vinyl().regulator
may show for the write rate of dumps to .run and .index files.
Type: number
Default: box.NULL
Environment variable: TT_SNAPSHOT_SNAP_IO_RATE_LIMIT
-
snapshot.count
The maximum number of snapshots that are stored in the
snapshot.dir directory.
If the number of snapshots after creating a new one exceeds this value,
the Tarantool garbage collector deletes old snapshots.
If snapshot.count is set to zero, the garbage collector
does not delete old snapshots.
Example
In the example, the checkpoint daemon creates a snapshot every two hours until
it has created three snapshots. After creating a new snapshot (the fourth one), the oldest snapshot
and any associated write-ahead-log files are deleted.
snapshot:
by:
interval: 7200
count: 3
Note
Snapshots will not be deleted if replication is ongoing and the file has not been relayed to a replica.
Therefore, snapshot.count has no effect unless all replicas are alive.
Type: integer
Default: 2
Environment variable: TT_SNAPSHOT_COUNT
-
snapshot.by.interval
The interval in seconds between actions by the checkpoint daemon.
If the option is set to a value greater than zero, and there is
activity that causes change to a database, then the checkpoint daemon calls
box.snapshot() every snapshot.by.interval
seconds, creating a new snapshot file each time.
If the option is set to zero, the checkpoint daemon is disabled.
Example
In the example, the checkpoint daemon creates a new database snapshot every two hours, if there is activity.
Type: number
Default: 3600
Environment variable: TT_SNAPSHOT_BY_INTERVAL
-
snapshot.by.wal_size
The threshold for the total size in bytes for all WAL files created since the last snapshot taken.
Once the configured threshold is exceeded, the WAL thread notifies the
checkpoint daemon that it must make a new snapshot and delete old WAL files.
Type: integer
Default: 10^18
Environment variable: TT_SNAPSHOT_BY_WAL_SIZE
The sql section defines configuration parameters related to SQL.
Note
sql can be defined in any scope.
-
sql.cache_size
The maximum cache size (in bytes) for all SQL prepared statements.
To see the actual cache size, use box.info.sql().cache.size.
Type: integer
Default: 5242880
Environment variable: TT_SQL_CACHE_SIZE
The vinyl section defines configuration parameters related to the
vinyl storage engine.
Note
vinyl can be defined in any scope.
-
vinyl.bloom_fpr
A bloom filter’s false positive rate – the suitable probability of the
bloom filter
to give a wrong result.
The vinyl.bloom_fpr setting is a default value for the
bloom_fpr
option passed to space_object:create_index().
Type: number
Default: 0.05
Environment variable: TT_VINYL_BLOOM_FPR
-
vinyl.cache
The cache size for the vinyl storage engine. The cache can
be resized dynamically.
Type: integer
Default: 128 * 1024 * 1024
Environment variable: TT_VINYL_CACHE
-
vinyl.defer_deletes
Enable the deferred DELETE optimization in vinyl. It was disabled by default
since Tarantool version 2.10 to avoid possible performance degradation
of secondary index reads.
Type: boolean
Default: false
Environment variable: TT_VINYL_DEFER_DELETES
-
vinyl.dir
A directory where vinyl files or subdirectories will be stored.
This option may contain a relative file path.
In this case, it is interpreted as relative to
process.work_dir.
Type: string
Default: ‘var/lib/{{ instance_name }}’
Environment variable: TT_VINYL_DIR
-
vinyl.max_tuple_size
The size of the largest allocation unit, for the vinyl storage engine.
It can be increased if it is necessary to store large tuples.
Type: integer
Default: 1024 * 1024
Environment variable: TT_VINYL_MAX_TUPLE_SIZE
-
vinyl.memory
The maximum number of in-memory bytes that vinyl uses.
Type: integer
Default: 128 * 1024 * 1024
Environment variable: TT_VINYL_MEMORY
-
vinyl.page_size
The page size. A page is a read/write unit for vinyl disk operations.
The vinyl.page_size setting is a default value
for the page_size
option passed to space_object:create_index().
Type: integer
Default: 8 * 1024
Environment variable: TT_VINYL_PAGE_SIZE
-
vinyl.range_size
The default maximum range size for a vinyl index, in bytes.
The maximum range size affects the decision of whether to
split a range.
If vinyl.range_size is specified (but the value is not null or 0), then
it is used as the default value for the
range_size
option passed to space_object:create_index().
If vinyl.range_size is not specified (or is explicitly set to null or 0),
and range_size is not specified when the index is created,
then Tarantool sets a value later depending on performance considerations.
To see the actual value, use
index_object:stat().range_size.
Type: integer
Default: box.NULL (means that an effective default is determined in runtime)
Environment variable: TT_VINYL_RANGE_SIZE
-
vinyl.read_threads
The maximum number of read threads that vinyl can use for
concurrent operations, such as I/O and compression.
Type: integer
Default: 1
Environment variable: TT_VINYL_READ_THREADS
-
vinyl.run_count_per_level
The maximum number of runs per level in the vinyl LSM tree.
If this number is exceeded, a new level is created.
The vinyl.run_count_per_level setting is a default value for the
run_count_per_level
option passed to space_object:create_index().
Type: integer
Default: 2
Environment variable: TT_VINYL_RUN_COUNT_PER_LEVEL
-
vinyl.run_size_ratio
The ratio between the sizes of different levels in the LSM tree.
The vinyl.run_size_ratio setting is a default value for the
run_size_ratio
option passed to space_object:create_index().
Type: number
Default: 3.5
Environment variable: TT_VINYL_RUN_SIZE_RATIO
-
vinyl.timeout
The vinyl storage engine has a scheduler that performs compaction.
When vinyl is low on available memory, the compaction scheduler
may be unable to keep up with incoming update requests.
In that situation, queries may time out after vinyl.timeout seconds.
This should rarely occur, since normally vinyl
throttles inserts when it is running low on compaction bandwidth.
Compaction can also be initiated manually with
index_object:compact().
Type: integer
Default: 60
Environment variable: TT_VINYL_TIMEOUT
-
vinyl.write_threads
The maximum number of write threads that vinyl can use for some
concurrent operations, such as I/O and compression.
Type: integer
Default: 4
Environment variable: TT_VINYL_WRITE_THREADS
The wal section defines configuration parameters related to write-ahead log.
To learn more about the WAL configuration, check the Persistence page.
Note
wal can be defined in any scope.
-
wal.cleanup_delay
The delay in seconds used to prevent the Tarantool garbage collector
from immediately removing write-ahead log files after a node restart.
This delay eliminates possible erroneous situations when the master deletes WALs
needed by replicas after restart.
As a consequence, replicas sync with the master faster after its restart and
don’t need to download all the data again.
Once all the nodes in the replica set are up and running, a scheduled garbage collection is started again
even if wal.cleanup_delay has not expired.
See also: wal.retention_period
Type: number
Default: 14400
Environment variable: TT_WAL_CLEANUP_DELAY
-
wal.dir
A directory where write-ahead log (.xlog) files are stored.
A relative path in this option is interpreted as relative to process.work_dir.
By default, WAL files and snapshots are stored in the same directory.
However, you can set different values for the wal.dir and snapshot.dir options
to store them on different physical disks for performance matters.
Type: string
Default: ‘var/lib/{{ instance_name }}’
Environment variable: TT_WAL_DIR
-
wal.dir_rescan_delay
The time interval in seconds between periodic scans of the write-ahead-log
file directory, when checking for changes to write-ahead-log
files for the sake of replication or hot standby.
Type: number
Default: 2
Environment variable: TT_WAL_DIR_RESCAN_DELAY
-
wal.max_size
The maximum number of bytes in a single write-ahead log file.
When a request would cause an .xlog file to become larger than
wal.max_size, Tarantool creates a new WAL file.
Type: integer
Default: 268435456
Environment variable: TT_WAL_MAX_SIZE
-
wal.mode
Specify fiber-WAL-disk synchronization mode as:
none: write-ahead log is not maintained.
A node with wal.mode set to none can’t be a replication master.
write: fibers wait for their data to be written to
the write-ahead log (no fsync(2)).
fsync: fibers wait for their data, fsync(2)
follows each write(2).
Type: string
Default: ‘write’
Environment variable: TT_WAL_MODE
-
wal.queue_max_size
The size of the queue in bytes used by a replica to submit
new transactions to a write-ahead log (WAL).
This option helps limit the rate at which a replica submits transactions to the WAL.
Limiting the queue size might be useful when a replica is trying to sync with a master and
reads new transactions faster than writing them to the WAL.
Note
You might consider increasing the wal.queue_max_size value in case of
large tuples (approximately one megabyte or larger).
Type: integer
Default: 16777216
Environment variable: TT_WAL_QUEUE_MAX_SIZE
-
wal.retention_period
Since: 3.1.0 (Enterprise Edition only)
The delay in seconds used to prevent the Tarantool garbage collector from removing a write-ahead log file after it has been closed.
If a node is restarted, wal.retention_period counts down from the last modification time of the write-ahead log file.
The garbage collector doesn’t track write-ahead logs that are to be relayed to anonymous replicas, such as:
- Anonymous replicas added as a part of a cluster configuration (see replication.anon).
- CDC (Change Data Capture) that retrieves data using anonymous replication.
In case of a replica or CDC downtime, the required write-ahead logs can be removed.
As a result, such a replica needs to be rebootstrapped.
You can use wal.retention_period to prevent such issues.
Note that wal.cleanup_delay option also sets the delay used to prevent the Tarantool garbage collector from removing write-ahead logs.
The difference is that the garbage collector doesn’t take into account wal.cleanup_delay if all the nodes in the replica set are up and running, which may lead to the removal of the required write-ahead logs.
Type: number
Default: 0
Environment variable: TT_WAL_RETENTION_PERIOD
Enterprise Edition
Configuring wal.ext.* parameters is available in the Enterprise Edition only.
This section describes options related to WAL extensions.
-
wal.ext.new
Enable storing a new tuple for each CRUD operation performed.
The option is in effect for all spaces.
To adjust the option for specific spaces, use the wal.ext.spaces
option.
Type: boolean
Default: false
Environment variable: TT_WAL_EXT_NEW
-
wal.ext.old
Enable storing an old tuple for each CRUD operation performed.
The option is in effect for all spaces.
To adjust the option for specific spaces, use the wal.ext.spaces
option.
Type: boolean
Default: false
Environment variable: TT_WAL_EXT_OLD
-
wal.ext.spaces
Enable or disable storing an old and new tuple in the WAL record
for a given space explicitly.
The configuration for specific spaces has priority over the configuration in the
wal.ext.new and wal.ext.old
options.
The option is a key-value pair:
- The key is a space name (string).
- The value is a table that includes two optional boolean options:
old and new.
The format and the default value of these options are described in wal.ext.old and wal.ext.new.
Example
In the example, only new tuples are added to the log for the bands space.
ext:
new: true
old: true
spaces:
bands:
old: false
Type: map
Default: nil
Environment variable: TT_WAL_EXT_SPACES
Configuration reference (box.cfg)
Note
Starting with the 3.0 version, the recommended way of configuring Tarantool is using a configuration file.
Configuring Tarantool in code is considered a legacy approach.
This topic describes all configuration parameters
that can be specified in code using the box.cfg API.
-
background
Since version 1.6.2.
Run the server as a background task. The log
and pid_file parameters must be non-null for
this to work.
Important
Do not enable the background mode for applications intended to run by the
tt utility. For more information, see the tt start reference.
Type: boolean
Default: false
Environment variable: TT_BACKGROUND
Dynamic: no
-
coredump
Create coredump files.
Usually, an administrator needs to call ulimit -c unlimited
(or set corresponding options in systemd’s unit file)
before running a Tarantool process to get core dumps.
If coredump is enabled, Tarantool sets the corresponding
resource limit by itself
and the administrator doesn’t need to call ulimit -c unlimited
(see man 3 setrlimit).
This option also sets the state of the dumpable attribute,
which is enabled by default,
but may be dropped in some circumstances (according to
man 2 prctl, see PR_SET_DUMPABLE).
Type: boolean
Environment variable: TT_COREDUMP
Default: false
Dynamic: no
-
custom_proc_title
Since version 1.6.7.
Add the given string to the server’s process title
(what’s shown in the COMMAND column for
ps -ef and top -c commands).
For example, ordinarily ps -ef shows the Tarantool server process
thus:
$ ps -ef | grep tarantool
1000 14939 14188 1 10:53 pts/2 00:00:13 tarantool <running>
But if the configuration parameters include custom_proc_title='sessions'
then the output looks like:
$ ps -ef | grep tarantool
1000 14939 14188 1 10:53 pts/2 00:00:16 tarantool <running>: sessions
Type: string
Default: null
Environment variable: TT_CUSTOM_PROC_TITLE
Dynamic: yes
-
listen
Since version 1.6.4.
The read/write data port number or URI (Universal
Resource Identifier) string. Has no default value, so must be specified
if connections occur from the remote clients that don’t use the
“admin port”. Connections made with
listen = URI are called “binary port” or “binary protocol”
connections.
A typical value is 3301.
box.cfg { listen = 3301 }
box.cfg { listen = "127.0.0.1:3301" }
Note
A replica also binds to this port, and accepts connections, but these
connections can only serve reads until the replica becomes a master.
Starting from version 2.10.0, you can specify several URIs,
and the port number is always stored as an integer value.
Type: integer or string
Default: null
Environment variable: TT_LISTEN
Dynamic: yes
-
memtx_dir
Since version 1.7.4.
A directory where memtx stores snapshot (.snap) files.
A relative path in this option is interpreted as relative to work_dir.
By default, snapshots and WAL files are stored in the same directory.
However, you can set different values for the memtx_dir and wal_dir options
to store them on different physical disks for performance matters.
Type: string
Default: “.”
Environment variable: TT_MEMTX_DIR
Dynamic: no
-
pid_file
Since version 1.4.9.
Store the process id in this file. Can be relative to work_dir. A typical value is “tarantool.pid”.
Type: string
Default: null
Environment variable: TT_PID_FILE
Dynamic: no
-
read_only
Since version 1.7.1.
Say box.cfg{read_only=true...} to put the server instance in read-only
mode. After this, any requests that try to change persistent data will fail with error
ER_READONLY. Read-only mode should be used for master-replica
replication. Read-only mode does not affect data-change
requests for spaces defined as
temporary.
Although read-only mode prevents the server from writing to the WAL,
it does not prevent writing diagnostics with the log module.
Type: boolean
Default: false
Environment variable: TT_READ_ONLY
Dynamic: yes
Setting read_only == true affects spaces differently depending on the
options that were used during
box.schema.space.create,
as summarized by this chart:
| Option |
Can be created? |
Can be written to? |
Is replicated? |
Is persistent? |
| (default) |
no |
no |
yes |
yes |
| temporary |
no |
yes |
no |
no |
| is_local |
no |
yes |
no |
yes |
-
sql_cache_size
Since version 2.3.1.
The maximum number of bytes in the cache for
SQL prepared statements.
(The number of bytes that are actually used can be seen with
box.info.sql().cache.size.)
Type: number
Default: 5242880
Environment variable: TT_SQL_CACHE_SIZE
Dynamic: yes
-
vinyl_dir
Since version 1.7.1.
A directory where vinyl files or subdirectories will be stored. Can be
relative to work_dir. If not specified, defaults
to work_dir.
Type: string
Default: “.”
Environment variable: TT_VINYL_DIR
Dynamic: no
-
vinyl_timeout
Since version 1.7.5.
The vinyl storage engine has a scheduler which does compaction.
When vinyl is low on available memory, the compaction scheduler
may be unable to keep up with incoming update requests.
In that situation, queries may time out after vinyl_timeout seconds.
This should rarely occur, since normally vinyl
would throttle inserts when it is running low on compaction bandwidth.
Compaction can also be ordered manually with
index_object:compact().
Type: float
Default: 60
Environment variable: TT_VINYL_TIMEOUT
Dynamic: yes
-
username
Since version 1.4.9.
UNIX user name to switch to after start.
Type: string
Default: null
Environment variable: TT_USERNAME
Dynamic: no
-
wal_dir
Since version 1.6.2.
A directory where write-ahead log (.xlog) files are stored.
A relative path in this option is interpreted as relative to work_dir.
By default, WAL files and snapshots are stored in the same directory.
However, you can set different values for the wal_dir and memtx_dir options
to store them on different physical disks for performance matters.
Type: string
Default: “.”
Environment variable: TT_WAL_DIR
Dynamic: no
-
work_dir
Since version 1.4.9.
A directory where database working files will be stored. The server instance
switches to work_dir with chdir(2) after start. Can be
relative to the current directory. If not specified, defaults to
the current directory. Other directory parameters may be relative to
work_dir, for example:
box.cfg{
work_dir = '/home/user/A',
wal_dir = 'B',
memtx_dir = 'C'
}
will put xlog files in /home/user/A/B, snapshot files in /home/user/A/C,
and all other files or subdirectories in /home/user/A.
Type: string
Default: null
Environment variable: TT_WORK_DIR
Dynamic: no
-
worker_pool_threads
Since version 1.7.5.
The maximum number of threads to use during execution
of certain internal processes (currently
socket.getaddrinfo() and
coio_call()).
Type: integer
Default: 4
Environment variable: TT_WORKER_POOL_THREADS
Dynamic: yes
-
strip_core
Since version 2.2.2.
Whether coredump files should include memory allocated for tuples.
(This can be large if Tarantool runs under heavy load.)
Setting to true means “do not include”.
In an older version of Tarantool the default value of this parameter was false