Metrics reference
This page provides a detailed description of metrics from the metrics
module.
General instance information:
tnt_cfg_current_time
—instance system time in the Unix timestamp format.tnt_info_uptime
—time in seconds since instance has started.
These metrics provide a picture of memory usage by the Tarantool process.
tnt_info_memory_cache
—number of bytes in the cache for the tuples stored for the vinyl storage engine.tnt_info_memory_data
—number of bytes used for storing user data (the tuples) with the memtx engine and with level 0 of the vinyl engine, without taking memory fragmentation into account.tnt_info_memory_index
—number of bytes used for indexing user data, including memtx and vinyl memory tree extents, the vinyl page index, and the vinyl bloom filters.tnt_info_memory_lua
—number of bytes used for the Lua runtime. Lua memory is bounded by 2 GB per instance. Monitoring this metric can prevent memory overflow.tnt_info_memory_net
—number of bytes used for network input/output buffers.tnt_info_memory_tx
—number of bytes in use by active transactions. For the vinyl storage engine, this is the total size of all allocated objects (structtxv
, structvy_tx
, structvy_read_interval
) and tuples pinned for those objects.
Provides a memory usage report for the slab allocator. The slab allocator is the main allocator used to store tuples. This can be used to monitor the total memory usage and memory fragmentation. To learn more about use cases, refer to the documentation for box.slab submodule.
Available memory, bytes:
tnt_slab_quota_size
—the amount of memory available to store tuples and indexes, equalsmemtx_memory
.tnt_slab_arena_size
—the total memory used for tuples and indexes together (including allocated, but currently free slabs).tnt_slab_items_size
—the total amount of memory (including allocated, but currently free slabs) used only for tuples, no indexes.
Memory usage, bytes:
tnt_slab_quota_used
—the amount of memory that is already reserved by the slab allocator.tnt_slab_arena_used
—the efficient memory used for storing tuples and indexes together (omitting allocated, but currently free slabs).tnt_slab_items_used
—the efficient amount of memory (omitting allocated, but currently free slabs) used only for tuples, no indexes.
Memory utilization, %:
tnt_slab_quota_used_ratio
—tnt_slab_quota_used / tnt_slab_quota_size.tnt_slab_arena_used_ratio
—tnt_slab_arena_used / tnt_slab_arena_used.tnt_slab_items_used_ratio
—tnt_slab_items_used / tnt_slab_items_size.
These metrics provide specific information about each individual space in a Tarantool instance:
tnt_space_len
—number of records in the space. This metric always has 2 labels:{name="test", engine="memtx"}
, wherename
is the name of the space, andengine
is the engine of the space.tnt_space_bsize
—the total number of bytes in all tuples. This metric always has 2 labels:{name="test", engine="memtx"}
, wherename
is the name of the space, andengine
is the engine of the space.tnt_space_index_bsize
—the total number of bytes taken by the index. This metric always has 2 labels:{name="test", index_name="pk"}
, wherename
is the name of the space, andindex_name
is the name of the index.tnt_space_total_bsize
—the total size of tuples and all indexes in space. This metric always has 2 labels:{name="test", engine="memtx"}
, wherename
is the name of the space, andengine
is the engine of the space.tnt_space_count
—the total tuples count for vinyl. This metric always has labels—{name="test", engine="vinyl"}
, wherename
is the name of the space, andengine
is the engine of the space.
Network activity stats. This can be used to monitor network load, usage peaks and traffic drops.
Sent bytes:
tnt_net_sent_total
—bytes sent from this instance over network since instance start time.
Received bytes:
tnt_net_received_total
—bytes this instance has received since instance start time.
Connections:
tnt_net_connections_total
—number of incoming network connections since instance start time.tnt_net_connections_current
—number of active network connections.
Requests:
tnt_net_requests_total
—number of network requests this instance has handled since instance start time.tnt_net_requests_current
—amount of pending network requests.
Provides the statistics of fibers. If your app creates a lot of fibers, it can be used for monitoring fibers count and memory usage:
tnt_fiber_count
—number of fibers.tnt_fiber_csw
—overall amount of fibers context switches.tnt_fiber_memalloc
—the amount of memory that is reserved for fibers.tnt_fiber_memused
—the amount of memory that is used by fibers.
Number of iproto requests this instance has processed, aggregated by request type. It can be used to find out which type of operation clients make more often.
tnt_stats_op_total
—total number of calls since server start
That metric have operation
label to be able to distinguish different request types, e.g.:
{operation="select"}
Request type could be one of:
delete
—delete callserror
—requests resulted in an errorupdate
—update callscall
—requests to execute stored proceduresauth
—authentication requestseval
—calls to evaluate lua codereplace
—replace callexecute
—execute SQL callsselect
—select callsupsert
—upsert callsprepare
—SQL prepare callsinsert
—insert calls
Provides information of current replication status. To learn more about replication mechanism in Tarantool, see this.
tnt_info_lsn
—LSN of the instance.tnt_info_vclock
—LSN number in vclock. This metric always has label{id="id"}
, whereid
is the instance’s number in the replicaset.tnt_replication_replica_<id>_lsn
/tnt_replication_master_<id>_lsn
—LSN of master/replica, whereid
is the instance’s number in the replicaset.tnt_replication_<id>_lag
—replication lag value in seconds, whereid
is the instance’s number in the replicaset.
tnt_runtime_lua
—Lua garbage collector size in bytes.tnt_runtime_used
—number of bytes used for the Lua runtime.
cartridge_issues
—Number of
issues across cluster instances.
This metric always has label {level="critical"}
, where
level
is the level of the issue:
critical
level is associated with critical cluster problems, for example when memory used ratio is more than 90%.warning
level is associated with other cluster problems, e.g. replication issues on cluster.
LuaJIT metrics help understand the stage of Lua garbage collector. They are available in Tarantool 2.6 and later.
General JIT metrics:
lj_jit_snap_restore
—overall number of snap restores.lj_jit_trace_num
—number of JIT traces.lj_jit_trace_abort
—overall number of abort traces.lj_jit_mcode_size
—total size of all allocated machine code areas.
JIT strings:
lj_strhash_hit
—number of strings being interned.lj_strhash_miss
—total number of string allocations.
GC steps:
lj_gc_steps_atomic
—count of incremental GC steps (atomic state).lj_gc_steps_sweepstring
—count of incremental GC steps (sweepstring state).lj_gc_steps_finalize
—count of incremental GC steps (finalize state).lj_gc_steps_sweep
—count of incremental GC steps (sweep state).lj_gc_steps_propagate
—count of incremental GC steps (propagate state).lj_gc_steps_pause
—count of incremental GC steps (pause state).
Allocations:
lj_gc_strnum
—number of allocatedstring
objects.lj_gc_tabnum
—number of allocatedtable
objects.lj_gc_cdatanum
—number of allocatedcdata
objects.lj_gc_udatanum
—number of allocatedudata
objects.lj_gc_freed
—total amount of freed memory.lj_gc_total
—current allocated Lua memory.lj_gc_allocated
—total amount of allocated memory.
These metrics provide the CPU usage statistics. They are only available on Linux.
tnt_cpu_count
—total number of processors configured by the operating system.tnt_cpu_total
—host CPU time.tnt_cpu_thread
—Tarantool thread CPU time. This metric always has labels{kind="user", thread_name="tarantool", thread_pid="pid", file_name="init.lua"}
, where:kind
can be eitheruser
orsystem
.thread_name
istarantool
,wal
,iproto
, orcoio
.file_name
is the entrypoint file name, for example,init.lua
.
There are also the following cross-platform metrics obtained using the call getrusage()
tnt_cpu_user_time
- Tarantool CPU user time.tnt_cpu_system_time
- Tarantool CPU system time.
Vinyl metrics provide the vinyl engine statistics.
Disk
The disk metrics are used to monitor the overall data size on disk.
tnt_vinyl_disk_data_size
—the amount of data stored in the.run
files located in the vinyl_dir directory, bytes.tnt_vinyl_disk_index_size
—the amount of data stored in the.index
files- located in the vinyl_dir directory, bytes.
Regulator
The vinyl regulator decides when to take the disk IO actions. It groups activities in batches so that they will be more consistent and efficient.
tnt_vinyl_regulator_dump_bandwidth
—the estimated average rate of taking dumps, bytes per second. Initially, the rate value is 10485760 (10 megabytes per second) and being recalculated depending on the the actual rate. Only significant dumps that are larger than one megabyte are used for the estimate.tnt_vinyl_regulator_write_rate
—the actual average rate of performing the write operations, bytes per second. The rate is calculated as a 5-second moving average. If the metric value is gradually going down, this can indicate some disk issues.tnt_vinyl_regulator_rate_limit
—the write rate limit, bytes per second. The regulator imposes the limit on transactions based on the observed dump/compaction performance. If the metric value is down to approximately 10^5, this indicates issues with the disk or the scheduler.tnt_vinyl_regulator_dump_watermark
—the maximum amount of memory used for in-memory storing of a vinyl LSM tree, bytes. When accessing this maximum, the dumping must occur. For details, see Наполнение LSM-дерева. The value is slightly smaller than the amount of memory allocated for vinyl trees, which is the vinyl_memory parameter.
Transactional activity
tnt_vinyl_tx_commit
—the counter of commits (successful transaction ends). It includes implicit commits: for example, any insert operation causes a commit unless it is within a box.begin()–box.commit() block.tnt_vinyl_tx_rollback
—the counter of rollbacks (unsuccessful transaction ends). This is not merely a count of explicit box.rollback() requests—it includes requests that ended with errors.tnt_vinyl_tx_conflict
—the counter of conflicts that caused transactions to roll back. The ratiotnt_vinyl_tx_conflict / tnt_vinyl_tx_commit
above 5% indicates that vinyl is not healthy. At this moment you’ll probably see a lot of other problems with vinyl.tnt_vinyl_tx_read_views
—the counter of read views, that is, transactions entered a read-only state to avoid conflict temporarily. Usually the value is0
. If it stays non-zero for a long time, it indicates of a memory leak.
Memory
These metrics show the state memory areas used by vinyl for caches and write buffers.
tnt_vinyl_memory_tuple_cache
—the amount of memory that is being used for storing tuples (data), bytes.tnt_vinyl_memory_level0
—the «level 0» (L0) memory area, bytes. L0 is the area that vinyl can use for in-memory storage of an LSM tree. By monitoring the metric, you can see when L0 is getting close to its maximum (tnt_vinyl_regulator_dump_watermark
) at which a dump will be taken. You can expect L0 = 0 immediately after the dump operation is completed.tnt_vinyl_memory_page_index
—the amount of memory that is being used for storing indexes, bytes. If the metric value is close to vinyl_memory, this indicates the incorrectly chosen vinyl_page_size.tnt_vinyl_memory_bloom_filter
—the amount of memory used by bloom filters, bytes.
Scheduler
The vinyl scheduler invokes the regulator and updates the related variables. This happens once per second.
tnt_vinyl_scheduler_tasks
—the number of the scheduler dump/compaction tasks. The metric always has label{status = <status_value>}
where<status_value>
can be:inprogress
for currently running taskscompleted
for successfully completed tasksfailed
for tasks aborted due to errors.
tnt_vinyl_scheduler_dump_time
—total time spent by all worker threads performing dumps, seconds.tnt_vinyl_scheduler_dump_count
—the counter of dumps completed.