Module metrics | Tarantool

Module metrics

Since: 2.11.1

The metrics module provides the ability to collect and expose Tarantool metrics.

Note

If you use a Tarantool version below 2.11.1, it is necessary to install the latest version of metrics first. For Tarantool 2.11.1 and above, you can also use the external metrics module. In this case, the external metrics module takes priority over the built-in one.

Tarantool provides the following metric collectors:

A collector is a representation of one or more observations that change over time.

A counter is a cumulative metric that denotes a single monotonically increasing counter. Its value might only increase or be reset to zero on restart. For example, you can use the counter to represent the number of requests served, tasks completed, or errors.

The design is based on the Prometheus counter.

A gauge is a metric that denotes a single numerical value that can arbitrarily increase and decrease.

The gauge type is typically used for measured values like temperature or current memory usage. It could also be used for values that can increase or decrease, such as the number of concurrent requests.

The design is based on the Prometheus gauge.

A histogram metric is used to collect and analyze statistical data about the distribution of values within the application. Unlike metrics that track the average value or quantity of events, a histogram provides detailed visibility into the distribution of values and can uncover hidden dependencies.

The design is based on the Prometheus histogram.

A summary metric is used to collect statistical data about the distribution of values within the application.

Each summary provides several measurements:

  • total count of measurements
  • sum of measured values
  • values at specific quantiles

Similar to histograms, the summary also operates with value ranges. However, unlike histograms, it uses quantiles (defined by a number between 0 and 1) for this purpose. In this case, it is not required to define fixed boundaries. For summary type, the ranges depend on the measured values and the number of measurements.

The design is based on the Prometheus summary.

A label is a piece of metainfo that you associate with a metric in the key-value format. For details, see labels in Prometheus and tags in Graphite.

Labels are used to differentiate between the characteristics of a thing being measured. For example, in a metric associated with the total number of HTTP requests, you can represent methods and statuses as label pairs:

http_requests_total_counter:inc(1, { method = 'POST', status = '200' })

The example above allows extracting the following time series:

  1. The total number of requests over time with method = "POST" (and any status).
  2. The total number of requests over time with status = 500 (and any method).

To configure metrics, use metrics.cfg(). This function can be used to turn on or off the specified metrics or to configure labels applied to all collectors. Moreover, you can use the following shortcut functions to set-up metrics or labels:

Note

Starting from version 3.0, metrics can be configured using a configuration file in the metrics section.

To create a custom metric, follow the steps below:

  1. Create a metric

    To create a new metric, you need to call a function corresponding to the desired collector type. For example, call metrics.counter() or metrics.gauge() to create a new counter or gauge, respectively. In the example below, a new counter is created:

    local metrics = require('metrics')
    local bands_replace_count = metrics.counter('bands_replace_count', 'The number of data operations')
    

    This counter is intended to collect the number of data operations performed on the specified space.

    In the next example, a gauge is created:

    local metrics = require('metrics')
    local bands_waste_size = metrics.gauge('bands_waste_size', 'The size of memory wasted due to internal fragmentation')
    
  2. Observe a value

    You can observe a value in two ways:

    • At the appropriate place, for example, in an API request handler or trigger. In this example below, the counter value is increased any time a data operation is performed on the bands space. To increase a counter value, counter_obj:inc() is called.

      local metrics = require('metrics')
      local bands_replace_count = metrics.counter('bands_replace_count', 'The number of data operations')
      local trigger = require('trigger')
      trigger.set(
              'box.space.bands.on_replace',
              'update_bands_replace_count_metric',
              function(_, _, _, request_type)
                  bands_replace_count:inc(1, { request_type = request_type })
              end
      )
      
    • At the time of requesting the data collected by metrics. In this case, you need to collect the required metric inside metrics.register_callback(). The example below shows how to use a gauge collector to measure the size of memory wasted due to internal fragmentation:

      local metrics = require('metrics')
      local bands_waste_size = metrics.gauge('bands_waste_size', 'The size of memory wasted due to internal fragmentation')
      metrics.register_callback(function()
          bands_waste_size:set(box.space.bands:stat()['tuple']['memtx']['waste_size'])
      end)
      

      To set a gauge value, gauge_obj:set() is called.

You can find the full example on GitHub: metrics_collect_custom.

The module allows to add your own metrics, but there are some subtleties when working with specific tools.

When adding your custom metric, it’s important to ensure that the number of label value combinations is kept to a minimum. Otherwise, combinatorial explosion may happen in the timeseries database with metrics values stored. Examples of data labels:

For example, if your company uses InfluxDB for metric collection, you can potentially disrupt the entire monitoring setup, both for your application and for all other systems within the company. As a result, monitoring data is likely to be lost.

Example:

local some_metric = metrics.counter('some', 'Some metric')

-- THIS IS POSSIBLE
local function on_value_update(instance_alias)
   some_metric:inc(1, { alias = instance_alias })
end

-- THIS IS NOT ALLOWED
local function on_value_update(customer_id)
   some_metric:inc(1, { customer_id = customer_id })
end

In the example, there are two versions of the function on_value_update. The top version labels the data with the cluster instance’s alias. Since there’s a relatively small number of nodes, using them as labels is feasible. In the second case, an identifier of a record is used. If there are many records, it’s recommended to avoid such situations.

The same principle applies to URLs. Using the entire URL with parameters is not recommended. Use a URL template or the name of the command instead.

In essence, when designing custom metrics and selecting labels or tags, it’s crucial to opt for a minimal set of values that can uniquely identify the data without introducing unnecessary complexity or potential conflicts with existing metrics and systems.

The metrics module provides middleware for monitoring HTTP latency statistics for endpoints that are created using the http module. The latency collector observes both latency information and the number of invocations. The metrics collected by HTTP middleware are separated by a set of labels:

  • a route (path)
  • a method (method)
  • an HTTP status code (status)

For each route that you want to track, you must specify the middleware explicitly. The example below shows how to collect statistics for requests made to the /metrics/hello endpoint.

httpd = require('http.server').new('127.0.0.1', 8080)
local metrics = require('metrics')
metrics.http_middleware.configure_default_collector('summary')
httpd:route({
    method = 'GET',
    path = '/metrics/hello'
}, metrics.http_middleware.v1(
        function()
            return { status = 200,
                     headers = { ['content-type'] = 'text/plain' },
                     body = 'Hello from http_middleware!' }
        end))

httpd:start()

Note

The middleware does not cover the 404 errors.

The metrics module provides a set of plugins that let you collect metrics through a unified interface:

For example, you can obtain an HTTP response object containing metrics in the Prometheus format by calling the metrics.plugins.prometheus.collect_http() function:

local prometheus_plugin = require('metrics.plugins.prometheus')
local prometheus_metrics = prometheus_plugin.collect_http()

To expose the collected metrics, you can use the http module:

httpd = require('http.server').new('127.0.0.1', 8080)
httpd:route({
    method = 'GET',
    path = '/metrics/prometheus'
}, function()
    local prometheus_plugin = require('metrics.plugins.prometheus')
    local prometheus_metrics = prometheus_plugin.collect_http()
    return prometheus_metrics
end)
httpd:start()

Example on GitHub: metrics_plugins

Use the following API to create custom plugins:

To create a plugin, you need to include the following in your main export function:

-- Invoke all callbacks registered via `metrics.register_callback(<callback-function>)`
metrics.invoke_callbacks()

-- Loop over collectors
for _, c in pairs(metrics.collectors()) do
    ...

    -- Loop over instant observations in the collector
    for _, obs in pairs(c:collect()) do
        -- Export observation `obs`
        ...
    end
end

See the source code of built-in plugins in the metrics GitHub repository.

metrics API  
metrics.cfg() Entrypoint to setup the module
metrics.collect() Collect observations from each collector
metrics.collectors() List all collectors in the registry
metrics.counter() Register a new counter
metrics.enable_default_metrics() Same as metrics.cfg{ include = include, exclude = exclude }
metrics.gauge() Register a new gauge
metrics.histogram() Register a new histogram
metrics.invoke_callbacks() Invoke all registered callbacks
metrics.register_callback() Register a function named callback
metrics.set_global_labels() Same as metrics.cfg{ labels = label_pairs }
metrics.summary() Register a new summary
metrics.unregister_callback() Unregister a function named callback
metrics.http_middleware API  
metrics.http_middleware.build_default_collector() Register and return a collector for the middleware
metrics.http_middleware.configure_default_collector() Register a collector for the middleware and set it as default
metrics.http_middleware.get_default_collector() Get the default collector
metrics.http_middleware.set_default_collector() Set the default collector
metrics.http_middleware.v1() Latency measuring wrap-up
Related objects  
collector_object A collector object
counter_obj A counter object
gauge_obj A gauge object
histogram_obj A histogram object
registry A metrics registry
summary_obj A summary object

metrics.cfg([config])

Entrypoint to setup the module.

Parameters:
  • config (table) –

    module configuration options:

    • cfg.include (string/table, default all): all to enable all supported default metrics, none to disable all default metrics, table with names of the default metrics to enable a specific set of metrics.
    • cfg.exclude (table, default {}): a table containing the names of the default metrics that you want to disable. Has higher priority than cfg.include.
    • cfg.labels (table, default {}): a table containing label names as string keys, label values as values. See also: Labels.

You can work with metrics.cfg as a table to read values, but you must call metrics.cfg{} as a function to update them.

Supported default metric names (for cfg.include and cfg.exclude tables):

  • all (metasection including all metrics)
  • network
  • operations
  • system
  • replicas
  • info
  • slab
  • runtime
  • memory
  • spaces
  • fibers
  • cpu
  • vinyl
  • memtx
  • luajit
  • clock
  • event_loop
  • config

See metrics reference for details. All metric collectors from the collection have metainfo.default = true.

cfg.labels are the global labels to be added to every observation.

Global labels are applied only to metric collection. They have no effect on how observations are stored.

Global labels can be changed on the fly.

label_pairs from observation objects have priority over global labels. If you pass label_pairs to an observation method with the same key as some global label, the method argument value will be used.

Note that both label names and values in label_pairs are treated as strings.

metrics.collect([opts])

Collect observations from each collector.

Parameters:
  • opts (table) –

    table of collect options:

    • invoke_callbacks – if true, invoke_callbacks() is triggered before actual collect.
    • default_only – if true, observations contain only default metrics (metainfo.default = true).
metrics.collectors()

List all collectors in the registry. Designed to be used in exporters.

Return:A list of created collectors (see collector_object).

See also: Creating custom plugins

metrics.counter(name[, help, metainfo])

Register a new counter.

Parameters:
  • name (string) – collector name. Must be unique.
  • help (string) – collector description.
  • metainfo (table) – collector metainfo.
Return:

A counter object (see counter_obj).

Rtype:

counter_obj

See also: Creating custom metrics

metrics.enable_default_metrics([include, exclude])

Same as metrics.cfg{include=include, exclude=exclude}, but include={} is treated as include='all' for backward compatibility.

metrics.gauge(name[, help, metainfo])

Register a new gauge.

Parameters:
  • name (string) – collector name. Must be unique.
  • help (string) – collector description.
  • metainfo (table) – collector metainfo.
Return:

A gauge object (see gauge_obj).

Rtype:

gauge_obj

See also: Creating custom metrics

metrics.histogram(name[, help, buckets, metainfo])

Register a new histogram.

Parameters:
  • name (string) – collector name. Must be unique.
  • help (string) – collector description.
  • buckets (table) – histogram buckets (an array of sorted positive numbers). The infinity bucket (INF) is appended automatically. Default: {.005, .01, .025, .05, .075, .1, .25, .5, .75, 1.0, 2.5, 5.0, 7.5, 10.0, INF}.
  • metainfo (table) – collector metainfo.
Return:

A histogram object (see histogram_obj).

Rtype:

histogram_obj

See also: Creating custom metrics

Note

A histogram is basically a set of collectors:

  • name .. "_sum" – a counter holding the sum of added observations.
  • name .. "_count" – a counter holding the number of added observations.
  • name .. "_bucket" – a counter holding all bucket sizes under the label le (less or equal). To access a specific bucket – x (where x is a number), specify the value x for the label le.
metrics.invoke_callbacks()

Invoke all registered callbacks. Has to be called before each collect(). You can also use collect{invoke_callbacks = true} instead. If you’re using one of the default exporters, invoke_callbacks() will be called by the exporter.

See also: Creating custom plugins

metrics.register_callback(callback)

Register a function named callback, which will be called right before metric collection on plugin export.

Parameters:
  • callback (function) – a function that takes no parameters.

This method is most often used for gauge metrics updates.

Example:

local metrics = require('metrics')
local bands_waste_size = metrics.gauge('bands_waste_size', 'The size of memory wasted due to internal fragmentation')
metrics.register_callback(function()
    bands_waste_size:set(box.space.bands:stat()['tuple']['memtx']['waste_size'])
end)

See also: Custom metrics

metrics.set_global_labels(label_pairs)

Same as metrics.cfg{ labels = label_pairs }. Learn more in metrics.cfg().

metrics.summary(name[, help, objectives, params, metainfo])

Register a new summary. Quantile computation is based on the “Effective computation of biased quantiles over data streams” algorithm.

Parameters:
  • name (string) – collector name. Must be unique.
  • help (string) – collector description.
  • objectives (table) – a list of “targeted” φ-quantiles in the {quantile = error, ... } form. Example: {[0.5]=0.01, [0.9]=0.01, [0.99]=0.01}. The targeted φ-quantile is specified in the form of a φ-quantile and the tolerated error. For example, {[0.5] = 0.1} means that the median (= 50th percentile) is to be returned with a 10-percent error. Note that percentiles and quantiles are the same concept, except that percentiles are expressed as percentages. The φ-quantile must be in the interval [0, 1]. A lower tolerated error for a φ-quantile results in higher memory and CPU usage during summary calculation.
  • params (table) – table of the summary parameters used to configuring the sliding time window. This window consists of several buckets to store observations. New observations are added to each bucket. After a time period, the head bucket (from which observations are collected) is reset, and the next bucket becomes the new head. This way, each bucket stores observations for max_age_time * age_buckets_count seconds before it is reset. max_age_time sets the duration of each bucket’s lifetime – that is, how many seconds the observations are kept before they are discarded. age_buckets_count sets the number of buckets in the sliding time window. This variable determines the number of buckets used to exclude observations older than max_age_time from the summary. The value is a trade-off between resources (memory and CPU for maintaining the bucket) and how smooth the time window moves. Default value: {max_age_time = math.huge, age_buckets_count = 1}.
  • metainfo (table) – collector metainfo.
Return:

A summary object (see summary_obj).

Rtype:

summary_obj

See also: Creating custom metrics

Note

A summary represents a set of collectors:

  • name .. "_sum" – a counter holding the sum of added observations.
  • name .. "_count" – a counter holding the number of added observations.
  • name holds all the quantiles under observation that find themselves under the label quantile (less or equal). To access bucket x (where x is a number), specify the value x for the label quantile.
metrics.unregister_callback(callback)

Unregister a function named callback that is called right before metric collection on plugin export.

Parameters:
  • callback (function) – a function that takes no parameters.

Example:

local cpu_callback = function()
    local cpu_metrics = require('metrics.psutils.cpu')
    cpu_metrics.update()
end

metrics.register_callback(cpu_callback)

-- after a while, we don't need that callback function anymore

metrics.unregister_callback(cpu_callback)

metrics.http_middleware.build_default_collector(type_name, name[, help])

Register and return a collector for the middleware.

Parameters:
  • type_name (string) – collector type: histogram or summary. The default is histogram.
  • name (string) – collector name. The default is http_server_request_latency.
  • help (string) – collector description. The default is HTTP Server Request Latency.
Return:

A collector object

Possible errors:

  • A collector with the same type and name already exists in the registry.
metrics.http_middleware.configure_default_collector(type_name, name, help)

Register a collector for the middleware and set it as default.

Parameters:
  • type_name (string) – collector type: histogram or summary. The default is histogram.
  • name (string) – collector name. The default is http_server_request_latency.
  • help (string) – collector description. The default is HTTP Server Request Latency.

Possible errors:

  • A collector with the same type and name already exists in the registry.
metrics.http_middleware.get_default_collector()

Return the default collector. If the default collector hasn’t been set yet, register it (with default http_middleware.build_default_collector() parameters) and set it as default.

Return:A collector object
metrics.http_middleware.set_default_collector(collector)

Set the default collector.

Parameters:
  • collector – middleware collector object
metrics.http_middleware.v1(handler, collector)

Latency measuring wrap-up for the HTTP ver. 1.x.x handler. Returns a wrapped handler.

Learn more in Collecting HTTP metrics.

Parameters:

Usage:

httpd:route(route, http_middleware.v1(request_handler, collector))

See also: Collecting HTTP metrics

Found what you were looking for?
Feedback