Module metrics | Tarantool
Документация на русском языке
поддерживается сообществом

Module metrics

Since: 2.11.1

The `metrics` module provides the ability to collect and expose Tarantool metrics.

Примечание

If you use a Tarantool version below 2.11.1, it is necessary to install the latest version of metrics first. For Tarantool 2.11.1 and above, you can also use the external `metrics` module. In this case, the external `metrics` module takes priority over the built-in one.

Overview

Collectors

Tarantool provides the following metric collectors:

A collector is a representation of one or more observations that change over time.

counter

A counter is a cumulative metric that denotes a single monotonically increasing counter. Its value might only increase or be reset to zero on restart. For example, you can use the counter to represent the number of requests served, tasks completed, or errors.

The design is based on the Prometheus counter.

gauge

A gauge is a metric that denotes a single numerical value that can arbitrarily increase and decrease.

The gauge type is typically used for measured values like temperature or current memory usage. It could also be used for values that can increase or decrease, such as the number of concurrent requests.

The design is based on the Prometheus gauge.

histogram

A histogram metric is used to collect and analyze statistical data about the distribution of values within the application. Unlike metrics that track the average value or quantity of events, a histogram provides detailed visibility into the distribution of values and can uncover hidden dependencies.

The design is based on the Prometheus histogram.

summary

A summary metric is used to collect statistical data about the distribution of values within the application.

Each summary provides several measurements:

• total count of measurements
• sum of measured values
• values at specific quantiles

Similar to histograms, the summary also operates with value ranges. However, unlike histograms, it uses quantiles (defined by a number between 0 and 1) for this purpose. In this case, it is not required to define fixed boundaries. For summary type, the ranges depend on the measured values and the number of measurements.

The design is based on the Prometheus summary.

Labels

A label is a piece of metainfo that you associate with a metric in the key-value format. For details, see labels in Prometheus and tags in Graphite.

Labels are used to differentiate between the characteristics of a thing being measured. For example, in a metric associated with the total number of HTTP requests, you can represent methods and statuses as label pairs:

```http_requests_total_counter:inc(1, { method = 'POST', status = '200' })
```

The example above allows extracting the following time series:

1. The total number of requests over time with `method = "POST"` (and any status).
2. The total number of requests over time with `status = 500` (and any method).

Configuring metrics

To configure metrics, use metrics.cfg(). This function can be used to turn on or off the specified metrics or to configure labels applied to all collectors. Moreover, you can use the following shortcut functions to set-up metrics or labels:

Примечание

Starting from version 3.0, metrics can be configured using a configuration file in the metrics section.

Custom metrics

Creating custom metrics

To create a custom metric, follow the steps below:

1. Create a metric

To create a new metric, you need to call a function corresponding to the desired collector type. For example, call metrics.counter() or metrics.gauge() to create a new counter or gauge, respectively. In the example below, a new counter is created:

```local metrics = require('metrics')
local bands_replace_count = metrics.counter('bands_replace_count', 'The number of data operations')
```

This counter is intended to collect the number of data operations performed on the specified space.

In the next example, a gauge is created:

```local metrics = require('metrics')
local bands_waste_size = metrics.gauge('bands_waste_size', 'The size of memory wasted due to internal fragmentation')
```
2. Observe a value

You can observe a value in two ways:

• At the appropriate place, for example, in an API request handler or trigger. In this example below, the counter value is increased any time a data operation is performed on the `bands` space. To increase a counter value, counter_obj:inc() is called.

```local metrics = require('metrics')
local bands_replace_count = metrics.counter('bands_replace_count', 'The number of data operations')
local trigger = require('trigger')
trigger.set(
'box.space.bands.on_replace',
'update_bands_replace_count_metric',
function(_, _, _, request_type)
bands_replace_count:inc(1, { request_type = request_type })
end
)
```
• At the time of requesting the data collected by metrics. In this case, you need to collect the required metric inside metrics.register_callback(). The example below shows how to use a gauge collector to measure the size of memory wasted due to internal fragmentation:

```local metrics = require('metrics')
local bands_waste_size = metrics.gauge('bands_waste_size', 'The size of memory wasted due to internal fragmentation')
metrics.register_callback(function()
bands_waste_size:set(box.space.bands:stat()['tuple']['memtx']['waste_size'])
end)
```

To set a gauge value, gauge_obj:set() is called.

You can find the full example on GitHub: metrics_collect_custom.

Possible limitations

The module allows to add your own metrics, but there are some subtleties when working with specific tools.

When adding your custom metric, it’s important to ensure that the number of label value combinations is kept to a minimum. Otherwise, combinatorial explosion may happen in the timeseries database with metrics values stored. Examples of data labels:

For example, if your company uses InfluxDB for metric collection, you can potentially disrupt the entire monitoring setup, both for your application and for all other systems within the company. As a result, monitoring data is likely to be lost.

Example:

```local some_metric = metrics.counter('some', 'Some metric')

-- THIS IS POSSIBLE
local function on_value_update(instance_alias)
some_metric:inc(1, { alias = instance_alias })
end

-- THIS IS NOT ALLOWED
local function on_value_update(customer_id)
some_metric:inc(1, { customer_id = customer_id })
end
```

In the example, there are two versions of the function `on_value_update`. The top version labels the data with the cluster instance’s alias. Since there’s a relatively small number of nodes, using them as labels is feasible. In the second case, an identifier of a record is used. If there are many records, it’s recommended to avoid such situations.

The same principle applies to URLs. Using the entire URL with parameters is not recommended. Use a URL template or the name of the command instead.

In essence, when designing custom metrics and selecting labels or tags, it’s crucial to opt for a minimal set of values that can uniquely identify the data without introducing unnecessary complexity or potential conflicts with existing metrics and systems.

Collecting HTTP metrics

The `metrics` module provides middleware for monitoring HTTP latency statistics for endpoints that are created using the http module. The latency collector observes both latency information and the number of invocations. The metrics collected by HTTP middleware are separated by a set of labels:

• a route (`path`)
• a method (`method`)
• an HTTP status code (`status`)

For each route that you want to track, you must specify the middleware explicitly. The example below shows how to collect statistics for requests made to the `/metrics/hello` endpoint.

```httpd = require('http.server').new('127.0.0.1', 8080)
local metrics = require('metrics')
metrics.http_middleware.configure_default_collector('summary')
httpd:route({
method = 'GET',
path = '/metrics/hello'
}, metrics.http_middleware.v1(
function()
return { status = 200,
headers = { ['content-type'] = 'text/plain' },
body = 'Hello from http_middleware!' }
end))

httpd:start()
```

Примечание

The middleware does not cover the 404 errors.

Collecting metrics using plugins

The `metrics` module provides a set of plugins that let you collect metrics through a unified interface:

For example, you can obtain an HTTP response object containing metrics in the Prometheus format by calling the `metrics.plugins.prometheus.collect_http()` function:

```local prometheus_plugin = require('metrics.plugins.prometheus')
local prometheus_metrics = prometheus_plugin.collect_http()
```

To expose the collected metrics, you can use the http module:

```httpd = require('http.server').new('127.0.0.1', 8080)
httpd:route({
method = 'GET',
path = '/metrics/prometheus'
}, function()
local prometheus_plugin = require('metrics.plugins.prometheus')
local prometheus_metrics = prometheus_plugin.collect_http()
return prometheus_metrics
end)
httpd:start()
```

Example on GitHub: metrics_plugins

Creating custom plugins

Use the following API to create custom plugins:

To create a plugin, you need to include the following in your main export function:

```-- Invoke all callbacks registered via `metrics.register_callback(<callback-function>)`
metrics.invoke_callbacks()

-- Loop over collectors
for _, c in pairs(metrics.collectors()) do
...

-- Loop over instant observations in the collector
for _, obs in pairs(c:collect()) do
-- Export observation `obs`
...
end
end
```

See the source code of built-in plugins in the metrics GitHub repository.

API Reference

 metrics API metrics.cfg() Entrypoint to setup the module metrics.collect() Collect observations from each collector metrics.collectors() List all collectors in the registry metrics.counter() Register a new counter metrics.enable_default_metrics() Same as `metrics.cfg{ include = include, exclude = exclude }` metrics.gauge() Register a new gauge metrics.histogram() Register a new histogram metrics.invoke_callbacks() Invoke all registered callbacks metrics.register_callback() Register a function named `callback` metrics.set_global_labels() Same as `metrics.cfg{ labels = label_pairs }` metrics.summary() Register a new summary metrics.unregister_callback() Unregister a function named `callback` metrics.http_middleware API metrics.http_middleware.build_default_collector() Register and return a collector for the middleware metrics.http_middleware.configure_default_collector() Register a collector for the middleware and set it as default metrics.http_middleware.get_default_collector() Get the default collector metrics.http_middleware.set_default_collector() Set the default collector metrics.http_middleware.v1() Latency measuring wrap-up Related objects collector_object A collector object counter_obj A counter object gauge_obj A gauge object histogram_obj A histogram object registry A metrics registry summary_obj A summary object

metrics API

`metrics.``cfg`([config])

Entrypoint to setup the module.

Параметры: config (`table`) – module configuration options: `cfg.include` (string/table, default `all`): `all` to enable all supported default metrics, `none` to disable all default metrics, table with names of the default metrics to enable a specific set of metrics. `cfg.exclude` (table, default `{}`): a table containing the names of the default metrics that you want to disable. Has higher priority than `cfg.include`. `cfg.labels` (table, default `{}`): a table containing label names as string keys, label values as values. See also: Labels.

You can work with `metrics.cfg` as a table to read values, but you must call `metrics.cfg{}` as a function to update them.

Supported default metric names (for `cfg.include` and `cfg.exclude` tables):

• `all` (metasection including all metrics)
• `network`
• `operations`
• `system`
• `replicas`
• `info`
• `slab`
• `runtime`
• `memory`
• `spaces`
• `fibers`
• `cpu`
• `vinyl`
• `memtx`
• `luajit`
• `clock`
• `event_loop`
• `config`

See metrics reference for details. All metric collectors from the collection have `metainfo.default = true`.

`cfg.labels` are the global labels to be added to every observation.

Global labels are applied only to metric collection. They have no effect on how observations are stored.

Global labels can be changed on the fly.

`label_pairs` from observation objects have priority over global labels. If you pass `label_pairs` to an observation method with the same key as some global label, the method argument value will be used.

Note that both label names and values in `label_pairs` are treated as strings.

`metrics.``collect`([opts])

Collect observations from each collector.

Параметры: opts (`table`) – table of collect options: `invoke_callbacks` – if `true`, invoke_callbacks() is triggered before actual collect. `default_only` – if `true`, observations contain only default metrics (`metainfo.default = true`).
`metrics.``collectors`()

List all collectors in the registry. Designed to be used in exporters.

Return: A list of created collectors (see collector_object).

`metrics.``counter`(name[, help, metainfo])

Register a new counter.

Параметры: name (`string`) – collector name. Must be unique. help (`string`) – collector description. metainfo (`table`) – collector metainfo. A counter object (see counter_obj). counter_obj

`metrics.``enable_default_metrics`([include, exclude])

Same as `metrics.cfg{include=include, exclude=exclude}`, but `include={}` is treated as `include='all'` for backward compatibility.

`metrics.``gauge`(name[, help, metainfo])

Register a new gauge.

Параметры: name (`string`) – collector name. Must be unique. help (`string`) – collector description. metainfo (`table`) – collector metainfo. A gauge object (see gauge_obj). gauge_obj

`metrics.``histogram`(name[, help, buckets, metainfo])

Register a new histogram.

Параметры: name (`string`) – collector name. Must be unique. help (`string`) – collector description. buckets (`table`) – histogram buckets (an array of sorted positive numbers). The infinity bucket (`INF`) is appended automatically. Default: `{.005, .01, .025, .05, .075, .1, .25, .5, .75, 1.0, 2.5, 5.0, 7.5, 10.0, INF}`. metainfo (`table`) – collector metainfo. A histogram object (see histogram_obj). histogram_obj

Примечание

A histogram is basically a set of collectors:

• `name .. "_sum"` – a counter holding the sum of added observations.
• `name .. "_count"` – a counter holding the number of added observations.
• `name .. "_bucket"` – a counter holding all bucket sizes under the label `le` (less or equal). To access a specific bucket – `x` (where `x` is a number), specify the value `x` for the label `le`.
`metrics.``invoke_callbacks`()

Invoke all registered callbacks. Has to be called before each collect(). You can also use `collect{invoke_callbacks = true}` instead. If you’re using one of the default exporters, `invoke_callbacks()` will be called by the exporter.

`metrics.``register_callback`(callback)

Register a function named `callback`, which will be called right before metric collection on plugin export.

Параметры: callback (`function`) – a function that takes no parameters.

This method is most often used for gauge metrics updates.

Example:

```local metrics = require('metrics')
local bands_waste_size = metrics.gauge('bands_waste_size', 'The size of memory wasted due to internal fragmentation')
metrics.register_callback(function()
bands_waste_size:set(box.space.bands:stat()['tuple']['memtx']['waste_size'])
end)
```

`metrics.``set_global_labels`(label_pairs)

Same as `metrics.cfg{ labels = label_pairs }`. Learn more in metrics.cfg().

`metrics.``summary`(name[, help, objectives, params, metainfo])

Register a new summary. Quantile computation is based on the «Effective computation of biased quantiles over data streams» algorithm.

Параметры: name (`string`) – collector name. Must be unique. help (`string`) – collector description. objectives (`table`) – a list of «targeted» φ-quantiles in the `{quantile = error, ... }` form. Example: `{[0.5]=0.01, [0.9]=0.01, [0.99]=0.01}`. The targeted φ-quantile is specified in the form of a φ-quantile and the tolerated error. For example, `{[0.5] = 0.1}` means that the median (= 50th percentile) is to be returned with a 10-percent error. Note that percentiles and quantiles are the same concept, except that percentiles are expressed as percentages. The φ-quantile must be in the interval `[0, 1]`. A lower tolerated error for a φ-quantile results in higher memory and CPU usage during summary calculation. params (`table`) – table of the summary parameters used to configuring the sliding time window. This window consists of several buckets to store observations. New observations are added to each bucket. After a time period, the head bucket (from which observations are collected) is reset, and the next bucket becomes the new head. This way, each bucket stores observations for `max_age_time * age_buckets_count` seconds before it is reset. `max_age_time` sets the duration of each bucket’s lifetime – that is, how many seconds the observations are kept before they are discarded. `age_buckets_count` sets the number of buckets in the sliding time window. This variable determines the number of buckets used to exclude observations older than `max_age_time` from the summary. The value is a trade-off between resources (memory and CPU for maintaining the bucket) and how smooth the time window moves. Default value: `{max_age_time = math.huge, age_buckets_count = 1}`. metainfo (`table`) – collector metainfo. A summary object (see summary_obj). summary_obj

Примечание

A summary represents a set of collectors:

• `name .. "_sum"` – a counter holding the sum of added observations.
• `name .. "_count"` – a counter holding the number of added observations.
• `name` holds all the quantiles under observation that find themselves under the label `quantile` (less or equal). To access bucket `x` (where `x` is a number), specify the value `x` for the label `quantile`.
`metrics.``unregister_callback`(callback)

Unregister a function named `callback` that is called right before metric collection on plugin export.

Параметры: callback (`function`) – a function that takes no parameters.

Example:

```local cpu_callback = function()
local cpu_metrics = require('metrics.psutils.cpu')
cpu_metrics.update()
end

metrics.register_callback(cpu_callback)

-- after a while, we don't need that callback function anymore

metrics.unregister_callback(cpu_callback)
```

metrics.http_middleware API

`metrics.http_middleware.``build_default_collector`(type_name, name[, help])

Register and return a collector for the middleware.

Параметры: type_name (`string`) – collector type: `histogram` or `summary`. The default is `histogram`. name (`string`) – collector name. The default is `http_server_request_latency`. help (`string`) – collector description. The default is `HTTP Server Request Latency`. A collector object

Possible errors:

• A collector with the same type and name already exists in the registry.
`metrics.http_middleware.``configure_default_collector`(type_name, name, help)

Register a collector for the middleware and set it as default.

Параметры: type_name (`string`) – collector type: `histogram` or `summary`. The default is `histogram`. name (`string`) – collector name. The default is `http_server_request_latency`. help (`string`) – collector description. The default is `HTTP Server Request Latency`.

Possible errors:

• A collector with the same type and name already exists in the registry.
`metrics.http_middleware.``get_default_collector`()

Return the default collector. If the default collector hasn’t been set yet, register it (with default http_middleware.build_default_collector() parameters) and set it as default.

Return: A collector object
`metrics.http_middleware.``set_default_collector`(collector)

Set the default collector.

Параметры: collector – middleware collector object
`metrics.http_middleware.``v1`(handler, collector)

Latency measuring wrap-up for the HTTP ver. `1.x.x` handler. Returns a wrapped handler.

Параметры: handler (`function`) – handler function. collector – middleware collector object. If not set, the default collector is used (like in http_middleware.get_default_collector()).
```httpd:route(route, http_middleware.v1(request_handler, collector))