Version:

Reference / Rocks reference / Module vshard / Configuration reference
Reference / Rocks reference / Module vshard / Configuration reference

Configuration reference

Configuration reference

Basic parameters

sharding

A field defining the logical topology of the sharded Tarantool cluster.

Type: table
Default: false
Dynamic: yes
weights

A field defining the configuration of relative weights for each zone pair in a replica set. See the Replica weights section.

Type: table
Default: false
Dynamic: yes
shard_index

An index over the bucket id.

Type: non-empty string or non-negative integer
Default: coincides with the bucket id number
Dynamic: no
bucket_count

The total number of buckets in a cluster.

This number should be several orders of magnitude larger than the potential number of cluster nodes, considering potential scaling out in the foreseeable future.

Example:

If the estimated number of nodes is M, then the data set should be divided into 100M or even 1000M buckets, depending on the planned scaling out. This number is certainly greater than the potential number of cluster nodes in the system being designed.

Keep in mind that too many buckets can cause a need to allocate more memory to store routing information. On the other hand, an insufficient number of buckets can lead to decreased granularity when rebalancing.

Type: number
Default: 3000
Dynamic: no
collect_bucket_garbage_interval

The interval between garbage collector actions, in seconds.

Type: number
Default: 0.5
Dynamic: yes
collect_lua_garbage

If set to true, the Lua collectgarbage() function is called periodically.

Type: boolean
Default: no
Dynamic: yes
sync_timeout

Timeout to wait for synchronization of the old master with replicas before demotion. Used when switching a master or when manually calling the sync() function.

Type: number
Default: 1
Dynamic: yes
rebalancer_disbalance_threshold

A maximum bucket disbalance threshold, in percent. The threshold is calculated for each replica set using the following formula:

|etalon_bucket_count - real_bucket_count| / etalon_bucket_count * 100
Type: number
Default: 1
Dynamic: yes
rebalancer_max_receiving

The maximum number of buckets that can be received in parallel by a single replica set. This number must be limited, because when a new replica set is added to a cluster, the rebalancer sends a very large amount of buckets from the existing replica sets to the new replica set. This produces a heavy load on the new replica set.

Example:

Suppose rebalancer_max_receiving is equal to 100, bucket_count is equal to 1000. There are 3 replica sets with 333, 333 and 334 buckets on each respectively. When a new replica set is added, each replica set’s etalon_bucket_count becomes equal to 250. Rather than receiving all 250 buckets at once, the new replica set receives 100, 100 and 50 buckets sequentially.

Type: number
Default: 100
Dynamic: yes
rebalancer_max_sending

The degree of parallelism for parallel rebalancing.

Works for storages only, ignored for routers.

The maximum value is 15.

Type: number
Default: 1
Dynamic: yes

Replica set functions

uuid

A unique identifier of a replica set.

Type:
Default:
Dynamic:
weight

A weight of a replica set. See the Replica set weights section for details.

Type:
Default: 1
Dynamic:

API reference

Router public API

vshard.router.bootstrap()

Perform the initial cluster bootstrap and distribute all buckets across the replica sets.

Parameters:
  • timeout – a number of seconds before ending a bootstrap attempt as unsuccessful. Recreate the cluster in case of bootstrap timeout.
vshard.router.cfg(cfg)

Configure the database and start sharding for the specified router instance. See the sample configuration above.

Parameters:
  • cfg – a configuration table
vshard.router.new(name, cfg)

Create a new router instance. vshard supports multiple routers in a single Tarantool instance. Each router can be connected to any vshard cluster, and multiple routers can be connected to the same cluster.

A router created via vshard.router.new() works in the same way as a static router, but the method name is preceded by a colon (vshard.router:method_name(...)), while for a static router the method name is preceded by a period (vshard.router.method_name(...)).

A static router can be obtained via the vshard.router.static() method and then used like a router created via the vshard.router.new() method.

Note

box.cfg is shared among all the routers of a single instance.

Parameters:
  • name – a router instance name. This name is used as a prefix in logs of the router and must be unique within the instance
  • cfg – a configuration table. The sample configuration is described above.
Return:

a router instance, if created successfully; otherwise, nil and an error object

vshard.router.call(bucket_id, mode, function_name, {argument_list}, {options})

Call the function identified by function-name on the shard storing the bucket identified by bucket_id. See the Processing requests section for details on function operation.

Parameters:
  • bucket_id – a bucket identifier
  • mode – either a string = ‘read’|’write’, or a map with mode=’read’|’write’ and/or prefer_replica=true|false and/or balance=true|false.
  • function_name – a function to execute
  • argument_list – an array of the function’s arguments
  • options
    • timeout – a request timeout, in seconds. If the router cannot identify a shard with the specified bucket_id, the operation will be repeated until the timeout is reached.

The mode parameter has two possible forms: a string or a map. Examples of the string form are: 'read', 'write'. Examples of the map form are: {mode='read'}, {mode='write'}, {mode='read', prefer_replica=true}, {mode='read', balance=true}, {mode='read', prefer_replica=true, balance=true}.

If 'write' is specified then the target is the master.

If prefer_replica=true is specified then the preferred target is one of the replicas, but the target is the master if there is no conveniently available replica.

It may be good to specify prefer_replica=true for functions which are expensive in terms of resource use, to avoid slowing down the master.

If balance=true then there is load balancing – reads are distributed over all the nodes in the replica set in round-robin fashion, with a preference for replicas if prefer_replica=true is also set.

Return:

The original return value of the executed function, or nil and error object. The error object has a type attribute equal to ShardingError or one of the regular Tarantool errors (ClientError, OutOfMemory, SocketError, etc.).

ShardingError is returned on errors specific for sharding: the master is missing, wrong bucket id, etc. It has an attribute code containing one of the values from the vshard.error.code.* LUA table, an optional attribute containing a message with the human-readable error description, and other attributes specific for the error code.

Examples:

To call customer_add function from vshard/example, say:

vshard.router.call(100, 'write', 'customer_add', {{customer_id = 2, bucket_id = 100, name = 'name2', accounts = {}}}, {timeout = 100})
-- or, the same thing but with a map for the second argument
vshard.router.call(100, {mode='write'}, 'customer_add', {{customer_id = 2, bucket_id = 100, name = 'name2', accounts = {}}}, {timeout = 100})
vshard.router.callro(bucket_id, function_name, {argument_list}, {options})

Call the function identified by function-name on the shard storing the bucket identified by bucket_id, in read-only mode (similar to calling vshard.router.call with mode=’read’). See the Processing requests section for details on function operation.

Parameters:
  • bucket_id – a bucket identifier
  • function_name – a function to execute
  • argument_list – an array of the function’s arguments
  • options
    • timeout – a request timeout, in seconds. In case the router cannot identify a shard with the bucket id, the operation will be repeated until the timeout is reached.
Return:

The original return value of the executed function, or nil and error object. The error object has a type attribute equal to ShardingError or one of the regular Tarantool errors (ClientError, OutOfMemory, SocketError, etc.).

ShardingError is returned on errors specific for sharding: the replica set is not available, the master is missing, wrong bucket id, etc. It has an attribute code containing one of the values from the vshard.error.code.* LUA table, an optional attribute containing a message with the human-readable error description, and other attributes specific for this error code.

vshard.router.callrw(bucket_id, function_name, {argument_list}, {options})

Call the function identified by function-name on the shard storing the bucket identified by bucket_id, in read-write mode (similar to calling vshard.router.call with mode=’write’). See the Processing requests section for details on function operation.

Parameters:
  • bucket_id – a bucket identifier
  • function_name – a function to execute
  • argument_list – an array of the function’s arguments
  • options
    • timeout – a request timeout, in seconds. In case the router cannot identify a shard with the bucket id, the operation will be repeated until the timeout is reached.
Return:

The original return value of the executed function, or nil and error object. The error object has a type attribute equal to ShardingError or one of the regular Tarantool errors (ClientError, OutOfMemory, SocketError, etc.).

ShardingError is returned on errors specific for sharding: the replica set is not available, the master is missing, wrong bucket id, etc. It has an attribute code containing one of the values from the vshard.error.code.* LUA table, an optional attribute containing a message with the human-readable error description, and other attributes specific for this error code.

vshard.router.callre(bucket_id, function_name, {argument_list}, {options})

Call the function identified by function-name on the shard storing the bucket identified by bucket_id, in read-only mode (similar to calling vshard.router.call with mode='read'), with preference for a replica rather than a master (similar to calling vshard.router.call with prefer_replica = true). See the Processing requests section for details on function operation.

Parameters:
  • bucket_id – a bucket identifier
  • function_name – a function to execute
  • argument_list – an array of the function’s arguments
  • options
    • timeout – a request timeout, in seconds. In case the router cannot identify a shard with the bucket id, the operation will be repeated until the timeout is reached.
Return:

The original return value of the executed function, or nil and error object. The error object has a type attribute equal to ShardingError or one of the regular Tarantool errors (ClientError, OutOfMemory, SocketError, etc.).

ShardingError is returned on errors specific for sharding: the replica set is not available, the master is missing, wrong bucket id, etc. It has an attribute code containing one of the values from the vshard.error.code.* LUA table, an optional attribute containing a message with the human-readable error description, and other attributes specific for this error code.

vshard.router.callbro(bucket_id, function_name, {argument_list}, {options})

This has the same effect as vshard.router.call() with mode parameter = {mode=’read’, balance=true}.

vshard.router.callbre(bucket_id, function_name, {argument_list}, {options})

This has the same effect as vshard.router.call() with mode parameter = {mode='read', balance=true, prefer_replica=true}.

vshard.router.route(bucket_id)

Return the replica set object for the bucket with the specified bucket id value.

Parameters:
  • bucket_id – a bucket identifier
Return:

a replica set object

Example:

replicaset = vshard.router.route(123)
vshard.router.routeall()

Return all available replica set objects.

Return:a map of the following type: {UUID = replicaset}
Rtype:a replica set object

Example:

replicaset = vshard.router.routeall()
vshard.router.bucket_id(key)

Calculate the bucket id using a simple built-in hash function.

Parameters:
  • key – a hash key. This can be any Lua object (number, table, string).
Return:

a bucket identifier

Rtype:

number

Example:

bucket_id = vshard.router.bucket_id(18374927634039)
vshard.router.bucket_count()

Return the total number of buckets specified in vshard.router.cfg().

Return:the total number of buckets
Rtype:number
vshard.router.sync(timeout)

Wait until the dataset is synchronized on replicas.

Parameters:
  • timeout – a timeout, in seconds
Return:

true if the dataset was synchronized successfully; or nil and err explaining why the dataset cannot be synchronized.

vshard.router.discovery_wakeup()

Force wakeup of the bucket discovery fiber.

vshard.router.info()

Return information about each instance.

Return:

Replica set parameters:

  • replica set uuid
  • master instance parameters
  • replica instance parameters

Instance parameters:

  • uri — URI of the instance
  • uuid — UUID of the instance
  • status – status of the instance (available, unreachable, missing)
  • network_timeout – a timeout for the request. The value is updated automatically on each 10th successful request and each 2nd failed request.

Bucket parameters:

  • available_ro – the number of buckets known to the router and available for read requests
  • available_rw – the number of buckets known to the router and available for read and write requests
  • unavailable – the number of buckets known to the router but unavailable for any requests
  • unreachable – the number of buckets whose replica sets are not known to the router

Example:

tarantool> vshard.router.info()
---
- replicasets:
    ac522f65-aa94-4134-9f64-51ee384f1a54:
      replica: &0
        network_timeout: 0.5
        status: available
        uri: storage@127.0.0.1:3303
        uuid: 1e02ae8a-afc0-4e91-ba34-843a356b8ed7
      uuid: ac522f65-aa94-4134-9f64-51ee384f1a54
      master: *0
    cbf06940-0790-498b-948d-042b62cf3d29:
      replica: &1
        network_timeout: 0.5
        status: available
        uri: storage@127.0.0.1:3301
        uuid: 8a274925-a26d-47fc-9e1b-af88ce939412
      uuid: cbf06940-0790-498b-948d-042b62cf3d29
      master: *1
  bucket:
    unreachable: 0
    available_ro: 0
    unknown: 0
    available_rw: 3000
  status: 0
  alerts: []
...
vshard.router.buckets_info()

Return information about each bucket. Since a bucket map can be huge, only the required range of buckets can be specified.

Parameters:
  • offset – the offset in a bucket map of the first bucket to show
  • limit – the maximum number of buckets to show
Return:

a map of the following type: {bucket_id = 'unknown'/replicaset_uuid}

replicaset.call(replicaset, function_name, {argument_list}, {options})

Call a function on a nearest available master (distances are defined using replica.zone and cfg.weights matrix) with specified arguments.

Note

The replicaset.call method is similar to replicaset.callrw.

Parameters:
  • replicaset – UUID of a replica set
  • function_name – function to execute
  • argument_list – array of the function’s arguments
  • options
    • timeout – a request timeout, in seconds. In case the router cannot identify a shard with the bucket id, the operation will be repeated until the timeout is reached.
replicaset.callrw(replicaset, function_name, {argument_list}, {options})

Call a function on a nearest available master (distances are defined using replica.zone and cfg.weights matrix) with a specified arguments.

Note

The replicaset.callrw method is similar to replicaset.call.

Parameters:
  • replicaset – UUID of a replica set
  • function_name – function to execute
  • argument_list – array of the function’s arguments
  • options
    • timeout – a request timeout, in seconds. In case the router cannot identify a shard with the bucket id, the operation will be repeated until the timeout is reached.
replicaset.callro(function_name, {argument_list}, {options})

Call a function on the nearest available replica (distances are defined using replica.zone and cfg.weights matrix) with specified arguments. It is recommended to call only read-only functions using replicaset.callro(), as the function can be executed not only on a master, but also on replicas.

Parameters:
  • replicaset – UUID of a replica set
  • function_name – function to execute
  • argument_list – array of the function’s arguments
  • options
    • timeout – a request timeout, in seconds. In case the router cannot identify a shard with the bucket id, the operation will be repeated until the timeout is reached.
replicaset.callre(function_name, {argument_list}, {options})

Call a function on the nearest available replica (distances are defined using replica.zone and cfg.weights matrix) with specified arguments, with preference for a replica rather than a master (similar to calling vshard.router.call with prefer_replica = true). It is recommended to call only read-only functions using replicaset.callre(), as the function can be executed not only on a master, but also on replicas.

Parameters:
  • replicaset – UUID of a replica set
  • function_name – function to execute
  • argument_list – array of the function’s arguments
  • options
    • timeout – a request timeout, in seconds. In case the router cannot identify a shard with the bucket id, the operation will be repeated until the timeout is reached.

Router internal API

vshard.router.bucket_discovery(bucket_id)

Search for the bucket in the whole cluster. If the bucket is not found, it is likely that it does not exist. The bucket might also be moved during rebalancing and currently is in the RECEIVING state.

Parameters:
  • bucket_id – a bucket identifier

Storage public API

vshard.storage.cfg(cfg, name)

Configure the database and start sharding for the specified storage instance.

Parameters:
  • cfg – a storage configuration
  • instance_uuid – UUID of the instance
vshard.storage.info()

Return information about the storage instance in the following format:

tarantool> vshard.storage.info()
---
- buckets:
    2995:
      status: active
      id: 2995
    2997:
      status: active
      id: 2997
    2999:
      status: active
      id: 2999
  replicasets:
    2dd0a343-624e-4d3a-861d-f45efc571cd3:
      uuid: 2dd0a343-624e-4d3a-861d-f45efc571cd3
      master:
        state: active
        uri: storage:storage@127.0.0.1:3301
        uuid: 2ec29309-17b6-43df-ab07-b528e1243a79
    c7ad642f-2cd8-4a8c-bb4e-4999ac70bba1:
      uuid: c7ad642f-2cd8-4a8c-bb4e-4999ac70bba1
      master:
        state: active
        uri: storage:storage@127.0.0.1:3303
        uuid: 810d85ef-4ce4-4066-9896-3c352fec9e64
...
vshard.storage.call(bucket_id, mode, function_name, {argument_list})

Call the specified function on the current storage instance.

Parameters:
  • bucket_id – a bucket identifier
  • mode – a type of the function: ‘read’ or ‘write’
  • function_name – function to execute
  • argument_list – array of the function’s arguments
Return:

The original return value of the executed function, or nil and error object.

vshard.storage.sync(timeout)

Wait until the dataset is synchronized on replicas.

Parameters:
  • timeout – a timeout, in seconds
Return:

true if the dataset was synchronized successfully; or nil and err explaining why the dataset cannot be synchronized.

vshard.storage.bucket_pin(bucket_id)

Pin a bucket to a replica set. A pinned bucket cannot be moved even if it breaks the cluster balance.

Parameters:
  • bucket_id – a bucket identifier
Return:

true if the bucket is pinned successfully; or nil and err explaining why the bucket cannot be pinned

vshard.storage.bucket_unpin(bucket_id)

Return a pinned bucket back into the active state.

Parameters:
  • bucket_id – a bucket identifier
Return:

true if the bucket is unpinned successfully; or nil and err explaining why the bucket cannot be unpinned

vshard.storage.bucket_ref(bucket_id, mode)

Create an RO or RW ref.

Parameters:
  • bucket_id – a bucket identifier
  • mode – ‘read’ or ‘write’
Return:

true if the bucket ref is created successfully; or nil and err explaining why the ref cannot be created

vshard.storage.bucket_refro()

An alias for vshard.storage.bucket_ref in the RO mode.

vshard.storage.bucket_refrw()

An alias for vshard.storage.bucket_ref in the RW mode.

vshard.storage.bucket_unref(bucket_id, mode)

Remove a RO/RW ref.

Parameters:
  • bucket_id – a bucket identifier
  • mode – ‘read’ or ‘write’
Return:

true if the bucket ref is removed successfully; or nil and err explaining why the ref cannot be removed

vshard.storage.bucket_unrefro()

An alias for vshard.storage.bucket_unref in the RO mode.

vshard.storage.bucket_unrefrw()

An alias for vshard.storage.bucket_unref in the RW mode.

vshard.storage.find_garbage_bucket(bucket_index, control)

Find a bucket which has data in a space but is not stored in a _bucket space; or is in a GARBAGE state.

Parameters:
  • bucket_index – index of a space with the part of a bucket id
  • control – a garbage collector controller. If there is an increased buckets generation, then the search should be interrupted.
Return:

an identifier of the bucket in the garbage state, if found; otherwise, nil

vshard.storage.buckets_info()

Return information about each bucket located in storage. For example:

vshard.storage.buckets_info(1)
---
- 1:
    status: active
    ref_rw: 1
    ref_ro: 1
    ro_lock: true
    rw_lock: true
    id: 1
vshard.storage.buckets_count()

Return the number of buckets located in storage.

vshard.storage.recovery_wakeup()

Immediately wake up a recovery fiber, if it exists.

vshard.storage.rebalancing_is_in_progress()

Return a flag indicating whether rebalancing is in progress. The result is true if the node is currently applying routes received from a rebalancer node in the special fiber.

vshard.storage.is_locked()

Return a flag indicating whether storage is invisible to the rebalancer.

vshard.storage.rebalancer_disable()

Disable rebalancing. A disabled rebalancer sleeps until it is enabled again with vshard.storage.rebalancer_enable().

vshard.storage.rebalancer_enable()

Enable rebalancing.

vshard.storage.sharded_spaces()

Show the spaces that are visible to rebalancer and garbage collector fibers.

Storage internal API

vshard.storage.bucket_recv(bucket_id, from, data)

Receive a bucket identified by bucket id from a remote replica set.

Parameters:
  • bucket_id – a bucket identifier
  • from – UUID of source replica set
  • data – data logically stored in a bucket identified by bucket_id, in the same format as the return value from bucket_collect() <storage_api-bucket_collect>
vshard.storage.bucket_stat(bucket_id)

Return information about the bucket id:

tarantool> vshard.storage.bucket_stat(1)
---
- 0
- status: active
  id: 1
...
Parameters:
  • bucket_id – a bucket identifier
vshard.storage.bucket_delete_garbage(bucket_id)

Force garbage collection for the bucket identified by bucket_id in case the bucket was transferred to a different replica set.

Parameters:
  • bucket_id – a bucket identifier
vshard.storage.bucket_collect(bucket_id)

Collect all the data that is logically stored in the bucket identified by bucket_id:

tarantool> vshard.storage.bucket_collect(1)
---
- 0
- - - 514
    - - [10, 1, 1, 100, 'Account 10']
      - [11, 1, 1, 100, 'Account 11']
      - [12, 1, 1, 100, 'Account 12']
      - [50, 5, 1, 100, 'Account 50']
      - [51, 5, 1, 100, 'Account 51']
      - [52, 5, 1, 100, 'Account 52']
  - - 513
    - - [1, 1, 'Customer 1']
      - [5, 1, 'Customer 5']
...
Parameters:
  • bucket_id – a bucket identifier
vshard.storage.bucket_force_create(first_bucket_id, count)

Force creation of the buckets (single or multiple) on the current replica set. Use only for manual emergency recovery or for initial bootstrap.

Parameters:
  • first_bucket_id – an identifier of the first bucket in a range
  • count – the number of buckets to insert (default = 1)
vshard.storage.bucket_force_drop(bucket_id)

Drop a bucket manually for tests or emergency cases.

Parameters:
  • bucket_id – a bucket identifier
vshard.storage.bucket_send(bucket_id, to)

Send a specified bucket from the current replica set to a remote replica set.

Parameters:
  • bucket_id – bucket identifier
  • to – UUID of a remote replica set
vshard.storage.rebalancer_request_state()

Check all buckets of the host storage that have the SENT or ACTIVE state, return the number of active buckets.

Return:the number of buckets in the active state, if found; otherwise, nil
vshard.storage.buckets_discovery()

Collect an array of active bucket identifiers for discovery.