Data storage | Tarantool
Tarantool
Check out the new release policy
Concepts Data model Data storage

Data storage

Tarantool operates data in the form of tuples.

tuple

A tuple is a group of data values in Tarantool’s memory. Think of it as a “database record” or a “row”. The data values in the tuple are called fields.

When Tarantool returns a tuple value in the console, by default, it uses YAML format, for example: [3, 'Ace of Base', 1993].

Internally, Tarantool stores tuples as MsgPack arrays.

field

Fields are distinct data values, contained in a tuple. They play the same role as “row columns” or “record fields” in relational databases, with a few improvements:

  • fields can be composite structures, such as arrays or maps,
  • fields don’t need to have names.

A given tuple may have any number of fields, and the fields may be of different types.

The field’s number is the identifier of the field. Numbers are counted from base 1 in Lua and other 1-based languages, or from base 0 in languages like PHP or C/C++. So, 1 or 0 can be used in some contexts to refer to the first field of a tuple.

Tarantool stores tuples in containers called spaces. In the example above, there’s a space called tester.

space

In Tarantool, a space is a primary container that stores data. It is analogous to tables in relational databases. Spaces contain tuples – the Tarantool name for database records. The number of tuples in a space is unlimited.

At least one space is required to store data with Tarantool. Each space has the following attributes:

  • a unique name specified by the user,
  • a unique numeric identifier which can be specified by the user, but usually is assigned automatically by Tarantool,
  • an engine: memtx (default) — in-memory engine, fast but limited in size, or vinyl — on-disk engine for huge data sets.

To be functional, a space also needs to have a primary index. It can also have secondary indexes.

Tarantool is both a database manager and an application server. Therefore a developer often deals with two type sets: the types of the programming language (such as Lua) and the types of the Tarantool storage format (MsgPack).

Scalar / compound MsgPack type Lua type Example value
scalar nil cdata box.NULL
scalar boolean boolean true
scalar string string 'A B C'
scalar integer number 12345
scalar integer cdata 12345
scalar float64 (double) number 1.2345
scalar float64 (double) cdata 1.2345
scalar binary cdata [!!binary 3t7e]
scalar ext (for Tarantool decimal) cdata 1.2
scalar ext (for Tarantool datetime) cdata '2021-08-20T16:21:25.122999906 Europe/Berlin'
scalar ext (for Tarantool interval) cdata +1 months, 1 days
scalar ext (for Tarantool uuid) cdata 12a34b5c-de67-8f90-123g-h4567ab8901
compound map table (with string keys) {'a': 5, 'b': 6}
compound array table (with integer keys) [1, 2, 3, 4, 5]
compound array tuple (cdata) [12345, 'A B C']

Note

MsgPack values have variable lengths. So, for example, the smallest number requires only one byte, but the largest number requires nine bytes.

Note

The Lua nil type is encoded as MsgPack nil but decoded as msgpack.NULL.

nil. In Lua, the nil type has only one possible value, also called nil. Tarantool displays it as null when using the default YAML format. Nil may be compared to values of any types with == (is-equal) or ~= (is-not-equal), but other comparison operations will not work. Nil may not be used in Lua tables; the workaround is to use box.NULL because nil == box.NULL is true. Example: nil.

boolean. A boolean is either true or false. Example: true.

integer. The Tarantool integer type is for integers between -9223372036854775808 and 18446744073709551615, which is about 18 quintillion. This type corresponds to the number type in Lua and to the integer type in MsgPack. Example: -2^63.

unsigned. The Tarantool unsigned type is for integers between 0 and 18446744073709551615. So it is a subset of integer. Example: 123456.

double. The double field type exists mainly to be equivalent to Tarantool/SQL’s DOUBLE data type. In msgpuck.h (Tarantool’s interface to MsgPack), the storage type is MP_DOUBLE and the size of the encoded value is always 9 bytes. In Lua, fields of the double type can only contain non-integer numeric values and cdata values with double floating-point numbers. Examples: 1.234, -44, 1.447e+44.

To avoid using the wrong kind of values inadvertently, use ffi.cast() when searching or changing double fields. For example, instead of space_object:insert{value} use ffi = require('ffi') ... space_object:insert({ffi.cast('double',value)}). Example:

s = box.schema.space.create('s', {format = {{'d', 'double'}}})
s:create_index('ii')
s:insert({1.1})
ffi = require('ffi')
s:insert({ffi.cast('double', 1)})
s:insert({ffi.cast('double', tonumber('123'))})
s:select(1.1)
s:select({ffi.cast('double', 1)})

Arithmetic with cdata double will not work reliably, so for Lua, it is better to use the number type. This warning does not apply for Tarantool/SQL because Tarantool/SQL does implicit casting.

number. The Tarantool number field may have both integer and floating-point values, although in Lua a number is a double-precision floating-point.

Tarantool will try to store a Lua number as floating-point if the value contains a decimal point or is very large (greater than 100 trillion = 1e14), otherwise Tarantool will store it as an integer. To ensure that even very large numbers are stored as integers, use the tonumber64 function, or the LL (Long Long) suffix, or the ULL (Unsigned Long Long) suffix. Here are examples of numbers using regular notation, exponential notation, the ULL suffix and the tonumber64 function: -55, -2.7e+20, 100000000000000ULL, tonumber64('18446744073709551615').

You can also use the ffi module to specify a C type to cast the number to. In this case, the number will be stored as cdata.

decimal. The Tarantool decimal type is stored as a MsgPack ext (Extension). Values with the decimal type are not floating-point values although they may contain decimal points. They are exact with up to 38 digits of precision. Example: a value returned by a function in the decimal module.

datetime. Introduced in v. 2.10.0. The Tarantool datetime type facilitates operations with date and time, accounting for leap years or the varying number of days in a month. It is stored as a MsgPack ext (Extension). Operations with this data type use code from c-dt, a third-party library.

For more information, see Module datetime.

interval. Introduced in v. 2.10.0. The Tarantool interval type represents periods of time. They can be added to or subtracted from datetime values or each other. Operations with this data type use code from c-dt, a third-party library. The type is stored as a MsgPack ext (Extension). For more information, see Module datetime.

string. A string is a variable-length sequence of bytes, usually represented with alphanumeric characters inside single quotes. In both Lua and MsgPack, strings are treated as binary data, with no attempts to determine a string’s character set or to perform any string conversion – unless there is an optional collation. So, usually, string sorting and comparison are done byte-by-byte, without any special collation rules applied. For example, numbers are ordered by their point on the number line, so 2345 is greater than 500; meanwhile, strings are ordered by the encoding of the first byte, then the encoding of the second byte, and so on, so '2345' is less than '500'. Example: 'A, B, C'.

bin. A bin (binary) value is not directly supported by Lua but there is a Tarantool type varbinary which is encoded as MsgPack binary. For an (advanced) example showing how to insert varbinary into a database, see the Cookbook Recipe for ffi_varbinary_insert. Example: "\65 \66 \67".

uuid. The Tarantool uuid type is used for Universally Unique Identifiers. Since version 2.4.1 Tarantool stores uuid values as a MsgPack ext (Extension).

Example: 64d22e4d-ac92-4a23-899a-e5934af5479.

array. An array is represented in Lua with {...} (braces). Examples: lists of numbers representing points in geometric figures: {10, 11}, {3, 5, 9, 10}.

table. Lua tables with string keys are stored as MsgPack maps; Lua tables with integer keys starting with 1 are stored as MsgPack arrays. Nils may not be used in Lua tables; the workaround is to use box.NULL. Example: a box.space.tester:select() request will return a Lua table.

tuple. A tuple is a light reference to a MsgPack array stored in the database. It is a special type (cdata) to avoid conversion to a Lua table on retrieval. A few functions may return tables with multiple tuples. For tuple examples, see box.tuple.

scalar. Values in a scalar field can be boolean, integer, unsigned, double, number, decimal, string, uuid, or varbinary; but not array, map, or tuple. Examples: true, 1, 'xxx'.

any. Values in a field of this type can be boolean, integer, unsigned, double, number, decimal, string, uuid, varbinary, array, map, or tuple. Examples: true, 1, 'xxx', {box.NULL, 0}.

Examples of insert requests with different field types:

tarantool> box.space.K:insert{1,nil,true,'A B C',12345,1.2345}
---
- [1, null, true, 'A B C', 12345, 1.2345]
...
tarantool> box.space.K:insert{2,{['a']=5,['b']=6}}
---
- [2, {'a': 5, 'b': 6}]
...
tarantool> box.space.K:insert{3,{1,2,3,4,5}}
---
- [3, [1, 2, 3, 4, 5]]
...

To learn more about what values can be stored in indexed fields, read the Indexes section.

By default, when Tarantool compares strings, it uses the so-called binary collation. It only considers the numeric value of each byte in a string. For example, the encoding of 'A' (what used to be called the “ASCII value”) is 65, the encoding of 'B' is 66, and the encoding of 'a' is 98. Therefore, if the string is encoded with ASCII or UTF-8, then 'A' < 'B' < 'a'.

Binary collation is the best choice for fast deterministic simple maintenance and searching with Tarantool indexes.

But if you want the ordering that you see in phone books and dictionaries, then you need Tarantool’s optional collations, such as unicode and unicode_ci, which allow for 'a' < 'A' < 'B' and 'a' == 'A' < 'B' respectively.

The unicode and unicode_ci optional collations use the ordering according to the Default Unicode Collation Element Table (DUCET) and the rules described in Unicode® Technical Standard #10 Unicode Collation Algorithm (UTS #10 UCA). The only difference between the two collations is about weights:

  • unicode collation observes L1, L2, and L3 weights (strength = ‘tertiary’);
  • unicode_ci collation observes only L1 weights (strength = ‘primary’), so for example 'a' == 'A' == 'á' == 'Á'.

As an example, take some Russian words:

'ЕЛЕ'
'елейный'
'ёлка'
'еловый'
'елозить'
'Ёлочка'
'ёлочный'
'ЕЛь'
'ель'

…and show the difference in ordering and selecting by index:

  • with unicode collation:

    tarantool> box.space.T:create_index('I', {parts = {{field = 1, type = 'str', collation='unicode'}}})
    ...
    tarantool> box.space.T.index.I:select()
    ---
    - - ['ЕЛЕ']
      - ['елейный']
      - ['ёлка']
      - ['еловый']
      - ['елозить']
      - ['Ёлочка']
      - ['ёлочный']
      - ['ель']
      - ['ЕЛь']
    ...
    tarantool> box.space.T.index.I:select{'ЁлКа'}
    ---
    - []
    ...
    
  • with unicode_ci collation:

    tarantool> box.space.T:create_index('I', {parts = {{field = 1, type ='str', collation='unicode_ci'}}})
    ...
    tarantool> box.space.T.index.I:select()
    ---
    - - ['ЕЛЕ']
      - ['елейный']
      - ['ёлка']
      - ['еловый']
      - ['елозить']
      - ['Ёлочка']
      - ['ёлочный']
      - ['ЕЛь']
    ...
    tarantool> box.space.T.index.I:select{'ЁлКа'}
    ---
    - - ['ёлка']
    ...
    

In all, collation involves much more than these simple examples of upper case / lower case and accented / unaccented equivalence in alphabets. We also consider variations of the same character, non-alphabetic writing systems, and special rules that apply for combinations of characters.

For English, Russian, and most other languages and use cases, use the “unicode” and “unicode_ci” collations. If you need Cyrillic letters ‘Е’ and ‘Ё’ to have the same level-1 weights, try the Kyrgyz collation.

The tailored optional collations: for other languages, Tarantool supplies tailored collations for every modern language that has more than a million native speakers, and for specialized situations such as the difference between dictionary order and telephone book order. Run box.space._collation:select() to see the complete list.

The tailored collation names have the form unicode_[language code]_[strength], where language code is a standard 2-character or 3-character language abbreviation, and strength is s1 for “primary strength” (level-1 weights), s2 for “secondary”, s3 for “tertiary”. Tarantool uses the same language codes as the ones in the “list of tailorable locales” on man pages of Ubuntu and Fedora. Charts explaining the precise differences from DUCET order are in the Common Language Data Repository.

For better control over stored data, Tarantool supports constraints – user-defined limitations on the values of certain fields or entire tuples. Together with data types, constraints allow limiting the ranges of available field values both syntactically and semantically.

For example, the field age typically has the number type, so it cannot store strings or boolean values. However, it can still have values that don’t make sense, such as negative numbers. This is where constraints come to help.

There are two types of constraints in Tarantool:

  • Field constraints check that the value being assigned to a field satisfies a given condition. For example, age must be non-negative.
  • Tuple constraints check complex conditions that can involve all fields of a tuple. For example, a tuple contains a date in three fields: year, month, and day. You can validate day values based on the month value (and even year if you consider leap years).

Field constraints work faster, while tuple constraints allow implementing a wider range of limitations.

Constraints use stored Lua functions, which must return true when the constraint is satisfied. Other return values (including nil) and exceptions make the check fail and prevent tuple insertion or modification.

To create a constraint function, use func.create with function body.

Constraint functions take two parameters:

  • The field value and the constraint name for field constraints.

    tarantool> box.schema.func.create('check_age',
             > {language = 'LUA', is_deterministic = true, body = 'function(f, c) return (f >= 0 and f < 150) end'})
    ---
    ...
    
  • The tuple and the constraint name for tuple constraints.

    tarantool> box.schema.func.create('check_person',
             > {language = 'LUA', is_deterministic = true, body = 'function(t, c) return (t.age >= 0 and #(t.name) > 3) end'})
    ---
    ...
    

Warning

Tarantool doesn’t check field names used in tuple constraint functions. If a field referenced in a tuple constraint gets renamed, this constraint will break and prevent further insertions and modifications in the space.

To create a constraint in a space, specify the corresponding function’s name in the constraint parameter:

  • Field constraints: when setting up the space format:

    tarantool> box.space.person:format({
             > {name = 'id',   type = 'number'},
             > {name = 'name', type = 'string'},
             > {name = 'age',  type = 'number', constraint = 'check_age'},
             > })
    
  • Tuple constraints: when creating or altering a space:

    tarantool> box.schema.space.create('person', { engine = 'memtx', constraint = 'check_tuple'})
    

In both cases, constraint can contain multiple function names passed as a tuple. Each constraint can have an optional name:

constraint = {'age_constraint' = 'check_age', 'name_constraint' = 'check_name'}

Note

When adding a constraint to an existing space with data, Tarantool checks it against the stored data. If there are fields or tuples that don’t satisfy the constraint, it won’t be applied to the space.

Foreign keys provide links between related spaces, therefore maintaining the referential integrity of the database.

Some fields can only contain values present in other spaces. For example, shop orders always belong to existing customers. Hence, all values of the customer field of the orders space must exist in the customers space. In this case, customers is a parent space for orders (its child space). When two spaces are linked with a foreign key, each time a tuple is inserted or modified in the child space, Tarantool checks that a corresponding value is present in the parent space.

../../../_images/foreign_key.svg

There are two types of foreign keys in Tarantool:

  • Field foreign keys check that the value being assigned to a field is present in a particular field of another space. For example, the customer value in a tuple from the orders space must match an id stored in the customers space.
  • Tuple foreign keys check that multiple fields of a tuple have a match in another space. For example, if the orders space has fields customer_id and customer_name, a tuple foreign key can check that the customers space contains a tuple with both these values in the corresponding fields.

Field foreign keys work faster while tuple foreign keys allow implementing more strict references.

Important

For each foreign key, there must exist an index that includes all its fields.

To create a foreign key in a space, specify the parent space and linked fields in the foreign_key parameter. Fields can be referenced by name or by number:

  • Field foreign keys: when setting up the space format.

    tarantool> box.space.orders:format({
             > {name = 'id',   type = 'number'},
             > {name = 'customer_id', foreign_key = {space = 'customers', field = 'id'}}, -- or field = 1
             > {name = 'price_total',  type = 'number'},
             > })
    
  • Tuple foreign keys: when creating or altering a space. Note that for foreign keys with multiple fields there must exist an index that includes all these fields.

tarantool> box.schema.space.create("orders", {foreign_key={space='customers', field={customer_id='id', customer_name='name'}}})
---
...
tarantool> box.space.orders:format({
         > {name = "id", type = "number"},
         > {name = "customer_id" },
         > {name = "customer_name"},
         > {name = "price_total",    type = "number"},
         > })

Note

Type can be omitted for foreign key fields because it’s defined in the parent space.

Foreign keys can have an optional name.

foreign_key = {customer = {space = '...', field = {...}}}

A space can have multiple tuple foreign keys. In this case, they all must have names.

foreign_key = {customer = {space = '...', field = {...} }, item = { space = '...', field = {...}}}

Tarantool performs integrity checks upon data modifications in parent spaces. If you try to remove a tuple referenced by a foreign key or an entire parent space, you will get an error.

Important

Renaming parent spaces or referenced fields may break the corresponding foreign keys and prevent further insertions or modifications in the child spaces.

Found what you were looking for?
Feedback