Top.Mail.Ru
Binary protocol | Tarantool
 
Reference / Internals / Binary protocol
Reference / Internals / Binary protocol

Binary protocol

Binary protocol

The binary protocol is called a “request/response” protocol because it is for sending requests to a Tarantool server and receiving responses. There is complete access to Tarantool functionality, including:

  • request multiplexing, for example ability to issue multiple requests asynchronously via the same connection
  • response format that supports zero-copy writes

The protocol can be called “binary” because the most-frequently-used database accesses are done with binary codes instead of Lua request text. Tarantool experts use it to write their own connectors, to understand network messages, to support new features that their favorite connector doesn’t support yet, or to avoid repetitive parsing by the server.

Index

Section Description
Symbols and terms Notation of binary protocol
Illustration Illustration of use
Header and body Header of a request
Requests:
IPROTO_SELECT
IPROTO_INSERT
IPROTO_REPLACE
IPROTO_UPDATE
IPROTO_DELETE
IPROTO_CALL_16
IPROTO_AUTH
IPROTO_EVAL
IPROTO_UPSERT
IPROTO_CALL
IPROTO_EXECUTE
IPROTO_NOP
IPROTO_PREPARE
IPROTO_PING
IPROTO_JOIN
IPROTO_SUBSCRIBE
IPROTO_VOTE_DEPRECATED
IPROTO_VOTE
IPROTO_FETCH_SNAPSHOT
IPROTO_REGISTER
Body of a request
Responses if no error and no SQL Responses for no SQL
Responses for errors Responses for errors
Responses for SQL Responses for SQL
Authentication Authentication after connection
Replication Replication request
XLOG/SNAP Format of .xlog and .snap files

Binary protocol – symbols and terms

For diagrams in this section, the box borders have special meanings:

0    X
+----+
|    | - X + 1 bytes
+----+
 TYPE - type of MessagePack value (if it is a MessagePack object)

+====+
|    | - Variable size MessagePack object
+====+
 TYPE - type of MessagePack value

+~~~~+
|    | - Variable size MessagePack Array/Map
+~~~~+
 TYPE - type of MessagePack value

And words that start with MP_ mean: a MessagePack type or a range of MessagePack types, including the signal and possibly including a value, with slight modification:

  • MP_NIL nil
  • MP_UINT unsigned integer
  • MP_INT either integer or unsigned integer
  • MP_STR string
  • MP_BIN binary string
  • MP_ARRAY array
  • MP_MAP map
  • MP_BOOL boolean
  • MP_FLOAT float
  • MP_DOUBLE double
  • MP_EXT extension (including the DECIMAL type)
  • MP_OBJECT any MessagePack object

Short descriptions are in MessagePack’s “spec” page.

And words that start with IPROTO_ mean: a Tarantool constant which is either defined or mentioned in the iproto_constants.h file. These constants are used as keys within MP_MAP maps.

Binary protocol – illustration

To follow the examples in this section, get a single Linux computer and start three command-line shells (“terminals”).

– On terminal #1, Start monitoring port 3302 with tcpdump:
sudo tcpdump -i lo 'port 3302' -X

On terminal #2, start a server with:
box.cfg{listen=3302}
box.schema.user.grant('guest','read,write,execute,create,drop','universe')

On terminal #3, start another server, which will act as a client, with:
box.cfg{}
net_box = require('net.box')
conn = net_box.connect('localhost:3302')
conn.space._space:select(280)

Now look at what tcpdump shows for the job connecting to 3302. – the “request”. After the words “length 32” is a packet that ends with with these 32 bytes: (we have added indented comments):

ce 00 00 00 1b   MP_UINT = decimal 27 = number of bytes after this
82               MP_MAP, size 2 (we'll call this "Main-Map")
01                 IPROTO_SYNC (Main-Map Item#1)
04                 MP_INT = 4 = number that gets incremented with each request
00                 IPROTO_REQUEST_TYPE (Main-Map Item#2)
01                 IPROTO_SELECT
86                 MP_MAP, size 6 (we'll call this "Select-Map")
10                   IPROTO_SPACE_ID (Select-Map Item#1)
cd 01 18             MP_UINT = decimal 280 = id of _space
11                   IPROTO_INDEX_ID (Select-Map Item#2)
00                   MP_INT = 0 = id of index within _space
14                   IPROTO_ITERATOR (Select-Map Item#3)
00                   MP_INT = 0 = Tarantool iterator_type.h constant ITER_EQ
13                   IPROTO_OFFSET (Select-Map Item#4)
00                   MP_INT = 0 = amount to offset
12                   IPROTO_LIMIT (Select-Map Item#5)
ce ff ff ff ff       MP_UINT = 4294967295 = biggest possible limit
20                   IPROTO_KEY (Select-Map Item#6)
91                   MP_ARRAY, size 1 (we'll call this "Key-Array")
cd 01 18               MP_UINT = 280 (Select-Map Item#6, Key-Array Item#1)
                       -- 280 is the key value that we are searching for

Now read the source code file net_box.c and skip to the line netbox_encode_select(lua_State *L). From the comments and from simple function calls like mpstream_encode_uint(&stream, IPROTO_SPACE_ID); you will be able to see how net_box put together the packet contents that you have just observed with tcpdump.

There are libraries for reading and writing MessagePack objects. C programmers sometimes include msgpuck.h.

Now you know how Tarantool itself makes requests with the binary protocol. When in doubt about a detail, consult net_box.c – it has routines for each request. Some connectors have similar code.

Binary protocol – header and body

Except during connection (which involves a greeting from the server and optional authentication that we will discuss later in this section), the protocol is pure request/response (the client requests and the server responds). It is legal to put more than one request in a packet.

Almost all requests and responses contain both a header and a body.

Normal Request/Response header and body:

0        5
+--------+ +============+ +===================================+
| HEADER | |            | |                                   |
| + BODY | |   HEADER   | |               BODY                |
|  SIZE  | |            | |                                   |
+--------+ +============+ +===================================+
  MP_INT       MP_MAP                     MP_MAP

HEADER + BODY SIZE is the size of the header plus the size of the body. It may be useful to compare it with the number of bytes remaining in the packet.

HEADER may contain, in any order:

HEADER:

+====================================+=====================+===============================+
|                                    |                     |                               |
|   0x00: IPROTO_REQUEST_TYPE        |   0x01: IPROTO_SYNC |   0x05: IPROTO_SCHEMA_VERSION |
|         or Response-Code-Indicator | MP_INT: MP_INT      | MP_INT: MP_INT                |
| MP_INT: MP_INT                     |                     |                               |
|                                    |                     |                               |
+====================================+=====================+===============================+
                          MP_MAP

IPROTO_SYNC = 0x01. An unsigned integer that should be incremented so that it is unique in every request. This integer is also returned from box.session.sync(). The IPROTO_SYNC value of a response should be the same as the IPROTO_SYNC value of a request.

IPROTO_SCHEMA_VERSION = 0x05. An unsigned number, sometimes called SCHEMA_ID, that goes up when there is a major change. In a request header IPROTO_SCHEMA_VERSION is optional, so the version will not be checked if it is absent. In a response header IPROTO_SCHEMA_VERSION is always present, and it is up to the client to check if it has changed.

IPROTO_REQUEST_TYPE or Response-Code-Indicator = 0x00. An unsigned number that indicates what will be in the BODY. In requests IPROTO_REQUEST_TYPE will be followed by IPROTO_SELECT etc. In responses Response-Code-Indicator will be followed by IPROTO_OK etc.

The BODY has the details of the request or response. In a request, it can also be absent or be an empty map. Both these states will be interpreted equally. Responses will contain the BODY anyway even if it is a IPROTO_PING request.

Have a look at file xrow.c function xrow_header_encode, to see how Tarantool encodes the header. Have a look at file net_box.c, function netbox_decode_data, to see how Tarantool decodes the header. For example, in a successful response to box.space:select(), the Response-Code-Indicator value will be 0 = IPROTO_OK and the array will have all the tuples of the result.

Binary protocol – requests

After the HEADER, for a request, there will be a body that begins with these request-type IPROTO codes.

IPROTO_SELECT = 0x01.

See space_object:select(). The body is a 6-item map:

+=========================+=========================+=========================+
|                         |                         |                         |
|   0x10: IPROTO_SPACE_ID |   0x11: IPROTO_INDEX_ID |   0x12: IPROTO_LIMIT    |
| MP_INT: MP_INT          | MP_INT: MP_INT          | MP_INT: MP_INT          |
|                         |                         |                         |
+=========================+=========================+=========================+
|                         |                         |                         |
|   0x13: IPROTO_OFFSET   |   0x14: IPROTO_ITERATOR |   0x20: IPROTO_KEY      |
| MP_INT: MP_INT          | MP_INT: MP_INT          | MP_INT: MP_ARRAY        |
|                         |                         |                         |
+=========================+=========================+=========================+
             MP_MAP

IPROTO_SPACE_ID (0x10) + MP_INT, IPROTO_INDEX_ID (0x11) + MP_INT, IPROTO_ITERATOR (0x14) + MP_INT, IPROTO_OFFSET (0x13) + MP_INT, IPROTO_LIMIT (0x12) + MP_INT, IPROTO_KEY (0x20) + MP_ARRAY (array of key values). See the illustration of IPROTO_SELECT in the earlier section, Binary protocol – illustration.

IPROTO_INSERT == 0x02.

See space_object:insert(). The body is a 2-item map:

+=========================+======================+
|                         |                      |
|   0x10: IPROTO_SPACE_ID |   0x21: IPROTO_TUPLE |
| MP_INT: MP_INT          | MP_INT: MP_ARRAY     |
|                         |                      |
+=========================+======================+
                 MP_MAP

IPROTO_SPACE_ID (0x10) + MP_INT, IPROTO_TUPLE + MP_ARRAY (array of field values).

IPROTO_REPLACE = 0x03, See space_object:replace(). The body is a 2-item map, the same as for IPROTO_INSERT:

+=========================+======================+
|                         |                      |
|   0x10: IPROTO_SPACE_ID |   0x21: IPROTO_TUPLE |
| MP_INT: MP_INT          | MP_INT: MP_ARRAY     |
|                         |                      |
+=========================+======================+
               MP_MAP

IPROTO_SPACE_ID (0x10) + MP_INT, IPROTO_TUPLE (0x21) + MP_ARRAY (array of field values).

IPROTO_UPDATE = 0x04.

See space_object:update(). The body is usually a 4-item map,

+=========================+===============================+
|                         |                               |
|   0x10: IPROTO_SPACE_ID |   0x11: IPROTO_INDEX_ID       |
| MP_INT: MP_INT          | MP_INT: MP_INT                |
|                         |                               |
+=========================+===============================+
|                         |                 +~~~~~~~~~~~+ |
|                         |                 | usually   | |
|                         |                 | OPERATOR, | |
|                         | (IPROTO_TUPLE)  | FIELD_NO, | |
|   0x20: IPROTO_KEY      |    0x21:        | VALUE     | |
| MP_INT: MP_ARRAY        |  MP_INT:        +~~~~~~~~~~~+ |
|                         |                   MP_ARRAY    |
+=========================+===============================+
                MP_MAP

IPROTO_SPACE_ID (0x10) + MP_INT, IPROTO_INDEX_ID (0x11) + MP_INT with index number starting with 0, IPROTO_KEY (0x20) + MP_ARRAY (array of index keys), IPROTO_TUPLE (0x21) + MP_ARRAY (array of update operations).
If the operation specifies no values, it is a 2-item array: OPERATOR MP_STR = "#", FIELD_NO MP_INT = field number starting with 1.
If the operation specifies one value, it is a 3-item array:

0           2
+-------------+==========+===========+
|             |          |           |
| OPERATOR    | FIELD_NO | VALUE     |
| MP_STR      | MP_INT   | MP_OBJECT |
|             |          |           |
+-------------+==========+===========+
          MP_ARRAY

OPERATOR MP_STR = "+" or "-" or "&" or "^" or "|" or "!" or "="), FIELD_NO MP_INT = field number starting with 1, VALUE MP_OBJECT, that is, any type, MP_INT, MP_STR, etc..
Otherwise the operation is a 5-item array:

0           2
+-----------+==========+==========+========+==========+
|           |          |          |        |          |
| ':'       | FIELD_NO | POSITION | OFFSET | VALUE    |
| MP_STR    | MP_INT   | MP_INT   | MP_INT | MP_STR   |
|           |          |          |        |          |
+-----------+==========+==========+========+==========+
                      MP_ARRAY

OPERATOR MP_STR = ":", FIELD_NO MP_INT = field number starting with 1, POSITION MP_INT, OFFSET MP_INT, VALUE MP_STR.

For example, suppose a user changes field #2 in tuple #2 in space #256 to ‘BBBB’. The body will look like this: (notice that in this case there is an extra map item IPROTO_INDEX_BASE, to emphasize that field numbers start with 1, which is optional and can be omitted):

04               IPROTO_UPDATE
85               IPROTO_MAP, size 5
10                 IPROTO_SPACE_ID, Map Item#1
cd 02 00           MP_UINT 256
11                 IPROTO_INDEX_ID, Map Item#2
00                 MP_INT 0 = primary-key index number
15                 IPROTO_INDEX_BASE, Map Item#3
01                 MP_INT = 1 i.e. field numbers start at 1
21                 IPROTO_TUPLE, Map Item#4
91                 MP_ARRAY, size 1, for array of operations
93                   MP_ARRAY, size 3
a1 3d                   MP_STR = OPERATOR = '='
02                      MP_INT = FIELD_NO = 2
a5 42 42 42 42 42       MP_STR = VALUE = 'BBBB'
20                 IPROTO_KEY, Map Item#5
91                 MP_ARRAY, size 1, for array of key values
02                   MP_UINT = primary-key value = 2

IPROTO_DELETE = 0x05.

See space_object:delete(). The body is a 3-item map:

+=========================+=========================+====================+
|                         |                         |                    |
|   0x10: IPROTO_SPACE_ID |   0x11: IPROTO_INDEX_ID |   0x20: IPROTO_KEY |
| MP_INT: MP_INT          | MP_INT: MP_INT          | MP_INT: MP_ARRAY   |
|                         |                         |                    |
+=========================+=========================+====================+
                          MP_MAP

IPROTO_SPACE_ID (0x10) + MP_INT, IPROTO_INDEX_ID (0x11) + MP_INT, IPROTO_KEY (0x20) + MP_ARRAY (array of key values).

IPROTO_CALL_16 = 0x06.

See conn:call(). The suffix _16 is a hint that this is for the call() until Tarantool 1.6. It is deprecated. Use IPROTO_CALL instead. The body is a 2-item map:

+==============================+=======================+
|                              |                       |
|   0x22: IPROTO_FUNCTION_NAME |   0x21: IPROTO_TUPLE  |
| MP_INT: MP_STRING            | MP_INT: MP_ARRAY      |
|                              |                       |
+==============================+=======================+
                    MP_MAP

IPROTO_FUNCTION_NAME (0x22) + function name (MP_STRING), IPROTO_TUPLE (0x22) + array of arguments (MP_ARRAY). The return value is an array of tuples.

IPROTO_AUTH = 0x07.

See authentication. See the later section Binary protocol – authentication.

IPROTO_EVAL = 0x08.

See conn:eval(). Since the argument is a Lua expression, this is Tarantool’s way to handle non-binary with the binary protocol. Any request that does not have its own code, for example box.space.space-name:drop(), will be handled either with IPROTO_CALL or IPROTO_EVAL. Some client-like utilities, such as tarantoolctl, make extensive use of eval. The body is a 2-item map:

+=======================+======================+
|                       |                      |
|   0x27: IPROTO_EXPR   |   0x21: IPROTO_TUPLE |
| MP_INT: MP_STRING     | MP_INT: MP_ARRAY     |
|                       |                      |
+=======================+======================+
                MP_MAP

IPROTO_EXPR (0x27) + expression (MP_STRING), IPROTO_TUPLE (0x21) + array of arguments to match placeholders.

IPROTO_UPSERT = 0x09.

See space_object:upsert(). The body is the same as the body of IPROTO_UPDATE.

IPROTO_CALL = 0x0a.

See conn:call(). The body is a 2-item map:

+==============================+======================+
|                              |                      |
|   0x22: IPROTO_FUNCTION_NAME |   0x21: IPROTO_TUPLE |
| MP_INT: MP_STRING            | MP_INT: MP_ARRAY     |
|                              |                      |
+==============================+======================+
                        MP_MAP

IPROTO_FUNCTION_NAME (0x22) + function name (MP_STRING), IPROTO_TUPLE (0x22) + array of arguments (MP_ARRAY). The response will be a list of values, similar to the IPROTO_EVAL response.

IPROTO_EXECUTE = 0x0b.

See box.execute(), this is only for SQL. The body is a 3-item map:

+=========================+=========================+========================+
|                         |                         |                        |
|   0x43: IPROTO_STMT_ID  |   0x11: IPROTO_SQL_BIND |   0x20: IPROTO_OPTIONS |
| MP_INT: MP_INT          | MP_INT: MP_INT          | MP_INT: MP_ARRAY       |
|   or                    |                         |                        |
|   0x40: IPROTO_SQL_TEXT |                         |                        |
| MP_INT: MP_STR          |                         |                        |
|                         |                         |                        |
+=========================+=========================+========================+
                          MP_MAP

IPROTO_STMT_ID (0x43) + statement-id (MP_INT) if executing a prepared statement or IPROTO_SQL_TEXT (0x40) + statement-text (MP_STR) if executing an SQL string, IPROTO_SQL_BIND (0x41) + array of parameter values to match ? placeholders or :name placeholders, IPROTO_OPTIONS (0x2b) + array of options (usually empty).

For example, suppose we prepare a statement with two ? placeholders, and execute with two parameters, thus:
n = conn:prepare([[VALUES (?, ?);]])
conn:execute(n.stmt_id, {1,'a'})
Then the body will look like this:

0b               IPROTO_EXECUTE
83               MP_MAP, size 3
43                 IPROTO_STMT_ID Map Item#1
ce d7 aa 74 1b     MP_UINT value of n.stmt_id
41                 IPROTO_SQL_BIND Map Item#2
92                 MP_ARRAY, size 2
01                   MP_INT = 1 = value for first parameter
a1 61                MP_STR = 'a' = value for second parameter
2b                 IPROTO_OPTIONS Map Item#3
90                 MP_ARRAY, size 0 (there are no options)

To call a prepared statement with named parameters from a connector pass the parameters within an array of maps. A client should wrap each element into a map, where the key holds a name of the parameter (with a colon) and the value holds an actual value. So, to bind foo and bar to 42 and 43, a client should send IPROTO_SQL_TEXT: <...>, IPROTO_SQL_BIND: [{"foo": 42}, {"bar": 43}].

If a statement has both named and non-named parameters, wrap only named ones into a map. The rest of parameters are positional and substituted in order.

IPROTO_NOP = 0x0c.

There is no Lua request exactly equivalent to IPROTO_NOP. It causes the LSN to be incremented. It could be sometimes used for updates where the old and new values are the same, but the LSN must be increased because a data-change must be recorded. The body is: nothing.

IPROTO_PREPARE = 0x0d.

See box.prepare, this is only for SQL. The body is a 1-item map:

+=========================+
|                         |
|   0x10: IPROTO_STMT_ID  |
| MP_INT: MP_INT          |
|   or                    |
|   0x10: IPROTO_SQL_TEXT |
| MP_INT: MP_STR          |
|                         |
+=========================+
     MP_MAP

IPROTO_STMT_ID (0x43) + statement-id (MP_INT) if executing a prepared statement or IPROTO_SQL_TEXT (0x40) + statement-text (string) if executing an SQL string. Thus the IPROTO_PREPARE map item is the same as the first item of the IPROTO_EXECUTE map.

IPROTO_PING = 0x40.

See conn:ping(). The BODY will be an empty map because IPROTO_PING in the HEADER contains all the information that the server instance needs.

IPROTO_JOIN = 0x41, for replication
IPROTO_SUBSCRIBE = 0x42, for replication SUBSCRIBE
IPROTO_VOTE_DEPRECATED = 0x43, for old style vote, superseded by IPROTO_VOTE
IPROTO_VOTE = 0x44, for master election
IPROTO_FETCH_SNAPSHOT = 0x45, for starting anonymous replication
IPROTO_REGISTER =0x46, for leaving anonymous replication.

Tarantool constants 0x41 to 0x46 (decimal 65 to 70) are for replication. Connectors and clients do not need to send replication packets. See Binary protocol – replication.

The next two IProto messages are used in replication connections between Tarantool nodes in synchronous replication. The messages are not supposed to be used by any client applications in their regular connections.

IPROTO_CONFIRM = 0x28

This message confirms that the transactions originated from the instance with id = IPROTO_REPLICA_ID have achieved quorum and can be committed, up to LSN = IPROTO_LSN and including it.

The body is a 2-item map:

+===========================+====================+
|                           |                    |
|   0x02: IPROTO_REPLICA_ID |   0x03: IPROTO_LSN |
| MP_UINT: MP_UINT          | MP_UINT: MP_UINT   |
|                           |                    |
+===========================+====================+
                    MP_MAP

IPROTO_ROLLBACK = 0x29

This message says that the transactions originated from the instance with id = IPROTO_REPLICA_ID couldn’t achieve quorum for some reason and should be rolled back, down to LSN = IPROTO_LSN and including it.

The body is a 2-item map:

+===========================+====================+
|                           |                    |
|   0x02: IPROTO_REPLICA_ID |   0x03: IPROTO_LSN |
| MP_UINT: MP_UINT          | MP_UINT: MP_UINT   |
|                           |                    |
+===========================+====================+
                    MP_MAP

Binary protocol – responses if no error and no SQL

After the HEADER, for a response, there will be a body. It will contain IPROTO_OK (0x00) (there was no error), or an error code other than IPROTO_OK (there was an error). Responses to SQL statements are slightly different and will be described in the later section, Binary protocol – responses for SQL.

For IPROTO_OK, the header Response-Code-Indicator will be 0 and the body will be:

++=====================+
||                     |
||   0x30: IPROTO_DATA |
|| MP_INT: MP_OBJECT   |
||                     |
++=====================+
    MP_MAP

For IPROTO_PING the body will be an empty map. For most data-access requests (IPROTO_SELECT IPROTO_INSERT IPROTO_DELETE etc.) it will be an array of tuples that contain an array of fields. For IPROTO_EVAL and IPROTO_CALL it will usually be an array but, since Lua requests can result in a wide variety of structures, bodies can have a wide variety of structures.

For example, after box.space.space-name:insert{6} a successful response will look like this:

ce 00 00 00 20                MP_UINT = HEADER + BODY SIZE
83                            MP_MAP, size 3
00                              Response-Code-Indicator
ce 00 00 00 00                  MP_UINT = IPROTO_OK
01                              IPROTO_SYNC
cf 00 00 00 00 00 00 00 53      MP_UINT = sync value
05                              IPROTO_SCHEMA_VERSION
ce 00 00 00 68                  MP_UINT = schema version
81                            MP_MAP, size 1
30                              IPROTO_DATA
dd 00 00 00 01                  MP_ARRAY, size 1 (row count)
91                              MP_ARRAY, size 1 (field count)
06                              MP_INT = 6 = the value that was inserted

IPROTO_DATA is what we get with net_box and Module buffer so if we were using net_box we could decode with msgpack.decode_unchecked(), or we could convert to a string with ffi.string(pointer,length). The pickle.unpack() function might also be helpful.

Binary protocol – responses for errors

For a response other than IPROTO_OK, the header Response-Code-Indicator will be 0x8XXX and the body will be:

++=========================+
||                         |
||   0x31: IPROTO_ERROR_24 |
|| MP_INT: MP_STRING       |
||                         |
++=========================+
      MP_MAP

where 0x8XXX is the indicator for an error and XXX is a value in src/box/errcode.h. src/box/errcode.h also has some convenience macros which define hexadecimal constants for return codes.

For example, in version 2.4.0 or earlier, if we try to create a duplicate space with
conn:eval([[box.schema.space.create('_space');]])
the server response will look like this:

ce 00 00 00 3b                  MP_UINT = HEADER + BODY SIZE
83                              MP_MAP, size 3 (i.e. 3 items in header)
  00                              Response-Code-Indicator
  ce 00 00 80 0a                  MP_UINT = hexadecimal 800a
  01                              IPROTO_SYNC
  cf 00 00 00 00 00 00 00 26      MP_UINT = sync value
  05                              IPROTO_SCHEMA_VERSION
  ce 00 00 00 78                  MP_UINT = schema version value
  81                              MP_MAP, size 1
    31                              IPROTO_ERROR_24
    db 00 00 00 1d 53 70 61 63 etc. MP_STR = "Space '_space' already exists"

Looking in errcode.h we find that error code 0x0a (decimal 10) is ER_SPACE_EXISTS, and the string associated with ER_SPACE_EXISTS is “Space ‘%s’ already exists”.

Beginning in version 2.4.1, responses for errors have extra information following what was described above. This extra information is given via MP_ERROR extension type. See details in MessagePack extensions section.

Binary protocol – responses for SQL

After the HEADER, for a response to an SQL statement, there will be a body that is slightly different from the body for Binary protocol – responses if no error and no SQL.

If the SQL request is not SELECT or VALUES or PRAGMA, then the response body contains only IPROTO_SQL_INFO (0x42). Usually IPROTO_SQL_INFO is a map with only one item – SQL_INFO_ROW_COUNT (0x00) – which is the number of changed rows.

+=========================================================+
|                                                         |
|   0x42: IPROTO_SQL_INFO                                 |
| MP_MAP: usually 1 item  +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ |
|                         |                             | |
|                         |    0x00: SQL_INFO_ROW_COUNT | |
|                         | MP_UINT: changed row count  | |
|                         |                             | |
|                         +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ |
|                                                         |
+=========================================================+

For example, if the request is INSERT INTO table-name VALUES (1), (2), (3), then the response body contains an IPROTO_SQL_INFO map with SQL_INFO_ROW_COUNT = 3. SQL_INFO_ROW_COUNT can be 0 for statements that do not change rows, but can be 1 for statements that create new objects.

The IPROTO_SQL_INFO map may contain a second item – SQL_INFO_AUTO_INCREMENT_IDS (0x01) – which is the new primary-key value (or values) for an INSERT in a table defined with PRIMARY KEY AUTOINCREMENT. In this case the MP_MAP will have two keys, and one of the two keys will be 0x01: SQL_INFO_AUTO_INCREMENT_IDS, which is an array of unsigned integers.

For example, if we use the same net.box connection that we used for Binary protocol – illustration and we say
conn:execute([[CREATE TABLE t1 (dd INT PRIMARY KEY AUTOINCREMENT, дд STRING COLLATE "unicode");]])
conn:execute([[INSERT INTO t1 VALUES (NULL, 'a'), (NULL, 'b');]])
and we watch what tcpdump displays, we will see two noticeable things: (1) the CREATE statement caused a schema change so the response has a new IPROTO_SCHEMA_VERSION value and the body includes the new contents of some system tables (caused by requests from net.box which users will not see); (2) the final bytes of the response to the INSERT will be:

81   MP_MAP, size 1
42     IPROTO_SQL_INFO
82     MP_MAP, size 2
00       Tarantool constant (not in iproto_constants.h) = SQL_INFO_ROW_COUNT
02       1 = row count
01       Tarantool constant (not in iproto_constants.h) = SQL_INFO_AUTOINCREMENT_ID
92       MP_ARRAY, size 2
01         first autoincrement number
02         second autoincrement number

If the SQL statement is SELECT or VALUES or PRAGMA, the response contains:

  • IPROTO_METADATA + array of column maps, with each column map containing at least IPROTO_FIELD_NAME (0x00) + MP_STR, and IPROTO_FIELD_TYPE (0x01) + MP_STR. Additionally, if sql_full_metadata in the _session_settings system space is TRUE, then the array will have these additional column maps which correspond to components described in the box.execute() section: IPROTO_FIELD_COLL (0x02) + MP_STR, IPROTO_FIELD_IS_NULLABLE (0x03) + MP_BOOL, IPROTO_FIELD_IS_AUTOINCREMENT (0x04) + MP_BOOL, IPROTO_FIELD_SPAN (0x05) + MP_STR or MP_NIL.
  • IPROTO_DATA + array of tuples = the result set “rows”
EXECUTE SELECT RESPONSE BODY:
                              MAP
+=============================================+===========================+
|                                             |                           |
|     0x32: IPROTO_METADATA                   |                           |
| MP_ARRAY: array of maps:                    |                           |
|           +~~~~~~~~~~~~~~~~~~~~~~-------~~+ |                           |
|           | +~~~~~~~~~~~~~-------~~~~~~~+ | |     0x30: IPROTO_DATA     |
|           | |   0x00: IPROTO_FIELD_NAME | | | MP_ARRAY: array of tuples |
|           | | MP_STR: field name        | | |                           |
|           | |   0x01: IPROTO_FIELD_TYPE | | |                           |
|           | | MP_STR: field type        | | |                           |
|           | | + more if full metadata   | | |                           |
|           | +~~~~~~~~~~~~~~~~~~~~~~~~~~~+ | |                           |
|           |        MP_MAP                 | |                           |
|           +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ |                           |
|                   MP_ARRAY                  |                           |
|                                             |                           |
+=============================================+===========================+

For example, if we use the same net_box connection that we used for Binary protocol – illustration and we ask for full metadata by saying
conn.space._session_settings:update('sql_full_metadata', {{'=', 'value', true}})
and we select the two rows from the table that we just created
conn:execute([[SELECT dd, дд AS д FROM t1;]])
then tcpdump will show this response, after the header:

82                       MP_MAP, size 2 (i.e. metadata and rows)
32                         IPROTO_METADATA
92                         MP_ARRAY, size 2 (i.e. 2 columns)
85                           MP_MAP, size 5 (i.e. 5 items for column#1)
00 a2 44 44                    IPROTO_FIELD_NAME + 'DD'
01 a7 69 6e 74 65 67 65 72     IPROTO_FIELD_TYPE + 'integer'
03 c2                          IPROTO_FIELD_IS_NULLABLE + false
04 c3                          IPROTO_FIELD_IS_AUTOINCREMENT + true
05 c0                          PROTO_FIELD_SPAN + nil
85                           MP_MAP, size 5 (i.e. 5 items for column#2)
00 a2 d0 94                    IPROTO_FIELD_NAME + 'Д' upper case
01 a6 73 74 72 69 6e 67        IPROTO_FIELD_TYPE + 'string'
02 a7 75 6e 69 63 6f 64 65     IPROTO_FIELD_COLL + 'unicode'
03 c3                          IPROTO_FIELD_IS_NULLABLE + true
05 a4 d0 b4 d0 b4              IPROTO_FIELD_SPAN + 'дд' lower case
30                         IPROTO_DATA
92                         MP_ARRAY, size 2
92                           MP_ARRAY, size 2
01                             MP_INT = 1 i.e. contents of row#1 column#1
a1 61                          MP_STR = 'a' i.e. contents of row#1 column#2
92                           MP_ARRAY, size 2
02                             MP_INT = 2 i.e. contents of row#2 column#1
a1 62                          MP_STR = 'b' i.e. contents of row#2 column#2

If instead we said
conn:prepare([[SELECT dd, дд AS д FROM t1;]])
then tcpdump would should show almost the same response, but there would be no IPROTO_DATA and there would be two additional items:
34 00 = IPROTO_BIND_COUNT + MP_UINT = 0 (there are no parameters to bind),
33 90 = IPROTO_BIND_METADATA + MP_ARRAY, size 0 (there are no parameters to bind).

84                       MP_MAP, size 4
43                         IPROTO_STMT_ID
ce c2 3c 2c 1e             MP_UINT = statement id
34                         IPROTO_BIND_COUNT
00                         MP_INT = 0 = number of parameters to bind
33                         IPROTO_BIND_METADATA
90                         MP_ARRAY, size 0 = there are no parameters to bind
32                         IPROTO_METADATA
92                         MP_ARRAY, size 2 (i.e. 2 columns)
85                           MP_MAP, size 5 (i.e. 5 items for column#1)
00 a2 44 44                    IPROTO_FIELD_NAME + 'DD'
01 a7 69 6e 74 65 67 65 72     IPROTO_FIELD_TYPE + 'integer'
03 c2                          IPROTO_FIELD_IS_NULLABLE + false
04 c3                          IPROTO_FIELD_IS_AUTOINCREMENT + true
05 c0                          PROTO_FIELD_SPAN + nil
85                           MP_MAP, size 5 (i.e. 5 items for column#2)
00 a2 d0 94                    IPROTO_FIELD_NAME + 'Д' upper case
01 a6 73 74 72 69 6e 67        IPROTO_FIELD_TYPE + 'string'
02 a7 75 6e 69 63 6f 64 65     IPROTO_FIELD_COLL + 'unicode'
03 c3                          IPROTO_FIELD_IS_NULLABLE + true
05 a4 d0 b4 d0 b4              IPROTO_FIELD_SPAN + 'дд' lower case

Now read the source code file net_box.c where the function “decode_metadata_optional” is an example of how Tarantool itself decodes extra items.

Binary protocol – authentication

When a client connects to the server instance, the instance responds with a 128-byte text greeting message, like this:

Greeting packet sent by server after connect:

0                                     63
+--------------------------------------+
|                                      |
| Tarantool Greeting (server version)  |
|               64 bytes               |
+---------------------+----------------+
|                     |                |
| BASE64 encoded SALT |      NULL      |
|      44 bytes       |                |
+---------------------+----------------+
64                  107              127

The greeting contains two 64-byte lines of ASCII text. Each line ends with a newline character (\n). The first line contains the instance version and protocol type. The second line contains up to 44 bytes of base64-encoded random string, to use in the authentication packet, and ends with up to 23 spaces.

Part of the greeting is a base-64-encoded session salt - a random string which can be used for authentication. The maximum length of an encoded salt (44 bytes) is more than the amount necessary to create the authentication message. An excess is reserved for future authentication schemas.

Authentication is optional – if it is skipped, then the session user is 'guest' (the 'guest' user does not need a password).

If authentication is not skipped, then at any time an authentication packet can be prepared using the greeting, the user’s name and password, and sha-1 functions, as follows.

PREPARE SCRAMBLE:

    size_of_encoded_salt_in_greeting = 44;
    size_of_salt_after_base64_decode = 32;
    /* sha1() will only use the first 20 bytes */
    size_of_any_sha1_digest = 20;
    size_of_scramble = 20;

prepare 'chap-sha1' scramble:

    salt = base64_decode(encoded_salt);
    step_1 = sha1(password);
    step_2 = sha1(step_1);
    step_3 = sha1(first_20_bytes_of_salt, step_2);
    scramble = xor(step_1, step_3);
    return scramble;

AUTHORIZATION BODY: CODE = IPROTO_AUTH (0x07)

+==========================+=====================================+
|                          |        +-------------+------------+ |
|  (KEY)                   | (TUPLE)| size == 9   | size == 20 | |
|   0x23: IPROTO_USER_NAME |   0x21:| "chap-sha1" |  SCRAMBLE  | |
| MP_INT: MP_STRING        | MP_INT:|  MP_STRING  |  MP_STRING | |
|                          |        +-------------+------------+ |
|                          |                   MP_ARRAY          |
+==========================+=====================================+
                        MP_MAP

<key> holds the user name. <tuple> must be an array of 2 fields: authentication mechanism (“chap-sha1” is the only supported mechanism right now) and scramble, encrypted according to the specified mechanism.

The server instance responds to an authentication packet with a standard response with 0 tuples.

To see how Tarantool handles this, look at net_box.c function netbox_encode_auth.

Binary protocol – replication

-- replication keys
<server_id>    ::= 0x02
<lsn>          ::= 0x03
<timestamp>    ::= 0x04
<server_uuid>  ::= 0x24
<cluster_uuid> ::= 0x25
<vclock>       ::= 0x26
-- replication codes
<join>         ::= 0x41
<subscribe>    ::= 0x42
JOIN:

In the beginning you must send an initial IPROTO_JOIN request (0x41)
               HEADER                      BODY
+================+=======================++========================+
|                |                       ||   IPROTO_INSTANCE_UUID |
|   0x00: 0x41   |   0x01: IPROTO_SYNC   ||   0x24: UUID           |
| MP_INT: MP_INT | MP_INT: MP_INT        || MP_INT: MP_STRING      |
|                |                       ||                        |
+================+=======================++========================+
               MP_MAP                     MP_MAP

Then the instance which you want to connect to will send its last SNAP file,
by simply creating a number of INSERTs (with additional LSN and ServerID)
(do not reply to this). Then that instance will send a vclock's MP_MAP and
close a socket.

+================+=======================++============================+
|                |                       ||        +~~~~~~~~~~~~~~~~~+ |
|                |                       ||        |                 | |
|   0x00: 0x00   |   0x01: IPROTO_SYNC   ||   0x26:| SRV_ID: SRV_LSN | |
| MP_INT: MP_INT | MP_INT: MP_INT        || MP_INT:| MP_INT: MP_INT  | |
|                |                       ||        +~~~~~~~~~~~~~~~~~+ |
|                |                       ||               MP_MAP       |
+================+=======================++============================+
               MP_MAP                      MP_MAP

SUBSCRIBE:

Then you must send an IPROTO_SUBSCRIBE request (0x42)

                              HEADER
+=========================+========================+
|                         |                        |
|     0x00: 0x42          |    0x01: IPROTO_SYNC   |
|   MP_INT: MP_INT        |  MP_INT: MP_INT        |
|                         |                        |
+=========================+========================+
|    IPROTO_INSTANCE_UUID |    IPROTO_CLUSTER_UUID |
|   0x24: UUID            |   0x25: UUID           |
| MP_INT: MP_STRING       | MP_INT: MP_STRING      |
|                         |                        |
+=========================+========================+
                 MP_MAP

      BODY
+=======================+
|                       |
|   0x26: IPROTO_VCLOCK |
| MP_INT: MP_INT        |
|                       |
+=======================+
      MP_MAP

Then you must process every request that could come through other masters.
Every request between masters will have Additional LSN and SERVER_ID.

Frequently a master sends a heartbeat message to a replica. For example, if there is a replica with id = 2, and a timestamp with a moment in 2020, a master might send this:

83                      MP_MAP, size 3
00                        Main-Map Item #1 IPROTO_REQUEST_TYPE
00                          MP_UINT = 0
02                        Main-Map Item #2 IPROTO_REPLICA_ID
02                          MP_UINT = 2 = id
04                        Main-Map Item #3 IPROTO_TIMESTAMP
cb                          MP_DOUBLE (MessagePack "Float 64")
41 d7 ba 06 7b 3a 03 21     8-byte timestamp

and the replica might send back this:

81                       MP_MAP, size 1
00                         Main-Map Item #1 Response-code-indicator
00                         MP_UINT = 0 = IPROTO_OK
81                         Main-Map Item #2, MP_MAP, size 1
26                           Sub-Map Item #1 IPROTO_VCLOCK
81                           Sub-Map Item #2, MP_MAP, size 1
01                             MP_UINT = 1 = id (part 1 of vclock)
06                             MP_UINT = 6 = lsn (part 2 of vclock)

XLOG / SNAP

.xlog and .snap files have nearly the same format. The header looks like:

<type>\n                  SNAP\n or XLOG\n
<version>\n               currently 0.13\n
Server: <server_uuid>\n   where UUID is a 36-byte string
VClock: <vclock_map>\n    e.g. {1: 0}\n
\n

After the file header come the data tuples. Tuples begin with a row marker 0xd5ba0bab and the last tuple may be followed by an EOF marker 0xd510aded. Thus, between the file header and the EOF marker, there may be data tuples that have this form:

0            3 4                                         17
+-------------+========+============+===========+=========+
|             |        |            |           |         |
| 0xd5ba0bab  | LENGTH | CRC32 PREV | CRC32 CUR | PADDING |
|             |        |            |           |         |
+-------------+========+============+===========+=========+
  MP_FIXEXT2    MP_INT     MP_INT       MP_INT      ---

+============+ +===================================+
|            | |                                   |
|   HEADER   | |                BODY               |
|            | |                                   |
+============+ +===================================+
    MP_MAP                     MP_MAP

See the example in the File formats section.