Module msgpack | Tarantool
Tarantool
Check out the new release policy

Module msgpack

Definitions:

  • MsgPack is short for MessagePack.
  • A “raw MsgPack string” is a byte array formatted according to the MsgPack specification including type bytes and sizes. The type bytes and sizes can be made displayable with string.hex(), or the raw MsgPack strings can be converted to Lua objects by using the msgpack module methods.

The msgpack module decodes raw MsgPack strings by converting them to Lua objects, and encodes Lua objects by converting them to raw MsgPack strings. Tarantool makes heavy internal use of MsgPack because tuples in Tarantool are stored as MsgPack arrays.

Besides, starting from version 2.10.0, the msgpack module enables creating a specific userdata Lua object—MsgPack object. The MsgPack object stores arbitrary MsgPack data, and can be created from any Lua object including another MsgPack object and from a raw MsgPack string. The MsgPack object has its own set of methods and iterators.

Below is a list of all msgpack functions and members.

Name Use
msgpack.encode(lua_value) Convert a Lua object to a raw MsgPack string
msgpack.encode(lua_value,ibuf) Convert a Lua object to a raw MsgPack string in an ibuf
msgpack.decode(msgpack_string) Convert a raw MsgPack string to a Lua object
msgpack.decode(C_style_string_pointer) Convert a raw MsgPack string in an ibuf to a Lua object
msgpack.decode_unchecked(msgpack_string) Convert a raw MsgPack string to a Lua object
msgpack.decode_unchecked(C_style_string_pointer) Convert a raw MsgPack string to a Lua object
msgpack.decode_array_header(byte-array, size) Skip an array header in a raw MsgPack string
msgpack.decode_map_header(byte-array, size) Skip a map header in a raw MsgPack string
__serialize parameter Output structure specification
msgpack.cfg() Change configuration
msgpack.NULL Analog of Lua’s nil
msgpack.object(lua_value) Create a MsgPack object from a Lua object
msgpack.object_from_raw(msgpack_string) Create a MsgPack object from a raw MsgPack string
msgpack.object_from_raw(C_style_string_pointer, size) Create a MsgPack object from a raw MsgPack string
msgpack.is_object(some_argument) Check if an argument is a MsgPack object
msgpack_object:decode() Decode MsgPack data in a MsgPack object and return a Lua object
msgpack_object:iterator() Get an iterator over the MsgPack data
iterator_object:decode_array_header() Decode a MsgPack array header under the iterator cursor, return the number of elements in the array, and advance the cursor
iterator_object:decode_map_header() Decode a MsgPack map header under the iterator cursor, return the number of key value pairs in the map, and advance the cursor
iterator_object:decode() Decode a MsgPack value under the iterator cursor, return the corresponding Lua object, and advance the cursor
iterator_object:take() Return a MsgPack value under the iterator cursor as a MsgPack object without decoding and advance the cursor
iterator_object:skip() Advance the iterator cursor by skipping one MsgPack value under the cursor
msgpack.encode(lua_value)

Convert a Lua object to a raw MsgPack string.

Parameters:
  • lua_value – either a scalar value or a Lua table value.
Return:

the original contents formatted as a raw MsgPack string;

Rtype:

raw MsgPack string

msgpack.encode(lua_value, ibuf)

Convert a Lua object to a raw MsgPack string in an ibuf, which is a buffer such as buffer.ibuf() creates. As with encode(lua_value), the result is a raw MsgPack string, but it goes to the ibuf output instead of being returned.

Parameters:
  • lua_value (lua-object) – either a scalar value or a Lua table value.
  • ibuf (buffer) – (output parameter) where the result raw MsgPack string goes
Return:

number of bytes in the output

Rtype:

raw MsgPack string

Example using buffer.ibuf() and ffi.string() and string.hex(): The result will be ‘91a161’ because 91 is the MessagePack encoding of “fixarray size 1”, a1 is the MessagePack encoding of “fixstr size 1”, and 61 is the UTF-8 encoding of ‘a’:

ibuf = require('buffer').ibuf()
msgpack_string_size = require('msgpack').encode({'a'}, ibuf)
msgpack_string = require('ffi').string(ibuf.rpos, msgpack_string_size)
string.hex(msgpack_string)
msgpack.decode(msgpack_string[, start_position])

Convert a raw MsgPack string to a Lua object.

Parameters:
  • msgpack_string (string) – a raw MsgPack string.
  • start_position (integer) – where to start, minimum = 1, maximum = string length, default = 1.
Return:
  • (if msgpack_string is a valid raw MsgPack string) the original contents of msgpack_string, formatted as a Lua object, usually a Lua table, (otherwise) a scalar value, such as a string or a number;
  • “next_start_position”. If decode stops after parsing as far as byte N in msgpack_string, then “next_start_position” will equal N + 1, and decode(msgpack_string, next_start_position) will continue parsing from where the previous decode stopped, plus 1. Normally decode parses all of msgpack_string, so “next_start_position” will equal string.len(msgpack_string) + 1.
Rtype:

Lua object and number

Example: The result will be [‘a’] and 4:

msgpack_string = require('msgpack').encode({'a'})
require('msgpack').decode(msgpack_string, 1)
msgpack.decode(C_style_string_pointer, size)

Convert a raw MsgPack string, whose address is supplied as a C-style string pointer such as the rpos pointer which is inside an ibuf such as buffer.ibuf() creates, to a Lua object. A C-style string pointer may be described as cdata<char *> or cdata<const char *>.

Parameters:
  • C_style_string_pointer (buffer) – a pointer to a raw MsgPack string.
  • size (integer) – number of bytes in the raw MsgPack string
Return:
  • (if C_style_string_pointer points to a valid raw MsgPack string) the original contents of msgpack_string, formatted as a Lua object, usually a Lua table, (otherwise) a scalar value, such as a string or a number;
  • returned_pointer = a C-style pointer to the byte after what was passed, so that C_style_string_pointer + size = returned_pointer
Rtype:

table and C-style pointer to after what was passed

Example using buffer.ibuf and pointer arithmetic: The result will be [‘a’] and 3 and true:

ibuf = require('buffer').ibuf()
msgpack_string_size = require('msgpack').encode({'a'}, ibuf)
a, b = require('msgpack').decode(ibuf.rpos, msgpack_string_size)
a, b - ibuf.rpos, msgpack_string_size == b - ibuf.rpos
msgpack.decode_unchecked(msgpack_string[, start_position])

Input and output are the same as for decode(string).

msgpack.decode_unchecked(C_style_string_pointer)

Input and output are the same as for decode(C_style_string_pointer), except that size is not needed. Some checking is skipped, and decode_unchecked(C_style_string_pointer) can operate with string pointers to buffers which decode(C_style_string_pointer) cannot handle. For an example see the buffer module.

msgpack.decode_array_header(byte-array, size)

Call the mp_decode_array function in the MsgPuck library and return the array size and a pointer to the first array component. A subsequent call to msgpack_decode can decode the component instead of the whole array.

Parameters:
  • byte-array – a pointer to a raw MsgPack string.
  • size – a number greater than or equal to the string’s length
Return:
  • the size of the array;
  • a pointer to after the array header.
-- Example of decode_array_header
-- Suppose we have the raw data '\x93\x01\x02\x03'.
-- \x93 is MsgPack encoding for a header of a three-item array.
-- We want to skip it and decode the next three items.
msgpack=require('msgpack'); ffi=require('ffi')
x,y=msgpack.decode_array_header(ffi.cast('char*','\x93\x01\x02\x03'),4)
a=msgpack.decode(y,1);b=msgpack.decode(y+1,1);c=msgpack.decode(y+2,1);
a,b,c
-- The result will be: 1,2,3.
msgpack.decode_map_header(byte-array, size)

Call the mp_decode_map function in the MsgPuck library and return the map size and a pointer to the first map component. A subsequent call to msgpack_decode can decode the component instead of the whole map.

Parameters:
  • byte-array – a pointer to a raw MsgPack string.
  • size – a number greater than or equal to the raw MsgPack string’s length
Return:
  • the size of the map;
  • a pointer to after the map header.
-- Example of decode_map_header
-- Suppose we have the raw data '\x81\xa2\x41\x41\xc3'.
-- \x81 is MsgPack encoding for a header of a one-item map.
-- We want to skip it and decode the next map item.
msgpack=require('msgpack'); ffi=require('ffi')
x,y=msgpack.decode_map_header(ffi.cast('char*','\x81\xa2\x41\x41\xc3'),5)
a=msgpack.decode(y,3);b=msgpack.decode(y+3,1)
x,a,b
-- The result will be: 1,"AA", true.

__serialize parameter

The MsgPack output structure can be specified with the __serialize parameter:

  • ‘seq’, ‘sequence’, ‘array’ - table encoded as an array
  • ‘map’, ‘mappping’ - table encoded as a map
  • function - the meta-method called to unpack serializable representation of table, cdata or userdata objects

Serializing ‘A’ and ‘B’ with different __serialize values brings different results. To show this, here is a routine which encodes {'A','B'} both as an array and as a map, then displays each result in hexadecimal.

function hexdump(bytes)
    local result = ''
    for i = 1, #bytes do
        result = result .. string.format("%x", string.byte(bytes, i)) .. ' '
    end
    return result
end

msgpack = require('msgpack')
m1 = msgpack.encode(setmetatable({'A', 'B'}, {
                             __serialize = "seq"
                          }))
m2 = msgpack.encode(setmetatable({'A', 'B'}, {
                             __serialize = "map"
                          }))
print('array encoding: ', hexdump(m1))
print('map encoding: ', hexdump(m2))

Result:

array encoding: 92 a1 41 a1 42
map encoding:   82 01 a1 41 02 a1 42

The MsgPack Specification page explains that the first encoding means:

fixarray(2), fixstr(1), "A", fixstr(1), "B"

and the second encoding means:

fixmap(2), key(1), fixstr(1), "A", key(2), fixstr(2), "B"

Here are examples for all the common types, with the Lua-table representation on the left, with the MsgPack format name and encoding on the right.

Common Types and MsgPack Encodings

{} ‘fixmap’ if metatable is ‘map’ = 80 otherwise ‘fixarray’ = 90
‘a’ ‘fixstr’ = a1 61
false ‘false’ = c2
true ‘true’ = c3
127 ‘positive fixint’ = 7f
65535 ‘uint 16’ = cd ff ff
4294967295 ‘uint 32’ = ce ff ff ff ff
nil ‘nil’ = c0
msgpack.NULL same as nil
[0] = 5 ‘fixmap(1)’ + ‘positive fixint’ (for the key) + ‘positive fixint’ (for the value) = 81 00 05
[0] = nil ‘fixmap(0)’ = 80 – nil is not stored when it is a missing map value
1.5 ‘float 64’ = cb 3f f8 00 00 00 00 00 00
msgpack.cfg(table)

Some MsgPack configuration settings can be changed.

The values are all either integers or boolean true/false.

Option Default Use
cfg.encode_max_depth 128 Max recursion depth for encoding
cfg.encode_deep_as_nil false A flag saying whether to crop tables with nesting level deeper than cfg.encode_max_depth. Not-encoded fields are replaced with one null. If not set, too high nesting is considered an error.
cfg.encode_invalid_numbers true A flag saying whether to enable encoding of NaN and Inf numbers
cfg.encode_load_metatables true A flag saying whether the serializer will follow __serialize metatable field
cfg.encode_use_tostring false A flag saying whether to use tostring() for unknown types
cfg.encode_invalid_as_nil false A flag saying whether to use NULL for non-recognized types
cfg.encode_sparse_convert true A flag saying whether to handle excessively sparse arrays as maps. See detailed description below
cfg.encode_sparse_ratio 2 1/encode_sparse_ratio is the permissible percentage of missing values in a sparse array
cfg.encode_sparse_safe 10 A limit ensuring that small Lua arrays are always encoded as sparse arrays (instead of generating an error or encoding as a map)
cfg.decode_invalid_numbers true A flag saying whether to enable decoding of NaN and Inf numbers
cfg.decode_save_metatables true A flag saying whether to set metatables for all arrays and maps

Sparse arrays features

During encoding, the MsgPack encoder tries to classify tables into one of four kinds:

  • map - at least one table index is not unsigned integer
  • regular array - all array indexes are available
  • sparse array - at least one array index is missing
  • excessively sparse array - the number of values missing exceeds the configured ratio

An array is excessively sparse when all the following conditions are met:

  • encode_sparse_ratio > 0
  • max(table) > encode_sparse_safe
  • max(table) > count(table) * encode_sparse_ratio

MsgPack encoder will never consider an array to be excessively sparse when encode_sparse_ratio = 0. The encode_sparse_safe limit ensures that small Lua arrays are always encoded as sparse arrays. By default, attempting to encode an excessively sparse array will generate an error. If encode_sparse_convert is set to true, excessively sparse arrays will be handled as maps.

msgpack.cfg() example 1:

If msgpack.cfg.encode_invalid_numbers = true (the default), then NaN and Inf are legal values. If that is not desirable, then ensure that msgpack.encode() will not accept them, by saying msgpack.cfg{encode_invalid_numbers = false}, thus:

tarantool> msgpack = require('msgpack'); msgpack.cfg{encode_invalid_numbers = true}
---
...
tarantool> msgpack.decode(msgpack.encode{1, 0 / 0, 1 / 0, false})
---
- [1, -nan, inf, false]
- 22
...
tarantool> msgpack.cfg{encode_invalid_numbers = false}
---
...
tarantool> msgpack.decode(msgpack.encode{1, 0 / 0, 1 / 0, false})
---
- error: ... number must not be NaN or Inf'
...

msgpack.cfg() example 2:

To avoid generating errors on attempts to encode unknown data types as userdata/cdata, you can use this code:

tarantool> httpc = require('http.client').new()
---
...

tarantool> msgpack.encode(httpc.curl)
---
- error: unsupported Lua type 'userdata'
...

tarantool> msgpack.encode(httpc.curl, {encode_use_tostring=true})
---
- '"userdata: 0x010a4ef2a0"'
...

Note

To achieve the same effect for only one call to msgpack.encode() (i.e. without changing the configuration permanently), you can use msgpack.encode({1, x, y, 2}, {encode_invalid_numbers = true}).

Similar configuration settings exist for JSON and YAML.

msgpack.NULL

A value comparable to Lua “nil” which may be useful as a placeholder in a tuple.

Example

tarantool> msgpack = require('msgpack')
---
...
tarantool> y = msgpack.encode({'a',1,'b',2})
---
...
tarantool> z = msgpack.decode(y)
---
...
tarantool> z[1], z[2], z[3], z[4]
---
- a
- 1
- b
- 2
...
tarantool> box.space.tester:insert{20, msgpack.NULL, 20}
---
- [20, null, 20]
...
msgpack.object(lua_value)

Since version 2.10.0.

Encode an arbitrary Lua object into the MsgPack format.

Parameters:
  • lua_value (lua-object) – a Lua object of any type.
Return:

encoded MsgPack data encapsulated in a MsgPack object.

Rtype:

userdata

Example:

local msgpack = require('msgpack')
-- Create a MsgPack object from a Lua object of any type
local mp = msgpack.object(123)
local mp = msgpack.object("foobar")
local mp = msgpack.object({1, 2, 3})
local mp = msgpack.object({foo = 1, bar = 2})
local mp = msgpack.object(box.tuple.new(1, 2, 3))
msgpack.object_from_raw(msgpack_string)

Since version 2.10.0.

Create a MsgPack object from a raw MsgPack string.

Parameters:
  • msgpack_string (string) – a raw MsgPack string.
Return:

a MsgPack object

Rtype:

userdata

Example:

local msgpack = require('msgpack')
local data = msgpack.encode({1, 2, 3})
local mp = msgpack.object_from_raw(data)
msgpack.object_from_raw(C_style_string_pointer, size)

Since version 2.10.0.

Create a MsgPack object from a raw MsgPack string. The address of the MsgPack string is supplied as a C-style string pointer such as the rpos pointer inside an ibuf that the buffer.ibuf() creates. A C-style string pointer may be described as cdata<char *> or cdata<const char *>.

Parameters:
  • C_style_string_pointer (buffer) – a pointer to a raw MsgPack string.
  • size (integer) – number of bytes in the raw MsgPack string.
Return:

a MsgPack object

Rtype:

userdata

Example:

local msgpack = require('msgpack')
local buffer = require('buffer')
local buf = buffer.ibuf()
msgpack.encode({1, 2, 3}, buf)
local mp = msgpack.object_from_raw(buf.buf, buf:size())
msgpack.is_object(some_argument)

Since version 2.10.0.

Check if the given argument is a MsgPack object.

Parameters:
  • some_agrument – any argument.
Return:

true if the argument is a MsgPack object; otherwise, false

Rtype:

boolean

Example:

local msgpack = require('msgpack')
local mp = msgpack.object(123)
msgpack.is_object(mp) -- returns true
msgpack.is_object({}) -- returns false
object msgpack_object

A MsgPack object can be passed to the MsgPack encoder with the same effect as passing the original Lua object:

local msgpack = require('msgpack')
local mp = msgpack.object(123)
msgpack.object({mp, mp}):decode()         -- returns {123, 123}
msgpack.decode(msgpack.encode({mp, mp}))  -- returns {123, 123}

In particular, this means that if a MsgPack object stores an array, it can be inserted into a database space:

box.space.my_space:insert(msgpack.object({1, 2, 3}))

The MsgPack object has the following methods:

msgpack_object:decode()

Since version 2.10.0.

Decode MsgPack data in the MsgPack object.

Return:a Lua object
Rtype:Lua object
msgpack_object:iterator()

Since version 2.10.0.

Create an iterator over the MsgPack data.

A MsgPack iterator object has its own set of methods.

Return:an iterator object over the MsgPack data
Rtype:userdata
object iterator_object

The MsgPack iterator object has the following methods:

iterator_object:decode_array_header()

Since version 2.10.0.

Decode a MsgPack array header under the iterator cursor and advance the cursor. After calling this function, the iterator points to the first element of the array or to the value following the array if the array is empty.

Return:number of elements in the array
Rtype:number

Possible errors: raise an error if the type of the value under the iterator cursor is not MP_ARRAY.

iterator_object:decode_map_header()

Since version 2.10.0.

Decode a MsgPack map header under the iterator cursor and advance the cursor. After calling this function, the iterator points to the first key stored in the map or to the value following the map if the map is empty.

Return:number of key value pairs in the map
Rtype:number

Possible errors: raise an error if the type of the value under the iterator cursor is not MP_MAP.

iterator_object:decode()

Since version 2.10.0.

Decode a MsgPack value under the iterator cursor and advance the cursor.

Return:a Lua object corresponding to the MsgPack value
Rtype:Lua object

Possible errors: raise a Lua error if there’s no data to decode.

iterator_object:take()

Since version 2.10.0.

Return a MsgPack value under the iterator cursor as a MsgPack object without decoding and advance the cursor. The method doesn’t copy MsgPack data. Instead, it takes a reference to the original object.

Possible errors: raise a Lua error if there’s no data to decode.

iterator_object:skip()

Since version 2.10.0.

Advance the iterator cursor by skipping one MsgPack value under the cursor. Returns nothing.

Possible errors: raise a Lua error if there’s no data to skip.

Example:

local msgpack = require('msgpack')
local mp = msgpack.object({foo = 123, bar = {1, 2, 3}})
local it = mp:iterator()
it:decode_map_header()  -- returns 2
it:decode()             -- returns 'foo'
it:decode()             -- returns 123
it:skip()               -- returns none, skips 'bar'
local mp2 = it:take()
mp2:decode()            -- returns {1, 2, 3}