Tuple compression

Enterprise Edition

Tuple compression is available in the Enterprise Edition only.

Tuple compression, introduced in Tarantool Enterprise Edition 2.10.0, aims to save memory space. Typically, it decreases the volume of stored data by 15%. However, the exact volume saved depends on the type of data.

The following compression algorithms are supported:

lz4
zstd
zlib (since 2.11.0)

To learn about the performance costs of each algorithm, check Tuple compression performance.

Tarantool doesn’t compress tuples themselves, just the fields inside these tuples. You can only compress non-indexed fields. Compression works best when JSON is stored in the field.

Note

The compress module provides the API for compressing and decompressing data.

Enabling compression for a new space

First, create a space:

box.schema.space.create('bands')

Then, create an index for this space, for example:

box.space.bands:create_index('primary', {parts = {{1, 'unsigned'}}})

Create a format to declare field names and types. In the example below, the band_name and year fields have the zstd and lz4 compression formats, respectively. The first field (id) has the index, so it cannot be compressed.

box.space.bands:format({
           {name = 'id', type = 'unsigned'},
           {name = 'band_name', type = 'string', compression = 'zstd'},
           {name = 'year', type = 'unsigned', compression = 'lz4'}
       })

Now, the new tuples that you add to the space bands will be compressed. When you read a compressed tuple, you do not need to decompress it back yourself.

Checking which fields are compressed

To check which fields in a space are compressed, run space_object:format() on the space. If a field is compressed, the format includes the compression algorithm, for example:

tarantool> box.space.bands:format()
    ---
    - [{'name': 'id', 'type': 'unsigned'},
       {'type': 'string', 'compression': 'zstd', 'name': 'band_name'},
       {'type': 'unsigned', 'compression': 'lz4', 'name': 'year'}]
    ...

Enabling compression for existing spaces

You can enable compression for existing fields. All the tuples added after that will have this field compressed. However, this doesn’t affect the tuples already stored in the space. You need to make the snapshot and restart Tarantool to compress the existing tuples.

Here’s an example of how to compress existing fields:

Create a space without compression and add several tuples:

box.schema.space.create('bands')

box.space.bands:format({
    { name = 'id', type = 'unsigned' },
    { name = 'band_name', type = 'string' },
    { name = 'year', type = 'unsigned' }
})

box.space.bands:create_index('primary', { parts = { 'id' } })

box.space.bands:insert { 1, 'Roxette', 1986 }
box.space.bands:insert { 2, 'Scorpions', 1965 }
box.space.bands:insert { 3, 'Ace of Base', 1987 }
box.space.bands:insert { 4, 'The Beatles', 1960 }

Suppose that you want fields 2 and 3 to be compressed from now on. To enable compression, change the format as follows:
```
local new_format = box.space.bands:format()

new_format[2].compression = 'zstd'
new_format[3].compression = 'lz4'

box.space.bands:format(new_format)
```
From now on, all the tuples that you add to the space have fields 2 and 3 compressed.
To finalize the change, create a snapshot by running box.snapshot() and restart Tarantool. As a result, all old tuples will also be compressed in memory during recovery.

Note

space:upgrade() provides the ability to enable compression and update the existing tuples in the background. To achieve this, you need to pass a new space format in the format argument of space:upgrade().