Tuple compression | Enterprise
Tuple compression

Tuple compression

Tuple compression, introduced in Tarantool 2.10.0, aims to save memory space. Typically, it decreases the volume of stored data by 15%. However, the exact volume saved depends on the type of data.

Two compression algorithms are currently supported: lz4 and zstd. To learn about the performance costs of each algorithm, check the appendix.

You do not compress tuples themselves, just the fields inside these tuples. You can only compress non-indexed fields. Compression works best when JSON is stored in the field.

Tuple compression is possible for memtx spaces only. Vinyl spaces do not support compression.

How to create compressed fields

First, create a memtx space:

box.schema.space.create('TEST')

Then create an index for this space, for example:

box.space.TEST:create_index('tree', {
            type = 'TREE',
            parts = {
                {1, 'unsigned'},
                {3, 'unsigned'},
                {5, 'unsigned'}
        }})

Create a format to declare field names and types. It is possible to have only one field with an index. This example has several indexed fields, just to demonstrate a more complicated case:

box.space.TEST:format({
            {name = 'A', type = 'unsigned'},
            {name = 'B', type = 'string', compression = 'zstd'},
            {name = 'C', type = 'unsigned'},
            {name = 'D', type = 'unsigned', compression = 'lz4'},
            {name = 'E', type = 'unsigned'}
        })

In this example, fields number 1, 3, and 5 have indexes, so they cannot be compressed. Fields 2 and 4 can be compressed. They have compression formats compression = 'zstd' and compression = 'lz4', correspondingly. You can apply different compression algorithms to different fields in a single space.

Now, the new tuples that you create and add to the space ‘TEST’ will be compressed.

When you read a compressed tuple, you do not need to decompress it back yourself.

If the size of the field is too small, the field will not be compressed. It is not an error, so you will see no error message. The field will just have the same size as it had before the compression.

How to check whether a field is compressed

To determine which fields in your space are compressed, run space_object:format() on the space. If a field is compressed, the format will include the compression type. Example output:

box.space.ledger:format({
            {name = 'id', type = 'unsigned'}, -- this field is uncompressed
            {name = 'client_details', type = 'array', compression = 'zstd'},
            {name = 'notes', type = 'string', compression = 'lz4'},
        })

What tuples can be compressed

In Tarantool 2.10.0, you can enable compression for an existing field. All the tuples added after that will have this field compressed. However, this doesn’t affect the tuples already stored in the space – they remain uncompressed until the snapshot and restart.

How to enable compression for already created tuples

With the help of space:upgrade(), you can enable compression and migrate, including already created tuples. Just specify the fields to be compressed in the format passed to space:upgrade(). Everything works transparently: when writing, the data is compressed, when reading it is decompressed. You can also compress data in existing storages.

Here’s an example of how to compress an existing field:

  1. Create a space without compression (named ledger in this example) and add several tuples.

  2. Suppose that you want fields 2 and 3 to be compressed from now on. To enable compression, change the format:

    local format = box.space.ledger:format()
    format[2].compression = 'zstd'
    format[3].compression = 'zstd'
    box.space.ledger:format(format)
    
  3. To finalize the change, create a snapshot by running box.snapshot() and restart Tarantool.

  4. From now on, all the tuples that you add to the space have fields 2 and 3 compressed. After the snapshot and restart, all old tuples will also be compressed in-memory as well during recovery).

Errors

“Indexed field does not support compression”

You can only compress non-indexed fields. If you try to compress an indexed field, you will get an error message: “Indexed field does not support compression”.

“Vinyl does not support compression”

Tuple compression is possible for memtx spaces. If you create a vinyl space with compression, you will get an error message: “Vinyl does not support compression”.

“Failed to create space ‘T’: field 1 has unknown compression type”

If you set a compression format that is not zstd or lz4, you will get an error message: “Failed to create space ‘T’: field 1 has unknown compression type”. Here field 1 is the name of an example field.

Found what you were looking for?
Feedback