Tuple compression | Enterprise

Version:

latest
Tarantool
Check out the new release policy
Tuple compression

Tuple compression

Tuple compression aims to save memory space. Typically, it decreases the volume of stored data by 15%. However, the exact volume saved depends on the type of data.

Two compression algorithms are currently supported: lz4 and zstd. To learn about the performance costs of each algorithm, check the appendix.

You do not compress tuples themselves, just the fields inside these tuples. You can only compress non-indexed fields. Compression works best when JSON is stored in the field.

Tuple compression is possible for memtx spaces only. Vinyl spaces do not support compression.

How to create compressed fields

First, create a memtx space:

box.schema.space.create('TEST')

Then create an index for this space, for example:

box.space.TEST:create_index('tree', {
            type = 'TREE',
            parts = {
                {1, 'unsigned'},
                {3, 'unsigned'},
                {5, 'unsigned'}
        }})

Create a format to declare field names and types. It is possible to have only one field with an index. This example has several indexed fields, just to demonstrate a more complicated case:

box.space.TEST:format({
            {name = 'A', type = 'unsigned'},
            {name = 'B', type = 'string', compression = 'zstd'},
            {name = 'C', type = 'unsigned'},
            {name = 'D', type = 'unsigned', compression = 'lz4'},
            {name = 'E', type = 'unsigned'}
        })

In this example, fields number 1, 3, and 5 have indexes, so they cannot be compressed. Fields 2 and 4 can be compressed. They have compression formats compression = 'zstd' and compression = 'lz4', correspondingly. You can apply different compression algorithms to different fields in a single space.

Now, the new tuples that you create and add to the space ‘TEST’ will be compressed.

When you read a compressed tuple, you do not need to decompress it back yourself.

If the size of the field is too small, the field will not be compressed. It is not an error, so you will see no error message. The field will just have the same size as it had before the compression.

How to check whether a field is compressed

To determine which fields in your space are compressed, run space_object:format() on the space. If a field is compressed, the format will include the compression type. Example output:

box.space.ledger:format({
            {name = 'id', type = 'unsigned'}, -- this field is uncompressed
            {name = 'client_details', type = 'array', compression = 'zstd'},
            {name = 'notes', type = 'string', compression = 'lz4'},
        })

What tuples can be compressed

For now, you can only compress new tuples. Here is an example of how it works. You create a space without compression with several tuples. These tuples will always remain uncompressed. Then you want to change the format. For example, you set that field number 2 will be compressed. From now on, all new tuples that you create inside the space will be compressed. All old tuples will remain uncompressed.

How to enable compression for already created tuples

With the help of non-blocking DDL, you can enable compression and migrate, including already created tuples. This chapter will be updated as soon as non-blocking DDL becomes available.

Errors

“Indexed field does not support compression”

You can only compress non-indexed fields. If you try to compress an indexed field, you will get an error message: “Indexed field does not support compression”.

“Vinyl does not support compression”

Tuple compression is possible for memtx spaces. If you create a vinyl space with compression, you will get an error message: “Vinyl does not support compression”.

“Failed to create space ‘T’: field 1 has unknown compression type”

If you set a compression format that is not zstd or lz4, you will get an error message: “Failed to create space ‘T’: field 1 has unknown compression type”. Here field 1 is the name of an example field.