Tuple compression | Enterprise
Tuple compression

Tuple compression

Tuple compression, introduced in Tarantool 2.10.0, aims to save memory space. Typically, it decreases the volume of stored data by 15%. However, the exact volume saved depends on the type of data.

The following compression algorithms are currently supported:

To learn about the performance costs of each algorithm, check Tuple compression performance.

Tarantool doesn’t compress tuples themselves, just the fields inside these tuples. You can only compress non-indexed fields. Compression works best when JSON is stored in the field.

Note

The compress module provides the API for compressing and decompressing data.

Enabling compression for a new space

First, create a space:

box.schema.space.create('bands')

Then, create an index for this space, for example:

box.space.bands:create_index('primary', {parts = {{1, 'unsigned'}}})

Create a format to declare field names and types. In the example below, the band_name and year fields have the zstd and lz4 compression formats, respectively. The first field (id) has the index, so it cannot be compressed.

box.space.bands:format({
           {name = 'id', type = 'unsigned'},
           {name = 'band_name', type = 'string', compression = 'zstd'},
           {name = 'year', type = 'unsigned', compression = 'lz4'}
       })

Now, the new tuples that you add to the space bands will be compressed. When you read a compressed tuple, you do not need to decompress it back yourself.

Checking which fields are compressed

To check which fields in a space are compressed, run space_object:format() on the space. If a field is compressed, the format includes the compression algorithm, for example:

tarantool> box.space.bands:format()
    ---
    - [{'name': 'id', 'type': 'unsigned'},
       {'type': 'string', 'compression': 'zstd', 'name': 'band_name'},
       {'type': 'unsigned', 'compression': 'lz4', 'name': 'year'}]
    ...

Enabling compression for existing spaces

You can enable compression for existing fields. All the tuples added after that will have this field compressed. However, this doesn’t affect the tuples already stored in the space. You need to make the snapshot and restart Tarantool to compress the existing tuples.

Here’s an example of how to compress existing fields:

  1. Create a space without compression and add several tuples:

    box.schema.space.create('bands')
    
    box.space.bands:format({
        { name = 'id', type = 'unsigned' },
        { name = 'band_name', type = 'string' },
        { name = 'year', type = 'unsigned' }
    })
    
    box.space.bands:create_index('primary', { parts = { 'id' } })
    
    box.space.bands:insert { 1, 'Roxette', 1986 }
    box.space.bands:insert { 2, 'Scorpions', 1965 }
    box.space.bands:insert { 3, 'Ace of Base', 1987 }
    box.space.bands:insert { 4, 'The Beatles', 1960 }
    
  2. Suppose that you want fields 2 and 3 to be compressed from now on. To enable compression, change the format as follows:

    local new_format = box.space.bands:format()
    
    new_format[2].compression = 'zstd'
    new_format[3].compression = 'lz4'
    
    box.space.bands:format(new_format)
    

    From now on, all the tuples that you add to the space have fields 2 and 3 compressed.

  3. To finalize the change, create a snapshot by running box.snapshot() and restart Tarantool. As a result, all old tuples will also be compressed in memory during recovery.

Note

space:upgrade() provides the ability to enable compression and update the existing tuples in the background. To achieve this, you need to pass a new space format in the format argument of space:upgrade().

Tuple compression performance

Below are the results of a synthetic test that illustrate how tuple compression affects performance. The test was carried out on a simple Tarantool space containing 100,000 tuples, each having a field with a sample JSON roughly 600 bytes large. The test compared the speed of running select and replace operations on uncompressed and compressed data as well as the overall data size of the space. Performance is measured in requests per second.

Compression type

select, RPS

replace, RPS

Space size, bytes

None

4,486k

1,109k

41,168,548

zstd

308k

26k

21,368,548

lz4

1,765k

672k

25,268,548

zlib

325k

107k

20,768,548

Found what you were looking for?
Feedback