Importing data

Enterprise Edition

This command is supported by the Enterprise Edition only.

$ tt [crud|tdg2] import URI FILE:SPACE [IMPORT_OPTION ...]
# or
$ tt [crud|tdg2] import URI :SPACE < FILE [IMPORT_OPTION ...]

tt [crud|tdg] import imports data from a file to a space. Three import commands cover the following cases:

tt import imports data into a replica set through its master instance using the box.space API.
tt crud import imports data into a sharded cluster through a router using the CRUD module.
tt tdg2 import imports data into a Tarantool Data Grid 2 cluster through its router using the repository.put function of the TDG2 Repository API.

tt [crud|tdg2] import takes the following arguments:

URI: The URI of a router instance if crud is used. Otherwise, it should specify the URI of a storage.
FILE: The name of a file containing data to be imported.
SPACE: The name of a space to which data is imported.

Примечание

Write access to the space and execute access to universe are required to import data.

Input file format

tt import imports data from the following formats:

tt import and tt crud import: CSV
tt tdg2 import: JSON lines

Limitations

Importing isn’t supported for the interval field type.

Matching of input and space fields

Automatic matching

Suppose that you have the customers.csv file with a header containing field names in the first row:

id,firstname,lastname,age
1,Andrew,Fuller,38
2,Michael,Suyama,46
3,Robert,King,33
# ...

If the target customers space has fields with the same names, you can import data using the --header and --match options specified as follows:

$ tt crud import localhost:3301 customers.csv:customers \
                 --header \
                 --match=header

In this case, fields in the input file and the target space are matched automatically. You can also match fields manually if field names in the input file and the target space differ. Note that if you’re importing data into a cluster, you don’t need to specify the bucket_id field. The CRUD module generates bucket_id values automatically.

Manual matching

The --match option enables importing data by matching field names in the input file and the target space manually. Suppose that you have the following customers.csv file with four fields:

customer_id,name,surname,customer_age
1,Andrew,Fuller,38
2,Michael,Suyama,46
3,Robert,King,33
# ...

If the target customers space has the id, firstname, lastname, and age fields, you can configure mapping as follows:

$ tt crud import localhost:3301 customers.csv:customers \
                 --header \
                 --match "id=customer_id;firstname=name;lastname=surname;age=customer_age"

Similarly, you can configure mapping using numeric field positions in the input file:

$ tt crud import localhost:3301 customers.csv:customers \
                 --header \
                 --match "id=1;firstname=2;lastname=3;age=4"

Below are the rules if some fields are missing in input data or space:

If a space has fields that are not specified in input data, tt [crud] import tries to insert null values.
If input data contains fields missing in a target space, these fields are ignored.

Importing bucket_id into sharded clusters

When importing data into a CRUD-enabled sharded cluster, tt crud import ignores the bucket_id field values from the input file. This allows CRUD to automatically manage data distribution in the cluster by generating new bucket_id for tuples during import.

If you need to preserve the original bucket_id values, use the --keep-bucket-id option:

$ tt crud import localhost:3301 customers.csv:customers \
                 --keep-bucket-id \
                 --header \
                 --match=header

Handling duplicate primary key errors

The --on-exist option enables you to control data import when a duplicate primary key error occurs. In the example below, values already existing in the space are replaced with new ones:

$ tt crud import localhost:3301 customers.csv:customers \
                 --on-exist replace

Handling parsing errors

To skip rows whose data cannot be parsed correctly, use the --on-error option as follows:

$ tt crud import localhost:3301 customers.csv:customers \
                 --on-error skip

Importing into Tarantool Data Grid 2

Примечание

In the TDG2 data model, a type represents a Tarantool space, and an object of a type represents a tuple in the type’s underlying space.

The command below imports objects of the customers type into a TDG2 cluster. The objects are described in the customers.jsonl file.

$ tt tdg2 import localhost:3301 customers.jsonl:customers

The input file can look like this:

{"age":30,"first_name":"Samantha","id":1,"second_name":"Carter"}
{"age":41,"first_name":"Fay","id":2,"second_name":"Rivers"}
{"age":74,"first_name":"Milo","id":4,"second_name":"Walters"}

Примечание

Since JSON describes objects in maps with string keys, there is no way to import a field value that is a map with a non-string key.

In case of an error during TDG2 import, tt tdg2 import rolls back the changes made within the current batch on the storage where the error has happened (per-storage rollback) and reports an error. On other storages, objects from the same batch can be successfully imported. So, the rollback process of tt tdg2 import is the same as the one of tt crud import with the --rollback-on-error option.

Since object batches can be imported partially (per-storage rollback), the absence of error matching complicates the debugging in case of errors. To minimize this effect, the default batch size (--batch-size) for tt tdg2 import is 1. This makes the debugging straightforward: you always know which object caused the error. On the other hand, this decreases the performance in comparison to import in larger batches.

If you increase the batch size, tt informs you about the possible issues and asks for an explicit confirmation to proceed. To automatically confirm a batch import operation, add the --force option:

$ tt tdg2 import localhost:3301 customers.jsonl:customers \
                 --batch-size=100 \
                 --force

Options

--batch-size INT¶

Applicable to: tt crud import, tt tdg2 import

The number of tuples to transfer per request. The default is:

100 for tt crud import.

1 for tt tdg2 import. See Importing into Tarantool Data Grid 2 for details.

--dec-sep STRING¶

Applicable to: tt import, tt crud import

The string of symbols that defines decimal separators for numeric data (the default is .,).

Примечание

Symbols specified in this option cannot intersect with --th-sep.

--delimiter STRING¶

Applicable to: tt import, tt crud import

A symbol that defines a field value delimiter. For CSV, the default delimiter is a comma (,). To use a tab character as a delimiter, set this value as tab:

$ tt crud import localhost:3301 customers.csv:customers \
                 --delimiter tab

Примечание

A delimiter cannot be \r, \n, or the Unicode replacement character (U+FFFD).

--error STRING¶

The name of a file containing rows that are not imported (the default is error).

Версия:

Importing data

Input file format

Limitations

Matching of input and space fields

Automatic matching

Manual matching

Importing bucket_id into sharded clusters

Handling duplicate primary key errors

Handling parsing errors

Importing into Tarantool Data Grid 2

Options