Importing data
Enterprise Edition
This command is supported by the Enterprise Edition only.
$ tt [crud|tdg2] import URI FILE:SPACE [IMPORT_OPTION ...]
# or
$ tt [crud|tdg2] import URI :SPACE < FILE [IMPORT_OPTION ...]
tt [crud|tdg] import imports data from a file to a space. Three import commands
cover the following cases:
tt importimports data into a replica set through its master instance using the box.space API.tt crud importimports data into a sharded cluster through a router using the CRUD module.tt tdg2 importimports data into a Tarantool Data Grid 2 cluster through its router using therepository.putfunction of the TDG2 Repository API.
tt [crud|tdg2] import takes the following arguments:
URI: The URI of a router instance ifcrudis used. Otherwise, it should specify the URI of a storage.FILE: The name of a file containing data to be imported.SPACE: The name of a space to which data is imported.
Примечание
Write access to the space and execute access to universe are required to import data.
tt import imports data from the following formats:
tt importandtt crud import: CSVtt tdg2 import: JSON lines
Importing isn’t supported for the interval field type.
Suppose that you have the customers.csv file with a header containing field names in the first row:
id,firstname,lastname,age
1,Andrew,Fuller,38
2,Michael,Suyama,46
3,Robert,King,33
# ...
If the target customers space has fields with the same names, you can import data using the --header and --match options specified as follows:
$ tt crud import localhost:3301 customers.csv:customers \
--header \
--match=header
In this case, fields in the input file and the target space are matched automatically.
You can also match fields manually if field names in the input file and the target space differ.
Note that if you’re importing data into a cluster, you don’t need to specify the bucket_id field.
The CRUD module generates bucket_id values automatically.
The --match option enables importing data by matching field names in the input file and the target space manually.
Suppose that you have the following customers.csv file with four fields:
customer_id,name,surname,customer_age
1,Andrew,Fuller,38
2,Michael,Suyama,46
3,Robert,King,33
# ...
If the target customers space has the id, firstname, lastname, and age fields,
you can configure mapping as follows:
$ tt crud import localhost:3301 customers.csv:customers \
--header \
--match "id=customer_id;firstname=name;lastname=surname;age=customer_age"
Similarly, you can configure mapping using numeric field positions in the input file:
$ tt crud import localhost:3301 customers.csv:customers \
--header \
--match "id=1;firstname=2;lastname=3;age=4"
Below are the rules if some fields are missing in input data or space:
- If a space has fields that are not specified in input data,
tt [crud] importtries to insertnullvalues. - If input data contains fields missing in a target space, these fields are ignored.
When importing data into a CRUD-enabled sharded cluster, tt crud import ignores
the bucket_id field values from the input file. This allows CRUD to automatically
manage data distribution in the cluster by generating new bucket_id for tuples
during import.
If you need to preserve the original bucket_id values, use the --keep-bucket-id option:
$ tt crud import localhost:3301 customers.csv:customers \
--keep-bucket-id \
--header \
--match=header
The --on-exist option enables you to control data import when a duplicate primary key error occurs.
In the example below, values already existing in the space are replaced with new ones:
$ tt crud import localhost:3301 customers.csv:customers \
--on-exist replace
To skip rows whose data cannot be parsed correctly, use the --on-error option as follows:
$ tt crud import localhost:3301 customers.csv:customers \
--on-error skip
Примечание
In the TDG2 data model, a type represents a Tarantool space, and an object of a type represents a tuple in the type’s underlying space.
The command below imports objects of the customers type into a TDG2 cluster.
The objects are described in the customers.jsonl file.
$ tt tdg2 import localhost:3301 customers.jsonl:customers
If token authentication is enabled in TDG2, pass the application token in the --token option:
$ tt tdg2 import localhost:3301 customers.jsonl:customers \
--token=2fc136cf-8cae-4655-a431-7c318967263d
The input file can look like this:
{"age":30,"first_name":"Samantha","id":1,"second_name":"Carter"}
{"age":41,"first_name":"Fay","id":2,"second_name":"Rivers"}
{"age":74,"first_name":"Milo","id":4,"second_name":"Walters"}
Примечание
Since JSON describes objects in maps with string keys, there is no way to import a field value that is a map with a non-string key.
In case of an error during TDG2 import, tt tdg2 import rolls back the changes made
within the current batch on the storage where the error has happened (per-storage rollback)
and reports an error. On other storages, objects from the same batch can be successfully
imported. So, the rollback process of tt tdg2 import
is the same as the one of tt crud import with the --rollback-on-error option.
Since object batches can be imported partially (per-storage rollback), the absence
of error matching complicates the debugging in case of errors. To minimize this
effect, the default batch size (--batch-size) for tt tdg2 import is 1.
This makes the debugging straightforward: you always know which object caused the error.
On the other hand, this decreases the performance in comparison to import in larger batches.
If you increase the batch size, tt informs you about the possible issues and
asks for an explicit confirmation to proceed.
To automatically confirm a batch import operation, add the --force option:
$ tt tdg2 import localhost:3301 customers.jsonl:customers \
--batch-size=100 \
--force
When connecting to the cluster with enabled authentication, specify access credentials
in the --username and --password command options:
$ tt crud import localhost:3301 customers.csv:customers \
--header --match=header \
--username myuser --password p4$$w0rD
To connect to instances that use SSL encryption,
provide the SSL certificate and SSL key files in the --sslcertfile and --sslkeyfile options.
If necessary, add other SSL parameters in the --ssl* options.
$ tt crud import localhost:3301 customers.csv:customers \
--header --match=header \
--username myuser --password p4$$w0rD \
--auth pap-sha256 --sslcertfile certs/server.crt \
--sslkeyfile certs/server.key
For connections that use SSL but don’t require additional parameters, add the --use-ssl
option:
$ tt crud import localhost:3301 customers.csv:customers \
--header --match=header \
--username myuser --password p4$$w0rD \
--use-ssl
-
--authSTRING¶ Applicable to:
tt crud import,tt tdg2 importAuthentication type:
chap-sha1,pap-sha256, orauto.
-
--batch-sizeINT¶ Applicable to:
tt crud import,tt tdg2 importThe number of tuples to transfer per request. The default is:
100fortt crud import.1fortt tdg2 import. See Importing into Tarantool Data Grid 2 for details.
-
--dec-sepSTRING¶ Applicable to:
tt import,tt crud importThe string of symbols that defines decimal separators for numeric data (the default is
.,).Примечание
Symbols specified in this option cannot intersect with
--th-sep.
-
--delimiterSTRING¶ Applicable to:
tt import,tt crud importA symbol that defines a field value delimiter. For CSV, the default delimiter is a comma (
,). To use a tab character as a delimiter, set this value astab:$ tt crud import localhost:3301 customers.csv:customers \ --delimiter tab
Примечание
A delimiter cannot be
\r,\n, or the Unicode replacement character (U+FFFD).
-
--errorSTRING¶ The name of a file containing rows that are not imported (the default is
error).See also: Handling parsing errors.
-
--force¶ Applicable to:
tt tdg2 importAutomatically confirm importing into TDG2 with
--batch-sizegreater than one.
-
--formatSTRING¶ A format of input data.
Supported formats:
csv.
-
--header¶ Applicable to:
tt import,tt crud importProcess the first line as a header containing field names. In this case, field values start from the second line.
See also: Matching of input and space fields.
-
--keep-bucket-id¶ Applicable to:
tt crud importPreserve original values of the
bucket_idfield.See also: Importing bucket_id into sharded clusters.
-
--logSTRING¶ The name of a log file containing information about import errors (the default is
import). If the log file already exists, new data is written to this file.
-
--matchSTRING¶ Applicable to:
tt import,tt crud importConfigure matching between field names in the input file and the target space.
See also: Matching of input and space fields.
-
--nullSTRING¶ Applicable to:
tt import,tt crud importA value to be interpreted as
nullwhen importing data. By default, an empty value is interpreted asnull. For example, a tuple imported from the following row …1,477,Andrew,,38
… should look as follows:
[1, 477, 'Andrew', null, 38].
-
--on-errorSTRING¶ An action performed if a row to be imported cannot be parsed correctly. Possible values:
stop: stop importing data.skip: skip rows whose data cannot be parsed correctly.
Duplicate primary key errors are handled using the
--on-existoption.See also: Handling parsing errors.
-
--on-existSTRING¶ An action performed if a duplicate primary key error occurs. Possible values:
stop: stop importing data.skip: skip existing values when importing.replace: replace existing values when importing.
Other errors are handled using the
--on-erroroption.See also: Handling duplicate primary key errors.
-
--passwordSTRING¶ A password used to connect to the instance.
-
--progressSTRING¶ The name of a progress file that stores the following information:
- The positions of lines that were not imported at the last launch.
- The last position that was processed at the last launch.
If a file with the specified name exists, it is taken into account when importing data.
tt importtries to insert lines that were not imported and then continues importing from the last position.At each launch, the content of a progress file with the specified name is overwritten. If the file with the specified name does not exist, a progress file is created with the results of this run.
Примечание
If the option is not set, then this mechanism is not used.
-
--quoteSTRING¶ Applicable to:
tt import,tt crud importA symbol that defines a quote. For CSV, double quotes are used by default (
"). The double symbol of this option acts as the escaping symbol within input data.
-
--rollback-on-error¶ Applicable to:
tt crud importSpecify whether any operation failed on a storage leads to rolling back batch import on this storage.
Примечание
tt tdg2 importalways works as if--rollback-on-erroristrue.
-
--sslcafileSTRING¶ Applicable to:
tt crud import,tt tdg2 importThe path to a trusted certificate authorities (CA) file for encrypted connections.
See also Encrypted connection.
-
--sslcertfileSTRING¶ Applicable to:
tt crud import,tt tdg2 importThe path to an SSL certificate file for encrypted connections.
See also Encrypted connection.
-
--sslciphersfileSTRING¶ Applicable to:
tt crud import,tt tdg2 importThe list of SSL cipher suites used for encrypted connections, separated by colons (
:).See also Encrypted connection.
-
--sslkeyfileSTRING¶ Applicable to:
tt crud import,tt tdg2 importThe path to a private SSL key file for encrypted connections.
See also Encrypted connection.
-
--sslpasswordSTRING¶ Applicable to:
tt crud import,tt tdg2 importThe password for the SSL key file for encrypted connections.
See also Encrypted connection.
-
--sslpasswordfileSTRING¶ Applicable to:
tt crud import,tt tdg2 importA file with a list of passwords to the SSL key file for encrypted connections.
See also Authentication.
-
-successSTRING¶ The name of a file with rows that were imported (the default is
success). Overwrites the file if it already exists.
-
--th-sepSTRING¶ Applicable to:
tt import,tt crud importThe string of symbols that define thousand separators for numeric data. The default value includes a space and a backtick
`. This means that1 000 000and1`000`000are both imported as1000000.Примечание
Symbols specified in this option cannot intersect with
--dec-sep.
-
--tokenSTRING¶ Applicable to:
tt tdg2 importAn application token for connecting to TDG2.
-
--use-sslSTRING¶ Use SSL without providing any additional SSL parameters.
See also Encrypted connection.
-
--usernameSTRING¶ A username for connecting to the instance.