Importing data
Enterprise Edition
This command is supported by the Enterprise Edition only.
$ tt [crud] import URI FILE SPACE [IMPORT_OPTION ...]
# or
$ tt [crud] import URI - SPACE < FILE [IMPORT_OPTION ...]
tt [crud] import
imports data from a file to a space.
The crud
command is optional and can be used to import data to a cluster by using the CRUD module. Without crud
, data is imported using the box.space API.
This command takes the following arguments:
URI
: The URI of a router instance ifcrud
is used. Otherwise, it should specify the URI of a storage.FILE
: The name of a file containing data to be imported.SPACE
: The name of a space to which data is imported.
Note
Write access to the space and execute
access to universe
are required to import data.
Importing isn’t supported for the interval field type.
Suppose that you have the customers.csv
file with a header containing field names in the first row:
id,firstname,lastname,age
1,Andrew,Fuller,38
2,Michael,Suyama,46
3,Robert,King,33
# ...
If the target customers
space has fields with the same names, you can import data using the --header
and --match
options specified as follows:
$ tt crud import localhost:3301 customers.csv customers \
--header \
--match=header
In this case, fields in the input file and the target space are matched automatically.
You can also match fields manually if field names in the input file and the target space differ.
Note that if you’re importing data into a cluster, you don’t need to specify the bucket_id
field.
The CRUD module generates bucket_id
values automatically.
The --match
option enables importing data by matching field names in the input file and the target space manually.
Suppose that you have the following customers.csv
file with four fields:
customer_id,name,surname,customer_age
1,Andrew,Fuller,38
2,Michael,Suyama,46
3,Robert,King,33
# ...
If the target customers
space has the id
, firstname
, lastname
, and age
fields,
you can configure mapping as follows:
$ tt crud import localhost:3301 customers.csv customers \
--header \
--match "id=customer_id;firstname=name;lastname=surname;age=customer_age"
Similarly, you can configure mapping using numeric field positions in the input file:
$ tt crud import localhost:3301 customers.csv customers \
--header \
--match "id=1;firstname=2;lastname=3;age=4"
Below are the rules if some fields are missing in input data or space:
- If a space has fields that are not specified in input data,
tt [crud] import
tries to insertnull
values. - If input data contains fields missing in a target space, these fields are ignored.
The --on-exist
option enables you to control data import when a duplicate primary key error occurs.
In the example below, values already existing in the space are replaced with new ones:
$ tt crud import localhost:3301 customers.csv customers \
--on-exist replace
To skip rows whose data cannot be parsed correctly, use the --on-error
option as follows:
$ tt crud import localhost:3301 customers.csv customers \
--on-error skip
-
--dec-sep
STRING
¶ The string of symbols that defines decimal separators for numeric data (the default is
.,
).Note
Symbols specified in this option cannot intersect with
--th-sep
.
-
--delimiter
STRING
¶ A symbol that defines a field value delimiter. For CSV, the default delimiter is a comma (
,
). To use a tab character as a delimiter, set this value astab
:$ tt crud import localhost:3301 customers.csv customers \ --delimiter tab
Note
A delimiter cannot be
\r
,\n
, or the Unicode replacement character (U+FFFD
).
-
--error
STRING
¶ The name of a file containing rows that are not imported (the default is
error
).See also: Handling parsing errors.
-
--format
STRING
¶ A format of input data.
Supported formats:
csv
.
-
--header
¶
Process the first line as a header containing field names. In this case, field values start from the second line.
See also: Matching of input and space fields.
-
--log
STRING
¶ The name of a log file containing information about import errors (the default is
import
). If the log file already exists, new data is written to this file.
-
--match
STRING
¶ Configure matching between field names in the input file and the target space.
See also: Matching of input and space fields.
-
--null
STRING
¶ A value to be interpreted as
null
when importing data. By default, an empty value is interpreted asnull
. For example, a tuple imported from the following row …1,477,Andrew,,38
… should look as follows:
[1, 477, 'Andrew', null, 38]
.
-
--on-error
STRING
¶ An action performed if a row to be imported cannot be parsed correctly. Possible values:
stop
: stop importing data.skip
: skip rows whose data cannot be parsed correctly.
Duplicate primary key errors are handled using the
--on-exist
option.See also: Handling parsing errors.
-
--on-exist
STRING
¶ An action performed if a duplicate primary key error occurs. Possible values:
stop
: stop importing data.skip
: skip existing values when importing.replace
: replace existing values when importing.
Other errors are handled using the
--on-error
option.See also: Handling duplicate primary key errors.
-
--password
STRING
¶ A password used to connect to the instance.
-
--progress
STRING
¶ The name of a progress file that stores the following information:
- The positions of lines that were not imported at the last launch.
- The last position that was processed at the last launch.
If a file with the specified name exists, it is taken into account when importing data.
tt import
tries to insert lines that were not imported and then continues importing from the last position.At each launch, the content of a progress file with the specified name is overwritten. If the file with the specified name does not exist, a progress file is created with the results of this run.
Note
If the option is not set, then this mechanism is not used.
-
--quote
STRING
¶ A symbol that defines a quote. For CSV, double quotes are used by default (
"
). The double symbol of this option acts as the escaping symbol within input data.
-
-success
STRING
¶ The name of a file with rows that were imported (the default is
success
). Overwrites the file if it already exists.
-
--th-sep
STRING
¶ The string of symbols that define thousand separators for numeric data. The default value includes a space and a backtick
`
. This means that1 000 000
and1`000`000
are both imported as1000000
.Note
Symbols specified in this option cannot intersect with
--dec-sep
.
-
--username
STRING
¶ A username for connecting to the instance.
-
--rollback-on-error
¶
Applicable only when
crud
is used.Specify whether any operation failed on a router leads to rollback on a storage where the operation is failed.