SQL tutorial
This tutorial is a demonstration of the support for SQL in Tarantool. It includes the functionality that you’d encounter in an “SQL-101” course.
In this tutorial, the requests are done in the Tarantool interactive console.
Start Tarantool and initialize it with default settings
by calling box.cfg{}
:
tarantool> box.cfg{}
Now you can start working with the database.
A feature of the client console program is that you can switch languages and specify the end-of-statement delimiter.
Run the following commands to set the console input language to SQL and use semicolon as a delimiter:
tarantool> \set language sql
tarantool> \set delimiter ;
To get started, enter simple SQL statements:
CREATE TABLE table1 (column1 INTEGER PRIMARY KEY, column2 VARCHAR(100));
INSERT INTO table1 VALUES (1, 'A');
UPDATE table1 SET column2 = 'B';
SELECT * FROM table1 WHERE column1 = 1;
The result of the SELECT
statement looks like this:
tarantool> SELECT * FROM table1 WHERE column1 = 1;
---
- metadata:
- name: COLUMN1
type: integer
- name: COLUMN2
type: string
rows:
- [1, 'B']
...
The result includes:
- Metadata: the names and data types of each column
- Result rows
For conciseness, metadata is skipped in query results in this tutorial. Only the result rows are shown.
Here is CREATE TABLE
with more details:
- There are multiple columns, with different data types.
- There is a
PRIMARY KEY
(unique and not-null) for two of the columns.
CREATE TABLE table2 (column1 INTEGER,
column2 VARCHAR(100),
column3 SCALAR,
column4 DOUBLE,
PRIMARY KEY (column1, column2));
The result is: row_count: 1
.
Put four rows in the table:
- The INTEGER and DOUBLE columns get numbers
- The VARCHAR and SCALAR columns get strings (the SCALAR strings are expressed as hexadecimals)
INSERT INTO table2 VALUES (1, 'AB', X'4142', 5.5);
INSERT INTO table2 VALUES (1, 'CD', X'2020', 1E4);
INSERT INTO table2 VALUES (2, 'AB', X'2020', 12.34567);
INSERT INTO table2 VALUES (-1000, '', X'', 0.0);
Then try to put another row:
INSERT INTO table2 VALUES (1, 'AB', X'A5', -5.5);
This INSERT
fails because of a primary-key violation: the row with the primary
key 1, 'AB'
already exists.
In Tarantool, SELECT
SQL queries that perform sequential scans (that is, go
through all the table rows instead of using indexes) are prohibited by default.
For example, this query leads to the error Scanning is not allowed for 'table2'
:
SELECT * FROM table2;
To execute a scan query, put the SEQSCAN
keyword before the table name:
SELECT * FROM SEQSCAN table2;
Try to execute these queries that use indexed column1
in filters:
SELECT * FROM table2 WHERE column1 = 1;
SELECT * FROM table2 WHERE column1 + 1 = 2;
The result is:
The first query returns rows:
- [1, 'AB', 'AB', 10.5] - [1, 'CD', ' ', 10005]
The second query fails with the error
Scanning is not allowed for 'TABLE2'
. Althoughcolumn1
is indexed, the expressioncolumn1 + 1
is not calculated from the index, which makes thisSELECT
a scan query.
Note
You can allow SQL scan queries without SEQSCAN
for the current session
by running the command:
SET SESSION "sql_seq_scan" = true;
Learn more about using SEQSCAN
in the SQL FROM clause description.
Retrieve the 4 rows in the table, in descending order by column2
, then
(where the column2
values are the same) in ascending order by column4
.
*
is short for “all columns”.
SELECT * FROM SEQSCAN table2 ORDER BY column2 DESC, column4 ASC;
Important
Tarantool has its own
The result is:
- - [1, 'CD', ' ', 10000]
- [1, 'AB', 'AB', 5.5]
- [2, 'AB', ' ', 12.34567]
- [-1000, '', '', 0]
Retrieve some of what you inserted:
- The first statement uses
the
LIKE
comparison operator which is asking for “first character must be ‘A’, the next characters can be anything.” - The second statement uses logical operators and parentheses, so the ANDed expressions must be true, or the ORed expression must be true. Notice the columns don’t have to be indexed.
SELECT column1, column2, column1 * column4 FROM SEQSCAN table2 WHERE column2
LIKE 'A%';
SELECT column1, column2, column3, column4 FROM SEQSCAN table2
WHERE (column1 < 2 AND column4 < 10)
OR column3 = X'2020';
The results are:
- - [1, 'AB', 5.5]
- [2, 'AB', 24.69134]
and
- - [-1000, '', '', 0]
- [1, 'AB', 'AB', 5.5]
- [1, 'CD', ' ', 10000]
- [2, 'AB', ' ', 12.34567]
Retrieve with grouping.
The rows which have the same values for column2
are grouped and are aggregated
– summed, counted, averaged – for column4
.
SELECT column2, SUM(column4), COUNT(column4), AVG(column4)
FROM SEQSCAN table2
GROUP BY column2;
The result is:
- - ['', 0, 1, 0]
- ['AB', 17.84567, 2, 8.922835]
- ['CD', 10000, 1, 10000]
Insert rows that contain NULL
values.
NULL
is not the same as Lua nil
; it commonly is used in SQL for unknown
or not-applicable.
INSERT INTO table2 VALUES (1, NULL, X'4142', 5.5);
INSERT INTO table2 VALUES (0, '!!@', NULL, NULL);
INSERT INTO table2 VALUES (0, '!!!', X'00', NULL);
The results are:
- The first
INSERT
fails becauseNULL
is not permitted for a column that was defined with aPRIMARY KEY
clause. - The other
INSERT
statements succeed.
Create a new index on column4
.
There already is an index for the primary key. Indexes are useful for making queries
faster. In this case, the index also acts as a constraint, because it prevents
two rows from having the same values in column4
. However, it is not an error
that column4
has multiple occurrences of NULLs.
CREATE UNIQUE INDEX i ON table2 (column4);
The result is: rowcount: 1
.
Create a table table3
, which contains a subset of the table2
columns
and a subset of the table2
rows.
You can do this by combining INSERT
with SELECT
. Then select everything
from the result table.
CREATE TABLE table3 (column1 INTEGER, column2 VARCHAR(100), PRIMARY KEY
(column2));
INSERT INTO table3 SELECT column1, column2 FROM SEQSCAN table2 WHERE column1 <> 2;
SELECT * FROM SEQSCAN table3;
The result is:
- - [-1000, '']
- [0, '!!!']
- [0, '!!@']
- [1, 'AB']
- [1, 'CD']
A subquery is a query within a query.
Find all the rows in table2
whose (column1, column2)
values are not
present in table3
.
SELECT * FROM SEQSCAN table2 WHERE (column1, column2) NOT IN (SELECT column1,
column2 FROM SEQSCAN table3);
The result is the single row that was excluded when inserting the rows with
the INSERT ... SELECT
statement:
- - [2, 'AB', ' ', 12.34567]
A join is a combination of two tables. There is more than one way to do them in Tarantool: “Cartesian joins”, “left outer joins”, and so on.
This example shows the most typical case, where column values from one table match column values from another table.
SELECT * FROM SEQSCAN table2, table3
WHERE table2.column1 = table3.column1 AND table2.column2 = table3.column2
ORDER BY table2.column4;
The result is:
- - [0, '!!!', "\0", null, 0, '!!!']
- [0, '!!@', null, null, 0, '!!@']
- [-1000, '', '', 0, -1000, '']
- [1, 'AB', 'AB', 5.5, 1, 'AB']
- [1, 'CD', ' ', 10000, 1, 'CD']
Create a table which includes a constraint that there must not be any rows
containing 13
in column2
. Then try to insert such a row.
CREATE TABLE table4 (column1 INTEGER PRIMARY KEY, column2 INTEGER, CHECK
(column2 <> 13));
INSERT INTO table4 VALUES (12, 13);
Result: the insert fails, as it should, with the message
Check constraint 'ck_unnamed_TABLE4_1' failed for tuple
.
Create a table which includes a constraint that there must not be any rows containing
values that do not appear in table2
.
CREATE TABLE table5 (column1 INTEGER, column2 VARCHAR(100),
PRIMARY KEY (column1),
FOREIGN KEY (column1, column2) REFERENCES table2 (column1, column2));
INSERT INTO table5 VALUES (2,'AB');
INSERT INTO table5 VALUES (3,'AB');
Result:
- The first
INSERT
statement succeeds becausetable3
contains a row with[2, 'AB', ' ', 12.34567]
. - The second
INSERT
statement, correctly, fails with the messageForeign key constraint ''fk_unnamed_TABLE5_1'' failed: foreign tuple was not found
.
Due to earlier INSERT
statements, these values are in column4
of table2
:
{0, NULL, NULL, 5.5, 10000, 12.34567}
. Add 5
to each of these values except 0
.
Adding 5
to NULL
results in NULL, as SQL arithmetic requires.
Use SELECT
to see what happened to column4
.
UPDATE table2 SET column4 = column4 + 5 WHERE column4 <> 0;
SELECT column4 FROM SEQSCAN table2 ORDER BY column4;
The result is: {NULL, NULL, 0, 10.5, 17.34567, 10005}
.
Due to earlier INSERT
statements, there are 6 rows in table2
:
- - [-1000, '', '', 0]
- [0, '!!!', "\0", null]
- [0, '!!@', null, null]
- [1, 'AB', 'AB', 10.5]
- [1, 'CD', ' ', 10005]
- [2, 'AB', ' ', 17.34567]
Try to delete the last and first of these rows:
DELETE FROM table2 WHERE column1 = 2;
DELETE FROM table2 WHERE column1 = -1000;
SELECT COUNT(column1) FROM SEQSCAN table2;
The result is:
- The first
DELETE
statement causes an error because there’s a foreign-key constraint. - The second
DELETE
statement succeeds. - The
SELECT
statement shows that there are 5 rows remaining.
Create another constraint that there must not be any rows in table1
containing values that do not appear in table5
. This was impossible
during the table1
creation because at that time table5
did not exist.
You can add constraints to existing tables with the ALTER TABLE
statement.
ALTER TABLE table1 ADD CONSTRAINT c
FOREIGN KEY (column1) REFERENCES table5 (column1);
DELETE FROM table1;
ALTER TABLE table1 ADD CONSTRAINT c
FOREIGN KEY (column1) REFERENCES table5 (column1);
Result: the ALTER TABLE
statement fails the first time because there is a row
in table1
, and ADD CONSTRAINT
requires that the table be empty.
After the row is deleted, the ALTER TABLE
statement completes successfully.
Now there is a chain of references, from table1
to table5
and from table5
to table2
.
The idea of a trigger is: if a change (INSERT
or UPDATE
or DELETE
) happens,
then a further action – perhaps another INSERT
or UPDATE
or DELETE
– will happen.
Set up the following trigger: when a update to table3
is done, do an update
to table2
. Specify this as FOR EACH ROW
, so that the trigger activates 5
times (since there are 5 rows in table3
).
SELECT column4 FROM table2 WHERE column1 = 2;
CREATE TRIGGER tr AFTER UPDATE ON table3 FOR EACH ROW
BEGIN UPDATE table2 SET column4 = column4 + 1 WHERE column1 = 2; END;
UPDATE table3 SET column2 = column2;
SELECT column4 FROM table2 WHERE column1 = 2;
Result:
- The first
SELECT
shows that the original value ofcolumn4
intable2
wherecolumn1 = 2
was: 17.34567. - The second
SELECT
returns:
- - [22.34567]
You can manipulate string data (usually defined with CHAR
or VARCHAR
data types)
in many ways. For example:
- concatenate strings with the
||
operator - extract substrings with the
SUBSTR
function
SELECT column2, column2 || column2, SUBSTR(column2, 2, 1) FROM SEQSCAN table2;
The result is:
- - ['!!!', '!!!!!!', '!']
- ['!!@', '!!@!!@', '!']
- ['AB', 'ABAB', 'B']
- ['CD', 'CDCD', 'D']
- ['AB', 'ABAB', 'B']
You can also manipulate number data (usually defined with INTEGER
or DOUBLE
data types) in many ways. For example:
- shift left with the
<<
operator - get modulo with the
%
operator
SELECT column1, column1 << 1, column1 << 2, column1 % 2 FROM SEQSCAN table2;
The result is:
- - [0, 0, 0, 0]
- [0, 0, 0, 0]
- [1, 2, 4, 1]
- [1, 2, 4, 1]
- [2, 4, 8, 0]
Tarantool can handle:
- integers anywhere in the 4-byte integer range
- approximate-numerics anywhere in the 8-byte IEEE floating point range
- any Unicode characters, with UTF-8 encoding and a choice of collations
Insert such values in a new table and see what happens when you select them with arithmetic on a number column and ordering by a string column.
CREATE TABLE t6 (column1 INTEGER, column2 VARCHAR(10), column4 DOUBLE,
PRIMARY KEY (column1));
INSERT INTO t6 VALUES (-1234567890, 'АБВГД', 123456.123456);
INSERT INTO t6 VALUES (+1234567890, 'GD', 1e30);
INSERT INTO t6 VALUES (10, 'FADEW?', 0.000001);
INSERT INTO t6 VALUES (5, 'ABCDEFG', NULL);
SELECT column1 + 1, column2, column4 * 2 FROM SEQSCAN t6 ORDER BY column2;
The result is:
- - [6, 'ABCDEFG', null]
- [11, 'FADEW?', 2e-06]
- [1234567891, 'GD', 2e+30]
- [-1234567889, 'АБВГД', 246912.246912]
A view (or viewed table), is virtual, meaning that its rows aren’t physically in the database, their values are calculated from other tables.
Create a view v3
based on table3
and select from it:
CREATE VIEW v3 AS SELECT SUBSTR(column2,1,2), column4 FROM SEQSCAN t6
WHERE column4 >= 0;
SELECT * FROM v3;
The result is:
- - ['АБ', 123456.123456]
- ['FA', 1e-06]
- ['GD', 1e+30]
By putting WITH
+ SELECT
in front of a SELECT
, you can make a
temporary view that lasts for the duration of the statement.
Create such a view and select from it:
WITH cte AS (
SELECT SUBSTR(column2,1,2), column4 FROM SEQSCAN t6
WHERE column4 >= 0)
SELECT * FROM cte;
The result is the same as the CREATE VIEW
result:
- - ['АБ', 123456.123456]
- ['FA', 1e-06]
- ['GD', 1e+30]
Tarantool can handle statements like SELECT 55;
(select without FROM
)
like some other popular DBMSs. But it also handles the more standard statement
VALUES (expression [, expression ...]);
.
SELECT 55 * 55, 'The rain in Spain';
VALUES (55 * 55, 'The rain in Spain');
The result of both these statements is:
- - [3025, 'The rain in Spain']
To find out the internal structure of the Tarantool database with SQL,
select from the Tarantool system tables _space
, _index
, and _trigger
:
SELECT * FROM SEQSCAN "_space";
SELECT * FROM SEQSCAN "_index";
SELECT * FROM SEQSCAN "_trigger";
Actually, these statements select from NoSQL “system spaces”.
Select from _space
by a table name:
SELECT "id", "name", "owner", "engine" FROM "_space" WHERE "name"='TABLE3';
The result is:
- - [517, 'TABLE3', 1, 'memtx']
You can execute SQL statements directly from the Lua code without switching to the SQL input.
Change the settings so that the console accepts statements written in Lua instead of statements written in SQL:
tarantool> \set language lua
You can invoke SQL statements using the Lua function box.execute(string)
.
tarantool> box.execute([[SELECT * FROM SEQSCAN table3;]]);
The result is:
- - [-1000, '']
- [0, '!!!']
- [0, '!!@']
- [1, 'AB']
- [1, 'CD']
...
To see how the SQL in Tarantool scales, create a bigger table.
The following Lua code generates one million rows with random data and inserts them into a table. Copy this code into the Tarantool console and wait a bit:
box.execute("CREATE TABLE tester (s1 INT PRIMARY KEY, s2 VARCHAR(10))");
function string_function()
local random_number
local random_string
random_string = ""
for x = 1,10,1 do
random_number = math.random(65, 90)
random_string = random_string .. string.char(random_number)
end
return random_string
end;
function main_function()
local string_value, t, sql_statement
for i = 1,1000000,1 do
string_value = string_function()
sql_statement = "INSERT INTO tester VALUES (" .. i .. ",'" .. string_value .. "')"
box.execute(sql_statement)
end
end;
start_time = os.clock();
main_function();
end_time = os.clock();
print('insert done in ' .. end_time - start_time .. ' seconds');
The result is: you now have a table with a million rows, with a message saying
“insert done in 88.570578 seconds
”.
Check how SELECT
works on the million-row table:
- the first query goes by an index because
s1
is the primary key - the second query does not go by an index
box.execute([[SELECT * FROM tester WHERE s1 = 73446;]]);
box.execute([[SELECT * FROM SEQSCAN tester WHERE s2 LIKE 'QFML%';]]);
The result is:
- the first statement completes instantaneously
- the second statement completed noticeably slower
To cleanup all the objects created in this tutorial, switch to the SQL input
language again. Then run the DROP
statements for all created tables, views,
and triggers.
These statements must be entered separately.
tarantool> \set language sql
tarantool> DROP TABLE tester;
tarantool> DROP TABLE table1;
tarantool> DROP VIEW v3;
tarantool> DROP TRIGGER tr;
tarantool> DROP TABLE table5;
tarantool> DROP TABLE table4;
tarantool> DROP TABLE table3;
tarantool> DROP TABLE table2;
tarantool> DROP TABLE t6;
tarantool> \set language lua
tarantool> os.exit();