SQL tutorial | Tarantool
How-to guides SQL guides SQL tutorial

SQL tutorial

This tutorial is a demonstration of the support for SQL in Tarantool. It includes the functionality that you’d encounter in an “SQL-101” course.

Before starting this tutorial:

  1. Install the tt CLI utility.

  2. Create a tt environment in the current directory using the tt init command.

  3. Inside the instances.enabled directory of the created tt environment, create the sql_tutorial directory.

  4. Inside instances.enabled/sql_tutorial, create the instances.yml and config.yaml files:

    • instances.yml specifies instances to run in the current environment:

      instance001:
      
    • config.yaml contains basic configuration:

      groups:
        group001:
          replicasets:
            replicaset001:
              instances:
                instance001:
                  iproto:
                    listen:
                    - uri: '127.0.0.1:3301'
      

Read more: Starting instances using the tt utility.

After configuration, start a Tarantool instance from the tt environment directory using the following command:

$ tt start sql_tutorial

After that, connect to the instance:

$ tt connect sql_tutorial:instance001

This command opens an interactive Tarantool console. Now you can start working with the database.

A feature of the client console program is that you can switch languages and specify the end-of-statement delimiter.

Run the following commands to set the console input language to SQL and use semicolon as a delimiter:

sql_tutorial:instance001> \set language sql
sql_tutorial:instance001> \set delimiter ;

To get started, enter these SQL statements:

CREATE TABLE table1 (column1 INTEGER PRIMARY KEY, column2 VARCHAR(100));
INSERT INTO table1 VALUES (1, 'A');
UPDATE table1 SET column2 = 'B';
SELECT * FROM table1 WHERE column1 = 1;

The result of the SELECT statement looks like this:

sql_tutorial:instance001> SELECT * FROM table1 WHERE column1 = 1;
---
- metadata:
  - name: COLUMN1
    type: integer
  - name: COLUMN2
    type: string
  rows:
  - [1, 'B']
...

The result includes:

  • metadata: the names and data types of each column
  • result rows

For conciseness, metadata is skipped in query results in this tutorial. Only the result rows are shown.

Here is CREATE TABLE with more details:

  • There are multiple columns, with different data types.
  • There is a PRIMARY KEY (unique and not-null) for two of the columns.

Create another table:

CREATE TABLE table2 (column1 INTEGER,
                     column2 VARCHAR(100),
                     column3 SCALAR,
                     column4 DOUBLE,
                     PRIMARY KEY (column1, column2));

The result is: row_count: 1.

Put four rows in the table (table2):

  • The INTEGER and DOUBLE columns get numbers
  • The VARCHAR and SCALAR columns get strings (the SCALAR strings are expressed as hexadecimals)
INSERT INTO table2 VALUES (1, 'AB', X'4142', 5.5);
INSERT INTO table2 VALUES (1, 'CD', X'2020', 1E4);
INSERT INTO table2 VALUES (2, 'AB', X'2020', 12.34567);
INSERT INTO table2 VALUES (-1000, '', X'', 0.0);

Then try to put another row:

INSERT INTO table2 VALUES (1, 'AB', X'A5', -5.5);

This INSERT fails because of a primary-key violation: the row with the primary key 1, 'AB' already exists.

Sequential scan is the scan through all the table rows instead of using indexes. In Tarantool, SELECT SQL queries that perform sequential scans are prohibited by default. For example, this query leads to the error Scanning is not allowed for 'table2':

SELECT * FROM table2;

To execute a scan query, put the SEQSCAN keyword before the table name:

SELECT * FROM SEQSCAN table2;

Try to execute these queries that use indexed column1 in filters:

SELECT * FROM table2 WHERE column1 = 1;
SELECT * FROM table2 WHERE column1 + 1 = 2;

The result is:

  • The first query returns rows:

    - [1, 'AB', 'AB', 10.5]
    - [1, 'CD', '  ', 10005]
    
  • The second query fails with the error Scanning is not allowed for 'TABLE2'. Although column1 is indexed, the expression column1 + 1 is not calculated from the index, which makes this SELECT a scan query.

Note

To enable SQL scan queries without SEQSCAN for the current session, run this command:

SET SESSION "sql_seq_scan" = true;

Learn more about using SEQSCAN in the SQL FROM clause description.

Retrieve the 4 rows in the table, in descending order by column2, then (where the column2 values are the same) in ascending order by column4.

* is short for “all columns”.

SELECT * FROM SEQSCAN table2 ORDER BY column2 DESC, column4 ASC;

The result is:

- - [1, 'CD', '  ', 10000]
  - [1, 'AB', 'AB', 5.5]
  - [2, 'AB', '  ', 12.34567]
  - [-1000, '', '', 0]

Retrieve some of what you inserted:

  • The first statement uses the LIKE comparison operator which is asking for “first character must be ‘A’, the next characters can be anything.”
  • The second statement uses logical operators and parentheses, so the AND expressions must be true, or the OR expression must be true. Notice the columns don’t have to be indexed.
SELECT column1, column2, column1 * column4 FROM SEQSCAN table2 WHERE column2
LIKE 'A%';
SELECT column1, column2, column3, column4 FROM SEQSCAN table2
    WHERE (column1 < 2 AND column4 < 10)
    OR column3 = X'2020';

The first result is:

- - [1, 'AB', 5.5]
  - [2, 'AB', 24.69134]

The second result is:

- - [-1000, '', '', 0]
  - [1, 'AB', 'AB', 5.5]
  - [1, 'CD', '  ', 10000]
  - [2, 'AB', '  ', 12.34567]

Retrieve with grouping.

The rows that have the same values for column2 are grouped and are aggregated – summed, counted, averaged – for column4.

SELECT column2, SUM(column4), COUNT(column4), AVG(column4)
FROM SEQSCAN table2
GROUP BY column2;

The result is:

- - ['', 0, 1, 0]
  - ['AB', 17.84567, 2, 8.922835]
  - ['CD', 10000, 1, 10000]

Insert rows that contain NULL values.

NULL is not the same as Lua nil; it commonly is used in SQL for unknown or not-applicable.

INSERT INTO table2 VALUES (1, NULL, X'4142', 5.5);
INSERT INTO table2 VALUES (0, '!!@', NULL, NULL);
INSERT INTO table2 VALUES (0, '!!!', X'00', NULL);

The results are:

  • The first INSERT fails because NULL is not permitted for a column that was defined with a PRIMARY KEY clause.
  • The other INSERT statements succeed.

Create a new index on column4.

There already is an index for the primary key. Indexes are useful for making queries faster. In this case, the index also acts as a constraint, because it prevents two rows from having the same values in column4. However, it is not an error that column4 has multiple occurrences of NULLs.

CREATE UNIQUE INDEX i ON table2 (column4);

The result is: rowcount: 1.

Create a table table3, which contains a subset of the table2 columns and a subset of the table2 rows.

You can do this by combining INSERT with SELECT. Then select everything from the result table.

CREATE TABLE table3 (column1 INTEGER, column2 VARCHAR(100), PRIMARY KEY
(column2));
INSERT INTO table3 SELECT column1, column2 FROM SEQSCAN table2 WHERE column1 <> 2;
SELECT * FROM SEQSCAN table3;

The result is:

- - [-1000, '']
  - [0, '!!!']
  - [0, '!!@']
  - [1, 'AB']
  - [1, 'CD']

A subquery is a query within a query.

Find all the rows in table2 whose (column1, column2) values are not present in table3.

SELECT * FROM SEQSCAN table2 WHERE (column1, column2) NOT IN (SELECT column1,
column2 FROM SEQSCAN table3);

The result is the single row that was excluded when inserting the rows with the INSERT ... SELECT statement:

- - [2, 'AB', '  ', 12.34567]

A join is a combination of two tables. There is more than one way to do them in Tarantool, for example, “Cartesian joins” or “left outer joins”.

This example shows the most typical case, where column values from one table match column values from another table.

SELECT * FROM SEQSCAN table2, table3
    WHERE table2.column1 = table3.column1 AND table2.column2 = table3.column2
    ORDER BY table2.column4;

The result is:

- - [0, '!!!', "\0", null, 0, '!!!']
  - [0, '!!@', null, null, 0, '!!@']
  - [-1000, '', '', 0, -1000, '']
  - [1, 'AB', 'AB', 5.5, 1, 'AB']
  - [1, 'CD', ' ', 10000, 1, 'CD']

Create a table that includes a constraint – there must not be any rows containing 13 in column2. After that, try to insert the following row:

CREATE TABLE table4 (column1 INTEGER PRIMARY KEY, column2 INTEGER, CHECK
(column2 <> 13));
INSERT INTO table4 VALUES (12, 13);

Result: the insert fails, as it should, with the message Check constraint 'ck_unnamed_TABLE4_1' failed for tuple.

Create a table that includes a constraint: there must not be any rows containing values that do not appear in table2.

CREATE TABLE table5 (column1 INTEGER, column2 VARCHAR(100),
    PRIMARY KEY (column1),
    FOREIGN KEY (column1, column2) REFERENCES table2 (column1, column2));
INSERT INTO table5 VALUES (2,'AB');
INSERT INTO table5 VALUES (3,'AB');

Result:

  • The first INSERT statement succeeds because table3 contains a row with [2, 'AB', ' ', 12.34567].
  • The second INSERT statement, correctly, fails with the message Foreign key constraint ''fk_unnamed_TABLE5_1'' failed: foreign tuple was not found.

Due to earlier INSERT statements, these values are in column4 of table2: {0, NULL, NULL, 5.5, 10000, 12.34567}. Add 5 to each of these values except 0. Adding 5 to NULL results in NULL, as SQL arithmetic requires. Use SELECT to see what happened to column4.

UPDATE table2 SET column4 = column4 + 5 WHERE column4 <> 0;
SELECT column4 FROM SEQSCAN table2 ORDER BY column4;

The result is: {NULL, NULL, 0, 10.5, 17.34567, 10005}.

Due to earlier INSERT statements, there are 6 rows in table2:

- - [-1000, '', '', 0]
  - [0, '!!!', "\0", null]
  - [0, '!!@', null, null]
  - [1, 'AB', 'AB', 10.5]
  - [1, 'CD', '  ', 10005]
  - [2, 'AB', '  ', 17.34567]

Try to delete the last and first of these rows:

DELETE FROM table2 WHERE column1 = 2;
DELETE FROM table2 WHERE column1 = -1000;
SELECT COUNT(column1) FROM SEQSCAN table2;

The result is:

  • The first DELETE statement causes an error because there’s a foreign-key constraint.
  • The second DELETE statement succeeds.
  • The SELECT statement shows that there are 5 rows remaining.

Create another constraint that there must not be any rows in table1 containing values that do not appear in table5. This was impossible during the table1 creation because at that time table5 did not exist. You can add constraints to existing tables with the ALTER TABLE statement.

ALTER TABLE table1 ADD CONSTRAINT c
    FOREIGN KEY (column1) REFERENCES table5 (column1);
DELETE FROM table1;
ALTER TABLE table1 ADD CONSTRAINT c
    FOREIGN KEY (column1) REFERENCES table5 (column1);

Result: the ALTER TABLE statement fails the first time because there is a row in table1, and ADD CONSTRAINT requires that the table be empty. After the row is deleted, the ALTER TABLE statement completes successfully. Now there is a chain of references, from table1 to table5 and from table5 to table2.

The idea of a trigger is: if a change (INSERT or UPDATE or DELETE) happens, then a further action – perhaps another INSERT or UPDATE or DELETE – will happen.

Set up the following trigger: when a update to table3 is done, do an update to table2. Specify this as FOR EACH ROW, so that the trigger activates 5 times (since there are 5 rows in table3).

SELECT column4 FROM table2 WHERE column1 = 2;
CREATE TRIGGER tr AFTER UPDATE ON table3 FOR EACH ROW
BEGIN UPDATE table2 SET column4 = column4 + 1 WHERE column1 = 2; END;
UPDATE table3 SET column2 = column2;
SELECT column4 FROM table2 WHERE column1 = 2;

Result:

  • The first SELECT shows that the original value of column4 in table2 where column1 = 2 was: 17.34567.
  • The second SELECT returns:
- - [22.34567]

You can manipulate string data (usually defined with CHAR or VARCHAR data types) in many ways. For example:

  • concatenate strings with the || operator
  • extract substrings with the SUBSTR function
SELECT column2, column2 || column2, SUBSTR(column2, 2, 1) FROM SEQSCAN table2;

The result is:

- - ['!!!', '!!!!!!', '!']
  - ['!!@', '!!@!!@', '!']
  - ['AB', 'ABAB', 'B']
  - ['CD', 'CDCD', 'D']
  - ['AB', 'ABAB', 'B']

You can also manipulate number data (usually defined with INTEGER or DOUBLE data types) in many ways. For example:

  • shift left with the << operator
  • get modulo with the % operator
SELECT column1, column1 << 1, column1 << 2, column1 % 2 FROM SEQSCAN table2;

The result is:

- - [0, 0, 0, 0]
  - [0, 0, 0, 0]
  - [1, 2, 4, 1]
  - [1, 2, 4, 1]
  - [2, 4, 8, 0]

Tarantool can handle:

  • integers anywhere in the 4-byte integer range
  • approximate-numerics anywhere in the 8-byte IEEE floating point range
  • any Unicode characters, with UTF-8 encoding and a choice of collations

Insert such values in a new table and see what happens when you select them with arithmetic on a number column and ordering by a string column.

CREATE TABLE t6 (column1 INTEGER, column2 VARCHAR(10), column4 DOUBLE,
PRIMARY KEY (column1));
INSERT INTO t6 VALUES (-1234567890, 'АБВГД', 123456.123456);
INSERT INTO t6 VALUES (+1234567890, 'GD', 1e30);
INSERT INTO t6 VALUES (10, 'FADEW?', 0.000001);
INSERT INTO t6 VALUES (5, 'ABCDEFG', NULL);
SELECT column1 + 1, column2, column4 * 2 FROM SEQSCAN t6 ORDER BY column2;

The result is:

- - [6, 'ABCDEFG', null]
  - [11, 'FADEW?', 2e-06]
  - [1234567891, 'GD', 2e+30]
  - [-1234567889, 'АБВГД', 246912.246912]

A view (or viewed table), is virtual, meaning that its rows aren’t physically in the database, their values are calculated from other tables.

Create a view v3 based on table3 and select from it:

CREATE VIEW v3 AS SELECT SUBSTR(column2,1,2), column4 FROM SEQSCAN t6
WHERE column4 >= 0;
SELECT * FROM v3;

The result is:

- - ['АБ', 123456.123456]
  - ['FA', 1e-06]
  - ['GD', 1e+30]

By putting WITH + SELECT in front of a SELECT, you can make a temporary view that lasts for the duration of the statement.

Create such a view and select from it:

WITH cte AS (
             SELECT SUBSTR(column2,1,2), column4 FROM SEQSCAN t6
             WHERE column4 >= 0)
SELECT * FROM cte;

The result is the same as the CREATE VIEW result:

- - ['АБ', 123456.123456]
  - ['FA', 1e-06]
  - ['GD', 1e+30]

Tarantool can handle statements like SELECT 55; (select without FROM) like some other popular DBMSs. But it also handles the more standard statement VALUES (expression [, expression ...]);.

SELECT 55 * 55, 'The rain in Spain';
VALUES (55 * 55, 'The rain in Spain');

The result of both these statements is:

- - [3025, 'The rain in Spain']

To find out the internal structure of the Tarantool database with SQL, select from the Tarantool system tables _space, _index, and _trigger:

SELECT * FROM SEQSCAN "_space";
SELECT * FROM SEQSCAN "_index";
SELECT * FROM SEQSCAN "_trigger";

Actually, these statements select from NoSQL “system spaces”.

Select from _space by a table name:

SELECT "id", "name", "owner", "engine" FROM "_space" WHERE "name"='TABLE3';

The result is:

- - [517, 'TABLE3', 1, 'memtx']

You can execute SQL statements directly from the Lua code without switching to the SQL input.

Change the settings so that the console accepts statements written in Lua instead of statements written in SQL:

sql_tutorial:instance001> \set language lua

You can invoke SQL statements using the Lua function box.execute(string).

sql_tutorial:instance001> box.execute([[SELECT * FROM SEQSCAN table3;]]);

The result is:

- - [-1000, '']
  - [0, '!!!']
  - [0, '!!@']
  - [1, 'AB']
  - [1, 'CD']
...

To see how the SQL in Tarantool scales, create a bigger table.

The following Lua code generates one million rows with random data and inserts them into a table. Copy this code into the Tarantool console and wait a bit:

box.execute("CREATE TABLE tester (s1 INT PRIMARY KEY, s2 VARCHAR(10))");

function string_function()
    local random_number
    local random_string
    random_string = ""
    for x = 1, 10, 1 do
        random_number = math.random(65, 90)
        random_string = random_string .. string.char(random_number)
    end
    return random_string
end;

function main_function()
    local string_value, t, sql_statement
    for i = 1, 1000000, 1 do
        string_value = string_function()
        sql_statement = "INSERT INTO tester VALUES (" .. i .. ",'" .. string_value .. "')"
        box.execute(sql_statement)
    end
end;
start_time = os.clock();
main_function();
end_time = os.clock();
print('insert done in ' .. end_time - start_time .. ' seconds');

The result is: you now have a table with a million rows, with a message saying insert done in 88.570578 seconds.

Check how SELECT works on the million-row table:

  • the first query goes by an index because s1 is the primary key
  • the second query does not go by an index
box.execute([[SELECT * FROM tester WHERE s1 = 73446;]]);
box.execute([[SELECT * FROM SEQSCAN tester WHERE s2 LIKE 'QFML%';]]);

The result is:

  • the first statement completes instantaneously
  • the second statement completed noticeably slower

To cleanup all the objects created in this tutorial, switch to the SQL input language again. Then run the DROP statements for all created tables, views, and triggers.

These statements must be entered separately.

sql_tutorial:instance001> \set language sql
sql_tutorial:instance001> DROP TABLE tester;
sql_tutorial:instance001> DROP TABLE table1;
sql_tutorial:instance001> DROP VIEW v3;
sql_tutorial:instance001> DROP TRIGGER tr;
sql_tutorial:instance001> DROP TABLE table5;
sql_tutorial:instance001> DROP TABLE table4;
sql_tutorial:instance001> DROP TABLE table3;
sql_tutorial:instance001> DROP TABLE table2;
sql_tutorial:instance001> DROP TABLE t6;
sql_tutorial:instance001> \set language lua
sql_tutorial:instance001> os.exit();
Found what you were looking for?
Feedback