Fibers, yields, and cooperative multitasking
Creating a fiber is the Tarantool way of making application logic work in the background at all times. A fiber is a set of instructions that are executed with cooperative multitasking: the instructions contain yield signals, upon which control is passed to another fiber.
Fibers are similar to threads of execution in computing. The key difference is that threads use preemptive multitasking, while fibers use cooperative multitasking (see below). This gives fibers the following two advantages over threads:
- Better controllability. Threads often depend on the kernel’s thread scheduler to preempt a busy thread and resume another thread, so preemption may occur unpredictably. Fibers yield themselves to run another fiber while executing, so yields are controlled by application logic.
- Higher performance. Threads require more resources to preempt as they need to address the system kernel. Fibers are lighter and faster as they don’t need to address the kernel to yield.
Yet fibers have some limitations as compared with threads, the main limitation being no multi-core mode. All fibers in an application belong to a single thread, so they all use the same CPU core as the parent thread. Meanwhile, this limitation is not really serious for Tarantool applications, because a typical bottleneck for Tarantool is the HDD, not the CPU.
A fiber has all the features of a Lua coroutine and all programming concepts that apply for Lua coroutines will apply for fibers as well. However, Tarantool has made some enhancements for fibers and has used fibers internally. So, although the use of coroutines is possible and supported, the use of fibers is recommended.
Any live fiber can be in one of three states: running
, suspended
, and
ready
. After a fiber dies, the dead
status returns.
To learn more about fibers, go to the fiber module documentation.
Yield is an action that occurs in a cooperative environment that transfers control of the thread from the current fiber to another fiber that is ready to execute.
Any live fiber can be in one of three states: running
, suspended
, and
ready
. After a fiber dies, the dead
status is returned. By observing
fibers from the outside, you can only see running
(for the current fiber)
and suspended
for any other fiber waiting for an event from the event loop (ev
)
for execution.
After a yield has occurred, the next ready
fiber is taken from the queue and executed.
When there are no more ready
fibers, execution is transferred to the event loop.
After a fiber has yielded and regained control, it immediately issues testcancel.
Yields can be explicit or implicit.
Explicit yields are clearly visible from the invoking code. There are only two explicit yields: fiber.yield() and fiber.sleep(t).
- fiber.yield() yields execution to another
ready
fiber while putting itself in theready
state, meaning that it will be executed again as soon as possible while being polite to other fibers waiting for execution. - fiber.sleep(t) yields execution to another
ready
fiber and puts itself in thesuspended
state for timet
until time passes and the event loop wakes up this fiber to theready
state.
In general, it is good behavior for long-running cpu-intensive tasks to yield periodically to be cooperative to other waiting fibers.
On the other hand, there are many operations, such as operations with sockets, file system, and disk I/O, which imply some waiting for the current fiber while others can be executed. When such an operation occurs, a possible blocking operation would be passed into the event loop and the fiber would be suspended until the resource is ready to continue fiber execution.
Here is the list of implicitly yielding operations:
- Connection establishment (socket).
- Socket read and write (socket).
- Filesystem operations (from fio).
- Channel data transfer (fiber.channel).
- File input/output (from fio).
- Console operations (since console is a socket).
- HTTP requests (since HTTP is a socket operation).
- Database modifications (if they imply a disk write).
- Database reading for the vinyl engine.
- Invocation of another process (popen).
Примечание
Please note that all operations of the os
module are non-cooperative and
exclusively block the whole tx thread.
For memtx, since all data is in memory, there is no yielding for a read request
(like :select
, :pairs
, :get
).
For vinyl, since some data may not be in memory, there may be disk I/O for a read (to fetch data from disk) or write (because a stall may occur while waiting for memory to be freed).
For both memtx and vinyl, since data change requests must be recorded in the WAL, there is normally a box.commit().
With the default autocommit
mode the following operations are yielding:
- space:alter.
- space:drop.
- space:create_index.
- space:truncate.
- space:insert.
- space:replace.
- space:update.
- space:upserts.
- space:delete.
- index:update.
- index:delete.
- index:alter.
- index:drop.
- index:rename.
- box.commit (if there were some modifications within the transaction).
To provide atomicity for transactions in transaction mode, some changes are applied to the modification operations for the memtx engine. After executing box.begin or within a box.atomic call, any modification operation will not yield, and yield will occur only on box.commit or upon return from box.atomic. Meanwhile, box.rollback does not yield.
That is why executing separate commands like select()
, insert()
, update()
in the console inside a
transaction without MVCC will cause it to an abort. This is due to implicit yield after each
chunk of code is executed in the console.
- Engine = memtx.
space:get()
space:insert()
The sequence has one yield, at the end of the insert, caused by implicit commit;
get()
has nothing to write to the WAL and so does not yield.
- Engine = memtx.
box.begin()
space1:get()
space1:insert()
space2:get()
space2:insert()
box.commit()
The sequence has one yield, at the end of the box.commit
, none of the inserts are yielding.
- Engine = vinyl.
space:get()
space:insert()
The sequence has one to three yields, since get()
may yield if the data is not in the cache,
insert()
may yield if it waits for available memory, and there is an implicit yield
at commit.
- Engine = vinyl.
box.begin()
space1:get()
space1:insert()
space2:get()
space2:insert()
box.commit()
The sequence may yield from 1 to 5 times.
Assume that there are tuples in the memtx space tester
where the third field
represents a positive dollar amount.
Let’s start a transaction, withdraw from tuple#1, deposit in tuple#2, and end the transaction, making its effects permanent.
tarantool> function txn_example(from, to, amount_of_money)
> box.atomic(function()
> box.space.tester:update(from, {{'-', 3, amount_of_money}})
> box.space.tester:update(to, {{'+', 3, amount_of_money}})
> end)
> return "ok"
> end
Result:
---
...
tarantool> txn_example({999}, {1000}, 1.00)
---
- "ok"
...
If wal_mode = none
, then
there is no implicit yielding at the commit time because there are
no writes to the WAL.
If a request if performed via network connector such as net.box and implies sending requests to the server and receiving responses, then it involves network I/O and thus implicit yielding. Even if the request that is sent to the server has no implicit yield. Therefore, the following sequence causes yields three times sequentially when sending requests to the network and awaiting the results.
conn.space.test:get{1}
conn.space.test:get{2}
conn.space.test:get{3}
Cooperative multitasking means that unless a running fiber deliberately yields control, it is not preempted by some other fiber. But a running fiber will deliberately yield when it encounters a «yield point»: a transaction commit, an operating system call, or an explicit «yield» request. Any system call which can block will be performed asynchronously, and any running fiber which must wait for a system call will be preempted, so that another ready-to-run fiber takes its place and becomes the new running fiber.
This model makes all programmatic locks unnecessary: cooperative multitasking ensures that there will be no concurrency around a resource, no race conditions, and no memory consistency issues. The way to achieve this is simple: Use no yields, explicit or implicit in critical sections, and no one can interfere with code execution.
For small requests, such as simple UPDATE or INSERT or DELETE or SELECT, fiber scheduling is fair: it takes little time to process the request, schedule a disk write, and yield to a fiber serving the next client.
However, a function may perform complex calculations or be written in such a way that yields take a long time to occur. This can lead to unfair scheduling when a single client throttles the rest of the system, or to apparent stalls in processing requests. It is the responsibility of the function author to avoid this situation. As a protective mechanism, a fiber slice can be used.