Module cartridge.issues | Tarantool
Документация на русском языке
поддерживается сообществом
Tarantool Cartridge Table of contents Module cartridge.issues

Module cartridge.issues

Monitor issues across cluster instances.

Cartridge detects the following problems:

Replication:

  • critical: «Replication from … to … isn’t running» - when box.info.replication.upstream == nil ;
  • critical: «Replication from … to … state «stopped»/»orphan»/etc. (…)»;
  • warning: «Replication from … to …: high lag» - when upstream.lag > box.cfg.replication_sync_lag ;
  • warning: «Replication from … to …: long idle» - when upstream.idle > 2 * box.cfg.replication_timeout ;

Failover:

  • warning: «Can’t obtain failover coordinator (…)»;
  • warning: «There is no active failover coordinator»;
  • warning: «Failover is stuck on …: Error fetching appointments (…)»;
  • warning: «Failover is stuck on …: Failover fiber is dead» - this is likely a bug;

Switchover:

  • warning: «Consistency on … isn’t reached yet»;

Clock:

  • warning: «Clock difference between … and … exceed threshold» limits.clock_delta_threshold_warning ;

Memory:

  • critical: «Running out of memory on …» - when all 3 metrics items_used_ratio, arena_used_ratio, quota_used_ratio from box.slab.info() exceed limits.fragmentation_threshold_critical ;
  • warning: «Memory is highly fragmented on …» - when items_used_ratio > limits.fragmentation_threshold_warning and both arena_used_ratio, quota_used_ratio exceed critical limit;

Configuration:

  • warning: «Configuration checksum mismatch on …»;
  • warning: «Configuration is prepared and locked on …»;
  • warning: «Advertise URI (…) differs from clusterwide config (…)»;
  • warning: «Configuring roles is stuck on … and hangs for … so far»;

Vshard:

  • various vshard alerts (see vshard docs for details);
  • warning: «Group «…» wasn’t bootstrapped: …»;
  • warning: Vshard storages in replicaset %s marked as «all writable».

Alien members:

  • warning: «Instance … with alien uuid is in the membership» - when two separate clusters share the same cluster cookie;

Expelled instances:

  • warning: «Replicaset … has expelled instance … in box.space._cluster» - when instance was expelled from replicaset, but still remains in box.space._cluster;

Deprecated space format:

  • warning: «Instance … has spaces with deprecated format: space1, …»

Raft issues:

  • warning: «Raft leader idle is 10.000 on … . Is raft leader alive and connection is healthy?»

Unhealthy replicasets:

  • critical: «All instances are unhealthy in replicaset … «.

Custom issues (defined by user):

GraphQL request:

You can get info about cluster issues using the following GrapQL request:

{
    cluster {
        issues {
            level
            message
            replicaset_uuid
            instance_uuid
            topic
         }
     }
 }

Thresholds for issuing warnings. All settings are local, not clusterwide. They can be changed with corresponding environment variables ( TARANTOOL_* ) or command-line arguments. See cartridge.argparse module for details.

Fields:

  • fragmentation_threshold_critical: (number) default: 0.85.
  • fragmentation_threshold_full: (number) default: 1.0.
  • fragmentation_threshold_warning: (number) default: 0.6.
  • clock_delta_threshold_warning: (number) default: 5.

Validate limits configuration.

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

Update limits configuration.

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

Нашли ответ на свой вопрос?
Обратная связь