Module cartridge.issues

Module cartridge.issues

Monitor issues across cluster instances.

Cartridge detects the following problems:

Replication:

  • «Replication from … to … isn’t running» - when box.info.replication.upstream == nil ;
  • «Replication from … to … is stopped/orphan/etc. (…)»;
  • «Replication from … to …: high lag» - when upstream.lag > box.cfg.replication_sync_lag ;
  • «Replication from … to …: long idle» - when upstream.idle > box.cfg.replication_timeout ;

Failover:

  • «Can’t obtain failover coordinator (…)»;
  • «There is no active failover coordinator»;
  • «Failover is stuck on …: Error fetching appointments (…)»;
  • «Failover is stuck on …: Failover fiber is dead» - this is likely a bug;

Clock:

  • «Clock difference between … and … exceed threshold» limits.clock_delta_threshold_warning ;

Memory:

  • «Running out of memory on …» - when all 3 metrics items_used_ratio, arena_used_ratio, quota_used_ratio from box.slab.info() exceed limits.fragmentation_threshold_critical ;
  • «Memory is highly fragmented on …» - when items_used_ratio > limits.fragmentation_threshold_warning and both arena_used_ratio , quota_used_ratio exceed critical limit.

Tables

limits

Thresholds for issuing warnings. All settings are local, not clusterwide. They can be changed with corresponding environment variables ( TARANTOOL_* ) or command-line arguments. See cartridge.argparse module for details.

Fields:

  • fragmentation_threshold_critical: (number) default: 0.9.
  • fragmentation_threshold_warning: (number) default: 0.6.
  • clock_delta_threshold_warning: (number) default: 5.