Skip to content

Configuration: Check Scheduling

This page describes how check intervals and parallelism work.

Resource Type Defaults

Resource Type Default Interval Default Parallelism
HTTP 1 minute 5 threads
DOCKER 5 minutes 2 threads
TCP 1 minute 5 threads

Settings are managed in Admin -> Resource Types.

Resource Discovery Defaults

Discovery Service Type Default Sync Interval Default Parallelism
DOCKER_REPOSITORY 60 minutes 1 thread

Settings are managed in Admin -> Resource Discovery.

How Scheduler Timing Works

  • The scheduler loop runs every 30 seconds.
  • A resource type run is dispatched when now - lastRun >= interval.
  • Parallelism controls concurrent checks for that type.

Startup Behavior

Kairos always performs an immediate check pass at startup, independent of configured intervals.

Outage Detection and Recovery

Outages are evaluated per resource type and configured in Admin -> Resource Types.

Setting Default Meaning
Outage threshold 3 Open an outage after this many consecutive NOT_AVAILABLE check results
Recovery threshold 2 Close an active outage after this many consecutive AVAILABLE check results

Outage lifecycle:

  • At most one active outage is kept per resource.
  • Outage start time is based on the first failing check in the triggering failure streak.
  • Outage end time is set to the check time that satisfies the recovery threshold.

Operational notes:

  • Public outage overview page: /outages
  • Resource detail pages include active outage banner and outage history table
  • Dashboard rows/cards show active outage indicators with live elapsed time

Retention Jobs

Kairos runs retention in dedicated background jobs. Both jobs are configured in Admin -> General Settings and run independently from check execution.

Check History Retention

  • Controls: checkHistoryRetentionEnabled, checkHistoryRetentionIntervalMinutes, checkHistoryRetentionDays
  • Default interval: 60 minutes
  • Default retention: 31 days
  • Deletion reference: check_result.checked_at

Outage Retention

  • Controls: outageRetentionEnabled, outageRetentionIntervalHours, outageRetentionDays
  • Default interval: 12 hours
  • Default retention: 31 days
  • Deletion reference: outage.endDate
  • Only closed outages are removed; active outages are never deleted by retention.

Retention Flow

flowchart TD
    A[Scheduler tick] --> B{Retention job enabled?}
    B -- No --> Z[Skip run]
    B -- Yes --> C{Interval elapsed?}
    C -- No --> Z
    C -- Yes --> D[Compute cutoff now - retentionDays]
    D --> E[Delete matching historical records]
    E --> F[Log cleanup result]

Safety Notes

  • Set retention values according to compliance and incident review requirements.
  • Outage retention uses end date to preserve full incident duration before deletion.
  • If you need long-term analytics, export data before lowering retention days.

DOCKER_REPOSITORY Discovery Behavior

DOCKER_REPOSITORY sync runs do not create direct check entries.

Instead, each run synchronizes discovered images into generated DOCKER resources in an auto-created group and removes resources that no longer exist in the upstream registry.