Configuration: Check Scheduling¶

This page describes how check intervals and parallelism work.

Resource Type Defaults¶

Resource Type	Default Interval	Default Parallelism
HTTP	1 minute	5 threads
DOCKER	5 minutes	2 threads
TCP	1 minute	5 threads

Settings are managed in Admin -> Resource Types.

Resource Discovery Defaults¶

Discovery Service Type	Default Sync Interval	Default Parallelism
DOCKER_REPOSITORY	60 minutes	1 thread

Settings are managed in Admin -> Resource Discovery.

How Scheduler Timing Works¶

The scheduler loop runs every 30 seconds.
A resource type run is dispatched when now - lastRun >= interval.
Parallelism controls concurrent checks for that type.

Startup Behavior¶

Kairos always performs an immediate check pass at startup, independent of configured intervals.

Outage Detection and Recovery¶

Outages are evaluated per resource type and configured in Admin -> Resource Types.

Setting	Default	Meaning
Outage threshold	`3`	Open an outage after this many consecutive `NOT_AVAILABLE` check results
Recovery threshold	`2`	Close an active outage after this many consecutive `AVAILABLE` check results

Outage lifecycle:

At most one active outage is kept per resource.
Outage start time is based on the first failing check in the triggering failure streak.
Outage end time is set to the check time that satisfies the recovery threshold.

Operational notes:

Public outage overview page: /outages
Resource detail pages include active outage banner and outage history table
Dashboard rows/cards show active outage indicators with live elapsed time

Retention Jobs¶

Kairos runs retention in dedicated background jobs. Both jobs are configured in Admin -> General Settings and run independently from check execution.

Check History Retention¶

Controls: checkHistoryRetentionEnabled, checkHistoryRetentionIntervalMinutes, checkHistoryRetentionDays
Default interval: 60 minutes
Default retention: 31 days
Deletion reference: check_result.checked_at

Outage Retention¶

Controls: outageRetentionEnabled, outageRetentionIntervalHours, outageRetentionDays
Default interval: 12 hours
Default retention: 31 days
Deletion reference: outage.endDate
Only closed outages are removed; active outages are never deleted by retention.

Retention Flow¶

flowchart TD
    A[Scheduler tick] --> B{Retention job enabled?}
    B -- No --> Z[Skip run]
    B -- Yes --> C{Interval elapsed?}
    C -- No --> Z
    C -- Yes --> D[Compute cutoff now - retentionDays]
    D --> E[Delete matching historical records]
    E --> F[Log cleanup result]

Safety Notes¶

Set retention values according to compliance and incident review requirements.
Outage retention uses end date to preserve full incident duration before deletion.
If you need long-term analytics, export data before lowering retention days.

DOCKER_REPOSITORY Discovery Behavior¶

DOCKER_REPOSITORY sync runs do not create direct check entries.

Instead, each run synchronizes discovered images into generated DOCKER resources in an auto-created group and removes resources that no longer exist in the upstream registry.