Skip to main content
PercherPercher

Monitoring & alerts

External uptime probes, app.unhealthy / app.recovered, top error paths

Percher pings every paid app from outside the cluster on a schedule and tells you when external probes start failing. Distinct from crash diagnostics: the watchdog watches the container process; uptime monitoring watches whether the app is actually reachable from the internet (DNS, Caddy, TLS, 5xx). Both can fire for the same incident.

Probe cadence per plan

  • Free — disabled (no external probes)
  • Starter — every 5 minutes
  • Maker — every minute
  • Pro — every 30 seconds

The probe fetches the app's primary URL (verified custom domain if any, otherwise the *.percher.run subdomain) with an 8-second timeout. HEAD requests fall back to GET for hosts that 405 on HEAD. Anything that returns 2xx or 3xx counts as up; 4xx and 5xx and network errors count as down.

Outage detection

A single bad probe is not an outage. Three consecutive failures flip the app to down and fire app.unhealthy. A subsequent successful probe flips back to up and fires app.recovered — but only when the matching unhealthy was actually delivered (per channel, paired). A 15-minute cooldown gates re-firing during flapping. The Health tab shows live status, 24h/7d/30d uptime %, the latency sparkline, and a 7-day outage log.

Webhook payload — app.unhealthy

{
  "type": "app.unhealthy",
  "id": "wh_abc123",
  "timestamp": 1735689600000,
  "data": {
    "appId": "app_xyz",
    "appName": "my-app",
    "url": "https://my-app.percher.run",
    "consecutiveFailures": 3,
    "statusCode": 502,
    "error": "bad gateway",
    "latencyMs": 8000
  }
}

Webhook payload — app.recovered

{
  "type": "app.recovered",
  "id": "wh_def456",
  "timestamp": 1735689900000,
  "data": {
    "appId": "app_xyz",
    "appName": "my-app",
    "url": "https://my-app.percher.run",
    "statusCode": 200,
    "latencyMs": 47
  }
}

Per-route 5xx tracking

The Analytics tab's Top error paths panel shows which routes returned 4xx/5xx, sorted by 5xx count, with errorRate% computed from the per-route request total. Aggregated daily from Caddy access logs and re-summed across the selected window — a route that fails consistently every day for a month rises to the top of the 30-day view, even if it's never the worst on any single day. Updates every 15 minutes.

PrevCrash diagnosticsNextZero-downtime deploys
Monitoring & alerts — Percher docs