External uptime probes, app.unhealthy / app.recovered, top error paths
Percher pings every paid app from outside the cluster on a schedule and tells you when external probes start failing. Distinct from crash diagnostics: the watchdog watches the container process; uptime monitoring watches whether the app is actually reachable from the internet (DNS, Caddy, TLS, 5xx). Both can fire for the same incident.
The probe fetches the app's primary URL (verified custom domain if any, otherwise the *.percher.run subdomain) with an 8-second timeout. HEAD requests fall back to GET for hosts that 405 on HEAD. Anything that returns 2xx or 3xx counts as up; 4xx and 5xx and network errors count as down.
A single bad probe is not an outage. Three consecutive failures flip the app to down and fire app.unhealthy. A subsequent successful probe flips back to up and fires app.recovered — but only when the matching unhealthy was actually delivered (per channel, paired). A 15-minute cooldown gates re-firing during flapping. The Health tab shows live status, 24h/7d/30d uptime %, the latency sparkline, and a 7-day outage log.
{
"type": "app.unhealthy",
"id": "wh_abc123",
"timestamp": 1735689600000,
"data": {
"appId": "app_xyz",
"appName": "my-app",
"url": "https://my-app.percher.run",
"consecutiveFailures": 3,
"statusCode": 502,
"error": "bad gateway",
"latencyMs": 8000
}
}{
"type": "app.recovered",
"id": "wh_def456",
"timestamp": 1735689900000,
"data": {
"appId": "app_xyz",
"appName": "my-app",
"url": "https://my-app.percher.run",
"statusCode": 200,
"latencyMs": 47
}
}The Analytics tab's Top error paths panel shows which routes returned 4xx/5xx, sorted by 5xx count, with errorRate% computed from the per-route request total. Aggregated daily from Caddy access logs and re-summed across the selected window — a route that fails consistently every day for a month rises to the top of the 30-day view, even if it's never the worst on any single day. Updates every 15 minutes.