Skip to main content
PercherPercher

← Blog

The agent recovery contract

Percher errors include a typed nextAction so assistants can recover from deploy failures without parsing prose or guessing the next tool call.

Most deploy errors are written for humans:

Build failed: missing API key. Set OPENAI_API_KEY and redeploy.

A human understands that. An assistant can usually understand it too, until the wording changes, two keys are missing, the key is named oddly, or the suggested command does not match the tool it can call.

Percher does not ask the assistant to infer the next step from prose. Errors carry a recovery object.

The shape

Every Percher API error can include:

```ts recovery: { nextAction:

"none"
"open_login"
"wait_auth"
"wait_deploy"
"run_doctor"
"set_env_vars"
"fix_problems"
"retry"
"fix_config"
"ask_user"
"inspect_build_log";

args?: Record<string, unknown>; url?: string; prompt?: string; } ```

The assistant handles this with a small switch. No regex. No guessing. No "read the paragraph and decide what the platform meant."

Why nextAction is an enum

Early versions used a free-form suggestion. That recreated the same problem in a nicer field. The assistant still had to read text and decide what to do.

An enum forces the platform to commit. If the platform says set_env_vars, the assistant sets env vars. If it says ask_user, the assistant asks the user. If it says retry, the same call is safe to retry.

The enum is intentionally small. Actions describe what the caller should do, not every reason the platform might have.

Args carry data

For a missing env var, Percher returns:

``json { "code": "REQUIRED_ENV_MISSING", "message": "Required env vars unset: OPENAI_API_KEY, ANTHROPIC_API_KEY", "recovery": { "nextAction": "set_env_vars", "args": { "keys": ["OPENAI_API_KEY", "ANTHROPIC_API_KEY"] } } } ``

The keys are data, not words in a sentence. That matters for long names, public build vars, and multiple missing keys. The assistant asks for the values and calls the env-var tool with the exact names Percher returned.

Prompts are for the user

Sometimes the platform cannot continue. A quota is exhausted. A name is taken. Auth needs approval. In those cases nextAction is ask_user, and prompt is the text to show.

Example:

``json { "code": "DAILY_QUOTA_EXCEEDED", "message": "Daily live-deploy limit reached (50/50). Resets at 2026-05-16T00:00:00Z.", "recovery": { "nextAction": "ask_user", "prompt": "You've hit your daily limit of 50 live deploys. The counter resets at 00:00 UTC. Try --preview to deploy without consuming the live quota, or upgrade your plan for a higher cap." } } ``

The assistant should not retry. It should show the prompt and wait.

Waiting is a tool call too

Queued deploys return the deploy id as args:

``json { "status": "queued", "deployId": "d_8475289abc", "recovery": { "nextAction": "wait_deploy", "args": { "deployId": "d_8475289abc", "mode": "live" } } } ``

The assistant calls percher_wait_for_deploy with those args. It does not need to scrape the id from a status message or remember whether this was a live or preview deploy.

Why normal error codes are not enough

HTTP status codes and platform-specific error codes tell you what happened. They usually do not tell a tool what to do next.

That difference shows up in recovery. With prose, each failure is a reading task. With a recovery contract, each failure is a dispatch table entry. The platform owns the semantics, and the assistant just follows the contract.

This is easier to test too. If DEPLOY_RATE_LIMITED should be retryable, the test pins nextAction: "retry". If DAILY_QUOTA_EXCEEDED should not be retryable, the test pins ask_user. The behavior lives in code, not prompt luck.

A few things that turned out to matter

The biggest one is that the platform writes the action, not the assistant. Percher already knows which failures are transient, which ones need user input, and which ones need a source edit. Asking the model to rediscover that from a sentence was wasted work.

Keeping the enum small mattered more than we expected. Every new action is integration cost for every caller, so most "new cases" end up being ask_user with a sharper prompt. We add to the enum reluctantly.

And the boring stuff — tool args go in args, URLs go in url, user-facing text goes in prompt — is what keeps the contract honest. The moment those blur together, the assistant is back to parsing English.

None of this is a theory of agents. It is plumbing. But it is the kind of plumbing that decides whether a failed deploy turns into a fix or a confused chat transcript.