Skip to main content
Percher is still being built, but you can try it out with a free account right now!

Managed AI inference

Keyless LLM calls from your app — no API key, billed to your plan

Apps with a PocketBase sidecar can call LLMs through Percher's managed inference proxy — there's no API key to mint, store, or leak. The capsule template ships it pre-wired as src/lib/ai.ts:

import { complete, streamComplete } from "./lib/ai";

const answer = await complete("Summarize this note: " + note);
// or, for a typewriter UI:
await streamComplete(prompt, (token) => output.append(token));

How it works

  • The browser calls /capsule/<app>/ai with the signed-in user's PocketBase session token. Percher validates the session against your app's PocketBase — only signed-in users of your app can spend.
  • The request rides OpenRouter pinned to zero-data-retention: prompts and completions aren't stored or used for training.
  • Spend is metered against the app owner's plan with a per-day cap that resets at UTC midnight: free $0.05, Starter $0.50, Maker $2, Pro $10. When the cap is hit, calls return 429 with the reset time.
  • Rate limits: 30 requests/min per app, 20 requests/min per user. Output is capped at 2,000 tokens per request.

Models

openai/gpt-4o-mini (default), openai/gpt-4o, and anthropic/claude-3.5-haiku. Pass model in the request body to pick one; anything off the allowlist is rejected rather than billed blind.

Without the capsule template

Any app with [data] mode = "pocketbase" and signed-in users can call the endpoint directly: POST https://api.percher.run/capsule/<app>/ai with the PocketBase auth token in the Authorization header and an OpenAI-style messages array.