Rate limits
A per-key throttle protects the platform from runaway agents and noisy neighbors. Headers on every response tell you exactly how much budget you have left.
The numbers
- 50 requests per second per agent key, counted across every endpoint under
/api/agent/v1/*. Fixed 1-second window — the counter resets every second, not on a sliding clock. - Per-key cap. The bucket is keyed by agent key id, not user. Two keys on the same user have independent budgets, which makes it safe to dedicate a key to a noisy backfill.
- Per-key policy daily call cap is independent — enforced per-key on top of the per-second throttle. See Per-key policy constraints.
Headers on every response
Every response — success or error — carries three headers so you don’t have to model the budget on your side.
HTTP/1.1 200 OK
Content-Type: application/json
X-RateLimit-Limit: 50
X-RateLimit-Remaining: 27
X-RateLimit-Reset: 1X-RateLimit-Limit— total budget per second.X-RateLimit-Remaining— calls left in the current window.X-RateLimit-Reset— seconds until the bucket resets. Always1today, but treat it as authoritative for future changes.
whoami echoes the same numbers in its response body under rateLimit so an agent can probe its budget without consuming a non-trivial call.
When you hit the limit
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
X-RateLimit-Limit: 50
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1
Retry-After: 1
{
"error": "rate_limit_exceeded",
"message": "Too many requests on this agent key. Retry after the window resets.",
"limit": 50,
"resetSeconds": 1
}We surface Retry-After in seconds. Honor it; immediate retries will just 429 again and burn your budget for the next window.
Backoff pattern
async function withBackoff<T>(fn: () => Promise<T>): Promise<T> {
let delay = 250;
for (let attempt = 0; attempt < 5; attempt++) {
try {
return await fn();
} catch (e) {
const err = e as { status?: number };
if (err.status === 429 || (err.status ?? 0) >= 500) {
const jitter = Math.random() * delay;
await new Promise((r) => setTimeout(r, delay + jitter));
delay = Math.min(delay * 2, 10_000);
continue;
}
throw e;
}
}
throw new Error('giving up after 5 attempts');
}Patterns that hurt
- Polling
/auditin a hot loop. One call every 5–10 seconds is plenty. For real-time, subscribe to webhooks instead. - Calling
/mandateper trade. Read it once at session start (or pull theac://mandate/currentMCP resource); the response is current to the most recent IPS update. - Validating one trade at a time. If you have N candidates, batch them through
POST /policy/validate-batch— up to 25 per call, one round-trip, one rate-limit hit. - Probing every possible amount. If you’re walking down sizes to find what the policy allows, jump in larger steps. The walk in our drawdown-response recipe does ~5 probes total, not 50.
If the Redis backend is unavailable
The throttle uses Redis to count. If Redis is down, the guard fails open — your requests still succeed, but the X-RateLimit-Remaining header will report 50 on every call. We prefer a brief loss of enforcement to dropping legitimate traffic.