Skip to main content

Errors and rate limits

Rate limits

The API applies per-key rate limits to protect stability. Limits vary by endpoint (e.g. inference vs batch). When you exceed the limit:
  • Status code: 429 Too Many Requests
  • Response: JSON with an error message; some responses include retryAfter (seconds).
Recommended behavior:
  1. Respect Retry-After if present; otherwise use exponential backoff.
  2. Cache responses where it makes sense to reduce calls.
  3. For bulk work, use the batch inference endpoint where applicable.

Common inference errors

StatusMeaningWhat to do
400Bad request (e.g. missing model or messages)Fix the request body and retry.
401Invalid or missing API keyCheck Authorization: Bearer sk_... and key validity in the dashboard.
402Insufficient balanceAdd funds in the dashboard and retry.
404Model or resource not foundUse a valid model ID from the models list.
429Rate limitedBack off and retry after the indicated time.
500 / 502Server or upstream errorRetry with backoff; contact support if it persists.

Error response format

Errors return JSON with at least:
{
  "error": "Short error code or description",
  "message": "Human-readable details (optional)"
}
For 429, you may see:
{
  "error": "rate_limit_exceeded",
  "message": "Too many requests. Please slow down.",
  "retryAfter": 60
}
Use the error and message fields to show clear feedback (e.g. “Add funds” for 402, “Sign in or check API key” for 401).

Usage tracking

Every successful inference request is recorded for usage and billing. Failed requests (4xx/5xx) are not charged. See Billing & usage and Errors & status codes.