Errors and rate limits

Rate limits

The API applies per-key rate limits to protect stability. Limits vary by endpoint (e.g. inference vs batch). When you exceed the limit:

Status code: 429 Too Many Requests
Response: JSON with an error message; some responses include retryAfter (seconds).

Recommended behavior:

Respect Retry-After if present; otherwise use exponential backoff.
Cache responses where it makes sense to reduce calls.
For bulk work, use the batch inference endpoint where applicable.

Common inference errors

Status	Meaning	What to do
400	Bad request (e.g. missing `model` or `messages`)	Fix the request body and retry.
401	Invalid or missing API key	Check `Authorization: Bearer sk_...` and key validity in the dashboard.
402	Insufficient balance	Add funds in the dashboard and retry.
404	Model or resource not found	Use a valid model ID from the models list.
429	Rate limited	Back off and retry after the indicated time.
500 / 502	Server or upstream error	Retry with backoff; contact support if it persists.

Error response format

Errors return JSON with at least:

{
  "error": "Short error code or description",
  "message": "Human-readable details (optional)"
}

For 429, you may see:

{
  "error": "rate_limit_exceeded",
  "message": "Too many requests. Please slow down.",
  "retryAfter": 60
}

Use the error and message fields to show clear feedback (e.g. “Add funds” for 402, “Sign in or check API key” for 401).

Usage tracking

Every successful inference request is recorded for usage and billing. Failed requests (4xx/5xx) are not charged. See Billing & usage and Errors & status codes.

Overview

Inference

Agent

Fine-tuning

Billing & errors

Errors and rate limits

Errors and rate limits

Rate limits

Common inference errors

Error response format

Usage tracking

Overview

Inference

Agent

Fine-tuning

Billing & errors

​Errors and rate limits

​Rate limits

​Common inference errors

​Error response format

​Usage tracking

Errors and rate limits

Rate limits

Common inference errors

Error response format

Usage tracking