Errors and rate limits
Rate limits
The API applies per-key rate limits to protect stability. Limits vary by endpoint (e.g. inference vs batch). When you exceed the limit:- Status code:
429 Too Many Requests - Response: JSON with an
errormessage; some responses includeretryAfter(seconds).
- Respect
Retry-Afterif present; otherwise use exponential backoff. - Cache responses where it makes sense to reduce calls.
- For bulk work, use the batch inference endpoint where applicable.
Common inference errors
| Status | Meaning | What to do |
|---|---|---|
| 400 | Bad request (e.g. missing model or messages) | Fix the request body and retry. |
| 401 | Invalid or missing API key | Check Authorization: Bearer sk_... and key validity in the dashboard. |
| 402 | Insufficient balance | Add funds in the dashboard and retry. |
| 404 | Model or resource not found | Use a valid model ID from the models list. |
| 429 | Rate limited | Back off and retry after the indicated time. |
| 500 / 502 | Server or upstream error | Retry with backoff; contact support if it persists. |
Error response format
Errors return JSON with at least:429, you may see:
error and message fields to show clear feedback (e.g. “Add funds” for 402, “Sign in or check API key” for 401).