Rate Limits

Rate limits are applied per client (by IP and, where relevant, by path) to protect the API and ensure fair usage. When a limit is exceeded, the API returns 429 Too Many Requests.

Limits by Endpoint

Endpoint / area	Window	Max requests	Notes
Inference	1 minute	100	`/models/{id}/inference`, etc.
Chat completions	5 minutes	20	OpenAI-compatible `/chat/completions`
Batch inference	1 minute	20	Each batch can contain up to 100 items
Training start	1 minute	10	Starting new fine-tune jobs
Auth (login, signup)	15 minutes	10	Authentication attempts
General API	1 minute	60	Other API routes
Upload	1 minute	10	File uploads
Explorer	1 minute	30	Explorer endpoints
Contact form	1 hour	5	Contact / support

Response When Rate Limited

When you exceed a limit, the API responds with:

Status: 429 Too Many Requests
Body: JSON with error, optional message, and retryAfter (seconds)
Headers:
- Retry-After – Seconds to wait before retrying
- X-RateLimit-Limit – Max requests in the window
- X-RateLimit-Remaining – 0 when limited
- X-RateLimit-Reset – Unix timestamp when the window resets

Example:

{
  "error": "Too many requests. Please slow down.",
  "message": "Too many requests. Currently, there are many requests being processed. Please try again later.",
  "retryAfter": 60
}

Best Practices

Honor Retry-After – Wait at least that many seconds before retrying.
Use exponential backoff – After repeated 429s, increase delay between retries.
Batch when possible – Use the batch inference endpoint instead of many single requests.
Cache – Cache responses where appropriate to reduce request volume.
Monitor – Track 429 responses in your application to tune concurrency and batching.

Next Steps

Error codes for other API error responses
API Overview for authentication and base URLs

API overview

REST endpoints

Reference

Rate Limits

Rate Limits

Limits by Endpoint

Response When Rate Limited

Best Practices

Next Steps

API overview

REST endpoints

Reference

​Rate Limits

​Limits by Endpoint

​Response When Rate Limited

​Best Practices

​Next Steps

Rate Limits

Limits by Endpoint

Response When Rate Limited

Best Practices

Next Steps