Inference API Guide
LLMTune provides an OpenAI-compatible chat completion API for deployed models. This guide shows how to call it with cURL, JavaScript, and Python.Base URL
The public LLMTune API is available at:Note: The in-app routes underhttps://llmtune.io/api/...are used by the LLMTune web application. For external integrations, use thehttps://api.llmtune.io/v1base URL.
Endpoint
{modelId} with your deployed model ID (e.g., meta-llama/Llama-3.3-70B-Instruct or your fine-tuned model ID).
Authentication
Include a Bearer token created under API Keys in the LLMTune dashboard.Request Body
Available Parameters
| Field | Required | Description | Default |
|---|---|---|---|
prompt | Yes | Input prompt string | - |
temperature | No | Sampling temperature (0-2), higher = more creative | 0.7 |
maxTokens | No | Maximum output tokens | 1024 |
topP | No | Nucleus sampling parameter | 1.0 |
topK | No | Top-K sampling limit | 50 |
metadata | No | Arbitrary JSON metadata for observability | null |
Response Format
error and message fields:
cURL Example
JavaScript (Fetch)
Python (requests)
Playground Inference
For quick smoke tests, use the playground endpoint:Batch Inference
Submit up to 100 inference jobs per call:Error Handling
Common error codes:401 Unauthorized– Invalid or missing API key402 Payment Required– Insufficient credits404 Not Found– Model or job ID not found429 Rate Limited– Rate limit exceeded (checkRetry-Afterheader)500 Server Error– Unexpected issue (retry with exponential backoff)
Rate Limits
Rate limits vary by plan:- Sandbox – Lower limits for experimentation
- Growth / Production – Higher limits for production traffic
- Enterprise – Custom limits and SLAs
Next Steps
- Review the API Overview for complete endpoint documentation
- Check API Endpoints for all available endpoints
- Visit the in-app API Docs for live examples
- Set up Webhooks for automation