Inference API Guide
LLMTune provides an OpenAI-compatible chat completion API for deployed models. This section shows how to call it with cURL, JavaScript, and Python.Endpoint
{deployment_id} with your deployed model ID.
Authentication
Include a Bearer token created under API Keys.Request Body
Available Parameters
| Field | Required | Description |
|---|---|---|
messages | Yes | Array of chat messages |
temperature | No | Randomness control (default 0.7) |
max_tokens | No | Maximum output tokens (default 512) |
stream | No | Enable SSE streaming |
metadata | No | Arbitrary JSON metadata for observability |
Responses
error and message fields.
cURL Example
JavaScript (Fetch)
Python (requests)
Streaming
To enable streaming responses, set"stream": true in the request and consume the SSE stream on the client side. Each event will carry a partial token output until data: [DONE] is emitted.