Chat completions
The chat completions endpoint accepts a list of messages and returns the model’s reply. It is compatible with the OpenAI chat API shape.Endpoint
Request body
| Field | Required | Type | Description |
|---|---|---|---|
model | Yes | string | Model ID (e.g. from the models list). |
messages | Yes | array | Array of { role, content } objects. |
temperature | No | number | Default 0.7. |
max_tokens | No | number | Max output tokens. |
stream | No | boolean | Set true for streaming. |
Message roles
system— System instruction (optional).user— User message.assistant— Assistant message (for multi-turn context).
Example request
Example: cURL
Example response (non-streaming)
Responses follow an OpenAI-style structure, for example:choices[0].message.content for the assistant text. Token usage is in usage; this is what is metered and deducted from your balance.
Standard completion (single prompt)
For a single prompt without chat history, use the model inference endpoint:text, tokens, model, and optionally latency.