Chat completions

The chat completions endpoint accepts a list of messages and returns the model’s reply. It is compatible with the OpenAI chat API shape.

Endpoint

POST https://api.llmtune.io/v1/chat/completions

Request body

Field	Required	Type	Description
`model`	Yes	string	Model ID (e.g. from the models list).
`messages`	Yes	array	Array of `{ role, content }` objects.
`temperature`	No	number	Default 0.7.
`max_tokens`	No	number	Max output tokens.
`stream`	No	boolean	Set true for streaming.

Message roles

system — System instruction (optional).
user — User message.
assistant — Assistant message (for multi-turn context).

Example request

{
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Summarize this in one sentence." }
  ],
  "temperature": 0.7,
  "max_tokens": 400
}

Example: cURL

curl https://api.llmtune.io/v1/chat/completions \
  -H "Authorization: Bearer sk_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "messages": [
      { "role": "user", "content": "What is the capital of France?" }
    ],
    "temperature": 0.5,
    "max_tokens": 200
  }'

Example response (non-streaming)

Responses follow an OpenAI-style structure, for example:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

Use choices[0].message.content for the assistant text. Token usage is in usage; this is what is metered and deducted from your balance.

Standard completion (single prompt)

For a single prompt without chat history, use the model inference endpoint:

POST https://api.llmtune.io/v1/models/{modelId}/inference

Body:

{
  "prompt": "Your prompt here.",
  "temperature": 0.6,
  "maxTokens": 256
}

Response typically includes text, tokens, model, and optionally latency.

Overview

Inference

Agent

Fine-tuning

Billing & errors

Chat completions

Chat completions

Endpoint

Request body

Message roles

Example request

Example: cURL

Example response (non-streaming)

Standard completion (single prompt)

Overview

Inference

Agent

Fine-tuning

Billing & errors

​Chat completions

​Endpoint

​Request body

​Message roles

​Example request

​Example: cURL

​Example response (non-streaming)

​Standard completion (single prompt)

Chat completions

Endpoint

Request body

Message roles

Example request

Example: cURL

Example response (non-streaming)

Standard completion (single prompt)