Inference overview

The platform exposes OpenAI-compatible endpoints for text generation and chat. You send a prompt or messages, choose a model, and receive generated text plus token usage.

Endpoints

Purpose	Method	Path
Standard completion	POST	`/v1/models/{modelId}/inference`
Chat completions	POST	`/v1/chat/completions`
Batch inference	POST	`/v1/batch/inference`

Base URL: https://api.llmtune.io

Authentication

Every request must include:

Authorization: Bearer sk_live_YOUR_API_KEY

Without a valid key you get 401 Unauthorized.

Common parameters

Parameter	Type	Description
`prompt`	string	Input text (completion endpoint).
`messages`	array	Chat messages (chat completions).
`temperature`	number	Sampling randomness (e.g. 0.7).
`max_tokens` / `maxTokens`	number	Maximum output tokens.
`top_p` / `topP`	number	Nucleus sampling (e.g. 1.0).
`stream`	boolean	Enable streaming (when supported).

Response shape

Typical success response includes:

text or choices — Generated content.
tokens or usage — Token counts (input/output).
model — Model ID used.
latency — Response time in milliseconds (when provided).

Usage and billing

Each request consumes tokens from your account balance. Input and output tokens are metered and charged; insufficient balance returns 402 Payment Required. See Billing & usage.

Chat completions — Request/response format and examples.
Streaming — Streaming responses.
Errors and limits — Rate limits and error handling.

Overview

Inference

Agent

Fine-tuning

Billing & errors

Inference overview

Inference overview

Endpoints

Authentication

Common parameters

Response shape

Usage and billing

Next

Overview

Inference

Agent

Fine-tuning

Billing & errors

​Inference overview

​Endpoints

​Authentication

​Common parameters

​Response shape

​Usage and billing

​Next

Inference overview

Endpoints

Authentication

Common parameters

Response shape

Usage and billing

Next