Skip to main content

Inference overview

The platform exposes OpenAI-compatible endpoints for text generation and chat. You send a prompt or messages, choose a model, and receive generated text plus token usage.

Endpoints

PurposeMethodPath
Standard completionPOST/v1/models/{modelId}/inference
Chat completionsPOST/v1/chat/completions
Batch inferencePOST/v1/batch/inference
Base URL: https://api.llmtune.io

Authentication

Every request must include:
Authorization: Bearer sk_live_YOUR_API_KEY
Without a valid key you get 401 Unauthorized.

Common parameters

ParameterTypeDescription
promptstringInput text (completion endpoint).
messagesarrayChat messages (chat completions).
temperaturenumberSampling randomness (e.g. 0.7).
max_tokens / maxTokensnumberMaximum output tokens.
top_p / topPnumberNucleus sampling (e.g. 1.0).
streambooleanEnable streaming (when supported).

Response shape

Typical success response includes:
  • text or choices — Generated content.
  • tokens or usage — Token counts (input/output).
  • model — Model ID used.
  • latency — Response time in milliseconds (when provided).

Usage and billing

Each request consumes tokens from your account balance. Input and output tokens are metered and charged; insufficient balance returns 402 Payment Required. See Billing & usage.

Next