Skip to main content

LLMTune API Overview

LLMTune exposes a public REST API for inference, fine-tuning, and automation. The API is designed to be OpenAI-compatible where it matters (chat-style inference) while adding endpoints specific to LLMTune’s training and deployment workflow.

Base URL

Public API requests should be sent to:
https://api.llmtune.io/v1
The in-app Next.js routes under https://llmtune.io/api/... are used by the LLMTune web application itself. For external integrations, use the https://api.llmtune.io/v1 base URL described here.

Authentication

All requests must include a workspace API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY
Create and manage API keys from the API Keys section in the LLMTune dashboard.

Response Format

  • All responses are JSON.
  • Successful responses return domain-specific fields (for example: text, tokens, latency for inference).
  • Error responses use a standard shape:
{
  "error": {
    "message": "Description of what went wrong",
    "code": "ERROR_CODE",
    "details": { "optional": "context" }
  }
}
Common HTTP status codes:
  • 200 – Success
  • 400 – Invalid request
  • 401 – Invalid or missing API key
  • 402 – Insufficient credits / payment required
  • 404 – Resource not found (model, job, etc.)
  • 429 – Rate limit exceeded
  • 500 – Unexpected server error

Idempotency

For operations that may be retried (for example fine-tune job creation or large batch submissions), you can provide an Idempotency-Key header. If LLMTune receives multiple requests with the same key, only the first is processed and subsequent ones return the original result.

High-Level Endpoint Groups

Inference

Run completions against hosted or deployed models.
  • POST /v1/models/{modelId}/inference – Single inference request (OpenAI-style payload).
  • POST /v1/playground/inference – Playground-style inference for quick smoke tests.
  • POST /v1/batch/inference – Submit a batch of inference jobs with optional webhook callbacks.
See Inference Endpoints for full request/response schemas.

Fine-Tuning

Launch and monitor fine-tuning jobs.
  • POST /v1/fine-tune – Submit a fine-tune job with base model, dataset location, and hyperparameters.
  • GET /v1/fine-tune/{jobId} – Retrieve job status, metrics, and errors.
See Fine-Tuning Endpoints for body examples.

Webhooks

Subscribe to lifecycle events such as training and deployment changes.
  • Events include: training.started, training.completed, training.failed, model.deployed.
  • Configure webhooks from the dashboard and point them at your backend.
See Webhooks for payload examples.

SDK Compatibility

The inference API is compatible with OpenAI-style clients. For example, using the OpenAI SDK you can point the base URL to LLMTune and pass your LLMTune key:
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.llmtune.io/v1",
  apiKey: process.env.LLMTUNE_API_KEY,
});

const response = await client.chat.completions.create({
  model: "meta-llama/Llama-3.3-70B-Instruct",
  messages: [
    { role: "user", content: "Summarize LLMTune." }
  ],
});

Rate Limits

Rate limits depend on your plan. Generally:
  • Sandbox – Lower request and concurrency limits, great for experimentation.
  • Growth / Production – Higher limits suitable for production traffic.
  • Enterprise – Custom limits, SLAs, and private fleets.
If you exceed limits you will receive 429 responses and may inspect headers such as Retry-After to back off.

Next Steps