LLMTune API Overview
LLMTune exposes a public REST API for inference, fine-tuning, and automation. The API is designed to be OpenAI-compatible where it matters (chat-style inference) while adding endpoints specific to LLMTune’s training and deployment workflow.Base URL
Public API requests should be sent to:The in-app Next.js routes underhttps://llmtune.io/api/...are used by the LLMTune web application itself. For external integrations, use thehttps://api.llmtune.io/v1base URL described here.
Authentication
All requests must include a workspace API key in theAuthorization header:
Response Format
- All responses are JSON.
- Successful responses return domain-specific fields (for example:
text,tokens,latencyfor inference). - Error responses use a standard shape:
200– Success400– Invalid request401– Invalid or missing API key402– Insufficient credits / payment required404– Resource not found (model, job, etc.)429– Rate limit exceeded500– Unexpected server error
Idempotency
For operations that may be retried (for example fine-tune job creation or large batch submissions), you can provide anIdempotency-Key header. If LLMTune receives multiple requests with the same key, only the first is processed and subsequent ones return the original result.
High-Level Endpoint Groups
Inference
Run completions against hosted or deployed models.POST /v1/models/{modelId}/inference– Single inference request (OpenAI-style payload).POST /v1/playground/inference– Playground-style inference for quick smoke tests.POST /v1/batch/inference– Submit a batch of inference jobs with optional webhook callbacks.
Fine-Tuning
Launch and monitor fine-tuning jobs.POST /v1/fine-tune– Submit a fine-tune job with base model, dataset location, and hyperparameters.GET /v1/fine-tune/{jobId}– Retrieve job status, metrics, and errors.
Webhooks
Subscribe to lifecycle events such as training and deployment changes.- Events include:
training.started,training.completed,training.failed,model.deployed. - Configure webhooks from the dashboard and point them at your backend.
SDK Compatibility
The inference API is compatible with OpenAI-style clients. For example, using the OpenAI SDK you can point the base URL to LLMTune and pass your LLMTune key:Rate Limits
Rate limits depend on your plan. Generally:- Sandbox – Lower request and concurrency limits, great for experimentation.
- Growth / Production – Higher limits suitable for production traffic.
- Enterprise – Custom limits, SLAs, and private fleets.
429 responses and may inspect headers such as Retry-After to back off.
Next Steps
- Review the API Endpoints reference for full details.
- Go through the Inference API Guide for end-to-end examples.
- Use the in-app API Docs page for live, copy-pasteable examples.