Skip to main contentDeployment Guide
Deploying a fine-tuned model makes it accessible via REST endpoints and the LLMTune playground.
- After a training run completes, click Promote to Endpoint.
- Choose environment (staging or production).
- Provide a deployment name and optional description.
- Configure autoscaling (min/max replicas) and timeout settings.
Endpoint Configuration
Each endpoint exposes:
- Base URL:
https://llmtune.io/api/models/{deployment_id}/inference
- Supported modes: chat completions (OpenAI-compatible), streaming, batch.
- Rate limits: displayed in the deployment panel.
Managing Versions
- Deploy multiple versions simultaneously (v1, v2, etc.).
- Mark one as “default” for production traffic.
- Retire older versions when no longer needed.
Rollback
To rollback:
- Pause the current deployment.
- Promote the desired previous version.
- Update the default routing pointer.
All deployments log request metrics for the usage dashboard and support audit trails.