Skip to main contentDeployment Guide
Deploying a fine-tuned model makes it accessible via REST endpoints and the LLMTune playground. Use LLMTune Deploy to manage versions, control traffic, and monitor production deployments.
Overview
LLMTune Deploy provides:
- Controlled promotion – Promote successful runs to staging or production with explicit approvals
- Release orchestration – Manage rollout plans, route traffic by version, and trigger automated tests
- Runtime observability – Monitor latency, spend, and error rates with alerting hooks
- Instant rollback – Revert to previous versions with one click
- After a training run completes in FineTune Studio, navigate to LLMTune Deploy (or click Promote to Endpoint from the training job).
- Choose environment:
- Staging – For testing before production
- Production – For live traffic
- Provide a deployment name and optional description.
- Configure deployment settings:
- Version tagging
- Traffic routing (if deploying multiple versions)
- Autoscaling parameters (min/max replicas)
- Timeout settings
Endpoint Configuration
Each deployed model exposes:
- Base URL:
https://api.llmtune.io/v1/models/{deployment_id}/inference
- Supported modes:
- Chat completions (OpenAI-compatible)
- Streaming responses
- Batch inference
- Rate limits: Displayed in the deployment panel and vary by plan
Managing Versions
LLMTune Deploy supports full version control:
- Deploy multiple versions simultaneously (v1, v2, etc.)
- Mark one as default for production traffic
- Track changes with notes, approvers, and automated rollback states
- Retire older versions when no longer needed
Traffic Management
Deploy supports advanced traffic routing:
- Canary deployments – Gradually shift traffic to new versions
- Shadow deployments – Test new versions without affecting production
- Blue/Green deployments – Instant switch between versions
- Traffic splitting – Route percentage of traffic to different versions
All traffic management happens without touching infrastructure – configure it directly in the Deploy interface.
Rollback
To rollback to a previous version:
- Open the deployment in LLMTune Deploy.
- Select the version you want to rollback to.
- Click Rollback or Promote to Production.
- The deployment switches instantly – no downtime.
LLMTune tracks all rollback events in the change log for audit purposes.
Runtime Observability
Monitor your deployments in real time:
- Latency metrics – Track response times and P95/P99 percentiles
- Spend tracking – Monitor costs per version and time period
- Error rates – Track failures and anomalies
- Usage intelligence – Tie metrics to each release for PM and ops alignment
Set up alerting hooks to notify your team when thresholds are breached.
Ops Automation
LLMTune Deploy integrates with your ops workflows:
- Smoke tests – Automatically run tests after deployment
- Observability dashboards – Connect to your existing monitoring tools
- Incident workflows – Trigger alerts and notifications
- Webhooks – Receive deployment lifecycle events
Best Practices
- Start with staging – Always test in staging before promoting to production
- Use version control – Tag and document each deployment version
- Monitor closely – Watch metrics for the first few minutes after deployment
- Plan rollbacks – Know which version to rollback to before deploying
- Use traffic management – Gradually roll out changes with canary deployments
Troubleshooting
- Deployment fails: Check that the training job completed successfully and the model is accessible
- High latency: Review model size and consider using a smaller model or optimizing inference
- Errors in production: Use the rollback feature immediately, then investigate in staging
- Traffic routing issues: Verify traffic split configuration and check version status
Next Steps