Skip to main content

Deployment Guide

Deploying a fine-tuned model makes it accessible via REST endpoints and the LLMTune playground.

Promote a Run

  1. After a training run completes, click Promote to Endpoint.
  2. Choose environment (staging or production).
  3. Provide a deployment name and optional description.
  4. Configure autoscaling (min/max replicas) and timeout settings.

Endpoint Configuration

Each endpoint exposes:
  • Base URL: https://llmtune.io/api/models/{deployment_id}/inference
  • Supported modes: chat completions (OpenAI-compatible), streaming, batch.
  • Rate limits: displayed in the deployment panel.

Managing Versions

  • Deploy multiple versions simultaneously (v1, v2, etc.).
  • Mark one as “default” for production traffic.
  • Retire older versions when no longer needed.

Rollback

To rollback:
  1. Pause the current deployment.
  2. Promote the desired previous version.
  3. Update the default routing pointer.
All deployments log request metrics for the usage dashboard and support audit trails.