Skip to main content

Fine-tuning overview

Fine-tuning lets you train a model on your data using platform-supported base models. You submit a job (dataset + config), the platform runs training, and you can deploy the resulting model for inference.

Supported models

Only models that are marked as fine-tunable in the platform catalog can be used as base models. The dashboard and the models API list which models support fine-tuning. Use one of those model IDs when starting a job; others will be rejected. Models are described generically in the catalog (e.g. by size and capability). Check the dashboard or API for the current list of fine-tunable base models.

How it works

  1. Dataset — You provide training data in a supported format (e.g. JSONL). See Dataset format.
  2. Submit job — You call the training start endpoint with base model, dataset reference, and hyperparameters (e.g. epochs, batch size, learning rate).
  3. Execution — The platform runs training on its infrastructure. You do not manage GPUs or nodes.
  4. Monitor — You poll the job status endpoint or use webhooks to get progress and metrics (e.g. loss, steps).
  5. Deploy — When the job completes, you can deploy the trained model for inference via the usual inference endpoints.

Training workflow

StepDescription
Prepare dataFormat your dataset (e.g. JSONL with messages or prompt/completion pairs).
Upload or referenceUpload via the dashboard or provide a URL/path the platform can access.
Configure jobChoose base model, training method (e.g. SFT), epochs, batch size, learning rate.
Start jobPOST to the training start endpoint; you receive a job ID.
MonitorGET job status; optionally register a webhook for training.completed / training.failed.
DeployUse the trained model ID (or deployment flow) for inference.

API endpoints (conceptual)

  • Start trainingPOST /api/fine-tune/training/start or equivalent (see API reference). Body: base model, dataset, hyperparameters, optional webhook URL.
  • Get job statusGET /api/training/{jobId}. Response: status, progress, metrics, error if failed.
  • CancelPOST /api/training/{jobId}/cancel (if supported).
Exact paths may vary; refer to the API endpoints and the dashboard for the current routes.

Limitations

  • Only platform-supported base models can be fine-tuned.
  • Dataset must conform to the required format and size limits.
  • Training runs on platform infrastructure; you cannot bring your own cluster.
  • Cost is based on usage (e.g. tokens or job type); balance must be sufficient.
See Dataset format, Workflow, and Limitations for details.