Skip to main content

Fine-Tuning Guide

This step-by-step tutorial walks through configuring and launching a training job in FineTune Studio.

Prerequisites

  • LLMTune account and workspace
  • At least one dataset (uploaded via Dataset Hub or using playground datasets)
  • Understanding of dataset structure (see Datasets)

Step 1: Access FineTune Studio

  1. Navigate to FineTune Studio from the main navigation or visit /finetune-studio.
  2. You’ll see the product overview with information about training methods, modalities, and compute options.

Step 2: Choose Base Model

  1. In FineTune Studio, review the recommended models.
  2. Each model card shows:
    • Provider (IO Intelligence, DeepSeek, Mistral, etc.)
    • Context length
    • Latency expectations
    • Recommended use cases
  3. Select the foundation model that fits your task. Record the model ID for reference.

Step 3: Select Training Method

FineTune Studio supports 13 training methods:

Core Methods

  • SFT (Supervised Fine-Tuning) – Classic instruction tuning with labeled demonstrations
  • DPO (Direct Preference Optimization) – Align to preferences without reinforcement
  • PPO (Proximal Policy Optimization) – Reinforcement learning with reward-driven updates
  • RLAIF (RL with AI Feedback) – Use AI to provide feedback for RL training
  • CTO (Controlled Tuning Optimization) – Fine-grained control over model behavior

Specialized Methods

  • Code Generation – Train models to generate code from natural language
  • Multimodal – Train vision-language models with text and images
  • Text-to-Embeddings – Train embedding models for semantic search and RAG
  • Audio Understanding – Train models to understand and reason about audio
  • Audio-to-Text (ASR) – Train Automatic Speech Recognition models
  • Text-to-Audio (TTS) – Generate speech from text (uses Coqui TTS service)
  • Video Understanding – Train video-language models (coming soon)
  • Reward Modeling – Train reward functions for RLHF pipelines
Choose the method that matches your use case. The interface provides descriptions and “best for” guidance for each.

Step 4: Select Dataset

  1. Choose from:
    • Uploaded datasets from Dataset Hub
    • Playground datasets (pre-configured demo datasets for each training method)
  2. If using playground datasets, they are automatically selected based on your training method.
  3. For custom datasets, ensure they match the expected format for your chosen method:
    • SFT/DPO/PPO: JSONL with messages or conversations arrays
    • Code Generation: Code examples with prompts
    • Multimodal: Text + image pairs
    • Audio methods: Audio file paths with transcripts or labels

Step 5: Configure Training Parameters

Key settings:
SettingDescription
Training methodSFT, DPO, PPO, RLAIF, CTO, Code, Multimodal, etc.
Learning rateDefault varies by method; typically 0.0001 for SFT
Batch sizeAdaptive based on model size and dataset
EpochsTypically 2–5 for most methods; validate with evaluation
Evaluation cadenceFrequency of validation runs during training

Step 6: Choose Compute Model

FineTune Studio offers flexible compute options:

Traditional Computing

  • Single Instance – One GPU instance, perfect for smaller models and testing
  • GPU Cluster – Multiple GPUs working together, faster training for large models
  • Best for: Stable workloads, predictable costs, single location

Federated Computing

  • Single Instance (Federated) – Privacy-preserving distributed compute
  • GPU Cluster (Federated) – Unlimited scale across global nodes
  • Best for: Privacy-sensitive data, global scale, lower costs
You can switch between compute options anytime without changing your workflow.

Step 7: Launch and Monitor

  1. Review the cost estimate displayed before launching.
  2. Click Launch Training.
  3. Monitor progress in real time:
    • Loss curves – Track training and validation loss
    • Tokens/sec – Monitor training throughput
    • Epoch completion – See current epoch and total epochs
    • Queue position – If multiple jobs are queued, see your position
    • GPU allocation events – Track resource allocation
  4. You can pause or cancel if necessary (depending on job state).

Step 8: Review Results

When the run completes:
  1. Inspect evaluation metrics and sample outputs.
  2. Use the Evaluate feature (LLMTune Evaluate) to test your model:
    • Single prompt evaluation – Quick individual tests
    • Compare with base model – Side-by-side comparison
    • Batch evaluation – Test multiple prompts at once
    • Results dashboard – Comprehensive metrics and trends
  3. Compare with previous runs using the comparison view.
  4. Archive the run or mark as ready for deployment.
For detailed evaluation instructions, see the Evaluate Guide.

Supported Modalities

FineTune Studio supports training models for all major modalities:
  • Text-to-Text – LLaMA, Mistral, Qwen, DeepSeek
  • Image-to-Text – Qwen-VL, LLaVA, Kimi-K2, PiXtral
  • Audio Understanding – Qwen2-Audio, MiniCPM-o
  • Audio-to-Text (ASR) – Whisper, NeMo, SpeechT5
  • Video-to-Text – InternVL, Kosmos-2, Qwen2-VL-Video
  • Code – DeepSeek-Coder, StarCoder2, CodeLLaMA
  • Multimodal – Qwen-VL, LLaVA, Kimi-K2
  • Text-to-Audio (TTS) – XTTS, Bark, Rime
  • Text-to-Embeddings – BGE, E5, text-embedding-3-large

Troubleshooting

  • Job stuck in pending: Check resource availability or contact support if it exceeds expected queue times. The training queue processes jobs sequentially to conserve GPU resources.
  • Spiking loss: Inspect dataset balance; consider re-weighting or cleaning data. Review dataset quality in Dataset Hub.
  • Unexpected outputs: Revisit dataset quality and ensure conversation turns are labeled correctly for your training method.
  • Compute selection issues: You can switch between Traditional and Federated compute anytime. If one option isn’t available, try the other.

Next Steps