Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.llmtune.io/llms.txt

Use this file to discover all available pages before exploring further.

Pricing & Billing

This guide explains LLMTune’s pricing model, billing process, and how to optimize your spending.

Table of Contents


Pricing Overview

LLMTune offers simple, transparent pricing:
  • Pay-as-you-go: Only pay for what you use
  • No hidden fees: Clear pricing for all services
  • Volume discounts: Automatic discounts for high usage
  • Predictable costs: Set budget limits and alerts

Pricing Components

  1. Inference: Charged per input/output token
  2. Fine-tuning: Charged per GPU hour
  3. Storage: Free for datasets and model artifacts
  4. API requests: Included in token pricing

Cost Comparison

ServiceLLMTuneOpenAIHuggingFace
GPT-4 class inference/1M tokens/1M tokens/1M tokens
GPT-3.5 class inference.50/1M tokens/1M tokens.80/1M tokens
Fine-tuning/GPU hourCustom/GPU hour

LLMTune offers 30-50% cost savings compared to major providers.

Inference Pricing

Inference is charged per token processed (both input and output).

Token Pricing

Model TierPrice per 1M Tokens
Small (7B parameters).50
Medium (13B-34B parameters).00
Large (70B+ parameters).00

Billing Model

  • Input tokens: Charged at full rate
  • Output tokens: Charged at full rate
  • Minimum charge: 1 token per request
  • Rounding: Tokens are counted exactly (no rounding up)

Example Calculations

Example 1: Simple request
  • Input: 100 tokens
  • Output: 200 tokens
  • Total: 300 tokens
  • Cost (70B model): 300 / 1,000,000 * \ = .003
Example 2: Batch processing
  • 10 requests, 1,000 tokens each
  • Total: 10,000 tokens
  • Cost (70B model): 10,000 / 1,000,000 * \ = .10

Specific Model Pricing

ModelPrice per 1M TokensNotes
Llama 3.3 70B.00Premium performance
Mistral 7B.50Fast, cost-effective
Qwen2.5 72B.00Excellent value
DeepSeek R1.00Strong reasoning

Volume Discounts

Automatic discounts apply at these monthly thresholds:
Monthly UsageDiscount
1M+ tokens5%
10M+ tokens10%
100M+ tokens20%
1B+ tokens30%

Discounts are applied automatically at the end of each billing cycle.

Fine-Tuning Pricing

Fine-tuning is charged per GPU hour, with multipliers based on the training method.

GPU Hour Pricing

Compute TypePrice per GPU Hour
Traditional - Single Instance.00
Traditional - GPU Cluster.50
Federated - Single Instance.50
Federated - GPU Cluster.00

Training Method Multipliers

Different training methods have different compute requirements:
MethodGPU Hour MultiplierNotes
SFTBaseline
DPO1.5×Requires reward model
PPOMost compute-intensive
RLAIF1.8×AI feedback loop
CTO1.2×Controlled tuning
LoRA0.5×Parameter-efficient
QLoRA0.3×Most efficient

Cost Estimation

Before launching training, LLMTune provides:
  1. GPU hour estimate: Based on model size and dataset
  2. Cost estimate: Based on compute type and training method
  3. Time estimate: Based on current queue and compute availability
You can adjust parameters to see cost impacts before launching.

Example Calculations

Example 1: SFT with LoRA
  • Model: Llama 3.3 70B
  • Dataset: 100K examples
  • Method: SFT with LoRA (0.5× multiplier)
  • Compute: Traditional Single Instance (/hour)
  • Estimated GPU hours: 2 hours
  • Cost: 2 * 2 * 0.5 = .00
Example 2: PPO full fine-tune
  • Model: Mistral 7B
  • Dataset: 50K examples
  • Method: PPO (2× multiplier)
  • Compute: Traditional GPU Cluster (.50/hour)
  • Estimated GPU hours: 1 hour
  • Cost: 1 * 2.50 * 2 = .00

Training Queue

Jobs are processed sequentially to conserve GPU resources:
  • Queue position shown before launch
  • Estimated wait time provided
  • No charge for time spent in queue
  • Billing starts when GPU allocation begins