Documentation Index
Fetch the complete documentation index at: https://docs.llmtune.io/llms.txt
Use this file to discover all available pages before exploring further.
Pricing & Billing
This guide explains LLMTune’s pricing model, billing process, and how to optimize your spending.
Table of Contents
Pricing Overview
LLMTune offers simple, transparent pricing:
- Pay-as-you-go: Only pay for what you use
- No hidden fees: Clear pricing for all services
- Volume discounts: Automatic discounts for high usage
- Predictable costs: Set budget limits and alerts
Pricing Components
- Inference: Charged per input/output token
- Fine-tuning: Charged per GPU hour
- Storage: Free for datasets and model artifacts
- API requests: Included in token pricing
Cost Comparison
| Service | LLMTune | OpenAI | HuggingFace |
|---|
| GPT-4 class inference | /1M tokens | /1M tokens | /1M tokens |
| GPT-3.5 class inference | .50/1M tokens | /1M tokens | .80/1M tokens |
| Fine-tuning | /GPU hour | Custom | /GPU hour |
LLMTune offers 30-50% cost savings compared to major providers.
Inference Pricing
Inference is charged per token processed (both input and output).
Token Pricing
| Model Tier | Price per 1M Tokens |
|---|
| Small (7B parameters) | .50 |
| Medium (13B-34B parameters) | .00 |
| Large (70B+ parameters) | .00 |
Billing Model
- Input tokens: Charged at full rate
- Output tokens: Charged at full rate
- Minimum charge: 1 token per request
- Rounding: Tokens are counted exactly (no rounding up)
Example Calculations
Example 1: Simple request
- Input: 100 tokens
- Output: 200 tokens
- Total: 300 tokens
- Cost (70B model): 300 / 1,000,000 * \ = .003
Example 2: Batch processing
- 10 requests, 1,000 tokens each
- Total: 10,000 tokens
- Cost (70B model): 10,000 / 1,000,000 * \ = .10
Specific Model Pricing
| Model | Price per 1M Tokens | Notes |
|---|
| Llama 3.3 70B | .00 | Premium performance |
| Mistral 7B | .50 | Fast, cost-effective |
| Qwen2.5 72B | .00 | Excellent value |
| DeepSeek R1 | .00 | Strong reasoning |
Volume Discounts
Automatic discounts apply at these monthly thresholds:
| Monthly Usage | Discount |
|---|
| 1M+ tokens | 5% |
| 10M+ tokens | 10% |
| 100M+ tokens | 20% |
| 1B+ tokens | 30% |
Discounts are applied automatically at the end of each billing cycle.
Fine-Tuning Pricing
Fine-tuning is charged per GPU hour, with multipliers based on the training method.
GPU Hour Pricing
| Compute Type | Price per GPU Hour |
|---|
| Traditional - Single Instance | .00 |
| Traditional - GPU Cluster | .50 |
| Federated - Single Instance | .50 |
| Federated - GPU Cluster | .00 |
Training Method Multipliers
Different training methods have different compute requirements:
| Method | GPU Hour Multiplier | Notes |
|---|
| SFT | 1× | Baseline |
| DPO | 1.5× | Requires reward model |
| PPO | 2× | Most compute-intensive |
| RLAIF | 1.8× | AI feedback loop |
| CTO | 1.2× | Controlled tuning |
| LoRA | 0.5× | Parameter-efficient |
| QLoRA | 0.3× | Most efficient |
Cost Estimation
Before launching training, LLMTune provides:
- GPU hour estimate: Based on model size and dataset
- Cost estimate: Based on compute type and training method
- Time estimate: Based on current queue and compute availability
You can adjust parameters to see cost impacts before launching.
Example Calculations
Example 1: SFT with LoRA
- Model: Llama 3.3 70B
- Dataset: 100K examples
- Method: SFT with LoRA (0.5× multiplier)
- Compute: Traditional Single Instance (/hour)
- Estimated GPU hours: 2 hours
- Cost: 2 * 2 * 0.5 = .00
Example 2: PPO full fine-tune
- Model: Mistral 7B
- Dataset: 50K examples
- Method: PPO (2× multiplier)
- Compute: Traditional GPU Cluster (.50/hour)
- Estimated GPU hours: 1 hour
- Cost: 1 * 2.50 * 2 = .00
Training Queue
Jobs are processed sequentially to conserve GPU resources:
- Queue position shown before launch
- Estimated wait time provided
- No charge for time spent in queue
- Billing starts when GPU allocation begins