Documentation Index
Fetch the complete documentation index at: https://docs.llmtune.io/llms.txt
Use this file to discover all available pages before exploring further.
Usage Metering
Usage metering ensures you understand how models are performing and what they cost. LLMTune provides comprehensive usage tracking across all products.Metrics Captured
Inference Metrics
- Requests – Count of inference requests, grouped by endpoint and time range
- Tokens – Input and output tokens per request
- Latency – P50, P95, P99 response times for each endpoint
- Errors – Rate of retryable or fatal errors
- Success rate – Percentage of successful requests
Training Metrics
- Training jobs – Number of jobs launched
- GPU hours – Compute time used for training
- Training spend – Cost per training job
- Queue time – Time spent waiting in queue
Deployment Metrics
- Deployments – Number of active deployments
- Version changes – Deployment version updates
- Traffic distribution – Traffic split across versions
Overall Metrics
- Spend – Aggregated cost for training runs and inference in your billing currency
- Usage trends – Daily, weekly, monthly usage patterns
- Cost breakdown – Spending by product, model, or endpoint
Usage Dashboards
Navigate to Usage in the dashboard for comprehensive usage analytics.Overview Dashboard
The overview provides a consolidated view with:- Summary cards – Total requests, tokens, spend, and active deployments
- Time series charts – Usage trends over time
- Top models – Most used models and endpoints
- Cost breakdown – Spending by category
Filtering Options
Filter usage data by:- Model or endpoint – View usage for specific models
- Environment – Staging vs production
- Time ranges – Last hour, day, week, month, or custom range
- Team member – Usage by user (if team features are enabled)
- API key – Usage per API key for debugging
Alerts and Webhooks
Configure usage thresholds to trigger alerts:Email Notifications
- Spend alerts – Notify when spending exceeds thresholds
- Usage spikes – Alert on unusual usage patterns
- Error rate alerts – Notify when error rates exceed limits
Webhooks
Subscribe to training and deployment events via webhooks (see Webhooks). Usage-threshold webhook events may be added in a future release.Billing Integration
LLMTune integrates with Stripe for billing:- Automatic top-ups – Configure automatic balance top-ups
- Usage-based billing – Pay per token or request
- Training billing – Pay per GPU hour for training jobs
- Billing exports – Download usage and billing reports
Rate Limits
Rate limits are applied per workspace and plan:- Sandbox – Lower limits for experimentation
- Growth / Production – Higher limits for production traffic
- Enterprise – Custom limits and SLAs
Best Practices
- Monitor regularly – Check usage dashboards weekly
- Set up alerts – Configure thresholds to avoid surprises
- Review costs – Understand what drives spending
- Optimize usage – Use caching, batching, and efficient models
- Track trends – Watch for usage spikes or anomalies
Troubleshooting
- Unexpected charges – Review usage breakdown by model and endpoint
- High latency – Check latency metrics to identify slow endpoints
- Rate limit issues – Monitor rate limit usage and upgrade if needed
- Missing data – Ensure API keys are properly configured and requests are authenticated
Next Steps
- Learn about Workspaces to understand workspace-level usage
- Read the Deployment Guide to understand deployment metrics
- Check the API documentation for programmatic usage access
- Set up Webhooks for usage automation