Usage Metering
Usage metering ensures you understand how models are performing and what they cost. LLMTune provides comprehensive usage tracking across all products.Metrics Captured
Inference Metrics
- Requests – Count of inference requests, grouped by endpoint and time range
- Tokens – Input and output tokens per request
- Latency – P50, P95, P99 response times for each endpoint
- Errors – Rate of retryable or fatal errors
- Success rate – Percentage of successful requests
Training Metrics
- Training jobs – Number of jobs launched
- GPU hours – Compute time used for training
- Training spend – Cost per training job
- Queue time – Time spent waiting in queue
Deployment Metrics
- Deployments – Number of active deployments
- Version changes – Deployment version updates
- Traffic distribution – Traffic split across versions
Overall Metrics
- Spend – Aggregated cost for training runs and inference in your billing currency
- Usage trends – Daily, weekly, monthly usage patterns
- Cost breakdown – Spending by product, model, or endpoint
Usage Dashboards
Navigate to Usage in the dashboard for comprehensive usage analytics.Overview Dashboard
The overview provides a consolidated view with:- Summary cards – Total requests, tokens, spend, and active deployments
- Time series charts – Usage trends over time
- Top models – Most used models and endpoints
- Cost breakdown – Spending by category
Filtering Options
Filter usage data by:- Model or endpoint – View usage for specific models
- Environment – Staging vs production
- Time ranges – Last hour, day, week, month, or custom range
- Team member – Usage by user (if team features are enabled)
- API key – Usage per API key for debugging
Alerts and Webhooks
Configure usage thresholds to trigger alerts:Email Notifications
- Spend alerts – Notify when spending exceeds thresholds
- Usage spikes – Alert on unusual usage patterns
- Error rate alerts – Notify when error rates exceed limits
Webhooks
Subscribe to training and deployment events via webhooks (see Webhooks). Usage-threshold webhook events may be added in a future release.Billing Integration
LLMTune integrates with Stripe for billing:- Automatic top-ups – Configure automatic balance top-ups
- Usage-based billing – Pay per token or request
- Training billing – Pay per GPU hour for training jobs
- Billing exports – Download usage and billing reports
Rate Limits
Rate limits are applied per workspace and plan:- Sandbox – Lower limits for experimentation
- Growth / Production – Higher limits for production traffic
- Enterprise – Custom limits and SLAs
Best Practices
- Monitor regularly – Check usage dashboards weekly
- Set up alerts – Configure thresholds to avoid surprises
- Review costs – Understand what drives spending
- Optimize usage – Use caching, batching, and efficient models
- Track trends – Watch for usage spikes or anomalies
Troubleshooting
- Unexpected charges – Review usage breakdown by model and endpoint
- High latency – Check latency metrics to identify slow endpoints
- Rate limit issues – Monitor rate limit usage and upgrade if needed
- Missing data – Ensure API keys are properly configured and requests are authenticated
Next Steps
- Learn about Workspaces to understand workspace-level usage
- Read the Deployment Guide to understand deployment metrics
- Check the API documentation for programmatic usage access
- Set up Webhooks for usage automation