Skip to main contentUsage Metering
Usage metering ensures you understand how models are performing and what they cost. LLMTune provides comprehensive usage tracking across all products.
Metrics Captured
Inference Metrics
- Requests – Count of inference requests, grouped by endpoint and time range
- Tokens – Input and output tokens per request
- Latency – P50, P95, P99 response times for each endpoint
- Errors – Rate of retryable or fatal errors
- Success rate – Percentage of successful requests
Training Metrics
- Training jobs – Number of jobs launched
- GPU hours – Compute time used for training
- Training spend – Cost per training job
- Queue time – Time spent waiting in queue
Deployment Metrics
- Deployments – Number of active deployments
- Version changes – Deployment version updates
- Traffic distribution – Traffic split across versions
Overall Metrics
- Spend – Aggregated cost for training runs and inference in your billing currency
- Usage trends – Daily, weekly, monthly usage patterns
- Cost breakdown – Spending by product, model, or endpoint
Usage Dashboards
Navigate to Usage in the dashboard for comprehensive usage analytics.
Overview Dashboard
The overview provides a consolidated view with:
- Summary cards – Total requests, tokens, spend, and active deployments
- Time series charts – Usage trends over time
- Top models – Most used models and endpoints
- Cost breakdown – Spending by category
Filtering Options
Filter usage data by:
- Model or endpoint – View usage for specific models
- Environment – Staging vs production
- Time ranges – Last hour, day, week, month, or custom range
- Team member – Usage by user (if team features are enabled)
- API key – Usage per API key for debugging
Alerts and Webhooks
Configure usage thresholds to trigger alerts:
Email Notifications
- Spend alerts – Notify when spending exceeds thresholds
- Usage spikes – Alert on unusual usage patterns
- Error rate alerts – Notify when error rates exceed limits
Webhooks
Subscribe to usage events via webhooks:
usage.threshold_reached – Fired when usage crosses configured thresholds
- Custom webhook URLs for integration with Slack, PagerDuty, or internal tooling
Billing Integration
LLMTune integrates with Stripe for billing:
- Automatic top-ups – Configure automatic balance top-ups
- Usage-based billing – Pay per token or request
- Training billing – Pay per GPU hour for training jobs
- Billing exports – Download usage and billing reports
Rate Limits
Rate limits are applied per workspace and plan:
- Sandbox – Lower limits for experimentation
- Growth / Production – Higher limits for production traffic
- Enterprise – Custom limits and SLAs
Monitor rate limit usage in the Usage dashboard to avoid hitting limits.
Best Practices
- Monitor regularly – Check usage dashboards weekly
- Set up alerts – Configure thresholds to avoid surprises
- Review costs – Understand what drives spending
- Optimize usage – Use caching, batching, and efficient models
- Track trends – Watch for usage spikes or anomalies
Troubleshooting
- Unexpected charges – Review usage breakdown by model and endpoint
- High latency – Check latency metrics to identify slow endpoints
- Rate limit issues – Monitor rate limit usage and upgrade if needed
- Missing data – Ensure API keys are properly configured and requests are authenticated
Next Steps