Skip to main content

Usage Metering

Usage metering ensures you understand how models are performing and what they cost. LLMTune provides comprehensive usage tracking across all products.

Metrics Captured

Inference Metrics

  • Requests – Count of inference requests, grouped by endpoint and time range
  • Tokens – Input and output tokens per request
  • Latency – P50, P95, P99 response times for each endpoint
  • Errors – Rate of retryable or fatal errors
  • Success rate – Percentage of successful requests

Training Metrics

  • Training jobs – Number of jobs launched
  • GPU hours – Compute time used for training
  • Training spend – Cost per training job
  • Queue time – Time spent waiting in queue

Deployment Metrics

  • Deployments – Number of active deployments
  • Version changes – Deployment version updates
  • Traffic distribution – Traffic split across versions

Overall Metrics

  • Spend – Aggregated cost for training runs and inference in your billing currency
  • Usage trends – Daily, weekly, monthly usage patterns
  • Cost breakdown – Spending by product, model, or endpoint

Usage Dashboards

Navigate to Usage in the dashboard for comprehensive usage analytics.

Overview Dashboard

The overview provides a consolidated view with:
  • Summary cards – Total requests, tokens, spend, and active deployments
  • Time series charts – Usage trends over time
  • Top models – Most used models and endpoints
  • Cost breakdown – Spending by category

Filtering Options

Filter usage data by:
  • Model or endpoint – View usage for specific models
  • Environment – Staging vs production
  • Time ranges – Last hour, day, week, month, or custom range
  • Team member – Usage by user (if team features are enabled)
  • API key – Usage per API key for debugging

Alerts and Webhooks

Configure usage thresholds to trigger alerts:

Email Notifications

  • Spend alerts – Notify when spending exceeds thresholds
  • Usage spikes – Alert on unusual usage patterns
  • Error rate alerts – Notify when error rates exceed limits

Webhooks

Subscribe to usage events via webhooks:
  • usage.threshold_reached – Fired when usage crosses configured thresholds
  • Custom webhook URLs for integration with Slack, PagerDuty, or internal tooling

Billing Integration

LLMTune integrates with Stripe for billing:
  • Automatic top-ups – Configure automatic balance top-ups
  • Usage-based billing – Pay per token or request
  • Training billing – Pay per GPU hour for training jobs
  • Billing exports – Download usage and billing reports

Rate Limits

Rate limits are applied per workspace and plan:
  • Sandbox – Lower limits for experimentation
  • Growth / Production – Higher limits for production traffic
  • Enterprise – Custom limits and SLAs
Monitor rate limit usage in the Usage dashboard to avoid hitting limits.

Best Practices

  1. Monitor regularly – Check usage dashboards weekly
  2. Set up alerts – Configure thresholds to avoid surprises
  3. Review costs – Understand what drives spending
  4. Optimize usage – Use caching, batching, and efficient models
  5. Track trends – Watch for usage spikes or anomalies

Troubleshooting

  • Unexpected charges – Review usage breakdown by model and endpoint
  • High latency – Check latency metrics to identify slow endpoints
  • Rate limit issues – Monitor rate limit usage and upgrade if needed
  • Missing data – Ensure API keys are properly configured and requests are authenticated

Next Steps