Usage Metering

Usage metering ensures you understand how models are performing and what they cost. LLMTune provides comprehensive usage tracking across all products.

Metrics Captured

Inference Metrics

Requests – Count of inference requests, grouped by endpoint and time range
Tokens – Input and output tokens per request
Latency – P50, P95, P99 response times for each endpoint
Errors – Rate of retryable or fatal errors
Success rate – Percentage of successful requests

Training Metrics

Training jobs – Number of jobs launched
GPU hours – Compute time used for training
Training spend – Cost per training job
Queue time – Time spent waiting in queue

Deployment Metrics

Deployments – Number of active deployments
Version changes – Deployment version updates
Traffic distribution – Traffic split across versions

Overall Metrics

Spend – Aggregated cost for training runs and inference in your billing currency
Usage trends – Daily, weekly, monthly usage patterns
Cost breakdown – Spending by product, model, or endpoint

Usage Dashboards

Navigate to Usage in the dashboard for comprehensive usage analytics.

Overview Dashboard

The overview provides a consolidated view with:

Summary cards – Total requests, tokens, spend, and active deployments
Time series charts – Usage trends over time
Top models – Most used models and endpoints
Cost breakdown – Spending by category

Filtering Options

Filter usage data by:

Model or endpoint – View usage for specific models
Environment – Staging vs production
Time ranges – Last hour, day, week, month, or custom range
Team member – Usage by user (if team features are enabled)
API key – Usage per API key for debugging

Alerts and Webhooks

Configure usage thresholds to trigger alerts:

Email Notifications

Spend alerts – Notify when spending exceeds thresholds
Usage spikes – Alert on unusual usage patterns
Error rate alerts – Notify when error rates exceed limits

Webhooks

Subscribe to training and deployment events via webhooks (see Webhooks). Usage-threshold webhook events may be added in a future release.

Billing Integration

LLMTune integrates with Stripe for billing:

Automatic top-ups – Configure automatic balance top-ups
Usage-based billing – Pay per token or request
Training billing – Pay per GPU hour for training jobs
Billing exports – Download usage and billing reports

Rate Limits

Rate limits are applied per workspace and plan:

Sandbox – Lower limits for experimentation
Growth / Production – Higher limits for production traffic
Enterprise – Custom limits and SLAs

Monitor rate limit usage in the Usage dashboard to avoid hitting limits.

Best Practices

Monitor regularly – Check usage dashboards weekly
Set up alerts – Configure thresholds to avoid surprises
Review costs – Understand what drives spending
Optimize usage – Use caching, batching, and efficient models
Track trends – Watch for usage spikes or anomalies

Troubleshooting

Unexpected charges – Review usage breakdown by model and endpoint
High latency – Check latency metrics to identify slow endpoints
Rate limit issues – Monitor rate limit usage and upgrade if needed
Missing data – Ensure API keys are properly configured and requests are authenticated

Next Steps

Learn about Workspaces to understand workspace-level usage
Read the Deployment Guide to understand deployment metrics
Check the API documentation for programmatic usage access
Set up Webhooks for usage automation

Getting started

Core concepts

How-to guides

Usage Metering

Usage Metering

Metrics Captured

Inference Metrics

Training Metrics

Deployment Metrics

Overall Metrics

Usage Dashboards

Overview Dashboard

Filtering Options

Alerts and Webhooks

Email Notifications

Webhooks

Billing Integration

Rate Limits

Best Practices

Troubleshooting

Next Steps

Getting started

Core concepts

How-to guides

​Usage Metering

​Metrics Captured

​Inference Metrics

​Training Metrics

​Deployment Metrics

​Overall Metrics

​Usage Dashboards

​Overview Dashboard

​Filtering Options

​Alerts and Webhooks

​Email Notifications

​Webhooks

​Billing Integration

​Rate Limits

​Best Practices

​Troubleshooting

​Next Steps

Usage Metering

Metrics Captured

Inference Metrics

Training Metrics

Deployment Metrics

Overall Metrics

Usage Dashboards

Overview Dashboard

Filtering Options

Alerts and Webhooks

Email Notifications

Webhooks

Billing Integration

Rate Limits

Best Practices

Troubleshooting

Next Steps