Skip to main contentUsage Metering
Usage metering ensures you understand how models are performing and what they cost.
Metrics Captured
- Requests: Count of inference requests, grouped by endpoint and time range.
- Tokens: Input and output tokens per request.
- Spend: Aggregated cost for training runs and inference in your billing currency.
- Latency: P50, P95 response times for each endpoint.
- Errors: Rate of retryable or fatal errors.
Dashboards
Navigate to Usage → Overview for a consolidated view. Filter by:
- Model or endpoint
- Environment (staging, production)
- Time ranges (last hour, day, week, custom)
- Team member or API key
Alerts and Webhooks
Configure usage thresholds to trigger alerts:
- Email notifications to workspace admins.
- Webhooks (e.g., send to Slack or internal tooling) for automated responses.