Skip to main content

Usage Metering

Usage metering ensures you understand how models are performing and what they cost.

Metrics Captured

  • Requests: Count of inference requests, grouped by endpoint and time range.
  • Tokens: Input and output tokens per request.
  • Spend: Aggregated cost for training runs and inference in your billing currency.
  • Latency: P50, P95 response times for each endpoint.
  • Errors: Rate of retryable or fatal errors.

Dashboards

Navigate to Usage → Overview for a consolidated view. Filter by:
  • Model or endpoint
  • Environment (staging, production)
  • Time ranges (last hour, day, week, custom)
  • Team member or API key

Alerts and Webhooks

Configure usage thresholds to trigger alerts:
  • Email notifications to workspace admins.
  • Webhooks (e.g., send to Slack or internal tooling) for automated responses.