Skip to main content

Billing & usage

Usage is metered by tokens and cost, and deducted from your account balance. This section describes how that works.

How usage is calculated

  • Inference — Each request consumes input tokens (your prompt or messages) and output tokens (the model’s reply). Some models or endpoints may also report cache read/write or reasoning tokens; all are accounted for in the usage and cost.
  • Fine-tuning — Training jobs are charged based on platform pricing (e.g. per job or per token). The dashboard and the training start/status APIs show cost estimates and actual cost when available.
  • Agent — Agent chat requests are metered like inference (input + output tokens).
Rates are per model or per category; the platform applies its pricing and optional markup. The exact rate per token is not always exposed in the API; the dashboard and usage endpoints show aggregate cost and token counts.

Token accounting

TypeDescription
InputTokens in the prompt or message list.
OutputTokens in the generated completion.
Cache read/writeWhen supported and reported (may have separate pricing).
ReasoningWhen supported (e.g. chain-of-thought); may be reported separately.
Usage records typically include:
  • tokens or total_tokens
  • input_tokens / output_tokens (when broken out)
  • cost — Amount deducted in your account currency (e.g. USD)

Where you see usage

  • Dashboard — Usage or Billing section: token counts, cost over time, breakdown by model or endpoint.
  • API — Usage summary or history endpoints (if available) return the same data for integration or reporting.

Balance

  • You top up balance via the dashboard (e.g. add funds or attach a payment method).
  • Each inference and training job deducts from the balance according to the usage above.
  • If the balance is insufficient, the API returns 402 Payment Required. Add funds and retry.
No usage is deducted for failed requests (e.g. 4xx or 5xx); only successful calls are charged.

Next