Billing & usage
Usage is metered by tokens and cost, and deducted from your account balance. This section describes how that works.How usage is calculated
- Inference — Each request consumes input tokens (your prompt or messages) and output tokens (the model’s reply). Some models or endpoints may also report cache read/write or reasoning tokens; all are accounted for in the usage and cost.
- Fine-tuning — Training jobs are charged based on platform pricing (e.g. per job or per token). The dashboard and the training start/status APIs show cost estimates and actual cost when available.
- Agent — Agent chat requests are metered like inference (input + output tokens).
Token accounting
| Type | Description |
|---|---|
| Input | Tokens in the prompt or message list. |
| Output | Tokens in the generated completion. |
| Cache read/write | When supported and reported (may have separate pricing). |
| Reasoning | When supported (e.g. chain-of-thought); may be reported separately. |
- tokens or total_tokens
- input_tokens / output_tokens (when broken out)
- cost — Amount deducted in your account currency (e.g. USD)
Where you see usage
- Dashboard — Usage or Billing section: token counts, cost over time, breakdown by model or endpoint.
- API — Usage summary or history endpoints (if available) return the same data for integration or reporting.
Balance
- You top up balance via the dashboard (e.g. add funds or attach a payment method).
- Each inference and training job deducts from the balance according to the usage above.
- If the balance is insufficient, the API returns 402 Payment Required. Add funds and retry.
Next
- Balance and deductions — How deductions work and how to handle 402.