Free Tool

LLM Token Cost Calculator

Estimate API model cost, RAG overhead, embedding cost, reranking cost, agent call multiplication, prompt caching, batch discounts, and self-hosted GPU economics.

Runs in your browser

No prompts, documents, or usage inputs are sent anywhere by this calculator.

Cost mode

Model pricing

Custom pricing

Input $1/1M · Cached $0.25/1M · Output $3/1M

Enter your own vendor, region, reserved-capacity, or internal chargeback pricing.

Llama/Qwen/Mistral 7B–8B self-hosted estimate

Input $0.05/1M · Cached $0.05/1M · Output $0.05/1M

Illustrative compute estimate only. Use self-hosted mode for infrastructure-based costing.

12B self-hosted estimate

Input $0.1/1M · Cached $0.1/1M · Output $0.1/1M

Good starting point for local RAG sizing. Validate with your own hardware utilization.

32B–70B self-hosted estimate

Input $0.4/1M · Cached $0.4/1M · Output $0.4/1M

Large models need multi-GPU or high-memory GPU servers.

Frontier managed model estimate

Input $2.5/1M · Cached $1.25/1M · Output $10/1M

Use for Azure OpenAI, Bedrock, Vertex, or similar managed enterprise AI estimates.

Small managed model estimate

Input $0.3/1M · Cached $0.1/1M · Output $1.2/1M

Useful for high-volume summarization, classification, and lightweight RAG.

Public API comparison

Input $2.5/1M · Cached $1.25/1M · Output $10/1M

For comparison only. Avoid for confidential or regulated data unless approved.

Workload volume

Requests / day

Input tokens / call

Output tokens / call

Model calls / request

Use 3–8 for agent workflows.

RAG overhead

Tokens / retrieved chunk

Chunks / query

RAG adds 4,000 input tokens per model call. Effective input per call: 5,500 tokens.

Caching and batch discount

Prompt cache hit rate (%)

Batch discount (%)

Embedding cost

New docs / month

Avg doc tokens

Query embedding tokens

Embedding $ / 1M

Reranker cost

Pricing numbers here are illustrative estimates. Vendor prices, regions, reserved capacity, batch rates, cached token rates, and self-hosted infrastructure costs change frequently. Always verify current provider pricing before budgeting.