AI Tokens Explained: Count, Cost, and Context Windows

What Are Tokens?

When you send text to an AI model like ChatGPT, Claude, or Gemini, the model does not read words the way you do. It breaks your text into smaller pieces called tokens - subword units that sit somewhere between individual characters and full words.

For example, the word "tokenization" is split into two tokens: "token" and "ization". Common words like "the" or "hello" are usually one token. Rare or long words get split into multiple pieces. Numbers, punctuation, and whitespace each get their own tokens too.

As a rough guide: 1 token is about 4 characters or 0.75 words in English. A 1,000-word essay is roughly 1,300-1,500 tokens depending on vocabulary.

Why tokens instead of words? Because tokens handle everything - any language, typos, code, emojis, invented words - without needing a fixed dictionary. A word-based system would fail on "asdfghjkl" or a line of Python. A token-based system just splits it into smaller pieces and moves on.

How Tokenization Works

Most modern LLMs use an algorithm called Byte Pair Encoding (BPE). The idea is simple:

Start with every individual byte (character) as its own token
Scan the training data and find the most frequently occurring pair of adjacent tokens
Merge that pair into a single new token
Repeat thousands of times until you reach the target vocabulary size

The result is a vocabulary where common words and subwords ("the", "ing", "tion") are single tokens, while rare strings get split into smaller pieces. The model learns which merges to use during training, and the same merge rules are applied to your input at inference time.

Different models use different tokenizers. GPT-5.4 uses an encoding called o200k_base. Earlier GPT models used cl100k_base. Claude, Gemini, and DeepSeek use proprietary tokenizers that are not publicly available. This means the same text can produce different token counts depending on which model you are targeting - a sentence that is 25 tokens on GPT might be 27 on Claude or 23 on Gemini.

Why Token Count Matters

There are two practical reasons to care about token count: cost and limits.

Cost. Every LLM API charges per token. Pricing is split into input tokens (what you send) and output tokens (what the model generates). Output tokens are typically more expensive because generating text is more compute-intensive than reading it. If you are building an application that makes hundreds or thousands of API calls, the difference between a 500-token prompt and a 2,000-token prompt adds up fast.

Limits. Every model has a context window - the maximum number of tokens it can process in a single request. Your input and the model's output must fit inside this window together. If you exceed it, the API returns an error. In multi-turn conversations, tokens accumulate quickly: every request sends the system prompt, the full conversation history, and the current message. A 10-turn conversation can easily reach tens of thousands of tokens per request.

Context Windows Explained

The context window is the model's working memory. Everything inside it is visible to the model. Everything outside it does not exist.

Current context window sizes vary widely:

GPT-5.4 - 1,000,000 tokens
Gemini 3.1 Pro - 1,000,000 tokens
Claude Opus 4.6 - 200,000 tokens
Claude Sonnet 4.6 - 200,000 tokens
DeepSeek V3.2 - 130,000 tokens

A larger context window does not mean you should fill it. Larger inputs cost more money and can degrade response quality as the model has more text to attend to. Most practical use cases stay well under the limit.

There is also a separate output token limit. Most models cap their response at 4,000-16,000 tokens regardless of how large the context window is. If you need a longer response, you may need to ask the model to continue in a follow-up request.

Common strategies for staying within limits:

Summarize older conversation turns instead of sending the full history
Keep system prompts concise - they are sent on every request
Use a model with a larger context window if your use case demands it
Split long documents into chunks and process them separately

Comparing API Costs Across Models

Pricing varies dramatically. Here is what the major models charge per 1,000 tokens (as of early 2026):

Model	Input (per 1K tokens)	Output (per 1K tokens)
GPT-5.4	$0.0025	$0.015
Claude Opus 4.6	$0.015	$0.075
Claude Sonnet 4.6	$0.003	$0.015
Gemini 3.1 Pro	$0.002	$0.012
DeepSeek V3.2	$0.00028	$0.00042

The cost difference between the most expensive (Claude Opus) and cheapest (DeepSeek) model is roughly 50x for input and 180x for output. That gap matters at scale.

For a concrete example: a 1,000-token prompt with a 500-token response costs about $0.053 on Claude Opus 4.6, but only $0.00049 on DeepSeek V3.2. One request is negligible either way. But at 10,000 requests per day, that is $530 vs $4.90 - per day.

The right model depends on what you are building:

Prototyping and experimentation - use a cheaper model (Sonnet, Gemini, DeepSeek) to iterate fast without worrying about cost
Production with quality requirements - use a flagship model (Opus, GPT-5.4) where accuracy matters and the volume justifies the cost
High-volume, cost-sensitive workloads - use the cheapest model that meets your quality threshold, and invest in prompt optimization to keep token counts low

Practical Tips for Reducing Token Usage

Every token you save is money saved on every single API call. Here are the highest-leverage optimizations:

Write concise system prompts. The system prompt is sent on every request. A 500-token system prompt costs you 500 tokens per call whether the user asks a simple question or a complex one. Cut ruthlessly.
Trim conversation history. Instead of sending the entire chat, summarize older turns into a few sentences. The model does not need verbatim transcripts of messages from 20 turns ago.
Request structured output. Ask for JSON, a table, or a numbered list instead of prose. Structured responses tend to be shorter and more predictable, using fewer output tokens.
Pick the smallest model that works. If Sonnet produces acceptable results for your use case, there is no reason to pay for Opus. Test across tiers before committing.
Write better prompts. A well-structured prompt that covers all 7 dimensions - clarity, specificity, context, role, format, examples, and tone - tends to get the right answer on the first try. Vague prompts often require follow-up corrections, which means more tokens spent reaching the same result.

Try the AI Token Calculator

Knowing your token count before hitting the API removes the guesswork from cost planning. The AI Token Calculator lets you paste any text and instantly see:

Token count for your selected model
Context window usage as a percentage
Estimated input and output cost
A side-by-side cost comparison across all supported models

All processing happens in your browser - no text is sent to any server. Paste a prompt, a system message, or an entire document to see exactly what it will cost before you send it.

Ready to try it out?

Open the Token Calculator →