Token (LLM)

A token is the unit of text a language model reads and generates, typically a word, sub-word, or character fragment produced by the model’s tokenizer.

Language models do not operate on raw characters or whole words; they operate on tokens, the pieces a tokenizer splits text into. A common rule of thumb for English is that one token is roughly four characters, or about three-quarters of a word, though this varies by tokenizer and language.

Tokens are the unit of billing and the unit of context limits. API pricing is quoted per million tokens, and a model’s context window is measured in tokens. Both the input you send and the output you receive count, which is why long contexts are expensive and slow.

Because cost and latency scale with token count, reducing input tokens is the most direct lever for cheaper, faster LLM calls. This is exactly what context compression does: it lowers the token count of the input while preserving the answer-bearing content.

Compresr is priced and measured in tokens (for example $0.10 per 1M tokens) and its job is to cut the input tokens your model has to read without losing the answer.