Introduction
Compresr shrinks the context you send to your LLM without losing the answer.
Compresr is a context-compression API for LLM developers. Send the long text you'd pass to your model with the query you want answered; Compresr keeps the spans that carry the answer, drops the rest, and returns a shorter context you forward to your LLM. The result: fewer input tokens, lower cost, a longer effective context window, and faster inference. It sits in front of whatever LLM stack you already use and replaces nothing.
Use cases
Anywhere you send long text with a question about it:
- RAG — compress retrieved chunks before they reach the model, so you only pay for tokens that carry the information.
- Conversations — compress growing chat history so long sessions stay inside the context window.
- Tool outputs (agents) — compress noisy tool results — web search hits, API responses, file dumps — before they re-enter the prompt. Pass the tool call's intent as the
queryso Compresr keeps only what the agent asked for. - Document understanding — ask a question against long, dense documents — legal contracts, medical records, financial filings — and keep only the spans that answer it.
The models: latte_v1 and latte_v2
Compresr exposes two query-specific compression models on the public API:
latte_v1— the stable, battle-tested model.latte_v2(beta) — the newer model: up to 5x faster thanlatte_v1at the same or better compression quality.
Which one to use
Reach for latte_v2 by default. It's a drop-in for latte_v1: every parameter latte_v1 takes works here too, plus a dynamic mode that picks the compression ratio per input. latte_v1 stays available for the rare case where latte_v2 falls short.
Parameters
Every call takes a query and the context to compress. Everything else is optional:
querystringRequiredtarget_compression_rationumberOptional0 < r ≤ 1 removes that fraction of the input; r > 1 targets an rx reduction. Ignored when dynamic is on.coarsebooleanOptionalfalseheuristic_chunkingbooleanOptionalfalsedisable_placeholdersbooleanOptionalfalsedynamicbooleanOptionalfalselatte_v2 only. Let Compresr choose the compression ratio per input instead of a fixed target_compression_ratio. A good default for mixed-difficulty inputs; use a fixed ratio when you need a predictable token budget.dynamic_min_rationumberOptional1.5latte_v2 only. Lower bound on the ratio dynamic can pick.dynamic_max_rationumberOptional10.0latte_v2 only. Upper bound on the ratio dynamic can pick.The three dynamic* parameters are latte_v2 only. Full semantics, defaults, and the support matrix live in the Models reference.
Start with
Pick your language and send the first request. The shape of the call is identical across all three; only the syntax differs.
Pick a language
- Python -
pip install compresr, then callclient.compress(...). - TypeScript -
npm install @compresr/sdk, then callclient.compress({...}). - cURL - one
POSTrequest, no install required.
Framework integrations
First-party integrations ship in both SDKs, so you can drop them into existing pipelines without rewriting the surrounding code.
- LangChain: three middlewares (tool output, history summarization, prompt budget) + RAG document compressor + single-tool wrapper, for
create_agentandContextualCompressionRetriever. - LangGraph: adds
make_compresr_nodefor custom state graphs, lossyCompresrCheckpointSerializer+CompresrStorefor at-rest compression, andcompresr_handoff_toolfor supervisor → sub-agent transfers. - LlamaIndex:
CompresrNodePostprocessorfor query engines,wrap_tool_with_compresrforFunctionTools, andCompresrMemoryBlockfor the Memory API. - LiteLLM: Python-only
pre_callguardrail that auto-compresses tool/function messages before they go upstream, working against every LiteLLM provider. - LLM provider recipes: manual pattern called directly against OpenAI, Anthropic, Gemini, or local Ollama.
Related reading
- Quick start - the same 30-second example in Python, TypeScript, and cURL.
- Authentication - how
cmp_keys are issued, rotated, and revoked. - API reference - every endpoint, parameter, and response field.