Quick start
Send your first compression request in 30 seconds.
By the end of this page you will have made your first compressed request and seen the token savings.
Install the SDK
Install the client for your language. The cURL path needs nothing - it ships with your OS.
bashGet an API key
Create a key in the dashboard, copy its
cmp_...value, and export it asCOMPRESR_API_KEY. See Authentication for the full key model and security guidance.Send a compression request
Pass the long
contextyou would otherwise send to your LLM, plus thequeryyou want it to answer.pythonInspect the result
The response contains the compressed text plus token-savings stats. This is the actual response from the live API for the call above (numbers will vary slightly run-to-run as
duration_msdepends on load):textWhat the fields mean
compressed_context: the shortened text. Forward this to your LLM exactly as you would the original input.[N tokens dropped]markers show where spans were cut; passdisable_placeholders=trueif you want a clean concatenation without them.actual_compression_ratio: fraction of input tokens removed (here0.6375= ~64% removed). It is not an Nx factor.target_compression_ratio: the value you asked for, echoed back.0–1= removal strength;>1= Nx factor (max200).tokens_saved:original_tokens−compressed_tokens.duration_ms: server-side compression time. Network round-trip is on top of this.
From here, see the models reference for everything you can tune, or wire Compresr into your stack with one of the integrations below.
- Python SDK - full method reference, async variants, streaming, batching.
- TypeScript SDK - same surface, camelCase params.
- cURL / HTTP - raw REST reference.
- Models - tune
target_compression_ratioand other latte-only options. - Agent client - drop-in for
anthropic.Anthropic()/openai.OpenAI()with automatic tool-output compression. - Web search - add Tavily or Brave to your agent loop in one line.
- LangChain integration: first-party middleware for tool outputs, history, and outbound prompts, plus a
BaseDocumentCompressorfor RAG. - LangGraph integration: state-graph node, lossy checkpoint serializer, store wrapper, and multi-agent handoff tool.
- LlamaIndex integration: query-engine postprocessor, tool wrapper, and Memory API block.
- LiteLLM integration: drop the
compresrguardrail into the proxy and compress tool messages across every provider transparently.