Overview
Compresr reduces LLM token costs by up to 200x through intelligent context compression.
Compression Types
Three compression modes to fit different use cases:
1. Agnostic Compression
General-purpose, token-level compression that selects the most important tokens with full control over the compression rate. No query needed. Perfect for system prompts, documentation, and static contexts.
Model: espresso_v1
2. Query-Specific Compression
Token-level compression that preserves tokens relevant to a given query. Ideal for RAG pipelines and Q&A systems where you want to keep answer-relevant information while compressing the rest.
Model: latte_v1
3. Chunk-Level Filtering
Coarse-grained chunk selection — keeps or drops entire chunks based on query relevance without modifying their content. Best for refining retrieval results before stuffing them into a prompt.
Model: coldbrew_v1
Two Integration Paths
Choose the path that fits your workflow:
| SDK | Context Gateway | |
|---|---|---|
| Description | Use Compresr directly in your code. Full programmatic control over compression. | Transparent proxy for AI agents. Zero code changes required. |
| Who Uses It? | Developers building LLM-powered applications, RAG pipelines, and AI products | AI agent users, platform teams, and anyone running long coding sessions |
| Key Features |
|
|
| Use Cases |
|
|
| Get Started | SDK docs → | Gateway docs → |
Quick Start
- Get your API key from the Dashboard
- Install the SDK:
pip install compresr - Start compressing — see the Quick Start guide