DocsOverview

Overview

Compresr reduces LLM token costs by up to 200x through intelligent context compression.

Compression Types

Three compression modes to fit different use cases:

1. Agnostic Compression

General-purpose, token-level compression that selects the most important tokens with full control over the compression rate. No query needed. Perfect for system prompts, documentation, and static contexts.

Model: espresso_v1

2. Query-Specific Compression

Token-level compression that preserves tokens relevant to a given query. Ideal for RAG pipelines and Q&A systems where you want to keep answer-relevant information while compressing the rest.

Model: latte_v1

3. Chunk-Level Filtering

Coarse-grained chunk selection — keeps or drops entire chunks based on query relevance without modifying their content. Best for refining retrieval results before stuffing them into a prompt.

Model: coldbrew_v1

Two Integration Paths

Choose the path that fits your workflow:

SDKContext Gateway
DescriptionUse Compresr directly in your code. Full programmatic control over compression.Transparent proxy for AI agents. Zero code changes required.
Who Uses It?Developers building LLM-powered applications, RAG pipelines, and AI productsAI agent users, platform teams, and anyone running long coding sessions
Key Features
  • Direct library integration in your codebase
  • Two client types: CompressionClient + FilterClient
  • Streaming and async support
  • Sits between your agent and the LLM API
  • Auto-compresses history when context hits threshold
  • Compresses tool outputs on the fly
  • Tool discovery — pre-selects relevant tools based on query
  • Full observability for black-box agents
Use Cases
  • Compress system prompts before sending to LLMs
  • Reduce RAG context size to cut token costs
  • Filter retrieval chunks by query relevance
  • Pre-process long documents for summarization
  • Optimize prompt caching
  • Extend Claude Code sessions without context limits
  • Reduce token costs for Cursor AI sessions
  • Keep OpenClaw agents running on longer tasks
  • Monitor compression savings across sessions
Get StartedSDK docs →Gateway docs →

Quick Start

  1. Get your API key from the Dashboard
  2. Install the SDK: pip install compresr
  3. Start compressing — see the Quick Start guide