SDK & API Reference
Official SDK and REST API for Compresr.
View on GitHubInstallation
pip install compresrTwo Client Types
Choose the right client based on your use case:
CompressionClient
Token-level compression. Selects the most important tokens with full control over compression rate.
Models: espresso_v1 (agnostic), latte_v1 (query-specific)
FilterClient
Chunk-level filtering. Keeps or drops entire chunks by query relevance without modifying content.
Model: coldbrew_v1
1. Agnostic Compression
from compresr import CompressionClient
client = CompressionClient(api_key="cmp_your_api_key")
result = client.compress(
context="Your very long context that needs compression...",
compression_model_name="espresso_v1"
)
print(f"Original: {result.data.original_tokens} tokens")
print(f"Compressed: {result.data.compressed_tokens} tokens")
print(f"Saved: {result.data.tokens_saved} tokens")2. Question-Specific Compression
from compresr import CompressionClient
client = CompressionClient(api_key="cmp_your_api_key")
result = client.compress(
context="Python was created in 1991. JavaScript in 1995. Java in 1995.",
query="Who created Python?",
compression_model_name="latte_v1"
)
print(f"Compressed: {result.data.compressed_context}")
print(f"Saved: {result.data.tokens_saved} tokens")3. Chunk-Level Filtering
from compresr import FilterClient
client = FilterClient(api_key="cmp_your_api_key")
result = client.filter(
chunks=["Chunk about Python...", "Chunk about Java...", "Chunk about ML..."],
query="What is Python?",
compression_model_name="coldbrew_v1"
)
print(f"Kept chunks: {result.data.compressed_context}") # List[str]
print(f"Saved: {result.data.tokens_saved} tokens")Streaming
Both clients support streaming for real-time output:
from compresr import CompressionClient, FilterClient
# CompressionClient streaming
client = CompressionClient(api_key="cmp_your_api_key")
for chunk in client.compress_stream(
context="Your long context...",
compression_model_name="espresso_v1"
):
print(chunk.content, end="", flush=True)
# Query-specific streaming
for chunk in client.compress_stream(
context="Your context...",
query="What is important?",
compression_model_name="latte_v1"
):
print(chunk.content, end="", flush=True)
# FilterClient streaming
filter_client = FilterClient(api_key="cmp_your_api_key")
for chunk in filter_client.filter_stream(
chunks=["Chunk 1...", "Chunk 2...", "Chunk 3..."],
query="What is important?"
):
print(chunk.content, end="", flush=True)Async / Await
Both clients support async usage for non-blocking operations:
import asyncio
from compresr import CompressionClient, FilterClient
async def main():
# Async compression
client = CompressionClient(api_key="cmp_your_api_key")
result = await client.acompress(
context="Your long context...",
compression_model_name="espresso_v1"
)
print(f"Compressed: {result.data.compressed_tokens} tokens")
# Async filtering
filter_client = FilterClient(api_key="cmp_your_api_key")
result = await filter_client.afilter(
chunks=["Chunk 1...", "Chunk 2..."],
query="What is relevant?"
)
print(f"Kept: {len(result.data.compressed_context)} chunks")
asyncio.run(main())Workflow Integration
Integrate Compresr into your existing LLM workflows. Works with any LLM provider.
Agnostic (System Prompts)
from compresr import CompressionClient
compresr = CompressionClient(api_key="cmp_xxx")
# Compress your system prompt or context
compressed = compresr.compress(
context="Your long system prompt...",
compression_model_name="espresso_v1"
)
# Use with any LLM provider
messages = [
{"role": "system", "content": compressed.data.compressed_context},
{"role": "user", "content": "Your question..."}
]
# Pass to OpenAI, Anthropic, or any other LLMQuestion-Specific (RAG/QA)
from compresr import CompressionClient
compresr = CompressionClient(api_key="cmp_xxx")
user_question = "What is machine learning?"
# Compress retrieved documents based on the query
compressed = compresr.compress(
context="Retrieved documents from your vector DB...",
query=user_question,
compression_model_name="latte_v1"
)
# Use with any LLM provider
messages = [
{"role": "system", "content": compressed.data.compressed_context},
{"role": "user", "content": user_question}
]Chunk Filtering (RAG Pre-filter)
from compresr import FilterClient
filter_client = FilterClient(api_key="cmp_xxx")
user_question = "What is machine learning?"
# Filter retrieved chunks — keep only relevant ones
filtered = filter_client.filter(
chunks=["Chunk about ML...", "Chunk about cooking...", "Chunk about AI..."],
query=user_question
)
# filtered.data.compressed_context is List[str] of kept chunks
context = "\n".join(filtered.data.compressed_context)
messages = [
{"role": "system", "content": context},
{"role": "user", "content": user_question}
]Support
- Documentation: compresr.ai/docs/overview
- Email: [email protected]
- GitHub Discussions
License
Proprietary License - see LICENSE for details.