Compression Service
The Compression Service gives you direct access to compressed context with you can re-use for multiple queries. Context compression can be done in 3 modes: simple (single), batch, and async streaming. Choose the mode that fits your workflow and configure compression ratios to balance token savings and quality.
Tutorial: Compression Modes
Simple Compression (Sync)
from compresr import CompressionClient
# Initialize the client
client = CompressionClient(api_key="YOUR_COMPRESR_API_KEY")
# Compress context
result = client.generate(
context=(
"NASA's Artemis program aims to return humans to the Moon after 50 years. "
"Artemis I completed an uncrewed test flight in 2022. Artemis II will carry "
"four astronauts on a lunar flyby, while Artemis III will attempt the first "
"crewed landing since Apollo 17, using SpaceX's Starship. International "
"partners include ESA, Japan, and Canada."
),
target_compression_ratio=0.3,
compression_model_name="cmprsr_v1"
)
print(f"Original: {result.data.original_tokens} tokens")
print(f"Compressed: {result.data.compressed_tokens} tokens")
print(f"Compression ratio: {result.data.actual_compression_ratio:.1%}")
print(f"Compressed context: {result.data.compressed_context}")Batch Compression
from compresr import CompressionClient
client = CompressionClient(api_key="YOUR_COMPRESR_API_KEY")
# Batch compress multiple contexts
results = client.generate_batch(
contexts=[
"NASA's Artemis program returns humans to the Moon after 50 years. "
"Artemis III will attempt the first crewed landing since 1972, using "
"SpaceX's Starship as the landing system.",
"React transformed web development with its component-based architecture "
"and virtual DOM. Released by Facebook in 2013, it enables building UIs "
"from reusable components.",
"The Mediterranean diet emphasizes whole grains, vegetables, olive oil, "
"and fish. Research links it to reduced risks of heart disease and diabetes."
],
target_compression_ratio=0.3,
compression_model_name="cmprsr_v1"
)
for i, result in enumerate(results.data.results):
print(f"Context {i+1}: {result.original_tokens} → {result.compressed_tokens} tokens")Async / Streaming Compression
import asyncio
from compresr import CompressionClient
async def main():
client = CompressionClient(api_key="YOUR_COMPRESR_API_KEY")
async for chunk in client.generate_stream(
context="Very long context that you want to compress...",
target_compression_ratio=0.5,
compression_model_name="cmprsr_v1"
):
print(chunk.content, end="")
asyncio.run(main())