Available Models
Choose the right model based on your use case and performance requirements.
Quick Reference
| Model | Client | Query | Compression Ratio | Use Case |
|---|---|---|---|---|
espresso_v1 | CompressionClient | Optional | Yes (default: 0.5) | System prompts, docs |
latte_v1 | CompressionClient | Required | Yes (default: 0.5) | Query-aware compression |
coldbrew_v1 | FilterClient | Required | N/A (keeps/drops chunks) | Retrieval filtering |
Compression Models
Token-level compression via CompressionClient. Fine-grained selection of the most important tokens with full control over compression rate.
| Model | Parameters | Description |
|---|---|---|
espresso_v1 | context, target_compression_ratio | Agnostic compression (default). General-purpose token-level compression for system prompts, documents, and static contexts. |
latte_v1 | context, query, target_compression_ratio | Query-specific compression. Preserves tokens relevant to the given query — ideal for RAG and Q&A pipelines. |
Compression Ratio Guide
Set via target_compression_ratio parameter. Higher value = more aggressive compression.
- 0.2 = Light (keeps 80%, ~1.25x compression)
- 0.5 = Balanced (keeps 50%, ~2x compression)
- 0.9 = Aggressive (keeps 10%, ~10x compression)
- >1 = Direct factor (e.g., 2 = 2x, 100 = 100x)
- Achievable range: 2x to 100x depending on content redundancy
Filter Models
Coarse-grained chunk selection via FilterClient. Keeps or drops entire chunks based on query relevance without modifying their content.
| Model | Parameters | Description |
|---|---|---|
coldbrew_v1 | chunks, query | Chunk-level filtering. Keeps or drops entire chunks by query relevance — best for refining retrieval results before stuffing into a prompt. |
Model Selection Guide
espresso_v1
CompressionClient · no query needed
- System prompts and instructions
- Static documentation
- General context without specific queries
- Long-form content compression
latte_v1
CompressionClient · query required
- RAG (Retrieval-Augmented Generation)
- Q&A systems with user queries
- Preserving answer-relevant tokens
- Query-aware context compression
coldbrew_v1
FilterClient · query required
- Filtering retriever results
- Dropping irrelevant chunks
- Pre-filtering before compression
- Keeping chunks unchanged