DocsAvailable Models

Available Models

Choose the right model based on your use case and performance requirements.

Quick Reference

ModelClientQueryCompression RatioUse Case
espresso_v1CompressionClientOptionalYes (default: 0.5)System prompts, docs
latte_v1CompressionClientRequiredYes (default: 0.5)Query-aware compression
coldbrew_v1FilterClientRequiredN/A (keeps/drops chunks)Retrieval filtering

Compression Models

Token-level compression via CompressionClient. Fine-grained selection of the most important tokens with full control over compression rate.

ModelParametersDescription
espresso_v1context, target_compression_ratioAgnostic compression (default). General-purpose token-level compression for system prompts, documents, and static contexts.
latte_v1context, query, target_compression_ratioQuery-specific compression. Preserves tokens relevant to the given query — ideal for RAG and Q&A pipelines.

Compression Ratio Guide

Set via target_compression_ratio parameter. Higher value = more aggressive compression.

  • 0.2 = Light (keeps 80%, ~1.25x compression)
  • 0.5 = Balanced (keeps 50%, ~2x compression)
  • 0.9 = Aggressive (keeps 10%, ~10x compression)
  • >1 = Direct factor (e.g., 2 = 2x, 100 = 100x)
  • Achievable range: 2x to 100x depending on content redundancy

Filter Models

Coarse-grained chunk selection via FilterClient. Keeps or drops entire chunks based on query relevance without modifying their content.

ModelParametersDescription
coldbrew_v1chunks, queryChunk-level filtering. Keeps or drops entire chunks by query relevance — best for refining retrieval results before stuffing into a prompt.

Model Selection Guide

espresso_v1

CompressionClient · no query needed

  • System prompts and instructions
  • Static documentation
  • General context without specific queries
  • Long-form content compression

latte_v1

CompressionClient · query required

  • RAG (Retrieval-Augmented Generation)
  • Q&A systems with user queries
  • Preserving answer-relevant tokens
  • Query-aware context compression

coldbrew_v1

FilterClient · query required

  • Filtering retriever results
  • Dropping irrelevant chunks
  • Pre-filtering before compression
  • Keeping chunks unchanged