Quick start
Send your first compression request in 30 seconds.
By the end of this page you will have made your first compressed request and seen the token savings.
What just happened?
latte_v1 scored every paragraph in your context against the query, kept the ones that answer it, and dropped the rest. The paragraph with the mirror-diameter sentence stayed; the L2 orbit, sunshield, launch, and operator paragraphs did not. You sent ~64% fewer input tokens to your downstream model without losing the answer.
Want a cleaner output without the [N tokens dropped] placeholders? Pass disable_placeholders=True. For finer-grained, sentence-level cuts inside a paragraph, pass coarse=False. See the models reference for the full parameter list.
Next steps
- Python SDK - full method reference, async variants, streaming, batching.
- TypeScript SDK - same surface, camelCase params.
- cURL / HTTP - raw REST reference.
- Models - tune
target_compression_ratioand other latte-only options. - Agent client - drop-in for
anthropic.Anthropic()/openai.OpenAI()with automatic tool-output compression. - Web search - add Tavily or Brave to your agent loop in one line.
- LangChain integration: first-party middleware for tool outputs, history, and outbound prompts, plus a
BaseDocumentCompressorfor RAG. - LangGraph integration: state-graph node, lossy checkpoint serializer, store wrapper, and multi-agent handoff tool.
- LlamaIndex integration: query-engine postprocessor, tool wrapper, and Memory API block.
- LiteLLM integration: drop the
compresrguardrail into the proxy and compress tool messages across every provider transparently.