Skip to content
Compresr docs

LangChain

First-party Compresr middleware, document compressor, and tool wrappers for LangChain 1.0+ agents and retrievers.

The Compresr SDK ships first-party LangChain integrations under compresr.integrations.langchain (Python) and @compresr/sdk/integrations/langchain (TypeScript). They cover the four places LangChain users typically burn tokens: tool outputs in an agent loop, long chat history, the final outbound prompt, and retrieved documents in a ContextualCompressionRetriever. Every entry point calls Compresr under the hood — same auth, same latte_v1 model, same parameter semantics as a direct SDK call.

Python is class-based, TypeScript is factory-based

In Python the three middlewares are classes (CompresrToolMiddleware(...)); in TypeScript they are factory functions (compresrToolMiddleware({...})). The document compressor CompresrExtractor is a class in both. Snake_case kwargs in Python; camelCase options in TypeScript.

1. Install

bash

LangChain 1.0+ is required for the create_agent / createAgent middleware mechanism used by CompresrToolMiddleware, CompresrSummarizationMiddleware, and CompresrPromptMiddleware. The document compressor (CompresrExtractor) works on any version that ships BaseDocumentCompressor.

2. Compress tool outputs in an agent

CompresrToolMiddleware plugs into create_agent. Each time a tool returns, the middleware compresses the result against the user's question before it lands in agent state — so the LLM's next reasoning step sees a shorter ToolMessage.

python

Options

All keyword-only (Python) / options-object fields (TypeScript). Defaults shown:

PythonTypeScriptDefaultNotes
api_keyapiKeyRequired unless client is passed.
clientclientPre-built CompressionClient — bypasses api_key/base_url.
base_urlbaseUrlhttps://api.compresr.aiOverride for self-hosted.
compression_modelcompressionModel"latte_v1"Only latte_v1 is public.
target_compression_ratiotargetCompressionRatio0.50 < r ≤ 1 removes that fraction; r > 1 is Nx target.
min_tokensminTokens200Skip tool outputs shorter than this.
coarsecoarseserver defaultParagraph-level vs token-level.
allow_toolsallowToolsNoneWhitelist of tool names. Pass allow_tools OR ignore_tools, not both.
ignore_toolsignoreToolsNoneBlacklist.
queryqueryStatic query overriding everything else.
query_extractorqueryExtractor(tool_call, messages) -> str — fully custom resolution.
query_argqueryArgPull the query directly from a named tool arg.
on_erroronError"passthrough""raise" to fail fast in tests.

3. Replace old history with a compressed summary

CompresrSummarizationMiddleware is a KV-cache-friendly alternative to LangChain's SummarizationMiddleware. When the conversation crosses a token threshold, it compresses everything older than the last N messages into a single [Earlier conversation summary] ... HumanMessage and leaves recent messages untouched. No LLM round-trip — token-level compression.

python

The middleware detects its own prior summary via the [Earlier conversation summary] prefix and refuses to re-summarize an already-summarized prefix, so it's safe to run on every turn.

Aliases trigger / keep are accepted in place of max_tokens_before_summary / messages_to_keep to match LangChain's own SummarizationMiddleware naming. Additional knobs: token_counter (defaults to the SDK's estimate_tokens — a char/4 heuristic that upgrades to tiktoken cl100k_base when available), query, query_extractor.

4. Cap the outbound prompt with CompresrPromptMiddleware

The last-mile budget cap. CompresrPromptMiddleware runs in wrap_model_call and walks the messages largest-first, compressing just enough to fit under max_tokens. It mutates the model request, not agent state — the next turn still sees the original messages.

python

Compose all three middlewares in the same create_agent/createAgent call — they cover orthogonal token sources:

python

5. Compress retrieved documents

CompresrExtractor is a BaseDocumentCompressor — drop-in replacement for LLMChainExtractor inside a ContextualCompressionRetriever. Batches all eligible documents into a single Compresr call (up to 100 per batch).

python

The extractor sets metadata["compresr"] = True on every document it touched and leaves documents below min_tokens unchanged (or filters them out if drop_below_min=True).

6. Wrap a single tool

If you only need compression on one tool — without an agent middleware — wrap it directly. wrap_tool_with_compression / wrapToolWithCompression returns a new StructuredTool preserving name, description, args_schema, return_direct, and error handlers.

python

Python raises TypeError if the input isn't a StructuredTool — wrap a raw function with @tool first.

For the case where you own the tool's source, there is also a decorator form, compress_tool_output / compressToolOutput, that stacks on top of @tool:

python

When this helps

  • Agent loops calling verbose tools — search, scrape, RAG fetch. CompresrToolMiddleware compresses each tool result once, against the user's question, before the LLM ever sees it.
  • Long-running chat agentsCompresrSummarizationMiddleware keeps the prompt bounded without forcing a slow LLM summary call.
  • Bounded model spend per callCompresrPromptMiddleware enforces an absolute outbound budget regardless of which middleware fired earlier.
  • High-recall retrievaltop_k=20+ plus CompresrExtractor keeps the retrieved payload tight without dropping documents entirely.
  • LangGraph — same middlewares applied inside StateGraph, plus node-level helpers, lossy checkpoint serializer, lossy store wrapper, and multi-agent handoff.
  • Modelslatte_v1 parameter semantics (target_compression_ratio, coarse, and friends).
  • RAG guide — the underlying retrieve → compress → answer pipeline.