LangChain

First-party Compresr middleware, document compressor, and tool wrappers for LangChain 1.0+ agents and retrievers.

The Compresr SDK ships first-party LangChain integrations under compresr.integrations.langchain (Python) and @compresr/sdk/integrations/langchain (TypeScript). They cover the four places LangChain users typically burn tokens: tool outputs in an agent loop, long chat history, the final outbound prompt, and retrieved documents in a ContextualCompressionRetriever. Every entry point calls Compresr under the hood: same auth, same latte_v1/latte_v2 models, same parameter semantics as a direct SDK call.

Python is class-based, TypeScript is factory-based

In Python the three middlewares are classes (CompresrToolMiddleware(...)); in TypeScript they are factory functions (compresrToolMiddleware({...})). The document compressor CompresrExtractor is a class in both. Snake_case kwargs in Python; camelCase options in TypeScript.

1. Install

bash

LangChain 1.0+ is required for the create_agent / createAgent middleware mechanism used by CompresrToolMiddleware, CompresrSummarizationMiddleware, and CompresrPromptMiddleware. The document compressor (CompresrExtractor) works on any version that ships BaseDocumentCompressor.

Extras are no-ops since 2.6.0

LangChain deps ship in the base install. pip install compresr[langchain] still resolves for back-compat but adds nothing.

2. Compress tool outputs in an agent

CompresrToolMiddleware plugs into create_agent. Each time a tool returns, the middleware compresses the result against the user's question before it lands in agent state, so the LLM's next reasoning step sees a shorter ToolMessage.

Middleware default is latte_v1

All three middlewares default to compression_model="latte_v1" (query-aware, requires a query). Pass compression_model="latte_v2" to opt into the newer backbone. api_key= can be omitted when COMPRESR_API_KEY is set in the environment or you have run compresr-sdk login (INI file at ~/.compresr/credentials).

python

CompresrToolMiddleware options

All keyword-only (Python) / options-object fields (TypeScript). Defaults shown. allow_tools, ignore_tools, and query_arg are specific to CompresrToolMiddleware; the summarization and prompt middlewares accept a different set (see below).

Python	TypeScript	Default	Notes
`api_key`	`apiKey`	n/a	Required unless `client` is passed.
`client`	`client`	n/a	Pre-built `CompressionClient`: bypasses `api_key`/`base_url`.
`base_url`	`baseUrl`	`https://api.compresr.ai`	Override for self-hosted.
`compression_model`	`compressionModel`	`"latte_v1"`	The Compresr backbone routing the compression call. Defaults to `latte_v1`; pass `"latte_v2"` to opt into the newer backbone.
`target_compression_ratio`	`targetCompressionRatio`	`0.5`	`0 < r ≤ 1` removes that fraction; `r > 1` is Nx target.
`min_tokens`	`minTokens`	`200`	Skip tool outputs shorter than this.
`coarse`	`coarse`	server default	Paragraph-level vs token-level.
`allow_tools`	`allowTools`	`None`	Whitelist of tool names. Pass `allow_tools` OR `ignore_tools`, not both.
`ignore_tools`	`ignoreTools`	`None`	Blacklist.
`query`	`query`	n/a	Static query overriding everything else.
`query_extractor`	`queryExtractor`	n/a	`(tool_call, messages) -> str`, fully custom resolution.
`query_arg`	`queryArg`	n/a	Pull the query directly from a named tool arg.
`on_error`	`onError`	`"passthrough"`	`"raise"` to fail fast in tests. The type is exported as `ErrorPolicy` for typed configs.

3. Replace old history with a compressed summary

CompresrSummarizationMiddleware is a KV-cache-friendly alternative to LangChain's SummarizationMiddleware. When the conversation crosses a token threshold, it compresses everything older than the last N messages into a single [Earlier conversation summary] ... HumanMessage and leaves recent messages untouched. No LLM round-trip: token-level compression.

python

The middleware detects its own prior summary via the [Earlier conversation summary] prefix and refuses to re-summarize an already-summarized prefix, so it's safe to run on every turn.

Aliases trigger / keep are accepted in place of max_tokens_before_summary / messages_to_keep to match LangChain's own SummarizationMiddleware naming.

CompresrSummarizationMiddleware options

Python	TypeScript	Default	Notes
`max_tokens_before_summary` (alias `trigger`)	`maxTokensBeforeSummary` (alias `trigger`)	`4000`	Token threshold that triggers a summary.
`messages_to_keep` (alias `keep`)	`messagesToKeep` (alias `keep`)	`20`	Recent messages preserved verbatim.
`target_compression_ratio`	`targetCompressionRatio`	`0.5`	Applied to the older-messages block.
`token_counter`	`tokenCounter`	SDK `estimate_tokens`	char/4 heuristic, upgrades to tiktoken `cl100k_base` when available.
`query`	`query`	n/a	Static query.
`query_extractor`	`queryExtractor`	n/a	`(messages) -> str`, custom resolution.
`client`	`client`	n/a	Pre-built `CompressionClient`.
`api_key`	`apiKey`	n/a	Required unless `client` is passed.
`base_url`	`baseUrl`	`https://api.compresr.ai`	Self-hosted override.
`compression_model`	`compressionModel`	`"latte_v1"`
`coarse`	`coarse`	server default	Paragraph vs token level.
`on_error`	`onError`	`"passthrough"`	Exported as `ErrorPolicy`.

4. Cap the outbound prompt with `CompresrPromptMiddleware`

The last-mile budget cap. CompresrPromptMiddleware runs in wrap_model_call and walks the messages largest-first, compressing just enough to fit under max_tokens. It mutates the model request, not agent state; the next turn still sees the original messages.

python

CompresrPromptMiddleware options

Python	TypeScript	Default	Notes
`max_tokens`	`maxTokens`	n/a	Required. Hard ceiling for the outbound prompt.
`min_tokens`	`minTokens`	`200`	Don't touch messages smaller than this.
`target_compression_ratio`	`targetCompressionRatio`	`0.5`	Per-message removal strength.
`token_counter`	`tokenCounter`	SDK `estimate_tokens`	Same char/4 heuristic with tiktoken upgrade.
`coarse`	`coarse`	server default	Paragraph vs token level.
`query`	`query`	n/a	Static query.
`query_extractor`	`queryExtractor`	n/a	`(messages) -> str`, custom resolution.
`client`	`client`	n/a	Pre-built `CompressionClient`.
`api_key`	`apiKey`	n/a	Required unless `client` is passed.
`base_url`	`baseUrl`	`https://api.compresr.ai`	Self-hosted override.
`compression_model`	`compressionModel`	`"latte_v1"`
`on_error`	`onError`	`"passthrough"`	Exported as `ErrorPolicy`.

Compose all three middlewares in the same create_agent/createAgent call. They cover orthogonal token sources:

python

5. Compress retrieved documents

CompresrExtractor is a BaseDocumentCompressor: a drop-in replacement for LLMChainExtractor inside a ContextualCompressionRetriever. Batches all eligible documents into a single Compresr call (up to 100 per batch).

python

The extractor sets metadata["compresr"] = True on every document it touched and leaves documents below min_tokens unchanged (or filters them out if drop_below_min=True).

6. Wrap a single tool

If you only need compression on one tool, without an agent middleware, wrap it directly. wrap_tool_with_compression / wrapToolWithCompression returns a new StructuredTool preserving name, description, args_schema, return_direct, and error handlers.

python

Python raises TypeError if the input isn't a StructuredTool; wrap a raw function with @tool first. TypeScript is more permissive: wrapToolWithCompression accepts anything exposing name plus func, _call, or invoke.

For the case where you own the tool's source, there is also a decorator form, compress_tool_output / compressToolOutput, that stacks on top of @tool:

python

When this helps

Agent loops calling verbose tools: search, scrape, RAG fetch. CompresrToolMiddleware compresses each tool result once, against the user's question, before the LLM ever sees it.
Long-running chat agents: CompresrSummarizationMiddleware keeps the prompt bounded without forcing a slow LLM summary call.
Bounded model spend per call: CompresrPromptMiddleware enforces an absolute outbound budget regardless of which middleware fired earlier.
High-recall retrieval: top_k=20+ plus CompresrExtractor keeps the retrieved payload tight without dropping documents entirely.

LangGraph: same middlewares applied inside StateGraph, plus node-level helpers, lossy checkpoint serializer, lossy store wrapper, and multi-agent handoff.
Models: latte_v2 parameter semantics (target_compression_ratio, coarse, and friends).
RAG guide: the underlying retrieve → compress → answer pipeline.