SDKs
TypeScript SDK
Compress prompts and retrieved context with the official TypeScript client.
@compresr/sdk is the official TypeScript client. It is fully typed, isomorphic (Node 18+, edge runtimes, modern browsers via a server proxy), and async by default. Every method returns a Promise or an AsyncGenerator.
Naming convention
Inputs use camelCase (compressionModelName, targetCompressionRatio, heuristicChunking, disablePlaceholders); the SDK serializes them to snake_case on the wire. Response fields remain snake_case — they match the API exactly and are stable across SDKs.
1. Install
@compresr/sdk works with npm, pnpm, yarn, and bun. Ships ESM + CJS builds plus .d.ts types — no @types/... companion needed.
The agent client lives in @compresr/sdk/agents
As of @compresr/sdk 1.5.x the agent client layer (client.messages.create, client.chat.completions.create, client.run, WebSearchTool) ships in the same package under the /agents subpath. LangChain.js + provider chat-model + Tavily/Brave are declared as optional peer dependencies — install only the ones for the provider you actually use (e.g. npm install langchain @langchain/core @langchain/anthropic @langchain/tavily).
2. Initialize the client
Construct CompressionClient once at module scope and reuse it — the client keeps an internal connection pool. Pass apiKey; optionally timeout (ms, default 300_000 — five minutes) and baseUrl (default https://api.compresr.ai). Uses the native fetch API (Node 18+, modern browsers, edge runtimes); no option to swap in a custom fetch.
The non-null assertion (!) on process.env.COMPRESR_API_KEY is fine in code you control; in serverless deployments whose env wiring you do not own, narrow with a guard. See Authentication.
3. compress
Single-request compression. Returns a Promise<CompressionResponse> — await it. Pass context, query, and compressionModelName: 'latte_v1'; the model keeps the spans that matter for the query. For many chunks against one query, see compressBatch; for token-by-token output, compressStream.
Parameters
contextstringRequiredquerystringRequiredlatte_v1.compression_model_name"latte_v1"Requiredlatte_v1 is the only public model.target_compression_rationumberOptionalcoarsebooleanOptionalundefinedtrue locks paragraph-level; false opts into token-level precision.heuristic_chunkingbooleanOptionalundefineddisable_placeholdersbooleanOptionalundefined[...] placeholders inserted where content was dropped.Response
Response field names are snake_case across every SDK — the SDK does not rename them. Matches what comes back over the wire.
dataobjectcompressed_contextstringThe compressed text, ready to drop into your prompt.
original_tokensnumberToken count of the input context (tiktoken cl100k).
compressed_tokensnumberToken count of the compressed output.
tokens_savednumberoriginal_tokens − compressed_tokens.
actual_compression_rationumberFraction of input tokens actually removed (0–1). e.g. 0.5 = ~50% removed.
duration_msnumberServer-side wall-clock time for the compression pass.
4. Stream
client.compressStream(...) returns an AsyncGenerator<StreamChunk> — iterate with for await. Each chunk is { content: string; done: boolean; error?: string }; the final chunk has done: true and empty content, or error set if the server aborts mid-stream. Use it anywhere time-to-first-token matters (chat UIs, agent loops); for one-shot calls stick with compress.
The generator is single-use — to consume the stream multiple ways (render to UI and persist to a log), tee it: wrap it in a function that pushes each chunk to multiple sinks before yielding.
5. Batch
client.compressBatch(...) compresses many contexts in one request. Pass contexts: string[] plus either a single queries: string (applied to every context) or a queries: string[] matching contexts in length. Cheaper than firing N concurrent compress() calls — ideal for RAG re-ranking or bulk document processing.
The TypeScript queries param is typed string | string[] — the compiler does not enforce array length matching contexts.length; that check is a runtime ValidationError from the API. Per-item results carry the same fields as a single compress() call except target_compression_ratio (request-level only). The envelope also exposes aggregates: result.data.count, total_original_tokens, total_compressed_tokens, total_tokens_saved, average_compression_ratio.
6. Agent client
Construct CompressionClient with llm and you get an agent surface — three call-shapes (Anthropic-style messages.create, OpenAI-style chat.completions.create, native run) that auto-compress every tool output above minTokens before the LLM sees it. Behind all three sits LangChain.js 1.0's createAgent + the SDK's compresrToolMiddleware. Use it as a drop-in for new Anthropic() / new OpenAI(); for raw { context, query } calls stick with compress.
These surfaces are SDK-shaped and have no direct cURL equivalent. The underlying compression is still the same /api/compress/question-specific/ endpoint — it's what the middleware fires whenever a tool returns.
Construct with llm
Provider lives on the client; model lives at the call site. Swap providers by changing one string — same tools, same code:
The llm string accepts 'anthropic' (provider only — every call must pass model), 'anthropic:claude-haiku-4-5' (default model, overridable at call site), or 'anthropic/claude-haiku-4-5' (Vercel AI SDK convention; both separators accepted). If neither provides a model, the SDK throws CompresrError("model is required …").
Three call shapes
messages.create duck-types Anthropic.Message, chat.completions.create duck-types OpenAI.Chat.Completions.ChatCompletion, and run returns a native NormalizedResult (.text, .toolUses, .citations, .stopReason, .usage).
TypeScript facades are async by default — no acreate / arun variants. The Python SDK exposes acreate and arun.
Web search — WebSearchTool
Backed by Tavily (default) or Brave. The returned object is a real LangChain.js BaseTool; its output flows through compresrToolMiddleware automatically.
Why not Anthropic / OpenAI / Gemini server search?
Provider-native server search tools (web_search_20250305, web_search_preview, google_search) execute server-side and return opaque/encrypted content that Compresr cannot read or compress. Use Tavily or Brave so the result is plaintext. See the Web search guide.
Bring your own tool
Any function wrapped with LangChain.js's tool({...}) works. The string return value is compressed before the LLM sees it.
Streaming isn't on the agent layer yet — client.messages.stream(...) / client.chat.completions.stream(...) throw CompresrError('streaming not yet implemented'). The compression-API stream (compressStream) is unaffected.
Per-call LLM knobs
Forwarded to the underlying chat model: temperature, topP, topK, maxTokens, maxOutputTokens, stop, stopSequences, presencePenalty, frequencyPenalty, seed, logprobs, topLogprobs. Anything else is silently dropped.
Gemini aliasing
When provider == 'google_genai' the SDK renames maxTokens → maxOutputTokens automatically. Pass maxTokens from any provider — the SDK will do the right thing.
Compression knobs — compression: { ... }
Set at client construction. Applies to every tool-output compression the middleware fires.
| Key | Default | Effect |
|---|---|---|
targetCompressionRatio | 0.5 | 0–1 removal strength; >1 = Nx factor (same as compress arg). |
minTokens | 200 | Tool outputs shorter than this skip compression. |
coarse | server default (true) | Paragraph-level vs token-level. |
compressionModelName | 'latte_v1' | Backend validates; latte_v1 is the only public model. |
allowTools | undefined | Whitelist of tool names to compress. |
ignoreTools | undefined | Blacklist of tool names to leave untouched. |
onError | 'passthrough' | 'raise' to fail loudly on backend errors instead of returning the original tool output. |
7. Async
TypeScript is async by default
There are no _async variants in the TypeScript SDK. compress and compressBatch return Promise; compressStream returns AsyncGenerator. The examples in Sections 3, 4, and 5 are the async pattern. Python's _async twins exist because its sync surface is the default — that's the only meaningful difference between the SDKs.
8. Errors & types
Every error thrown by the SDK is an instance of CompresrError. Specific subclasses let you branch on what went wrong with instanceof:
AuthenticationError—401. Missing, malformed, or revoked key. Rotate it.RateLimitError—429. Carries a numericretryAfterfield (seconds). Back off and retry.ValidationError—400/422. Request body failed validation. Fix the payload.CompresrError— base class. Network errors,5xx, or any unexpected response.
Narrow with instanceof, not strings
Do not branch on error.message or status-code strings. The SDK is free to refine wording across patch versions. error instanceof RateLimitError is the stable contract.