Skip to content
Compresr docs

SDKs

TypeScript SDK

Compress prompts and retrieved context with the official TypeScript client.

@compresr/sdk is the official TypeScript client. It is fully typed, isomorphic (Node 18+, edge runtimes, modern browsers via a server proxy), and async by default. Every method returns a Promise or an AsyncGenerator.

Naming convention

Inputs use camelCase (compressionModelName, targetCompressionRatio, heuristicChunking, disablePlaceholders); the SDK serializes them to snake_case on the wire. Response fields remain snake_case — they match the API exactly and are stable across SDKs.

1. Install

@compresr/sdk works with npm, pnpm, yarn, and bun. Ships ESM + CJS builds plus .d.ts types — no @types/... companion needed.

bash

The agent client lives in @compresr/sdk/agents

As of @compresr/sdk 1.5.x the agent client layer (client.messages.create, client.chat.completions.create, client.run, WebSearchTool) ships in the same package under the /agents subpath. LangChain.js + provider chat-model + Tavily/Brave are declared as optional peer dependencies — install only the ones for the provider you actually use (e.g. npm install langchain @langchain/core @langchain/anthropic @langchain/tavily).

2. Initialize the client

Construct CompressionClient once at module scope and reuse it — the client keeps an internal connection pool. Pass apiKey; optionally timeout (ms, default 300_000 — five minutes) and baseUrl (default https://api.compresr.ai). Uses the native fetch API (Node 18+, modern browsers, edge runtimes); no option to swap in a custom fetch.

python

The non-null assertion (!) on process.env.COMPRESR_API_KEY is fine in code you control; in serverless deployments whose env wiring you do not own, narrow with a guard. See Authentication.

3. compress

Single-request compression. Returns a Promise<CompressionResponse>await it. Pass context, query, and compressionModelName: 'latte_v1'; the model keeps the spans that matter for the query. For many chunks against one query, see compressBatch; for token-by-token output, compressStream.

python

Parameters

contextstringRequired
The long text to compress: RAG chunks, document body, chat history.
querystringRequired
The question the compressed context must still answer. Required for latte_v1.
compression_model_name"latte_v1"Required
Public model identifier. latte_v1 is the only public model.
target_compression_rationumberOptional
Removal strength when 0 < r ≤ 1, or Nx target when r > 1. See the Models reference.
coarsebooleanOptional
Default: undefined
Latte-only. Omitted = backend default (paragraph-level); true locks paragraph-level; false opts into token-level precision.
heuristic_chunkingbooleanOptional
Default: undefined
Latte-only. Heuristic splitter (paragraphs, code blocks) instead of fixed-size chunks.
disable_placeholdersbooleanOptional
Default: undefined
Latte-only. Skip the [...] placeholders inserted where content was dropped.

Response

Response field names are snake_case across every SDK — the SDK does not rename them. Matches what comes back over the wire.

CompressionResponse
  • dataobject
    • compressed_contextstring

      The compressed text, ready to drop into your prompt.

    • original_tokensnumber

      Token count of the input context (tiktoken cl100k).

    • compressed_tokensnumber

      Token count of the compressed output.

    • tokens_savednumber

      original_tokens − compressed_tokens.

    • actual_compression_rationumber

      Fraction of input tokens actually removed (0–1). e.g. 0.5 = ~50% removed.

    • duration_msnumber

      Server-side wall-clock time for the compression pass.

4. Stream

client.compressStream(...) returns an AsyncGenerator<StreamChunk> — iterate with for await. Each chunk is { content: string; done: boolean; error?: string }; the final chunk has done: true and empty content, or error set if the server aborts mid-stream. Use it anywhere time-to-first-token matters (chat UIs, agent loops); for one-shot calls stick with compress.

python

The generator is single-use — to consume the stream multiple ways (render to UI and persist to a log), tee it: wrap it in a function that pushes each chunk to multiple sinks before yielding.

5. Batch

client.compressBatch(...) compresses many contexts in one request. Pass contexts: string[] plus either a single queries: string (applied to every context) or a queries: string[] matching contexts in length. Cheaper than firing N concurrent compress() calls — ideal for RAG re-ranking or bulk document processing.

python

The TypeScript queries param is typed string | string[] — the compiler does not enforce array length matching contexts.length; that check is a runtime ValidationError from the API. Per-item results carry the same fields as a single compress() call except target_compression_ratio (request-level only). The envelope also exposes aggregates: result.data.count, total_original_tokens, total_compressed_tokens, total_tokens_saved, average_compression_ratio.

6. Agent client

Construct CompressionClient with llm and you get an agent surface — three call-shapes (Anthropic-style messages.create, OpenAI-style chat.completions.create, native run) that auto-compress every tool output above minTokens before the LLM sees it. Behind all three sits LangChain.js 1.0's createAgent + the SDK's compresrToolMiddleware. Use it as a drop-in for new Anthropic() / new OpenAI(); for raw { context, query } calls stick with compress.

These surfaces are SDK-shaped and have no direct cURL equivalent. The underlying compression is still the same /api/compress/question-specific/ endpoint — it's what the middleware fires whenever a tool returns.

Construct with llm

Provider lives on the client; model lives at the call site. Swap providers by changing one string — same tools, same code:

typescript

The llm string accepts 'anthropic' (provider only — every call must pass model), 'anthropic:claude-haiku-4-5' (default model, overridable at call site), or 'anthropic/claude-haiku-4-5' (Vercel AI SDK convention; both separators accepted). If neither provides a model, the SDK throws CompresrError("model is required …").

Three call shapes

messages.create duck-types Anthropic.Message, chat.completions.create duck-types OpenAI.Chat.Completions.ChatCompletion, and run returns a native NormalizedResult (.text, .toolUses, .citations, .stopReason, .usage).

typescript

TypeScript facades are async by default — no acreate / arun variants. The Python SDK exposes acreate and arun.

Web search — WebSearchTool

Backed by Tavily (default) or Brave. The returned object is a real LangChain.js BaseTool; its output flows through compresrToolMiddleware automatically.

typescript

Why not Anthropic / OpenAI / Gemini server search?

Provider-native server search tools (web_search_20250305, web_search_preview, google_search) execute server-side and return opaque/encrypted content that Compresr cannot read or compress. Use Tavily or Brave so the result is plaintext. See the Web search guide.

Bring your own tool

Any function wrapped with LangChain.js's tool({...}) works. The string return value is compressed before the LLM sees it.

typescript

Streaming isn't on the agent layer yet — client.messages.stream(...) / client.chat.completions.stream(...) throw CompresrError('streaming not yet implemented'). The compression-API stream (compressStream) is unaffected.

Per-call LLM knobs

Forwarded to the underlying chat model: temperature, topP, topK, maxTokens, maxOutputTokens, stop, stopSequences, presencePenalty, frequencyPenalty, seed, logprobs, topLogprobs. Anything else is silently dropped.

typescript

Gemini aliasing

When provider == 'google_genai' the SDK renames maxTokensmaxOutputTokens automatically. Pass maxTokens from any provider — the SDK will do the right thing.

Compression knobs — compression: { ... }

Set at client construction. Applies to every tool-output compression the middleware fires.

KeyDefaultEffect
targetCompressionRatio0.50–1 removal strength; >1 = Nx factor (same as compress arg).
minTokens200Tool outputs shorter than this skip compression.
coarseserver default (true)Paragraph-level vs token-level.
compressionModelName'latte_v1'Backend validates; latte_v1 is the only public model.
allowToolsundefinedWhitelist of tool names to compress.
ignoreToolsundefinedBlacklist of tool names to leave untouched.
onError'passthrough''raise' to fail loudly on backend errors instead of returning the original tool output.

7. Async

TypeScript is async by default

There are no _async variants in the TypeScript SDK. compress and compressBatch return Promise; compressStream returns AsyncGenerator. The examples in Sections 3, 4, and 5 are the async pattern. Python's _async twins exist because its sync surface is the default — that's the only meaningful difference between the SDKs.

python

8. Errors & types

Every error thrown by the SDK is an instance of CompresrError. Specific subclasses let you branch on what went wrong with instanceof:

  • AuthenticationError401. Missing, malformed, or revoked key. Rotate it.
  • RateLimitError429. Carries a numeric retryAfter field (seconds). Back off and retry.
  • ValidationError400 / 422. Request body failed validation. Fix the payload.
  • CompresrError — base class. Network errors, 5xx, or any unexpected response.
python

Narrow with instanceof, not strings

Do not branch on error.message or status-code strings. The SDK is free to refine wording across patch versions. error instanceof RateLimitError is the stable contract.