SDKs

TypeScript SDK

Compress prompts and retrieved context with the official TypeScript client.

@compresr/sdk is the official TypeScript client. It is fully typed, isomorphic (Node 20+, edge runtimes, modern browsers via a server proxy), and async by default. Every method returns a Promise or an AsyncGenerator.

Naming convention

Inputs use camelCase (compressionModelName, targetCompressionRatio, heuristicChunking, disablePlaceholders); the SDK serializes them to snake_case on the wire. Response fields remain snake_case; they match the API exactly and are stable across SDKs.

1. Install

@compresr/sdk works with npm, pnpm, yarn, and bun. Ships ESM + CJS builds plus .d.ts types; no @types/... companion needed.

bash

The agent client lives in @compresr/sdk/agents

As of @compresr/sdk 1.5+ the agent client layer (client.messages.create, client.chat.completions.create, client.run, WebSearchTool) ships in the same package under the /agents subpath. LangChain.js + provider chat-model + Tavily/Brave are declared as optional peer dependencies; install only the ones for the provider you actually use (e.g. npm install langchain @langchain/core @langchain/anthropic @langchain/tavily).

2. Initialize the client

Construct CompressionClient once at module scope and reuse it: the client keeps an internal connection pool. Pass apiKey; optionally timeout (ms, default 300_000, five minutes) and baseUrl (default https://api.compresr.ai). Uses the native fetch API (Node 20+, modern browsers, edge runtimes); no option to swap in a custom fetch.

python

The non-null assertion (!) on process.env.COMPRESR_API_KEY is fine in code you control; in serverless deployments whose env wiring you do not own, narrow with a guard. See Authentication.

Constructor options

apiKeystring | undefinedOptional

Default: undefined

If omitted, resolved from process.env.COMPRESR_API_KEY. On Node you can also import @compresr/sdk/auth to load credentials from ~/.compresr/credentials.json.

baseUrlstring | undefinedOptional

Default: "https://api.compresr.ai"

API endpoint override; use for regional or self-hosted endpoints.

timeoutnumberOptional

Default: 300_000

Request timeout in milliseconds (per HTTP call). Default is five minutes.

retryRetryConfig | undefinedOptional

Default: undefined (built-in policy)

Override the retry policy. Default retries 429 and 503 with exponential backoff (respects Retry-After). Import RetryConfig from @compresr/sdk.

llmstring | undefinedOptional

Default: undefined

Provider for the agent surface (e.g. "anthropic", "openai:gpt-4o-mini"). Required for client.messages / client.chat / client.run / client.research (they throw CompresrError("missing_llm") otherwise). See Section 6.

llmApiKeystring | undefinedOptional

Default: undefined

API key for the LLM provider. Falls back to ANTHROPIC_API_KEY / OPENAI_API_KEY / GOOGLE_API_KEY depending on llm.

compressionCompressionPolicyOptions | undefinedOptional

Default: undefined

Middleware compression policy applied to every tool output. See Compression knobs.

enablePromptCachebooleanOptional

Default: true

Enable provider-side prompt caching (Anthropic cache_control, OpenAI prompt_cache_key). No-op for Gemini.

promptCacheTtl"5m" | "1h"Optional

Default: "5m"

Anthropic cache TTL. Longer TTL costs more per cache write but survives longer between calls. On OpenAI, "1h" maps to prompt_cache_retention: "24h".

promptCacheMinMessagesnumberOptional

Default: 2

Skip caching for very short conversations (avoids paying the cache-write premium on trivial prompts).

openaiPromptCacheKeystring | undefinedOptional

Default: undefined

Explicit OpenAI prompt_cache_key; when omitted the SDK does not set prompt_cache_key on the OpenAI call and provider defaults apply.

Environment variables

Variable	Purpose
`COMPRESR_API_KEY`	Fallback for `apiKey`. Also read by the cURL examples.
`COMPRESR_BASE_URL`	Read by the cURL examples in this doc; the TS SDK itself does not read it — pass `baseUrl` explicitly to override the default.
`ANTHROPIC_API_KEY` / `OPENAI_API_KEY` / `GOOGLE_API_KEY`	Fallback for `llmApiKey` when the matching provider is used.

3. compress

Single-request compression. Returns a Promise<CompressionResponse>; await it. Pass context, query, and compressionModelName: 'latte_v2'; the model keeps the spans that matter for the query. For many chunks against one query, see compressBatch; for token-by-token output, compressStream.

python

Parameters

latte_v2 accepts every parameter latte_v1 accepts, plus three latte_v2-only knobs for dynamic compression-ratio selection. See the Models reference for the canonical decision guide and the at-a-glance support matrix.

Shared parameters (both models)

contextstringRequired

The long text to compress: RAG chunks, document body, chat history.

querystringOptional

The question the compressed context must still answer. Required for latte_v1; optional for latte_v2. Backend validates.

compressionModelName"latte_v1" | "latte_v2"Optional

Default: "latte_v1"

Routes the call. SDK default is latte_v1 for stability; pass "latte_v2" to opt into the newer backbone. See the Models reference.

targetCompressionRationumberOptional

Removal strength when 0 < r ≤ 1, or Nx target when r > 1 (e.g. 60 = 60×). Server hard-caps at 200. Ignored on latte_v2 when dynamic is true. See Models › target_compression_ratio.

coarsebooleanOptional

Default: undefined

Omitted = backend default (paragraph-level); true locks paragraph-level; false opts into token-level precision.

heuristicChunkingbooleanOptional

Default: undefined

Heuristic splitter (paragraphs, code blocks) instead of fixed-size chunks.

disablePlaceholdersbooleanOptional

Default: undefined

Skip the [...] placeholders inserted where content was dropped.

`latte_v2`-only parameters

dynamicbooleanOptional

Default: undefined

Omitted = server default; true picks the compression ratio per-input automatically inside [dynamicMinRatio, dynamicMaxRatio] and overrides targetCompressionRatio; false explicitly forces the fixed-ratio path. Rejected on latte_v1 with a ValidationError.

dynamicMinRationumberOptional

Default: undefined (server default 1.5)

Floor on the chosen Nx ratio when dynamic is true. Must be ≥ 1.0. Only consulted when dynamic is true.

dynamicMaxRationumberOptional

Default: undefined (server default 10.0)

Ceiling on the chosen Nx ratio when dynamic is true. Must be ≥ 1.0. Only consulted when dynamic is true.

Response

Response field names are snake_case across every SDK; the SDK does not rename them. Matches what comes back over the wire.

CompressionResponse

dataobject
- compressed_contextstring
  The compressed text, ready to drop into your prompt.
- original_contextstring | null
  Optional echo of the input context; populated only when the server returns it.
- original_tokensnumber
  Token count of the input context (tiktoken cl100k).
- compressed_tokensnumber
  Token count of the compressed output.
- tokens_savednumber
  original_tokens − compressed_tokens.
- actual_compression_rationumber
  Fraction of input tokens actually removed (0–1). e.g. 0.5 = ~50% removed.
- duration_msnumber
  Server-side wall-clock time for the compression pass.

4. Stream

client.compressStream(...) returns an AsyncGenerator<StreamChunk>; iterate with for await. Each chunk is { content: string; done: boolean; error?: string }. SSE framing is used for HTTP-level backpressure and forward-compat; the current backend implementation emits a single final frame carrying the full compressed output with done: true, so treat this like a POST that ends with a done: true sentinel — no time-to-first-token benefit yet. error? is reserved on the type but currently unused: server-side aborts propagate as thrown ConnectionError / CompresrError rather than as chunks with .error set. For one-shot calls stick with compress.

python

The generator is single-use. To consume the stream multiple ways (render to UI and persist to a log), tee it: wrap it in a function that pushes each chunk to multiple sinks before yielding.

5. Batch

client.compressBatch(...) compresses many contexts in one request. Pass contexts: string[] plus either a single queries: string (applied to every context) or a queries: string[] matching contexts in length. Cheaper than firing N concurrent compress() calls, and ideal for RAG re-ranking or bulk document processing.

python

The TypeScript queries param is typed string | string[]; the compiler does not enforce array length matching contexts.length, and that check is a runtime ValidationError from the SDK (raised before the request is sent). Batches are capped at 100 inputs per request. Per-item results carry the same fields as a single compress() call except target_compression_ratio (request-level only). The envelope also exposes aggregates: result.data.count, total_original_tokens, total_compressed_tokens, total_tokens_saved, average_compression_ratio.

Alternate form: `inputs: BatchInput[]`

The wire format is a list of { context, query } pairs. Pass inputs instead of contexts / queries when it matches your data shape more naturally (queues, streaming pipelines, per-item queries). Exactly one of inputs OR contexts is required; passing both — or neither — throws ValidationError('Pass \inputs` OR `contexts`, not both.')`.

typescript

6. Agent client

Construct CompressionClient with llm and you get an agent surface: three call-shapes (Anthropic-style messages.create, OpenAI-style chat.completions.create, native run) that auto-compress every tool output above minTokens before the LLM sees it. Behind all three sits LangChain.js 1.0's createAgent + the SDK's compresrToolMiddleware. Use it as a drop-in for new Anthropic() / new OpenAI(); for raw { context, query } calls stick with compress.

These surfaces are SDK-shaped and have no direct cURL equivalent. The underlying compression is still the same /api/compress/question-specific/ endpoint; it's what the middleware fires whenever a tool returns.

Construct with `llm`

Provider lives on the client; model lives at the call site. Swap providers by changing one string: same tools, same code:

typescript

The llm string accepts 'anthropic' (provider only, every call must pass model), 'anthropic:claude-haiku-4-5' (default model, overridable at call site), or 'anthropic/claude-haiku-4-5' (Vercel AI SDK convention; both separators accepted). If neither provides a model, the SDK throws CompresrError("model is required …").

Three call shapes

messages.create duck-types Anthropic.Message, chat.completions.create duck-types OpenAI.Chat.Completions.ChatCompletion, and run returns a native NormalizedResult (.text, .toolUses, .citations, .stopReason, .usage).

typescript

TypeScript facades are async by default; there are no acreate / arun variants. The Python SDK exposes acreate and arun.

Web search: `WebSearchTool`

Three providers ship in the box: Tavily, Brave, and AgentCore (Amazon Bedrock via MCP). All three factories are async and return a LangChain.js StructuredToolInterface; their output flows through compresrToolMiddleware automatically.

typescript

Why not Anthropic / OpenAI / Gemini server search?

Provider-native server search tools (web_search_20250305, web_search_preview, google_search) execute server-side and return opaque/encrypted content that Compresr cannot read or compress. Use Tavily, Brave, or AgentCore so the result is plaintext. See the Web search guide.

Provider reference

Tavily (WebSearchTool.tavily) — install @langchain/tavily. apiKey is passed to the underlying LangChain tool; there is no SDK-side env fallback in the factory itself (the peer package may resolve TAVILY_API_KEY on its own). Supports allowedDomains / blockedDomains natively. Throws CompresrError('missing_peer_dependency') if @langchain/tavily isn't installed.

Brave (WebSearchTool.brave) — install @langchain/community. Reads apiKey, then BRAVE_SEARCH_API_KEY, then BRAVE_API_KEY. Throws CompresrError('missing_api_key') if none of the three are set. allowedDomains / blockedDomains are not supported by Brave (Goggles-based filtering is out of scope). Throws CompresrError('missing_peer_dependency') if @langchain/community isn't installed.

AgentCore (WebSearchTool.agentcore) — install @modelcontextprotocol/sdk. Talks to an Amazon Bedrock AgentCore gateway over MCP streamable-HTTP, authenticated via a Cognito OAuth 2.0 client-credentials handshake. Bearer tokens are cached; a 401 triggers one automatic re-mint. maxResults is clamped to 1..25. gatewayUrl and cognitoTokenUrl must be https:// — the factory throws CompresrError('invalid_config') on plaintext URLs to avoid leaking credentials. allowedDomains / blockedDomains are accepted for signature parity but silently ignored — use Tavily for domain filtering.

AgentCore config resolves per field with precedence explicit option → AgentCore-namespaced env → short env. If any field is unresolved, throws CompresrError('missing_config') listing every missing field:

Option	Env var (primary)	Env var (fallback)
`gatewayUrl`	`AGENTCORE_GATEWAY_MCP_URL`	`GATEWAY_MCP_URL`
`cognitoTokenUrl`	`AGENTCORE_COGNITO_TOKEN_URL`	`COGNITO_TOKEN_URL`
`clientId`	`AGENTCORE_COGNITO_CLIENT_ID`	`COGNITO_CLIENT_ID`
`clientSecret`	`AGENTCORE_COGNITO_CLIENT_SECRET`	`COGNITO_CLIENT_SECRET`
`scope`	`AGENTCORE_COGNITO_SCOPE`	`COGNITO_SCOPE`

`createWebSearchTool` — discriminated-union factory

For code that picks a provider dynamically, createWebSearchTool(provider, options) is a typed overload set. Each overload returns Promise<WebSearchToolInstance>; an unknown provider string throws CompresrError('invalid_provider').

typescript

Bring your own tool

Any function wrapped with LangChain.js's tool({...}) works. The string return value is compressed before the LLM sees it.

typescript

Streaming isn't on the agent layer yet: client.messages.stream(...) / client.chat.completions.stream(...) throw a CompresrError with code === 'not_implemented'. The compression-API stream (compressStream) is unaffected.

Research: `client.research`

When constructed with llm, the client also exposes client.research, a multi-step search-and-summarize loop that runs a web-search tool for you, compresses each snippet before it enters the LLM's context, and returns a structured result with citations. client.research.run(question) runs the full loop (up to maxSteps); client.research.search(question) is the same loop capped at 2 steps for quick lookups.

Accessing client.research when llm was not passed throws CompresrError('missing_llm') — the facade needs a chat model to reason about search results.

typescript

search"tavily" | "brave" | StructuredToolInterfaceOptional

Default: "tavily"

Provider string (uses env-var fallbacks) or a preconstructed WebSearchTool.

maxStepsnumberOptional

Default: 10

Upper bound on search / synthesize iterations. `.search()` overrides this to 2.

modelstring | undefinedOptional

Default: undefined (uses client.llm)

Override the client-level model for this call.

compressSnippetsbooleanOptional

Default: true

Route each search snippet through the compression API before it enters the LLM context.

compressionModelstringOptional

Default: "latte_v1"

Which model runs the snippet compression.

minCompressTokensnumberOptional

Default: 100

Skip compression for snippets shorter than this many tokens.

maxContextTokensnumberOptional

Default: 120_000

Hard ceiling on total tokens across all compressed snippets before synthesis.

systemPromptstring | undefinedOptional

Default: undefined

Override the built-in system prompt (see DEFAULT_RESEARCH_SYSTEM_PROMPT).

ResearchResult fields: answer: string, explanation: string, confidence: number | null, text: string, citations: Citation[], trajectory: Step[], usage: ResearchUsage (input_tokens, output_tokens, cache_read_tokens, cache_creation_tokens, calls, search_calls — all numbers, snake_case), raw: unknown (the raw provider response). Each research Citation is { url: string; title?: string; snippet?: string } — distinct from the Citation on NormalizedResult (.citedText, .providerMetadata), which the run / messages / chat.completions surfaces return.

Per-call LLM knobs

Forwarded to the underlying chat model: temperature, topP, topK, maxTokens, maxOutputTokens, stop, stopSequences, presencePenalty, frequencyPenalty, seed, logprobs, topLogprobs. Anything else is silently dropped.

typescript

Gemini aliasing

When provider == 'google_genai' the SDK renames maxTokens → maxOutputTokens automatically. Pass maxTokens from any provider; the SDK will do the right thing.

Compression knobs: `compression: { ... }`

Set at client construction. Applies to every tool-output compression the middleware fires. The model-routing keys mirror compress() — compressionModelName picks the backbone, and the compression-shaping keys forward through to the same /compress/question-specific/ endpoint.

Shared keys (accepted regardless of compressionModelName):

Key	Default	Effect
`compressionModelName`	`'latte_v1'`	Backend validates; `'latte_v1'` and `'latte_v2'` are both public. See Models.
`targetCompressionRatio`	`0.5`	0–1 removal strength; `>1` = Nx factor (same as `compress` arg). Ignored on `latte_v2` when `dynamic: true`.
`minTokens`	`200`	Tool outputs shorter than this skip compression. Middleware-side gate; not forwarded to the API.
`coarse`	server default (`true`)	Paragraph-level vs token-level.
`allowTools`	`undefined`	Whitelist of tool names to compress.
`ignoreTools`	`undefined`	Blacklist of tool names to leave untouched.
`onError`	`'passthrough'`	`'raise'` to fail loudly on backend errors instead of returning the original tool output.

The middleware policy doesn't expose the dynamic* latte_v2-only knobs. If you need adaptive ratio selection on tool outputs, call client.compress(...) directly with dynamic: true instead of routing through the middleware.

7. Async

TypeScript is async by default

There are no _async variants in the TypeScript SDK. compress and compressBatch return Promise; compressStream returns AsyncGenerator. The examples in Sections 3, 4, and 5 are the async pattern. Python's _async twins exist because its sync surface is the default; that's the only meaningful difference between the SDKs.

python

8. Errors & types

Every Compresr error inherits from CompresrError. Catch the base for a single handler; catch subclasses when recovery differs. Every subclass carries a stable code string and, where relevant, structured attributes you can branch on (e.g. err.retryAfter, err.creditsRemaining, err.availableModels) instead of parsing prose.

Only the base classes are importable

Only these classes are exported from @compresr/sdk root and safe for instanceof checks: CompresrError, AuthenticationError, ValidationError, RateLimitError, ScopeError, NotFoundError, ServerError, ConnectionError. Every other row below ships as a CompresrError instance with the matching code — branch on err.code === 'insufficient_credits' (etc.) rather than err instanceof InsufficientCreditsError. The table enumerates the possible err.code values on CompresrError; the Importable column marks which rows also have a dedicated exported class.

Exception	HTTP	`code`	Importable	Structured fields
`AuthenticationError`	401	`authentication_error`	yes	—
`ScopeError`	403	`scope_error`	yes	`requiredScope`
`NotFoundError`	404	`not_found`	yes	—
`RateLimitError`	429	`rate_limit_exceeded`	yes	`retryAfter?: number`
`ValidationError`	400 / 422	`validation_error`	yes	—
`InsufficientCreditsError`	402	`insufficient_credits`	no (`CompresrError`)	`creditsRequired`, `creditsRemaining`
`BudgetLimitError`	402	`budget_limit_reached`	no (`CompresrError`)	`currentBudget`, `budgetUsed`
`ApiKeyBudgetError`	402	`api_key_budget_exceeded`	no (`CompresrError`)	`apiKeyBudget`, `apiKeyUsed`
`DailyLimitError`	429	`daily_limit_exceeded`	no (`CompresrError`)	`dailyLimit`, `requestsUsed`
`ModelNotFoundError`	404	`model_not_found`	no (`CompresrError`)	`modelName`, `availableModels`
`ContextWindowExceededError`	413	`context_window_exceeded`	no (`CompresrError`)	`maxTokens`, `actualTokens`
`ContentPolicyError`	400	`content_policy_violation`	no (`CompresrError`)	`provider`
`TargetAuthenticationError`	401	`target_authentication_error`	no (`CompresrError`)	`provider`
`ServiceUnavailableError`	503	`service_unavailable`	no (`CompresrError`)	`retryAfter?: number`
`ServerError`	5xx	`server_error`	yes	—
`TimeoutError`	—	`timeout`	no (`CompresrError`)	—
`ConnectionError`	—	`connection_error`	yes	—
`CompresrError`	—	(varies, e.g. `missing_llm`, `missing_model`, `missing_api_key`, `missing_config`, `invalid_config`, `invalid_llm_spec`, `invalid_provider`, `invalid_base_url`, `insecure_base_url`, `invalid_endpoint`, `invalid_response`, `response_too_large`, `not_implemented`, `missing_peer_dependency`, `agent_execution_failed`, `agentcore_no_tool`, `agentcore_bad_response`, `agentcore_auth_error`, `agentcore_tool_error`, `compresr_error`)	yes	base class; catch it as a fallback

RateLimitError.retryAfter can be undefined

When the server omits the Retry-After header, err.retryAfter reads as undefined. Guard with (err.retryAfter ?? 1) * 1000 when passing to setTimeout — a bare err.retryAfter * 1000 becomes NaN and fires immediately.

Network/timeout failures throw ConnectionError and TimeoutError respectively (not the base CompresrError); code that branches on err.code === 'connection_error' will match them.

python

Narrow with instanceof, not strings

Do not branch on error.message or status-code strings. The SDK is free to refine wording across patch versions. error instanceof RateLimitError is the stable contract.

Constructor options

Environment variables

Shared parameters (both models)

latte_v2-only parameters

Alternate form: inputs: BatchInput[]

Provider reference

createWebSearchTool — discriminated-union factory

`latte_v2`-only parameters

Alternate form: `inputs: BatchInput[]`

`createWebSearchTool` — discriminated-union factory