# https://compresr.ai/docs/sdks/python

> Human-readable page: https://compresr.ai/docs/sdks/python

The `compresr` package is the official Python client. It wraps the REST API with typed methods, handles auth, and ships both sync and async variants of every call. Python 3.9+.

## 1. Install

Install from PyPI: `pip`, `poetry`, and `uv` all work.

pip:

```bash
pip install compresr
```

poetry:

```bash
poetry add compresr
```

uv:

```bash
uv add compresr
```

> **The agent client ships in the base install**
> As of `compresr 2.6.0` the [agent client](#6-agent-client) layer (`client.messages.create`, `client.chat.completions.create`, `client.run`, `WebSearchTool`) is part of the base install: `pip install compresr` is enough. LangChain + provider chat-model + Tavily/Brave deps are pulled in automatically. Old `compresr[agents]` / `compresr[agents-all]` brackets still work as no-op aliases.

## 2. Initialize the client

Construct `CompressionClient` once at module scope and reuse it: the client keeps an internal `httpx` connection pool. Read the key from env, never hardcode.

The constructor takes `api_key` (required), plus optional `base_url` (defaults to `https://api.compresr.ai`) and `timeout` (seconds; default uses the SDK's built-in timeout). Override `base_url` only for regional or self-hosted endpoints.

```python
import os
from compresr import CompressionClient

# Minimal (recommended)
client = CompressionClient(api_key=os.environ["COMPRESR_API_KEY"])

# Explicit overrides
client = CompressionClient(
    api_key=os.environ["COMPRESR_API_KEY"],
    base_url="https://api.compresr.ai",  # optional
    timeout=300,                          # optional, seconds
)
```

**TypeScript**

```typescript
import { CompressionClient } from '@compresr/sdk';

const client = new CompressionClient({
  apiKey: process.env.COMPRESR_API_KEY!,
});
```

**cURL**

```bash
# No client object - set the key and base URL once.
export COMPRESR_API_KEY="cmp_your_api_key"
export COMPRESR_BASE_URL="https://api.compresr.ai"
```

See [Authentication](/docs/authentication) for key rotation, budgets, and rules.

## 3. compress

Synchronous single-request compression. Pass `context`, `query`, and `compression_model_name="latte_v1"`; the model keeps the spans that matter for the query. For many chunks against one query, see [`compress_batch`](#5-batch); for token-by-token output, [`compress_stream`](#4-stream).

```python
result = client.compress(
    context=(
        "The James Webb Space Telescope (JWST) is a space telescope designed "
        "primarily to conduct infrared astronomy. Its 6.5-metre primary mirror "
        "is composed of 18 gold-coated hexagonal beryllium segments. JWST orbits "
        "the Sun near the Sun-Earth L2 Lagrange point, about 1.5 million "
        "kilometres from Earth, where its sunshield keeps the instruments "
        "below 50 K. Launched on 25 December 2021, it is operated jointly by "
        "NASA, ESA, and the Canadian Space Agency."
    ),
    query="What is the diameter of JWST's primary mirror?",
    compression_model_name="latte_v1",
    target_compression_ratio=0.5,
)

print(result.data.compressed_context)
print(
    f"{result.data.original_tokens} → {result.data.compressed_tokens} tokens "
    f"({result.data.actual_compression_ratio:.2f}x)"
)
```

**TypeScript**

```typescript
const result = await client.compress({
  context:
    "The James Webb Space Telescope (JWST) is a space telescope designed " +
    "primarily to conduct infrared astronomy. Its 6.5-metre primary mirror " +
    "is composed of 18 gold-coated hexagonal beryllium segments. JWST orbits " +
    "the Sun near the Sun-Earth L2 Lagrange point, about 1.5 million " +
    "kilometres from Earth, where its sunshield keeps the instruments " +
    "below 50 K. Launched on 25 December 2021, it is operated jointly by " +
    "NASA, ESA, and the Canadian Space Agency.",
  query: "What is the diameter of JWST's primary mirror?",
  compressionModelName: 'latte_v1',
  targetCompressionRatio: 0.5,
});

console.log(result.data.compressed_context);
```

**cURL**

```bash
curl -X POST https://api.compresr.ai/api/compress/question-specific/ \
  -H "X-API-Key: $COMPRESR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "context": "The James Webb Space Telescope (JWST) is a space telescope...",
    "query": "What is the diameter of JWST'"'"'s primary mirror?",
    "compression_model_name": "latte_v1",
    "target_compression_ratio": 0.5
  }'
```

### Parameters

`latte_v2` accepts every parameter `latte_v1` accepts, **plus** three `latte_v2`-only knobs for dynamic compression-ratio selection. See the [Models reference](/docs/api-reference/models) for the canonical decision guide and the at-a-glance support matrix.

#### Shared parameters (both models)

| Parameter | Type | Required | Description |
| --- | --- | --- | --- |
| `context` | string | yes | The long text to compress: RAG chunks, document body, chat history. |
| `query` | string | yes | The question the compressed context must still answer. Required for both models. |
| `compression_model_name` | "latte_v1" \| "latte_v2" | yes | Routes the call. `latte_v2` is the recommended default. See the [Models](/docs/api-reference/models) reference. |
| `target_compression_ratio` | number | no | Removal strength when `0 < r ≤ 1`, or Nx target when `r > 1`. See [Models › target_compression_ratio](/docs/api-reference/models#target_compression_ratio). Ignored on `latte_v2` when `dynamic=True`. |
| `coarse` | boolean \| None | no | `None` = backend default (paragraph-level); `True` locks paragraph-level; `False` opts into token-level precision. |
| `heuristic_chunking` | boolean \| None | no | Heuristic splitter (paragraphs, code blocks) instead of fixed-size chunks. |
| `disable_placeholders` | boolean \| None | no | Skip the `[...]` placeholders inserted where content was dropped. |

#### `latte_v2`-only parameters

| Parameter | Type | Required | Description |
| --- | --- | --- | --- |
| `dynamic` | boolean | no | Pick the compression ratio per-input via Kneedle elbow selection inside `[dynamic_min_ratio, dynamic_max_ratio]`; overrides `target_compression_ratio` when `True`. Rejected on `latte_v1` with `ValidationError`. |
| `dynamic_min_ratio` | float \| None | no | Floor on the chosen Nx ratio when `dynamic=True`. Must be `≥ 1.0`. Only consulted when `dynamic=True`. |
| `dynamic_max_ratio` | float \| None | no | Ceiling on the chosen Nx ratio when `dynamic=True`. Must be `≥ 1.0`. Only consulted when `dynamic=True`. |

### Response

`compress()` returns a typed object; access fields as attributes (`result.data.compressed_context`). Response field names stay snake_case across every SDK.

| Field | Type | Description |
| --- | --- | --- |
| `data` | object |  |
| `data.compressed_context` | string | The compressed text, ready to drop into your prompt. |
| `data.original_tokens` | integer | Token count of the input context (tiktoken cl100k). |
| `data.compressed_tokens` | integer | Token count of the compressed output. |
| `data.tokens_saved` | integer | original_tokens − compressed_tokens. |
| `data.actual_compression_ratio` | number | Fraction of input tokens actually removed (0–1). e.g. 0.5 = ~50% removed. |
| `data.duration_ms` | integer | Server-side wall-clock time for the compression pass. |

## 4. Stream

`client.compress_stream(...)` returns an iterator yielding `{content, done}` chunks as the model produces them; the final chunk has `done=True` and empty `content`. Use it anywhere time-to-first-token matters (UIs, agent loops); for one-shot calls stick with [`compress`](#3-compress).

```python
for chunk in client.compress_stream(
    context=long_document,
    query="What was the project's Q3 churn rate?",
    compression_model_name="latte_v1",
):
    print(chunk.content, end="", flush=True)
    if chunk.done:
        break
```

**TypeScript**

```typescript
for await (const chunk of client.compressStream({
  context: longDocument,
  query: "What was the project's Q3 churn rate?",
  compressionModelName: 'latte_v1',
})) {
  process.stdout.write(chunk.content);
  if (chunk.done) break;
}
```

**cURL**

```bash
curl -N -X POST https://api.compresr.ai/api/compress/question-specific/stream \
  -H "X-API-Key: $COMPRESR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "context": "<your long document>",
    "query": "What was the project'"'"'s Q3 churn rate?",
    "compression_model_name": "latte_v1"
  }'
```

The iterator is a normal generator: wrap it in `itertools.islice`, push chunks through a queue, or consume from a worker thread. Same `context` / `query` / `compression_model_name` rules as `compress()`.

## 5. Batch

`client.compress_batch(...)` compresses many contexts in one request. Pass `contexts: list[str]` plus either a single `queries: str` (applied to every context) or a `queries: list[str]` matching `contexts` in length. Cheaper than firing N concurrent `compress()` calls, and ideal for RAG re-ranking or bulk document processing.

```python
# One query against many candidate contexts (typical RAG re-ranking shape).
result = client.compress_batch(
    contexts=[chunk_1, chunk_2, chunk_3, chunk_4],
    queries="What did the customer cite as the reason for churn?",
    compression_model_name="latte_v1",
)

# Or per-context queries: one query per context, same length.
result = client.compress_batch(
    contexts=[doc_a, doc_b, doc_c],
    queries=[
        "Who signed the contract?",
        "When was the renewal date?",
        "What was the agreed unit price?",
    ],
    compression_model_name="latte_v1",
)

for item in result.data.results:
    print(item.compressed_context)
```

**TypeScript**

```typescript
// One query against many candidate contexts.
const result = await client.compressBatch({
  contexts: [chunk1, chunk2, chunk3, chunk4],
  queries: 'What did the customer cite as the reason for churn?',
  compressionModelName: 'latte_v1',
});

// Or per-context queries: same length as contexts.
const perItem = await client.compressBatch({
  contexts: [docA, docB, docC],
  queries: [
    'Who signed the contract?',
    'When was the renewal date?',
    'What was the agreed unit price?',
  ],
  compressionModelName: 'latte_v1',
});

for (const item of result.data.results) {
  console.log(item.compressed_context);
}
```

**cURL**

```bash
curl -X POST https://api.compresr.ai/api/compress/question-specific/batch \
  -H "X-API-Key: $COMPRESR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contexts": ["<chunk 1>", "<chunk 2>", "<chunk 3>", "<chunk 4>"],
    "queries": "What did the customer cite as the reason for churn?",
    "compression_model_name": "latte_v1"
  }'
```

`queries` is either a string (applied to every context) or a list matching `contexts` in length; mixing the two raises `ValidationError`. Per-item results carry the same fields as a single `compress()` call **except** `target_compression_ratio` (request-level only). The envelope also exposes aggregates: `result.data.count`, `total_original_tokens`, `total_compressed_tokens`, `total_tokens_saved`, `average_compression_ratio`.

## 6. Agent client

Construct `CompressionClient` with `llm=` and you get an **agent surface**: three call-shapes (Anthropic-style `messages.create`, OpenAI-style `chat.completions.create`, native `run`) that auto-compress every tool output above `min_tokens` before the LLM sees it. Behind all three sits LangChain 1.0's `create_agent` + the SDK's `CompresrToolMiddleware`. Use it as a drop-in for `anthropic.Anthropic()` / `openai.OpenAI()`; for raw `(context, query)` calls stick with [`compress`](#3-compress).

> These surfaces are SDK-shaped and have no direct cURL equivalent. The underlying compression is still the same `/api/compress/question-specific/` endpoint; it's what the middleware fires whenever a tool returns.

### Construct with `llm=`

Provider lives on the client; **model lives at the call site**. Swap providers by changing one string: same tools, same code:

```python
import os
from compresr import CompressionClient, WebSearchTool

client = CompressionClient(
    api_key=os.environ["COMPRESR_API_KEY"],
    llm="anthropic",                                # or "openai", "google_genai"
    llm_api_key=os.environ["ANTHROPIC_API_KEY"],
    compression={"target_compression_ratio": 0.5, "min_tokens": 300},
)
```

**TypeScript**

```typescript
import { CompressionClient, WebSearchTool } from '@compresr/sdk';

const client = new CompressionClient({
  apiKey: process.env.COMPRESR_API_KEY!,
  llm: 'anthropic',                                 // or 'openai', 'google_genai'
  llmApiKey: process.env.ANTHROPIC_API_KEY!,
  compression: { targetCompressionRatio: 0.5, minTokens: 300 },
});
```

The `llm` string accepts `"anthropic"` (provider only, every call must pass `model="..."`), `"anthropic:claude-haiku-4-5"` (default model, overridable at call site), or `"anthropic/claude-haiku-4-5"` (Vercel AI SDK convention; both separators accepted). If neither provides a model, the SDK raises `CompresrError("model is required …")`.

### Three call shapes

`messages.create` duck-types `anthropic.types.Message`, `chat.completions.create` duck-types `openai.types.chat.ChatCompletion`, and `run` returns a native `NormalizedResult` (`.text`, `.tool_uses`, `.citations`, `.stop_reason`, `.usage`).

```python
tavily = WebSearchTool.tavily(api_key=os.environ["TAVILY_API_KEY"], max_results=3)
messages = [{"role": "user", "content": "What's the latest AI news?"}]

# Anthropic shape
msg = client.messages.create(model="claude-haiku-4-5", max_tokens=512, messages=messages, tools=[tavily])
# OpenAI shape
completion = client.chat.completions.create(model="gpt-4o-mini", messages=messages, tools=[tavily])
# Native
result = client.run(prompt="What's the latest AI news?", model="claude-haiku-4-5", tools=[tavily], max_tokens=512)
```

**TypeScript**

```typescript
const search = await WebSearchTool.tavily({ apiKey: process.env.TAVILY_API_KEY!, maxResults: 3 });
const messages = [{ role: 'user' as const, content: "What's the latest AI news?" }];

// Anthropic shape
const msg = await client.messages.create({ model: 'claude-haiku-4-5', maxTokens: 512, messages, tools: [search] });
// OpenAI shape
const completion = await client.chat.completions.create({ model: 'gpt-4o-mini', messages, tools: [search] });
// Native
const result = await client.run({ prompt: "What's the latest AI news?", model: 'claude-haiku-4-5', tools: [search], maxTokens: 512 });
```

Python also exposes async variants: `acreate`, `arun`. TypeScript is async by default.

### Web search: `WebSearchTool`

Backed by Tavily (default) or Brave. The returned object is a real LangChain `BaseTool`; its output flows through `CompresrToolMiddleware` automatically.

```python
from compresr import WebSearchTool

tavily = WebSearchTool.tavily(
    api_key=os.environ["TAVILY_API_KEY"],
    max_results=5,
    allowed_domains=["nytimes.com"],   # optional
    blocked_domains=["example.com"],   # optional
)

brave = WebSearchTool.brave(api_key=os.environ["BRAVE_API_KEY"], max_results=5)
```

**TypeScript**

```typescript
import { WebSearchTool } from '@compresr/sdk';

const tavily = await WebSearchTool.tavily({
  apiKey: process.env.TAVILY_API_KEY!,
  maxResults: 5,
  allowedDomains: ['nytimes.com'],
});

const brave = await WebSearchTool.brave({
  apiKey: process.env.BRAVE_API_KEY!,
  maxResults: 5,
});
```

> **Why not Anthropic / OpenAI / Gemini server search?**
> Provider-native server search tools (`web_search_20250305`, `web_search_preview`, `google_search`) execute server-side and return opaque/encrypted content that Compresr cannot read or compress. Use Tavily or Brave so the result is plaintext. See the [Web search guide](/docs/guides/web-search).

### Bring your own tool

Any LangChain `@tool`-decorated function works. The string return value is compressed before the LLM sees it.

```python
from langchain_core.tools import tool

@tool
def kb_lookup(topic: str) -> str:
    """Look up the internal policy on the given topic."""
    return INTERNAL_KB.get(topic, "Not found.")

client.messages.create(model="claude-haiku-4-5", max_tokens=256,
    messages=[{"role": "user", "content": "Refund policy?"}], tools=[kb_lookup])
```

**TypeScript**

```typescript
import { tool } from '@langchain/core/tools';
import { z } from 'zod';

const kbLookup = tool(async ({ topic }) => INTERNAL_KB[topic] ?? 'Not found.', {
  name: 'kb_lookup',
  description: 'Look up the internal policy on the given topic.',
  schema: z.object({ topic: z.string() }),
});

await client.messages.create({ model: 'claude-haiku-4-5', maxTokens: 256,
  messages: [{ role: 'user', content: 'Refund policy?' }], tools: [kbLookup] });
```

Streaming isn't on the agent layer yet: `client.messages.stream(...)` / `client.chat.completions.stream(...)` throw `CompresrError('streaming not yet implemented')`. The compression-API stream ([`compress_stream`](#4-stream)) is unaffected.

### Per-call LLM knobs

Forwarded to the underlying chat model: `temperature, top_p, top_k, max_tokens, max_output_tokens, stop, stop_sequences, presence_penalty, frequency_penalty, seed, logprobs, top_logprobs`. Anything else is silently dropped.

```python
client.messages.create(model="claude-haiku-4-5", max_tokens=512,
    temperature=0.2, top_p=0.9, messages=[...], tools=[...])
```

**TypeScript**

```typescript
await client.messages.create({ model: 'claude-haiku-4-5', maxTokens: 512,
  temperature: 0.2, topP: 0.9, messages: [...], tools: [...] });
```

> **Gemini aliasing**
> When `provider == "google_genai"` the SDK renames `max_tokens` → `max_output_tokens` automatically. Pass `max_tokens` from any provider; the SDK will do the right thing.

### Compression knobs: `compression={...}`

Set at client construction. Applies to every tool-output compression the middleware fires. The model-routing keys mirror [`compress()`](#3-compress) — `compression_model_name` picks the backbone, and the compression-shaping keys forward through to the same `/compress/question-specific/` endpoint.

**Shared keys (accepted regardless of `compression_model_name`):**

| Key | Default | Effect |
|---|---|---|
| `compression_model_name` | `"latte_v1"` | Backend validates; `"latte_v1"` and `"latte_v2"` are both public. See [Models](/docs/api-reference/models). |
| `target_compression_ratio` | `0.5` | 0–1 removal strength; `>1` = Nx factor (same as [`compress`](#3-compress) arg). Ignored on `latte_v2` when `dynamic=True`. |
| `min_tokens` | `200` | Tool outputs shorter than this skip compression. Middleware-side gate; not forwarded to the API. |
| `coarse` | server default (`True`) | Paragraph-level vs token-level. |
| `heuristic_chunking` | server default (`False`) | Structure-aware chunker before scoring. |
| `disable_placeholders` | server default (`False`) | Drop the `[...]` markers between kept spans. |
| `allow_tools` | `None` | Whitelist of tool names to compress. |
| `ignore_tools` | `None` | Blacklist of tool names to leave untouched. |
| `on_error` | `"passthrough"` | `"raise"` to fail loudly on backend errors instead of returning the original tool output. |

**`latte_v2`-only keys** (set `compression_model_name="latte_v2"` first; the backend rejects these with `422` on `latte_v1`):

| Key | Default | Effect |
|---|---|---|
| `dynamic` | `False` | Pick the compression ratio per-tool-output via Kneedle elbow selection. Overrides `target_compression_ratio` when `True`. |
| `dynamic_min_ratio` | `1.5` | Floor on the chosen Nx ratio when `dynamic=True`. Must be `≥ 1.0`. |
| `dynamic_max_ratio` | `10.0` | Ceiling on the chosen Nx ratio when `dynamic=True`. Must be `≥ 1.0`. |

## 7. Async

`compress_async` and `compress_batch_async` are the async twins of `compress` and `compress_batch`: same params, return awaitables. Streaming is sync-only (no `compress_stream_async`). Call `await client.aclose()` when done to release the `httpx` pool, or use the client as an async context manager (`async with CompressionClient(...) as client:`). Use these inside event loops (FastAPI handlers, Discord bots, agent runtimes); for scripts the sync methods are simpler.

```python
import asyncio
import os
from compresr import CompressionClient

async def main():
    client = CompressionClient(api_key=os.environ["COMPRESR_API_KEY"])
    try:
        result = await client.compress_async(
            context=long_document,
            query="What was the project's Q3 churn rate?",
            compression_model_name="latte_v1",
        )
        print(result.data.compressed_context)
    finally:
        await client.aclose()

asyncio.run(main())
```

**TypeScript**

```typescript
// The TypeScript SDK is already async. compress, compressStream,
// and compressBatch return Promise or AsyncGenerator - no _async variants.
const result = await client.compress({
  context: longDocument,
  query: "What was the project's Q3 churn rate?",
  compressionModelName: 'latte_v1',
});
console.log(result.data.compressed_context);
```

**cURL**

```bash
# HTTP is already request/response - there is no async pattern specific to cURL.
# For token-by-token output, use the /stream endpoint (Section 4).
# For concurrency, fire requests in parallel with shell jobs:
curl ... &
curl ... &
wait
```

## 8. Errors & types

Every Compresr error inherits from `CompresrError`. Catch the base for a single handler; catch subclasses when recovery differs.

- **`AuthenticationError`**: `401`. Missing, malformed, or revoked key. Rotate it.
- **`RateLimitError`**: `429`. Carries a `retry_after` attribute (seconds). Back off and retry.
- **`ValidationError`**: `400` / `422`. Request body failed validation (e.g. `target_compression_ratio` out of range, missing `query`). Fix the payload.
- **`CompresrError`**: base class. Network errors, `5xx`, or anything unexpected.

```python
import time
import logging
from compresr import (
    CompresrError,
    AuthenticationError,
    RateLimitError,
    ValidationError,
)

logger = logging.getLogger(__name__)

try:
    result = client.compress(
        context=long_document,
        query="What was the project's Q3 churn rate?",
        compression_model_name="latte_v1",
    )
except RateLimitError as e:
    time.sleep(e.retry_after)
    # ...retry
except AuthenticationError:
    raise RuntimeError("COMPRESR_API_KEY is invalid or revoked")
except ValidationError as e:
    raise ValueError(f"Bad request: {e}") from e
except CompresrError as e:
    logger.exception("Compression failed: %s", e)
    raise
```

**TypeScript**

```typescript
import {
  CompresrError,
  AuthenticationError,
  RateLimitError,
  ValidationError,
} from '@compresr/sdk';

try {
  const result = await client.compress({
    context: longDocument,
    query: "What was the project's Q3 churn rate?",
    compressionModelName: 'latte_v1',
  });
} catch (error: unknown) {
  if (error instanceof RateLimitError) {
    await new Promise((r) => setTimeout(r, error.retryAfter * 1000));
    // ...retry
  } else if (error instanceof AuthenticationError) {
    throw new Error('COMPRESR_API_KEY is invalid or revoked');
  } else if (error instanceof ValidationError) {
    throw new Error(`Bad request: ${error.message}`);
  } else if (error instanceof CompresrError) {
    throw error;
  } else {
    throw error;
  }
}
```

**cURL**

```bash
# Branch on HTTP status. The response body always has a stable error envelope:
# { "success": false, "error": "...", "code": "..." }
status=$(curl -s -o body.json -D headers.txt -w "%{http_code}" \
  -X POST https://api.compresr.ai/api/compress/question-specific/ \
  -H "X-API-Key: $COMPRESR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"context":"...","query":"...","compression_model_name":"latte_v1"}')

if [ "$status" = "429" ]; then
  retry_after=$(grep -i '^Retry-After:' headers.txt | awk '{print $2}' | tr -d '\r')
  sleep "${retry_after:-1}"
  # ...retry
elif [ "$status" -ge 400 ]; then
  echo "Request failed ($status):" >&2
  cat body.json >&2
  exit 1
fi
```

> **Always handle 429**
> The default tier has tight per-minute limits. A retry loop with exponential backoff (respecting `retry_after`) is the single most important piece of error handling for production.