SDKs

cURL / HTTP

Call the Compresr REST API directly with cURL or any HTTP client.

The Compresr REST API is a small, ordinary HTTPS surface: JSON in, JSON out, X-API-Key for auth. Any HTTP client in any language can call it. cURL is the lingua franca for examples on this page.

Prefer a typed client? See the Python SDK or the TypeScript SDK. Every section below mirrors those pages one-to-one.

Field names on the wire

Every field - request and response - is snake_case. Both SDKs serialize to and from this shape. The cURL examples on this page show the canonical wire format.

1. Install

Nothing to install - cURL ships with macOS, Linux, and modern Windows. Put your key in an environment variable so it never lands in shell history or a copy-pasted snippet.

bash

For permanence, add the export line to ~/.zshrc / ~/.bashrc, or set a user-level Windows env var with setx COMPRESR_API_KEY "cmp_..." (new shells only).

2. Initialize the client

cURL has no client object. The two things you reuse on every request are the base URL and the X-API-Key header.

Base URL: https://api.compresr.ai
Auth header: X-API-Key: cmp_...
Content type: application/json on every POST

python

Requests missing the X-API-Key header return 422 with FastAPI's default validation shape ({"detail":[{"loc":[...], "msg":..., "type":...}]}). A present-but-invalid or revoked key returns 401 with the standard error envelope. See Authentication for the full security guidance; the rules are language-independent.

3. POST /compress/question-specific

What it does. The single-request endpoint. Send a JSON body with context, query, and compression_model_name: "latte_v2". The response carries the compressed context and token-accounting metadata.

When to use it. One chunk of text, one question. For many chunks against the same query, use /batch. For an SSE-framed response instead of a plain JSON body, use /stream — note that today it emits a single final frame, not per-token output.

python

For large payloads, prefer --data @body.json (read JSON from a file) over inlining; cURL's single-line string handling does not love embedded newlines or apostrophes in the body.

Request body

latte_v2 accepts every parameter latte_v1 accepts, plus three latte_v2-only knobs for dynamic compression-ratio selection. See the Models reference for the canonical decision guide.

Shared parameters (both models)

contextstringOptional

The long text you want compressed. JSON-escape any embedded quotes or newlines. If null or empty, the response returns an empty compressed_context with zero tokens — no error, no billing.

querystringRequired

The question or intent the compressed context must still be able to answer. Required for both models.

compression_model_name"latte_v1" | "latte_v2"Required

Routes the call. latte_v2 is the recommended default. See the Models reference.

target_compression_rationumberOptional

Removal strength when 0 < r ≤ 1, Nx target when r > 1. Server caps at 200. Omit for the model default. See Models › target_compression_ratio. Ignored on latte_v2 when dynamic=true.

coarsebooleanOptional

Default: omitted

Omit for the backend default (paragraph-level scoring). Send false to opt into token-level precision (slower, higher fidelity). Sending JSON null returns 422.

heuristic_chunkingbooleanOptional

Default: omitted

Heuristic splitter (paragraphs, code blocks) instead of the default fixed-size chunker. Omit for the backend default. Sending JSON null returns 422.

disable_placeholdersbooleanOptional

Default: omitted

Skip the [...] placeholders the model normally inserts where content was dropped. Omit for the backend default. Sending JSON null returns 422.

`latte_v2`-only parameters

dynamicbooleanOptional

Default: false

Pick the compression ratio per-input automatically inside [dynamic_min_ratio, dynamic_max_ratio]; overrides target_compression_ratio when true. Rejected on latte_v1 with 422.

dynamic_min_rationumberOptional

Default: omitted (server default 1.5)

Floor on the chosen Nx ratio when dynamic=true. Must be ≥ 1.0. Only consulted when dynamic=true.

dynamic_max_rationumberOptional

Default: omitted (server default 10.0)

Ceiling on the chosen Nx ratio when dynamic=true. Must be ≥ 1.0. Only consulted when dynamic=true.

Response

A 200 OK returns a JSON object with this shape. Field names are stable across all SDKs.

200 OK

dataobject
- compressed_contextstring
  The compressed text, ready to drop into your prompt.
- original_tokensinteger
  Token count of the input context (tiktoken cl100k).
- compressed_tokensinteger
  Token count of the compressed output.
- tokens_savedinteger
  original_tokens − compressed_tokens.
- actual_compression_rationumber
  Fraction of input tokens actually removed (0–1). e.g. 0.5 = ~50% removed.
- duration_msinteger
  Server-side wall-clock time for the compression pass.

4. POST /compress/question-specific/stream

What it does. Returns Server-Sent Events (SSE). Today the endpoint waits for the compression pass to finish and emits exactly one data: {...}\n\n frame with { content, done: true } carrying the full compressed context. The SSE framing exists for HTTP-level backpressure and forward compatibility with per-token streaming — not for incremental output today.

When to use it. When you want an SSE-shaped response (single frame, done: true) instead of a plain JSON body — for example, when your client is already wired for SSE. Pass curl -N to disable output buffering. Parse each data: frame and stop when you see done: true.

python

In your own HTTP client, split the response body on \n\n, drop the data: prefix from each frame, and JSON.parse what is left. Stop when you see done: true. There is no [DONE] sentinel — the done: true field on the JSON payload is the terminator.

5. POST /compress/question-specific/batch

What it does. Accepts inputs: an array of { context, query } objects (max 100 per request). Each row carries its own query, so you can mix queries within one call. The response carries one result per input, in input order, plus aggregate token counts.

The Python and TypeScript SDKs expose a convenience form (contexts=[...], queries="..." or queries=[...]) that expands client-side into the inputs wire shape. When calling the HTTP API directly, always send inputs.

When to use it. RAG re-ranking against one user question, bulk document processing, or anywhere you would otherwise fire N parallel single-shot requests. Batching is significantly cheaper than N concurrent calls.

python

Each element of inputs must be an object with a context (optional; null/empty rows return an empty compressed row and are not billed) and a required query. Malformed items return 422 Unprocessable Entity.

Max 100 inputs per request. Larger workloads: split into multiple batch calls.
The batch schema accepts compression_model_name, target_compression_ratio, coarse, dynamic, dynamic_min_ratio, dynamic_max_ratio, and source at the top level, applied to all rows. heuristic_chunking and disable_placeholders are not accepted on /batch — they are single-request-only knobs.

Batch response

200 OK

dataobject
- resultsarray<object>
  One entry per input, in input order. Each entry has the same shape as the single-compression response data.
- total_original_tokensinteger
  Sum of original_tokens across all rows.
- total_compressed_tokensinteger
  Sum of compressed_tokens across all rows.
- total_tokens_savedinteger
  total_original_tokens − total_compressed_tokens.
- average_compression_rationumber
  Mean actual_compression_ratio across all rows.
- countinteger
  Number of rows in results.

6. Async

HTTP is request/response

There is no async pattern specific to cURL - the protocol is already request/response. For an SSE-framed response, use /stream (single final frame today, not per-token). For concurrent requests, fire them in parallel from your shell (& + wait), or use one of the language SDKs:

Python ships _async twins of every method (details).
TypeScript is async by default (details).

python

7. Errors & status codes

Every error response has the same JSON envelope. The HTTP status code is the primary signal: branch on the status, then inspect the body for a machine-readable code and a human-readable error string.

json

detail, retry_after, and field are optional and only appear when relevant. retry_after is echoed in both the response body and the Retry-After header. field is set on validation errors that isolate to a single request field.

FastAPI's default validation shape is different: a missing X-API-Key header (or any request that fails schema validation before hitting an app handler) returns 422 with {"detail":[{"loc":[...], "msg":..., "type":...}]} instead of the envelope above.

Status	Code	When it fires	How to recover
`400`	`bad_request`	Malformed JSON in the request body.	Fix the payload. Retrying as-is will not help.
`401`	`authentication_error`	Present-but-invalid, malformed, or revoked `X-API-Key`.	Rotate the key in the dashboard.
`402`	`insufficient_credits`	Prepaid wallet cannot cover the request.	Top up the wallet.
`402`	`budget_limit_reached`	Per-key monthly budget exhausted.	Raise the budget or rotate to a key with headroom.
`402`	`payment_failed`	Downstream payment provider rejected the charge.	Update payment method, then retry.
`403`	`scope_error`	Key scope does not permit this endpoint (e.g. demo key on a paid route).	Use a key with the required scope.
`404`	`not_found`	Resource does not exist.	Check the path / identifier.
`422`	`validation_error`	Body parsed but failed validation (e.g. `target_compression_ratio` out of range, `query` missing).	Fix the payload.
`422`	(FastAPI default)	Header/query/path parameter missing or wrong type (e.g. missing `X-API-Key`).	Body is `{"detail":[...]}` — inspect `loc` to find the missing field.
`429`	`rate_limit_exceeded`	Rate-limit hit. `Retry-After` in header and body.	Sleep for `Retry-After` seconds, then retry. Use exponential backoff if it recurs.
`503`	`service_unavailable`	Backend temporarily unhealthy.	Retry with backoff.
`503`	`connection_error`	Upstream connection failed.	Retry with backoff.
`503`	`circuit_open`	Circuit breaker tripped. `Retry-After` is jittered.	Honor `Retry-After`.
`504`	`timeout`	Server-side deadline exceeded.	Retry; consider a smaller payload.
`5xx`	`server_error`	Uncaught server error.	Retry with backoff. If it persists, check the status page or contact support.

python

Always honor Retry-After

On a 429, the Retry-After header is the canonical signal for how long to wait. Hammering the endpoint without it is the fastest way to get rate-limited harder, and on a leaked key, to drain budget.

Shared parameters (both models)

latte_v2-only parameters

`latte_v2`-only parameters