Skip to content
Compresr docs

SDKs

cURL / HTTP

Call the Compresr REST API directly with cURL or any HTTP client.

The Compresr REST API is a small, ordinary HTTPS surface: JSON in, JSON out, X-API-Key for auth. Any HTTP client in any language can call it. cURL is the lingua franca for examples on this page.

Prefer a typed client? See the Python SDK or the TypeScript SDK. Every section below mirrors those pages one-to-one.

Field names on the wire

Every field - request and response - is snake_case. Both SDKs serialize to and from this shape. The cURL examples on this page show the canonical wire format.

1. Install

Nothing to install - cURL ships with macOS, Linux, and modern Windows. Put your key in an environment variable so it never lands in shell history or a copy-pasted snippet.

bash

For permanence, add the export line to ~/.zshrc / ~/.bashrc, or set a user-level Windows env var with setx COMPRESR_API_KEY "cmp_..." (new shells only).

2. Initialize the client

cURL has no client object. The two things you reuse on every request are the base URL and the X-API-Key header.

  • Base URL: https://api.compresr.ai
  • Auth header: X-API-Key: cmp_...
  • Content type: application/json on every POST
python

Requests without X-API-Key return 401 Unauthorized. See Authentication for the full security guidance; the rules are language-independent.

3. POST /compress/question-specific

What it does. The single-request endpoint. Send a JSON body with context, query, and compression_model_name: "latte_v1". The response carries the compressed context and token-accounting metadata.

When to use it. One chunk of text, one question. For many chunks against the same query, use /batch. For token-by-token output, use /stream.

python

For large payloads, prefer --data @body.json (read JSON from a file) over inlining; cURL's single-line string handling does not love embedded newlines or apostrophes in the body.

Request body

contextstringRequired
The long text you want compressed. JSON-escape any embedded quotes or newlines.
querystringRequired
The question or intent the compressed context must still be able to answer. Required for latte_v1.
compression_model_name"latte_v1"Required
Public model identifier. latte_v1 is the only model exposed by the public API.
target_compression_rationumberOptional
Removal strength when 0 < r ≤ 1, Nx target when r > 1. Omit for the model default. See the Models reference for the canonical table.
coarsebooleanOptional
Default: omitted
Latte-only. Omit (or send null) for the backend default (paragraph-level scoring). Send false to opt into token-level precision (slower, higher fidelity).
heuristic_chunkingbooleanOptional
Default: omitted
Latte-only. Heuristic splitter (paragraphs, code blocks) instead of the default fixed-size chunker. Omit to use the backend default.
disable_placeholdersbooleanOptional
Default: omitted
Latte-only. Skip the [...] placeholders the model normally inserts where content was dropped. Omit to use the backend default.

Response

A 200 OK returns a JSON object with this shape. Field names are stable across all SDKs.

200 OK
  • dataobject
    • compressed_contextstring

      The compressed text, ready to drop into your prompt.

    • original_tokensinteger

      Token count of the input context (tiktoken cl100k).

    • compressed_tokensinteger

      Token count of the compressed output.

    • tokens_savedinteger

      original_tokens − compressed_tokens.

    • actual_compression_rationumber

      Fraction of input tokens actually removed (0–1). e.g. 0.5 = ~50% removed.

    • duration_msinteger

      Server-side wall-clock time for the compression pass.

4. POST /compress/question-specific/stream

What it does. Returns Server-Sent Events (SSE). Each event is a data: {...}\n\n frame with a JSON payload of { content, done }. The final event has done: true and an empty content.

When to use it. Anywhere time-to-first-token matters. Always pass curl -N to disable output buffering so chunks appear as they arrive, not after the connection closes.

python

In your own HTTP client, split the response body on \n\n, drop the data: prefix from each frame, and JSON.parse what is left. Stop when you see done: true.

5. POST /compress/question-specific/batch

What it does. Accepts contexts (array of strings) and queries (either a single string applied to every context, or an array the same length as contexts). The response carries one result per input context, in the same order.

When to use it. RAG re-ranking against one user question, bulk document processing, or anywhere you would otherwise fire N parallel single-shot requests. Batching is significantly cheaper than N concurrent calls.

python

queries is either a string or an array of strings. Mixing the two - for example a string when you meant a single-element array - returns 422 Unprocessable Entity from the server, not a silent fallback.

6. Async

HTTP is request/response

There is no async pattern specific to cURL - the protocol is already request/response. For token-by-token output, use /stream. For concurrent requests, fire them in parallel from your shell (& + wait), or use one of the language SDKs:

  • Python ships _async twins of every method (details).
  • TypeScript is async by default (details).
python

7. Errors & status codes

Every error response has the same JSON envelope. The HTTP status code is the primary signal: branch on the status, then inspect the body for a machine-readable code and a human-readable error string.

json
StatusWhen it firesHow to recover
400Malformed JSON in the request body.Fix the payload. Retrying as-is will not help.
401Missing, malformed, or revoked X-API-Key.Rotate the key in the dashboard.
402Per-key budget exhausted.Raise the budget or rotate to a key with headroom.
422Body parsed but failed validation (e.g. target_compression_ratio out of range, query missing for latte_v1).Fix the payload.
429Rate-limit hit. Response includes a Retry-After header in seconds.Sleep for Retry-After seconds, then retry. Use exponential backoff if it happens repeatedly.
5xxTransient server error.Retry with backoff. If it persists, check the status page or contact support.
python

Always honor Retry-After

On a 429, the Retry-After header is the canonical signal for how long to wait. Hammering the endpoint without it is the fastest way to get rate-limited harder, and on a leaked key, to drain budget.