API reference

POST /compress/question-specific

Compress context while preserving tokens relevant to a query. The primary Compresr endpoint.

POST/api/compress/question-specific/API key

Compress context while preserving tokens relevant to a query.

Supply a long context and a query; the model keeps the parts of the context that matter for that query and drops the rest. This is the headline Compresr endpoint.

Request body

latte_v2 accepts every parameter latte_v1 accepts, plus three latte_v2-only knobs for dynamic compression-ratio selection. The shared parameters apply identically on both models; the latte_v2-only knobs are rejected with 422 if sent to latte_v1. See Models for the canonical reference and decision guide.

Shared parameters (both models)

contextstringRequired

The text to compress. Pass null or an empty string to get an empty result back with no billing.

querystringRequired

The question or topic to preserve relevance for. Cannot be empty.

compression_model_name"latte_v1" | "latte_v2"Required

Routes the call. latte_v2 is the recommended default; pick latte_v1 when you need the older backbone explicitly. See Models.

target_compression_rationumberOptional

Default: model default

Compression strength. Removal fraction when 0 < r ≤ 1, Nx target when r > 1. See Models › target_compression_ratio. Ignored on latte_v2 when dynamic=true.

coarsebooleanOptional

Default: true

Paragraph-level compression. Faster and cheaper, less precise than token-level.

heuristic_chunkingbooleanOptional

Default: false

Pre-chunk the context with structure-aware heuristics (paragraphs, code blocks, markdown sections) before scoring.

disable_placeholdersbooleanOptional

Default: false

Drop the [...] markers the model normally inserts between kept spans.

`latte_v2`-only parameters

dynamicbooleanOptional

Default: false

Pick the compression ratio per-input automatically inside [dynamic_min_ratio, dynamic_max_ratio]. Overrides target_compression_ratio when true. Rejected on latte_v1 with 422.

dynamic_min_rationumberOptional

Default: 1.5

Floor on the chosen Nx ratio when dynamic=true. Must be ≥ 1.0. Only consulted when dynamic=true.

dynamic_max_rationumberOptional

Default: 10.0

Ceiling on the chosen Nx ratio when dynamic=true. Must be ≥ 1.0 and ≥ dynamic_min_ratio. Only consulted when dynamic=true.

Response

successboolean
true on success.
dataobject
- original_contextstring
  The input context, echoed back.
- compressed_contextstring
  The compressed output you forward to your LLM.
- original_tokensinteger
  Token count of the input context.
- compressed_tokensinteger
  Token count after compression.
- tokens_savedinteger
  original_tokens − compressed_tokens.
- target_compression_rationumber | null
  The ratio you requested, if any.
- actual_compression_rationumber
  The ratio actually achieved (0 to 1).
- duration_msinteger
  Server-side processing time in milliseconds.

Status codes

200
Compression succeeded.
OK
400
Malformed JSON body.
Bad Request
401
Missing or invalid X-API-Key.
Unauthorized
422
A field failed validation (e.g. empty query, target_compression_ratio > 200).
Unprocessable Entity
429
Rate limit hit for your tier.
Too Many Requests
500
Upstream compression service error.
Internal Server Error
503
Upstream compression service error.
Service Unavailable

Request

python

Response

json

Request body

Shared parameters (both models)

latte_v2-only parameters

Response

Status codes

`latte_v2`-only parameters