API reference

POST /compress/question-specific/batch

Compress up to 100 context+query pairs in a single request with aggregated metrics.

POST/api/compress/question-specific/batchAPI key

Compress up to 100 context + query pairs in a single request.

Each input has its own context and query, so you can run a batch where every row is a different document scored against a different question. Useful for RAG pipelines where you compress one chunk per retrieved document.

Request body

All compression knobs are request-level — they apply to every row in inputs. latte_v2 accepts every parameter latte_v1 accepts, plus the three latte_v2-only dynamic* knobs (rejected on latte_v1 with 422). See Models for the canonical reference.

Shared parameters (both models)

inputsArray<{context, query}>Required

Between 1 and 100 items. Each item has a context (string, nullable) and a non-empty query.

compression_model_name"latte_v1" | "latte_v2"Required

Routes the call. Applies to every row. See Models.

target_compression_rationumberOptional

Default: model default

Compression strength. See Models › target_compression_ratio. Ignored on latte_v2 when dynamic=true.

coarsebooleanOptional

Default: true

Paragraph-level compression. Applies to every row.

`latte_v2`-only parameters

dynamicbooleanOptional

Default: false

Picks the compression ratio per-input; overrides target_compression_ratio when true. Applies to every row. Rejected on latte_v1 with 422.

dynamic_min_rationumberOptional

Default: 1.5

Floor on the chosen Nx ratio when dynamic=true. Must be ≥ 1.0.

dynamic_max_rationumberOptional

Default: 10.0

Ceiling on the chosen Nx ratio when dynamic=true. Must be ≥ 1.0.

Items with context: null or context: "" return an empty result and are not billed.

The Python and TypeScript SDKs flatten this into parallel contexts and queries arrays. See Python SDK § Batch and TypeScript SDK § Batch. The REST shape shown above is what goes on the wire when you hit the endpoint directly with cURL or fetch.

Response

successboolean
true when the batch was accepted and processed.
dataobject
- resultsarray
  Per-row results, in input order. Each entry has the same shape as the single-compression response data.
  - original_contextstring
  - compressed_contextstring
  - original_tokensinteger
  - compressed_tokensinteger
  - tokens_savedinteger
  - target_compression_rationumber | null
  - actual_compression_rationumber
  - duration_msinteger
- total_original_tokensinteger
  Sum across all rows.
- total_compressed_tokensinteger
  Sum across all rows.
- total_tokens_savedinteger
  Sum across all rows.
- average_compression_rationumber
  Mean actual_compression_ratio across all rows.
- countinteger
  Number of rows processed.

Status codes

200
Batch processed. Inspect each row in data.results.
OK
401
Missing or invalid X-API-Key.
Unauthorized
422
Validation failed (empty inputs, more than 100 rows, empty query on a row, ratio out of range).
Unprocessable Entity
429
Rate limit hit. Batch usage counts as one request but full token-volume.
Too Many Requests
500
Upstream error.
Internal Server Error
503
Upstream error.
Service Unavailable

Request

python

Response

json

Request body

Shared parameters (both models)

latte_v2-only parameters

Response

Status codes

`latte_v2`-only parameters