Skip to content
Compresr docs

API reference

POST /compress/question-specific/batch

Compress up to 100 context+query pairs in a single request with aggregated metrics.

POST/api/compress/question-specific/batchAPI key

Compress up to 100 context + query pairs in a single request.

Each input has its own context and query, so you can run a batch where every row is a different document scored against a different question. Useful for RAG pipelines where you compress one chunk per retrieved document.

Request body

inputsArray<{context, query}>Required
Between 1 and 100 items. Each item has a context (string, nullable) and a non-empty query.
compression_model_name"latte_v1"Required
Compression model name. Applies to every row.
target_compression_rationumberOptional
Default: see Models
Compression strength. See /docs/api-reference/models.
coarsebooleanOptional
Default: true
Paragraph-level compression. Applies to every row.

Items with context: null or context: "" return an empty result and are not billed.

The Python and TypeScript SDKs flatten this into parallel contexts and queries arrays — see Python SDK § Batch and TypeScript SDK § Batch. The REST shape shown above is what goes on the wire when you hit the endpoint directly with cURL or fetch.

Response

Response
  • successboolean

    true when the batch was accepted and processed.

  • dataobject
    • resultsarray

      Per-row results, in input order. Each entry has the same shape as the single-compression response data.

      • original_contextstring
      • compressed_contextstring
      • original_tokensinteger
      • compressed_tokensinteger
      • tokens_savedinteger
      • target_compression_rationumber | null
      • actual_compression_rationumber
      • duration_msinteger
    • total_original_tokensinteger

      Sum across all rows.

    • total_compressed_tokensinteger

      Sum across all rows.

    • total_tokens_savedinteger

      Sum across all rows.

    • average_compression_rationumber

      Mean actual_compression_ratio across all rows.

    • countinteger

      Number of rows processed.

Status codes

Status codes
  • 200
    Batch processed. Inspect each row in data.results.
  • 401
    Missing or invalid X-API-Key.
  • 422
    Validation failed (empty inputs, more than 100 rows, empty query on a row, ratio out of range).
  • 429
    Rate limit hit. Batch usage counts as one request but full token-volume.
  • 500
    Upstream error.
  • 503
    Upstream error.
Request
python
Response
json