API reference
POST /compress/question-specific/batch
Compress up to 100 context+query pairs in a single request with aggregated metrics.
/api/compress/question-specific/batchAPI keyCompress up to 100 context + query pairs in a single request.
Each input has its own context and query, so you can run a batch where every row is a different document scored against a different question. Useful for RAG pipelines where you compress one chunk per retrieved document.
Request body
inputsArray<{context, query}>Requiredcompression_model_name"latte_v1"Requiredtarget_compression_rationumberOptionalsee ModelscoarsebooleanOptionaltrueItems with context: null or context: "" return an empty result and are not billed.
The Python and TypeScript SDKs flatten this into parallel contexts and queries arrays — see Python SDK § Batch and TypeScript SDK § Batch. The REST shape shown above is what goes on the wire when you hit the endpoint directly with cURL or fetch.
Response
successbooleantrue when the batch was accepted and processed.
dataobjectresultsarrayPer-row results, in input order. Each entry has the same shape as the single-compression response data.
original_contextstringcompressed_contextstringoriginal_tokensintegercompressed_tokensintegertokens_savedintegertarget_compression_rationumber | nullactual_compression_rationumberduration_msinteger
total_original_tokensintegerSum across all rows.
total_compressed_tokensintegerSum across all rows.
total_tokens_savedintegerSum across all rows.
average_compression_rationumberMean actual_compression_ratio across all rows.
countintegerNumber of rows processed.
Status codes
200Batch processed. Inspect each row inOKdata.results.401Missing or invalidUnauthorizedX-API-Key.422Validation failed (emptyUnprocessable Entityinputs, more than 100 rows, emptyqueryon a row, ratio out of range).429Rate limit hit. Batch usage counts as one request but full token-volume.Too Many Requests500Upstream error.Internal Server Error503Upstream error.Service Unavailable