API reference
POST /compress/question-specific/stream
Stream compressed tokens over Server-Sent Events as they're produced.
/api/compress/question-specific/streamAPI keyStream compressed tokens over Server-Sent Events as they're produced.
Same input as POST /compress/question-specific/, but the response is a Server-Sent Events stream instead of a single JSON envelope. Start forwarding compressed tokens to your LLM before the full output is ready.
Request body
Identical to the non-streaming endpoint:
contextstringRequiredquerystringRequiredcompression_model_name"latte_v1"Requiredtarget_compression_rationumberOptionalsee ModelscoarsebooleanOptionaltrueheuristic_chunkingbooleanOptionalfalsedisable_placeholdersbooleanOptionalfalseResponse
The response uses Content-Type: text/event-stream. Each event carries a JSON object:
contentstringThe next chunk of compressed text. Concatenate chunks in order.
donebooleantrue on the final chunk. The stream closes after it.
errorstring | nullSet when the stream aborts mid-flight. content is empty in that case.
Status codes
200Stream opened. Body isOKtext/event-stream.401Missing or invalidUnauthorizedX-API-Key.422Field validation failure.Unprocessable Entity429Rate limit hit.Too Many Requests500Upstream error. Stream may include anInternal Server Errorerrorchunk before closing.503Upstream error. Stream may include anService Unavailableerrorchunk before closing.
The streaming endpoint returns events, not the standard { success, data, error } envelope. Handle each chunk's error field rather than HTTP status once the stream is open.