Skip to content
Compresr docs

API reference

POST /compress/question-specific/stream

Stream compressed tokens over Server-Sent Events as they're produced.

POST/api/compress/question-specific/streamAPI key

Stream compressed tokens over Server-Sent Events as they're produced.

Same input as POST /compress/question-specific/, but the response is a Server-Sent Events stream instead of a single JSON envelope. Start forwarding compressed tokens to your LLM before the full output is ready.

Request body

Identical to the non-streaming endpoint:

contextstringRequired
The text to compress.
querystringRequired
The question or topic to preserve relevance for.
compression_model_name"latte_v1"Required
Compression model name.
target_compression_rationumberOptional
Default: see Models
Compression strength. See /docs/api-reference/models.
coarsebooleanOptional
Default: true
Paragraph-level compression (faster, less precise).
heuristic_chunkingbooleanOptional
Default: false
Pre-chunk with structure-aware heuristics.
disable_placeholdersbooleanOptional
Default: false
Strip placeholder tokens from the output.

Response

The response uses Content-Type: text/event-stream. Each event carries a JSON object:

Response
  • contentstring

    The next chunk of compressed text. Concatenate chunks in order.

  • doneboolean

    true on the final chunk. The stream closes after it.

  • errorstring | null

    Set when the stream aborts mid-flight. content is empty in that case.

Status codes

Status codes
  • 200
    Stream opened. Body is text/event-stream.
  • 401
    Missing or invalid X-API-Key.
  • 422
    Field validation failure.
  • 429
    Rate limit hit.
  • 500
    Upstream error. Stream may include an error chunk before closing.
  • 503
    Upstream error. Stream may include an error chunk before closing.

The streaming endpoint returns events, not the standard { success, data, error } envelope. Handle each chunk's error field rather than HTTP status once the stream is open.

Request
python
Response
text