API reference

POST /compress/question-specific/stream

Stream compressed tokens over Server-Sent Events as they're produced.

POST/api/compress/question-specific/streamAPI key

Stream compressed tokens over Server-Sent Events as they're produced.

Same input as POST /compress/question-specific/, but the response is a Server-Sent Events stream instead of a single JSON envelope. Start forwarding compressed tokens to your LLM before the full output is ready.

Request body

Identical to the non-streaming endpoint. latte_v2 accepts every parameter latte_v1 accepts, plus the three latte_v2-only dynamic* knobs (rejected on latte_v1 with 422).

Shared parameters (both models)

contextstringRequired

The text to compress.

querystringRequired

The question or topic to preserve relevance for.

compression_model_name"latte_v1" | "latte_v2"Required

Routes the call. See Models.

target_compression_rationumberOptional

Default: model default

Compression strength. See Models › target_compression_ratio. Ignored on latte_v2 when dynamic=true.

coarsebooleanOptional

Default: true

Paragraph-level compression (faster, less precise).

heuristic_chunkingbooleanOptional

Default: false

Pre-chunk with structure-aware heuristics.

disable_placeholdersbooleanOptional

Default: false

Drop the [...] markers between kept spans.

`latte_v2`-only parameters

dynamicbooleanOptional

Default: false

Picks the compression ratio per-input; overrides target_compression_ratio when true. Rejected on latte_v1 with 422.

dynamic_min_rationumberOptional

Default: 1.5

Floor on the chosen Nx ratio when dynamic=true. Must be ≥ 1.0.

dynamic_max_rationumberOptional

Default: 10.0

Ceiling on the chosen Nx ratio when dynamic=true. Must be ≥ 1.0.

Response

The response uses Content-Type: text/event-stream. Each event carries a JSON object:

Response

contentstring
The next chunk of compressed text. Concatenate chunks in order.
doneboolean
true on the final chunk. The stream closes after it.
errorstring | null
Set when the stream aborts mid-flight. content is empty in that case.

Status codes

200
Stream opened. Body is text/event-stream.
OK
401
Missing or invalid X-API-Key.
Unauthorized
422
Field validation failure.
Unprocessable Entity
429
Rate limit hit.
Too Many Requests
500
Upstream error. Stream may include an error chunk before closing.
Internal Server Error
503
Upstream error. Stream may include an error chunk before closing.
Service Unavailable

The streaming endpoint returns events, not the standard { success, data, error } envelope. Handle each chunk's error field rather than HTTP status once the stream is open.

Request

python

Response

text

Request body

Shared parameters (both models)

latte_v2-only parameters

Response

Status codes

`latte_v2`-only parameters