Guides

Coarse mode

Paragraph-level scoring is the default on `latte_v1`. This guide covers when to keep it on and when to set `coarse=false` for token-level precision.

latte_v1 scores at paragraph granularity by default — that's coarse mode. This guide covers the precision tradeoff and when to set coarse=false to opt into the slower, more precise token-level pass.

How coarse vs fine compares

By default, latte_v1 scores context at paragraph granularity. The model decides whether to keep or drop whole paragraphs based on how relevant each one is to your query. That's coarse mode, and it's what you get unless you opt out.

The opt-in is coarse=false, which switches to token-level scoring. A fine pass can keep one relevant sentence from an otherwise irrelevant paragraph, at the cost of significantly more work per request. Coarse mode can't slice into a paragraph — it either keeps it or drops it.

query and target_compression_ratio work identically in both modes. Only the unit of decision changes.

When the default works

Very long contexts. Anything north of about 50K tokens, where token-level scoring starts to dominate wall-clock time.
Structured documents where block boundaries already carry meaning. Markdown, knowledge-base articles, transcripts, code files. Keeping a whole markdown section is usually what you wanted anyway.
Pre-filtering before a fine pass. Coarse-cut a 100K-token corpus down to 20K, then run a fine pass over the cut for precision.
Real-time pipelines. Agent loops and chat backends where compression sits on the latency critical path and a few percent precision is an acceptable trade.

When to set `coarse=false`

Short or medium inputs. The coarse savings don't matter at this scale, and the precision is worth taking.
You need to keep specific sentences from a paragraph that is mostly irrelevant. The whole point of token-level scoring.
High-stakes RAG. When a single relevant span shouldn't be dropped along with its enclosing paragraph.
Output feeds a downstream model that's sensitive to noise. Tighter, sentence-level output reduces what the next model has to filter through.

Example

The default call doesn't need to mention coarse at all — paragraph-level scoring is what you get for free:

python

To opt into token-level precision, set coarse=false (Python), coarse: false (TypeScript), or "coarse": false in the JSON body:

python

The tradeoff at a glance

Dimension	Coarse mode (default)	Fine mode (`coarse=false`)
Unit of decision	Paragraph	Token / sentence
Latency on long inputs	Baseline	Higher
Cost	Lower	Higher
Precision	Paragraph-level	Sentence-level
Steering signal (`query`)	Honored	Honored
Response shape	Same	Same

The precision cost of the default is concrete: if only one sentence in a long paragraph is relevant, coarse mode keeps the whole paragraph rather than that one sentence. The compressed output is therefore slightly larger than a fine pass would produce, but it gets there much faster. Set coarse=false when that tradeoff isn't worth it.

Coarse mode applies to latte_v1 only and is a typed kwarg in the SDK as of compresr 2.5.0+. The flag is named identically across Python (coarse), TypeScript (coarse), and cURL ("coarse": false).

How coarse vs fine compares

When the default works

When to set coarse=false

Example

The tradeoff at a glance

When to set `coarse=false`