Query-specific compression

Query-specific compression is context compression that conditions on the question being asked, keeping the spans relevant to that query and dropping the rest.

Most naive compression is query-blind: it shortens text the same way regardless of what will be asked of it. Query-specific compression instead takes the query as an input and decides what to keep based on relevance to that query. The same document compressed for two different questions can yield two different shortened contexts.

This matters because answer quality hinges on whether the evidence survives compression. Conditioning on the query protects the answer-bearing tokens for the current task, which is why query-specific methods tend to preserve accuracy better than generic pruning at the same ratio.

In published benchmarks of long-document QA, query-specific compression at light ratios has matched or beaten full-context accuracy while sending far fewer tokens, with the accuracy advantage concentrated at light compression rather than at very high ratios.

Compresr’s public model is query-specific: the query is a required input, and it guides which tokens are retained so the shortened context still answers the question.

Related terms