---
title: LLM context compression benchmarks
url: https://compresr.ai/benchmarks
description: Public, dated FinanceBench + QMSum results for Compresr context compression, including where high ratios trade accuracy.
benchmark_date: 2026-04
---

# On public long-document QA, Compresr light compression (~2x) matches or beats full-context accuracy; past ~2x is a cost/latency trade.

> Compresr light compression (~2x) matches or beats full-context accuracy on FinanceBench (73%→77%) and QMSum (55.9%→59.6%); beyond ~2x accuracy slips below baseline: a deliberate cost-and-latency regime.

> Human-readable page: https://compresr.ai/benchmarks

## Results
### FinanceBench: SEC-filing QA
n = 128, single-shot long-document QA, gpt-5.2 judge, 2026-04

| Config | Accuracy | Compression ratio | Evidence retrieval |
| --- | --- | --- | --- |
| Full context (baseline) | 73% | 1.0x | 91% |
| Light compression (best) | 77% | 1.9x | 91% |
| Medium compression (below baseline, cost/latency) | 70% | 4.6x | n/a |
| High compression (below baseline, cost/latency) | 65% | 8.9x | n/a |

### QMSum: meeting-transcript QA
n = 272, single-shot long-document QA, gpt-5.4-mini answerer, gpt-5.4 judge, 2026-04

| Config | Accuracy | Compression ratio |
| --- | --- | --- |
| Full context (baseline) | 55.9% | 1.0x |
| Query-specific (light) (best) | 59.6% | 1.87x |
| High compression (below baseline, cost/latency) | 42.6% | 8.76x |

#### vs. other methods at matched ~2x ratio

| Method | QMSum accuracy | Ratio |
| --- | --- | --- |
| Compresr (latte_v1) (us) | 59.6% | ~1.87x |
| scaledown | 57.4% | ~2x |
| LongLLMLingua | 53.7% | ~3x |
| LLMLingua-2 | 50.7% | ~2x |
| Token Company (ttc) | 48.2% | ~2x |

## Methodology
- Single-shot long-document QA: each filing/transcript is compressed first vs. sent in full, then answered and judged. There is no retriever in the loop. These are NOT RAG comparisons.
- Public model: `latte_v1` (public, query-specific; `query` is required).
- Statistical rigor: single-run deltas under 2pp sit within noise (5-run std ~0.9–1.5pp), so light-compression results are framed as "matches or beats". High-ratio degradation is published deliberately.

## Accuracy vs. ratio: two separate claims
- The accuracy win lives at LIGHT (~2x) compression: FinanceBench 73%→77%, QMSum 55.9%→59.6% (single-shot long-document QA, not RAG, 2026-04).
- High ratios (~10x / ~90% reduction) are a COST + LATENCY claim only. Past ~2x, accuracy falls ~2pp per doubling and can drop below the full-context baseline (e.g. FinanceBench ~8.9x = 65%).
- These are never combined into a single "90% reduction at full accuracy" claim.

## Related machine surfaces
- [/compare/prompt-compression-tools.md: vs other tools](/compare/prompt-compression-tools.md)
- [/glossary/compression-ratio.md](/glossary/compression-ratio.md)
- [/machine: entity overview](/machine)

## Provenance
Compresr Inc. is a Y Combinator W26 company built by four EPFL-trained founders in San Francisco, California and Europe (Switzerland).
Contact: [compresr.ai/contact](https://compresr.ai/contact).