Stop paying for context your model doesn’t need.

Your prompts carry far more tokens than the model actually reads. Compresr drops the rest: up to ~90% fewer tokens, so you cut cost and latency. At light compression it matches or beats full-context accuracy on public benchmarks.

See it on your own file in 60 seconds
Claude Code prompt
Use compresr to show me live cost savings on my own file.

1. pip install compresr
2. Introspect the SDK to discover the API, don't guess.
3. Ask me for COMPRESR_API_KEY (open https://compresr.ai/signup if I don't have one: $10 free, no card).
4. Ask me for (a) a path to a long document (PDF/.md/.txt) and (b) a question about it.
5. Compress the document with the question, then print a receipt: tokens in/out, ratio, GPT-5.2 cost full vs compressed at $1.75/M input tokens, % saved.
6. Ask GPT-5.2 the question against the compressed context and print the answer.

Don't skip the cost receipt; that's the point.
Works in Claude Code, Cursor, or any agent harness.Open full demo

How it works

We keep the signal and drop the noise.

Your raw text
Boeingreportedtotalrevenueof$77.8B202310-Kcommercialairplanes…+112,540 more
112,552 tokensBoeing 10-K, $0.263/query
Compresr
compresr
compressingdelivered

Keep the tokens that matter to your query.

Compression
226×
Compressed
revenue$77.8B2023
498 tokens$0.037/query, 86% cheaper
Tokens
112,552498
226× fewer
Cost
$0.263$0.037
86% cheaper
Latency
18s13.7s
24% faster

What most teams are losing

Stop overpaying.

If you’re paying full price for your tokens, you’re leaving real money on the table.

~90%
Bill cut
10×
Avg. compression
+3.7pp
Accuracy uplift
TodayWhat most teams are doing

Trimming / Truncation

  • Cuts off the tail: the answer was often in what you dropped.
  • Accuracy collapses on long docs.

Summarization

  • Lossy rewrite: nuance and exact wording are gone.
  • Costs extra LLM calls and latency for a worse context.

Question-agnostic compression

  • Compresses blindly: keeps irrelevant tokens, drops important ones.
  • Rarely gets past 5× without tanking accuracy.
With CompresrOne API call. Any scale.

Question-aware compression.

Feed us the query and the context. We return only the tokens that actually move the answer. You pay less, the LLM responds faster, and answers get sharper.

Per query
$0.263$0.037

GPT-5.2 + latte_v1

Tokens in
112,552498

226× fewer tokens

  • Question-aware: we compress for the task.
  • At light ~2× compression, accuracy matches or beats full context.
  • SDK or on-prem. Your call.

Independent benchmark

FinanceBench.

BaselineGPT-5.2latte_v1 API+ GPT-5.2
CompressionNone~2x
Context~106Ktokens~56Ktokens
Accuracy73%77%
SavingsNone~47%cheaper

FinanceBench, 128 questions over SEC filings. At light ~2x compression accuracy holds; push to ~10x when cost matters more than peak accuracy

Two ways to deploy

Pick the one that fits your stack.

Hosted SDK

Drop-in SDK. One API key.

Install, grab a key, compress any prompt or document before it hits your LLM. Pay per million tokens, no surprise bills.

  • $10 in free credits on sign-up, no credit card required
  • TypeScript & Python clients
  • Question-aware compression
  • Transparent per-million-token pricing
Get your free credits

Sign up, get $10 of compression free, no card needed.

On-Prem Deployment

Runs inside your VPC.

Your data never leaves your network. We deploy Compresr to your infrastructure, tune it for your workload, and support you directly.

  • Private deployment in your cloud or data center
  • Custom throughput & latency SLAs
  • Tailored to your business needs
  • Dedicated support
Contact us for on-prem

Enterprise, finance, healthcare, regulated workloads.