On-prem deployment
Self-host the Compresr compression server inside your VPC or on bare metal.
Compresr ships a hardened Docker image that mirrors the cloud /compress surface, so
you can run the same compression pipeline inside your own infrastructure. On-prem
deployments are designed for teams that cannot send raw context to a third-party API —
the customer runs the image inside their own VPC or bare-metal cluster, and no document
text ever leaves the network boundary. The same SDKs and request shapes work against the
self-hosted endpoint; the only thing that changes is the base URL.
This page is a placeholder. Detailed runbooks (image registry, sizing guidance, GPU vs CPU profiles, SSO wiring, audit-log sinks) will land here as customers come online.
When to choose on-prem
- Data sovereignty. Regulated workloads (healthcare, finance, defense, EU residency requirements) where context cannot transit a vendor's network.
- Sensitive context. Internal documents, customer PII, or trade-secret material that must stay inside your security perimeter.
- Latency-sensitive in-VPC traffic. Co-locating compression next to your inference
cluster removes the round-trip to
api.compresr.aiand lets you keep request budgets measured in single-digit milliseconds.
What you get
- The same
latte_v1query-specific compression model that powers the cloud API. No accuracy gap between hosted and self-hosted — it's the same weights. - A bare
/compressHTTP surface that accepts the same request parameters as the hosted endpoint —context,query,target_compression_ratio,coarse,compression_model_name. - Drop-in compatibility with the public Python and TypeScript SDKs. Point
CompressionClientat your hosted URL and existing call sites keep working unchanged. - Optional audit logging that writes every request to a sink you control (object storage, SIEM, or stdout for your own log pipeline).
- Optional SSO / IdP integration for operator access to the admin surface.
- Air-gapped operation supported — the image does not phone home and does not require outbound network access at request time.
Deployment shapes we support
- Single-node Docker. One host, one container, fronted by your existing reverse proxy. Good for evaluation and smaller workloads.
- Kubernetes. Helm chart with horizontal pod autoscaling, readiness probes, and a values file you check into your own infra repo.
- VPC-managed compute. Run the image on your existing GPU or CPU node pool inside AWS, GCP, or Azure — the image is registry-agnostic.
Sizing, GPU vs CPU tradeoffs, and a concrete reference architecture for each shape will be documented here as the on-prem rollout matures.
How to get access
On-prem deployment is part of the enterprise tier. Contact sales to discuss image access, licensing, and a deployment plan for your environment. We typically run a short scoping call to confirm compliance requirements, expected throughput, and your preferred deployment shape before handing over the image and a configuration walkthrough.
Placeholder
This page is a stub. The full on-prem documentation — image pull instructions, configuration reference, hardware sizing, observability, and upgrade flow — is being written. If you need it now, reach out and we'll walk you through it directly.