Skip to content
Compresr docs

On-prem deployment

Self-host the Compresr compression server inside your VPC or on bare metal.

Compresr ships a hardened Docker image that mirrors the cloud /compress surface, so you can run the same compression pipeline inside your own infrastructure. On-prem deployments are designed for teams that cannot send raw context to a third-party API — the customer runs the image inside their own VPC or bare-metal cluster, and no document text ever leaves the network boundary. The same SDKs and request shapes work against the self-hosted endpoint; the only thing that changes is the base URL.

This page is a placeholder. Detailed runbooks (image registry, sizing guidance, GPU vs CPU profiles, SSO wiring, audit-log sinks) will land here as customers come online.

When to choose on-prem

  • Data sovereignty. Regulated workloads (healthcare, finance, defense, EU residency requirements) where context cannot transit a vendor's network.
  • Sensitive context. Internal documents, customer PII, or trade-secret material that must stay inside your security perimeter.
  • Latency-sensitive in-VPC traffic. Co-locating compression next to your inference cluster removes the round-trip to api.compresr.ai and lets you keep request budgets measured in single-digit milliseconds.

What you get

  • The same latte_v1 query-specific compression model that powers the cloud API. No accuracy gap between hosted and self-hosted — it's the same weights.
  • A bare /compress HTTP surface that accepts the same request parameters as the hosted endpoint — context, query, target_compression_ratio, coarse, compression_model_name.
  • Drop-in compatibility with the public Python and TypeScript SDKs. Point CompressionClient at your hosted URL and existing call sites keep working unchanged.
  • Optional audit logging that writes every request to a sink you control (object storage, SIEM, or stdout for your own log pipeline).
  • Optional SSO / IdP integration for operator access to the admin surface.
  • Air-gapped operation supported — the image does not phone home and does not require outbound network access at request time.

Deployment shapes we support

  • Single-node Docker. One host, one container, fronted by your existing reverse proxy. Good for evaluation and smaller workloads.
  • Kubernetes. Helm chart with horizontal pod autoscaling, readiness probes, and a values file you check into your own infra repo.
  • VPC-managed compute. Run the image on your existing GPU or CPU node pool inside AWS, GCP, or Azure — the image is registry-agnostic.

Sizing, GPU vs CPU tradeoffs, and a concrete reference architecture for each shape will be documented here as the on-prem rollout matures.

How to get access

On-prem deployment is part of the enterprise tier. Contact sales to discuss image access, licensing, and a deployment plan for your environment. We typically run a short scoping call to confirm compliance requirements, expected throughput, and your preferred deployment shape before handing over the image and a configuration walkthrough.

Placeholder

This page is a stub. The full on-prem documentation — image pull instructions, configuration reference, hardware sizing, observability, and upgrade flow — is being written. If you need it now, reach out and we'll walk you through it directly.