DocsContext Gateway

Context Gateway

A transparent proxy that sits between your AI agent and the LLM API, automatically compressing conversation history in the background so you never wait.

View on GitHub

What is Context Gateway?

Context Gateway intercepts LLM API calls from your AI agent and manages context automatically:

Monitors conversation length — tracks token usage across the conversation
Pre-computes summaries — when context hits 75% of the limit (configurable), it starts compressing history in the background
Instant compaction — when the limit is reached, the compressed version is ready instantly with no wait
Compresses tool outputs — large tool outputs are compressed on the fly
Logs everything — all compression events are logged to logs/history_compaction.jsonl

Installation

Quick Install

curl -fsSL https://compresr.ai/api/install | sh

Manual Install Options

Download the binary directly from GitHub Releases
Build from source (requires Go 1.21+): go build -o context-gateway ./cmd/gateway

Quick Start

1. Launch the interactive wizard

context-gateway

2. Follow the wizard

The TUI wizard will guide you through:

Select your agent (Claude Code, OpenClaw, OpenCode, or Custom)
Enter your LLM provider API key (Anthropic, OpenAI, etc.)
Enter your Compresr API key
Configure compression settings (threshold, model, etc.)

3. Use your agent as usual

The gateway runs as a local proxy on http://localhost:8080. Your agent's API calls are routed through it automatically. No code changes needed.

Supported Agents

Works with any LLM provider including OpenAI, Anthropic, Ollama (local models), and Amazon Bedrock. You can start or stop the gateway at any time — agents will auto-detect it automatically.

Claude Code

Codex

OpenHands

OpenClaw

Custom

Coding Agents

For CLI-based coding agents like Claude Code, Codex, and OpenHands. Same user experience — you select the agent in the interactive wizard, then use it as usual. The gateway runs in one terminal and your agent runs in another.

# Terminal 1: start the gateway (interactive mode)
context-gateway
# -> Select your agent (Claude Code, Codex, OpenHands, ...)
# -> Enter your API keys
# -> Gateway starts on http://localhost:8080

# Terminal 2: use your agent as usual
claude          # Claude Code
codex           # Codex
openhands       # OpenHands

The gateway intercepts API calls and compresses context in the background. Your agent never knows the difference. You can stop and restart the gateway at any time — the agent will detect it automatically.

Non-Interactive Agents

For agents like OpenClaw that run as persistent services or deployments.

OpenClaw handles routing via its plugin system. Install the Context Gateway plugin and OpenClaw will automatically route through the gateway. If you have a running agent or deploy a new one, it auto-detects the gateway and routes through it — no restarts needed.

# Install the OpenClaw plugin
openclaw plugin install context-gateway

# Start the gateway
context-gateway

# Any running or new OpenClaw agent auto-detects the gateway
# You can stop/restart the gateway anytime — agents detect it automatically

For other non-interactive agents or custom deployments, point your agent's LLM API base URL to the gateway proxy endpoint (http://localhost:8080).

Configuration

Configuration is saved to ~/.config/context-gateway/.env after running the interactive wizard. You can edit or update your config at any time using the CLI:

Re-configure via CLI

context-gateway -c

Or edit it manually at ~/.config/context-gateway/.env:

Environment Variables

# Required
COMPRESR_API_KEY=cmp_your_api_key        # Your Compresr API key
LLM_API_KEY=sk-xxx                       # Your LLM provider API key

# Agent Configuration
AGENT_TYPE=claude_code                   # claude_code | openclaw | opencode | custom
PROXY_PORT=8080                          # Local proxy port (default: 8080)

# Compression Settings
CONTEXT_THRESHOLD=0.75                   # Trigger compression at 75% of context limit
COMPRESSION_MODEL=espresso_v1            # Model used for history compression
TARGET_COMPRESSION_RATIO=0.5             # How aggressively to compress (0.2-0.9)

# Optional
SLACK_WEBHOOK_URL=https://hooks.slack.com/...  # Slack notifications
LOG_LEVEL=info                           # debug | info | warn | error

What You'll Notice

No waiting

When conversation hits the limit, the compressed summary is already ready.

Transparent

Your agent keeps working normally — it never knows the difference.

Automatic

No code changes needed. Just route API calls through the gateway.

Observable

Check logs to see every compression event with full metrics.

How It Works

Intercepts requests: All LLM API calls from your agent go through the gateway proxy
Tracks context usage: The gateway monitors token count across the conversation
Background compression: When usage hits the threshold (default 75%), compressed summaries are pre-computed
Instant swap: When the context limit is reached, the compressed history replaces the original — no wait time
Compresses tool outputs: Large tool outputs (file reads, search results, etc.) are compressed on the fly

Logs & Monitoring

The gateway creates detailed logs for every compression event:

logs/history_compaction.jsonl

When and how conversations are compressed

logs/tool_output_compression.jsonl

Tool output compression metrics and results

logs/telemetry.jsonl

Request/response timing and performance data

Example log entry

{
  "timestamp": "2026-03-06T14:30:00Z",
  "event": "history_compaction",
  "agent": "claude_code",
  "original_tokens": 180000,
  "compressed_tokens": 54000,
  "compression_ratio": 0.7,
  "model": "espresso_v1",
  "latency_ms": 1200
}

Remote Deployment

Deploy the gateway as a service for team-wide usage:

Deploy as a service

# Using Docker
docker run -d \
  -p 8080:8080 \
  -e COMPRESR_API_KEY=cmp_your_api_key \
  -e LLM_API_KEY=sk-xxx \
  -e AGENT_TYPE=claude_code \
  compresr/context-gateway:latest

# Or using the binary directly
COMPRESR_API_KEY=cmp_xxx LLM_API_KEY=sk-xxx context-gateway --port 8080

Environment Variables for Deployment

All configuration options from the ~/.config/context-gateway/.env file can be passed as environment variables. This makes it easy to deploy via Docker, Kubernetes, or any container orchestration platform.

Benefits

Zero latency: Compression happens in the background, not on the critical path
Transparent: Works with existing tools and workflows without code changes
Cost savings: Reduce token usage by 30-70%
Extended conversations: Never hit context limits unexpectedly
Better quality: Agent retains more important context after compression

Contributing

We welcome contributions! Please join our Discord to contribute.