Context Gateway
A transparent proxy that sits between your AI agent and the LLM API, automatically compressing conversation history in the background so you never wait.
View on GitHubWhat is Context Gateway?
Context Gateway intercepts LLM API calls from your AI agent and manages context automatically:
- Monitors conversation length — tracks token usage across the conversation
- Pre-computes summaries — when context hits 75% of the limit (configurable), it starts compressing history in the background
- Instant compaction — when the limit is reached, the compressed version is ready instantly with no wait
- Compresses tool outputs — large tool outputs are compressed on the fly
- Logs everything — all compression events are logged to
logs/history_compaction.jsonl
Installation
Quick Install
curl -fsSL https://compresr.ai/api/install | shManual Install Options
- Download the binary directly from GitHub Releases
- Build from source (requires Go 1.21+):
go build -o context-gateway ./cmd/gateway
Quick Start
1. Launch the interactive wizard
context-gateway2. Follow the wizard
The TUI wizard will guide you through:
- Select your agent (Claude Code, OpenClaw, OpenCode, or Custom)
- Enter your LLM provider API key (Anthropic, OpenAI, etc.)
- Enter your Compresr API key
- Configure compression settings (threshold, model, etc.)
3. Use your agent as usual
The gateway runs as a local proxy on http://localhost:8080. Your agent's API calls are routed through it automatically. No code changes needed.
Supported Agents
Works with any LLM provider including OpenAI, Anthropic, Ollama (local models), and Amazon Bedrock. You can start or stop the gateway at any time — agents will auto-detect it automatically.
Claude Code
Codex
OpenHands
OpenClaw
Custom
Coding Agents
For CLI-based coding agents like Claude Code, Codex, and OpenHands. Same user experience — you select the agent in the interactive wizard, then use it as usual. The gateway runs in one terminal and your agent runs in another.
# Terminal 1: start the gateway (interactive mode)
context-gateway
# -> Select your agent (Claude Code, Codex, OpenHands, ...)
# -> Enter your API keys
# -> Gateway starts on http://localhost:8080
# Terminal 2: use your agent as usual
claude # Claude Code
codex # Codex
openhands # OpenHandsThe gateway intercepts API calls and compresses context in the background. Your agent never knows the difference. You can stop and restart the gateway at any time — the agent will detect it automatically.
Non-Interactive Agents
For agents like OpenClaw that run as persistent services or deployments.
OpenClaw handles routing via its plugin system. Install the Context Gateway plugin and OpenClaw will automatically route through the gateway. If you have a running agent or deploy a new one, it auto-detects the gateway and routes through it — no restarts needed.
# Install the OpenClaw plugin
openclaw plugin install context-gateway
# Start the gateway
context-gateway
# Any running or new OpenClaw agent auto-detects the gateway
# You can stop/restart the gateway anytime — agents detect it automaticallyFor other non-interactive agents or custom deployments, point your agent's LLM API base URL to the gateway proxy endpoint (http://localhost:8080).
Configuration
Configuration is saved to ~/.config/context-gateway/.env after running the interactive wizard. You can edit or update your config at any time using the CLI:
Re-configure via CLI
context-gateway -cOr edit it manually at ~/.config/context-gateway/.env:
Environment Variables
# Required
COMPRESR_API_KEY=cmp_your_api_key # Your Compresr API key
LLM_API_KEY=sk-xxx # Your LLM provider API key
# Agent Configuration
AGENT_TYPE=claude_code # claude_code | openclaw | opencode | custom
PROXY_PORT=8080 # Local proxy port (default: 8080)
# Compression Settings
CONTEXT_THRESHOLD=0.75 # Trigger compression at 75% of context limit
COMPRESSION_MODEL=espresso_v1 # Model used for history compression
TARGET_COMPRESSION_RATIO=0.5 # How aggressively to compress (0.2-0.9)
# Optional
SLACK_WEBHOOK_URL=https://hooks.slack.com/... # Slack notifications
LOG_LEVEL=info # debug | info | warn | errorWhat You'll Notice
No waiting
When conversation hits the limit, the compressed summary is already ready.
Transparent
Your agent keeps working normally — it never knows the difference.
Automatic
No code changes needed. Just route API calls through the gateway.
Observable
Check logs to see every compression event with full metrics.
How It Works
- Intercepts requests: All LLM API calls from your agent go through the gateway proxy
- Tracks context usage: The gateway monitors token count across the conversation
- Background compression: When usage hits the threshold (default 75%), compressed summaries are pre-computed
- Instant swap: When the context limit is reached, the compressed history replaces the original — no wait time
- Compresses tool outputs: Large tool outputs (file reads, search results, etc.) are compressed on the fly
Logs & Monitoring
The gateway creates detailed logs for every compression event:
logs/history_compaction.jsonlWhen and how conversations are compressed
logs/tool_output_compression.jsonlTool output compression metrics and results
logs/telemetry.jsonlRequest/response timing and performance data
Example log entry
{
"timestamp": "2026-03-06T14:30:00Z",
"event": "history_compaction",
"agent": "claude_code",
"original_tokens": 180000,
"compressed_tokens": 54000,
"compression_ratio": 0.7,
"model": "espresso_v1",
"latency_ms": 1200
}Remote Deployment
Deploy the gateway as a service for team-wide usage:
Deploy as a service
# Using Docker
docker run -d \
-p 8080:8080 \
-e COMPRESR_API_KEY=cmp_your_api_key \
-e LLM_API_KEY=sk-xxx \
-e AGENT_TYPE=claude_code \
compresr/context-gateway:latest
# Or using the binary directly
COMPRESR_API_KEY=cmp_xxx LLM_API_KEY=sk-xxx context-gateway --port 8080Environment Variables for Deployment
All configuration options from the ~/.config/context-gateway/.env file can be passed as environment variables. This makes it easy to deploy via Docker, Kubernetes, or any container orchestration platform.
Benefits
- Zero latency: Compression happens in the background, not on the critical path
- Transparent: Works with existing tools and workflows without code changes
- Cost savings: Reduce token usage by 30-70%
- Extended conversations: Never hit context limits unexpectedly
- Better quality: Agent retains more important context after compression
Contributing
We welcome contributions! Please join our Discord to contribute.