Hanzo Gateway
Unified API gateway and LLM proxy — routes to 100+ AI providers
Hanzo Gateway
Hanzo Gateway is the unified, high-performance API gateway for all Hanzo services. It serves as the single entry point for LLM inference, commerce, authentication, analytics, and every other Hanzo backend -- with rate limiting, circuit breakers, header forwarding, and telemetry driven by declarative JSON configuration. Two independent gateway instances run on separate Kubernetes clusters, each with its own routing table and rate-limit policy.
GitHub: github.com/hanzoai/gateway
Docker: ghcr.io/hanzoai/gateway
License: Apache-2.0
Endpoints
| Cluster | Domain | Purpose |
|---|---|---|
| hanzo-k8s | api.hanzo.ai | All Hanzo services (LLM, Commerce, Auth, Analytics, Agents, Bot, etc.) |
| lux-k8s | api.lux.network | Lux blockchain RPC (C-Chain, X-Chain, P-Chain, indexers) |
Both endpoints terminate TLS via Cloudflare (Hanzo) or DigitalOcean Load Balancer (Lux) and proxy to the gateway on port 8080 inside the cluster.
Features
- LLM Proxy: OpenAI-compatible
/v1/*endpoints routed to 100+ models via Cloud Backend and DO AI inference - Unified Routing: Single gateway fans out to 15+ backend services across 133+ endpoints (Cloud, Commerce, IAM, Analytics, Agents, Bot, Operative, KMS, Web3, Infra, Console, Team, Billing)
- Rate Limiting: Per-IP and global rate limits on every endpoint, enforced at the gateway layer before requests reach backends
- Circuit Breakers: Automatic backend failure isolation prevents cascading failures across services
- Passthrough Encoding: All traffic uses
no-opencoding -- the gateway never transforms request or response bodies - Header Forwarding: All client headers (including
Authorization) are forwarded to backends - Health Checks:
/__healthliveness probe,GET /healthpublic endpoint - Streaming: Passthrough encoding preserves SSE streams for chat completions
- Telemetry: Structured logging with
[GATEWAY]prefix, Prometheus-compatible metrics - Zero-Downtime Deploys: ConfigMap-based config with rolling restart
- Multi-Cluster: Separate configs, images, and deployments for Hanzo and Lux clusters
Architecture
Internet
|
+-----------+-----------+
| |
Cloudflare DO Load Balancer
api.hanzo.ai api.lux.network
| |
v v
+------------------+ +------------------+
| hanzo-k8s | | lux-k8s |
| Gateway (2 pods) | | Gateway (2 pods) |
| port 8080 | | port 8080 |
+--------+---------+ +--------+---------+
| |
+-----+-----+ +-----+-----+
| | | |
/v1/* /commerce/* /ext/bc/C/rpc /ext/bc/X
| | | |
v v v v
cloud-api commerce luxd:9630 luxd:9630
:8000 :8001 (mainnet) (mainnet)Hanzo Cluster Route Map
| Path Prefix | Backend Service | Port | Description |
|---|---|---|---|
/v1/chat/completions | cloud-api | 8000 | LLM chat completions |
/v1/completions | cloud-api | 8000 | LLM text completions |
/v1/embeddings | DO AI (direct) | 443 | Text embeddings |
/v1/models | DO AI (direct) | 443 | Model listing |
/v1/images/generations | DO AI (direct) | 443 | Image generation |
/v1/audio/transcriptions | DO AI (direct) | 443 | Speech-to-text |
/v1/audio/speech | DO AI (direct) | 443 | Text-to-speech |
/v1/async-invoke | DO AI (direct) | 443 | Async inference jobs |
/cloud/* | cloud-api | 8000 | Cloud management API |
/ai/* | cloud-api | 8000 | AI management API |
/commerce/* | commerce | 8001 | Commerce / payments |
/billing/* | commerce | 8001 | Billing API (v1) |
/auth/* | iam | 8000 | IAM / authentication |
/analytics/* | analytics | 80 | Usage analytics |
/agents/* | agents | 8080 | Agent orchestration |
/bot/* | bot-gateway | 80 | Bot management |
/operative/* | operative | 80 | Computer-use agents |
/kms/* | kms | 80 | Key management |
/web3/* | bootnode-api | 80 | Blockchain API |
/infra/* | visor | 19000 | Infrastructure management |
/console/* | console | 80 | Console backend |
/team/* | team services | various | Team / collaboration |
/health | self | 8080 | Gateway health check |
Lux Cluster Route Map
| Path | Backend | Description |
|---|---|---|
/ext/bc/C/rpc | luxd:9630 | C-Chain EVM JSON-RPC |
/ext/bc/C/ws | luxd:9630 | C-Chain WebSocket |
/ext/bc/X | luxd:9630 | X-Chain (Exchange) |
/ext/bc/P | luxd:9630 | P-Chain (Platform) |
/ext/info | luxd:9630 | Node info |
/ext/health | luxd:9630 | Node health |
/ext/keystore | luxd:9630 | Keystore API |
/ext/index/C/block | luxd:9630 | C-Chain block index |
/ext/index/C/tx | luxd:9630 | C-Chain tx index |
/ext/index/X/tx | luxd:9630 | X-Chain tx index |
/ext/index/P/block | luxd:9630 | P-Chain block index |
/ext/index/P/tx | luxd:9630 | P-Chain tx index |
/ext/metrics | luxd:9630 | Prometheus metrics |
/ext/admin | luxd:9630 | Admin API |
API Examples
Chat Completions
curl https://api.hanzo.ai/v1/chat/completions \
-H "Authorization: Bearer $HANZO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic-claude-haiku-4.5",
"messages": [
{"role": "user", "content": "Explain quantum computing in one paragraph."}
],
"max_tokens": 256
}'Response:
{
"id": "chatcmpl-abc123",
"object": "chat.completions",
"created": 1740000000,
"model": "anthropic-claude-haiku-4.5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing leverages quantum mechanical phenomena..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 85,
"total_tokens": 97
}
}Streaming
curl https://api.hanzo.ai/v1/chat/completions \
-H "Authorization: Bearer $HANZO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai-gpt-5-nano",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'Embeddings
curl https://api.hanzo.ai/v1/embeddings \
-H "Authorization: Bearer $HANZO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "The quick brown fox jumps over the lazy dog"
}'Image Generation
curl https://api.hanzo.ai/v1/images/generations \
-H "Authorization: Bearer $HANZO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "dall-e-3",
"prompt": "A futuristic city skyline at sunset",
"n": 1,
"size": "1024x1024"
}'List Models
curl https://api.hanzo.ai/v1/models \
-H "Authorization: Bearer $HANZO_API_KEY"SDK Usage
Python
from hanzoai import Hanzo
client = Hanzo(api_key="your-key")
response = client.chat.completions.create(
model="zen4-pro",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)TypeScript
import Hanzo from '@hanzo/ai'
const client = new Hanzo({ apiKey: 'your-key' })
const response = await client.chat.completions.create({
model: 'zen4-pro',
messages: [{ role: 'user', content: 'Hello!' }]
})
console.log(response.choices[0].message.content)OpenAI SDK (Drop-in)
Any OpenAI-compatible client works by changing the base URL:
from openai import OpenAI
client = OpenAI(
api_key="your-hanzo-key",
base_url="https://api.hanzo.ai/v1"
)
response = client.chat.completions.create(
model="openai-gpt-5-nano",
messages=[{"role": "user", "content": "Hello!"}]
)Provider Routing
Chat and text completions are routed through the Cloud Backend (cloud-api), which selects the appropriate upstream provider based on the model ID prefix:
| Prefix | Provider | Upstream URL |
|---|---|---|
zen- / zen4- | Zen models (Hanzo Engine) | engine.hanzo.svc |
openai- | OpenAI (via DO AI) | inference.do-ai.run/v1 |
anthropic- | Anthropic (via DO AI) | inference.do-ai.run/v1 |
meta- | Meta models (via DO AI) | inference.do-ai.run/v1 |
mistral- | Mistral models (via DO AI) | inference.do-ai.run/v1 |
google- | Google models (via DO AI) | inference.do-ai.run/v1 |
Embeddings, images, audio, and async-invoke endpoints bypass Cloud Backend and route directly to DO AI (inference.do-ai.run).
Model ID Format
Model IDs follow the pattern {provider}-{model-name}. Examples:
zen4-pro
zen4-mini
openai-gpt-5-nano
anthropic-claude-haiku-4.5
meta-llama-3.3-70b-instruct
mistral-small-3.1-24b-instruct
google-gemini-2.0-flash-001Rate Limiting
Rate limits are enforced per-endpoint at the gateway layer using the qos/ratelimit/router configuration.
Hanzo Cluster (api.hanzo.ai)
| Scope | Limit | Window |
|---|---|---|
| Global (all clients) | 5,000 req/s | per second |
| Per IP | 100 req | per minute |
Lux Cluster (api.lux.network)
| Scope | Limit | Window |
|---|---|---|
| Global (all clients) | 1,000 req/s | per second |
| Per IP | 100 req/s | per second |
Rate-limited requests receive HTTP 429 Too Many Requests.
Infrastructure
Build
The gateway is a Go binary with declarative JSON-driven routing:
github.com/hanzoai/gateway/v2 (Go 1.25)Docker Images
| Image | Config | Cluster |
|---|---|---|
ghcr.io/hanzoai/gateway:hanzo-latest | configs/hanzo/gateway.json | hanzo-k8s |
ghcr.io/hanzoai/gateway:lux-latest | configs/lux/gateway.json | lux-k8s |
Kubernetes Deployment
Both clusters run 2 replicas behind a ClusterIP service:
Deployment (2 replicas)
image: ghcr.io/hanzoai/gateway:<cluster>-latest
port: 8080
resources:
requests: 100m CPU, 128Mi RAM
limits: 1 CPU, 512Mi RAM
probes:
liveness: GET /__health (every 15s)
readiness: GET /__health (every 5s)
config: ConfigMap → /etc/gateway/gateway.jsonThe Hanzo cluster is exposed via an nginx Ingress with TLS (cert-manager + Let's Encrypt) at api.hanzo.ai. The Lux cluster uses a DigitalOcean LoadBalancer at 134.199.141.71.
DNS
| Domain | Type | Target |
|---|---|---|
api.hanzo.ai | A (CF proxied) | 24.199.76.156 (hanzo-k8s LB) |
llm.hanzo.ai | CNAME | api.hanzo.ai |
api.lux.network | A | 134.199.141.71 (lux-k8s LB) |
Operations
# Validate configs
make validate
# Deploy to hanzo-k8s
make deploy-hanzo
# Deploy to lux-k8s
make deploy-lux
# Deploy both
make deploy
# Check status
make status
# Tail logs
make logs-hanzo
make logs-luxConfig Structure
configs/
hanzo/gateway.json # Hanzo API Gateway (15+ services)
lux/gateway.json # Lux blockchain RPC (14 endpoints)
k8s/
hanzo/ # deployment, service, ingress
lux/ # deployment, service
Dockerfile # Multi-stage build (Go build + Alpine runtime)
Makefile # Build, validate, deploy commandsTo add or modify a route:
- Edit
configs/<cluster>/gateway.json - Validate:
make validate - Deploy:
make deploy-hanzoormake deploy-lux
The Makefile creates a ConfigMap from the JSON config and triggers a rolling restart.
Health Checks
# Gateway internal health
curl https://api.hanzo.ai/__health
# Public health endpoint
curl https://api.hanzo.ai/health
# Lux gateway health
curl https://api.lux.network/ext/healthInfrastructure Stack
Hanzo Gateway is one of four products in the Hanzo AI infrastructure stack:
| Product | Role | Repository |
|---|---|---|
| Hanzo Ingress | L7 reverse proxy, TLS termination, load balancing | hanzoai/ingress |
| Hanzo Gateway | API gateway, rate limiting, endpoint routing | hanzoai/gateway |
| Hanzo Engine | GPU inference engine, model serving | hanzoai/engine |
| Hanzo Edge | On-device inference runtime (mobile, web, embedded) | hanzoai/edge |
Internet -> Ingress (TLS/L7) -> Gateway (API routing) -> Engine (inference) / Cloud API / Services
Edge (on-device, client-side)Related Services
Cloud Backend -- LLM provider routing and management
High-performance Rust inference engine with GPU acceleration
On-device inference runtime for mobile, web, and embedded
Identity and access management (hanzo.id)
How is this guide?
Last updated on