Hanzo Gateway

Hanzo Gateway is the unified, high-performance API gateway for all Hanzo services. It serves as the single entry point for LLM inference, commerce, authentication, analytics, and every other Hanzo backend -- with rate limiting, circuit breakers, header forwarding, and telemetry driven by declarative JSON configuration. Two independent gateway instances run on separate Kubernetes clusters, each with its own routing table and rate-limit policy.

GitHub: github.com/hanzoai/gateway Docker: ghcr.io/hanzoai/gateway License: Apache-2.0

Endpoints

Cluster	Domain	Purpose
hanzo-k8s	`api.hanzo.ai`	All Hanzo services (LLM, Commerce, Auth, Analytics, Agents, Bot, etc.)
lux-k8s	`api.lux.network`	Lux blockchain RPC (C-Chain, X-Chain, P-Chain, indexers)

Both endpoints terminate TLS via Cloudflare (Hanzo) or DigitalOcean Load Balancer (Lux) and proxy to the gateway on port 8080 inside the cluster.

Features

LLM Proxy: OpenAI-compatible /v1/* endpoints routed to 100+ models via Cloud Backend and DO AI inference

Unified Routing: Single gateway fans out to 15+ backend services across 133+ endpoints (Cloud, Commerce, IAM, Analytics, Agents, Bot, Operative, KMS, Web3, Infra, Console, Team, Billing)

Rate Limiting: Per-IP and global rate limits on every endpoint, enforced at the gateway layer before requests reach backends

Circuit Breakers: Automatic backend failure isolation prevents cascading failures across services

Passthrough Encoding: All traffic uses no-op encoding -- the gateway never transforms request or response bodies

Header Forwarding: All client headers (including Authorization) are forwarded to backends

Health Checks: /__health liveness probe, GET /health public endpoint

Streaming: Passthrough encoding preserves SSE streams for chat completions

Telemetry: Structured logging with [GATEWAY] prefix, Prometheus-compatible metrics

Zero-Downtime Deploys: ConfigMap-based config with rolling restart

Multi-Cluster: Separate configs, images, and deployments for Hanzo and Lux clusters

Architecture

                    Internet
                       |
          +-----------+-----------+
          |                       |
   Cloudflare              DO Load Balancer
   api.hanzo.ai            api.lux.network
          |                       |
          v                       v
 +------------------+   +------------------+
 | hanzo-k8s        |   | lux-k8s          |
 | Gateway (2 pods) |   | Gateway (2 pods) |
 | port 8080        |   | port 8080        |
 +--------+---------+   +--------+---------+
          |                       |
    +-----+-----+          +-----+-----+
    |           |          |           |
 /v1/*      /commerce/*  /ext/bc/C/rpc  /ext/bc/X
    |           |          |              |
    v           v          v              v
 cloud-api  commerce    luxd:9630      luxd:9630
  :8000      :8001     (mainnet)      (mainnet)

Hanzo Cluster Route Map

Path Prefix	Backend Service	Port	Description
`/v1/chat/completions`	`cloud-api`	8000	LLM chat completions
`/v1/completions`	`cloud-api`	8000	LLM text completions
`/v1/embeddings`	DO AI (direct)	443	Text embeddings
`/v1/models`	DO AI (direct)	443	Model listing
`/v1/images/generations`	DO AI (direct)	443	Image generation
`/v1/audio/transcriptions`	DO AI (direct)	443	Speech-to-text
`/v1/audio/speech`	DO AI (direct)	443	Text-to-speech
`/v1/async-invoke`	DO AI (direct)	443	Async inference jobs
`/cloud/*`	`cloud-api`	8000	Cloud management API
`/ai/*`	`cloud-api`	8000	AI management API
`/commerce/*`	`commerce`	8001	Commerce / payments
`/billing/*`	`commerce`	8001	Billing API (v1)
`/auth/*`	`iam`	8000	IAM / authentication
`/analytics/*`	`analytics`	80	Usage analytics
`/agents/*`	`agents`	8080	Agent orchestration
`/bot/*`	`bot-gateway`	80	Bot management
`/operative/*`	`operative`	80	Computer-use agents
`/kms/*`	`kms`	80	Key management
`/web3/*`	`bootnode-api`	80	Blockchain API
`/infra/*`	`visor`	19000	Infrastructure management
`/console/*`	`console`	80	Console backend
`/team/*`	team services	various	Team / collaboration
`/health`	self	8080	Gateway health check

Lux Cluster Route Map

Path	Backend	Description
`/ext/bc/C/rpc`	`luxd:9630`	C-Chain EVM JSON-RPC
`/ext/bc/C/ws`	`luxd:9630`	C-Chain WebSocket
`/ext/bc/X`	`luxd:9630`	X-Chain (Exchange)
`/ext/bc/P`	`luxd:9630`	P-Chain (Platform)
`/ext/info`	`luxd:9630`	Node info
`/ext/health`	`luxd:9630`	Node health
`/ext/keystore`	`luxd:9630`	Keystore API
`/ext/index/C/block`	`luxd:9630`	C-Chain block index
`/ext/index/C/tx`	`luxd:9630`	C-Chain tx index
`/ext/index/X/tx`	`luxd:9630`	X-Chain tx index
`/ext/index/P/block`	`luxd:9630`	P-Chain block index
`/ext/index/P/tx`	`luxd:9630`	P-Chain tx index
`/ext/metrics`	`luxd:9630`	Prometheus metrics
`/ext/admin`	`luxd:9630`	Admin API

API Examples

Chat Completions

curl https://api.hanzo.ai/v1/chat/completions \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic-claude-haiku-4.5",
    "messages": [
      {"role": "user", "content": "Explain quantum computing in one paragraph."}
    ],
    "max_tokens": 256
  }'

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completions",
  "created": 1740000000,
  "model": "anthropic-claude-haiku-4.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing leverages quantum mechanical phenomena..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 85,
    "total_tokens": 97
  }
}

Streaming

curl https://api.hanzo.ai/v1/chat/completions \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-5-nano",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Embeddings

curl https://api.hanzo.ai/v1/embeddings \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Image Generation

curl https://api.hanzo.ai/v1/images/generations \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "dall-e-3",
    "prompt": "A futuristic city skyline at sunset",
    "n": 1,
    "size": "1024x1024"
  }'

List Models

curl https://api.hanzo.ai/v1/models \
  -H "Authorization: Bearer $HANZO_API_KEY"

SDK Usage

Python

from hanzoai import Hanzo

client = Hanzo(api_key="your-key")

response = client.chat.completions.create(
    model="zen4-pro",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

TypeScript

import Hanzo from '@hanzo/ai'

const client = new Hanzo({ apiKey: 'your-key' })

const response = await client.chat.completions.create({
  model: 'zen4-pro',
  messages: [{ role: 'user', content: 'Hello!' }]
})
console.log(response.choices[0].message.content)

OpenAI SDK (Drop-in)

Any OpenAI-compatible client works by changing the base URL:

from openai import OpenAI

client = OpenAI(
    api_key="your-hanzo-key",
    base_url="https://api.hanzo.ai/v1"
)

response = client.chat.completions.create(
    model="openai-gpt-5-nano",
    messages=[{"role": "user", "content": "Hello!"}]
)

Provider Routing

Chat and text completions are routed through the Cloud Backend (cloud-api), which selects the appropriate upstream provider based on the model ID prefix:

Prefix	Provider	Upstream URL
`zen-` / `zen4-`	Zen models (Hanzo Engine)	`engine.hanzo.svc`
`openai-`	OpenAI (via DO AI)	`inference.do-ai.run/v1`
`anthropic-`	Anthropic (via DO AI)	`inference.do-ai.run/v1`
`meta-`	Meta models (via DO AI)	`inference.do-ai.run/v1`
`mistral-`	Mistral models (via DO AI)	`inference.do-ai.run/v1`
`google-`	Google models (via DO AI)	`inference.do-ai.run/v1`

Embeddings, images, audio, and async-invoke endpoints bypass Cloud Backend and route directly to DO AI (inference.do-ai.run).

Model ID Format

Model IDs follow the pattern {provider}-{model-name}. Examples:

zen4-pro
zen4-mini
openai-gpt-5-nano
anthropic-claude-haiku-4.5
meta-llama-3.3-70b-instruct
mistral-small-3.1-24b-instruct
google-gemini-2.0-flash-001

Rate Limiting

Rate limits are enforced per-endpoint at the gateway layer using the qos/ratelimit/router configuration.

Hanzo Cluster (`api.hanzo.ai`)

Scope	Limit	Window
Global (all clients)	5,000 req/s	per second
Per IP	100 req	per minute

Lux Cluster (`api.lux.network`)

Scope	Limit	Window
Global (all clients)	1,000 req/s	per second
Per IP	100 req/s	per second

Rate-limited requests receive HTTP 429 Too Many Requests.

Infrastructure

Build

The gateway is a Go binary with declarative JSON-driven routing:

github.com/hanzoai/gateway/v2  (Go 1.25)

Docker Images

Image	Config	Cluster
`ghcr.io/hanzoai/gateway:hanzo-latest`	`configs/hanzo/gateway.json`	hanzo-k8s
`ghcr.io/hanzoai/gateway:lux-latest`	`configs/lux/gateway.json`	lux-k8s

Kubernetes Deployment

Both clusters run 2 replicas behind a ClusterIP service:

Deployment (2 replicas)
  image: ghcr.io/hanzoai/gateway:<cluster>-latest
  port: 8080
  resources:
    requests: 100m CPU, 128Mi RAM
    limits: 1 CPU, 512Mi RAM
  probes:
    liveness:  GET /__health (every 15s)
    readiness: GET /__health (every 5s)
  config: ConfigMap → /etc/gateway/gateway.json

The Hanzo cluster is exposed via an nginx Ingress with TLS (cert-manager + Let's Encrypt) at api.hanzo.ai. The Lux cluster uses a DigitalOcean LoadBalancer at 134.199.141.71.

DNS

Domain	Type	Target
`api.hanzo.ai`	A (CF proxied)	`24.199.76.156` (hanzo-k8s LB)
`llm.hanzo.ai`	CNAME	`api.hanzo.ai`
`api.lux.network`	A	`134.199.141.71` (lux-k8s LB)

Operations

# Validate configs
make validate

# Deploy to hanzo-k8s
make deploy-hanzo

# Deploy to lux-k8s
make deploy-lux

# Deploy both
make deploy

# Check status
make status

# Tail logs
make logs-hanzo
make logs-lux

Config Structure

configs/
  hanzo/gateway.json    # Hanzo API Gateway (15+ services)
  lux/gateway.json      # Lux blockchain RPC (14 endpoints)
k8s/
  hanzo/                # deployment, service, ingress
  lux/                  # deployment, service
Dockerfile              # Multi-stage build (Go build + Alpine runtime)
Makefile                # Build, validate, deploy commands

To add or modify a route:

Edit configs/<cluster>/gateway.json

Validate: make validate

Deploy: make deploy-hanzo or make deploy-lux

The Makefile creates a ConfigMap from the JSON config and triggers a rolling restart.

Health Checks

# Gateway internal health
curl https://api.hanzo.ai/__health

# Public health endpoint
curl https://api.hanzo.ai/health

# Lux gateway health
curl https://api.lux.network/ext/health

Infrastructure Stack

Hanzo Gateway is one of four products in the Hanzo AI infrastructure stack:

Product	Role	Repository
Hanzo Ingress	L7 reverse proxy, TLS termination, load balancing	`hanzoai/ingress`
Hanzo Gateway	API gateway, rate limiting, endpoint routing	`hanzoai/gateway`
Hanzo Engine	GPU inference engine, model serving	`hanzoai/engine`
Hanzo Edge	On-device inference runtime (mobile, web, embedded)	`hanzoai/edge`

Internet -> Ingress (TLS/L7) -> Gateway (API routing) -> Engine (inference) / Cloud API / Services
                                                          Edge (on-device, client-side)

Cloud Backend -- LLM provider routing and management

High-performance Rust inference engine with GPU acceleration

On-device inference runtime for mobile, web, and embedded

Identity and access management (hanzo.id)

Hanzo Gateway

On this page