Hanzo

Hanzo Gateway

Unified API gateway and LLM proxy — routes to 100+ AI providers

Hanzo Gateway

Hanzo Gateway is the unified, high-performance API gateway for all Hanzo services. It serves as the single entry point for LLM inference, commerce, authentication, analytics, and every other Hanzo backend -- with rate limiting, circuit breakers, header forwarding, and telemetry driven by declarative JSON configuration. Two independent gateway instances run on separate Kubernetes clusters, each with its own routing table and rate-limit policy.

GitHub: github.com/hanzoai/gateway Docker: ghcr.io/hanzoai/gateway License: Apache-2.0

Endpoints

ClusterDomainPurpose
hanzo-k8sapi.hanzo.aiAll Hanzo services (LLM, Commerce, Auth, Analytics, Agents, Bot, etc.)
lux-k8sapi.lux.networkLux blockchain RPC (C-Chain, X-Chain, P-Chain, indexers)

Both endpoints terminate TLS via Cloudflare (Hanzo) or DigitalOcean Load Balancer (Lux) and proxy to the gateway on port 8080 inside the cluster.

Features

  • LLM Proxy: OpenAI-compatible /v1/* endpoints routed to 100+ models via Cloud Backend and DO AI inference
  • Unified Routing: Single gateway fans out to 15+ backend services across 133+ endpoints (Cloud, Commerce, IAM, Analytics, Agents, Bot, Operative, KMS, Web3, Infra, Console, Team, Billing)
  • Rate Limiting: Per-IP and global rate limits on every endpoint, enforced at the gateway layer before requests reach backends
  • Circuit Breakers: Automatic backend failure isolation prevents cascading failures across services
  • Passthrough Encoding: All traffic uses no-op encoding -- the gateway never transforms request or response bodies
  • Header Forwarding: All client headers (including Authorization) are forwarded to backends
  • Health Checks: /__health liveness probe, GET /health public endpoint
  • Streaming: Passthrough encoding preserves SSE streams for chat completions
  • Telemetry: Structured logging with [GATEWAY] prefix, Prometheus-compatible metrics
  • Zero-Downtime Deploys: ConfigMap-based config with rolling restart
  • Multi-Cluster: Separate configs, images, and deployments for Hanzo and Lux clusters

Architecture

                    Internet
                       |
          +-----------+-----------+
          |                       |
   Cloudflare              DO Load Balancer
   api.hanzo.ai            api.lux.network
          |                       |
          v                       v
 +------------------+   +------------------+
 | hanzo-k8s        |   | lux-k8s          |
 | Gateway (2 pods) |   | Gateway (2 pods) |
 | port 8080        |   | port 8080        |
 +--------+---------+   +--------+---------+
          |                       |
    +-----+-----+          +-----+-----+
    |           |          |           |
 /v1/*      /commerce/*  /ext/bc/C/rpc  /ext/bc/X
    |           |          |              |
    v           v          v              v
 cloud-api  commerce    luxd:9630      luxd:9630
  :8000      :8001     (mainnet)      (mainnet)

Hanzo Cluster Route Map

Path PrefixBackend ServicePortDescription
/v1/chat/completionscloud-api8000LLM chat completions
/v1/completionscloud-api8000LLM text completions
/v1/embeddingsDO AI (direct)443Text embeddings
/v1/modelsDO AI (direct)443Model listing
/v1/images/generationsDO AI (direct)443Image generation
/v1/audio/transcriptionsDO AI (direct)443Speech-to-text
/v1/audio/speechDO AI (direct)443Text-to-speech
/v1/async-invokeDO AI (direct)443Async inference jobs
/cloud/*cloud-api8000Cloud management API
/ai/*cloud-api8000AI management API
/commerce/*commerce8001Commerce / payments
/billing/*commerce8001Billing API (v1)
/auth/*iam8000IAM / authentication
/analytics/*analytics80Usage analytics
/agents/*agents8080Agent orchestration
/bot/*bot-gateway80Bot management
/operative/*operative80Computer-use agents
/kms/*kms80Key management
/web3/*bootnode-api80Blockchain API
/infra/*visor19000Infrastructure management
/console/*console80Console backend
/team/*team servicesvariousTeam / collaboration
/healthself8080Gateway health check

Lux Cluster Route Map

PathBackendDescription
/ext/bc/C/rpcluxd:9630C-Chain EVM JSON-RPC
/ext/bc/C/wsluxd:9630C-Chain WebSocket
/ext/bc/Xluxd:9630X-Chain (Exchange)
/ext/bc/Pluxd:9630P-Chain (Platform)
/ext/infoluxd:9630Node info
/ext/healthluxd:9630Node health
/ext/keystoreluxd:9630Keystore API
/ext/index/C/blockluxd:9630C-Chain block index
/ext/index/C/txluxd:9630C-Chain tx index
/ext/index/X/txluxd:9630X-Chain tx index
/ext/index/P/blockluxd:9630P-Chain block index
/ext/index/P/txluxd:9630P-Chain tx index
/ext/metricsluxd:9630Prometheus metrics
/ext/adminluxd:9630Admin API

API Examples

Chat Completions

curl https://api.hanzo.ai/v1/chat/completions \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic-claude-haiku-4.5",
    "messages": [
      {"role": "user", "content": "Explain quantum computing in one paragraph."}
    ],
    "max_tokens": 256
  }'

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completions",
  "created": 1740000000,
  "model": "anthropic-claude-haiku-4.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing leverages quantum mechanical phenomena..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 85,
    "total_tokens": 97
  }
}

Streaming

curl https://api.hanzo.ai/v1/chat/completions \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-5-nano",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Embeddings

curl https://api.hanzo.ai/v1/embeddings \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Image Generation

curl https://api.hanzo.ai/v1/images/generations \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "dall-e-3",
    "prompt": "A futuristic city skyline at sunset",
    "n": 1,
    "size": "1024x1024"
  }'

List Models

curl https://api.hanzo.ai/v1/models \
  -H "Authorization: Bearer $HANZO_API_KEY"

SDK Usage

Python

from hanzoai import Hanzo

client = Hanzo(api_key="your-key")

response = client.chat.completions.create(
    model="zen4-pro",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

TypeScript

import Hanzo from '@hanzo/ai'

const client = new Hanzo({ apiKey: 'your-key' })

const response = await client.chat.completions.create({
  model: 'zen4-pro',
  messages: [{ role: 'user', content: 'Hello!' }]
})
console.log(response.choices[0].message.content)

OpenAI SDK (Drop-in)

Any OpenAI-compatible client works by changing the base URL:

from openai import OpenAI

client = OpenAI(
    api_key="your-hanzo-key",
    base_url="https://api.hanzo.ai/v1"
)

response = client.chat.completions.create(
    model="openai-gpt-5-nano",
    messages=[{"role": "user", "content": "Hello!"}]
)

Provider Routing

Chat and text completions are routed through the Cloud Backend (cloud-api), which selects the appropriate upstream provider based on the model ID prefix:

PrefixProviderUpstream URL
zen- / zen4-Zen models (Hanzo Engine)engine.hanzo.svc
openai-OpenAI (via DO AI)inference.do-ai.run/v1
anthropic-Anthropic (via DO AI)inference.do-ai.run/v1
meta-Meta models (via DO AI)inference.do-ai.run/v1
mistral-Mistral models (via DO AI)inference.do-ai.run/v1
google-Google models (via DO AI)inference.do-ai.run/v1

Embeddings, images, audio, and async-invoke endpoints bypass Cloud Backend and route directly to DO AI (inference.do-ai.run).

Model ID Format

Model IDs follow the pattern {provider}-{model-name}. Examples:

zen4-pro
zen4-mini
openai-gpt-5-nano
anthropic-claude-haiku-4.5
meta-llama-3.3-70b-instruct
mistral-small-3.1-24b-instruct
google-gemini-2.0-flash-001

Rate Limiting

Rate limits are enforced per-endpoint at the gateway layer using the qos/ratelimit/router configuration.

Hanzo Cluster (api.hanzo.ai)

ScopeLimitWindow
Global (all clients)5,000 req/sper second
Per IP100 reqper minute

Lux Cluster (api.lux.network)

ScopeLimitWindow
Global (all clients)1,000 req/sper second
Per IP100 req/sper second

Rate-limited requests receive HTTP 429 Too Many Requests.

Infrastructure

Build

The gateway is a Go binary with declarative JSON-driven routing:

github.com/hanzoai/gateway/v2  (Go 1.25)

Docker Images

ImageConfigCluster
ghcr.io/hanzoai/gateway:hanzo-latestconfigs/hanzo/gateway.jsonhanzo-k8s
ghcr.io/hanzoai/gateway:lux-latestconfigs/lux/gateway.jsonlux-k8s

Kubernetes Deployment

Both clusters run 2 replicas behind a ClusterIP service:

Deployment (2 replicas)
  image: ghcr.io/hanzoai/gateway:<cluster>-latest
  port: 8080
  resources:
    requests: 100m CPU, 128Mi RAM
    limits: 1 CPU, 512Mi RAM
  probes:
    liveness:  GET /__health (every 15s)
    readiness: GET /__health (every 5s)
  config: ConfigMap → /etc/gateway/gateway.json

The Hanzo cluster is exposed via an nginx Ingress with TLS (cert-manager + Let's Encrypt) at api.hanzo.ai. The Lux cluster uses a DigitalOcean LoadBalancer at 134.199.141.71.

DNS

DomainTypeTarget
api.hanzo.aiA (CF proxied)24.199.76.156 (hanzo-k8s LB)
llm.hanzo.aiCNAMEapi.hanzo.ai
api.lux.networkA134.199.141.71 (lux-k8s LB)

Operations

# Validate configs
make validate

# Deploy to hanzo-k8s
make deploy-hanzo

# Deploy to lux-k8s
make deploy-lux

# Deploy both
make deploy

# Check status
make status

# Tail logs
make logs-hanzo
make logs-lux

Config Structure

configs/
  hanzo/gateway.json    # Hanzo API Gateway (15+ services)
  lux/gateway.json      # Lux blockchain RPC (14 endpoints)
k8s/
  hanzo/                # deployment, service, ingress
  lux/                  # deployment, service
Dockerfile              # Multi-stage build (Go build + Alpine runtime)
Makefile                # Build, validate, deploy commands

To add or modify a route:

  1. Edit configs/<cluster>/gateway.json
  2. Validate: make validate
  3. Deploy: make deploy-hanzo or make deploy-lux

The Makefile creates a ConfigMap from the JSON config and triggers a rolling restart.

Health Checks

# Gateway internal health
curl https://api.hanzo.ai/__health

# Public health endpoint
curl https://api.hanzo.ai/health

# Lux gateway health
curl https://api.lux.network/ext/health

Infrastructure Stack

Hanzo Gateway is one of four products in the Hanzo AI infrastructure stack:

ProductRoleRepository
Hanzo IngressL7 reverse proxy, TLS termination, load balancinghanzoai/ingress
Hanzo GatewayAPI gateway, rate limiting, endpoint routinghanzoai/gateway
Hanzo EngineGPU inference engine, model servinghanzoai/engine
Hanzo EdgeOn-device inference runtime (mobile, web, embedded)hanzoai/edge
Internet -> Ingress (TLS/L7) -> Gateway (API routing) -> Engine (inference) / Cloud API / Services
                                                          Edge (on-device, client-side)

Cloud Backend -- LLM provider routing and management

High-performance Rust inference engine with GPU acceleration

On-device inference runtime for mobile, web, and embedded

Identity and access management (hanzo.id)

How is this guide?

Last updated on

On this page