Docker Swarm
How to self-host Hanzo KMS with Docker Swarm (HA).
Self-Hosting Hanzo KMS with Docker Swarm
This guide will provide step-by-step instructions on how to self-host Hanzo KMS using Docker Swarm. This is particularly helpful for those wanting to self-host Hanzo KMS on premise while still maintaining high availability (HA) for the core Hanzo KMS components. The guide will demonstrate a setup with three nodes, ensuring that the cluster can tolerate the failure of one node while remaining fully operational.
Docker Swarm
Docker Swarm is a native clustering and orchestration solution for Docker containers. It simplifies the deployment and management of containerized applications across multiple nodes, making it a great choice for self-hosting Hanzo KMS.
Unlike Kubernetes, which requires a deep understanding of the Kubernetes ecosystem, if you're accustomed to Docker and Docker Compose, you're already familiar with most of Docker Swarm. For this reason, we suggest teams use Docker Swarm to deploy Hanzo KMS in a highly available and fault tolerant manner.
Prerequisites
- Understanding of Docker Swarm
- Bare/Virtual Machines with Docker installed on each VM.
- Docker Swarm initialized on the VMs.
Core Components for High Availability
The provided Docker stack includes the following core components to achieve high availability:
- Redis: Redis is used for caching and is set up with Redis Sentinel for HA. The stack includes three Redis replicas and three Redis Sentinel instances for monitoring and failover.
- Hanzo KMS: Hanzo KMS is stateless, allowing for easy scaling and replication across multiple nodes.
- HAProxy: HAProxy is used as a load balancer to distribute traffic to the PostgreSQL and Redis instances. It is configured to perform health checks and route requests to the appropriate backend services.
Node Failure Tolerance
To ensure Hanzo KMS is highly available and fault tolerant, it's important to choose the number of nodes in the cluster. The following table shows the relationship between the number of nodes and the maximum number of nodes that can be down while the cluster continues to function:
| Total Nodes | Max Nodes Down | Min Nodes Required |
|---|---|---|
| 1 | 0 | 1 |
| 2 | 0 | 2 |
| 3 | 1 | 2 |
| 4 | 1 | 3 |
| 5 | 2 | 3 |
| 6 | 2 | 4 |
| 7 | 3 | 4 |
The formula for calculating the minimum number of nodes required is: floor(n/2) + 1, where n is the total number of nodes.
This guide will demonstrate a setup with three nodes, which allows for one node to be down while the cluster remains operational. This fault tolerance applies to the following components:
- Redis Sentinel: With three Sentinel instances, one instance can be down, and the remaining two can still form a quorum to make decisions.
- Redis: With three Redis instances (one master and two replicas), one instance can be down, and the remaining two can continue to provide caching services.
- PostgreSQL: With three PostgreSQL instances managed by Patroni and etcd, one instance can be down, and the remaining two can maintain data consistency and availability.
- Manager Nodes: In a Docker Swarm cluster with three manager nodes, one manager node can be down, and the remaining two can continue to manage the cluster. For the sake of simplicity, the example in this guide only contains one manager node.
It's important to note that while the cluster can tolerate the failure of one node in a three-node setup, it's recommended to have a minimum of three nodes to ensure high availability. With two nodes, the failure of a single node can result in a loss of quorum and potential downtime.
Docker Deployment Stack Overview
The Docker stack file used in this guide defines the services and their configurations for deploying Hanzo KMS in a highly available manner. The main components of this stack are as follows.
- HAProxy: The HAProxy service is configured to expose ports for accessing PostgreSQL (5433 for the master, 5434 for replicas), Redis master (6379), and the Hanzo KMS backend (8080). It uses a config file (
haproxy.cfg) to define the load balancing and health check rules. - Hanzo KMS: The Hanzo KMS backend service is deployed with the latest PostgreSQL-compatible image. It is connected to the
kmsnetwork and uses secrets for environment variables. - etcd: Three etcd instances (etcd1, etcd2, etcd3) are deployed, one on each node, to provide distributed key-value storage for leader election and configuration management.
- Spilo: Three Spilo instances (spolo1, spolo2, spolo3) are deployed, one on each node, to run PostgreSQL with Patroni for high availability. They are connected to the
kmsnetwork and use persistent volumes for data storage. - Redis: Three Redis instances (redis_replica0, redis_replica1, redis_replica2) are deployed, one on each node, with redis_replica0 acting as the master. They are connected to the
kmsnetwork. - Redis Sentinel: Three Redis Sentinel instances (redis_sentinel1, redis_sentinel2, redis_sentinel3) are deployed, one on each node, to monitor and manage the Redis instances. They are connected to the
kmsnetwork.
Deployment instructions
Run the following on each node to install the Docker engine.
curl -fsSL https://get.docker.com -o get-docker.sh && sh get-docker.sh docker swarm init Replace <MANAGER_NODE_IP> with the IP address of the VM that will serve as the manager node. Remember to copy the join token returned by the this init command.
For the sake of simplicity, we only use one manager node in this example deployment. However, in production settings, we recommended you have at least 3 manager nodes.
docker swarm join --token <JOIN_TOKEN> <MANAGER_NODE_IP>:2377Replace <JOIN_TOKEN> with the token provided by the manager node during initialization.
Labels on nodes will help us select where stateful components such as Postgres and Redis are deployed on. To label nodes, follow the steps below.
docker node update --label-add name=node1 <NODE1_ID>
docker node update --label-add name=node2 <NODE2_ID>
docker node update --label-add name=node3 <NODE3_ID>Replace <NODE1_ID>, <NODE2_ID>, and <NODE3_ID> with the respective node IDs.
To view the list of nodes and their ids, run the following on the manager node docker node ls.
Copy the Docker stack YAML file, HAProxy configuration file and example .env file to the manager node. Ensure that all 3 files are placed in the same file directory.
- Docker stack file (rename to kms-stack.yaml)
- HA configuration file (rename to haproxy.cfg)
- Example .env file (rename to .env)
docker stack deploy -c kms-stack.yaml kms$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
4kzq3ub8qgn9 kms_etcd1 replicated 1/1 ghcr.io/zalando/spilo-16:3.2-p2
tqx9t82bn8d9 kms_etcd2 replicated 1/1 ghcr.io/zalando/spilo-16:3.2-p2
t8vbkrasy8fz kms_etcd3 replicated 1/1 ghcr.io/zalando/spilo-16:3.2-p2
77iei42fcf6q kms_haproxy global 4/4 haproxy:latest *:5002-5003->5433-5434/tcp, *:6379->6379/tcp, *:7001->7000/tcp, *:8080->8080/tcp
jaewzqy8md56 kms_kms replicated 5/5 kms/kms:v0.60.1-postgres
58w4zablfbtb kms_redis_replica0 replicated 1/1 bitnami/redis:6.2.10
w4yag2whq0un kms_redis_replica1 replicated 1/1 bitnami/redis:6.2.10
w03mriy0jave kms_redis_replica2 replicated 1/1 bitnami/redis:6.2.10
ppo6rk47hc9t kms_redis_sentinel1 replicated 1/1 bitnami/redis-sentinel:6.2.10
ub29vd0lnq7f kms_redis_sentinel2 replicated 1/1 bitnami/redis-sentinel:6.2.10
szg3yky7yji2 kms_redis_sentinel3 replicated 1/1 bitnami/redis-sentinel:6.2.10
eqtocpf5tiy0 kms_spolo1 replicated 1/1 ghcr.io/zalando/spilo-16:3.2-p2
3lznscvk7k5t kms_spolo2 replicated 1/1 ghcr.io/zalando/spilo-16:3.2-p2
v04ml7rz2j5q kms_spolo3 replicated 1/1 ghcr.io/zalando/spilo-16:3.2-p2
To view the health of services in your Hanzo KMS cluster, visit port <NODE-IP>:7001 of any node in your Docker swarm.
This port will expose the HA Proxy stats.
Run the following command to view the IPs of the nodes in your docker swarm.
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
0jnegl4gpo235l66nglcwc07t localhost Ready Active 26.0.2
no1a7zwj88057k73m196ulkq6 * localhost Ready Active Leader 26.0.2
wcb2x27w3tq7ht4v1h7ke49qk localhost Ready Active 26.0.2
zov5q7uop7wpxc2ndz712v9oa localhost Ready Active 26.0.2The stats page may take 1-2 minutes to become accessible.
Once all expected services are up and running, visit <NODE-IP>:8080 of any node in the swarm. This will take you to the Hanzo KMS configuration page.
FAQ
To further scale and make the system more resilient, you can add more nodes to the Docker Swarm and update the stack configuration accordingly:
-
Add new VMs and join them to the Docker Swarm as worker nodes.
-
Update the Docker stack YAML file to include the new nodes in the
deploysection of the relevant services, specifying the appropriatenode.labels.nameconstraints. -
Update the HAProxy configuration file (
haproxy.cfg) to include the new nodes in the backend sections for PostgreSQL and Redis. -
Redeploy the updated stack using the
docker stack deploycommand.
Note that the database containers (PostgreSQL) are stateful and cannot be simply replicated. Instead, one database instance is deployed per node to ensure data consistency and avoid conflicts.
Native tooling for scheduled backups of Postgres and Redis is currently in development. In the meantime, we recommend using a variety of open-source tools available for this purpose. For Postgres, Spilo provides built-in support for scheduled data dumps. You can explore other third party tools for managing db backups, one such tool is docker-db-backup.
How is this guide?
Last updated on