Monitoring and Telemetry Setup
Learn how to set up monitoring and telemetry for your self-hosted Hanzo KMS instance using Grafana, Prometheus, and OpenTelemetry.
Hanzo KMS provides comprehensive monitoring and telemetry capabilities to help you monitor the health, performance, and usage of your self-hosted instance. This guide covers setting up monitoring using Grafana with two different telemetry collection approaches.
Overview
Hanzo KMS exports metrics in OpenTelemetry (OTEL) format, which provides maximum flexibility for your monitoring infrastructure. While this guide focuses on Grafana, the OTEL format means you can easily integrate with:
- Cloud-native monitoring: AWS CloudWatch, Google Cloud Monitoring, Azure Monitor
- Observability platforms: Datadog, New Relic, Splunk, Dynatrace
- Custom backends: Any system that supports OTEL ingestion
- Traditional monitoring: Prometheus, Grafana (as covered in this guide)
Hanzo KMS supports two telemetry collection methods:
- Pull-based (Prometheus): Exposes metrics on a dedicated endpoint for Prometheus to scrape
- Push-based (OTLP): Sends metrics to an OpenTelemetry Collector via OTLP protocol
Both approaches provide the same metrics data in OTEL format, so you can choose the one that best fits your infrastructure and monitoring strategy.
Prerequisites
- Self-hosted Hanzo KMS instance running
- Access to deploy monitoring services (Prometheus, Grafana, etc.)
- Basic understanding of Prometheus and Grafana
Setup
Environment Variables
Configure the following environment variables in your Hanzo KMS backend:
# Enable telemetry collection
OTEL_TELEMETRY_COLLECTION_ENABLED=true
# Choose export type: "prometheus" or "otlp"
OTEL_EXPORT_TYPE=prometheusThis approach exposes metrics on port 9464 at the /metrics endpoint, allowing Prometheus to scrape the data. The metrics are exposed in Prometheus format but originate from OpenTelemetry instrumentation.
Configuration
OTEL_TELEMETRY_COLLECTION_ENABLED=true
OTEL_EXPORT_TYPE=prometheusExpose the metrics port in your Hanzo KMS backend:
- Docker: Expose port 9464
- Kubernetes: Create a service exposing port 9464
- Other: Ensure port 9464 is accessible to your monitoring stack
Create prometheus.yml:
global:
scrape_interval: 30s
evaluation_interval: 30s
scrape_configs:
- job_name: "kms"
scrape_interval: 30s
static_configs:
- targets: ["kms-backend:9464"] # Adjust hostname/port based on your deployment
metrics_path: "/metrics"Replace kms-backend:9464 with the actual hostname and port where your Hanzo KMS backend is running. This could be:
- Docker Compose:
kms-backend:9464(service name) - Kubernetes:
kms-backend.default.svc.cluster.local:9464(service name) - Bare Metal:
192.168.1.100:9464(actual IP address) - Cloud:
your-kms.example.com:9464(domain name)
Deployment Options
Once you've configured Hanzo KMS to expose metrics, you'll need to deploy Prometheus to scrape and store them. Below are examples for different deployment environments. Choose the option that matches your infrastructure.
services:
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
command:
- "--config.file=/etc/prometheus/prometheus.yml"
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=admin# prometheus-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus:latest
ports:
- containerPort: 9090
volumeMounts:
- name: config
mountPath: /etc/prometheus
volumes:
- name: config
configMap:
name: prometheus-config
---
# prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
selector:
app: prometheus
ports:
- port: 9090
targetPort: 9090
type: ClusterIPhelm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/prometheus \
--set server.config.global.scrape_interval=30s \
--set server.config.scrape_configs[0].job_name=kms \
--set server.config.scrape_configs[0].static_configs[0].targets[0]=kms-backend:9464This approach sends metrics directly to an OpenTelemetry Collector via the OTLP protocol. This gives you the most flexibility as you can configure the collector to export to multiple backends simultaneously.
Configuration
OTEL_TELEMETRY_COLLECTION_ENABLED=true
OTEL_EXPORT_TYPE=otlp
OTEL_EXPORT_OTLP_ENDPOINT=http://otel-collector:4318/v1/metrics
OTEL_COLLECTOR_BASIC_AUTH_USERNAME=kms
OTEL_COLLECTOR_BASIC_AUTH_PASSWORD=kms
OTEL_OTLP_PUSH_INTERVAL=30000Create otel-collector-config.yaml:
extensions:
health_check:
pprof:
zpages:
basicauth/server:
htpasswd:
inline: |
your_username:your_password
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
auth:
authenticator: basicauth/server
prometheus:
config:
scrape_configs:
- job_name: otel-collector
scrape_interval: 30s
static_configs:
- targets: [kms-backend:9464]
metric_relabel_configs:
- action: labeldrop
regex: "service_instance_id|service_name"
processors:
batch:
exporters:
prometheus:
endpoint: "0.0.0.0:8889"
auth:
authenticator: basicauth/server
resource_to_telemetry_conversion:
enabled: true
service:
extensions: [basicauth/server, health_check, pprof, zpages]
pipelines:
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheus]Replace your_username:your_password with your chosen credentials. These must match the values you set in Hanzo KMS's OTEL_COLLECTOR_BASIC_AUTH_USERNAME and OTEL_COLLECTOR_BASIC_AUTH_PASSWORD environment variables.
Create Prometheus configuration for the collector:
global:
scrape_interval: 30s
evaluation_interval: 30s
scrape_configs:
- job_name: "otel-collector"
scrape_interval: 30s
static_configs:
- targets: ["otel-collector:8889"] # Adjust hostname/port based on your deployment
metrics_path: "/metrics"Replace otel-collector:8889 with the actual hostname and port where your OpenTelemetry Collector is running. This could be:
- Docker Compose:
otel-collector:8889(service name) - Kubernetes:
otel-collector.default.svc.cluster.local:8889(service name) - Bare Metal:
192.168.1.100:8889(actual IP address) - Cloud:
your-collector.example.com:8889(domain name)
Deployment Options
After configuring Hanzo KMS and the OpenTelemetry Collector, you'll need to deploy the collector to receive metrics from Hanzo KMS. Below are examples for different deployment environments. Choose the option that matches your infrastructure.
services:
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
ports:
- 4318:4318 # OTLP http receiver
- 8889:8889 # Prometheus exporter metrics
volumes:
- ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml:ro
command:
- "--config=/etc/otelcol-contrib/config.yaml"# otel-collector-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector
spec:
replicas: 1
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- name: otel-collector
image: otel/opentelemetry-collector-contrib:latest
ports:
- containerPort: 4318
- containerPort: 8889
volumeMounts:
- name: config
mountPath: /etc/otelcol-contrib
volumes:
- name: config
configMap:
name: otel-collector-confighelm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm install otel-collector open-telemetry/opentelemetry-collector \
--set config.receivers.otlp.protocols.http.endpoint=0.0.0.0:4318 \
--set config.exporters.prometheus.endpoint=0.0.0.0:8889Available Metrics
Hanzo KMS exposes the following key metrics in OpenTelemetry format:
Core API Metrics
These metrics track all HTTP API requests to Hanzo KMS, including request counts, latency, and errors. Use these to monitor overall API health, identify performance bottlenecks, and track usage patterns across users and machine identities.
Metric Name: kms.http.server.request.count
Type: Counter
Unit: {request}
Description: Total number of API requests to Hanzo KMS (covers both human users and machine identities)
Attributes:
kms.organization.id(string): Organization IDkms.organization.name(string): Organization name (e.g., "Platform Engineering Team")kms.user.id(string, optional): User ID if human userkms.user.email(string, optional): User email (e.g., "jane.doe@cisco.com")kms.identity.id(string, optional): Machine identity IDkms.identity.name(string, optional): Machine identity name (e.g., "prod-k8s-operator")kms.auth.method(string, optional): Auth method usedhttp.request.method(string): HTTP method (GET, POST, PUT, DELETE)http.route(string): API endpoint route patternhttp.response.status_code(int): HTTP status codekms.project.id(string, optional): Project IDkms.project.name(string, optional): Project nameuser_agent.original(string, optional): User agent stringclient.address(string, optional): IP address
Metric Name: kms.http.server.request.duration
Type: Histogram
Unit: s (seconds)
Description: API request latency
Buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
Attributes:
kms.organization.id(string): Organization IDkms.organization.name(string): Organization namekms.user.id(string, optional): User ID if human userkms.user.email(string, optional): User emailkms.identity.id(string, optional): Machine identity IDkms.identity.name(string, optional): Machine identity namehttp.request.method(string): HTTP methodhttp.route(string): API endpoint route patternhttp.response.status_code(int): HTTP status codekms.project.id(string, optional): Project IDkms.project.name(string, optional): Project name
Metric Name: kms.http.server.error.count
Type: Counter
Unit: {error}
Description: API errors grouped by actor (for identifying misconfigured services)
Attributes:
kms.organization.id(string): Organization IDkms.organization.name(string): Organization namekms.user.id(string, optional): User ID if humankms.user.email(string, optional): User emailkms.identity.id(string, optional): Identity ID if machinekms.identity.name(string, optional): Identity namehttp.route(string): API endpoint where error occurredhttp.request.method(string): HTTP methoderror.type(string): Error category/type (client_error, server_error, auth_error, rate_limit_error, etc.)kms.project.id(string, optional): Project IDkms.project.name(string, optional): Project nameclient.address(string, optional): IP addressuser_agent.original(string, optional): User agent information
Secret Operations Metrics
These metrics provide visibility into secret access patterns, helping you understand which secrets are being accessed, by whom, and from where. Essential for security auditing and access pattern analysis.
Metric Name: kms.secret.read.count
Type: Counter
Unit: {operation}
Description: Number of secret read operations
Attributes:
kms.organization.id(string): Organization IDkms.organization.name(string): Organization namekms.project.id(string): Project IDkms.project.name(string): Project name (e.g., "payment-service-secrets")kms.environment(string): Environment (dev, staging, prod)kms.secret.path(string): Path to secrets (e.g., "/microservice-a/database")kms.secret.name(string, optional): Name of secretkms.user.id(string, optional): User ID if humankms.user.email(string, optional): User emailkms.identity.id(string, optional): Machine identity IDkms.identity.name(string, optional): Machine identity nameuser_agent.original(string, optional): User agent/SDK informationclient.address(string, optional): IP address
Authentication Metrics
These metrics track authentication attempts and outcomes, enabling you to monitor login success rates, detect potential security threats, and identify authentication issues.
Metric Name: kms.auth.attempt.count
Type: Counter
Unit: {attempt}
Description: Authentication attempts (both successful and failed)
Attributes:
kms.organization.id(string): Organization IDkms.organization.name(string): Organization namekms.user.id(string, optional): User ID if human (if identifiable)kms.user.email(string, optional): User email (if identifiable)kms.identity.id(string, optional): Identity ID if machine (if identifiable)kms.identity.name(string, optional): Identity name (if identifiable)kms.auth.method(string): Authentication method attemptedkms.auth.result(string): success or failureerror.type(string, optional): Reason for failure if failed (invalid_credentials, expired_token, invalid_token, etc.)client.address(string): IP addressuser_agent.original(string, optional): User agent/client informationkms.auth.attempt.username(string, optional): Attempted username/email (if available)
Key Management Interoperability Protocol Metrics
These metrics track Key Management Interoperability Protocol (KMIP) operations, providing visibility into key management activities including key creation, retrieval, activation, revocation, and destruction.
Metric Name: kms.kmip.operation.count
Type: Counter
Unit: {operation}
Description: Number of KMIP operations performed
Attributes:
kms.kmip.operation.type(string): Operation type (create,get,get_attributes,activate,revoke,destroy,locate,register)kms.organization.id(string): Organization IDkms.project.id(string): Project IDkms.kmip.client.id(string): KMIP client ID performing the operationkms.kmip.object.id(string, optional): Managed object/key IDkms.kmip.object.name(string, optional): Managed object/key namekms.identity.id(string, optional): Machine identity IDkms.identity.name(string, optional): Machine identity nameuser_agent.original(string, optional): User agent stringclient.address(string, optional): Client IP address
Integration & Secret Sync Metrics
These metrics monitor secret synchronization operations between Hanzo KMS and external systems, helping you track sync health, identify integration failures, and troubleshoot connectivity issues.
Integration secret sync error count
- Labels:
version,integration,integrationId,type,status,name,projectId - Example: Monitor integration sync failures across different services
Secret sync operation error count
- Labels:
version,destination,syncId,projectId,type,status,name - Example: Track secret sync failures to external systems
Secret import operation error count
- Labels:
version,destination,syncId,projectId,type,status,name - Example: Monitor secret import failures
Secret removal operation error count
- Labels:
version,destination,syncId,projectId,type,status,name - Example: Track secret removal operation failures
System Metrics
These low-level HTTP metrics are automatically collected by OpenTelemetry's instrumentation layer, providing baseline performance data for all HTTP traffic.
HTTP server request duration metrics (histogram buckets, count, sum)
HTTP client request duration metrics (histogram buckets, count, sum)
Troubleshooting
If your metrics are not showing up in Prometheus or your monitoring system, check the following:
- Verify
OTEL_TELEMETRY_COLLECTION_ENABLED=trueis set in your Hanzo KMS environment variables - Ensure the correct
OTEL_EXPORT_TYPEis set (prometheusorotlp) - Check network connectivity between Hanzo KMS and your monitoring services (Prometheus or OTLP collector)
- For pull-based monitoring: Verify port 9464 is exposed and accessible
- For push-based monitoring: Verify the OTLP endpoint URL is correct and reachable
- Check Hanzo KMS backend logs for any errors related to metrics export
If you're experiencing authentication errors with the OpenTelemetry Collector:
- Verify basic auth credentials in your OTLP configuration match between Hanzo KMS and the collector
- Check that
OTEL_COLLECTOR_BASIC_AUTH_USERNAMEandOTEL_COLLECTOR_BASIC_AUTH_PASSWORDmatch the credentials in yourotel-collector-config.yaml - Ensure the htpasswd format in the collector configuration is correct
- Test the collector endpoint manually using curl with the same credentials to verify they work
How is this guide?
Last updated on