Hanzo
PlatformHanzo KMSSelf-HostingGuides

Monitoring and Telemetry Setup

Learn how to set up monitoring and telemetry for your self-hosted Hanzo KMS instance using Grafana, Prometheus, and OpenTelemetry.

Hanzo KMS provides comprehensive monitoring and telemetry capabilities to help you monitor the health, performance, and usage of your self-hosted instance. This guide covers setting up monitoring using Grafana with two different telemetry collection approaches.

Overview

Hanzo KMS exports metrics in OpenTelemetry (OTEL) format, which provides maximum flexibility for your monitoring infrastructure. While this guide focuses on Grafana, the OTEL format means you can easily integrate with:

  • Cloud-native monitoring: AWS CloudWatch, Google Cloud Monitoring, Azure Monitor
  • Observability platforms: Datadog, New Relic, Splunk, Dynatrace
  • Custom backends: Any system that supports OTEL ingestion
  • Traditional monitoring: Prometheus, Grafana (as covered in this guide)

Hanzo KMS supports two telemetry collection methods:

  1. Pull-based (Prometheus): Exposes metrics on a dedicated endpoint for Prometheus to scrape
  2. Push-based (OTLP): Sends metrics to an OpenTelemetry Collector via OTLP protocol

Both approaches provide the same metrics data in OTEL format, so you can choose the one that best fits your infrastructure and monitoring strategy.

Prerequisites

  • Self-hosted Hanzo KMS instance running
  • Access to deploy monitoring services (Prometheus, Grafana, etc.)
  • Basic understanding of Prometheus and Grafana

Setup

Environment Variables

Configure the following environment variables in your Hanzo KMS backend:

# Enable telemetry collection
OTEL_TELEMETRY_COLLECTION_ENABLED=true

# Choose export type: "prometheus" or "otlp"
OTEL_EXPORT_TYPE=prometheus

This approach exposes metrics on port 9464 at the /metrics endpoint, allowing Prometheus to scrape the data. The metrics are exposed in Prometheus format but originate from OpenTelemetry instrumentation.

Configuration

OTEL_TELEMETRY_COLLECTION_ENABLED=true
OTEL_EXPORT_TYPE=prometheus

Expose the metrics port in your Hanzo KMS backend:

  • Docker: Expose port 9464
  • Kubernetes: Create a service exposing port 9464
  • Other: Ensure port 9464 is accessible to your monitoring stack

Create prometheus.yml:

global:
  scrape_interval: 30s
  evaluation_interval: 30s

scrape_configs:
  - job_name: "kms"
    scrape_interval: 30s
    static_configs:
      - targets: ["kms-backend:9464"] # Adjust hostname/port based on your deployment
    metrics_path: "/metrics"

Replace kms-backend:9464 with the actual hostname and port where your Hanzo KMS backend is running. This could be:

  • Docker Compose: kms-backend:9464 (service name)
  • Kubernetes: kms-backend.default.svc.cluster.local:9464 (service name)
  • Bare Metal: 192.168.1.100:9464 (actual IP address)
  • Cloud: your-kms.example.com:9464 (domain name)

Deployment Options

Once you've configured Hanzo KMS to expose metrics, you'll need to deploy Prometheus to scrape and store them. Below are examples for different deployment environments. Choose the option that matches your infrastructure.

services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin
# prometheus-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
        - name: prometheus
          image: prom/prometheus:latest
          ports:
            - containerPort: 9090
          volumeMounts:
            - name: config
              mountPath: /etc/prometheus
      volumes:
        - name: config
          configMap:
            name: prometheus-config

---
# prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  selector:
    app: prometheus
  ports:
    - port: 9090
      targetPort: 9090
  type: ClusterIP
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/prometheus \
  --set server.config.global.scrape_interval=30s \
  --set server.config.scrape_configs[0].job_name=kms \
  --set server.config.scrape_configs[0].static_configs[0].targets[0]=kms-backend:9464

This approach sends metrics directly to an OpenTelemetry Collector via the OTLP protocol. This gives you the most flexibility as you can configure the collector to export to multiple backends simultaneously.

Configuration

OTEL_TELEMETRY_COLLECTION_ENABLED=true
OTEL_EXPORT_TYPE=otlp
OTEL_EXPORT_OTLP_ENDPOINT=http://otel-collector:4318/v1/metrics
OTEL_COLLECTOR_BASIC_AUTH_USERNAME=kms
OTEL_COLLECTOR_BASIC_AUTH_PASSWORD=kms
OTEL_OTLP_PUSH_INTERVAL=30000

Create otel-collector-config.yaml:

extensions:
  health_check:
  pprof:
  zpages:
  basicauth/server:
    htpasswd:
      inline: |
        your_username:your_password

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
        auth:
          authenticator: basicauth/server

  prometheus:
    config:
      scrape_configs:
        - job_name: otel-collector
          scrape_interval: 30s
          static_configs:
            - targets: [kms-backend:9464]
          metric_relabel_configs:
            - action: labeldrop
              regex: "service_instance_id|service_name"

processors:
  batch:

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
    auth:
      authenticator: basicauth/server
    resource_to_telemetry_conversion:
      enabled: true

service:
  extensions: [basicauth/server, health_check, pprof, zpages]
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus]

Replace your_username:your_password with your chosen credentials. These must match the values you set in Hanzo KMS's OTEL_COLLECTOR_BASIC_AUTH_USERNAME and OTEL_COLLECTOR_BASIC_AUTH_PASSWORD environment variables.

Create Prometheus configuration for the collector:

global:
  scrape_interval: 30s
  evaluation_interval: 30s

scrape_configs:
  - job_name: "otel-collector"
    scrape_interval: 30s
    static_configs:
      - targets: ["otel-collector:8889"] # Adjust hostname/port based on your deployment
    metrics_path: "/metrics"

Replace otel-collector:8889 with the actual hostname and port where your OpenTelemetry Collector is running. This could be:

  • Docker Compose: otel-collector:8889 (service name)
  • Kubernetes: otel-collector.default.svc.cluster.local:8889 (service name)
  • Bare Metal: 192.168.1.100:8889 (actual IP address)
  • Cloud: your-collector.example.com:8889 (domain name)

Deployment Options

After configuring Hanzo KMS and the OpenTelemetry Collector, you'll need to deploy the collector to receive metrics from Hanzo KMS. Below are examples for different deployment environments. Choose the option that matches your infrastructure.

services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    ports:
      - 4318:4318 # OTLP http receiver
      - 8889:8889 # Prometheus exporter metrics
    volumes:
      - ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml:ro
    command:
      - "--config=/etc/otelcol-contrib/config.yaml"
# otel-collector-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
spec:
  replicas: 1
  selector:
    matchLabels:
      app: otel-collector
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      containers:
        - name: otel-collector
          image: otel/opentelemetry-collector-contrib:latest
          ports:
            - containerPort: 4318
            - containerPort: 8889
          volumeMounts:
            - name: config
              mountPath: /etc/otelcol-contrib
      volumes:
        - name: config
          configMap:
            name: otel-collector-config
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm install otel-collector open-telemetry/opentelemetry-collector \
  --set config.receivers.otlp.protocols.http.endpoint=0.0.0.0:4318 \
  --set config.exporters.prometheus.endpoint=0.0.0.0:8889

Available Metrics

Hanzo KMS exposes the following key metrics in OpenTelemetry format:

Core API Metrics

These metrics track all HTTP API requests to Hanzo KMS, including request counts, latency, and errors. Use these to monitor overall API health, identify performance bottlenecks, and track usage patterns across users and machine identities.

Metric Name: kms.http.server.request.count

Type: Counter

Unit: {request}

Description: Total number of API requests to Hanzo KMS (covers both human users and machine identities)

Attributes:

  • kms.organization.id (string): Organization ID
  • kms.organization.name (string): Organization name (e.g., "Platform Engineering Team")
  • kms.user.id (string, optional): User ID if human user
  • kms.user.email (string, optional): User email (e.g., "jane.doe@cisco.com")
  • kms.identity.id (string, optional): Machine identity ID
  • kms.identity.name (string, optional): Machine identity name (e.g., "prod-k8s-operator")
  • kms.auth.method (string, optional): Auth method used
  • http.request.method (string): HTTP method (GET, POST, PUT, DELETE)
  • http.route (string): API endpoint route pattern
  • http.response.status_code (int): HTTP status code
  • kms.project.id (string, optional): Project ID
  • kms.project.name (string, optional): Project name
  • user_agent.original (string, optional): User agent string
  • client.address (string, optional): IP address

Metric Name: kms.http.server.request.duration

Type: Histogram

Unit: s (seconds)

Description: API request latency

Buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]

Attributes:

  • kms.organization.id (string): Organization ID
  • kms.organization.name (string): Organization name
  • kms.user.id (string, optional): User ID if human user
  • kms.user.email (string, optional): User email
  • kms.identity.id (string, optional): Machine identity ID
  • kms.identity.name (string, optional): Machine identity name
  • http.request.method (string): HTTP method
  • http.route (string): API endpoint route pattern
  • http.response.status_code (int): HTTP status code
  • kms.project.id (string, optional): Project ID
  • kms.project.name (string, optional): Project name

Metric Name: kms.http.server.error.count

Type: Counter

Unit: {error}

Description: API errors grouped by actor (for identifying misconfigured services)

Attributes:

  • kms.organization.id (string): Organization ID
  • kms.organization.name (string): Organization name
  • kms.user.id (string, optional): User ID if human
  • kms.user.email (string, optional): User email
  • kms.identity.id (string, optional): Identity ID if machine
  • kms.identity.name (string, optional): Identity name
  • http.route (string): API endpoint where error occurred
  • http.request.method (string): HTTP method
  • error.type (string): Error category/type (client_error, server_error, auth_error, rate_limit_error, etc.)
  • kms.project.id (string, optional): Project ID
  • kms.project.name (string, optional): Project name
  • client.address (string, optional): IP address
  • user_agent.original (string, optional): User agent information

Secret Operations Metrics

These metrics provide visibility into secret access patterns, helping you understand which secrets are being accessed, by whom, and from where. Essential for security auditing and access pattern analysis.

Metric Name: kms.secret.read.count

Type: Counter

Unit: {operation}

Description: Number of secret read operations

Attributes:

  • kms.organization.id (string): Organization ID
  • kms.organization.name (string): Organization name
  • kms.project.id (string): Project ID
  • kms.project.name (string): Project name (e.g., "payment-service-secrets")
  • kms.environment (string): Environment (dev, staging, prod)
  • kms.secret.path (string): Path to secrets (e.g., "/microservice-a/database")
  • kms.secret.name (string, optional): Name of secret
  • kms.user.id (string, optional): User ID if human
  • kms.user.email (string, optional): User email
  • kms.identity.id (string, optional): Machine identity ID
  • kms.identity.name (string, optional): Machine identity name
  • user_agent.original (string, optional): User agent/SDK information
  • client.address (string, optional): IP address

Authentication Metrics

These metrics track authentication attempts and outcomes, enabling you to monitor login success rates, detect potential security threats, and identify authentication issues.

Metric Name: kms.auth.attempt.count

Type: Counter

Unit: {attempt}

Description: Authentication attempts (both successful and failed)

Attributes:

  • kms.organization.id (string): Organization ID
  • kms.organization.name (string): Organization name
  • kms.user.id (string, optional): User ID if human (if identifiable)
  • kms.user.email (string, optional): User email (if identifiable)
  • kms.identity.id (string, optional): Identity ID if machine (if identifiable)
  • kms.identity.name (string, optional): Identity name (if identifiable)
  • kms.auth.method (string): Authentication method attempted
  • kms.auth.result (string): success or failure
  • error.type (string, optional): Reason for failure if failed (invalid_credentials, expired_token, invalid_token, etc.)
  • client.address (string): IP address
  • user_agent.original (string, optional): User agent/client information
  • kms.auth.attempt.username (string, optional): Attempted username/email (if available)

Key Management Interoperability Protocol Metrics

These metrics track Key Management Interoperability Protocol (KMIP) operations, providing visibility into key management activities including key creation, retrieval, activation, revocation, and destruction.

Metric Name: kms.kmip.operation.count

Type: Counter

Unit: {operation}

Description: Number of KMIP operations performed

Attributes:

  • kms.kmip.operation.type (string): Operation type (create, get, get_attributes, activate, revoke, destroy, locate, register)
  • kms.organization.id (string): Organization ID
  • kms.project.id (string): Project ID
  • kms.kmip.client.id (string): KMIP client ID performing the operation
  • kms.kmip.object.id (string, optional): Managed object/key ID
  • kms.kmip.object.name (string, optional): Managed object/key name
  • kms.identity.id (string, optional): Machine identity ID
  • kms.identity.name (string, optional): Machine identity name
  • user_agent.original (string, optional): User agent string
  • client.address (string, optional): Client IP address

Integration & Secret Sync Metrics

These metrics monitor secret synchronization operations between Hanzo KMS and external systems, helping you track sync health, identify integration failures, and troubleshoot connectivity issues.

Integration secret sync error count

  • Labels: version, integration, integrationId, type, status, name, projectId
  • Example: Monitor integration sync failures across different services

Secret sync operation error count

  • Labels: version, destination, syncId, projectId, type, status, name
  • Example: Track secret sync failures to external systems

Secret import operation error count

  • Labels: version, destination, syncId, projectId, type, status, name
  • Example: Monitor secret import failures

Secret removal operation error count

  • Labels: version, destination, syncId, projectId, type, status, name
  • Example: Track secret removal operation failures

System Metrics

These low-level HTTP metrics are automatically collected by OpenTelemetry's instrumentation layer, providing baseline performance data for all HTTP traffic.

HTTP server request duration metrics (histogram buckets, count, sum)

HTTP client request duration metrics (histogram buckets, count, sum)

Troubleshooting

If your metrics are not showing up in Prometheus or your monitoring system, check the following:

  • Verify OTEL_TELEMETRY_COLLECTION_ENABLED=true is set in your Hanzo KMS environment variables
  • Ensure the correct OTEL_EXPORT_TYPE is set (prometheus or otlp)
  • Check network connectivity between Hanzo KMS and your monitoring services (Prometheus or OTLP collector)
  • For pull-based monitoring: Verify port 9464 is exposed and accessible
  • For push-based monitoring: Verify the OTLP endpoint URL is correct and reachable
  • Check Hanzo KMS backend logs for any errors related to metrics export

If you're experiencing authentication errors with the OpenTelemetry Collector:

  • Verify basic auth credentials in your OTLP configuration match between Hanzo KMS and the collector
  • Check that OTEL_COLLECTOR_BASIC_AUTH_USERNAME and OTEL_COLLECTOR_BASIC_AUTH_PASSWORD match the credentials in your otel-collector-config.yaml
  • Ensure the htpasswd format in the collector configuration is correct
  • Test the collector endpoint manually using curl with the same credentials to verify they work

How is this guide?

Last updated on

On this page