Docker-Monitoring und Observability

guideline_type: "technology" scope: "docker-monitoring" audience: ["developers", "devops", "ai-assistants"] last_updated: "2025-09-15" dependencies: ["docker-overview.md", "docker-architecture.md"] related_files: ["docker-compose.yml", "config/monitoring/", "config/grafana/", "config/prometheus/*"] ai_context: "Monitoring-Setup, Prometheus-Metriken, Grafana-Dashboards, Health-Checks und Log-Aggregation"

📊 Monitoring und Observability

Prometheus Metrics

Alle Services exposieren standardisierte Metrics:

# Service-Labels für Prometheus Autodiscovery
labels:
  - "prometheus.scrape=true"
  - "prometheus.port=8080"
  - "prometheus.path=/actuator/prometheus"
  - "prometheus.service=${SERVICE_NAME}"

🤖 AI-Assistant Hinweis: Monitoring-Stack Zugriff:

Grafana: http://localhost:3000 (admin/admin)

Prometheus: http://localhost:9090

Metrics-Endpoints: /actuator/prometheus für Spring-Services

Health-Checks: /actuator/health für Readiness-Probes

Grafana Dashboards

Vorgefertigte Dashboards:

Infrastructure Overview: CPU, Memory, Disk, Network
Spring Boot Services: JVM Metrics, HTTP Requests, Circuit Breaker
Database Performance: PostgreSQL Connections, Query Performance
Message Queue: Kafka Consumer Lag, Throughput
Business Metrics: Application-spezifische KPIs

Health Check Matrix

Service	Endpoint	Erwartung	Timeout
API Gateway	`/actuator/health`	`{"status":"UP"}`	15s
Ping Service	`/actuator/health/readiness`	HTTP 200	3s
PostgreSQL	`pg_isready`	Connection OK	5s
Redis	`redis-cli ping`	PONG	5s
Keycloak	`/health/ready`	HTTP 200	5s

Log Aggregation

# Centralized logging mit ELK Stack (optional)
docker-compose -f docker-compose.yaml -f docker-compose.logging.yml up -d

# Log-Parsing für strukturierte Logs
docker-compose logs --follow --tail=100 api-gateway | jq -r '.message'

🎯 AI-Assistenten: Monitoring-Schnellreferenz

Monitoring-URLs

Grafana Dashboard: http://localhost:3000 (admin/admin)
Prometheus Targets: http://localhost:9090/targets
Prometheus Metrics: http://localhost:9090/metrics
Service Health: http://localhost:/actuator/health

Wichtige Metrics

Metric-Typ	Beispiel	Beschreibung
JVM Memory	`jvm_memory_used_bytes`	Speicherverbrauch Java-Services
HTTP Requests	`http_requests_total`	API-Request-Zähler
Database Connections	`hikaricp_connections`	Pool-Verbindungen
Kafka Lag	`kafka_consumer_lag`	Consumer-Verzögerung
Custom Business	`meldestelle_registrations_total`	Fachliche KPIs

Health-Check Befehle

# Alle Services prüfen
docker-compose ps

# Service-spezifische Health-Checks
curl -s http://localhost:8082/actuator/health | jq '.status'
curl -s http://localhost:8081/actuator/health | jq '.status'

# Infrastructure Health-Checks
docker-compose exec postgres pg_isready -U meldestelle -d meldestelle
docker-compose exec redis redis-cli ping
curl -s http://localhost:8180/health/ready

Log-Analyse

# Service-Logs in Echtzeit
docker-compose logs -f <service-name>

# Error-Logs filtern
docker-compose logs <service-name> | grep ERROR

# JSON-Logs strukturiert anzeigen
docker-compose logs api-gateway | jq -r '. | select(.level=="ERROR") | .message'

# Performance-Logs analysieren
docker-compose logs api-gateway | grep -i "took\|duration\|time"

Dashboard-Setup

Infrastructure-Dashboard

{
  "dashboard": {
    "title": "Meldestelle Infrastructure",
    "panels": [
      {
        "title": "CPU Usage",
        "targets": [
          {
            "expr": "rate(container_cpu_usage_seconds_total[5m]) * 100"
          }
        ]
      },
      {
        "title": "Memory Usage",
        "targets": [
          {
            "expr": "container_memory_usage_bytes / container_spec_memory_limit_bytes * 100"
          }
        ]
      }
    ]
  }
}

Application-Dashboard

{
  "dashboard": {
    "title": "Meldestelle Services",
    "panels": [
      {
        "title": "HTTP Requests/sec",
        "targets": [
          {
            "expr": "rate(http_requests_total[1m])"
          }
        ]
      },
      {
        "title": "Response Time",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))"
          }
        ]
      }
    ]
  }
}

Alerting-Regeln

# prometheus/alerts.yaml
groups:
  - name: meldestelle.rules
    rules:
    - alert: ServiceDown
      expr: up == 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "Service {{ $labels.instance }} is down"

    - alert: HighMemoryUsage
      expr: (container_memory_usage_bytes / container_spec_memory_limit_bytes) > 0.8
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High memory usage on {{ $labels.instance }}"

    - alert: DatabaseConnectionsFull
      expr: hikaricp_connections_active >= hikaricp_connections_max * 0.8
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "Database connection pool nearly exhausted"

Monitoring-Wartung

# Prometheus-Konfiguration neu laden
curl -X POST http://localhost:9090/-/reload

# Grafana-Dashboards exportieren
curl -s -H "Authorization: Bearer <token>" \
  http://localhost:3000/api/dashboards/uid/<dashboard-uid> > dashboard_backup.json

# Monitoring-Data bereinigen
docker-compose exec prometheus rm -rf /prometheus/data
docker-compose restart prometheus

# Log-Rotation für Monitoring-Services
docker-compose exec grafana find /var/log -name "*.log" -exec truncate -s 0 {} \;

Performance-Tuning

# prometheus.yaml - Optimierte Konfiguration
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "/etc/prometheus/alerts.yaml"

scrape_configs:
  - job_name: 'spring-boot'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['api-gateway:8081', 'ping-service:8082']
    scrape_interval: 10s

  - job_name: 'infrastructure'
    static_configs:
      - targets: ['postgres:5432', 'redis:6379']
    scrape_interval: 30s

Navigation:

docker-overview - Grundlagen und Philosophie
docker-architecture - Container-Services und Struktur
docker-development - Entwicklungsworkflow
docker-production - Production-Deployment
docker-troubleshooting - Problemlösung

7.1 KiB Raw Blame History