meldestelle/.junie/guidelines/technology-guides/docker/docker-monitoring.md
2025-12-04 03:34:11 +01:00

7.1 KiB

Docker-Monitoring und Observability


guideline_type: "technology" scope: "docker-monitoring" audience: ["developers", "devops", "ai-assistants"] last_updated: "2025-09-15" dependencies: ["docker-overview.md", "docker-architecture.md"] related_files: ["docker-compose.yml", "config/monitoring/", "config/grafana/", "config/prometheus/*"] ai_context: "Monitoring-Setup, Prometheus-Metriken, Grafana-Dashboards, Health-Checks und Log-Aggregation"


📊 Monitoring und Observability

Prometheus Metrics

Alle Services exposieren standardisierte Metrics:

# Service-Labels für Prometheus Autodiscovery
labels:
  - "prometheus.scrape=true"
  - "prometheus.port=8080"
  - "prometheus.path=/actuator/prometheus"
  - "prometheus.service=${SERVICE_NAME}"

🤖 AI-Assistant Hinweis: Monitoring-Stack Zugriff:

Grafana Dashboards

Vorgefertigte Dashboards:

  • Infrastructure Overview: CPU, Memory, Disk, Network
  • Spring Boot Services: JVM Metrics, HTTP Requests, Circuit Breaker
  • Database Performance: PostgreSQL Connections, Query Performance
  • Message Queue: Kafka Consumer Lag, Throughput
  • Business Metrics: Application-spezifische KPIs

Health Check Matrix

Service Endpoint Erwartung Timeout
API Gateway /actuator/health {"status":"UP"} 15s
Ping Service /actuator/health/readiness HTTP 200 3s
PostgreSQL pg_isready Connection OK 5s
Redis redis-cli ping PONG 5s
Keycloak /health/ready HTTP 200 5s

Log Aggregation

# Centralized logging mit ELK Stack (optional)
docker-compose -f docker-compose.yaml -f docker-compose.logging.yml up -d

# Log-Parsing für strukturierte Logs
docker-compose logs --follow --tail=100 api-gateway | jq -r '.message'

🎯 AI-Assistenten: Monitoring-Schnellreferenz

Monitoring-URLs

Wichtige Metrics

Metric-Typ Beispiel Beschreibung
JVM Memory jvm_memory_used_bytes Speicherverbrauch Java-Services
HTTP Requests http_requests_total API-Request-Zähler
Database Connections hikaricp_connections Pool-Verbindungen
Kafka Lag kafka_consumer_lag Consumer-Verzögerung
Custom Business meldestelle_registrations_total Fachliche KPIs

Health-Check Befehle

# Alle Services prüfen
docker-compose ps

# Service-spezifische Health-Checks
curl -s http://localhost:8082/actuator/health | jq '.status'
curl -s http://localhost:8081/actuator/health | jq '.status'

# Infrastructure Health-Checks
docker-compose exec postgres pg_isready -U meldestelle -d meldestelle
docker-compose exec redis redis-cli ping
curl -s http://localhost:8180/health/ready

Log-Analyse

# Service-Logs in Echtzeit
docker-compose logs -f <service-name>

# Error-Logs filtern
docker-compose logs <service-name> | grep ERROR

# JSON-Logs strukturiert anzeigen
docker-compose logs api-gateway | jq -r '. | select(.level=="ERROR") | .message'

# Performance-Logs analysieren
docker-compose logs api-gateway | grep -i "took\|duration\|time"

Dashboard-Setup

Infrastructure-Dashboard

{
  "dashboard": {
    "title": "Meldestelle Infrastructure",
    "panels": [
      {
        "title": "CPU Usage",
        "targets": [
          {
            "expr": "rate(container_cpu_usage_seconds_total[5m]) * 100"
          }
        ]
      },
      {
        "title": "Memory Usage",
        "targets": [
          {
            "expr": "container_memory_usage_bytes / container_spec_memory_limit_bytes * 100"
          }
        ]
      }
    ]
  }
}

Application-Dashboard

{
  "dashboard": {
    "title": "Meldestelle Services",
    "panels": [
      {
        "title": "HTTP Requests/sec",
        "targets": [
          {
            "expr": "rate(http_requests_total[1m])"
          }
        ]
      },
      {
        "title": "Response Time",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))"
          }
        ]
      }
    ]
  }
}

Alerting-Regeln

# prometheus/alerts.yaml
groups:
  - name: meldestelle.rules
    rules:
    - alert: ServiceDown
      expr: up == 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "Service {{ $labels.instance }} is down"

    - alert: HighMemoryUsage
      expr: (container_memory_usage_bytes / container_spec_memory_limit_bytes) > 0.8
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High memory usage on {{ $labels.instance }}"

    - alert: DatabaseConnectionsFull
      expr: hikaricp_connections_active >= hikaricp_connections_max * 0.8
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "Database connection pool nearly exhausted"

Monitoring-Wartung

# Prometheus-Konfiguration neu laden
curl -X POST http://localhost:9090/-/reload

# Grafana-Dashboards exportieren
curl -s -H "Authorization: Bearer <token>" \
  http://localhost:3000/api/dashboards/uid/<dashboard-uid> > dashboard_backup.json

# Monitoring-Data bereinigen
docker-compose exec prometheus rm -rf /prometheus/data
docker-compose restart prometheus

# Log-Rotation für Monitoring-Services
docker-compose exec grafana find /var/log -name "*.log" -exec truncate -s 0 {} \;

Performance-Tuning

# prometheus.yaml - Optimierte Konfiguration
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "/etc/prometheus/alerts.yaml"

scrape_configs:
  - job_name: 'spring-boot'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['api-gateway:8081', 'ping-service:8082']
    scrape_interval: 10s

  - job_name: 'infrastructure'
    static_configs:
      - targets: ['postgres:5432', 'redis:6379']
    scrape_interval: 30s

Navigation: