(vision) SCS/DDD

Service Discovery einführen Consul als Service-Registry implementieren Services für automatische Registrierung konfigurieren Dynamisches Service-Routing im API-Gateway einrichten Health-Checks für jeden Service implementieren
2025-07-22 00:03:51 +02:00
parent 1ecac43d72
commit 8229e8e571
9 changed files with 212 additions and 54 deletions
@@ -0,0 +1,199 @@
+# Meldestelle Monitoring System
+
+This document describes the monitoring system set up for the Meldestelle application. The monitoring system includes metrics collection, visualization, centralized logging, and alerting.
+
+## Components
+
+The monitoring system consists of the following components:
+
+1. **Prometheus** - For metrics collection and storage
+2. **Grafana** - For metrics visualization and dashboards
+3. **ELK Stack** - For centralized logging (Elasticsearch, Logstash, Kibana)
+4. **Alertmanager** - For alert management and notifications
+
+## Architecture
+
+The monitoring system is deployed as Docker containers alongside the Meldestelle application. The components interact as follows:
+
+- The Meldestelle application exposes metrics at the `/metrics` endpoint
+- Prometheus scrapes metrics from the application and stores them
+- Grafana visualizes the metrics from Prometheus
+- The application sends logs to Logstash
+- Logstash processes the logs and sends them to Elasticsearch
+- Kibana visualizes the logs from Elasticsearch
+- Prometheus evaluates alerting rules and sends alerts to Alertmanager
+- Alertmanager manages alerts and sends notifications via configured channels (email, Slack, etc.)
+
+## Setup
+
+The monitoring system is configured in the `docker-compose.yml` file and the configuration files in the `config/monitoring` directory.
+
+### Prerequisites
+
+- Docker and Docker Compose
+- The Meldestelle application running with metrics enabled
+
+### Starting the Monitoring System
+
+To start the monitoring system, run:
+
+```bash
+docker-compose up -d prometheus grafana alertmanager
+```
+
+To start the ELK Stack, run:
+
+```bash
+docker-compose up -d elasticsearch logstash kibana
+```
+
+### Testing the Monitoring System
+
+A test script is provided to verify that the monitoring system is working correctly:
+
+```bash
+./test-monitoring.sh
+```
+
+## Accessing the Monitoring Tools
+
+- **Prometheus**: http://localhost:9090
+- **Grafana**: http://localhost:3000 (default credentials: admin/admin)
+- **Alertmanager**: http://localhost:9093
+- **Kibana**: http://localhost:5601
+
+## Metrics
+
+The following metrics are collected by Prometheus:
+
+### JVM Metrics
+
+- Memory usage (heap and non-heap)
+- Garbage collection statistics
+- Thread counts
+- Class loading statistics
+- CPU usage
+
+### Application Metrics
+
+- HTTP request counts
+- HTTP request durations
+- Error rates
+- Custom business metrics
+
+## Dashboards
+
+Grafana dashboards are provided for visualizing the metrics:
+
+- **JVM Dashboard**: Shows JVM metrics such as memory usage, garbage collection, and thread counts
+- **Application Dashboard**: Shows application metrics such as request rates, error rates, and response times
+
+## Alerting
+
+Alerting is configured in Prometheus and Alertmanager. The following alerts are defined:
+
+- **High Memory Usage**: Triggered when JVM heap memory usage exceeds 85% for 5 minutes
+- **High CPU Usage**: Triggered when CPU usage exceeds 85% for 5 minutes
+- **High Error Rate**: Triggered when the error rate exceeds 5% for 2 minutes
+- **Service Unavailable**: Triggered when the service is down for 1 minute
+- **Slow Response Time**: Triggered when the average response time exceeds 1 second for 5 minutes
+- **High GC Pause Time**: Triggered when the average GC pause time exceeds 0.5 seconds for 5 minutes
+
+Alerts are sent to the configured notification channels (email and Slack).
+
+## Logging
+
+Logs are collected by Logstash, stored in Elasticsearch, and visualized in Kibana. The following log sources are configured:
+
+- Application logs via TCP (JSON format)
+- File logs from the `/var/log/meldestelle` directory
+
+## Configuration Files
+
+- **Prometheus**: `config/monitoring/prometheus.yml`
+- **Alertmanager**: `config/monitoring/alertmanager/alertmanager.yml`
+- **Alerting Rules**: `config/monitoring/prometheus/rules/alerts.yml`
+- **Grafana Dashboards**: `config/monitoring/grafana/dashboards/`
+- **Grafana Datasources**: `config/monitoring/grafana/provisioning/datasources/`
+- **Logstash**: `config/monitoring/elk/logstash.conf`
+- **Elasticsearch**: `config/monitoring/elk/elasticsearch.yml`
+
+## Troubleshooting
+
+### Prometheus
+
+- Check if Prometheus is running: `docker-compose ps prometheus`
+- Check Prometheus logs: `docker-compose logs prometheus`
+- Verify that Prometheus can scrape metrics: http://localhost:9090/targets
+- Check if alerting rules are loaded: http://localhost:9090/rules
+
+### Grafana
+
+- Check if Grafana is running: `docker-compose ps grafana`
+- Check Grafana logs: `docker-compose logs grafana`
+- Verify that Grafana can connect to Prometheus: http://localhost:3000/datasources
+
+### Alertmanager
+
+- Check if Alertmanager is running: `docker-compose ps alertmanager`
+- Check Alertmanager logs: `docker-compose logs alertmanager`
+- Verify that Alertmanager is receiving alerts: http://localhost:9093/#/alerts
+
+### ELK Stack
+
+- Check if Elasticsearch is running: `docker-compose ps elasticsearch`
+- Check Elasticsearch logs: `docker-compose logs elasticsearch`
+- Check if Logstash is running: `docker-compose ps logstash`
+- Check Logstash logs: `docker-compose logs logstash`
+- Check if Kibana is running: `docker-compose ps kibana`
+- Check Kibana logs: `docker-compose logs kibana`
+- Verify that Elasticsearch is receiving logs: http://localhost:9200/_cat/indices
+- Verify that Kibana can connect to Elasticsearch: http://localhost:5601/app/management/kibana/indexPatterns
+
+## Maintenance
+
+### Backup and Restore
+
+- Prometheus data is stored in the `prometheus_data` volume
+- Grafana data is stored in the `grafana_data` volume
+- Alertmanager data is stored in the `alertmanager_data` volume
+- Elasticsearch data is stored in the `elasticsearch_data` volume
+
+To backup these volumes, use Docker's volume backup functionality:
+
+```bash
+docker run --rm -v prometheus_data:/source -v $(pwd)/backup:/backup alpine tar -czf /backup/prometheus_data.tar.gz -C /source .
+```
+
+To restore from a backup:
+
+```bash
+docker run --rm -v prometheus_data:/target -v $(pwd)/backup:/backup alpine sh -c "rm -rf /target/* && tar -xzf /backup/prometheus_data.tar.gz -C /target"
+```
+
+### Updating
+
+To update the monitoring components, update the image tags in the `docker-compose.yml` file and run:
+
+```bash
+docker-compose pull prometheus grafana alertmanager
+docker-compose up -d prometheus grafana alertmanager
+```
+
+## Security Considerations
+
+- The monitoring system is configured for development and testing purposes
+- For production use, consider the following security measures:
+  - Enable authentication for Prometheus
+  - Use strong passwords for Grafana
+  - Configure TLS for all components
+  - Restrict access to the monitoring endpoints
+  - Use environment variables for sensitive configuration values
+  - Implement network segmentation to isolate the monitoring system
+
+## Further Reading
+
+- [Prometheus Documentation](https://prometheus.io/docs/introduction/overview/)
+- [Grafana Documentation](https://grafana.com/docs/grafana/latest/)
+- [Alertmanager Documentation](https://prometheus.io/docs/alerting/latest/alertmanager/)
+- [ELK Stack Documentation](https://www.elastic.co/guide/index.html)
@@ -62,20 +62,6 @@ implementation("io.ktor:ktor-client-cio:${libs.versions.ktor.get()}")
 Create a service registration component in the shared-kernel module:

 ```kotlin
-package at.mocode.shared.discovery
-
-import at.mocode.shared.config.AppConfig
-import com.orbitz.consul.Consul
-import com.orbitz.consul.model.agent.ImmutableRegistration
-import com.orbitz.consul.model.agent.Registration
-import kotlinx.coroutines.CoroutineScope
-import kotlinx.coroutines.Dispatchers
-import kotlinx.coroutines.delay
-import kotlinx.coroutines.launch
-import java.net.InetAddress
-import java.util.*
-import kotlin.time.Duration.Companion.seconds
-
 class ServiceRegistration(
    private val serviceName: String,
    private val servicePort: Int,
@@ -222,17 +208,6 @@ implementation("io.ktor:ktor-serialization-kotlinx-json:${libs.versions.ktor.get
 Create a service discovery component in the API Gateway:

 ```kotlin
-package at.mocode.gateway.discovery
-
-import com.orbitz.consul.Consul
-import com.orbitz.consul.model.health.ServiceHealth
-import io.ktor.client.*
-import io.ktor.client.engine.cio.*
-import io.ktor.client.request.*
-import io.ktor.http.*
-import java.net.URI
-import java.util.concurrent.ConcurrentHashMap
-
 class ServiceDiscovery(
    private val consulHost: String = "consul",
    private val consulPort: Int = 8500
@@ -157,7 +157,7 @@ private fun getRolePermissions(roles: List<UserRole>): List<Permission> {
    roles.forEach { role ->
        when (role) {
            UserRole.ADMIN -> {
-                permissions.addAll(Permission.values())
+                permissions.addAll(Permission.entries.toTypedArray())
            }
            UserRole.VEREINS_ADMIN -> {
                permissions.addAll(listOf(
@@ -354,7 +354,7 @@ val PipelineContext<Unit, ApplicationCall>.userAuthContext: UserAuthContext?
    get() = call.principal<JWTPrincipal>()?.getUserAuthContext()

 /**
- * Application call extension to check if user has specific role.
+ * Application call extension to check if the user has a specific role.
 */
 fun ApplicationCall.hasRole(role: UserRole): Boolean {
    val authContext = principal<JWTPrincipal>()?.getUserAuthContext()
@@ -362,7 +362,7 @@ fun ApplicationCall.hasRole(role: UserRole): Boolean {
 }

 /**
- * Application call extension to check if user has specific permission.
+ * Application call extension to check if the user has specific permission.
 */
 fun ApplicationCall.hasPermission(permission: Permission): Boolean {
    val authContext = principal<JWTPrincipal>()?.getUserAuthContext()
@@ -95,23 +95,23 @@ class CachingConfig(
    }

    /**
-     * Put a value in cache with TTL in minutes
+     * Put a value in a cache with TTL in minutes
     */
    fun <T> put(cacheName: String, key: String, value: T, ttlMinutes: Long = defaultTtlMinutes) {
        val stats = cacheStats.computeIfAbsent(cacheName) { CacheStats() }
        stats.puts++

-        // Store in local cache
+        // Store in a local cache
        val expiresAt = System.currentTimeMillis() + TimeUnit.MINUTES.toMillis(ttlMinutes)
        val entry = CacheEntry(value as Any, expiresAt)
        getCacheMap(cacheName)[key] = entry
    }

    /**
-     * Remove a value from cache
+     * Remove a value from the cache
     */
    fun remove(cacheName: String, key: String) {
-        // Remove from local cache
+        // Remove from the local cache
        getCacheMap(cacheName).remove(key)
    }

@@ -136,7 +136,7 @@ class CachingConfig(
    }

    /**
-     * Get the appropriate cache map based on cache name
+     * Get the appropriate cache map based on the cache name
     */
    private fun getCacheMap(cacheName: String): ConcurrentHashMap<String, CacheEntry<Any>> {
        return when (cacheName) {
@@ -1,18 +1,13 @@
 package at.mocode.gateway.config

 import io.ktor.server.application.*
-import io.ktor.server.plugins.*
 import io.ktor.server.request.*
 import io.ktor.server.routing.*
 import io.ktor.util.*
 import io.micrometer.core.instrument.Counter
-import io.micrometer.core.instrument.MeterRegistry
 import io.micrometer.core.instrument.Timer
-import io.micrometer.core.instrument.binder.MeterBinder
 import io.micrometer.prometheus.PrometheusMeterRegistry
-import java.time.Duration
 import java.util.concurrent.ConcurrentHashMap
-import java.util.concurrent.TimeUnit

 /**
 * Custom application metrics configuration.
@@ -4,19 +4,10 @@ import at.mocode.dto.base.ApiResponse
 import at.mocode.shared.config.AppConfig
 import io.ktor.http.*
 import io.ktor.server.application.*
-import io.ktor.server.metrics.micrometer.*
 import io.ktor.server.plugins.calllogging.*
 import io.ktor.server.plugins.statuspages.*
 import io.ktor.server.request.*
 import io.ktor.server.response.*
-import io.ktor.server.routing.*
-import io.micrometer.core.instrument.binder.jvm.ClassLoaderMetrics
-import io.micrometer.core.instrument.binder.jvm.JvmGcMetrics
-import io.micrometer.core.instrument.binder.jvm.JvmMemoryMetrics
-import io.micrometer.core.instrument.binder.jvm.JvmThreadMetrics
-import io.micrometer.core.instrument.binder.system.ProcessorMetrics
-import io.micrometer.prometheus.PrometheusConfig
-import io.micrometer.prometheus.PrometheusMeterRegistry
 import org.slf4j.event.Level
 import java.time.LocalDateTime
 import java.time.format.DateTimeFormatter
@@ -131,7 +131,7 @@ class ServiceDiscovery(
     * @return The complete URL
     */
    fun buildServiceUrl(instance: ServiceInstance, path: String): String {
-        val baseUrl = "http://${instance.host}:${instance.port}"
+        val baseUrl = "https://${instance.host}:${instance.port}"
        return URI(baseUrl).resolve(path).toString()
    }

@@ -143,7 +143,7 @@ class ServiceDiscovery(
     */
    suspend fun isServiceHealthy(serviceName: String): Boolean {
        try {
-            val response = httpClient.get("http://$consulHost:$consulPort/v1/health/service/$serviceName?passing=true")
+            val response = httpClient.get("https://$consulHost:$consulPort/v1/health/service/$serviceName?passing=true")
            val responseBody = response.bodyAsText()
            val healthyServices = Json.decodeFromString<List<Any>>(responseBody)
            return healthyServices.isNotEmpty()
@@ -1,6 +1,5 @@
 package at.mocode.gateway.plugins

-import at.mocode.gateway.config.CachingConfig
 import at.mocode.gateway.config.getCachingConfig
 import io.ktor.http.*
 import io.ktor.server.application.*
@@ -10,7 +9,6 @@ import io.ktor.util.pipeline.*
 import java.security.MessageDigest
 import java.text.SimpleDateFormat
 import java.util.*
-import kotlin.text.Charsets

 /**
 * Configures enhanced HTTP caching headers for the application.
@@ -190,7 +188,7 @@ suspend fun PipelineContext<Unit, ApplicationCall>.checkLastModifiedAndRespond(t
                call.respond(HttpStatusCode.NotModified)
                return true
            }
-        } catch (e: Exception) {
+        } catch (_: Exception) {
            // If we can't parse the date, ignore it
        }
    }
@@ -217,7 +215,7 @@ suspend fun <T> PipelineContext<Unit, ApplicationCall>.checkCacheAndRespond(
    val application = call.application
    val cachingConfig = try {
        application.getCachingConfig()
-    } catch (e: Exception) {
+    } catch (_: Exception) {
        return false
    }

@@ -25,7 +25,7 @@ dependencyResolutionManagement {
                includeGroupAndSubgroups("com.google")
            }
        }
-        // Add JCenter repository (archive)
+        // Add a JCenter repository (archive)
        maven {
            url = uri("https://jcenter.bintray.com")
        }