mirror of
https://github.com/azaion/admin.git
synced 2026-04-22 22:06:33 +00:00
2.1 KiB
2.1 KiB
Step 5: Observability
Role: Site Reliability Engineer (SRE) Goal: Define logging, metrics, tracing, and alerting strategy. Constraints: Strategy document — describe what to implement, not how to wire it.
Steps
- Read
architecture.mdand component specs for service boundaries - Research observability best practices for the tech stack
Logging
- Structured JSON to stdout/stderr (no file logging in containers)
- Fields:
timestamp(ISO 8601),level,service,correlation_id,message,context - Levels: ERROR (exceptions), WARN (degraded), INFO (business events), DEBUG (diagnostics, dev only)
- No PII in logs
- Retention: dev = console, staging = 7 days, production = 30 days
Metrics
- Expose Prometheus-compatible
/metricsendpoint per service - System metrics: CPU, memory, disk, network
- Application metrics:
request_count,request_duration(histogram),error_count,active_connections - Business metrics: derived from acceptance criteria
- Collection interval: 15s
Distributed Tracing
- OpenTelemetry SDK integration
- Trace context propagation via HTTP headers and message queue metadata
- Span naming:
<service>.<operation> - Sampling: 100% in dev/staging, 10% in production (adjust based on volume)
Alerting
| Severity | Response Time | Condition Examples |
|---|---|---|
| Critical | 5 min | Service down, data loss, health check failed |
| High | 30 min | Error rate > 5%, P95 latency > 2x baseline |
| Medium | 4 hours | Disk > 80%, elevated latency |
| Low | Next business day | Non-critical warnings |
Dashboards
- Operations: service health, request rate, error rate, response time percentiles, resource utilization
- Business: key business metrics from acceptance criteria
Self-verification
- Structured logging format defined with required fields
- Metrics endpoint specified per service
- OpenTelemetry tracing configured
- Alert severities with response times defined
- Dashboards cover operations and business metrics
- PII exclusion from logs addressed
Save action
Write observability.md using templates/observability.md.