mirror of
https://github.com/azaion/detections-semantic.git
synced 2026-04-22 22:16:37 +00:00
8e2ecf50fd
Made-with: Cursor
81 lines
3.2 KiB
Markdown
81 lines
3.2 KiB
Markdown
# Observability
|
|
|
|
## Logging
|
|
|
|
### Detection Log
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| ts | ISO 8601 | Detection timestamp |
|
|
| frame_id | uint64 | Source frame |
|
|
| gps_denied_lat | float64 | GPS-denied latitude |
|
|
| gps_denied_lon | float64 | GPS-denied longitude |
|
|
| tier | uint8 | Tier that produced detection |
|
|
| class | string | Detection class label |
|
|
| confidence | float32 | Detection confidence |
|
|
| bbox | float32[4] | centerX, centerY, width, height (normalized) |
|
|
| freshness | string | Freshness tag (footpaths only) |
|
|
| tier2_result | string | Tier 2 classification |
|
|
| tier2_confidence | float32 | Tier 2 confidence |
|
|
| tier3_used | bool | Whether VLM was invoked |
|
|
| thumbnail_path | string | Path to ROI thumbnail |
|
|
|
|
**Format**: JSON-lines, append-only
|
|
**Location**: `/data/output/detections.jsonl`
|
|
**Rotation**: None (circular buffer at filesystem level for L1 frames)
|
|
|
|
### Gimbal Command Log
|
|
|
|
**Format**: Text, one line per command (timestamp, command type, target angles, CRC status, retry count)
|
|
**Location**: `/data/output/gimbal.log`
|
|
|
|
### System Health Log
|
|
|
|
**Format**: JSON-lines, 1 entry per second
|
|
**Fields**: timestamp, t_junction, power_watts, gpu_mem_mb, cpu_mem_mb, degradation_level, gimbal_alive, semantic_alive, vlm_alive, nvme_free_pct
|
|
**Location**: `/data/output/health.jsonl`
|
|
|
|
### Application Error Log
|
|
|
|
**Format**: Text with severity levels (ERROR, WARN, INFO)
|
|
**Location**: `/data/output/app.log`
|
|
**Content**: Exceptions, timeouts, CRC failures, frame skips, VLM errors
|
|
|
|
## Metrics (In-Memory)
|
|
|
|
No external metrics service (air-gapped). Metrics are computed in-memory and exposed via health API endpoint:
|
|
|
|
| Metric | Type | Description |
|
|
|--------|------|-------------|
|
|
| frames_processed_total | Counter | Total frames through Tier 1 |
|
|
| frames_skipped_quality | Counter | Frames rejected by quality gate |
|
|
| detections_total | Counter | Total detections produced (all tiers) |
|
|
| tier1_latency_ms | Histogram | Tier 1 inference time |
|
|
| tier2_latency_ms | Histogram | Tier 2 processing time |
|
|
| tier3_latency_ms | Histogram | Tier 3 VLM time |
|
|
| poi_queue_depth | Gauge | Current POI queue size |
|
|
| degradation_level | Gauge | Current degradation level |
|
|
| t_junction_celsius | Gauge | Current junction temperature |
|
|
| power_draw_watts | Gauge | Current power draw |
|
|
| gpu_memory_used_mb | Gauge | Current GPU memory |
|
|
| gimbal_crc_failures | Counter | Total CRC failures on UART |
|
|
| vlm_crashes | Counter | VLM process crash count |
|
|
|
|
**Exposed via**: GET /api/v1/health (JSON response with all metrics)
|
|
|
|
## Alerting
|
|
|
|
No external alerting system. Alerts are:
|
|
1. Degradation level changes → logged to health log + detection log
|
|
2. Critical events (VLM crash, gimbal loss, thermal critical) → logged with severity ERROR
|
|
3. Operator display shows current degradation level as status indicator
|
|
|
|
## Post-Flight Analysis
|
|
|
|
After landing, NVMe data is extracted via USB for offline analysis:
|
|
- `detections.jsonl` → import into annotation tool for TP/FP labeling
|
|
- `frames/` → source material for training dataset expansion
|
|
- `health.jsonl` → thermal/power profile for hardware optimization
|
|
- `gimbal.log` → PID tuning analysis
|
|
- `app.log` → debugging and issue diagnosis
|