Made-with: Cursor
3.2 KiB
Observability
Logging
Detection Log
| Field | Type | Description |
|---|---|---|
| ts | ISO 8601 | Detection timestamp |
| frame_id | uint64 | Source frame |
| gps_denied_lat | float64 | GPS-denied latitude |
| gps_denied_lon | float64 | GPS-denied longitude |
| tier | uint8 | Tier that produced detection |
| class | string | Detection class label |
| confidence | float32 | Detection confidence |
| bbox | float32[4] | centerX, centerY, width, height (normalized) |
| freshness | string | Freshness tag (footpaths only) |
| tier2_result | string | Tier 2 classification |
| tier2_confidence | float32 | Tier 2 confidence |
| tier3_used | bool | Whether VLM was invoked |
| thumbnail_path | string | Path to ROI thumbnail |
Format: JSON-lines, append-only
Location: /data/output/detections.jsonl
Rotation: None (circular buffer at filesystem level for L1 frames)
Gimbal Command Log
Format: Text, one line per command (timestamp, command type, target angles, CRC status, retry count)
Location: /data/output/gimbal.log
System Health Log
Format: JSON-lines, 1 entry per second
Fields: timestamp, t_junction, power_watts, gpu_mem_mb, cpu_mem_mb, degradation_level, gimbal_alive, semantic_alive, vlm_alive, nvme_free_pct
Location: /data/output/health.jsonl
Application Error Log
Format: Text with severity levels (ERROR, WARN, INFO)
Location: /data/output/app.log
Content: Exceptions, timeouts, CRC failures, frame skips, VLM errors
Metrics (In-Memory)
No external metrics service (air-gapped). Metrics are computed in-memory and exposed via health API endpoint:
| Metric | Type | Description |
|---|---|---|
| frames_processed_total | Counter | Total frames through Tier 1 |
| frames_skipped_quality | Counter | Frames rejected by quality gate |
| detections_total | Counter | Total detections produced (all tiers) |
| tier1_latency_ms | Histogram | Tier 1 inference time |
| tier2_latency_ms | Histogram | Tier 2 processing time |
| tier3_latency_ms | Histogram | Tier 3 VLM time |
| poi_queue_depth | Gauge | Current POI queue size |
| degradation_level | Gauge | Current degradation level |
| t_junction_celsius | Gauge | Current junction temperature |
| power_draw_watts | Gauge | Current power draw |
| gpu_memory_used_mb | Gauge | Current GPU memory |
| gimbal_crc_failures | Counter | Total CRC failures on UART |
| vlm_crashes | Counter | VLM process crash count |
Exposed via: GET /api/v1/health (JSON response with all metrics)
Alerting
No external alerting system. Alerts are:
- Degradation level changes → logged to health log + detection log
- Critical events (VLM crash, gimbal loss, thermal critical) → logged with severity ERROR
- Operator display shows current degradation level as status indicator
Post-Flight Analysis
After landing, NVMe data is extracted via USB for offline analysis:
detections.jsonl→ import into annotation tool for TP/FP labelingframes/→ source material for training dataset expansionhealth.jsonl→ thermal/power profile for hardware optimizationgimbal.log→ PID tuning analysisapp.log→ debugging and issue diagnosis