Files
Oleksandr Bezdieniezhnykh 8e2ecf50fd Initial commit
Made-with: Cursor
2026-03-26 00:20:30 +02:00

3.2 KiB

Observability

Logging

Detection Log

Field Type Description
ts ISO 8601 Detection timestamp
frame_id uint64 Source frame
gps_denied_lat float64 GPS-denied latitude
gps_denied_lon float64 GPS-denied longitude
tier uint8 Tier that produced detection
class string Detection class label
confidence float32 Detection confidence
bbox float32[4] centerX, centerY, width, height (normalized)
freshness string Freshness tag (footpaths only)
tier2_result string Tier 2 classification
tier2_confidence float32 Tier 2 confidence
tier3_used bool Whether VLM was invoked
thumbnail_path string Path to ROI thumbnail

Format: JSON-lines, append-only Location: /data/output/detections.jsonl Rotation: None (circular buffer at filesystem level for L1 frames)

Gimbal Command Log

Format: Text, one line per command (timestamp, command type, target angles, CRC status, retry count) Location: /data/output/gimbal.log

System Health Log

Format: JSON-lines, 1 entry per second Fields: timestamp, t_junction, power_watts, gpu_mem_mb, cpu_mem_mb, degradation_level, gimbal_alive, semantic_alive, vlm_alive, nvme_free_pct Location: /data/output/health.jsonl

Application Error Log

Format: Text with severity levels (ERROR, WARN, INFO) Location: /data/output/app.log Content: Exceptions, timeouts, CRC failures, frame skips, VLM errors

Metrics (In-Memory)

No external metrics service (air-gapped). Metrics are computed in-memory and exposed via health API endpoint:

Metric Type Description
frames_processed_total Counter Total frames through Tier 1
frames_skipped_quality Counter Frames rejected by quality gate
detections_total Counter Total detections produced (all tiers)
tier1_latency_ms Histogram Tier 1 inference time
tier2_latency_ms Histogram Tier 2 processing time
tier3_latency_ms Histogram Tier 3 VLM time
poi_queue_depth Gauge Current POI queue size
degradation_level Gauge Current degradation level
t_junction_celsius Gauge Current junction temperature
power_draw_watts Gauge Current power draw
gpu_memory_used_mb Gauge Current GPU memory
gimbal_crc_failures Counter Total CRC failures on UART
vlm_crashes Counter VLM process crash count

Exposed via: GET /api/v1/health (JSON response with all metrics)

Alerting

No external alerting system. Alerts are:

  1. Degradation level changes → logged to health log + detection log
  2. Critical events (VLM crash, gimbal loss, thermal critical) → logged with severity ERROR
  3. Operator display shows current degradation level as status indicator

Post-Flight Analysis

After landing, NVMe data is extracted via USB for offline analysis:

  • detections.jsonl → import into annotation tool for TP/FP labeling
  • frames/ → source material for training dataset expansion
  • health.jsonl → thermal/power profile for hardware optimization
  • gimbal.log → PID tuning analysis
  • app.log → debugging and issue diagnosis