# Observability ## Logging ### Detection Log | Field | Type | Description | |-------|------|-------------| | ts | ISO 8601 | Detection timestamp | | frame_id | uint64 | Source frame | | gps_denied_lat | float64 | GPS-denied latitude | | gps_denied_lon | float64 | GPS-denied longitude | | tier | uint8 | Tier that produced detection | | class | string | Detection class label | | confidence | float32 | Detection confidence | | bbox | float32[4] | centerX, centerY, width, height (normalized) | | freshness | string | Freshness tag (footpaths only) | | tier2_result | string | Tier 2 classification | | tier2_confidence | float32 | Tier 2 confidence | | tier3_used | bool | Whether VLM was invoked | | thumbnail_path | string | Path to ROI thumbnail | **Format**: JSON-lines, append-only **Location**: `/data/output/detections.jsonl` **Rotation**: None (circular buffer at filesystem level for L1 frames) ### Gimbal Command Log **Format**: Text, one line per command (timestamp, command type, target angles, CRC status, retry count) **Location**: `/data/output/gimbal.log` ### System Health Log **Format**: JSON-lines, 1 entry per second **Fields**: timestamp, t_junction, power_watts, gpu_mem_mb, cpu_mem_mb, degradation_level, gimbal_alive, semantic_alive, vlm_alive, nvme_free_pct **Location**: `/data/output/health.jsonl` ### Application Error Log **Format**: Text with severity levels (ERROR, WARN, INFO) **Location**: `/data/output/app.log` **Content**: Exceptions, timeouts, CRC failures, frame skips, VLM errors ## Metrics (In-Memory) No external metrics service (air-gapped). Metrics are computed in-memory and exposed via health API endpoint: | Metric | Type | Description | |--------|------|-------------| | frames_processed_total | Counter | Total frames through Tier 1 | | frames_skipped_quality | Counter | Frames rejected by quality gate | | detections_total | Counter | Total detections produced (all tiers) | | tier1_latency_ms | Histogram | Tier 1 inference time | | tier2_latency_ms | Histogram | Tier 2 processing time | | tier3_latency_ms | Histogram | Tier 3 VLM time | | poi_queue_depth | Gauge | Current POI queue size | | degradation_level | Gauge | Current degradation level | | t_junction_celsius | Gauge | Current junction temperature | | power_draw_watts | Gauge | Current power draw | | gpu_memory_used_mb | Gauge | Current GPU memory | | gimbal_crc_failures | Counter | Total CRC failures on UART | | vlm_crashes | Counter | VLM process crash count | **Exposed via**: GET /api/v1/health (JSON response with all metrics) ## Alerting No external alerting system. Alerts are: 1. Degradation level changes → logged to health log + detection log 2. Critical events (VLM crash, gimbal loss, thermal critical) → logged with severity ERROR 3. Operator display shows current degradation level as status indicator ## Post-Flight Analysis After landing, NVMe data is extracted via USB for offline analysis: - `detections.jsonl` → import into annotation tool for TP/FP labeling - `frames/` → source material for training dataset expansion - `health.jsonl` → thermal/power profile for hardware optimization - `gimbal.log` → PID tuning analysis - `app.log` → debugging and issue diagnosis