Made-with: Cursor
6.1 KiB
Semantic Detection System — Data Model
Design Principle
This is a real-time streaming pipeline on an edge device, not a CRUD application. There is no database. Data falls into two categories:
- Runtime structs — in-memory only, exist for one processing cycle, then discarded
- Persistent logs — append-only flat files on NVMe SSD
Runtime Structs (in-memory only)
These are C/Cython structs or Python dataclasses. They are created, consumed by the next pipeline stage, and garbage-collected. No persistence.
FrameContext
Wraps a camera frame with metadata for the current processing cycle.
| Field | Type | Description |
|---|---|---|
| frame_id | uint64 | Sequential counter |
| timestamp | float64 | Capture time (epoch seconds) |
| image | numpy array (H,W,3) | Raw frame pixels |
| scan_level | uint8 | 1 or 2 |
| quality_score | float32 | Laplacian variance (computed on capture) |
| pan | float32 | Gimbal pan at capture |
| tilt | float32 | Gimbal tilt at capture |
| zoom | float32 | Zoom level at capture |
YoloDetection (external input)
Received from existing YOLO pipeline. Consumed by semantic pipeline, not stored.
| Field | Type | Description |
|---|---|---|
| centerX | float32 | Normalized center X (0-1) |
| centerY | float32 | Normalized center Y (0-1) |
| width | float32 | Normalized width (0-1) |
| height | float32 | Normalized height (0-1) |
| classNum | int32 | Class index |
| label | string | Class label |
| confidence | float32 | 0-1 |
| mask | numpy array (H,W) | Segmentation mask (if seg model) |
POI
In-memory queue entry. Created when Tier 1 detects a point of interest, removed after investigation or timeout. Max size configurable (default 10).
| Field | Type | Description |
|---|---|---|
| poi_id | uint64 | Counter |
| frame_id | uint64 | Frame that triggered this POI |
| trigger_class | string | Class that triggered (footpath_winter, branch_pile, etc.) |
| scenario_name | string | Which search scenario triggered this POI |
| investigation_type | string | "path_follow", "area_sweep", or "zoom_classify" |
| confidence | float32 | Trigger confidence |
| bbox | float32[4] | Bounding box in frame |
| priority | float32 | Computed: confidence × priority_boost × recency |
| status | enum | queued / investigating / done / timeout |
GimbalState
Current gimbal position. Single instance, updated on every gimbal feedback message.
| Field | Type | Description |
|---|---|---|
| pan | float32 | Current pan angle |
| tilt | float32 | Current tilt angle |
| zoom | float32 | Current zoom level |
| target_pan | float32 | Commanded pan |
| target_tilt | float32 | Commanded tilt |
| target_zoom | float32 | Commanded zoom |
| last_heartbeat | float64 | Last response timestamp |
Persistent Data (NVMe flat files)
DetectionLogEntry → detections.jsonl
One JSON line per confirmed detection. This is the primary system output.
| Field | Type | Required | Description |
|---|---|---|---|
| ts | string (ISO 8601) | Yes | Detection timestamp |
| frame_id | uint64 | Yes | Source frame |
| gps_denied_lat | float64 | No | GPS-denied latitude (null if unavailable) |
| gps_denied_lon | float64 | No | GPS-denied longitude |
| tier | uint8 | Yes | 1, 2, or 3 |
| class | string | Yes | Detection class label |
| confidence | float32 | Yes | 0-1 |
| bbox | float32[4] | Yes | centerX, centerY, width, height (normalized) |
| freshness | string | No | "high_contrast" / "low_contrast" (footpaths only) |
| tier2_result | string | No | Tier 2 classification |
| tier2_confidence | float32 | No | Tier 2 confidence |
| tier3_used | bool | Yes | Whether VLM was invoked |
| thumbnail_path | string | No | Saved ROI thumbnail path |
HealthLogEntry → health.jsonl
One JSON line per second. System health snapshot.
| Field | Type | Description |
|---|---|---|
| ts | string (ISO 8601) | Timestamp |
| t_junction | float32 | Junction temperature °C |
| power_watts | float32 | Power draw |
| gpu_mem_mb | uint32 | GPU memory used |
| vlm_available | bool | VLM capability flag |
| gimbal_available | bool | Gimbal capability flag |
| semantic_available | bool | Semantic capability flag |
RecordedFrame → frames/{frame_id}.jpg
JPEG file per recorded frame. Metadata embedded in filename (frame_id). Correlation to detections via frame_id in detections.jsonl.
Config → config.yaml
Single YAML file with all runtime parameters. Versioned (version: 1 field). Updated via USB.
What Is NOT Stored
| Item | Why not |
|---|---|
| FootpathMask (segmentation mask) | Transient. Exists ~50ms during Tier 2 processing. Too large to log (HxW binary). |
| PathSkeleton | Transient. Derivative of mask. |
| EndpointROI crop | Thumbnail saved only if detection confirmed. Raw crop discarded. |
| YoloDetection input | External system's data. We consume it, don't archive it. |
| POI queue state | Runtime queue. Not useful after flight. Detections capture the outcomes. |
| Raw VLM response text | Optionally logged inside DetectionLogEntry if tier3_used=true. Not stored separately. |
Storage Budget (256GB NVMe)
| Data | Write Rate | Per Hour | 4-Hour Flight |
|---|---|---|---|
| detections.jsonl | ~1 KB/detection, ~100 detections/min | ~6 MB | ~24 MB |
| health.jsonl | ~200 bytes/s | ~720 KB | ~3 MB |
| frames/ (L1, 2 FPS) | ~100 KB/frame, 2 FPS | ~720 MB | ~2.9 GB |
| frames/ (L2, 30 FPS) | ~100 KB/frame, 30 FPS | ~10.8 GB | ~43 GB (if L2 100% of time) |
| gimbal.log | ~50 bytes/command, 10 Hz | ~1.8 MB | ~7 MB |
| Total (typical L1-heavy) | ~1.5 GB | ~6 GB | |
| Total (L2-heavy) | ~11 GB | ~46 GB |
256GB NVMe comfortably supports 5+ typical flights or 5+ hours of L2-heavy operation before circular buffer kicks in.
Migration Strategy
Not applicable — no relational database. Config changes handled by YAML versioning:
version: 1field in config.yaml- New fields get defaults (backward-compatible)
- Breaking changes: bump version, include migration notes in USB update package