# Semantic Detection System — Planning Report ## Executive Summary Planned a three-tier semantic detection system for UAV reconnaissance that identifies camouflaged/concealed positions by detecting footpaths, tracing them to endpoints, and classifying concealment via heuristic + VLM analysis. The architecture decomposes into 6 components, 2 common helpers, and 8 Jira epics totaling an estimated 42–68 story points, with a behavior tree orchestrating a two-level scan strategy (wide sweep + detailed investigation) driven by data-configurable search scenarios. ## Problem Statement Existing YOLO-based UAV detection pipelines cannot identify camouflaged military positions — FPV operator hideouts, hidden artillery, branch-covered dugouts. A semantic layer is needed that detects terrain indicators (footpaths, branch piles, dark entrances), traces spatial patterns to potential concealment points, and optionally confirms via visual language model analysis, all while controlling a camera gimbal through a two-level scan strategy on a resource-constrained Jetson Orin Nano Super. ## Architecture Overview Three-tier inference pipeline (Tier 1: YOLOE TensorRT detection, Tier 2: heuristic spatial analysis, Tier 3: optional VLM deep analysis) orchestrated by a py_trees behavior tree. The scan controller manages L1 wide-area sweep and L2 detailed investigation with 4 investigation types (path_follow, cluster_follow, area_sweep, zoom_classify), each driven by YAML-configured search scenarios. Graceful degradation via capability flags handles VLM unavailability, gimbal failure, and thermal throttling. **Technology stack**: Cython/Python 3.11, TensorRT FP16, NanoLLM (VILA1.5-3B), py_trees 2.4.0, OpenCV, scikit-image, pyserial, Docker on JetPack 6.2 **Deployment**: Development (x86 workstation with mock services) and Production (Jetson Orin Nano Super on UAV, NVMe SSD, air-gapped) ## Component Summary | # | Component | Purpose | Dependencies | Epic | |---|-----------|---------|-------------|------| | H1 | Config | YAML config loading, validation, typed access | — | Bootstrap | | H2 | Types | Shared dataclasses (FrameContext, Detection, POI, etc.) | — | Bootstrap | | 01 | ScanController | BT orchestrator for L1/L2 scan with scenario dispatch | All components | Epic 7 | | 02 | Tier1Detector | YOLOE TensorRT FP16 detection + segmentation | Config, Types | Epic 2 | | 03 | Tier2SpatialAnalyzer | Mask tracing (footpaths) + cluster tracing (defense systems) | Config, Types | Epic 3 | | 04 | VLMClient | IPC client to NanoLLM Docker container (Unix socket) | Config, Types | Epic 4 | | 05 | GimbalDriver | ViewLink serial protocol + PID path following | Config, Types | Epic 5 | | 06 | OutputManager | Detection logging, frame recording, operator delivery | Config, Types | Epic 6 | **Implementation order**: 1. Phase 1: Bootstrap (Config + Types + project scaffold) 2. Phase 2: Tier1Detector, Tier2SpatialAnalyzer, VLMClient, GimbalDriver, OutputManager (parallel) 3. Phase 3: ScanController (integrates all components) 4. Phase 4: Integration Tests ## System Flows | Flow | Description | Key Components | |------|-------------|---------------| | Main Pipeline | Frame → Tier1 → EvaluatePOI → queue | ScanController, Tier1Detector | | L2 Path Follow | POI → zoom → trace mask → PID follow → waypoint analysis → VLM (optional) | ScanController, Tier2, GimbalDriver, VLMClient | | L2 Cluster Follow | Cluster POI → trace cluster → visit waypoints → classify each | ScanController, Tier2, GimbalDriver | | Health Degradation | Thermal/failure → disable capability → fallback behavior | ScanController | | Recording | Frame/detection → NVMe write → circular buffer management | OutputManager | Reference `system-flows.md` for full details. ## Risk Summary | Level | Count | Key Risks | |-------|-------|-----------| | Critical | 1 | R05: Seasonal model generalization (phased rollout mitigates) | | High | 4 | R01 backbone accuracy, R02 VLM load latency, R03 heuristic FP rate, R04 GPU memory pressure | | Medium | 4 | R06 config complexity, R07 fragmented masks, R08 ViewLink effort, R09 operator overload | | Low | 4 | R10 GIL, R11 NVMe writes, R12 py_trees overhead, R13 scenario extensibility | **Iterations completed**: 1 **All Critical/High risks mitigated**: Yes — all have documented mitigation strategies and contingency plans Reference `risk_mitigations.md` for full register. ## Test Coverage | Component | Integration | Performance | Security | Acceptance | AC Coverage | |-----------|-------------|-------------|----------|------------|-------------| | ScanController | 11 tests | 2 tests | 2 tests | 7 tests | 10 ACs | | Tier1Detector | 5 tests | 3 tests | 1 test | 2 tests | 4 ACs | | Tier2SpatialAnalyzer | 9 tests | 2 tests | 1 test | 4 tests | 7 ACs | | VLMClient | 7 tests | 2 tests | 2 tests | 2 tests | 2 ACs | | GimbalDriver | 9 tests | 3 tests | 1 test | 3 tests | 5 ACs | | OutputManager | 9 tests | 2 tests | 2 tests | 2 tests | 2 ACs | | **Total** | **50** | **14** | **9** | **20** | — | **Overall acceptance criteria coverage**: 27 / 28 ACs covered (96%) - AC-28 (training dataset requirements) not covered — data annotation scope, not runtime behavior ## Epic Roadmap | Order | Jira ID | Epic | Component | Effort | Dependencies | |-------|---------|------|-----------|--------|-------------| | 1 | AZ-130 | Bootstrap & Initial Structure | Config, Types, scaffold | M (5-8 pts) | — | | 2 | AZ-131 | Tier1Detector | Tier1Detector | M (5-8 pts) | AZ-130 | | 3 | AZ-132 | Tier2SpatialAnalyzer | Tier2SpatialAnalyzer | M (5-8 pts) | AZ-130 | | 4 | AZ-133 | VLMClient | VLMClient | S (3-5 pts) | AZ-130 | | 5 | AZ-134 | GimbalDriver | GimbalDriver | L (8-13 pts) | AZ-130 | | 6 | AZ-135 | OutputManager | OutputManager | S (3-5 pts) | AZ-130 | | 7 | AZ-136 | ScanController | ScanController | L (8-13 pts) | AZ-130–AZ-135 | | 8 | AZ-137 | Integration Tests | System-level | M (5-8 pts) | AZ-136 | **Total estimated effort**: 42–68 story points ## Key Decisions Made | # | Decision | Rationale | Alternatives Rejected | |---|----------|-----------|----------------------| | 1 | Three-tier architecture (YOLOE + heuristic + VLM) | Graceful degradation; VLM optional | CNN-based Tier 2 (V2 CNN removed) | | 2 | NanoLLM replaces vLLM | vLLM unstable on Jetson; NanoLLM purpose-built | vLLM, Ollama | | 3 | py_trees Behavior Tree for orchestration | Preemptive, extensible, proven | State machine, custom loop | | 4 | Data-driven YAML search scenarios | New scenarios without code changes | Hardcoded investigation logic | | 5 | Tier2SpatialAnalyzer (unified mask + cluster) | Single component, two strategies, unified output | Separate PathTracer and ClusterTracer components | | 6 | No traditional DB — runtime structs + NVMe flat files | Embedded edge device; no DB overhead | SQLite, Redis | | 7 | Sequential GPU scheduling (YOLOE then VLM) | 8GB shared RAM constraint | Concurrent execution (impossible) | | 8 | FP16 TRT only (INT8 deferred) | INT8 unstable on Jetson currently | INT8 quantization | | 9 | Phased seasonal rollout (winter first) | Critical R05 mitigation | Multi-season from day 1 | ## Open Questions | # | Question | Impact | Assigned To | |---|----------|--------|-------------| | 1 | ViewLink protocol: native checksum or CRC-16 wrapper? | GimbalDriver implementation detail | Dev (during spike) | | 2 | YOLOE-11 vs YOLOE-26: which backbone wins benchmark? | Tier1Detector engine selection | Dev (benchmark sprint) | | 3 | VILA1.5-3B prompt optimization for concealment detection | VLM accuracy on target domain | Dev + domain expert | ## Artifact Index | File | Description | |------|-------------| | `architecture.md` | System architecture, tech stack, ADRs | | `system-flows.md` | 5 system flows with sequence descriptions | | `data_model.md` | Runtime structs + persistent file formats | | `deployment/containerization.md` | Docker strategy | | `deployment/ci_cd_pipeline.md` | CI/CD pipeline stages | | `deployment/environment_strategy.md` | Dev vs production config | | `deployment/observability.md` | Logging, metrics, alerting | | `deployment/deployment_procedures.md` | Rollout, rollback, health checks | | `risk_mitigations.md` | 13 risks with mitigations | | `components/01_scan_controller/description.md` | ScanController spec | | `components/01_scan_controller/tests.md` | ScanController test spec | | `components/02_tier1_detector/description.md` | Tier1Detector spec | | `components/02_tier1_detector/tests.md` | Tier1Detector test spec | | `components/03_tier2_spatial_analyzer/description.md` | Tier2SpatialAnalyzer spec | | `components/03_tier2_spatial_analyzer/tests.md` | Tier2SpatialAnalyzer test spec | | `components/04_vlm_client/description.md` | VLMClient spec | | `components/04_vlm_client/tests.md` | VLMClient test spec | | `components/05_gimbal_driver/description.md` | GimbalDriver spec | | `components/05_gimbal_driver/tests.md` | GimbalDriver test spec | | `components/06_output_manager/description.md` | OutputManager spec | | `components/06_output_manager/tests.md` | OutputManager test spec | | `common-helpers/01_helper_config.md` | Config helper spec | | `common-helpers/02_helper_types.md` | Types helper spec | | `integration_tests/environment.md` | Test environment spec | | `integration_tests/test_data.md` | Test data management | | `integration_tests/functional_tests.md` | Functional test scenarios | | `integration_tests/non_functional_tests.md` | Non-functional test scenarios | | `integration_tests/traceability_matrix.md` | AC-to-test traceability | | `epics.md` | Jira epic specifications (AZ-130 through AZ-137) | | `diagrams/components.md` | Component diagram, dependency graph, data flow (Mermaid) | | `diagrams/flows/flow_main_pipeline.md` | Main pipeline Mermaid diagram |