Files
detections-semantic/_docs/02_plans/epics.md
T
Oleksandr Bezdieniezhnykh 8e2ecf50fd Initial commit
Made-with: Cursor
2026-03-26 00:20:30 +02:00

18 KiB
Raw Blame History

Jira Epics — Semantic Detection System

Epics created in Jira project AZ (AZAION) on 2026-03-20.

Epic → Jira ID Mapping

# Epic Jira ID
1 Bootstrap & Initial Structure AZ-130
2 Tier1Detector — YOLOE TensorRT Inference AZ-131
3 Tier2SpatialAnalyzer — Spatial Pattern Analysis AZ-132
4 VLMClient — NanoLLM IPC Client AZ-133
5 GimbalDriver — ViewLink Serial Control AZ-134
6 OutputManager — Recording & Logging AZ-135
7 ScanController — Behavior Tree Orchestrator AZ-136
8 Integration Tests — End-to-End System Testing AZ-137

Dependency Order

1. AZ-130 Bootstrap & Initial Structure  (no dependencies)
2. AZ-131 Tier1Detector                  (depends on AZ-130)
3. AZ-132 Tier2SpatialAnalyzer           (depends on AZ-130)  ← parallel with 2,4,5,6
4. AZ-133 VLMClient                      (depends on AZ-130)  ← parallel with 2,3,5,6
5. AZ-134 GimbalDriver                   (depends on AZ-130)  ← parallel with 2,3,4,6
6. AZ-135 OutputManager                  (depends on AZ-130)  ← parallel with 2,3,4,5
7. AZ-136 ScanController                 (depends on AZ-130AZ-135)
8. AZ-137 Integration Tests              (depends on AZ-136)

Epic 1: Bootstrap & Initial Structure

Summary: Scaffold the project: folder structure, shared models, interfaces, stubs, CI/CD config, Docker setup, test infrastructure.

Problem / Context: The semantic detection module needs a clean project scaffold that integrates with the existing Cython + TensorRT codebase. All components share Config and Types helpers that must exist before any implementation.

Scope:

In Scope:

  • Project folder structure matching architecture
  • Config helper: YAML loading, validation, typed access, dev/prod configs
  • Types helper: all shared dataclasses (FrameContext, Detection, POI, GimbalState, CapabilityFlags, SpatialAnalysisResult, Waypoint, VLMResponse, SearchScenario)
  • Interface stubs for all 6 components
  • Docker setup: dev Dockerfile, docker-compose with mock services
  • CI pipeline config: lint, test, build stages
  • Test infrastructure: pytest setup, fixture directory, mock factories

Out of Scope:

  • Component implementation (handled in component epics)
  • Real hardware integration
  • Model training / export

Dependencies:

  • Epic dependencies: None (first epic)
  • External: Existing detections repository access

Effort Estimation: M / 5-8 points

Acceptance Criteria:

# Criterion Measurable Condition
1 Config helper loads and validates YAML Unit tests pass for valid/invalid configs
2 Types helper defines all shared structs All dataclasses importable, fields match spec
3 Docker dev environment boots docker compose up succeeds, health endpoint returns
4 CI pipeline runs lint + test Pipeline passes on scaffold code

Risks:

# Risk Mitigation
1 Cython build system integration Start with pure Python, Cython-ize later
2 Config schema changes during development Version field in config, validation tolerant of additions

Labels: component:bootstrap, type:platform

Child Issues:

Type Title Points
Task Create project folder structure and pyproject.toml 1
Task Implement Config helper with YAML validation 3
Task Implement Types helper with all shared dataclasses 2
Task Create interface stubs for all 6 components 2
Task Docker dev setup + docker-compose 3
Task CI pipeline config (lint, test, build) 2
Task Test infrastructure (pytest, fixtures, mock factories) 2

Epic 2: Tier1Detector — YOLOE TensorRT Inference

Summary: Wrap YOLOE TensorRT FP16 inference for detection + segmentation on aerial frames.

Problem / Context: The system needs fast (<100ms) object detection including segmentation masks for footpaths and concealment indicators. Must support both YOLOE-11 and YOLOE-26 backbones.

Scope:

In Scope:

  • TRT engine loading with class name configuration
  • Frame preprocessing (resize, normalize)
  • Detection + segmentation inference
  • NMS handling for YOLOE-11 (NMS-free for YOLOE-26)
  • ONNX Runtime fallback for dev environment

Out of Scope:

  • Model training and export (separate repo)
  • Custom class fine-tuning

Dependencies:

  • Epic dependencies: Bootstrap
  • External: Pre-exported TRT engine files, Ultralytics 8.4.x

Effort Estimation: M / 5-8 points

Acceptance Criteria:

# Criterion Measurable Condition
1 Inference ≤100ms on Jetson Orin Nano Super PT-01 passes
2 New classes P≥80%, R≥80% on validation set AT-01 passes
3 Existing classes not degraded AT-02 passes (mAP50 within 2% of baseline)
4 GPU memory ≤2.5GB PT-03 passes

Risks:

# Risk Mitigation
1 R01: Backbone accuracy on concealment data Benchmark sprint; dual backbone strategy
2 YOLOE-26 NMS-free model packaging Validate engine metadata detection at load time

Labels: component:tier1-detector, type:inference

Child Issues:

Type Title Points
Spike Benchmark YOLOE-11 vs YOLOE-26 on 200 annotated frames 3
Task Implement TRT engine loader with class name config 3
Task Implement detect() with preprocessing + postprocessing 3
Task Add ONNX Runtime fallback for dev environment 2
Task Write unit + integration tests for Tier1Detector 2

Epic 3: Tier2SpatialAnalyzer — Spatial Pattern Analysis

Summary: Analyze spatial patterns from Tier 1 detections — trace footpath masks and cluster discrete objects — producing waypoints for gimbal navigation.

Problem / Context: After Tier 1 detects objects, spatial reasoning is needed to trace footpaths to endpoints (concealed positions) and group clustered objects (defense networks). This is the core semantic reasoning layer.

Scope:

In Scope:

  • Mask tracing: skeletonize → prune → endpoints → classify
  • Cluster tracing: spatial clustering → visit order → per-point classify
  • ROI classification heuristic (darkness + contrast)
  • Freshness tagging for mask traces

Out of Scope:

  • CNN classifier (removed from V1)
  • Machine learning-based classification (V2+)

Dependencies:

  • Epic dependencies: Bootstrap
  • External: scikit-image, scipy

Effort Estimation: M / 5-8 points

Acceptance Criteria:

# Criterion Measurable Condition
1 trace_mask ≤200ms on 1080p mask PT-01 passes
2 Concealed position recall ≥60% AT-01 passes
3 Footpath endpoint detection ≥70% AT-02 passes
4 Freshness tags correctly assigned AT-03 passes (≥80% high-contrast correct)

Risks:

# Risk Mitigation
1 R03: High false positive rate from heuristic Conservative thresholds; per-season config
2 R07: Fragmented masks Morphological closing; min-branch pruning

Labels: component:tier2-spatial-analyzer, type:inference

Child Issues:

Type Title Points
Task Implement mask tracing pipeline (skeletonize, prune, endpoints) 5
Task Implement cluster tracing (spatial clustering, visit order) 3
Task Implement analyze_roi heuristic (darkness + contrast + freshness) 3
Task Write unit + integration tests for Tier2SpatialAnalyzer 2

Epic 4: VLMClient — NanoLLM IPC Client

Summary: IPC client for communicating with the NanoLLM Docker container via Unix domain socket for Tier 3 visual language model analysis.

Problem / Context: Ambiguous Tier 2 results need deep visual analysis via VILA1.5-3B VLM. The VLM runs in a separate Docker container; this client manages the IPC protocol and model lifecycle (load/unload for GPU memory management).

Scope:

In Scope:

  • Unix domain socket client (connect, disconnect)
  • JSON IPC protocol (analyze, load_model, unload_model, status)
  • Model lifecycle management (load on demand, unload to free GPU)
  • Timeout handling, retry logic, availability tracking

Out of Scope:

  • VLM model selection/training
  • NanoLLM Docker container itself (pre-built)

Dependencies:

  • Epic dependencies: Bootstrap
  • External: NanoLLM Docker container with VILA1.5-3B

Effort Estimation: S / 3-5 points

Acceptance Criteria:

# Criterion Measurable Condition
1 Round-trip analyze ≤5s PT-01 passes
2 GPU memory released on unload PT-02 passes (≤baseline+50MB)
3 3 consecutive failures → unavailable IT-06 passes
4 Temp files cleaned after analyze ST-02 passes

Risks:

# Risk Mitigation
1 R02: VLM load latency (5-10s) Predictive loading when first POI queued
2 R04: GPU memory pressure Sequential scheduling; explicit unload

Labels: component:vlm-client, type:integration

Child Issues:

Type Title Points
Task Implement Unix socket client (connect, disconnect, protocol) 3
Task Implement model lifecycle management (load, unload, status) 2
Task Implement analyze() with timeout, retry, availability tracking 3
Task Write unit + integration tests for VLMClient 2

Summary: Hardware adapter for the ViewPro A40 gimbal, implementing the ViewLink serial protocol for pan/tilt/zoom control and PID-based path following.

Problem / Context: The scan controller needs to point the camera at specific angles and follow paths. This requires implementing the ViewLink Serial Protocol V3.3.3 over UART, plus a PID controller for smooth path tracking. A mock TCP mode enables development without hardware.

Scope:

In Scope:

  • ViewLink protocol: send commands, receive state, checksum validation
  • Pan/tilt/zoom absolute control
  • PID dual-axis path following
  • Mock TCP mode for development
  • Retry logic with CRC validation

Out of Scope:

  • Advanced gimbal features (tracking modes, stabilization tuning)
  • Hardware EMI mitigation (physical, not software)

Dependencies:

  • Epic dependencies: Bootstrap
  • External: ViewLink Protocol V3.3.3 specification, pyserial

Effort Estimation: L / 8-13 points

Acceptance Criteria:

# Criterion Measurable Condition
1 Command latency ≤500ms PT-01 passes
2 Zoom transition ≤2s PT-02, AT-02 pass
3 PID keeps path in center 50% AT-03 passes (≥90% of cycles)
4 Smooth transitions (jerk ≤50 deg/s³) PT-03 passes

Risks:

# Risk Mitigation
1 R08: ViewLink protocol implementation effort ArduPilot C++ reference; mock mode for parallel dev
2 PID tuning on real hardware Configurable gains; bench test phase

Labels: component:gimbal-driver, type:hardware

Child Issues:

Type Title Points
Spike Parse ViewLink V3.3.3 protocol spec, document packet format 3
Task Implement UART/TCP connection layer with mock mode 3
Task Implement command send/receive with checksum and retry 5
Task Implement PID dual-axis controller with anti-windup 3
Task Implement zoom_to_poi, return_to_sweep, follow_path 3
Task Write unit + integration tests for GimbalDriver 2
Task Mock gimbal TCP server for dev/test 2

Epic 6: OutputManager — Recording & Logging

Summary: Facade over all persistent output: detection logging, frame recording, health logging, gimbal logging, and operator detection delivery.

Problem / Context: Every flight produces data needed for operator situational awareness, post-flight review, and training data collection. Recording must never block inference. NVMe storage requires circular buffer management.

Scope:

In Scope:

  • Detection JSON-lines logger (append, flush)
  • JPEG frame recorder (L1 at 2 FPS, L2 at 30 FPS)
  • Health and gimbal command logging
  • Operator delivery in YOLO-compatible format
  • NVMe storage monitoring and circular buffer

Out of Scope:

  • Data export/upload tools
  • Long-term storage management

Dependencies:

  • Epic dependencies: Bootstrap
  • External: NVMe SSD, OpenCV for JPEG encoding

Effort Estimation: S / 3-5 points

Acceptance Criteria:

# Criterion Measurable Condition
1 30 FPS frame recording without dropped frames PT-01 passes
2 No memory leak under sustained load PT-02 passes (≤10MB growth)
3 Storage warning triggers at <20% free IT-07 passes
4 Write failures don't block caller IT-09 passes

Risks:

# Risk Mitigation
1 R11: NVMe write latency at 30 FPS Async writes; drop frames if queue backs up
2 R09: Operator overload Confidence thresholds; detection throttle

Labels: component:output-manager, type:data

Child Issues:

Type Title Points
Task Implement detection JSON-lines logger 2
Task Implement JPEG frame recorder with rate control 3
Task Implement health + gimbal command logging 1
Task Implement operator delivery in YOLO format 2
Task Implement NVMe storage monitor + circular buffer 3
Task Write unit + integration tests for OutputManager 2

Epic 7: ScanController — Behavior Tree Orchestrator

Summary: Central orchestrator implementing the two-level scan strategy via a py_trees behavior tree with data-driven search scenarios.

Problem / Context: All components need coordination: L1 sweep finds POIs, L2 investigation analyzes them via the appropriate subtree (path_follow, cluster_follow, area_sweep, zoom_classify), and results flow to the operator. Health monitoring provides graceful degradation.

Scope:

In Scope:

  • Behavior tree structure (Root, HealthGuard, L2Investigation, L1Sweep, Idle)
  • All 4 investigation subtrees (PathFollow, ClusterFollow, AreaSweep, ZoomClassify)
  • POI queue with priority management and deduplication
  • EvaluatePOI with scenario-aware trigger matching
  • Search scenario YAML loading and dispatching
  • Health API endpoint (/api/v1/health)
  • Capability flags and graceful degradation

Out of Scope:

  • Component internals (delegated to respective components)
  • New investigation types (extensible via BT subtrees)

Dependencies:

  • Epic dependencies: Bootstrap, Tier1Detector, Tier2SpatialAnalyzer, VLMClient, GimbalDriver, OutputManager
  • External: py_trees 2.4.0, FastAPI

Effort Estimation: L / 8-13 points

Acceptance Criteria:

# Criterion Measurable Condition
1 L1→L2 transition ≤2s PT-01 passes
2 Full L1→L2→L1 cycle works AT-03 passes
3 POI queue orders by priority IT-09 passes
4 HealthGuard degrades gracefully IT-11 passes
5 Coexists with YOLO (≤5% FPS reduction) PT-02 passes

Risks:

# Risk Mitigation
1 R06: Config complexity → runtime errors Validation at startup; skip invalid scenarios
2 Single-threaded BT bottleneck Leaf nodes delegate to optimized C/TRT backends

Labels: component:scan-controller, type:orchestration

Child Issues:

Type Title Points
Task Implement BT skeleton: Root, HealthGuard, L1Sweep, L2Investigation, Idle 5
Task Implement EvaluatePOI with scenario-aware matching + cluster aggregation 3
Task Implement PathFollowSubtree (TraceMask, PIDFollow, WaypointAnalysis) 5
Task Implement ClusterFollowSubtree (TraceCluster, VisitLoop, ClassifyWaypoint) 3
Task Implement AreaSweepSubtree and ZoomClassifySubtree 3
Task Implement POI queue (priority, deduplication, max size) 2
Task Implement health API endpoint + capability flags 2
Task Write unit + integration tests for ScanController 3

Epic 8: Integration Tests — End-to-End System Testing

Summary: Implement the black-box integration test suite defined in _docs/02_plans/integration_tests/.

Problem / Context: The system must be validated end-to-end using Docker-based tests that treat the semantic detection module as a black box, verifying all acceptance criteria and cross-component interactions.

Scope:

In Scope:

  • Functional integration tests (all scenarios from integration_tests/functional_tests.md)
  • Non-functional tests (performance, resilience)
  • Docker test environment setup
  • Test data management
  • CI integration

Out of Scope:

  • Unit tests (covered in component epics)
  • Field testing on real hardware

Dependencies:

  • Epic dependencies: ScanController (all components integrated)
  • External: Docker, test data set

Effort Estimation: M / 5-8 points

Acceptance Criteria:

# Criterion Measurable Condition
1 All functional test scenarios pass Green CI for functional suite
2 ≥76% AC coverage in traceability matrix Coverage report
3 Docker test env boots with docker compose up Setup documented and reproducible

Risks:

# Risk Mitigation
1 Test data availability (annotated imagery) Use synthetic data for CI; real data for acceptance
2 Mock services diverge from real behavior Keep mocks minimal; integration tests catch drift

Labels: component:integration-tests, type:testing

Child Issues:

Type Title Points
Task Docker test environment + compose setup 3
Task Implement functional integration tests (positive scenarios) 5
Task Implement functional integration tests (negative/edge scenarios) 3
Task Implement non-functional tests (performance, resilience) 3
Task CI integration and test reporting 2

Summary

# Jira ID Epic T-shirt Points Range Dependencies
1 AZ-130 Bootstrap & Initial Structure M 5-8 None
2 AZ-131 Tier1Detector M 5-8 AZ-130
3 AZ-132 Tier2SpatialAnalyzer M 5-8 AZ-130
4 AZ-133 VLMClient S 3-5 AZ-130
5 AZ-134 GimbalDriver L 8-13 AZ-130
6 AZ-135 OutputManager S 3-5 AZ-130
7 AZ-136 ScanController L 8-13 AZ-130AZ-135
8 AZ-137 Integration Tests M 5-8 AZ-136
Total 42-68