Files
detections-semantic/_docs/02_plans/epics.md
T
Oleksandr Bezdieniezhnykh 8e2ecf50fd Initial commit
Made-with: Cursor
2026-03-26 00:20:30 +02:00

495 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Jira Epics — Semantic Detection System
> Epics created in Jira project AZ (AZAION) on 2026-03-20.
## Epic → Jira ID Mapping
| # | Epic | Jira ID |
|---|------|---------|
| 1 | Bootstrap & Initial Structure | AZ-130 |
| 2 | Tier1Detector — YOLOE TensorRT Inference | AZ-131 |
| 3 | Tier2SpatialAnalyzer — Spatial Pattern Analysis | AZ-132 |
| 4 | VLMClient — NanoLLM IPC Client | AZ-133 |
| 5 | GimbalDriver — ViewLink Serial Control | AZ-134 |
| 6 | OutputManager — Recording & Logging | AZ-135 |
| 7 | ScanController — Behavior Tree Orchestrator | AZ-136 |
| 8 | Integration Tests — End-to-End System Testing | AZ-137 |
## Dependency Order
```
1. AZ-130 Bootstrap & Initial Structure (no dependencies)
2. AZ-131 Tier1Detector (depends on AZ-130)
3. AZ-132 Tier2SpatialAnalyzer (depends on AZ-130) ← parallel with 2,4,5,6
4. AZ-133 VLMClient (depends on AZ-130) ← parallel with 2,3,5,6
5. AZ-134 GimbalDriver (depends on AZ-130) ← parallel with 2,3,4,6
6. AZ-135 OutputManager (depends on AZ-130) ← parallel with 2,3,4,5
7. AZ-136 ScanController (depends on AZ-130AZ-135)
8. AZ-137 Integration Tests (depends on AZ-136)
```
---
## Epic 1: Bootstrap & Initial Structure
**Summary**: Scaffold the project: folder structure, shared models, interfaces, stubs, CI/CD config, Docker setup, test infrastructure.
**Problem / Context**: The semantic detection module needs a clean project scaffold that integrates with the existing Cython + TensorRT codebase. All components share Config and Types helpers that must exist before any implementation.
**Scope**:
In Scope:
- Project folder structure matching architecture
- Config helper: YAML loading, validation, typed access, dev/prod configs
- Types helper: all shared dataclasses (FrameContext, Detection, POI, GimbalState, CapabilityFlags, SpatialAnalysisResult, Waypoint, VLMResponse, SearchScenario)
- Interface stubs for all 6 components
- Docker setup: dev Dockerfile, docker-compose with mock services
- CI pipeline config: lint, test, build stages
- Test infrastructure: pytest setup, fixture directory, mock factories
Out of Scope:
- Component implementation (handled in component epics)
- Real hardware integration
- Model training / export
**Dependencies**:
- Epic dependencies: None (first epic)
- External: Existing detections repository access
**Effort Estimation**: M / 5-8 points
**Acceptance Criteria**:
| # | Criterion | Measurable Condition |
|---|-----------|---------------------|
| 1 | Config helper loads and validates YAML | Unit tests pass for valid/invalid configs |
| 2 | Types helper defines all shared structs | All dataclasses importable, fields match spec |
| 3 | Docker dev environment boots | `docker compose up` succeeds, health endpoint returns |
| 4 | CI pipeline runs lint + test | Pipeline passes on scaffold code |
**Risks**:
| # | Risk | Mitigation |
|---|------|------------|
| 1 | Cython build system integration | Start with pure Python, Cython-ize later |
| 2 | Config schema changes during development | Version field in config, validation tolerant of additions |
**Labels**: `component:bootstrap`, `type:platform`
**Child Issues**:
| Type | Title | Points |
|------|-------|--------|
| Task | Create project folder structure and pyproject.toml | 1 |
| Task | Implement Config helper with YAML validation | 3 |
| Task | Implement Types helper with all shared dataclasses | 2 |
| Task | Create interface stubs for all 6 components | 2 |
| Task | Docker dev setup + docker-compose | 3 |
| Task | CI pipeline config (lint, test, build) | 2 |
| Task | Test infrastructure (pytest, fixtures, mock factories) | 2 |
---
## Epic 2: Tier1Detector — YOLOE TensorRT Inference
**Summary**: Wrap YOLOE TensorRT FP16 inference for detection + segmentation on aerial frames.
**Problem / Context**: The system needs fast (<100ms) object detection including segmentation masks for footpaths and concealment indicators. Must support both YOLOE-11 and YOLOE-26 backbones.
**Scope**:
In Scope:
- TRT engine loading with class name configuration
- Frame preprocessing (resize, normalize)
- Detection + segmentation inference
- NMS handling for YOLOE-11 (NMS-free for YOLOE-26)
- ONNX Runtime fallback for dev environment
Out of Scope:
- Model training and export (separate repo)
- Custom class fine-tuning
**Dependencies**:
- Epic dependencies: Bootstrap
- External: Pre-exported TRT engine files, Ultralytics 8.4.x
**Effort Estimation**: M / 5-8 points
**Acceptance Criteria**:
| # | Criterion | Measurable Condition |
|---|-----------|---------------------|
| 1 | Inference ≤100ms on Jetson Orin Nano Super | PT-01 passes |
| 2 | New classes P≥80%, R≥80% on validation set | AT-01 passes |
| 3 | Existing classes not degraded | AT-02 passes (mAP50 within 2% of baseline) |
| 4 | GPU memory ≤2.5GB | PT-03 passes |
**Risks**:
| # | Risk | Mitigation |
|---|------|------------|
| 1 | R01: Backbone accuracy on concealment data | Benchmark sprint; dual backbone strategy |
| 2 | YOLOE-26 NMS-free model packaging | Validate engine metadata detection at load time |
**Labels**: `component:tier1-detector`, `type:inference`
**Child Issues**:
| Type | Title | Points |
|------|-------|--------|
| Spike | Benchmark YOLOE-11 vs YOLOE-26 on 200 annotated frames | 3 |
| Task | Implement TRT engine loader with class name config | 3 |
| Task | Implement detect() with preprocessing + postprocessing | 3 |
| Task | Add ONNX Runtime fallback for dev environment | 2 |
| Task | Write unit + integration tests for Tier1Detector | 2 |
---
## Epic 3: Tier2SpatialAnalyzer — Spatial Pattern Analysis
**Summary**: Analyze spatial patterns from Tier 1 detections — trace footpath masks and cluster discrete objects — producing waypoints for gimbal navigation.
**Problem / Context**: After Tier 1 detects objects, spatial reasoning is needed to trace footpaths to endpoints (concealed positions) and group clustered objects (defense networks). This is the core semantic reasoning layer.
**Scope**:
In Scope:
- Mask tracing: skeletonize → prune → endpoints → classify
- Cluster tracing: spatial clustering → visit order → per-point classify
- ROI classification heuristic (darkness + contrast)
- Freshness tagging for mask traces
Out of Scope:
- CNN classifier (removed from V1)
- Machine learning-based classification (V2+)
**Dependencies**:
- Epic dependencies: Bootstrap
- External: scikit-image, scipy
**Effort Estimation**: M / 5-8 points
**Acceptance Criteria**:
| # | Criterion | Measurable Condition |
|---|-----------|---------------------|
| 1 | trace_mask ≤200ms on 1080p mask | PT-01 passes |
| 2 | Concealed position recall ≥60% | AT-01 passes |
| 3 | Footpath endpoint detection ≥70% | AT-02 passes |
| 4 | Freshness tags correctly assigned | AT-03 passes (≥80% high-contrast correct) |
**Risks**:
| # | Risk | Mitigation |
|---|------|------------|
| 1 | R03: High false positive rate from heuristic | Conservative thresholds; per-season config |
| 2 | R07: Fragmented masks | Morphological closing; min-branch pruning |
**Labels**: `component:tier2-spatial-analyzer`, `type:inference`
**Child Issues**:
| Type | Title | Points |
|------|-------|--------|
| Task | Implement mask tracing pipeline (skeletonize, prune, endpoints) | 5 |
| Task | Implement cluster tracing (spatial clustering, visit order) | 3 |
| Task | Implement analyze_roi heuristic (darkness + contrast + freshness) | 3 |
| Task | Write unit + integration tests for Tier2SpatialAnalyzer | 2 |
---
## Epic 4: VLMClient — NanoLLM IPC Client
**Summary**: IPC client for communicating with the NanoLLM Docker container via Unix domain socket for Tier 3 visual language model analysis.
**Problem / Context**: Ambiguous Tier 2 results need deep visual analysis via VILA1.5-3B VLM. The VLM runs in a separate Docker container; this client manages the IPC protocol and model lifecycle (load/unload for GPU memory management).
**Scope**:
In Scope:
- Unix domain socket client (connect, disconnect)
- JSON IPC protocol (analyze, load_model, unload_model, status)
- Model lifecycle management (load on demand, unload to free GPU)
- Timeout handling, retry logic, availability tracking
Out of Scope:
- VLM model selection/training
- NanoLLM Docker container itself (pre-built)
**Dependencies**:
- Epic dependencies: Bootstrap
- External: NanoLLM Docker container with VILA1.5-3B
**Effort Estimation**: S / 3-5 points
**Acceptance Criteria**:
| # | Criterion | Measurable Condition |
|---|-----------|---------------------|
| 1 | Round-trip analyze ≤5s | PT-01 passes |
| 2 | GPU memory released on unload | PT-02 passes (≤baseline+50MB) |
| 3 | 3 consecutive failures → unavailable | IT-06 passes |
| 4 | Temp files cleaned after analyze | ST-02 passes |
**Risks**:
| # | Risk | Mitigation |
|---|------|------------|
| 1 | R02: VLM load latency (5-10s) | Predictive loading when first POI queued |
| 2 | R04: GPU memory pressure | Sequential scheduling; explicit unload |
**Labels**: `component:vlm-client`, `type:integration`
**Child Issues**:
| Type | Title | Points |
|------|-------|--------|
| Task | Implement Unix socket client (connect, disconnect, protocol) | 3 |
| Task | Implement model lifecycle management (load, unload, status) | 2 |
| Task | Implement analyze() with timeout, retry, availability tracking | 3 |
| Task | Write unit + integration tests for VLMClient | 2 |
---
## Epic 5: GimbalDriver — ViewLink Serial Control
**Summary**: Hardware adapter for the ViewPro A40 gimbal, implementing the ViewLink serial protocol for pan/tilt/zoom control and PID-based path following.
**Problem / Context**: The scan controller needs to point the camera at specific angles and follow paths. This requires implementing the ViewLink Serial Protocol V3.3.3 over UART, plus a PID controller for smooth path tracking. A mock TCP mode enables development without hardware.
**Scope**:
In Scope:
- ViewLink protocol: send commands, receive state, checksum validation
- Pan/tilt/zoom absolute control
- PID dual-axis path following
- Mock TCP mode for development
- Retry logic with CRC validation
Out of Scope:
- Advanced gimbal features (tracking modes, stabilization tuning)
- Hardware EMI mitigation (physical, not software)
**Dependencies**:
- Epic dependencies: Bootstrap
- External: ViewLink Protocol V3.3.3 specification, pyserial
**Effort Estimation**: L / 8-13 points
**Acceptance Criteria**:
| # | Criterion | Measurable Condition |
|---|-----------|---------------------|
| 1 | Command latency ≤500ms | PT-01 passes |
| 2 | Zoom transition ≤2s | PT-02, AT-02 pass |
| 3 | PID keeps path in center 50% | AT-03 passes (≥90% of cycles) |
| 4 | Smooth transitions (jerk ≤50 deg/s³) | PT-03 passes |
**Risks**:
| # | Risk | Mitigation |
|---|------|------------|
| 1 | R08: ViewLink protocol implementation effort | ArduPilot C++ reference; mock mode for parallel dev |
| 2 | PID tuning on real hardware | Configurable gains; bench test phase |
**Labels**: `component:gimbal-driver`, `type:hardware`
**Child Issues**:
| Type | Title | Points |
|------|-------|--------|
| Spike | Parse ViewLink V3.3.3 protocol spec, document packet format | 3 |
| Task | Implement UART/TCP connection layer with mock mode | 3 |
| Task | Implement command send/receive with checksum and retry | 5 |
| Task | Implement PID dual-axis controller with anti-windup | 3 |
| Task | Implement zoom_to_poi, return_to_sweep, follow_path | 3 |
| Task | Write unit + integration tests for GimbalDriver | 2 |
| Task | Mock gimbal TCP server for dev/test | 2 |
---
## Epic 6: OutputManager — Recording & Logging
**Summary**: Facade over all persistent output: detection logging, frame recording, health logging, gimbal logging, and operator detection delivery.
**Problem / Context**: Every flight produces data needed for operator situational awareness, post-flight review, and training data collection. Recording must never block inference. NVMe storage requires circular buffer management.
**Scope**:
In Scope:
- Detection JSON-lines logger (append, flush)
- JPEG frame recorder (L1 at 2 FPS, L2 at 30 FPS)
- Health and gimbal command logging
- Operator delivery in YOLO-compatible format
- NVMe storage monitoring and circular buffer
Out of Scope:
- Data export/upload tools
- Long-term storage management
**Dependencies**:
- Epic dependencies: Bootstrap
- External: NVMe SSD, OpenCV for JPEG encoding
**Effort Estimation**: S / 3-5 points
**Acceptance Criteria**:
| # | Criterion | Measurable Condition |
|---|-----------|---------------------|
| 1 | 30 FPS frame recording without dropped frames | PT-01 passes |
| 2 | No memory leak under sustained load | PT-02 passes (≤10MB growth) |
| 3 | Storage warning triggers at <20% free | IT-07 passes |
| 4 | Write failures don't block caller | IT-09 passes |
**Risks**:
| # | Risk | Mitigation |
|---|------|------------|
| 1 | R11: NVMe write latency at 30 FPS | Async writes; drop frames if queue backs up |
| 2 | R09: Operator overload | Confidence thresholds; detection throttle |
**Labels**: `component:output-manager`, `type:data`
**Child Issues**:
| Type | Title | Points |
|------|-------|--------|
| Task | Implement detection JSON-lines logger | 2 |
| Task | Implement JPEG frame recorder with rate control | 3 |
| Task | Implement health + gimbal command logging | 1 |
| Task | Implement operator delivery in YOLO format | 2 |
| Task | Implement NVMe storage monitor + circular buffer | 3 |
| Task | Write unit + integration tests for OutputManager | 2 |
---
## Epic 7: ScanController — Behavior Tree Orchestrator
**Summary**: Central orchestrator implementing the two-level scan strategy via a py_trees behavior tree with data-driven search scenarios.
**Problem / Context**: All components need coordination: L1 sweep finds POIs, L2 investigation analyzes them via the appropriate subtree (path_follow, cluster_follow, area_sweep, zoom_classify), and results flow to the operator. Health monitoring provides graceful degradation.
**Scope**:
In Scope:
- Behavior tree structure (Root, HealthGuard, L2Investigation, L1Sweep, Idle)
- All 4 investigation subtrees (PathFollow, ClusterFollow, AreaSweep, ZoomClassify)
- POI queue with priority management and deduplication
- EvaluatePOI with scenario-aware trigger matching
- Search scenario YAML loading and dispatching
- Health API endpoint (/api/v1/health)
- Capability flags and graceful degradation
Out of Scope:
- Component internals (delegated to respective components)
- New investigation types (extensible via BT subtrees)
**Dependencies**:
- Epic dependencies: Bootstrap, Tier1Detector, Tier2SpatialAnalyzer, VLMClient, GimbalDriver, OutputManager
- External: py_trees 2.4.0, FastAPI
**Effort Estimation**: L / 8-13 points
**Acceptance Criteria**:
| # | Criterion | Measurable Condition |
|---|-----------|---------------------|
| 1 | L1→L2 transition ≤2s | PT-01 passes |
| 2 | Full L1→L2→L1 cycle works | AT-03 passes |
| 3 | POI queue orders by priority | IT-09 passes |
| 4 | HealthGuard degrades gracefully | IT-11 passes |
| 5 | Coexists with YOLO (≤5% FPS reduction) | PT-02 passes |
**Risks**:
| # | Risk | Mitigation |
|---|------|------------|
| 1 | R06: Config complexity → runtime errors | Validation at startup; skip invalid scenarios |
| 2 | Single-threaded BT bottleneck | Leaf nodes delegate to optimized C/TRT backends |
**Labels**: `component:scan-controller`, `type:orchestration`
**Child Issues**:
| Type | Title | Points |
|------|-------|--------|
| Task | Implement BT skeleton: Root, HealthGuard, L1Sweep, L2Investigation, Idle | 5 |
| Task | Implement EvaluatePOI with scenario-aware matching + cluster aggregation | 3 |
| Task | Implement PathFollowSubtree (TraceMask, PIDFollow, WaypointAnalysis) | 5 |
| Task | Implement ClusterFollowSubtree (TraceCluster, VisitLoop, ClassifyWaypoint) | 3 |
| Task | Implement AreaSweepSubtree and ZoomClassifySubtree | 3 |
| Task | Implement POI queue (priority, deduplication, max size) | 2 |
| Task | Implement health API endpoint + capability flags | 2 |
| Task | Write unit + integration tests for ScanController | 3 |
---
## Epic 8: Integration Tests — End-to-End System Testing
**Summary**: Implement the black-box integration test suite defined in `_docs/02_plans/integration_tests/`.
**Problem / Context**: The system must be validated end-to-end using Docker-based tests that treat the semantic detection module as a black box, verifying all acceptance criteria and cross-component interactions.
**Scope**:
In Scope:
- Functional integration tests (all scenarios from integration_tests/functional_tests.md)
- Non-functional tests (performance, resilience)
- Docker test environment setup
- Test data management
- CI integration
Out of Scope:
- Unit tests (covered in component epics)
- Field testing on real hardware
**Dependencies**:
- Epic dependencies: ScanController (all components integrated)
- External: Docker, test data set
**Effort Estimation**: M / 5-8 points
**Acceptance Criteria**:
| # | Criterion | Measurable Condition |
|---|-----------|---------------------|
| 1 | All functional test scenarios pass | Green CI for functional suite |
| 2 | ≥76% AC coverage in traceability matrix | Coverage report |
| 3 | Docker test env boots with `docker compose up` | Setup documented and reproducible |
**Risks**:
| # | Risk | Mitigation |
|---|------|------------|
| 1 | Test data availability (annotated imagery) | Use synthetic data for CI; real data for acceptance |
| 2 | Mock services diverge from real behavior | Keep mocks minimal; integration tests catch drift |
**Labels**: `component:integration-tests`, `type:testing`
**Child Issues**:
| Type | Title | Points |
|------|-------|--------|
| Task | Docker test environment + compose setup | 3 |
| Task | Implement functional integration tests (positive scenarios) | 5 |
| Task | Implement functional integration tests (negative/edge scenarios) | 3 |
| Task | Implement non-functional tests (performance, resilience) | 3 |
| Task | CI integration and test reporting | 2 |
---
## Summary
| # | Jira ID | Epic | T-shirt | Points Range | Dependencies |
|---|---------|------|---------|-------------|-------------|
| 1 | AZ-130 | Bootstrap & Initial Structure | M | 5-8 | None |
| 2 | AZ-131 | Tier1Detector | M | 5-8 | AZ-130 |
| 3 | AZ-132 | Tier2SpatialAnalyzer | M | 5-8 | AZ-130 |
| 4 | AZ-133 | VLMClient | S | 3-5 | AZ-130 |
| 5 | AZ-134 | GimbalDriver | L | 8-13 | AZ-130 |
| 6 | AZ-135 | OutputManager | S | 3-5 | AZ-130 |
| 7 | AZ-136 | ScanController | L | 8-13 | AZ-130AZ-135 |
| 8 | AZ-137 | Integration Tests | M | 5-8 | AZ-136 |
| **Total** | | | | **42-68** | |