mirror of
https://github.com/azaion/detections-semantic.git
synced 2026-04-22 22:36:38 +00:00
8e2ecf50fd
Made-with: Cursor
495 lines
18 KiB
Markdown
495 lines
18 KiB
Markdown
# Jira Epics — Semantic Detection System
|
||
|
||
> Epics created in Jira project AZ (AZAION) on 2026-03-20.
|
||
|
||
## Epic → Jira ID Mapping
|
||
|
||
| # | Epic | Jira ID |
|
||
|---|------|---------|
|
||
| 1 | Bootstrap & Initial Structure | AZ-130 |
|
||
| 2 | Tier1Detector — YOLOE TensorRT Inference | AZ-131 |
|
||
| 3 | Tier2SpatialAnalyzer — Spatial Pattern Analysis | AZ-132 |
|
||
| 4 | VLMClient — NanoLLM IPC Client | AZ-133 |
|
||
| 5 | GimbalDriver — ViewLink Serial Control | AZ-134 |
|
||
| 6 | OutputManager — Recording & Logging | AZ-135 |
|
||
| 7 | ScanController — Behavior Tree Orchestrator | AZ-136 |
|
||
| 8 | Integration Tests — End-to-End System Testing | AZ-137 |
|
||
|
||
## Dependency Order
|
||
|
||
```
|
||
1. AZ-130 Bootstrap & Initial Structure (no dependencies)
|
||
2. AZ-131 Tier1Detector (depends on AZ-130)
|
||
3. AZ-132 Tier2SpatialAnalyzer (depends on AZ-130) ← parallel with 2,4,5,6
|
||
4. AZ-133 VLMClient (depends on AZ-130) ← parallel with 2,3,5,6
|
||
5. AZ-134 GimbalDriver (depends on AZ-130) ← parallel with 2,3,4,6
|
||
6. AZ-135 OutputManager (depends on AZ-130) ← parallel with 2,3,4,5
|
||
7. AZ-136 ScanController (depends on AZ-130–AZ-135)
|
||
8. AZ-137 Integration Tests (depends on AZ-136)
|
||
```
|
||
|
||
---
|
||
|
||
## Epic 1: Bootstrap & Initial Structure
|
||
|
||
**Summary**: Scaffold the project: folder structure, shared models, interfaces, stubs, CI/CD config, Docker setup, test infrastructure.
|
||
|
||
**Problem / Context**: The semantic detection module needs a clean project scaffold that integrates with the existing Cython + TensorRT codebase. All components share Config and Types helpers that must exist before any implementation.
|
||
|
||
**Scope**:
|
||
|
||
In Scope:
|
||
- Project folder structure matching architecture
|
||
- Config helper: YAML loading, validation, typed access, dev/prod configs
|
||
- Types helper: all shared dataclasses (FrameContext, Detection, POI, GimbalState, CapabilityFlags, SpatialAnalysisResult, Waypoint, VLMResponse, SearchScenario)
|
||
- Interface stubs for all 6 components
|
||
- Docker setup: dev Dockerfile, docker-compose with mock services
|
||
- CI pipeline config: lint, test, build stages
|
||
- Test infrastructure: pytest setup, fixture directory, mock factories
|
||
|
||
Out of Scope:
|
||
- Component implementation (handled in component epics)
|
||
- Real hardware integration
|
||
- Model training / export
|
||
|
||
**Dependencies**:
|
||
- Epic dependencies: None (first epic)
|
||
- External: Existing detections repository access
|
||
|
||
**Effort Estimation**: M / 5-8 points
|
||
|
||
**Acceptance Criteria**:
|
||
|
||
| # | Criterion | Measurable Condition |
|
||
|---|-----------|---------------------|
|
||
| 1 | Config helper loads and validates YAML | Unit tests pass for valid/invalid configs |
|
||
| 2 | Types helper defines all shared structs | All dataclasses importable, fields match spec |
|
||
| 3 | Docker dev environment boots | `docker compose up` succeeds, health endpoint returns |
|
||
| 4 | CI pipeline runs lint + test | Pipeline passes on scaffold code |
|
||
|
||
**Risks**:
|
||
|
||
| # | Risk | Mitigation |
|
||
|---|------|------------|
|
||
| 1 | Cython build system integration | Start with pure Python, Cython-ize later |
|
||
| 2 | Config schema changes during development | Version field in config, validation tolerant of additions |
|
||
|
||
**Labels**: `component:bootstrap`, `type:platform`
|
||
|
||
**Child Issues**:
|
||
|
||
| Type | Title | Points |
|
||
|------|-------|--------|
|
||
| Task | Create project folder structure and pyproject.toml | 1 |
|
||
| Task | Implement Config helper with YAML validation | 3 |
|
||
| Task | Implement Types helper with all shared dataclasses | 2 |
|
||
| Task | Create interface stubs for all 6 components | 2 |
|
||
| Task | Docker dev setup + docker-compose | 3 |
|
||
| Task | CI pipeline config (lint, test, build) | 2 |
|
||
| Task | Test infrastructure (pytest, fixtures, mock factories) | 2 |
|
||
|
||
---
|
||
|
||
## Epic 2: Tier1Detector — YOLOE TensorRT Inference
|
||
|
||
**Summary**: Wrap YOLOE TensorRT FP16 inference for detection + segmentation on aerial frames.
|
||
|
||
**Problem / Context**: The system needs fast (<100ms) object detection including segmentation masks for footpaths and concealment indicators. Must support both YOLOE-11 and YOLOE-26 backbones.
|
||
|
||
**Scope**:
|
||
|
||
In Scope:
|
||
- TRT engine loading with class name configuration
|
||
- Frame preprocessing (resize, normalize)
|
||
- Detection + segmentation inference
|
||
- NMS handling for YOLOE-11 (NMS-free for YOLOE-26)
|
||
- ONNX Runtime fallback for dev environment
|
||
|
||
Out of Scope:
|
||
- Model training and export (separate repo)
|
||
- Custom class fine-tuning
|
||
|
||
**Dependencies**:
|
||
- Epic dependencies: Bootstrap
|
||
- External: Pre-exported TRT engine files, Ultralytics 8.4.x
|
||
|
||
**Effort Estimation**: M / 5-8 points
|
||
|
||
**Acceptance Criteria**:
|
||
|
||
| # | Criterion | Measurable Condition |
|
||
|---|-----------|---------------------|
|
||
| 1 | Inference ≤100ms on Jetson Orin Nano Super | PT-01 passes |
|
||
| 2 | New classes P≥80%, R≥80% on validation set | AT-01 passes |
|
||
| 3 | Existing classes not degraded | AT-02 passes (mAP50 within 2% of baseline) |
|
||
| 4 | GPU memory ≤2.5GB | PT-03 passes |
|
||
|
||
**Risks**:
|
||
|
||
| # | Risk | Mitigation |
|
||
|---|------|------------|
|
||
| 1 | R01: Backbone accuracy on concealment data | Benchmark sprint; dual backbone strategy |
|
||
| 2 | YOLOE-26 NMS-free model packaging | Validate engine metadata detection at load time |
|
||
|
||
**Labels**: `component:tier1-detector`, `type:inference`
|
||
|
||
**Child Issues**:
|
||
|
||
| Type | Title | Points |
|
||
|------|-------|--------|
|
||
| Spike | Benchmark YOLOE-11 vs YOLOE-26 on 200 annotated frames | 3 |
|
||
| Task | Implement TRT engine loader with class name config | 3 |
|
||
| Task | Implement detect() with preprocessing + postprocessing | 3 |
|
||
| Task | Add ONNX Runtime fallback for dev environment | 2 |
|
||
| Task | Write unit + integration tests for Tier1Detector | 2 |
|
||
|
||
---
|
||
|
||
## Epic 3: Tier2SpatialAnalyzer — Spatial Pattern Analysis
|
||
|
||
**Summary**: Analyze spatial patterns from Tier 1 detections — trace footpath masks and cluster discrete objects — producing waypoints for gimbal navigation.
|
||
|
||
**Problem / Context**: After Tier 1 detects objects, spatial reasoning is needed to trace footpaths to endpoints (concealed positions) and group clustered objects (defense networks). This is the core semantic reasoning layer.
|
||
|
||
**Scope**:
|
||
|
||
In Scope:
|
||
- Mask tracing: skeletonize → prune → endpoints → classify
|
||
- Cluster tracing: spatial clustering → visit order → per-point classify
|
||
- ROI classification heuristic (darkness + contrast)
|
||
- Freshness tagging for mask traces
|
||
|
||
Out of Scope:
|
||
- CNN classifier (removed from V1)
|
||
- Machine learning-based classification (V2+)
|
||
|
||
**Dependencies**:
|
||
- Epic dependencies: Bootstrap
|
||
- External: scikit-image, scipy
|
||
|
||
**Effort Estimation**: M / 5-8 points
|
||
|
||
**Acceptance Criteria**:
|
||
|
||
| # | Criterion | Measurable Condition |
|
||
|---|-----------|---------------------|
|
||
| 1 | trace_mask ≤200ms on 1080p mask | PT-01 passes |
|
||
| 2 | Concealed position recall ≥60% | AT-01 passes |
|
||
| 3 | Footpath endpoint detection ≥70% | AT-02 passes |
|
||
| 4 | Freshness tags correctly assigned | AT-03 passes (≥80% high-contrast correct) |
|
||
|
||
**Risks**:
|
||
|
||
| # | Risk | Mitigation |
|
||
|---|------|------------|
|
||
| 1 | R03: High false positive rate from heuristic | Conservative thresholds; per-season config |
|
||
| 2 | R07: Fragmented masks | Morphological closing; min-branch pruning |
|
||
|
||
**Labels**: `component:tier2-spatial-analyzer`, `type:inference`
|
||
|
||
**Child Issues**:
|
||
|
||
| Type | Title | Points |
|
||
|------|-------|--------|
|
||
| Task | Implement mask tracing pipeline (skeletonize, prune, endpoints) | 5 |
|
||
| Task | Implement cluster tracing (spatial clustering, visit order) | 3 |
|
||
| Task | Implement analyze_roi heuristic (darkness + contrast + freshness) | 3 |
|
||
| Task | Write unit + integration tests for Tier2SpatialAnalyzer | 2 |
|
||
|
||
---
|
||
|
||
## Epic 4: VLMClient — NanoLLM IPC Client
|
||
|
||
**Summary**: IPC client for communicating with the NanoLLM Docker container via Unix domain socket for Tier 3 visual language model analysis.
|
||
|
||
**Problem / Context**: Ambiguous Tier 2 results need deep visual analysis via VILA1.5-3B VLM. The VLM runs in a separate Docker container; this client manages the IPC protocol and model lifecycle (load/unload for GPU memory management).
|
||
|
||
**Scope**:
|
||
|
||
In Scope:
|
||
- Unix domain socket client (connect, disconnect)
|
||
- JSON IPC protocol (analyze, load_model, unload_model, status)
|
||
- Model lifecycle management (load on demand, unload to free GPU)
|
||
- Timeout handling, retry logic, availability tracking
|
||
|
||
Out of Scope:
|
||
- VLM model selection/training
|
||
- NanoLLM Docker container itself (pre-built)
|
||
|
||
**Dependencies**:
|
||
- Epic dependencies: Bootstrap
|
||
- External: NanoLLM Docker container with VILA1.5-3B
|
||
|
||
**Effort Estimation**: S / 3-5 points
|
||
|
||
**Acceptance Criteria**:
|
||
|
||
| # | Criterion | Measurable Condition |
|
||
|---|-----------|---------------------|
|
||
| 1 | Round-trip analyze ≤5s | PT-01 passes |
|
||
| 2 | GPU memory released on unload | PT-02 passes (≤baseline+50MB) |
|
||
| 3 | 3 consecutive failures → unavailable | IT-06 passes |
|
||
| 4 | Temp files cleaned after analyze | ST-02 passes |
|
||
|
||
**Risks**:
|
||
|
||
| # | Risk | Mitigation |
|
||
|---|------|------------|
|
||
| 1 | R02: VLM load latency (5-10s) | Predictive loading when first POI queued |
|
||
| 2 | R04: GPU memory pressure | Sequential scheduling; explicit unload |
|
||
|
||
**Labels**: `component:vlm-client`, `type:integration`
|
||
|
||
**Child Issues**:
|
||
|
||
| Type | Title | Points |
|
||
|------|-------|--------|
|
||
| Task | Implement Unix socket client (connect, disconnect, protocol) | 3 |
|
||
| Task | Implement model lifecycle management (load, unload, status) | 2 |
|
||
| Task | Implement analyze() with timeout, retry, availability tracking | 3 |
|
||
| Task | Write unit + integration tests for VLMClient | 2 |
|
||
|
||
---
|
||
|
||
## Epic 5: GimbalDriver — ViewLink Serial Control
|
||
|
||
**Summary**: Hardware adapter for the ViewPro A40 gimbal, implementing the ViewLink serial protocol for pan/tilt/zoom control and PID-based path following.
|
||
|
||
**Problem / Context**: The scan controller needs to point the camera at specific angles and follow paths. This requires implementing the ViewLink Serial Protocol V3.3.3 over UART, plus a PID controller for smooth path tracking. A mock TCP mode enables development without hardware.
|
||
|
||
**Scope**:
|
||
|
||
In Scope:
|
||
- ViewLink protocol: send commands, receive state, checksum validation
|
||
- Pan/tilt/zoom absolute control
|
||
- PID dual-axis path following
|
||
- Mock TCP mode for development
|
||
- Retry logic with CRC validation
|
||
|
||
Out of Scope:
|
||
- Advanced gimbal features (tracking modes, stabilization tuning)
|
||
- Hardware EMI mitigation (physical, not software)
|
||
|
||
**Dependencies**:
|
||
- Epic dependencies: Bootstrap
|
||
- External: ViewLink Protocol V3.3.3 specification, pyserial
|
||
|
||
**Effort Estimation**: L / 8-13 points
|
||
|
||
**Acceptance Criteria**:
|
||
|
||
| # | Criterion | Measurable Condition |
|
||
|---|-----------|---------------------|
|
||
| 1 | Command latency ≤500ms | PT-01 passes |
|
||
| 2 | Zoom transition ≤2s | PT-02, AT-02 pass |
|
||
| 3 | PID keeps path in center 50% | AT-03 passes (≥90% of cycles) |
|
||
| 4 | Smooth transitions (jerk ≤50 deg/s³) | PT-03 passes |
|
||
|
||
**Risks**:
|
||
|
||
| # | Risk | Mitigation |
|
||
|---|------|------------|
|
||
| 1 | R08: ViewLink protocol implementation effort | ArduPilot C++ reference; mock mode for parallel dev |
|
||
| 2 | PID tuning on real hardware | Configurable gains; bench test phase |
|
||
|
||
**Labels**: `component:gimbal-driver`, `type:hardware`
|
||
|
||
**Child Issues**:
|
||
|
||
| Type | Title | Points |
|
||
|------|-------|--------|
|
||
| Spike | Parse ViewLink V3.3.3 protocol spec, document packet format | 3 |
|
||
| Task | Implement UART/TCP connection layer with mock mode | 3 |
|
||
| Task | Implement command send/receive with checksum and retry | 5 |
|
||
| Task | Implement PID dual-axis controller with anti-windup | 3 |
|
||
| Task | Implement zoom_to_poi, return_to_sweep, follow_path | 3 |
|
||
| Task | Write unit + integration tests for GimbalDriver | 2 |
|
||
| Task | Mock gimbal TCP server for dev/test | 2 |
|
||
|
||
---
|
||
|
||
## Epic 6: OutputManager — Recording & Logging
|
||
|
||
**Summary**: Facade over all persistent output: detection logging, frame recording, health logging, gimbal logging, and operator detection delivery.
|
||
|
||
**Problem / Context**: Every flight produces data needed for operator situational awareness, post-flight review, and training data collection. Recording must never block inference. NVMe storage requires circular buffer management.
|
||
|
||
**Scope**:
|
||
|
||
In Scope:
|
||
- Detection JSON-lines logger (append, flush)
|
||
- JPEG frame recorder (L1 at 2 FPS, L2 at 30 FPS)
|
||
- Health and gimbal command logging
|
||
- Operator delivery in YOLO-compatible format
|
||
- NVMe storage monitoring and circular buffer
|
||
|
||
Out of Scope:
|
||
- Data export/upload tools
|
||
- Long-term storage management
|
||
|
||
**Dependencies**:
|
||
- Epic dependencies: Bootstrap
|
||
- External: NVMe SSD, OpenCV for JPEG encoding
|
||
|
||
**Effort Estimation**: S / 3-5 points
|
||
|
||
**Acceptance Criteria**:
|
||
|
||
| # | Criterion | Measurable Condition |
|
||
|---|-----------|---------------------|
|
||
| 1 | 30 FPS frame recording without dropped frames | PT-01 passes |
|
||
| 2 | No memory leak under sustained load | PT-02 passes (≤10MB growth) |
|
||
| 3 | Storage warning triggers at <20% free | IT-07 passes |
|
||
| 4 | Write failures don't block caller | IT-09 passes |
|
||
|
||
**Risks**:
|
||
|
||
| # | Risk | Mitigation |
|
||
|---|------|------------|
|
||
| 1 | R11: NVMe write latency at 30 FPS | Async writes; drop frames if queue backs up |
|
||
| 2 | R09: Operator overload | Confidence thresholds; detection throttle |
|
||
|
||
**Labels**: `component:output-manager`, `type:data`
|
||
|
||
**Child Issues**:
|
||
|
||
| Type | Title | Points |
|
||
|------|-------|--------|
|
||
| Task | Implement detection JSON-lines logger | 2 |
|
||
| Task | Implement JPEG frame recorder with rate control | 3 |
|
||
| Task | Implement health + gimbal command logging | 1 |
|
||
| Task | Implement operator delivery in YOLO format | 2 |
|
||
| Task | Implement NVMe storage monitor + circular buffer | 3 |
|
||
| Task | Write unit + integration tests for OutputManager | 2 |
|
||
|
||
---
|
||
|
||
## Epic 7: ScanController — Behavior Tree Orchestrator
|
||
|
||
**Summary**: Central orchestrator implementing the two-level scan strategy via a py_trees behavior tree with data-driven search scenarios.
|
||
|
||
**Problem / Context**: All components need coordination: L1 sweep finds POIs, L2 investigation analyzes them via the appropriate subtree (path_follow, cluster_follow, area_sweep, zoom_classify), and results flow to the operator. Health monitoring provides graceful degradation.
|
||
|
||
**Scope**:
|
||
|
||
In Scope:
|
||
- Behavior tree structure (Root, HealthGuard, L2Investigation, L1Sweep, Idle)
|
||
- All 4 investigation subtrees (PathFollow, ClusterFollow, AreaSweep, ZoomClassify)
|
||
- POI queue with priority management and deduplication
|
||
- EvaluatePOI with scenario-aware trigger matching
|
||
- Search scenario YAML loading and dispatching
|
||
- Health API endpoint (/api/v1/health)
|
||
- Capability flags and graceful degradation
|
||
|
||
Out of Scope:
|
||
- Component internals (delegated to respective components)
|
||
- New investigation types (extensible via BT subtrees)
|
||
|
||
**Dependencies**:
|
||
- Epic dependencies: Bootstrap, Tier1Detector, Tier2SpatialAnalyzer, VLMClient, GimbalDriver, OutputManager
|
||
- External: py_trees 2.4.0, FastAPI
|
||
|
||
**Effort Estimation**: L / 8-13 points
|
||
|
||
**Acceptance Criteria**:
|
||
|
||
| # | Criterion | Measurable Condition |
|
||
|---|-----------|---------------------|
|
||
| 1 | L1→L2 transition ≤2s | PT-01 passes |
|
||
| 2 | Full L1→L2→L1 cycle works | AT-03 passes |
|
||
| 3 | POI queue orders by priority | IT-09 passes |
|
||
| 4 | HealthGuard degrades gracefully | IT-11 passes |
|
||
| 5 | Coexists with YOLO (≤5% FPS reduction) | PT-02 passes |
|
||
|
||
**Risks**:
|
||
|
||
| # | Risk | Mitigation |
|
||
|---|------|------------|
|
||
| 1 | R06: Config complexity → runtime errors | Validation at startup; skip invalid scenarios |
|
||
| 2 | Single-threaded BT bottleneck | Leaf nodes delegate to optimized C/TRT backends |
|
||
|
||
**Labels**: `component:scan-controller`, `type:orchestration`
|
||
|
||
**Child Issues**:
|
||
|
||
| Type | Title | Points |
|
||
|------|-------|--------|
|
||
| Task | Implement BT skeleton: Root, HealthGuard, L1Sweep, L2Investigation, Idle | 5 |
|
||
| Task | Implement EvaluatePOI with scenario-aware matching + cluster aggregation | 3 |
|
||
| Task | Implement PathFollowSubtree (TraceMask, PIDFollow, WaypointAnalysis) | 5 |
|
||
| Task | Implement ClusterFollowSubtree (TraceCluster, VisitLoop, ClassifyWaypoint) | 3 |
|
||
| Task | Implement AreaSweepSubtree and ZoomClassifySubtree | 3 |
|
||
| Task | Implement POI queue (priority, deduplication, max size) | 2 |
|
||
| Task | Implement health API endpoint + capability flags | 2 |
|
||
| Task | Write unit + integration tests for ScanController | 3 |
|
||
|
||
---
|
||
|
||
## Epic 8: Integration Tests — End-to-End System Testing
|
||
|
||
**Summary**: Implement the black-box integration test suite defined in `_docs/02_plans/integration_tests/`.
|
||
|
||
**Problem / Context**: The system must be validated end-to-end using Docker-based tests that treat the semantic detection module as a black box, verifying all acceptance criteria and cross-component interactions.
|
||
|
||
**Scope**:
|
||
|
||
In Scope:
|
||
- Functional integration tests (all scenarios from integration_tests/functional_tests.md)
|
||
- Non-functional tests (performance, resilience)
|
||
- Docker test environment setup
|
||
- Test data management
|
||
- CI integration
|
||
|
||
Out of Scope:
|
||
- Unit tests (covered in component epics)
|
||
- Field testing on real hardware
|
||
|
||
**Dependencies**:
|
||
- Epic dependencies: ScanController (all components integrated)
|
||
- External: Docker, test data set
|
||
|
||
**Effort Estimation**: M / 5-8 points
|
||
|
||
**Acceptance Criteria**:
|
||
|
||
| # | Criterion | Measurable Condition |
|
||
|---|-----------|---------------------|
|
||
| 1 | All functional test scenarios pass | Green CI for functional suite |
|
||
| 2 | ≥76% AC coverage in traceability matrix | Coverage report |
|
||
| 3 | Docker test env boots with `docker compose up` | Setup documented and reproducible |
|
||
|
||
**Risks**:
|
||
|
||
| # | Risk | Mitigation |
|
||
|---|------|------------|
|
||
| 1 | Test data availability (annotated imagery) | Use synthetic data for CI; real data for acceptance |
|
||
| 2 | Mock services diverge from real behavior | Keep mocks minimal; integration tests catch drift |
|
||
|
||
**Labels**: `component:integration-tests`, `type:testing`
|
||
|
||
**Child Issues**:
|
||
|
||
| Type | Title | Points |
|
||
|------|-------|--------|
|
||
| Task | Docker test environment + compose setup | 3 |
|
||
| Task | Implement functional integration tests (positive scenarios) | 5 |
|
||
| Task | Implement functional integration tests (negative/edge scenarios) | 3 |
|
||
| Task | Implement non-functional tests (performance, resilience) | 3 |
|
||
| Task | CI integration and test reporting | 2 |
|
||
|
||
---
|
||
|
||
## Summary
|
||
|
||
| # | Jira ID | Epic | T-shirt | Points Range | Dependencies |
|
||
|---|---------|------|---------|-------------|-------------|
|
||
| 1 | AZ-130 | Bootstrap & Initial Structure | M | 5-8 | None |
|
||
| 2 | AZ-131 | Tier1Detector | M | 5-8 | AZ-130 |
|
||
| 3 | AZ-132 | Tier2SpatialAnalyzer | M | 5-8 | AZ-130 |
|
||
| 4 | AZ-133 | VLMClient | S | 3-5 | AZ-130 |
|
||
| 5 | AZ-134 | GimbalDriver | L | 8-13 | AZ-130 |
|
||
| 6 | AZ-135 | OutputManager | S | 3-5 | AZ-130 |
|
||
| 7 | AZ-136 | ScanController | L | 8-13 | AZ-130–AZ-135 |
|
||
| 8 | AZ-137 | Integration Tests | M | 5-8 | AZ-136 |
|
||
| **Total** | | | | **42-68** | |
|