Revise acceptance criteria and restrictions documentation to clarify recent updates and specifications. Key changes include enhanced definitions for position accuracy, image processing quality, and operational parameters, as well as updates to camera specifications and validation requirements. This revision aims to improve clarity and ensure alignment with project goals.

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-01 16:24:46 +03:00
parent 3f173c1bb7
commit 7e15868d39
62 changed files with 6878 additions and 13 deletions
+275
View File
@@ -0,0 +1,275 @@
# Risk Assessment — Architecture Review — Iteration 01
## Evaluator Pass Summary
| Check | Result | Notes |
|-------|--------|-------|
| Single Responsibility | Pass | Components each own one primary concern: ingest, VIO, safety, retrieval, verification, cache, MAVLink, FDR, validation |
| Dumb Code / Smart Data | Pass | Complex behavior is mostly expressed through DTOs, mode labels, covariance fields, manifests, and gates |
| Interface Consistency | Pass with fix | Safety wrapper no longer directly depends on cache lifecycle for anchor acceptance; cache freshness/provenance travels through `AnchorDecision` |
| Circular Dependencies | Pass with caution | Runtime flow is acyclic at component ownership level; MAVLink remains a bidirectional protocol adapter but owns no localization policy |
| Missing Interactions | Pass | Pre-VIO occlusion, IMU-only blackout, relocalization, tile writes, FDR, and SITL validation are all represented |
| Security Considerations | Pass | Signed cache sidecars, source/system ID checks, spoofing rejection, and no in-flight satellite-provider access are covered |
| Performance Bottlenecks | Pass | Jetson latency, VPR/local matching, FDR append pressure, PostgreSQL availability, and thermal limits are identified |
| API Contracts | Pass | Core DTO handoffs are documented: `FramePacket`, `VioStatePacket`, `AnchorDecision`, `PositionEstimate`, `FdrEvent` |
## Risk Scoring Matrix
| | Low Impact | Medium Impact | High Impact |
|--|------------|---------------|-------------|
| **High Probability** | Medium | High | Critical |
| **Medium Probability** | Low | Medium | High |
| **Low Probability** | Low | Low | Medium |
## Acceptance Criteria by Risk Level
| Level | Action Required |
|-------|-----------------|
| Low | Accepted and monitored |
| Medium | Mitigation plan required before implementation |
| High | Mitigation + contingency plan required, reviewed during implementation |
| Critical | Must be resolved before proceeding to next planning step |
## Risk Register
| ID | Risk | Category | Probability | Impact | Score | Mitigation | Owner | Status |
|----|------|----------|-------------|--------|-------|------------|-------|--------|
| R01 | ADTi 20MP 20L V1 public specs conflict with planning assumptions for resolution, FPS, lens, interface, and temperature | Technical / External | Medium | High | High | Pin manufacturer datasheet and exact lens/interface before implementation; make camera calibration/spec task a bootstrap blocker | Camera ingest/calibration | Mitigated by gate |
| R02 | BASALT may underperform or lose tracking on nadir fixed-wing low-parallax terrain | Technical | Medium | High | High | Public replay with MUN-FRL/ALTO/Kagaru/EPFL where applicable, representative target replay, OpenVINS reference comparison, Kimera backup path | BASALT VIO adapter | Mitigated by validation |
| R03 | BASALT confidence/covariance may under-report real error | Safety | Medium | High | High | Wrapper owns covariance calibration; compare against ground truth, satellite residuals, and OpenVINS reference; never emit optimistic `horiz_accuracy` | Safety/anchor wrapper | Mitigated by wrapper design |
| R04 | Total occlusion detector may false-negative and feed unusable frames into VIO | Safety / Technical | Medium | High | High | Conservative pre-VIO occlusion gate, FDR status, tests for total blackout, and fallback to IMU-only `dead_reckoned` mode | Camera ingest/calibration | Mitigated by spec/test |
| R05 | IMU-only blackout propagation could be trusted too long | Safety | Medium | High | High | Monotonic covariance growth, `dead_reckoned` label, `fix_type=0`/`horiz_accuracy=999.0` when >30 s or covariance >500 m | Safety/anchor wrapper | Mitigated by AC gate |
| R06 | DINOv2-VLAD + ALIKED/DISK-LightGlue exceeds Jetson latency/memory budget | Performance | Medium | High | High | Trigger-only execution, CPU FAISS first, top-K caps, model profiling, TensorRT only after fidelity checks | Satellite retrieval / Anchor verification | Mitigated by profiling gates |
| R07 | PostgreSQL/PostGIS local DB is unavailable or too heavy for onboard runtime | Technical / Operational | Medium | High | High | Run local onboard PostgreSQL, health-check before flight, keep large payloads in files, fail mission cache validation if DB unavailable | Cache lifecycle / FDR | Mitigated by deployment gates |
| R08 | Generated tile cache poisoning corrupts future anchors | Security / Safety | Low | High | Medium | Sigma gate, provenance sidecars, post-flight Satellite Service voting, no direct promotion to trusted basemap | Cache/tile lifecycle | Mitigated by policy |
| R09 | Public datasets do not cover final target terrain or commercial license needs | External / Schedule | Medium | Medium | Medium | Use public data for de-risking only; representative synchronized target data remains mandatory for acceptance | Validation harness | Mitigated by acceptance rule |
| R10 | MAVLink `GPS_INPUT` parameters or Plane behavior differs from assumptions | Integration | Medium | High | High | Plane SITL release gate with production parameters, spoofing/failsafe tests, raw field validation with pymavlink | MAVLink/GCS integration | Mitigated by SITL gate |
| R11 | FDR appends or PostgreSQL indexing interferes with hot-path latency | Performance | Medium | Medium | Medium | Append asynchronously, use CBOR payload segments for high-volume data, keep PostgreSQL as event index/query surface | FDR/observability | Mitigated by design |
| R12 | GPL/non-commercial tooling accidentally enters production or acceptance evidence | Legal / Compliance | Low | High | Medium | Keep OpenVINS/ORB-SLAM3 reference-only; license-tag datasets before CI; SuperPoint only after legal approval | Validation harness / Architecture | Mitigated by gates |
## Detailed Risk Analysis
### R01: Camera Specification Mismatch
**Description**: Public ADTi pages show 5456 x 3632 stills, 2 fps continuous capture, Sony E mount, and -10..40 C operation. The project needs the exact production lens, camera interface, sustained capture behavior, thermal behavior, and calibration model.
**Trigger conditions**: Manufacturer documentation or hardware testing contradicts assumed FPS, interface, temperature, or lens characteristics.
**Affected components**: Camera ingest/calibration, BASALT VIO adapter, validation harness, deployment procedures.
**Mitigation strategy**:
1. Make camera specification verification a bootstrap task.
2. Require manufacturer datasheet or hardware measurement before implementation claims 3 fps or hot-environment operation.
3. Version calibration data by exact camera/lens/interface.
**Contingency plan**: Reduce frame rate assumptions, adjust latency tests, or select a different navigation camera/lens/interface.
**Residual risk after mitigation**: Medium.
**Documents updated**: `glossary.md`, `architecture.md`, `components/01_camera_ingest_calibration/description.md`, `deployment/deployment_procedures.md`.
---
### R02: BASALT Nadir Fixed-Wing Fit
**Description**: BASALT is a strong VIO candidate, but fixed downward cameras over planar terrain can cause low-parallax and texture-degeneracy cases.
**Trigger conditions**: Public or representative replay shows high drift, frequent tracking loss, or poor initialization.
**Affected components**: BASALT VIO adapter, safety/anchor wrapper, validation harness.
**Mitigation strategy**:
1. Run MUN-FRL first for synchronized nadir camera + IMU + ground truth.
2. Add ALTO/Kagaru/EPFL slices where available for aerial/fixed-wing realism.
3. Compare against OpenVINS reference and Kimera backup.
**Contingency plan**: Keep Kimera backup or build a project-owned fallback estimator around OpenCV + IMU only after replay evidence requires it.
**Residual risk after mitigation**: Medium.
**Documents updated**: `architecture.md`, `components/02_basalt_vio_adapter/description.md`, `tests/test-data.md`.
---
### R03: Covariance Under-Reporting
**Description**: Incorrect confidence is more dangerous than no estimate because the flight controller may trust a false fix.
**Trigger conditions**: Replay error exceeds reported covariance, or anchors are accepted despite inconsistent residuals.
**Affected components**: Safety/anchor wrapper, MAVLink/GCS integration, FDR/observability.
**Mitigation strategy**:
1. Make wrapper covariance the product authority, not BASALT raw confidence.
2. Validate calibration against ground truth, satellite residuals, and OpenVINS reference.
3. Map `horiz_accuracy` so it never under-reports the 95% semi-major covariance axis.
**Contingency plan**: Degrade to no-fix sooner and require operator relocalization or mission abort behavior.
**Residual risk after mitigation**: Medium.
**Documents updated**: `architecture.md`, `components/03_safety_anchor_wrapper/description.md`, `tests/blackbox-tests.md`.
---
### R04: Total Occlusion Detection Failure
**Description**: If total occlusion is not detected before VIO, BASALT may receive unusable frames and produce misleading state updates.
**Trigger conditions**: Lens cover, cloud/whiteout, decode failure, underexposure/overexposure, or textureless frame reaches VIO as usable.
**Affected components**: Camera ingest/calibration, safety/anchor wrapper, BASALT VIO adapter.
**Mitigation strategy**:
1. Camera ingest exposes `OcclusionReport` and sets `usable_for_vio=false` for total occlusion/blackout.
2. Total occlusion bypasses BASALT for that frame.
3. Safety wrapper switches to IMU-only `dead_reckoned` propagation with monotonic covariance growth.
**Contingency plan**: Tune detector conservatively and accept temporary false-positive IMU-only degradation over false VIO confidence.
**Residual risk after mitigation**: Medium.
**Documents updated**: `components/01_camera_ingest_calibration/description.md`, `components/03_safety_anchor_wrapper/description.md`, `system-flows.md`, `diagrams/flows/flow_normal_localization.md`, `tests/resilience-tests.md`.
---
### R05: IMU-Only Mode Over-Trust
**Description**: IMU-only propagation drifts quickly and must be treated as an emergency bridge, not a long-duration solution.
**Trigger conditions**: Blackout lasts longer than 30 seconds or covariance exceeds 500 m.
**Affected components**: Safety/anchor wrapper, MAVLink/GCS integration, FDR/observability.
**Mitigation strategy**:
1. Emit `source_label=dead_reckoned` during IMU-only mode.
2. Grow covariance monotonically.
3. Emit `fix_type=0`, `horiz_accuracy=999.0`, and `VISUAL_BLACKOUT_FAILSAFE` at thresholds.
**Contingency plan**: Stop publishing valid fixes and require relocalization/operator action.
**Residual risk after mitigation**: Low.
**Documents updated**: `components/03_safety_anchor_wrapper/description.md`, `system-flows.md`, `tests/blackbox-tests.md`, `tests/resilience-tests.md`, `tests/traceability-matrix.md`.
---
### R06: Trigger Path Performance
**Description**: DINOv2-VLAD and learned local matching can exceed Jetson latency/memory limits.
**Trigger conditions**: Relocalization exceeds p95 latency, memory budget, or causes thermal throttling.
**Affected components**: Satellite retrieval, anchor verification, validation harness.
**Mitigation strategy**:
1. Keep VPR/local matching trigger-based.
2. Use CPU FAISS first and bounded top-K.
3. Accept optimized engines only after descriptor-fidelity tests pass.
**Contingency plan**: Reduce descriptor resolution/model size, reduce top-K, or fall back to classical features for emergency operation.
**Residual risk after mitigation**: Medium.
**Documents updated**: `architecture.md`, `components/04_satellite_retrieval/description.md`, `components/05_anchor_verification/description.md`, `tests/performance-tests.md`.
---
### R07: Onboard PostgreSQL/PostGIS Availability
**Description**: PostgreSQL/PostGIS is now the structured metadata store. If local DB availability or resource use is poor, cache/FDR queries may fail.
**Trigger conditions**: Local DB does not start, DB files corrupt, DB consumes too much memory/I/O, or migrations fail.
**Affected components**: Cache/tile lifecycle, FDR/observability, deployment procedures.
**Mitigation strategy**:
1. Require local onboard PostgreSQL health check before flight.
2. Store large imagery/descriptors/CBOR payloads as files, not DB blobs.
3. Treat DB unavailability as a mission-cache validation blocker.
**Contingency plan**: Abort mission-cache activation and run only no-cache degraded modes or resync/rebuild DB before flight.
**Residual risk after mitigation**: Medium.
**Documents updated**: `data_model.md`, `architecture.md`, `components/06_cache_tile_lifecycle/description.md`, `components/08_fdr_observability/description.md`, `deployment/environment_strategy.md`.
---
### R08: Cache Poisoning
**Description**: A bad generated tile could be written back and later used as a trusted anchor.
**Trigger conditions**: Generated tile is promoted despite high parent covariance, stale source, bad sidecar, or inconsistent overlap voting.
**Affected components**: Cache/tile lifecycle, safety/anchor wrapper, Satellite Service integration.
**Mitigation strategy**:
1. Require tile-write sigma gates.
2. Store generated tiles as candidates with signed sidecars.
3. Promote only through post-flight Satellite Service validation/voting.
**Contingency plan**: Quarantine generated tiles and invalidate affected cache regions.
**Residual risk after mitigation**: Low.
**Documents updated**: `architecture.md`, `components/06_cache_tile_lifecycle/description.md`, `tests/security-tests.md`.
---
### R09: Dataset Coverage / Licensing
**Description**: Public datasets may not match target terrain, may lack raw synchronized IMU, or may have non-commercial restrictions.
**Trigger conditions**: MUN-FRL/ALTO/Kagaru/EPFL slices are unavailable, unrepresentative, or license-incompatible for acceptance.
**Affected components**: Validation harness, BASALT VIO adapter, anchor verification.
**Mitigation strategy**:
1. Use public datasets for de-risking only.
2. License-tag datasets before CI jobs.
3. Require representative synchronized target data for final acceptance.
**Contingency plan**: Collect a target replay dataset before final acceptance.
**Residual risk after mitigation**: Medium.
**Documents updated**: `tests/test-data.md`, `deployment/environment_strategy.md`, `deployment/ci_cd_pipeline.md`.
---
### R10: Plane `GPS_INPUT` Integration
**Description**: ArduPilot Plane EKF and `GPS_INPUT` handling may differ from assumptions, especially around accuracy fields, ignore flags, velocity fields, and spoofing transitions.
**Trigger conditions**: Plane SITL rejects or mishandles emitted `GPS_INPUT`, or QGC status is insufficient.
**Affected components**: MAVLink/GCS integration, safety/anchor wrapper, validation harness.
**Mitigation strategy**:
1. Use pymavlink for exact `GPS_INPUT` field control.
2. Gate release on Plane SITL with production parameters.
3. Validate spoofing/failsafe and QGC status behavior.
**Contingency plan**: Adjust parameter guidance/output fields before hardware deployment.
**Residual risk after mitigation**: Medium.
**Documents updated**: `components/07_mavlink_gcs_integration/description.md`, `tests/environment.md`, `deployment/ci_cd_pipeline.md`.
## Architecture/Component Changes Applied
| Risk ID | Document Modified | Change Description |
|---------|-------------------|--------------------|
| R04 | `components/01_camera_ingest_calibration/description.md` | Added explicit `detect_occlusion`, `OcclusionReport`, and pre-VIO bypass behavior |
| R04/R05 | `components/03_safety_anchor_wrapper/description.md` | Added `propagate_imu_only`, `total_occlusion`, monotonic covariance behavior, and no direct cache lifecycle dependency |
| R07 | `data_model.md` | Replaced embedded DB references with PostgreSQL/PostGIS structured metadata and CBOR FDR payload segments |
| R07 | `architecture.md` | Added PostgreSQL/PostGIS ADR and FDR storage decision |
| R05 | `tests/blackbox-tests.md` / `tests/resilience-tests.md` | Made total occlusion and IMU-only blackout behavior explicit |
## Summary
**Total risks identified**: 12
**Critical**: 0 | **High**: 7 | **Medium**: 5 | **Low**: 0
**Risks mitigated this iteration**: 12
**Risks requiring user decision**: None immediately. Future decisions are tied to exact camera hardware proof, dataset license approval, and representative data collection timing.