[AZ-240] Update product implementation and task decomposition processes

- Refined task decomposition steps to ensure implementation tasks are atomic and complexity does not exceed 5 points. - Enhanced the product implementation process with a completeness gate to verify task outcomes against architecture promises before proceeding to testing. - Updated dependencies table to reflect new tasks and their relationships, ensuring all test tasks are linked to product remediation tasks. - Adjusted workflow documentation to clarify entry points for task decomposition and implementation contexts. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 14:11:13 +00:00 · 2026-05-05 01:02:25 +03:00
parent 9fb9e4a349
commit 827d4fe644
32 changed files with 1399 additions and 89 deletions
@@ -1,8 +1,8 @@
 # Dependencies Table

-**Date**: 2026-05-03
-**Total Tasks**: 14
-**Total Complexity Points**: 60
+**Date**: 2026-05-04
+**Total Tasks**: 24
+**Total Complexity Points**: 108
 **Lessons applied**: No `_docs/LESSONS.md` file exists; no prior estimation or dependency lessons were available.

 | Task | Name | Complexity | Dependencies | Epic |
@@ -21,9 +21,29 @@
 | AZ-230 | satellite_service_vpr_retrieval | 5 | AZ-223, AZ-225, AZ-229 | AZ-214 |
 | AZ-231 | anchor_verification_matching | 5 | AZ-223, AZ-225, AZ-230 | AZ-215 |
 | AZ-232 | safety_anchor_state_machine | 5 | AZ-223, AZ-224, AZ-227, AZ-228, AZ-231 | AZ-216 |
+| AZ-240 | native_vio_backend_integration | 5 | AZ-228 | AZ-213 |
+| AZ-241 | real_satellite_vpr_descriptor_retrieval | 5 | AZ-230 | AZ-214 |
+| AZ-242 | real_anchor_feature_matching_ransac | 5 | AZ-231, AZ-241 | AZ-215 |
+| AZ-233 | test_infrastructure | 5 | AZ-240, AZ-241, AZ-242 | AZ-218 |
+| AZ-234 | replay_geolocation_confidence_tests | 3 | AZ-233 | AZ-218 |
+| AZ-235 | vio_replay_performance_tests | 5 | AZ-233, AZ-240 | AZ-218 |
+| AZ-236 | satellite_anchor_cache_tests | 5 | AZ-233, AZ-241, AZ-242 | AZ-218 |
+| AZ-237 | mavlink_blackout_spoofing_tests | 5 | AZ-233 | AZ-218 |
+| AZ-238 | cold_start_restart_tests | 5 | AZ-233 | AZ-218 |
+| AZ-239 | jetson_resource_endurance_tests | 5 | AZ-233 | AZ-218 |

 ## Verification Notes

 - No task exceeds 5 complexity points.
- E2E/blackbox test work remains outside this product implementation task set and is deferred to the greenfield Decompose Tests phase.
- The graph is acyclic: foundations precede adapters/stores, then VIO/retrieval/matching, then safety wrapper orchestration.
+- Test implementation tasks are appended under Blackbox Tests (AZ-218); the test infrastructure bootstrap now depends on the product remediation tasks so tests do not validate scaffold behavior.
+- The graph is acyclic: product foundations precede adapters/stores, then VIO/retrieval/matching, then safety wrapper orchestration; remediation tasks close native VIO, real VPR, and real matching gaps before affected blackbox tests run.
+
+## Test Coverage Verification
+
+- AZ-234 covers FT-P-01, FT-P-02, and NFT-PERF-01.
+- AZ-235 covers FT-P-03 and NFT-PERF-02 after AZ-240 provides the real native VIO path.
+- AZ-236 covers FT-P-04, FT-N-01, FT-N-03, NFT-PERF-03, NFT-RES-04, NFT-SEC-01, NFT-SEC-02, NFT-SEC-04, and NFT-RES-LIM-03 after AZ-241 and AZ-242 provide real VPR retrieval and anchor matching.
+- AZ-237 covers FT-N-02, NFT-RES-01, and NFT-SEC-03.
+- AZ-238 covers NFT-RES-02, NFT-RES-03, NFT-PERF-04, and NFT-RES-LIM-05.
+- AZ-239 covers NFT-RES-LIM-01, NFT-RES-LIM-02, and NFT-RES-LIM-04.
+- All traceability-matrix AC and restriction groups remain covered by at least one test task.
@@ -0,0 +1,163 @@
+# Test Infrastructure
+
+**Task**: AZ-233_test_infrastructure
+**Name**: Test Infrastructure
+**Description**: Scaffold the blackbox and e2e test project: runner, deterministic fixtures, isolated replay/SITL environment, reporting, and external dependency stubs.
+**Complexity**: 5 points
+**Dependencies**: AZ-240_native_vio_backend_integration, AZ-241_real_satellite_vpr_descriptor_retrieval, AZ-242_real_anchor_feature_matching_ransac
+**Component**: Blackbox Tests
+**Tracker**: AZ-233
+**Epic**: AZ-218
+
+## Test Project Folder Layout
+
+```text
+e2e/
+├── replay/
+│   ├── run_replay.py
+│   ├── scenarios/
+│   └── reports/
+├── fixtures/
+│   ├── cache/
+│   ├── mavlink/
+│   ├── telemetry/
+│   └── expected/
+├── tests/
+│   ├── test_still_image_replay.py
+│   ├── test_vio_replay.py
+│   ├── test_satellite_anchor.py
+│   ├── test_blackout_spoofing.py
+│   ├── test_resource_limits.py
+│   └── test_security_gates.py
+├── mocks/
+│   ├── satellite_cache_stub/
+│   ├── ardupilot_sitl/
+│   └── qgc_observer/
+└── reports/
+```
+
+### Layout Rationale
+
+The test project keeps blackbox/e2e runner code outside product runtime internals. Scenario definitions, fixtures, mocks, and reports are separated so tests can reset state between runs and produce release evidence without importing private component modules.
+
+Test implementation starts only after remediation tasks AZ-240, AZ-241, and AZ-242 close the native VIO, real satellite VPR, and real anchor matching gaps found during autodev verification.
+
+## Mock Services
+
+| Mock Service | Replaces | Interfaces | Behavior |
+|-------------|----------|------------|----------|
+| `satellite_cache_stub` | Offline Azaion Suite Satellite Service cache package | Local COG/manifest/descriptor fixture volume | Serves preloaded valid, stale, unsigned, hash-mismatched, and low-resolution cache fixtures; never performs network fetches during flight-mode tests. |
+| `ardupilot_sitl` | ArduPilot Plane flight controller | MAVLink telemetry and `GPS_INPUT` receiving path | Emits generated IMU, attitude, GPS health, spoofing, and failsafe traces; records injected `GPS_INPUT` for assertions. |
+| `qgc_observer` | QGroundControl status consumer | MAVLink/tlog parser | Records downsampled `STATUSTEXT`, status, and failsafe messages for rate and content assertions. |
+
+### Mock Control API
+
+Each mock or runner fixture must expose deterministic scenario controls for normal replay, stale cache, missing cache, spoofed GPS, blackout, restart, and resource-load modes. Recorded interactions must be queryable after each test run for assertions.
+
+## Docker Test Environment
+
+### `docker-compose.test.yml` Structure
+
+| Service | Image / Build | Purpose | Depends On |
+|---------|---------------|---------|------------|
+| `gps-denied-service` | Project runtime image or local package mount | System under test | `satellite-cache-stub` |
+| `replay-consumer` | Python replay/test harness | Feeds frames, telemetry, cache data, and faults | `gps-denied-service`, mock services |
+| `satellite-cache-stub` | Fixture volume/service | Provides offline cache manifests, sidecars, descriptors, and generated invalid variants | none |
+| `ardupilot-plane-sitl` | SITL container or local process wrapper | Validates `GPS_INPUT`, spoofing, and failsafe behavior | `gps-denied-service` |
+| `qgc-observer` | MAVLink log parser | Verifies GCS-visible status output | `ardupilot-plane-sitl` |
+
+### Networks and Volumes
+
+- `replay-net`: connects the runtime, replay consumer, and satellite-cache stub.
+- `sitl-net`: connects the runtime, ArduPilot Plane SITL, and QGC observer.
+- `input-data`: read-only mount for `_docs/00_problem/input_data/`.
+- `expected-results`: read-only mount for expected coordinate and report fixtures.
+- `derkachi-replay`: read-only mount for `flight_derkachi.mp4` and `data_imu.csv`.
+- `satellite-cache`: fixture cache volume with valid and invalid manifests.
+- `fdr-output`: fresh per-run output volume for FDR and report artifacts.
+
+## Test Runner Configuration
+
+**Framework**: Python pytest-style replay harness.
+**Entry point**: `run-blackbox-replay` or equivalent pytest command that executes scenario groups and writes reports.
+**Reports**: CSV summary plus FDR validation Markdown.
+
+### Fixture Strategy
+
+| Fixture | Scope | Purpose |
+|---------|-------|---------|
+| `project_60_still_images` | session | Provides 60 nadir images and expected WGS84 centers. |
+| `derkachi_video_telemetry` | session | Provides synchronized video, IMU, and `GLOBAL_POSITION_INT` replay data. |
+| `cache_integrity_fixtures` | function | Provides valid, stale, unsigned, hash-mismatched, and low-resolution cache variants. |
+| `sitl_spoofing_scenarios` | function | Provides generated GPS loss/spoofing and blackout traces. |
+| `public_nadir_vio_candidates` | optional/session | Provides public or representative synchronized datasets when available. |
+
+## Test Data Fixtures
+
+| Data Set | Source | Format | Used By |
+|----------|--------|--------|---------|
+| `project_60_still_images` | `_docs/00_problem/input_data/` | JPG + metadata | Still-image accuracy, confidence, latency smoke |
+| `expected_frame_centers` | `_docs/00_problem/input_data/coordinates.csv` and expected-results report | CSV/Markdown | Geolocation assertions |
+| `derkachi_video_telemetry` | `_docs/00_problem/input_data/flight_derkachi/` | MP4 + CSV | VIO replay, latency, resilience |
+| `cache_integrity_fixtures` | generated fixture volume | COG/manifest/sidecar/index fixtures | Cache freshness, poisoning, no-fetch tests |
+| `sitl_spoofing_scenarios` | generated by SITL harness | MAVLink/tlog traces | Spoofing, blackout, failsafe, GCS status |
+| `public_nadir_vio_candidates` | pinned external fixtures | dataset-specific | Final VIO and satellite-anchor validation |
+
+### Data Isolation
+
+Every run uses read-only input fixtures and fresh run-scoped output directories. FDR, generated tiles, tlogs, and reports are written only to per-run output volumes. Mock state and generated fixtures are reset before each scenario group.
+
+## Test Reporting
+
+**Format**: CSV summary and Markdown evidence report.
+**Output paths**: `test-results/blackbox-report.csv` and `test-results/fdr-validation-summary.md`.
+**Required columns**: Test ID, test name, input dataset, execution time, result, error distance, source label, covariance 95% semi-major, `GPS_INPUT.fix_type`, and error message.
+
+## Acceptance Criteria
+
+**AC-1: Test environment starts**
+Given the Docker/replay test environment
+When the test stack starts
+Then the runtime, replay consumer, cache fixture, SITL, and observer services are reachable or report a clear blocked prerequisite.
+
+**AC-2: External dependency stubs are deterministic**
+Given a scenario config for cache, MAVLink, QGC, or fixture behavior
+When the replay consumer executes it
+Then mocks produce repeatable responses and expose recorded interactions for assertions.
+
+**AC-3: Test runner executes scenario groups**
+Given valid fixtures and a running test environment
+When the test runner starts
+Then it discovers and executes blackbox, performance, resilience, security, and resource-limit scenario groups.
+
+**AC-4: Reports are generated**
+Given a completed or blocked test run
+When reporting finishes
+Then CSV and Markdown evidence files are written with the required columns, metrics, artifact paths, and blocked-prerequisite reasons.
+
+## Non-Functional Requirements
+
+**Reliability**
+- Missing hardware, public datasets, calibration, or SITL prerequisites are reported as `blocked`, not `passed`.
+
+**Security**
+- Fixture stubs must not access external satellite-provider or Suite service networks during in-flight test scenarios.
+
+**Data Isolation**
+- No test may mutate source fixtures or write FDR/generated-tile artifacts outside run-scoped output paths.
+
+## Constraints
+
+- The test suite must use public runtime boundaries only: navigation frames, telemetry, offline cache, MAVLink output, QGC status, and FDR outputs.
+- The suite must not import private estimator, BASALT, wrapper, or tile-manager internals.
+- Hardware-specific Jetson gates remain release-gate tests and may be skipped or blocked in ordinary local replay.
+
+## Risks & Mitigation
+
+**Risk 1: Environment prerequisites hide real failures**
+- *Risk*: Missing hardware, calibration, or datasets could be treated as success.
+- *Mitigation*: Report unavailable prerequisites as `blocked` with explicit artifact evidence.
+
+**Risk 2: Fixture mutation contaminates later runs**
+- *Risk*: Generated FDR, cache, or SITL output changes expected input fixtures.
+- *Mitigation*: Use read-only fixture mounts and fresh run-scoped output volumes for every execution.
@@ -0,0 +1,88 @@
+# Replay Geolocation And Confidence Tests
+
+**Task**: AZ-234_replay_geolocation_confidence_tests
+**Name**: Replay Geolocation And Confidence Tests
+**Description**: Implement blackbox tests for still-image geolocation, confidence/source-label output, and replay latency smoke.
+**Complexity**: 3 points
+**Dependencies**: AZ-233_test_infrastructure
+**Component**: Blackbox Tests
+**Tracker**: AZ-234
+**Epic**: AZ-218
+
+## Problem
+
+The project needs deterministic blackbox evidence that the 60-image replay path emits WGS84 frame-center estimates with required confidence fields and latency metrics.
+
+## Outcome
+
+- Still-image replay reports per-frame coordinate error and aggregate threshold results.
+- Every emitted estimate includes covariance, source label, and anchor-age fields.
+- Replay smoke latency and dropped-frame metrics are captured in the shared report format.
+
+## Scope
+
+### Included
+
+- FT-P-01 Still-Image Frame Center Geolocation.
+- FT-P-02 Position Confidence Output Contract.
+- NFT-PERF-01 Per-Frame Latency On Project Still Images.
+- CSV and Markdown evidence output for these scenarios.
+
+### Excluded
+
+- Synchronized VIO video/IMU replay.
+- Satellite-anchor VPR/local matching.
+- Jetson-only release-gate profiling.
+
+## Acceptance Criteria
+
+**AC-1: Still-image coordinates are validated**
+Given the 60-image project fixture and expected frame-center coordinates
+When the replay test runs
+Then per-frame WGS84 error is reported and aggregate 50 m / 20 m thresholds are evaluated.
+
+**AC-2: Confidence output contract is validated**
+Given emitted position estimates from the replay
+When the test inspects public output fields
+Then each estimate includes WGS84 coordinates, 95% covariance semi-major axis, source label, and anchor age.
+
+**AC-3: Replay latency is measured**
+Given the still-image replay runs at the configured smoke rate
+When processing completes
+Then capture-to-output latency and dropped-frame rate are recorded with pass/fail or blocked status.
+
+## Non-Functional Requirements
+
+**Performance**
+- Replay smoke evidence includes p50/p95/p99 latency and dropped-frame rate.
+
+**Reliability**
+- Missing or invalid expected-coordinate fixtures fail fixture validation before scenario execution.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|--------------|------------------|
+| AC-1 | Expected-coordinate loader validation | Invalid coordinates are rejected before replay |
+| AC-2 | Report field validation | Missing confidence/source fields fail the scenario |
+| AC-3 | Latency metric aggregation | p50/p95/p99 and dropped-frame metrics are emitted |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|-------------------------|--------------|-------------------|----------------|
+| AC-1 | `project_60_still_images`, `expected_frame_centers` | FT-P-01 | >=80% within 50 m and >=50% within 20 m or explicit failure | Reliability |
+| AC-2 | Same replay output | FT-P-02 | 100% of emitted estimates include required confidence fields | Reliability |
+| AC-3 | Replay smoke run | NFT-PERF-01 | Latency and drop-rate metrics are recorded | Performance |
+
+## Constraints
+
+- Tests must use public replay input and output artifacts only.
+- Input fixtures must be mounted read-only.
+- Blocked prerequisites must be reported as `blocked`, not `passed`.
+
+## Risks & Mitigation
+
+**Risk 1: Calibration limits are mistaken for product failure**
+- *Risk*: Fixture limits can make absolute accuracy inconclusive.
+- *Mitigation*: Report the fixture source and threshold basis with each failure.
@@ -0,0 +1,89 @@
+# VIO Replay Performance Tests
+
+**Task**: AZ-235_vio_replay_performance_tests
+**Name**: VIO Replay Performance Tests
+**Description**: Implement synchronized video/IMU replay tests for VIO output, covariance evidence, and replay performance metrics.
+**Complexity**: 5 points
+**Dependencies**: AZ-233_test_infrastructure, AZ-240_native_vio_backend_integration
+**Component**: Blackbox Tests
+**Tracker**: AZ-235
+**Epic**: AZ-218
+
+## Problem
+
+The runtime needs blackbox evidence that synchronized navigation video and flight-controller telemetry can drive VIO/wrapper output with honest confidence and measurable performance.
+
+This test task must run after AZ-240 so it validates the real native VIO path rather than the deterministic scaffold.
+
+## Outcome
+
+- Derkachi video/telemetry fixture alignment is validated before replay.
+- Synchronized replay produces frame-by-frame output or a clear blocked/failure reason.
+- Latency, completion rate, memory, trajectory comparison, and calibration-gated checks are reported.
+
+## Scope
+
+### Included
+
+- FT-P-03 BASALT VIO Replay With Synchronized Video/Telemetry.
+- NFT-PERF-02 BASALT + Wrapper Replay Latency.
+- Public/representative dataset prerequisite reporting.
+
+### Excluded
+
+- Satellite-anchor local verification.
+- SITL spoofing/failsafe scenarios.
+- Thermal/endurance release gates.
+
+## Acceptance Criteria
+
+**AC-1: Replay fixture alignment is validated**
+Given the Derkachi MP4 and telemetry CSV
+When fixture validation runs
+Then duration, frame-to-telemetry ratio, and timestamp monotonicity are verified before replay.
+
+**AC-2: Synchronized replay emits estimates**
+Given a valid synchronized video/IMU replay fixture
+When replay executes
+Then estimates are emitted frame-by-frame with source labels, covariance, and segment evidence.
+
+**AC-3: VIO performance evidence is reported**
+Given replay completed or blocked
+When reporting finishes
+Then latency, completion rate, memory, and calibration/public-dataset prerequisite status are written.
+
+## Non-Functional Requirements
+
+**Performance**
+- Reports include per-frame latency and memory metrics where the environment can measure them.
+
+**Reliability**
+- Calibration-gated absolute accuracy checks must be marked explicitly instead of silently passing.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|--------------|------------------|
+| AC-1 | Video/telemetry validator | Invalid duration or timestamp alignment blocks replay |
+| AC-2 | Replay result parser | Missing per-frame confidence fields fail the scenario |
+| AC-3 | Calibration gate reporting | Missing calibration/public data is reported as blocked |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|-------------------------|--------------|-------------------|----------------|
+| AC-1 | `derkachi_video_telemetry` | FT-P-03 fixture validation | Fixture accepted only when alignment rules pass | Reliability |
+| AC-2 | Valid synchronized replay | FT-P-03 output | Continuous estimates for normal overlapping segments or explicit degradation | Reliability |
+| AC-3 | Replay performance run | NFT-PERF-02 | Latency, completion rate, and memory evidence are recorded | Performance |
+
+## Constraints
+
+- Tests must not import BASALT/OpenVINS/Kimera internals directly.
+- Public/representative datasets are optional prerequisites and may produce blocked results.
+- Raw input video and telemetry fixtures remain read-only.
+
+## Risks & Mitigation
+
+**Risk 1: Hardware or dataset prerequisites are unavailable**
+- *Risk*: The scenario cannot produce final accuracy evidence locally.
+- *Mitigation*: Emit blocked results with exact missing prerequisite and continue other scenario groups.
@@ -0,0 +1,102 @@
+# Satellite Anchor Cache Tests
+
+**Task**: AZ-236_satellite_anchor_cache_tests
+**Name**: Satellite Anchor Cache Tests
+**Description**: Implement blackbox, security, and performance tests for satellite-anchor retrieval, local verification, cache integrity, and no in-flight external access.
+**Complexity**: 5 points
+**Dependencies**: AZ-233_test_infrastructure, AZ-241_real_satellite_vpr_descriptor_retrieval, AZ-242_real_anchor_feature_matching_ransac
+**Component**: Blackbox Tests
+**Tracker**: AZ-236
+**Epic**: AZ-218
+
+## Problem
+
+Satellite anchors and cache fixtures are safety-critical: invalid, stale, poisoned, or externally fetched data must not become trusted localization output.
+
+This test task must run after AZ-241 and AZ-242 so it validates real local VPR retrieval and real anchor feature matching rather than scaffold evidence gates.
+
+## Outcome
+
+- Accepted anchors include retrieval, matching, geometry, freshness, and provenance evidence.
+- Invalid/stale/poisoned cache fixtures cannot produce trusted anchors or trusted generated tiles.
+- No in-flight Satellite Service or provider access occurs when cache data is missing.
+
+## Scope
+
+### Included
+
+- FT-P-04 Satellite Service And Anchor Verification.
+- FT-N-01 Repetitive Or Low-Texture Imagery.
+- FT-N-03 Invalid Or Stale Satellite Cache.
+- NFT-PERF-03 Relocalization Trigger Path Latency.
+- NFT-RES-04 Tile Cache Freshness Degradation.
+- NFT-SEC-01 Signed Cache Manifest Enforcement.
+- NFT-SEC-02 Cache Poisoning Write Gate.
+- NFT-SEC-04 No In-Flight Satellite Provider Access.
+- NFT-RES-LIM-03 Satellite Cache Storage Budget.
+
+### Excluded
+
+- VIO synchronized replay.
+- MAVLink spoofing/failsafe behavior.
+- Jetson thermal endurance.
+
+## Acceptance Criteria
+
+**AC-1: Verified anchors include evidence**
+Given a valid local cache/index fixture and relocalization trigger
+When retrieval and verification run
+Then accepted anchors include candidate IDs, scores, MRE, inliers, covariance, and tile provenance.
+
+**AC-2: Unsafe candidates are rejected**
+Given low-texture, stale, unsigned, hash-mismatched, or low-resolution fixtures
+When anchor/cache tests run
+Then no invalid candidate emits a trusted `satellite_anchored` estimate or trusted generated tile.
+
+**AC-3: No in-flight external access occurs**
+Given flight-mode replay with missing cache data
+When relocalization is requested
+Then the system reports degraded/no-candidate behavior without satellite-provider or Suite service network calls.
+
+**AC-4: Cache and trigger-path metrics are reported**
+Given cache and relocalization scenarios complete
+When reporting finishes
+Then latency, MRE, trust level, freshness, and storage-budget evidence are written.
+
+## Non-Functional Requirements
+
+**Security**
+- Invalid cache data must not be trusted or promoted.
+
+**Performance**
+- Trigger-path latency and bounded top-K behavior are measured.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|--------------|------------------|
+| AC-1 | Anchor evidence parser | Required evidence fields are present |
+| AC-2 | Invalid cache fixture generator | Stale/unsigned/hash-mismatched fixtures are produced deterministically |
+| AC-3 | Network-block assertion | Unexpected external calls fail the scenario |
+| AC-4 | Cache metrics report | Latency, freshness, and storage metrics are present |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|-------------------------|--------------|-------------------|----------------|
+| AC-1 | Public/cache fixture | FT-P-04 | Accepted anchors meet MRE/evidence requirements | Performance |
+| AC-2 | Ambiguous and invalid cache fixtures | FT-N-01, FT-N-03, NFT-SEC-01, NFT-SEC-02 | 0 unsafe trusted outputs | Security |
+| AC-3 | Network-blocked flight-mode replay | NFT-SEC-04 | Missing cache causes degraded behavior, not fetch | Security |
+| AC-4 | Relocalization/cache runs | NFT-PERF-03, NFT-RES-04, NFT-RES-LIM-03 | Metrics and storage evidence are recorded | Performance |
+
+## Constraints
+
+- Tests must use local preloaded cache/index fixtures only.
+- External network access during flight-mode scenarios is a failure.
+- VPAir and UZH FPV licensing must be respected before use as commercial acceptance evidence.
+
+## Risks & Mitigation
+
+**Risk 1: Dataset licensing blocks final anchor evidence**
+- *Risk*: Public dataset terms prevent commercial acceptance use.
+- *Mitigation*: Mark dataset-specific checks blocked and keep generated cache fixtures for deterministic security coverage.
@@ -0,0 +1,94 @@
+# MAVLink Blackout Spoofing Tests
+
+**Task**: AZ-237_mavlink_blackout_spoofing_tests
+**Name**: MAVLink Blackout Spoofing Tests
+**Description**: Implement SITL/replay tests for visual blackout, spoofed GPS, MAVLink source validation, degraded covariance, no-fix thresholds, and QGC status.
+**Complexity**: 5 points
+**Dependencies**: AZ-233_test_infrastructure
+**Component**: Blackbox Tests
+**Tracker**: AZ-237
+**Epic**: AZ-218
+
+## Problem
+
+The system must prove that spoofed GPS and unauthorized MAVLink messages cannot override estimator state during visual blackout or degraded operation.
+
+## Outcome
+
+- Blackout and spoofing traces drive visible degraded-mode transitions.
+- Covariance, `GPS_INPUT`, QGC status, and FDR evidence match the safety thresholds.
+- Unauthorized MAVLink sources are rejected and recorded.
+
+## Scope
+
+### Included
+
+- FT-N-02 GPS Spoofing During Total Visual Blackout.
+- NFT-RES-01 Total Visual Blackout With GPS Spoofing.
+- NFT-SEC-03 MAVLink Source And Spoofing Rejection.
+
+### Excluded
+
+- Still-image geolocation accuracy.
+- Satellite-anchor cache poisoning.
+- Cold-start and restart trials.
+
+## Acceptance Criteria
+
+**AC-1: Blackout transitions to dead reckoning**
+Given a replay/SITL trace with total camera blackout and spoofed GPS
+When the scenario runs
+Then the system enters `dead_reckoned` mode within the required frame or timing threshold.
+
+**AC-2: Degraded output thresholds are enforced**
+Given blackout continues beyond configured thresholds
+When estimates are emitted
+Then covariance grows monotonically and `GPS_INPUT` fields degrade to no-fix/failsafe values at the specified limits.
+
+**AC-3: Spoofed or unauthorized MAVLink inputs are rejected**
+Given spoofed real-GPS measurements or unauthorized MAVLink source IDs
+When messages arrive during normal or blackout operation
+Then no confident position estimate is produced from those inputs.
+
+**AC-4: Operator and FDR evidence is visible**
+Given degraded-mode transitions occur
+When reporting completes
+Then QGC status and FDR evidence show promotion, demotion, blackout, and failsafe events at expected rates.
+
+## Non-Functional Requirements
+
+**Safety**
+- Spoofed GPS must not be promoted during blackout without the documented recovery gates.
+
+**Reliability**
+- Missing SITL prerequisites are reported as blocked with exact setup evidence.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|--------------|------------------|
+| AC-1 | Scenario trigger builder | Blackout and spoofing events are generated deterministically |
+| AC-2 | Threshold assertion logic | Fix type, covariance, and `horiz_accuracy` thresholds are checked |
+| AC-3 | MAVLink source filter assertion | Unauthorized source messages fail the scenario |
+| AC-4 | Status/FDR parser | Expected status events and rates are validated |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|-------------------------|--------------|-------------------|----------------|
+| AC-1 | SITL or replay spoofing trace | FT-N-02, NFT-RES-01 | Dead-reckoned transition within timing threshold | Safety |
+| AC-2 | Continued blackout | FT-N-02, NFT-RES-01 | Monotonic covariance and no-fix/failsafe fields | Safety |
+| AC-3 | Unauthorized/spoofed MAVLink messages | NFT-SEC-03 | No confident estimate from bad source | Safety |
+| AC-4 | QGC/FDR outputs | FT-N-02, NFT-SEC-03 | Status and evidence are visible and rate-limited | Reliability |
+
+## Constraints
+
+- ArduPilot Plane SITL is the authoritative autopilot target.
+- v1 asserts `GPS_INPUT` output and intentional absence of ODOMETRY.
+- Tests must not depend on Mission Planner or PX4 behavior.
+
+## Risks & Mitigation
+
+**Risk 1: SITL setup varies by environment**
+- *Risk*: Local runs may not have SITL installed or configured.
+- *Mitigation*: Report blocked prerequisites clearly and keep replay-level assertions runnable where possible.
@@ -0,0 +1,95 @@
+# Cold Start Restart Tests
+
+**Task**: AZ-238_cold_start_restart_tests
+**Name**: Cold Start Restart Tests
+**Description**: Implement tests for cold start, companion restart, sharp-turn/disconnected relocalization, and first-fix resource spikes.
+**Complexity**: 5 points
+**Dependencies**: AZ-233_test_infrastructure
+**Component**: Blackbox Tests
+**Tracker**: AZ-238
+**Epic**: AZ-218
+
+## Problem
+
+The test suite must prove that the runtime recovers from disconnected visual segments and companion restarts without hiding missing prerequisites or unsafe degraded behavior.
+
+## Outcome
+
+- Sharp-turn/disconnected-segment scenarios trigger relocalization or explicit degraded output.
+- Companion restart scenarios measure first valid output timing and FDR evidence.
+- Cold-start trials record first-fix latency and resource spikes.
+
+## Scope
+
+### Included
+
+- NFT-RES-02 Sharp Turn And Disconnected Segment Relocalization.
+- NFT-RES-03 Companion Computer Restart Mid-Flight.
+- NFT-PERF-04 Cold Boot Time To First Fix.
+- NFT-RES-LIM-05 Cold Start Resource Spike.
+
+### Excluded
+
+- Long thermal endurance.
+- FDR 8-hour rollover load.
+- Cache poisoning and no-fetch security tests.
+
+## Acceptance Criteria
+
+**AC-1: Disconnected segments trigger relocalization**
+Given a sharp-turn or disconnected segment fixture
+When replay reaches the low-overlap transition
+Then relocalization is requested and the system either reconnects via verified anchor or reports degraded status.
+
+**AC-2: Companion restart recovery is measured**
+Given a replay/SITL mission in progress
+When the GPS-denied service is restarted
+Then first valid output timing, FC-state handoff behavior, and FDR restart evidence are recorded.
+
+**AC-3: Cold-start trials report first-fix timing**
+Given cold-start conditions and local cache/index prerequisites
+When 50 trials run or are blocked
+Then the p95 time-to-first-fix result or exact blocked prerequisite is reported.
+
+**AC-4: Cold-start resource spikes are captured**
+Given initialization begins
+When engines/indexes/cache are loaded
+Then peak memory and initialization-stage timing are recorded where measurable.
+
+## Non-Functional Requirements
+
+**Reliability**
+- Missing calibration, public datasets, or hardware prerequisites must not be treated as passing.
+
+**Performance**
+- First-fix timing and peak memory are reported with percentile summaries where enough trials run.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|--------------|------------------|
+| AC-1 | Relocalization trigger assertion | Missing-position thresholds trigger request checks |
+| AC-2 | Restart report parser | Restart and first-output events are present |
+| AC-3 | Trial aggregation | p95 first-fix summary or blocked reason is emitted |
+| AC-4 | Resource metric parser | Peak memory and stage timings are captured |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|-------------------------|--------------|-------------------|----------------|
+| AC-1 | Sharp-turn/disconnected replay | NFT-RES-02 | Verified relocalization or degraded evidence | Reliability |
+| AC-2 | Mission restart trace | NFT-RES-03 | First valid output and FDR restart evidence | Reliability |
+| AC-3 | Cold-start harness | NFT-PERF-04 | p95 first fix <30 s or blocked prerequisite | Performance |
+| AC-4 | Cold-start resource monitoring | NFT-RES-LIM-05 | Peak memory <8 GB or blocked/failure evidence | Performance |
+
+## Constraints
+
+- Restart tests must preserve fixture read-only guarantees.
+- Trial loops must be bounded and report partial results if interrupted.
+- Hardware-only assertions must be clearly marked when not runnable locally.
+
+## Risks & Mitigation
+
+**Risk 1: Long cold-start trials are expensive**
+- *Risk*: Full 50-run evidence may not be practical on every PR.
+- *Mitigation*: Support smoke mode for PRs and full mode for release gates, with clear report labels.
@@ -0,0 +1,94 @@
+# Jetson Resource Endurance Tests
+
+**Task**: AZ-239_jetson_resource_endurance_tests
+**Name**: Jetson Resource Endurance Tests
+**Description**: Implement release-gate resource and endurance tests for Jetson memory, thermal/power behavior, and FDR rollover.
+**Complexity**: 5 points
+**Dependencies**: AZ-233_test_infrastructure
+**Component**: Blackbox Tests
+**Tracker**: AZ-239
+**Epic**: AZ-218
+
+## Problem
+
+Release readiness requires hardware/resource evidence that cannot be proven by ordinary unit tests or short local replay runs.
+
+## Outcome
+
+- Jetson memory and thermal/power metrics are captured where hardware is available.
+- FDR 8-hour synthetic load verifies rollover, storage cap, and retained payload classes.
+- Hardware-only prerequisites are reported as blocked when not available.
+
+## Scope
+
+### Included
+
+- NFT-RES-LIM-01 Jetson Memory Budget.
+- NFT-RES-LIM-02 Thermal And Power Envelope.
+- NFT-RES-LIM-04 Flight Data Recorder Rollover.
+
+### Excluded
+
+- Still-image replay accuracy.
+- Satellite anchor/cache security tests.
+- Cold-start first-fix trials.
+
+## Acceptance Criteria
+
+**AC-1: Jetson memory budget is measured**
+Given Jetson hardware or equivalent production target is available
+When sustained replay and trigger-path workload runs
+Then CPU/GPU shared memory, process RSS, CUDA allocations, and OOM/throttle status are recorded.
+
+**AC-2: Thermal and power endurance is validated or blocked**
+Given thermal test prerequisites are available
+When the sustained 25 W workload runs
+Then throttle flags, temperatures, clocks, and latency are recorded for the required duration; otherwise the run reports blocked prerequisites.
+
+**AC-3: FDR rollover is validated**
+Given an 8-hour synthetic mission load
+When FDR output reaches rollover conditions
+Then storage remains within the cap, rollover is logged, and no payload class is silently dropped.
+
+**AC-4: Evidence artifacts are complete**
+Given resource/endurance scenarios complete or block
+When reporting finishes
+Then metrics, duration, environment, status, and artifact paths are written.
+
+## Non-Functional Requirements
+
+**Performance**
+- Resource evidence must include duration and sampling interval.
+
+**Reliability**
+- Hardware-unavailable results are `blocked`, not `passed`.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|--------------|------------------|
+| AC-1 | Resource metric parser | Memory and throttle fields are present |
+| AC-2 | Blocked prerequisite reporter | Missing hardware/thermal setup records blocked status |
+| AC-3 | FDR rollover report parser | Storage, rollover, and payload-class fields are validated |
+| AC-4 | Evidence manifest writer | Artifact paths and run metadata are present |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|-------------------------|--------------|-------------------|----------------|
+| AC-1 | Jetson/prod-equivalent hardware | NFT-RES-LIM-01 | Peak memory <8 GB or explicit failure | Performance |
+| AC-2 | Thermal/power test setup | NFT-RES-LIM-02 | No throttle over required duration or blocked/failure | Performance |
+| AC-3 | Synthetic 8-hour mission load | NFT-RES-LIM-04 | FDR cap and rollover behavior are evidenced | Reliability |
+| AC-4 | Resource/endurance reports | All included scenarios | Complete artifact manifest and status | Reliability |
+
+## Constraints
+
+- These tests are release-gate oriented and may be skipped or blocked in ordinary PR mode.
+- Raw frames must not be retained during FDR load tests.
+- Resource tests must not write outside run-scoped output directories.
+
+## Risks & Mitigation
+
+**Risk 1: Hardware gates are unavailable during local development**
+- *Risk*: Developers cannot run full evidence locally.
+- *Mitigation*: Support blocked status and separate PR smoke mode from release-gate execution.
@@ -0,0 +1,95 @@
+# Native VIO Backend Integration
+
+**Task**: AZ-240_native_vio_backend_integration
+**Name**: Native VIO Backend Integration
+**Description**: Replace the deterministic VIO placeholder path with a real native backend integration boundary for representative replay.
+**Complexity**: 5 points
+**Dependencies**: AZ-228_vio_adapter
+**Component**: VIO Adapter
+**Tracker**: AZ-240
+**Epic**: AZ-213
+
+## Problem
+
+The current VIO adapter satisfies the public contract with deterministic scaffold behavior, but it does not exercise a real native VIO backend for synchronized replay.
+
+## Outcome
+
+- A production-capable native VIO bridge is available behind the existing `VioBackend` protocol.
+- Backend-specific setup remains isolated from the public VIO adapter boundary.
+- Existing timestamp mismatch, tracking-loss, health, and no-WGS84-authority behavior is preserved.
+
+## Scope
+
+### Included
+
+- Native/backend bridge implementation behind `VioBackend`.
+- Backend initialization and runtime failure mapping into explicit health/error states.
+- Replay-driven relative pose, velocity, bias, tracking quality, and covariance output.
+- Tests that prove the real backend path is selected when configured.
+
+### Excluded
+
+- Absolute WGS84 authority or safety fusion.
+- Satellite-anchor fallback logic.
+- Direct test imports of backend internals.
+
+## Dependencies
+
+### Document Dependencies
+
+- `_docs/02_document/components/02_vio_adapter/description.md`
+- `_docs/02_document/contracts/shared/runtime_contracts.md`
+- `_docs/02_document/contracts/shared/geometry_time_sync.md`
+- `_docs/02_document/contracts/shared/config_errors_telemetry.md`
+
+## Acceptance Criteria
+
+**AC-1: Native backend path emits VIO state**
+Given synchronized replay frames and telemetry
+When VIO processing runs with the native backend enabled
+Then the adapter emits a relative VIO state packet from the native path.
+
+**AC-2: Backend failures are explicit**
+Given backend initialization or runtime failure
+When VIO processing or health reporting runs
+Then the adapter surfaces an explicit error and degraded or failed health state.
+
+**AC-3: Existing safety boundaries remain intact**
+Given timestamp mismatch, low tracking quality, or successful native output
+When the adapter returns a result
+Then degraded behavior, tracking quality, and absence of WGS84 authority remain intact.
+
+## Non-Functional Requirements
+
+**Performance**
+- Replay execution must expose latency and memory metrics for later Jetson profiling gates.
+
+**Reliability**
+- Backend failures must not be hidden behind deterministic fallback success.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|--------------|------------------|
+| AC-1 | Configured native backend path | Native estimate is used, not deterministic fallback |
+| AC-2 | Backend init/runtime failure | Explicit error and degraded/failed health |
+| AC-3 | Timestamp/quality boundaries | Existing degraded/no-WGS84 behavior preserved |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|-------------------------|--------------|-------------------|----------------|
+| AC-1 | Derkachi or representative synchronized replay | Native VIO replay path | Relative estimates are emitted or blocked with a real prerequisite reason | Performance |
+
+## Constraints
+
+- Keep backend-specific dependencies behind the `vio_adapter` native boundary.
+- Do not make the VIO adapter the safety or WGS84 authority.
+- If required native packages are unavailable locally, tests must skip or block with explicit prerequisite evidence rather than passing through the deterministic fallback.
+
+## Risks & Mitigation
+
+**Risk 1: Native dependency unavailable in local CI**
+- *Risk*: The real backend cannot run on all developer machines.
+- *Mitigation*: Provide dependency-gated tests that fail only when the backend is configured but broken, and report blocked prerequisites for full replay gates.
@@ -0,0 +1,95 @@
+# Real Satellite VPR Descriptor Retrieval
+
+**Task**: AZ-241_real_satellite_vpr_descriptor_retrieval
+**Name**: Real Satellite VPR Descriptor Retrieval
+**Description**: Replace the tuple-similarity satellite retrieval scaffold with the real local descriptor/index retrieval path promised by the Satellite Service design.
+**Complexity**: 5 points
+**Dependencies**: AZ-230_satellite_service_vpr_retrieval
+**Component**: Satellite Service
+**Tracker**: AZ-241
+**Epic**: AZ-214
+
+## Problem
+
+The current Satellite Service can load in-memory descriptor records and rank them with local tuple similarity, but it does not yet integrate the real offline descriptor/index retrieval path.
+
+## Outcome
+
+- Local mission cache descriptor/index packages can be loaded by the runtime retrieval path.
+- Retrieval uses the selected CPU FAISS/DINOv2-VLAD-compatible boundary where available.
+- Freshness filtering, bounded top-K output, descriptor-fidelity checks, and no in-flight network behavior remain intact.
+
+## Scope
+
+### Included
+
+- Local descriptor/index package loading from the offline cache boundary.
+- Real local VPR retrieval implementation behind the public Satellite Service API.
+- Explicit degraded/no-candidate/index failure behavior.
+- Tests that distinguish the real retrieval path from the current tuple-similarity scaffold.
+
+### Excluded
+
+- Local feature matching, RANSAC, or anchor acceptance.
+- In-flight provider or Suite service calls.
+- TensorRT/ONNX optimization unless descriptor-fidelity gates are in place.
+
+## Dependencies
+
+### Document Dependencies
+
+- `_docs/02_document/components/04_satellite_retrieval/description.md`
+- `_docs/02_document/contracts/shared/runtime_contracts.md`
+- `_docs/02_document/contracts/shared/config_errors_telemetry.md`
+- `_docs/02_document/components/06_cache_tile_lifecycle/description.md`
+
+## Acceptance Criteria
+
+**AC-1: Real local index readiness is reported**
+Given a valid local descriptor/index package
+When the Satellite Service loads the package
+Then readiness reflects the real local index and loaded record count.
+
+**AC-2: Real top-K retrieval returns candidates**
+Given a relocalization request and loaded local index
+When retrieval runs
+Then bounded candidates come from the real local descriptor/index path with scores, footprints, and freshness state.
+
+**AC-3: Missing or invalid indexes degrade safely**
+Given missing, corrupt, incompatible, or empty local index data
+When retrieval runs
+Then the result is explicit degraded/no-candidate behavior without unsafe anchors or network calls.
+
+## Non-Functional Requirements
+
+**Performance**
+- Retrieval remains trigger-based and exposes latency metrics for Jetson profiling.
+
+**Security**
+- Retrieval must not perform in-flight provider or Suite service calls.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|--------------|------------------|
+| AC-1 | Real index package load | Ready status references loaded real index data |
+| AC-2 | Query against fixture index | Candidates come from the real retrieval path |
+| AC-3 | Missing/corrupt index | Explicit degraded/no-candidate result |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|-------------------------|--------------|-------------------|----------------|
+| AC-2 | Public/cache fixture with descriptor index | VPR recall and top-K policy | Candidate bounds, freshness, and latency evidence are reported | Performance |
+
+## Constraints
+
+- Use only local preloaded cache/index data during flight-mode retrieval.
+- Keep optional optimized engines behind descriptor-fidelity gates.
+- Missing native/index prerequisites must be reported as blocked, not silently passed by the scaffold path.
+
+## Risks & Mitigation
+
+**Risk 1: Heavy native/index dependencies do not run in ordinary CI**
+- *Risk*: The real retrieval path needs packages or data unavailable in local CI.
+- *Mitigation*: Keep fast contract tests for package parsing and dependency-gated integration tests for real index execution.
@@ -0,0 +1,94 @@
+# Real Anchor Feature Matching And RANSAC
+
+**Task**: AZ-242_real_anchor_feature_matching_ransac
+**Name**: Real Anchor Feature Matching And RANSAC
+**Description**: Replace the precomputed evidence gate-only scaffold with real local feature matching and geometry verification behind the Anchor Verification boundary.
+**Complexity**: 5 points
+**Dependencies**: AZ-231_anchor_verification_matching, AZ-241_real_satellite_vpr_descriptor_retrieval
+**Component**: Anchor Verification
+**Tracker**: AZ-242
+**Epic**: AZ-215
+
+## Problem
+
+The current Anchor Verification component can classify precomputed `MatchEvidence`, but it does not yet run real feature extraction, matching, homography estimation, or RANSAC/USAC geometry checks.
+
+## Outcome
+
+- Approved matcher profiles can compute correspondence evidence from frame imagery and candidate tile data.
+- Geometry verification produces inliers, MRE, homography/provenance, runtime, and rejection reasons.
+- Existing safety gates continue to reject unsafe candidates before any anchor is trusted.
+
+## Scope
+
+### Included
+
+- Matcher bridge for approved ALIKED/DISK + LightGlue and SIFT/ORB baseline profiles where dependencies are available.
+- Homography and RANSAC/USAC evidence generation from local imagery/tile fixtures.
+- Integration with existing `GeometryGatedAnchorVerifier` decision output.
+- Benchmark reporting from actual matching paths.
+
+### Excluded
+
+- VPR candidate ranking.
+- Safety wrapper fusion/promotion policy.
+- Per-frame steady-state VIO hot path execution.
+
+## Dependencies
+
+### Document Dependencies
+
+- `_docs/02_document/components/05_anchor_verification/description.md`
+- `_docs/02_document/contracts/shared/runtime_contracts.md`
+- `_docs/02_document/components/04_satellite_retrieval/description.md`
+
+## Acceptance Criteria
+
+**AC-1: Matching path computes evidence**
+Given a usable frame and fresh candidate tile
+When anchor verification runs
+Then matcher evidence is computed from local imagery and includes inliers, MRE, homography, provenance, and runtime.
+
+**AC-2: Unsafe candidates are rejected**
+Given low inliers, high reprojection error, stale or untrusted provenance, or geometry failure
+When verification runs
+Then no accepted anchor decision is emitted for that candidate.
+
+**AC-3: Real matcher benchmark is reportable**
+Given configured matcher profiles and fixture inputs
+When benchmark runs
+Then runtime and quality metrics are reported from actual matching paths.
+
+## Non-Functional Requirements
+
+**Performance**
+- Learned matching remains trigger-based and benchmarked separately from the VIO hot path.
+
+**Reliability**
+- Missing matcher dependencies or fixture data must be explicit blocked prerequisites, not passing scaffold behavior.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|--------------|------------------|
+| AC-1 | Fixture matching path | Evidence is computed from imagery/tile input |
+| AC-2 | Bad geometry/provenance | Candidate is rejected with reason |
+| AC-3 | Matcher benchmark | Runtime and quality metrics come from real path |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|-------------------------|--------------|-------------------|----------------|
+| AC-1 | Aerial/cache fixture pair | Anchor verification path | Accepted anchors meet MRE/inlier gates with real evidence | Performance |
+
+## Constraints
+
+- Keep native feature extraction and RANSAC acceleration under `anchor_verification`.
+- Do not trust precomputed evidence in production paths without provenance checks.
+- SuperPoint or other legally restricted models remain excluded unless explicitly approved.
+
+## Risks & Mitigation
+
+**Risk 1: False anchor acceptance**
+- *Risk*: Real cross-domain matching can produce plausible but unsafe geometry.
+- *Mitigation*: Preserve freshness, provenance, inlier, MRE, and downstream safety gates; add negative fixtures for low-texture and stale-cache cases.
@@ -0,0 +1,74 @@
+# Implementation Report
+
+**Feature**: Product runtime
+**Cycle**: 1
+**Date**: 2026-05-04
+**Status**: Superseded — remediation pending
+
+## Summary
+
+Greenfield product implementation completed the initial GPS-denied onboard runtime scaffold and component behavior tasks. Later product verification identified required remediation work before the flow can advance to testability revision.
+
+- Total tasks completed: 14
+- Completed batches: 9
+- Blocked tasks: 0
+- Code review verdicts: PASS for all batch reviews and cumulative review
+- Final test run: 49 passed
+
+## Completed Tasks
+
+| Task | Name | Batch | Status |
+|------|------|-------|--------|
+| AZ-219 | initial_structure | 1 | Done |
+| AZ-220 | shared_runtime_contracts | 2 | Done |
+| AZ-221 | shared_geometry_time_sync | 3 | Done |
+| AZ-222 | runtime_config_errors_telemetry | 3 | Done |
+| AZ-223 | camera_ingest_calibration | 4 | Done |
+| AZ-224 | mavlink_gcs_gateway | 4 | Done |
+| AZ-225 | tile_manager_cache_manifest | 4 | Done |
+| AZ-227 | fdr_event_recorder | 4 | Done |
+| AZ-226 | generated_tile_orthorectification | 5 | Done |
+| AZ-228 | vio_adapter | 6 | Done |
+| AZ-229 | satellite_service_sync | 6 | Done |
+| AZ-230 | satellite_service_vpr_retrieval | 7 | Done |
+| AZ-231 | anchor_verification_matching | 8 | Done |
+| AZ-232 | safety_anchor_state_machine | 9 | Done |
+
+## Batch Outcomes
+
+| Batch | Tasks | Code Review | Tests |
+|-------|-------|-------------|-------|
+| 1 | AZ-219_initial_structure | PASS | 5 passed |
+| 2 | AZ-220_shared_runtime_contracts | PASS | 11 passed |
+| 3 | AZ-221_shared_geometry_time_sync, AZ-222_runtime_config_errors_telemetry | PASS | 17 passed |
+| 4 | AZ-223_camera_ingest_calibration, AZ-224_mavlink_gcs_gateway, AZ-225_tile_manager_cache_manifest, AZ-227_fdr_event_recorder | PASS | 29 passed |
+| 5 | AZ-226_generated_tile_orthorectification | PASS | 32 passed |
+| 6 | AZ-228_vio_adapter, AZ-229_satellite_service_sync | PASS | 38 passed |
+| 7 | AZ-230_satellite_service_vpr_retrieval | PASS | 42 passed |
+| 8 | AZ-231_anchor_verification_matching | PASS | 45 passed |
+| 9 | AZ-232_safety_anchor_state_machine | PASS | 49 passed |
+
+## Acceptance Coverage
+
+All acceptance criteria documented in the product implementation task specs are covered by tests recorded in the batch reports:
+
+- Shared contracts, configuration, errors, telemetry, geometry, and time-sync behavior are validated by shared unit tests.
+- Component runtime boundaries for camera ingest, MAVLink/GCS, tile management, FDR, VIO, Satellite Service, anchor verification, and safety/anchor state management are validated by component unit tests.
+- Safety-critical behavior for explicit errors, no raw-frame retention, no mid-flight Satellite Service calls, conservative generated-tile writes, rejected unsafe anchors, monotonic blackout degradation, and honest covariance is covered by the current unit suite.
+
+## Review Summary
+
+- Batch reviews: `_docs/03_implementation/reviews/batch_01_review.md` through `_docs/03_implementation/reviews/batch_09_review.md`
+- Cumulative review: `_docs/03_implementation/reviews/cumulative_review_batches_01-09_cycle1_report.md`
+- Auto-fix attempts: 0 across all batches
+- Stuck agents: none
+
+## Final Verification
+
+- `.venv/bin/python -m black --check src tests e2e/replay` passed.
+- `.venv/bin/python -m ruff check src tests e2e/replay` passed.
+- `.venv/bin/python -m pytest` passed: 49 tests.
+
+## Next Step
+
+Autodev should remain at Step 7, Implement, until remediation tasks AZ-240 through AZ-242 are implemented and the Product Implementation Completeness Gate produces `_docs/03_implementation/implementation_completeness_cycle1_report.md` without unresolved `FAIL` classifications.
@@ -0,0 +1,65 @@
+# Code Review Report
+
+**Batch**: cumulative batches 01-09, cycle 1
+**Date**: 2026-05-04
+**Verdict**: PASS
+
+## Scope
+
+- Task specs reviewed: AZ-219 through AZ-232.
+- Batch reports reviewed: `_docs/03_implementation/batch_01_cycle1_report.md` through `_docs/03_implementation/batch_09_cycle1_report.md`.
+- Code scope reviewed: `src/`, `tests/`, and `e2e/replay`.
+- Architecture references reviewed: `_docs/02_document/architecture.md` and `_docs/02_document/module-layout.md`.
+
+## Findings
+
+| # | Severity | Category | File:Line | Title |
+|---|----------|----------|-----------|-------|
+| - | - | - | - | No findings |
+
+## Phase Results
+
+### Phase 1: Context Loading
+
+All 14 product implementation tasks, the project restrictions, the solution overview, module layout, architecture, and batch reports were reviewed.
+
+### Phase 2: Spec Compliance
+
+Every task acceptance criterion is covered by the per-batch reports and unit tests. The final full suite passed with 49 tests.
+
+### Phase 3: Code Quality
+
+Formatter and lint checks passed:
+
+- `.venv/bin/python -m black --check src tests e2e/replay`
+- `.venv/bin/python -m ruff check src tests e2e/replay`
+
+No dead imports, style errors, or obvious duplicated component-local contract shapes were found.
+
+### Phase 4: Security Quick-Scan
+
+No hardcoded secrets, `eval`, `exec`, shell subprocess usage, insecure deserialization, or sensitive-data logging patterns were found in `src/`.
+
+### Phase 5: Performance Scan
+
+The implemented code remains lightweight and trigger-oriented for the current scaffold/runtime-contract level. Heavy VPR, matching, Jetson, SITL, and endurance profiling remain release-gate work for later test implementation and deploy phases.
+
+### Phase 6: Cross-Task Consistency
+
+Shared DTOs and component interfaces are consistently consumed through public package surfaces. Batch-level reports show all dependencies were implemented before consumers.
+
+### Phase 7: Architecture Compliance
+
+Observed imports align with the component public API layout:
+
+- Runtime components import shared helpers and contracts through `shared/*` public modules.
+- Cross-component imports use package-level public exports such as `tile_manager`, not internal component files.
+- No component imports from `internal/`, `_*.py`, or native bridge paths owned by another component.
+
+No architecture baseline file exists, so no baseline delta section is required.
+
+## Verification
+
+- `.venv/bin/python -m black --check src tests e2e/replay` passed.
+- `.venv/bin/python -m ruff check src tests e2e/replay` passed.
+- `.venv/bin/python -m pytest` passed: 49 tests.
@@ -0,0 +1,56 @@
+# Code Testability Assessment
+
+**Date**: 2026-05-04
+**Autodev step**: Greenfield Step 8 — Code Testability Revision
+**Outcome**: Code is testable — no changes needed
+
+## Scope Reviewed
+
+- Test specifications in `_docs/02_document/tests/`
+- Traceability matrix in `_docs/02_document/tests/traceability-matrix.md`
+- Runtime source under `src/`
+- Existing unit tests under `tests/`
+- Product implementation report `_docs/03_implementation/implementation_report_product_runtime_cycle1.md`
+
+## Testability Result
+
+The implemented product runtime can support the planned tests without a testability-focused refactor.
+
+- Runtime components expose public package-level APIs through `__init__.py`, `types.py`, and `interfaces.py`.
+- Component behavior is expressed through data models and class/protocol boundaries that can be constructed directly in tests.
+- External systems are represented as boundary objects or planned black-box fixtures, not hardwired network calls.
+- No direct filesystem, environment, subprocess, socket, HTTP, global singleton, or wall-clock usage was found in `src/` that would block deterministic tests.
+- Planned hardware, SITL, Jetson, and dataset dependencies belong in test harness tasks and can report `blocked` when prerequisites are unavailable.
+
+## Scenario Review
+
+| Scenario Area | Testability Assessment |
+|---------------|------------------------|
+| Unit/component tests | Current public classes and DTOs are directly constructible and already covered by 49 passing tests. |
+| Black-box replay | The planned harness can drive public frame, telemetry, cache, MAVLink, status, and FDR boundaries without importing runtime internals. |
+| VIO and anchor replay | Heavy BASALT, FAISS, and matcher dependencies can be represented by test harness fixtures or backend boundaries in test tasks. |
+| SITL/MAVLink tests | The MAVLink/GCS gateway exposes validation and status behavior without requiring live hardware for unit-level coverage. |
+| Jetson/resource tests | Hardware-specific release gates are environment-dependent and do not require runtime refactoring before test-task implementation. |
+| Security/cache tests | Cache, freshness, no-fetch, and generated-tile trust behavior is exposed through public component methods. |
+
+## Reviewed Test Artifacts
+
+- `_docs/02_document/tests/blackbox-tests.md`
+- `_docs/02_document/tests/e2e-test-suite.md`
+- `_docs/02_document/tests/environment.md`
+- `_docs/02_document/tests/performance-tests.md`
+- `_docs/02_document/tests/resilience-tests.md`
+- `_docs/02_document/tests/resource-limit-tests.md`
+- `_docs/02_document/tests/security-tests.md`
+- `_docs/02_document/tests/test-data.md`
+- `_docs/02_document/tests/traceability-matrix.md`
+
+## Verification
+
+- `.venv/bin/python -m black --check src tests e2e/replay` passed.
+- `.venv/bin/python -m ruff check src tests e2e/replay` passed.
+- `.venv/bin/python -m pytest` passed: 49 tests.
+
+## Next Step
+
+Proceed to Greenfield Step 9, Decompose Tests.
@@ -4,11 +4,11 @@
 flow: greenfield
 step: 7
 name: Implement
-status: in_progress
+status: not_started
 tracker: jira
 sub_step:
-  phase: 1
-  name: batch-loop
-  detail: "batch 9: AZ-232_safety_anchor_state_machine"
+  phase: 0
+  name: awaiting-invocation
+  detail: "Product implementation incomplete: AZ-240..AZ-242 remediation tasks are pending. Re-run Step 7 and the Product Implementation Completeness Gate before Step 8 or test tasks."
 retry_count: 0
 cycle: 1