[AZ-233] Update Docker Compose and enhance test documentation

- Modified the Docker Compose configuration to include an input root for replay tests and added an environment variable for enabling SITL.
- Enhanced documentation for various testing processes, including the addition of a Runtime Completeness Decomposition Gate and clarifications on internal module testing requirements.
- Updated the implementation completeness report to reflect the current state and added new test cases for performance and resilience scenarios.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-06 05:03:48 +03:00
parent 2485763d09
commit cab7b5d020
20 changed files with 265 additions and 41 deletions
+1 -1
View File
@@ -14,7 +14,7 @@ Build a Jetson-hosted onboard localization pipeline for fixed-wing GPS-denied fl
- Tile Manager: manage COGs, manifests, freshness/provenance, orthorectified generated tiles, and local tile metadata.
- MAVLink/GCS integration: consume FC telemetry and emit `GPS_INPUT`/QGC status.
- FDR/observability: record replayable mission evidence under storage caps.
- Validation harness: run still-image, public dataset, SITL, Jetson, and representative replay tests.
- Validation harness: run local pytest plus Docker replay smoke for still-image, cache, SITL/QGC stub, security, Jetson-prerequisite, public dataset, and representative replay tests.
### Principles / Non-Negotiables
+7 -3
View File
@@ -228,11 +228,15 @@ Read top-to-bottom; an upper layer may import from a lower layer but never the r
Violations of this table are Architecture findings in code-review Phase 7 and are High severity.
## Out-of-Product E2E Test Suite
## Out-of-Product Blackbox / E2E Test Suite
The e2e replay/SITL/Jetson validation suite is not a product component and must not receive Step 6 product implementation tasks. It owns test-support artifacts under `tests/blackbox/**`, `tests/e2e/**`, `e2e/replay/**`, and `e2e/reports/**`, and it exercises the runtime only through public file, MAVLink, cache, status, and FDR interfaces.
The blackbox/e2e replay/SITL/Jetson validation suite is not a product component and must not receive Step 6 product implementation tasks. It owns test-support artifacts under `tests/blackbox/**`, `e2e/replay/**`, `e2e/fixtures/**`, `e2e/mocks/**`, `docker-compose.test.yml`, and `deployment/docker/Dockerfile.replay`, and it exercises the runtime only through public file, MAVLink, cache, status, and FDR interfaces.
- **Technologies**: Python, pytest-style runner, Docker/compose, pymavlink/log parser, ArduPilot Plane SITL, QGC observer/log parser, CSV/Markdown reports
- **Technologies**: Python, pytest-style runner, Docker/compose, deterministic fixture stubs, ArduPilot Plane SITL/QGC observer placeholders, CSV/Markdown reports
- **Entry points**:
- Local: `python3 -m pytest`
- Replay: `python -m e2e.replay.run_replay --output-dir <dir> --input-root <fixture-root>`
- Compose: `docker compose -f docker-compose.test.yml run --build --rm replay-consumer`
## Self-Verification
+17
View File
@@ -0,0 +1,17 @@
# Ripple Log Cycle 1
## Scope
Task-mode documentation refresh for Cycle 1 test implementation tasks `AZ-233` through `AZ-239`, plus Step 11 replay-gate fixes.
## Ripple Analysis
- No product component module docs were refreshed because the changed implementation surface is the out-of-product blackbox/e2e replay harness under `tests/blackbox/**`, `e2e/replay/**`, `docker-compose.test.yml`, and `deployment/docker/Dockerfile.replay`.
- `_docs/02_document/module-layout.md` was refreshed because the out-of-product test-suite path list now includes actual implemented paths and entry points.
- `_docs/02_document/architecture.md` was refreshed because the validation harness responsibility now includes the implemented Docker replay smoke gate.
- `_docs/02_document/tests/environment.md` was refreshed because the replay harness entry points, output paths, and local-vs-Jetson gate behavior changed.
- `_docs/02_document/tests/*` and `_docs/02_document/tests/traceability-matrix.md` were refreshed during Step 12 to capture implementation-learned replay-smoke scenario IDs.
## Import-Graph Result
No reverse-import product ripple was found or required. The replay harness imports product runtime modules only from tests; product runtime modules do not import the replay harness.
+8 -3
View File
@@ -44,9 +44,12 @@
## Consumer Application
**Tech stack**: Python replay harness with pytest-style assertions and MAVLink log parsing.
**Tech stack**: Python replay harness with pytest-style assertions, Docker/compose orchestration, deterministic cache/SITL/QGC stubs, and CSV/Markdown report generation.
**Entry point**: `run-blackbox-replay` command to be created during implementation; this planning artifact defines required behavior, not code.
**Entry points**:
- Local functional suite: `python3 -m pytest`
- Replay harness: `python -m e2e.replay.run_replay --output-dir <dir> --input-root <fixture-root>`
- Docker replay gate: `docker compose -f docker-compose.test.yml run --build --rm replay-consumer`
### Communication With System Under Test
@@ -81,7 +84,7 @@
**Columns**: Test ID, Test Name, Input Dataset, Execution Time (ms), Result, Error Distance (m), Source Label, Covariance 95% Semi-Major (m), `GPS_INPUT.fix_type`, Error Message.
**Output path**: `./test-results/blackbox-report.csv` and `./test-results/fdr-validation-summary.md`.
**Output path**: `data/test-results/<run-id>/blackbox-report.csv` and `data/test-results/<run-id>/fdr-validation-summary.md` on the host; `/app/data/test-results/<run-id>/...` inside the replay container.
## Test Execution
@@ -107,6 +110,8 @@ Use Docker or local host replay for deterministic, reproducible tests that do no
Docker/replay mode is suitable for PR checks and nightly validation, but it does not prove Jetson latency, memory, thermal, or camera-driver behavior.
Current Docker replay smoke evidence is expected to pass `FT-P-01`, `NFT-PERF-INFRA`, `NFT-RES-INFRA`, and `NFT-SEC-INFRA`. `NFT-RES-LIM-INFRA` remains blocked on local non-Jetson runners with an explicit target-hardware prerequisite.
### Local Hardware Mode
Use local Jetson hardware for release gates:
@@ -94,3 +94,24 @@
**Pass criteria**: 95th percentile <30 s over 50 runs.
**Duration**: 50 cold-start trials.
---
### NFT-PERF-INFRA: Replay Evidence Smoke
**Summary**: Validate that the Docker replay harness records timing evidence for the runnable local replay subset.
**Traces to**: AZ-234 AC-3, AZ-233 AC-3, AZ-233 AC-4
**Metric**: Scenario execution time and report generation status.
**Preconditions**:
- Docker replay environment is available.
- Project input fixtures are mounted read-only into the replay consumer.
| Step | Consumer Action | Measurement |
|------|-----------------|-------------|
| 1 | Run the replay consumer in Docker mode | Confirm the performance smoke scenario executes |
| 2 | Inspect the generated CSV and FDR summary | Confirm execution time and artifact paths are recorded |
**Pass criteria**: `NFT-PERF-INFRA` reports `pass` and writes run-scoped CSV/Markdown evidence; Jetson-only performance evidence remains in release-gate resource tests.
@@ -83,3 +83,25 @@
| 2 | Inspect emitted estimate | No stale tile produces `satellite_anchored` label past hard rejection threshold |
**Pass criteria**: Freshness decay and hard rejection match AC-NEW-6.
---
### NFT-RES-INFRA: Replay/SITL Prerequisite Smoke
**Summary**: Validate that the Docker replay environment can execute the resilience scenario group with deterministic SITL/QGC stubs.
**Traces to**: AZ-237 AC-1, AZ-237 AC-4, AZ-233 AC-1, AZ-233 AC-3
**Preconditions**:
- `ardupilot-plane-sitl` and `qgc-observer` services are started by `docker-compose.test.yml`.
- `GPSD_ENABLE_SITL=1` is set only for the Docker replay stub environment.
**Fault injection**:
- Run the blackout/restart control smoke scenario through the replay consumer.
| Step | Action | Expected Behavior |
|------|--------|-------------------|
| 1 | Start Docker replay services | SITL and QGC observer stubs are reachable to the replay consumer |
| 2 | Execute the resilience smoke scenario | The report records a `pass` result instead of a missing-SITL prerequisite block |
**Pass criteria**: `NFT-RES-INFRA` reports `pass` in Docker replay mode; live SITL release-candidate scenarios remain covered by `NFT-RES-01` and `FT-N-02`.
@@ -83,3 +83,18 @@
**Duration**: 50 cold-start trials.
**Pass criteria**: First valid `GPS_INPUT` <30 s p95; peak memory <8 GB; no first-run engine build occurs at runtime.
---
### NFT-RES-LIM-INFRA: Jetson Hardware Prerequisite Smoke
**Summary**: Validate that local replay reports Jetson-only resource gates as blocked unless target hardware is explicitly enabled.
**Traces to**: AZ-239 AC-1, AZ-239 AC-2, AZ-239 AC-4, AZ-233 Reliability NFR
**Monitoring**:
- Replay report status, blocked reason, and run-scoped artifact path.
**Duration**: One Docker replay smoke run.
**Pass criteria**: On non-Jetson local runners, the scenario reports `blocked` with `Jetson prerequisite blocked: set GPSD_ENABLE_JETSON=1 on target hardware`; on Jetson release-gate runners, it must collect the metrics required by `NFT-RES-LIM-01`, `NFT-RES-LIM-02`, and `NFT-RES-LIM-05`.
+15
View File
@@ -60,3 +60,18 @@
| 2 | Run replay requiring missing tile | System reports degraded/relocalization-needed status, not an external fetch |
**Pass criteria**: 0 outbound satellite-provider or Suite Service calls during runtime; missing cache data produces controlled degraded behavior.
---
### NFT-SEC-INFRA: Invalid Cache No-Fetch Smoke
**Summary**: Validate that the replay harness treats untrusted cache fixtures as a successful security rejection, not as a trusted anchor.
**Traces to**: AZ-236 AC-2, AZ-236 AC-3, AZ-233 Security NFR
| Step | Consumer Action | Expected Response |
|------|-----------------|-------------------|
| 1 | Run replay with `cache_variant=stale` | Satellite cache stub marks the manifest untrusted and records no network fetch |
| 2 | Inspect replay evidence | Scenario reports `pass`, `source_label=untrusted_cache_rejected`, and `GPS_INPUT.fix_type=0` |
**Pass criteria**: The invalid cache smoke scenario passes only when the untrusted fixture is rejected and no external satellite-provider or Suite service network fetch is attempted.
@@ -59,6 +59,37 @@
| R-GCS-01 | QGroundControl supported GCS | FT-N-02, NFT-SEC-03 | Covered |
| R-SAFETY-01 | False-position, cold-start, spoofing, and failsafe constraints | FT-N-01, FT-N-02, NFT-PERF-04, NFT-RES-01 | Covered |
## Cycle 1 Implementation-Learned Test Coverage
| Task AC ID | Task Acceptance Criterion Summary | Test IDs | Coverage |
|------------|-----------------------------------|----------|----------|
| AZ-233 AC-1 | Docker/replay environment starts or reports clear blocked prerequisites | NFT-RES-INFRA, NFT-RES-LIM-INFRA | Covered |
| AZ-233 AC-2 | External dependency stubs are deterministic and record interactions | NFT-SEC-INFRA, NFT-RES-INFRA | Covered |
| AZ-233 AC-3 | Runner executes blackbox, performance, resilience, security, and resource-limit groups | FT-P-01, NFT-PERF-INFRA, NFT-RES-INFRA, NFT-SEC-INFRA, NFT-RES-LIM-INFRA | Covered |
| AZ-233 AC-4 | CSV and Markdown evidence reports are generated with required fields | FT-P-01, NFT-PERF-INFRA, NFT-RES-INFRA, NFT-SEC-INFRA, NFT-RES-LIM-INFRA | Covered |
| AZ-234 AC-1 | Still-image WGS84 error is reported against expected coordinates | FT-P-01 | Covered |
| AZ-234 AC-2 | Confidence output contract fields are validated | FT-P-02 | Covered |
| AZ-234 AC-3 | Replay latency and dropped-frame metrics are recorded | NFT-PERF-INFRA, NFT-PERF-01 | Covered |
| AZ-235 AC-1 | Derkachi fixture alignment is validated before replay | FT-P-03 | Covered |
| AZ-235 AC-2 | Synchronized replay emits frame-by-frame estimates or explicit degradation | FT-P-03 | Covered |
| AZ-235 AC-3 | VIO latency, completion, memory, and calibration status are reported | NFT-PERF-02 | Covered |
| AZ-236 AC-1 | Verified anchors include retrieval, matching, geometry, freshness, and provenance evidence | FT-P-04 | Covered |
| AZ-236 AC-2 | Unsafe cache or low-texture candidates are rejected | FT-N-01, FT-N-03, NFT-SEC-INFRA | Covered |
| AZ-236 AC-3 | Flight-mode missing-cache behavior does not fetch external satellite data | NFT-SEC-04, NFT-SEC-INFRA | Covered |
| AZ-236 AC-4 | Cache and trigger-path metrics are reported | NFT-PERF-03, NFT-RES-04, NFT-RES-LIM-03 | Covered |
| AZ-237 AC-1 | Blackout transitions to dead reckoning within threshold | FT-N-02, NFT-RES-01 | Covered |
| AZ-237 AC-2 | Degraded covariance and no-fix/failsafe thresholds are enforced | FT-N-02, NFT-RES-01 | Covered |
| AZ-237 AC-3 | Spoofed or unauthorized MAVLink inputs are rejected | NFT-SEC-03 | Covered |
| AZ-237 AC-4 | QGC and FDR degraded-mode evidence is visible | FT-N-02, NFT-SEC-03, NFT-RES-INFRA | Covered |
| AZ-238 AC-1 | Disconnected segments trigger relocalization or degraded status | NFT-RES-02 | Covered |
| AZ-238 AC-2 | Companion restart first-output and FDR evidence are recorded | NFT-RES-03 | Covered |
| AZ-238 AC-3 | Cold-start trials report first-fix timing or blocked prerequisite | NFT-PERF-04, NFT-RES-LIM-05 | Covered |
| AZ-238 AC-4 | Cold-start resource spikes are captured where measurable | NFT-RES-LIM-05 | Covered |
| AZ-239 AC-1 | Jetson memory budget is measured on target hardware | NFT-RES-LIM-01, NFT-RES-LIM-INFRA | Covered |
| AZ-239 AC-2 | Thermal/power endurance is validated or blocked with reason | NFT-RES-LIM-02, NFT-RES-LIM-INFRA | Covered |
| AZ-239 AC-3 | FDR rollover behavior is validated | NFT-RES-LIM-04 | Covered |
| AZ-239 AC-4 | Resource/endurance evidence artifacts are complete | NFT-RES-LIM-01, NFT-RES-LIM-02, NFT-RES-LIM-04, NFT-RES-LIM-INFRA | Covered |
## Coverage Summary
| Category | Total Items | Covered | Not Covered | Coverage % |
@@ -79,3 +110,4 @@
- Derkachi project data supports synchronized video/IMU/GPS trajectory replay for FT-P-03 and NFT-PERF-02.
- Derkachi project data is calibration-limited: raw camera intrinsics, lens distortion, and camera-to-body transform are still required before final absolute accuracy thresholds can be treated as production acceptance.
- Phase 3 must validate camera calibration inputs and public/calibrated dataset acquisition before FT-P-03, FT-P-04, and NFT-PERF-02 can be used for final signoff.
- Cycle 1 Docker replay smoke evidence currently passes blackbox, performance, resilience, and security infrastructure scenarios; Jetson resource evidence remains a target-hardware release gate and is reported as blocked on local runners.
@@ -2,24 +2,24 @@
**Cycle**: 1
**Date**: 2026-05-05
**Outcome**: Product implementation complete
**Outcome**: FAIL — product implementation incomplete
## Summary
All product implementation tasks for cycle 1 are implemented or have explicit runtime prerequisite boundaries. The remediation tasks close the previously identified gaps in native VIO selection, local descriptor/index VPR retrieval, and computed anchor matching/geometry verification.
Product implementation was previously marked complete, but Step 11 exposed a false-positive gate: tests passed against scaffold/fake contract behavior while the actual A-Z runtime path, especially real VIO execution, is not implemented. Product implementation must return to Step 7 and create remediation tasks before downstream test gates can be trusted.
## Product Task Classifications
| Task | Classification | Evidence |
|------|----------------|----------|
| AZ-219 through AZ-232 | PASS | Prior batch reports 01-09 and cumulative review 01-09 |
| AZ-240 | PASS | `src/vio_adapter/interfaces.py`, `src/vio_adapter/native/__init__.py`, `tests/unit/test_vio_adapter.py` |
| AZ-219 through AZ-232 | NEEDS RECHECK | Prior batch reports 01-09 and cumulative review 01-09 were not audited under the stricter runtime completeness gate |
| AZ-240 | FAIL | `src/vio_adapter/interfaces.py` exposes `NativeVioBackend`, but default runtime behavior is `ReplayVioBackend`; `src/vio_adapter/native/__init__.py` only re-exports protocol wrappers and does not execute a real BASALT/native VIO engine |
| AZ-241 | PASS | `src/satellite_service/interfaces.py`, `src/satellite_service/types.py`, `src/satellite_service/native/__init__.py`, `tests/unit/test_satellite_service_vpr.py` |
| AZ-242 | PASS | `src/anchor_verification/interfaces.py`, `src/anchor_verification/types.py`, `src/anchor_verification/native/__init__.py`, `tests/unit/test_anchor_verification.py` |
## Remediation Evidence
- VIO now exposes `NativeVioBackend` behind the `VioBackend` protocol, fills latency metrics, maps initialization/runtime failures into explicit health/error envelopes, and keeps WGS84 authority out of the adapter.
- VIO currently exposes `NativeVioBackend` behind the `VioBackend` protocol, but the production/native engine is not actually integrated. This is a scaffold, not product-complete VIO.
- Satellite retrieval now loads local descriptor/index packages from cache files, builds a CPU FAISS-compatible descriptor index, requires query descriptors for retrieval, and degrades safely for missing or invalid index data.
- Anchor verification now computes matcher evidence from frame/tile keypoints through `KeypointRansacMatcher`, reports runtime/quality metrics, and routes computed evidence through the existing freshness, provenance, inlier, MRE, and homography gates.
@@ -39,4 +39,4 @@ Checked changed component source for unresolved implementation markers:
## Required Follow-Up
No product remediation tasks remain. Autodev may advance to Step 8, Code Testability Revision.
Autodev must return to Step 7, rerun the Product Implementation Completeness Gate under the stricter rules, create remediation tasks sized at 5 points or less, and implement the missing runtime behavior before Step 8 or Step 11 may pass.
+6 -6
View File
@@ -2,13 +2,13 @@
## Current Step
flow: greenfield
step: 11
name: Run Tests
status: not_started
step: 7
name: Implement
status: in_progress
tracker: jira
sub_step:
phase: 0
name: awaiting-invocation
detail: ""
phase: 15
name: product-completeness-gate
detail: "Reset after failed reality check: VIO/A-Z runtime path is scaffolded; tests passed stubs/contracts instead of actual system behavior"
retry_count: 0
cycle: 1