[AZ-233] Update Docker Compose and enhance test documentation

- Modified the Docker Compose configuration to include an input root for replay tests and added an environment variable for enabling SITL.
- Enhanced documentation for various testing processes, including the addition of a Runtime Completeness Decomposition Gate and clarifications on internal module testing requirements.
- Updated the implementation completeness report to reflect the current state and added new test cases for performance and resilience scenarios.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-06 05:03:48 +03:00
parent 2485763d09
commit cab7b5d020
20 changed files with 265 additions and 41 deletions
+8 -3
View File
@@ -44,9 +44,12 @@
## Consumer Application
**Tech stack**: Python replay harness with pytest-style assertions and MAVLink log parsing.
**Tech stack**: Python replay harness with pytest-style assertions, Docker/compose orchestration, deterministic cache/SITL/QGC stubs, and CSV/Markdown report generation.
**Entry point**: `run-blackbox-replay` command to be created during implementation; this planning artifact defines required behavior, not code.
**Entry points**:
- Local functional suite: `python3 -m pytest`
- Replay harness: `python -m e2e.replay.run_replay --output-dir <dir> --input-root <fixture-root>`
- Docker replay gate: `docker compose -f docker-compose.test.yml run --build --rm replay-consumer`
### Communication With System Under Test
@@ -81,7 +84,7 @@
**Columns**: Test ID, Test Name, Input Dataset, Execution Time (ms), Result, Error Distance (m), Source Label, Covariance 95% Semi-Major (m), `GPS_INPUT.fix_type`, Error Message.
**Output path**: `./test-results/blackbox-report.csv` and `./test-results/fdr-validation-summary.md`.
**Output path**: `data/test-results/<run-id>/blackbox-report.csv` and `data/test-results/<run-id>/fdr-validation-summary.md` on the host; `/app/data/test-results/<run-id>/...` inside the replay container.
## Test Execution
@@ -107,6 +110,8 @@ Use Docker or local host replay for deterministic, reproducible tests that do no
Docker/replay mode is suitable for PR checks and nightly validation, but it does not prove Jetson latency, memory, thermal, or camera-driver behavior.
Current Docker replay smoke evidence is expected to pass `FT-P-01`, `NFT-PERF-INFRA`, `NFT-RES-INFRA`, and `NFT-SEC-INFRA`. `NFT-RES-LIM-INFRA` remains blocked on local non-Jetson runners with an explicit target-hardware prerequisite.
### Local Hardware Mode
Use local Jetson hardware for release gates:
@@ -94,3 +94,24 @@
**Pass criteria**: 95th percentile <30 s over 50 runs.
**Duration**: 50 cold-start trials.
---
### NFT-PERF-INFRA: Replay Evidence Smoke
**Summary**: Validate that the Docker replay harness records timing evidence for the runnable local replay subset.
**Traces to**: AZ-234 AC-3, AZ-233 AC-3, AZ-233 AC-4
**Metric**: Scenario execution time and report generation status.
**Preconditions**:
- Docker replay environment is available.
- Project input fixtures are mounted read-only into the replay consumer.
| Step | Consumer Action | Measurement |
|------|-----------------|-------------|
| 1 | Run the replay consumer in Docker mode | Confirm the performance smoke scenario executes |
| 2 | Inspect the generated CSV and FDR summary | Confirm execution time and artifact paths are recorded |
**Pass criteria**: `NFT-PERF-INFRA` reports `pass` and writes run-scoped CSV/Markdown evidence; Jetson-only performance evidence remains in release-gate resource tests.
@@ -83,3 +83,25 @@
| 2 | Inspect emitted estimate | No stale tile produces `satellite_anchored` label past hard rejection threshold |
**Pass criteria**: Freshness decay and hard rejection match AC-NEW-6.
---
### NFT-RES-INFRA: Replay/SITL Prerequisite Smoke
**Summary**: Validate that the Docker replay environment can execute the resilience scenario group with deterministic SITL/QGC stubs.
**Traces to**: AZ-237 AC-1, AZ-237 AC-4, AZ-233 AC-1, AZ-233 AC-3
**Preconditions**:
- `ardupilot-plane-sitl` and `qgc-observer` services are started by `docker-compose.test.yml`.
- `GPSD_ENABLE_SITL=1` is set only for the Docker replay stub environment.
**Fault injection**:
- Run the blackout/restart control smoke scenario through the replay consumer.
| Step | Action | Expected Behavior |
|------|--------|-------------------|
| 1 | Start Docker replay services | SITL and QGC observer stubs are reachable to the replay consumer |
| 2 | Execute the resilience smoke scenario | The report records a `pass` result instead of a missing-SITL prerequisite block |
**Pass criteria**: `NFT-RES-INFRA` reports `pass` in Docker replay mode; live SITL release-candidate scenarios remain covered by `NFT-RES-01` and `FT-N-02`.
@@ -83,3 +83,18 @@
**Duration**: 50 cold-start trials.
**Pass criteria**: First valid `GPS_INPUT` <30 s p95; peak memory <8 GB; no first-run engine build occurs at runtime.
---
### NFT-RES-LIM-INFRA: Jetson Hardware Prerequisite Smoke
**Summary**: Validate that local replay reports Jetson-only resource gates as blocked unless target hardware is explicitly enabled.
**Traces to**: AZ-239 AC-1, AZ-239 AC-2, AZ-239 AC-4, AZ-233 Reliability NFR
**Monitoring**:
- Replay report status, blocked reason, and run-scoped artifact path.
**Duration**: One Docker replay smoke run.
**Pass criteria**: On non-Jetson local runners, the scenario reports `blocked` with `Jetson prerequisite blocked: set GPSD_ENABLE_JETSON=1 on target hardware`; on Jetson release-gate runners, it must collect the metrics required by `NFT-RES-LIM-01`, `NFT-RES-LIM-02`, and `NFT-RES-LIM-05`.
+15
View File
@@ -60,3 +60,18 @@
| 2 | Run replay requiring missing tile | System reports degraded/relocalization-needed status, not an external fetch |
**Pass criteria**: 0 outbound satellite-provider or Suite Service calls during runtime; missing cache data produces controlled degraded behavior.
---
### NFT-SEC-INFRA: Invalid Cache No-Fetch Smoke
**Summary**: Validate that the replay harness treats untrusted cache fixtures as a successful security rejection, not as a trusted anchor.
**Traces to**: AZ-236 AC-2, AZ-236 AC-3, AZ-233 Security NFR
| Step | Consumer Action | Expected Response |
|------|-----------------|-------------------|
| 1 | Run replay with `cache_variant=stale` | Satellite cache stub marks the manifest untrusted and records no network fetch |
| 2 | Inspect replay evidence | Scenario reports `pass`, `source_label=untrusted_cache_rejected`, and `GPS_INPUT.fix_type=0` |
**Pass criteria**: The invalid cache smoke scenario passes only when the untrusted fixture is rejected and no external satellite-provider or Suite service network fetch is attempted.
@@ -59,6 +59,37 @@
| R-GCS-01 | QGroundControl supported GCS | FT-N-02, NFT-SEC-03 | Covered |
| R-SAFETY-01 | False-position, cold-start, spoofing, and failsafe constraints | FT-N-01, FT-N-02, NFT-PERF-04, NFT-RES-01 | Covered |
## Cycle 1 Implementation-Learned Test Coverage
| Task AC ID | Task Acceptance Criterion Summary | Test IDs | Coverage |
|------------|-----------------------------------|----------|----------|
| AZ-233 AC-1 | Docker/replay environment starts or reports clear blocked prerequisites | NFT-RES-INFRA, NFT-RES-LIM-INFRA | Covered |
| AZ-233 AC-2 | External dependency stubs are deterministic and record interactions | NFT-SEC-INFRA, NFT-RES-INFRA | Covered |
| AZ-233 AC-3 | Runner executes blackbox, performance, resilience, security, and resource-limit groups | FT-P-01, NFT-PERF-INFRA, NFT-RES-INFRA, NFT-SEC-INFRA, NFT-RES-LIM-INFRA | Covered |
| AZ-233 AC-4 | CSV and Markdown evidence reports are generated with required fields | FT-P-01, NFT-PERF-INFRA, NFT-RES-INFRA, NFT-SEC-INFRA, NFT-RES-LIM-INFRA | Covered |
| AZ-234 AC-1 | Still-image WGS84 error is reported against expected coordinates | FT-P-01 | Covered |
| AZ-234 AC-2 | Confidence output contract fields are validated | FT-P-02 | Covered |
| AZ-234 AC-3 | Replay latency and dropped-frame metrics are recorded | NFT-PERF-INFRA, NFT-PERF-01 | Covered |
| AZ-235 AC-1 | Derkachi fixture alignment is validated before replay | FT-P-03 | Covered |
| AZ-235 AC-2 | Synchronized replay emits frame-by-frame estimates or explicit degradation | FT-P-03 | Covered |
| AZ-235 AC-3 | VIO latency, completion, memory, and calibration status are reported | NFT-PERF-02 | Covered |
| AZ-236 AC-1 | Verified anchors include retrieval, matching, geometry, freshness, and provenance evidence | FT-P-04 | Covered |
| AZ-236 AC-2 | Unsafe cache or low-texture candidates are rejected | FT-N-01, FT-N-03, NFT-SEC-INFRA | Covered |
| AZ-236 AC-3 | Flight-mode missing-cache behavior does not fetch external satellite data | NFT-SEC-04, NFT-SEC-INFRA | Covered |
| AZ-236 AC-4 | Cache and trigger-path metrics are reported | NFT-PERF-03, NFT-RES-04, NFT-RES-LIM-03 | Covered |
| AZ-237 AC-1 | Blackout transitions to dead reckoning within threshold | FT-N-02, NFT-RES-01 | Covered |
| AZ-237 AC-2 | Degraded covariance and no-fix/failsafe thresholds are enforced | FT-N-02, NFT-RES-01 | Covered |
| AZ-237 AC-3 | Spoofed or unauthorized MAVLink inputs are rejected | NFT-SEC-03 | Covered |
| AZ-237 AC-4 | QGC and FDR degraded-mode evidence is visible | FT-N-02, NFT-SEC-03, NFT-RES-INFRA | Covered |
| AZ-238 AC-1 | Disconnected segments trigger relocalization or degraded status | NFT-RES-02 | Covered |
| AZ-238 AC-2 | Companion restart first-output and FDR evidence are recorded | NFT-RES-03 | Covered |
| AZ-238 AC-3 | Cold-start trials report first-fix timing or blocked prerequisite | NFT-PERF-04, NFT-RES-LIM-05 | Covered |
| AZ-238 AC-4 | Cold-start resource spikes are captured where measurable | NFT-RES-LIM-05 | Covered |
| AZ-239 AC-1 | Jetson memory budget is measured on target hardware | NFT-RES-LIM-01, NFT-RES-LIM-INFRA | Covered |
| AZ-239 AC-2 | Thermal/power endurance is validated or blocked with reason | NFT-RES-LIM-02, NFT-RES-LIM-INFRA | Covered |
| AZ-239 AC-3 | FDR rollover behavior is validated | NFT-RES-LIM-04 | Covered |
| AZ-239 AC-4 | Resource/endurance evidence artifacts are complete | NFT-RES-LIM-01, NFT-RES-LIM-02, NFT-RES-LIM-04, NFT-RES-LIM-INFRA | Covered |
## Coverage Summary
| Category | Total Items | Covered | Not Covered | Coverage % |
@@ -79,3 +110,4 @@
- Derkachi project data supports synchronized video/IMU/GPS trajectory replay for FT-P-03 and NFT-PERF-02.
- Derkachi project data is calibration-limited: raw camera intrinsics, lens distortion, and camera-to-body transform are still required before final absolute accuracy thresholds can be treated as production acceptance.
- Phase 3 must validate camera calibration inputs and public/calibrated dataset acquisition before FT-P-03, FT-P-04, and NFT-PERF-02 can be used for final signoff.
- Cycle 1 Docker replay smoke evidence currently passes blackbox, performance, resilience, and security infrastructure scenarios; Jetson resource evidence remains a target-hardware release gate and is reported as blocked on local runners.