mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 10:11:13 +00:00
[AZ-233] Update Docker Compose and enhance test documentation
- Modified the Docker Compose configuration to include an input root for replay tests and added an environment variable for enabling SITL. - Enhanced documentation for various testing processes, including the addition of a Runtime Completeness Decomposition Gate and clarifications on internal module testing requirements. - Updated the implementation completeness report to reflect the current state and added new test cases for performance and resilience scenarios. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -14,7 +14,7 @@ Build a Jetson-hosted onboard localization pipeline for fixed-wing GPS-denied fl
|
||||
- Tile Manager: manage COGs, manifests, freshness/provenance, orthorectified generated tiles, and local tile metadata.
|
||||
- MAVLink/GCS integration: consume FC telemetry and emit `GPS_INPUT`/QGC status.
|
||||
- FDR/observability: record replayable mission evidence under storage caps.
|
||||
- Validation harness: run still-image, public dataset, SITL, Jetson, and representative replay tests.
|
||||
- Validation harness: run local pytest plus Docker replay smoke for still-image, cache, SITL/QGC stub, security, Jetson-prerequisite, public dataset, and representative replay tests.
|
||||
|
||||
### Principles / Non-Negotiables
|
||||
|
||||
|
||||
@@ -228,11 +228,15 @@ Read top-to-bottom; an upper layer may import from a lower layer but never the r
|
||||
|
||||
Violations of this table are Architecture findings in code-review Phase 7 and are High severity.
|
||||
|
||||
## Out-of-Product E2E Test Suite
|
||||
## Out-of-Product Blackbox / E2E Test Suite
|
||||
|
||||
The e2e replay/SITL/Jetson validation suite is not a product component and must not receive Step 6 product implementation tasks. It owns test-support artifacts under `tests/blackbox/**`, `tests/e2e/**`, `e2e/replay/**`, and `e2e/reports/**`, and it exercises the runtime only through public file, MAVLink, cache, status, and FDR interfaces.
|
||||
The blackbox/e2e replay/SITL/Jetson validation suite is not a product component and must not receive Step 6 product implementation tasks. It owns test-support artifacts under `tests/blackbox/**`, `e2e/replay/**`, `e2e/fixtures/**`, `e2e/mocks/**`, `docker-compose.test.yml`, and `deployment/docker/Dockerfile.replay`, and it exercises the runtime only through public file, MAVLink, cache, status, and FDR interfaces.
|
||||
|
||||
- **Technologies**: Python, pytest-style runner, Docker/compose, pymavlink/log parser, ArduPilot Plane SITL, QGC observer/log parser, CSV/Markdown reports
|
||||
- **Technologies**: Python, pytest-style runner, Docker/compose, deterministic fixture stubs, ArduPilot Plane SITL/QGC observer placeholders, CSV/Markdown reports
|
||||
- **Entry points**:
|
||||
- Local: `python3 -m pytest`
|
||||
- Replay: `python -m e2e.replay.run_replay --output-dir <dir> --input-root <fixture-root>`
|
||||
- Compose: `docker compose -f docker-compose.test.yml run --build --rm replay-consumer`
|
||||
|
||||
## Self-Verification
|
||||
|
||||
|
||||
@@ -0,0 +1,17 @@
|
||||
# Ripple Log Cycle 1
|
||||
|
||||
## Scope
|
||||
|
||||
Task-mode documentation refresh for Cycle 1 test implementation tasks `AZ-233` through `AZ-239`, plus Step 11 replay-gate fixes.
|
||||
|
||||
## Ripple Analysis
|
||||
|
||||
- No product component module docs were refreshed because the changed implementation surface is the out-of-product blackbox/e2e replay harness under `tests/blackbox/**`, `e2e/replay/**`, `docker-compose.test.yml`, and `deployment/docker/Dockerfile.replay`.
|
||||
- `_docs/02_document/module-layout.md` was refreshed because the out-of-product test-suite path list now includes actual implemented paths and entry points.
|
||||
- `_docs/02_document/architecture.md` was refreshed because the validation harness responsibility now includes the implemented Docker replay smoke gate.
|
||||
- `_docs/02_document/tests/environment.md` was refreshed because the replay harness entry points, output paths, and local-vs-Jetson gate behavior changed.
|
||||
- `_docs/02_document/tests/*` and `_docs/02_document/tests/traceability-matrix.md` were refreshed during Step 12 to capture implementation-learned replay-smoke scenario IDs.
|
||||
|
||||
## Import-Graph Result
|
||||
|
||||
No reverse-import product ripple was found or required. The replay harness imports product runtime modules only from tests; product runtime modules do not import the replay harness.
|
||||
@@ -44,9 +44,12 @@
|
||||
|
||||
## Consumer Application
|
||||
|
||||
**Tech stack**: Python replay harness with pytest-style assertions and MAVLink log parsing.
|
||||
**Tech stack**: Python replay harness with pytest-style assertions, Docker/compose orchestration, deterministic cache/SITL/QGC stubs, and CSV/Markdown report generation.
|
||||
|
||||
**Entry point**: `run-blackbox-replay` command to be created during implementation; this planning artifact defines required behavior, not code.
|
||||
**Entry points**:
|
||||
- Local functional suite: `python3 -m pytest`
|
||||
- Replay harness: `python -m e2e.replay.run_replay --output-dir <dir> --input-root <fixture-root>`
|
||||
- Docker replay gate: `docker compose -f docker-compose.test.yml run --build --rm replay-consumer`
|
||||
|
||||
### Communication With System Under Test
|
||||
|
||||
@@ -81,7 +84,7 @@
|
||||
|
||||
**Columns**: Test ID, Test Name, Input Dataset, Execution Time (ms), Result, Error Distance (m), Source Label, Covariance 95% Semi-Major (m), `GPS_INPUT.fix_type`, Error Message.
|
||||
|
||||
**Output path**: `./test-results/blackbox-report.csv` and `./test-results/fdr-validation-summary.md`.
|
||||
**Output path**: `data/test-results/<run-id>/blackbox-report.csv` and `data/test-results/<run-id>/fdr-validation-summary.md` on the host; `/app/data/test-results/<run-id>/...` inside the replay container.
|
||||
|
||||
## Test Execution
|
||||
|
||||
@@ -107,6 +110,8 @@ Use Docker or local host replay for deterministic, reproducible tests that do no
|
||||
|
||||
Docker/replay mode is suitable for PR checks and nightly validation, but it does not prove Jetson latency, memory, thermal, or camera-driver behavior.
|
||||
|
||||
Current Docker replay smoke evidence is expected to pass `FT-P-01`, `NFT-PERF-INFRA`, `NFT-RES-INFRA`, and `NFT-SEC-INFRA`. `NFT-RES-LIM-INFRA` remains blocked on local non-Jetson runners with an explicit target-hardware prerequisite.
|
||||
|
||||
### Local Hardware Mode
|
||||
|
||||
Use local Jetson hardware for release gates:
|
||||
|
||||
@@ -94,3 +94,24 @@
|
||||
**Pass criteria**: 95th percentile <30 s over 50 runs.
|
||||
|
||||
**Duration**: 50 cold-start trials.
|
||||
|
||||
---
|
||||
|
||||
### NFT-PERF-INFRA: Replay Evidence Smoke
|
||||
|
||||
**Summary**: Validate that the Docker replay harness records timing evidence for the runnable local replay subset.
|
||||
|
||||
**Traces to**: AZ-234 AC-3, AZ-233 AC-3, AZ-233 AC-4
|
||||
|
||||
**Metric**: Scenario execution time and report generation status.
|
||||
|
||||
**Preconditions**:
|
||||
- Docker replay environment is available.
|
||||
- Project input fixtures are mounted read-only into the replay consumer.
|
||||
|
||||
| Step | Consumer Action | Measurement |
|
||||
|------|-----------------|-------------|
|
||||
| 1 | Run the replay consumer in Docker mode | Confirm the performance smoke scenario executes |
|
||||
| 2 | Inspect the generated CSV and FDR summary | Confirm execution time and artifact paths are recorded |
|
||||
|
||||
**Pass criteria**: `NFT-PERF-INFRA` reports `pass` and writes run-scoped CSV/Markdown evidence; Jetson-only performance evidence remains in release-gate resource tests.
|
||||
|
||||
@@ -83,3 +83,25 @@
|
||||
| 2 | Inspect emitted estimate | No stale tile produces `satellite_anchored` label past hard rejection threshold |
|
||||
|
||||
**Pass criteria**: Freshness decay and hard rejection match AC-NEW-6.
|
||||
|
||||
---
|
||||
|
||||
### NFT-RES-INFRA: Replay/SITL Prerequisite Smoke
|
||||
|
||||
**Summary**: Validate that the Docker replay environment can execute the resilience scenario group with deterministic SITL/QGC stubs.
|
||||
|
||||
**Traces to**: AZ-237 AC-1, AZ-237 AC-4, AZ-233 AC-1, AZ-233 AC-3
|
||||
|
||||
**Preconditions**:
|
||||
- `ardupilot-plane-sitl` and `qgc-observer` services are started by `docker-compose.test.yml`.
|
||||
- `GPSD_ENABLE_SITL=1` is set only for the Docker replay stub environment.
|
||||
|
||||
**Fault injection**:
|
||||
- Run the blackout/restart control smoke scenario through the replay consumer.
|
||||
|
||||
| Step | Action | Expected Behavior |
|
||||
|------|--------|-------------------|
|
||||
| 1 | Start Docker replay services | SITL and QGC observer stubs are reachable to the replay consumer |
|
||||
| 2 | Execute the resilience smoke scenario | The report records a `pass` result instead of a missing-SITL prerequisite block |
|
||||
|
||||
**Pass criteria**: `NFT-RES-INFRA` reports `pass` in Docker replay mode; live SITL release-candidate scenarios remain covered by `NFT-RES-01` and `FT-N-02`.
|
||||
|
||||
@@ -83,3 +83,18 @@
|
||||
**Duration**: 50 cold-start trials.
|
||||
|
||||
**Pass criteria**: First valid `GPS_INPUT` <30 s p95; peak memory <8 GB; no first-run engine build occurs at runtime.
|
||||
|
||||
---
|
||||
|
||||
### NFT-RES-LIM-INFRA: Jetson Hardware Prerequisite Smoke
|
||||
|
||||
**Summary**: Validate that local replay reports Jetson-only resource gates as blocked unless target hardware is explicitly enabled.
|
||||
|
||||
**Traces to**: AZ-239 AC-1, AZ-239 AC-2, AZ-239 AC-4, AZ-233 Reliability NFR
|
||||
|
||||
**Monitoring**:
|
||||
- Replay report status, blocked reason, and run-scoped artifact path.
|
||||
|
||||
**Duration**: One Docker replay smoke run.
|
||||
|
||||
**Pass criteria**: On non-Jetson local runners, the scenario reports `blocked` with `Jetson prerequisite blocked: set GPSD_ENABLE_JETSON=1 on target hardware`; on Jetson release-gate runners, it must collect the metrics required by `NFT-RES-LIM-01`, `NFT-RES-LIM-02`, and `NFT-RES-LIM-05`.
|
||||
|
||||
@@ -60,3 +60,18 @@
|
||||
| 2 | Run replay requiring missing tile | System reports degraded/relocalization-needed status, not an external fetch |
|
||||
|
||||
**Pass criteria**: 0 outbound satellite-provider or Suite Service calls during runtime; missing cache data produces controlled degraded behavior.
|
||||
|
||||
---
|
||||
|
||||
### NFT-SEC-INFRA: Invalid Cache No-Fetch Smoke
|
||||
|
||||
**Summary**: Validate that the replay harness treats untrusted cache fixtures as a successful security rejection, not as a trusted anchor.
|
||||
|
||||
**Traces to**: AZ-236 AC-2, AZ-236 AC-3, AZ-233 Security NFR
|
||||
|
||||
| Step | Consumer Action | Expected Response |
|
||||
|------|-----------------|-------------------|
|
||||
| 1 | Run replay with `cache_variant=stale` | Satellite cache stub marks the manifest untrusted and records no network fetch |
|
||||
| 2 | Inspect replay evidence | Scenario reports `pass`, `source_label=untrusted_cache_rejected`, and `GPS_INPUT.fix_type=0` |
|
||||
|
||||
**Pass criteria**: The invalid cache smoke scenario passes only when the untrusted fixture is rejected and no external satellite-provider or Suite service network fetch is attempted.
|
||||
|
||||
@@ -59,6 +59,37 @@
|
||||
| R-GCS-01 | QGroundControl supported GCS | FT-N-02, NFT-SEC-03 | Covered |
|
||||
| R-SAFETY-01 | False-position, cold-start, spoofing, and failsafe constraints | FT-N-01, FT-N-02, NFT-PERF-04, NFT-RES-01 | Covered |
|
||||
|
||||
## Cycle 1 Implementation-Learned Test Coverage
|
||||
|
||||
| Task AC ID | Task Acceptance Criterion Summary | Test IDs | Coverage |
|
||||
|------------|-----------------------------------|----------|----------|
|
||||
| AZ-233 AC-1 | Docker/replay environment starts or reports clear blocked prerequisites | NFT-RES-INFRA, NFT-RES-LIM-INFRA | Covered |
|
||||
| AZ-233 AC-2 | External dependency stubs are deterministic and record interactions | NFT-SEC-INFRA, NFT-RES-INFRA | Covered |
|
||||
| AZ-233 AC-3 | Runner executes blackbox, performance, resilience, security, and resource-limit groups | FT-P-01, NFT-PERF-INFRA, NFT-RES-INFRA, NFT-SEC-INFRA, NFT-RES-LIM-INFRA | Covered |
|
||||
| AZ-233 AC-4 | CSV and Markdown evidence reports are generated with required fields | FT-P-01, NFT-PERF-INFRA, NFT-RES-INFRA, NFT-SEC-INFRA, NFT-RES-LIM-INFRA | Covered |
|
||||
| AZ-234 AC-1 | Still-image WGS84 error is reported against expected coordinates | FT-P-01 | Covered |
|
||||
| AZ-234 AC-2 | Confidence output contract fields are validated | FT-P-02 | Covered |
|
||||
| AZ-234 AC-3 | Replay latency and dropped-frame metrics are recorded | NFT-PERF-INFRA, NFT-PERF-01 | Covered |
|
||||
| AZ-235 AC-1 | Derkachi fixture alignment is validated before replay | FT-P-03 | Covered |
|
||||
| AZ-235 AC-2 | Synchronized replay emits frame-by-frame estimates or explicit degradation | FT-P-03 | Covered |
|
||||
| AZ-235 AC-3 | VIO latency, completion, memory, and calibration status are reported | NFT-PERF-02 | Covered |
|
||||
| AZ-236 AC-1 | Verified anchors include retrieval, matching, geometry, freshness, and provenance evidence | FT-P-04 | Covered |
|
||||
| AZ-236 AC-2 | Unsafe cache or low-texture candidates are rejected | FT-N-01, FT-N-03, NFT-SEC-INFRA | Covered |
|
||||
| AZ-236 AC-3 | Flight-mode missing-cache behavior does not fetch external satellite data | NFT-SEC-04, NFT-SEC-INFRA | Covered |
|
||||
| AZ-236 AC-4 | Cache and trigger-path metrics are reported | NFT-PERF-03, NFT-RES-04, NFT-RES-LIM-03 | Covered |
|
||||
| AZ-237 AC-1 | Blackout transitions to dead reckoning within threshold | FT-N-02, NFT-RES-01 | Covered |
|
||||
| AZ-237 AC-2 | Degraded covariance and no-fix/failsafe thresholds are enforced | FT-N-02, NFT-RES-01 | Covered |
|
||||
| AZ-237 AC-3 | Spoofed or unauthorized MAVLink inputs are rejected | NFT-SEC-03 | Covered |
|
||||
| AZ-237 AC-4 | QGC and FDR degraded-mode evidence is visible | FT-N-02, NFT-SEC-03, NFT-RES-INFRA | Covered |
|
||||
| AZ-238 AC-1 | Disconnected segments trigger relocalization or degraded status | NFT-RES-02 | Covered |
|
||||
| AZ-238 AC-2 | Companion restart first-output and FDR evidence are recorded | NFT-RES-03 | Covered |
|
||||
| AZ-238 AC-3 | Cold-start trials report first-fix timing or blocked prerequisite | NFT-PERF-04, NFT-RES-LIM-05 | Covered |
|
||||
| AZ-238 AC-4 | Cold-start resource spikes are captured where measurable | NFT-RES-LIM-05 | Covered |
|
||||
| AZ-239 AC-1 | Jetson memory budget is measured on target hardware | NFT-RES-LIM-01, NFT-RES-LIM-INFRA | Covered |
|
||||
| AZ-239 AC-2 | Thermal/power endurance is validated or blocked with reason | NFT-RES-LIM-02, NFT-RES-LIM-INFRA | Covered |
|
||||
| AZ-239 AC-3 | FDR rollover behavior is validated | NFT-RES-LIM-04 | Covered |
|
||||
| AZ-239 AC-4 | Resource/endurance evidence artifacts are complete | NFT-RES-LIM-01, NFT-RES-LIM-02, NFT-RES-LIM-04, NFT-RES-LIM-INFRA | Covered |
|
||||
|
||||
## Coverage Summary
|
||||
|
||||
| Category | Total Items | Covered | Not Covered | Coverage % |
|
||||
@@ -79,3 +110,4 @@
|
||||
- Derkachi project data supports synchronized video/IMU/GPS trajectory replay for FT-P-03 and NFT-PERF-02.
|
||||
- Derkachi project data is calibration-limited: raw camera intrinsics, lens distortion, and camera-to-body transform are still required before final absolute accuracy thresholds can be treated as production acceptance.
|
||||
- Phase 3 must validate camera calibration inputs and public/calibrated dataset acquisition before FT-P-03, FT-P-04, and NFT-PERF-02 can be used for final signoff.
|
||||
- Cycle 1 Docker replay smoke evidence currently passes blackbox, performance, resilience, and security infrastructure scenarios; Jetson resource evidence remains a target-hardware release gate and is reported as blocked on local runners.
|
||||
|
||||
Reference in New Issue
Block a user