[autodev] Update configuration and documentation for cycle-1
ci/woodpecker/push/02-build-push Pipeline failed

- Enhanced `.env.example` with detailed CMake build flags and replay-mode strategy flags for development and CI environments.
- Updated `.gitignore` to include a new deploy rollback bookmark.
- Revised `_docs/_autodev_state.md` to reflect the current task status and steps.
- Added new lessons to `_docs/LESSONS.md` regarding testing and architectural improvements.
- Documented changes in `_docs/02_document/deployment/ci_cd_pipeline.md` to reflect the relaxed OpenCV version pin.
- Updated test data documentation in `_docs/02_document/tests/test-data.md` to clarify fixture usage and paths.

This commit continues the cycle-1 documentation sync and addresses various configuration updates for improved clarity and functionality.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-20 08:05:35 +03:00
parent ab92946833
commit bf13549b32
34 changed files with 3689 additions and 42 deletions
@@ -98,7 +98,7 @@ The Dockerfile receives the args; `cmake -DBUILD_VINS_MONO=$BUILD_VINS_MONO -DBU
| .NET dependency CVEs | `dotnet list package --vulnerable --include-transitive` | Critical / High severity |
| C++ dependency CVEs | Manual audit via SBOM matched against NVD; `osv-scanner` for known submodule pins | Critical / High severity |
| Image scan | Trivy on all CI-built images | Critical / High severity |
| OpenCV pin gate | CI step asserts the resolved OpenCV version is `≥ 4.12.0` (D-CROSS-CVE-1) | Any version `< 4.12.0` |
| OpenCV pin gate | CI step asserts the resolved OpenCV version is within the cycle-1 relaxed band `>=4.11.0.86,<4.12` (D-CROSS-CVE-1 — see `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md`; original target `>=4.12.0` replays once gtsam ships numpy-2 wheels) | Any version `< 4.11.0.86` OR `>= 4.12` while leftover is open |
| GTSAM CVE re-scan | Monthly scheduled workflow against the GTSAM commit pinned in `cmake/dependencies.cmake` | Any new published CVE |
### Push images (Tier-1)
+6 -1
View File
@@ -42,6 +42,11 @@ The Task workflow's three update levels each have a different effective scope th
- **Batch 1 (this session, 2026-05-19)**: C2 (VPR), C2.5 (Rerank), C3 (Matcher) — cycle-1 reality paragraphs + OpenCV pin relaxation (where applicable) + C3 xfeat Tier-2 follow-up note. Source of truth crossed: `runtime_root/airborne_bootstrap.py` (`_C*_STRATEGIES`, `AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS`, `C3_MATCHER_BUILD_FLAGS`), `runtime_root/vpr_factory.py`, per-component `config.py`.
- **Remaining (10 components)**: C3.5 (AdHoP), C4 (Pose), C5 (StateEstimator), C6 (TileCache), C7 (Inference), C8 (FC adapter), C10 (Provisioning), C11 (TileManager), C12 (OperatorOrchestrator), C13 (FDR).
- **Helpers (8 files)**: `imu_preintegrator`, `se3_utils`, `lightglue_runtime`, `wgs_converter`, `sha256_sidecar`, `engine_filename_schema`, `ransac_filter`, `descriptor_normaliser`.
4. **`tests/*.md`** — pick up cycle-1 deltas; Step 12 (Test-Spec Sync) already touched `traceability-matrix.md` and `resilience-tests.md` in the uncommitted working tree; the remaining test-doc surfaces (`blackbox-tests.md`, `performance-tests.md`, `resource-limit-tests.md`, `security-tests.md`, `tier2-jetson-testing.md`, `environment.md`, `test-data.md`) should be checked against the ~36 done Blackbox Tests task specs.
4. **`tests/*.md`** — pick up cycle-1 deltas; Step 12 (Test-Spec Sync) already touched `traceability-matrix.md` and `resilience-tests.md` in the uncommitted working tree; the remaining test-doc surfaces (`blackbox-tests.md`, `performance-tests.md`, `resource-limit-tests.md`, `security-tests.md`, `tier2-jetson-testing.md`, `environment.md`, `test-data.md`) should be checked against the ~36 done Blackbox Tests task specs. **DONE** — 2026-05-19 session (autodev Step 13 phase 9 tests-doc-updates):
- **`environment.md`** — added § Harness Implementation Layout (47-evaluator `runner/helpers/` inventory, `runner/reporting/` CSV+evidence bundler from AZ-445/446, `fixtures/{injectors,sitl_replay,sitl_replay_builder}` layout, `e2e/jetson/` Tier-2 entrypoint); added § Replay-Mode Skip Gating (`E2E_SITL_REPLAY_DIR` + `sitl_replay_ready` marker from AZ-594/595/598/599); replaced raw-compose Tier-1 + Tier-2 examples with `e2e/docker/run-tier1.sh` + `e2e/jetson/run-tier2.sh` selector-parity wrappers (AZ-444 AC-1); aligned OpenCV pin reference to the cycle-1 `>=4.11.0.86,<4.12` floor with leftover cross-reference; fixed stale `tests/fixtures/` and `tests/runner` paths to `e2e/fixtures/` and `e2e/runner`.
- **`test-data.md`** — added rows for `sitl-replay-fixture-p01`, `sitl-replay-fixture-p02`, and `fc-proxy-schedule` (AZ-596/598/599); revised `cve-jpeg-fixture` row to reflect the relaxed cycle-1 OpenCV pin band with leftover cross-reference; added § Data Isolation paragraph for committed-fixture mode; fixed stale `tests/fixtures/` paths to `e2e/fixtures/`.
- **`security-tests.md`** — NFT-SEC-04 pin assertion + Pass Criteria updated to the cycle-1 relaxed band with leftover cross-reference; replay condition documented.
- **`ci_cd_pipeline.md`** (deployment doc — adjacent hygiene) — OpenCV pin-gate row revised to the relaxed band with replay condition.
- **Unchanged** — `blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `resource-limit-tests.md`, `tier2-jetson-testing.md`, `traceability-matrix.md`. Step 12 (Test-Spec Sync) already aligned these with implementation; no further gaps found against the ~50 done test task specs (AZ-406-446, AZ-594-600, AZ-618-619). `tests/e2e/replay/test_derkachi_1min.py` path references in `resilience-tests.md` + `tier2-jetson-testing.md` verified — file exists at that path at repo root, distinct from the `e2e/tests/` blackbox harness.
The component-level pass (item 2 + 3) is the bulk of the work. Each component-batch session should re-read this ripple log + the relevant component description.md + the task specs in `_docs/02_tasks/done/AZ-*_<component>*.md` + the actual source in `src/gps_denied_onboard/components/<c>/`. Per-batch session pattern proven this cycle: read `runtime_root/airborne_bootstrap.py` `_C*_STRATEGIES` + `AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS` rows for each slug, cross-check against component `config.py` defaults, then add (a) a "Cycle-1 operational reality" paragraph in § 1 of `description.md`, (b) an OpenCV pin row in § 5 if the component imports `cv2`, (c) a "Cycle-1 Tier-2 follow-up dependencies" subsection in § 7 only when a strategy exists in code but is parked from the airborne registry.
+115 -16
View File
@@ -33,8 +33,8 @@ CI runs Tier-1 on every PR. Tier-2 runs on hardware-attached runners on a nightl
| `gps-denied-onboard` | local build (`docker/Dockerfile`) | The SUT. Production binary built with `BUILD_VINS_MONO=OFF` per locked sub-decision D-C1-1-SUB-A; research builds run a parallel job with `BUILD_VINS_MONO=ON` | 14550/udp (MAVLink to GCS), 5760/tcp (MSP2 to iNav SITL) |
| `ardupilot-plane-sitl` | `ardupilot/ardupilot-sitl:plane-stable` | ArduPilot Plane SITL. Receives `GPS_INPUT` from the SUT; we read its EKF source-set state to validate AC-4.3, AC-NEW-2, AC-5.x | 14550/udp (MAVLink) |
| `inav-sitl` | `inavflight/inav-sitl:9.0.0` | iNav SITL. Receives `MSP2_SENSOR_GPS` from the SUT; we read its GPS provider state | 5760/tcp (MSP2 over TCP per iNav SITL convention) |
| `mock-suite-sat-service` | local build (`tests/fixtures/mock-suite-sat`) | Stubs the parent-suite Satellite Service tile-publish API (read-only ingest contract for AC-NEW-7 voting layer). Returns deterministic fixture tiles | 8080/tcp |
| `e2e-runner` | local build (`tests/runner`) | Pytest-based harness. Drives all replays, reads FDR output, spins SITL scenarios | — |
| `mock-suite-sat-service` | local build (`e2e/fixtures/mock-suite-sat`) | Stubs the parent-suite Satellite Service tile-publish API (read-only ingest contract for AC-NEW-7 voting layer). Returns deterministic fixture tiles | 8080/tcp |
| `e2e-runner` | local build (`e2e/runner`) | Pytest-based harness. Drives all replays, reads FDR output, spins SITL scenarios. See § Harness Implementation Layout below for the per-evaluator inventory. | — |
| `mavproxy-listener` | `ardupilot/mavproxy:latest` | Passive MAVLink listener that captures the SUT → GCS stream into a per-run `.tlog` for assertions | 14551/udp |
### Networks
@@ -47,7 +47,7 @@ CI runs Tier-1 on every PR. Tier-2 runs on hardware-attached runners on a nightl
| Volume | Mounted to | Purpose |
|--------|-----------|---------|
| `tile-cache-fixture` | `gps-denied-onboard:/var/azaion/tile-cache:ro` | Pre-built FAISS HNSW index + tile filesystem. Built once per test run from `tests/fixtures/tile-cache-builder/` from the 60 still-image satellite references and the Derkachi route bbox. Read-only mount mirrors AC-8.3 pre-flight load behavior. |
| `tile-cache-fixture` | `gps-denied-onboard:/var/azaion/tile-cache:ro` | Pre-built FAISS HNSW index + tile filesystem. Built once per test run from `e2e/fixtures/tile-cache-builder/` from the 60 still-image satellite references and the Derkachi route bbox. Read-only mount mirrors AC-8.3 pre-flight load behavior. |
| `fdr-output` | `gps-denied-onboard:/var/azaion/fdr` | Per-flight FDR write target (AC-NEW-3 64 GB cap enforced via Docker `--storage-opt size=64g` on this volume) |
| `input-data` | `e2e-runner:/test-data:ro` | Bind mount of `_docs/00_problem/input_data/` for replay |
| `expected-results` | `e2e-runner:/expected:ro` | Bind mount of `_docs/00_problem/input_data/expected_results/` for assertions |
@@ -117,9 +117,80 @@ volumes:
## Consumer Application
**Tech stack**: Python 3.12, pytest 8.x, pymavlink (MAVLink ground side), `msp_gps_toy` (MSP2 ground side, Rust binary called via subprocess), OpenCV ≥4.12.0 (frame source replay), numpy + scipy (geodesic-distance assertions in WGS84).
**Tech stack**: Python 3.12, pytest 8.x, pymavlink (MAVLink ground side), `msp_gps_toy` (MSP2 ground side, Rust binary called via subprocess), OpenCV ≥4.11.0,<4.12 (frame source replay; see `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md` — pin is held below 4.12 until gtsam ships numpy-2 wheels; D-CROSS-CVE-1 leftover remains open), numpy ≥1.26,<2.0 + scipy (geodesic-distance assertions in WGS84).
**Entry point**: `pytest tests/e2e/` from inside `e2e-runner`. Each scenario is a parameterized pytest case keyed by FC adapter (`ardupilot` / `inav`).
**Entry point**: `pytest e2e/tests/` from inside `e2e-runner`. Each scenario is a parameterized pytest case keyed by FC adapter (`ardupilot` / `inav`) and VioStrategy (`okvis2` / `klt_ransac`) via the session-scoped conftest fixtures.
### Harness Implementation Layout
The blackbox harness implementation lives under `e2e/` (NOT the SUT source tree — public-boundary discipline enforced by `e2e/README.md`):
```
e2e/
├── docker/ Tier-1 entrypoint
│ ├── docker-compose.test.yml Compose stack (services from § Services above)
│ ├── docker-compose.tier2-bridge.yml Compose override for paired-host Tier-2 SITL bridging
│ ├── run-tier1.sh AZ-444 selector-parity wrapper
│ └── secrets/ Mounted Docker secrets (mavlink-passkey)
├── jetson/ Tier-2 entrypoint
│ ├── run-tier2.sh AZ-444 selector-parity wrapper (control-host side)
│ ├── tier2-on-jetson.sh SSH-orchestrated on-Jetson half
│ ├── tier2.service systemd unit template
│ ├── jtop_parser.py jetson_stats / jtop telemetry parser (NFT-LIM-01)
│ └── tegrastats_parser.py tegrastats parser (NFT-LIM-04)
├── runner/ e2e-runner image
│ ├── Dockerfile, conftest.py, pytest.ini, requirements.txt
│ ├── helpers/ Per-AC evaluator + observer modules (47 evaluators
│ │ covering accuracy, AP/iNav contract, blackout-spoof,
│ │ cache poisoning, cold-start, companion reboot,
│ │ CVE probe, e2e latency, egress observer, escalation
│ │ ladder, FDR reader, frame-source replay, IMU replay,
│ │ injector fixtures, MAVLink signing, MAVProxy tlog,
│ │ memory budget, mid-flight tile, mock suite-sat audit,
│ │ Monte Carlo envelope, MRE, multi-segment, outage
│ │ request, outlier tolerance, registration classifier,
│ │ retrieval, sharp-turn, sitl_observer, smoothing,
│ │ spoof promotion, storage budget, streaming, thermal
│ │ envelope, tile-cache inspector, TTFF — see
│ │ `e2e/runner/helpers/` for the authoritative list)
│ └── reporting/ CSV reporter + evidence bundler (AZ-445/446)
│ ├── csv_reporter.py Emits `report.csv` per § Reporting
│ ├── evidence_bundler.py Collects per-run `.tlog`, FDR, telemetry CSVs
│ └── nfr_recorder.py NFR per-stage latency + budget recorder
├── fixtures/ Fixture builders + captured fixtures
│ ├── tile-cache-builder/ `tile-cache-fixture` builder
│ ├── age-injector/ `synth-age-tile-set` builder (FT-N-05)
│ ├── injectors/ Runtime injectors:
│ │ ├── outlier.py `outlier-injection-derkachi` (FT-N-01)
│ │ ├── blackout_spoof.py `blackout-spoof-derkachi` (FT-N-04, NFT-RES-04)
│ │ ├── multi_segment.py `multi-segment-derkachi` (FT-P-08)
│ │ ├── cold_boot.py `cold-boot-fixture` (NFT-PERF-03)
│ │ └── fc_proxy.py FC-inbound blackout/spoof proxy (FT-N-04 driver)
│ ├── sitl_replay/ Captured offline FDR-replay fixtures
│ │ └── p01/ FT-P-01 capture set (see test-data.md)
│ ├── sitl_replay_builder/ Captured-fixture builder framework (AZ-598-600)
│ │ ├── builder.py VideoSource × TlogSource × FdrProjection strategies
│ │ ├── build_p01_fixtures.py FT-P-01 still-image builder
│ │ └── build_p02_fixtures.py FT-P-02 Derkachi builder
│ ├── mock-suite-sat/ `mock-suite-sat-service` Docker image
│ ├── secrets/ Test-only secrets (mavlink-test-passkey.txt)
│ └── security/ Security fixtures (cve-2025-53644.jpg)
├── tests/ Pytest target: positive/, negative/, performance/,
│ resilience/, security/, resource_limit/
└── _unit_tests/ Out-of-container unit tests for harness internals
(runs as part of project pytest, no Docker required)
```
### Replay-Mode Skip Gating
Several FT-* and FT-N-* scenarios rely on a pre-captured FDR-replay fixture instead of a live SITL run. When the `E2E_SITL_REPLAY_DIR` environment variable is unset, those scenarios skip cleanly via a `sitl_replay_ready` pytest marker (per AZ-594/595/598/599). To activate them:
```bash
E2E_SITL_REPLAY_DIR=e2e/fixtures/sitl_replay/p01 \
pytest e2e/tests/positive/test_ft_p_01_still_image_accuracy.py
```
The captured-fixture builder framework (`e2e/fixtures/sitl_replay_builder/`) regenerates these fixtures from `_docs/00_problem/input_data/` against a live compose stack; the captured artifacts are then committed under `e2e/fixtures/sitl_replay/<scenario>/`. See `e2e/fixtures/sitl_replay_builder/README.md` for the framework, supported scenarios, and per-scenario builder invocations.
### Communication with system under test
@@ -191,7 +262,7 @@ volumes:
| OS-specific services | tegrastats / jetson_stats for thermal telemetry | `_docs/02_document/tests/resource-limit-tests.md` NFT-LIM-04 |
| Thermal envelope | -20 °C to +50 °C operating envelope, 25 W TDP, 8 h duty cycle | `_docs/00_problem/restrictions.md` § Failsafe & Safety + AC-NEW-5 |
(Step 2 Code scan returned zero indicators because no source code exists yet — this is the planning phase. Decompose → Implement will produce `requirements.txt` / `pyproject.toml` / Cargo.toml entries that confirm: `tensorrt`, `pycuda`, `pymavlink`, `gtsam`, `faiss-gpu`, `opencv-python>=4.12.0`, `jetson-stats`.)
(Step 2 Code scan from the planning phase returned zero indicators because no source code existed yet. Post-implementation: `pyproject.toml` confirms `tensorrt`, `pymavlink`, `gtsam==4.2.1`, `faiss-gpu`, `opencv-python>=4.11.0.86,<4.12` (cycle-1 relaxation per `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md` — the original `>=4.12.0` target replays once gtsam ships numpy-2 wheels), and `jetson-stats`. `pycuda` was NOT added — TensorRT EP is invoked via ONNX Runtime + the `onnx_trt_ep_runtime` factory, which uses TensorRT's Python bindings directly without `pycuda`.)
### Execution instructions — Tier-1 (Docker)
@@ -200,19 +271,33 @@ volumes:
- NVIDIA Container Toolkit if the workstation has an NVIDIA dGPU (lets the SUT exercise the TensorRT path; otherwise falls back to CPU TensorRT).
- ≥16 GB host RAM, ≥80 GB free disk for `tile-cache-fixture` + `fdr-output` + image build cache.
**How to start**:
**How to start** (preferred — selector-parity wrapper from AZ-444):
```bash
./e2e/docker/run-tier1.sh \
--fc-adapter ardupilot \
--vio-strategy okvis2 \
[-k <pytest selector>] \
[--build-kind production|asan] \
[--enable-chamber]
```
`run-tier1.sh` and `e2e/jetson/run-tier2.sh` accept the same `-k <selector>` flag and emit the same pytest invocation modulo the `TIER` env var (AZ-444 AC-1).
Raw-compose equivalent (when bypassing the wrapper for debugging):
```bash
cd e2e/docker
export FC_ADAPTER=ardupilot # or: inav (parameterized per scenario in CI)
export VIO_STRATEGY=okvis2 # or: klt_ransac (production binary)
export FC_ADAPTER=ardupilot VIO_STRATEGY=okvis2
docker compose -f docker-compose.test.yml up --build --abort-on-container-exit e2e-runner
```
The run reports to `./e2e-results/run-${RUN_ID}/report.csv` (see § Reporting). Exit code matches the test verdict.
**Environment variables**:
- `FC_ADAPTER``{ardupilot, inav}` — selects which SITL the SUT talks to.
- `VIO_STRATEGY``{okvis2, klt_ransac}` for production binary; `vins_mono` only when the research binary `BUILD_VINS_MONO=ON` is the build.
- `MAVLINK_SIGNING_PASSKEY_FILE` — path to the Docker secret loaded with the test passkey for FT-P-09-AP / NFT-SEC-03.
- `E2E_SITL_REPLAY_DIR` — when set, activates captured-fixture FDR-replay mode for scenarios that gate on `sitl_replay_ready`; unset → those scenarios skip cleanly (see § Replay-Mode Skip Gating above).
- `RUN_ID` — per-invocation run identifier; defaults to `local-${USER}-${EPOCH}` in development, CI sets it from the workflow run id. Determines the `e2e-results/run-${RUN_ID}/` output directory.
**Skipped on Tier-1**: `NFT-PERF-01` (AC-4.1 latency p95 — Jetson-bound), `NFT-LIM-01` (AC-4.2 memory — Jetson-bound), `NFT-PERF-03` (AC-NEW-1 cold-start — Jetson-bound), `NFT-LIM-04` (AC-NEW-5 chamber baseline — Jetson-bound), AC-NEW-5 chamber portion (chamber-bound).
@@ -225,20 +310,34 @@ The run reports to `./e2e-results/run-${RUN_ID}/report.csv` (see § Reporting).
- ArduPilot Plane SITL + iNav SITL run on the same Jetson, OR on a paired x86 host on the same network — both are supported.
- Real ADTi 20MP 20L V1 camera connected via USB/MIPI-CSI/GigE; OR file-replay source if camera unavailable (in which case all `AC-2.x` cross-validation is `XFAIL` for that run).
**How to start**:
**How to start** (AZ-444 selector-parity wrapper):
```bash
cd e2e/jetson
sudo systemctl restart gps-denied-onboard.service
./run-tier2.sh --fc-adapter ardupilot --vio-strategy okvis2 --duration 8h
# or:
./run-tier2.sh --fc-adapter inav --vio-strategy klt_ransac --duration 5min
./e2e/jetson/run-tier2.sh \
--fc-adapter ardupilot \
--vio-strategy okvis2 \
[-k <pytest selector>] \
[--build-kind production|asan] \
[--duration 5min|8h] \
[--enable-chamber] \
[--reflash]
```
Outputs the same CSV format as Tier-1 (one report.csv per run).
The Tier-2 SITL stack runs on a paired x86 host via:
```bash
docker compose \
-f e2e/docker/docker-compose.test.yml \
-f e2e/docker/docker-compose.tier2-bridge.yml up ...
```
When invoked on a control host (typical), the script SSH-orchestrates the Jetson half (`tier2-on-jetson.sh`). When `TIER2_HOST=localhost` and the script runs on the Jetson itself, it delegates directly without SSH. Outputs the same CSV format as Tier-1 (one report.csv per run) plus tegrastats + jtop CSVs in the evidence bundle.
**Environment variables**: same as Tier-1 plus:
- `TIER2_HOST` / `TIER2_USER` / `TIER2_KEY_PATH` — control-host → Jetson SSH wiring (required when `TIER2_HOST != localhost`).
- `TIER2_CHAMBER_AMBIENT_C` — ambient temperature for AC-NEW-5 chamber runs.
- `TIER2_CAMERA_DEVICE``/dev/video0` (production) or file path for replay mode.
`gps-denied-onboard.service` (or `gps-denied-onboard-asan.service` for `--build-kind=asan`) MUST be installed via systemd on the Jetson — `e2e/jetson/tier2.service` is the template. See `_docs/03_implementation/jetson_harness_setup.md` for the physical provisioning steps.
### CI runner mapping
- `ubuntu-24.04` (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.
+2 -2
View File
@@ -75,9 +75,9 @@ These tests cover the security-relevant AC and the Mode B revisions that introdu
| 2 | Push `cve-jpeg-fixture` to every code path that uses OpenCV imread/imdecode: nav-camera frame source (C1), satellite tile thumbnail re-load (C4), tile cache import (C6) | Each path either decodes cleanly OR returns a graceful error |
| 3 | Observe ASan output | 0 buffer-overflow / use-after-free / uninitialized-read reports |
| 4 | Observe SUT process exit code | Process does NOT crash; if rejection path taken, exit code is 0 + error logged |
| 5 | CI step: lint the lockfile / pyproject.toml / requirements.txt for the OpenCV version pin | Pin asserts `opencv-python >= 4.12.0` (or platform-equivalent) |
| 5 | CI step: lint the lockfile / pyproject.toml / requirements.txt for the OpenCV version pin | Pin asserts `opencv-python>=4.11.0.86,<4.12` (cycle-1 relaxation per `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md`; original target was `>=4.12.0` and will replay once gtsam ships numpy-2 wheels) |
**Pass criteria**: ASan clean; no crash; pinned version 4.12.0 in dependency manifest.
**Pass criteria**: ASan clean; no crash; pinned version satisfies the cycle-1 floor `opencv-python>=4.11.0.86,<4.12` (D-CROSS-CVE-1 follow-up open). The leftover-replay condition lifts the floor back to `>=4.12.0` once upstream constraints clear.
---
+11 -6
View File
@@ -7,14 +7,17 @@
| `still-image-set-60` | 60 nadir aerial images `AD000001-60.jpg` from `_docs/00_problem/input_data/` with WGS84 frame-center GT in `coordinates.csv` and per-image accuracy table in `expected_results/position_accuracy.csv`. Captured at 400 m AGL with ADTi 20MP 20L V1 (per `data_parameters.md`). Slow cadence (~1 per 2-3 s), so suitable for satellite-anchor frame-center tests, NOT frame-to-frame VIO. | FT-P-01, FT-P-03, FT-P-05, FT-P-06, FT-P-15, FT-P-19, NFT-RES-03 (Monte Carlo), NFT-PERF-04 | Bind-mounted from `_docs/00_problem/input_data/` to `/test-data` in `e2e-runner` (read-only) | None — read-only fixture |
| `still-image-sat-refs-2` | Two paired Google Maps reference images `AD000001_gmaps.png`, `AD000002_gmaps.png`. Insufficient for full satellite-anchor coverage of the 60-image set; supplements the tile-cache fixture for AC-2.1b cross-validation only. | FT-P-05 (subset), FT-P-19 | Same as above | Same |
| `derkachi-fixture` | Cropped nadir flight footage `flight_derkachi/flight_derkachi.mp4` (H.264, 880×720, 30 fps, ~490.07 s = 14,700 frames) plus synchronized FC telemetry `flight_derkachi/data_imu.csv` (4,900 rows @ 10 Hz, columns `timestamp(ms)`, `Time`, `SCALED_IMU2.*`, `GLOBAL_POSITION_INT.*`). Three video frames per telemetry row. The `GLOBAL_POSITION_INT` columns are the trajectory ground truth. | FT-P-02, FT-P-04, FT-P-07, FT-P-10, FT-N-01 (synth on top), FT-N-02, FT-N-03 (synth), FT-N-04 (synth), NFT-PERF-01, NFT-PERF-02, NFT-RES-01, NFT-RES-02, NFT-RES-03 (Monte Carlo), NFT-RES-04, NFT-LIM-02 (8 h synth load loop) | Same bind mount as above | Same |
| `tile-cache-fixture` | Pre-built FAISS HNSW index + tile filesystem covering: (a) the 60 still-image footprints at 0.3-0.5 m/px, (b) the Derkachi route bbox at the same resolution. Built once per CI run by `tests/fixtures/tile-cache-builder/` from the `_gmaps.png` references and from a curated public-data subset (when D-PROJ-3 is resolved — until then, stub-tile content for footprints not paired with `_gmaps.png`). Tile manifest schema per `restrictions.md` § Satellite Imagery. | FT-P-01, FT-P-05, FT-P-15, FT-P-16, FT-P-17, FT-P-19, FT-N-05, FT-N-06, NFT-LIM-03, NFT-PERF-01, NFT-PERF-04, NFT-SEC-01 (poisoning test), NFT-SEC-02 (egress) | Built into named Docker volume `tile-cache-fixture`; mounted read-only into SUT at `/var/azaion/tile-cache` | Volume removed at teardown |
| `synth-age-tile-set` | Two clones of the tile-cache-fixture with manifest `capture_date` field synthetically aged: `synth-age-7mo` (>6 mo, exceeds AC-8.2 active-conflict threshold) and `synth-age-13mo` (>12 mo, exceeds rear threshold). Tile pixels unchanged; only manifest dates differ. | FT-N-05, FT-N-06 | Built from `tile-cache-fixture` by date-mutating script in `tests/fixtures/age-injector/` | Volume removed at teardown |
| `outlier-injection-derkachi` | Synthetic adversarial overlay on `derkachi-fixture`: every Nth frame replaced by a random crop from a far-away tile (>350 m offset, per AC-3.1) to inject a visual outlier. Three injection densities: `light` (1 in 100), `medium` (1 in 10), `heavy` (1 in 3). Generated at runtime by `tests/fixtures/injectors/outlier.py`. | FT-N-01 | Generated at scenario start, written to `tmpfs` in `e2e-runner`, mounted into SUT as a derived frame source | Auto-cleared at teardown (tmpfs) |
| `tile-cache-fixture` | Pre-built FAISS HNSW index + tile filesystem covering: (a) the 60 still-image footprints at 0.3-0.5 m/px, (b) the Derkachi route bbox at the same resolution. Built once per CI run by `e2e/fixtures/tile-cache-builder/` from the `_gmaps.png` references and from a curated public-data subset (when D-PROJ-3 is resolved — until then, stub-tile content for footprints not paired with `_gmaps.png`). Tile manifest schema per `restrictions.md` § Satellite Imagery. | FT-P-01, FT-P-05, FT-P-15, FT-P-16, FT-P-17, FT-P-19, FT-N-05, FT-N-06, NFT-LIM-03, NFT-PERF-01, NFT-PERF-04, NFT-SEC-01 (poisoning test), NFT-SEC-02 (egress) | Built into named Docker volume `tile-cache-fixture`; mounted read-only into SUT at `/var/azaion/tile-cache` | Volume removed at teardown |
| `synth-age-tile-set` | Two clones of the tile-cache-fixture with manifest `capture_date` field synthetically aged: `synth-age-7mo` (>6 mo, exceeds AC-8.2 active-conflict threshold) and `synth-age-13mo` (>12 mo, exceeds rear threshold). Tile pixels unchanged; only manifest dates differ. | FT-N-05, FT-N-06 | Built from `tile-cache-fixture` by date-mutating script in `e2e/fixtures/age-injector/` | Volume removed at teardown |
| `outlier-injection-derkachi` | Synthetic adversarial overlay on `derkachi-fixture`: every Nth frame replaced by a random crop from a far-away tile (>350 m offset, per AC-3.1) to inject a visual outlier. Three injection densities: `light` (1 in 100), `medium` (1 in 10), `heavy` (1 in 3). Generated at runtime by `e2e/fixtures/injectors/outlier.py`. | FT-N-01 | Generated at scenario start, written to `tmpfs` in `e2e-runner`, mounted into SUT as a derived frame source | Auto-cleared at teardown (tmpfs) |
| `blackout-spoof-derkachi` | Synthetic overlay on `derkachi-fixture`: pure-black frames inserted in 5 s / 15 s / 35 s windows AND simultaneous spoofed-GPS injection on the FC inbound stream. Spoof pattern: realistic-looking GPS jumps the trajectory 200-500 m in `north_east_random_direction`. Three windows produce three sub-scenarios per AC-NEW-8. Generated at runtime. | FT-N-04, NFT-RES-04 | Same | Same |
| `multi-segment-derkachi` | Synthetic overlay: 3+ blackout segments distributed across the Derkachi flight to exercise satellite-reference re-localization (AC-3.3) without spoofing. Generated at runtime. | FT-P-08 | Same | Same |
| `cold-boot-fixture` | The state needed to validate AC-NEW-1: a frozen FC pose (`GLOBAL_POSITION_INT` snapshot at flight-resume time) + the tile-cache-fixture + a blank FDR. Test cold-boots the SUT and measures TTFF. | NFT-PERF-03 (AC-NEW-1) | The frozen FC pose is a JSON fixture in `tests/fixtures/cold-boot/`; SUT is restarted (`docker compose restart gps-denied-onboard`) and TTFF is measured from container-ready event to first valid `GPS_INPUT` / `MSP2_SENSOR_GPS` arrival at SITL | Container restart only |
| `mavlink-passkey` | A test-only MAVLink 2.0 signing passkey (32-byte hex). Used for D-C8-9 ArduPilot-track signing channel. NEVER reused outside test environment; checked-in as `tests/fixtures/secrets/mavlink-test-passkey.txt` with explicit comment "TEST ONLY". | FT-P-09 (AP track), NFT-SEC-03 | Loaded via Docker secret into SUT environment | None — fixture file |
| `cve-jpeg-fixture` | Crafted JPEG that triggers CVE-2025-53644 (uninitialized stack pointer → heap buffer write) in OpenCV 4.10/4.11. The pinned ≥4.12.0 must process it without crash and either decode safely or reject. | NFT-SEC-04 | Local-data-only fixture file at `tests/fixtures/security/cve-2025-53644.jpg` (sourced from public PoC, license-checked) | None — fixture file |
| `cold-boot-fixture` | The state needed to validate AC-NEW-1: a frozen FC pose (`GLOBAL_POSITION_INT` snapshot at flight-resume time) + the tile-cache-fixture + a blank FDR. Test cold-boots the SUT and measures TTFF. | NFT-PERF-03 (AC-NEW-1) | The frozen FC pose is a JSON fixture in `e2e/fixtures/cold-boot/`; SUT is restarted (`docker compose restart gps-denied-onboard`) and TTFF is measured from container-ready event to first valid `GPS_INPUT` / `MSP2_SENSOR_GPS` arrival at SITL | Container restart only |
| `mavlink-passkey` | A test-only MAVLink 2.0 signing passkey (32-byte hex). Used for D-C8-9 ArduPilot-track signing channel. NEVER reused outside test environment; checked-in as `e2e/fixtures/secrets/mavlink-test-passkey.txt` with explicit comment "TEST ONLY". | FT-P-09 (AP track), NFT-SEC-03 | Loaded via Docker secret into SUT environment | None — fixture file |
| `cve-jpeg-fixture` | Crafted JPEG that triggers CVE-2025-53644 (uninitialized stack pointer → heap buffer write) in OpenCV 4.10/4.11. The currently-pinned `opencv-python>=4.11.0.86,<4.12` must process it without crash and either decode safely or reject. NFT-SEC-04 also exercises ASan to confirm no buffer overflow. The original D-CROSS-CVE-1 spec required `>=4.12.0`; the pin is held below 4.12 because gtsam==4.2 ships only numpy-1 wheels (see `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md` — D-CROSS-CVE-1 leftover OPEN until upstream gtsam targets numpy>=2). | NFT-SEC-04 | Local-data-only fixture file at `e2e/fixtures/security/cve-2025-53644.jpg` (sourced from public PoC, license-checked) | None — fixture file |
| `sitl-replay-fixture-p01` | Pre-captured FT-P-01 FDR-replay set built by `e2e/fixtures/sitl_replay_builder/build_p01_fixtures.py` from the 60 still images. Contains `outbound_messages_<fc_kind>_<host>.json` (per-image lat/lon emitted by SUT; `null` entries encode timeouts), `observer_<fc_kind>_<host>.json` (sitl_observer config), `stills.mp4` (60-image stitched video), `stationary.tlog` (synthetic stationary IMU/ATTITUDE), `fdr.jsonl` (FDR archive). Activated by `E2E_SITL_REPLAY_DIR=e2e/fixtures/sitl_replay/p01` (see environment.md § Replay-Mode Skip Gating). | FT-P-01 | Pre-committed at `e2e/fixtures/sitl_replay/p01/`; rebuild via `python -m e2e.fixtures.sitl_replay_builder.build_p01_fixtures --input-dir _docs/00_problem/input_data --output-dir e2e/fixtures/sitl_replay/p01 --fc-kind ardupilot --host sitl-host` | None — committed fixture |
| `sitl-replay-fixture-p02` | Pre-captured FT-P-02 Derkachi drift FDR-replay set built by `e2e/fixtures/sitl_replay_builder/build_p02_fixtures.py` from `flight_derkachi.mp4` + `data_imu.csv`. Contains `derkachi.tlog`, `fdr/fdr.jsonl`, `observer_<fc_kind>_<host>.json`. iNav not supported by current builder — ArduPilot only. | FT-P-02 | Pre-committed at `e2e/fixtures/sitl_replay/p02/`; rebuild via `python -m e2e.fixtures.sitl_replay_builder.build_p02_fixtures --derkachi-dir _docs/00_problem/input_data/flight_derkachi --output-dir e2e/fixtures/sitl_replay/p02 --fc-kind ardupilot --host sitl-host` | None — committed fixture |
| `fc-proxy-schedule` | JSON schedule loaded by `e2e/fixtures/injectors/fc_proxy.BlackoutSpoofProxy` to drive FT-N-04 blackout + spoofed-GPS windows on the FC inbound stream. Schedule format: `window_start_ms`, `window_end_ms`, `spoof_pattern` per window. Loaded via `BlackoutSpoofProxy.from_schedule_file(schedule_path)` and replayed by `runner/helpers/fc_proxy_runtime.drive_fc_proxy(...)` (AZ-596). | FT-N-04, NFT-RES-04 | Generated alongside the scenario's `blackout-spoof-derkachi` overlay; written to per-test tmpfs OR pre-captured under `e2e/fixtures/sitl_replay/<scenario>/proxy_schedule.json` when in FDR-replay mode | Auto-cleared at teardown (tmpfs) or committed (FDR-replay) |
## Data Isolation Strategy
@@ -24,6 +27,8 @@ For Tier-2 (Jetson hardware), the same isolation discipline applies but at the s
Synthetic-injection fixtures (`outlier-injection-derkachi`, `blackout-spoof-derkachi`, `multi-segment-derkachi`, `synth-age-tile-set`) are generated into per-test tmpfs and never written back to a persistent volume.
`sitl-replay-fixture-*` and `fc-proxy-schedule` (when in FDR-replay mode) are committed under `e2e/fixtures/sitl_replay/<scenario>/` and read read-only by the replay-mode scenarios. They are not regenerated per test — the builders under `e2e/fixtures/sitl_replay_builder/` are invoked manually (or by a fixture-refresh CI job) when the SUT replay contract changes. When `E2E_SITL_REPLAY_DIR` is unset, the gated scenarios skip cleanly via the `sitl_replay_ready` pytest marker (per AZ-594/595/598/599) and the harness falls back to live-mode (which requires the full Docker compose stack).
## Input Data Mapping
| Input Data File | Source Location | Description | Covers Scenarios |