[autodev] Update configuration and documentation for cycle-1
ci/woodpecker/push/02-build-push Pipeline failed

- Enhanced `.env.example` with detailed CMake build flags and replay-mode strategy flags for development and CI environments.
- Updated `.gitignore` to include a new deploy rollback bookmark.
- Revised `_docs/_autodev_state.md` to reflect the current task status and steps.
- Added new lessons to `_docs/LESSONS.md` regarding testing and architectural improvements.
- Documented changes in `_docs/02_document/deployment/ci_cd_pipeline.md` to reflect the relaxed OpenCV version pin.
- Updated test data documentation in `_docs/02_document/tests/test-data.md` to clarify fixture usage and paths.

This commit continues the cycle-1 documentation sync and addresses various configuration updates for improved clarity and functionality.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-20 08:05:35 +03:00
parent ab92946833
commit bf13549b32
34 changed files with 3689 additions and 42 deletions
+115 -16
View File
@@ -33,8 +33,8 @@ CI runs Tier-1 on every PR. Tier-2 runs on hardware-attached runners on a nightl
| `gps-denied-onboard` | local build (`docker/Dockerfile`) | The SUT. Production binary built with `BUILD_VINS_MONO=OFF` per locked sub-decision D-C1-1-SUB-A; research builds run a parallel job with `BUILD_VINS_MONO=ON` | 14550/udp (MAVLink to GCS), 5760/tcp (MSP2 to iNav SITL) |
| `ardupilot-plane-sitl` | `ardupilot/ardupilot-sitl:plane-stable` | ArduPilot Plane SITL. Receives `GPS_INPUT` from the SUT; we read its EKF source-set state to validate AC-4.3, AC-NEW-2, AC-5.x | 14550/udp (MAVLink) |
| `inav-sitl` | `inavflight/inav-sitl:9.0.0` | iNav SITL. Receives `MSP2_SENSOR_GPS` from the SUT; we read its GPS provider state | 5760/tcp (MSP2 over TCP per iNav SITL convention) |
| `mock-suite-sat-service` | local build (`tests/fixtures/mock-suite-sat`) | Stubs the parent-suite Satellite Service tile-publish API (read-only ingest contract for AC-NEW-7 voting layer). Returns deterministic fixture tiles | 8080/tcp |
| `e2e-runner` | local build (`tests/runner`) | Pytest-based harness. Drives all replays, reads FDR output, spins SITL scenarios | — |
| `mock-suite-sat-service` | local build (`e2e/fixtures/mock-suite-sat`) | Stubs the parent-suite Satellite Service tile-publish API (read-only ingest contract for AC-NEW-7 voting layer). Returns deterministic fixture tiles | 8080/tcp |
| `e2e-runner` | local build (`e2e/runner`) | Pytest-based harness. Drives all replays, reads FDR output, spins SITL scenarios. See § Harness Implementation Layout below for the per-evaluator inventory. | — |
| `mavproxy-listener` | `ardupilot/mavproxy:latest` | Passive MAVLink listener that captures the SUT → GCS stream into a per-run `.tlog` for assertions | 14551/udp |
### Networks
@@ -47,7 +47,7 @@ CI runs Tier-1 on every PR. Tier-2 runs on hardware-attached runners on a nightl
| Volume | Mounted to | Purpose |
|--------|-----------|---------|
| `tile-cache-fixture` | `gps-denied-onboard:/var/azaion/tile-cache:ro` | Pre-built FAISS HNSW index + tile filesystem. Built once per test run from `tests/fixtures/tile-cache-builder/` from the 60 still-image satellite references and the Derkachi route bbox. Read-only mount mirrors AC-8.3 pre-flight load behavior. |
| `tile-cache-fixture` | `gps-denied-onboard:/var/azaion/tile-cache:ro` | Pre-built FAISS HNSW index + tile filesystem. Built once per test run from `e2e/fixtures/tile-cache-builder/` from the 60 still-image satellite references and the Derkachi route bbox. Read-only mount mirrors AC-8.3 pre-flight load behavior. |
| `fdr-output` | `gps-denied-onboard:/var/azaion/fdr` | Per-flight FDR write target (AC-NEW-3 64 GB cap enforced via Docker `--storage-opt size=64g` on this volume) |
| `input-data` | `e2e-runner:/test-data:ro` | Bind mount of `_docs/00_problem/input_data/` for replay |
| `expected-results` | `e2e-runner:/expected:ro` | Bind mount of `_docs/00_problem/input_data/expected_results/` for assertions |
@@ -117,9 +117,80 @@ volumes:
## Consumer Application
**Tech stack**: Python 3.12, pytest 8.x, pymavlink (MAVLink ground side), `msp_gps_toy` (MSP2 ground side, Rust binary called via subprocess), OpenCV ≥4.12.0 (frame source replay), numpy + scipy (geodesic-distance assertions in WGS84).
**Tech stack**: Python 3.12, pytest 8.x, pymavlink (MAVLink ground side), `msp_gps_toy` (MSP2 ground side, Rust binary called via subprocess), OpenCV ≥4.11.0,<4.12 (frame source replay; see `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md` — pin is held below 4.12 until gtsam ships numpy-2 wheels; D-CROSS-CVE-1 leftover remains open), numpy ≥1.26,<2.0 + scipy (geodesic-distance assertions in WGS84).
**Entry point**: `pytest tests/e2e/` from inside `e2e-runner`. Each scenario is a parameterized pytest case keyed by FC adapter (`ardupilot` / `inav`).
**Entry point**: `pytest e2e/tests/` from inside `e2e-runner`. Each scenario is a parameterized pytest case keyed by FC adapter (`ardupilot` / `inav`) and VioStrategy (`okvis2` / `klt_ransac`) via the session-scoped conftest fixtures.
### Harness Implementation Layout
The blackbox harness implementation lives under `e2e/` (NOT the SUT source tree — public-boundary discipline enforced by `e2e/README.md`):
```
e2e/
├── docker/ Tier-1 entrypoint
│ ├── docker-compose.test.yml Compose stack (services from § Services above)
│ ├── docker-compose.tier2-bridge.yml Compose override for paired-host Tier-2 SITL bridging
│ ├── run-tier1.sh AZ-444 selector-parity wrapper
│ └── secrets/ Mounted Docker secrets (mavlink-passkey)
├── jetson/ Tier-2 entrypoint
│ ├── run-tier2.sh AZ-444 selector-parity wrapper (control-host side)
│ ├── tier2-on-jetson.sh SSH-orchestrated on-Jetson half
│ ├── tier2.service systemd unit template
│ ├── jtop_parser.py jetson_stats / jtop telemetry parser (NFT-LIM-01)
│ └── tegrastats_parser.py tegrastats parser (NFT-LIM-04)
├── runner/ e2e-runner image
│ ├── Dockerfile, conftest.py, pytest.ini, requirements.txt
│ ├── helpers/ Per-AC evaluator + observer modules (47 evaluators
│ │ covering accuracy, AP/iNav contract, blackout-spoof,
│ │ cache poisoning, cold-start, companion reboot,
│ │ CVE probe, e2e latency, egress observer, escalation
│ │ ladder, FDR reader, frame-source replay, IMU replay,
│ │ injector fixtures, MAVLink signing, MAVProxy tlog,
│ │ memory budget, mid-flight tile, mock suite-sat audit,
│ │ Monte Carlo envelope, MRE, multi-segment, outage
│ │ request, outlier tolerance, registration classifier,
│ │ retrieval, sharp-turn, sitl_observer, smoothing,
│ │ spoof promotion, storage budget, streaming, thermal
│ │ envelope, tile-cache inspector, TTFF — see
│ │ `e2e/runner/helpers/` for the authoritative list)
│ └── reporting/ CSV reporter + evidence bundler (AZ-445/446)
│ ├── csv_reporter.py Emits `report.csv` per § Reporting
│ ├── evidence_bundler.py Collects per-run `.tlog`, FDR, telemetry CSVs
│ └── nfr_recorder.py NFR per-stage latency + budget recorder
├── fixtures/ Fixture builders + captured fixtures
│ ├── tile-cache-builder/ `tile-cache-fixture` builder
│ ├── age-injector/ `synth-age-tile-set` builder (FT-N-05)
│ ├── injectors/ Runtime injectors:
│ │ ├── outlier.py `outlier-injection-derkachi` (FT-N-01)
│ │ ├── blackout_spoof.py `blackout-spoof-derkachi` (FT-N-04, NFT-RES-04)
│ │ ├── multi_segment.py `multi-segment-derkachi` (FT-P-08)
│ │ ├── cold_boot.py `cold-boot-fixture` (NFT-PERF-03)
│ │ └── fc_proxy.py FC-inbound blackout/spoof proxy (FT-N-04 driver)
│ ├── sitl_replay/ Captured offline FDR-replay fixtures
│ │ └── p01/ FT-P-01 capture set (see test-data.md)
│ ├── sitl_replay_builder/ Captured-fixture builder framework (AZ-598-600)
│ │ ├── builder.py VideoSource × TlogSource × FdrProjection strategies
│ │ ├── build_p01_fixtures.py FT-P-01 still-image builder
│ │ └── build_p02_fixtures.py FT-P-02 Derkachi builder
│ ├── mock-suite-sat/ `mock-suite-sat-service` Docker image
│ ├── secrets/ Test-only secrets (mavlink-test-passkey.txt)
│ └── security/ Security fixtures (cve-2025-53644.jpg)
├── tests/ Pytest target: positive/, negative/, performance/,
│ resilience/, security/, resource_limit/
└── _unit_tests/ Out-of-container unit tests for harness internals
(runs as part of project pytest, no Docker required)
```
### Replay-Mode Skip Gating
Several FT-* and FT-N-* scenarios rely on a pre-captured FDR-replay fixture instead of a live SITL run. When the `E2E_SITL_REPLAY_DIR` environment variable is unset, those scenarios skip cleanly via a `sitl_replay_ready` pytest marker (per AZ-594/595/598/599). To activate them:
```bash
E2E_SITL_REPLAY_DIR=e2e/fixtures/sitl_replay/p01 \
pytest e2e/tests/positive/test_ft_p_01_still_image_accuracy.py
```
The captured-fixture builder framework (`e2e/fixtures/sitl_replay_builder/`) regenerates these fixtures from `_docs/00_problem/input_data/` against a live compose stack; the captured artifacts are then committed under `e2e/fixtures/sitl_replay/<scenario>/`. See `e2e/fixtures/sitl_replay_builder/README.md` for the framework, supported scenarios, and per-scenario builder invocations.
### Communication with system under test
@@ -191,7 +262,7 @@ volumes:
| OS-specific services | tegrastats / jetson_stats for thermal telemetry | `_docs/02_document/tests/resource-limit-tests.md` NFT-LIM-04 |
| Thermal envelope | -20 °C to +50 °C operating envelope, 25 W TDP, 8 h duty cycle | `_docs/00_problem/restrictions.md` § Failsafe & Safety + AC-NEW-5 |
(Step 2 Code scan returned zero indicators because no source code exists yet — this is the planning phase. Decompose → Implement will produce `requirements.txt` / `pyproject.toml` / Cargo.toml entries that confirm: `tensorrt`, `pycuda`, `pymavlink`, `gtsam`, `faiss-gpu`, `opencv-python>=4.12.0`, `jetson-stats`.)
(Step 2 Code scan from the planning phase returned zero indicators because no source code existed yet. Post-implementation: `pyproject.toml` confirms `tensorrt`, `pymavlink`, `gtsam==4.2.1`, `faiss-gpu`, `opencv-python>=4.11.0.86,<4.12` (cycle-1 relaxation per `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md` — the original `>=4.12.0` target replays once gtsam ships numpy-2 wheels), and `jetson-stats`. `pycuda` was NOT added — TensorRT EP is invoked via ONNX Runtime + the `onnx_trt_ep_runtime` factory, which uses TensorRT's Python bindings directly without `pycuda`.)
### Execution instructions — Tier-1 (Docker)
@@ -200,19 +271,33 @@ volumes:
- NVIDIA Container Toolkit if the workstation has an NVIDIA dGPU (lets the SUT exercise the TensorRT path; otherwise falls back to CPU TensorRT).
- ≥16 GB host RAM, ≥80 GB free disk for `tile-cache-fixture` + `fdr-output` + image build cache.
**How to start**:
**How to start** (preferred — selector-parity wrapper from AZ-444):
```bash
./e2e/docker/run-tier1.sh \
--fc-adapter ardupilot \
--vio-strategy okvis2 \
[-k <pytest selector>] \
[--build-kind production|asan] \
[--enable-chamber]
```
`run-tier1.sh` and `e2e/jetson/run-tier2.sh` accept the same `-k <selector>` flag and emit the same pytest invocation modulo the `TIER` env var (AZ-444 AC-1).
Raw-compose equivalent (when bypassing the wrapper for debugging):
```bash
cd e2e/docker
export FC_ADAPTER=ardupilot # or: inav (parameterized per scenario in CI)
export VIO_STRATEGY=okvis2 # or: klt_ransac (production binary)
export FC_ADAPTER=ardupilot VIO_STRATEGY=okvis2
docker compose -f docker-compose.test.yml up --build --abort-on-container-exit e2e-runner
```
The run reports to `./e2e-results/run-${RUN_ID}/report.csv` (see § Reporting). Exit code matches the test verdict.
**Environment variables**:
- `FC_ADAPTER``{ardupilot, inav}` — selects which SITL the SUT talks to.
- `VIO_STRATEGY``{okvis2, klt_ransac}` for production binary; `vins_mono` only when the research binary `BUILD_VINS_MONO=ON` is the build.
- `MAVLINK_SIGNING_PASSKEY_FILE` — path to the Docker secret loaded with the test passkey for FT-P-09-AP / NFT-SEC-03.
- `E2E_SITL_REPLAY_DIR` — when set, activates captured-fixture FDR-replay mode for scenarios that gate on `sitl_replay_ready`; unset → those scenarios skip cleanly (see § Replay-Mode Skip Gating above).
- `RUN_ID` — per-invocation run identifier; defaults to `local-${USER}-${EPOCH}` in development, CI sets it from the workflow run id. Determines the `e2e-results/run-${RUN_ID}/` output directory.
**Skipped on Tier-1**: `NFT-PERF-01` (AC-4.1 latency p95 — Jetson-bound), `NFT-LIM-01` (AC-4.2 memory — Jetson-bound), `NFT-PERF-03` (AC-NEW-1 cold-start — Jetson-bound), `NFT-LIM-04` (AC-NEW-5 chamber baseline — Jetson-bound), AC-NEW-5 chamber portion (chamber-bound).
@@ -225,20 +310,34 @@ The run reports to `./e2e-results/run-${RUN_ID}/report.csv` (see § Reporting).
- ArduPilot Plane SITL + iNav SITL run on the same Jetson, OR on a paired x86 host on the same network — both are supported.
- Real ADTi 20MP 20L V1 camera connected via USB/MIPI-CSI/GigE; OR file-replay source if camera unavailable (in which case all `AC-2.x` cross-validation is `XFAIL` for that run).
**How to start**:
**How to start** (AZ-444 selector-parity wrapper):
```bash
cd e2e/jetson
sudo systemctl restart gps-denied-onboard.service
./run-tier2.sh --fc-adapter ardupilot --vio-strategy okvis2 --duration 8h
# or:
./run-tier2.sh --fc-adapter inav --vio-strategy klt_ransac --duration 5min
./e2e/jetson/run-tier2.sh \
--fc-adapter ardupilot \
--vio-strategy okvis2 \
[-k <pytest selector>] \
[--build-kind production|asan] \
[--duration 5min|8h] \
[--enable-chamber] \
[--reflash]
```
Outputs the same CSV format as Tier-1 (one report.csv per run).
The Tier-2 SITL stack runs on a paired x86 host via:
```bash
docker compose \
-f e2e/docker/docker-compose.test.yml \
-f e2e/docker/docker-compose.tier2-bridge.yml up ...
```
When invoked on a control host (typical), the script SSH-orchestrates the Jetson half (`tier2-on-jetson.sh`). When `TIER2_HOST=localhost` and the script runs on the Jetson itself, it delegates directly without SSH. Outputs the same CSV format as Tier-1 (one report.csv per run) plus tegrastats + jtop CSVs in the evidence bundle.
**Environment variables**: same as Tier-1 plus:
- `TIER2_HOST` / `TIER2_USER` / `TIER2_KEY_PATH` — control-host → Jetson SSH wiring (required when `TIER2_HOST != localhost`).
- `TIER2_CHAMBER_AMBIENT_C` — ambient temperature for AC-NEW-5 chamber runs.
- `TIER2_CAMERA_DEVICE``/dev/video0` (production) or file path for replay mode.
`gps-denied-onboard.service` (or `gps-denied-onboard-asan.service` for `--build-kind=asan`) MUST be installed via systemd on the Jetson — `e2e/jetson/tier2.service` is the template. See `_docs/03_implementation/jetson_harness_setup.md` for the physical provisioning steps.
### CI runner mapping
- `ubuntu-24.04` (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.