[AZ-603] [AZ-604] e2e-runner: install SUT, fix entrypoint (Track 1)

Multi-stage Ubuntu 22.04 e2e-runner image installs gps-denied-onboard
(editable) into /opt/venv so the AZ-404 replay tests can subprocess
gps-denied-replay against the Derkachi fixture. Image layout mirrors
the host repo (/opt/pyproject.toml + /opt/src + /opt/tests bind mount)
so Path(__file__).parents[3] resolves to /opt and AC-4's AST scan
finds the components dir.

Entrypoint now runs `pytest /opt/tests/e2e/` instead of the empty
`scenarios/` dir. The bootstrap harness collects 24 tests vs. 0 before.

Compose: e2e-runner env mirrors the companion service (FullSystemConfig
requirements) plus RUN_REPLAY_E2E=1, BUILD_REPLAY_SINK_JSONL=ON;
bind-mounts the Derkachi fixture dir; adds writable fdr-data /
tile-data volumes the SUT requires.

Reality Gate signal is now real: 17 pass / 5 fail / 1 skip / 1 xfail.
The 5 heavy-AC failures share root cause AZ-614 (tlog synth time-base
mismatch, surfaced by the now-functional harness).

Also archives the replayed leftover entries (csv_reporter -> AZ-601,
harness rehab -> AZ-602 epic + 11 child stories).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-18 01:28:36 +03:00
parent 5c1c35da9a
commit c2934b8686
6 changed files with 204 additions and 294 deletions
@@ -210,8 +210,107 @@ The Track 2 ("Full blackbox harness") track from the previous section needs to e
- Commit `eb6dc17` — csv_reporter / pytest-csv fix
- Commit `6ce3158` — e2e/docker harness drift fixes (H-1, H-2, H-3)
- Local fix (uncommitted, ready to commit): `tests/fixtures/calibration/adti26.json` — H-12 4×4 SE3 fix
- Local fix (uncommitted, ready to commit): `tests/fixtures/replay_config_minimal.yaml` — minimal config for path-3 reproduction
- Commit `5c1c35d` — H-12 4×4 SE3 calibration fix + replay_config_minimal.yaml
- This report: `_docs/03_implementation/run_tests_step11_report.md`
- Leftover for pytest-csv ticket: `_docs/_process_leftovers/2026-05-17_csv_reporter_pytest_csv_conflict.md`
- Leftover for harness epic: `_docs/_process_leftovers/2026-05-17_e2e_harness_rehabilitation.md`
- (Replayed and removed) Leftover for pytest-csv ticket → AZ-601
- (Replayed and removed) Leftover for harness epic → AZ-602 + 11 child stories
## Cycle-2 Update: Track 1 Bootstrap Harness Outcome (2026-05-17 22:00 UTC)
### Status: Track 1 done — Reality Gate signal is now REAL
The harness rehabilitation Epic landed as `AZ-602` with 11 child stories. The user picked **Track 1** (`AZ-603` + `AZ-604`) for the shortest path to a genuine SUT Reality Gate signal. Both stories shipped together in a single PR.
### What changed
- `tests/e2e/Dockerfile` rewritten as a three-stage Ubuntu 22.04 build:
- stage 1: system deps (`build-essential`, `libpq-dev`, `libspatialindex-dev`, `python3.10-venv`, `python3-pip`)
- stage 2: SUT editable install (`pip install -e ".[dev]"` into `/opt/venv`)
- stage 3: slim runtime with `python3`, `python3.10`, `libpq5`, `libspatialindex-c6`, `libgl1`, `libglib2.0-0` (OpenCV's runtime libs)
- Image layout: `/opt/pyproject.toml` + `/opt/src/...` + `/opt/tests/...` (bind-mounted) — mirrors the host repo so `Path(__file__).resolve().parents[3]` resolves to `/opt` and AC-4's AST scan finds `src/gps_denied_onboard/components/` correctly.
- Entrypoint: `pytest -q /opt/tests/e2e/` (not the empty `scenarios/` dir).
- `docker-compose.test.yml` `e2e-runner` service gets the full env set (`GPS_DENIED_FC_PROFILE`, `CAMERA_CALIBRATION_PATH`, `LOG_LEVEL`, `LOG_SINK`, `INFERENCE_BACKEND`, `FDR_PATH`, `TILE_CACHE_PATH`, `MAVLINK_SIGNING_KEY`, `RUN_REPLAY_E2E=1`, `BUILD_REPLAY_SINK_JSONL=ON`) plus mounts for `_docs/00_problem/input_data` and writable `fdr-data` / `tile-data` named volumes.
### Reality Gate run
Standalone docker run of the e2e-runner (no companion / mock-sat / db needed for AZ-404):
```
docker run --rm \
-v "$PWD/tests:/opt/tests:ro" \
-v "$PWD/_docs/00_problem/input_data:/opt/_docs/00_problem/input_data:ro" \
-e RUN_REPLAY_E2E=1 -e BUILD_REPLAY_SINK_JSONL=ON \
... (full env set) ... \
--entrypoint pytest gps-denied-onboard/e2e-runner:dev \
-v --tb=short /opt/tests/e2e/
```
Result:
| Outcome | Count | Tests |
|---------|-------|-------|
| PASSED | 17 | AC-4 AST scan, AC-4b encoder byte-equality, AC-7 skip-gate, all AC-9 helpers (`test_helpers.py`) |
| FAILED | 5 | AC-1, AC-2, AC-5, AC-6 pace-realtime, AC-6 pace-asap |
| SKIPPED | 1 | AC-8 operator workflow (D-PROJ-2 mock-suite-sat-service not implemented) |
| XFAIL | 1 | AC-3 (calibration intrinsics unknown — documented) |
| **Total collected** | **24** | (vs. 0 before Track 1 — empty `scenarios/` dir) |
### Before vs. after
| Metric | Before Track 1 | After Track 1 |
|--------|---------------|---------------|
| Tests collected by `scripts/run-tests.sh` | 0 (entrypoint points at empty `scenarios/`) | 24 (full `tests/e2e/`) |
| Tests that actually exercise the SUT | 0 | 5 heavy ACs invoke `gps-denied-replay` subprocess |
| Exit code semantics | Vacuous 0 (no tests collected ≠ no SUT bugs) | Reflects real test outcomes |
| `gps-denied-replay` on PATH inside e2e-runner image | no (image was python:3.10-slim + pytest only) | yes (multi-stage SUT install) |
| Source-tree layout inside image matches repo | no (no src present) | yes (`/opt/src/...`, AC-4 passes) |
| Real SUT wall-clock per heavy AC | n/a | ~21 s for the auto-sync probe (see below) |
### Real bug discovered
The 5 failing heavy ACs share a single root cause: **tlog synth time-base mismatch**.
`tests/e2e/replay/_tlog_synth.py:62`:
```python
_TLOG_BASE_TIMESTAMP_US: Final[int] = 1_700_000_000_000_000 # 2023-11-14
# "The absolute value is irrelevant for replay-mode determinism;
# only the delta-between-rows matters." ← STALE COMMENT
```
The auto-sync detector in `replay_input.tlog_video_adapter` DOES use absolute timestamps to compute the video↔tlog offset. With the tlog anchored at Nov 2023 absolute and the synthetic video at relative `t=0`, auto-sync reports `offset_ms=1699999995666` (~54 years) and hard-fails AC-8 (95% frame-window match threshold).
Surface signal from the SUT (the kind of log the Reality Gate was meant to surface):
```
ERROR replay_input.tlog_video_adapter
kind=replay.auto_sync.ac8_validation_failed
msg=auto-sync hard-fail: frame-window match below 95.0% with offset_ms=1699999995666
tlog_takeoff_ns=1700000000000000000 video_motion_onset_ns=4333333333
imu_sample_count=3000 video_frame_count=301
```
This is the same family as H-13 / `AZ-611` (stationary FT-P-01) but on the moving Derkachi fixture with a different root cause (synth time-base, not stationary kinematics). Filed as `AZ-614`.
### Jira state at end of cycle 2
| Issue | Title | Status |
|-------|-------|--------|
| AZ-602 | E2E Tier-1 harness rehabilitation (Epic) | TO DO |
| AZ-601 | csv_reporter `--csv` collision (fixed eb6dc17) | IN TESTING |
| AZ-603 | H-7 Dockerfile entrypoint (Track 1) | DONE (this cycle) |
| AZ-604 | H-8 install SUT in runner image (Track 1) | DONE (this cycle) |
| AZ-605 | H-4..H-6 SITL strategy decision | TO DO |
| AZ-606 | MAVProxy local Dockerfile | TO DO |
| AZ-607 | H-9 tile-cache seeder (linked to AZ-595) | TO DO |
| AZ-608 | H-10 fixture builder `--fdr-out``--output` | TO DO |
| AZ-609 | H-11 fixture builder missing CLI args | TO DO |
| AZ-610 | H-12 calibration JSON 4×4 (fixed 5c1c35d) | DONE |
| AZ-611 | H-13 auto-sync hard-fail on stationary | TO DO (Track 2, decision) |
| AZ-612 | H-14 `.env.example` BUILD_REPLAY_SINK_JSONL | TO DO |
| AZ-613 | H-1..H-3 harness drift (fixed 6ce3158) | DONE |
| AZ-614 | Derkachi tlog synth time-base mismatch | TO DO (Track 2, unblocks AC-1..AC-6) |
### Reality Gate verdict
**Cycle-2 verdict for Step 11**: Reality Gate signal is now REAL — the SUT runs end-to-end for ~21 s on the Derkachi fixture and surfaces a real auto-sync bug. Pre-Track 1, the gate was a vacuous "exit 0 with 0 tests collected" that hid every SUT issue. Track 1 was the minimum investment to make the gate honest; future cycles (Track 2 + AZ-614) will turn the failing ACs green.