[AZ-603] [AZ-604] e2e-runner: install SUT, fix entrypoint (Track 1)

Multi-stage Ubuntu 22.04 e2e-runner image installs gps-denied-onboard (editable) into /opt/venv so the AZ-404 replay tests can subprocess gps-denied-replay against the Derkachi fixture. Image layout mirrors the host repo (/opt/pyproject.toml + /opt/src + /opt/tests bind mount) so Path(__file__).parents[3] resolves to /opt and AC-4's AST scan finds the components dir. Entrypoint now runs `pytest /opt/tests/e2e/` instead of the empty `scenarios/` dir. The bootstrap harness collects 24 tests vs. 0 before. Compose: e2e-runner env mirrors the companion service (FullSystemConfig requirements) plus RUN_REPLAY_E2E=1, BUILD_REPLAY_SINK_JSONL=ON; bind-mounts the Derkachi fixture dir; adds writable fdr-data / tile-data volumes the SUT requires. Reality Gate signal is now real: 17 pass / 5 fail / 1 skip / 1 xfail. The 5 heavy-AC failures share root cause AZ-614 (tlog synth time-base mismatch, surfaced by the now-functional harness). Also archives the replayed leftover entries (csv_reporter -> AZ-601, harness rehab -> AZ-602 epic + 11 child stories). Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-21 08:41:12 +00:00 · 2026-05-18 01:28:36 +03:00
parent 5c1c35da9a
commit c2934b8686
6 changed files with 204 additions and 294 deletions
@@ -210,8 +210,107 @@ The Track 2 ("Full blackbox harness") track from the previous section needs to e

 - Commit `eb6dc17` — csv_reporter / pytest-csv fix
 - Commit `6ce3158` — e2e/docker harness drift fixes (H-1, H-2, H-3)
- Local fix (uncommitted, ready to commit): `tests/fixtures/calibration/adti26.json` — H-12 4×4 SE3 fix
- Local fix (uncommitted, ready to commit): `tests/fixtures/replay_config_minimal.yaml` — minimal config for path-3 reproduction
+- Commit `5c1c35d` — H-12 4×4 SE3 calibration fix + replay_config_minimal.yaml
 - This report: `_docs/03_implementation/run_tests_step11_report.md`
- Leftover for pytest-csv ticket: `_docs/_process_leftovers/2026-05-17_csv_reporter_pytest_csv_conflict.md`
- Leftover for harness epic: `_docs/_process_leftovers/2026-05-17_e2e_harness_rehabilitation.md`
+- (Replayed and removed) Leftover for pytest-csv ticket → AZ-601
+- (Replayed and removed) Leftover for harness epic → AZ-602 + 11 child stories
+
+## Cycle-2 Update: Track 1 Bootstrap Harness Outcome (2026-05-17 22:00 UTC)
+
+### Status: Track 1 done — Reality Gate signal is now REAL
+
+The harness rehabilitation Epic landed as `AZ-602` with 11 child stories. The user picked **Track 1** (`AZ-603` + `AZ-604`) for the shortest path to a genuine SUT Reality Gate signal. Both stories shipped together in a single PR.
+
+### What changed
+
+- `tests/e2e/Dockerfile` rewritten as a three-stage Ubuntu 22.04 build:
+  - stage 1: system deps (`build-essential`, `libpq-dev`, `libspatialindex-dev`, `python3.10-venv`, `python3-pip`)
+  - stage 2: SUT editable install (`pip install -e ".[dev]"` into `/opt/venv`)
+  - stage 3: slim runtime with `python3`, `python3.10`, `libpq5`, `libspatialindex-c6`, `libgl1`, `libglib2.0-0` (OpenCV's runtime libs)
+- Image layout: `/opt/pyproject.toml` + `/opt/src/...` + `/opt/tests/...` (bind-mounted) — mirrors the host repo so `Path(__file__).resolve().parents[3]` resolves to `/opt` and AC-4's AST scan finds `src/gps_denied_onboard/components/` correctly.
+- Entrypoint: `pytest -q /opt/tests/e2e/` (not the empty `scenarios/` dir).
+- `docker-compose.test.yml` `e2e-runner` service gets the full env set (`GPS_DENIED_FC_PROFILE`, `CAMERA_CALIBRATION_PATH`, `LOG_LEVEL`, `LOG_SINK`, `INFERENCE_BACKEND`, `FDR_PATH`, `TILE_CACHE_PATH`, `MAVLINK_SIGNING_KEY`, `RUN_REPLAY_E2E=1`, `BUILD_REPLAY_SINK_JSONL=ON`) plus mounts for `_docs/00_problem/input_data` and writable `fdr-data` / `tile-data` named volumes.
+
+### Reality Gate run
+
+Standalone docker run of the e2e-runner (no companion / mock-sat / db needed for AZ-404):
+
+```
+docker run --rm \
+  -v "$PWD/tests:/opt/tests:ro" \
+  -v "$PWD/_docs/00_problem/input_data:/opt/_docs/00_problem/input_data:ro" \
+  -e RUN_REPLAY_E2E=1  -e BUILD_REPLAY_SINK_JSONL=ON \
+  ... (full env set) ... \
+  --entrypoint pytest gps-denied-onboard/e2e-runner:dev \
+  -v --tb=short /opt/tests/e2e/
+```
+
+Result:
+
+| Outcome | Count | Tests |
+|---------|-------|-------|
+| PASSED | 17 | AC-4 AST scan, AC-4b encoder byte-equality, AC-7 skip-gate, all AC-9 helpers (`test_helpers.py`) |
+| FAILED | 5 | AC-1, AC-2, AC-5, AC-6 pace-realtime, AC-6 pace-asap |
+| SKIPPED | 1 | AC-8 operator workflow (D-PROJ-2 mock-suite-sat-service not implemented) |
+| XFAIL | 1 | AC-3 (calibration intrinsics unknown — documented) |
+| **Total collected** | **24** | (vs. 0 before Track 1 — empty `scenarios/` dir) |
+
+### Before vs. after
+
+| Metric | Before Track 1 | After Track 1 |
+|--------|---------------|---------------|
+| Tests collected by `scripts/run-tests.sh` | 0 (entrypoint points at empty `scenarios/`) | 24 (full `tests/e2e/`) |
+| Tests that actually exercise the SUT | 0 | 5 heavy ACs invoke `gps-denied-replay` subprocess |
+| Exit code semantics | Vacuous 0 (no tests collected ≠ no SUT bugs) | Reflects real test outcomes |
+| `gps-denied-replay` on PATH inside e2e-runner image | no (image was python:3.10-slim + pytest only) | yes (multi-stage SUT install) |
+| Source-tree layout inside image matches repo | no (no src present) | yes (`/opt/src/...`, AC-4 passes) |
+| Real SUT wall-clock per heavy AC | n/a | ~21 s for the auto-sync probe (see below) |
+
+### Real bug discovered
+
+The 5 failing heavy ACs share a single root cause: **tlog synth time-base mismatch**.
+
+`tests/e2e/replay/_tlog_synth.py:62`:
+
+```python
+_TLOG_BASE_TIMESTAMP_US: Final[int] = 1_700_000_000_000_000  # 2023-11-14
+# "The absolute value is irrelevant for replay-mode determinism;
+#  only the delta-between-rows matters."  ← STALE COMMENT
+```
+
+The auto-sync detector in `replay_input.tlog_video_adapter` DOES use absolute timestamps to compute the video↔tlog offset. With the tlog anchored at Nov 2023 absolute and the synthetic video at relative `t=0`, auto-sync reports `offset_ms=1699999995666` (~54 years) and hard-fails AC-8 (95% frame-window match threshold).
+
+Surface signal from the SUT (the kind of log the Reality Gate was meant to surface):
+
+```
+ERROR replay_input.tlog_video_adapter
+  kind=replay.auto_sync.ac8_validation_failed
+  msg=auto-sync hard-fail: frame-window match below 95.0% with offset_ms=1699999995666
+  tlog_takeoff_ns=1700000000000000000  video_motion_onset_ns=4333333333
+  imu_sample_count=3000  video_frame_count=301
+```
+
+This is the same family as H-13 / `AZ-611` (stationary FT-P-01) but on the moving Derkachi fixture with a different root cause (synth time-base, not stationary kinematics). Filed as `AZ-614`.
+
+### Jira state at end of cycle 2
+
+| Issue | Title | Status |
+|-------|-------|--------|
+| AZ-602 | E2E Tier-1 harness rehabilitation (Epic) | TO DO |
+| AZ-601 | csv_reporter `--csv` collision (fixed eb6dc17) | IN TESTING |
+| AZ-603 | H-7 Dockerfile entrypoint (Track 1) | DONE (this cycle) |
+| AZ-604 | H-8 install SUT in runner image (Track 1) | DONE (this cycle) |
+| AZ-605 | H-4..H-6 SITL strategy decision | TO DO |
+| AZ-606 | MAVProxy local Dockerfile | TO DO |
+| AZ-607 | H-9 tile-cache seeder (linked to AZ-595) | TO DO |
+| AZ-608 | H-10 fixture builder `--fdr-out` → `--output` | TO DO |
+| AZ-609 | H-11 fixture builder missing CLI args | TO DO |
+| AZ-610 | H-12 calibration JSON 4×4 (fixed 5c1c35d) | DONE |
+| AZ-611 | H-13 auto-sync hard-fail on stationary | TO DO (Track 2, decision) |
+| AZ-612 | H-14 `.env.example` BUILD_REPLAY_SINK_JSONL | TO DO |
+| AZ-613 | H-1..H-3 harness drift (fixed 6ce3158) | DONE |
+| AZ-614 | Derkachi tlog synth time-base mismatch | TO DO (Track 2, unblocks AC-1..AC-6) |
+
+### Reality Gate verdict
+
+**Cycle-2 verdict for Step 11**: Reality Gate signal is now REAL — the SUT runs end-to-end for ~21 s on the Derkachi fixture and surfaces a real auto-sync bug. Pre-Track 1, the gate was a vacuous "exit 0 with 0 tests collected" that hid every SUT issue. Track 1 was the minimum investment to make the gate honest; future cycles (Track 2 + AZ-614) will turn the failing ACs green.
@@ -4,20 +4,12 @@
 flow: greenfield
 step: 11
 name: Run Tests
-status: blocked
+status: passed_with_followups
 sub_step:
-  phase: 4
-  name: sut-reality-gate-deferred
-  detail: ""
+  phase: 6
+  name: track-1-complete
+  detail: "Track 1 done (AZ-603 + AZ-604 Done). Reality Gate signal now REAL: 17 pass / 5 fail / 1 skip / 1 xfail across 24 tests. AC-1..AC-6 share root cause AZ-614 (tlog synth time-base mismatch). Tracks 2/3 queued for cycle 2."
 retry_count: 0
 cycle: 1
 tracker: jira
 last_completed_batch: 89
-last_cumulative_review: batches_85-87
-current_batch: 89
-
-last_step_outcomes:
-  step_8: "Code is testable — no changes needed (testability_assessment.md committed; no list-of-changes, no source edits)"
-  step_9: "41 blackbox test tasks (AZ-406..AZ-446) under epic AZ-262 in _docs/02_tasks/todo/ pre-existing; AZ-406 test-infra bootstrap pre-existing. Folder fallback satisfied. No Step-9 work executed in cycle 1."
-  step_10: "41 of 41 blackbox-test tasks done (AZ-406..AZ-446). Final report at _docs/03_implementation/implementation_report_tests.md. Full-suite gate handed off to test-run skill per implement Step 16."
-  step_11: "Local Tier-1 pytest: 3343 pass / 88 skip / 0 fail (after csv_reporter fix in eb6dc17). SUT Reality Gate UNMET — both docker harnesses blocked by pre-existing drift (now 14 distinct items: 3 fixed, 11 deferred). Full report: _docs/03_implementation/run_tests_step11_report.md. Path-3 attempt on 2026-05-17 21:30 surfaced H-10..H-14 (fixture-builder/CLI/calibration/auto-sync integration drifts); discovered sitl_observer is offline-fixture replay, NOT live SITL — compose-file SITL services are aspirational. Tickets deferred to leftovers."
@@ -1,79 +0,0 @@
-# Leftover — Bug ticket creation deferred
-
- **Timestamp**: 2026-05-17T16:06:48Z
- **What was blocked**: Jira ticket creation for the `--csv` flag-collision regression
- **Reason for blockage**: surfaced mid-execution of `test-run` (Step 11 of greenfield); the user already
-  skipped my structured-questions prompt in this session, so I did not pause again to confirm a tracker
-  write. Recording the would-be payload here so the next `/autodev` invocation can replay it.
-
-## Background
-
-During Step 11 chunked test-run, three subprocess-based tests in
-`e2e/_unit_tests/reporting/` crashed with
-`argparse.ArgumentError: argument --csv: conflicting option string: --csv`.
-
-Root cause:
-
-1. `e2e/runner/requirements.txt` listed `pytest-csv>=3.0,<4.0`. The package was installed locally and
-   auto-loaded via entry-point into every pytest subprocess.
-2. `e2e/runner/reporting/csv_reporter.py` registered `--csv` with the intent of "overriding"
-   pytest-csv. pytest's option registry does not allow overrides — it raises on conflict.
-3. `pytest-csv 3.0.0` is also incompatible with `pytest 9.x` (uses removed `@pytest.mark.hookwrapper`).
-4. Our code never `import pytest_csv` — the dep was dead weight.
-
-Fix applied in this commit:
-
- Removed `pytest-csv` from `e2e/runner/requirements.txt`
- Updated the docstring in `e2e/runner/reporting/csv_reporter.py`
- Updated the comment in `e2e/runner/conftest.py`
- Uninstalled `pytest-csv` from the local environment
-
-After the fix, all 1229 `e2e/_unit_tests` pass with no skips and no failures.
-
-## Secondary issue — false-positive batch report
-
-`_docs/03_implementation/batch_89_cycle1_report.md` claims:
-
-> Full e2e unit-test suite: **1229 passed in 134 s** (+6 vs. batch 88).
-
-That number was reported without actually running the failing subprocess tests at the time. The 3 tests
-have been broken since `pytest-csv` was installed locally, but the implementation skill's batch report
-did not catch it. This is a process gap: a report claimed verification it had not performed.
-
-A meta-rule retrospective entry should be added (per `meta-rule.mdc` → Self-Improvement) to prevent
-recurrence. Proposed rule: "Before writing `Test Results: X passed` in a batch report, the same shell
-invocation that produced X must appear in the assistant transcript, with the exit code visible."
-
-## Pending tracker write — to replay on next /autodev
-
-```yaml
-type: Bug
-summary: "[Bug] csv_reporter --csv flag collides with pytest-csv autoload — subprocess tests crash"
-description: |
-  Three subprocess-spawned tests in e2e/_unit_tests/reporting/ crash with
-  `argparse.ArgumentError: argument --csv: conflicting option string: --csv`
-  when `pytest-csv` is installed (entry-point autoload) alongside our custom
-  csv_reporter plugin. Fix removed pytest-csv from e2e/runner/requirements.txt.
-
-  Affected tests (pre-fix):
-  - e2e/_unit_tests/reporting/test_csv_reporter.py::test_csv_plugin_emits_required_columns
-  - e2e/_unit_tests/reporting/test_nfr_recorder.py::test_nfr_recorder_fixture_emits_artifacts_in_run
-  - e2e/_unit_tests/reporting/test_nfr_recorder.py::test_per_metric_report_emitted_in_pytest_run
-
-  See commit (this commit hash) for the fix and
-  `_docs/_process_leftovers/2026-05-17_csv_reporter_pytest_csv_conflict.md` for full root-cause.
-  Also: batch_89_cycle1_report.md falsely claimed "1229 passed" — the meta-rule retrospective
-  is pending as a separate item.
-status_after_create: "In Testing"  # fix already in the same commit; ready to close after Jira tag
-story_points: 2
-epic: AZ-262  # blackbox test infrastructure / reporting
-linked_to: AZ-446  # CSV reporter refinements (introduces the affected subprocess test)
-```
-
-## Replay obligation
-
-On next `/autodev`, the bootstrap phase should:
-
-1. Open Jira, create the ticket with the payload above
-2. Read this commit's hash from `git log --oneline | grep csv_reporter` and inject into the description
-3. Delete this leftover entry on success
@@ -1,194 +0,0 @@
-# Leftover — E2E Tier-1 harness rehabilitation tickets deferred
-
- **Timestamp**: 2026-05-17T17:30:00Z
- **What was blocked**: Jira ticket creation for the harness drift surfaced during Step 11
- **Reason for blockage**: same session as the csv_reporter fix; user skipped the
-  structured Q&A so I did not pause for tracker writes. Full findings live in
-  `_docs/03_implementation/run_tests_step11_report.md`; this leftover records the
-  tickets that need filing.
-
-## Pending tracker writes — replay on next /autodev
-
-### Epic
-```yaml
-type: Epic
-summary: "E2E Tier-1 harness rehabilitation"
-description: |
-  Surfaced during /autodev Step 11 (Run Tests) cycle 1 on 2026-05-17. Both
-  Tier-1 docker harnesses (top-level scripts/run-tests.sh and the fuller
-  e2e/docker/run-tier1.sh) had pre-existing drift preventing them from
-  running end-to-end. Local pytest suite is green (3343/88/0); SUT Reality
-  Gate is unmet until at least the bootstrap harness can run
-  tests/e2e/replay/ with RUN_REPLAY_E2E=1. Full report:
-  _docs/03_implementation/run_tests_step11_report.md
-linked_to: AZ-595, AZ-444  # related but distinct: tile-cache fixtures, Tier-2 hw loop
-```
-
-### Story: H-7 — Bootstrap runner entrypoint
-```yaml
-type: Story
-summary: "[Bug] tests/e2e/Dockerfile entrypoint points at empty scenarios dir"
-description: |
-  Current entrypoint: `pytest -q /opt/tests/e2e/scenarios` (empty in repo).
-  Real tests are in `tests/e2e/replay/` (test_derkachi_1min.py, etc.).
-  Fix: change entrypoint to /opt/tests/e2e/ (let pytest discover both
-  scenarios and replay).
-story_points: 1
-```
-
-### Story: H-8 — Install SUT in runner image
-```yaml
-type: Story
-summary: "[Bug] tests/e2e e2e-runner image doesn't install gps-denied-onboard"
-description: |
-  Image is python:3.10-slim with only pytest+requests+pyyaml. The replay
-  tests need `gps-denied-replay` console script on PATH. Either:
-   - COPY pyproject.toml + src/ and pip install -e ".[dev]", or
-   - Build a wheel in a separate stage and pip install it.
-  Verify the resulting image: `which gps-denied-replay`.
-story_points: 3
-```
-
-### Story: H-4..H-6 — SITL/MAVLink images choice
-```yaml
-type: Story
-summary: "[Decision] Choose SITL strategy for e2e/docker harness"
-description: |
-  environment.md specifies ardupilot/ardupilot-sitl:plane-stable,
-  inavflight/inav-sitl:9.0.0, ardupilot/mavproxy:latest. All MISSING from
-  Docker Hub. Options:
-   a) Switch to community images (radarku/ardupilot-sitl etc.)
-   b) Build SITLs from source in a separate stage
-   c) Strip SITL services and mark SITL-bound scenarios skip(reason="sitl-unavailable")
-  Track 1 doesn't depend on this; Track 2 does.
-story_points: 5
-```
-
-### Story: MAVProxy local image
-```yaml
-type: Story
-summary: "[Story] Replace ardupilot/mavproxy:latest with local pip-MAVProxy Dockerfile"
-description: |
-  Image doesn't exist on Docker Hub. Wrap `pip install MAVProxy` in a
-  python:3.10-slim Dockerfile in e2e/fixtures/mavproxy/. Update compose
-  to use the local build.
-story_points: 1
-```
-
-### Story: H-9 — Tile-cache fixture builder
-```yaml
-type: Story
-summary: "Link H-9 to AZ-595 / tile-cache fixture seeder"
-description: |
-  e2e/docker/docker-compose.test.yml declares tile-cache-fixture as an
-  empty named volume. Track 2 cannot run without seeded tiles. AZ-595
-  exists and owns this; verify scope alignment, add a link.
-story_points: 2
-```
-
-### Story: H-10 — Fixture builder uses wrong CLI flag
-```yaml
-type: Story
-summary: "[Bug] sitl_replay_builder uses --fdr-out; CLI requires --output"
-description: |
-  e2e/fixtures/sitl_replay_builder/builder.py:79 passes `--fdr-out` to
-  `gps-denied-replay`. The CLI's actual flag (src/gps_denied_onboard/cli/replay.py:90)
-  is `--output`. Also need to add the CLI's other required args
-  (--camera-calibration, --config, --mavlink-signing-key) — see H-11.
-  Bundle H-10 + H-11 in one PR. Unit tests in
-  e2e/_unit_tests/fixtures/test_sitl_replay_builder_builder.py assert on
-  `--fdr-out` and need to be updated.
-story_points: 2
-```
-
-### Story: H-11 — Fixture builder missing required CLI args
-```yaml
-type: Story
-summary: "[Bug] sitl_replay_builder doesn't pass camera-calibration/config/signing-key"
-description: |
-  gps-denied-replay requires --camera-calibration PATH, --config PATH,
-  --mavlink-signing-key PATH. Fixture builder omits all three. Add
-  fields to FixtureBuilderConfig with defaults pointing at
-  tests/fixtures/calibration/adti26.json, a new
-  tests/fixtures/replay_config_minimal.yaml, and
-  tests/fixtures/mavlink_signing/dev_key. Also set
-  BUILD_REPLAY_SINK_JSONL=ON in the subprocess env.
-story_points: 2
-```
-
-### Bug: H-12 — Calibration JSON shape drift (FIXED)
-```yaml
-type: Bug
-summary: "[Bug] adti26.json body_to_camera_se3 used dict form; loader expects 4x4"
-description: |
-  tests/fixtures/calibration/adti26.json declared body_to_camera_se3 as
-  {rotation_xyzw, translation_xyz_m}. _replay_branch.py:308 does
-  np.asarray(..., dtype=np.float64) which can't decode the dict. Fixed
-  by converting to the equivalent 4x4 identity matrix. Both forms encode
-  the same SE3 (identity) so no behavior change.
-story_points: 1
-status_after_create: "Done"
-```
-
-### Story: H-13 — Auto-sync hard-fails on stationary fixtures
-```yaml
-type: Story
-summary: "[Bug] AC-8 auto-sync validation rejects stationary FT-P-01 fixture"
-description: |
-  Auto-sync (src/gps_denied_onboard/replay_input/...) hard-fails when
-  --time-offset-ms 0 is supplied for a fixture with stationary IMU + no
-  video motion (FT-P-01 still-image scenario). Threshold:
-  frame_window_match_pct_threshold=95% in ReplayAutoSyncConfig defaults.
-  Three possible fixes (design decision needed):
-   a) Add --skip-auto-sync CLI flag that bypasses AC-8 validation entirely
-      when time_offset_ms is explicitly supplied
-   b) Lower or expose match_threshold_pct via config (already configurable
-      but not surfaced in fixture builder)
-   c) Change fixture builder to inject a single motion event so auto-sync
-      can find SOMETHING to align on
-  Recommend (a): aligns with replay protocol intent ("manual offset
-  bypasses auto-sync entirely" per ReplayConfig docstring).
-story_points: 3
-```
-
-### Story: H-14 — Document BUILD_REPLAY_SINK_JSONL in .env.example
-```yaml
-type: Story
-summary: "[Doc] add BUILD_REPLAY_SINK_JSONL=ON to .env.example for replay mode"
-description: |
-  src/gps_denied_onboard/components/c8_fc_adapter/noop_mavlink_transport.py
-  requires BUILD_REPLAY_SINK_JSONL=ON env var to construct. Not in
-  .env.example. Add with comment explaining it's a replay-mode requirement
-  per replay protocol Invariant 9.
-story_points: 1
-```
-
-### Story: H-1..H-3 — fixes already committed
-```yaml
-type: Story
-summary: "[Bug] e2e/docker harness drift (already fixed in commit 6ce3158)"
-description: |
-  Fixed in this session: dockerfile rename, fdr-output tmpfs cap, e2e-results
-  dir + gitignore. Ticket is just for tracking — already in dev branch.
-story_points: 1
-status_after_create: "Done"
-```
-
-### Bug: csv_reporter --csv collision (already committed)
-```yaml
-type: Bug
-summary: "[Bug] csv_reporter --csv flag collides with pytest-csv autoload"
-description: |
-  See _docs/_process_leftovers/2026-05-17_csv_reporter_pytest_csv_conflict.md
-  Fix already in commit eb6dc17.
-linked_to: AZ-446
-story_points: 2
-status_after_create: "Done"
-```
-
-## Replay obligation
-
-Next /autodev should:
-1. Open Jira, create the Epic + Stories above (link Epic to AZ-595 and AZ-444).
-2. Update the Epic with the actual issue keys once created.
-3. Delete this leftover entry on success.
@@ -34,12 +34,38 @@ services:
      db:
        condition: service_healthy
    environment:
+      # FullSystemConfig requires this full env set (see
+      # src/gps_denied_onboard/config/loader.py:_check_required_env).
+      # Values mirror the `companion` service in docker-compose.yml so
+      # the subprocess invoked by the replay tests can resolve identically.
+      GPS_DENIED_FC_PROFILE: ardupilot_plane
      GPS_DENIED_TIER: "1"
      DB_URL: postgresql://gps_denied:dev@db:5432/gps_denied
      SATELLITE_PROVIDER_URL: http://mock-sat:5100
      COMPANION_URL: http://companion:8080
+      CAMERA_CALIBRATION_PATH: /opt/tests/fixtures/calibration/adti26.json
+      LOG_LEVEL: INFO
+      LOG_SINK: console
+      INFERENCE_BACKEND: pytorch_fp16
+      FDR_PATH: /var/lib/gps-denied/fdr
+      TILE_CACHE_PATH: /var/lib/gps-denied/tiles
+      MAVLINK_SIGNING_KEY: /opt/tests/fixtures/mavlink_signing/dev_key
+      # Track-1 bootstrap harness: enable the heavy replay-pipeline tests
+      # in tests/e2e/replay/. AZ-602 / AZ-603 / AZ-604.
+      RUN_REPLAY_E2E: "1"
+      # NoopMavlinkTransport / JsonlReplaySink build flag — the binary
+      # refuses to construct the replay transport without it (AZ-612).
+      BUILD_REPLAY_SINK_JSONL: "ON"
    volumes:
      - ./tests:/opt/tests:ro
+      # Derkachi fixture (~60 s clip) consumed by the replay e2e suite.
+      # Mount path matches `tests/e2e/replay/conftest._derkachi_dir()`,
+      # which resolves to <repo-root>/_docs/00_problem/input_data/...
+      # where <repo-root> == /opt inside the container.
+      - ./_docs/00_problem/input_data:/opt/_docs/00_problem/input_data:ro
+      # Writable runtime dirs the SUT expects (FDR_PATH, TILE_CACHE_PATH).
+      - fdr-data:/var/lib/gps-denied/fdr
+      - tile-data:/var/lib/gps-denied/tiles

 volumes:
  db-data: {}
@@ -1,5 +1,71 @@
-# Slim pytest container for the suite-level e2e harness.
-FROM python:3.10-slim
-WORKDIR /opt/tests
-RUN pip install --no-cache-dir pytest requests pyyaml
-ENTRYPOINT ["pytest", "-q", "/opt/tests/e2e/scenarios"]
+# Tier-1 e2e-runner image — multi-stage.
+#
+# Installs the gps-denied-onboard SUT so `gps-denied-replay` is on PATH for
+# the tests in `tests/e2e/replay/`. Mirrors the install layout of
+# `docker/companion-tier1.Dockerfile` minus the C++ build stage — the test
+# runner only needs the Python entry points.
+#
+# Image layout intentionally mirrors the repo (so tests that compute
+# `Path(__file__).resolve().parents[3] / "src" / "gps_denied_onboard" ...`
+# resolve correctly):
+#
+#   /opt/pyproject.toml
+#   /opt/src/gps_denied_onboard/...   (SUT package, editable install)
+#   /opt/tests/...                    (bind-mounted from host)
+#   /opt/_docs/00_problem/input_data/... (bind-mounted from host)
+#
+# Build context is the repo root (see `docker-compose.test.yml` →
+# `services.e2e-runner.build.context`).
+
+# Stage 1: system deps -------------------------------------------------------
+FROM ubuntu:22.04 AS system-deps
+ARG DEBIAN_FRONTEND=noninteractive
+RUN apt-get update && apt-get install -y --no-install-recommends \
+        ca-certificates \
+        build-essential \
+        libpq-dev \
+        libspatialindex-dev \
+        python3.10 \
+        python3.10-venv \
+        python3-pip \
+    && rm -rf /var/lib/apt/lists/*
+
+# Stage 2: python deps + SUT editable install -------------------------------
+FROM system-deps AS python-deps
+WORKDIR /opt
+COPY pyproject.toml README.md ./
+COPY src ./src
+RUN python3 -m venv /opt/venv \
+    && /opt/venv/bin/pip install --upgrade pip \
+    && /opt/venv/bin/pip install --no-cache-dir -e ".[dev]"
+
+# Stage 3: runtime ----------------------------------------------------------
+FROM ubuntu:22.04 AS runtime
+ARG DEBIAN_FRONTEND=noninteractive
+# Notes on runtime deps beyond python:
+#   * `python3`           — provides the /usr/bin/python3 symlink the venv's
+#                            shebang resolves to; without it every
+#                            console-script fails with "not found".
+#   * `libgl1` + `libglib2.0-0` — opencv-python requires both at runtime.
+#   * `libspatialindex-c6`     — rtree wrapper's native lib.
+#   * `libpq5`                  — psycopg's libpq client.
+RUN apt-get update && apt-get install -y --no-install-recommends \
+        ca-certificates \
+        python3 \
+        python3.10 \
+        libpq5 \
+        libspatialindex-c6 \
+        libgl1 \
+        libglib2.0-0 \
+    && rm -rf /var/lib/apt/lists/*
+COPY --from=python-deps /opt/venv /opt/venv
+COPY --from=python-deps /opt/src /opt/src
+COPY --from=python-deps /opt/pyproject.toml /opt/pyproject.toml
+ENV PATH="/opt/venv/bin:${PATH}"
+ENV PYTHONPATH="/opt/src"
+WORKDIR /opt
+# `pytest /opt/tests/e2e/` exercises both `tests/e2e/replay/` (heavy
+# replay tests gated by RUN_REPLAY_E2E) and any future `tests/e2e/scenarios/`
+# additions. Rootdir resolves to /opt via the COPY'd pyproject.toml so
+# `from tests.e2e.replay._helpers import ...` works inside the test files.
+ENTRYPOINT ["pytest", "-q", "/opt/tests/e2e/"]