[AZ-406] Blackbox test harness bootstrap (Tier-1 + Tier-2 scaffold)

Bootstraps the public-boundary blackbox test harness owned by epic AZ-262 (E-BBT). Establishes the e2e/ directory tree at the repo root, fully separated from src/gps_denied_onboard/** and from the in-process tests/** tree, and commits to the contracts every subsequent test ticket (AZ-407..AZ-446) builds against. Tier-1 (workstation Docker): - docker/docker-compose.test.yml wires SUT + ArduPilot SITL + iNav SITL + mock Suite Sat Service + mavproxy listener + e2e-runner onto one e2e-net bridge with internal: true (enforces RESTRICT-SAT-1 / NFT-SEC-02 egress isolation at the network layer). - docker/docker-compose.tier2-bridge.yml override disables the in- compose SUT so Tier-2 pairs SITLs + mock + runner on an x86 host while the SUT runs natively on the Jetson under systemd. Tier-2 (Jetson): - jetson/run-tier2.sh + tier2.service systemd unit + tegrastats / jtop parsers feed per-sample telemetry into the evidence bundle. Runner image (e2e/runner/): - Dockerfile + requirements.txt install ONLY ground-side libs (pymavlink, opencv-python>=4.12, numpy/scipy/geopy/pyproj, httpx, orjson, pydantic, structlog, pytest 8.x). The runner deliberately does NOT install the SUT package. - conftest.py implements the AC-9 skip-rule mapping (tier2_only, chamber_only, vins_mono, deferred_ac) tied to environment.md parametrize axes. - reporting/csv_reporter.py is a pytest plugin emitting one row per test with the exact 11-column schema from environment.md § Reporting (test_id, test_name, traces_to, fc_adapter, vio_strategy, tier, started_at_utc, execution_time_ms, result, error_message, evidence_paths). XFAIL surfaced only when a test carries @pytest.mark.deferred_ac(verdict="xfail", reason=...). - reporting/evidence_bundler.py exposes the attach_evidence fixture that copies per-test artifacts (.tlog, FDR archives, screenshots, tegrastats / jtop CSVs) into the run bundle and records relative paths into the reporter's evidence_paths column. - helpers/{frame_source_replay,imu_replay,sitl_observer, mavproxy_tlog_reader,fdr_reader}.py declare the public surfaces (concrete implementations owned by AZ-407 / AZ-408 / AZ-416 / AZ-417 / AZ-441 per the dependency table); helpers/geo.py ships today (no downstream task dep) — WGS84 distance / forward-bearing / offset via pyproj with NaN rejection. Mock Suite Sat Service (e2e/fixtures/mock-suite-sat/): - FastAPI app: POST /tiles (ingest contract from D-PROJ-2 follow-up), GET /tiles/audit + /mock/audit (per-run read-back), POST /mock/config (force-status, response delay), POST /mock/reset (clears audit between tests), GET /mock/health. Fixture scaffolds (e2e/fixtures/{tile-cache-builder, age-injector, injectors, cold-boot, secrets, security}/): - Public surfaces only. Concrete builders land in AZ-407 (static fixtures), AZ-408 (runtime synthetic injection), AZ-419 (cold-boot fixture), AZ-439 (CVE-2025-53644 JPEG generator). Test tree (e2e/tests/{positive,negative,performance,resilience, security,resource_limit}/): - Mirror of the test-spec category grouping in _docs/02_document/tests/*-tests.md. - tests/positive/test_smoke.py is the AC-1 harness-boot smoke run inside the e2e-runner image once Docker brings everything up. Out-of-container unit tests (e2e/_unit_tests/): - Exercises the harness internals (CSV reporter plugin lifecycle, conftest skip rules, helper modules, parsers, mock app, compose YAML structural contract, public-boundary enforcement) without Docker / SITL. 97 unit tests, all passing. Build / config: - pyproject.toml: testpaths extended with e2e/_unit_tests; pythonpath extended with e2e; fastapi>=0.111,<0.120 added to dev extras for the mock-app TestClient unit test. AC coverage: - AC-1 (Tier-1 boot) → compose YAML test + directory layout + smoke test (Docker-bound) - AC-2 (mock services) → 6 FastAPI TestClient unit tests - AC-3 (SITLs accept output) → contract present; concrete check deferred to AZ-416 / AZ-417 - AC-4 (CSV columns) → in-process plugin lifecycle test emits the exact 11-column schema - AC-5 (egress isolation) → static config test + runtime probe in Docker-bound smoke - AC-6 (Tier-2 contract) → tegrastats + jtop parser unit tests + jetson/* layout test; full Tier-2 contract is AZ-444 - AC-7 (fixture reproducibility) → deferred to AZ-407 per task spec - AC-8 (parametrize matrix) → vins_mono skip-rule cases + tests/positive/test_smoke - AC-9 (skip semantics) → 9 conftest skip-rule unit tests Module layout entry for blackbox_tests was added in 2026-05-16 preparatory commit d7a17a8 so this diff stays focused on the harness scaffold. AZ-406 advances to In Testing on commit. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 11:31:13 +00:00 · 2026-05-16 16:22:44 +03:00
parent d7a17a8248
commit 59d9116d36
72 changed files with 3515 additions and 6 deletions
@@ -0,0 +1,53 @@
+# e2e-runner image — drives the SUT through public boundaries only.
+#
+# CRITICAL: this image MUST NOT install the SUT package and MUST NOT have
+# `src/gps_denied_onboard/` on its PYTHONPATH. The pytest tree it runs lives
+# at `/test-suite` (bind-mounted) and imports only from `e2e.runner.*` paths
+# baked into this image — never from the SUT.
+#
+# Image size target: ≤ 2 GB (AZ-406 Risk 1 mitigation). The heavy ML stack
+# (tensorrt, gtsam, faiss, cuda) lives in the SUT image, not here.
+
+FROM python:3.12-slim-bookworm AS base
+
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1 \
+    PIP_NO_CACHE_DIR=1 \
+    PIP_DISABLE_PIP_VERSION_CHECK=1
+
+# --- system deps for OpenCV runtime + libffi (msp_gps_toy linkage) + libssl + tini ---
+# OpenCV needs libgl1 + libglib2.0-0 for the JPEG/PNG codecs; tini is a small
+# init that reaps zombie children when pytest forks (`--forked`).
+RUN apt-get update && apt-get install -y --no-install-recommends \
+        libgl1 \
+        libglib2.0-0 \
+        libffi8 \
+        libssl3 \
+        tini \
+        ca-certificates \
+        curl \
+        netcat-openbsd \
+    && rm -rf /var/lib/apt/lists/*
+
+WORKDIR /opt/e2e-runner
+
+COPY requirements.txt /opt/e2e-runner/requirements.txt
+RUN pip install --no-cache-dir -r /opt/e2e-runner/requirements.txt
+
+# Runner package — conftest, helpers, reporting plugins. Copied AFTER pip
+# install so source-only changes don't bust the heavy layer cache.
+COPY __init__.py /opt/e2e-runner/runner/__init__.py
+COPY conftest.py /opt/e2e-runner/runner/conftest.py
+COPY pytest.ini /opt/e2e-runner/pytest.ini
+COPY reporting /opt/e2e-runner/runner/reporting
+COPY helpers /opt/e2e-runner/runner/helpers
+
+ENV PYTHONPATH=/opt/e2e-runner:/opt/e2e-runner/runner
+
+# `/test-suite` is bind-mounted by docker-compose (../tests). The runner
+# default cwd is its own root; the docker-compose `command:` overrides the
+# entrypoint with the explicit `pytest /test-suite ...` invocation.
+WORKDIR /opt/e2e-runner
+
+ENTRYPOINT ["/usr/bin/tini", "--"]
+CMD ["pytest", "/test-suite"]
@@ -0,0 +1,10 @@
+"""e2e-runner package.
+
+Top-level package for the blackbox harness — owns the conftest, the CSV
+reporter plugin, the evidence bundler, and the boundary-driving helpers
+(`frame_source_replay`, `imu_replay`, `sitl_observer`, `mavproxy_tlog_reader`,
+`fdr_reader`, `geo`).
+
+IMPORTANT: nothing under this package may import from `gps_denied_onboard.*`.
+The harness interacts with the SUT only via public boundaries.
+"""
@@ -0,0 +1,214 @@
+"""Top-level pytest conftest for the blackbox e2e harness.
+
+Responsibilities:
+    1. Session-level parameterization over ``(fc_adapter, vio_strategy)``.
+    2. Skip-rule enforcement per the traceability matrix
+       (`_docs/02_document/tests/traceability-matrix.md`):
+         - AC-7.1, AC-7.2 → SKIP (deferred — no AI-camera fixture)
+         - RESTRICT-CAM-2 → SKIP (paired with AC-7.x)
+         - AC-NEW-5 chamber portion → SKIP unless --enable-chamber
+         - RESTRICT-HW-2 chamber portion → SKIP unless --enable-chamber
+         - Tier-2-only tests → SKIP on tier1-docker
+         - `vins_mono` parametrization → SKIP on production-build sessions
+    3. Wiring of the boundary-driving fixtures (`sitl_observer`,
+       `mavproxy_tlog`, `fdr_reader`, `mock_suite_sat_client`) consumed by
+       per-scenario tests.
+
+The actual boundary-driving fixtures import helper modules from
+``runner.helpers.*``. They are registered here but their implementations
+live in the helpers package.
+"""
+
+from __future__ import annotations
+
+import os
+from collections.abc import Iterator
+from pathlib import Path
+
+import pytest
+
+
+# ---------------------------------------------------------------------------
+# Command-line options
+# ---------------------------------------------------------------------------
+
+
+def pytest_addoption(parser: pytest.Parser) -> None:
+    """Harness-level options (not exposed to individual tests)."""
+    group = parser.getgroup("e2e-runner", "Blackbox e2e harness options")
+    group.addoption(
+        "--enable-chamber",
+        action="store_true",
+        default=False,
+        help="Enable thermal-chamber-gated tests (AC-NEW-5 hot-soak, RESTRICT-HW-2). "
+        "Requires the chamber-attached Jetson runner; default off.",
+    )
+    group.addoption(
+        "--build-kind",
+        action="store",
+        default=os.environ.get("BUILD_KIND", "production"),
+        choices=("production", "research"),
+        help="Selects which VIO strategies are valid: production excludes vins_mono.",
+    )
+    group.addoption(
+        "--evidence-out",
+        action="store",
+        default=os.environ.get("EVIDENCE_OUT", "/e2e-results/evidence"),
+        help="Directory the evidence bundler writes per-run artifacts to.",
+    )
+    group.addoption(
+        "--allow-no-skip-reason",
+        action="store_true",
+        default=False,
+        help="Allow @pytest.mark.deferred_ac without an explicit reason= kwarg. "
+        "Default off — every deferred AC must cite its traceability-matrix row.",
+    )
+
+
+# ---------------------------------------------------------------------------
+# Parameterization matrix
+# ---------------------------------------------------------------------------
+
+_FC_ADAPTERS = ("ardupilot", "inav")
+_VIO_STRATEGIES = ("okvis2", "klt_ransac", "vins_mono")
+
+
+def pytest_generate_tests(metafunc: pytest.Metafunc) -> None:
+    """Parametrize tests that request the ``fc_adapter`` / ``vio_strategy`` fixtures.
+
+    Tests opt in by listing the fixture name in their signature. Tests that
+    explicitly do not depend on the matrix simply do not request the fixture.
+    """
+    if "fc_adapter" in metafunc.fixturenames:
+        env_default = os.environ.get("FC_ADAPTER")
+        if env_default:
+            metafunc.parametrize("fc_adapter", [env_default], ids=[env_default])
+        else:
+            metafunc.parametrize("fc_adapter", _FC_ADAPTERS, ids=_FC_ADAPTERS)
+    if "vio_strategy" in metafunc.fixturenames:
+        env_default = os.environ.get("VIO_STRATEGY")
+        if env_default:
+            metafunc.parametrize("vio_strategy", [env_default], ids=[env_default])
+        else:
+            metafunc.parametrize("vio_strategy", _VIO_STRATEGIES, ids=_VIO_STRATEGIES)
+
+
+# ---------------------------------------------------------------------------
+# Skip-rule enforcement (deterministic; runs at collection time)
+# ---------------------------------------------------------------------------
+
+
+def pytest_collection_modifyitems(
+    config: pytest.Config, items: list[pytest.Item]
+) -> None:
+    """Apply traceability-matrix-driven skips before any test executes.
+
+    The mapping between AC / RESTRICT IDs and the SKIP reason strings is the
+    one declared in `_docs/02_document/tests/traceability-matrix.md` §
+    Uncovered Items Analysis. Any change to that matrix MUST be mirrored
+    here (and vice-versa) — the unit tests in
+    `e2e/_unit_tests/test_traceability_skip_rules.py` catch drift.
+    """
+    tier = os.environ.get("TIER", "tier1-docker")
+    chamber_enabled = config.getoption("--enable-chamber")
+    build_kind = config.getoption("--build-kind")
+
+    skip_tier2 = pytest.mark.skip(reason="Tier-2 only — Jetson hardware required")
+    skip_chamber = pytest.mark.skip(
+        reason="Chamber-gated — run with --enable-chamber on the chamber-attached Jetson runner"
+    )
+    skip_research = pytest.mark.skip(
+        reason="vins_mono is research-build-only per D-C1-1-SUB-A"
+    )
+
+    for item in items:
+        # ----- Tier-2 only -----
+        if "tier2_only" in item.keywords and tier != "tier2-jetson":
+            item.add_marker(skip_tier2)
+            continue
+
+        # ----- Chamber only -----
+        if "chamber_only" in item.keywords and not chamber_enabled:
+            item.add_marker(skip_chamber)
+            continue
+
+        # ----- Research-build vs production matrix -----
+        # Skip vins_mono on production-build runs (the marker is set on the
+        # parametrize id, not the test fn — we check the param id).
+        if build_kind == "production":
+            call_params = getattr(item, "callspec", None)
+            if call_params is not None and call_params.params.get("vio_strategy") == "vins_mono":
+                item.add_marker(skip_research)
+                continue
+
+        # ----- Deferred-AC traceability-matrix skips -----
+        deferred = item.get_closest_marker("deferred_ac")
+        if deferred is not None:
+            reason = deferred.kwargs.get("reason")
+            if reason is None and not config.getoption("--allow-no-skip-reason"):
+                # Hard failure at collection — every deferred_ac MUST cite its
+                # matrix row to prevent silent coverage erosion.
+                item.add_marker(
+                    pytest.mark.skip(
+                        reason=(
+                            "deferred_ac marker without reason= kwarg; cite the "
+                            "traceability-matrix row that justifies the deferral, "
+                            "or run with --allow-no-skip-reason for local debugging."
+                        )
+                    )
+                )
+                continue
+            verdict = deferred.kwargs.get("verdict", "skip").lower()
+            if verdict == "xfail":
+                item.add_marker(pytest.mark.xfail(reason=reason or "deferred AC (xfail)", strict=False))
+            else:
+                item.add_marker(
+                    pytest.mark.skip(
+                        reason=(
+                            reason
+                            or "deferred AC — see _docs/02_document/tests/traceability-matrix.md"
+                        )
+                    )
+                )
+
+
+# ---------------------------------------------------------------------------
+# Fixtures
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture(scope="session")
+def run_id() -> str:
+    return os.environ.get("RUN_ID", "local")
+
+
+@pytest.fixture(scope="session")
+def tier() -> str:
+    return os.environ.get("TIER", "tier1-docker")
+
+
+@pytest.fixture(scope="session")
+def evidence_dir(pytestconfig: pytest.Config, run_id: str) -> Path:
+    base = Path(pytestconfig.getoption("--evidence-out"))
+    target = base if base.name == "evidence" else base / "evidence"
+    target.mkdir(parents=True, exist_ok=True)
+    return target
+
+
+@pytest.fixture(scope="session")
+def mock_suite_sat_url() -> str:
+    return os.environ.get("MOCK_SUITE_SAT_URL", "http://mock-suite-sat-service:8080")
+
+
+# ---------------------------------------------------------------------------
+# Plugin registration
+# ---------------------------------------------------------------------------
+
+# The CSV reporter plugin is a separate module so the unit tests can exercise
+# it directly without going through a real pytest run. It is registered via
+# `pytest_plugins` so docker-compose's `--csv=...` flag binds to our column
+# set rather than the upstream pytest-csv default.
+pytest_plugins = [
+    "runner.reporting.csv_reporter",
+    "runner.reporting.evidence_bundler",
+]
@@ -0,0 +1,13 @@
+"""Public-boundary helper modules used by every blackbox test.
+
+Modules:
+    frame_source_replay   — replay images/video to the SUT's V4L2 file source
+    imu_replay            — replay `data_imu.csv` at 10 Hz to the FC inbound
+    sitl_observer         — AP/iNav read-side observers (param reads, GPS_RAW_INT, MSP queries)
+    mavproxy_tlog_reader  — parse `.tlog` files emitted by `mavproxy-listener`
+    fdr_reader            — post-run filesystem read of the FDR archive
+    geo                   — Vincenty / WGS84 geodesic helpers
+
+These modules MUST NOT import from `gps_denied_onboard.*`. Public-boundary
+discipline is enforced by `e2e/_unit_tests/test_no_sut_imports.py`.
+"""
@@ -0,0 +1,59 @@
+"""Post-run filesystem read of the FDR archive.
+
+The FDR archive is a line-delimited JSON record stream per AZ-272 / AZ-273.
+Each line is an `FdrRecord` envelope (producer_id, type, monotonic_ms,
+payload). The runner image must NEVER import the SUT's FdrRecord schema
+directly — it parses the JSON bytes and validates against a duplicate
+record-type allowlist baked into this module.
+
+Public surface only; concrete parser + assertion helpers are owned by
+AZ-441 (NFT-LIM-02 — FDR size budget) and the resilience scenario tasks
+that need to crawl the archive (AZ-432, AZ-433, AZ-435).
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Iterator
+
+
+@dataclass(frozen=True)
+class FdrRecord:
+    """Mirror of `gps_denied_onboard.fdr_client.records.FdrRecord` — public-boundary copy.
+
+    The schema is duplicated intentionally; if the SUT's FDR schema evolves
+    in a breaking way, this duplicate file fails to parse (visible drift)
+    rather than silently following along.
+    """
+
+    producer_id: str
+    monotonic_ms: int
+    record_type: str
+    payload: dict[str, object]
+
+
+def iter_records(fdr_archive_root: Path) -> Iterator[FdrRecord]:
+    """Iterate every FDR record in the archive root (ordered by monotonic_ms).
+
+    Raises NotImplementedError until AZ-441 supplies the orjson-backed parser.
+    """
+    raise NotImplementedError(
+        "fdr_reader.iter_records is owned by AZ-441 — AZ-406 supplies only "
+        "the public surface."
+    )
+
+
+def archive_size_bytes(fdr_archive_root: Path) -> int:
+    """Sum the size of every file under ``fdr_archive_root``.
+
+    Concrete implementation here — it's a thin os.walk + stat loop that
+    NFT-LIM-02 needs as soon as a real archive lands.
+    """
+    if not fdr_archive_root.exists():
+        return 0
+    total = 0
+    for p in fdr_archive_root.rglob("*"):
+        if p.is_file():
+            total += p.stat().st_size
+    return total
@@ -0,0 +1,77 @@
+"""Replay images / video to the SUT's V4L2 file frame source.
+
+Two replay modes:
+    1. Image-set replay (FT-P-01, FT-P-05) — emit a sequence of JPEG / PNG
+       still images at a configurable rate to the file frame source path the
+       SUT polls.
+    2. Video replay (FT-P-02, FT-P-04, FT-N-01..04, NFT-PERF-*) — decode an
+       MP4 with OpenCV and emit frames at the encoded FPS (or a user-supplied
+       rate for fast-forward).
+
+The actual frame-source path inside the SUT container is configured via the
+``ONBOARD_FRAME_SOURCE_PATH`` environment variable on the SUT — the runner
+writes to a shared tmpfs volume mounted at the same path inside both
+containers.
+
+This file currently provides the public surface used by per-scenario tests;
+concrete implementations land alongside their consuming test tasks
+(AZ-407 onward). The intent is that `FrameSourceReplayer` is a stable API
+the test specs can rely on while the underlying replay strategy is filled
+in incrementally.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Protocol
+
+
+@dataclass(frozen=True)
+class ReplayCadence:
+    """Frame-rate / pace configuration for a replay session."""
+
+    fps: float = 10.0
+    realtime: bool = True
+
+
+class FrameSink(Protocol):
+    """Abstract destination for replayed frames (file path or memory queue)."""
+
+    def write_frame(self, jpeg_bytes: bytes, timestamp_ms: int) -> None:
+        ...
+
+
+class FrameSourceReplayer:
+    """Public surface for replaying frames into the SUT's frame-source path.
+
+    AZ-407 (Static fixture builders) supplies the concrete still-image replay
+    implementation; AZ-408 (Runtime synthetic-injection) supplies the video
+    + injector variants. AZ-406 only commits to the contract.
+    """
+
+    def __init__(self, sink: FrameSink, cadence: ReplayCadence | None = None) -> None:
+        self._sink = sink
+        self._cadence = cadence or ReplayCadence()
+
+    def replay_image_directory(self, directory: Path) -> int:
+        """Replay every image in ``directory`` (sorted by name). Returns count emitted.
+
+        Raises NotImplementedError until AZ-407 lands. Tests that need this
+        path should mark themselves @pytest.mark.skip(reason="awaiting AZ-407")
+        until then; AC-1 (smoke) does not depend on this surface.
+        """
+        raise NotImplementedError(
+            "FrameSourceReplayer.replay_image_directory is owned by AZ-407 — "
+            "AZ-406 supplies only the public surface."
+        )
+
+    def replay_video(self, video_path: Path) -> int:
+        """Replay an MP4 / .h264 file frame-by-frame. Returns count emitted.
+
+        Raises NotImplementedError until AZ-408 lands.
+        """
+        raise NotImplementedError(
+            "FrameSourceReplayer.replay_video is owned by AZ-408 — "
+            "AZ-406 supplies only the public surface."
+        )
@@ -0,0 +1,54 @@
+"""WGS84 geodesic helpers — Vincenty distance + bearing for accuracy assertions.
+
+Wraps `pyproj.Geod` (WGS84 ellipsoid) for the few operations the blackbox
+tests need. Kept deliberately small — broader geo math (UTM, MGRS, datum
+conversions) is NOT in scope for the e2e harness.
+
+All inputs are degrees lat / lon (WGS84); all distances are meters.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+
+from pyproj import Geod
+
+_WGS84 = Geod(ellps="WGS84")
+
+
+@dataclass(frozen=True)
+class GeodeticDelta:
+    """Bearing + distance + back-bearing between two WGS84 points."""
+
+    distance_m: float
+    forward_bearing_deg: float
+    reverse_bearing_deg: float
+
+
+def distance_m(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
+    """Vincenty distance in meters between two WGS84 points.
+
+    Raises ValueError on NaN inputs (defensive — silent NaN propagation in
+    a test assertion is the kind of bug this helper exists to prevent).
+    """
+    for name, value in (("lat1", lat1), ("lon1", lon1), ("lat2", lat2), ("lon2", lon2)):
+        if value != value:  # NaN check
+            raise ValueError(f"distance_m: {name} is NaN")
+    _, _, d = _WGS84.inv(lon1, lat1, lon2, lat2)
+    return float(d)
+
+
+def delta(lat1: float, lon1: float, lat2: float, lon2: float) -> GeodeticDelta:
+    """Full geodetic delta: distance + forward/reverse bearings."""
+    fwd_az, rev_az, d = _WGS84.inv(lon1, lat1, lon2, lat2)
+    return GeodeticDelta(
+        distance_m=float(d),
+        forward_bearing_deg=float(fwd_az),
+        reverse_bearing_deg=float(rev_az),
+    )
+
+
+def offset(lat: float, lon: float, bearing_deg: float, distance_m: float) -> tuple[float, float]:
+    """Project ``(lat, lon)`` by ``distance_m`` along ``bearing_deg`` (degrees CW from north)."""
+    new_lon, new_lat, _ = _WGS84.fwd(lon, lat, bearing_deg, distance_m)
+    return float(new_lat), float(new_lon)
@@ -0,0 +1,53 @@
+"""Replay `data_imu.csv` to the FC inbound at 10 Hz.
+
+CSV schema (from `_docs/00_problem/input_data/flight_derkachi/data_imu.csv`):
+    timestamp_ms,ax,ay,az,gx,gy,gz,roll_deg,pitch_deg,yaw_deg,baro_m
+
+Owned by AZ-406 (public surface) + AZ-407 (concrete file-driver
+implementation). This module commits to the type signatures the
+per-scenario tests will import; the actual MAVLink / MSP2 emission is
+wired up by the downstream task.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Protocol
+
+
+@dataclass(frozen=True)
+class ImuSample:
+    """One row of `data_imu.csv` after parsing into native units."""
+
+    timestamp_ms: int
+    accel_mss: tuple[float, float, float]
+    gyro_rps: tuple[float, float, float]
+    attitude_rad: tuple[float, float, float]  # roll, pitch, yaw (radians)
+    baro_alt_m: float
+
+
+class FcInboundEmitter(Protocol):
+    """Abstract emitter — concrete impls are MAVLink (AP) or MSP2 (iNav)."""
+
+    def emit(self, sample: ImuSample) -> None:
+        ...
+
+
+class ImuReplayer:
+    """Drives an `FcInboundEmitter` from a CSV file at the recorded cadence."""
+
+    def __init__(self, emitter: FcInboundEmitter, rate_hz: float = 10.0) -> None:
+        self._emitter = emitter
+        self._rate_hz = rate_hz
+
+    def replay(self, csv_path: Path) -> int:
+        """Replay the CSV file. Returns the number of samples emitted.
+
+        Concrete implementation is owned by AZ-407 (FT-P-02 derkachi-drift
+        + FT-P-04 frame-to-frame registration are the first consumers).
+        """
+        raise NotImplementedError(
+            "ImuReplayer.replay is owned by AZ-407 — AZ-406 supplies only "
+            "the public surface."
+        )
@@ -0,0 +1,48 @@
+"""Parse `.tlog` files emitted by `mavproxy-listener`.
+
+`.tlog` is the standard MAVLink dialect dump format: each message is a
+6-byte unix-microsecond timestamp followed by the wire bytes of the MAVLink
+frame. pymavlink ships `mavlogfile` which knows how to iterate this.
+
+This module exposes a small typed wrapper so per-scenario tests can:
+    1. Filter for the message types they care about.
+    2. Compute summary statistics (count per type, message-rate Hz, ratio
+       of signed vs unsigned messages for NFT-SEC-03).
+    3. Attach the source `.tlog` path to the evidence bundler.
+
+Concrete iteration logic is owned by AZ-416 (FT-P-09-AP); AZ-406 commits
+to the public surface.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Iterator
+
+
+@dataclass(frozen=True)
+class TlogMessage:
+    timestamp_us: int
+    msg_type: str
+    signed: bool
+    fields: dict[str, object]
+
+
+def iter_messages(tlog_path: Path) -> Iterator[TlogMessage]:
+    """Iterate `.tlog` messages oldest-first.
+
+    AZ-406 raises until AZ-416 fills in the pymavlink-backed iterator.
+    """
+    raise NotImplementedError(
+        "mavproxy_tlog_reader.iter_messages is owned by AZ-416 — "
+        "AZ-406 supplies only the public surface."
+    )
+
+
+def count_by_type(tlog_path: Path) -> dict[str, int]:
+    """Return ``{msg_type: count}`` for every distinct message type."""
+    counts: dict[str, int] = {}
+    for msg in iter_messages(tlog_path):
+        counts[msg.msg_type] = counts.get(msg.msg_type, 0) + 1
+    return counts
@@ -0,0 +1,59 @@
+"""ArduPilot Plane / iNav SITL state-read observers.
+
+Reads what the SUT delivered to the FC over its external-positioning
+interface, without ever bypassing the FC's own acceptance path. This is
+the only legal way for blackbox tests to assert AC-4.3 (FC output contract):
+every assertion goes through the SITL's state machine.
+
+Public surface only; concrete pymavlink / yamspy / msp_gps_toy subprocess
+plumbing is owned by AZ-416 (FT-P-09-AP) and AZ-417 (FT-P-09-iNav).
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import Literal, Protocol
+
+FcKind = Literal["ardupilot", "inav"]
+
+
+@dataclass(frozen=True)
+class FcGpsState:
+    """The subset of FC state the e2e tests assert against.
+
+    AP: assembled from EKF source-set + GLOBAL_POSITION_INT replay-back.
+    iNav: assembled from MSP2 GPS-provider state + getRawGPS query.
+    """
+
+    primary_source: str  # "MAV" (AP gps_type=14) or "MSP" (iNav)
+    last_position_lat_deg: float
+    last_position_lon_deg: float
+    last_position_alt_m: float
+    fix_quality: int  # 0..6 per NMEA convention
+    horizontal_accuracy_m: float
+    last_update_age_ms: int
+
+
+class FcSitlObserver(Protocol):
+    """Common observer protocol — implemented by `ArduPilotObserver` + `InavObserver`."""
+
+    fc_kind: FcKind
+
+    def read_gps_state(self) -> FcGpsState:
+        ...
+
+    def read_parameter(self, name: str) -> float | int | str | None:
+        ...
+
+
+def get_observer(fc_kind: FcKind, host: str) -> FcSitlObserver:
+    """Factory — returns the matching observer for the requested FC.
+
+    AZ-416/417 own the concrete return types. AZ-406 raises until those
+    tasks land so test authors can plumb the observer through their
+    fixtures without yet running them.
+    """
+    raise NotImplementedError(
+        f"sitl_observer.get_observer({fc_kind=}, {host=}) is owned by "
+        "AZ-416 (AP) / AZ-417 (iNav) — AZ-406 supplies only the contract."
+    )
@@ -0,0 +1,12 @@
+[pytest]
+minversion = 8.0
+addopts = -ra --strict-markers --timeout=300
+markers =
+    tier2_only: scenario only valid on Tier-2 Jetson hardware (SKIP on tier1-docker)
+    chamber_only: scenario requires the thermal chamber rig (SKIP unless --enable-chamber)
+    research_build_only: scenario only valid on a research build (SKIP when vio_strategy=vins_mono is selected on production matrix)
+    deferred_ac: scenario maps to an AC marked NOT COVERED / PARTIAL in the traceability matrix; emits SKIP or XFAIL with the matrix-mapped reason
+    traces_to(ids): comma-separated AC/RESTRICT IDs the test exercises (consumed by csv_reporter for the `traces_to` column)
+    smoke: minimal verification that the harness boots end-to-end
+filterwarnings =
+    ignore::DeprecationWarning:pymavlink.*
@@ -0,0 +1,7 @@
+"""CSV reporter + evidence bundler — pytest plugins registered by the runner conftest.
+
+`csv_reporter` overrides the upstream pytest-csv default columns with the
+exact column set declared in `_docs/02_document/tests/environment.md` §
+Reporting; `evidence_bundler` collects per-run `.tlog`, FDR archives,
+screenshots, profiler traces, tegrastats / jtop CSVs into a single bundle.
+"""
@@ -0,0 +1,254 @@
+"""CSV reporter pytest plugin.
+
+Emits one row per test with the exact columns declared in
+``_docs/02_document/tests/environment.md`` § Reporting:
+
+    test_id, test_name, traces_to, fc_adapter, vio_strategy, tier,
+    started_at_utc, execution_time_ms, result, error_message, evidence_paths
+
+Why a custom plugin rather than `pytest-csv` defaults?
+    - `pytest-csv` is dependency-installed for its column-extension hooks, but
+      its default emission is `name`/`status`/`duration` — our matrix needs the
+      `traces_to`, `fc_adapter`, `vio_strategy`, `tier`, `started_at_utc`,
+      `evidence_paths` columns to feed the downstream badge generator and
+      regression detector.
+
+Result classification per AC-9:
+    - PASS / FAIL / SKIP map 1:1 to pytest's own outcome.
+    - XFAIL is emitted when the test was marked `deferred_ac(verdict="xfail",
+      reason=...)` and the body raised (the standard pytest XFAIL path).
+
+The plugin is unit-tested in ``e2e/_unit_tests/reporting/test_csv_reporter.py``.
+"""
+
+from __future__ import annotations
+
+import csv
+import os
+import time
+from datetime import datetime, timezone
+
+UTC = timezone.utc
+from pathlib import Path
+from typing import Any
+
+import pytest
+
+CSV_COLUMNS: tuple[str, ...] = (
+    "test_id",
+    "test_name",
+    "traces_to",
+    "fc_adapter",
+    "vio_strategy",
+    "tier",
+    "started_at_utc",
+    "execution_time_ms",
+    "result",
+    "error_message",
+    "evidence_paths",
+)
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def _parametrize_value(item: pytest.Item, name: str, default: str = "n/a") -> str:
+    cs = getattr(item, "callspec", None)
+    if cs is None:
+        return default
+    return str(cs.params.get(name, default))
+
+
+def _traces_to(item: pytest.Item) -> str:
+    marker = item.get_closest_marker("traces_to")
+    if marker is None:
+        return ""
+    ids = marker.args[0] if marker.args else marker.kwargs.get("ids", "")
+    if isinstance(ids, (list, tuple, set)):
+        return ",".join(str(i) for i in ids)
+    return str(ids)
+
+
+def _test_id(item: pytest.Item) -> str:
+    """Stable test id for the CSV `test_id` column.
+
+    Prefers an explicit ``@pytest.mark.test_id("FT-P-01")`` if set, otherwise
+    falls back to pytest's nodeid which is unique per parametrize variant.
+    """
+    marker = item.get_closest_marker("test_id")
+    if marker is not None and marker.args:
+        return str(marker.args[0])
+    return item.nodeid
+
+
+def _outcome_to_result(report: pytest.TestReport, item: pytest.Item) -> str:
+    if report.outcome == "passed":
+        if report.when == "call" and item.get_closest_marker("deferred_ac") is not None:
+            deferred = item.get_closest_marker("deferred_ac")
+            if deferred and deferred.kwargs.get("verdict") == "xfail":
+                return "XFAIL"
+        return "PASS"
+    if report.outcome == "failed":
+        return "FAIL"
+    if report.outcome == "skipped":
+        if report.when == "call" and item.get_closest_marker("deferred_ac") is not None:
+            deferred = item.get_closest_marker("deferred_ac")
+            if deferred and deferred.kwargs.get("verdict") == "xfail":
+                return "XFAIL"
+        return "SKIP"
+    # Unknown outcome — should never happen with stock pytest, but emit a
+    # visible FAIL rather than swallow it silently.
+    return f"FAIL ({report.outcome})"
+
+
+# ---------------------------------------------------------------------------
+# Row builder (exposed for unit tests)
+# ---------------------------------------------------------------------------
+
+
+def build_row(
+    item: pytest.Item,
+    report: pytest.TestReport,
+    started_at_utc: str,
+    execution_time_ms: int,
+    evidence_paths: list[str] | None = None,
+) -> dict[str, str]:
+    """Build the CSV row for a finished test.
+
+    Public function — unit-tested directly without spinning a pytest run.
+    """
+    result = _outcome_to_result(report, item)
+    error_message = ""
+    if report.outcome == "failed":
+        # `longreprtext` is the canonical pytest rendering of the traceback;
+        # we collapse it to a single line for CSV friendliness and truncate
+        # to keep the row from blowing past a reasonable limit.
+        raw = report.longreprtext or repr(getattr(report, "longrepr", ""))
+        error_message = raw.replace("\n", " | ")[:2000]
+    elif report.outcome == "skipped":
+        # `longrepr` on a skip is a 3-tuple (file, lineno, reason).
+        if isinstance(report.longrepr, tuple) and len(report.longrepr) == 3:
+            error_message = str(report.longrepr[2])
+        else:
+            error_message = str(getattr(report, "longrepr", ""))[:2000]
+
+    return {
+        "test_id": _test_id(item),
+        "test_name": item.name,
+        "traces_to": _traces_to(item),
+        "fc_adapter": _parametrize_value(item, "fc_adapter"),
+        "vio_strategy": _parametrize_value(item, "vio_strategy"),
+        "tier": os.environ.get("TIER", "tier1-docker"),
+        "started_at_utc": started_at_utc,
+        "execution_time_ms": str(execution_time_ms),
+        "result": result,
+        "error_message": error_message,
+        "evidence_paths": ",".join(evidence_paths or []),
+    }
+
+
+# ---------------------------------------------------------------------------
+# Plugin hooks
+# ---------------------------------------------------------------------------
+
+
+class _CsvReporter:
+    def __init__(self, output_path: Path) -> None:
+        self._path = output_path
+        self._path.parent.mkdir(parents=True, exist_ok=True)
+        # Per-item start times so we can attribute call-phase duration accurately
+        # (we want call+setup wall-clock, NOT just call duration which omits any
+        # boundary-fixture setup cost).
+        self._start_times: dict[str, tuple[float, str]] = {}
+        self._evidence: dict[str, list[str]] = {}
+        self._rows: list[dict[str, str]] = []
+
+    # --- lifecycle hooks ---
+
+    def pytest_runtest_logstart(self, nodeid: str, location: Any) -> None:  # noqa: ARG002 (pytest hook signature)
+        self._start_times[nodeid] = (time.monotonic(), datetime.now(UTC).isoformat(timespec="seconds"))
+
+    def pytest_runtest_logreport(self, report: pytest.TestReport) -> None:
+        # We emit one row per item, taken from the `call` phase. Setup-phase
+        # SKIPs (e.g. from `pytest.skip()` inside a fixture) lack a `call`
+        # phase, so for those we use the `setup` phase report instead.
+        item = getattr(report, "_item", None)  # populated by pytest_runtest_protocol below
+        if item is None:
+            return
+        if report.when == "call" or (report.when == "setup" and report.outcome == "skipped"):
+            start_mono, start_iso = self._start_times.get(report.nodeid, (time.monotonic(), datetime.now(UTC).isoformat(timespec="seconds")))
+            elapsed_ms = int((time.monotonic() - start_mono) * 1000)
+            evidence = self._evidence.get(report.nodeid, [])
+            row = build_row(item, report, start_iso, elapsed_ms, evidence)
+            self._rows.append(row)
+
+    @pytest.hookimpl(hookwrapper=True)
+    def pytest_runtest_protocol(self, item: pytest.Item, nextitem: pytest.Item | None) -> Any:
+        # Tag the report objects with the originating item so logreport above
+        # can read parametrize ids / markers without a global lookup.
+        original_pytest_runtest_makereport = item.session.config.hook.pytest_runtest_makereport
+
+        def wrapper(*args: Any, **kwargs: Any) -> Any:  # noqa: ANN401
+            report = original_pytest_runtest_makereport(*args, **kwargs)
+            if report is not None:
+                report._item = item  # noqa: SLF001 (intentional plugin attribute)
+            return report
+
+        item.session.config.hook.pytest_runtest_makereport = wrapper
+        outcome = yield
+        item.session.config.hook.pytest_runtest_makereport = original_pytest_runtest_makereport
+        return outcome.get_result() if hasattr(outcome, "get_result") else None
+
+    def pytest_sessionfinish(self, session: pytest.Session, exitstatus: int) -> None:  # noqa: ARG002
+        with self._path.open("w", newline="", encoding="utf-8") as fh:
+            writer = csv.DictWriter(fh, fieldnames=list(CSV_COLUMNS))
+            writer.writeheader()
+            writer.writerows(self._rows)
+
+    # --- public surface for the evidence_bundler plugin to attach paths ---
+
+    def attach_evidence(self, nodeid: str, evidence_path: str) -> None:
+        self._evidence.setdefault(nodeid, []).append(evidence_path)
+
+
+_REPORTER_KEY = pytest.StashKey["_CsvReporter | None"]()
+
+
+def pytest_addoption(parser: pytest.Parser) -> None:
+    group = parser.getgroup("e2e-runner", "Blackbox e2e harness options")
+    group.addoption(
+        "--csv",
+        action="store",
+        default=None,
+        help="Path to the CSV report (one row per test). Default off — set to enable.",
+    )
+    group.addoption(
+        "--csv-columns",
+        action="store",
+        default=",".join(CSV_COLUMNS),
+        help="Comma-separated column order. Default = environment.md § Reporting.",
+    )
+
+
+def pytest_configure(config: pytest.Config) -> None:
+    config.stash[_REPORTER_KEY] = None
+    csv_path = config.getoption("--csv")
+    if csv_path:
+        reporter = _CsvReporter(Path(csv_path))
+        config.stash[_REPORTER_KEY] = reporter
+        config.pluginmanager.register(reporter, name="e2e-csv-reporter")
+    # `traces_to` and `test_id` are pytest markers — register them so
+    # --strict-markers doesn't error on first use.
+    config.addinivalue_line(
+        "markers", "traces_to(ids): comma-separated AC/RESTRICT IDs the test exercises"
+    )
+    config.addinivalue_line(
+        "markers", "test_id(name): override the test_id column (default = pytest nodeid)"
+    )
+
+
+def reporter_for(config: pytest.Config) -> _CsvReporter | None:
+    """Public accessor — used by `evidence_bundler` to attach evidence paths."""
+    return config.stash.get(_REPORTER_KEY, None)
@@ -0,0 +1,84 @@
+"""Evidence bundler pytest plugin.
+
+For each test, collects supporting artifacts (`.tlog`, FDR archive snapshots,
+screenshots, profiler traces, tegrastats / jtop CSVs) into a per-run bundle
+at ``--evidence-out`` (default ``/e2e-results/<run-id>/evidence/``) and
+records the resulting paths in the CSV reporter's ``evidence_paths`` column.
+
+The bundler is INERT by default: tests opt in by calling the
+``attach_evidence`` fixture with a file path. The runner conftest registers
+this plugin via `pytest_plugins`.
+"""
+
+from __future__ import annotations
+
+import shutil
+from collections.abc import Callable
+from pathlib import Path
+
+import pytest
+
+from .csv_reporter import reporter_for
+
+
+def _safe_relpath(target: Path, base: Path) -> str:
+    try:
+        return str(target.relative_to(base))
+    except ValueError:
+        # If the target isn't under base, we still record its absolute path
+        # — the bundle copy below makes the absolute fallback robust to
+        # arbitrary source locations (e.g. /tlogs/<run>.tlog).
+        return str(target)
+
+
+@pytest.fixture
+def attach_evidence(
+    request: pytest.FixtureRequest,
+    evidence_dir: Path,
+) -> Callable[[str | Path], str]:
+    """Copy a file into the run evidence bundle and record its CSV path.
+
+    Returns a callable ``attach(path) -> str`` — the test invokes it after
+    capturing an artifact (e.g., the .tlog file or an FDR snapshot). The
+    returned string is the path that will appear in the CSV
+    ``evidence_paths`` column.
+
+    The implementation copies the file (rather than moving it) so the same
+    artifact can be referenced by multiple tests if needed.
+    """
+    nodeid = request.node.nodeid
+    config = request.config
+    reporter = reporter_for(config)
+    bundle_root = evidence_dir / _slug(nodeid)
+    bundle_root.mkdir(parents=True, exist_ok=True)
+
+    def _attach(path: str | Path) -> str:
+        src = Path(path)
+        if not src.exists():
+            raise FileNotFoundError(f"attach_evidence: {src} not found")
+        dst = bundle_root / src.name
+        # If a test attaches the same name twice in one run, disambiguate.
+        if dst.exists():
+            stem, suffix = src.stem, src.suffix
+            counter = 1
+            while dst.exists():
+                dst = bundle_root / f"{stem}__{counter}{suffix}"
+                counter += 1
+        shutil.copy2(src, dst)
+        rel = _safe_relpath(dst, evidence_dir.parent)
+        if reporter is not None:
+            reporter.attach_evidence(nodeid, rel)
+        return rel
+
+    return _attach
+
+
+def _slug(nodeid: str) -> str:
+    """Filesystem-safe slug for the nodeid (preserves uniqueness, no path chars)."""
+    return (
+        nodeid.replace("/", "_")
+        .replace("::", "__")
+        .replace("[", "_")
+        .replace("]", "")
+        .replace(" ", "")
+    )
@@ -0,0 +1,36 @@
+# e2e-runner image dependencies.
+#
+# Pin reasoning:
+#   - `opencv-python>=4.12.0` honors D-CROSS-CVE-1 (the runner image does NOT
+#     depend on gtsam — the numpy<2 ABI block that forces the SUT pin does not
+#     apply here; see _docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md).
+#   - Versions match the SUT pyproject where feasible (numpy 1.x line, pyproj 3.6+, pydantic 2.x).
+#   - pytest 8.x is the stable line; pytest-csv 3.x supplies the columns the CSV reporter plugin extends.
+
+pytest>=8.0,<9.0
+pytest-timeout>=2.2,<3.0
+pytest-xdist>=3.5,<4.0
+pytest-forked>=1.6,<2.0
+pytest-csv>=3.0,<4.0
+
+# MAVLink ground side — used for both AP signing-handshake assertions and the
+# passive listener that consumes mavproxy-listener's forwarded UDP stream.
+pymavlink>=2.4
+
+# Geodesic + frame replay + numerical assertion stack.
+opencv-python>=4.12.0
+numpy>=1.26,<2.0
+scipy>=1.11,<2.0
+geopy>=2.4,<3.0
+pyproj>=3.6,<4.0
+
+# HTTP client for talking to mock-suite-sat-service.
+httpx>=0.28,<1.0
+pyyaml>=6.0
+pydantic>=2.5,<3.0
+
+# Structured logging in the runner side (mirrors the SUT logger choice).
+structlog>=24.1
+
+# FDR archive reader uses orjson for the line-delimited JSON record format.
+orjson>=3.9,<4.0