[AZ-406] Blackbox test harness bootstrap (Tier-1 + Tier-2 scaffold)

Bootstraps the public-boundary blackbox test harness owned by epic
AZ-262 (E-BBT). Establishes the e2e/ directory tree at the repo root,
fully separated from src/gps_denied_onboard/** and from the in-process
tests/** tree, and commits to the contracts every subsequent test
ticket (AZ-407..AZ-446) builds against.

Tier-1 (workstation Docker):
- docker/docker-compose.test.yml wires SUT + ArduPilot SITL + iNav SITL
  + mock Suite Sat Service + mavproxy listener + e2e-runner onto one
  e2e-net bridge with internal: true (enforces RESTRICT-SAT-1 /
  NFT-SEC-02 egress isolation at the network layer).
- docker/docker-compose.tier2-bridge.yml override disables the in-
  compose SUT so Tier-2 pairs SITLs + mock + runner on an x86 host
  while the SUT runs natively on the Jetson under systemd.

Tier-2 (Jetson):
- jetson/run-tier2.sh + tier2.service systemd unit + tegrastats /
  jtop parsers feed per-sample telemetry into the evidence bundle.

Runner image (e2e/runner/):
- Dockerfile + requirements.txt install ONLY ground-side libs
  (pymavlink, opencv-python>=4.12, numpy/scipy/geopy/pyproj, httpx,
  orjson, pydantic, structlog, pytest 8.x). The runner deliberately
  does NOT install the SUT package.
- conftest.py implements the AC-9 skip-rule mapping (tier2_only,
  chamber_only, vins_mono, deferred_ac) tied to environment.md
  parametrize axes.
- reporting/csv_reporter.py is a pytest plugin emitting one row per
  test with the exact 11-column schema from environment.md §
  Reporting (test_id, test_name, traces_to, fc_adapter, vio_strategy,
  tier, started_at_utc, execution_time_ms, result, error_message,
  evidence_paths). XFAIL surfaced only when a test carries
  @pytest.mark.deferred_ac(verdict="xfail", reason=...).
- reporting/evidence_bundler.py exposes the attach_evidence fixture
  that copies per-test artifacts (.tlog, FDR archives, screenshots,
  tegrastats / jtop CSVs) into the run bundle and records relative
  paths into the reporter's evidence_paths column.
- helpers/{frame_source_replay,imu_replay,sitl_observer,
  mavproxy_tlog_reader,fdr_reader}.py declare the public surfaces
  (concrete implementations owned by AZ-407 / AZ-408 / AZ-416 /
  AZ-417 / AZ-441 per the dependency table); helpers/geo.py ships
  today (no downstream task dep) — WGS84 distance / forward-bearing
  / offset via pyproj with NaN rejection.

Mock Suite Sat Service (e2e/fixtures/mock-suite-sat/):
- FastAPI app: POST /tiles (ingest contract from D-PROJ-2 follow-up),
  GET /tiles/audit + /mock/audit (per-run read-back), POST
  /mock/config (force-status, response delay), POST /mock/reset
  (clears audit between tests), GET /mock/health.

Fixture scaffolds (e2e/fixtures/{tile-cache-builder, age-injector,
injectors, cold-boot, secrets, security}/):
- Public surfaces only. Concrete builders land in AZ-407 (static
  fixtures), AZ-408 (runtime synthetic injection), AZ-419 (cold-boot
  fixture), AZ-439 (CVE-2025-53644 JPEG generator).

Test tree (e2e/tests/{positive,negative,performance,resilience,
security,resource_limit}/):
- Mirror of the test-spec category grouping in
  _docs/02_document/tests/*-tests.md.
- tests/positive/test_smoke.py is the AC-1 harness-boot smoke run
  inside the e2e-runner image once Docker brings everything up.

Out-of-container unit tests (e2e/_unit_tests/):
- Exercises the harness internals (CSV reporter plugin lifecycle,
  conftest skip rules, helper modules, parsers, mock app, compose
  YAML structural contract, public-boundary enforcement) without
  Docker / SITL. 97 unit tests, all passing.

Build / config:
- pyproject.toml: testpaths extended with e2e/_unit_tests; pythonpath
  extended with e2e; fastapi>=0.111,<0.120 added to dev extras for the
  mock-app TestClient unit test.

AC coverage:
- AC-1 (Tier-1 boot)         → compose YAML test + directory layout
                                + smoke test (Docker-bound)
- AC-2 (mock services)       → 6 FastAPI TestClient unit tests
- AC-3 (SITLs accept output) → contract present; concrete check
                                deferred to AZ-416 / AZ-417
- AC-4 (CSV columns)         → in-process plugin lifecycle test
                                emits the exact 11-column schema
- AC-5 (egress isolation)    → static config test + runtime probe
                                in Docker-bound smoke
- AC-6 (Tier-2 contract)     → tegrastats + jtop parser unit tests
                                + jetson/* layout test; full Tier-2
                                contract is AZ-444
- AC-7 (fixture reproducibility) → deferred to AZ-407 per task spec
- AC-8 (parametrize matrix)  → vins_mono skip-rule cases +
                                tests/positive/test_smoke
- AC-9 (skip semantics)      → 9 conftest skip-rule unit tests

Module layout entry for blackbox_tests was added in 2026-05-16
preparatory commit d7a17a8 so this diff stays focused on the harness
scaffold. AZ-406 advances to In Testing on commit.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-16 16:22:44 +03:00
parent d7a17a8248
commit 59d9116d36
72 changed files with 3515 additions and 6 deletions
+53
View File
@@ -0,0 +1,53 @@
# e2e-runner image — drives the SUT through public boundaries only.
#
# CRITICAL: this image MUST NOT install the SUT package and MUST NOT have
# `src/gps_denied_onboard/` on its PYTHONPATH. The pytest tree it runs lives
# at `/test-suite` (bind-mounted) and imports only from `e2e.runner.*` paths
# baked into this image — never from the SUT.
#
# Image size target: ≤ 2 GB (AZ-406 Risk 1 mitigation). The heavy ML stack
# (tensorrt, gtsam, faiss, cuda) lives in the SUT image, not here.
FROM python:3.12-slim-bookworm AS base
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1
# --- system deps for OpenCV runtime + libffi (msp_gps_toy linkage) + libssl + tini ---
# OpenCV needs libgl1 + libglib2.0-0 for the JPEG/PNG codecs; tini is a small
# init that reaps zombie children when pytest forks (`--forked`).
RUN apt-get update && apt-get install -y --no-install-recommends \
libgl1 \
libglib2.0-0 \
libffi8 \
libssl3 \
tini \
ca-certificates \
curl \
netcat-openbsd \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /opt/e2e-runner
COPY requirements.txt /opt/e2e-runner/requirements.txt
RUN pip install --no-cache-dir -r /opt/e2e-runner/requirements.txt
# Runner package — conftest, helpers, reporting plugins. Copied AFTER pip
# install so source-only changes don't bust the heavy layer cache.
COPY __init__.py /opt/e2e-runner/runner/__init__.py
COPY conftest.py /opt/e2e-runner/runner/conftest.py
COPY pytest.ini /opt/e2e-runner/pytest.ini
COPY reporting /opt/e2e-runner/runner/reporting
COPY helpers /opt/e2e-runner/runner/helpers
ENV PYTHONPATH=/opt/e2e-runner:/opt/e2e-runner/runner
# `/test-suite` is bind-mounted by docker-compose (../tests). The runner
# default cwd is its own root; the docker-compose `command:` overrides the
# entrypoint with the explicit `pytest /test-suite ...` invocation.
WORKDIR /opt/e2e-runner
ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["pytest", "/test-suite"]
+10
View File
@@ -0,0 +1,10 @@
"""e2e-runner package.
Top-level package for the blackbox harness — owns the conftest, the CSV
reporter plugin, the evidence bundler, and the boundary-driving helpers
(`frame_source_replay`, `imu_replay`, `sitl_observer`, `mavproxy_tlog_reader`,
`fdr_reader`, `geo`).
IMPORTANT: nothing under this package may import from `gps_denied_onboard.*`.
The harness interacts with the SUT only via public boundaries.
"""
+214
View File
@@ -0,0 +1,214 @@
"""Top-level pytest conftest for the blackbox e2e harness.
Responsibilities:
1. Session-level parameterization over ``(fc_adapter, vio_strategy)``.
2. Skip-rule enforcement per the traceability matrix
(`_docs/02_document/tests/traceability-matrix.md`):
- AC-7.1, AC-7.2 → SKIP (deferred — no AI-camera fixture)
- RESTRICT-CAM-2 → SKIP (paired with AC-7.x)
- AC-NEW-5 chamber portion → SKIP unless --enable-chamber
- RESTRICT-HW-2 chamber portion → SKIP unless --enable-chamber
- Tier-2-only tests → SKIP on tier1-docker
- `vins_mono` parametrization → SKIP on production-build sessions
3. Wiring of the boundary-driving fixtures (`sitl_observer`,
`mavproxy_tlog`, `fdr_reader`, `mock_suite_sat_client`) consumed by
per-scenario tests.
The actual boundary-driving fixtures import helper modules from
``runner.helpers.*``. They are registered here but their implementations
live in the helpers package.
"""
from __future__ import annotations
import os
from collections.abc import Iterator
from pathlib import Path
import pytest
# ---------------------------------------------------------------------------
# Command-line options
# ---------------------------------------------------------------------------
def pytest_addoption(parser: pytest.Parser) -> None:
"""Harness-level options (not exposed to individual tests)."""
group = parser.getgroup("e2e-runner", "Blackbox e2e harness options")
group.addoption(
"--enable-chamber",
action="store_true",
default=False,
help="Enable thermal-chamber-gated tests (AC-NEW-5 hot-soak, RESTRICT-HW-2). "
"Requires the chamber-attached Jetson runner; default off.",
)
group.addoption(
"--build-kind",
action="store",
default=os.environ.get("BUILD_KIND", "production"),
choices=("production", "research"),
help="Selects which VIO strategies are valid: production excludes vins_mono.",
)
group.addoption(
"--evidence-out",
action="store",
default=os.environ.get("EVIDENCE_OUT", "/e2e-results/evidence"),
help="Directory the evidence bundler writes per-run artifacts to.",
)
group.addoption(
"--allow-no-skip-reason",
action="store_true",
default=False,
help="Allow @pytest.mark.deferred_ac without an explicit reason= kwarg. "
"Default off — every deferred AC must cite its traceability-matrix row.",
)
# ---------------------------------------------------------------------------
# Parameterization matrix
# ---------------------------------------------------------------------------
_FC_ADAPTERS = ("ardupilot", "inav")
_VIO_STRATEGIES = ("okvis2", "klt_ransac", "vins_mono")
def pytest_generate_tests(metafunc: pytest.Metafunc) -> None:
"""Parametrize tests that request the ``fc_adapter`` / ``vio_strategy`` fixtures.
Tests opt in by listing the fixture name in their signature. Tests that
explicitly do not depend on the matrix simply do not request the fixture.
"""
if "fc_adapter" in metafunc.fixturenames:
env_default = os.environ.get("FC_ADAPTER")
if env_default:
metafunc.parametrize("fc_adapter", [env_default], ids=[env_default])
else:
metafunc.parametrize("fc_adapter", _FC_ADAPTERS, ids=_FC_ADAPTERS)
if "vio_strategy" in metafunc.fixturenames:
env_default = os.environ.get("VIO_STRATEGY")
if env_default:
metafunc.parametrize("vio_strategy", [env_default], ids=[env_default])
else:
metafunc.parametrize("vio_strategy", _VIO_STRATEGIES, ids=_VIO_STRATEGIES)
# ---------------------------------------------------------------------------
# Skip-rule enforcement (deterministic; runs at collection time)
# ---------------------------------------------------------------------------
def pytest_collection_modifyitems(
config: pytest.Config, items: list[pytest.Item]
) -> None:
"""Apply traceability-matrix-driven skips before any test executes.
The mapping between AC / RESTRICT IDs and the SKIP reason strings is the
one declared in `_docs/02_document/tests/traceability-matrix.md` §
Uncovered Items Analysis. Any change to that matrix MUST be mirrored
here (and vice-versa) — the unit tests in
`e2e/_unit_tests/test_traceability_skip_rules.py` catch drift.
"""
tier = os.environ.get("TIER", "tier1-docker")
chamber_enabled = config.getoption("--enable-chamber")
build_kind = config.getoption("--build-kind")
skip_tier2 = pytest.mark.skip(reason="Tier-2 only — Jetson hardware required")
skip_chamber = pytest.mark.skip(
reason="Chamber-gated — run with --enable-chamber on the chamber-attached Jetson runner"
)
skip_research = pytest.mark.skip(
reason="vins_mono is research-build-only per D-C1-1-SUB-A"
)
for item in items:
# ----- Tier-2 only -----
if "tier2_only" in item.keywords and tier != "tier2-jetson":
item.add_marker(skip_tier2)
continue
# ----- Chamber only -----
if "chamber_only" in item.keywords and not chamber_enabled:
item.add_marker(skip_chamber)
continue
# ----- Research-build vs production matrix -----
# Skip vins_mono on production-build runs (the marker is set on the
# parametrize id, not the test fn — we check the param id).
if build_kind == "production":
call_params = getattr(item, "callspec", None)
if call_params is not None and call_params.params.get("vio_strategy") == "vins_mono":
item.add_marker(skip_research)
continue
# ----- Deferred-AC traceability-matrix skips -----
deferred = item.get_closest_marker("deferred_ac")
if deferred is not None:
reason = deferred.kwargs.get("reason")
if reason is None and not config.getoption("--allow-no-skip-reason"):
# Hard failure at collection — every deferred_ac MUST cite its
# matrix row to prevent silent coverage erosion.
item.add_marker(
pytest.mark.skip(
reason=(
"deferred_ac marker without reason= kwarg; cite the "
"traceability-matrix row that justifies the deferral, "
"or run with --allow-no-skip-reason for local debugging."
)
)
)
continue
verdict = deferred.kwargs.get("verdict", "skip").lower()
if verdict == "xfail":
item.add_marker(pytest.mark.xfail(reason=reason or "deferred AC (xfail)", strict=False))
else:
item.add_marker(
pytest.mark.skip(
reason=(
reason
or "deferred AC — see _docs/02_document/tests/traceability-matrix.md"
)
)
)
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
@pytest.fixture(scope="session")
def run_id() -> str:
return os.environ.get("RUN_ID", "local")
@pytest.fixture(scope="session")
def tier() -> str:
return os.environ.get("TIER", "tier1-docker")
@pytest.fixture(scope="session")
def evidence_dir(pytestconfig: pytest.Config, run_id: str) -> Path:
base = Path(pytestconfig.getoption("--evidence-out"))
target = base if base.name == "evidence" else base / "evidence"
target.mkdir(parents=True, exist_ok=True)
return target
@pytest.fixture(scope="session")
def mock_suite_sat_url() -> str:
return os.environ.get("MOCK_SUITE_SAT_URL", "http://mock-suite-sat-service:8080")
# ---------------------------------------------------------------------------
# Plugin registration
# ---------------------------------------------------------------------------
# The CSV reporter plugin is a separate module so the unit tests can exercise
# it directly without going through a real pytest run. It is registered via
# `pytest_plugins` so docker-compose's `--csv=...` flag binds to our column
# set rather than the upstream pytest-csv default.
pytest_plugins = [
"runner.reporting.csv_reporter",
"runner.reporting.evidence_bundler",
]
+13
View File
@@ -0,0 +1,13 @@
"""Public-boundary helper modules used by every blackbox test.
Modules:
frame_source_replay — replay images/video to the SUT's V4L2 file source
imu_replay — replay `data_imu.csv` at 10 Hz to the FC inbound
sitl_observer — AP/iNav read-side observers (param reads, GPS_RAW_INT, MSP queries)
mavproxy_tlog_reader — parse `.tlog` files emitted by `mavproxy-listener`
fdr_reader — post-run filesystem read of the FDR archive
geo — Vincenty / WGS84 geodesic helpers
These modules MUST NOT import from `gps_denied_onboard.*`. Public-boundary
discipline is enforced by `e2e/_unit_tests/test_no_sut_imports.py`.
"""
+59
View File
@@ -0,0 +1,59 @@
"""Post-run filesystem read of the FDR archive.
The FDR archive is a line-delimited JSON record stream per AZ-272 / AZ-273.
Each line is an `FdrRecord` envelope (producer_id, type, monotonic_ms,
payload). The runner image must NEVER import the SUT's FdrRecord schema
directly — it parses the JSON bytes and validates against a duplicate
record-type allowlist baked into this module.
Public surface only; concrete parser + assertion helpers are owned by
AZ-441 (NFT-LIM-02 — FDR size budget) and the resilience scenario tasks
that need to crawl the archive (AZ-432, AZ-433, AZ-435).
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
from typing import Iterator
@dataclass(frozen=True)
class FdrRecord:
"""Mirror of `gps_denied_onboard.fdr_client.records.FdrRecord` — public-boundary copy.
The schema is duplicated intentionally; if the SUT's FDR schema evolves
in a breaking way, this duplicate file fails to parse (visible drift)
rather than silently following along.
"""
producer_id: str
monotonic_ms: int
record_type: str
payload: dict[str, object]
def iter_records(fdr_archive_root: Path) -> Iterator[FdrRecord]:
"""Iterate every FDR record in the archive root (ordered by monotonic_ms).
Raises NotImplementedError until AZ-441 supplies the orjson-backed parser.
"""
raise NotImplementedError(
"fdr_reader.iter_records is owned by AZ-441 — AZ-406 supplies only "
"the public surface."
)
def archive_size_bytes(fdr_archive_root: Path) -> int:
"""Sum the size of every file under ``fdr_archive_root``.
Concrete implementation here — it's a thin os.walk + stat loop that
NFT-LIM-02 needs as soon as a real archive lands.
"""
if not fdr_archive_root.exists():
return 0
total = 0
for p in fdr_archive_root.rglob("*"):
if p.is_file():
total += p.stat().st_size
return total
+77
View File
@@ -0,0 +1,77 @@
"""Replay images / video to the SUT's V4L2 file frame source.
Two replay modes:
1. Image-set replay (FT-P-01, FT-P-05) — emit a sequence of JPEG / PNG
still images at a configurable rate to the file frame source path the
SUT polls.
2. Video replay (FT-P-02, FT-P-04, FT-N-01..04, NFT-PERF-*) — decode an
MP4 with OpenCV and emit frames at the encoded FPS (or a user-supplied
rate for fast-forward).
The actual frame-source path inside the SUT container is configured via the
``ONBOARD_FRAME_SOURCE_PATH`` environment variable on the SUT — the runner
writes to a shared tmpfs volume mounted at the same path inside both
containers.
This file currently provides the public surface used by per-scenario tests;
concrete implementations land alongside their consuming test tasks
(AZ-407 onward). The intent is that `FrameSourceReplayer` is a stable API
the test specs can rely on while the underlying replay strategy is filled
in incrementally.
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
from typing import Protocol
@dataclass(frozen=True)
class ReplayCadence:
"""Frame-rate / pace configuration for a replay session."""
fps: float = 10.0
realtime: bool = True
class FrameSink(Protocol):
"""Abstract destination for replayed frames (file path or memory queue)."""
def write_frame(self, jpeg_bytes: bytes, timestamp_ms: int) -> None:
...
class FrameSourceReplayer:
"""Public surface for replaying frames into the SUT's frame-source path.
AZ-407 (Static fixture builders) supplies the concrete still-image replay
implementation; AZ-408 (Runtime synthetic-injection) supplies the video
+ injector variants. AZ-406 only commits to the contract.
"""
def __init__(self, sink: FrameSink, cadence: ReplayCadence | None = None) -> None:
self._sink = sink
self._cadence = cadence or ReplayCadence()
def replay_image_directory(self, directory: Path) -> int:
"""Replay every image in ``directory`` (sorted by name). Returns count emitted.
Raises NotImplementedError until AZ-407 lands. Tests that need this
path should mark themselves @pytest.mark.skip(reason="awaiting AZ-407")
until then; AC-1 (smoke) does not depend on this surface.
"""
raise NotImplementedError(
"FrameSourceReplayer.replay_image_directory is owned by AZ-407 — "
"AZ-406 supplies only the public surface."
)
def replay_video(self, video_path: Path) -> int:
"""Replay an MP4 / .h264 file frame-by-frame. Returns count emitted.
Raises NotImplementedError until AZ-408 lands.
"""
raise NotImplementedError(
"FrameSourceReplayer.replay_video is owned by AZ-408 — "
"AZ-406 supplies only the public surface."
)
+54
View File
@@ -0,0 +1,54 @@
"""WGS84 geodesic helpers — Vincenty distance + bearing for accuracy assertions.
Wraps `pyproj.Geod` (WGS84 ellipsoid) for the few operations the blackbox
tests need. Kept deliberately small — broader geo math (UTM, MGRS, datum
conversions) is NOT in scope for the e2e harness.
All inputs are degrees lat / lon (WGS84); all distances are meters.
"""
from __future__ import annotations
from dataclasses import dataclass
from pyproj import Geod
_WGS84 = Geod(ellps="WGS84")
@dataclass(frozen=True)
class GeodeticDelta:
"""Bearing + distance + back-bearing between two WGS84 points."""
distance_m: float
forward_bearing_deg: float
reverse_bearing_deg: float
def distance_m(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
"""Vincenty distance in meters between two WGS84 points.
Raises ValueError on NaN inputs (defensive — silent NaN propagation in
a test assertion is the kind of bug this helper exists to prevent).
"""
for name, value in (("lat1", lat1), ("lon1", lon1), ("lat2", lat2), ("lon2", lon2)):
if value != value: # NaN check
raise ValueError(f"distance_m: {name} is NaN")
_, _, d = _WGS84.inv(lon1, lat1, lon2, lat2)
return float(d)
def delta(lat1: float, lon1: float, lat2: float, lon2: float) -> GeodeticDelta:
"""Full geodetic delta: distance + forward/reverse bearings."""
fwd_az, rev_az, d = _WGS84.inv(lon1, lat1, lon2, lat2)
return GeodeticDelta(
distance_m=float(d),
forward_bearing_deg=float(fwd_az),
reverse_bearing_deg=float(rev_az),
)
def offset(lat: float, lon: float, bearing_deg: float, distance_m: float) -> tuple[float, float]:
"""Project ``(lat, lon)`` by ``distance_m`` along ``bearing_deg`` (degrees CW from north)."""
new_lon, new_lat, _ = _WGS84.fwd(lon, lat, bearing_deg, distance_m)
return float(new_lat), float(new_lon)
+53
View File
@@ -0,0 +1,53 @@
"""Replay `data_imu.csv` to the FC inbound at 10 Hz.
CSV schema (from `_docs/00_problem/input_data/flight_derkachi/data_imu.csv`):
timestamp_ms,ax,ay,az,gx,gy,gz,roll_deg,pitch_deg,yaw_deg,baro_m
Owned by AZ-406 (public surface) + AZ-407 (concrete file-driver
implementation). This module commits to the type signatures the
per-scenario tests will import; the actual MAVLink / MSP2 emission is
wired up by the downstream task.
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
from typing import Protocol
@dataclass(frozen=True)
class ImuSample:
"""One row of `data_imu.csv` after parsing into native units."""
timestamp_ms: int
accel_mss: tuple[float, float, float]
gyro_rps: tuple[float, float, float]
attitude_rad: tuple[float, float, float] # roll, pitch, yaw (radians)
baro_alt_m: float
class FcInboundEmitter(Protocol):
"""Abstract emitter — concrete impls are MAVLink (AP) or MSP2 (iNav)."""
def emit(self, sample: ImuSample) -> None:
...
class ImuReplayer:
"""Drives an `FcInboundEmitter` from a CSV file at the recorded cadence."""
def __init__(self, emitter: FcInboundEmitter, rate_hz: float = 10.0) -> None:
self._emitter = emitter
self._rate_hz = rate_hz
def replay(self, csv_path: Path) -> int:
"""Replay the CSV file. Returns the number of samples emitted.
Concrete implementation is owned by AZ-407 (FT-P-02 derkachi-drift
+ FT-P-04 frame-to-frame registration are the first consumers).
"""
raise NotImplementedError(
"ImuReplayer.replay is owned by AZ-407 — AZ-406 supplies only "
"the public surface."
)
@@ -0,0 +1,48 @@
"""Parse `.tlog` files emitted by `mavproxy-listener`.
`.tlog` is the standard MAVLink dialect dump format: each message is a
6-byte unix-microsecond timestamp followed by the wire bytes of the MAVLink
frame. pymavlink ships `mavlogfile` which knows how to iterate this.
This module exposes a small typed wrapper so per-scenario tests can:
1. Filter for the message types they care about.
2. Compute summary statistics (count per type, message-rate Hz, ratio
of signed vs unsigned messages for NFT-SEC-03).
3. Attach the source `.tlog` path to the evidence bundler.
Concrete iteration logic is owned by AZ-416 (FT-P-09-AP); AZ-406 commits
to the public surface.
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
from typing import Iterator
@dataclass(frozen=True)
class TlogMessage:
timestamp_us: int
msg_type: str
signed: bool
fields: dict[str, object]
def iter_messages(tlog_path: Path) -> Iterator[TlogMessage]:
"""Iterate `.tlog` messages oldest-first.
AZ-406 raises until AZ-416 fills in the pymavlink-backed iterator.
"""
raise NotImplementedError(
"mavproxy_tlog_reader.iter_messages is owned by AZ-416 — "
"AZ-406 supplies only the public surface."
)
def count_by_type(tlog_path: Path) -> dict[str, int]:
"""Return ``{msg_type: count}`` for every distinct message type."""
counts: dict[str, int] = {}
for msg in iter_messages(tlog_path):
counts[msg.msg_type] = counts.get(msg.msg_type, 0) + 1
return counts
+59
View File
@@ -0,0 +1,59 @@
"""ArduPilot Plane / iNav SITL state-read observers.
Reads what the SUT delivered to the FC over its external-positioning
interface, without ever bypassing the FC's own acceptance path. This is
the only legal way for blackbox tests to assert AC-4.3 (FC output contract):
every assertion goes through the SITL's state machine.
Public surface only; concrete pymavlink / yamspy / msp_gps_toy subprocess
plumbing is owned by AZ-416 (FT-P-09-AP) and AZ-417 (FT-P-09-iNav).
"""
from __future__ import annotations
from dataclasses import dataclass
from typing import Literal, Protocol
FcKind = Literal["ardupilot", "inav"]
@dataclass(frozen=True)
class FcGpsState:
"""The subset of FC state the e2e tests assert against.
AP: assembled from EKF source-set + GLOBAL_POSITION_INT replay-back.
iNav: assembled from MSP2 GPS-provider state + getRawGPS query.
"""
primary_source: str # "MAV" (AP gps_type=14) or "MSP" (iNav)
last_position_lat_deg: float
last_position_lon_deg: float
last_position_alt_m: float
fix_quality: int # 0..6 per NMEA convention
horizontal_accuracy_m: float
last_update_age_ms: int
class FcSitlObserver(Protocol):
"""Common observer protocol — implemented by `ArduPilotObserver` + `InavObserver`."""
fc_kind: FcKind
def read_gps_state(self) -> FcGpsState:
...
def read_parameter(self, name: str) -> float | int | str | None:
...
def get_observer(fc_kind: FcKind, host: str) -> FcSitlObserver:
"""Factory — returns the matching observer for the requested FC.
AZ-416/417 own the concrete return types. AZ-406 raises until those
tasks land so test authors can plumb the observer through their
fixtures without yet running them.
"""
raise NotImplementedError(
f"sitl_observer.get_observer({fc_kind=}, {host=}) is owned by "
"AZ-416 (AP) / AZ-417 (iNav) — AZ-406 supplies only the contract."
)
+12
View File
@@ -0,0 +1,12 @@
[pytest]
minversion = 8.0
addopts = -ra --strict-markers --timeout=300
markers =
tier2_only: scenario only valid on Tier-2 Jetson hardware (SKIP on tier1-docker)
chamber_only: scenario requires the thermal chamber rig (SKIP unless --enable-chamber)
research_build_only: scenario only valid on a research build (SKIP when vio_strategy=vins_mono is selected on production matrix)
deferred_ac: scenario maps to an AC marked NOT COVERED / PARTIAL in the traceability matrix; emits SKIP or XFAIL with the matrix-mapped reason
traces_to(ids): comma-separated AC/RESTRICT IDs the test exercises (consumed by csv_reporter for the `traces_to` column)
smoke: minimal verification that the harness boots end-to-end
filterwarnings =
ignore::DeprecationWarning:pymavlink.*
+7
View File
@@ -0,0 +1,7 @@
"""CSV reporter + evidence bundler — pytest plugins registered by the runner conftest.
`csv_reporter` overrides the upstream pytest-csv default columns with the
exact column set declared in `_docs/02_document/tests/environment.md` §
Reporting; `evidence_bundler` collects per-run `.tlog`, FDR archives,
screenshots, profiler traces, tegrastats / jtop CSVs into a single bundle.
"""
+254
View File
@@ -0,0 +1,254 @@
"""CSV reporter pytest plugin.
Emits one row per test with the exact columns declared in
``_docs/02_document/tests/environment.md`` § Reporting:
test_id, test_name, traces_to, fc_adapter, vio_strategy, tier,
started_at_utc, execution_time_ms, result, error_message, evidence_paths
Why a custom plugin rather than `pytest-csv` defaults?
- `pytest-csv` is dependency-installed for its column-extension hooks, but
its default emission is `name`/`status`/`duration` — our matrix needs the
`traces_to`, `fc_adapter`, `vio_strategy`, `tier`, `started_at_utc`,
`evidence_paths` columns to feed the downstream badge generator and
regression detector.
Result classification per AC-9:
- PASS / FAIL / SKIP map 1:1 to pytest's own outcome.
- XFAIL is emitted when the test was marked `deferred_ac(verdict="xfail",
reason=...)` and the body raised (the standard pytest XFAIL path).
The plugin is unit-tested in ``e2e/_unit_tests/reporting/test_csv_reporter.py``.
"""
from __future__ import annotations
import csv
import os
import time
from datetime import datetime, timezone
UTC = timezone.utc
from pathlib import Path
from typing import Any
import pytest
CSV_COLUMNS: tuple[str, ...] = (
"test_id",
"test_name",
"traces_to",
"fc_adapter",
"vio_strategy",
"tier",
"started_at_utc",
"execution_time_ms",
"result",
"error_message",
"evidence_paths",
)
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _parametrize_value(item: pytest.Item, name: str, default: str = "n/a") -> str:
cs = getattr(item, "callspec", None)
if cs is None:
return default
return str(cs.params.get(name, default))
def _traces_to(item: pytest.Item) -> str:
marker = item.get_closest_marker("traces_to")
if marker is None:
return ""
ids = marker.args[0] if marker.args else marker.kwargs.get("ids", "")
if isinstance(ids, (list, tuple, set)):
return ",".join(str(i) for i in ids)
return str(ids)
def _test_id(item: pytest.Item) -> str:
"""Stable test id for the CSV `test_id` column.
Prefers an explicit ``@pytest.mark.test_id("FT-P-01")`` if set, otherwise
falls back to pytest's nodeid which is unique per parametrize variant.
"""
marker = item.get_closest_marker("test_id")
if marker is not None and marker.args:
return str(marker.args[0])
return item.nodeid
def _outcome_to_result(report: pytest.TestReport, item: pytest.Item) -> str:
if report.outcome == "passed":
if report.when == "call" and item.get_closest_marker("deferred_ac") is not None:
deferred = item.get_closest_marker("deferred_ac")
if deferred and deferred.kwargs.get("verdict") == "xfail":
return "XFAIL"
return "PASS"
if report.outcome == "failed":
return "FAIL"
if report.outcome == "skipped":
if report.when == "call" and item.get_closest_marker("deferred_ac") is not None:
deferred = item.get_closest_marker("deferred_ac")
if deferred and deferred.kwargs.get("verdict") == "xfail":
return "XFAIL"
return "SKIP"
# Unknown outcome — should never happen with stock pytest, but emit a
# visible FAIL rather than swallow it silently.
return f"FAIL ({report.outcome})"
# ---------------------------------------------------------------------------
# Row builder (exposed for unit tests)
# ---------------------------------------------------------------------------
def build_row(
item: pytest.Item,
report: pytest.TestReport,
started_at_utc: str,
execution_time_ms: int,
evidence_paths: list[str] | None = None,
) -> dict[str, str]:
"""Build the CSV row for a finished test.
Public function — unit-tested directly without spinning a pytest run.
"""
result = _outcome_to_result(report, item)
error_message = ""
if report.outcome == "failed":
# `longreprtext` is the canonical pytest rendering of the traceback;
# we collapse it to a single line for CSV friendliness and truncate
# to keep the row from blowing past a reasonable limit.
raw = report.longreprtext or repr(getattr(report, "longrepr", ""))
error_message = raw.replace("\n", " | ")[:2000]
elif report.outcome == "skipped":
# `longrepr` on a skip is a 3-tuple (file, lineno, reason).
if isinstance(report.longrepr, tuple) and len(report.longrepr) == 3:
error_message = str(report.longrepr[2])
else:
error_message = str(getattr(report, "longrepr", ""))[:2000]
return {
"test_id": _test_id(item),
"test_name": item.name,
"traces_to": _traces_to(item),
"fc_adapter": _parametrize_value(item, "fc_adapter"),
"vio_strategy": _parametrize_value(item, "vio_strategy"),
"tier": os.environ.get("TIER", "tier1-docker"),
"started_at_utc": started_at_utc,
"execution_time_ms": str(execution_time_ms),
"result": result,
"error_message": error_message,
"evidence_paths": ",".join(evidence_paths or []),
}
# ---------------------------------------------------------------------------
# Plugin hooks
# ---------------------------------------------------------------------------
class _CsvReporter:
def __init__(self, output_path: Path) -> None:
self._path = output_path
self._path.parent.mkdir(parents=True, exist_ok=True)
# Per-item start times so we can attribute call-phase duration accurately
# (we want call+setup wall-clock, NOT just call duration which omits any
# boundary-fixture setup cost).
self._start_times: dict[str, tuple[float, str]] = {}
self._evidence: dict[str, list[str]] = {}
self._rows: list[dict[str, str]] = []
# --- lifecycle hooks ---
def pytest_runtest_logstart(self, nodeid: str, location: Any) -> None: # noqa: ARG002 (pytest hook signature)
self._start_times[nodeid] = (time.monotonic(), datetime.now(UTC).isoformat(timespec="seconds"))
def pytest_runtest_logreport(self, report: pytest.TestReport) -> None:
# We emit one row per item, taken from the `call` phase. Setup-phase
# SKIPs (e.g. from `pytest.skip()` inside a fixture) lack a `call`
# phase, so for those we use the `setup` phase report instead.
item = getattr(report, "_item", None) # populated by pytest_runtest_protocol below
if item is None:
return
if report.when == "call" or (report.when == "setup" and report.outcome == "skipped"):
start_mono, start_iso = self._start_times.get(report.nodeid, (time.monotonic(), datetime.now(UTC).isoformat(timespec="seconds")))
elapsed_ms = int((time.monotonic() - start_mono) * 1000)
evidence = self._evidence.get(report.nodeid, [])
row = build_row(item, report, start_iso, elapsed_ms, evidence)
self._rows.append(row)
@pytest.hookimpl(hookwrapper=True)
def pytest_runtest_protocol(self, item: pytest.Item, nextitem: pytest.Item | None) -> Any:
# Tag the report objects with the originating item so logreport above
# can read parametrize ids / markers without a global lookup.
original_pytest_runtest_makereport = item.session.config.hook.pytest_runtest_makereport
def wrapper(*args: Any, **kwargs: Any) -> Any: # noqa: ANN401
report = original_pytest_runtest_makereport(*args, **kwargs)
if report is not None:
report._item = item # noqa: SLF001 (intentional plugin attribute)
return report
item.session.config.hook.pytest_runtest_makereport = wrapper
outcome = yield
item.session.config.hook.pytest_runtest_makereport = original_pytest_runtest_makereport
return outcome.get_result() if hasattr(outcome, "get_result") else None
def pytest_sessionfinish(self, session: pytest.Session, exitstatus: int) -> None: # noqa: ARG002
with self._path.open("w", newline="", encoding="utf-8") as fh:
writer = csv.DictWriter(fh, fieldnames=list(CSV_COLUMNS))
writer.writeheader()
writer.writerows(self._rows)
# --- public surface for the evidence_bundler plugin to attach paths ---
def attach_evidence(self, nodeid: str, evidence_path: str) -> None:
self._evidence.setdefault(nodeid, []).append(evidence_path)
_REPORTER_KEY = pytest.StashKey["_CsvReporter | None"]()
def pytest_addoption(parser: pytest.Parser) -> None:
group = parser.getgroup("e2e-runner", "Blackbox e2e harness options")
group.addoption(
"--csv",
action="store",
default=None,
help="Path to the CSV report (one row per test). Default off — set to enable.",
)
group.addoption(
"--csv-columns",
action="store",
default=",".join(CSV_COLUMNS),
help="Comma-separated column order. Default = environment.md § Reporting.",
)
def pytest_configure(config: pytest.Config) -> None:
config.stash[_REPORTER_KEY] = None
csv_path = config.getoption("--csv")
if csv_path:
reporter = _CsvReporter(Path(csv_path))
config.stash[_REPORTER_KEY] = reporter
config.pluginmanager.register(reporter, name="e2e-csv-reporter")
# `traces_to` and `test_id` are pytest markers — register them so
# --strict-markers doesn't error on first use.
config.addinivalue_line(
"markers", "traces_to(ids): comma-separated AC/RESTRICT IDs the test exercises"
)
config.addinivalue_line(
"markers", "test_id(name): override the test_id column (default = pytest nodeid)"
)
def reporter_for(config: pytest.Config) -> _CsvReporter | None:
"""Public accessor — used by `evidence_bundler` to attach evidence paths."""
return config.stash.get(_REPORTER_KEY, None)
+84
View File
@@ -0,0 +1,84 @@
"""Evidence bundler pytest plugin.
For each test, collects supporting artifacts (`.tlog`, FDR archive snapshots,
screenshots, profiler traces, tegrastats / jtop CSVs) into a per-run bundle
at ``--evidence-out`` (default ``/e2e-results/<run-id>/evidence/``) and
records the resulting paths in the CSV reporter's ``evidence_paths`` column.
The bundler is INERT by default: tests opt in by calling the
``attach_evidence`` fixture with a file path. The runner conftest registers
this plugin via `pytest_plugins`.
"""
from __future__ import annotations
import shutil
from collections.abc import Callable
from pathlib import Path
import pytest
from .csv_reporter import reporter_for
def _safe_relpath(target: Path, base: Path) -> str:
try:
return str(target.relative_to(base))
except ValueError:
# If the target isn't under base, we still record its absolute path
# — the bundle copy below makes the absolute fallback robust to
# arbitrary source locations (e.g. /tlogs/<run>.tlog).
return str(target)
@pytest.fixture
def attach_evidence(
request: pytest.FixtureRequest,
evidence_dir: Path,
) -> Callable[[str | Path], str]:
"""Copy a file into the run evidence bundle and record its CSV path.
Returns a callable ``attach(path) -> str`` — the test invokes it after
capturing an artifact (e.g., the .tlog file or an FDR snapshot). The
returned string is the path that will appear in the CSV
``evidence_paths`` column.
The implementation copies the file (rather than moving it) so the same
artifact can be referenced by multiple tests if needed.
"""
nodeid = request.node.nodeid
config = request.config
reporter = reporter_for(config)
bundle_root = evidence_dir / _slug(nodeid)
bundle_root.mkdir(parents=True, exist_ok=True)
def _attach(path: str | Path) -> str:
src = Path(path)
if not src.exists():
raise FileNotFoundError(f"attach_evidence: {src} not found")
dst = bundle_root / src.name
# If a test attaches the same name twice in one run, disambiguate.
if dst.exists():
stem, suffix = src.stem, src.suffix
counter = 1
while dst.exists():
dst = bundle_root / f"{stem}__{counter}{suffix}"
counter += 1
shutil.copy2(src, dst)
rel = _safe_relpath(dst, evidence_dir.parent)
if reporter is not None:
reporter.attach_evidence(nodeid, rel)
return rel
return _attach
def _slug(nodeid: str) -> str:
"""Filesystem-safe slug for the nodeid (preserves uniqueness, no path chars)."""
return (
nodeid.replace("/", "_")
.replace("::", "__")
.replace("[", "_")
.replace("]", "")
.replace(" ", "")
)
+36
View File
@@ -0,0 +1,36 @@
# e2e-runner image dependencies.
#
# Pin reasoning:
# - `opencv-python>=4.12.0` honors D-CROSS-CVE-1 (the runner image does NOT
# depend on gtsam — the numpy<2 ABI block that forces the SUT pin does not
# apply here; see _docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md).
# - Versions match the SUT pyproject where feasible (numpy 1.x line, pyproj 3.6+, pydantic 2.x).
# - pytest 8.x is the stable line; pytest-csv 3.x supplies the columns the CSV reporter plugin extends.
pytest>=8.0,<9.0
pytest-timeout>=2.2,<3.0
pytest-xdist>=3.5,<4.0
pytest-forked>=1.6,<2.0
pytest-csv>=3.0,<4.0
# MAVLink ground side — used for both AP signing-handshake assertions and the
# passive listener that consumes mavproxy-listener's forwarded UDP stream.
pymavlink>=2.4
# Geodesic + frame replay + numerical assertion stack.
opencv-python>=4.12.0
numpy>=1.26,<2.0
scipy>=1.11,<2.0
geopy>=2.4,<3.0
pyproj>=3.6,<4.0
# HTTP client for talking to mock-suite-sat-service.
httpx>=0.28,<1.0
pyyaml>=6.0
pydantic>=2.5,<3.0
# Structured logging in the runner side (mirrors the SUT logger choice).
structlog>=24.1
# FDR archive reader uses orjson for the line-delimited JSON record format.
orjson>=3.9,<4.0