[AZ-406] Blackbox test harness bootstrap (Tier-1 + Tier-2 scaffold)

Bootstraps the public-boundary blackbox test harness owned by epic
AZ-262 (E-BBT). Establishes the e2e/ directory tree at the repo root,
fully separated from src/gps_denied_onboard/** and from the in-process
tests/** tree, and commits to the contracts every subsequent test
ticket (AZ-407..AZ-446) builds against.

Tier-1 (workstation Docker):
- docker/docker-compose.test.yml wires SUT + ArduPilot SITL + iNav SITL
  + mock Suite Sat Service + mavproxy listener + e2e-runner onto one
  e2e-net bridge with internal: true (enforces RESTRICT-SAT-1 /
  NFT-SEC-02 egress isolation at the network layer).
- docker/docker-compose.tier2-bridge.yml override disables the in-
  compose SUT so Tier-2 pairs SITLs + mock + runner on an x86 host
  while the SUT runs natively on the Jetson under systemd.

Tier-2 (Jetson):
- jetson/run-tier2.sh + tier2.service systemd unit + tegrastats /
  jtop parsers feed per-sample telemetry into the evidence bundle.

Runner image (e2e/runner/):
- Dockerfile + requirements.txt install ONLY ground-side libs
  (pymavlink, opencv-python>=4.12, numpy/scipy/geopy/pyproj, httpx,
  orjson, pydantic, structlog, pytest 8.x). The runner deliberately
  does NOT install the SUT package.
- conftest.py implements the AC-9 skip-rule mapping (tier2_only,
  chamber_only, vins_mono, deferred_ac) tied to environment.md
  parametrize axes.
- reporting/csv_reporter.py is a pytest plugin emitting one row per
  test with the exact 11-column schema from environment.md §
  Reporting (test_id, test_name, traces_to, fc_adapter, vio_strategy,
  tier, started_at_utc, execution_time_ms, result, error_message,
  evidence_paths). XFAIL surfaced only when a test carries
  @pytest.mark.deferred_ac(verdict="xfail", reason=...).
- reporting/evidence_bundler.py exposes the attach_evidence fixture
  that copies per-test artifacts (.tlog, FDR archives, screenshots,
  tegrastats / jtop CSVs) into the run bundle and records relative
  paths into the reporter's evidence_paths column.
- helpers/{frame_source_replay,imu_replay,sitl_observer,
  mavproxy_tlog_reader,fdr_reader}.py declare the public surfaces
  (concrete implementations owned by AZ-407 / AZ-408 / AZ-416 /
  AZ-417 / AZ-441 per the dependency table); helpers/geo.py ships
  today (no downstream task dep) — WGS84 distance / forward-bearing
  / offset via pyproj with NaN rejection.

Mock Suite Sat Service (e2e/fixtures/mock-suite-sat/):
- FastAPI app: POST /tiles (ingest contract from D-PROJ-2 follow-up),
  GET /tiles/audit + /mock/audit (per-run read-back), POST
  /mock/config (force-status, response delay), POST /mock/reset
  (clears audit between tests), GET /mock/health.

Fixture scaffolds (e2e/fixtures/{tile-cache-builder, age-injector,
injectors, cold-boot, secrets, security}/):
- Public surfaces only. Concrete builders land in AZ-407 (static
  fixtures), AZ-408 (runtime synthetic injection), AZ-419 (cold-boot
  fixture), AZ-439 (CVE-2025-53644 JPEG generator).

Test tree (e2e/tests/{positive,negative,performance,resilience,
security,resource_limit}/):
- Mirror of the test-spec category grouping in
  _docs/02_document/tests/*-tests.md.
- tests/positive/test_smoke.py is the AC-1 harness-boot smoke run
  inside the e2e-runner image once Docker brings everything up.

Out-of-container unit tests (e2e/_unit_tests/):
- Exercises the harness internals (CSV reporter plugin lifecycle,
  conftest skip rules, helper modules, parsers, mock app, compose
  YAML structural contract, public-boundary enforcement) without
  Docker / SITL. 97 unit tests, all passing.

Build / config:
- pyproject.toml: testpaths extended with e2e/_unit_tests; pythonpath
  extended with e2e; fastapi>=0.111,<0.120 added to dev extras for the
  mock-app TestClient unit test.

AC coverage:
- AC-1 (Tier-1 boot)         → compose YAML test + directory layout
                                + smoke test (Docker-bound)
- AC-2 (mock services)       → 6 FastAPI TestClient unit tests
- AC-3 (SITLs accept output) → contract present; concrete check
                                deferred to AZ-416 / AZ-417
- AC-4 (CSV columns)         → in-process plugin lifecycle test
                                emits the exact 11-column schema
- AC-5 (egress isolation)    → static config test + runtime probe
                                in Docker-bound smoke
- AC-6 (Tier-2 contract)     → tegrastats + jtop parser unit tests
                                + jetson/* layout test; full Tier-2
                                contract is AZ-444
- AC-7 (fixture reproducibility) → deferred to AZ-407 per task spec
- AC-8 (parametrize matrix)  → vins_mono skip-rule cases +
                                tests/positive/test_smoke
- AC-9 (skip semantics)      → 9 conftest skip-rule unit tests

Module layout entry for blackbox_tests was added in 2026-05-16
preparatory commit d7a17a8 so this diff stays focused on the harness
scaffold. AZ-406 advances to In Testing on commit.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-16 16:22:44 +03:00
parent d7a17a8248
commit 59d9116d36
72 changed files with 3515 additions and 6 deletions
+13
View File
@@ -0,0 +1,13 @@
"""Public-boundary helper modules used by every blackbox test.
Modules:
frame_source_replay — replay images/video to the SUT's V4L2 file source
imu_replay — replay `data_imu.csv` at 10 Hz to the FC inbound
sitl_observer — AP/iNav read-side observers (param reads, GPS_RAW_INT, MSP queries)
mavproxy_tlog_reader — parse `.tlog` files emitted by `mavproxy-listener`
fdr_reader — post-run filesystem read of the FDR archive
geo — Vincenty / WGS84 geodesic helpers
These modules MUST NOT import from `gps_denied_onboard.*`. Public-boundary
discipline is enforced by `e2e/_unit_tests/test_no_sut_imports.py`.
"""
+59
View File
@@ -0,0 +1,59 @@
"""Post-run filesystem read of the FDR archive.
The FDR archive is a line-delimited JSON record stream per AZ-272 / AZ-273.
Each line is an `FdrRecord` envelope (producer_id, type, monotonic_ms,
payload). The runner image must NEVER import the SUT's FdrRecord schema
directly — it parses the JSON bytes and validates against a duplicate
record-type allowlist baked into this module.
Public surface only; concrete parser + assertion helpers are owned by
AZ-441 (NFT-LIM-02 — FDR size budget) and the resilience scenario tasks
that need to crawl the archive (AZ-432, AZ-433, AZ-435).
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
from typing import Iterator
@dataclass(frozen=True)
class FdrRecord:
"""Mirror of `gps_denied_onboard.fdr_client.records.FdrRecord` — public-boundary copy.
The schema is duplicated intentionally; if the SUT's FDR schema evolves
in a breaking way, this duplicate file fails to parse (visible drift)
rather than silently following along.
"""
producer_id: str
monotonic_ms: int
record_type: str
payload: dict[str, object]
def iter_records(fdr_archive_root: Path) -> Iterator[FdrRecord]:
"""Iterate every FDR record in the archive root (ordered by monotonic_ms).
Raises NotImplementedError until AZ-441 supplies the orjson-backed parser.
"""
raise NotImplementedError(
"fdr_reader.iter_records is owned by AZ-441 — AZ-406 supplies only "
"the public surface."
)
def archive_size_bytes(fdr_archive_root: Path) -> int:
"""Sum the size of every file under ``fdr_archive_root``.
Concrete implementation here — it's a thin os.walk + stat loop that
NFT-LIM-02 needs as soon as a real archive lands.
"""
if not fdr_archive_root.exists():
return 0
total = 0
for p in fdr_archive_root.rglob("*"):
if p.is_file():
total += p.stat().st_size
return total
+77
View File
@@ -0,0 +1,77 @@
"""Replay images / video to the SUT's V4L2 file frame source.
Two replay modes:
1. Image-set replay (FT-P-01, FT-P-05) — emit a sequence of JPEG / PNG
still images at a configurable rate to the file frame source path the
SUT polls.
2. Video replay (FT-P-02, FT-P-04, FT-N-01..04, NFT-PERF-*) — decode an
MP4 with OpenCV and emit frames at the encoded FPS (or a user-supplied
rate for fast-forward).
The actual frame-source path inside the SUT container is configured via the
``ONBOARD_FRAME_SOURCE_PATH`` environment variable on the SUT — the runner
writes to a shared tmpfs volume mounted at the same path inside both
containers.
This file currently provides the public surface used by per-scenario tests;
concrete implementations land alongside their consuming test tasks
(AZ-407 onward). The intent is that `FrameSourceReplayer` is a stable API
the test specs can rely on while the underlying replay strategy is filled
in incrementally.
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
from typing import Protocol
@dataclass(frozen=True)
class ReplayCadence:
"""Frame-rate / pace configuration for a replay session."""
fps: float = 10.0
realtime: bool = True
class FrameSink(Protocol):
"""Abstract destination for replayed frames (file path or memory queue)."""
def write_frame(self, jpeg_bytes: bytes, timestamp_ms: int) -> None:
...
class FrameSourceReplayer:
"""Public surface for replaying frames into the SUT's frame-source path.
AZ-407 (Static fixture builders) supplies the concrete still-image replay
implementation; AZ-408 (Runtime synthetic-injection) supplies the video
+ injector variants. AZ-406 only commits to the contract.
"""
def __init__(self, sink: FrameSink, cadence: ReplayCadence | None = None) -> None:
self._sink = sink
self._cadence = cadence or ReplayCadence()
def replay_image_directory(self, directory: Path) -> int:
"""Replay every image in ``directory`` (sorted by name). Returns count emitted.
Raises NotImplementedError until AZ-407 lands. Tests that need this
path should mark themselves @pytest.mark.skip(reason="awaiting AZ-407")
until then; AC-1 (smoke) does not depend on this surface.
"""
raise NotImplementedError(
"FrameSourceReplayer.replay_image_directory is owned by AZ-407 — "
"AZ-406 supplies only the public surface."
)
def replay_video(self, video_path: Path) -> int:
"""Replay an MP4 / .h264 file frame-by-frame. Returns count emitted.
Raises NotImplementedError until AZ-408 lands.
"""
raise NotImplementedError(
"FrameSourceReplayer.replay_video is owned by AZ-408 — "
"AZ-406 supplies only the public surface."
)
+54
View File
@@ -0,0 +1,54 @@
"""WGS84 geodesic helpers — Vincenty distance + bearing for accuracy assertions.
Wraps `pyproj.Geod` (WGS84 ellipsoid) for the few operations the blackbox
tests need. Kept deliberately small — broader geo math (UTM, MGRS, datum
conversions) is NOT in scope for the e2e harness.
All inputs are degrees lat / lon (WGS84); all distances are meters.
"""
from __future__ import annotations
from dataclasses import dataclass
from pyproj import Geod
_WGS84 = Geod(ellps="WGS84")
@dataclass(frozen=True)
class GeodeticDelta:
"""Bearing + distance + back-bearing between two WGS84 points."""
distance_m: float
forward_bearing_deg: float
reverse_bearing_deg: float
def distance_m(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
"""Vincenty distance in meters between two WGS84 points.
Raises ValueError on NaN inputs (defensive — silent NaN propagation in
a test assertion is the kind of bug this helper exists to prevent).
"""
for name, value in (("lat1", lat1), ("lon1", lon1), ("lat2", lat2), ("lon2", lon2)):
if value != value: # NaN check
raise ValueError(f"distance_m: {name} is NaN")
_, _, d = _WGS84.inv(lon1, lat1, lon2, lat2)
return float(d)
def delta(lat1: float, lon1: float, lat2: float, lon2: float) -> GeodeticDelta:
"""Full geodetic delta: distance + forward/reverse bearings."""
fwd_az, rev_az, d = _WGS84.inv(lon1, lat1, lon2, lat2)
return GeodeticDelta(
distance_m=float(d),
forward_bearing_deg=float(fwd_az),
reverse_bearing_deg=float(rev_az),
)
def offset(lat: float, lon: float, bearing_deg: float, distance_m: float) -> tuple[float, float]:
"""Project ``(lat, lon)`` by ``distance_m`` along ``bearing_deg`` (degrees CW from north)."""
new_lon, new_lat, _ = _WGS84.fwd(lon, lat, bearing_deg, distance_m)
return float(new_lat), float(new_lon)
+53
View File
@@ -0,0 +1,53 @@
"""Replay `data_imu.csv` to the FC inbound at 10 Hz.
CSV schema (from `_docs/00_problem/input_data/flight_derkachi/data_imu.csv`):
timestamp_ms,ax,ay,az,gx,gy,gz,roll_deg,pitch_deg,yaw_deg,baro_m
Owned by AZ-406 (public surface) + AZ-407 (concrete file-driver
implementation). This module commits to the type signatures the
per-scenario tests will import; the actual MAVLink / MSP2 emission is
wired up by the downstream task.
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
from typing import Protocol
@dataclass(frozen=True)
class ImuSample:
"""One row of `data_imu.csv` after parsing into native units."""
timestamp_ms: int
accel_mss: tuple[float, float, float]
gyro_rps: tuple[float, float, float]
attitude_rad: tuple[float, float, float] # roll, pitch, yaw (radians)
baro_alt_m: float
class FcInboundEmitter(Protocol):
"""Abstract emitter — concrete impls are MAVLink (AP) or MSP2 (iNav)."""
def emit(self, sample: ImuSample) -> None:
...
class ImuReplayer:
"""Drives an `FcInboundEmitter` from a CSV file at the recorded cadence."""
def __init__(self, emitter: FcInboundEmitter, rate_hz: float = 10.0) -> None:
self._emitter = emitter
self._rate_hz = rate_hz
def replay(self, csv_path: Path) -> int:
"""Replay the CSV file. Returns the number of samples emitted.
Concrete implementation is owned by AZ-407 (FT-P-02 derkachi-drift
+ FT-P-04 frame-to-frame registration are the first consumers).
"""
raise NotImplementedError(
"ImuReplayer.replay is owned by AZ-407 — AZ-406 supplies only "
"the public surface."
)
@@ -0,0 +1,48 @@
"""Parse `.tlog` files emitted by `mavproxy-listener`.
`.tlog` is the standard MAVLink dialect dump format: each message is a
6-byte unix-microsecond timestamp followed by the wire bytes of the MAVLink
frame. pymavlink ships `mavlogfile` which knows how to iterate this.
This module exposes a small typed wrapper so per-scenario tests can:
1. Filter for the message types they care about.
2. Compute summary statistics (count per type, message-rate Hz, ratio
of signed vs unsigned messages for NFT-SEC-03).
3. Attach the source `.tlog` path to the evidence bundler.
Concrete iteration logic is owned by AZ-416 (FT-P-09-AP); AZ-406 commits
to the public surface.
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
from typing import Iterator
@dataclass(frozen=True)
class TlogMessage:
timestamp_us: int
msg_type: str
signed: bool
fields: dict[str, object]
def iter_messages(tlog_path: Path) -> Iterator[TlogMessage]:
"""Iterate `.tlog` messages oldest-first.
AZ-406 raises until AZ-416 fills in the pymavlink-backed iterator.
"""
raise NotImplementedError(
"mavproxy_tlog_reader.iter_messages is owned by AZ-416 — "
"AZ-406 supplies only the public surface."
)
def count_by_type(tlog_path: Path) -> dict[str, int]:
"""Return ``{msg_type: count}`` for every distinct message type."""
counts: dict[str, int] = {}
for msg in iter_messages(tlog_path):
counts[msg.msg_type] = counts.get(msg.msg_type, 0) + 1
return counts
+59
View File
@@ -0,0 +1,59 @@
"""ArduPilot Plane / iNav SITL state-read observers.
Reads what the SUT delivered to the FC over its external-positioning
interface, without ever bypassing the FC's own acceptance path. This is
the only legal way for blackbox tests to assert AC-4.3 (FC output contract):
every assertion goes through the SITL's state machine.
Public surface only; concrete pymavlink / yamspy / msp_gps_toy subprocess
plumbing is owned by AZ-416 (FT-P-09-AP) and AZ-417 (FT-P-09-iNav).
"""
from __future__ import annotations
from dataclasses import dataclass
from typing import Literal, Protocol
FcKind = Literal["ardupilot", "inav"]
@dataclass(frozen=True)
class FcGpsState:
"""The subset of FC state the e2e tests assert against.
AP: assembled from EKF source-set + GLOBAL_POSITION_INT replay-back.
iNav: assembled from MSP2 GPS-provider state + getRawGPS query.
"""
primary_source: str # "MAV" (AP gps_type=14) or "MSP" (iNav)
last_position_lat_deg: float
last_position_lon_deg: float
last_position_alt_m: float
fix_quality: int # 0..6 per NMEA convention
horizontal_accuracy_m: float
last_update_age_ms: int
class FcSitlObserver(Protocol):
"""Common observer protocol — implemented by `ArduPilotObserver` + `InavObserver`."""
fc_kind: FcKind
def read_gps_state(self) -> FcGpsState:
...
def read_parameter(self, name: str) -> float | int | str | None:
...
def get_observer(fc_kind: FcKind, host: str) -> FcSitlObserver:
"""Factory — returns the matching observer for the requested FC.
AZ-416/417 own the concrete return types. AZ-406 raises until those
tasks land so test authors can plumb the observer through their
fixtures without yet running them.
"""
raise NotImplementedError(
f"sitl_observer.get_observer({fc_kind=}, {host=}) is owned by "
"AZ-416 (AP) / AZ-417 (iNav) — AZ-406 supplies only the contract."
)