[AZ-406] Blackbox test harness bootstrap (Tier-1 + Tier-2 scaffold)

Bootstraps the public-boundary blackbox test harness owned by epic
AZ-262 (E-BBT). Establishes the e2e/ directory tree at the repo root,
fully separated from src/gps_denied_onboard/** and from the in-process
tests/** tree, and commits to the contracts every subsequent test
ticket (AZ-407..AZ-446) builds against.

Tier-1 (workstation Docker):
- docker/docker-compose.test.yml wires SUT + ArduPilot SITL + iNav SITL
  + mock Suite Sat Service + mavproxy listener + e2e-runner onto one
  e2e-net bridge with internal: true (enforces RESTRICT-SAT-1 /
  NFT-SEC-02 egress isolation at the network layer).
- docker/docker-compose.tier2-bridge.yml override disables the in-
  compose SUT so Tier-2 pairs SITLs + mock + runner on an x86 host
  while the SUT runs natively on the Jetson under systemd.

Tier-2 (Jetson):
- jetson/run-tier2.sh + tier2.service systemd unit + tegrastats /
  jtop parsers feed per-sample telemetry into the evidence bundle.

Runner image (e2e/runner/):
- Dockerfile + requirements.txt install ONLY ground-side libs
  (pymavlink, opencv-python>=4.12, numpy/scipy/geopy/pyproj, httpx,
  orjson, pydantic, structlog, pytest 8.x). The runner deliberately
  does NOT install the SUT package.
- conftest.py implements the AC-9 skip-rule mapping (tier2_only,
  chamber_only, vins_mono, deferred_ac) tied to environment.md
  parametrize axes.
- reporting/csv_reporter.py is a pytest plugin emitting one row per
  test with the exact 11-column schema from environment.md §
  Reporting (test_id, test_name, traces_to, fc_adapter, vio_strategy,
  tier, started_at_utc, execution_time_ms, result, error_message,
  evidence_paths). XFAIL surfaced only when a test carries
  @pytest.mark.deferred_ac(verdict="xfail", reason=...).
- reporting/evidence_bundler.py exposes the attach_evidence fixture
  that copies per-test artifacts (.tlog, FDR archives, screenshots,
  tegrastats / jtop CSVs) into the run bundle and records relative
  paths into the reporter's evidence_paths column.
- helpers/{frame_source_replay,imu_replay,sitl_observer,
  mavproxy_tlog_reader,fdr_reader}.py declare the public surfaces
  (concrete implementations owned by AZ-407 / AZ-408 / AZ-416 /
  AZ-417 / AZ-441 per the dependency table); helpers/geo.py ships
  today (no downstream task dep) — WGS84 distance / forward-bearing
  / offset via pyproj with NaN rejection.

Mock Suite Sat Service (e2e/fixtures/mock-suite-sat/):
- FastAPI app: POST /tiles (ingest contract from D-PROJ-2 follow-up),
  GET /tiles/audit + /mock/audit (per-run read-back), POST
  /mock/config (force-status, response delay), POST /mock/reset
  (clears audit between tests), GET /mock/health.

Fixture scaffolds (e2e/fixtures/{tile-cache-builder, age-injector,
injectors, cold-boot, secrets, security}/):
- Public surfaces only. Concrete builders land in AZ-407 (static
  fixtures), AZ-408 (runtime synthetic injection), AZ-419 (cold-boot
  fixture), AZ-439 (CVE-2025-53644 JPEG generator).

Test tree (e2e/tests/{positive,negative,performance,resilience,
security,resource_limit}/):
- Mirror of the test-spec category grouping in
  _docs/02_document/tests/*-tests.md.
- tests/positive/test_smoke.py is the AC-1 harness-boot smoke run
  inside the e2e-runner image once Docker brings everything up.

Out-of-container unit tests (e2e/_unit_tests/):
- Exercises the harness internals (CSV reporter plugin lifecycle,
  conftest skip rules, helper modules, parsers, mock app, compose
  YAML structural contract, public-boundary enforcement) without
  Docker / SITL. 97 unit tests, all passing.

Build / config:
- pyproject.toml: testpaths extended with e2e/_unit_tests; pythonpath
  extended with e2e; fastapi>=0.111,<0.120 added to dev extras for the
  mock-app TestClient unit test.

AC coverage:
- AC-1 (Tier-1 boot)         → compose YAML test + directory layout
                                + smoke test (Docker-bound)
- AC-2 (mock services)       → 6 FastAPI TestClient unit tests
- AC-3 (SITLs accept output) → contract present; concrete check
                                deferred to AZ-416 / AZ-417
- AC-4 (CSV columns)         → in-process plugin lifecycle test
                                emits the exact 11-column schema
- AC-5 (egress isolation)    → static config test + runtime probe
                                in Docker-bound smoke
- AC-6 (Tier-2 contract)     → tegrastats + jtop parser unit tests
                                + jetson/* layout test; full Tier-2
                                contract is AZ-444
- AC-7 (fixture reproducibility) → deferred to AZ-407 per task spec
- AC-8 (parametrize matrix)  → vins_mono skip-rule cases +
                                tests/positive/test_smoke
- AC-9 (skip semantics)      → 9 conftest skip-rule unit tests

Module layout entry for blackbox_tests was added in 2026-05-16
preparatory commit d7a17a8 so this diff stays focused on the harness
scaffold. AZ-406 advances to In Testing on commit.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-16 16:22:44 +03:00
parent d7a17a8248
commit 59d9116d36
72 changed files with 3515 additions and 6 deletions
+129
View File
@@ -0,0 +1,129 @@
"""Sample jtop (jetson-stats) Python API → per-sample CSV rows.
Unlike tegrastats which is a stdout stream, jtop exposes a Python API
that emits a polled state dictionary. We poll at a caller-supplied
cadence and convert the relevant fields to CSV columns aligned with the
tegrastats output where the two overlap.
Schema (CSV columns):
timestamp_utc_iso, ram_used_mb, ram_total_mb, gpu_load_pct,
gpu_freq_mhz, cpu_load_avg_pct, soc_temp_c, gpu_temp_c, power_mw,
extras_json
Usage:
python3 jtop_parser.py --out out.csv --interval 1.0
"""
from __future__ import annotations
import argparse
import csv
import json
import time
from datetime import datetime, timezone
UTC = timezone.utc
from pathlib import Path
CSV_COLUMNS = (
"timestamp_utc_iso",
"ram_used_mb",
"ram_total_mb",
"gpu_load_pct",
"gpu_freq_mhz",
"cpu_load_avg_pct",
"soc_temp_c",
"gpu_temp_c",
"power_mw",
"extras_json",
)
def state_to_row(state: object) -> dict[str, object]:
"""Convert one jtop polled-state object to a CSV row.
`state` is whatever `jtop.jtop().stats` returns; on real Jetson runs it
is a `JtopStats` dataclass-ish object exposing `ram`, `gpu`, `cpu`,
`temperature`, `power`. We extract defensively because jetson-stats
schema has shifted across versions.
"""
def _get(obj: object, *path: str, default: object = "") -> object:
cur = obj
for key in path:
if cur is None:
return default
if isinstance(cur, dict):
cur = cur.get(key, default)
else:
cur = getattr(cur, key, default)
return cur if cur is not None else default
row: dict[str, object] = {
"timestamp_utc_iso": datetime.now(UTC).isoformat(timespec="milliseconds"),
"ram_used_mb": _get(state, "ram", "used"),
"ram_total_mb": _get(state, "ram", "tot"),
"gpu_load_pct": _get(state, "gpu", "load"),
"gpu_freq_mhz": _get(state, "gpu", "freq", "cur"),
"cpu_load_avg_pct": _get(state, "cpu", "load_avg", default=""),
"soc_temp_c": _get(state, "temperature", "SOC", default=""),
"gpu_temp_c": _get(state, "temperature", "GPU", default=""),
"power_mw": _get(state, "power", "total", default=""),
"extras_json": "",
}
return row
def run(out_path: Path, interval_s: float, samples_max: int | None = None) -> int:
"""Poll jtop and write rows to ``out_path``. Returns rows written.
On hosts without jetson-stats installed (e.g., unit-test runs on dev
workstations), the function ImportError → emits a single "stub" row
pointing at the missing dependency and exits. This keeps Tier-2 dry
runs and CI smoke happy without forcing CI to install jetson-stats.
"""
out_path.parent.mkdir(parents=True, exist_ok=True)
rows_written = 0
try:
from jtop import jtop # type: ignore[import-untyped]
except ImportError as exc:
with out_path.open("w", newline="", encoding="utf-8") as fh:
writer = csv.DictWriter(fh, fieldnames=list(CSV_COLUMNS))
writer.writeheader()
writer.writerow(
{
**{col: "" for col in CSV_COLUMNS},
"timestamp_utc_iso": datetime.now(UTC).isoformat(timespec="milliseconds"),
"extras_json": json.dumps({"stub": True, "missing_dep": "jetson-stats", "import_error": str(exc)}),
}
)
return 1
with jtop() as poll, out_path.open("w", newline="", encoding="utf-8") as fh:
writer = csv.DictWriter(fh, fieldnames=list(CSV_COLUMNS))
writer.writeheader()
while poll.ok():
row = state_to_row(poll.stats)
writer.writerow(row)
fh.flush()
rows_written += 1
if samples_max is not None and rows_written >= samples_max:
break
time.sleep(interval_s)
return rows_written
def main() -> int:
parser = argparse.ArgumentParser(description="Sample jtop → CSV.")
parser.add_argument("--out", type=Path, required=True)
parser.add_argument("--interval", type=float, default=1.0, help="Poll interval in seconds.")
parser.add_argument("--samples-max", type=int, default=None)
args = parser.parse_args()
n = run(args.out, args.interval, args.samples_max)
print(f"jtop_parser: wrote {n} rows to {args.out}")
return 0
if __name__ == "__main__":
raise SystemExit(main())
+148
View File
@@ -0,0 +1,148 @@
#!/usr/bin/env bash
# Tier-2 Jetson hardware-loop entrypoint.
#
# Usage:
# ./run-tier2.sh --fc-adapter <ardupilot|inav> --vio-strategy <okvis2|klt_ransac> [--duration <5min|8h>] [--enable-chamber]
#
# Pre-requisites (verified at startup):
# * The Jetson is provisioned per `_docs/02_document/tests/environment.md`
# § Execution instructions — Tier-2 (JetPack 6.2, CUDA, TensorRT 10.3, cuDNN).
# * `gps-denied-onboard.service` is installed via systemd
# (`tier2.service` is the template; operator copies it to /etc/systemd/system).
# * SITLs + mock + listener + runner reachable on the same network via
# `docker compose -f e2e/docker/docker-compose.test.yml -f e2e/docker/docker-compose.tier2-bridge.yml up ...`
# on a paired x86 host. (Same-Jetson SITL is also supported — set JETSON_HOST=localhost.)
#
# Outputs the same CSV format as Tier-1 to ./e2e-results/run-${RUN_ID}/report.csv
# plus the per-sample tegrastats + jtop CSVs in the evidence bundle.
set -euo pipefail
FC_ADAPTER=""
VIO_STRATEGY=""
DURATION="5min"
ENABLE_CHAMBER=0
JETSON_HOST_OVERRIDE=""
usage() {
grep -E '^# ' "$0" | sed 's/^# //'
exit 1
}
while [[ $# -gt 0 ]]; do
case "$1" in
--fc-adapter) FC_ADAPTER="$2"; shift 2 ;;
--vio-strategy) VIO_STRATEGY="$2"; shift 2 ;;
--duration) DURATION="$2"; shift 2 ;;
--enable-chamber) ENABLE_CHAMBER=1; shift ;;
--jetson-host) JETSON_HOST_OVERRIDE="$2"; shift 2 ;;
-h|--help) usage ;;
*) echo "Unknown arg: $1" >&2; usage ;;
esac
done
if [[ -z "$FC_ADAPTER" || -z "$VIO_STRATEGY" ]]; then
echo "ERROR: --fc-adapter and --vio-strategy are required" >&2
usage
fi
case "$FC_ADAPTER" in
ardupilot|inav) ;;
*) echo "ERROR: --fc-adapter must be ardupilot or inav (got: $FC_ADAPTER)" >&2; exit 2 ;;
esac
case "$VIO_STRATEGY" in
okvis2|klt_ransac|vins_mono) ;;
*) echo "ERROR: --vio-strategy must be okvis2 | klt_ransac | vins_mono (got: $VIO_STRATEGY)" >&2; exit 2 ;;
esac
# RUN_ID — caller may set; default is utc-stamp + adapter pair.
: "${RUN_ID:=tier2-$(date -u +%Y%m%dT%H%M%SZ)-${FC_ADAPTER}-${VIO_STRATEGY}}"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
RESULTS_DIR="${REPO_ROOT}/e2e-results/run-${RUN_ID}"
EVIDENCE_DIR="${RESULTS_DIR}/evidence"
mkdir -p "${EVIDENCE_DIR}"
echo "[tier2] RUN_ID=${RUN_ID}"
echo "[tier2] FC_ADAPTER=${FC_ADAPTER} VIO_STRATEGY=${VIO_STRATEGY} DURATION=${DURATION}"
echo "[tier2] RESULTS_DIR=${RESULTS_DIR}"
# ---------------------------------------------------------------------------
# Pre-flight: confirm the SUT systemd unit is healthy.
# ---------------------------------------------------------------------------
if ! systemctl is-active --quiet gps-denied-onboard.service; then
echo "[tier2] gps-denied-onboard.service is not active — attempting restart..." >&2
sudo systemctl restart gps-denied-onboard.service
sleep 3
if ! systemctl is-active --quiet gps-denied-onboard.service; then
echo "[tier2] FATAL: gps-denied-onboard.service failed to start" >&2
sudo systemctl status gps-denied-onboard.service --no-pager || true
exit 3
fi
fi
# ---------------------------------------------------------------------------
# Start tegrastats + jtop background samplers (evidence bundle inputs).
# ---------------------------------------------------------------------------
TEGRA_CSV="${EVIDENCE_DIR}/tegrastats.csv"
JTOP_CSV="${EVIDENCE_DIR}/jtop.csv"
# tegrastats emits at 5 Hz by default; parser converts to per-sample CSV rows.
if command -v tegrastats >/dev/null 2>&1; then
tegrastats --interval 200 \
| python3 "${SCRIPT_DIR}/tegrastats_parser.py" --out "${TEGRA_CSV}" &
TEGRA_PID=$!
else
echo "[tier2] WARNING: tegrastats not in PATH — skipping that evidence channel." >&2
TEGRA_PID=
fi
if command -v jtop >/dev/null 2>&1; then
python3 "${SCRIPT_DIR}/jtop_parser.py" --out "${JTOP_CSV}" --interval 1.0 &
JTOP_PID=$!
else
echo "[tier2] WARNING: jtop not in PATH — skipping that evidence channel." >&2
JTOP_PID=
fi
cleanup() {
local rc=$?
[[ -n "${TEGRA_PID:-}" ]] && kill "${TEGRA_PID}" 2>/dev/null || true
[[ -n "${JTOP_PID:-}" ]] && kill "${JTOP_PID}" 2>/dev/null || true
echo "[tier2] cleanup complete (rc=${rc})"
exit "${rc}"
}
trap cleanup EXIT INT TERM
# ---------------------------------------------------------------------------
# Run the e2e suite — the runner image is the SAME as Tier-1; only TIER differs.
# ---------------------------------------------------------------------------
JETSON_HOST_ARG="${JETSON_HOST_OVERRIDE:-localhost}"
CHAMBER_ARG=()
[[ "${ENABLE_CHAMBER}" -eq 1 ]] && CHAMBER_ARG=("--enable-chamber")
(
cd "${REPO_ROOT}/e2e/docker"
RUN_ID="${RUN_ID}" \
FC_ADAPTER="${FC_ADAPTER}" \
VIO_STRATEGY="${VIO_STRATEGY}" \
TIER="tier2-jetson" \
JETSON_HOST="${JETSON_HOST_ARG}" \
docker compose \
-f docker-compose.test.yml \
-f docker-compose.tier2-bridge.yml \
run --rm \
-e TIER=tier2-jetson \
e2e-runner \
pytest /test-suite \
--csv="/e2e-results/run-${RUN_ID}/report.csv" \
--csv-columns="test_id,test_name,traces_to,fc_adapter,vio_strategy,tier,started_at_utc,execution_time_ms,result,error_message,evidence_paths" \
--evidence-out="/e2e-results/run-${RUN_ID}/evidence" \
--build-kind=production \
"${CHAMBER_ARG[@]}"
)
echo "[tier2] Suite complete. Report: ${RESULTS_DIR}/report.csv"
+131
View File
@@ -0,0 +1,131 @@
"""Parse tegrastats output stream → per-sample CSV rows.
tegrastats emits one line per sample. Each line begins with an ISO-ish
timestamp ("RAM 2345/7858MB ...") and includes RAM, GPU MHz, GPU load,
CPU load per-core, and thermal zone readings.
This parser is intentionally tolerant of unknown fields — JetPack 6.2 vs
6.3 vary in which tags they emit. Anything we cannot parse goes into an
``extras`` JSON column so downstream analysis can still inspect it.
Schema (CSV columns):
timestamp_utc_iso, ram_used_mb, ram_total_mb, gpu_load_pct,
gpu_freq_mhz, cpu_load_avg_pct, soc_temp_c, gpu_temp_c, extras_json
Usage:
tegrastats --interval 200 | python3 tegrastats_parser.py --out out.csv
"""
from __future__ import annotations
import argparse
import csv
import json
import re
import sys
from datetime import datetime, timezone
UTC = timezone.utc
from pathlib import Path
from typing import IO
CSV_COLUMNS = (
"timestamp_utc_iso",
"ram_used_mb",
"ram_total_mb",
"gpu_load_pct",
"gpu_freq_mhz",
"cpu_load_avg_pct",
"soc_temp_c",
"gpu_temp_c",
"extras_json",
)
_RAM_RE = re.compile(r"RAM\s+(\d+)/(\d+)MB")
_GR3D_RE = re.compile(r"GR3D_FREQ\s+(\d+)%@?(\d+)?")
_CPU_RE = re.compile(r"CPU\s+\[([^\]]+)\]")
_SOC_TEMP_RE = re.compile(r"(?:SOC|cpu)@(\d+(?:\.\d+)?)C", re.IGNORECASE)
_GPU_TEMP_RE = re.compile(r"GPU@(\d+(?:\.\d+)?)C", re.IGNORECASE)
def parse_line(line: str) -> dict[str, object] | None:
"""Parse one tegrastats line. Returns None if the line is empty/comment."""
line = line.strip()
if not line:
return None
row: dict[str, object] = {
"timestamp_utc_iso": datetime.now(UTC).isoformat(timespec="milliseconds"),
"ram_used_mb": "",
"ram_total_mb": "",
"gpu_load_pct": "",
"gpu_freq_mhz": "",
"cpu_load_avg_pct": "",
"soc_temp_c": "",
"gpu_temp_c": "",
"extras_json": "",
}
if m := _RAM_RE.search(line):
row["ram_used_mb"] = m.group(1)
row["ram_total_mb"] = m.group(2)
if m := _GR3D_RE.search(line):
row["gpu_load_pct"] = m.group(1)
if m.group(2):
row["gpu_freq_mhz"] = m.group(2)
if m := _CPU_RE.search(line):
cpu_field = m.group(1)
# Pattern looks like "67%@1190,55%@1190,..." or "off,55%@1190,..."
loads: list[float] = []
for tok in cpu_field.split(","):
head = tok.strip().split("%", 1)[0]
try:
loads.append(float(head))
except ValueError:
continue
if loads:
row["cpu_load_avg_pct"] = f"{sum(loads) / len(loads):.1f}"
if m := _SOC_TEMP_RE.search(line):
row["soc_temp_c"] = m.group(1)
if m := _GPU_TEMP_RE.search(line):
row["gpu_temp_c"] = m.group(1)
# Any line content not captured above goes into extras for downstream
# debugging — we never silently drop data.
extras = {"raw": line}
row["extras_json"] = json.dumps(extras, separators=(",", ":"))
return row
def stream_to_csv(source: IO[str], out_path: Path) -> int:
"""Stream tegrastats lines from ``source`` to a CSV file. Returns rows written."""
out_path.parent.mkdir(parents=True, exist_ok=True)
rows_written = 0
with out_path.open("w", newline="", encoding="utf-8") as fh:
writer = csv.DictWriter(fh, fieldnames=list(CSV_COLUMNS))
writer.writeheader()
for line in source:
row = parse_line(line)
if row is None:
continue
writer.writerow(row)
fh.flush()
rows_written += 1
return rows_written
def main() -> int:
parser = argparse.ArgumentParser(description="Parse tegrastats to CSV.")
parser.add_argument("--out", type=Path, required=True)
args = parser.parse_args()
n = stream_to_csv(sys.stdin, args.out)
print(f"tegrastats_parser: wrote {n} rows to {args.out}", file=sys.stderr)
return 0
if __name__ == "__main__":
raise SystemExit(main())
+44
View File
@@ -0,0 +1,44 @@
# systemd unit template for the SUT on Tier-2 Jetson runners.
#
# Copy to /etc/systemd/system/gps-denied-onboard.service, edit the
# Environment= lines for the local deployment, then:
# sudo systemctl daemon-reload
# sudo systemctl enable --now gps-denied-onboard.service
#
# `run-tier2.sh` calls `systemctl restart` before each suite — the unit
# must therefore be self-restoring. RestartSec is short because Tier-2
# tests budget 4 hours per matrix entry (`environment.md` § Timeout) and a
# slow restart cuts into that budget.
[Unit]
Description=gps-denied-onboard companion service (Tier-2 Jetson)
After=network-online.target
Wants=network-online.target
[Service]
Type=exec
User=azaion
Group=azaion
WorkingDirectory=/opt/gps-denied-onboard
Environment=ONBOARD_FC_ADAPTER=ardupilot
Environment=ONBOARD_VIO_STRATEGY=okvis2
Environment=MAVLINK_SIGNING_PASSKEY_FILE=/run/secrets/mavlink_passkey
Environment=TILE_CACHE_PATH=/var/azaion/tile-cache
Environment=FDR_OUTPUT_PATH=/var/azaion/fdr
ExecStart=/opt/gps-denied-onboard/bin/gps-denied-onboard --config /etc/azaion/onboard.yaml
Restart=on-failure
RestartSec=2s
StandardOutput=journal
StandardError=journal
# Resource budget mirrors restrictions.md § Onboard Hardware: 25 W TDP,
# 8 GB shared LPDDR5. systemd cgroup limits are a defence-in-depth gate;
# the SUT enforces these internally too.
MemoryMax=6G
TasksMax=512
# Allow tegrastats / jtop to read /sys without requiring CAP_SYS_ADMIN here.
ProtectKernelTunables=true
ProtectKernelModules=true
NoNewPrivileges=true
[Install]
WantedBy=multi-user.target