[AZ-406] Blackbox test harness bootstrap (Tier-1 + Tier-2 scaffold)

Bootstraps the public-boundary blackbox test harness owned by epic AZ-262 (E-BBT). Establishes the e2e/ directory tree at the repo root, fully separated from src/gps_denied_onboard/** and from the in-process tests/** tree, and commits to the contracts every subsequent test ticket (AZ-407..AZ-446) builds against. Tier-1 (workstation Docker): - docker/docker-compose.test.yml wires SUT + ArduPilot SITL + iNav SITL + mock Suite Sat Service + mavproxy listener + e2e-runner onto one e2e-net bridge with internal: true (enforces RESTRICT-SAT-1 / NFT-SEC-02 egress isolation at the network layer). - docker/docker-compose.tier2-bridge.yml override disables the in- compose SUT so Tier-2 pairs SITLs + mock + runner on an x86 host while the SUT runs natively on the Jetson under systemd. Tier-2 (Jetson): - jetson/run-tier2.sh + tier2.service systemd unit + tegrastats / jtop parsers feed per-sample telemetry into the evidence bundle. Runner image (e2e/runner/): - Dockerfile + requirements.txt install ONLY ground-side libs (pymavlink, opencv-python>=4.12, numpy/scipy/geopy/pyproj, httpx, orjson, pydantic, structlog, pytest 8.x). The runner deliberately does NOT install the SUT package. - conftest.py implements the AC-9 skip-rule mapping (tier2_only, chamber_only, vins_mono, deferred_ac) tied to environment.md parametrize axes. - reporting/csv_reporter.py is a pytest plugin emitting one row per test with the exact 11-column schema from environment.md § Reporting (test_id, test_name, traces_to, fc_adapter, vio_strategy, tier, started_at_utc, execution_time_ms, result, error_message, evidence_paths). XFAIL surfaced only when a test carries @pytest.mark.deferred_ac(verdict="xfail", reason=...). - reporting/evidence_bundler.py exposes the attach_evidence fixture that copies per-test artifacts (.tlog, FDR archives, screenshots, tegrastats / jtop CSVs) into the run bundle and records relative paths into the reporter's evidence_paths column. - helpers/{frame_source_replay,imu_replay,sitl_observer, mavproxy_tlog_reader,fdr_reader}.py declare the public surfaces (concrete implementations owned by AZ-407 / AZ-408 / AZ-416 / AZ-417 / AZ-441 per the dependency table); helpers/geo.py ships today (no downstream task dep) — WGS84 distance / forward-bearing / offset via pyproj with NaN rejection. Mock Suite Sat Service (e2e/fixtures/mock-suite-sat/): - FastAPI app: POST /tiles (ingest contract from D-PROJ-2 follow-up), GET /tiles/audit + /mock/audit (per-run read-back), POST /mock/config (force-status, response delay), POST /mock/reset (clears audit between tests), GET /mock/health. Fixture scaffolds (e2e/fixtures/{tile-cache-builder, age-injector, injectors, cold-boot, secrets, security}/): - Public surfaces only. Concrete builders land in AZ-407 (static fixtures), AZ-408 (runtime synthetic injection), AZ-419 (cold-boot fixture), AZ-439 (CVE-2025-53644 JPEG generator). Test tree (e2e/tests/{positive,negative,performance,resilience, security,resource_limit}/): - Mirror of the test-spec category grouping in _docs/02_document/tests/*-tests.md. - tests/positive/test_smoke.py is the AC-1 harness-boot smoke run inside the e2e-runner image once Docker brings everything up. Out-of-container unit tests (e2e/_unit_tests/): - Exercises the harness internals (CSV reporter plugin lifecycle, conftest skip rules, helper modules, parsers, mock app, compose YAML structural contract, public-boundary enforcement) without Docker / SITL. 97 unit tests, all passing. Build / config: - pyproject.toml: testpaths extended with e2e/_unit_tests; pythonpath extended with e2e; fastapi>=0.111,<0.120 added to dev extras for the mock-app TestClient unit test. AC coverage: - AC-1 (Tier-1 boot) → compose YAML test + directory layout + smoke test (Docker-bound) - AC-2 (mock services) → 6 FastAPI TestClient unit tests - AC-3 (SITLs accept output) → contract present; concrete check deferred to AZ-416 / AZ-417 - AC-4 (CSV columns) → in-process plugin lifecycle test emits the exact 11-column schema - AC-5 (egress isolation) → static config test + runtime probe in Docker-bound smoke - AC-6 (Tier-2 contract) → tegrastats + jtop parser unit tests + jetson/* layout test; full Tier-2 contract is AZ-444 - AC-7 (fixture reproducibility) → deferred to AZ-407 per task spec - AC-8 (parametrize matrix) → vins_mono skip-rule cases + tests/positive/test_smoke - AC-9 (skip semantics) → 9 conftest skip-rule unit tests Module layout entry for blackbox_tests was added in 2026-05-16 preparatory commit d7a17a8 so this diff stays focused on the harness scaffold. AZ-406 advances to In Testing on commit. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 13:41:14 +00:00 · 2026-05-16 16:22:44 +03:00
parent d7a17a8248
commit 59d9116d36
72 changed files with 3515 additions and 6 deletions
@@ -0,0 +1,129 @@
+"""Sample jtop (jetson-stats) Python API → per-sample CSV rows.
+
+Unlike tegrastats which is a stdout stream, jtop exposes a Python API
+that emits a polled state dictionary. We poll at a caller-supplied
+cadence and convert the relevant fields to CSV columns aligned with the
+tegrastats output where the two overlap.
+
+Schema (CSV columns):
+    timestamp_utc_iso, ram_used_mb, ram_total_mb, gpu_load_pct,
+    gpu_freq_mhz, cpu_load_avg_pct, soc_temp_c, gpu_temp_c, power_mw,
+    extras_json
+
+Usage:
+    python3 jtop_parser.py --out out.csv --interval 1.0
+"""
+
+from __future__ import annotations
+
+import argparse
+import csv
+import json
+import time
+from datetime import datetime, timezone
+
+UTC = timezone.utc
+from pathlib import Path
+
+
+CSV_COLUMNS = (
+    "timestamp_utc_iso",
+    "ram_used_mb",
+    "ram_total_mb",
+    "gpu_load_pct",
+    "gpu_freq_mhz",
+    "cpu_load_avg_pct",
+    "soc_temp_c",
+    "gpu_temp_c",
+    "power_mw",
+    "extras_json",
+)
+
+
+def state_to_row(state: object) -> dict[str, object]:
+    """Convert one jtop polled-state object to a CSV row.
+
+    `state` is whatever `jtop.jtop().stats` returns; on real Jetson runs it
+    is a `JtopStats` dataclass-ish object exposing `ram`, `gpu`, `cpu`,
+    `temperature`, `power`. We extract defensively because jetson-stats
+    schema has shifted across versions.
+    """
+
+    def _get(obj: object, *path: str, default: object = "") -> object:
+        cur = obj
+        for key in path:
+            if cur is None:
+                return default
+            if isinstance(cur, dict):
+                cur = cur.get(key, default)
+            else:
+                cur = getattr(cur, key, default)
+        return cur if cur is not None else default
+
+    row: dict[str, object] = {
+        "timestamp_utc_iso": datetime.now(UTC).isoformat(timespec="milliseconds"),
+        "ram_used_mb": _get(state, "ram", "used"),
+        "ram_total_mb": _get(state, "ram", "tot"),
+        "gpu_load_pct": _get(state, "gpu", "load"),
+        "gpu_freq_mhz": _get(state, "gpu", "freq", "cur"),
+        "cpu_load_avg_pct": _get(state, "cpu", "load_avg", default=""),
+        "soc_temp_c": _get(state, "temperature", "SOC", default=""),
+        "gpu_temp_c": _get(state, "temperature", "GPU", default=""),
+        "power_mw": _get(state, "power", "total", default=""),
+        "extras_json": "",
+    }
+    return row
+
+
+def run(out_path: Path, interval_s: float, samples_max: int | None = None) -> int:
+    """Poll jtop and write rows to ``out_path``. Returns rows written.
+
+    On hosts without jetson-stats installed (e.g., unit-test runs on dev
+    workstations), the function ImportError → emits a single "stub" row
+    pointing at the missing dependency and exits. This keeps Tier-2 dry
+    runs and CI smoke happy without forcing CI to install jetson-stats.
+    """
+    out_path.parent.mkdir(parents=True, exist_ok=True)
+    rows_written = 0
+    try:
+        from jtop import jtop  # type: ignore[import-untyped]
+    except ImportError as exc:
+        with out_path.open("w", newline="", encoding="utf-8") as fh:
+            writer = csv.DictWriter(fh, fieldnames=list(CSV_COLUMNS))
+            writer.writeheader()
+            writer.writerow(
+                {
+                    **{col: "" for col in CSV_COLUMNS},
+                    "timestamp_utc_iso": datetime.now(UTC).isoformat(timespec="milliseconds"),
+                    "extras_json": json.dumps({"stub": True, "missing_dep": "jetson-stats", "import_error": str(exc)}),
+                }
+            )
+        return 1
+
+    with jtop() as poll, out_path.open("w", newline="", encoding="utf-8") as fh:
+        writer = csv.DictWriter(fh, fieldnames=list(CSV_COLUMNS))
+        writer.writeheader()
+        while poll.ok():
+            row = state_to_row(poll.stats)
+            writer.writerow(row)
+            fh.flush()
+            rows_written += 1
+            if samples_max is not None and rows_written >= samples_max:
+                break
+            time.sleep(interval_s)
+    return rows_written
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser(description="Sample jtop → CSV.")
+    parser.add_argument("--out", type=Path, required=True)
+    parser.add_argument("--interval", type=float, default=1.0, help="Poll interval in seconds.")
+    parser.add_argument("--samples-max", type=int, default=None)
+    args = parser.parse_args()
+    n = run(args.out, args.interval, args.samples_max)
+    print(f"jtop_parser: wrote {n} rows to {args.out}")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
@@ -0,0 +1,148 @@
+#!/usr/bin/env bash
+# Tier-2 Jetson hardware-loop entrypoint.
+#
+# Usage:
+#   ./run-tier2.sh --fc-adapter <ardupilot|inav> --vio-strategy <okvis2|klt_ransac> [--duration <5min|8h>] [--enable-chamber]
+#
+# Pre-requisites (verified at startup):
+#   * The Jetson is provisioned per `_docs/02_document/tests/environment.md`
+#     § Execution instructions — Tier-2 (JetPack 6.2, CUDA, TensorRT 10.3, cuDNN).
+#   * `gps-denied-onboard.service` is installed via systemd
+#     (`tier2.service` is the template; operator copies it to /etc/systemd/system).
+#   * SITLs + mock + listener + runner reachable on the same network via
+#     `docker compose -f e2e/docker/docker-compose.test.yml -f e2e/docker/docker-compose.tier2-bridge.yml up ...`
+#     on a paired x86 host. (Same-Jetson SITL is also supported — set JETSON_HOST=localhost.)
+#
+# Outputs the same CSV format as Tier-1 to ./e2e-results/run-${RUN_ID}/report.csv
+# plus the per-sample tegrastats + jtop CSVs in the evidence bundle.
+
+set -euo pipefail
+
+FC_ADAPTER=""
+VIO_STRATEGY=""
+DURATION="5min"
+ENABLE_CHAMBER=0
+JETSON_HOST_OVERRIDE=""
+
+usage() {
+    grep -E '^# ' "$0" | sed 's/^# //'
+    exit 1
+}
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        --fc-adapter) FC_ADAPTER="$2"; shift 2 ;;
+        --vio-strategy) VIO_STRATEGY="$2"; shift 2 ;;
+        --duration) DURATION="$2"; shift 2 ;;
+        --enable-chamber) ENABLE_CHAMBER=1; shift ;;
+        --jetson-host) JETSON_HOST_OVERRIDE="$2"; shift 2 ;;
+        -h|--help) usage ;;
+        *) echo "Unknown arg: $1" >&2; usage ;;
+    esac
+done
+
+if [[ -z "$FC_ADAPTER" || -z "$VIO_STRATEGY" ]]; then
+    echo "ERROR: --fc-adapter and --vio-strategy are required" >&2
+    usage
+fi
+
+case "$FC_ADAPTER" in
+    ardupilot|inav) ;;
+    *) echo "ERROR: --fc-adapter must be ardupilot or inav (got: $FC_ADAPTER)" >&2; exit 2 ;;
+esac
+
+case "$VIO_STRATEGY" in
+    okvis2|klt_ransac|vins_mono) ;;
+    *) echo "ERROR: --vio-strategy must be okvis2 | klt_ransac | vins_mono (got: $VIO_STRATEGY)" >&2; exit 2 ;;
+esac
+
+# RUN_ID — caller may set; default is utc-stamp + adapter pair.
+: "${RUN_ID:=tier2-$(date -u +%Y%m%dT%H%M%SZ)-${FC_ADAPTER}-${VIO_STRATEGY}}"
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
+RESULTS_DIR="${REPO_ROOT}/e2e-results/run-${RUN_ID}"
+EVIDENCE_DIR="${RESULTS_DIR}/evidence"
+
+mkdir -p "${EVIDENCE_DIR}"
+
+echo "[tier2] RUN_ID=${RUN_ID}"
+echo "[tier2] FC_ADAPTER=${FC_ADAPTER}  VIO_STRATEGY=${VIO_STRATEGY}  DURATION=${DURATION}"
+echo "[tier2] RESULTS_DIR=${RESULTS_DIR}"
+
+# ---------------------------------------------------------------------------
+# Pre-flight: confirm the SUT systemd unit is healthy.
+# ---------------------------------------------------------------------------
+if ! systemctl is-active --quiet gps-denied-onboard.service; then
+    echo "[tier2] gps-denied-onboard.service is not active — attempting restart..." >&2
+    sudo systemctl restart gps-denied-onboard.service
+    sleep 3
+    if ! systemctl is-active --quiet gps-denied-onboard.service; then
+        echo "[tier2] FATAL: gps-denied-onboard.service failed to start" >&2
+        sudo systemctl status gps-denied-onboard.service --no-pager || true
+        exit 3
+    fi
+fi
+
+# ---------------------------------------------------------------------------
+# Start tegrastats + jtop background samplers (evidence bundle inputs).
+# ---------------------------------------------------------------------------
+TEGRA_CSV="${EVIDENCE_DIR}/tegrastats.csv"
+JTOP_CSV="${EVIDENCE_DIR}/jtop.csv"
+
+# tegrastats emits at 5 Hz by default; parser converts to per-sample CSV rows.
+if command -v tegrastats >/dev/null 2>&1; then
+    tegrastats --interval 200 \
+        | python3 "${SCRIPT_DIR}/tegrastats_parser.py" --out "${TEGRA_CSV}" &
+    TEGRA_PID=$!
+else
+    echo "[tier2] WARNING: tegrastats not in PATH — skipping that evidence channel." >&2
+    TEGRA_PID=
+fi
+
+if command -v jtop >/dev/null 2>&1; then
+    python3 "${SCRIPT_DIR}/jtop_parser.py" --out "${JTOP_CSV}" --interval 1.0 &
+    JTOP_PID=$!
+else
+    echo "[tier2] WARNING: jtop not in PATH — skipping that evidence channel." >&2
+    JTOP_PID=
+fi
+
+cleanup() {
+    local rc=$?
+    [[ -n "${TEGRA_PID:-}" ]] && kill "${TEGRA_PID}" 2>/dev/null || true
+    [[ -n "${JTOP_PID:-}" ]] && kill "${JTOP_PID}" 2>/dev/null || true
+    echo "[tier2] cleanup complete (rc=${rc})"
+    exit "${rc}"
+}
+trap cleanup EXIT INT TERM
+
+# ---------------------------------------------------------------------------
+# Run the e2e suite — the runner image is the SAME as Tier-1; only TIER differs.
+# ---------------------------------------------------------------------------
+JETSON_HOST_ARG="${JETSON_HOST_OVERRIDE:-localhost}"
+CHAMBER_ARG=()
+[[ "${ENABLE_CHAMBER}" -eq 1 ]] && CHAMBER_ARG=("--enable-chamber")
+
+(
+    cd "${REPO_ROOT}/e2e/docker"
+    RUN_ID="${RUN_ID}" \
+    FC_ADAPTER="${FC_ADAPTER}" \
+    VIO_STRATEGY="${VIO_STRATEGY}" \
+    TIER="tier2-jetson" \
+    JETSON_HOST="${JETSON_HOST_ARG}" \
+    docker compose \
+        -f docker-compose.test.yml \
+        -f docker-compose.tier2-bridge.yml \
+        run --rm \
+        -e TIER=tier2-jetson \
+        e2e-runner \
+        pytest /test-suite \
+            --csv="/e2e-results/run-${RUN_ID}/report.csv" \
+            --csv-columns="test_id,test_name,traces_to,fc_adapter,vio_strategy,tier,started_at_utc,execution_time_ms,result,error_message,evidence_paths" \
+            --evidence-out="/e2e-results/run-${RUN_ID}/evidence" \
+            --build-kind=production \
+            "${CHAMBER_ARG[@]}"
+)
+
+echo "[tier2] Suite complete. Report: ${RESULTS_DIR}/report.csv"
@@ -0,0 +1,131 @@
+"""Parse tegrastats output stream → per-sample CSV rows.
+
+tegrastats emits one line per sample. Each line begins with an ISO-ish
+timestamp ("RAM 2345/7858MB ...") and includes RAM, GPU MHz, GPU load,
+CPU load per-core, and thermal zone readings.
+
+This parser is intentionally tolerant of unknown fields — JetPack 6.2 vs
+6.3 vary in which tags they emit. Anything we cannot parse goes into an
+``extras`` JSON column so downstream analysis can still inspect it.
+
+Schema (CSV columns):
+    timestamp_utc_iso, ram_used_mb, ram_total_mb, gpu_load_pct,
+    gpu_freq_mhz, cpu_load_avg_pct, soc_temp_c, gpu_temp_c, extras_json
+
+Usage:
+    tegrastats --interval 200 | python3 tegrastats_parser.py --out out.csv
+"""
+
+from __future__ import annotations
+
+import argparse
+import csv
+import json
+import re
+import sys
+from datetime import datetime, timezone
+
+UTC = timezone.utc
+from pathlib import Path
+from typing import IO
+
+
+CSV_COLUMNS = (
+    "timestamp_utc_iso",
+    "ram_used_mb",
+    "ram_total_mb",
+    "gpu_load_pct",
+    "gpu_freq_mhz",
+    "cpu_load_avg_pct",
+    "soc_temp_c",
+    "gpu_temp_c",
+    "extras_json",
+)
+
+_RAM_RE = re.compile(r"RAM\s+(\d+)/(\d+)MB")
+_GR3D_RE = re.compile(r"GR3D_FREQ\s+(\d+)%@?(\d+)?")
+_CPU_RE = re.compile(r"CPU\s+\[([^\]]+)\]")
+_SOC_TEMP_RE = re.compile(r"(?:SOC|cpu)@(\d+(?:\.\d+)?)C", re.IGNORECASE)
+_GPU_TEMP_RE = re.compile(r"GPU@(\d+(?:\.\d+)?)C", re.IGNORECASE)
+
+
+def parse_line(line: str) -> dict[str, object] | None:
+    """Parse one tegrastats line. Returns None if the line is empty/comment."""
+    line = line.strip()
+    if not line:
+        return None
+
+    row: dict[str, object] = {
+        "timestamp_utc_iso": datetime.now(UTC).isoformat(timespec="milliseconds"),
+        "ram_used_mb": "",
+        "ram_total_mb": "",
+        "gpu_load_pct": "",
+        "gpu_freq_mhz": "",
+        "cpu_load_avg_pct": "",
+        "soc_temp_c": "",
+        "gpu_temp_c": "",
+        "extras_json": "",
+    }
+
+    if m := _RAM_RE.search(line):
+        row["ram_used_mb"] = m.group(1)
+        row["ram_total_mb"] = m.group(2)
+
+    if m := _GR3D_RE.search(line):
+        row["gpu_load_pct"] = m.group(1)
+        if m.group(2):
+            row["gpu_freq_mhz"] = m.group(2)
+
+    if m := _CPU_RE.search(line):
+        cpu_field = m.group(1)
+        # Pattern looks like "67%@1190,55%@1190,..." or "off,55%@1190,..."
+        loads: list[float] = []
+        for tok in cpu_field.split(","):
+            head = tok.strip().split("%", 1)[0]
+            try:
+                loads.append(float(head))
+            except ValueError:
+                continue
+        if loads:
+            row["cpu_load_avg_pct"] = f"{sum(loads) / len(loads):.1f}"
+
+    if m := _SOC_TEMP_RE.search(line):
+        row["soc_temp_c"] = m.group(1)
+    if m := _GPU_TEMP_RE.search(line):
+        row["gpu_temp_c"] = m.group(1)
+
+    # Any line content not captured above goes into extras for downstream
+    # debugging — we never silently drop data.
+    extras = {"raw": line}
+    row["extras_json"] = json.dumps(extras, separators=(",", ":"))
+    return row
+
+
+def stream_to_csv(source: IO[str], out_path: Path) -> int:
+    """Stream tegrastats lines from ``source`` to a CSV file. Returns rows written."""
+    out_path.parent.mkdir(parents=True, exist_ok=True)
+    rows_written = 0
+    with out_path.open("w", newline="", encoding="utf-8") as fh:
+        writer = csv.DictWriter(fh, fieldnames=list(CSV_COLUMNS))
+        writer.writeheader()
+        for line in source:
+            row = parse_line(line)
+            if row is None:
+                continue
+            writer.writerow(row)
+            fh.flush()
+            rows_written += 1
+    return rows_written
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser(description="Parse tegrastats to CSV.")
+    parser.add_argument("--out", type=Path, required=True)
+    args = parser.parse_args()
+    n = stream_to_csv(sys.stdin, args.out)
+    print(f"tegrastats_parser: wrote {n} rows to {args.out}", file=sys.stderr)
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
@@ -0,0 +1,44 @@
+# systemd unit template for the SUT on Tier-2 Jetson runners.
+#
+# Copy to /etc/systemd/system/gps-denied-onboard.service, edit the
+# Environment= lines for the local deployment, then:
+#     sudo systemctl daemon-reload
+#     sudo systemctl enable --now gps-denied-onboard.service
+#
+# `run-tier2.sh` calls `systemctl restart` before each suite — the unit
+# must therefore be self-restoring. RestartSec is short because Tier-2
+# tests budget 4 hours per matrix entry (`environment.md` § Timeout) and a
+# slow restart cuts into that budget.
+
+[Unit]
+Description=gps-denied-onboard companion service (Tier-2 Jetson)
+After=network-online.target
+Wants=network-online.target
+
+[Service]
+Type=exec
+User=azaion
+Group=azaion
+WorkingDirectory=/opt/gps-denied-onboard
+Environment=ONBOARD_FC_ADAPTER=ardupilot
+Environment=ONBOARD_VIO_STRATEGY=okvis2
+Environment=MAVLINK_SIGNING_PASSKEY_FILE=/run/secrets/mavlink_passkey
+Environment=TILE_CACHE_PATH=/var/azaion/tile-cache
+Environment=FDR_OUTPUT_PATH=/var/azaion/fdr
+ExecStart=/opt/gps-denied-onboard/bin/gps-denied-onboard --config /etc/azaion/onboard.yaml
+Restart=on-failure
+RestartSec=2s
+StandardOutput=journal
+StandardError=journal
+# Resource budget mirrors restrictions.md § Onboard Hardware: 25 W TDP,
+# 8 GB shared LPDDR5. systemd cgroup limits are a defence-in-depth gate;
+# the SUT enforces these internally too.
+MemoryMax=6G
+TasksMax=512
+# Allow tegrastats / jtop to read /sys without requiring CAP_SYS_ADMIN here.
+ProtectKernelTunables=true
+ProtectKernelModules=true
+NoNewPrivileges=true
+
+[Install]
+WantedBy=multi-user.target