Files
Oleksandr Bezdieniezhnykh 6599d828d2 [AZ-407] [AZ-444] [AZ-445] Batch 68: fixtures, Tier-2 harness, NFR reporter
Three blackbox-harness tasks landed together — all depend only on
AZ-406 and unblock the FT-* / NFT-* scenario tasks scheduled for
batches 69+.

AZ-407 — Static fixture builders (3pt):
  * tile-cache-builder/{builder.py, Dockerfile, build.sh} produces a
    deterministic tile-cache-fixture Docker volume from
    _docs/00_problem/input_data/. Reproducibility primitives: sorted
    iteration, frozen PIL JPEG settings, FAISS HNSW32 built single-
    threaded with seeded stub descriptors.
  * age-injector/{age_injector.py, inject.sh} clones the volume and
    shifts capture_date by N×30.44 days; tile JPEG bytes preserved
    bit-identical. Emits synth-age-7mo + synth-age-13mo volumes.
  * cold-boot/cold_boot_fixture.json: frozen FC pose snapshot at
    Derkachi sector centre, schema v1.
  * secrets/mavlink-test-passkey.txt: 64-hex with required
    `# TEST ONLY` header line per AC-5. Passkey-equality test now
    compares the secret line after stripping the header.
  * security/cve-2025-53644.jpg: synthetic 158-byte malformed JPEG
    (truncated SOS marker). OpenCV 4.11.x rejects gracefully with
    imdecode → None. AZ-439 will sharpen for ASan instrumentation.
  * Top-level Makefile with `make fixtures` / `make fixtures-*` /
    `make e2e-tier1*` / `make unit-tests` targets.

AZ-444 — Tier-2 Jetson harness wrapper (5pt):
  * run-tier2.sh rewritten as orchestrator. Detects local
    (aarch64 + TIER2_HOST=localhost) vs remote (ssh into TIER2_HOST).
    New flags: -k/--selector, --build-kind production|asan,
    --reflash (gated behind TIER2_REFLASH_ACK=1 two-key gate),
    --dry-run.
  * tier2-on-jetson.sh (new) — on-device delegate. Verifies
    gps-denied-onboard{,-asan}.service health; restarts with 5s
    tolerance; spawns tegrastats + jtop parallel samplers; tails
    ASan unit's journal in asan mode; drives docker compose with
    TIER=tier2-jetson; forwards SELECTOR to pytest -k.
  * docker/run-tier1.sh (new) — selector-parity sibling.
  * AC-1 (selector parity) and AC-6 (reflash gating) unit-tested via
    --dry-run output assertions. AC-2/AC-3/AC-4/AC-5 are hardware-
    loop ACs verified by the Tier-2 runtime smoke (no Jetson in the
    unit-test layer).

AZ-445 — CSV reporter + evidence bundler refinements (2pt):
  * reporting/nfr_recorder.py (new) — pytest plugin. Provides the
    `nfr_recorder` fixture with record_metric(name, value, ac_id)
    and partial(ac_id, reason). At session end emits:
      - per-nfr/<scenario_id>.json (AC-1)
      - traceability-status.json with every AC ID parsed from
        traceability-matrix.md, classified Covered/PARTIAL/NOT
        COVERED with source scenario IDs (AC-2)
      - regression-baseline.json with all numeric metrics (AC-3)
  * csv_reporter.py extended — `_outcome_to_result` consults the
    aggregator; rows flip PASS → PARTIAL when an AC was marked
    PARTIAL by nfr_recorder (AC-4). Graceful fallback when
    aggregator isn't registered (unit-test contexts).
  * conftest.py registers nfr_recorder in pytest_plugins.
  * New --traceability-matrix CLI flag seeds the NOT COVERED rows.

Build / config:
  * pyproject.toml dev extras: added Pillow>=10.4,<13.0 for the
    tile-cache-builder unit test (broad enough to keep torchvision's
    Pillow 12 pin happy; the production builder runs inside its own
    Docker image with its own pin).
  * Updated test_directory_layout.py to cover 10 new files + replaced
    the byte-equal passkey assertion with the header-stripping
    variant.

Test results:
  * 157 focused tests pass (was 97 in batch 67; +60 new across this
    batch). No regressions.

Module-layout / spec drift:
  * AZ-407 spec text says `tests/fixtures/...`; module-layout
    blackbox_tests entry (commit d7a17a8) authoritatively places the
    harness under `e2e/`. Implementation followed the layout entry.
  * AZ-444 spec mentions `e2e/tier2/run-tier2.sh`; AZ-406 placed it
    at `e2e/jetson/run-tier2.sh`. Kept at `e2e/jetson/` for
    consistency.
  * Cold-boot README ownership: corrected from AZ-419 to AZ-407 per
    AZ-419's own Dependencies field.

Specs archived to _docs/02_tasks/done/. Jira tickets transitioned to
In Testing on commit.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-16 17:18:01 +03:00

238 lines
8.8 KiB
Bash
Executable File

#!/usr/bin/env bash
# Tier-2 Jetson hardware-loop entrypoint (orchestrator).
#
# This script runs FROM a control host (typically x86) and ssh-orchestrates
# the on-Jetson half (`tier2-on-jetson.sh`). When invoked on the Jetson
# itself (uname -m == aarch64 AND TIER2_HOST=localhost), it delegates
# directly without going through ssh.
#
# Usage:
# ./run-tier2.sh \
# --fc-adapter <ardupilot|inav> \
# --vio-strategy <okvis2|klt_ransac|vins_mono> \
# [-k <pytest selector>] \
# [--build-kind <production|asan>] \
# [--duration <5min|8h>] \
# [--enable-chamber] \
# [--reflash] \
# [--dry-run]
#
# Required env vars (when TIER2_HOST != localhost):
# TIER2_HOST Jetson hostname or IP
# TIER2_USER SSH user on the Jetson
# TIER2_KEY_PATH Path to the SSH private key
#
# Pre-requisites verified at startup:
# * The Jetson is provisioned per `_docs/02_document/tests/environment.md`
# § Execution instructions — Tier-2 (JetPack 6.2, CUDA, TensorRT 10.3,
# cuDNN).
# * `gps-denied-onboard.service` (or `gps-denied-onboard-asan.service`
# for --build-kind=asan) is installed via systemd. `tier2.service` is
# the template.
# * SITLs + mock + listener + runner reachable on the same network via
# `docker compose -f e2e/docker/docker-compose.test.yml
# -f e2e/docker/docker-compose.tier2-bridge.yml up ...`
# on a paired x86 host (same as Tier-1's `docker-compose.test.yml`
# network).
#
# Outputs the same CSV format as Tier-1 to
# ./e2e-results/run-${RUN_ID}/report.csv
# plus the per-sample tegrastats + jtop CSVs in the evidence bundle.
set -euo pipefail
FC_ADAPTER=""
VIO_STRATEGY=""
SELECTOR=""
BUILD_KIND="production"
DURATION="5min"
ENABLE_CHAMBER=0
RUN_REFLASH=0
DRY_RUN=0
usage() {
grep -E '^# ' "$0" | sed 's/^# //' >&2
exit 1
}
while [[ $# -gt 0 ]]; do
case "$1" in
--fc-adapter) FC_ADAPTER="$2"; shift 2 ;;
--vio-strategy) VIO_STRATEGY="$2"; shift 2 ;;
-k|--selector) SELECTOR="$2"; shift 2 ;;
--build-kind) BUILD_KIND="$2"; shift 2 ;;
--duration) DURATION="$2"; shift 2 ;;
--enable-chamber) ENABLE_CHAMBER=1; shift ;;
--reflash) RUN_REFLASH=1; shift ;;
--dry-run) DRY_RUN=1; shift ;;
-h|--help) usage ;;
*) echo "Unknown arg: $1" >&2; usage ;;
esac
done
if [[ -z "$FC_ADAPTER" || -z "$VIO_STRATEGY" ]]; then
echo "ERROR: --fc-adapter and --vio-strategy are required" >&2
usage
fi
case "$FC_ADAPTER" in
ardupilot|inav) ;;
*) echo "ERROR: --fc-adapter must be ardupilot or inav (got: $FC_ADAPTER)" >&2; exit 2 ;;
esac
case "$VIO_STRATEGY" in
okvis2|klt_ransac|vins_mono) ;;
*) echo "ERROR: --vio-strategy must be okvis2 | klt_ransac | vins_mono (got: $VIO_STRATEGY)" >&2; exit 2 ;;
esac
case "$BUILD_KIND" in
production|asan) ;;
*) echo "ERROR: --build-kind must be production or asan (got: $BUILD_KIND)" >&2; exit 2 ;;
esac
# AC-6 (image-flash gating). Even when --reflash is requested, refuse to
# proceed unless the operator has acknowledged via TIER2_REFLASH_ACK=1.
# This is a two-key gate so a stray flag flip in CI cannot accidentally
# re-provision a development board.
if [[ "${RUN_REFLASH}" -eq 1 ]]; then
if [[ "${TIER2_REFLASH_ACK:-0}" != "1" ]]; then
echo "ERROR: --reflash requires TIER2_REFLASH_ACK=1 in the env" >&2
echo " This is a destructive operation; set the ack to" >&2
echo " confirm you intend to re-flash the Jetson via" >&2
echo " nvidia-sdkmanager-cli." >&2
exit 4
fi
fi
# RUN_ID — caller may set; default is utc-stamp + adapter pair.
: "${RUN_ID:=tier2-$(date -u +%Y%m%dT%H%M%SZ)-${FC_ADAPTER}-${VIO_STRATEGY}}"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
# ---------------------------------------------------------------------------
# Determine mode:
# * local mode — run on the Jetson itself; no ssh wrapper.
# Triggered when TIER2_HOST=localhost OR is unset on an aarch64 host.
# * remote mode — orchestrator: ssh into TIER2_HOST and execute the
# on-Jetson delegate there.
# ---------------------------------------------------------------------------
TIER2_HOST="${TIER2_HOST:-}"
if [[ -z "${TIER2_HOST}" ]]; then
if [[ "$(uname -m)" == "aarch64" ]]; then
TIER2_HOST="localhost"
else
echo "ERROR: TIER2_HOST must be set when running from a non-Jetson host" >&2
echo " (uname -m is $(uname -m); this script is not running on a Jetson)" >&2
exit 5
fi
fi
echo "[tier2] RUN_ID=${RUN_ID}"
echo "[tier2] FC_ADAPTER=${FC_ADAPTER} VIO_STRATEGY=${VIO_STRATEGY} BUILD_KIND=${BUILD_KIND}"
echo "[tier2] SELECTOR='${SELECTOR}' DURATION=${DURATION} ENABLE_CHAMBER=${ENABLE_CHAMBER}"
echo "[tier2] TIER2_HOST=${TIER2_HOST}"
# ---------------------------------------------------------------------------
# Build the ssh command prefix for the orchestrator mode.
# ---------------------------------------------------------------------------
SSH_CMD=""
if [[ "${TIER2_HOST}" != "localhost" ]]; then
: "${TIER2_USER:?TIER2_USER must be set for remote orchestrator mode}"
: "${TIER2_KEY_PATH:?TIER2_KEY_PATH must be set for remote orchestrator mode}"
if [[ ! -f "${TIER2_KEY_PATH}" ]]; then
echo "ERROR: TIER2_KEY_PATH does not point at a real file: ${TIER2_KEY_PATH}" >&2
exit 6
fi
SSH_CMD="ssh -o StrictHostKeyChecking=accept-new -i ${TIER2_KEY_PATH} ${TIER2_USER}@${TIER2_HOST}"
fi
# ---------------------------------------------------------------------------
# AC-2: idempotent provisioning. apt update + install is idempotent on
# its own; we just gate it behind a `--reflash` flag because re-running
# it on every test invocation is needlessly slow.
# ---------------------------------------------------------------------------
provision_jetson() {
local PROVISION_CMD
PROVISION_CMD="set -eu;
if ! dpkg -s python3-pip >/dev/null 2>&1; then
sudo apt-get update;
sudo apt-get install -y --no-install-recommends \
python3-pip docker.io openssh-client iproute2;
fi"
if [[ "${TIER2_HOST}" == "localhost" ]]; then
bash -c "${PROVISION_CMD}"
else
# shellcheck disable=SC2086
${SSH_CMD} "${PROVISION_CMD}"
fi
}
# ---------------------------------------------------------------------------
# AC-6: reflash via NVIDIA's sdkmanager-cli. This is the destructive
# path; only runs when --reflash AND TIER2_REFLASH_ACK=1 are BOTH set.
# ---------------------------------------------------------------------------
reflash_jetson() {
local FLASH_CMD
FLASH_CMD="set -eu;
if ! command -v nvidia-sdkmanager-cli >/dev/null 2>&1; then
echo 'ERROR: nvidia-sdkmanager-cli not installed on Jetson' >&2
exit 7
fi
echo '[tier2] re-flashing JetPack image via nvidia-sdkmanager-cli...' >&2
nvidia-sdkmanager-cli flash --target-spec jetson-orin-nano-super"
if [[ "${TIER2_HOST}" == "localhost" ]]; then
bash -c "${FLASH_CMD}"
else
# shellcheck disable=SC2086
${SSH_CMD} "${FLASH_CMD}"
fi
}
# ---------------------------------------------------------------------------
# Execute the on-Jetson delegate.
# ---------------------------------------------------------------------------
ENV_PREFIX=(
"RUN_ID=${RUN_ID}"
"FC_ADAPTER=${FC_ADAPTER}"
"VIO_STRATEGY=${VIO_STRATEGY}"
"BUILD_KIND=${BUILD_KIND}"
"SELECTOR=${SELECTOR}"
"ENABLE_CHAMBER=${ENABLE_CHAMBER}"
"JETSON_HOST=${TIER2_HOST}"
)
if [[ "${TIER2_HOST}" == "localhost" ]]; then
DELEGATE_CMD=(env "${ENV_PREFIX[@]}" "${SCRIPT_DIR}/tier2-on-jetson.sh")
else
# Remote mode: rsync the e2e/ tree onto the Jetson and run the
# delegate over ssh. We mirror the repo to /opt/azaion-e2e/ on the
# Jetson; subsequent invocations are incremental via rsync's default
# delta-transfer.
REMOTE_REPO="/opt/azaion-e2e"
RSYNC_CMD="rsync -az --delete -e 'ssh -o StrictHostKeyChecking=accept-new -i ${TIER2_KEY_PATH}' ${REPO_ROOT}/e2e/ ${TIER2_USER}@${TIER2_HOST}:${REMOTE_REPO}/e2e/"
DELEGATE_CMD=(
bash -c
"${RSYNC_CMD} && ${SSH_CMD} \"env $(printf '%q ' "${ENV_PREFIX[@]}")${REMOTE_REPO}/e2e/jetson/tier2-on-jetson.sh\""
)
fi
if [[ "${DRY_RUN}" -eq 1 ]]; then
echo "[tier2] --dry-run: showing actions that would execute, then exiting."
echo "[tier2] provision: ${SSH_CMD:-(local)} apt-get install -y python3-pip docker.io openssh-client iproute2"
if [[ "${RUN_REFLASH}" -eq 1 ]]; then
echo "[tier2] reflash: ${SSH_CMD:-(local)} nvidia-sdkmanager-cli flash --target-spec jetson-orin-nano-super"
fi
echo "[tier2] delegate: ${DELEGATE_CMD[*]}"
exit 0
fi
provision_jetson
[[ "${RUN_REFLASH}" -eq 1 ]] && reflash_jetson
"${DELEGATE_CMD[@]}"
echo "[tier2] Suite complete. RUN_ID=${RUN_ID}"