[AZ-407] [AZ-444] [AZ-445] Batch 68: fixtures, Tier-2 harness, NFR reporter

Three blackbox-harness tasks landed together — all depend only on AZ-406 and unblock the FT-* / NFT-* scenario tasks scheduled for batches 69+. AZ-407 — Static fixture builders (3pt): * tile-cache-builder/{builder.py, Dockerfile, build.sh} produces a deterministic tile-cache-fixture Docker volume from _docs/00_problem/input_data/. Reproducibility primitives: sorted iteration, frozen PIL JPEG settings, FAISS HNSW32 built single- threaded with seeded stub descriptors. * age-injector/{age_injector.py, inject.sh} clones the volume and shifts capture_date by N×30.44 days; tile JPEG bytes preserved bit-identical. Emits synth-age-7mo + synth-age-13mo volumes. * cold-boot/cold_boot_fixture.json: frozen FC pose snapshot at Derkachi sector centre, schema v1. * secrets/mavlink-test-passkey.txt: 64-hex with required `# TEST ONLY` header line per AC-5. Passkey-equality test now compares the secret line after stripping the header. * security/cve-2025-53644.jpg: synthetic 158-byte malformed JPEG (truncated SOS marker). OpenCV 4.11.x rejects gracefully with imdecode → None. AZ-439 will sharpen for ASan instrumentation. * Top-level Makefile with `make fixtures` / `make fixtures-*` / `make e2e-tier1*` / `make unit-tests` targets. AZ-444 — Tier-2 Jetson harness wrapper (5pt): * run-tier2.sh rewritten as orchestrator. Detects local (aarch64 + TIER2_HOST=localhost) vs remote (ssh into TIER2_HOST). New flags: -k/--selector, --build-kind production|asan, --reflash (gated behind TIER2_REFLASH_ACK=1 two-key gate), --dry-run. * tier2-on-jetson.sh (new) — on-device delegate. Verifies gps-denied-onboard{,-asan}.service health; restarts with 5s tolerance; spawns tegrastats + jtop parallel samplers; tails ASan unit's journal in asan mode; drives docker compose with TIER=tier2-jetson; forwards SELECTOR to pytest -k. * docker/run-tier1.sh (new) — selector-parity sibling. * AC-1 (selector parity) and AC-6 (reflash gating) unit-tested via --dry-run output assertions. AC-2/AC-3/AC-4/AC-5 are hardware- loop ACs verified by the Tier-2 runtime smoke (no Jetson in the unit-test layer). AZ-445 — CSV reporter + evidence bundler refinements (2pt): * reporting/nfr_recorder.py (new) — pytest plugin. Provides the `nfr_recorder` fixture with record_metric(name, value, ac_id) and partial(ac_id, reason). At session end emits: - per-nfr/<scenario_id>.json (AC-1) - traceability-status.json with every AC ID parsed from traceability-matrix.md, classified Covered/PARTIAL/NOT COVERED with source scenario IDs (AC-2) - regression-baseline.json with all numeric metrics (AC-3) * csv_reporter.py extended — `_outcome_to_result` consults the aggregator; rows flip PASS → PARTIAL when an AC was marked PARTIAL by nfr_recorder (AC-4). Graceful fallback when aggregator isn't registered (unit-test contexts). * conftest.py registers nfr_recorder in pytest_plugins. * New --traceability-matrix CLI flag seeds the NOT COVERED rows. Build / config: * pyproject.toml dev extras: added Pillow>=10.4,<13.0 for the tile-cache-builder unit test (broad enough to keep torchvision's Pillow 12 pin happy; the production builder runs inside its own Docker image with its own pin). * Updated test_directory_layout.py to cover 10 new files + replaced the byte-equal passkey assertion with the header-stripping variant. Test results: * 157 focused tests pass (was 97 in batch 67; +60 new across this batch). No regressions. Module-layout / spec drift: * AZ-407 spec text says `tests/fixtures/...`; module-layout blackbox_tests entry (commit d7a17a8) authoritatively places the harness under `e2e/`. Implementation followed the layout entry. * AZ-444 spec mentions `e2e/tier2/run-tier2.sh`; AZ-406 placed it at `e2e/jetson/run-tier2.sh`. Kept at `e2e/jetson/` for consistency. * Cold-boot README ownership: corrected from AZ-419 to AZ-407 per AZ-419's own Dependencies field. Specs archived to _docs/02_tasks/done/. Jira tickets transitioned to In Testing on commit. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 11:51:14 +00:00 · 2026-05-16 17:18:01 +03:00
parent e9e6e32097
commit 6599d828d2
35 changed files with 3716 additions and 147 deletions
@@ -0,0 +1,90 @@
+# Fixture Builders — Static (tile-cache, age-injector, cold-boot, mavlink-passkey, cve-jpeg)
+
+**Task**: AZ-407_fixture_builders_static
+**Name**: Static fixture builders for tile cache, aged tiles, cold-boot pose, MAVLink passkey, CVE JPEG
+**Description**: Implement reproducible fixture builders for the five static (build-once-per-CI) fixtures named in `test-data.md`: `tile-cache-fixture`, `synth-age-tile-set`, `cold-boot-fixture`, `mavlink-passkey`, `cve-jpeg-fixture`.
+**Complexity**: 3 points
+**Dependencies**: AZ-406
+**Component**: Blackbox Tests / Fixture builders (epic AZ-262 / E-BBT)
+**Tracker**: AZ-407
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+Several blackbox scenarios assume the existence of static fixtures that do not vary across runs (FAISS HNSW index, aged tile manifests, frozen FC pose, signing passkey, crafted JPEG). Without a single owner producing them deterministically, every scenario task would re-implement its own variant and assertions would drift.
+
+## Outcome
+
+- `tests/fixtures/tile-cache-builder/build.sh` produces the same `tile-cache-fixture` content (FAISS index hashes, tile manifest rows, on-disk file sizes) bit-for-bit on two consecutive runs from the same `_docs/00_problem/input_data/` source. Builds at minimum: 60 still-image footprints + Derkachi route bbox at 0.3-0.5 m/px. When D-PROJ-3 is unresolved, footprints without paired `_gmaps.png` use stub-tile content with explicit "STUB" provenance in the manifest.
+- `tests/fixtures/age-injector/` clones `tile-cache-fixture` and produces `synth-age-7mo` (>6 mo, exceeds AC-8.2 active-conflict threshold) and `synth-age-13mo` (>12 mo, exceeds rear threshold). Tile pixels unchanged; only the manifest `capture_date` field mutated.
+- `tests/fixtures/cold-boot/` ships a JSON snapshot of a `GLOBAL_POSITION_INT` pose at flight-resume time, loadable by `ardupilot-plane-sitl` / `inav-sitl` SITL via the standard parameter-load path.
+- `tests/fixtures/secrets/mavlink-test-passkey.txt` ships a 32-byte hex passkey, prefixed `# TEST ONLY — not for production use`.
+- `tests/fixtures/security/cve-2025-53644.jpg` ships a license-checked PoC OR a generation script that produces an equivalent crafted JPEG following the published PoC structure.
+
+## Scope
+
+### Included
+- `build.sh` + Dockerfile for tile-cache-builder; FAISS index emission; tile filesystem layout; manifest CSV/SQLite per `restrictions.md` § Satellite Imagery schema.
+- `age-injector` script that copies the tile-cache volume and mutates manifest dates only.
+- Static cold-boot JSON, mavlink-passkey, CVE JPEG fixtures + their license/provenance README.
+- A top-level `make fixtures` (or equivalent CI step) that builds all five fixtures into named Docker volumes / files.
+
+### Excluded
+- Synthetic-injection fixtures (outlier, blackout-spoof, multi-segment) — owned by AZ-408.
+- Real Derkachi video / 60 still images — bind-mounted from `_docs/00_problem/input_data/`, not built.
+- The Suite Sat Service mock — owned by AZ-406.
+- Production-grade tile-cache content (real public-data subset for D-PROJ-3); stub-tile fallback is acceptable until D-PROJ-3 lands.
+
+## Acceptance Criteria
+
+**AC-1: tile-cache-fixture is deterministic**
+Given a clean Docker volume state
+When `tests/fixtures/tile-cache-builder/build.sh` runs twice from the same source
+Then both runs produce a tile-cache-fixture with identical FAISS index hash, identical manifest rows, and identical tile-filesystem byte sizes.
+
+**AC-2: tile-cache-fixture covers required footprints**
+Given the build completes
+Then the manifest contains entries for all 60 still-image footprints AND the Derkachi route bbox AND the 2 paired `_gmaps.png` references; m/px ≥ 0.5 for every entry.
+
+**AC-3: synth-age-7mo and synth-age-13mo correctly aged**
+Given `tile-cache-fixture` exists
+When `age-injector` runs with target=7mo / target=13mo
+Then the resulting volume has all `capture_date` fields set to (now - 7 mo) / (now - 13 mo) ± 1 day; tile pixel content is bit-identical to the source.
+
+**AC-4: cold-boot-fixture loads into SITL**
+Given the JSON pose snapshot
+When loaded into `ardupilot-plane-sitl` (and separately `inav-sitl`) per the SITL parameter-load convention
+Then the SITL EKF reflects the snapshot pose within ±1 m of the JSON's lat/lon/alt fields.
+
+**AC-5: mavlink-passkey is a valid 32-byte hex secret**
+Given `mavlink-test-passkey.txt`
+Then the file contains exactly 64 hex characters (32 bytes); the first line is `# TEST ONLY — not for production use`.
+
+**AC-6: cve-jpeg-fixture is decodable / triggers the CVE behavior**
+Given `cve-2025-53644.jpg`
+When fed to OpenCV ≥4.12.0 imdecode under AddressSanitizer
+Then no buffer-overflow / use-after-free is reported AND OpenCV either decodes the image or returns an error gracefully (no crash). When fed to a vulnerable OpenCV (≤4.11) the PoC behavior is observable.
+
+**AC-7: License + provenance documented**
+Given each fixture
+Then a `README.md` next to it states: source URL (or "synthetic"), license, and re-distribution terms. Fixtures lacking a clear license are generated programmatically rather than checked in.
+
+## System Under Test Boundary
+
+This task ONLY produces fixtures consumed by other test tasks. It does NOT exercise SUT behavior. The fixtures themselves are the deliverable.
+
+- No internal SUT modules are imported by the builders.
+- The tile-cache-builder uses only the public on-disk schema documented in `_docs/00_problem/restrictions.md` § Satellite Imagery; it does NOT depend on the runtime tile-cache implementation (C6).
+- If C6's on-disk schema later evolves, this builder's output must be updated to match — the builder is a contract test on the schema.
+
+## Constraints
+
+- Re-runnability: each builder MUST be idempotent; running twice produces the same output.
+- Volume-driven: tile-cache + age-injector emit named Docker volumes (`tile-cache-fixture`, `synth-age-7mo`, `synth-age-13mo`) so compose can mount them RO into the SUT.
+- License hygiene: any third-party data must be license-checked at build time; failures abort the build with a human-readable error.
+
+## Document Dependencies
+
+- `_docs/02_document/tests/test-data.md` § Seed Data Sets, § Input Data Mapping
+- `_docs/00_problem/restrictions.md` § Satellite Imagery (manifest schema)
+- `_docs/02_document/tests/blackbox-tests.md` (which scenarios consume which fixture)
@@ -0,0 +1,78 @@
+# Tier-2 Jetson harness wrapper
+
+**Task**: AZ-444_tier2_jetson_harness
+**Name**: Tier-2 hardware-loop runner — `run-tier2.sh`, ssh provisioning, systemd service install, ASan-fuzz mode, image-flash automation
+**Description**: Implement the Tier-2 hardware-loop wrapper that AZ-406 stubs out — actual ssh-based runner, systemd service install for SUT on Jetson, `tegrastats` capture, ASan-fuzz launch path.
+**Complexity**: 5 points
+**Dependencies**: AZ-406
+**Component**: Blackbox Tests / Tier-2 runner (epic AZ-262)
+**Tracker**: AZ-444
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+AZ-406 scaffolds Tier-2 (`run-tier2.sh` exists as a stub); but the actual hardware-loop semantics — ssh provisioning, image-flash automation, systemd service-life management, telemetry capture, ASan-fuzz launch — are non-trivial and need a dedicated task.
+
+## Outcome
+
+- `e2e/tier2/run-tier2.sh` accepts the same `pytest -k <selector>` selector as Tier-1 and runs the selector against a configured Jetson host (env var `TIER2_HOST`).
+- Provisioning: ssh-based; runs `apt update && apt install -y ...` for runner deps if not already present (idempotent).
+- SUT lifecycle: installs the SUT as a systemd service (`gps-denied-onboard.service`); restart command is `systemctl restart gps-denied-onboard`.
+- Telemetry capture: `tegrastats` runs as a parallel ssh stream during each test; output piped into the per-run evidence bundle.
+- ASan-fuzz: separate `--build-kind asan` mode that flashes the ASan image (or builds it remotely) and runs the fuzz binary with stderr captured.
+- Image-flash automation: `--reflash` flag (gated, OFF by default) re-flashes the Jetson via `nvidia-sdkmanager-cli` when needed.
+
+## Scope
+
+### Included
+- `run-tier2.sh` runner.
+- ssh-based provisioning + systemd install/restart.
+- `tegrastats` parallel capture.
+- ASan-fuzz launch.
+- Image-flash automation (gated).
+
+### Excluded
+- The CSV reporter — owned by AZ-406.
+- Per-scenario test logic — owned by individual scenario tasks.
+- Chamber automation for +50 °C — out of scope.
+
+## Acceptance Criteria
+
+**AC-1: selector parity**
+Given the same `pytest -k <selector>` invocation
+Then both `run-tier1.sh` and `run-tier2.sh` accept it; the resulting test selection on Tier-2 is the same as Tier-1 (modulo `tier == tier2-jetson` skip rules).
+
+**AC-2: idempotent provisioning**
+Given the Jetson host has the runner deps already installed
+When `run-tier2.sh` runs
+Then provisioning is a no-op (idempotent).
+
+**AC-3: systemd lifecycle**
+Given a Tier-2 test triggers `restart`
+Then `systemctl restart gps-denied-onboard` is issued; the SUT process restarts within ≤5 s.
+
+**AC-4: tegrastats parallel capture**
+Given any Tier-2 test
+Then `tegrastats` runs as a parallel ssh stream during the test; its output lands in `e2e-results/run-${RUN_ID}/tegrastats-${tier2-host}-${test_id}.log`.
+
+**AC-5: ASan-fuzz mode**
+Given `--build-kind asan` flag
+Then the runner ensures the ASan SUT image is installed; the fuzz binary is launched; stderr is captured into `e2e-results/run-${RUN_ID}/asan-fuzz-${test_id}.log`.
+
+**AC-6: image-flash gating**
+Given the `--reflash` flag is OFF (default)
+Then image-flash is NOT triggered; the runner errors out with a clear message if the on-Jetson SUT version does not match the test selection's expected version.
+
+## System Under Test Boundary
+
+This task IS infrastructure — no SUT logic is exercised by it. The Tier-2 harness only orchestrates SUT lifecycle on real hardware.
+
+## Constraints
+
+- ssh-based: requires `TIER2_HOST` + `TIER2_USER` + `TIER2_KEY_PATH` env vars; fails fast if any are missing.
+- The reflash path uses NVIDIA's `sdkmanager-cli` and is environment-specific; it is gated OFF by default to prevent accidental hardware re-provisioning during routine CI.
+
+## Document Dependencies
+
+- `_docs/02_document/tests/environment.md` § Tier-2 (Jetson hardware loop)
+- `_docs/02_document/tests/test-data.md` § Tier-2-only fixtures (none beyond shared)
@@ -0,0 +1,60 @@
+# CSV reporter + evidence bundler refinements
+
+**Task**: AZ-445_csv_reporter_evidence_bundler
+**Name**: Per-NFR machine-readable outputs, traceability-status.json, regression-detector inputs
+**Description**: Build out the CSV reporter + evidence bundler beyond AZ-406's bootstrap — add per-NFR machine-readable outputs for performance / resilience / security / resource-limit suites, generate `traceability-status.json` mapping AC → status, emit regression-detector inputs.
+**Complexity**: 2 points
+**Dependencies**: AZ-406, all 39 prior scenario tasks (consumes their output)
+**Component**: Blackbox Tests / Reporting (epic AZ-262)
+**Tracker**: AZ-445
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+The bootstrap CSV reporter handles the basics (one row per `(scenario, fc_adapter, vio_strategy, tier)` with PASS/FAIL/PARTIAL). But the project also needs per-NFR machine-readable outputs (so `nft-perf-01-partition.csv` can be diffed across runs), a `traceability-status.json` aligning to the matrix, and explicit regression-detector inputs so future runs can tell quickly whether a regression occurred.
+
+## Outcome
+
+- `e2e/runner/evidence_bundler.py` extended:
+  - For each NFT scenario: emit a per-scenario JSON with the canonical metrics (latency p95, drift, cov, etc.) into `e2e-results/run-${RUN_ID}/per-nfr/<scenario_id>.json`.
+  - Emit `traceability-status.json` mapping each AC ID → `{Covered, PARTIAL, NOT COVERED}` with the source scenario IDs.
+  - Emit `regression-baseline.json` with all numeric metrics from the run (consumable by a future diff tool).
+
+## Scope
+
+### Included
+- Per-NFR JSON emission.
+- `traceability-status.json` generation by aggregating each scenario's `traces_to` field.
+- `regression-baseline.json` emission.
+- Updates to existing `report.csv` with PARTIAL annotation pulled from each scenario's `traceability-status` contribution.
+
+### Excluded
+- The diff tool itself — out of scope (a separate future task).
+- Per-scenario assertion logic — owned by each scenario task.
+
+## Acceptance Criteria
+
+**AC-1: per-NFR JSON emission**
+Given any NFT scenario completes
+Then `e2e-results/run-${RUN_ID}/per-nfr/<scenario_id>.json` exists with the documented schema (one well-known field per metric the scenario emits).
+
+**AC-2: traceability-status.json**
+Given a full CI run
+Then `traceability-status.json` lists every AC ID (from `traceability-matrix.md`) with status `Covered | PARTIAL | NOT COVERED` AND the source scenario IDs.
+
+**AC-3: regression-baseline.json**
+Given a full CI run
+Then `regression-baseline.json` contains every numeric metric per scenario per parameterization, in a stable schema diffable across runs.
+
+**AC-4: PARTIAL propagation**
+Given any scenario's per-NFR JSON marks an AC PARTIAL
+Then `traceability-status.json` reflects it; `report.csv` shows the scenario row as `PARTIAL` rather than `PASS`.
+
+## System Under Test Boundary
+
+This task is reporting-only. It consumes test outputs and produces evidence files.
+
+## Document Dependencies
+
+- `_docs/02_document/tests/traceability-matrix.md` (the AC-ID universe)
+- `_docs/02_document/tests/test-data.md` § Reporting & Evidence