diff --git a/_docs/02_tasks/todo/AZ-407_fixture_builders_static.md b/_docs/02_tasks/done/AZ-407_fixture_builders_static.md similarity index 100% rename from _docs/02_tasks/todo/AZ-407_fixture_builders_static.md rename to _docs/02_tasks/done/AZ-407_fixture_builders_static.md diff --git a/_docs/02_tasks/todo/AZ-444_tier2_jetson_harness.md b/_docs/02_tasks/done/AZ-444_tier2_jetson_harness.md similarity index 100% rename from _docs/02_tasks/todo/AZ-444_tier2_jetson_harness.md rename to _docs/02_tasks/done/AZ-444_tier2_jetson_harness.md diff --git a/_docs/02_tasks/todo/AZ-445_csv_reporter_evidence_bundler.md b/_docs/02_tasks/done/AZ-445_csv_reporter_evidence_bundler.md similarity index 100% rename from _docs/02_tasks/todo/AZ-445_csv_reporter_evidence_bundler.md rename to _docs/02_tasks/done/AZ-445_csv_reporter_evidence_bundler.md diff --git a/_docs/03_implementation/batch_68_report.md b/_docs/03_implementation/batch_68_report.md new file mode 100644 index 0000000..e0338fd --- /dev/null +++ b/_docs/03_implementation/batch_68_report.md @@ -0,0 +1,315 @@ +# Batch 68 Report — Test Implementation (cycle 1, batch 2 of test phase) + +**Batch**: 68 +**Date**: 2026-05-16 +**Context**: Test implementation (greenfield Step 10 — Implement Tests) +**Tasks**: AZ-407 (3pt), AZ-444 (5pt), AZ-445 (2pt) — 10 cp / 3 tasks +**Cycle**: 1 +**Verdict**: COMPLETE — PASS (self-reviewed) + +## Summary + +Three blackbox-harness tasks, all dependent only on AZ-406: + +### AZ-407 — Static fixture builders (3pt) + +Concrete deliverables for the five static fixtures named in +`test-data.md`: + +* **tile-cache-fixture** — `e2e/fixtures/tile-cache-builder/`: + `builder.py` (pure Python; emits tile JPEGs + sidecar JSON + + `manifest.csv` + FAISS HNSW `descriptors.index`), `Dockerfile` + (Python 3.10-slim + Pillow + numpy + faiss-cpu), `build.sh` + (Docker volume mode + `--local` unit-test mode). Reproducibility + primitives: sorted input iteration, fixed PIL JPEG settings + (`quality=85, optimize=False, progressive=False, subsampling=2`), + manifest rows sorted by `(zoom, x, y)`, FAISS single-threaded with + fixed seed. AC-1 verified by `test_builder_is_deterministic`. +* **age-injector** — `e2e/fixtures/age-injector/`: + `age_injector.py` (clones the tile tree bit-identical, mutates + manifest + sidecar `capture_date` to `now - age_months × 30.44d`), + `inject.sh` (emits `synth-age-7mo` + `synth-age-13mo` named Docker + volumes). Tile pixels remain byte-equal across age injection. +* **cold-boot-fixture** — `e2e/fixtures/cold-boot/cold_boot_fixture.json`: + Frozen FC pose snapshot at flight-resume time. Schema v1 carries + `global_position_int` (lat_e7 / lon_e7 / alt_mm / hdg_cdeg), + `attitude` (roll/pitch/yaw_rad), and per-FC param-load hints. The + fixture lat/lon sits inside the Derkachi bbox; AZ-419 (FT-P-11) + drives the SITL parameter-load path. +* **mavlink-test-passkey** — `e2e/fixtures/secrets/mavlink-test-passkey.txt`: + 64-hex passkey with the required `# TEST ONLY — not for production + use` header line. Sync with the Docker-secret file + `e2e/docker/secrets/mavlink_passkey` enforced by the updated + `test_passkey_files_match` (strips the comment header before byte + comparison). +* **cve-2025-53644.jpg** — `e2e/fixtures/security/`: + Synthetic malformed JPEG (truncated SOS marker, no EOI). The + generator `generate_cve_jpeg.py` emits a 158-byte file with + pinned SHA-256 `c281d2f25959…877002e`. OpenCV 4.11 (vulnerable + line) rejects gracefully with `imdecode → None`. AZ-439 (NFT-SEC-04) + will sharpen this for full ASan instrumentation. + +Top-level `Makefile` with `make fixtures` / `make fixtures-*` / +`make fixtures-unit-tests` / `make e2e-tier1` targets. + +Per-fixture READMEs document source, license, provenance, and +reproducibility per AC-7. + +### AZ-444 — Tier-2 Jetson harness wrapper (5pt) + +The AZ-406 scaffold of `run-tier2.sh` covered the local-execution +on-Jetson path; AZ-444 splits the harness into the orchestrator-side +and on-device parts: + +* **`e2e/jetson/run-tier2.sh`** (rewritten) — orchestrator. Detects + local (aarch64 + TIER2_HOST=localhost) vs remote (ssh into + `TIER2_HOST`). Flags: `--fc-adapter`, `--vio-strategy`, + `-k`/`--selector`, `--build-kind production|asan`, `--duration`, + `--enable-chamber`, `--reflash`, `--dry-run`. Remote mode rsyncs + the `e2e/` tree to `/opt/azaion-e2e/` on the Jetson and ssh's the + on-device delegate. Reflash path requires both `--reflash` AND + `TIER2_REFLASH_ACK=1` (two-key gate). +* **`e2e/jetson/tier2-on-jetson.sh`** (new) — on-device delegate. + Verifies `gps-denied-onboard.service` (or `*-asan.service` for + `--build-kind=asan`); restarts with 5-second tolerance per AC-3; + spawns tegrastats + jtop parallel samplers per AC-4; tails the + ASan unit's journal into `asan-fuzz.log` when in asan mode; drives + the e2e-runner via docker compose with TIER=tier2-jetson; forwards + `SELECTOR` to pytest's `-k` per AC-1. +* **`e2e/docker/run-tier1.sh`** (new) — selector-parity sibling. + Same flag surface as `run-tier2.sh` minus the ssh / reflash + options. AC-1 verified by `test_selector_parity_pytest_args_equivalent` + which extracts the `-k ` from both dry-run outputs and + asserts the same string is present. + +ACs whose authentic verification path requires a Jetson are +documented in this report's "AC coverage" table and gated behind +docker-bound smoke tests inside the runner image. + +### AZ-445 — CSV reporter + evidence bundler refinements (2pt) + +* **`e2e/runner/reporting/nfr_recorder.py`** (new) — pytest plugin. + Provides the `nfr_recorder` fixture; tests call + `nfr_recorder.record_metric(name, value, ac_id)` and + `nfr_recorder.partial(ac_id, reason)`. At session end the plugin + emits three artifacts into the evidence dir: + - `per-nfr/.json` — one file per recorded scenario + (AC-1) + - `traceability-status.json` — every AC from + `_docs/02_document/tests/traceability-matrix.md` listed with + status ∈ {Covered, PARTIAL, NOT COVERED} and source scenarios + (AC-2) + - `regression-baseline.json` — flat numeric-metric dump for + diff tooling (AC-3) +* **`e2e/runner/reporting/csv_reporter.py`** (extended) — the + `_outcome_to_result` path now consults the aggregator: when an + NFR-recorded scenario has any PARTIAL AC, the row's `result` + column is `PARTIAL` instead of `PASS` (AC-4). Graceful fallback + when the aggregator isn't registered (unit-test contexts). +* **`e2e/runner/conftest.py`** — registers `nfr_recorder` in + `pytest_plugins`. +* New CLI flag `--traceability-matrix` (default: project's + `_docs/02_document/tests/traceability-matrix.md`) lets the + aggregator seed the NOT COVERED rows. + +The matrix parser uses two regex passes (`AC-…` and `RESTRICT-…` +table-row prefixes); 88 IDs in the current matrix file parse +cleanly. + +## Files added / modified + +### Added (15) + +AZ-407: +* `e2e/fixtures/tile-cache-builder/builder.py` +* `e2e/fixtures/tile-cache-builder/Dockerfile` +* `e2e/fixtures/tile-cache-builder/build.sh` +* `e2e/fixtures/age-injector/age_injector.py` +* `e2e/fixtures/age-injector/inject.sh` +* `e2e/fixtures/cold-boot/cold_boot_fixture.json` +* `e2e/fixtures/security/cve-2025-53644.jpg` (158 bytes; generated) + +AZ-444: +* `e2e/jetson/tier2-on-jetson.sh` +* `e2e/docker/run-tier1.sh` + +AZ-445: +* `e2e/runner/reporting/nfr_recorder.py` + +Top-level: +* `Makefile` + +Unit tests (AZ-407 + AZ-444 + AZ-445): +* `e2e/_unit_tests/fixtures/test_tile_cache_builder.py` +* `e2e/_unit_tests/fixtures/test_age_injector.py` +* `e2e/_unit_tests/fixtures/test_cold_boot_fixture.py` +* `e2e/_unit_tests/fixtures/test_mavlink_passkey.py` +* `e2e/_unit_tests/fixtures/test_cve_jpeg.py` +* `e2e/_unit_tests/jetson/test_run_tier_scripts.py` +* `e2e/_unit_tests/reporting/test_nfr_recorder.py` + +### Modified (8) + +* `pyproject.toml` — added `Pillow>=10.4,<13.0` to dev extras + (used by `test_tile_cache_builder.py` to verify reproducibility + without Docker). +* `e2e/jetson/run-tier2.sh` — rewritten as the orchestrator (was a + local-only stub from AZ-406). +* `e2e/fixtures/secrets/mavlink-test-passkey.txt` — added the + required `# TEST ONLY — not for production use` header line per + AZ-407 AC-5. +* `e2e/fixtures/secrets/README.md` — expanded per AC-7 (license, + provenance, sync-with-docker-secret note). +* `e2e/fixtures/security/generate_cve_jpeg.py` — concrete impl + (replaces the AZ-406 NotImplementedError pointer). +* `e2e/fixtures/security/README.md` — expanded per AC-7. +* `e2e/fixtures/tile-cache-builder/README.md` — expanded per AC-7. +* `e2e/fixtures/age-injector/README.md` — expanded per AC-7. +* `e2e/fixtures/cold-boot/README.md` — expanded; clarified that + AZ-407 owns the JSON file (the prior README incorrectly pointed + at AZ-419). +* `e2e/runner/reporting/csv_reporter.py` — PARTIAL propagation + hook (AZ-445 AC-4). +* `e2e/runner/conftest.py` — registered `nfr_recorder` plugin. +* `e2e/_unit_tests/test_directory_layout.py` — added the new + paths (10 new files); replaced the byte-equal passkey assertion + with a header-stripping comparison. + +## Spec / module-layout drift notes + +* **AZ-407 spec uses `tests/fixtures/...` paths**, but the + `blackbox_tests` cross-cutting entry in + `_docs/02_document/module-layout.md` (added in preparatory commit + `d7a17a8`) authoritatively places the e2e harness under `e2e/`. + Implementation followed the module-layout entry; the spec text is + pre-fix and was not updated. The AZ-407 archived spec retains its + `tests/fixtures` wording for audit, but the actual file ownership + is `e2e/fixtures/...`. No further action — the module-layout + entry is the source of truth. +* **AZ-444 spec mentions `e2e/tier2/run-tier2.sh`**, but the + AZ-406 scaffold placed Tier-2 scripts under `e2e/jetson/`. + Kept at `e2e/jetson/` for consistency with the AZ-406 commit; + no behavioural difference. +* **Cold-boot ownership**: AZ-419 spec line "Dependencies: AZ-406, + AZ-407 (cold-boot-fixture)" confirms AZ-407 owns the JSON; the + scaffold's old README incorrectly attributed ownership to AZ-419. + Fixed in this batch. + +## Test Results + +### Focused tests (Step 6.4) + +`pytest e2e/_unit_tests/` — **157 passed in 12.59s** (was 97 in +batch 67; +60 new tests across this batch). + +Breakdown of new tests: + +* AZ-407 fixtures (30 cases): tile-cache determinism (7), age-injector + shift+pixel-preserve (5), cold-boot schema (5), MAVLink passkey (3), + CVE JPEG generator (5), provenance READMEs (5). +* AZ-444 Tier scripts (15 cases): existence+exec bit (3), Tier-1 + dry-run (1), Tier-2 dry-run local/remote (2), CLI rejection (4), + reflash gating (2), selector parity (3). +* AZ-445 NFR recorder (9 cases incl. 1 CSV-reporter PARTIAL guard). + +No regressions in the 97 inherited AZ-406 tests. + +No per-batch full-suite run per the implement skill's Test-Run Cadence +(Step 16 owns the only full-suite invocation). + +## AC Test Coverage + +### AZ-407 + +| AC | Test | Status | +|----|------|--------| +| AC-1 (deterministic) | `test_builder_is_deterministic` | Covered | +| AC-2 (footprint coverage) | `test_manifest_covers_60_stills_plus_bbox`, `test_real_tile_count_matches_paired_gmaps`, `test_manifest_schema_matches_restrictions_md` | Covered | +| AC-3 (aged dates) | `test_age_injector_shifts_capture_date[7-180]`, `[13-360]`, `test_age_injector_preserves_tile_bytes`, `test_age_injector_updates_sidecar_dates` | Covered | +| AC-4 (cold-boot SITL load) | `test_cold_boot_fixture_*`: JSON schema, Derkachi bbox membership, attitude bounds. **SITL load (±1 m EKF)** deferred to AZ-419 (Docker-bound, FT-P-11). | Covered by contract; full check is AZ-419 | +| AC-5 (mavlink passkey) | `test_passkey_has_comment_header`, `test_passkey_is_64_hex_chars`, `test_passkey_is_lowercase`, `test_passkey_files_match` | Covered | +| AC-6 (CVE JPEG no-crash) | `test_opencv_rejects_without_crash`, `test_jpeg_has_soi_and_truncated_sos`, `test_committed_fixture_matches_generator` | Covered | +| AC-7 (license + provenance) | `test_provenance_readme_lists_required_sections`, `test_age_injector_provenance_readme_exists`, `test_provenance_block_present`, `test_provenance_readme_exists` (CVE) | Covered | + +### AZ-444 + +| AC | Test | Status | +|----|------|--------| +| AC-1 (selector parity) | `test_selector_parity_pytest_args_equivalent`, `test_selector_appears_in_dry_run[*]` | Covered | +| AC-2 (idempotent provisioning) | Static-shape verified in code review (dpkg-precondition guard); full check requires a Jetson host. **No unit test.** | NOT COVERED (hardware-loop) | +| AC-3 (systemd lifecycle) | Static-shape verified in code review (5×1s poll loop); full check requires a Jetson host. **No unit test.** | NOT COVERED (hardware-loop) | +| AC-4 (tegrastats parallel capture) | `test_required_path_exists[jetson/tegrastats_parser.py]` + AZ-406 parser unit tests; full pipe-capture path requires a Jetson. | Covered by contract; full check is Tier-2 runtime | +| AC-5 (ASan-fuzz) | `test_tier2_rejects_unknown_build_kind`; ASan unit `gps-denied-onboard-asan.service` is referenced by name in the delegate. Full check requires ASan-instrumented SUT on Jetson. | Covered by contract; full check is Tier-2 runtime | +| AC-6 (image-flash gating) | `test_reflash_refuses_without_ack`, `test_reflash_dry_run_with_ack_shows_flash_command` | Covered | + +AC-2 and AC-3 are documented as hardware-loop ACs whose runtime +verification path is the on-Jetson smoke test. The scripts compile, +parse, and dry-run correctly; they cannot be authentically verified +without a Jetson because mocking `systemctl` and `apt-get` would +test the mock, not the real binding. + +### AZ-445 + +| AC | Test | Status | +|----|------|--------| +| AC-1 (per-NFR JSON) | `test_emit_per_nfr_json_writes_one_file_per_scenario` + integration | Covered | +| AC-2 (traceability-status.json) | `test_emit_traceability_status_classifies_acs`, `test_emit_traceability_status_downgrades_on_fail`, `test_parse_traceability_matrix_*` | Covered | +| AC-3 (regression-baseline.json) | `test_emit_regression_baseline_dumps_numeric_metrics` + integration | Covered | +| AC-4 (PARTIAL propagation in CSV) | `test_build_row_pass_when_no_session_attribute`, integration test (`test_nfr_recorder_fixture_emits_artifacts_in_run`) | Covered | + +## Code Review Verdict + +Self-reviewed — PASS. Notable points: + +* **Reproducibility** of the tile-cache builder relies on (a) sorted + input iteration, (b) frozen PIL JPEG params, (c) FAISS + single-thread + fixed seed (`omp_set_num_threads(1)` + + `np.random.default_rng` seeded from a SHA hash of the content + hash). Test verifies bit-identical output across two runs. +* **Pillow pin compatibility**: the local venv had Pillow 12.x via + torchvision; my initial `<12.0` pin downgraded it to 11.3. Widened + to `<13.0` so both major lines are accepted and the project's + inference extras stay happy. +* **`np.random.default_rng` vs `RandomState`**: first impl used + `RandomState.standard_normal(dim, dtype=np.float32)` which doesn't + accept `dtype` in older numpy; replaced with `default_rng`. The + builder now works on the project's `numpy>=1.26,<2.0` pin. +* **CSV PARTIAL propagation** is decoupled via the aggregator — + `_outcome_to_result` in `csv_reporter.py` imports `nfr_recorder` + lazily and falls back to PASS when the import fails. Keeps the + two plugins individually testable without a hard dependency. +* **Spec drift** flagged in this report's "Spec / module-layout + drift notes" section. No action needed; the module-layout entry + is the authoritative source. + +## Auto-Fix Attempts + +0. No code-review failures — auto-fix gate was not entered. + +## Stuck Agents + +None. + +## Deferred follow-ups + +* AZ-419 (FT-P-11) — owns SITL parameter-load verification of the + cold-boot fixture (AZ-407 AC-4 runtime path). +* AZ-439 (NFT-SEC-04) — owns the ASan-instrumented CVE-2025-53644 + verification (AZ-407 AC-6's full PoC structure). +* AZ-444 hardware-loop ACs (AC-2/3/4/5) — owned by the Tier-2 smoke + test inside the runner image; will be re-verified on a Jetson + bring-up cycle. + +## Next Batch + +Batch 69 candidate set (all unblocked): + +* AZ-408 (Runtime synthetic injection — 3pt) — outlier injector, + blackout-spoof injector, multi-segment injector (the fixtures + scaffolded by AZ-406 + AZ-407). +* AZ-410 (FT-P-01 — frame-center GPS accuracy — 5pt) +* AZ-411 (FT-P-02 — cumulative drift — 3pt) + +Total: 11 cp across 3 tasks. AZ-408 unblocks the FT-N-* synthetic +scenarios; AZ-410 / AZ-411 are the first concrete positive scenarios +exercising the SUT through the full Docker-bound runner. diff --git a/_docs/_autodev_state.md b/_docs/_autodev_state.md index 0c3e01c..f054ae1 100644 --- a/_docs/_autodev_state.md +++ b/_docs/_autodev_state.md @@ -6,16 +6,16 @@ step: 10 name: Implement Tests status: in_progress sub_step: - phase: 14 - name: loop-next-batch + phase: 11 + name: commit-and-tracker-transition detail: "" retry_count: 0 cycle: 1 tracker: jira -last_completed_batch: 67 +last_completed_batch: 68 last_cumulative_review: batches_61-63 current_batch: 68 -current_batch_tasks: "" +current_batch_tasks: "AZ-407, AZ-444, AZ-445" last_step_outcomes: step_8: "Code is testable — no changes needed (testability_assessment.md committed; no list-of-changes, no source edits)" step_9: "Already complete — 41 blackbox test tasks (AZ-406..AZ-446) under epic AZ-262 with specs in _docs/02_tasks/todo/ were produced in a prior cycle; AZ-406 test-infrastructure bootstrap also pre-existing. Folder fallback satisfied (todo/ has test tasks, _dependencies_table.md reflects 114 product + 41 test = 155 total). No Step-9 work executed in cycle 1." diff --git a/e2e/_unit_tests/fixtures/test_age_injector.py b/e2e/_unit_tests/fixtures/test_age_injector.py new file mode 100644 index 0000000..099f0d2 --- /dev/null +++ b/e2e/_unit_tests/fixtures/test_age_injector.py @@ -0,0 +1,202 @@ +"""Tests for the AZ-407 age-injector. + +Covers AC-3 (capture_date shifted, pixels bit-identical) and AC-7 +(provenance docs present). +""" + +from __future__ import annotations + +import csv +import datetime as _dt +import hashlib +import json +import os +import subprocess +import sys +from pathlib import Path + +import pytest + +REPO_ROOT = Path(__file__).resolve().parents[3] +INPUT_DIR = REPO_ROOT / "_docs" / "00_problem" / "input_data" +BUILDER_PY = REPO_ROOT / "e2e" / "fixtures" / "tile-cache-builder" / "builder.py" +INJECTOR_PY = REPO_ROOT / "e2e" / "fixtures" / "age-injector" / "age_injector.py" +INJECTOR_DIR = REPO_ROOT / "e2e" / "fixtures" / "age-injector" + + +def _run(cmd: list[str]) -> str: + """Run a subprocess, return stdout (raises on failure).""" + + env = dict(os.environ, PYTHONHASHSEED="0") + result = subprocess.run(cmd, check=True, capture_output=True, text=True, env=env) + return result.stdout + + +def _build_source_cache(out_dir: Path) -> Path: + """Run the tile-cache builder; return the populated dir.""" + + _run( + [ + sys.executable, + str(BUILDER_PY), + "--input-dir", + str(INPUT_DIR), + "--output-dir", + str(out_dir), + "--quiet", + ] + ) + return out_dir + + +def _file_hashes(root: Path, suffix: str) -> dict[str, str]: + return { + p.relative_to(root).as_posix(): hashlib.sha256(p.read_bytes()).hexdigest() + for p in sorted(root.rglob(f"*{suffix}")) + } + + +@pytest.fixture(scope="module") +def source_cache(tmp_path_factory: pytest.TempPathFactory) -> Path: + """One-shot module-scoped tile-cache build (~1s).""" + + return _build_source_cache(tmp_path_factory.mktemp("source-cache")) + + +@pytest.mark.parametrize("age_months,threshold_days", [(7, 6 * 30), (13, 12 * 30)]) +def test_age_injector_shifts_capture_date( + tmp_path: Path, + source_cache: Path, + age_months: int, + threshold_days: int, +) -> None: + """AC-3: every manifest row's capture_date is now - age_months ±1 day.""" + + # Arrange + out = tmp_path / f"out-{age_months}mo" + today = _dt.datetime.now(tz=_dt.timezone.utc).date() + + # Act + _run( + [ + sys.executable, + str(INJECTOR_PY), + "--source-dir", + str(source_cache), + "--output-dir", + str(out), + "--age-months", + str(age_months), + ] + ) + + # Assert + with (out / "manifest.csv").open() as fp: + rows = list(csv.DictReader(fp)) + assert rows, "aged manifest is empty" + for r in rows: + shifted = _dt.date.fromisoformat(r["capture_date"]) + delta_days = (today - shifted).days + target_days = int(round(age_months * 30.44)) + assert abs(delta_days - target_days) <= 1, ( + f"row {r['tile_x']},{r['tile_y']}: capture_date offset is " + f"{delta_days} days, expected {target_days} ±1" + ) + assert delta_days > threshold_days, ( + f"aged capture_date {r['capture_date']} did not exceed the " + f"{threshold_days}-day threshold" + ) + + +def test_age_injector_preserves_tile_bytes(tmp_path: Path, source_cache: Path) -> None: + """AC-3: tile JPEG bodies copy bit-identical.""" + + # Arrange + out = tmp_path / "out-7mo" + + # Act + _run( + [ + sys.executable, + str(INJECTOR_PY), + "--source-dir", + str(source_cache), + "--output-dir", + str(out), + "--age-months", + "7", + ] + ) + + # Assert + src_hashes = _file_hashes(source_cache / "tiles", ".jpg") + out_hashes = _file_hashes(out / "tiles", ".jpg") + assert src_hashes == out_hashes, "tile JPEG bytes drifted across age injection" + + +def test_age_injector_updates_sidecar_dates(tmp_path: Path, source_cache: Path) -> None: + """AC-3: per-tile sidecar JSON also reflects the aged date.""" + + # Arrange + out = tmp_path / "out-13mo" + + # Act + _run( + [ + sys.executable, + str(INJECTOR_PY), + "--source-dir", + str(source_cache), + "--output-dir", + str(out), + "--age-months", + "13", + ] + ) + + # Assert + today = _dt.datetime.now(tz=_dt.timezone.utc).date() + target_days = int(round(13 * 30.44)) + for sidecar in sorted((out / "tiles").rglob("*.json")): + data = json.loads(sidecar.read_text()) + shifted = _dt.date.fromisoformat(data["capture_date"]) + delta = (today - shifted).days + assert abs(delta - target_days) <= 1, ( + f"sidecar {sidecar}: capture_date offset {delta}d, expected {target_days}d ±1" + ) + + +def test_age_injector_rejects_non_positive_months(tmp_path: Path, source_cache: Path) -> None: + """Defensive: zero or negative age_months must error out, not silently no-op.""" + + # Arrange + out = tmp_path / "rejected" + + # Act + Assert + with pytest.raises(subprocess.CalledProcessError) as excinfo: + _run( + [ + sys.executable, + str(INJECTOR_PY), + "--source-dir", + str(source_cache), + "--output-dir", + str(out), + "--age-months", + "0", + ] + ) + assert "must be positive" in (excinfo.value.stderr or "") + + +def test_age_injector_provenance_readme_exists() -> None: + """AC-7: README documents the injector.""" + + # Arrange / Act + readme = INJECTOR_DIR / "README.md" + + # Assert + assert readme.exists() + content = readme.read_text() + assert "Provenance" in content + assert "Reproducibility" in content diff --git a/e2e/_unit_tests/fixtures/test_cold_boot_fixture.py b/e2e/_unit_tests/fixtures/test_cold_boot_fixture.py new file mode 100644 index 0000000..2a14686 --- /dev/null +++ b/e2e/_unit_tests/fixtures/test_cold_boot_fixture.py @@ -0,0 +1,84 @@ +"""Tests for the AZ-407 cold-boot fixture. + +AC-4 (SITL loads pose within ±1 m) requires SITL which the unit-test +layer cannot run; that path is covered by AZ-419's FT-P-11 inside the +Docker-bound runner. AZ-407's unit-test obligation is to verify the +JSON shape and bounds. +""" + +from __future__ import annotations + +import json +from pathlib import Path + +import pytest + +REPO_ROOT = Path(__file__).resolve().parents[3] +FIXTURE_PATH = REPO_ROOT / "e2e" / "fixtures" / "cold-boot" / "cold_boot_fixture.json" + + +@pytest.fixture(scope="module") +def cold_boot() -> dict: + return json.loads(FIXTURE_PATH.read_text()) + + +def test_schema_version(cold_boot: dict) -> None: + """The schema field locks the file shape; AZ-419's loader keys off it.""" + # Assert + assert cold_boot["_schema"] == "cold-boot-fixture/v1" + + +def test_global_position_int_block(cold_boot: dict) -> None: + """GLOBAL_POSITION_INT fields use canonical MAVLink units.""" + + # Arrange + gpi = cold_boot["global_position_int"] + + # Assert + required = { + "time_boot_ms", + "lat_e7", + "lon_e7", + "alt_mm", + "relative_alt_mm", + "vx_cm_s", + "vy_cm_s", + "vz_cm_s", + "hdg_cdeg", + } + assert required <= set(gpi), f"missing fields: {required - set(gpi)}" + assert -90 * 10**7 <= gpi["lat_e7"] <= 90 * 10**7 + assert -180 * 10**7 <= gpi["lon_e7"] <= 180 * 10**7 + assert -50_000_000 <= gpi["alt_mm"] <= 50_000_000 + + +def test_attitude_block(cold_boot: dict) -> None: + """Attitude angles fall inside [-pi, pi].""" + + # Arrange + att = cold_boot["attitude"] + import math + + # Assert + for field in ("roll_rad", "pitch_rad", "yaw_rad"): + assert -math.pi <= att[field] <= math.pi, f"{field} out of range: {att[field]}" + + +def test_derkachi_lat_lon_inside_bbox(cold_boot: dict) -> None: + """The frozen pose must be inside the Derkachi route bbox used by C2.""" + + # Arrange + lat = cold_boot["global_position_int"]["lat_e7"] / 10**7 + lon = cold_boot["global_position_int"]["lon_e7"] / 10**7 + + # Assert + assert 50.05 <= lat <= 50.10, f"lat {lat} outside Derkachi bbox" + assert 36.10 <= lon <= 36.20, f"lon {lon} outside Derkachi bbox" + + +def test_provenance_block_present(cold_boot: dict) -> None: + """AC-7: license + provenance fields documented inside the JSON itself.""" + # Assert + assert "_license" in cold_boot + assert "_provenance" in cold_boot + assert "AZ-419" in cold_boot["_authored_for"][1] diff --git a/e2e/_unit_tests/fixtures/test_cve_jpeg.py b/e2e/_unit_tests/fixtures/test_cve_jpeg.py new file mode 100644 index 0000000..0190cf5 --- /dev/null +++ b/e2e/_unit_tests/fixtures/test_cve_jpeg.py @@ -0,0 +1,107 @@ +"""Tests for the AZ-407 CVE-2025-53644 fixture (AC-6, AC-7).""" + +from __future__ import annotations + +import hashlib +import os +import subprocess +import sys +from pathlib import Path + +import pytest + +REPO_ROOT = Path(__file__).resolve().parents[3] +GENERATOR = REPO_ROOT / "e2e" / "fixtures" / "security" / "generate_cve_jpeg.py" +COMMITTED_FIXTURE = REPO_ROOT / "e2e" / "fixtures" / "security" / "cve-2025-53644.jpg" + +# Pin the committed fixture's SHA-256 so any change to the generator's +# byte layout fails the unit test explicitly. +COMMITTED_SHA256 = "c281d2f2595916dbbaca8173d2ab37507b6e3c6511aa8e420c1f4e81c877002e" + + +def _generator_run(out_path: Path) -> None: + env = dict(os.environ, PYTHONHASHSEED="0") + subprocess.run( + [sys.executable, str(GENERATOR), str(out_path)], + check=True, + capture_output=True, + text=True, + env=env, + ) + + +def test_generator_is_idempotent(tmp_path: Path) -> None: + """AC-6 / determinism: same call → identical bytes.""" + + # Arrange + out_a = tmp_path / "a.jpg" + out_b = tmp_path / "b.jpg" + + # Act + _generator_run(out_a) + _generator_run(out_b) + + # Assert + assert out_a.read_bytes() == out_b.read_bytes() + + +def test_committed_fixture_matches_generator(tmp_path: Path) -> None: + """The checked-in JPEG must equal the generator's current output.""" + + # Arrange + regen = tmp_path / "regen.jpg" + + # Act + _generator_run(regen) + + # Assert + assert COMMITTED_FIXTURE.exists(), "the AZ-407 deliverable JPEG must be checked in" + assert COMMITTED_FIXTURE.read_bytes() == regen.read_bytes(), ( + "committed cve-2025-53644.jpg drifted from generator output; " + "re-run `make fixtures-cve` to regenerate" + ) + assert hashlib.sha256(COMMITTED_FIXTURE.read_bytes()).hexdigest() == COMMITTED_SHA256 + + +def test_jpeg_has_soi_and_truncated_sos() -> None: + """Structural sanity: SOI present, SOS present, NO EOI (truncated stream).""" + + # Arrange + data = COMMITTED_FIXTURE.read_bytes() + + # Assert + assert data.startswith(b"\xff\xd8"), "missing SOI marker" + assert b"\xff\xda" in data, "missing SOS marker" + assert not data.endswith(b"\xff\xd9"), "EOI present — CVE truncation is gone" + + +def test_opencv_rejects_without_crash() -> None: + """AC-6: OpenCV must return a clean None imdecode result, no crash.""" + + # Arrange + cv2 = pytest.importorskip("cv2", reason="opencv-python not in test venv") + import numpy as np # noqa: PLC0415 + + # Act + buf = np.fromfile(str(COMMITTED_FIXTURE), dtype=np.uint8) + img = cv2.imdecode(buf, cv2.IMREAD_COLOR) + + # Assert + assert img is None, ( + "OpenCV decoded the malformed JPEG — the AZ-407 fixture no longer " + "exercises the CVE-2025-53644 truncation path" + ) + + +def test_provenance_readme_exists() -> None: + """AC-7: README documents source, license, redistribution.""" + + # Arrange + readme = REPO_ROOT / "e2e" / "fixtures" / "security" / "README.md" + + # Assert + assert readme.exists() + content = readme.read_text() + assert "Provenance" in content + assert "Re-distribution" in content + assert "License" in content diff --git a/e2e/_unit_tests/fixtures/test_mavlink_passkey.py b/e2e/_unit_tests/fixtures/test_mavlink_passkey.py new file mode 100644 index 0000000..712004f --- /dev/null +++ b/e2e/_unit_tests/fixtures/test_mavlink_passkey.py @@ -0,0 +1,47 @@ +"""Tests for the AZ-407 MAVLink test passkey fixture (AC-5).""" + +from __future__ import annotations + +from pathlib import Path + +REPO_ROOT = Path(__file__).resolve().parents[3] +PASSKEY_PATH = REPO_ROOT / "e2e" / "fixtures" / "secrets" / "mavlink-test-passkey.txt" + + +def _hex_lines(path: Path) -> list[str]: + """Return non-comment, non-blank stripped lines.""" + out: list[str] = [] + for raw in path.read_text().splitlines(): + line = raw.strip() + if not line or line.startswith("#"): + continue + out.append(line) + return out + + +def test_passkey_has_comment_header() -> None: + """AC-5: the first line is the human-readable test-only header.""" + # Arrange + first_line = PASSKEY_PATH.read_text().splitlines()[0] + # Assert + assert first_line.startswith("# TEST ONLY") + assert "not for production use" in first_line + + +def test_passkey_is_64_hex_chars() -> None: + """AC-5: the secret line is exactly 64 hex chars (32 bytes).""" + # Arrange + lines = _hex_lines(PASSKEY_PATH) + # Assert + assert len(lines) == 1, f"expected one hex line, got {len(lines)}" + secret = lines[0] + assert len(secret) == 64, f"passkey length {len(secret)}, expected 64" + int(secret, 16) # raises ValueError if not hex + + +def test_passkey_is_lowercase() -> None: + """Conventionally lowercase so byte-equality comparisons are stable.""" + # Arrange + secret = _hex_lines(PASSKEY_PATH)[0] + # Assert + assert secret == secret.lower() diff --git a/e2e/_unit_tests/fixtures/test_tile_cache_builder.py b/e2e/_unit_tests/fixtures/test_tile_cache_builder.py new file mode 100644 index 0000000..9c9d403 --- /dev/null +++ b/e2e/_unit_tests/fixtures/test_tile_cache_builder.py @@ -0,0 +1,216 @@ +"""Tests for the AZ-407 tile-cache-builder. + +Covers AC-1 (deterministic), AC-2 (footprint coverage), AC-7 (provenance +docs present). FAISS portion gated via importorskip — the production +Docker image installs faiss-cpu, but the local venv runs the test fine +without it (asserting only manifest + tile-filesystem determinism). +""" + +from __future__ import annotations + +import csv +import hashlib +import json +import os +import subprocess +import sys +from pathlib import Path + +import pytest + +REPO_ROOT = Path(__file__).resolve().parents[3] +INPUT_DIR = REPO_ROOT / "_docs" / "00_problem" / "input_data" +BUILDER_DIR = REPO_ROOT / "e2e" / "fixtures" / "tile-cache-builder" +BUILDER_PY = BUILDER_DIR / "builder.py" + + +def _run_builder(output_dir: Path) -> dict: + """Invoke builder.py against the project input_data, return summary.""" + + env = dict(os.environ) + env["PYTHONHASHSEED"] = "0" + result = subprocess.run( + [ + sys.executable, + str(BUILDER_PY), + "--input-dir", + str(INPUT_DIR), + "--output-dir", + str(output_dir), + "--quiet", + ], + check=True, + capture_output=True, + text=True, + env=env, + ) + return json.loads(result.stdout) + + +def _walk_file_hashes(root: Path) -> dict[str, str]: + """Return {relative_path: sha256_hex} for every file under root.""" + + hashes: dict[str, str] = {} + for path in sorted(root.rglob("*")): + if not path.is_file(): + continue + rel = path.relative_to(root).as_posix() + hashes[rel] = hashlib.sha256(path.read_bytes()).hexdigest() + return hashes + + +def test_builder_is_deterministic(tmp_path: Path) -> None: + """AC-1: two consecutive runs produce a bit-identical output tree.""" + + # Arrange + out_a = tmp_path / "run-a" + out_b = tmp_path / "run-b" + + # Act + summary_a = _run_builder(out_a) + summary_b = _run_builder(out_b) + + # Assert + assert summary_a["manifest_hash"] == summary_b["manifest_hash"], ( + f"manifest hash drift: {summary_a['manifest_hash']} vs " + f"{summary_b['manifest_hash']} — AC-1 broken" + ) + if summary_a["descriptors_index_hash"] is not None: + assert summary_a["descriptors_index_hash"] == summary_b["descriptors_index_hash"], ( + "FAISS descriptors.index drift between runs — AC-1 broken" + ) + hashes_a = _walk_file_hashes(out_a) + hashes_b = _walk_file_hashes(out_b) + assert hashes_a == hashes_b, ( + "Tile filesystem byte-drift between runs — AC-1 broken. " + f"diff(a-b)={set(hashes_a) - set(hashes_b)}, " + f"diff(b-a)={set(hashes_b) - set(hashes_a)}" + ) + + +def test_manifest_covers_60_stills_plus_bbox(tmp_path: Path) -> None: + """AC-2: manifest contains 60 still entries + 1 Derkachi bbox entry.""" + + # Arrange + out = tmp_path / "run" + + # Act + summary = _run_builder(out) + + # Assert + assert summary["tile_count"] == 61, ( + f"expected 60 stills + 1 bbox = 61 rows, got {summary['tile_count']}" + ) + manifest_path = out / "manifest.csv" + assert manifest_path.exists() + with manifest_path.open() as fp: + rows = list(csv.DictReader(fp)) + assert len(rows) == 61 + bbox_rows = [r for r in rows if r["provenance"].startswith("STUB_BBOX:derkachi")] + assert len(bbox_rows) == 1, "exactly one Derkachi bbox row required" + for r in rows: + assert float(r["m_per_px"]) >= 0.5, ( + f"row {r['tile_x']},{r['tile_y']} below 0.5 m/px AC-8.1 floor" + ) + + +def test_manifest_schema_matches_restrictions_md(tmp_path: Path) -> None: + """AC-2 / data_model.md alignment: column order is the contract.""" + + # Arrange + out = tmp_path / "run" + _run_builder(out) + + # Act + with (out / "manifest.csv").open() as fp: + reader = csv.reader(fp) + header = next(reader) + + # Assert + assert header == [ + "zoom_level", + "tile_x", + "tile_y", + "capture_date", + "source", + "m_per_px", + "jpeg_path", + "content_hash", + "provenance", + ] + + +def test_real_tile_count_matches_paired_gmaps(tmp_path: Path) -> None: + """AC-2: every `_gmaps.png` reference becomes a `source=googlemaps` row.""" + + # Arrange + out = tmp_path / "run" + + # Act + summary = _run_builder(out) + + # Assert + paired_count = len(list(INPUT_DIR.glob("AD*_gmaps.png"))) + assert summary["real_count"] == paired_count, ( + f"paired _gmaps.png files: {paired_count}, real rows: {summary['real_count']}" + ) + assert summary["paired_gmaps_count"] == paired_count + + +def test_sidecar_json_per_tile(tmp_path: Path) -> None: + """data_model.md § 2.1.2: every tile JPEG has a matching JSON sidecar.""" + + # Arrange + out = tmp_path / "run" + _run_builder(out) + + # Act + jpgs = sorted((out / "tiles").rglob("*.jpg")) + jsons = sorted((out / "tiles").rglob("*.json")) + + # Assert + assert len(jpgs) == len(jsons) > 0 + for jpg, sidecar in zip(jpgs, jsons, strict=True): + assert jpg.with_suffix(".json") == sidecar + data = json.loads(sidecar.read_text()) + assert {"zoom_level", "tile_x", "tile_y", "capture_date", "source"} <= set(data) + + +@pytest.mark.skipif( + not BUILDER_DIR.joinpath("README.md").exists(), + reason="builder README is the AC-7 provenance doc", +) +def test_provenance_readme_lists_required_sections() -> None: + """AC-7: README documents source URL/synthetic, license, redistribution.""" + + # Arrange + readme = (BUILDER_DIR / "README.md").read_text() + + # Assert + for required in ("Provenance", "License", "Reproducibility", "License-Expression: MIT".split(":")[0]): + # accept "Provenance" as a section header OR "License" header + if required == "Provenance": + assert "## Provenance" in readme or "## Provenance (AC-7)" in readme + elif required == "License": + assert "License" in readme or "license" in readme + elif required == "Reproducibility": + assert "Reproducibility" in readme + + +def test_faiss_index_emitted_when_faiss_available(tmp_path: Path) -> None: + """AC-1: descriptors.index is bit-stable across runs (FAISS gate).""" + + # Arrange + pytest.importorskip("faiss", reason="faiss-cpu not in test venv") + out = tmp_path / "run" + + # Act + summary = _run_builder(out) + + # Assert + assert summary["descriptors_index_hash"] is not None, ( + "faiss-cpu IS importable but builder produced no descriptors.index" + ) + index_path = out / "descriptors.index" + assert index_path.exists() + assert index_path.stat().st_size > 0 diff --git a/e2e/_unit_tests/jetson/test_run_tier_scripts.py b/e2e/_unit_tests/jetson/test_run_tier_scripts.py new file mode 100644 index 0000000..6a55cc8 --- /dev/null +++ b/e2e/_unit_tests/jetson/test_run_tier_scripts.py @@ -0,0 +1,356 @@ +"""Tests for the AZ-444 Tier-2 harness scripts. + +The scripts themselves can only be END-TO-END validated on a real Jetson +host; unit tests cover: + +* CLI flag parsing (rejects bad combos, accepts valid combos) +* --dry-run mode emits the expected ssh/docker command sequence +* Selector parity: same `-k ` flag produces a pytest invocation + with the same `-k` argument on both Tier-1 and Tier-2 +* AC-6 reflash gating: --reflash without TIER2_REFLASH_ACK=1 refuses +""" + +from __future__ import annotations + +import os +import re +import shutil +import subprocess +from pathlib import Path + +import pytest + +REPO_ROOT = Path(__file__).resolve().parents[3] +TIER1_SH = REPO_ROOT / "e2e" / "docker" / "run-tier1.sh" +TIER2_SH = REPO_ROOT / "e2e" / "jetson" / "run-tier2.sh" +ON_JETSON_SH = REPO_ROOT / "e2e" / "jetson" / "tier2-on-jetson.sh" + +# Skip all tests in this module when bash isn't available. +pytestmark = pytest.mark.skipif( + shutil.which("bash") is None, + reason="bash not available in this environment", +) + + +def _run(args: list[str], env: dict[str, str] | None = None) -> subprocess.CompletedProcess: + """Invoke a script and return the completed process (no `check=True`).""" + + full_env = dict(os.environ) + if env: + full_env.update(env) + return subprocess.run(args, capture_output=True, text=True, env=full_env) + + +# ───────── Existence + executable bit ───────── + + +@pytest.mark.parametrize("script", [TIER1_SH, TIER2_SH, ON_JETSON_SH]) +def test_script_exists_and_executable(script: Path) -> None: + # Assert + assert script.exists(), f"missing script: {script}" + assert os.access(script, os.X_OK), f"script not executable: {script}" + + +# ───────── CLI parsing — happy paths ───────── + + +def test_tier1_dry_run_emits_compose_command() -> None: + """Tier-1 --dry-run prints the docker-compose invocation.""" + + # Act + proc = _run( + [ + str(TIER1_SH), + "--fc-adapter", + "ardupilot", + "--vio-strategy", + "okvis2", + "--dry-run", + ] + ) + + # Assert + assert proc.returncode == 0, proc.stderr + assert "docker compose" in proc.stdout + assert "docker-compose.test.yml" in proc.stdout + assert "TIER=tier1-workstation" in proc.stdout + assert "e2e-runner" in proc.stdout + + +def test_tier2_dry_run_local_mode() -> None: + """Tier-2 --dry-run on local mode shows the delegate command.""" + + # Act + proc = _run( + [ + str(TIER2_SH), + "--fc-adapter", + "ardupilot", + "--vio-strategy", + "okvis2", + "--dry-run", + ], + env={"TIER2_HOST": "localhost"}, + ) + + # Assert + assert proc.returncode == 0, proc.stderr + assert "tier2-on-jetson.sh" in proc.stdout + assert "(local)" in proc.stdout, "local mode marker missing" + + +def test_tier2_dry_run_remote_mode() -> None: + """Tier-2 --dry-run with TIER2_HOST set ssh's via the delegate.""" + + # Arrange + fake_key = REPO_ROOT / "e2e" / "_unit_tests" / "jetson" / "_fake_key.tmp" + fake_key.write_text("fake") + try: + # Act + proc = _run( + [ + str(TIER2_SH), + "--fc-adapter", + "inav", + "--vio-strategy", + "klt_ransac", + "--dry-run", + ], + env={ + "TIER2_HOST": "jetson-test-01.internal", + "TIER2_USER": "azaion", + "TIER2_KEY_PATH": str(fake_key), + }, + ) + + # Assert + assert proc.returncode == 0, proc.stderr + assert "ssh -o StrictHostKeyChecking=accept-new" in proc.stdout + assert "azaion@jetson-test-01.internal" in proc.stdout + assert "rsync" in proc.stdout + assert "tier2-on-jetson.sh" in proc.stdout + finally: + fake_key.unlink(missing_ok=True) + + +# ───────── CLI parsing — rejection paths ───────── + + +def test_tier2_rejects_unknown_fc_adapter() -> None: + # Act + proc = _run( + [ + str(TIER2_SH), + "--fc-adapter", + "px4", + "--vio-strategy", + "okvis2", + "--dry-run", + ], + env={"TIER2_HOST": "localhost"}, + ) + + # Assert + assert proc.returncode == 2 + assert "--fc-adapter must be ardupilot or inav" in proc.stderr + + +def test_tier2_rejects_unknown_vio_strategy() -> None: + # Act + proc = _run( + [ + str(TIER2_SH), + "--fc-adapter", + "ardupilot", + "--vio-strategy", + "msckf", + "--dry-run", + ], + env={"TIER2_HOST": "localhost"}, + ) + + # Assert + assert proc.returncode == 2 + assert "--vio-strategy must be" in proc.stderr + + +def test_tier2_rejects_unknown_build_kind() -> None: + # Act + proc = _run( + [ + str(TIER2_SH), + "--fc-adapter", + "ardupilot", + "--vio-strategy", + "okvis2", + "--build-kind", + "debug", + "--dry-run", + ], + env={"TIER2_HOST": "localhost"}, + ) + + # Assert + assert proc.returncode == 2 + assert "--build-kind must be production or asan" in proc.stderr + + +def test_tier2_requires_tier2_host_on_non_arm() -> None: + """Without TIER2_HOST set on a non-aarch64 host, the script errors.""" + + # Act + proc = _run( + [ + str(TIER2_SH), + "--fc-adapter", + "ardupilot", + "--vio-strategy", + "okvis2", + "--dry-run", + ], + env={"TIER2_HOST": ""}, + ) + + # Assert — exit 5 unless we're actually on aarch64 (in which case + # localhost gets auto-selected and the script proceeds). + if os.uname().machine == "aarch64": + assert proc.returncode == 0 + else: + assert proc.returncode == 5 + assert "TIER2_HOST must be set" in proc.stderr + + +# ───────── AC-6: reflash gating ───────── + + +def test_reflash_refuses_without_ack() -> None: + """--reflash without TIER2_REFLASH_ACK=1 must refuse to proceed.""" + + # Act + proc = _run( + [ + str(TIER2_SH), + "--fc-adapter", + "ardupilot", + "--vio-strategy", + "okvis2", + "--reflash", + "--dry-run", + ], + env={"TIER2_HOST": "localhost"}, + ) + + # Assert + assert proc.returncode == 4 + assert "TIER2_REFLASH_ACK=1" in proc.stderr + + +def test_reflash_dry_run_with_ack_shows_flash_command() -> None: + """--reflash with the ack present shows the sdkmanager command on --dry-run.""" + + # Act + proc = _run( + [ + str(TIER2_SH), + "--fc-adapter", + "ardupilot", + "--vio-strategy", + "okvis2", + "--reflash", + "--dry-run", + ], + env={"TIER2_HOST": "localhost", "TIER2_REFLASH_ACK": "1"}, + ) + + # Assert + assert proc.returncode == 0, proc.stderr + assert "nvidia-sdkmanager-cli flash" in proc.stdout + + +# ───────── AC-1: selector parity ───────── + + +@pytest.mark.parametrize( + "selector,tier_args,expected_in_stdout", + [ + ("not_tier2_only", "tier1", "TIER=tier1-workstation"), + ("FT_P", "tier2", "JETSON_HOST=localhost"), + ], +) +def test_selector_appears_in_dry_run( + selector: str, tier_args: str, expected_in_stdout: str +) -> None: + """The same -k selector arg surfaces in both tier dry-runs.""" + + # Arrange + script = TIER1_SH if tier_args == "tier1" else TIER2_SH + + # Act + proc = _run( + [ + str(script), + "--fc-adapter", + "ardupilot", + "--vio-strategy", + "okvis2", + "-k", + selector, + "--dry-run", + ], + env={"TIER2_HOST": "localhost"}, + ) + + # Assert + assert proc.returncode == 0, proc.stderr + # The Tier-1 selector appears directly in the printed pytest arg + # list; the Tier-2 selector is forwarded via SELECTOR= env var into + # the delegate, which then puts it on the pytest cmdline. Both + # variations end up containing the selector string. + assert selector in proc.stdout, ( + f"selector '{selector}' not present in {script.name} dry-run output" + ) + assert expected_in_stdout in proc.stdout + + +def test_selector_parity_pytest_args_equivalent() -> None: + """Tier-1 and Tier-2 dry-runs both compose `-k ` into the + pytest argv. We extract the `-k` arg from each and assert they + match. + """ + + # Arrange + selector = "FT_P_09_AP and not asan" + + # Act + p1 = _run( + [ + str(TIER1_SH), + "--fc-adapter", + "ardupilot", + "--vio-strategy", + "okvis2", + "-k", + selector, + "--dry-run", + ] + ) + p2 = _run( + [ + str(TIER2_SH), + "--fc-adapter", + "ardupilot", + "--vio-strategy", + "okvis2", + "-k", + selector, + "--dry-run", + ], + env={"TIER2_HOST": "localhost"}, + ) + + # Assert + assert p1.returncode == 0 and p2.returncode == 0 + # Tier-1 shows `-k ` directly in the dry-run output. + assert f"-k {selector}" in p1.stdout + # Tier-2 forwards via SELECTOR= env var. + assert f"SELECTOR={selector}" in p2.stdout diff --git a/e2e/_unit_tests/reporting/test_csv_reporter.py b/e2e/_unit_tests/reporting/test_csv_reporter.py index 159d0c8..54aad11 100644 --- a/e2e/_unit_tests/reporting/test_csv_reporter.py +++ b/e2e/_unit_tests/reporting/test_csv_reporter.py @@ -148,6 +148,22 @@ def test_build_row_records_evidence_paths() -> None: assert row["evidence_paths"] == "evidence/a.tlog,evidence/b.csv" +def test_build_row_pass_when_no_session_attribute() -> None: + """The PARTIAL propagation path swallows AttributeError on a fake item. + + AZ-445: when nfr_recorder is loaded the result column may flip to + PARTIAL; when it isn't (or when item.session is missing — unit-test + fake context), the row stays PASS. + """ + # Arrange — fake item without .session + item = _FakeItem() + report = _report("passed") + # Act + row = build_row(item, report, "2026-05-16T10:00:00+00:00", 1) + # Assert + assert row["result"] == "PASS", "no aggregator available → result must be PASS" + + # --------------------------------------------------------------------------- # In-process plugin integration # --------------------------------------------------------------------------- diff --git a/e2e/_unit_tests/reporting/test_nfr_recorder.py b/e2e/_unit_tests/reporting/test_nfr_recorder.py new file mode 100644 index 0000000..a89d314 --- /dev/null +++ b/e2e/_unit_tests/reporting/test_nfr_recorder.py @@ -0,0 +1,305 @@ +"""Tests for the AZ-445 NFR recorder + run-end aggregator.""" + +from __future__ import annotations + +import json +import textwrap +from pathlib import Path + +import pytest + +from runner.reporting import nfr_recorder +from runner.reporting.nfr_recorder import ( + _RunAggregator, + parse_traceability_matrix, +) + + +# ───────────────────── traceability matrix parser ───────────────────── + + +def test_parse_traceability_matrix_extracts_ac_ids(tmp_path: Path) -> None: + """Every row prefixed by an `AC-…` or `RESTRICT-…` token is captured.""" + + # Arrange + matrix = tmp_path / "matrix.md" + matrix.write_text( + textwrap.dedent( + """ + ## Acceptance Criteria Coverage + + | AC ID | Description | Source | Status | + |-------|-------------|--------|--------| + | AC-1.1 | something | FT-P-01 | Covered | + | AC-7.1 | nope | — | NOT COVERED | + | RESTRICT-CAM-2 | restriction | NFT-SEC-01 | Covered | + + text in between (no row). + + | AC-NEW-3 | another | NFT-LIM-02 | Covered | + """ + ).strip() + ) + + # Act + ids = parse_traceability_matrix(matrix) + + # Assert + assert ids == sorted(["AC-1.1", "AC-7.1", "RESTRICT-CAM-2", "AC-NEW-3"]) + + +def test_parse_traceability_matrix_missing_file(tmp_path: Path) -> None: + """Missing matrix file surfaces as a clear FileNotFoundError.""" + # Act + Assert + with pytest.raises(FileNotFoundError): + parse_traceability_matrix(tmp_path / "does-not-exist.md") + + +# ───────────────────── aggregator: per-scenario state ───────────────────── + + +def _aggregator(tmp_path: Path, matrix_ids: list[str]) -> _RunAggregator: + return _RunAggregator(tmp_path, matrix_ids) + + +def test_aggregator_records_metric_and_partial(tmp_path: Path) -> None: + """ensure_record → record_metric → mark_partial round-trips into _records.""" + + # Arrange + agg = _aggregator(tmp_path, ["AC-1.1", "AC-4.1"]) + rec = agg.ensure_record( + scenario_id="NFT-PERF-01", nodeid="test_x", traces_to=("AC-4.1",) + ) + + # Act + agg.record_metric( + scenario_id=rec.scenario_id, + name="latency_p95_ms", + value=380.4, + ac_id="AC-4.1", + nodeid="test_x", + ) + agg.mark_partial( + scenario_id=rec.scenario_id, + ac_id="AC-4.1", + reason="exceeds 400ms in chamber", + nodeid="test_x", + ) + agg.set_outcome("test_x", "PASS") + + # Assert + [stored] = agg.records() + assert stored.metrics["latency_p95_ms"] == {"value": 380.4, "ac_id": "AC-4.1"} + assert stored.partial_acs == {"AC-4.1": "exceeds 400ms in chamber"} + assert stored.outcome == "PASS" + + +# ───────────────────── aggregator: emission ───────────────────── + + +def test_emit_per_nfr_json_writes_one_file_per_scenario(tmp_path: Path) -> None: + """AC-1: per-NFR JSON emitted for each recorded scenario.""" + + # Arrange + agg = _aggregator(tmp_path, ["AC-4.1"]) + agg.ensure_record("NFT-PERF-01", "test_a", ("AC-4.1",)) + agg.ensure_record("NFT-PERF-02", "test_b", ("AC-4.4",)) + agg.record_metric( + scenario_id="NFT-PERF-01", + name="latency_p95_ms", + value=380.4, + ac_id="AC-4.1", + nodeid="test_a", + ) + agg.set_outcome("test_a", "PASS") + agg.set_outcome("test_b", "PASS") + + # Act + paths = agg.emit_per_nfr_json() + + # Assert + assert len(paths) == 2 + assert {p.name for p in paths} == {"NFT-PERF-01.json", "NFT-PERF-02.json"} + blob_a = json.loads((tmp_path / "per-nfr" / "NFT-PERF-01.json").read_text()) + assert blob_a["scenario_id"] == "NFT-PERF-01" + assert blob_a["outcome"] == "PASS" + assert blob_a["traces_to"] == ["AC-4.1"] + assert blob_a["metrics"]["latency_p95_ms"]["value"] == 380.4 + + +def test_emit_traceability_status_classifies_acs(tmp_path: Path) -> None: + """AC-2: every matrix AC ID appears with status + sources.""" + + # Arrange — matrix has 3 ACs. One scenario covers AC-1.1 (PASS) + + # AC-4.1 (PARTIAL). A second scenario covers AC-1.1 (PASS). + # AC-NEW-3 has no tracing scenario. + agg = _aggregator(tmp_path, ["AC-1.1", "AC-4.1", "AC-NEW-3"]) + agg.ensure_record("FT-P-01", "test_p01", ("AC-1.1",)) + agg.ensure_record("FT-P-01-dup", "test_p01b", ("AC-1.1",)) + agg.ensure_record("NFT-PERF-01", "test_perf01", ("AC-4.1",)) + agg.mark_partial( + scenario_id="NFT-PERF-01", + ac_id="AC-4.1", + reason="exceeds threshold under chamber", + nodeid="test_perf01", + ) + agg.set_outcome("test_p01", "PASS") + agg.set_outcome("test_p01b", "PASS") + agg.set_outcome("test_perf01", "PASS") + + # Act + status = agg.compute_traceability_status() + emitted_path = agg.emit_traceability_status() + + # Assert + assert status["AC-1.1"]["status"] == "Covered" + assert sorted(status["AC-1.1"]["sources"]) == ["FT-P-01", "FT-P-01-dup"] + assert status["AC-4.1"]["status"] == "PARTIAL" + assert status["AC-4.1"]["sources"] == ["NFT-PERF-01"] + assert status["AC-NEW-3"]["status"] == "NOT COVERED" + assert status["AC-NEW-3"]["sources"] == [] + persisted = json.loads(emitted_path.read_text()) + assert persisted == status + + +def test_emit_traceability_status_downgrades_on_fail(tmp_path: Path) -> None: + """A FAILing test tracing to an AC keeps the AC out of Covered.""" + + # Arrange + agg = _aggregator(tmp_path, ["AC-1.1"]) + agg.ensure_record("FT-P-01", "test_p01", ("AC-1.1",)) + agg.set_outcome("test_p01", "FAIL") + + # Act + status = agg.compute_traceability_status() + + # Assert + # Per AZ-445 AC-2 the status enum is {Covered, PARTIAL, NOT COVERED}. + # A FAIL is downgraded to PARTIAL (it's covered by a scenario but + # the scenario didn't pass). + assert status["AC-1.1"]["status"] == "PARTIAL" + + +def test_emit_regression_baseline_dumps_numeric_metrics(tmp_path: Path) -> None: + """AC-3: regression-baseline.json contains every numeric metric per scenario.""" + + # Arrange + agg = _aggregator(tmp_path, ["AC-4.1"]) + agg.ensure_record("NFT-PERF-01", "test_a", ("AC-4.1",)) + agg.record_metric( + scenario_id="NFT-PERF-01", + name="latency_p95_ms", + value=380.4, + ac_id="AC-4.1", + nodeid="test_a", + ) + agg.record_metric( + scenario_id="NFT-PERF-01", + name="latency_p99_ms", + value=420.7, + ac_id="AC-4.1", + nodeid="test_a", + ) + agg.record_metric( + scenario_id="NFT-PERF-01", + name="extra_meta", + value={"k": "v"}, # non-numeric — dropped from baseline + ac_id="AC-4.1", + nodeid="test_a", + ) + agg.set_outcome("test_a", "PASS") + + # Act + path = agg.emit_regression_baseline() + + # Assert + blob = json.loads(path.read_text()) + assert blob["scenarios"]["NFT-PERF-01"]["metrics"] == { + "latency_p95_ms": 380.4, + "latency_p99_ms": 420.7, + } + assert blob["scenarios"]["NFT-PERF-01"]["outcome"] == "PASS" + assert "extra_meta" not in blob["scenarios"]["NFT-PERF-01"]["metrics"] + + +# ───────────────────── integration with pytest plugin ───────────────────── + + +def test_nfr_recorder_fixture_emits_artifacts_in_run(tmp_path: Path) -> None: + """End-to-end: invoke an in-process pytest run, assert artifacts exist. + + The inner test calls `nfr_recorder.record_metric` + `partial` and + asserts PASS. The outer test (this one) checks that the run emitted + per-nfr/.json, traceability-status.json, and + regression-baseline.json into the evidence dir. + """ + + # Arrange + matrix = tmp_path / "matrix.md" + matrix.write_text( + "## Acceptance Criteria Coverage\n\n" + "| AC ID | Desc | Source | Status |\n" + "|-------|------|--------|--------|\n" + "| AC-4.1 | foo | NFT-PERF-01 | Covered |\n" + "| AC-4.2 | bar | NFT-PERF-02 | Covered |\n" + ) + evidence_out = tmp_path / "evidence" + evidence_out.mkdir() + + inner = tmp_path / "test_inner.py" + inner.write_text( + textwrap.dedent( + """ + import pytest + + @pytest.mark.scenario_id("NFT-PERF-01") + @pytest.mark.traces_to(("AC-4.1",)) + def test_inner_perf(nfr_recorder): + nfr_recorder.record_metric("latency_p95_ms", 380.4, ac_id="AC-4.1") + nfr_recorder.partial("AC-4.1", "exceeds threshold") + """ + ) + ) + # Minimal conftest registering only `--evidence-out` so nfr_recorder + # has a place to write. (The real harness's conftest is heavy; we + # don't want to drag it in.) + (tmp_path / "conftest.py").write_text( + textwrap.dedent( + """ + def pytest_addoption(parser): + parser.addoption( + "--evidence-out", + action="store", + default=".", + ) + """ + ) + ) + + # Act + rc = pytest.main( + [ + "-p", + "runner.reporting.csv_reporter", + "-p", + "runner.reporting.nfr_recorder", + str(inner), + f"--evidence-out={evidence_out}", + f"--traceability-matrix={matrix}", + "--no-header", + "-q", + ] + ) + + # Assert + assert rc == 0, f"inner pytest run failed with rc={rc}" + per_nfr = evidence_out / "per-nfr" / "NFT-PERF-01.json" + assert per_nfr.exists() + blob = json.loads(per_nfr.read_text()) + assert blob["scenario_id"] == "NFT-PERF-01" + assert blob["partial_acs"] == {"AC-4.1": "exceeds threshold"} + status = json.loads((evidence_out / "traceability-status.json").read_text()) + assert status["AC-4.1"]["status"] == "PARTIAL" + assert status["AC-4.2"]["status"] == "NOT COVERED" + baseline = json.loads((evidence_out / "regression-baseline.json").read_text()) + assert baseline["scenarios"]["NFT-PERF-01"]["metrics"] == {"latency_p95_ms": 380.4} diff --git a/e2e/_unit_tests/test_directory_layout.py b/e2e/_unit_tests/test_directory_layout.py index 069329f..32ec3c2 100644 --- a/e2e/_unit_tests/test_directory_layout.py +++ b/e2e/_unit_tests/test_directory_layout.py @@ -22,7 +22,9 @@ E2E_ROOT = Path(__file__).resolve().parents[1] "docker/docker-compose.test.yml", "docker/docker-compose.tier2-bridge.yml", "docker/secrets/mavlink_passkey", + "docker/run-tier1.sh", "jetson/run-tier2.sh", + "jetson/tier2-on-jetson.sh", "jetson/tier2.service", "jetson/tegrastats_parser.py", "jetson/jtop_parser.py", @@ -32,6 +34,7 @@ E2E_ROOT = Path(__file__).resolve().parents[1] "runner/conftest.py", "runner/reporting/csv_reporter.py", "runner/reporting/evidence_bundler.py", + "runner/reporting/nfr_recorder.py", "runner/helpers/frame_source_replay.py", "runner/helpers/imu_replay.py", "runner/helpers/sitl_observer.py", @@ -42,14 +45,21 @@ E2E_ROOT = Path(__file__).resolve().parents[1] "fixtures/mock-suite-sat/app.py", "fixtures/mock-suite-sat/requirements.txt", "fixtures/tile-cache-builder/README.md", + "fixtures/tile-cache-builder/builder.py", + "fixtures/tile-cache-builder/Dockerfile", + "fixtures/tile-cache-builder/build.sh", "fixtures/age-injector/README.md", + "fixtures/age-injector/age_injector.py", + "fixtures/age-injector/inject.sh", "fixtures/injectors/outlier.py", "fixtures/injectors/blackout_spoof.py", "fixtures/injectors/multi_segment.py", "fixtures/injectors/cold_boot.py", "fixtures/cold-boot/README.md", + "fixtures/cold-boot/cold_boot_fixture.json", "fixtures/secrets/mavlink-test-passkey.txt", "fixtures/security/generate_cve_jpeg.py", + "fixtures/security/cve-2025-53644.jpg", "fixtures/security/README.md", "tests/__init__.py", "tests/conftest.py", @@ -63,19 +73,35 @@ E2E_ROOT = Path(__file__).resolve().parents[1] ], ) def test_required_path_exists(relative_path: str) -> None: - """Each path AZ-406 commits to must exist on disk.""" + """Each path AZ-406 + AZ-407 + AZ-444 + AZ-445 commit to must exist on disk.""" assert (E2E_ROOT / relative_path).exists(), ( - f"AZ-406 layout invariant broken: e2e/{relative_path} is missing" + f"layout invariant broken: e2e/{relative_path} is missing" ) def test_passkey_files_match() -> None: - """Docker secret and runner-side passkey fixture must hold the same bytes.""" + """Docker secret and runner-side passkey fixture must encode the same secret. + + The docker-secret file is consumed by mavproxy as a raw 64-hex passkey + (no comments allowed in its body). The runner-side fixture file is the + AZ-407 AC-5 deliverable and ships with a ``# TEST ONLY...`` header + line so it self-documents during code review. + + We therefore compare the FIRST 64-hex line of each file rather than + the raw bytes. The two files MUST encode the same 32-byte secret; + drift between them would mean a mavproxy run uses a different key + than the runner fixture states. + """ + # Arrange - docker_pk = (E2E_ROOT / "docker/secrets/mavlink_passkey").read_bytes() - runner_pk = (E2E_ROOT / "fixtures/secrets/mavlink-test-passkey.txt").read_bytes() + docker_pk = (E2E_ROOT / "docker/secrets/mavlink_passkey").read_text().strip().splitlines() + runner_pk_lines = (E2E_ROOT / "fixtures/secrets/mavlink-test-passkey.txt").read_text().strip().splitlines() + runner_pk = [line for line in runner_pk_lines if not line.lstrip().startswith("#")] + # Assert - assert docker_pk == runner_pk, ( - "MAVLink test passkey bytes differ between docker secret and runner " - "fixture. They MUST be kept in sync — see e2e/fixtures/secrets/README.md." + assert docker_pk and runner_pk, "passkey files must contain at least one non-comment line" + assert docker_pk[0] == runner_pk[0], ( + "MAVLink test passkey secrets differ between docker secret and runner " + "fixture. They MUST encode the same 32-byte secret — see " + "e2e/fixtures/secrets/README.md." ) diff --git a/e2e/docker/run-tier1.sh b/e2e/docker/run-tier1.sh new file mode 100755 index 0000000..acf7de7 --- /dev/null +++ b/e2e/docker/run-tier1.sh @@ -0,0 +1,99 @@ +#!/usr/bin/env bash +# Tier-1 (workstation Docker) entrypoint. Selector-parity sibling of +# `e2e/jetson/run-tier2.sh`. +# +# Usage: +# ./run-tier1.sh \ +# --fc-adapter \ +# --vio-strategy \ +# [-k ] \ +# [--build-kind ] \ +# [--enable-chamber] \ +# [--dry-run] +# +# AZ-444 AC-1: this script + run-tier2.sh accept the same `-k ` +# flag and emit the same pytest invocation modulo the TIER env var. + +set -euo pipefail + +FC_ADAPTER="" +VIO_STRATEGY="" +SELECTOR="" +BUILD_KIND="production" +ENABLE_CHAMBER=0 +DRY_RUN=0 + +usage() { + grep -E '^# ' "$0" | sed 's/^# //' >&2 + exit 1 +} + +while [[ $# -gt 0 ]]; do + case "$1" in + --fc-adapter) FC_ADAPTER="$2"; shift 2 ;; + --vio-strategy) VIO_STRATEGY="$2"; shift 2 ;; + -k|--selector) SELECTOR="$2"; shift 2 ;; + --build-kind) BUILD_KIND="$2"; shift 2 ;; + --enable-chamber) ENABLE_CHAMBER=1; shift ;; + --dry-run) DRY_RUN=1; shift ;; + -h|--help) usage ;; + *) echo "Unknown arg: $1" >&2; usage ;; + esac +done + +if [[ -z "$FC_ADAPTER" || -z "$VIO_STRATEGY" ]]; then + echo "ERROR: --fc-adapter and --vio-strategy are required" >&2 + usage +fi + +case "$FC_ADAPTER" in + ardupilot|inav) ;; + *) echo "ERROR: --fc-adapter must be ardupilot or inav (got: $FC_ADAPTER)" >&2; exit 2 ;; +esac + +case "$VIO_STRATEGY" in + okvis2|klt_ransac|vins_mono) ;; + *) echo "ERROR: --vio-strategy must be okvis2 | klt_ransac | vins_mono (got: $VIO_STRATEGY)" >&2; exit 2 ;; +esac + +case "$BUILD_KIND" in + production|asan) ;; + *) echo "ERROR: --build-kind must be production or asan (got: $BUILD_KIND)" >&2; exit 2 ;; +esac + +: "${RUN_ID:=tier1-$(date -u +%Y%m%dT%H%M%SZ)-${FC_ADAPTER}-${VIO_STRATEGY}}" +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +REPO_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)" + +PYTEST_ARGS=("/test-suite") +PYTEST_ARGS+=("--csv=/e2e-results/run-${RUN_ID}/report.csv") +PYTEST_ARGS+=("--csv-columns=test_id,test_name,traces_to,fc_adapter,vio_strategy,tier,started_at_utc,execution_time_ms,result,error_message,evidence_paths") +PYTEST_ARGS+=("--evidence-out=/e2e-results/run-${RUN_ID}/evidence") +PYTEST_ARGS+=("--build-kind=${BUILD_KIND}") +[[ "${ENABLE_CHAMBER}" -eq 1 ]] && PYTEST_ARGS+=("--enable-chamber") +[[ -n "${SELECTOR}" ]] && PYTEST_ARGS+=("-k" "${SELECTOR}") + +COMPOSE_CMD=( + docker compose + -f "${SCRIPT_DIR}/docker-compose.test.yml" + run --rm + -e TIER=tier1-workstation + -e BUILD_KIND="${BUILD_KIND}" + e2e-runner + pytest "${PYTEST_ARGS[@]}" +) + +if [[ "${DRY_RUN}" -eq 1 ]]; then + echo "[tier1] --dry-run:" + echo "[tier1] RUN_ID=${RUN_ID}" + echo "[tier1] ${COMPOSE_CMD[*]}" + exit 0 +fi + +RUN_ID="${RUN_ID}" \ +FC_ADAPTER="${FC_ADAPTER}" \ +VIO_STRATEGY="${VIO_STRATEGY}" \ +TIER="tier1-workstation" \ +"${COMPOSE_CMD[@]}" + +echo "[tier1] Suite complete. RUN_ID=${RUN_ID}" diff --git a/e2e/fixtures/age-injector/README.md b/e2e/fixtures/age-injector/README.md index 5c31322..69fd4c6 100644 --- a/e2e/fixtures/age-injector/README.md +++ b/e2e/fixtures/age-injector/README.md @@ -1,7 +1,50 @@ -# age-injector +# age-injector (AZ-407) -Mutates `tile-cache-fixture` manifest dates → `synth-age-tile-set` for -FT-N-05 / FT-N-06 (stale-tile rejection on freshness violation). +Clones a `tile-cache-fixture` tree and mutates ONLY the manifest's +`capture_date` field (and the per-tile sidecar JSON's matching field) +to age every entry by a target number of months. -Delivered by **AZ-407** (Static fixture builders). AZ-406 commits to the -directory location + name only. +## Output volumes + +| Volume | Age shift | Triggers | +|--------|-----------|----------| +| `synth-age-7mo` | now - 7 mo | > AC-8.2 active-conflict threshold (6 mo) — FT-N-05 | +| `synth-age-13mo` | now - 13 mo | > AC-8.2 rear threshold (12 mo) — FT-N-06 | + +## Reproducibility + +* Tile JPEG bodies are copied bit-identical (`shutil.copytree`). +* Manifest CSV row order is preserved from the source manifest (the + builder already sorts rows by `(zoom, x, y)`). +* The shifted date is `now - age_months × 30.44 days`, rounded — the + AC-3 tolerance is `± 1 day`, well within the 30.44-day floor. +* The descriptors.index (if present in the source) is copied + bit-identical. + +## Provenance + +The injector itself is fully synthetic. The aged volumes are derivative +works of `tile-cache-fixture` (same license — see +`e2e/fixtures/tile-cache-builder/README.md` § Provenance). + +## Usage + +```bash +# Production (Docker volumes): +e2e/fixtures/age-injector/inject.sh + +# Local mode (used by AZ-407 unit test): +e2e/fixtures/age-injector/inject.sh --local /tmp/src /tmp/out-7mo /tmp/out-13mo +``` + +The unit test `e2e/_unit_tests/fixtures/test_age_injector.py` verifies +AC-3 by: + +1. Building a small tile-cache fixture from a synthetic 4-still input +2. Running the injector with `--age-months=7` and `--age-months=13` +3. Asserting the manifest `capture_date` shifts ±1 day from `now - N*30.44 days` +4. Asserting every tile JPEG body byte-equals the source + +## Owned by + +AZ-407 (this task). diff --git a/e2e/fixtures/age-injector/age_injector.py b/e2e/fixtures/age-injector/age_injector.py new file mode 100644 index 0000000..797af91 --- /dev/null +++ b/e2e/fixtures/age-injector/age_injector.py @@ -0,0 +1,177 @@ +"""Age-injector for the tile-cache fixture. + +Clones a ``tile-cache-fixture`` tree and mutates ONLY the manifest's +``capture_date`` column (and the per-tile sidecar JSON's matching field). +Tile JPEG bodies are copied bit-identical. + +AC-3 (AZ-407): given target=7mo, every row's ``capture_date`` becomes +``now - 7 mo`` ± 1 day, exceeding the AC-8.2 active-conflict 6-month +threshold. Given target=13mo, every row's ``capture_date`` becomes +``now - 13 mo`` ± 1 day, exceeding the rear 12-month threshold. + +Used by FT-N-05 / FT-N-06 (stale-tile rejection on freshness violation). + +Public-boundary discipline: this module does NOT import any +``src/gps_denied_onboard`` symbol. The freshness contract lives in +``_docs/00_problem/restrictions.md`` § Satellite Imagery (AC-8.2). +""" + +from __future__ import annotations + +import argparse +import csv +import datetime as _dt +import json +import logging +import shutil +import sys +from pathlib import Path + +logger = logging.getLogger(__name__) + +# 30.44 days/month average — gives `now - N*30 days ± 1 day`, which the +# AC's "±1 day" tolerance accepts. +_DAYS_PER_MONTH = 30.44 + +_MANIFEST_HEADERS = ( + "zoom_level", + "tile_x", + "tile_y", + "capture_date", + "source", + "m_per_px", + "jpeg_path", + "content_hash", + "provenance", +) + + +def _shifted_date(now: _dt.date, age_months: int) -> str: + delta_days = int(round(age_months * _DAYS_PER_MONTH)) + return (now - _dt.timedelta(days=delta_days)).isoformat() + + +def inject( + source_dir: Path, + output_dir: Path, + age_months: int, + now: _dt.date | None = None, +) -> dict: + """Clone ``source_dir`` into ``output_dir`` and mutate dates. + + Returns a summary dict: + {"row_count": int, "shifted_date": "YYYY-MM-DD", "source_dir": str} + """ + + if age_months <= 0: + raise ValueError(f"age_months must be positive; got {age_months}") + if now is None: + now = _dt.datetime.now(tz=_dt.timezone.utc).date() + + if output_dir.exists(): + shutil.rmtree(output_dir) + output_dir.mkdir(parents=True) + + # Phase 1: clone the tile tree. Pixels copy bit-identical. + src_tiles = source_dir / "tiles" + if not src_tiles.is_dir(): + raise FileNotFoundError( + f"{source_dir} does not look like a tile-cache fixture " + "(no `tiles/` subdir)" + ) + shutil.copytree(src_tiles, output_dir / "tiles") + + shifted = _shifted_date(now, age_months) + + # Phase 2: mutate per-tile sidecar JSON files. + sidecar_count = 0 + for sidecar in sorted((output_dir / "tiles").rglob("*.json")): + data = json.loads(sidecar.read_text()) + data["capture_date"] = shifted + sidecar.write_text( + json.dumps(data, sort_keys=True, separators=(",", ":")) + "\n" + ) + sidecar_count += 1 + + # Phase 3: re-emit manifest.csv with shifted dates. Row order is + # preserved (the source manifest is already sorted by builder.py). + src_manifest = source_dir / "manifest.csv" + if not src_manifest.is_file(): + raise FileNotFoundError(f"missing manifest.csv at {src_manifest}") + with src_manifest.open() as fp: + reader = csv.DictReader(fp) + if tuple(reader.fieldnames or ()) != _MANIFEST_HEADERS: + raise ValueError( + f"unexpected manifest schema: {reader.fieldnames} " + f"(expected {list(_MANIFEST_HEADERS)})" + ) + rows = list(reader) + + out_manifest = output_dir / "manifest.csv" + with out_manifest.open("w", newline="") as fp: + writer = csv.writer(fp, lineterminator="\n") + writer.writerow(_MANIFEST_HEADERS) + for r in rows: + writer.writerow( + [ + r["zoom_level"], + r["tile_x"], + r["tile_y"], + shifted, + r["source"], + r["m_per_px"], + r["jpeg_path"], + r["content_hash"], + r["provenance"], + ] + ) + + # Phase 4: passthrough the descriptors.index if present (FAISS file + # is independent of capture_date; copy bit-identical). + src_index = source_dir / "descriptors.index" + if src_index.is_file(): + shutil.copyfile(src_index, output_dir / "descriptors.index") + + return { + "row_count": len(rows), + "sidecar_count": sidecar_count, + "shifted_date": shifted, + "source_dir": str(source_dir), + } + + +def main(argv: list[str] | None = None) -> int: + parser = argparse.ArgumentParser(description="Age-inject the tile-cache fixture") + parser.add_argument( + "--source-dir", + type=Path, + required=True, + help="Path to the source tile-cache-fixture tree", + ) + parser.add_argument( + "--output-dir", + type=Path, + required=True, + help="Path to the aged output tree", + ) + parser.add_argument( + "--age-months", + type=int, + required=True, + help="Shift capture_date by this many months into the past", + ) + args = parser.parse_args(argv) + + logging.basicConfig( + level=logging.INFO, + format="%(asctime)s %(levelname)s %(name)s %(message)s", + ) + + summary = inject(args.source_dir, args.output_dir, args.age_months) + json.dump(summary, sys.stdout, sort_keys=True, indent=2) + sys.stdout.write("\n") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/e2e/fixtures/age-injector/inject.sh b/e2e/fixtures/age-injector/inject.sh new file mode 100755 index 0000000..1833f17 --- /dev/null +++ b/e2e/fixtures/age-injector/inject.sh @@ -0,0 +1,60 @@ +#!/usr/bin/env bash +# Clone the tile-cache fixture and emit `synth-age-7mo` + `synth-age-13mo` +# Docker volumes (or local directories in ``--local`` mode). +# +# AC-3: dates shifted by 7 mo / 13 mo ±1 day; tile pixel content +# bit-identical to the source. +# +# Env vars: +# TILE_CACHE_VOLUME_NAME Source volume (default: tile-cache-fixture) +# AGE_7MO_VOLUME_NAME Output volume for 7mo (default: synth-age-7mo) +# AGE_13MO_VOLUME_NAME Output volume for 13mo (default: synth-age-13mo) +# +# Usage: +# inject.sh # Docker mode +# inject.sh --local /src /out-7mo /out-13mo # local mode (unit test path) + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +SOURCE_VOL="${TILE_CACHE_VOLUME_NAME:-tile-cache-fixture}" +OUT_7MO_VOL="${AGE_7MO_VOLUME_NAME:-synth-age-7mo}" +OUT_13MO_VOL="${AGE_13MO_VOLUME_NAME:-synth-age-13mo}" + +if [[ "${1:-}" == "--local" ]]; then + if [[ -z "${2:-}" || -z "${3:-}" || -z "${4:-}" ]]; then + echo "ERROR: --local requires " >&2 + exit 2 + fi + python3 "${SCRIPT_DIR}/age_injector.py" \ + --source-dir "$2" --output-dir "$3" --age-months 7 + python3 "${SCRIPT_DIR}/age_injector.py" \ + --source-dir "$2" --output-dir "$4" --age-months 13 + exit 0 +fi + +# Docker mode: reuse the tile-cache-builder image (it already has +# Python + Pillow + numpy; the injector script is mounted in). +IMAGE_TAG="azaion-tile-cache-builder:local" + +for spec in "${OUT_7MO_VOL}:7" "${OUT_13MO_VOL}:13"; do + target_vol="${spec%%:*}" + months="${spec##*:}" + + docker volume rm "${target_vol}" >/dev/null 2>&1 || true + docker volume create "${target_vol}" >/dev/null + + docker run --rm \ + -v "${SCRIPT_DIR}:/opt/injector:ro" \ + -v "${SOURCE_VOL}:/source:ro" \ + -v "${target_vol}:/output" \ + --entrypoint python3 \ + "${IMAGE_TAG}" \ + /opt/injector/age_injector.py \ + --source-dir /source \ + --output-dir /output \ + --age-months "${months}" + + echo "synth-age volume '${target_vol}' built (age=${months}mo)" +done diff --git a/e2e/fixtures/cold-boot/README.md b/e2e/fixtures/cold-boot/README.md index c368222..f265ee0 100644 --- a/e2e/fixtures/cold-boot/README.md +++ b/e2e/fixtures/cold-boot/README.md @@ -1,8 +1,65 @@ -# cold-boot-fixture +# cold-boot-fixture (AZ-407 / AZ-419) -Static JSON fixture loaded by FT-P-11 (cold-start init) and NFT-PERF-03 -(cold-start TTFF). Schema mirror lives in -`e2e/fixtures/injectors/cold_boot.py` (`ColdBootFixture`). +`cold_boot_fixture.json` is a frozen FC pose snapshot at flight-resume +time. The file is consumed by: -AZ-419 produces `cold_boot_fixture.json` here. AZ-406 commits to the -directory location only. +* **AZ-419 (FT-P-11 cold-start init)** — secondary path + (`origin_source == fc_ekf` per ADR-010): loaded into the SITL via + the standard parameter-load path. The SUT cold-starts with no + Manifest `takeoff_origin`, and the test asserts the first outbound + estimate lands within ±50 m of the snapshot pose. +* **NFT-PERF-03 (cold-start TTFF)** — same loading path, with + performance instrumentation around the time-to-first-fix metric. + +## Schema (v1) + +```json +{ + "_schema": "cold-boot-fixture/v1", + "global_position_int": { "lat_e7": ..., "lon_e7": ..., "alt_mm": ..., ... }, + "attitude": { "roll_rad": ..., "pitch_rad": ..., "yaw_rad": ..., ... }, + "ardupilot_param_overrides": { ... }, + "inav_serial_rx_overrides": { ... } +} +``` + +The `global_position_int` block uses the canonical MAVLink +`GLOBAL_POSITION_INT` units (lat/lon scaled by 1e7; alt in mm). + +## Provenance + +| Field | Source | License | +|-------|--------|---------| +| Lat / Lon | Derkachi sector centre (50.075° N, 36.150° E) | Synthetic — chosen from the Derkachi route bbox | +| Alt | 100 m AGL | Synthetic placeholder; refined when D-PROJ-3 supplies the production scenario | +| Attitude | Level flight, heading 0° (north) | Synthetic — chosen to match the parametrize matrix's default | + +Fully synthetic; no third-party data. Re-distributable under this +repository's license. + +## Loading path + +* **ArduPilot**: `mavproxy.py --master=... --cmd="param load cold_boot_fixture.json"` + followed by a `FAKE_GPS` injection sequence (handled by the AZ-419 + fixture loader; this README only documents the file itself). +* **iNav**: MSP2 `SET_HOME` message + `MSP2_SENSOR_GPS` injection. The + per-FC wiring is handled by the AZ-419 fixture loader. + +## Verification + +The AZ-407 unit test +`e2e/_unit_tests/fixtures/test_cold_boot_fixture.py` asserts: + +* The file is valid JSON +* The `_schema` field equals `cold-boot-fixture/v1` +* All required numeric fields are present and within physically + reasonable bounds (±90° lat, ±180° lon, > 0 alt, etc.) + +AC-4 (SITL loads the pose within ±1 m of the lat/lon/alt fields) is +verified by AZ-419's FT-P-11 test inside the Docker-bound runner — +that path requires SITL, which the AZ-407 unit test layer cannot +exercise. + +## Owned by + +AZ-407 (this file) + AZ-419 (the loader that consumes it). diff --git a/e2e/fixtures/cold-boot/cold_boot_fixture.json b/e2e/fixtures/cold-boot/cold_boot_fixture.json new file mode 100644 index 0000000..647c1c8 --- /dev/null +++ b/e2e/fixtures/cold-boot/cold_boot_fixture.json @@ -0,0 +1,38 @@ +{ + "_schema": "cold-boot-fixture/v1", + "_description": "Frozen FC pose snapshot at flight-resume time. Loaded into ardupilot-plane-sitl / inav-sitl via the standard parameter-load path. Consumed by FT-P-11 (cold-start init, secondary path: origin_source == fc_ekf) per AZ-419.", + "_provenance": "synthetic — Derkachi sector centre at 100 m AGL, heading north", + "_license": "test-fixture (no third-party data; safe to redistribute under this repo's license)", + "_authored_for": ["AZ-407 (AC-4)", "AZ-419 (FT-P-11 fc_ekf path)"], + + "global_position_int": { + "time_boot_ms": 0, + "lat_e7": 500750000, + "lon_e7": 361500000, + "alt_mm": 100000, + "relative_alt_mm": 100000, + "vx_cm_s": 0, + "vy_cm_s": 0, + "vz_cm_s": 0, + "hdg_cdeg": 0 + }, + + "attitude": { + "roll_rad": 0.0, + "pitch_rad": 0.0, + "yaw_rad": 0.0, + "rollspeed_rad_s": 0.0, + "pitchspeed_rad_s": 0.0, + "yawspeed_rad_s": 0.0 + }, + + "ardupilot_param_overrides": { + "SIM_GPS_DISABLE": 0, + "SIM_GPS_TYPE": 1, + "_comment_lat_lon_alt_yaw": "SIM_GPS_* params do not directly set EKF origin on the parameter-load path; FT-P-11 fixture loader will use mavproxy `param load` + a follow-up SET_HOME_POSITION / FAKE_GPS injection to land the EKF at the snapshot pose." + }, + + "inav_serial_rx_overrides": { + "_comment": "iNav loads pose via MSP2_SENSOR_GPS injection + INAV_SET_HOME message. FT-P-11 loader uses the standard MSP2 path; this fixture only declares the target lat/lon/alt/yaw — the loader handles per-FC wiring." + } +} diff --git a/e2e/fixtures/secrets/README.md b/e2e/fixtures/secrets/README.md index e396013..99d28aa 100644 --- a/e2e/fixtures/secrets/README.md +++ b/e2e/fixtures/secrets/README.md @@ -3,9 +3,30 @@ These files are loaded by pymavlink / msp_gps_toy when the runner needs to participate in a signed-message handshake (FT-P-09-AP, NFT-SEC-03). -The bytes here match the Docker-secret value at -`e2e/docker/secrets/mavlink_passkey`. **Both files MUST be kept in sync.** +## Files -Production deployments never see either file — the production passkey is -provisioned via a real secret store at deploy time per `environment.md` +| File | Format | Consumer | +|------|--------|----------| +| `mavlink-test-passkey.txt` | `# header line` + 64-hex passkey | Runner-side test fixture (AZ-407 AC-5 deliverable) | + +The secret encoded here MUST match the bytes in +`e2e/docker/secrets/mavlink_passkey` (which is the raw 64-hex passkey +consumed by mavproxy as a Docker secret — no comment header allowed +in that file's body). The unit test +`e2e/_unit_tests/test_directory_layout.py::test_passkey_files_match` +strips the comment header before comparing. + +## Provenance + +The 64-hex value `0123456789abcdef…0123456789abcdef` is the canonical +"all-test-zeros-and-evens" pattern. It is **NOT** cryptographically +secure and MUST NEVER be used in any production deployment. + +Production deployments provision the passkey via a real secret store +at deploy time per `_docs/02_document/tests/environment.md` § Communication with system under test. + +## License + +Synthetic — no third-party material. Covered by this repository's +license. diff --git a/e2e/fixtures/secrets/mavlink-test-passkey.txt b/e2e/fixtures/secrets/mavlink-test-passkey.txt index eef9161..e0d179a 100644 --- a/e2e/fixtures/secrets/mavlink-test-passkey.txt +++ b/e2e/fixtures/secrets/mavlink-test-passkey.txt @@ -1 +1,2 @@ +# TEST ONLY — not for production use 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef diff --git a/e2e/fixtures/security/README.md b/e2e/fixtures/security/README.md index b99d634..92f9457 100644 --- a/e2e/fixtures/security/README.md +++ b/e2e/fixtures/security/README.md @@ -1,5 +1,48 @@ -# Security fixtures +# security fixtures (AZ-407 + AZ-439) -Hosts the crafted artifacts consumed by NFT-SEC-* scenarios. AZ-406 -delivers the directory + generator scaffold; concrete fixture content is -delivered by the consuming security tasks (AZ-439 for the CVE JPEG). +## Contents + +| File | Source | License | Consumer | +|------|--------|---------|----------| +| `generate_cve_jpeg.py` | Synthetic (this repo) | Same as repository license | AZ-439 (NFT-SEC-04) | +| `cve-2025-53644.jpg` | Generated by `generate_cve_jpeg.py` | Synthetic — no third-party data | NFT-SEC-04 control / regression test | + +## Provenance + +The JPEG is **fully synthetic** — hand-crafted bytes following the +JPEG structure documented in ITU-T T.81 / RFC 2046. It is NOT a copy +of the upstream CVE-2025-53644 proof-of-concept (whose redistribution +terms are unclear). The structural feature it exercises is a +**truncated SOS marker**: the marker is announced (`FFDA`) with a +valid 12-byte header but the entropy-coded scan data is absent and +the EOI (`FFD9`) is not present. + +This matches the class of malformed input that CVE-2025-53644 +exploits in vulnerable OpenCV (≤ 4.11). Hardened OpenCV (≥ 4.12) +must return a clean `imdecode` failure (None) without +buffer-overflow / use-after-free / SIGSEGV. + +## Verification + +```bash +.venv/bin/python -c " +import cv2, numpy as np +buf = np.fromfile('e2e/fixtures/security/cve-2025-53644.jpg', dtype=np.uint8) +img = cv2.imdecode(buf, cv2.IMREAD_COLOR) +assert img is None, 'AZ-407 fixture: OpenCV must reject this JPEG' +" +``` + +## Reproducibility + +The generator is deterministic — `python generate_cve_jpeg.py out.jpg` +produces the same 158-byte file every time. The SHA-256 of the +generated file is checked into `e2e/_unit_tests/fixtures/test_cve_jpeg.py` +so any change to the generator's byte layout fails the unit test +explicitly. + +## Re-distribution + +The synthetic byte-stream and the generator script are covered by +this repository's license. No third-party CVE proof-of-concept content +is committed. diff --git a/e2e/fixtures/security/cve-2025-53644.jpg b/e2e/fixtures/security/cve-2025-53644.jpg new file mode 100644 index 0000000..dbde4ee Binary files /dev/null and b/e2e/fixtures/security/cve-2025-53644.jpg differ diff --git a/e2e/fixtures/security/generate_cve_jpeg.py b/e2e/fixtures/security/generate_cve_jpeg.py index b8b2c0b..57331f1 100644 --- a/e2e/fixtures/security/generate_cve_jpeg.py +++ b/e2e/fixtures/security/generate_cve_jpeg.py @@ -1,43 +1,131 @@ """Programmatically generate the crafted JPEG fixture for CVE-2025-53644. -Per AZ-406 § Risk 5 — the upstream PoC JPEG has unclear redistribution -terms, so the e2e harness generates a structurally equivalent file from -scratch rather than committing copyrighted bytes. +Per AZ-407 § AC-6 and AZ-406 § Risk 5 — the upstream PoC JPEG has +unclear redistribution terms, so the e2e harness generates a +structurally equivalent malformed file from scratch rather than +committing copyrighted bytes. -The fixture is consumed by NFT-SEC-04 (OpenCV CVE-2025-53644 + -AddressSanitizer fuzz). The intent is NOT to reproduce the exact RCE; it -is to provide a malformed JPEG with the structural features the CVE -exploits (oversized DHT segment, truncated SOS marker) so the SUT's -hardened OpenCV path (>= 4.12.0) rejects it. +AZ-407 ships a *minimal* malformed JPEG with: + * Valid SOI marker (``FFD8``) + * Valid DQT (quantisation table) + * Valid SOF0 (baseline DCT) header + * **Truncated SOS marker** — the marker is announced (``FFDA``) but + only the length field is present; the entropy-coded data is + deliberately absent. This is the structural feature CVE-2025-53644 + exploits: vulnerable OpenCV (≤ 4.11) reads past the buffer; hardened + OpenCV (≥ 4.12) rejects gracefully with an `imread` failure. -AZ-406 commits to the generator's existence + signature; AZ-439 -(NFT-SEC-04) supplies the byte-level details and validates the generated -file actually triggers the CVE code path against opencv 4.11.x (control) -vs 4.12+ (mitigated). +AZ-439 (NFT-SEC-04) tightens this further: + * Adds an oversized DHT segment (the full PoC structure) + * Runs the file under AddressSanitizer to assert no buffer-overflow + / use-after-free is reported on the hardened build + * Compares behaviour against a control vulnerable OpenCV ≤ 4.11 + +The AZ-407 fixture is sufficient to verify AC-6: feeding it to +OpenCV 4.12+ does NOT crash; it returns a clean decode failure. + +The function is deterministic: same input → identical output bytes. """ from __future__ import annotations +import argparse +import hashlib +import logging from pathlib import Path +logger = logging.getLogger(__name__) + + +def _build_minimal_malformed_jpeg() -> bytes: + """Emit a deterministic malformed JPEG with a truncated SOS marker. + + Byte-level structure (annotated): + + FFD8 # SOI + FFE0 0010 4A464946 00 0102 0000 0001 0001 0000 # APP0 / JFIF stub + FFDB 0043 00 <64 bytes> # DQT (table 0, baseline) + FFC0 0011 08 0001 0001 03 01 22 00 02 11 01 03 11 01 # SOF0 (1x1 baseline 3-component) + FFC4 001F 00 <31 bytes> # DHT (DC table 0; bytes follow JPEG std) + FFDA 000C 03 01 00 02 11 03 11 00 3F 00 # SOS — header announced, NO entropy data + # CVE: truncated stream + """ + + soi = b"\xff\xd8" + app0 = bytes.fromhex( + "ffe000104a46494600010200000001000100" + "00" + ) + dqt_body = bytes(range(64)) + dqt = b"\xff\xdb" + (3 + len(dqt_body)).to_bytes(2, "big") + b"\x00" + dqt_body + sof0 = bytes.fromhex( + "ffc0001108" # SOF0 marker + length + precision + "0001" # height = 1 + "0001" # width = 1 + "03" # 3 components + "012200" # Y : id=1, sampling=22, quant tbl=0 + "021101" # Cb : id=2, sampling=11, quant tbl=1 + "031101" # Cr : id=3, sampling=11, quant tbl=1 + ) + # DHT for AC bits — standard JPEG huffman table 0/0; the count/value + # bytes here are a 31-byte body that decodes cleanly. We hand-craft + # the structure rather than depending on PIL. + dht_body = ( + b"\x00" # tc=0, th=0 + + bytes([0, 1, 5, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]) # length counts + + bytes([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]) # symbols + ) + dht = b"\xff\xc4" + (2 + len(dht_body)).to_bytes(2, "big") + dht_body + + # SOS: announce the marker + parameters, then STOP. No entropy-coded + # scan data. No EOI. This is the CVE-relevant truncation. + sos = bytes.fromhex( + "ffda000c" # SOS marker + length + "03" # 3 components in scan + "0100" # Y : DC=0 / AC=0 + "0211" # Cb : DC=1 / AC=1 + "0311" # Cr : DC=1 / AC=1 + "00" # Ss + "3f" # Se + "00" # Ah/Al + ) + + return soi + app0 + dqt + sof0 + dht + sos + def generate(out_path: Path) -> Path: - """Write a malformed JPEG to ``out_path``. Returns the path on success. + """Write the AZ-407 malformed JPEG to ``out_path``. - Raises NotImplementedError until AZ-439 supplies the byte template. - Tests that need the crafted fixture should mark themselves - @pytest.mark.skip(reason="awaiting AZ-439") until then. + Returns the path on success. Idempotent: writing twice produces the + same bytes. """ - raise NotImplementedError( - "generate_cve_jpeg.generate is owned by AZ-439 — AZ-406 commits " - "to the public signature only." + + blob = _build_minimal_malformed_jpeg() + out_path.parent.mkdir(parents=True, exist_ok=True) + out_path.write_bytes(blob) + logger.info( + "Wrote %d-byte CVE-2025-53644 fixture (sha256=%s) to %s", + len(blob), + hashlib.sha256(blob).hexdigest(), + out_path, ) + return out_path + + +def main(argv: list[str] | None = None) -> int: + parser = argparse.ArgumentParser(description="Generate CVE-2025-53644 fixture JPEG.") + parser.add_argument( + "out", + type=Path, + nargs="?", + default=Path("cve-2025-53644.jpg"), + help="Output JPEG path (default: ./cve-2025-53644.jpg)", + ) + args = parser.parse_args(argv) + logging.basicConfig(level=logging.INFO) + generate(args.out) + return 0 if __name__ == "__main__": - import argparse - - parser = argparse.ArgumentParser(description="Generate CVE-2025-53644 fixture JPEG.") - parser.add_argument("out", type=Path, default=Path("cve-2025-53644.jpg")) - args = parser.parse_args() - generate(args.out) + raise SystemExit(main()) diff --git a/e2e/fixtures/tile-cache-builder/Dockerfile b/e2e/fixtures/tile-cache-builder/Dockerfile new file mode 100644 index 0000000..aa8bffc --- /dev/null +++ b/e2e/fixtures/tile-cache-builder/Dockerfile @@ -0,0 +1,49 @@ +# syntax=docker/dockerfile:1.7 +# +# tile-cache-fixture builder image. Built once per CI; output is a named +# Docker volume (`tile-cache-fixture`) mounted RO into the SUT by +# `docker/docker-compose.test.yml`. +# +# Public-boundary discipline: this image does NOT install the SUT +# package. It depends only on: +# * Pillow — JPEG re-encode of the paired _gmaps.png reference tiles +# and the deterministic stub-tile generator. +# * faiss-cpu — deterministic HNSW descriptor index emission. +# * numpy — backing array dtype for FAISS. +# +# Reproducibility: +# * Pin Python to 3.10-slim (matches the runner image's Python line). +# * Pin Pillow, faiss-cpu, numpy to the versions verified deterministic +# in `e2e/_unit_tests/fixtures/test_tile_cache_builder.py`. +# * `PYTHONHASHSEED=0` neutralises hash-order non-determinism. + +FROM python:3.10.14-slim-bookworm@sha256:9c9efb0c19a8bb1f08e8e7a13be5d671e51bcb9c83a3a8b0e2ad7d8aaeb33b30 + +ENV PYTHONUNBUFFERED=1 \ + PYTHONDONTWRITEBYTECODE=1 \ + PYTHONHASHSEED=0 \ + PIP_NO_CACHE_DIR=1 + +RUN apt-get update \ + && apt-get install -y --no-install-recommends \ + libgomp1 \ + ca-certificates \ + && rm -rf /var/lib/apt/lists/* + +RUN pip install --no-cache-dir \ + "Pillow>=10.4,<12.0" \ + "numpy>=1.26,<2.0" \ + "faiss-cpu>=1.8,<2.0" + +WORKDIR /opt/builder +COPY builder.py /opt/builder/builder.py + +# Drop root for runtime; the image only reads /input and writes to +# /output, both bind-mounted by the caller. +RUN useradd -u 10001 -m -d /home/builder builder \ + && mkdir -p /input /output \ + && chown -R builder:builder /opt/builder /input /output +USER 10001:10001 + +ENTRYPOINT ["python", "/opt/builder/builder.py"] +CMD ["--input-dir", "/input", "--output-dir", "/output"] diff --git a/e2e/fixtures/tile-cache-builder/README.md b/e2e/fixtures/tile-cache-builder/README.md index 2088c77..c024fd8 100644 --- a/e2e/fixtures/tile-cache-builder/README.md +++ b/e2e/fixtures/tile-cache-builder/README.md @@ -1,15 +1,80 @@ -# tile-cache-builder +# tile-cache-builder (AZ-407) Builds the `tile-cache-fixture` Docker volume from the 60 still-image -satellite references in `_docs/00_problem/input_data/` plus the Derkachi -route bbox. +satellite references in `_docs/00_problem/input_data/` plus the +Derkachi route bbox. -This directory currently contains only the structural placeholder; the -concrete builder (Dockerfile + build script + FAISS HNSW index emitter + -manifest writer + reproducibility assertion) is delivered by **AZ-407** -(Static fixture builders) — see AC-7 ("Fixture builders are reproducible") -in `_docs/02_tasks/todo/AZ-406_test_infrastructure.md`. +## Output schema -AZ-406 commits to the directory's location + name only. Do NOT delete this -README before AZ-407 lands; the `e2e_unit_test_directory_layout` unit test -asserts the placeholder is present. +``` +tile-cache-fixture/ + tiles///.jpg # tile JPEG body + tiles///.json # per-tile sidecar (mirrors `tiles` row) + manifest.csv # sorted manifest (9 columns) + descriptors.index # FAISS HNSW32 index (omitted if faiss not available) +``` + +Manifest columns (per `_docs/00_problem/restrictions.md` § Satellite +Imagery + `_docs/02_document/data_model.md` § 2.1): + +| Column | Type | Notes | +|--------|------|-------| +| `zoom_level` | int | Slippy/XYZ zoom | +| `tile_x`, `tile_y` | int | Tile coords at the zoom | +| `capture_date` | ISO-8601 date | Default `2025-11-01` (frozen so freshness gate treats as fresh) | +| `source` | enum | `googlemaps` for real paired tiles, `stub` for D-PROJ-3 fallback | +| `m_per_px` | float | `0.5` (≥ the AC-8.1 floor) | +| `jpeg_path` | str | Relative path to the JPEG body | +| `content_hash` | hex | SHA-256 of the JPEG bytes | +| `provenance` | str | `paired_gmaps:AD000NNN`, `STUB`, or `STUB_BBOX:derkachi:lat,lon,lat,lon` | + +## Reproducibility (AC-1) + +Two consecutive invocations from the same input produce a bit-identical +output tree: + +* Input files iterated in lexicographic order +* PIL JPEG encoded with `quality=85, optimize=False, progressive=False, subsampling=2` +* Manifest rows sorted by `(zoom_level, tile_x, tile_y)` before CSV + serialisation +* FAISS index built single-threaded with `omp_set_num_threads(1)` and + SHA-derived stub descriptors + +## Provenance (AC-7) + +| Item | Source | License | +|------|--------|---------| +| Real tile bodies | `_docs/00_problem/input_data/AD*_gmaps.png` (2 paired references) | Project test fixture; safe to redistribute under this repo's license | +| Stub tile bodies | Generated from `_stub_jpeg_bytes(seed)` (PIL solid-fill) | Fully synthetic; no third-party data | +| Derkachi bbox tile | Synthetic placeholder until D-PROJ-3 lands | Fully synthetic | +| FAISS index | SHA-derived stub vectors (not real VPR descriptors) | Fully synthetic | + +## Usage + +```bash +# Production (Docker volume): +e2e/fixtures/tile-cache-builder/build.sh + +# Local mode (used by AZ-407 unit test): +e2e/fixtures/tile-cache-builder/build.sh --local /tmp/tile-cache-out +``` + +The unit test `e2e/_unit_tests/fixtures/test_tile_cache_builder.py` +verifies AC-1 / AC-2 / AC-7 by invoking `builder.py` twice against a +`tmp_path` and asserting the output is byte-identical. + +## Notes on D-PROJ-3 + +When D-PROJ-3 supplies the production tile-corpus for the Derkachi +sector, the stub tiles produced here (any row with `provenance = STUB`) +should be replaced by real Suite Sat Service tiles for those +footprints. The builder will then no longer fall back to +`_stub_jpeg_bytes` — every still that lacks a paired `_gmaps.png` +will draw from the real corpus instead. + +## Owned by + +AZ-407 (this task). The FAISS-stub descriptor format will not be used +in production; the production VPR pipeline (C2) emits real DINOv2 +descriptors. The stub format is sufficient for AZ-407's reproducibility +and schema contracts only. diff --git a/e2e/fixtures/tile-cache-builder/build.sh b/e2e/fixtures/tile-cache-builder/build.sh new file mode 100755 index 0000000..852b8ed --- /dev/null +++ b/e2e/fixtures/tile-cache-builder/build.sh @@ -0,0 +1,64 @@ +#!/usr/bin/env bash +# Build the tile-cache test fixture as a named Docker volume +# (`tile-cache-fixture`), or emit it to a local directory in +# ``--local `` mode (used by the AZ-407 unit tests). +# +# AC-1 (deterministic): two invocations against the same input emit +# identical FAISS index hash, identical manifest rows, and identical +# tile filesystem byte sizes. +# +# Env vars: +# TILE_CACHE_INPUT_DIR Path to _docs/00_problem/input_data (required) +# TILE_CACHE_VOLUME_NAME Docker volume name (default: tile-cache-fixture) +# +# Usage: +# build.sh # builds the named Docker volume +# build.sh --local /tmp/out # emits to /tmp/out (no Docker) + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +REPO_ROOT="$(cd "${SCRIPT_DIR}/../../.." && pwd)" + +VOLUME_NAME="${TILE_CACHE_VOLUME_NAME:-tile-cache-fixture}" +INPUT_DIR="${TILE_CACHE_INPUT_DIR:-${REPO_ROOT}/_docs/00_problem/input_data}" + +LOCAL_OUT="" +if [[ "${1:-}" == "--local" ]]; then + if [[ -z "${2:-}" ]]; then + echo "ERROR: --local requires an output directory" >&2 + exit 2 + fi + LOCAL_OUT="$2" +fi + +if [[ ! -d "${INPUT_DIR}" ]]; then + echo "ERROR: input dir not found: ${INPUT_DIR}" >&2 + exit 2 +fi + +if [[ -n "${LOCAL_OUT}" ]]; then + # Local mode: invoke builder.py directly. The caller's venv must + # have Pillow, numpy, faiss-cpu installed; the unit test pulls + # them via the dev extras. + python3 "${SCRIPT_DIR}/builder.py" \ + --input-dir "${INPUT_DIR}" \ + --output-dir "${LOCAL_OUT}" + exit 0 +fi + +# Docker mode: build the builder image and populate the named volume. +IMAGE_TAG="azaion-tile-cache-builder:local" + +docker build -t "${IMAGE_TAG}" "${SCRIPT_DIR}" + +# Recreate the named volume so output is bit-stable across runs (AC-1). +docker volume rm "${VOLUME_NAME}" >/dev/null 2>&1 || true +docker volume create "${VOLUME_NAME}" >/dev/null + +docker run --rm \ + -v "${INPUT_DIR}:/input:ro" \ + -v "${VOLUME_NAME}:/output" \ + "${IMAGE_TAG}" + +echo "tile-cache-fixture volume '${VOLUME_NAME}' built from ${INPUT_DIR}" diff --git a/e2e/fixtures/tile-cache-builder/builder.py b/e2e/fixtures/tile-cache-builder/builder.py new file mode 100644 index 0000000..96dca31 --- /dev/null +++ b/e2e/fixtures/tile-cache-builder/builder.py @@ -0,0 +1,418 @@ +"""Deterministic tile-cache fixture builder. + +Reads source imagery + ground-truth from ``_docs/00_problem/input_data/`` +and emits a reproducible ``tile-cache-fixture`` tree at ``--output``: + + / + tiles///.jpg # tile JPEG bodies + tiles///.json # per-tile sidecar (mirrors `tiles` row) + manifest.csv # sorted manifest with content hashes + descriptors.index # stub FAISS HNSW index (optional) + +The builder is invokable directly (``python -m runner.fixtures.tile_cache_builder.builder``) +or inside the per-builder Docker image (``Dockerfile`` in this directory). + +Reproducibility primitives (AC-1): + +* Source files are sorted lexicographically before processing. +* PIL JPEG encode uses ``quality=85, optimize=False, progressive=False`` + with explicit ``subsampling=2`` (4:2:0) — these are the PIL defaults + but pinning them protects against future PIL changes. +* Manifest rows are sorted by ``(zoom_level, tile_x, tile_y)`` before CSV + serialization. +* FAISS index (when ``faiss-cpu`` is importable) is built single-threaded + with ``faiss.omp_set_num_threads(1)`` and a fixed seed (``faiss.write_index`` + output is deterministic given the same descriptor sequence). +* Descriptors are SHA-256-derived stub vectors — sufficient for schema + contracts, NOT a substitute for real VPR descriptors emitted by C2. + +Public-boundary discipline: this module does NOT import any +``src/gps_denied_onboard`` symbol. The on-disk schema lives in +``_docs/00_problem/restrictions.md`` § Satellite Imagery and is the only +contract this builder honours. +""" + +from __future__ import annotations + +import argparse +import csv +import datetime as _dt +import hashlib +import io +import json +import logging +import os +import shutil +import sys +from dataclasses import dataclass +from pathlib import Path +from typing import Iterable + +logger = logging.getLogger(__name__) + +# AC-2: Derkachi route bbox (placeholder centre — refined when D-PROJ-3 +# lands the production Derkachi sector polygon). Lat/Lon are the bbox +# corners; the builder emits one tile per `(zoom, tx, ty)` covering the +# rectangle. +DERKACHI_BBOX = { + "min_lat": 50.05, + "max_lat": 50.10, + "min_lon": 36.10, + "max_lon": 36.20, +} + +# Static "frozen" capture date for the base fixture. AC-3's age-injector +# operates on a clone; the BASE fixture's date is intentionally fixed in +# the past so the C6 freshness check (6-mo active-conflict / +# 12-mo rear) treats it as fresh for the default scenarios. +BASE_CAPTURE_DATE = "2025-11-01" + +# Zoom level used by C6 for the Derkachi corpus (matches restrictions.md +# §Satellite Imagery: ≥0.5 m/px at the cache interface). +DEFAULT_ZOOM = 18 + +# Tile dimensions (slippy/XYZ convention). +TILE_W = 256 +TILE_H = 256 + +# Stub-descriptor dimensionality (matches the production VPR descriptor +# size declared in `_docs/02_document/components/c2_vpr/description.md` +# for layout compatibility; the values themselves are SHA-derived stubs). +DESCRIPTOR_DIM = 256 + + +@dataclass(frozen=True) +class TileEntry: + """One row of the manifest. Sorted before CSV serialisation.""" + + zoom_level: int + tile_x: int + tile_y: int + capture_date: str + source: str + m_per_px: float + jpeg_path: str + content_hash: str + provenance: str + + +def _iter_stills(input_dir: Path) -> Iterable[Path]: + """Yield AD000NNN.jpg files in sorted order.""" + + for p in sorted(input_dir.glob("AD*.jpg")): + yield p + + +def _iter_paired_gmaps(input_dir: Path) -> set[str]: + """Return the set of AD000NNN basenames that have a paired _gmaps.png.""" + + return {p.stem.removesuffix("_gmaps") for p in input_dir.glob("AD*_gmaps.png")} + + +def _slippy_xy_from_index(idx: int, zoom: int) -> tuple[int, int]: + """Deterministic (tile_x, tile_y) layout: row-major raster across the + Derkachi bbox. The mapping is NOT geodetically meaningful — it is a + stable placeholder until D-PROJ-3 supplies the production tile-matrix + transform. Each `idx` gets a unique (tx, ty) so the manifest stays + collision-free. + """ + + cols = 16 # 16x16 grid covers 256 tiles → comfortably more than 60 stills + 1 bbox + tx = (idx % cols) + (1 << (zoom - 1)) + ty = (idx // cols) + (1 << (zoom - 1)) + return tx, ty + + +def _stub_jpeg_bytes(seed: int) -> bytes: + """Render a deterministic 256x256 JPEG keyed on `seed`. + + No PIL randomness, no timestamps in metadata. The body is a 4-band + gradient (R,G,B,grey) computed from `seed`; OpenCV's imdecode + C2's + descriptor pipeline both treat the bytes as a valid JPEG. + """ + + from PIL import Image # noqa: PLC0415 — heavy import, deferred + + r = (seed * 37) & 0xFF + g = (seed * 53) & 0xFF + b = (seed * 71) & 0xFF + img = Image.new("RGB", (TILE_W, TILE_H), color=(r, g, b)) + buf = io.BytesIO() + img.save( + buf, + format="JPEG", + quality=85, + optimize=False, + progressive=False, + subsampling=2, + ) + return buf.getvalue() + + +def _real_tile_jpeg_bytes(gmaps_png: Path) -> bytes: + """Re-encode a paired _gmaps.png as a deterministic JPEG.""" + + from PIL import Image # noqa: PLC0415 + + img = Image.open(gmaps_png).convert("RGB").resize((TILE_W, TILE_H), Image.BICUBIC) + buf = io.BytesIO() + img.save( + buf, + format="JPEG", + quality=85, + optimize=False, + progressive=False, + subsampling=2, + ) + return buf.getvalue() + + +def _content_hash(b: bytes) -> str: + return hashlib.sha256(b).hexdigest() + + +def _sidecar_dict(entry: TileEntry) -> dict: + """Per-tile JSON sidecar (mirrors the `tiles` row content per + data_model.md § 2.1.2). + """ + + return { + "zoom_level": entry.zoom_level, + "tile_x": entry.tile_x, + "tile_y": entry.tile_y, + "capture_date": entry.capture_date, + "source": entry.source, + "m_per_px": entry.m_per_px, + "content_hash": entry.content_hash, + "provenance": entry.provenance, + } + + +def _emit_tile(out_dir: Path, entry: TileEntry, jpeg_bytes: bytes) -> None: + """Write `/tiles///.{jpg,json}` atomically.""" + + tile_dir = out_dir / "tiles" / str(entry.zoom_level) / str(entry.tile_x) + tile_dir.mkdir(parents=True, exist_ok=True) + jpg_path = tile_dir / f"{entry.tile_y}.jpg" + json_path = tile_dir / f"{entry.tile_y}.json" + jpg_path.write_bytes(jpeg_bytes) + json_path.write_text( + json.dumps(_sidecar_dict(entry), sort_keys=True, separators=(",", ":")) + "\n" + ) + + +def _write_manifest(out_dir: Path, rows: list[TileEntry]) -> Path: + """Write the sorted manifest CSV.""" + + manifest_path = out_dir / "manifest.csv" + with manifest_path.open("w", newline="") as fp: + writer = csv.writer(fp, lineterminator="\n") + writer.writerow( + [ + "zoom_level", + "tile_x", + "tile_y", + "capture_date", + "source", + "m_per_px", + "jpeg_path", + "content_hash", + "provenance", + ] + ) + for r in sorted(rows, key=lambda x: (x.zoom_level, x.tile_x, x.tile_y)): + writer.writerow( + [ + r.zoom_level, + r.tile_x, + r.tile_y, + r.capture_date, + r.source, + f"{r.m_per_px:.6f}", + r.jpeg_path, + r.content_hash, + r.provenance, + ] + ) + return manifest_path + + +def _write_descriptors_index(out_dir: Path, rows: list[TileEntry]) -> Path | None: + """Emit a deterministic FAISS HNSW index of stub descriptors. + + Returns the index path on success, or None when faiss-cpu is not + importable. The unit test gates on importorskip("faiss"); the + production build inside ``Dockerfile`` ships faiss-cpu so this path + is always exercised in CI. + """ + + try: + import faiss # noqa: PLC0415 + import numpy as np # noqa: PLC0415 + except ImportError: + logger.warning( + "faiss / numpy not importable in this environment — " + "skipping descriptors.index emission. The fixture is still " + "usable for schema-only scenarios; VPR-matching scenarios " + "need the Docker build." + ) + return None + + # Single-thread + deterministic seed → bit-stable output. + faiss.omp_set_num_threads(1) + + descriptors = np.zeros((len(rows), DESCRIPTOR_DIM), dtype=np.float32) + for i, r in enumerate(sorted(rows, key=lambda x: (x.zoom_level, x.tile_x, x.tile_y))): + # SHA-derived stub: hash the tile's content_hash + index byte + # into DESCRIPTOR_DIM float32s. Stable across runs because + # content_hash is stable. + seed_bytes = hashlib.sha256( + f"{r.content_hash}|{i}".encode("ascii") + ).digest() + rng = np.random.default_rng(int.from_bytes(seed_bytes[:8], "big")) + descriptors[i] = rng.standard_normal(DESCRIPTOR_DIM, dtype=np.float32) + + # HNSW32 + IP metric is the C2 production choice (see + # _docs/02_document/components/c2_vpr/description.md). + index = faiss.IndexHNSWFlat(DESCRIPTOR_DIM, 32, faiss.METRIC_INNER_PRODUCT) + index.hnsw.efConstruction = 40 + index.hnsw.efSearch = 16 + index.add(descriptors) + + index_path = out_dir / "descriptors.index" + faiss.write_index(index, str(index_path)) + return index_path + + +def build(input_dir: Path, output_dir: Path) -> dict: + """Build the tile-cache fixture under `output_dir` from `input_dir`. + + Returns a manifest summary dict for caller logging: + {"tile_count": int, "stub_count": int, "real_count": int, + "manifest_hash": str, "descriptors_index_hash": str | None} + + The output directory is wiped and re-created so two consecutive + invocations against the same input produce bit-identical trees + (AC-1). + """ + + if output_dir.exists(): + shutil.rmtree(output_dir) + output_dir.mkdir(parents=True) + + paired = _iter_paired_gmaps(input_dir) + stills = list(_iter_stills(input_dir)) + if not stills: + raise FileNotFoundError( + f"No AD*.jpg files under {input_dir} — input_data/ may be missing" + ) + + rows: list[TileEntry] = [] + stub_count = 0 + real_count = 0 + + # AC-2: one tile entry per still + one entry for the Derkachi bbox + # (index 60 in our deterministic layout). + for idx, still in enumerate(stills): + tx, ty = _slippy_xy_from_index(idx, DEFAULT_ZOOM) + if still.stem in paired: + jpeg = _real_tile_jpeg_bytes(input_dir / f"{still.stem}_gmaps.png") + source = "googlemaps" + provenance = f"paired_gmaps:{still.stem}" + real_count += 1 + else: + # D-PROJ-3 stub-tile fallback per AZ-407 spec lines 18–19. + jpeg = _stub_jpeg_bytes(idx + 1) + source = "stub" + provenance = "STUB" + stub_count += 1 + entry = TileEntry( + zoom_level=DEFAULT_ZOOM, + tile_x=tx, + tile_y=ty, + capture_date=BASE_CAPTURE_DATE, + source=source, + m_per_px=0.5, + jpeg_path=f"tiles/{DEFAULT_ZOOM}/{tx}/{ty}.jpg", + content_hash=_content_hash(jpeg), + provenance=provenance, + ) + rows.append(entry) + _emit_tile(output_dir, entry, jpeg) + + # AC-2: Derkachi route bbox entry — single representative tile at + # the bbox centre. Real coverage of the bbox is owned by D-PROJ-3. + tx, ty = _slippy_xy_from_index(60, DEFAULT_ZOOM) + bbox_jpeg = _stub_jpeg_bytes(60 + 1) + bbox_entry = TileEntry( + zoom_level=DEFAULT_ZOOM, + tile_x=tx, + tile_y=ty, + capture_date=BASE_CAPTURE_DATE, + source="stub", + m_per_px=0.5, + jpeg_path=f"tiles/{DEFAULT_ZOOM}/{tx}/{ty}.jpg", + content_hash=_content_hash(bbox_jpeg), + provenance=( + f"STUB_BBOX:derkachi:{DERKACHI_BBOX['min_lat']}," + f"{DERKACHI_BBOX['min_lon']},{DERKACHI_BBOX['max_lat']}," + f"{DERKACHI_BBOX['max_lon']}" + ), + ) + rows.append(bbox_entry) + _emit_tile(output_dir, bbox_entry, bbox_jpeg) + stub_count += 1 + + manifest_path = _write_manifest(output_dir, rows) + manifest_hash = hashlib.sha256(manifest_path.read_bytes()).hexdigest() + + index_path = _write_descriptors_index(output_dir, rows) + if index_path is not None: + descriptors_hash = hashlib.sha256(index_path.read_bytes()).hexdigest() + else: + descriptors_hash = None + + return { + "tile_count": len(rows), + "stub_count": stub_count, + "real_count": real_count, + "paired_gmaps_count": len(paired), + "manifest_hash": manifest_hash, + "descriptors_index_hash": descriptors_hash, + } + + +def main(argv: list[str] | None = None) -> int: + parser = argparse.ArgumentParser(description="Build the tile-cache test fixture") + parser.add_argument( + "--input-dir", + type=Path, + required=True, + help="Directory containing AD*.jpg and AD*_gmaps.png source files", + ) + parser.add_argument( + "--output-dir", + type=Path, + required=True, + help="Output directory for the tile-cache fixture tree", + ) + parser.add_argument( + "--quiet", + action="store_true", + help="Suppress per-tile log lines (errors still surface)", + ) + args = parser.parse_args(argv) + + logging.basicConfig( + level=logging.WARNING if args.quiet else logging.INFO, + format="%(asctime)s %(levelname)s %(name)s %(message)s", + ) + + summary = build(args.input_dir, args.output_dir) + json.dump(summary, sys.stdout, sort_keys=True, indent=2) + sys.stdout.write("\n") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/e2e/jetson/run-tier2.sh b/e2e/jetson/run-tier2.sh index d289d48..2476767 100755 --- a/e2e/jetson/run-tier2.sh +++ b/e2e/jetson/run-tier2.sh @@ -1,31 +1,57 @@ #!/usr/bin/env bash -# Tier-2 Jetson hardware-loop entrypoint. +# Tier-2 Jetson hardware-loop entrypoint (orchestrator). +# +# This script runs FROM a control host (typically x86) and ssh-orchestrates +# the on-Jetson half (`tier2-on-jetson.sh`). When invoked on the Jetson +# itself (uname -m == aarch64 AND TIER2_HOST=localhost), it delegates +# directly without going through ssh. # # Usage: -# ./run-tier2.sh --fc-adapter --vio-strategy [--duration <5min|8h>] [--enable-chamber] +# ./run-tier2.sh \ +# --fc-adapter \ +# --vio-strategy \ +# [-k ] \ +# [--build-kind ] \ +# [--duration <5min|8h>] \ +# [--enable-chamber] \ +# [--reflash] \ +# [--dry-run] # -# Pre-requisites (verified at startup): +# Required env vars (when TIER2_HOST != localhost): +# TIER2_HOST Jetson hostname or IP +# TIER2_USER SSH user on the Jetson +# TIER2_KEY_PATH Path to the SSH private key +# +# Pre-requisites verified at startup: # * The Jetson is provisioned per `_docs/02_document/tests/environment.md` -# § Execution instructions — Tier-2 (JetPack 6.2, CUDA, TensorRT 10.3, cuDNN). -# * `gps-denied-onboard.service` is installed via systemd -# (`tier2.service` is the template; operator copies it to /etc/systemd/system). +# § Execution instructions — Tier-2 (JetPack 6.2, CUDA, TensorRT 10.3, +# cuDNN). +# * `gps-denied-onboard.service` (or `gps-denied-onboard-asan.service` +# for --build-kind=asan) is installed via systemd. `tier2.service` is +# the template. # * SITLs + mock + listener + runner reachable on the same network via -# `docker compose -f e2e/docker/docker-compose.test.yml -f e2e/docker/docker-compose.tier2-bridge.yml up ...` -# on a paired x86 host. (Same-Jetson SITL is also supported — set JETSON_HOST=localhost.) +# `docker compose -f e2e/docker/docker-compose.test.yml +# -f e2e/docker/docker-compose.tier2-bridge.yml up ...` +# on a paired x86 host (same as Tier-1's `docker-compose.test.yml` +# network). # -# Outputs the same CSV format as Tier-1 to ./e2e-results/run-${RUN_ID}/report.csv +# Outputs the same CSV format as Tier-1 to +# ./e2e-results/run-${RUN_ID}/report.csv # plus the per-sample tegrastats + jtop CSVs in the evidence bundle. set -euo pipefail FC_ADAPTER="" VIO_STRATEGY="" +SELECTOR="" +BUILD_KIND="production" DURATION="5min" ENABLE_CHAMBER=0 -JETSON_HOST_OVERRIDE="" +RUN_REFLASH=0 +DRY_RUN=0 usage() { - grep -E '^# ' "$0" | sed 's/^# //' + grep -E '^# ' "$0" | sed 's/^# //' >&2 exit 1 } @@ -33,9 +59,12 @@ while [[ $# -gt 0 ]]; do case "$1" in --fc-adapter) FC_ADAPTER="$2"; shift 2 ;; --vio-strategy) VIO_STRATEGY="$2"; shift 2 ;; + -k|--selector) SELECTOR="$2"; shift 2 ;; + --build-kind) BUILD_KIND="$2"; shift 2 ;; --duration) DURATION="$2"; shift 2 ;; --enable-chamber) ENABLE_CHAMBER=1; shift ;; - --jetson-host) JETSON_HOST_OVERRIDE="$2"; shift 2 ;; + --reflash) RUN_REFLASH=1; shift ;; + --dry-run) DRY_RUN=1; shift ;; -h|--help) usage ;; *) echo "Unknown arg: $1" >&2; usage ;; esac @@ -56,93 +85,153 @@ case "$VIO_STRATEGY" in *) echo "ERROR: --vio-strategy must be okvis2 | klt_ransac | vins_mono (got: $VIO_STRATEGY)" >&2; exit 2 ;; esac +case "$BUILD_KIND" in + production|asan) ;; + *) echo "ERROR: --build-kind must be production or asan (got: $BUILD_KIND)" >&2; exit 2 ;; +esac + +# AC-6 (image-flash gating). Even when --reflash is requested, refuse to +# proceed unless the operator has acknowledged via TIER2_REFLASH_ACK=1. +# This is a two-key gate so a stray flag flip in CI cannot accidentally +# re-provision a development board. +if [[ "${RUN_REFLASH}" -eq 1 ]]; then + if [[ "${TIER2_REFLASH_ACK:-0}" != "1" ]]; then + echo "ERROR: --reflash requires TIER2_REFLASH_ACK=1 in the env" >&2 + echo " This is a destructive operation; set the ack to" >&2 + echo " confirm you intend to re-flash the Jetson via" >&2 + echo " nvidia-sdkmanager-cli." >&2 + exit 4 + fi +fi + # RUN_ID — caller may set; default is utc-stamp + adapter pair. : "${RUN_ID:=tier2-$(date -u +%Y%m%dT%H%M%SZ)-${FC_ADAPTER}-${VIO_STRATEGY}}" SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" REPO_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)" -RESULTS_DIR="${REPO_ROOT}/e2e-results/run-${RUN_ID}" -EVIDENCE_DIR="${RESULTS_DIR}/evidence" - -mkdir -p "${EVIDENCE_DIR}" - -echo "[tier2] RUN_ID=${RUN_ID}" -echo "[tier2] FC_ADAPTER=${FC_ADAPTER} VIO_STRATEGY=${VIO_STRATEGY} DURATION=${DURATION}" -echo "[tier2] RESULTS_DIR=${RESULTS_DIR}" # --------------------------------------------------------------------------- -# Pre-flight: confirm the SUT systemd unit is healthy. +# Determine mode: +# * local mode — run on the Jetson itself; no ssh wrapper. +# Triggered when TIER2_HOST=localhost OR is unset on an aarch64 host. +# * remote mode — orchestrator: ssh into TIER2_HOST and execute the +# on-Jetson delegate there. # --------------------------------------------------------------------------- -if ! systemctl is-active --quiet gps-denied-onboard.service; then - echo "[tier2] gps-denied-onboard.service is not active — attempting restart..." >&2 - sudo systemctl restart gps-denied-onboard.service - sleep 3 - if ! systemctl is-active --quiet gps-denied-onboard.service; then - echo "[tier2] FATAL: gps-denied-onboard.service failed to start" >&2 - sudo systemctl status gps-denied-onboard.service --no-pager || true - exit 3 +TIER2_HOST="${TIER2_HOST:-}" +if [[ -z "${TIER2_HOST}" ]]; then + if [[ "$(uname -m)" == "aarch64" ]]; then + TIER2_HOST="localhost" + else + echo "ERROR: TIER2_HOST must be set when running from a non-Jetson host" >&2 + echo " (uname -m is $(uname -m); this script is not running on a Jetson)" >&2 + exit 5 fi fi -# --------------------------------------------------------------------------- -# Start tegrastats + jtop background samplers (evidence bundle inputs). -# --------------------------------------------------------------------------- -TEGRA_CSV="${EVIDENCE_DIR}/tegrastats.csv" -JTOP_CSV="${EVIDENCE_DIR}/jtop.csv" +echo "[tier2] RUN_ID=${RUN_ID}" +echo "[tier2] FC_ADAPTER=${FC_ADAPTER} VIO_STRATEGY=${VIO_STRATEGY} BUILD_KIND=${BUILD_KIND}" +echo "[tier2] SELECTOR='${SELECTOR}' DURATION=${DURATION} ENABLE_CHAMBER=${ENABLE_CHAMBER}" +echo "[tier2] TIER2_HOST=${TIER2_HOST}" -# tegrastats emits at 5 Hz by default; parser converts to per-sample CSV rows. -if command -v tegrastats >/dev/null 2>&1; then - tegrastats --interval 200 \ - | python3 "${SCRIPT_DIR}/tegrastats_parser.py" --out "${TEGRA_CSV}" & - TEGRA_PID=$! -else - echo "[tier2] WARNING: tegrastats not in PATH — skipping that evidence channel." >&2 - TEGRA_PID= +# --------------------------------------------------------------------------- +# Build the ssh command prefix for the orchestrator mode. +# --------------------------------------------------------------------------- +SSH_CMD="" +if [[ "${TIER2_HOST}" != "localhost" ]]; then + : "${TIER2_USER:?TIER2_USER must be set for remote orchestrator mode}" + : "${TIER2_KEY_PATH:?TIER2_KEY_PATH must be set for remote orchestrator mode}" + if [[ ! -f "${TIER2_KEY_PATH}" ]]; then + echo "ERROR: TIER2_KEY_PATH does not point at a real file: ${TIER2_KEY_PATH}" >&2 + exit 6 + fi + SSH_CMD="ssh -o StrictHostKeyChecking=accept-new -i ${TIER2_KEY_PATH} ${TIER2_USER}@${TIER2_HOST}" fi -if command -v jtop >/dev/null 2>&1; then - python3 "${SCRIPT_DIR}/jtop_parser.py" --out "${JTOP_CSV}" --interval 1.0 & - JTOP_PID=$! -else - echo "[tier2] WARNING: jtop not in PATH — skipping that evidence channel." >&2 - JTOP_PID= -fi +# --------------------------------------------------------------------------- +# AC-2: idempotent provisioning. apt update + install is idempotent on +# its own; we just gate it behind a `--reflash` flag because re-running +# it on every test invocation is needlessly slow. +# --------------------------------------------------------------------------- +provision_jetson() { + local PROVISION_CMD + PROVISION_CMD="set -eu; + if ! dpkg -s python3-pip >/dev/null 2>&1; then + sudo apt-get update; + sudo apt-get install -y --no-install-recommends \ + python3-pip docker.io openssh-client iproute2; + fi" -cleanup() { - local rc=$? - [[ -n "${TEGRA_PID:-}" ]] && kill "${TEGRA_PID}" 2>/dev/null || true - [[ -n "${JTOP_PID:-}" ]] && kill "${JTOP_PID}" 2>/dev/null || true - echo "[tier2] cleanup complete (rc=${rc})" - exit "${rc}" + if [[ "${TIER2_HOST}" == "localhost" ]]; then + bash -c "${PROVISION_CMD}" + else + # shellcheck disable=SC2086 + ${SSH_CMD} "${PROVISION_CMD}" + fi } -trap cleanup EXIT INT TERM # --------------------------------------------------------------------------- -# Run the e2e suite — the runner image is the SAME as Tier-1; only TIER differs. +# AC-6: reflash via NVIDIA's sdkmanager-cli. This is the destructive +# path; only runs when --reflash AND TIER2_REFLASH_ACK=1 are BOTH set. # --------------------------------------------------------------------------- -JETSON_HOST_ARG="${JETSON_HOST_OVERRIDE:-localhost}" -CHAMBER_ARG=() -[[ "${ENABLE_CHAMBER}" -eq 1 ]] && CHAMBER_ARG=("--enable-chamber") +reflash_jetson() { + local FLASH_CMD + FLASH_CMD="set -eu; + if ! command -v nvidia-sdkmanager-cli >/dev/null 2>&1; then + echo 'ERROR: nvidia-sdkmanager-cli not installed on Jetson' >&2 + exit 7 + fi + echo '[tier2] re-flashing JetPack image via nvidia-sdkmanager-cli...' >&2 + nvidia-sdkmanager-cli flash --target-spec jetson-orin-nano-super" -( - cd "${REPO_ROOT}/e2e/docker" - RUN_ID="${RUN_ID}" \ - FC_ADAPTER="${FC_ADAPTER}" \ - VIO_STRATEGY="${VIO_STRATEGY}" \ - TIER="tier2-jetson" \ - JETSON_HOST="${JETSON_HOST_ARG}" \ - docker compose \ - -f docker-compose.test.yml \ - -f docker-compose.tier2-bridge.yml \ - run --rm \ - -e TIER=tier2-jetson \ - e2e-runner \ - pytest /test-suite \ - --csv="/e2e-results/run-${RUN_ID}/report.csv" \ - --csv-columns="test_id,test_name,traces_to,fc_adapter,vio_strategy,tier,started_at_utc,execution_time_ms,result,error_message,evidence_paths" \ - --evidence-out="/e2e-results/run-${RUN_ID}/evidence" \ - --build-kind=production \ - "${CHAMBER_ARG[@]}" + if [[ "${TIER2_HOST}" == "localhost" ]]; then + bash -c "${FLASH_CMD}" + else + # shellcheck disable=SC2086 + ${SSH_CMD} "${FLASH_CMD}" + fi +} + +# --------------------------------------------------------------------------- +# Execute the on-Jetson delegate. +# --------------------------------------------------------------------------- +ENV_PREFIX=( + "RUN_ID=${RUN_ID}" + "FC_ADAPTER=${FC_ADAPTER}" + "VIO_STRATEGY=${VIO_STRATEGY}" + "BUILD_KIND=${BUILD_KIND}" + "SELECTOR=${SELECTOR}" + "ENABLE_CHAMBER=${ENABLE_CHAMBER}" + "JETSON_HOST=${TIER2_HOST}" ) -echo "[tier2] Suite complete. Report: ${RESULTS_DIR}/report.csv" +if [[ "${TIER2_HOST}" == "localhost" ]]; then + DELEGATE_CMD=(env "${ENV_PREFIX[@]}" "${SCRIPT_DIR}/tier2-on-jetson.sh") +else + # Remote mode: rsync the e2e/ tree onto the Jetson and run the + # delegate over ssh. We mirror the repo to /opt/azaion-e2e/ on the + # Jetson; subsequent invocations are incremental via rsync's default + # delta-transfer. + REMOTE_REPO="/opt/azaion-e2e" + RSYNC_CMD="rsync -az --delete -e 'ssh -o StrictHostKeyChecking=accept-new -i ${TIER2_KEY_PATH}' ${REPO_ROOT}/e2e/ ${TIER2_USER}@${TIER2_HOST}:${REMOTE_REPO}/e2e/" + DELEGATE_CMD=( + bash -c + "${RSYNC_CMD} && ${SSH_CMD} \"env $(printf '%q ' "${ENV_PREFIX[@]}")${REMOTE_REPO}/e2e/jetson/tier2-on-jetson.sh\"" + ) +fi + +if [[ "${DRY_RUN}" -eq 1 ]]; then + echo "[tier2] --dry-run: showing actions that would execute, then exiting." + echo "[tier2] provision: ${SSH_CMD:-(local)} apt-get install -y python3-pip docker.io openssh-client iproute2" + if [[ "${RUN_REFLASH}" -eq 1 ]]; then + echo "[tier2] reflash: ${SSH_CMD:-(local)} nvidia-sdkmanager-cli flash --target-spec jetson-orin-nano-super" + fi + echo "[tier2] delegate: ${DELEGATE_CMD[*]}" + exit 0 +fi + +provision_jetson +[[ "${RUN_REFLASH}" -eq 1 ]] && reflash_jetson + +"${DELEGATE_CMD[@]}" + +echo "[tier2] Suite complete. RUN_ID=${RUN_ID}" diff --git a/e2e/jetson/tier2-on-jetson.sh b/e2e/jetson/tier2-on-jetson.sh new file mode 100755 index 0000000..c510ddb --- /dev/null +++ b/e2e/jetson/tier2-on-jetson.sh @@ -0,0 +1,149 @@ +#!/usr/bin/env bash +# Tier-2 ON-JETSON delegate. NOT invoked directly by humans — `run-tier2.sh` +# ssh-orchestrates this script onto the configured Jetson host. +# +# Responsibilities: +# * Verify `gps-denied-onboard.service` (or the `*-asan` variant) is healthy. +# * Spawn tegrastats + jtop parallel samplers; route their output into the +# evidence bundle. +# * Drive the e2e-runner image via docker compose against +# `docker-compose.test.yml + docker-compose.tier2-bridge.yml`. +# * Tear down samplers cleanly on EXIT / INT / TERM. +# +# Required env vars (set by run-tier2.sh): +# RUN_ID Run identifier (utc-stamp). +# FC_ADAPTER ardupilot | inav +# VIO_STRATEGY okvis2 | klt_ransac | vins_mono +# BUILD_KIND production | asan +# SELECTOR pytest -k expression (may be empty) +# ENABLE_CHAMBER 0 | 1 +# JETSON_HOST host alias used by the test for SUT identification + +set -euo pipefail + +: "${RUN_ID:?RUN_ID must be set by run-tier2.sh}" +: "${FC_ADAPTER:?FC_ADAPTER must be set}" +: "${VIO_STRATEGY:?VIO_STRATEGY must be set}" +: "${BUILD_KIND:=production}" +: "${SELECTOR:=}" +: "${ENABLE_CHAMBER:=0}" +: "${JETSON_HOST:=localhost}" + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +REPO_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)" +RESULTS_DIR="${REPO_ROOT}/e2e-results/run-${RUN_ID}" +EVIDENCE_DIR="${RESULTS_DIR}/evidence" + +mkdir -p "${EVIDENCE_DIR}" + +# AC-5: the asan build is a separate systemd unit so it can run alongside +# the production one for control/treatment comparisons. +case "${BUILD_KIND}" in + production) + SUT_UNIT="gps-denied-onboard.service" + ;; + asan) + SUT_UNIT="gps-denied-onboard-asan.service" + # ASan stderr stream is captured into the evidence bundle (see + # AC-5: "stderr captured into asan-fuzz-${test_id}.log"). We tail + # the unit's journal into the evidence file via journalctl. + ASAN_LOG="${EVIDENCE_DIR}/asan-fuzz.log" + ;; + *) + echo "[tier2-on-jetson] FATAL: unknown BUILD_KIND=${BUILD_KIND}" >&2 + exit 2 + ;; +esac + +# AC-3: systemd lifecycle. Restart on demand; fail loud if it doesn't +# come back up. +echo "[tier2-on-jetson] verifying ${SUT_UNIT} is active..." +if ! systemctl is-active --quiet "${SUT_UNIT}"; then + echo "[tier2-on-jetson] ${SUT_UNIT} is not active — restarting..." >&2 + sudo systemctl restart "${SUT_UNIT}" + # AC-3 says "restart within ≤5 s"; we poll up to 5s + 1s safety + # margin. + for _ in 1 2 3 4 5 6; do + sleep 1 + if systemctl is-active --quiet "${SUT_UNIT}"; then + break + fi + done + if ! systemctl is-active --quiet "${SUT_UNIT}"; then + echo "[tier2-on-jetson] FATAL: ${SUT_UNIT} failed to start" >&2 + sudo systemctl status "${SUT_UNIT}" --no-pager || true + exit 3 + fi +fi + +# AC-4: tegrastats + jtop parallel capture. Output streams into the +# evidence bundle. +TEGRA_CSV="${EVIDENCE_DIR}/tegrastats-${JETSON_HOST}-${RUN_ID}.csv" +JTOP_CSV="${EVIDENCE_DIR}/jtop-${JETSON_HOST}-${RUN_ID}.csv" +TEGRA_PID="" +JTOP_PID="" +ASAN_TAIL_PID="" + +if command -v tegrastats >/dev/null 2>&1; then + # 5 Hz sampling matches the parser's expected cadence. + tegrastats --interval 200 \ + | python3 "${SCRIPT_DIR}/tegrastats_parser.py" --out "${TEGRA_CSV}" & + TEGRA_PID=$! + echo "[tier2-on-jetson] tegrastats sampler pid=${TEGRA_PID} → ${TEGRA_CSV}" +else + echo "[tier2-on-jetson] WARNING: tegrastats not in PATH — skipping that evidence channel." >&2 +fi + +if command -v jtop >/dev/null 2>&1; then + python3 "${SCRIPT_DIR}/jtop_parser.py" --out "${JTOP_CSV}" --interval 1.0 & + JTOP_PID=$! + echo "[tier2-on-jetson] jtop sampler pid=${JTOP_PID} → ${JTOP_CSV}" +else + echo "[tier2-on-jetson] WARNING: jtop not in PATH — skipping that evidence channel." >&2 +fi + +if [[ "${BUILD_KIND}" == "asan" ]]; then + journalctl -u "${SUT_UNIT}" -f --no-pager > "${ASAN_LOG}" 2>&1 & + ASAN_TAIL_PID=$! + echo "[tier2-on-jetson] asan journal tail pid=${ASAN_TAIL_PID} → ${ASAN_LOG}" +fi + +cleanup() { + local rc=$? + [[ -n "${TEGRA_PID}" ]] && kill "${TEGRA_PID}" 2>/dev/null || true + [[ -n "${JTOP_PID}" ]] && kill "${JTOP_PID}" 2>/dev/null || true + [[ -n "${ASAN_TAIL_PID}" ]] && kill "${ASAN_TAIL_PID}" 2>/dev/null || true + echo "[tier2-on-jetson] cleanup complete (rc=${rc})" + exit "${rc}" +} +trap cleanup EXIT INT TERM + +# AC-1: selector parity. SELECTOR is forwarded as `-k ""` to the +# pytest inside the runner image; empty SELECTOR means "all tests". +PYTEST_ARGS=("/test-suite") +PYTEST_ARGS+=("--csv=/e2e-results/run-${RUN_ID}/report.csv") +PYTEST_ARGS+=("--csv-columns=test_id,test_name,traces_to,fc_adapter,vio_strategy,tier,started_at_utc,execution_time_ms,result,error_message,evidence_paths") +PYTEST_ARGS+=("--evidence-out=/e2e-results/run-${RUN_ID}/evidence") +PYTEST_ARGS+=("--build-kind=${BUILD_KIND}") +[[ "${ENABLE_CHAMBER}" -eq 1 ]] && PYTEST_ARGS+=("--enable-chamber") +[[ -n "${SELECTOR}" ]] && PYTEST_ARGS+=("-k" "${SELECTOR}") + +( + cd "${REPO_ROOT}/e2e/docker" + RUN_ID="${RUN_ID}" \ + FC_ADAPTER="${FC_ADAPTER}" \ + VIO_STRATEGY="${VIO_STRATEGY}" \ + TIER="tier2-jetson" \ + JETSON_HOST="${JETSON_HOST}" \ + BUILD_KIND="${BUILD_KIND}" \ + docker compose \ + -f docker-compose.test.yml \ + -f docker-compose.tier2-bridge.yml \ + run --rm \ + -e TIER=tier2-jetson \ + -e BUILD_KIND="${BUILD_KIND}" \ + e2e-runner \ + pytest "${PYTEST_ARGS[@]}" +) + +echo "[tier2-on-jetson] Suite complete. Report: ${RESULTS_DIR}/report.csv" diff --git a/e2e/runner/conftest.py b/e2e/runner/conftest.py index 293bebc..111e777 100644 --- a/e2e/runner/conftest.py +++ b/e2e/runner/conftest.py @@ -211,4 +211,5 @@ def mock_suite_sat_url() -> str: pytest_plugins = [ "runner.reporting.csv_reporter", "runner.reporting.evidence_bundler", + "runner.reporting.nfr_recorder", ] diff --git a/e2e/runner/reporting/csv_reporter.py b/e2e/runner/reporting/csv_reporter.py index 90cccb1..08aaaac 100644 --- a/e2e/runner/reporting/csv_reporter.py +++ b/e2e/runner/reporting/csv_reporter.py @@ -89,6 +89,22 @@ def _outcome_to_result(report: pytest.TestReport, item: pytest.Item) -> str: deferred = item.get_closest_marker("deferred_ac") if deferred and deferred.kwargs.get("verdict") == "xfail": return "XFAIL" + # AZ-445 AC-4 (PARTIAL propagation): if the NFR recorder marked + # any AC PARTIAL for this nodeid, the row is PARTIAL instead of + # PASS. The aggregator is the source of truth. + try: + # Local import keeps csv_reporter usable when nfr_recorder + # is not loaded (e.g. in the standalone unit-test that + # exercises csv_reporter alone). + from .nfr_recorder import aggregator_for # noqa: PLC0415 + + aggregator = aggregator_for(item.session.config) + except Exception: + aggregator = None + if aggregator is not None: + for rec in aggregator.records(): + if rec.nodeid == report.nodeid and rec.partial_acs: + return "PARTIAL" return "PASS" if report.outcome == "failed": return "FAIL" diff --git a/e2e/runner/reporting/nfr_recorder.py b/e2e/runner/reporting/nfr_recorder.py new file mode 100644 index 0000000..24cafec --- /dev/null +++ b/e2e/runner/reporting/nfr_recorder.py @@ -0,0 +1,408 @@ +"""NFR metrics recorder + run-end aggregator (AZ-445). + +Extends the AZ-406 reporting subsystem with three additional artifacts: + +* ``per-nfr/.json`` — canonical metric blob per NFT scenario. +* ``traceability-status.json`` — per-AC coverage roll-up across the run. +* ``regression-baseline.json`` — flat dump of every numeric metric the + run captured (diffable across runs). + +Public API (used by NFT scenario tests): + + def test_nft_perf_01_partition_latency_p95(nfr_recorder): + nfr_recorder.record_metric("latency_ms_p95", 380.4, ac_id="AC-4.1") + nfr_recorder.partial( + "AC-4.1", + "p95 exceeds 400 ms when chamber is enabled (deferred to NFT-PERF-01b)", + ) + +The recorder also exposes ``recorder.scenario_id`` for tests that need +to name their evidence files consistently with the per-NFR JSON. + +PARTIAL propagation: ``recorder.partial(ac_id, reason)`` marks the +current test row as PARTIAL in the CSV reporter and the corresponding +AC as PARTIAL in the traceability roll-up. Tests that PASS without +calling ``partial`` are recorded as Covered. +""" + +from __future__ import annotations + +import json +import logging +import re +from dataclasses import dataclass, field +from pathlib import Path +from typing import Any + +import pytest + +from .csv_reporter import reporter_for + +logger = logging.getLogger(__name__) + + +# ───────────────────────── data model ───────────────────────── + + +@dataclass +class _ScenarioRecord: + scenario_id: str + nodeid: str + traces_to: tuple[str, ...] + metrics: dict[str, Any] = field(default_factory=dict) + partial_acs: dict[str, str] = field(default_factory=dict) # ac_id → reason + outcome: str | None = None # filled in at logreport time + + +# ───────────────────── traceability matrix parser ───────────────────── + + +_AC_ROW_RE = re.compile(r"^\|\s*(AC-[A-Za-z0-9\.-]+)\s*\|", re.MULTILINE) +_RESTRICT_ROW_RE = re.compile( + r"^\|\s*(RESTRICT-[A-Za-z0-9\.-]+)\s*\|", re.MULTILINE +) + + +def parse_traceability_matrix(matrix_path: Path) -> list[str]: + """Extract every AC / RESTRICT ID declared in the matrix file. + + Returns a sorted, deduplicated list. Public so unit tests can call + it independently of pytest. + """ + + if not matrix_path.is_file(): + raise FileNotFoundError(f"traceability matrix not found at {matrix_path}") + text = matrix_path.read_text() + ids: set[str] = set() + for match in _AC_ROW_RE.finditer(text): + ids.add(match.group(1)) + for match in _RESTRICT_ROW_RE.finditer(text): + ids.add(match.group(1)) + return sorted(ids) + + +# ───────────────────── recorder fixture ───────────────────── + + +class _NfrRecorder: + """Per-test handle exposed via the ``nfr_recorder`` pytest fixture.""" + + def __init__( + self, + scenario_id: str, + nodeid: str, + traces_to: tuple[str, ...], + run: "_RunAggregator", + ) -> None: + self.scenario_id = scenario_id + self.nodeid = nodeid + self.traces_to = traces_to + self._run = run + + def record_metric(self, name: str, value: Any, ac_id: str | None = None) -> None: + """Capture a numeric / structured metric for this scenario.""" + if not isinstance(name, str) or not name: + raise ValueError(f"metric name must be a non-empty str, got {name!r}") + self._run.record_metric( + scenario_id=self.scenario_id, + name=name, + value=value, + ac_id=ac_id, + nodeid=self.nodeid, + ) + + def partial(self, ac_id: str, reason: str) -> None: + """Mark `ac_id` PARTIAL for this scenario and propagate to CSV row.""" + if not ac_id or not reason: + raise ValueError("partial() requires both ac_id and reason") + self._run.mark_partial( + scenario_id=self.scenario_id, + ac_id=ac_id, + reason=reason, + nodeid=self.nodeid, + ) + + +# ───────────────────── run aggregator ───────────────────── + + +class _RunAggregator: + """Plugin-scoped state for the whole pytest session.""" + + def __init__( + self, + evidence_dir: Path, + matrix_ids: list[str], + ) -> None: + self.evidence_dir = evidence_dir + self.matrix_ids = matrix_ids + self._records: dict[str, _ScenarioRecord] = {} + + # --- mutation API used by _NfrRecorder --- + + def ensure_record( + self, scenario_id: str, nodeid: str, traces_to: tuple[str, ...] + ) -> _ScenarioRecord: + rec = self._records.get(nodeid) + if rec is None: + rec = _ScenarioRecord( + scenario_id=scenario_id, + nodeid=nodeid, + traces_to=traces_to, + ) + self._records[nodeid] = rec + return rec + + def record_metric( + self, + *, + scenario_id: str, + name: str, + value: Any, + ac_id: str | None, + nodeid: str, + ) -> None: + rec = self._records[nodeid] + rec.metrics[name] = {"value": value, "ac_id": ac_id} + + def mark_partial( + self, + *, + scenario_id: str, + ac_id: str, + reason: str, + nodeid: str, + ) -> None: + rec = self._records[nodeid] + rec.partial_acs[ac_id] = reason + + def set_outcome(self, nodeid: str, outcome: str) -> None: + """Called by the plugin's logreport hook.""" + rec = self._records.get(nodeid) + if rec is not None: + rec.outcome = outcome + + # --- read-only accessors used by tests + emission --- + + def records(self) -> list[_ScenarioRecord]: + return list(self._records.values()) + + # --- emission (called at session end) --- + + def emit_per_nfr_json(self) -> list[Path]: + """One file per scenario under ``/per-nfr/``.""" + out_dir = self.evidence_dir / "per-nfr" + out_dir.mkdir(parents=True, exist_ok=True) + emitted: list[Path] = [] + for rec in self._records.values(): + path = out_dir / f"{rec.scenario_id}.json" + blob = { + "scenario_id": rec.scenario_id, + "nodeid": rec.nodeid, + "traces_to": list(rec.traces_to), + "outcome": rec.outcome or "UNKNOWN", + "metrics": rec.metrics, + "partial_acs": rec.partial_acs, + } + path.write_text(json.dumps(blob, sort_keys=True, indent=2) + "\n") + emitted.append(path) + return emitted + + def compute_traceability_status(self) -> dict: + """Aggregate per-AC status across all recorded scenarios. + + Algorithm: + * NOT COVERED — no scenario traces to this AC. + * PARTIAL — at least one scenario marks the AC PARTIAL + OR has outcome ∈ {FAIL, SKIP}. + * Covered — every tracing scenario has outcome ∈ + {PASS, XFAIL} and none marked PARTIAL. + """ + by_ac: dict[str, dict] = { + ac: {"status": "NOT COVERED", "sources": []} for ac in self.matrix_ids + } + for rec in self._records.values(): + for ac in rec.traces_to: + entry = by_ac.setdefault(ac, {"status": "NOT COVERED", "sources": []}) + entry["sources"].append(rec.scenario_id) + outcome = (rec.outcome or "").upper() + if ac in rec.partial_acs: + entry["status"] = "PARTIAL" + elif outcome in {"FAIL", "SKIP"}: + # Worse than partial — still surface as PARTIAL per + # AZ-445 AC-2 (status enum is {Covered, PARTIAL, NOT COVERED}). + if entry["status"] != "PARTIAL": + entry["status"] = "PARTIAL" + elif outcome in {"PASS", "XFAIL"}: + # Promote NOT COVERED → Covered; keep PARTIAL pinned. + if entry["status"] == "NOT COVERED": + entry["status"] = "Covered" + # Unknown / missing outcomes stay as whatever they were + # — we don't downgrade a PARTIAL by an unknown. + + # Make output deterministic: sort sources within each AC entry. + for entry in by_ac.values(): + entry["sources"] = sorted(set(entry["sources"])) + return by_ac + + def emit_traceability_status(self) -> Path: + path = self.evidence_dir / "traceability-status.json" + path.write_text( + json.dumps(self.compute_traceability_status(), sort_keys=True, indent=2) + + "\n" + ) + return path + + def emit_regression_baseline(self) -> Path: + """Flat dump of every numeric metric for diff tooling.""" + path = self.evidence_dir / "regression-baseline.json" + blob = { + "scenarios": { + rec.scenario_id: { + "metrics": { + name: entry["value"] + for name, entry in rec.metrics.items() + if isinstance(entry["value"], (int, float)) + }, + "outcome": rec.outcome or "UNKNOWN", + } + for rec in self._records.values() + } + } + path.write_text(json.dumps(blob, sort_keys=True, indent=2) + "\n") + return path + + +# ───────────────────── pytest plugin glue ───────────────────── + + +_AGGREGATOR_KEY = pytest.StashKey["_RunAggregator | None"]() + + +def pytest_addoption(parser: pytest.Parser) -> None: + group = parser.getgroup("e2e-runner") + group.addoption( + "--traceability-matrix", + action="store", + default=None, + help=( + "Path to traceability-matrix.md (default: " + "_docs/02_document/tests/traceability-matrix.md relative to repo root). " + "Used to seed the NOT COVERED rows in traceability-status.json." + ), + ) + + +def _resolve_matrix_path(config: pytest.Config) -> Path: + opt = config.getoption("--traceability-matrix") + if opt: + return Path(opt) + return Path(__file__).resolve().parents[3] / "_docs" / "02_document" / "tests" / "traceability-matrix.md" + + +def pytest_configure(config: pytest.Config) -> None: + """Parse the traceability matrix and create the aggregator. + + `--evidence-out` is owned by the runner's conftest.py; by the time + this hook fires, that option is registered and available. The + aggregator's emission directory is therefore known up front. + """ + config.stash[_AGGREGATOR_KEY] = None + matrix_path = _resolve_matrix_path(config) + try: + matrix_ids = parse_traceability_matrix(matrix_path) + except FileNotFoundError: + logger.warning( + "traceability matrix not found at %s — NOT COVERED rows will be empty", + matrix_path, + ) + matrix_ids = [] + + try: + evidence_out = config.getoption("--evidence-out") + except ValueError: + # `--evidence-out` is registered by the runner's conftest. In + # unit-test contexts where that conftest isn't loaded, default + # to the cwd so the aggregator still emits something — the unit + # test redirects this via direct construction of `_RunAggregator`. + evidence_out = "." + + aggregator = _RunAggregator(Path(evidence_out), matrix_ids) + config.stash[_AGGREGATOR_KEY] = aggregator + config.pluginmanager.register(_PluginHooks(aggregator), name="e2e-nfr-recorder") + + config.addinivalue_line( + "markers", "scenario_id(name): explicit NFT scenario id for the per-NFR JSON" + ) + + +class _PluginHooks: + """Tiny plugin instance that owns the logreport+sessionfinish hooks.""" + + def __init__(self, aggregator: _RunAggregator) -> None: + self._agg = aggregator + + def pytest_runtest_logreport(self, report: pytest.TestReport) -> None: + if report.when != "call": + return + outcome_map = { + "passed": "PASS", + "failed": "FAIL", + "skipped": "SKIP", + } + self._agg.set_outcome(report.nodeid, outcome_map.get(report.outcome, "UNKNOWN")) + + def pytest_sessionfinish(self, session: pytest.Session, exitstatus: int) -> None: # noqa: ARG002 + self._agg.emit_per_nfr_json() + self._agg.emit_traceability_status() + self._agg.emit_regression_baseline() + + +def _scenario_id_for(item: pytest.Item) -> str: + marker = item.get_closest_marker("scenario_id") + if marker and marker.args: + return str(marker.args[0]) + # Fall back to the test_id marker (compat with csv_reporter) or + # finally the nodeid. + test_id = item.get_closest_marker("test_id") + if test_id and test_id.args: + return str(test_id.args[0]) + return item.nodeid + + +def _traces_to_for(item: pytest.Item) -> tuple[str, ...]: + marker = item.get_closest_marker("traces_to") + if marker is None: + return () + ids = marker.args[0] if marker.args else marker.kwargs.get("ids", ()) + if isinstance(ids, str): + return tuple(s.strip() for s in ids.split(",") if s.strip()) + return tuple(ids) + + +@pytest.fixture +def nfr_recorder(request: pytest.FixtureRequest) -> _NfrRecorder: + """Fixture handle for NFT scenarios to record metrics + partials.""" + aggregator = request.config.stash.get(_AGGREGATOR_KEY, None) + if aggregator is None: + pytest.skip( + "nfr_recorder requires --evidence-out (the bundler's option) " + "to be set; the harness configures it at runtime." + ) + scenario_id = _scenario_id_for(request.node) + traces_to = _traces_to_for(request.node) + rec = aggregator.ensure_record(scenario_id, request.node.nodeid, traces_to) + return _NfrRecorder( + scenario_id=rec.scenario_id, + nodeid=rec.nodeid, + traces_to=rec.traces_to, + run=aggregator, + ) + + +# ───────────────────── public accessors for cross-plugin use ───────────────────── + + +def aggregator_for(config: pytest.Config) -> _RunAggregator | None: + """Used by csv_reporter to propagate PARTIAL into the row's result column.""" + return config.stash.get(_AGGREGATOR_KEY, None) diff --git a/pyproject.toml b/pyproject.toml index 3768049..4ec6807 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -106,6 +106,15 @@ dev = [ # production runtime of the mock lives inside its own Docker image so # the SUT does not depend on FastAPI; this is a test-only dep. "fastapi>=0.111,<0.120", + # AZ-407 (blackbox tile-cache + age-injector + cve-jpeg fixtures): the + # tile-cache-builder re-encodes paired _gmaps.png references into + # deterministic JPEG bodies and emits stub tiles via PIL. The + # production builder runs inside its own Docker image (which installs + # Pillow itself); this Pillow pin is only the test-time dep used by + # `e2e/_unit_tests/fixtures/test_tile_cache_builder.py`. Pin range + # tracks the Pillow that torchvision (project's inference extra) + # already accepts — currently 11.x / 12.x. + "Pillow>=10.4,<13.0", ] inference = [ "torch>=2.2",