# Batch 68 Report — Test Implementation (cycle 1, batch 2 of test phase) **Batch**: 68 **Date**: 2026-05-16 **Context**: Test implementation (greenfield Step 10 — Implement Tests) **Tasks**: AZ-407 (3pt), AZ-444 (5pt), AZ-445 (2pt) — 10 cp / 3 tasks **Cycle**: 1 **Verdict**: COMPLETE — PASS (self-reviewed) ## Summary Three blackbox-harness tasks, all dependent only on AZ-406: ### AZ-407 — Static fixture builders (3pt) Concrete deliverables for the five static fixtures named in `test-data.md`: * **tile-cache-fixture** — `e2e/fixtures/tile-cache-builder/`: `builder.py` (pure Python; emits tile JPEGs + sidecar JSON + `manifest.csv` + FAISS HNSW `descriptors.index`), `Dockerfile` (Python 3.10-slim + Pillow + numpy + faiss-cpu), `build.sh` (Docker volume mode + `--local` unit-test mode). Reproducibility primitives: sorted input iteration, fixed PIL JPEG settings (`quality=85, optimize=False, progressive=False, subsampling=2`), manifest rows sorted by `(zoom, x, y)`, FAISS single-threaded with fixed seed. AC-1 verified by `test_builder_is_deterministic`. * **age-injector** — `e2e/fixtures/age-injector/`: `age_injector.py` (clones the tile tree bit-identical, mutates manifest + sidecar `capture_date` to `now - age_months × 30.44d`), `inject.sh` (emits `synth-age-7mo` + `synth-age-13mo` named Docker volumes). Tile pixels remain byte-equal across age injection. * **cold-boot-fixture** — `e2e/fixtures/cold-boot/cold_boot_fixture.json`: Frozen FC pose snapshot at flight-resume time. Schema v1 carries `global_position_int` (lat_e7 / lon_e7 / alt_mm / hdg_cdeg), `attitude` (roll/pitch/yaw_rad), and per-FC param-load hints. The fixture lat/lon sits inside the Derkachi bbox; AZ-419 (FT-P-11) drives the SITL parameter-load path. * **mavlink-test-passkey** — `e2e/fixtures/secrets/mavlink-test-passkey.txt`: 64-hex passkey with the required `# TEST ONLY — not for production use` header line. Sync with the Docker-secret file `e2e/docker/secrets/mavlink_passkey` enforced by the updated `test_passkey_files_match` (strips the comment header before byte comparison). * **cve-2025-53644.jpg** — `e2e/fixtures/security/`: Synthetic malformed JPEG (truncated SOS marker, no EOI). The generator `generate_cve_jpeg.py` emits a 158-byte file with pinned SHA-256 `c281d2f25959…877002e`. OpenCV 4.11 (vulnerable line) rejects gracefully with `imdecode → None`. AZ-439 (NFT-SEC-04) will sharpen this for full ASan instrumentation. Top-level `Makefile` with `make fixtures` / `make fixtures-*` / `make fixtures-unit-tests` / `make e2e-tier1` targets. Per-fixture READMEs document source, license, provenance, and reproducibility per AC-7. ### AZ-444 — Tier-2 Jetson harness wrapper (5pt) The AZ-406 scaffold of `run-tier2.sh` covered the local-execution on-Jetson path; AZ-444 splits the harness into the orchestrator-side and on-device parts: * **`e2e/jetson/run-tier2.sh`** (rewritten) — orchestrator. Detects local (aarch64 + TIER2_HOST=localhost) vs remote (ssh into `TIER2_HOST`). Flags: `--fc-adapter`, `--vio-strategy`, `-k`/`--selector`, `--build-kind production|asan`, `--duration`, `--enable-chamber`, `--reflash`, `--dry-run`. Remote mode rsyncs the `e2e/` tree to `/opt/azaion-e2e/` on the Jetson and ssh's the on-device delegate. Reflash path requires both `--reflash` AND `TIER2_REFLASH_ACK=1` (two-key gate). * **`e2e/jetson/tier2-on-jetson.sh`** (new) — on-device delegate. Verifies `gps-denied-onboard.service` (or `*-asan.service` for `--build-kind=asan`); restarts with 5-second tolerance per AC-3; spawns tegrastats + jtop parallel samplers per AC-4; tails the ASan unit's journal into `asan-fuzz.log` when in asan mode; drives the e2e-runner via docker compose with TIER=tier2-jetson; forwards `SELECTOR` to pytest's `-k` per AC-1. * **`e2e/docker/run-tier1.sh`** (new) — selector-parity sibling. Same flag surface as `run-tier2.sh` minus the ssh / reflash options. AC-1 verified by `test_selector_parity_pytest_args_equivalent` which extracts the `-k ` from both dry-run outputs and asserts the same string is present. ACs whose authentic verification path requires a Jetson are documented in this report's "AC coverage" table and gated behind docker-bound smoke tests inside the runner image. ### AZ-445 — CSV reporter + evidence bundler refinements (2pt) * **`e2e/runner/reporting/nfr_recorder.py`** (new) — pytest plugin. Provides the `nfr_recorder` fixture; tests call `nfr_recorder.record_metric(name, value, ac_id)` and `nfr_recorder.partial(ac_id, reason)`. At session end the plugin emits three artifacts into the evidence dir: - `per-nfr/.json` — one file per recorded scenario (AC-1) - `traceability-status.json` — every AC from `_docs/02_document/tests/traceability-matrix.md` listed with status ∈ {Covered, PARTIAL, NOT COVERED} and source scenarios (AC-2) - `regression-baseline.json` — flat numeric-metric dump for diff tooling (AC-3) * **`e2e/runner/reporting/csv_reporter.py`** (extended) — the `_outcome_to_result` path now consults the aggregator: when an NFR-recorded scenario has any PARTIAL AC, the row's `result` column is `PARTIAL` instead of `PASS` (AC-4). Graceful fallback when the aggregator isn't registered (unit-test contexts). * **`e2e/runner/conftest.py`** — registers `nfr_recorder` in `pytest_plugins`. * New CLI flag `--traceability-matrix` (default: project's `_docs/02_document/tests/traceability-matrix.md`) lets the aggregator seed the NOT COVERED rows. The matrix parser uses two regex passes (`AC-…` and `RESTRICT-…` table-row prefixes); 88 IDs in the current matrix file parse cleanly. ## Files added / modified ### Added (15) AZ-407: * `e2e/fixtures/tile-cache-builder/builder.py` * `e2e/fixtures/tile-cache-builder/Dockerfile` * `e2e/fixtures/tile-cache-builder/build.sh` * `e2e/fixtures/age-injector/age_injector.py` * `e2e/fixtures/age-injector/inject.sh` * `e2e/fixtures/cold-boot/cold_boot_fixture.json` * `e2e/fixtures/security/cve-2025-53644.jpg` (158 bytes; generated) AZ-444: * `e2e/jetson/tier2-on-jetson.sh` * `e2e/docker/run-tier1.sh` AZ-445: * `e2e/runner/reporting/nfr_recorder.py` Top-level: * `Makefile` Unit tests (AZ-407 + AZ-444 + AZ-445): * `e2e/_unit_tests/fixtures/test_tile_cache_builder.py` * `e2e/_unit_tests/fixtures/test_age_injector.py` * `e2e/_unit_tests/fixtures/test_cold_boot_fixture.py` * `e2e/_unit_tests/fixtures/test_mavlink_passkey.py` * `e2e/_unit_tests/fixtures/test_cve_jpeg.py` * `e2e/_unit_tests/jetson/test_run_tier_scripts.py` * `e2e/_unit_tests/reporting/test_nfr_recorder.py` ### Modified (8) * `pyproject.toml` — added `Pillow>=10.4,<13.0` to dev extras (used by `test_tile_cache_builder.py` to verify reproducibility without Docker). * `e2e/jetson/run-tier2.sh` — rewritten as the orchestrator (was a local-only stub from AZ-406). * `e2e/fixtures/secrets/mavlink-test-passkey.txt` — added the required `# TEST ONLY — not for production use` header line per AZ-407 AC-5. * `e2e/fixtures/secrets/README.md` — expanded per AC-7 (license, provenance, sync-with-docker-secret note). * `e2e/fixtures/security/generate_cve_jpeg.py` — concrete impl (replaces the AZ-406 NotImplementedError pointer). * `e2e/fixtures/security/README.md` — expanded per AC-7. * `e2e/fixtures/tile-cache-builder/README.md` — expanded per AC-7. * `e2e/fixtures/age-injector/README.md` — expanded per AC-7. * `e2e/fixtures/cold-boot/README.md` — expanded; clarified that AZ-407 owns the JSON file (the prior README incorrectly pointed at AZ-419). * `e2e/runner/reporting/csv_reporter.py` — PARTIAL propagation hook (AZ-445 AC-4). * `e2e/runner/conftest.py` — registered `nfr_recorder` plugin. * `e2e/_unit_tests/test_directory_layout.py` — added the new paths (10 new files); replaced the byte-equal passkey assertion with a header-stripping comparison. ## Spec / module-layout drift notes * **AZ-407 spec uses `tests/fixtures/...` paths**, but the `blackbox_tests` cross-cutting entry in `_docs/02_document/module-layout.md` (added in preparatory commit `d7a17a8`) authoritatively places the e2e harness under `e2e/`. Implementation followed the module-layout entry; the spec text is pre-fix and was not updated. The AZ-407 archived spec retains its `tests/fixtures` wording for audit, but the actual file ownership is `e2e/fixtures/...`. No further action — the module-layout entry is the source of truth. * **AZ-444 spec mentions `e2e/tier2/run-tier2.sh`**, but the AZ-406 scaffold placed Tier-2 scripts under `e2e/jetson/`. Kept at `e2e/jetson/` for consistency with the AZ-406 commit; no behavioural difference. * **Cold-boot ownership**: AZ-419 spec line "Dependencies: AZ-406, AZ-407 (cold-boot-fixture)" confirms AZ-407 owns the JSON; the scaffold's old README incorrectly attributed ownership to AZ-419. Fixed in this batch. ## Test Results ### Focused tests (Step 6.4) `pytest e2e/_unit_tests/` — **157 passed in 12.59s** (was 97 in batch 67; +60 new tests across this batch). Breakdown of new tests: * AZ-407 fixtures (30 cases): tile-cache determinism (7), age-injector shift+pixel-preserve (5), cold-boot schema (5), MAVLink passkey (3), CVE JPEG generator (5), provenance READMEs (5). * AZ-444 Tier scripts (15 cases): existence+exec bit (3), Tier-1 dry-run (1), Tier-2 dry-run local/remote (2), CLI rejection (4), reflash gating (2), selector parity (3). * AZ-445 NFR recorder (9 cases incl. 1 CSV-reporter PARTIAL guard). No regressions in the 97 inherited AZ-406 tests. No per-batch full-suite run per the implement skill's Test-Run Cadence (Step 16 owns the only full-suite invocation). ## AC Test Coverage ### AZ-407 | AC | Test | Status | |----|------|--------| | AC-1 (deterministic) | `test_builder_is_deterministic` | Covered | | AC-2 (footprint coverage) | `test_manifest_covers_60_stills_plus_bbox`, `test_real_tile_count_matches_paired_gmaps`, `test_manifest_schema_matches_restrictions_md` | Covered | | AC-3 (aged dates) | `test_age_injector_shifts_capture_date[7-180]`, `[13-360]`, `test_age_injector_preserves_tile_bytes`, `test_age_injector_updates_sidecar_dates` | Covered | | AC-4 (cold-boot SITL load) | `test_cold_boot_fixture_*`: JSON schema, Derkachi bbox membership, attitude bounds. **SITL load (±1 m EKF)** deferred to AZ-419 (Docker-bound, FT-P-11). | Covered by contract; full check is AZ-419 | | AC-5 (mavlink passkey) | `test_passkey_has_comment_header`, `test_passkey_is_64_hex_chars`, `test_passkey_is_lowercase`, `test_passkey_files_match` | Covered | | AC-6 (CVE JPEG no-crash) | `test_opencv_rejects_without_crash`, `test_jpeg_has_soi_and_truncated_sos`, `test_committed_fixture_matches_generator` | Covered | | AC-7 (license + provenance) | `test_provenance_readme_lists_required_sections`, `test_age_injector_provenance_readme_exists`, `test_provenance_block_present`, `test_provenance_readme_exists` (CVE) | Covered | ### AZ-444 | AC | Test | Status | |----|------|--------| | AC-1 (selector parity) | `test_selector_parity_pytest_args_equivalent`, `test_selector_appears_in_dry_run[*]` | Covered | | AC-2 (idempotent provisioning) | Static-shape verified in code review (dpkg-precondition guard); full check requires a Jetson host. **No unit test.** | NOT COVERED (hardware-loop) | | AC-3 (systemd lifecycle) | Static-shape verified in code review (5×1s poll loop); full check requires a Jetson host. **No unit test.** | NOT COVERED (hardware-loop) | | AC-4 (tegrastats parallel capture) | `test_required_path_exists[jetson/tegrastats_parser.py]` + AZ-406 parser unit tests; full pipe-capture path requires a Jetson. | Covered by contract; full check is Tier-2 runtime | | AC-5 (ASan-fuzz) | `test_tier2_rejects_unknown_build_kind`; ASan unit `gps-denied-onboard-asan.service` is referenced by name in the delegate. Full check requires ASan-instrumented SUT on Jetson. | Covered by contract; full check is Tier-2 runtime | | AC-6 (image-flash gating) | `test_reflash_refuses_without_ack`, `test_reflash_dry_run_with_ack_shows_flash_command` | Covered | AC-2 and AC-3 are documented as hardware-loop ACs whose runtime verification path is the on-Jetson smoke test. The scripts compile, parse, and dry-run correctly; they cannot be authentically verified without a Jetson because mocking `systemctl` and `apt-get` would test the mock, not the real binding. ### AZ-445 | AC | Test | Status | |----|------|--------| | AC-1 (per-NFR JSON) | `test_emit_per_nfr_json_writes_one_file_per_scenario` + integration | Covered | | AC-2 (traceability-status.json) | `test_emit_traceability_status_classifies_acs`, `test_emit_traceability_status_downgrades_on_fail`, `test_parse_traceability_matrix_*` | Covered | | AC-3 (regression-baseline.json) | `test_emit_regression_baseline_dumps_numeric_metrics` + integration | Covered | | AC-4 (PARTIAL propagation in CSV) | `test_build_row_pass_when_no_session_attribute`, integration test (`test_nfr_recorder_fixture_emits_artifacts_in_run`) | Covered | ## Code Review Verdict Self-reviewed — PASS. Notable points: * **Reproducibility** of the tile-cache builder relies on (a) sorted input iteration, (b) frozen PIL JPEG params, (c) FAISS single-thread + fixed seed (`omp_set_num_threads(1)` + `np.random.default_rng` seeded from a SHA hash of the content hash). Test verifies bit-identical output across two runs. * **Pillow pin compatibility**: the local venv had Pillow 12.x via torchvision; my initial `<12.0` pin downgraded it to 11.3. Widened to `<13.0` so both major lines are accepted and the project's inference extras stay happy. * **`np.random.default_rng` vs `RandomState`**: first impl used `RandomState.standard_normal(dim, dtype=np.float32)` which doesn't accept `dtype` in older numpy; replaced with `default_rng`. The builder now works on the project's `numpy>=1.26,<2.0` pin. * **CSV PARTIAL propagation** is decoupled via the aggregator — `_outcome_to_result` in `csv_reporter.py` imports `nfr_recorder` lazily and falls back to PASS when the import fails. Keeps the two plugins individually testable without a hard dependency. * **Spec drift** flagged in this report's "Spec / module-layout drift notes" section. No action needed; the module-layout entry is the authoritative source. ## Auto-Fix Attempts 0. No code-review failures — auto-fix gate was not entered. ## Stuck Agents None. ## Deferred follow-ups * AZ-419 (FT-P-11) — owns SITL parameter-load verification of the cold-boot fixture (AZ-407 AC-4 runtime path). * AZ-439 (NFT-SEC-04) — owns the ASan-instrumented CVE-2025-53644 verification (AZ-407 AC-6's full PoC structure). * AZ-444 hardware-loop ACs (AC-2/3/4/5) — owned by the Tier-2 smoke test inside the runner image; will be re-verified on a Jetson bring-up cycle. ## Next Batch Batch 69 candidate set (all unblocked): * AZ-408 (Runtime synthetic injection — 3pt) — outlier injector, blackout-spoof injector, multi-segment injector (the fixtures scaffolded by AZ-406 + AZ-407). * AZ-410 (FT-P-01 — frame-center GPS accuracy — 5pt) * AZ-411 (FT-P-02 — cumulative drift — 3pt) Total: 11 cp across 3 tasks. AZ-408 unblocks the FT-N-* synthetic scenarios; AZ-410 / AZ-411 are the first concrete positive scenarios exercising the SUT through the full Docker-bound runner.