[AZ-407] [AZ-444] [AZ-445] Batch 68: fixtures, Tier-2 harness, NFR reporter

Three blackbox-harness tasks landed together — all depend only on
AZ-406 and unblock the FT-* / NFT-* scenario tasks scheduled for
batches 69+.

AZ-407 — Static fixture builders (3pt):
  * tile-cache-builder/{builder.py, Dockerfile, build.sh} produces a
    deterministic tile-cache-fixture Docker volume from
    _docs/00_problem/input_data/. Reproducibility primitives: sorted
    iteration, frozen PIL JPEG settings, FAISS HNSW32 built single-
    threaded with seeded stub descriptors.
  * age-injector/{age_injector.py, inject.sh} clones the volume and
    shifts capture_date by N×30.44 days; tile JPEG bytes preserved
    bit-identical. Emits synth-age-7mo + synth-age-13mo volumes.
  * cold-boot/cold_boot_fixture.json: frozen FC pose snapshot at
    Derkachi sector centre, schema v1.
  * secrets/mavlink-test-passkey.txt: 64-hex with required
    `# TEST ONLY` header line per AC-5. Passkey-equality test now
    compares the secret line after stripping the header.
  * security/cve-2025-53644.jpg: synthetic 158-byte malformed JPEG
    (truncated SOS marker). OpenCV 4.11.x rejects gracefully with
    imdecode → None. AZ-439 will sharpen for ASan instrumentation.
  * Top-level Makefile with `make fixtures` / `make fixtures-*` /
    `make e2e-tier1*` / `make unit-tests` targets.

AZ-444 — Tier-2 Jetson harness wrapper (5pt):
  * run-tier2.sh rewritten as orchestrator. Detects local
    (aarch64 + TIER2_HOST=localhost) vs remote (ssh into TIER2_HOST).
    New flags: -k/--selector, --build-kind production|asan,
    --reflash (gated behind TIER2_REFLASH_ACK=1 two-key gate),
    --dry-run.
  * tier2-on-jetson.sh (new) — on-device delegate. Verifies
    gps-denied-onboard{,-asan}.service health; restarts with 5s
    tolerance; spawns tegrastats + jtop parallel samplers; tails
    ASan unit's journal in asan mode; drives docker compose with
    TIER=tier2-jetson; forwards SELECTOR to pytest -k.
  * docker/run-tier1.sh (new) — selector-parity sibling.
  * AC-1 (selector parity) and AC-6 (reflash gating) unit-tested via
    --dry-run output assertions. AC-2/AC-3/AC-4/AC-5 are hardware-
    loop ACs verified by the Tier-2 runtime smoke (no Jetson in the
    unit-test layer).

AZ-445 — CSV reporter + evidence bundler refinements (2pt):
  * reporting/nfr_recorder.py (new) — pytest plugin. Provides the
    `nfr_recorder` fixture with record_metric(name, value, ac_id)
    and partial(ac_id, reason). At session end emits:
      - per-nfr/<scenario_id>.json (AC-1)
      - traceability-status.json with every AC ID parsed from
        traceability-matrix.md, classified Covered/PARTIAL/NOT
        COVERED with source scenario IDs (AC-2)
      - regression-baseline.json with all numeric metrics (AC-3)
  * csv_reporter.py extended — `_outcome_to_result` consults the
    aggregator; rows flip PASS → PARTIAL when an AC was marked
    PARTIAL by nfr_recorder (AC-4). Graceful fallback when
    aggregator isn't registered (unit-test contexts).
  * conftest.py registers nfr_recorder in pytest_plugins.
  * New --traceability-matrix CLI flag seeds the NOT COVERED rows.

Build / config:
  * pyproject.toml dev extras: added Pillow>=10.4,<13.0 for the
    tile-cache-builder unit test (broad enough to keep torchvision's
    Pillow 12 pin happy; the production builder runs inside its own
    Docker image with its own pin).
  * Updated test_directory_layout.py to cover 10 new files + replaced
    the byte-equal passkey assertion with the header-stripping
    variant.

Test results:
  * 157 focused tests pass (was 97 in batch 67; +60 new across this
    batch). No regressions.

Module-layout / spec drift:
  * AZ-407 spec text says `tests/fixtures/...`; module-layout
    blackbox_tests entry (commit d7a17a8) authoritatively places the
    harness under `e2e/`. Implementation followed the layout entry.
  * AZ-444 spec mentions `e2e/tier2/run-tier2.sh`; AZ-406 placed it
    at `e2e/jetson/run-tier2.sh`. Kept at `e2e/jetson/` for
    consistency.
  * Cold-boot README ownership: corrected from AZ-419 to AZ-407 per
    AZ-419's own Dependencies field.

Specs archived to _docs/02_tasks/done/. Jira tickets transitioned to
In Testing on commit.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-16 17:18:01 +03:00
parent e9e6e32097
commit 6599d828d2
35 changed files with 3716 additions and 147 deletions
+315
View File
@@ -0,0 +1,315 @@
# Batch 68 Report — Test Implementation (cycle 1, batch 2 of test phase)
**Batch**: 68
**Date**: 2026-05-16
**Context**: Test implementation (greenfield Step 10 — Implement Tests)
**Tasks**: AZ-407 (3pt), AZ-444 (5pt), AZ-445 (2pt) — 10 cp / 3 tasks
**Cycle**: 1
**Verdict**: COMPLETE — PASS (self-reviewed)
## Summary
Three blackbox-harness tasks, all dependent only on AZ-406:
### AZ-407 — Static fixture builders (3pt)
Concrete deliverables for the five static fixtures named in
`test-data.md`:
* **tile-cache-fixture** — `e2e/fixtures/tile-cache-builder/`:
`builder.py` (pure Python; emits tile JPEGs + sidecar JSON +
`manifest.csv` + FAISS HNSW `descriptors.index`), `Dockerfile`
(Python 3.10-slim + Pillow + numpy + faiss-cpu), `build.sh`
(Docker volume mode + `--local` unit-test mode). Reproducibility
primitives: sorted input iteration, fixed PIL JPEG settings
(`quality=85, optimize=False, progressive=False, subsampling=2`),
manifest rows sorted by `(zoom, x, y)`, FAISS single-threaded with
fixed seed. AC-1 verified by `test_builder_is_deterministic`.
* **age-injector** — `e2e/fixtures/age-injector/`:
`age_injector.py` (clones the tile tree bit-identical, mutates
manifest + sidecar `capture_date` to `now - age_months × 30.44d`),
`inject.sh` (emits `synth-age-7mo` + `synth-age-13mo` named Docker
volumes). Tile pixels remain byte-equal across age injection.
* **cold-boot-fixture** — `e2e/fixtures/cold-boot/cold_boot_fixture.json`:
Frozen FC pose snapshot at flight-resume time. Schema v1 carries
`global_position_int` (lat_e7 / lon_e7 / alt_mm / hdg_cdeg),
`attitude` (roll/pitch/yaw_rad), and per-FC param-load hints. The
fixture lat/lon sits inside the Derkachi bbox; AZ-419 (FT-P-11)
drives the SITL parameter-load path.
* **mavlink-test-passkey** — `e2e/fixtures/secrets/mavlink-test-passkey.txt`:
64-hex passkey with the required `# TEST ONLY — not for production
use` header line. Sync with the Docker-secret file
`e2e/docker/secrets/mavlink_passkey` enforced by the updated
`test_passkey_files_match` (strips the comment header before byte
comparison).
* **cve-2025-53644.jpg** — `e2e/fixtures/security/`:
Synthetic malformed JPEG (truncated SOS marker, no EOI). The
generator `generate_cve_jpeg.py` emits a 158-byte file with
pinned SHA-256 `c281d2f25959…877002e`. OpenCV 4.11 (vulnerable
line) rejects gracefully with `imdecode → None`. AZ-439 (NFT-SEC-04)
will sharpen this for full ASan instrumentation.
Top-level `Makefile` with `make fixtures` / `make fixtures-*` /
`make fixtures-unit-tests` / `make e2e-tier1` targets.
Per-fixture READMEs document source, license, provenance, and
reproducibility per AC-7.
### AZ-444 — Tier-2 Jetson harness wrapper (5pt)
The AZ-406 scaffold of `run-tier2.sh` covered the local-execution
on-Jetson path; AZ-444 splits the harness into the orchestrator-side
and on-device parts:
* **`e2e/jetson/run-tier2.sh`** (rewritten) — orchestrator. Detects
local (aarch64 + TIER2_HOST=localhost) vs remote (ssh into
`TIER2_HOST`). Flags: `--fc-adapter`, `--vio-strategy`,
`-k`/`--selector`, `--build-kind production|asan`, `--duration`,
`--enable-chamber`, `--reflash`, `--dry-run`. Remote mode rsyncs
the `e2e/` tree to `/opt/azaion-e2e/` on the Jetson and ssh's the
on-device delegate. Reflash path requires both `--reflash` AND
`TIER2_REFLASH_ACK=1` (two-key gate).
* **`e2e/jetson/tier2-on-jetson.sh`** (new) — on-device delegate.
Verifies `gps-denied-onboard.service` (or `*-asan.service` for
`--build-kind=asan`); restarts with 5-second tolerance per AC-3;
spawns tegrastats + jtop parallel samplers per AC-4; tails the
ASan unit's journal into `asan-fuzz.log` when in asan mode; drives
the e2e-runner via docker compose with TIER=tier2-jetson; forwards
`SELECTOR` to pytest's `-k` per AC-1.
* **`e2e/docker/run-tier1.sh`** (new) — selector-parity sibling.
Same flag surface as `run-tier2.sh` minus the ssh / reflash
options. AC-1 verified by `test_selector_parity_pytest_args_equivalent`
which extracts the `-k <selector>` from both dry-run outputs and
asserts the same string is present.
ACs whose authentic verification path requires a Jetson are
documented in this report's "AC coverage" table and gated behind
docker-bound smoke tests inside the runner image.
### AZ-445 — CSV reporter + evidence bundler refinements (2pt)
* **`e2e/runner/reporting/nfr_recorder.py`** (new) — pytest plugin.
Provides the `nfr_recorder` fixture; tests call
`nfr_recorder.record_metric(name, value, ac_id)` and
`nfr_recorder.partial(ac_id, reason)`. At session end the plugin
emits three artifacts into the evidence dir:
- `per-nfr/<scenario_id>.json` — one file per recorded scenario
(AC-1)
- `traceability-status.json` — every AC from
`_docs/02_document/tests/traceability-matrix.md` listed with
status ∈ {Covered, PARTIAL, NOT COVERED} and source scenarios
(AC-2)
- `regression-baseline.json` — flat numeric-metric dump for
diff tooling (AC-3)
* **`e2e/runner/reporting/csv_reporter.py`** (extended) — the
`_outcome_to_result` path now consults the aggregator: when an
NFR-recorded scenario has any PARTIAL AC, the row's `result`
column is `PARTIAL` instead of `PASS` (AC-4). Graceful fallback
when the aggregator isn't registered (unit-test contexts).
* **`e2e/runner/conftest.py`** — registers `nfr_recorder` in
`pytest_plugins`.
* New CLI flag `--traceability-matrix` (default: project's
`_docs/02_document/tests/traceability-matrix.md`) lets the
aggregator seed the NOT COVERED rows.
The matrix parser uses two regex passes (`AC-…` and `RESTRICT-…`
table-row prefixes); 88 IDs in the current matrix file parse
cleanly.
## Files added / modified
### Added (15)
AZ-407:
* `e2e/fixtures/tile-cache-builder/builder.py`
* `e2e/fixtures/tile-cache-builder/Dockerfile`
* `e2e/fixtures/tile-cache-builder/build.sh`
* `e2e/fixtures/age-injector/age_injector.py`
* `e2e/fixtures/age-injector/inject.sh`
* `e2e/fixtures/cold-boot/cold_boot_fixture.json`
* `e2e/fixtures/security/cve-2025-53644.jpg` (158 bytes; generated)
AZ-444:
* `e2e/jetson/tier2-on-jetson.sh`
* `e2e/docker/run-tier1.sh`
AZ-445:
* `e2e/runner/reporting/nfr_recorder.py`
Top-level:
* `Makefile`
Unit tests (AZ-407 + AZ-444 + AZ-445):
* `e2e/_unit_tests/fixtures/test_tile_cache_builder.py`
* `e2e/_unit_tests/fixtures/test_age_injector.py`
* `e2e/_unit_tests/fixtures/test_cold_boot_fixture.py`
* `e2e/_unit_tests/fixtures/test_mavlink_passkey.py`
* `e2e/_unit_tests/fixtures/test_cve_jpeg.py`
* `e2e/_unit_tests/jetson/test_run_tier_scripts.py`
* `e2e/_unit_tests/reporting/test_nfr_recorder.py`
### Modified (8)
* `pyproject.toml` — added `Pillow>=10.4,<13.0` to dev extras
(used by `test_tile_cache_builder.py` to verify reproducibility
without Docker).
* `e2e/jetson/run-tier2.sh` — rewritten as the orchestrator (was a
local-only stub from AZ-406).
* `e2e/fixtures/secrets/mavlink-test-passkey.txt` — added the
required `# TEST ONLY — not for production use` header line per
AZ-407 AC-5.
* `e2e/fixtures/secrets/README.md` — expanded per AC-7 (license,
provenance, sync-with-docker-secret note).
* `e2e/fixtures/security/generate_cve_jpeg.py` — concrete impl
(replaces the AZ-406 NotImplementedError pointer).
* `e2e/fixtures/security/README.md` — expanded per AC-7.
* `e2e/fixtures/tile-cache-builder/README.md` — expanded per AC-7.
* `e2e/fixtures/age-injector/README.md` — expanded per AC-7.
* `e2e/fixtures/cold-boot/README.md` — expanded; clarified that
AZ-407 owns the JSON file (the prior README incorrectly pointed
at AZ-419).
* `e2e/runner/reporting/csv_reporter.py` — PARTIAL propagation
hook (AZ-445 AC-4).
* `e2e/runner/conftest.py` — registered `nfr_recorder` plugin.
* `e2e/_unit_tests/test_directory_layout.py` — added the new
paths (10 new files); replaced the byte-equal passkey assertion
with a header-stripping comparison.
## Spec / module-layout drift notes
* **AZ-407 spec uses `tests/fixtures/...` paths**, but the
`blackbox_tests` cross-cutting entry in
`_docs/02_document/module-layout.md` (added in preparatory commit
`d7a17a8`) authoritatively places the e2e harness under `e2e/`.
Implementation followed the module-layout entry; the spec text is
pre-fix and was not updated. The AZ-407 archived spec retains its
`tests/fixtures` wording for audit, but the actual file ownership
is `e2e/fixtures/...`. No further action — the module-layout
entry is the source of truth.
* **AZ-444 spec mentions `e2e/tier2/run-tier2.sh`**, but the
AZ-406 scaffold placed Tier-2 scripts under `e2e/jetson/`.
Kept at `e2e/jetson/` for consistency with the AZ-406 commit;
no behavioural difference.
* **Cold-boot ownership**: AZ-419 spec line "Dependencies: AZ-406,
AZ-407 (cold-boot-fixture)" confirms AZ-407 owns the JSON; the
scaffold's old README incorrectly attributed ownership to AZ-419.
Fixed in this batch.
## Test Results
### Focused tests (Step 6.4)
`pytest e2e/_unit_tests/`**157 passed in 12.59s** (was 97 in
batch 67; +60 new tests across this batch).
Breakdown of new tests:
* AZ-407 fixtures (30 cases): tile-cache determinism (7), age-injector
shift+pixel-preserve (5), cold-boot schema (5), MAVLink passkey (3),
CVE JPEG generator (5), provenance READMEs (5).
* AZ-444 Tier scripts (15 cases): existence+exec bit (3), Tier-1
dry-run (1), Tier-2 dry-run local/remote (2), CLI rejection (4),
reflash gating (2), selector parity (3).
* AZ-445 NFR recorder (9 cases incl. 1 CSV-reporter PARTIAL guard).
No regressions in the 97 inherited AZ-406 tests.
No per-batch full-suite run per the implement skill's Test-Run Cadence
(Step 16 owns the only full-suite invocation).
## AC Test Coverage
### AZ-407
| AC | Test | Status |
|----|------|--------|
| AC-1 (deterministic) | `test_builder_is_deterministic` | Covered |
| AC-2 (footprint coverage) | `test_manifest_covers_60_stills_plus_bbox`, `test_real_tile_count_matches_paired_gmaps`, `test_manifest_schema_matches_restrictions_md` | Covered |
| AC-3 (aged dates) | `test_age_injector_shifts_capture_date[7-180]`, `[13-360]`, `test_age_injector_preserves_tile_bytes`, `test_age_injector_updates_sidecar_dates` | Covered |
| AC-4 (cold-boot SITL load) | `test_cold_boot_fixture_*`: JSON schema, Derkachi bbox membership, attitude bounds. **SITL load (±1 m EKF)** deferred to AZ-419 (Docker-bound, FT-P-11). | Covered by contract; full check is AZ-419 |
| AC-5 (mavlink passkey) | `test_passkey_has_comment_header`, `test_passkey_is_64_hex_chars`, `test_passkey_is_lowercase`, `test_passkey_files_match` | Covered |
| AC-6 (CVE JPEG no-crash) | `test_opencv_rejects_without_crash`, `test_jpeg_has_soi_and_truncated_sos`, `test_committed_fixture_matches_generator` | Covered |
| AC-7 (license + provenance) | `test_provenance_readme_lists_required_sections`, `test_age_injector_provenance_readme_exists`, `test_provenance_block_present`, `test_provenance_readme_exists` (CVE) | Covered |
### AZ-444
| AC | Test | Status |
|----|------|--------|
| AC-1 (selector parity) | `test_selector_parity_pytest_args_equivalent`, `test_selector_appears_in_dry_run[*]` | Covered |
| AC-2 (idempotent provisioning) | Static-shape verified in code review (dpkg-precondition guard); full check requires a Jetson host. **No unit test.** | NOT COVERED (hardware-loop) |
| AC-3 (systemd lifecycle) | Static-shape verified in code review (5×1s poll loop); full check requires a Jetson host. **No unit test.** | NOT COVERED (hardware-loop) |
| AC-4 (tegrastats parallel capture) | `test_required_path_exists[jetson/tegrastats_parser.py]` + AZ-406 parser unit tests; full pipe-capture path requires a Jetson. | Covered by contract; full check is Tier-2 runtime |
| AC-5 (ASan-fuzz) | `test_tier2_rejects_unknown_build_kind`; ASan unit `gps-denied-onboard-asan.service` is referenced by name in the delegate. Full check requires ASan-instrumented SUT on Jetson. | Covered by contract; full check is Tier-2 runtime |
| AC-6 (image-flash gating) | `test_reflash_refuses_without_ack`, `test_reflash_dry_run_with_ack_shows_flash_command` | Covered |
AC-2 and AC-3 are documented as hardware-loop ACs whose runtime
verification path is the on-Jetson smoke test. The scripts compile,
parse, and dry-run correctly; they cannot be authentically verified
without a Jetson because mocking `systemctl` and `apt-get` would
test the mock, not the real binding.
### AZ-445
| AC | Test | Status |
|----|------|--------|
| AC-1 (per-NFR JSON) | `test_emit_per_nfr_json_writes_one_file_per_scenario` + integration | Covered |
| AC-2 (traceability-status.json) | `test_emit_traceability_status_classifies_acs`, `test_emit_traceability_status_downgrades_on_fail`, `test_parse_traceability_matrix_*` | Covered |
| AC-3 (regression-baseline.json) | `test_emit_regression_baseline_dumps_numeric_metrics` + integration | Covered |
| AC-4 (PARTIAL propagation in CSV) | `test_build_row_pass_when_no_session_attribute`, integration test (`test_nfr_recorder_fixture_emits_artifacts_in_run`) | Covered |
## Code Review Verdict
Self-reviewed — PASS. Notable points:
* **Reproducibility** of the tile-cache builder relies on (a) sorted
input iteration, (b) frozen PIL JPEG params, (c) FAISS
single-thread + fixed seed (`omp_set_num_threads(1)` +
`np.random.default_rng` seeded from a SHA hash of the content
hash). Test verifies bit-identical output across two runs.
* **Pillow pin compatibility**: the local venv had Pillow 12.x via
torchvision; my initial `<12.0` pin downgraded it to 11.3. Widened
to `<13.0` so both major lines are accepted and the project's
inference extras stay happy.
* **`np.random.default_rng` vs `RandomState`**: first impl used
`RandomState.standard_normal(dim, dtype=np.float32)` which doesn't
accept `dtype` in older numpy; replaced with `default_rng`. The
builder now works on the project's `numpy>=1.26,<2.0` pin.
* **CSV PARTIAL propagation** is decoupled via the aggregator —
`_outcome_to_result` in `csv_reporter.py` imports `nfr_recorder`
lazily and falls back to PASS when the import fails. Keeps the
two plugins individually testable without a hard dependency.
* **Spec drift** flagged in this report's "Spec / module-layout
drift notes" section. No action needed; the module-layout entry
is the authoritative source.
## Auto-Fix Attempts
0. No code-review failures — auto-fix gate was not entered.
## Stuck Agents
None.
## Deferred follow-ups
* AZ-419 (FT-P-11) — owns SITL parameter-load verification of the
cold-boot fixture (AZ-407 AC-4 runtime path).
* AZ-439 (NFT-SEC-04) — owns the ASan-instrumented CVE-2025-53644
verification (AZ-407 AC-6's full PoC structure).
* AZ-444 hardware-loop ACs (AC-2/3/4/5) — owned by the Tier-2 smoke
test inside the runner image; will be re-verified on a Jetson
bring-up cycle.
## Next Batch
Batch 69 candidate set (all unblocked):
* AZ-408 (Runtime synthetic injection — 3pt) — outlier injector,
blackout-spoof injector, multi-segment injector (the fixtures
scaffolded by AZ-406 + AZ-407).
* AZ-410 (FT-P-01 — frame-center GPS accuracy — 5pt)
* AZ-411 (FT-P-02 — cumulative drift — 3pt)
Total: 11 cp across 3 tasks. AZ-408 unblocks the FT-N-* synthetic
scenarios; AZ-410 / AZ-411 are the first concrete positive scenarios
exercising the SUT through the full Docker-bound runner.