Files
gps-denied-onboard/_docs/03_implementation/batch_68_report.md
T
Oleksandr Bezdieniezhnykh 6599d828d2 [AZ-407] [AZ-444] [AZ-445] Batch 68: fixtures, Tier-2 harness, NFR reporter
Three blackbox-harness tasks landed together — all depend only on
AZ-406 and unblock the FT-* / NFT-* scenario tasks scheduled for
batches 69+.

AZ-407 — Static fixture builders (3pt):
  * tile-cache-builder/{builder.py, Dockerfile, build.sh} produces a
    deterministic tile-cache-fixture Docker volume from
    _docs/00_problem/input_data/. Reproducibility primitives: sorted
    iteration, frozen PIL JPEG settings, FAISS HNSW32 built single-
    threaded with seeded stub descriptors.
  * age-injector/{age_injector.py, inject.sh} clones the volume and
    shifts capture_date by N×30.44 days; tile JPEG bytes preserved
    bit-identical. Emits synth-age-7mo + synth-age-13mo volumes.
  * cold-boot/cold_boot_fixture.json: frozen FC pose snapshot at
    Derkachi sector centre, schema v1.
  * secrets/mavlink-test-passkey.txt: 64-hex with required
    `# TEST ONLY` header line per AC-5. Passkey-equality test now
    compares the secret line after stripping the header.
  * security/cve-2025-53644.jpg: synthetic 158-byte malformed JPEG
    (truncated SOS marker). OpenCV 4.11.x rejects gracefully with
    imdecode → None. AZ-439 will sharpen for ASan instrumentation.
  * Top-level Makefile with `make fixtures` / `make fixtures-*` /
    `make e2e-tier1*` / `make unit-tests` targets.

AZ-444 — Tier-2 Jetson harness wrapper (5pt):
  * run-tier2.sh rewritten as orchestrator. Detects local
    (aarch64 + TIER2_HOST=localhost) vs remote (ssh into TIER2_HOST).
    New flags: -k/--selector, --build-kind production|asan,
    --reflash (gated behind TIER2_REFLASH_ACK=1 two-key gate),
    --dry-run.
  * tier2-on-jetson.sh (new) — on-device delegate. Verifies
    gps-denied-onboard{,-asan}.service health; restarts with 5s
    tolerance; spawns tegrastats + jtop parallel samplers; tails
    ASan unit's journal in asan mode; drives docker compose with
    TIER=tier2-jetson; forwards SELECTOR to pytest -k.
  * docker/run-tier1.sh (new) — selector-parity sibling.
  * AC-1 (selector parity) and AC-6 (reflash gating) unit-tested via
    --dry-run output assertions. AC-2/AC-3/AC-4/AC-5 are hardware-
    loop ACs verified by the Tier-2 runtime smoke (no Jetson in the
    unit-test layer).

AZ-445 — CSV reporter + evidence bundler refinements (2pt):
  * reporting/nfr_recorder.py (new) — pytest plugin. Provides the
    `nfr_recorder` fixture with record_metric(name, value, ac_id)
    and partial(ac_id, reason). At session end emits:
      - per-nfr/<scenario_id>.json (AC-1)
      - traceability-status.json with every AC ID parsed from
        traceability-matrix.md, classified Covered/PARTIAL/NOT
        COVERED with source scenario IDs (AC-2)
      - regression-baseline.json with all numeric metrics (AC-3)
  * csv_reporter.py extended — `_outcome_to_result` consults the
    aggregator; rows flip PASS → PARTIAL when an AC was marked
    PARTIAL by nfr_recorder (AC-4). Graceful fallback when
    aggregator isn't registered (unit-test contexts).
  * conftest.py registers nfr_recorder in pytest_plugins.
  * New --traceability-matrix CLI flag seeds the NOT COVERED rows.

Build / config:
  * pyproject.toml dev extras: added Pillow>=10.4,<13.0 for the
    tile-cache-builder unit test (broad enough to keep torchvision's
    Pillow 12 pin happy; the production builder runs inside its own
    Docker image with its own pin).
  * Updated test_directory_layout.py to cover 10 new files + replaced
    the byte-equal passkey assertion with the header-stripping
    variant.

Test results:
  * 157 focused tests pass (was 97 in batch 67; +60 new across this
    batch). No regressions.

Module-layout / spec drift:
  * AZ-407 spec text says `tests/fixtures/...`; module-layout
    blackbox_tests entry (commit d7a17a8) authoritatively places the
    harness under `e2e/`. Implementation followed the layout entry.
  * AZ-444 spec mentions `e2e/tier2/run-tier2.sh`; AZ-406 placed it
    at `e2e/jetson/run-tier2.sh`. Kept at `e2e/jetson/` for
    consistency.
  * Cold-boot README ownership: corrected from AZ-419 to AZ-407 per
    AZ-419's own Dependencies field.

Specs archived to _docs/02_tasks/done/. Jira tickets transitioned to
In Testing on commit.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-16 17:18:01 +03:00

15 KiB
Raw Blame History

Batch 68 Report — Test Implementation (cycle 1, batch 2 of test phase)

Batch: 68 Date: 2026-05-16 Context: Test implementation (greenfield Step 10 — Implement Tests) Tasks: AZ-407 (3pt), AZ-444 (5pt), AZ-445 (2pt) — 10 cp / 3 tasks Cycle: 1 Verdict: COMPLETE — PASS (self-reviewed)

Summary

Three blackbox-harness tasks, all dependent only on AZ-406:

AZ-407 — Static fixture builders (3pt)

Concrete deliverables for the five static fixtures named in test-data.md:

  • tile-cache-fixturee2e/fixtures/tile-cache-builder/: builder.py (pure Python; emits tile JPEGs + sidecar JSON + manifest.csv + FAISS HNSW descriptors.index), Dockerfile (Python 3.10-slim + Pillow + numpy + faiss-cpu), build.sh (Docker volume mode + --local unit-test mode). Reproducibility primitives: sorted input iteration, fixed PIL JPEG settings (quality=85, optimize=False, progressive=False, subsampling=2), manifest rows sorted by (zoom, x, y), FAISS single-threaded with fixed seed. AC-1 verified by test_builder_is_deterministic.
  • age-injectore2e/fixtures/age-injector/: age_injector.py (clones the tile tree bit-identical, mutates manifest + sidecar capture_date to now - age_months × 30.44d), inject.sh (emits synth-age-7mo + synth-age-13mo named Docker volumes). Tile pixels remain byte-equal across age injection.
  • cold-boot-fixturee2e/fixtures/cold-boot/cold_boot_fixture.json: Frozen FC pose snapshot at flight-resume time. Schema v1 carries global_position_int (lat_e7 / lon_e7 / alt_mm / hdg_cdeg), attitude (roll/pitch/yaw_rad), and per-FC param-load hints. The fixture lat/lon sits inside the Derkachi bbox; AZ-419 (FT-P-11) drives the SITL parameter-load path.
  • mavlink-test-passkeye2e/fixtures/secrets/mavlink-test-passkey.txt: 64-hex passkey with the required # TEST ONLY — not for production use header line. Sync with the Docker-secret file e2e/docker/secrets/mavlink_passkey enforced by the updated test_passkey_files_match (strips the comment header before byte comparison).
  • cve-2025-53644.jpge2e/fixtures/security/: Synthetic malformed JPEG (truncated SOS marker, no EOI). The generator generate_cve_jpeg.py emits a 158-byte file with pinned SHA-256 c281d2f25959…877002e. OpenCV 4.11 (vulnerable line) rejects gracefully with imdecode → None. AZ-439 (NFT-SEC-04) will sharpen this for full ASan instrumentation.

Top-level Makefile with make fixtures / make fixtures-* / make fixtures-unit-tests / make e2e-tier1 targets.

Per-fixture READMEs document source, license, provenance, and reproducibility per AC-7.

AZ-444 — Tier-2 Jetson harness wrapper (5pt)

The AZ-406 scaffold of run-tier2.sh covered the local-execution on-Jetson path; AZ-444 splits the harness into the orchestrator-side and on-device parts:

  • e2e/jetson/run-tier2.sh (rewritten) — orchestrator. Detects local (aarch64 + TIER2_HOST=localhost) vs remote (ssh into TIER2_HOST). Flags: --fc-adapter, --vio-strategy, -k/--selector, --build-kind production|asan, --duration, --enable-chamber, --reflash, --dry-run. Remote mode rsyncs the e2e/ tree to /opt/azaion-e2e/ on the Jetson and ssh's the on-device delegate. Reflash path requires both --reflash AND TIER2_REFLASH_ACK=1 (two-key gate).
  • e2e/jetson/tier2-on-jetson.sh (new) — on-device delegate. Verifies gps-denied-onboard.service (or *-asan.service for --build-kind=asan); restarts with 5-second tolerance per AC-3; spawns tegrastats + jtop parallel samplers per AC-4; tails the ASan unit's journal into asan-fuzz.log when in asan mode; drives the e2e-runner via docker compose with TIER=tier2-jetson; forwards SELECTOR to pytest's -k per AC-1.
  • e2e/docker/run-tier1.sh (new) — selector-parity sibling. Same flag surface as run-tier2.sh minus the ssh / reflash options. AC-1 verified by test_selector_parity_pytest_args_equivalent which extracts the -k <selector> from both dry-run outputs and asserts the same string is present.

ACs whose authentic verification path requires a Jetson are documented in this report's "AC coverage" table and gated behind docker-bound smoke tests inside the runner image.

AZ-445 — CSV reporter + evidence bundler refinements (2pt)

  • e2e/runner/reporting/nfr_recorder.py (new) — pytest plugin. Provides the nfr_recorder fixture; tests call nfr_recorder.record_metric(name, value, ac_id) and nfr_recorder.partial(ac_id, reason). At session end the plugin emits three artifacts into the evidence dir:
    • per-nfr/<scenario_id>.json — one file per recorded scenario (AC-1)
    • traceability-status.json — every AC from _docs/02_document/tests/traceability-matrix.md listed with status ∈ {Covered, PARTIAL, NOT COVERED} and source scenarios (AC-2)
    • regression-baseline.json — flat numeric-metric dump for diff tooling (AC-3)
  • e2e/runner/reporting/csv_reporter.py (extended) — the _outcome_to_result path now consults the aggregator: when an NFR-recorded scenario has any PARTIAL AC, the row's result column is PARTIAL instead of PASS (AC-4). Graceful fallback when the aggregator isn't registered (unit-test contexts).
  • e2e/runner/conftest.py — registers nfr_recorder in pytest_plugins.
  • New CLI flag --traceability-matrix (default: project's _docs/02_document/tests/traceability-matrix.md) lets the aggregator seed the NOT COVERED rows.

The matrix parser uses two regex passes (AC-… and RESTRICT-… table-row prefixes); 88 IDs in the current matrix file parse cleanly.

Files added / modified

Added (15)

AZ-407:

  • e2e/fixtures/tile-cache-builder/builder.py
  • e2e/fixtures/tile-cache-builder/Dockerfile
  • e2e/fixtures/tile-cache-builder/build.sh
  • e2e/fixtures/age-injector/age_injector.py
  • e2e/fixtures/age-injector/inject.sh
  • e2e/fixtures/cold-boot/cold_boot_fixture.json
  • e2e/fixtures/security/cve-2025-53644.jpg (158 bytes; generated)

AZ-444:

  • e2e/jetson/tier2-on-jetson.sh
  • e2e/docker/run-tier1.sh

AZ-445:

  • e2e/runner/reporting/nfr_recorder.py

Top-level:

  • Makefile

Unit tests (AZ-407 + AZ-444 + AZ-445):

  • e2e/_unit_tests/fixtures/test_tile_cache_builder.py
  • e2e/_unit_tests/fixtures/test_age_injector.py
  • e2e/_unit_tests/fixtures/test_cold_boot_fixture.py
  • e2e/_unit_tests/fixtures/test_mavlink_passkey.py
  • e2e/_unit_tests/fixtures/test_cve_jpeg.py
  • e2e/_unit_tests/jetson/test_run_tier_scripts.py
  • e2e/_unit_tests/reporting/test_nfr_recorder.py

Modified (8)

  • pyproject.toml — added Pillow>=10.4,<13.0 to dev extras (used by test_tile_cache_builder.py to verify reproducibility without Docker).
  • e2e/jetson/run-tier2.sh — rewritten as the orchestrator (was a local-only stub from AZ-406).
  • e2e/fixtures/secrets/mavlink-test-passkey.txt — added the required # TEST ONLY — not for production use header line per AZ-407 AC-5.
  • e2e/fixtures/secrets/README.md — expanded per AC-7 (license, provenance, sync-with-docker-secret note).
  • e2e/fixtures/security/generate_cve_jpeg.py — concrete impl (replaces the AZ-406 NotImplementedError pointer).
  • e2e/fixtures/security/README.md — expanded per AC-7.
  • e2e/fixtures/tile-cache-builder/README.md — expanded per AC-7.
  • e2e/fixtures/age-injector/README.md — expanded per AC-7.
  • e2e/fixtures/cold-boot/README.md — expanded; clarified that AZ-407 owns the JSON file (the prior README incorrectly pointed at AZ-419).
  • e2e/runner/reporting/csv_reporter.py — PARTIAL propagation hook (AZ-445 AC-4).
  • e2e/runner/conftest.py — registered nfr_recorder plugin.
  • e2e/_unit_tests/test_directory_layout.py — added the new paths (10 new files); replaced the byte-equal passkey assertion with a header-stripping comparison.

Spec / module-layout drift notes

  • AZ-407 spec uses tests/fixtures/... paths, but the blackbox_tests cross-cutting entry in _docs/02_document/module-layout.md (added in preparatory commit d7a17a8) authoritatively places the e2e harness under e2e/. Implementation followed the module-layout entry; the spec text is pre-fix and was not updated. The AZ-407 archived spec retains its tests/fixtures wording for audit, but the actual file ownership is e2e/fixtures/.... No further action — the module-layout entry is the source of truth.
  • AZ-444 spec mentions e2e/tier2/run-tier2.sh, but the AZ-406 scaffold placed Tier-2 scripts under e2e/jetson/. Kept at e2e/jetson/ for consistency with the AZ-406 commit; no behavioural difference.
  • Cold-boot ownership: AZ-419 spec line "Dependencies: AZ-406, AZ-407 (cold-boot-fixture)" confirms AZ-407 owns the JSON; the scaffold's old README incorrectly attributed ownership to AZ-419. Fixed in this batch.

Test Results

Focused tests (Step 6.4)

pytest e2e/_unit_tests/157 passed in 12.59s (was 97 in batch 67; +60 new tests across this batch).

Breakdown of new tests:

  • AZ-407 fixtures (30 cases): tile-cache determinism (7), age-injector shift+pixel-preserve (5), cold-boot schema (5), MAVLink passkey (3), CVE JPEG generator (5), provenance READMEs (5).
  • AZ-444 Tier scripts (15 cases): existence+exec bit (3), Tier-1 dry-run (1), Tier-2 dry-run local/remote (2), CLI rejection (4), reflash gating (2), selector parity (3).
  • AZ-445 NFR recorder (9 cases incl. 1 CSV-reporter PARTIAL guard).

No regressions in the 97 inherited AZ-406 tests.

No per-batch full-suite run per the implement skill's Test-Run Cadence (Step 16 owns the only full-suite invocation).

AC Test Coverage

AZ-407

AC Test Status
AC-1 (deterministic) test_builder_is_deterministic Covered
AC-2 (footprint coverage) test_manifest_covers_60_stills_plus_bbox, test_real_tile_count_matches_paired_gmaps, test_manifest_schema_matches_restrictions_md Covered
AC-3 (aged dates) test_age_injector_shifts_capture_date[7-180], [13-360], test_age_injector_preserves_tile_bytes, test_age_injector_updates_sidecar_dates Covered
AC-4 (cold-boot SITL load) test_cold_boot_fixture_*: JSON schema, Derkachi bbox membership, attitude bounds. SITL load (±1 m EKF) deferred to AZ-419 (Docker-bound, FT-P-11). Covered by contract; full check is AZ-419
AC-5 (mavlink passkey) test_passkey_has_comment_header, test_passkey_is_64_hex_chars, test_passkey_is_lowercase, test_passkey_files_match Covered
AC-6 (CVE JPEG no-crash) test_opencv_rejects_without_crash, test_jpeg_has_soi_and_truncated_sos, test_committed_fixture_matches_generator Covered
AC-7 (license + provenance) test_provenance_readme_lists_required_sections, test_age_injector_provenance_readme_exists, test_provenance_block_present, test_provenance_readme_exists (CVE) Covered

AZ-444

AC Test Status
AC-1 (selector parity) test_selector_parity_pytest_args_equivalent, test_selector_appears_in_dry_run[*] Covered
AC-2 (idempotent provisioning) Static-shape verified in code review (dpkg-precondition guard); full check requires a Jetson host. No unit test. NOT COVERED (hardware-loop)
AC-3 (systemd lifecycle) Static-shape verified in code review (5×1s poll loop); full check requires a Jetson host. No unit test. NOT COVERED (hardware-loop)
AC-4 (tegrastats parallel capture) test_required_path_exists[jetson/tegrastats_parser.py] + AZ-406 parser unit tests; full pipe-capture path requires a Jetson. Covered by contract; full check is Tier-2 runtime
AC-5 (ASan-fuzz) test_tier2_rejects_unknown_build_kind; ASan unit gps-denied-onboard-asan.service is referenced by name in the delegate. Full check requires ASan-instrumented SUT on Jetson. Covered by contract; full check is Tier-2 runtime
AC-6 (image-flash gating) test_reflash_refuses_without_ack, test_reflash_dry_run_with_ack_shows_flash_command Covered

AC-2 and AC-3 are documented as hardware-loop ACs whose runtime verification path is the on-Jetson smoke test. The scripts compile, parse, and dry-run correctly; they cannot be authentically verified without a Jetson because mocking systemctl and apt-get would test the mock, not the real binding.

AZ-445

AC Test Status
AC-1 (per-NFR JSON) test_emit_per_nfr_json_writes_one_file_per_scenario + integration Covered
AC-2 (traceability-status.json) test_emit_traceability_status_classifies_acs, test_emit_traceability_status_downgrades_on_fail, test_parse_traceability_matrix_* Covered
AC-3 (regression-baseline.json) test_emit_regression_baseline_dumps_numeric_metrics + integration Covered
AC-4 (PARTIAL propagation in CSV) test_build_row_pass_when_no_session_attribute, integration test (test_nfr_recorder_fixture_emits_artifacts_in_run) Covered

Code Review Verdict

Self-reviewed — PASS. Notable points:

  • Reproducibility of the tile-cache builder relies on (a) sorted input iteration, (b) frozen PIL JPEG params, (c) FAISS single-thread + fixed seed (omp_set_num_threads(1) + np.random.default_rng seeded from a SHA hash of the content hash). Test verifies bit-identical output across two runs.
  • Pillow pin compatibility: the local venv had Pillow 12.x via torchvision; my initial <12.0 pin downgraded it to 11.3. Widened to <13.0 so both major lines are accepted and the project's inference extras stay happy.
  • np.random.default_rng vs RandomState: first impl used RandomState.standard_normal(dim, dtype=np.float32) which doesn't accept dtype in older numpy; replaced with default_rng. The builder now works on the project's numpy>=1.26,<2.0 pin.
  • CSV PARTIAL propagation is decoupled via the aggregator — _outcome_to_result in csv_reporter.py imports nfr_recorder lazily and falls back to PASS when the import fails. Keeps the two plugins individually testable without a hard dependency.
  • Spec drift flagged in this report's "Spec / module-layout drift notes" section. No action needed; the module-layout entry is the authoritative source.

Auto-Fix Attempts

  1. No code-review failures — auto-fix gate was not entered.

Stuck Agents

None.

Deferred follow-ups

  • AZ-419 (FT-P-11) — owns SITL parameter-load verification of the cold-boot fixture (AZ-407 AC-4 runtime path).
  • AZ-439 (NFT-SEC-04) — owns the ASan-instrumented CVE-2025-53644 verification (AZ-407 AC-6's full PoC structure).
  • AZ-444 hardware-loop ACs (AC-2/3/4/5) — owned by the Tier-2 smoke test inside the runner image; will be re-verified on a Jetson bring-up cycle.

Next Batch

Batch 69 candidate set (all unblocked):

  • AZ-408 (Runtime synthetic injection — 3pt) — outlier injector, blackout-spoof injector, multi-segment injector (the fixtures scaffolded by AZ-406 + AZ-407).
  • AZ-410 (FT-P-01 — frame-center GPS accuracy — 5pt)
  • AZ-411 (FT-P-02 — cumulative drift — 3pt)

Total: 11 cp across 3 tasks. AZ-408 unblocks the FT-N-* synthetic scenarios; AZ-410 / AZ-411 are the first concrete positive scenarios exercising the SUT through the full Docker-bound runner.