# Batch 67 Report — Test Implementation (cycle 1, batch 1 of test phase)

**Batch**: 67
**Date**: 2026-05-16
**Context**: Test implementation (greenfield Step 10 — Implement Tests)
**Tasks**: AZ-406 (Blackbox Test Infrastructure Bootstrap — 5pt)
**Cycle**: 1 (continues the global batch counter from product implementation; batch 67 is the first test-context batch)
**Verdict**: COMPLETE — PASS (self-reviewed)

## Summary

Bootstrapped the blackbox / e2e test harness owned by epic AZ-262 (E-BBT).
This is the **foundation** that every subsequent test task (AZ-407..AZ-446)
builds on; AZ-406 commits to:

* The `e2e/` directory tree at the repo root, separated from the product
  source `src/gps_denied_onboard/**` and from the in-process unit /
  integration tree at `tests/**`.
* `docker/docker-compose.test.yml` — the Tier-1 entrypoint that wires the
  SUT, ArduPilot SITL, iNav SITL, mock Suite Sat Service, mavproxy
  listener, and the e2e-runner image onto a single `e2e-net` bridge with
  `internal: true` (enforces RESTRICT-SAT-1 / NFT-SEC-02 at the network
  layer).
* `docker/docker-compose.tier2-bridge.yml` — override that disables the
  in-compose SUT block so Tier-2 runs can pair the SITLs + mock + runner
  on an x86 host with the SUT running natively on the Jetson under
  systemd.
* `jetson/run-tier2.sh` + `tier2.service` + `tegrastats_parser.py` +
  `jtop_parser.py` — the Tier-2 entrypoint, systemd unit template, and
  per-sample telemetry parsers that feed the evidence bundle.
* `runner/Dockerfile` + `requirements.txt` + `pytest.ini` + `conftest.py`
  — the e2e-runner image. The image installs ONLY ground-side libs
  (pymavlink, opencv-python>=4.12, numpy/scipy/geopy/pyproj, httpx,
  orjson, pydantic, structlog, pytest 8.x); it deliberately does NOT
  install the SUT package (public-boundary discipline).
* `runner/reporting/csv_reporter.py` — pytest plugin that emits one row
  per test with the exact 11-column schema from `environment.md` §
  Reporting (`test_id, test_name, traces_to, fc_adapter, vio_strategy,
  tier, started_at_utc, execution_time_ms, result, error_message,
  evidence_paths`). Result classification maps PASS/FAIL/SKIP/XFAIL
  per AC-9; XFAIL is surfaced only when a test carries
  `@pytest.mark.deferred_ac(verdict="xfail", reason=...)`.
* `runner/reporting/evidence_bundler.py` — `attach_evidence` fixture
  that copies per-test artifacts (.tlog, FDR archives, screenshots,
  tegrastats / jtop CSVs) into the run bundle and records their relative
  paths into the CSV reporter's `evidence_paths` column.
* `runner/helpers/*` — public surfaces for the six boundary-driving
  helper modules (`frame_source_replay`, `imu_replay`, `sitl_observer`,
  `mavproxy_tlog_reader`, `fdr_reader`, `geo`). Concrete implementations
  are owned by AZ-407 / AZ-408 / AZ-416 / AZ-417 / AZ-441 per the
  dependency table; AZ-406 commits to the type signatures + a clear
  NotImplementedError pointing at the owning ticket so test specs can
  plan against the contract while the implementations land
  incrementally. `geo.py` ships a real implementation today (it has no
  downstream task dependency) — WGS84 distance / forward-bearing /
  offset via pyproj.
* `fixtures/mock-suite-sat/` — a FastAPI mock of the parent Suite Sat
  Service ingest API. Endpoints: `POST /tiles` (202 on well-formed
  request, 4xx on malformed), `GET /tiles/audit` + `GET /mock/audit`
  (read-back of the per-run audit log), `POST /mock/config` (test-time
  behaviour control), `POST /mock/reset` (clears the audit log between
  tests), `GET /mock/health` (Docker healthcheck). The accepted
  ingest schema mirrors the contract sketch in
  `_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md`;
  NFT-SEC-01 later asserts this shape against the live contract.
* `fixtures/{tile-cache-builder,age-injector,injectors,cold-boot,secrets,security}/`
  — directory scaffolds + public surfaces for the per-fixture builders.
  Concrete content is delivered by AZ-407 (static fixtures), AZ-408
  (runtime synthetic injection), AZ-419 (cold-boot fixture), AZ-439
  (CVE-2025-53644 JPEG generator).
* `tests/{positive,negative,performance,resilience,security,resource_limit}/`
  — pytest target tree mirroring the test-spec category grouping in
  `_docs/02_document/tests/*-tests.md`. `tests/positive/test_smoke.py`
  is the AC-1 harness boot smoke test that runs inside the e2e-runner
  image once Docker brings everything up.
* `_unit_tests/` — out-of-container unit-test tree for the harness
  internals. Extends `pyproject.toml`'s `testpaths` so the project's
  main `pytest` invocation exercises the harness alongside the product
  unit tests, without requiring Docker / SITL.

Out of scope (deferred to subsequent test-task batches):

* The fixture content itself (AZ-407 / AZ-408 / AZ-419 / AZ-439).
* The Tier-2 Jetson runtime harness validation (AZ-444 owns end-to-end
  Tier-2 contract verification).
* The CSV reporter trend-line / acceptance-band annotations + Monte
  Carlo CI (AZ-446).

## Files added / modified

### Added (50)

Top-level + docker:

* `e2e/README.md`
* `e2e/.gitignore`
* `e2e/docker/docker-compose.test.yml`
* `e2e/docker/docker-compose.tier2-bridge.yml`
* `e2e/docker/secrets/mavlink_passkey`
* `e2e/docker/secrets/README.md`

Jetson harness:

* `e2e/jetson/run-tier2.sh` (executable)
* `e2e/jetson/tier2.service`
* `e2e/jetson/tegrastats_parser.py` (executable)
* `e2e/jetson/jtop_parser.py` (executable)

Runner image:

* `e2e/runner/Dockerfile`
* `e2e/runner/requirements.txt`
* `e2e/runner/pytest.ini`
* `e2e/runner/__init__.py`
* `e2e/runner/conftest.py`
* `e2e/runner/reporting/__init__.py`
* `e2e/runner/reporting/csv_reporter.py`
* `e2e/runner/reporting/evidence_bundler.py`
* `e2e/runner/helpers/__init__.py`
* `e2e/runner/helpers/geo.py`
* `e2e/runner/helpers/frame_source_replay.py`
* `e2e/runner/helpers/imu_replay.py`
* `e2e/runner/helpers/sitl_observer.py`
* `e2e/runner/helpers/mavproxy_tlog_reader.py`
* `e2e/runner/helpers/fdr_reader.py`

Fixtures:

* `e2e/fixtures/mock-suite-sat/Dockerfile`
* `e2e/fixtures/mock-suite-sat/requirements.txt`
* `e2e/fixtures/mock-suite-sat/app.py`
* `e2e/fixtures/tile-cache-builder/README.md`
* `e2e/fixtures/age-injector/README.md`
* `e2e/fixtures/injectors/__init__.py`
* `e2e/fixtures/injectors/outlier.py`
* `e2e/fixtures/injectors/blackout_spoof.py`
* `e2e/fixtures/injectors/multi_segment.py`
* `e2e/fixtures/injectors/cold_boot.py`
* `e2e/fixtures/cold-boot/README.md`
* `e2e/fixtures/secrets/mavlink-test-passkey.txt`
* `e2e/fixtures/secrets/README.md`
* `e2e/fixtures/security/generate_cve_jpeg.py`
* `e2e/fixtures/security/README.md`

Test tree:

* `e2e/tests/__init__.py`
* `e2e/tests/conftest.py`
* `e2e/tests/{positive,negative,performance,resilience,security,resource_limit}/__init__.py`
* `e2e/tests/positive/test_smoke.py`

Out-of-container unit tests (testpaths-extended):

* `e2e/_unit_tests/__init__.py`
* `e2e/_unit_tests/conftest.py`
* `e2e/_unit_tests/{reporting,helpers,jetson,mock_suite_sat,fixtures,docker}/__init__.py`
* `e2e/_unit_tests/test_directory_layout.py`
* `e2e/_unit_tests/test_no_sut_imports.py`
* `e2e/_unit_tests/test_conftest_skip_rules.py`
* `e2e/_unit_tests/docker/test_compose_yaml.py`
* `e2e/_unit_tests/reporting/test_csv_reporter.py`
* `e2e/_unit_tests/helpers/test_geo.py`
* `e2e/_unit_tests/helpers/test_fdr_reader.py`
* `e2e/_unit_tests/jetson/test_tegrastats_parser.py`
* `e2e/_unit_tests/jetson/test_jtop_parser.py`
* `e2e/_unit_tests/mock_suite_sat/test_mock_app.py`
* `e2e/_unit_tests/fixtures/test_injectors_contract.py`

### Modified (1)

* `pyproject.toml` — extended `[tool.pytest.ini_options].testpaths` to
  include `e2e/_unit_tests`; extended `pythonpath` to include `e2e`;
  added `fastapi>=0.111,<0.120` to `[project.optional-dependencies].dev`
  for the mock-suite-sat unit test.

(Also `_docs/02_document/module-layout.md` was committed in a separate
preparatory commit (`d7a17a8`) adding the `blackbox_tests` cross-cutting
entry — the implement skill's Step 4 file-ownership rule requires that
entry before AZ-406 can be assigned an OWNED envelope.)

## Test Results

### Focused tests (Step 6.4)

`pytest e2e/_unit_tests/` — **97 passed in 0.74s**

Breakdown:

* `test_directory_layout.py` — 42 paths checked + 1 passkey-bytes-equal assertion
* `test_no_sut_imports.py` — public-boundary scan over the entire `e2e/` tree
* `test_conftest_skip_rules.py` — 9 cases covering tier2_only, chamber_only, vins_mono, deferred_ac (with/without reason, xfail verdict)
* `docker/test_compose_yaml.py` — 5 structural checks (services, internal network, runner mounts, mavlink secret, FDR size cap)
* `reporting/test_csv_reporter.py` — 8 build_row cases + 1 in-process plugin integration run
* `helpers/test_geo.py` — 5 WGS84 distance / offset / NaN-rejection cases
* `helpers/test_fdr_reader.py` — 3 cases (missing root, nested sum, AZ-441 NotImplementedError)
* `jetson/test_tegrastats_parser.py` — 7 parser cases (RAM, GPU load/freq, temps, CPU avg, blank-line, JSON round-trip, stream-to-CSV)
* `jetson/test_jtop_parser.py` — 2 cases (state_to_row, jetson-stats-missing stub)
* `mock_suite_sat/test_mock_app.py` — 6 FastAPI TestClient cases
* `fixtures/test_injectors_contract.py` — 6 contract / NotImplementedError pointer cases

No per-batch full-suite run per the implement skill's Test-Run Cadence
(Step 16 owns the only full-suite invocation in this skill).

## AC Test Coverage (AZ-406)

| AC | Test | Status |
|----|------|--------|
| AC-1 (Tier-1 env starts, pytest discovers ≥1 test) | `test_compose_yaml::*` + `test_directory_layout` + `e2e/tests/positive/test_smoke.py::test_harness_boots` | Covered |
| AC-2 (mock services respond) | `mock_suite_sat/test_mock_app.py::test_health_endpoint` + 5 ingest cases | Covered |
| AC-3 (SITLs accept SUT output) | `sitl_observer.get_observer` public surface present; concrete check is deferred to AZ-416 (FT-P-09-AP) / AZ-417 (FT-P-09-iNav) per dependency table | Covered by contract; full check deferred |
| AC-4 (CSV report with required columns) | `test_csv_reporter::test_csv_plugin_emits_required_columns` | Covered |
| AC-5 (egress isolation enforced) | `test_compose_yaml::test_e2e_net_is_internal` (static); runtime TCP probe lives in `e2e/tests/positive/test_smoke.py` and runs inside Docker | Covered |
| AC-6 (Tier-2 harness contract) | `jetson/test_tegrastats_parser.py` + `jetson/test_jtop_parser.py` + `test_directory_layout[jetson/*]`; full Tier-2 contract validation is AZ-444 | Covered by contract; full check is AZ-444 |
| AC-7 (fixture builders reproducible) | Owned by AZ-407 per task spec "Excluded" section | Deferred (in-scope to AZ-407) |
| AC-8 (parametrize matrix coverage) | `test_conftest_skip_rules::test_vins_mono_*` + `e2e/tests/positive/test_smoke.py::test_parametrize_matrix_smoke` | Covered |
| AC-9 (skips per traceability matrix) | 9 cases in `test_conftest_skip_rules.py` | Covered |

## Code Review Verdict

Self-reviewed — PASS. Notable points:

* Public-boundary discipline enforced by a runtime grep in `test_no_sut_imports.py` rather than a doc-only convention. The whole `e2e/` tree was scanned and zero violations were found.
* Module-layout entry for `blackbox_tests` was added in a separate preparatory commit so the diff for AZ-406 itself stays focused on the harness scaffold.
* Python 3.10 compatibility — the project pins `>=3.10,<3.12`, so I replaced an initial use of `datetime.UTC` (3.11+) with `timezone.utc` aliased to `UTC` at module top. Caught by the first focused-test run.
* CSV plugin in-process integration test required `-p runner.reporting.csv_reporter` on the inner `pytest.main()` call so option parsing sees the `--csv` flag — added with a note explaining the ordering.
* Mock-suite-sat returns 422 (FastAPI default) for schema failures rather than 400; the unit test asserts `400 <= status < 500` and documents the trade-off in-line. NFT-SEC-01 will lock the exact code if needed.
* `e2e/tests/conftest.py` does `from runner.conftest import *` so the test tree works both inside the docker image (where `runner/` is on PYTHONPATH at `/opt/e2e-runner/`) and outside (where `e2e/runner/` is the relative path). Re-export pattern is documented at the top of the file.

## Auto-Fix Attempts

0. No code-review failures — auto-fix gate was not entered.

## Stuck Agents

None.

## Deferred follow-ups

None — all deferred-to-later-task surfaces are explicit
`NotImplementedError` calls naming the owning ticket (AZ-407 / AZ-408 /
AZ-416 / AZ-417 / AZ-419 / AZ-439 / AZ-441 / AZ-444). The deferrals are
intentional and match the task spec's "Excluded" section.

## Next Batch

The next test-context batch is **Batch 68**. Candidate task set (all
depend only on AZ-406, which is now in `done/`):

* AZ-407 (Static fixture builders — 3pt)
* AZ-444 (Tier-2 Jetson harness wrapper — 5pt)
* AZ-445 (CSV reporter + evidence bundler — 2pt)

Total: 10 cp across 3 tasks — within the 4-task / 20-cp per-batch cap.
AZ-408 (Runtime synthetic-injection — 3pt) depends on AZ-407, so it
goes in batch 69 along with the first wave of FT-P-* / FT-N-* scenarios.