Files
Oleksandr Bezdieniezhnykh 59d9116d36 [AZ-406] Blackbox test harness bootstrap (Tier-1 + Tier-2 scaffold)
Bootstraps the public-boundary blackbox test harness owned by epic
AZ-262 (E-BBT). Establishes the e2e/ directory tree at the repo root,
fully separated from src/gps_denied_onboard/** and from the in-process
tests/** tree, and commits to the contracts every subsequent test
ticket (AZ-407..AZ-446) builds against.

Tier-1 (workstation Docker):
- docker/docker-compose.test.yml wires SUT + ArduPilot SITL + iNav SITL
  + mock Suite Sat Service + mavproxy listener + e2e-runner onto one
  e2e-net bridge with internal: true (enforces RESTRICT-SAT-1 /
  NFT-SEC-02 egress isolation at the network layer).
- docker/docker-compose.tier2-bridge.yml override disables the in-
  compose SUT so Tier-2 pairs SITLs + mock + runner on an x86 host
  while the SUT runs natively on the Jetson under systemd.

Tier-2 (Jetson):
- jetson/run-tier2.sh + tier2.service systemd unit + tegrastats /
  jtop parsers feed per-sample telemetry into the evidence bundle.

Runner image (e2e/runner/):
- Dockerfile + requirements.txt install ONLY ground-side libs
  (pymavlink, opencv-python>=4.12, numpy/scipy/geopy/pyproj, httpx,
  orjson, pydantic, structlog, pytest 8.x). The runner deliberately
  does NOT install the SUT package.
- conftest.py implements the AC-9 skip-rule mapping (tier2_only,
  chamber_only, vins_mono, deferred_ac) tied to environment.md
  parametrize axes.
- reporting/csv_reporter.py is a pytest plugin emitting one row per
  test with the exact 11-column schema from environment.md §
  Reporting (test_id, test_name, traces_to, fc_adapter, vio_strategy,
  tier, started_at_utc, execution_time_ms, result, error_message,
  evidence_paths). XFAIL surfaced only when a test carries
  @pytest.mark.deferred_ac(verdict="xfail", reason=...).
- reporting/evidence_bundler.py exposes the attach_evidence fixture
  that copies per-test artifacts (.tlog, FDR archives, screenshots,
  tegrastats / jtop CSVs) into the run bundle and records relative
  paths into the reporter's evidence_paths column.
- helpers/{frame_source_replay,imu_replay,sitl_observer,
  mavproxy_tlog_reader,fdr_reader}.py declare the public surfaces
  (concrete implementations owned by AZ-407 / AZ-408 / AZ-416 /
  AZ-417 / AZ-441 per the dependency table); helpers/geo.py ships
  today (no downstream task dep) — WGS84 distance / forward-bearing
  / offset via pyproj with NaN rejection.

Mock Suite Sat Service (e2e/fixtures/mock-suite-sat/):
- FastAPI app: POST /tiles (ingest contract from D-PROJ-2 follow-up),
  GET /tiles/audit + /mock/audit (per-run read-back), POST
  /mock/config (force-status, response delay), POST /mock/reset
  (clears audit between tests), GET /mock/health.

Fixture scaffolds (e2e/fixtures/{tile-cache-builder, age-injector,
injectors, cold-boot, secrets, security}/):
- Public surfaces only. Concrete builders land in AZ-407 (static
  fixtures), AZ-408 (runtime synthetic injection), AZ-419 (cold-boot
  fixture), AZ-439 (CVE-2025-53644 JPEG generator).

Test tree (e2e/tests/{positive,negative,performance,resilience,
security,resource_limit}/):
- Mirror of the test-spec category grouping in
  _docs/02_document/tests/*-tests.md.
- tests/positive/test_smoke.py is the AC-1 harness-boot smoke run
  inside the e2e-runner image once Docker brings everything up.

Out-of-container unit tests (e2e/_unit_tests/):
- Exercises the harness internals (CSV reporter plugin lifecycle,
  conftest skip rules, helper modules, parsers, mock app, compose
  YAML structural contract, public-boundary enforcement) without
  Docker / SITL. 97 unit tests, all passing.

Build / config:
- pyproject.toml: testpaths extended with e2e/_unit_tests; pythonpath
  extended with e2e; fastapi>=0.111,<0.120 added to dev extras for the
  mock-app TestClient unit test.

AC coverage:
- AC-1 (Tier-1 boot)         → compose YAML test + directory layout
                                + smoke test (Docker-bound)
- AC-2 (mock services)       → 6 FastAPI TestClient unit tests
- AC-3 (SITLs accept output) → contract present; concrete check
                                deferred to AZ-416 / AZ-417
- AC-4 (CSV columns)         → in-process plugin lifecycle test
                                emits the exact 11-column schema
- AC-5 (egress isolation)    → static config test + runtime probe
                                in Docker-bound smoke
- AC-6 (Tier-2 contract)     → tegrastats + jtop parser unit tests
                                + jetson/* layout test; full Tier-2
                                contract is AZ-444
- AC-7 (fixture reproducibility) → deferred to AZ-407 per task spec
- AC-8 (parametrize matrix)  → vins_mono skip-rule cases +
                                tests/positive/test_smoke
- AC-9 (skip semantics)      → 9 conftest skip-rule unit tests

Module layout entry for blackbox_tests was added in 2026-05-16
preparatory commit d7a17a8 so this diff stays focused on the harness
scaffold. AZ-406 advances to In Testing on commit.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-16 16:22:44 +03:00

13 KiB

Batch 67 Report — Test Implementation (cycle 1, batch 1 of test phase)

Batch: 67 Date: 2026-05-16 Context: Test implementation (greenfield Step 10 — Implement Tests) Tasks: AZ-406 (Blackbox Test Infrastructure Bootstrap — 5pt) Cycle: 1 (continues the global batch counter from product implementation; batch 67 is the first test-context batch) Verdict: COMPLETE — PASS (self-reviewed)

Summary

Bootstrapped the blackbox / e2e test harness owned by epic AZ-262 (E-BBT). This is the foundation that every subsequent test task (AZ-407..AZ-446) builds on; AZ-406 commits to:

  • The e2e/ directory tree at the repo root, separated from the product source src/gps_denied_onboard/** and from the in-process unit / integration tree at tests/**.
  • docker/docker-compose.test.yml — the Tier-1 entrypoint that wires the SUT, ArduPilot SITL, iNav SITL, mock Suite Sat Service, mavproxy listener, and the e2e-runner image onto a single e2e-net bridge with internal: true (enforces RESTRICT-SAT-1 / NFT-SEC-02 at the network layer).
  • docker/docker-compose.tier2-bridge.yml — override that disables the in-compose SUT block so Tier-2 runs can pair the SITLs + mock + runner on an x86 host with the SUT running natively on the Jetson under systemd.
  • jetson/run-tier2.sh + tier2.service + tegrastats_parser.py + jtop_parser.py — the Tier-2 entrypoint, systemd unit template, and per-sample telemetry parsers that feed the evidence bundle.
  • runner/Dockerfile + requirements.txt + pytest.ini + conftest.py — the e2e-runner image. The image installs ONLY ground-side libs (pymavlink, opencv-python>=4.12, numpy/scipy/geopy/pyproj, httpx, orjson, pydantic, structlog, pytest 8.x); it deliberately does NOT install the SUT package (public-boundary discipline).
  • runner/reporting/csv_reporter.py — pytest plugin that emits one row per test with the exact 11-column schema from environment.md § Reporting (test_id, test_name, traces_to, fc_adapter, vio_strategy, tier, started_at_utc, execution_time_ms, result, error_message, evidence_paths). Result classification maps PASS/FAIL/SKIP/XFAIL per AC-9; XFAIL is surfaced only when a test carries @pytest.mark.deferred_ac(verdict="xfail", reason=...).
  • runner/reporting/evidence_bundler.pyattach_evidence fixture that copies per-test artifacts (.tlog, FDR archives, screenshots, tegrastats / jtop CSVs) into the run bundle and records their relative paths into the CSV reporter's evidence_paths column.
  • runner/helpers/* — public surfaces for the six boundary-driving helper modules (frame_source_replay, imu_replay, sitl_observer, mavproxy_tlog_reader, fdr_reader, geo). Concrete implementations are owned by AZ-407 / AZ-408 / AZ-416 / AZ-417 / AZ-441 per the dependency table; AZ-406 commits to the type signatures + a clear NotImplementedError pointing at the owning ticket so test specs can plan against the contract while the implementations land incrementally. geo.py ships a real implementation today (it has no downstream task dependency) — WGS84 distance / forward-bearing / offset via pyproj.
  • fixtures/mock-suite-sat/ — a FastAPI mock of the parent Suite Sat Service ingest API. Endpoints: POST /tiles (202 on well-formed request, 4xx on malformed), GET /tiles/audit + GET /mock/audit (read-back of the per-run audit log), POST /mock/config (test-time behaviour control), POST /mock/reset (clears the audit log between tests), GET /mock/health (Docker healthcheck). The accepted ingest schema mirrors the contract sketch in _docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md; NFT-SEC-01 later asserts this shape against the live contract.
  • fixtures/{tile-cache-builder,age-injector,injectors,cold-boot,secrets,security}/ — directory scaffolds + public surfaces for the per-fixture builders. Concrete content is delivered by AZ-407 (static fixtures), AZ-408 (runtime synthetic injection), AZ-419 (cold-boot fixture), AZ-439 (CVE-2025-53644 JPEG generator).
  • tests/{positive,negative,performance,resilience,security,resource_limit}/ — pytest target tree mirroring the test-spec category grouping in _docs/02_document/tests/*-tests.md. tests/positive/test_smoke.py is the AC-1 harness boot smoke test that runs inside the e2e-runner image once Docker brings everything up.
  • _unit_tests/ — out-of-container unit-test tree for the harness internals. Extends pyproject.toml's testpaths so the project's main pytest invocation exercises the harness alongside the product unit tests, without requiring Docker / SITL.

Out of scope (deferred to subsequent test-task batches):

  • The fixture content itself (AZ-407 / AZ-408 / AZ-419 / AZ-439).
  • The Tier-2 Jetson runtime harness validation (AZ-444 owns end-to-end Tier-2 contract verification).
  • The CSV reporter trend-line / acceptance-band annotations + Monte Carlo CI (AZ-446).

Files added / modified

Added (50)

Top-level + docker:

  • e2e/README.md
  • e2e/.gitignore
  • e2e/docker/docker-compose.test.yml
  • e2e/docker/docker-compose.tier2-bridge.yml
  • e2e/docker/secrets/mavlink_passkey
  • e2e/docker/secrets/README.md

Jetson harness:

  • e2e/jetson/run-tier2.sh (executable)
  • e2e/jetson/tier2.service
  • e2e/jetson/tegrastats_parser.py (executable)
  • e2e/jetson/jtop_parser.py (executable)

Runner image:

  • e2e/runner/Dockerfile
  • e2e/runner/requirements.txt
  • e2e/runner/pytest.ini
  • e2e/runner/__init__.py
  • e2e/runner/conftest.py
  • e2e/runner/reporting/__init__.py
  • e2e/runner/reporting/csv_reporter.py
  • e2e/runner/reporting/evidence_bundler.py
  • e2e/runner/helpers/__init__.py
  • e2e/runner/helpers/geo.py
  • e2e/runner/helpers/frame_source_replay.py
  • e2e/runner/helpers/imu_replay.py
  • e2e/runner/helpers/sitl_observer.py
  • e2e/runner/helpers/mavproxy_tlog_reader.py
  • e2e/runner/helpers/fdr_reader.py

Fixtures:

  • e2e/fixtures/mock-suite-sat/Dockerfile
  • e2e/fixtures/mock-suite-sat/requirements.txt
  • e2e/fixtures/mock-suite-sat/app.py
  • e2e/fixtures/tile-cache-builder/README.md
  • e2e/fixtures/age-injector/README.md
  • e2e/fixtures/injectors/__init__.py
  • e2e/fixtures/injectors/outlier.py
  • e2e/fixtures/injectors/blackout_spoof.py
  • e2e/fixtures/injectors/multi_segment.py
  • e2e/fixtures/injectors/cold_boot.py
  • e2e/fixtures/cold-boot/README.md
  • e2e/fixtures/secrets/mavlink-test-passkey.txt
  • e2e/fixtures/secrets/README.md
  • e2e/fixtures/security/generate_cve_jpeg.py
  • e2e/fixtures/security/README.md

Test tree:

  • e2e/tests/__init__.py
  • e2e/tests/conftest.py
  • e2e/tests/{positive,negative,performance,resilience,security,resource_limit}/__init__.py
  • e2e/tests/positive/test_smoke.py

Out-of-container unit tests (testpaths-extended):

  • e2e/_unit_tests/__init__.py
  • e2e/_unit_tests/conftest.py
  • e2e/_unit_tests/{reporting,helpers,jetson,mock_suite_sat,fixtures,docker}/__init__.py
  • e2e/_unit_tests/test_directory_layout.py
  • e2e/_unit_tests/test_no_sut_imports.py
  • e2e/_unit_tests/test_conftest_skip_rules.py
  • e2e/_unit_tests/docker/test_compose_yaml.py
  • e2e/_unit_tests/reporting/test_csv_reporter.py
  • e2e/_unit_tests/helpers/test_geo.py
  • e2e/_unit_tests/helpers/test_fdr_reader.py
  • e2e/_unit_tests/jetson/test_tegrastats_parser.py
  • e2e/_unit_tests/jetson/test_jtop_parser.py
  • e2e/_unit_tests/mock_suite_sat/test_mock_app.py
  • e2e/_unit_tests/fixtures/test_injectors_contract.py

Modified (1)

  • pyproject.toml — extended [tool.pytest.ini_options].testpaths to include e2e/_unit_tests; extended pythonpath to include e2e; added fastapi>=0.111,<0.120 to [project.optional-dependencies].dev for the mock-suite-sat unit test.

(Also _docs/02_document/module-layout.md was committed in a separate preparatory commit (d7a17a8) adding the blackbox_tests cross-cutting entry — the implement skill's Step 4 file-ownership rule requires that entry before AZ-406 can be assigned an OWNED envelope.)

Test Results

Focused tests (Step 6.4)

pytest e2e/_unit_tests/97 passed in 0.74s

Breakdown:

  • test_directory_layout.py — 42 paths checked + 1 passkey-bytes-equal assertion
  • test_no_sut_imports.py — public-boundary scan over the entire e2e/ tree
  • test_conftest_skip_rules.py — 9 cases covering tier2_only, chamber_only, vins_mono, deferred_ac (with/without reason, xfail verdict)
  • docker/test_compose_yaml.py — 5 structural checks (services, internal network, runner mounts, mavlink secret, FDR size cap)
  • reporting/test_csv_reporter.py — 8 build_row cases + 1 in-process plugin integration run
  • helpers/test_geo.py — 5 WGS84 distance / offset / NaN-rejection cases
  • helpers/test_fdr_reader.py — 3 cases (missing root, nested sum, AZ-441 NotImplementedError)
  • jetson/test_tegrastats_parser.py — 7 parser cases (RAM, GPU load/freq, temps, CPU avg, blank-line, JSON round-trip, stream-to-CSV)
  • jetson/test_jtop_parser.py — 2 cases (state_to_row, jetson-stats-missing stub)
  • mock_suite_sat/test_mock_app.py — 6 FastAPI TestClient cases
  • fixtures/test_injectors_contract.py — 6 contract / NotImplementedError pointer cases

No per-batch full-suite run per the implement skill's Test-Run Cadence (Step 16 owns the only full-suite invocation in this skill).

AC Test Coverage (AZ-406)

AC Test Status
AC-1 (Tier-1 env starts, pytest discovers ≥1 test) test_compose_yaml::* + test_directory_layout + e2e/tests/positive/test_smoke.py::test_harness_boots Covered
AC-2 (mock services respond) mock_suite_sat/test_mock_app.py::test_health_endpoint + 5 ingest cases Covered
AC-3 (SITLs accept SUT output) sitl_observer.get_observer public surface present; concrete check is deferred to AZ-416 (FT-P-09-AP) / AZ-417 (FT-P-09-iNav) per dependency table Covered by contract; full check deferred
AC-4 (CSV report with required columns) test_csv_reporter::test_csv_plugin_emits_required_columns Covered
AC-5 (egress isolation enforced) test_compose_yaml::test_e2e_net_is_internal (static); runtime TCP probe lives in e2e/tests/positive/test_smoke.py and runs inside Docker Covered
AC-6 (Tier-2 harness contract) jetson/test_tegrastats_parser.py + jetson/test_jtop_parser.py + test_directory_layout[jetson/*]; full Tier-2 contract validation is AZ-444 Covered by contract; full check is AZ-444
AC-7 (fixture builders reproducible) Owned by AZ-407 per task spec "Excluded" section Deferred (in-scope to AZ-407)
AC-8 (parametrize matrix coverage) test_conftest_skip_rules::test_vins_mono_* + e2e/tests/positive/test_smoke.py::test_parametrize_matrix_smoke Covered
AC-9 (skips per traceability matrix) 9 cases in test_conftest_skip_rules.py Covered

Code Review Verdict

Self-reviewed — PASS. Notable points:

  • Public-boundary discipline enforced by a runtime grep in test_no_sut_imports.py rather than a doc-only convention. The whole e2e/ tree was scanned and zero violations were found.
  • Module-layout entry for blackbox_tests was added in a separate preparatory commit so the diff for AZ-406 itself stays focused on the harness scaffold.
  • Python 3.10 compatibility — the project pins >=3.10,<3.12, so I replaced an initial use of datetime.UTC (3.11+) with timezone.utc aliased to UTC at module top. Caught by the first focused-test run.
  • CSV plugin in-process integration test required -p runner.reporting.csv_reporter on the inner pytest.main() call so option parsing sees the --csv flag — added with a note explaining the ordering.
  • Mock-suite-sat returns 422 (FastAPI default) for schema failures rather than 400; the unit test asserts 400 <= status < 500 and documents the trade-off in-line. NFT-SEC-01 will lock the exact code if needed.
  • e2e/tests/conftest.py does from runner.conftest import * so the test tree works both inside the docker image (where runner/ is on PYTHONPATH at /opt/e2e-runner/) and outside (where e2e/runner/ is the relative path). Re-export pattern is documented at the top of the file.

Auto-Fix Attempts

  1. No code-review failures — auto-fix gate was not entered.

Stuck Agents

None.

Deferred follow-ups

None — all deferred-to-later-task surfaces are explicit NotImplementedError calls naming the owning ticket (AZ-407 / AZ-408 / AZ-416 / AZ-417 / AZ-419 / AZ-439 / AZ-441 / AZ-444). The deferrals are intentional and match the task spec's "Excluded" section.

Next Batch

The next test-context batch is Batch 68. Candidate task set (all depend only on AZ-406, which is now in done/):

  • AZ-407 (Static fixture builders — 3pt)
  • AZ-444 (Tier-2 Jetson harness wrapper — 5pt)
  • AZ-445 (CSV reporter + evidence bundler — 2pt)

Total: 10 cp across 3 tasks — within the 4-task / 20-cp per-batch cap. AZ-408 (Runtime synthetic-injection — 3pt) depends on AZ-407, so it goes in batch 69 along with the first wave of FT-P-* / FT-N-* scenarios.