mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 22:51:14 +00:00
[AZ-406] Blackbox test harness bootstrap (Tier-1 + Tier-2 scaffold)
Bootstraps the public-boundary blackbox test harness owned by epic
AZ-262 (E-BBT). Establishes the e2e/ directory tree at the repo root,
fully separated from src/gps_denied_onboard/** and from the in-process
tests/** tree, and commits to the contracts every subsequent test
ticket (AZ-407..AZ-446) builds against.
Tier-1 (workstation Docker):
- docker/docker-compose.test.yml wires SUT + ArduPilot SITL + iNav SITL
+ mock Suite Sat Service + mavproxy listener + e2e-runner onto one
e2e-net bridge with internal: true (enforces RESTRICT-SAT-1 /
NFT-SEC-02 egress isolation at the network layer).
- docker/docker-compose.tier2-bridge.yml override disables the in-
compose SUT so Tier-2 pairs SITLs + mock + runner on an x86 host
while the SUT runs natively on the Jetson under systemd.
Tier-2 (Jetson):
- jetson/run-tier2.sh + tier2.service systemd unit + tegrastats /
jtop parsers feed per-sample telemetry into the evidence bundle.
Runner image (e2e/runner/):
- Dockerfile + requirements.txt install ONLY ground-side libs
(pymavlink, opencv-python>=4.12, numpy/scipy/geopy/pyproj, httpx,
orjson, pydantic, structlog, pytest 8.x). The runner deliberately
does NOT install the SUT package.
- conftest.py implements the AC-9 skip-rule mapping (tier2_only,
chamber_only, vins_mono, deferred_ac) tied to environment.md
parametrize axes.
- reporting/csv_reporter.py is a pytest plugin emitting one row per
test with the exact 11-column schema from environment.md §
Reporting (test_id, test_name, traces_to, fc_adapter, vio_strategy,
tier, started_at_utc, execution_time_ms, result, error_message,
evidence_paths). XFAIL surfaced only when a test carries
@pytest.mark.deferred_ac(verdict="xfail", reason=...).
- reporting/evidence_bundler.py exposes the attach_evidence fixture
that copies per-test artifacts (.tlog, FDR archives, screenshots,
tegrastats / jtop CSVs) into the run bundle and records relative
paths into the reporter's evidence_paths column.
- helpers/{frame_source_replay,imu_replay,sitl_observer,
mavproxy_tlog_reader,fdr_reader}.py declare the public surfaces
(concrete implementations owned by AZ-407 / AZ-408 / AZ-416 /
AZ-417 / AZ-441 per the dependency table); helpers/geo.py ships
today (no downstream task dep) — WGS84 distance / forward-bearing
/ offset via pyproj with NaN rejection.
Mock Suite Sat Service (e2e/fixtures/mock-suite-sat/):
- FastAPI app: POST /tiles (ingest contract from D-PROJ-2 follow-up),
GET /tiles/audit + /mock/audit (per-run read-back), POST
/mock/config (force-status, response delay), POST /mock/reset
(clears audit between tests), GET /mock/health.
Fixture scaffolds (e2e/fixtures/{tile-cache-builder, age-injector,
injectors, cold-boot, secrets, security}/):
- Public surfaces only. Concrete builders land in AZ-407 (static
fixtures), AZ-408 (runtime synthetic injection), AZ-419 (cold-boot
fixture), AZ-439 (CVE-2025-53644 JPEG generator).
Test tree (e2e/tests/{positive,negative,performance,resilience,
security,resource_limit}/):
- Mirror of the test-spec category grouping in
_docs/02_document/tests/*-tests.md.
- tests/positive/test_smoke.py is the AC-1 harness-boot smoke run
inside the e2e-runner image once Docker brings everything up.
Out-of-container unit tests (e2e/_unit_tests/):
- Exercises the harness internals (CSV reporter plugin lifecycle,
conftest skip rules, helper modules, parsers, mock app, compose
YAML structural contract, public-boundary enforcement) without
Docker / SITL. 97 unit tests, all passing.
Build / config:
- pyproject.toml: testpaths extended with e2e/_unit_tests; pythonpath
extended with e2e; fastapi>=0.111,<0.120 added to dev extras for the
mock-app TestClient unit test.
AC coverage:
- AC-1 (Tier-1 boot) → compose YAML test + directory layout
+ smoke test (Docker-bound)
- AC-2 (mock services) → 6 FastAPI TestClient unit tests
- AC-3 (SITLs accept output) → contract present; concrete check
deferred to AZ-416 / AZ-417
- AC-4 (CSV columns) → in-process plugin lifecycle test
emits the exact 11-column schema
- AC-5 (egress isolation) → static config test + runtime probe
in Docker-bound smoke
- AC-6 (Tier-2 contract) → tegrastats + jtop parser unit tests
+ jetson/* layout test; full Tier-2
contract is AZ-444
- AC-7 (fixture reproducibility) → deferred to AZ-407 per task spec
- AC-8 (parametrize matrix) → vins_mono skip-rule cases +
tests/positive/test_smoke
- AC-9 (skip semantics) → 9 conftest skip-rule unit tests
Module layout entry for blackbox_tests was added in 2026-05-16
preparatory commit d7a17a8 so this diff stays focused on the harness
scaffold. AZ-406 advances to In Testing on commit.
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,255 @@
|
||||
# Batch 67 Report — Test Implementation (cycle 1, batch 1 of test phase)
|
||||
|
||||
**Batch**: 67
|
||||
**Date**: 2026-05-16
|
||||
**Context**: Test implementation (greenfield Step 10 — Implement Tests)
|
||||
**Tasks**: AZ-406 (Blackbox Test Infrastructure Bootstrap — 5pt)
|
||||
**Cycle**: 1 (continues the global batch counter from product implementation; batch 67 is the first test-context batch)
|
||||
**Verdict**: COMPLETE — PASS (self-reviewed)
|
||||
|
||||
## Summary
|
||||
|
||||
Bootstrapped the blackbox / e2e test harness owned by epic AZ-262 (E-BBT).
|
||||
This is the **foundation** that every subsequent test task (AZ-407..AZ-446)
|
||||
builds on; AZ-406 commits to:
|
||||
|
||||
* The `e2e/` directory tree at the repo root, separated from the product
|
||||
source `src/gps_denied_onboard/**` and from the in-process unit /
|
||||
integration tree at `tests/**`.
|
||||
* `docker/docker-compose.test.yml` — the Tier-1 entrypoint that wires the
|
||||
SUT, ArduPilot SITL, iNav SITL, mock Suite Sat Service, mavproxy
|
||||
listener, and the e2e-runner image onto a single `e2e-net` bridge with
|
||||
`internal: true` (enforces RESTRICT-SAT-1 / NFT-SEC-02 at the network
|
||||
layer).
|
||||
* `docker/docker-compose.tier2-bridge.yml` — override that disables the
|
||||
in-compose SUT block so Tier-2 runs can pair the SITLs + mock + runner
|
||||
on an x86 host with the SUT running natively on the Jetson under
|
||||
systemd.
|
||||
* `jetson/run-tier2.sh` + `tier2.service` + `tegrastats_parser.py` +
|
||||
`jtop_parser.py` — the Tier-2 entrypoint, systemd unit template, and
|
||||
per-sample telemetry parsers that feed the evidence bundle.
|
||||
* `runner/Dockerfile` + `requirements.txt` + `pytest.ini` + `conftest.py`
|
||||
— the e2e-runner image. The image installs ONLY ground-side libs
|
||||
(pymavlink, opencv-python>=4.12, numpy/scipy/geopy/pyproj, httpx,
|
||||
orjson, pydantic, structlog, pytest 8.x); it deliberately does NOT
|
||||
install the SUT package (public-boundary discipline).
|
||||
* `runner/reporting/csv_reporter.py` — pytest plugin that emits one row
|
||||
per test with the exact 11-column schema from `environment.md` §
|
||||
Reporting (`test_id, test_name, traces_to, fc_adapter, vio_strategy,
|
||||
tier, started_at_utc, execution_time_ms, result, error_message,
|
||||
evidence_paths`). Result classification maps PASS/FAIL/SKIP/XFAIL
|
||||
per AC-9; XFAIL is surfaced only when a test carries
|
||||
`@pytest.mark.deferred_ac(verdict="xfail", reason=...)`.
|
||||
* `runner/reporting/evidence_bundler.py` — `attach_evidence` fixture
|
||||
that copies per-test artifacts (.tlog, FDR archives, screenshots,
|
||||
tegrastats / jtop CSVs) into the run bundle and records their relative
|
||||
paths into the CSV reporter's `evidence_paths` column.
|
||||
* `runner/helpers/*` — public surfaces for the six boundary-driving
|
||||
helper modules (`frame_source_replay`, `imu_replay`, `sitl_observer`,
|
||||
`mavproxy_tlog_reader`, `fdr_reader`, `geo`). Concrete implementations
|
||||
are owned by AZ-407 / AZ-408 / AZ-416 / AZ-417 / AZ-441 per the
|
||||
dependency table; AZ-406 commits to the type signatures + a clear
|
||||
NotImplementedError pointing at the owning ticket so test specs can
|
||||
plan against the contract while the implementations land
|
||||
incrementally. `geo.py` ships a real implementation today (it has no
|
||||
downstream task dependency) — WGS84 distance / forward-bearing /
|
||||
offset via pyproj.
|
||||
* `fixtures/mock-suite-sat/` — a FastAPI mock of the parent Suite Sat
|
||||
Service ingest API. Endpoints: `POST /tiles` (202 on well-formed
|
||||
request, 4xx on malformed), `GET /tiles/audit` + `GET /mock/audit`
|
||||
(read-back of the per-run audit log), `POST /mock/config` (test-time
|
||||
behaviour control), `POST /mock/reset` (clears the audit log between
|
||||
tests), `GET /mock/health` (Docker healthcheck). The accepted
|
||||
ingest schema mirrors the contract sketch in
|
||||
`_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md`;
|
||||
NFT-SEC-01 later asserts this shape against the live contract.
|
||||
* `fixtures/{tile-cache-builder,age-injector,injectors,cold-boot,secrets,security}/`
|
||||
— directory scaffolds + public surfaces for the per-fixture builders.
|
||||
Concrete content is delivered by AZ-407 (static fixtures), AZ-408
|
||||
(runtime synthetic injection), AZ-419 (cold-boot fixture), AZ-439
|
||||
(CVE-2025-53644 JPEG generator).
|
||||
* `tests/{positive,negative,performance,resilience,security,resource_limit}/`
|
||||
— pytest target tree mirroring the test-spec category grouping in
|
||||
`_docs/02_document/tests/*-tests.md`. `tests/positive/test_smoke.py`
|
||||
is the AC-1 harness boot smoke test that runs inside the e2e-runner
|
||||
image once Docker brings everything up.
|
||||
* `_unit_tests/` — out-of-container unit-test tree for the harness
|
||||
internals. Extends `pyproject.toml`'s `testpaths` so the project's
|
||||
main `pytest` invocation exercises the harness alongside the product
|
||||
unit tests, without requiring Docker / SITL.
|
||||
|
||||
Out of scope (deferred to subsequent test-task batches):
|
||||
|
||||
* The fixture content itself (AZ-407 / AZ-408 / AZ-419 / AZ-439).
|
||||
* The Tier-2 Jetson runtime harness validation (AZ-444 owns end-to-end
|
||||
Tier-2 contract verification).
|
||||
* The CSV reporter trend-line / acceptance-band annotations + Monte
|
||||
Carlo CI (AZ-446).
|
||||
|
||||
## Files added / modified
|
||||
|
||||
### Added (50)
|
||||
|
||||
Top-level + docker:
|
||||
|
||||
* `e2e/README.md`
|
||||
* `e2e/.gitignore`
|
||||
* `e2e/docker/docker-compose.test.yml`
|
||||
* `e2e/docker/docker-compose.tier2-bridge.yml`
|
||||
* `e2e/docker/secrets/mavlink_passkey`
|
||||
* `e2e/docker/secrets/README.md`
|
||||
|
||||
Jetson harness:
|
||||
|
||||
* `e2e/jetson/run-tier2.sh` (executable)
|
||||
* `e2e/jetson/tier2.service`
|
||||
* `e2e/jetson/tegrastats_parser.py` (executable)
|
||||
* `e2e/jetson/jtop_parser.py` (executable)
|
||||
|
||||
Runner image:
|
||||
|
||||
* `e2e/runner/Dockerfile`
|
||||
* `e2e/runner/requirements.txt`
|
||||
* `e2e/runner/pytest.ini`
|
||||
* `e2e/runner/__init__.py`
|
||||
* `e2e/runner/conftest.py`
|
||||
* `e2e/runner/reporting/__init__.py`
|
||||
* `e2e/runner/reporting/csv_reporter.py`
|
||||
* `e2e/runner/reporting/evidence_bundler.py`
|
||||
* `e2e/runner/helpers/__init__.py`
|
||||
* `e2e/runner/helpers/geo.py`
|
||||
* `e2e/runner/helpers/frame_source_replay.py`
|
||||
* `e2e/runner/helpers/imu_replay.py`
|
||||
* `e2e/runner/helpers/sitl_observer.py`
|
||||
* `e2e/runner/helpers/mavproxy_tlog_reader.py`
|
||||
* `e2e/runner/helpers/fdr_reader.py`
|
||||
|
||||
Fixtures:
|
||||
|
||||
* `e2e/fixtures/mock-suite-sat/Dockerfile`
|
||||
* `e2e/fixtures/mock-suite-sat/requirements.txt`
|
||||
* `e2e/fixtures/mock-suite-sat/app.py`
|
||||
* `e2e/fixtures/tile-cache-builder/README.md`
|
||||
* `e2e/fixtures/age-injector/README.md`
|
||||
* `e2e/fixtures/injectors/__init__.py`
|
||||
* `e2e/fixtures/injectors/outlier.py`
|
||||
* `e2e/fixtures/injectors/blackout_spoof.py`
|
||||
* `e2e/fixtures/injectors/multi_segment.py`
|
||||
* `e2e/fixtures/injectors/cold_boot.py`
|
||||
* `e2e/fixtures/cold-boot/README.md`
|
||||
* `e2e/fixtures/secrets/mavlink-test-passkey.txt`
|
||||
* `e2e/fixtures/secrets/README.md`
|
||||
* `e2e/fixtures/security/generate_cve_jpeg.py`
|
||||
* `e2e/fixtures/security/README.md`
|
||||
|
||||
Test tree:
|
||||
|
||||
* `e2e/tests/__init__.py`
|
||||
* `e2e/tests/conftest.py`
|
||||
* `e2e/tests/{positive,negative,performance,resilience,security,resource_limit}/__init__.py`
|
||||
* `e2e/tests/positive/test_smoke.py`
|
||||
|
||||
Out-of-container unit tests (testpaths-extended):
|
||||
|
||||
* `e2e/_unit_tests/__init__.py`
|
||||
* `e2e/_unit_tests/conftest.py`
|
||||
* `e2e/_unit_tests/{reporting,helpers,jetson,mock_suite_sat,fixtures,docker}/__init__.py`
|
||||
* `e2e/_unit_tests/test_directory_layout.py`
|
||||
* `e2e/_unit_tests/test_no_sut_imports.py`
|
||||
* `e2e/_unit_tests/test_conftest_skip_rules.py`
|
||||
* `e2e/_unit_tests/docker/test_compose_yaml.py`
|
||||
* `e2e/_unit_tests/reporting/test_csv_reporter.py`
|
||||
* `e2e/_unit_tests/helpers/test_geo.py`
|
||||
* `e2e/_unit_tests/helpers/test_fdr_reader.py`
|
||||
* `e2e/_unit_tests/jetson/test_tegrastats_parser.py`
|
||||
* `e2e/_unit_tests/jetson/test_jtop_parser.py`
|
||||
* `e2e/_unit_tests/mock_suite_sat/test_mock_app.py`
|
||||
* `e2e/_unit_tests/fixtures/test_injectors_contract.py`
|
||||
|
||||
### Modified (1)
|
||||
|
||||
* `pyproject.toml` — extended `[tool.pytest.ini_options].testpaths` to
|
||||
include `e2e/_unit_tests`; extended `pythonpath` to include `e2e`;
|
||||
added `fastapi>=0.111,<0.120` to `[project.optional-dependencies].dev`
|
||||
for the mock-suite-sat unit test.
|
||||
|
||||
(Also `_docs/02_document/module-layout.md` was committed in a separate
|
||||
preparatory commit (`d7a17a8`) adding the `blackbox_tests` cross-cutting
|
||||
entry — the implement skill's Step 4 file-ownership rule requires that
|
||||
entry before AZ-406 can be assigned an OWNED envelope.)
|
||||
|
||||
## Test Results
|
||||
|
||||
### Focused tests (Step 6.4)
|
||||
|
||||
`pytest e2e/_unit_tests/` — **97 passed in 0.74s**
|
||||
|
||||
Breakdown:
|
||||
|
||||
* `test_directory_layout.py` — 42 paths checked + 1 passkey-bytes-equal assertion
|
||||
* `test_no_sut_imports.py` — public-boundary scan over the entire `e2e/` tree
|
||||
* `test_conftest_skip_rules.py` — 9 cases covering tier2_only, chamber_only, vins_mono, deferred_ac (with/without reason, xfail verdict)
|
||||
* `docker/test_compose_yaml.py` — 5 structural checks (services, internal network, runner mounts, mavlink secret, FDR size cap)
|
||||
* `reporting/test_csv_reporter.py` — 8 build_row cases + 1 in-process plugin integration run
|
||||
* `helpers/test_geo.py` — 5 WGS84 distance / offset / NaN-rejection cases
|
||||
* `helpers/test_fdr_reader.py` — 3 cases (missing root, nested sum, AZ-441 NotImplementedError)
|
||||
* `jetson/test_tegrastats_parser.py` — 7 parser cases (RAM, GPU load/freq, temps, CPU avg, blank-line, JSON round-trip, stream-to-CSV)
|
||||
* `jetson/test_jtop_parser.py` — 2 cases (state_to_row, jetson-stats-missing stub)
|
||||
* `mock_suite_sat/test_mock_app.py` — 6 FastAPI TestClient cases
|
||||
* `fixtures/test_injectors_contract.py` — 6 contract / NotImplementedError pointer cases
|
||||
|
||||
No per-batch full-suite run per the implement skill's Test-Run Cadence
|
||||
(Step 16 owns the only full-suite invocation in this skill).
|
||||
|
||||
## AC Test Coverage (AZ-406)
|
||||
|
||||
| AC | Test | Status |
|
||||
|----|------|--------|
|
||||
| AC-1 (Tier-1 env starts, pytest discovers ≥1 test) | `test_compose_yaml::*` + `test_directory_layout` + `e2e/tests/positive/test_smoke.py::test_harness_boots` | Covered |
|
||||
| AC-2 (mock services respond) | `mock_suite_sat/test_mock_app.py::test_health_endpoint` + 5 ingest cases | Covered |
|
||||
| AC-3 (SITLs accept SUT output) | `sitl_observer.get_observer` public surface present; concrete check is deferred to AZ-416 (FT-P-09-AP) / AZ-417 (FT-P-09-iNav) per dependency table | Covered by contract; full check deferred |
|
||||
| AC-4 (CSV report with required columns) | `test_csv_reporter::test_csv_plugin_emits_required_columns` | Covered |
|
||||
| AC-5 (egress isolation enforced) | `test_compose_yaml::test_e2e_net_is_internal` (static); runtime TCP probe lives in `e2e/tests/positive/test_smoke.py` and runs inside Docker | Covered |
|
||||
| AC-6 (Tier-2 harness contract) | `jetson/test_tegrastats_parser.py` + `jetson/test_jtop_parser.py` + `test_directory_layout[jetson/*]`; full Tier-2 contract validation is AZ-444 | Covered by contract; full check is AZ-444 |
|
||||
| AC-7 (fixture builders reproducible) | Owned by AZ-407 per task spec "Excluded" section | Deferred (in-scope to AZ-407) |
|
||||
| AC-8 (parametrize matrix coverage) | `test_conftest_skip_rules::test_vins_mono_*` + `e2e/tests/positive/test_smoke.py::test_parametrize_matrix_smoke` | Covered |
|
||||
| AC-9 (skips per traceability matrix) | 9 cases in `test_conftest_skip_rules.py` | Covered |
|
||||
|
||||
## Code Review Verdict
|
||||
|
||||
Self-reviewed — PASS. Notable points:
|
||||
|
||||
* Public-boundary discipline enforced by a runtime grep in `test_no_sut_imports.py` rather than a doc-only convention. The whole `e2e/` tree was scanned and zero violations were found.
|
||||
* Module-layout entry for `blackbox_tests` was added in a separate preparatory commit so the diff for AZ-406 itself stays focused on the harness scaffold.
|
||||
* Python 3.10 compatibility — the project pins `>=3.10,<3.12`, so I replaced an initial use of `datetime.UTC` (3.11+) with `timezone.utc` aliased to `UTC` at module top. Caught by the first focused-test run.
|
||||
* CSV plugin in-process integration test required `-p runner.reporting.csv_reporter` on the inner `pytest.main()` call so option parsing sees the `--csv` flag — added with a note explaining the ordering.
|
||||
* Mock-suite-sat returns 422 (FastAPI default) for schema failures rather than 400; the unit test asserts `400 <= status < 500` and documents the trade-off in-line. NFT-SEC-01 will lock the exact code if needed.
|
||||
* `e2e/tests/conftest.py` does `from runner.conftest import *` so the test tree works both inside the docker image (where `runner/` is on PYTHONPATH at `/opt/e2e-runner/`) and outside (where `e2e/runner/` is the relative path). Re-export pattern is documented at the top of the file.
|
||||
|
||||
## Auto-Fix Attempts
|
||||
|
||||
0. No code-review failures — auto-fix gate was not entered.
|
||||
|
||||
## Stuck Agents
|
||||
|
||||
None.
|
||||
|
||||
## Deferred follow-ups
|
||||
|
||||
None — all deferred-to-later-task surfaces are explicit
|
||||
`NotImplementedError` calls naming the owning ticket (AZ-407 / AZ-408 /
|
||||
AZ-416 / AZ-417 / AZ-419 / AZ-439 / AZ-441 / AZ-444). The deferrals are
|
||||
intentional and match the task spec's "Excluded" section.
|
||||
|
||||
## Next Batch
|
||||
|
||||
The next test-context batch is **Batch 68**. Candidate task set (all
|
||||
depend only on AZ-406, which is now in `done/`):
|
||||
|
||||
* AZ-407 (Static fixture builders — 3pt)
|
||||
* AZ-444 (Tier-2 Jetson harness wrapper — 5pt)
|
||||
* AZ-445 (CSV reporter + evidence bundler — 2pt)
|
||||
|
||||
Total: 10 cp across 3 tasks — within the 4-task / 20-cp per-batch cap.
|
||||
AZ-408 (Runtime synthetic-injection — 3pt) depends on AZ-407, so it
|
||||
goes in batch 69 along with the first wave of FT-P-* / FT-N-* scenarios.
|
||||
Reference in New Issue
Block a user