Commit Graph

17 Commits

Author SHA1 Message Date
Oleksandr Bezdieniezhnykh 5def1a3eb3 [AZ-422] Add FT-P-17 + FT-N-06 mid-flight tile blackbox tests
Implement the AC-8.4 and AC-NEW-6 blackbox scenarios for mid-flight
tile generation, dedup, landing-time upload, and freshness gating.

Helpers:
- runner/helpers/mid_flight_tile_evaluator.py — pure-logic evaluators
  for tile generation rate, Mode B Fact #105 schema check, footprint+
  GSD dedup (via geo.distance_m), upload-audit reconciliation, and
  the AC-5/AC-6 capture_utc + freshness-gate checks.
- runner/helpers/mock_suite_sat_audit.py — httpx wrapper for the
  mock-suite-sat-service /tiles/audit endpoint with strict response-
  shape validation.

Scenarios:
- tests/positive/test_ft_p_17_mid_flight_tiles.py
- tests/negative/test_ft_n_06_mid_flight_freshness.py

Both skip when sitl_replay_ready is false and fail loudly when fixture
records are missing (tests-as-gates discipline). 52 new unit tests
(41 evaluator + 11 audit client) cover every helper branch.

Review: PASS_WITH_WARNINGS (2 Low — duplicate haversine carry-over,
upstream production dependency surface).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 15:28:39 +03:00
Oleksandr Bezdieniezhnykh 7d1288e4ba [AZ-421] Batch 82: FT-P-15 + FT-P-16 + FT-P-18 cache / offline / no-raw-retention
FT-P-15: parse FDR `cache-self-check` records; assert every tile-manifest
entry has CRS, tile_matrix, dimension, m_per_px, capture_date, source,
compression; m_per_px >= 0.5 (or rejected by FDR `tile-load-rejected`).

FT-P-16: read `docker network inspect e2e-net` + `docker inspect <sut>`
snapshots; assert `Internal == true` AND SUT attached only to e2e-net.
The 0-egress semantic of AC-8.3 is enforced structurally.

FT-P-18: walk FDR + tile-cache, probe JPEG dimensions via stdlib SOF
parser, reject any file matching nav-camera raw pattern (5472x3648 or
880x720). Extrapolate thumbnail-log size to 8h; assert < 1 GB.

Adds runner.helpers.tile_cache_inspector with five evaluators
(manifest schema, offline mode, raw-frame detection, thumbnail budget,
JPEG dimension probe) + walk_files helper. Pure-logic coverage: 43
new unit tests; full e2e/_unit_tests/ suite 793 passing (was 746).
Scenarios skip locally when SITL replay fixture or docker-inspect
env vars are missing; production hooks (cache-self-check FDR record,
tile-load-rejected events, docker-inspect snapshots) are tracked
outside this task.

See _docs/03_implementation/batch_82_report.md +
reviews/batch_82_review.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 15:09:58 +03:00
Oleksandr Bezdieniezhnykh bb744d9078 [AZ-420] Batch 81: FT-P-12 + FT-P-13 GCS scenarios
FT-P-12: parse mavproxy-listener tlog over a 60 s Derkachi replay and
assert SUT->GCS GLOBAL_POSITION_INT cadence lands in [1, 2] Hz (AC-6.1).

FT-P-13: inject `RELOC:<lat>,<lon>,<radius_m>` STATUSTEXT while the SUT
is in dead_reckoned; verify FDR `c8.gcs.operator_command` ack <=2s,
`anchor_search_region` centre shifts toward the hint, and no
BAD_SIGNATURE / UNAUTHORIZED / REJECTED STATUSTEXT lands in the
post-inject window (AC-6.2).

Adds runner.helpers.gcs_telemetry_evaluator (rate, hint-ack correlation,
haversine search-region shift, rejection scan) and
sitl_observer.capture_gcs_tlog (parity surface to capture_ap_tlog).
Pure-logic coverage: 39 new unit tests; full e2e/_unit_tests/ suite
746 passing (was 700). Scenarios skip locally on missing SITL replay
fixture; production hooks (inbound STATUSTEXT parser, anchor_search_region
FDR emitter) tracked outside this task.

See _docs/03_implementation/batch_81_report.md +
reviews/batch_81_review.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 14:46:08 +03:00
Oleksandr Bezdieniezhnykh 7fb3cb3f34 [AZ-600] Batch 80: refactor sitl_replay_builder to strategy pattern
Replace per-scenario fixture builders with a parameterized strategy
framework so future Derkachi-based scenarios compose existing pieces
instead of duplicating ~200 lines of orchestration per scenario.

New e2e/fixtures/sitl_replay_builder/builder.py:
- VideoSource ABC + StillImagesSource, Mp4PassthroughSource
- TlogSource ABC + SyntheticStationaryTlog, ImuCsvTlog
- FdrProjection ABC + RawFdrPassthrough, OutboundMessagesProjection
- FixtureBuilderConfig + build_fixtures(cfg) orchestrator
- Consolidated MAVLink pack_raw_imu / pack_attitude helpers
- Consolidated run_gps_denied_replay + write_observer_fixture

build_p01_fixtures.py: 423 -> 107 lines (75% reduction).
build_p02_fixtures.py: 292 -> 98 lines (66% reduction).
_common.py: deleted (folded into builder.py).

Tests reorganized:
- test_sitl_replay_builder_builder.py (new, 33 strategy-level tests)
- test_sitl_replay_builder.py (slimmed, 6 FT-P-01 integration)
- test_sitl_replay_builder_p02.py (slimmed, 7 FT-P-02 integration)

README documents the strategy framework + a worked example for
adding FT-P-04 in ~30 lines (no new strategy code required).

Regression gate: 700 passing (was 686; +14 from finer-grained
coverage of new strategy classes and the build_fixtures orchestrator).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 14:19:08 +03:00
Oleksandr Bezdieniezhnykh 4e0717e543 [AZ-599] Batch 79: FT-P-02 Derkachi builder + _common.py extraction
- Add build_p02_fixtures.py: IMU CSV → tlog conversion (RAW_IMU +
  ATTITUDE pairs, centidegrees→radians yaw) and orchestrator that
  runs gps-denied replay against Derkachi MP4 + generated tlog,
  verifying ≥1 record_type="estimate" in the FDR archive.
- Extract run_gps_denied_replay + FDR-parent-dir helpers into
  sitl_replay_builder/_common.py; refactor build_p01_fixtures.py
  to import from _common (b78 tests preserved).
- Add 20 unit tests under e2e/_unit_tests/fixtures/test_sitl_
  replay_builder_p02.py covering AC-1..AC-5; total unit suite
  686/686 passing (regression gate AC-6).
- README updated to document FT-P-01 + FT-P-02 builders.
- Advance autodev state: last_completed_batch=79, current_batch=80;
  prune verbose detail blob.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 13:40:07 +03:00
Oleksandr Bezdieniezhnykh 47ad43f913 [AZ-598] Batch 78: sitl_observer.wait_for_outbound + FT-P-01 fixture builder
Phase 1: extend sitl_observer with cursor-based `wait_for_outbound`
returning `OutboundMessage` from `outbound_messages_<fc_kind>_<host>.json`
fixtures. Three outcomes: message, TimeoutError (null entries), or
RuntimeError (missing/malformed). Fix FT-P-01 + FT-P-05 scenarios to
use `fc_kind=` kwarg.

Phase 2: FT-P-01 vertical-slice fixture builder under
`e2e/fixtures/sitl_replay_builder/`. Reuses the production
`gps-denied-replay` CLI + `ReplayInputAdapter`: encode 60 stills as
1 fps MP4 + synthetic stationary tlog (pymavlink); run replay;
project FDR outbound estimates into the schema. Avoids the
13+ cp of SUT-side frame-ingestion that a live-SITL-capture path
would have required. Live execution remains a manual operator step.

+35 unit tests (664 total, up from 637). K=3 cumulative review for
b76-b78 documents the offline-replay arc convergence.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 12:08:02 +03:00
Oleksandr Bezdieniezhnykh f49d803252 [AZ-597] Batch 77: replay_mode helpers + 13 scenario stub rewires
Add `runner/helpers/replay_mode.py` (NullFrameSink, NullFcInboundEmitter,
default_frame_period_ms, load_replay_json, resolve_replay_subdir,
imu_replay_noop) and rewire all 13 scenarios off their local
`_resolve_*` / `_drive_*` / `_push_*` NotImplementedError stubs.

Closes the offline FDR-replay execution path. `grep raise
NotImplementedError` under `e2e/tests/` now returns zero matches. +17
unit tests (626 total, up from 608). Unit-test behaviour unchanged
(scenarios still skip via b75 sitl_replay_ready gate when
E2E_SITL_REPLAY_DIR is unset).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 09:52:05 +03:00
Oleksandr Bezdieniezhnykh 6554d568f1 [AZ-596] Batch 76: fc_proxy_runtime driver (FDR-replay mode)
Add `runner/helpers/fc_proxy_runtime.py` wrapping the existing
`BlackoutSpoofProxy` (AZ-406) with a scenario-facing `drive_fc_proxy`
entry point. FDR-replay mode only: loads `schedule.json`, optionally
activates the proxy against a caller clock for alignment verification,
and writes a `proxy_drive_report.json` audit record into
`${E2E_SITL_REPLAY_DIR}` for downstream evaluators.

Replaces the local `_drive_fc_proxy` stub in FT-N-04. Adds 3
@property accessors on `BlackoutSpoofProxy` so the wrapper does not
reach into private attributes. +11 unit tests (608 total, up from
596). Live-mode router wiring remains out of scope (future ticket).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 09:08:48 +03:00
Oleksandr Bezdieniezhnykh 43fdef1aac [AZ-595] Batch 75: sitl_observer FDR-replay + scenario probe cleanup
Implement all 11 `sitl_observer` public surfaces as an offline
FDR-replay strategy (reads JSON fixtures under `${E2E_SITL_REPLAY_DIR}`
instead of live pymavlink/yamspy). Replace 12 per-scenario
`_harness_helpers_implemented` probes with one shared session-scoped
`sitl_replay_ready` fixture in `e2e/tests/conftest.py`.

Net: -636 LoC of duplicated scenario gating, +17 LoC shared fixture,
+38 new unit tests (596 total, up from 558). Includes K=3 cumulative
review for batches 73-75 (PASS).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 09:00:55 +03:00
Oleksandr Bezdieniezhnykh 1d260f7e41 [AZ-594] Implement core-three harness stubs (fdr_reader, frame_source_replay, imu_replay)
Replaces the NotImplementedError stubs AZ-406 reserved on three runner-
side helpers; these were stranded from any tracker ticket since
AZ-407/408 never came back to fill them. Concrete bodies:

* fdr_reader.iter_records: JSONL parser + wire-envelope validator;
  recursive *.jsonl walk; projects {schema_version, ts, producer_id,
  kind, payload} to runner-side FdrRecord with record_type/monotonic_ms
  renames; yields oldest-first.
* frame_source_replay.replay_video: OpenCV VideoCapture decode + JPEG
  re-encode; auto-detects file vs directory; injectable sleep_fn for
  unit-test pacing.
* imu_replay.ImuReplayer.replay: csv.DictReader parse; degrees->radians
  attitude conversion; tolerates scientific notation; same sleep_fn
  injection pattern.

Adds 34 unit tests (14 + 10 + 10). Full e2e unit suite: 558 passed (+31).
Existing scenario _harness_helpers_implemented probes still return False
because they also depend on sitl_observer / fc_proxy_runtime stubs that
remain pending; scenario probe cleanup is out of AZ-594 scope.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 08:42:12 +03:00
Oleksandr Bezdieniezhnykh 2d6d44af5d [AZ-424] [AZ-425] [AZ-426] Implement negatives set (FT-N-01/03/04)
Adds three pure-logic evaluators + scenarios + unit tests covering the
project's failure-mode robustness ladder (AC-3.1, AC-3.4, AC-3.5,
AC-NEW-8):

* outlier_tolerance_evaluator (AZ-424 / FT-N-01): per-event 50 m drift
  bound + 3-frame covariance-monotonic window over the AZ-408 outlier
  injector's medium-density manifest.
* outage_request_evaluator (AZ-425 / FT-N-03): detects 3+ consecutive
  missing-frame windows; validates OPERATOR_RELOC_REQUEST STATUSTEXT
  arrives at 2 s ±500 ms, dead_reckoned label during outage, and no
  FC EKF divergence.
* blackout_spoof_evaluator (AZ-426 / FT-N-04): eight-AC ladder across
  the 5 s / 15 s / 35 s sub-windows — switch latency, spoof rejection,
  monotonic covariance, honest horiz_accuracy, STATUSTEXT 1-2 Hz,
  35 s escalation thresholds, and recovery gate.

Each scenario is skip-gated on the AZ-441 / AZ-407 / AZ-416 replay /
SITL / mavproxy helpers; unit tests (14 + 18 + 29 = 61) cover the
AC logic today. Full e2e unit-test suite: 527 passed (+67).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 08:26:16 +03:00
Oleksandr Bezdieniezhnykh a644debdb7 [AZ-416] [AZ-417] [AZ-419] Test batch 72: FT-P-09 AP/iNav + FT-P-11 cold start
- AZ-416 (FT-P-09-AP): fills mavproxy_tlog_reader.iter_messages with
  pymavlink body (AZ-406 surface kept); adds ap_contract_evaluator
  covering AC-1 (signing handshake <=5s), AC-2 (GPS_INPUT >=4.5 Hz),
  AC-3 (EK3_SRC1_POSXY=3), AC-4 (GPS_RAW_INT health >=80%); scenario
  forces fc_adapter=ardupilot.
- AZ-417 (FT-P-09-iNav): msp_frame_observer covering AC-2 (MSP rate)
  and AC-3 (fix_type/provider/numSat); scenario forces
  fc_adapter=inav.
- AZ-419 (FT-P-11): cold_start_evaluator covering AC-1 (operator
  manifest origin), AC-2 (FC EKF fallback), AC-3 (no-origin abort),
  AC-4 (bounded-delta conflict, ADR-010 Principle #11 amended);
  scenario parametrized on origin_source plus dedicated no-origin
  abort scenario.
- All scenarios skip-gated on upstream frame_source_replay /
  imu_replay / fdr_reader / sitl_observer extensions.
- +67 unit tests; full e2e unit suite: 460 passed.
- K=3 cumulative review fired: PASS for batches 70-72.

See _docs/03_implementation/batch_72_report.md,
_docs/03_implementation/reviews/batch_72_review.md,
_docs/03_implementation/cumulative_review_batches_70-72_cycle1_report.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 07:49:17 +03:00
Oleksandr Bezdieniezhnykh c6e6cba237 [AZ-414] [AZ-415] [AZ-418] Test batch 71: sharp turn + multi-segment + smoothing
- AZ-414 (FT-P-07 + FT-N-02): sharp_turn_detector helper covering
  AC-1 (gyro_z run detection + synthetic-overlay fallback),
  AC-2/AC-3 (FT-N-02 during-turn label + monotonic covariance),
  AC-4/AC-5/AC-6 (FT-P-07 recovery lag/drift/heading); twin scenario
  files under positive/ and negative/.
- AZ-415 (FT-P-08): multi_segment_evaluator helper + scenario.
- AZ-418 (FT-P-10): smoothing_evaluator helper covering AC-1 (raw +
  smoothed pose pairing), AC-2 (improvement rate >= 0.80), AC-3
  (mean improvement >= 5 m); scenario file.
- All scenarios skip-gated on upstream frame_source_replay /
  imu_replay / fdr_reader stubs (auto-activate when AZ-441 + AZ-407
  leftovers land).
- +68 unit tests; full e2e unit suite: 393 passed.

See _docs/03_implementation/batch_71_report.md and
_docs/03_implementation/reviews/batch_71_review.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 07:12:24 +03:00
Oleksandr Bezdieniezhnykh 29ac16cfcb [AZ-409] [AZ-412] [AZ-413] Batch 70: FT-P-01/04/05/06 scenarios
AZ-409 (3pt) — FT-P-01 still-image frame-center accuracy:
- accuracy_evaluator.py: GT loader + Vincenty error + AC-2/AC-3 pass-counts
- test_ft_p_01_still_image_accuracy.py: scenario gated on frame_source_replay
  + sitl_observer NotImplementedError; AC-4 timeout discipline

AZ-412 (3pt) — FT-P-04 Derkachi f2f registration >=95% on normal segments:
- registration_classifier.py: accel-derived attitude + overlap heuristic
  + success ratio with AC-3 sharp-turn exclusion
- test_ft_p_04_derkachi_f2f_registration.py: scenario gated on
  frame_source_replay + imu_replay + fdr_reader

AZ-413 (3pt) — FT-P-05 + FT-P-06 cross-domain MRE budgets:
- mre_evaluator.py: per-image budget (strict <2.5px) + 95th-percentile
  via numpy linear interp + combined report
- test_ft_p_05_sat_anchor.py: cross-domain scenario, reuses
  accuracy_evaluator for geodesic join
- test_ft_p_06_mre_budgets.py: pure piggyback on FT-P-04 + FT-P-05 CSV
  evidence; skips when either upstream CSV missing

Tests: 325 unit tests pass (+77 vs batch 69).
Reports: batch_70_report.md, batch_70_review.md (PASS).
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-16 18:10:46 +03:00
Oleksandr Bezdieniezhnykh 702a0c0ff3 [AZ-408] [AZ-410] [AZ-411] Batch 69: synth injectors + FT-P-02/03/14
AZ-408 (3pt) — Replace AZ-406 injector scaffolds with concrete generators:
- outlier.py: deterministic stride + far-away tile replacement; AC-2 ≥350m offset
- blackout_spoof.py: paired video blackout + FC GPS spoof with ≤40ms alignment;
  AC-4 realistic fix_type/hdop; AC-NEW-8 200-500m inter-spoof deltas
- multi_segment.py: ≥3 disjoint windows, ≥30s gaps, ≤25% coverage
- fc_proxy.py: timed-splice runtime proxy with pre-activate RuntimeError guard
- _common.py: derive_rng + tile-manifest reader + tmpfs helpers
- injector_fixtures.py: pytest fixtures wired via runner conftest

AZ-410 (3pt) — FT-P-02 cumulative drift between satellite anchors:
- anchor_pair_detector.py: AC-1 detection, AC-2/3 pass-fraction,
  AC-4 monotonicity check, CSV evidence
- test_ft_p_02_derkachi_drift.py: scenario gated on upstream helper
  NotImplementedError (frame_source_replay / fdr_reader / imu_replay)

AZ-411 (2pt) — FT-P-03 + FT-P-14 schema + WGS84:
- estimate_schema.py: AC-1 schema completeness, AC-2 source-label set
  containment, AC-3 WGS84 range + int32 1e-7 decode
- test_ft_p_03_14_schema_wgs84.py: shared single-image-push scenario

Tests: 248 unit tests pass (+91 vs batch 68).
Reports: batch_69_report.md, batch_69_review.md (PASS),
cumulative_review_batches_67-69_cycle1_report.md (PASS).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-16 17:54:00 +03:00
Oleksandr Bezdieniezhnykh 6599d828d2 [AZ-407] [AZ-444] [AZ-445] Batch 68: fixtures, Tier-2 harness, NFR reporter
Three blackbox-harness tasks landed together — all depend only on
AZ-406 and unblock the FT-* / NFT-* scenario tasks scheduled for
batches 69+.

AZ-407 — Static fixture builders (3pt):
  * tile-cache-builder/{builder.py, Dockerfile, build.sh} produces a
    deterministic tile-cache-fixture Docker volume from
    _docs/00_problem/input_data/. Reproducibility primitives: sorted
    iteration, frozen PIL JPEG settings, FAISS HNSW32 built single-
    threaded with seeded stub descriptors.
  * age-injector/{age_injector.py, inject.sh} clones the volume and
    shifts capture_date by N×30.44 days; tile JPEG bytes preserved
    bit-identical. Emits synth-age-7mo + synth-age-13mo volumes.
  * cold-boot/cold_boot_fixture.json: frozen FC pose snapshot at
    Derkachi sector centre, schema v1.
  * secrets/mavlink-test-passkey.txt: 64-hex with required
    `# TEST ONLY` header line per AC-5. Passkey-equality test now
    compares the secret line after stripping the header.
  * security/cve-2025-53644.jpg: synthetic 158-byte malformed JPEG
    (truncated SOS marker). OpenCV 4.11.x rejects gracefully with
    imdecode → None. AZ-439 will sharpen for ASan instrumentation.
  * Top-level Makefile with `make fixtures` / `make fixtures-*` /
    `make e2e-tier1*` / `make unit-tests` targets.

AZ-444 — Tier-2 Jetson harness wrapper (5pt):
  * run-tier2.sh rewritten as orchestrator. Detects local
    (aarch64 + TIER2_HOST=localhost) vs remote (ssh into TIER2_HOST).
    New flags: -k/--selector, --build-kind production|asan,
    --reflash (gated behind TIER2_REFLASH_ACK=1 two-key gate),
    --dry-run.
  * tier2-on-jetson.sh (new) — on-device delegate. Verifies
    gps-denied-onboard{,-asan}.service health; restarts with 5s
    tolerance; spawns tegrastats + jtop parallel samplers; tails
    ASan unit's journal in asan mode; drives docker compose with
    TIER=tier2-jetson; forwards SELECTOR to pytest -k.
  * docker/run-tier1.sh (new) — selector-parity sibling.
  * AC-1 (selector parity) and AC-6 (reflash gating) unit-tested via
    --dry-run output assertions. AC-2/AC-3/AC-4/AC-5 are hardware-
    loop ACs verified by the Tier-2 runtime smoke (no Jetson in the
    unit-test layer).

AZ-445 — CSV reporter + evidence bundler refinements (2pt):
  * reporting/nfr_recorder.py (new) — pytest plugin. Provides the
    `nfr_recorder` fixture with record_metric(name, value, ac_id)
    and partial(ac_id, reason). At session end emits:
      - per-nfr/<scenario_id>.json (AC-1)
      - traceability-status.json with every AC ID parsed from
        traceability-matrix.md, classified Covered/PARTIAL/NOT
        COVERED with source scenario IDs (AC-2)
      - regression-baseline.json with all numeric metrics (AC-3)
  * csv_reporter.py extended — `_outcome_to_result` consults the
    aggregator; rows flip PASS → PARTIAL when an AC was marked
    PARTIAL by nfr_recorder (AC-4). Graceful fallback when
    aggregator isn't registered (unit-test contexts).
  * conftest.py registers nfr_recorder in pytest_plugins.
  * New --traceability-matrix CLI flag seeds the NOT COVERED rows.

Build / config:
  * pyproject.toml dev extras: added Pillow>=10.4,<13.0 for the
    tile-cache-builder unit test (broad enough to keep torchvision's
    Pillow 12 pin happy; the production builder runs inside its own
    Docker image with its own pin).
  * Updated test_directory_layout.py to cover 10 new files + replaced
    the byte-equal passkey assertion with the header-stripping
    variant.

Test results:
  * 157 focused tests pass (was 97 in batch 67; +60 new across this
    batch). No regressions.

Module-layout / spec drift:
  * AZ-407 spec text says `tests/fixtures/...`; module-layout
    blackbox_tests entry (commit d7a17a8) authoritatively places the
    harness under `e2e/`. Implementation followed the layout entry.
  * AZ-444 spec mentions `e2e/tier2/run-tier2.sh`; AZ-406 placed it
    at `e2e/jetson/run-tier2.sh`. Kept at `e2e/jetson/` for
    consistency.
  * Cold-boot README ownership: corrected from AZ-419 to AZ-407 per
    AZ-419's own Dependencies field.

Specs archived to _docs/02_tasks/done/. Jira tickets transitioned to
In Testing on commit.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-16 17:18:01 +03:00
Oleksandr Bezdieniezhnykh 59d9116d36 [AZ-406] Blackbox test harness bootstrap (Tier-1 + Tier-2 scaffold)
Bootstraps the public-boundary blackbox test harness owned by epic
AZ-262 (E-BBT). Establishes the e2e/ directory tree at the repo root,
fully separated from src/gps_denied_onboard/** and from the in-process
tests/** tree, and commits to the contracts every subsequent test
ticket (AZ-407..AZ-446) builds against.

Tier-1 (workstation Docker):
- docker/docker-compose.test.yml wires SUT + ArduPilot SITL + iNav SITL
  + mock Suite Sat Service + mavproxy listener + e2e-runner onto one
  e2e-net bridge with internal: true (enforces RESTRICT-SAT-1 /
  NFT-SEC-02 egress isolation at the network layer).
- docker/docker-compose.tier2-bridge.yml override disables the in-
  compose SUT so Tier-2 pairs SITLs + mock + runner on an x86 host
  while the SUT runs natively on the Jetson under systemd.

Tier-2 (Jetson):
- jetson/run-tier2.sh + tier2.service systemd unit + tegrastats /
  jtop parsers feed per-sample telemetry into the evidence bundle.

Runner image (e2e/runner/):
- Dockerfile + requirements.txt install ONLY ground-side libs
  (pymavlink, opencv-python>=4.12, numpy/scipy/geopy/pyproj, httpx,
  orjson, pydantic, structlog, pytest 8.x). The runner deliberately
  does NOT install the SUT package.
- conftest.py implements the AC-9 skip-rule mapping (tier2_only,
  chamber_only, vins_mono, deferred_ac) tied to environment.md
  parametrize axes.
- reporting/csv_reporter.py is a pytest plugin emitting one row per
  test with the exact 11-column schema from environment.md §
  Reporting (test_id, test_name, traces_to, fc_adapter, vio_strategy,
  tier, started_at_utc, execution_time_ms, result, error_message,
  evidence_paths). XFAIL surfaced only when a test carries
  @pytest.mark.deferred_ac(verdict="xfail", reason=...).
- reporting/evidence_bundler.py exposes the attach_evidence fixture
  that copies per-test artifacts (.tlog, FDR archives, screenshots,
  tegrastats / jtop CSVs) into the run bundle and records relative
  paths into the reporter's evidence_paths column.
- helpers/{frame_source_replay,imu_replay,sitl_observer,
  mavproxy_tlog_reader,fdr_reader}.py declare the public surfaces
  (concrete implementations owned by AZ-407 / AZ-408 / AZ-416 /
  AZ-417 / AZ-441 per the dependency table); helpers/geo.py ships
  today (no downstream task dep) — WGS84 distance / forward-bearing
  / offset via pyproj with NaN rejection.

Mock Suite Sat Service (e2e/fixtures/mock-suite-sat/):
- FastAPI app: POST /tiles (ingest contract from D-PROJ-2 follow-up),
  GET /tiles/audit + /mock/audit (per-run read-back), POST
  /mock/config (force-status, response delay), POST /mock/reset
  (clears audit between tests), GET /mock/health.

Fixture scaffolds (e2e/fixtures/{tile-cache-builder, age-injector,
injectors, cold-boot, secrets, security}/):
- Public surfaces only. Concrete builders land in AZ-407 (static
  fixtures), AZ-408 (runtime synthetic injection), AZ-419 (cold-boot
  fixture), AZ-439 (CVE-2025-53644 JPEG generator).

Test tree (e2e/tests/{positive,negative,performance,resilience,
security,resource_limit}/):
- Mirror of the test-spec category grouping in
  _docs/02_document/tests/*-tests.md.
- tests/positive/test_smoke.py is the AC-1 harness-boot smoke run
  inside the e2e-runner image once Docker brings everything up.

Out-of-container unit tests (e2e/_unit_tests/):
- Exercises the harness internals (CSV reporter plugin lifecycle,
  conftest skip rules, helper modules, parsers, mock app, compose
  YAML structural contract, public-boundary enforcement) without
  Docker / SITL. 97 unit tests, all passing.

Build / config:
- pyproject.toml: testpaths extended with e2e/_unit_tests; pythonpath
  extended with e2e; fastapi>=0.111,<0.120 added to dev extras for the
  mock-app TestClient unit test.

AC coverage:
- AC-1 (Tier-1 boot)         → compose YAML test + directory layout
                                + smoke test (Docker-bound)
- AC-2 (mock services)       → 6 FastAPI TestClient unit tests
- AC-3 (SITLs accept output) → contract present; concrete check
                                deferred to AZ-416 / AZ-417
- AC-4 (CSV columns)         → in-process plugin lifecycle test
                                emits the exact 11-column schema
- AC-5 (egress isolation)    → static config test + runtime probe
                                in Docker-bound smoke
- AC-6 (Tier-2 contract)     → tegrastats + jtop parser unit tests
                                + jetson/* layout test; full Tier-2
                                contract is AZ-444
- AC-7 (fixture reproducibility) → deferred to AZ-407 per task spec
- AC-8 (parametrize matrix)  → vins_mono skip-rule cases +
                                tests/positive/test_smoke
- AC-9 (skip semantics)      → 9 conftest skip-rule unit tests

Module layout entry for blackbox_tests was added in 2026-05-16
preparatory commit d7a17a8 so this diff stays focused on the harness
scaffold. AZ-406 advances to In Testing on commit.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-16 16:22:44 +03:00