[AZ-416] [AZ-417] [AZ-419] Test batch 72: FT-P-09 AP/iNav + FT-P-11 cold start

- AZ-416 (FT-P-09-AP): fills mavproxy_tlog_reader.iter_messages with
  pymavlink body (AZ-406 surface kept); adds ap_contract_evaluator
  covering AC-1 (signing handshake <=5s), AC-2 (GPS_INPUT >=4.5 Hz),
  AC-3 (EK3_SRC1_POSXY=3), AC-4 (GPS_RAW_INT health >=80%); scenario
  forces fc_adapter=ardupilot.
- AZ-417 (FT-P-09-iNav): msp_frame_observer covering AC-2 (MSP rate)
  and AC-3 (fix_type/provider/numSat); scenario forces
  fc_adapter=inav.
- AZ-419 (FT-P-11): cold_start_evaluator covering AC-1 (operator
  manifest origin), AC-2 (FC EKF fallback), AC-3 (no-origin abort),
  AC-4 (bounded-delta conflict, ADR-010 Principle #11 amended);
  scenario parametrized on origin_source plus dedicated no-origin
  abort scenario.
- All scenarios skip-gated on upstream frame_source_replay /
  imu_replay / fdr_reader / sitl_observer extensions.
- +67 unit tests; full e2e unit suite: 460 passed.
- K=3 cumulative review fired: PASS for batches 70-72.

See _docs/03_implementation/batch_72_report.md,
_docs/03_implementation/reviews/batch_72_review.md,
_docs/03_implementation/cumulative_review_batches_70-72_cycle1_report.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-17 07:49:17 +03:00
parent c6e6cba237
commit a644debdb7
19 changed files with 3041 additions and 9 deletions
+142
View File
@@ -0,0 +1,142 @@
# Batch 72 Report — Test Implementation (cycle 1, batch 6 of test phase)
**Batch**: 72
**Date**: 2026-05-16
**Context**: Test implementation (greenfield Step 10 — Implement Tests)
**Tasks**: AZ-416 (5pt), AZ-417 (3pt), AZ-419 (3pt) — 11 cp / 3 tasks
**Cycle**: 1
**Verdict**: COMPLETE — PASS (self-reviewed + K=3 cumulative reviewed; see
`reviews/batch_72_review.md` and `cumulative_review_batches_70-72_cycle1_report.md`)
## Summary
FC contract conformance + cold-start init — the three remaining
scenarios that consume mavproxy / signing / cold-boot fixtures already
built in batches 67-68. Same pattern as prior batches:
* Pure-logic helper under `e2e/runner/helpers/` (everything the
scenario can express without docker-bound SITL access).
* Scenario file(s) under `e2e/tests/positive/`, parameterized across
conftest fixtures, skip-gated on upstream replay / SITL observer
/ FDR helpers (auto-activates when AZ-441 + AZ-407 leftovers land).
* Helper-driven unit test file under `e2e/_unit_tests/helpers/`.
### AZ-416 — FT-P-09-AP ArduPilot signing + GPS_INPUT contract (5pt)
* **`runner/helpers/mavproxy_tlog_reader.py`** — AZ-416 fills in the
pymavlink-backed `iter_messages` body that AZ-406 reserved. Uses
`mavutil.mavlink_connection(str(tlog_path))` with `recv_match` to
iterate frames; exposes `TlogMessage(timestamp_us, msg_type, signed,
fields)`. The `signed` flag uses `msg.get_signed()` with a
defensive `AttributeError` fallback. The function is FAIL-FAST on
missing files (raises FileNotFoundError); pymavlink's BAD_DATA
frames are skipped silently per the standard idiom.
* **`runner/helpers/ap_contract_evaluator.py`** — four analysers:
- `observe_signing_handshake` (AC-1): first signed frame within
`HANDSHAKE_BUDGET_S = 5.0` s AND no `BAD_SIGNATURE` STATUSTEXT
within that window.
- `compute_gps_input_rate` (AC-2): GPS_INPUT cadence ≥4.5 Hz
(constant `GPS_INPUT_MIN_RATE_HZ`).
- `validate_ek3_src1_posxy` (AC-3): the AP EKF source-set parameter
must equal `EK3_SRC1_POSXY_REQUIRED = 3` (GPS).
- `evaluate_gps_raw_int_health` (AC-4): GPS_RAW_INT
`fix_type ≥ 3 AND eph ≤ 200` for ≥80 % of the window.
- `collect_messages_to_list` — explicit single-pass-iterator
materialisation so multiple analysers can share the tlog.
* **`tests/positive/test_ft_p_09_ap_signing.py`** — scenario forces
`fc_adapter=ardupilot` (skips other adapters), parameterised per
`vio_strategy`. Records `signing_handshake_s`,
`gps_input_rate_hz`, `ek3_src1_posxy`, `gps_raw_int_healthy_fraction`
NFR metrics with AC IDs.
* **22 unit tests** in `test_ap_contract_evaluator.py` + **6** in
`test_mavproxy_tlog_reader.py`.
### AZ-417 — FT-P-09-iNav MSP2_SENSOR_GPS contract (3pt)
* **`runner/helpers/msp_frame_observer.py`** — pure logic for AC-2
(`compute_rate_hz` with `MSP2_SENSOR_GPS_FUNCTION_ID = 0x1F03` +
`MIN_OBSERVED_RATE_HZ = 4.5`) and AC-3 (`evaluate_inav_gps_state`
with `MIN_FIX_TYPE = 3` and `REQUIRED_PROVIDER = "MSP"`).
* **`tests/positive/test_ft_p_09_inav.py`** — scenario forces
`fc_adapter=inav` (skips other adapters), parameterised per
`vio_strategy`. Probes TCP handshake via
`sitl_observer.observe_inav_tcp_handshake` (gated), captures MSP
frames via `collect_inav_msp_frames` (gated), queries iNav GPS
state via `query_inav_gps_state` (gated).
* **14 unit tests** in `test_msp_frame_observer.py`.
### AZ-419 — FT-P-11 cold-start init (3pt)
* **`runner/helpers/cold_start_evaluator.py`** — covers ADR-010's
primary + secondary + bounded-delta paths plus AC-3 no-origin
abort:
- `write_manifest` / `read_manifest` — test-fixture builder for the
C10 Manifest's `flight.takeoff_origin` (the test fabricates one
instead of fetching from C12 because the SUT consumes a Manifest
file path, not a service URL).
- `read_cold_boot_fixture` — parse the AZ-408 fixture JSON into a
typed `ColdBootSnapshot` (converts `lat_e7 / lon_e7 / alt_mm`
decimal degrees + meters).
- `evaluate_first_estimate` (AC-1/2/4): distance vs expected origin
+ source_label rule for bounded-delta + FDR record audit.
- `evaluate_no_origin_path` (AC-3): SUT must produce NO outbound
estimate AND FDR must record `c5.cold_start_origin.unavailable`.
- Constants for accuracy budget (50 m), bounded-delta trigger
(200 m), forbidden first-label (`satellite_anchored`), and the
three FDR record types.
* **`tests/positive/test_ft_p_11_cold_start_init.py`** — two scenario
functions:
- `test_ft_p_11_cold_start_origin_variants` — parametrized on
`origin_source ∈ {operator_manifest, fc_ekf,
bounded_delta_conflict}`; one fixture / one assertion path per
variant.
- `test_ft_p_11_cold_start_no_origin_aborts` — AC-3 dedicated
scenario.
Both rely on `sitl_observer.prepare_sitl_cold_boot` +
`prepare_sitl_no_gps` (gated until AZ-407 leftovers land).
* **19 unit tests** in `test_cold_start_evaluator.py`.
## Tests
* **Full e2e unit suite**: 460 passed in 134.35 s (was 393 at end of
batch 71 → +67 net new tests this batch).
* **Pre-existing**: macOS-only `/e2e-results` plugin issue in
scenario invocation outside Docker. Unit suite unaffected.
## Files Touched
**New helpers:**
* `e2e/runner/helpers/msp_frame_observer.py`
* `e2e/runner/helpers/ap_contract_evaluator.py`
* `e2e/runner/helpers/cold_start_evaluator.py`
**Modified helper:**
* `e2e/runner/helpers/mavproxy_tlog_reader.py` — AZ-416 fills the
pymavlink-backed `iter_messages` body that AZ-406 reserved
(NotImplementedError → real iterator). Surface unchanged.
**New unit tests:**
* `e2e/_unit_tests/helpers/test_mavproxy_tlog_reader.py` (6 tests)
* `e2e/_unit_tests/helpers/test_ap_contract_evaluator.py` (22 tests)
* `e2e/_unit_tests/helpers/test_msp_frame_observer.py` (14 tests)
* `e2e/_unit_tests/helpers/test_cold_start_evaluator.py` (19 tests)
**New scenarios:**
* `e2e/tests/positive/test_ft_p_09_ap_signing.py`
* `e2e/tests/positive/test_ft_p_09_inav.py`
* `e2e/tests/positive/test_ft_p_11_cold_start_init.py`
**Updated:**
* `e2e/_unit_tests/test_directory_layout.py` — added 6 new paths.
**Archived:**
* `_docs/02_tasks/todo/AZ-416_*.md` → `done/`
* `_docs/02_tasks/todo/AZ-417_*.md` → `done/`
* `_docs/02_tasks/todo/AZ-419_*.md` → `done/`
## Cumulative Review Trigger
K=3 FIRED at end of batch 72 (last cumulative covered batches 67-69;
since then 70 + 71 + 72 = 3 batches). Report written:
`_docs/03_implementation/cumulative_review_batches_70-72_cycle1_report.md`.
Verdict: PASS. Next cumulative trigger: end of batch 75.
@@ -0,0 +1,194 @@
# Cumulative Code Review Report — Batches 7072 (cycle 1, test phase)
**Date**: 2026-05-16
**Mode**: cumulative
**Scope**: union of files changed in batches 70, 71, 72 of cycle 1
(the test-implementation phase batches that followed the
`batches_67-69` cumulative review).
**Verdict**: PASS
## Batch coverage
| Batch | Tasks | Theme |
|-------|-------|-------|
| 70 | AZ-409, AZ-412, AZ-413 | Still-image accuracy (FT-P-01), Derkachi frame-to-frame registration (FT-P-04), satellite anchor + MRE budgets (FT-P-05 + FT-P-06) |
| 71 | AZ-414, AZ-415, AZ-418 | Sharp-turn recovery + failure twin (FT-P-07 + FT-N-02), multi-segment relocalisation (FT-P-08), GTSAM smoothing-loop look-back (FT-P-10) |
| 72 | AZ-416, AZ-417, AZ-419 | ArduPilot GPS_INPUT contract + signing handshake (FT-P-09-AP), iNav MSP2_SENSOR_GPS contract (FT-P-09-iNav), cold-start initialization (FT-P-11 — 3 origin_source variants + no-origin abort) |
Cycle 1 product implementation under `src/gps_denied_onboard/**` is
out of scope; drift between product and test phases is checked by
`test_no_sut_imports.py` (passing).
## Phase 1 — Context Loading
* Read `_docs/02_document/module-layout.md` § `blackbox_tests`.
* Read `_docs/02_document/architecture.md` § layering.
* Reviewed batch reports `batch_70_report.md`, `batch_71_report.md`,
`batch_72_report.md` (in-progress draft).
* Reviewed task specs AZ-409, AZ-410 (prior), AZ-411 (prior), AZ-412,
AZ-413, AZ-414, AZ-415, AZ-416, AZ-417, AZ-418, AZ-419.
* Cross-referenced the prior `cumulative_review_batches_67-69`
conclusions to verify the K=3 cumulative cadence is honoured.
## Phase 2 — Spec Compliance
Per-task AC coverage at the end of batch 72:
| Task | Status |
|------|--------|
| AZ-409 (FT-P-01) | Helper + scenario + 20 unit tests; AC-1..AC-7 covered |
| AZ-412 (FT-P-04) | Helper + scenario + 26 unit tests; AC-1..AC-5 covered |
| AZ-413 (FT-P-05 + FT-P-06) | Helper + 2 scenarios + 22 unit tests; AC-1..AC-4 covered (FT-P-06 piggybacks on FT-P-04 + FT-P-05 evidence CSVs) |
| AZ-414 (FT-P-07 + FT-N-02) | Helper + 2 scenarios + 30 unit tests; AC-1..AC-7 (FT-P-07) AND AC-1..AC-7 (FT-N-02) covered via the shared `sharp_turn_detector` helper |
| AZ-415 (FT-P-08) | Helper + scenario + 16 unit tests; AC-1..AC-4 covered |
| AZ-416 (FT-P-09-AP) | Helper + scenario + 22 unit tests (ap_contract_evaluator) + 6 unit tests (mavproxy_tlog_reader); AC-1..AC-5 + D-C8-9 covered |
| AZ-417 (FT-P-09-iNav) | Helper + scenario + 14 unit tests; AC-1..AC-4 covered |
| AZ-418 (FT-P-10) | Helper + scenario + 15 unit tests; AC-1..AC-3 covered |
| AZ-419 (FT-P-11) | Helper + 2 scenarios + 19 unit tests; AC-1..AC-5 covered (3 origin_source parametrize variants + 1 no-origin abort scenario) |
All scenarios are skip-gated on the AZ-441 / AZ-407 leftovers
(`frame_source_replay`, `imu_replay`, `fdr_reader`, `sitl_observer`
ext methods); pure-logic acceptance is fully covered in the
`e2e/_unit_tests/helpers/` test files.
## Phase 3 — Code Quality
* **Single responsibility**: each helper owns ONE analytic concern:
- `accuracy_evaluator` — still-image Vincenty + pass-count rules
- `registration_classifier` — IMU-derived attitude + normal-segment
classification + success ratio
- `mre_evaluator` — per-image cross-domain + 95th-percentile MRE
- `anchor_pair_detector` — drift binning + monotonicity
- `estimate_schema` — schema validation + WGS84 range + int32
decoding
- `sharp_turn_detector` — gyro_z run detection + during-turn label/cov
+ recovery lag/drift/heading
- `multi_segment_evaluator` — multi-window relocalisation
- `smoothing_evaluator` — raw + smoothed pose pair + improvement rate
- `mavproxy_tlog_reader` — pymavlink tlog frame iteration
- `ap_contract_evaluator` — signing handshake + GPS_INPUT rate +
EK3 source-set + GPS_RAW_INT health
- `msp_frame_observer` — MSP rate + iNav GPS state evaluation
- `cold_start_evaluator` — Manifest build/read + cold-boot snapshot
parse + first-estimate / no-origin / bounded-delta evaluation
* **No suppressed errors**: the only narrow `try`/`except` is in
`mavproxy_tlog_reader.iter_messages` for pymavlink's `BAD_DATA` +
per-message `to_dict` exceptions — documented in the docstring as
the standard pymavlink iteration idiom.
* **AAA discipline**: all 460 unit tests use `# Arrange / # Act /
# Assert`.
* **No narration comments** in any new module; docstrings carry
intent + AC mapping + Mode B Facts where relevant (Fact #107 in
`smoothing_evaluator`, Fact #109 noted in scenario docstrings of
AZ-416 + AZ-417, ADR-010 Principle #11 in `cold_start_evaluator`).
## Phase 4 — Security
* **`test_no_sut_imports.py` passes** — no e2e helper or test file
imports `src/gps_denied_onboard`.
* **Signing channel observability**: AZ-416 helper observes signed
frames + BAD_SIGNATURE STATUSTEXT events without ever validating
the signature itself (that's pymavlink + AP-side wiring). The
scenario "Forbidden" list (no bypass to unsigned channel) is
honoured — `passes` returns False if any `BAD_SIGNATURE` STATUSTEXT
appears in the handshake window OR no signed frame arrives.
* **Test passkey hygiene**: `test_passkey_files_match` (pre-existing)
still passes; AZ-416 scenario consumes the docker-secret fixture
only.
* **No credentials in source**: confirmed by grep across all batch
72 added modules.
## Phase 5 — Performance
* Across all 12 helpers added in batches 70-72, every analyser is O(N)
over its input.
* `mavproxy_tlog_reader` materialises to a list ONCE per scenario via
`ap_contract_evaluator.collect_messages_to_list` so multiple
analysers can share the result — the alternative (re-iterating the
generator) would re-open the pymavlink connection per analyser.
* No nested CSV reads or repeated geodesic recomputations in any
helper across the three batches.
## Phase 6 — Cross-Task Consistency
Verified across all 9 tasks in the 70-72 window:
* **Skip gate pattern**: every scenario uses an
`_*_harness_implemented` fixture that probes one or more
`NotImplementedError`-raising helpers and skips with a single,
spec-referenced message naming the upstream owner (AZ-441 / AZ-407)
and the pure-logic unit-test file that DOES cover the AC.
* **Constants discipline**: every scenario assertion message
references the helper's exported constant by name (e.g.
`ace.HANDSHAKE_BUDGET_S`, `cse.BOUNDED_DELTA_TRIGGER_M`,
`std.MAX_RECOVERY_FRAMES_SAFETY_MS`), not magic numbers.
* **Evidence emission**: every scenario emits per-scenario NFR metrics
via `nfr_recorder.record_metric(name, value, ac_id=…)`. Per-test CSV
artifacts use `write_csv_evidence(out, …)` returning the path —
same idiom in `accuracy_evaluator`, `mre_evaluator`,
`multi_segment_evaluator`, `smoothing_evaluator`,
`sharp_turn_detector`.
* **Trace markers**: every scenario uses `@pytest.mark.traces_to(...)`
with comma-separated AC IDs, matching the
`monorepo-document`-owned traceability format used by batches 67-69.
* **Helper return shape**: every analyser returns a frozen
`@dataclass` with a `passes` (or `passes_distance`, `passes_rate`,
etc.) property — so the scenario assertion is one boolean check
with a structured-data message.
* **No drift in shared types**: `TlogMessage` (AZ-406 surface, AZ-416
body) used identically across `mavproxy_tlog_reader.count_by_type`
and `ap_contract_evaluator.*` analysers.
## Phase 7 — Architecture Compliance
* **Module-layout invariant**: every new helper is under
`e2e/runner/helpers/`; every new scenario under
`e2e/tests/{positive,negative}/`; every new unit test under
`e2e/_unit_tests/helpers/`. `test_directory_layout.py` parametrize
list updated to enforce the invariant — 75 path entries pass.
* **Public-boundary**: every scenario uses only the FDR `record_type`
+ `payload` dict schema, outbound estimate stream, and SITL
observer surface; no SUT internals consumed.
* **Backwards compat with AZ-406 surface**: `mavproxy_tlog_reader`
filled in its body without changing the `TlogMessage` dataclass
shape or the `iter_messages` / `count_by_type` signatures, so
downstream consumers (FT-P-03/14 schema scenario, others) keep
working.
## Phase 8 — Test Suite Health Trend
| Batch end | Total tests | Delta |
|-----------|-------------|-------|
| 69 | 257 | (baseline) |
| 70 | 325 | +68 |
| 71 | 393 | +68 |
| 72 | 460 | +67 |
Net: +203 unit tests across batches 70-72 / 12 new helper modules + 9
new scenario files + 1 modified scenario file (FT-P-09-AP wired up
through the previously stub-only `mavproxy_tlog_reader`).
Pre-existing macOS-only `/e2e-results` plugin issue in scenario
invocation outside Docker is unaffected by all batch 70-72 changes;
unit suite untouched by it.
## Cross-Batch Consistency Verdict
PASS — no behavioural drift between batches; helper module shape +
scenario skeleton + skip-gate pattern + constants discipline + NFR
metrics format + traces_to marker format all identical across the 9
tasks.
## Architecture Compliance Verdict
PASS — public-boundary blackbox stance preserved across all 12 new
helpers; pymavlink boundary correctly placed at the tlog reader;
ADR-010 Principle #11 amended explicitly encoded in
`cold_start_evaluator`; Mode B Fact #107 preserved in
`smoothing_evaluator` docstring.
## Final Verdict
**PASS** — Batches 70-72 (AZ-409, AZ-412, AZ-413, AZ-414, AZ-415,
AZ-416, AZ-417, AZ-418, AZ-419 — 9 tasks / 27 cp) ready for the next
K=3 cumulative review at end of batch 75.
@@ -0,0 +1,176 @@
# Code Review Report
**Batch**: 72 — AZ-416, AZ-417, AZ-419
**Date**: 2026-05-16
**Verdict**: PASS
## Findings
(none)
## Findings Sweep
### Phase 1 — Context Loading
Loaded specs `AZ-416_ft_p_09_ap_signing.md`, `AZ-417_ft_p_09_inav.md`,
`AZ-419_ft_p_11_cold_start_init.md`. Re-read existing
`runner/helpers/mavproxy_tlog_reader.py` (AZ-406 surface to be filled
in by AZ-416 per the docstring), `sitl_observer.py`, `fdr_reader.py`,
`geo.py`. Read `fixtures/cold-boot/cold_boot_fixture.json` for FT-P-11
secondary path origin. Verified pymavlink ≥2.4 install + the
`MAVLink.get_signed()` API surface in the venv.
### Phase 2 — Spec Compliance
**AZ-416 (FT-P-09-AP)**
| AC | Coverage | Status |
|----|----------|--------|
| AC-1 (signing handshake ≤5 s, no BAD_SIGNATURE) | `test_handshake_passes_when_first_signed_within_window`, `test_handshake_fails_when_no_signed_within_window`, `test_handshake_fails_when_signed_arrives_after_budget`, `test_handshake_fails_on_bad_signature_statustext`, scenario assertion via `observe_signing_handshake` | Covered |
| AC-2 (GPS_INPUT ≥4.5 Hz for 5 Hz target) | `test_gps_input_rate_at_5hz_for_60s_passes`, `test_gps_input_rate_at_boundary_passes`, `test_gps_input_rate_below_minimum_fails`, scenario assertion via `compute_gps_input_rate` | Covered |
| AC-3 (EK3_SRC1_POSXY == 3) | `test_validate_ek3_src1_posxy_passes_at_3`, scenario via `validate_ek3_src1_posxy(sitl_observer.read_ap_parameter(...))` | Covered |
| AC-4 (GPS_RAW_INT healthy fraction ≥80 %) | `test_gps_raw_int_health_all_healthy_passes`, `test_gps_raw_int_health_at_80_pct_boundary_passes`, `test_gps_raw_int_health_below_80_pct_fails`, `test_gps_raw_int_health_eph_threshold_strict`, scenario via `evaluate_gps_raw_int_health` | Covered |
| AC-5 (vio_strategy parameterization; `fc_adapter` fixed to `ardupilot`) | scenario uses `vio_strategy` fixture from conftest; `fc_adapter != "ardupilot"` is skipped — collection across 6 variants reduces to 3 active variants | Covered |
| D-C8-9 (signing-handshake observability) | `traces_to` marker + handshake report includes `setup_signing_seen` | Covered |
Also: AZ-416's `mavproxy_tlog_reader.iter_messages` body landed
(previously raised NotImplementedError per the AZ-406 commit). 6 unit
tests in `test_mavproxy_tlog_reader.py` exercise the parser against
synthetic tlogs.
**AZ-417 (FT-P-09-iNav)**
| AC | Coverage | Status |
|----|----------|--------|
| AC-1 (TCP connect to inav-sitl:5760 ≤5 s) | scenario via `sitl_observer.observe_inav_tcp_handshake` (skip-gated) | Covered (gated) |
| AC-2 (MSP2_SENSOR_GPS ≥4.5 Hz for 5 Hz target) | `test_compute_rate_at_target_passes`, `test_compute_rate_at_boundary_passes`, `test_compute_rate_below_minimum_fails`, `test_compute_rate_filters_function_id`, scenario via `compute_rate_hz` | Covered |
| AC-3 (fix_type ≥3, provider=MSP, numSat matches emitted) | `test_evaluate_gps_state_passes_at_minimum_fix`, `test_evaluate_gps_state_fails_on_low_fix_type`, `test_evaluate_gps_state_fails_on_wrong_provider`, `test_evaluate_gps_state_fails_on_num_sat_mismatch`, scenario via `evaluate_inav_gps_state` | Covered |
| AC-4 (vio_strategy parameterization; `fc_adapter` fixed to `inav`) | scenario uses `vio_strategy` fixture; skips when `fc_adapter != "inav"` | Covered |
**AZ-419 (FT-P-11)**
| AC | Coverage | Status |
|----|----------|--------|
| AC-1 (operator_manifest: estimate ≤50 m of A; FDR `cold_start_origin.set(source="manifest")`) | `test_evaluate_operator_manifest_passes_at_origin`, `test_evaluate_operator_manifest_passes_just_inside_budget`, `test_evaluate_operator_manifest_fails_just_outside_budget`, scenario assertion | Covered |
| AC-2 (fc_ekf: estimate ≤50 m of FC EKF snapshot; FDR `source="fc_ekf"`) | `test_evaluate_fc_ekf_passes`, scenario assertion | Covered |
| AC-3 (no origin → SUT refuses takeoff; FDR `cold_start_origin.unavailable`) | `test_evaluate_no_origin_passes_when_silent_and_fdr_records_abort`, `test_evaluate_no_origin_fails_when_sut_emits_anything`, `test_evaluate_no_origin_fails_when_fdr_missing_unavailable_signal`, dedicated scenario `test_ft_p_11_cold_start_no_origin_aborts` | Covered |
| AC-4 (bounded-delta conflict: operator wins; source_label != satellite_anchored; FDR `gps_bounded_delta.reject`) | `test_evaluate_bounded_delta_conflict_operator_wins`, `test_evaluate_bounded_delta_fails_when_label_is_satellite_anchored`, scenario assertion (third parametrize variant) | Covered |
| AC-5 (parameterization across `fc_adapter, vio_strategy, origin_source`) | scenario uses conftest's `fc_adapter` + `vio_strategy`; parametrizes `origin_source ∈ {operator_manifest, fc_ekf, bounded_delta_conflict}` separately | Covered |
ADR-010 Principle #11 amended ("operator origin wins on bounded-delta
conflict; FC GPS logged as suspect") explicitly encoded as
`BOUNDED_DELTA_TRIGGER_M = 200.0` + the `c5.gps_bounded_delta.reject`
record audit.
### Phase 3 — Code Quality
* **Single responsibility**: `mavproxy_tlog_reader` only iterates/counts
tlog frames (file I/O concern); `ap_contract_evaluator` only consumes
`TlogMessage` iterables (analytics concern); `msp_frame_observer`
only consumes captured MSP samples. `cold_start_evaluator` is one
module because the three FT-P-11 variants share a single FDR record
vocabulary + Manifest schema; splitting them would force the scenario
to import three near-identical modules.
* **No suppressed errors**: `mavproxy_tlog_reader.iter_messages`
catches the narrow `BAD_DATA` + per-message `to_dict` exceptions
(documented in pymavlink) and continues, but the file-not-found
+ connection-close paths raise / surface naturally. No bare `except`
in any new module.
* **AAA comment discipline**: every test uses `# Arrange / # Act /
# Assert`; sections omitted when not needed.
* **No narration comments**: docstrings explain non-obvious intent
(AC mapping, why orphans excluded, why `materialize_to_list` exists,
why `EK3_SRC1_POSXY = 3` is the only acceptance value).
### Phase 4 — Security
* **No SUT imports**: confirmed by `test_no_sut_imports.py` (passing in
the full suite). None of the new modules import from
`src.gps_denied_onboard`.
* **Signing handshake stance**: the helper does NOT validate signatures
itself (that's pymavlink's job); it only counts signed-frame arrivals
and `BAD_SIGNATURE` STATUSTEXT incidents. If signing fails in any way
AC-1 fails — the scenario does NOT bypass to an unsigned channel
(per spec "Forbidden" list).
* **No secrets in source**: the AP scenario looks up
`mavlink-test-passkey.txt` from the on-disk fixture (already
verified by `test_passkey_files_match` in `test_directory_layout.py`).
The passkey itself is the AZ-407 / AZ-408 fixture, NOT a production
key.
* **No SQL/shell injection surface**: all helpers operate on bytes /
pathlib / dict; no subprocess calls in the helper layer (subprocess
for `msp_gps_toy` is the SITL-observer's responsibility).
### Phase 5 — Performance
* `mavproxy_tlog_reader.iter_messages` is a single pass over the tlog;
pymavlink's `recv_match(blocking=False)` is the standard idiom.
* `ap_contract_evaluator` consumes the materialised list ONCE per
analyser; `collect_messages_to_list` is the documented choice
(mavlink_connection's iterator closes on exhaustion so re-iteration
isn't safe). For typical 60 s of mavproxy traffic at ~50 msg/s this
is ≤3000 messages → trivial in memory.
* `cold_start_evaluator._scan_fdr_for_cold_start` is one pass.
* No nested loops over the same data.
### Phase 6 — Cross-Task Consistency
* **Pattern parity with batches 69 + 70 + 71**:
- Skip gate (`_*_harness_implemented` fixture) for missing upstream
replay/SITL/FDR helpers — same pattern as
`test_ft_p_02/04/05/07/08/10_*`.
- `_NullSink` probe — same idiom as the prior 5 scenario files.
- Evidence side-channel via `nfr_recorder.record_metric(name, value,
ac_id=…)` — same pattern as `test_ft_p_01/04/05/07/08/10_*`.
- Module-level constants (`UPPER_SNAKE`) for budgets — matches
`multi_segment_evaluator`, `mre_evaluator`, `smoothing_evaluator`,
`sharp_turn_detector`.
- Helper modules importable from `runner.helpers.*`.
* **No drift**: scenarios reuse the helper's constants (no magic
numbers) — `HANDSHAKE_BUDGET_S`, `GPS_INPUT_MIN_RATE_HZ`,
`MIN_FIX_TYPE`, `ACCURACY_BUDGET_M`, `BOUNDED_DELTA_TRIGGER_M`,
`FDR_RECORD_*`, `FORBIDDEN_FIRST_LABEL_BOUNDED_DELTA`.
* **No legacy NotImplementedError test left behind**: verified no test
asserts `iter_messages` raises NotImplementedError (was AZ-406's
surface contract; AZ-416 owns the body per docstring).
### Phase 7 — Architecture Compliance
* **Public-boundary discipline**: confirmed by `test_no_sut_imports.py`
(passing). Helpers consume pymavlink (a third-party MAVLink
reference impl, not SUT internals) + FDR record schema (record_type
+ payload dict) + outbound estimate schema. The signing handshake
observer specifically does NOT import the SUT's signing-key state
per the spec "Forbidden" list.
* **Directory layout**: new paths added to `test_directory_layout.py`
parametrize list (`runner/helpers/{msp_frame_observer,
ap_contract_evaluator, cold_start_evaluator}.py`,
`tests/positive/test_ft_p_{09_ap_signing, 09_inav, 11_cold_start_init}.py`).
All variants pass.
* **Determinism**: all helpers are deterministic — no `time.time()`,
no RNG; pymavlink parses bytes deterministically.
### Phase 8 — Test Suite Health
* Total: **460 passed in 134.35 s** (was 393 at end of batch 71).
* New tests this batch: **+67** (msp_frame_observer: 14;
mavproxy_tlog_reader: 6; ap_contract_evaluator: 22;
cold_start_evaluator: 19; directory_layout new entries: 6).
* Pre-existing macOS-only `/e2e-results` plugin issue still present —
affects scenario test invocation outside Docker only; unit suite
unaffected. Out of batch scope.
## Cross-Task Consistency Verdict
PASS — no cross-task drift, no duplicated logic across the four new
helpers, shared `TlogMessage` type used consistently between
`mavproxy_tlog_reader` and `ap_contract_evaluator`.
## Architecture Compliance Verdict
PASS — public-boundary blackbox stance preserved; no SUT imports;
pymavlink boundary correctly placed at the tlog reader.
## Final Verdict
**PASS** — Batch 72 (AZ-416 + AZ-417 + AZ-419) ready for commit.
+2 -2
View File
@@ -12,8 +12,8 @@ sub_step:
retry_count: 0
cycle: 1
tracker: jira
last_completed_batch: 71
last_cumulative_review: batches_67-69
last_completed_batch: 72
last_cumulative_review: batches_70-72
last_step_outcomes:
step_8: "Code is testable — no changes needed (testability_assessment.md committed; no list-of-changes, no source edits)"
step_9: "Already complete — 41 blackbox test tasks (AZ-406..AZ-446) under epic AZ-262 with specs in _docs/02_tasks/todo/ were produced in a prior cycle; AZ-406 test-infrastructure bootstrap also pre-existing. Folder fallback satisfied (todo/ has test tasks, _dependencies_table.md reflects 114 product + 41 test = 155 total). No Step-9 work executed in cycle 1."