diff --git a/_docs/02_document/tests/resilience-tests.md b/_docs/02_document/tests/resilience-tests.md index dc62acf..cfe41cb 100644 --- a/_docs/02_document/tests/resilience-tests.md +++ b/_docs/02_document/tests/resilience-tests.md @@ -106,3 +106,40 @@ - Recovery gate is honored — early recovery attempts (FC GPS healthy for <10 s) MUST NOT promote spoofed GPS back into the estimator. **Duration**: ~10 min total for three windows. + +--- + +### NFT-RES-05: Composition-root bootstrap contract (replay-mode minimal config, operator-error contract, Tier-2 log boundary) + +**Summary**: Validates the airborne composition-root bootstrap contract introduced by AZ-591 / AZ-618 / AZ-687. Three sub-cases pin the cross-cycle plumbing that ALL Tier-1 and Tier-2 replay tests depend on: +(a) replay-mode `build_pre_constructed(config)` succeeds without a `c6_tile_cache` config block (AZ-687 AC-687-1); +(b) on a misconfigured airborne bootstrap, the SUT exits non-zero with stderr carrying the `airborne_bootstrap:` prefix, the consuming component slug, the missing infrastructure key or `BUILD_*` flag, and one actionable sentence (AZ-618 operator-facing-error-contract NFR + AC-4 non-success branch); +(c) every successful Tier-2 replay run logs both `replay.compose_root.ready` and `replay.input.frame_emitted` to stdout (AZ-618 AC-5, AZ-687 AC-687-3). +**Traces to**: AC-NEW-1 (precondition for cold-start TTFF), AC-4.1 (precondition for latency budget) — protects the assembly path every product-AC test relies on. + +**Preconditions**: +- Tier-1 acceptable for sub-cases (a) and (b); sub-case (c) is Tier-2-only. +- For (a): replay CLI invocation with a synthesized `Config` that has `mode == "replay"` and `components` that does NOT include a `c6_tile_cache` block. Public surface: `scripts/run_replay.sh` (or equivalent CLI entrypoint) pointed at `_docs/00_problem/input_data/flight_derkachi/`. +- For (b): minimal live-mode `Config` selecting a component strategy whose required `BUILD_*` flag is set to OFF in the environment (e.g. `c2_vpr.strategy="net_vlad"` with `BUILD_PYTORCH_FP16_RUNTIME=OFF`). No SUT internals are imported; the test reads stderr. +- For (c): standard Tier-2 Jetson e2e replay invocation per `tests/e2e/replay/test_derkachi_1min.py` (AC-1, AC-2, AC-5, AC-6). + +**Fault injection** (sub-case b only): set the gating `BUILD_*` env var to `OFF` for a strategy that the config selects, then start the SUT. + +**Steps**: + +| Sub-case | Step | Consumer Action | Expected Behavior | +|----------|------|-----------------|-------------------| +| (a) replay minimal | 1 | Start SUT in replay mode against `derkachi-fixture`, with no `c6_tile_cache` block in the synthesized config | SUT process reaches the replay coordinator without exiting; stdout contains `replay.compose_root.ready` within 60 s | +| (a) replay minimal | 2 | Observe at least one outbound frame emission | stdout contains at least one `replay.input.frame_emitted` log line within the same run | +| (b) misconfig | 1 | Start SUT in live mode with a config that selects a strategy whose `BUILD_*` flag is OFF | SUT exits with `EXIT_GENERIC_FAILURE` (`1`) | +| (b) misconfig | 2 | Read SUT stderr | stderr contains `airborne_bootstrap:` prefix AND the consuming component slug (e.g. `c2_vpr`) AND either the missing infrastructure key (e.g. `c7_inference`) or the gating `BUILD_*` flag name AND one actionable sentence (regex: `set \`BUILD_[A-Z0-9_]+\`` OR `ensure \`[a-z0-9_]+\.[a-z0-9_]+\` is`) | +| (c) Tier-2 log boundary | 1 | Run `scripts/run-tests-jetson.sh tests/e2e/replay/test_derkachi_1min.py` (or its compose-bypass equivalent) | Each of AC-1 / AC-2 / AC-5 / AC-6 produces stdout that crosses both `replay.compose_root.ready` AND `replay.input.frame_emitted` log boundaries | + +**Pass criteria**: +- (a) Process does not exit during bootstrap; both boundary log lines appear within 60 s of process start. +- (b) Exit code is `1`; stderr matches the four-field contract (prefix + component slug + missing key/flag + actionable sentence). +- (c) Every AC sub-test in the Tier-2 invocation has both log lines in its captured stdout; batch report `Tier-2 evidence:` field references the terminal log path per `_docs/02_document/tests/tier2-jetson-testing.md`. + +**Duration**: ~3 min (Tier-1 sub-cases a+b combined); sub-case (c) is observed inside the existing Tier-2 e2e replay run, no separate duration. + +**Rationale**: This scenario formalizes the cross-cycle bootstrap invariant. The Tier-1 contract path bypasses `compose_root`'s registry-driven assembly via `replay_components_factory`, so without this test the failure modes AZ-687 caught (replay-mode `KeyError`) and AZ-618 caught (empty `pre_constructed`) regress silently — both were missed by the Tier-1 suite that was 3343/0 green at the moment of the Jetson rerun. See `_docs/02_document/tests/tier2-jetson-testing.md` § Rationale. diff --git a/_docs/02_document/tests/traceability-matrix.md b/_docs/02_document/tests/traceability-matrix.md index a37ae34..9b39257 100644 --- a/_docs/02_document/tests/traceability-matrix.md +++ b/_docs/02_document/tests/traceability-matrix.md @@ -20,7 +20,7 @@ This matrix is the canonical view of test coverage for the planning context. It | AC-3.3 | Handle ≥3 disconnected segments via satellite re-loc | FT-P-08 | Covered | | AC-3.4 | On ≥3 frames + ≥2 s outage, request operator re-loc; FC dead-reckons | FT-N-03 | Covered | | AC-3.5 | Visual blackout + spoofed GPS failsafe | FT-N-04 | Covered | -| AC-4.1 | E2E latency <400 ms p95 | NFT-PERF-01 (Tier-2) | Covered | +| AC-4.1 | E2E latency <400 ms p95 | NFT-PERF-01 (Tier-2), NFT-RES-05 (bootstrap precondition) | Covered | | AC-4.2 | Memory <8 GB on Jetson | NFT-LIM-01 (Tier-2) | Covered | | AC-4.3 | FC output contract: GPS_INPUT (AP) + MSP2_SENSOR_GPS (iNav) with honest covariance | FT-P-03, FT-P-09-AP, FT-P-09-iNav | Covered | | AC-4.4 | Estimates streamed frame-by-frame | NFT-PERF-02 | Covered | @@ -39,7 +39,7 @@ This matrix is the canonical view of test coverage for the planning context. It | AC-8.4 | Mid-flight tile generation with quality metadata | FT-P-17 | Covered | | AC-8.5 | No raw nav/AI-cam frame retention except thumbnail log | FT-P-18 | Covered | | AC-8.6 | Satellite relocalization scale-ratio + scene-change | FT-P-19 (scale FULL; scene-change PARTIAL) | PARTIAL — scene-change subset reduced confidence (only 2/60 stills have paired sat refs; no labeled change-pair dataset). Independent of the AC-NEW-4 / AC-NEW-7 multi-flight gap (those rows were resolved by AC-text relaxation 2026-05-09; AC-8.6 scene-change still requires a labeled change-pair dataset that synthetic perturbations cannot substitute for). Mitigation: deferred to a follow-up cycle when labeled change-pair data becomes available; surfaced in the Step 4 risk register | -| AC-NEW-1 | Cold-start TTFF <30 s p95 | NFT-PERF-03 (Tier-2) | Covered | +| AC-NEW-1 | Cold-start TTFF <30 s p95 | NFT-PERF-03 (Tier-2), NFT-RES-05 (bootstrap precondition) | Covered | | AC-NEW-2 | Spoofing-promotion latency <3 s p95 | NFT-PERF-04 | Covered | | AC-NEW-3 | FDR ≤64 GB / flight, no silent drops | NFT-LIM-02 | Covered | | AC-NEW-4 | False-position safety: P(>500 m)<0.1%, P(>1 km)<0.01% | NFT-RES-03 | Covered — AC text relaxed 2026-05-09 to Monte-Carlo-over-current-data with stated 95% CI (Plan Phase 2a.0 outcome). Multi-flight statistical headroom is residual risk in the Step 4 risk register; D-PROJ-3 reopens validation when additional multi-flight data becomes available | @@ -76,6 +76,8 @@ This matrix is the canonical view of test coverage for the planning context. It ## Coverage Summary > Revised 2026-05-09 (Plan Phase 2a.0 outcomes): three rows moved PARTIAL → Covered (AC-NEW-4, AC-NEW-7, RESTRICT-FAIL-2) following AC-text relaxation per Q3=B. Restriction row count corrected from 19 to 20 (pre-existing arithmetic error). +> +> Revised 2026-05-19 (Greenfield Step 12 cycle-update — autodev): NFT-RES-05 appended to `resilience-tests.md` capturing the composition-root bootstrap contract introduced by AZ-591 / AZ-618 / AZ-687 (replay-mode minimal config, `AirborneBootstrapError` operator-error contract, Tier-2 `replay.compose_root.ready` + `replay.input.frame_emitted` log-boundary gate). NFT-RES-05 is added to AC-NEW-1 and AC-4.1 as bootstrap-precondition coverage; no coverage counts move because the scenario is supplementary, not promoting any PARTIAL row. | Category | Total Items | Covered | PARTIAL | Not Covered | Coverage % (Covered + PARTIAL counted half) | |----------|-----------|---------|---------|-------------|--------------------------------------------|