diff --git a/_docs/02_tasks/todo/AZ-618_airborne_bootstrap_pre_constructed.md b/_docs/02_tasks/todo/AZ-618_airborne_bootstrap_pre_constructed.md new file mode 100644 index 0000000..3f8d325 --- /dev/null +++ b/_docs/02_tasks/todo/AZ-618_airborne_bootstrap_pre_constructed.md @@ -0,0 +1,255 @@ +# AZ-618 — Airborne main() builds pre_constructed infrastructure for compose_root + +**Task**: AZ-618_airborne_bootstrap_pre_constructed +**Name**: Airborne bootstrap pre_constructed assembly (cross-cutting Tier-1) +**Description**: Land an `airborne_bootstrap.build_pre_constructed(config) -> dict[str, Any]` function (or equivalent in-`main()` wiring) that constructs every infrastructure object the registered airborne-strategy wrappers require, and call `compose_root(config, pre_constructed=...)` with the result from `runtime_root.main()`. Without this, `compose_root()` raises `AirborneBootstrapError` on the first wrapper lookup (`c1_vio` reaches for `pre_constructed['c13_fdr']` and finds nothing) and the binary cannot reach takeoff. +**Complexity**: 5 points (cross-cutting; touches up to 12 infrastructure slots, but each slot reuses an existing per-component builder; GPU init for `c7_inference` + `c3_lightglue_runtime` + `c3_feature_extractor` is the only genuinely new wiring) +**Dependencies**: AZ-591 (registry registration is the prerequisite — without it the wrappers do not run at all). Helper / runtime classes consumed by the wrappers are all already in `done/` per their own task IDs (c13_fdr → AZ-273+, c6_descriptor_index → AZ-306, c6_tile_store → AZ-303+, c7_inference → AZ-320+, c3_lightglue_runtime + c3_feature_extractor → AZ-278+, c2_82_ransac_filter → AZ-358, c5_imu_preintegrator → AZ-276, c5_se3_utils → AZ-277, c5_wgs_converter → AZ-284, c5_isam2_graph_handle → AZ-381). +**Component**: runtime_root (cross-cutting) +**Tracker**: AZ-618 +**Epic**: AZ-602 (E2E Tier-1 harness rehabilitation — parent set during ticket creation) + +## Problem + +Step 11 (Run Tests) cycle 1 Jetson tier-2 e2e rerun #3 surfaced this gap. With AZ-614 (synth time-base) + AZ-611 (skip-auto-sync) + AZ-602 (compose `BUILD_*` flag completeness) all landed, the Derkachi 1-min replay path now passes every layer up to and including: + +``` +replay.compose_root.ready: pace=asap resolved_offset_ms=0 auto_sync_used=false +``` + +…then crashes inside `runtime_root.airborne_bootstrap._require`: + +``` +runtime_root: airborne_bootstrap: component 'c4_pose' requires +pre_constructed['c282_ransac_filter'] to be populated before compose_root() runs; +available keys in constructed: ['clock', 'fc_adapter', 'frame_source', +'mavlink_transport', 'replay_sink']. +Production main() must build infrastructure (c13_fdr, c6_*, c7_inference, etc.) +into pre_constructed and pass it to compose_root(config, pre_constructed=...). +Tests stub it via the same kwarg. +``` + +**Cause**: `runtime_root.main()` (`src/gps_denied_onboard/runtime_root/__init__.py:636`) calls `register_airborne_strategies()` (registers the wrapper factories — AZ-591 work) and then `compose_root(config)` with **no** `pre_constructed=`. The wrappers' `_require(constructed, "c13_fdr", "c1_vio")` etc. raise on the first lookup because the dict is empty. + +**Why hidden until now**: every prior Reality-Gate run died at auto-sync (AZ-614 root cause, 2026-05-17) BEFORE the composition graph was walked. AZ-591 was self-described as registering the "registry seam" — it explicitly deferred the `pre_constructed` assembly to a follow-up. That follow-up is this task. + +**Why both binaries are affected**: the live `gps-denied-onboard` binary would crash at the same lookup the moment any component reaches into `pre_constructed`. Existing unit tests for `compose_root` (`tests/unit/test_az401_compose_root_replay.py`, 38 passing) pass only because they inject a stub via the `replay_components_factory` kwarg, bypassing the registry-driven path entirely. There is currently no test that exercises the production assembly. + +## Outcome + +- `src/gps_denied_onboard/runtime_root/airborne_bootstrap.py` exposes a new + public `build_pre_constructed(config: Config) -> dict[str, Any]` that returns + a dict populated with every key in `AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS` + (12 distinct infrastructure objects: `c13_fdr`, `c6_descriptor_index`, + `c6_tile_store`, `c7_inference`, `c3_lightglue_runtime`, + `c3_feature_extractor`, `c282_ransac_filter`, `c5_wgs_converter`, + `c5_se3_utils`, `c5_isam2_graph_handle`, `c5_imu_preintegrator`, `clock`). + GPU-touching builders (`c7_inference`, `c3_lightglue_runtime`, + `c3_feature_extractor`) are gated by their existing `BUILD_*` env flags; + when a flag is OFF, the builder either skips (if the matching component + strategy is not selected by config) or raises a clear operator-facing error + naming the missing flag. + +- `src/gps_denied_onboard/runtime_root/__init__.py::main()` calls + `register_airborne_strategies()` followed by + `pre_constructed = build_pre_constructed(config)` and then + `compose_root(config, pre_constructed=pre_constructed)`. The + `EXIT_FDR_OPEN_FAILURE` path already covers FDR open failures; this task + extends the existing `RuntimeError` catch to surface + `AirborneBootstrapError` with a clear operator-facing message rather than + the current implicit traceback. + +- New unit tests under `tests/unit/runtime_root/test_az618_pre_constructed.py` + verify: + - AC-1: `build_pre_constructed(config)` returns a dict containing every key + in `AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS` flattened (no duplicates). + - AC-2: A config that selects every default strategy completes + `compose_root(config, pre_constructed=build_pre_constructed(config))` + without raising. (Heavy infrastructure objects may be stubbed via the + existing `_BUILD_*` env flags — the test asserts the seam, not the + runtime.) + - AC-3: When a required `BUILD_*` flag is OFF but the matching component + strategy IS selected by config, the builder raises a clear error naming + both the missing flag and the consuming component slug. + - AC-4: `runtime_root.main()` end-to-end on a minimal config returns 0 + (success) when all `BUILD_*` flags + infra deps resolve; returns + `EXIT_GENERIC_FAILURE` with the `AirborneBootstrapError` message in + stderr when a required infra dep cannot be constructed. + +- Existing Jetson tier-2 e2e replay tests + (`tests/e2e/replay/test_derkachi_1min.py`) cross the + `replay.compose_root.ready` log boundary and reach the per-frame inference + loop. The 5 currently-failing ACs (AC-1, AC-2, AC-5, AC-6 × 2) advance to + exercising C1..C8 end-to-end on the GPU — at which point any remaining + failure is a different, deeper class of bug and out of scope for this task. + +## Scope + +### Included + +- New / refactored module: `runtime_root/airborne_bootstrap.py` — + `build_pre_constructed(config)` function with one internal builder per + required key. Builders reuse existing helper / strategy factories (no new + infrastructure logic — only assembly). + +- `runtime_root/__init__.py::main()` modification: insert + `build_pre_constructed(config)` call between `register_airborne_strategies()` + and `compose_root(config, ...)`. Add `AirborneBootstrapError` to the + exception block so it surfaces with `EXIT_GENERIC_FAILURE` and a clear + operator-facing message. + +- New unit tests: `tests/unit/runtime_root/test_az618_pre_constructed.py` + covering AC-1..AC-4. + +- 6 internal phases — each phase is one source-file delta + matching unit + test, and they may be batched but MUST land in dependency order: + + 1. **c13_fdr + clock** — foundational. The FDR client + WallClock helper + (live) / TlogDerivedClock reuse (replay) — both already exist; the + builder is an assembly step. + 2. **c6_descriptor_index + c6_tile_store** — descriptor faiss index + + tile cache storage. AZ-306 + AZ-303 already built the runtime classes. + 3. **c7_inference engine** — GPU model load. PyTorch FP16 vs. TensorRT + selected by config; `BUILD_TENSORRT_RUNTIME` / `BUILD_PYTORCH_FP16_RUNTIME` + env flags gate the import path. + 4. **c3_lightglue_runtime + c3_feature_extractor** — ALIKED / DISK + LightGlue. Gated by `BUILD_C3_MATCHER_DISK_LIGHTGLUE` / + `BUILD_C3_MATCHER_ALIKED_LIGHTGLUE` env flags. + 5. **c282_ransac_filter** — small, stateless OpenCV-USAC wrapper. + 6. **c5 helpers** — `c5_imu_preintegrator`, `c5_se3_utils`, + `c5_wgs_converter`, `c5_isam2_graph_handle`. All four are already-done + helpers; the builder is pure assembly. + +### Excluded + +- Changing the per-component helper / strategy factory signatures. Each + builder consumes the existing factory's documented surface (e.g. + `make_fdr_client(...)`, `build_inference_runtime(config, ...)`); no + changes to those signatures are in scope. +- GPU build-flag matrix expansion. The `BUILD_*` env flag system is already + in place per component (`config.components.*.strategy`); this task only + consumes the existing flags. New flags are out of scope. +- Operator binary (`operator_bootstrap.py`) extensions. AZ-591 deferred the + operator-side pre_constructed assembly; this task is airborne-only. + Operator binary's current direct-factory path is not affected. +- Replay-branch wiring beyond what already exists. Replay continues to + supply `frame_source` / `fc_adapter` / `clock` / `mavlink_transport` / + `replay_sink` via `build_replay_components`; this task adds the + airborne-side keys ABOVE that set in the same `pre_constructed` dict. +- Refactor of `airborne_bootstrap.py`'s wrapper-factory layer. The existing + `_c1_vio_wrapper`, `_c2_vpr_wrapper`, etc. functions consume `constructed` + correctly today; only the dict-population layer is new. + +## Acceptance Criteria + +**AC-1: `build_pre_constructed(config)` populates every required key** +Given a process where `register_airborne_strategies()` has run +And a `Config` selecting every component's default strategy +When `build_pre_constructed(config)` is called +Then the returned dict contains exactly the set of keys + `set.union(*AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS.values())` +And no key maps to `None`. + +**AC-2: `compose_root(config, pre_constructed=...)` reaches takeoff** +Given `register_airborne_strategies()` has run +And `pre_constructed = build_pre_constructed(config)` for the default config +When `compose_root(config, pre_constructed=pre_constructed)` runs +Then it returns a `RuntimeRoot` whose `components` dict contains all 7 + registered slots (c1_vio, c2_vpr, c2_5_rerank, c3_matcher, c3_5_adhop, + c4_pose, c5_state) without raising `AirborneBootstrapError`. + +**AC-3: `BUILD_*` flag mismatch surfaces a clear error** +Given the config selects `c2_vpr.strategy="net_vlad"` (requires `c7_inference`) +And `BUILD_PYTORCH_FP16_RUNTIME=OFF` +When `build_pre_constructed(config)` is called +Then it raises `AirborneBootstrapError` whose message names both + `c7_inference` (the missing infrastructure) and the gating + `BUILD_*` flag. + +**AC-4: `runtime_root.main()` end-to-end exit codes** +Given a minimal in-process `Config` that selects all-defaults +When `main(config)` is called with every `BUILD_*` flag the defaults need +Then it returns `0` (success) and the runtime_root constructed log line + fires. +And when a single required infra dep is forcibly unavailable +Then it returns `EXIT_GENERIC_FAILURE` (`1`) and stderr contains the + `airborne_bootstrap:` prefix with the missing key and consuming component. + +**AC-5: Jetson tier-2 e2e replay tests cross compose_root.ready** +Given the AZ-618 changes are landed +And the Jetson tier-2 e2e harness is invoked + (`tests/e2e/replay/test_derkachi_1min.py::test_ac1_exits_0_jsonl_count_match` + + AC-2 + AC-5 + AC-6 ×2) +Then each test progresses BEYOND `replay.compose_root.ready` (cross-cycle + smoke: log appears in stdout AND the per-frame pipeline log + `replay.input.frame_emitted` fires at least once per test). +This AC verifies the airborne wiring is correct end-to-end; whether + the per-frame results pass each AC's substantive threshold (count + match, schema match, determinism, pace) is gated by other tasks and + not blocking this AC. + +## Non-Functional Requirements + +- **Startup time**: `build_pre_constructed(config)` must complete within + 60 s on Jetson Orin Nano (JetPack 6.2.2+b24) for the default config. + GPU model load + TensorRT engine cache compilation dominate; if the + engine cache is cold and exceeds 60 s, log a one-line progress + notice at 30 s. +- **Memory**: peak resident set after `build_pre_constructed` must be + < 2 GB on Jetson (excluding the inference model itself; the model is + separately bounded by AZ-320's NFRs). +- **Determinism**: invoking `build_pre_constructed(config)` twice in the + same process MUST produce equivalent dicts (every key present, every + builder callable). Re-invocation is not expected in production but + IS expected in tests; the second call must not raise on already-loaded + GPU resources. +- **Operator-facing error contract**: every `AirborneBootstrapError` + message MUST include (a) the consuming component slug, (b) the + missing dependency key or `BUILD_*` flag, and (c) one actionable + sentence pointing at the fix (e.g. "set `BUILD_C3_MATCHER_DISK_LIGHTGLUE=ON`" + or "ensure `c13_fdr.path` is writable"). + +## Dependencies + +- AZ-591 (registry registration prerequisite) — DONE +- All component runtime classes/factories listed under the **Dependencies** + field above — DONE per individual task IDs + +## Constraints + +- This task MUST NOT touch any per-component factory signature. All + changes are confined to `runtime_root/airborne_bootstrap.py`, + `runtime_root/__init__.py`, and the new test file. +- This task MUST NOT introduce new `BUILD_*` env flags. Reuse the + existing per-strategy `BUILD_*` matrix already gated by each + component's strategy factory. +- Do not stub or mock the inference engine in production code. The + `c7_inference` builder MUST exercise the real (PyTorch FP16 or + TensorRT) runtime when called from `main()`. Tests MAY stub it + via `build_pre_constructed` mock seams documented in the new test + file. + +## Implementation Notes + +- 6-phase internal split (see Scope.Included). Phases land in + dependency order; AC tests for each phase live with the phase + but the full AC-1..AC-4 suite only goes green after phase 6. +- The Jetson-only AC-5 cannot be run from the Mac dev host. The + task is "done" when AC-1..AC-4 pass locally + AC-5 passes on + the operator's Jetson per `scripts/run-tests-jetson.sh`. +- AZ-591's task spec called out this exact follow-up (see its + "AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS docstring": *"production + wiring populates them from the takeoff orchestrator — separate + task — AZ-591 follow-up infrastructure-prep"*). This is that task. + +## Evidence + +- Step-11 Cycle-3 addendum: `_docs/03_implementation/run_tests_step11_report.md` + (committed `e054a55`) +- Jetson tier-2 e2e rerun #3 terminal output: + `/Users/obezdienie001/.cursor/projects/Users-obezdienie001-dev-azaion-suite-gps-denied-onboard/terminals/110515.txt` + (2026-05-18 06:01 UTC, log lines for `replay.compose_root.ready` + + `airborne_bootstrap` raise). +- `AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS` definition site: + `src/gps_denied_onboard/runtime_root/airborne_bootstrap.py:92`. +- Current incomplete `main()`: `src/gps_denied_onboard/runtime_root/__init__.py:636`. diff --git a/_docs/_autodev_state.md b/_docs/_autodev_state.md index 160e8bc..f94a4af 100644 --- a/_docs/_autodev_state.md +++ b/_docs/_autodev_state.md @@ -2,13 +2,13 @@ ## Current Step flow: greenfield -step: 11 -name: Run Tests -status: passed_with_followups +step: 7 +name: Implement +status: not_started sub_step: - phase: 8 - name: az614-az611-landed-bootstrap-gap-discovered - detail: "AZ-614 + AZ-611 + AZ-602 build-flags + AZ-615 tilde-fix all landed (commits e114bfd, bd41956, 324bbd6, b7012d2). Jetson Cycle-3 rerun (terminal 110515.txt): replay path now reaches `replay.compose_root.ready: auto_sync_used=false`, then crashes in `runtime_root.airborne_bootstrap` with `pre_constructed['c282_ransac_filter']` missing. Same 5 heavy ACs still fail but 3 layers deeper — `runtime_root.main()` calls `register_airborne_strategies()` but does NOT build c13_fdr/c6_*/c7_inference/c3_*/c2_82_ransac_filter into pre_constructed. Filed AZ-618 (Story under AZ-602, 5 pts capped). Pending user decision on whether to start AZ-618 immediately or close out Step 11 with the current Reality-Gate signal." + phase: 0 + name: awaiting-invocation + detail: "AZ-618 task spec in todo/ (Step 11 gate sent flow back per greenfield rule: missing internal product implementation = back to Implement)" retry_count: 0 cycle: 1 tracker: jira diff --git a/_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md b/_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md index 3594f2b..e5d20e1 100644 --- a/_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md +++ b/_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md @@ -1,7 +1,7 @@ # D-CROSS-CVE-1 opencv-python pin deferred — gtsam/numpy ABI block **Recorded**: 2026-05-11T02:55+03:00 (Europe/Kyiv) -**Last replay attempt**: 2026-05-17T16:23+03:00 (Europe/Kyiv) — PyPI still shows +**Last replay attempt**: 2026-05-18T20:35+03:00 (Europe/Kyiv) — PyPI still shows `gtsam==4.2.1` as the latest stable (`requires_dist: numpy<2.0.0,>=1.11.0`); `gtsam==4.3a0` alpha exists but is not a stable wheel target. Replay condition (numpy>=2 stable wheels) still NOT met. Leftover remains open.