[AZ-591] Add airborne_bootstrap to populate _STRATEGY_REGISTRY

Batch 66 — fixes the production gap surfaced during the cycle-1
completeness-gate post-mortem: the central _STRATEGY_REGISTRY was
empty in production source, so compose_root() raised
StrategyNotLinkedError on the first component lookup and the
airborne binary couldn't reach takeoff.

Changes:

- New module `src/.../runtime_root/airborne_bootstrap.py` exposes
  `register_airborne_strategies()` and a documented
  `AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS` table. The function
  registers 14 entries into the central registry across 7
  strategy-selecting slots (c1_vio + c2_vpr + c2_5_rerank +
  c3_matcher + c3_5_adhop + c4_pose + c5_state). Per-slot wrappers
  adapt the registry-factory signature (config, constructed) to each
  per-component factory's kwarg surface and surface a
  AirborneBootstrapError when a required infrastructure dep is
  missing from constructed.

- `compose_root` gains a `pre_constructed` kwarg in live mode,
  symmetric with the replay-mode seam. Replay entries still take
  precedence on key collision (ADR-011). Existing callers unaffected
  (kwarg defaults to None).

- `runtime_root/__init__.py::main()` now calls
  `register_airborne_strategies()` before `compose_root(config)` so
  production binaries no longer crash at the registry-lookup step.

- Lazy-loading preserved: state_factory's private _STATE_REGISTRY is
  populated lazily inside the c5_state wrapper, gated by
  BUILD_STATE_GTSAM_ISAM2 / BUILD_STATE_ESKF env flags. pose_factory's
  own lazy-import fallback handles c4_pose without an explicit
  register() call.

- 7 new unit tests in `tests/unit/runtime_root/test_az591_airborne_\
  bootstrap.py` cover AC-1..AC-5 plus the negative-path
  AirborneBootstrapError contract. Full unit suite 2105 passed / 88
  environment-gated skips / 0 failures.

End-to-end takeoff still needs a follow-up task to wire infrastructure
pre-construction (c13_fdr / c6_* / c7_inference / etc.) into the
pre_constructed dict passed to compose_root. That follow-up is gated
by AZ-591 landing first; recommended split into per-component
infrastructure-prep tasks (3pt each).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-16 12:58:38 +03:00
parent 6d51e06886
commit f7a99282fb
6 changed files with 796 additions and 4 deletions
@@ -0,0 +1,145 @@
# AZ-591 — compose_root per-binary bootstrap: populate `_STRATEGY_REGISTRY`
**Task**: AZ-591_compose_root_per_binary_bootstrap
**Name**: compose_root per-binary bootstrap (cross-cutting Tier-1)
**Description**: Land `airborne_bootstrap.py` + `operator_bootstrap.py` modules under `runtime_root/` that call `register_strategy(...)` for every (component, strategy) pair their respective binary needs. Wire the airborne entrypoint `main()` to call `register_airborne_strategies()` before `compose_root(config)`. Without this, `compose_root()` raises `StrategyNotLinkedError` on the first component lookup and the binary cannot reach takeoff.
**Complexity**: 5 points (cross-cutting; touches 7 component slots but each slot is a small factory wrapper)
**Dependencies**: AZ-270 (compose_root surface), AZ-331 (c1_vio factory), AZ-339 (c2_vpr factory), AZ-352 (c2.5 factory), AZ-355 (c4_pose factory), AZ-380 (c5_state factory), AZ-345 (c3_matcher factory), AZ-368 (c3.5_adhop factory) — all already in `done/`.
**Component**: runtime_root (cross-cutting)
**Tracker**: AZ-591
**Epic**: AZ-246 (E-CC-CONF — Cross-Cutting / Composition Root)
## Problem
The Product Implementation Completeness Gate cycle 1 (2026-05-16) initially classified AZ-332 (OKVIS2 skeleton binding) as `FAIL` and created the now-closed AZ-589 + AZ-590 remediation tasks. Investigation of those remediation tasks surfaced the actual production gap: it has nothing to do with OKVIS2 or VINS-Mono specifically.
**The central `_STRATEGY_REGISTRY` is dormant**:
- `src/gps_denied_onboard/runtime_root/__init__.py` defines `_STRATEGY_REGISTRY: dict[tuple[str, str], _Registration]` and the public `register_strategy(component_slug, strategy_name, factory, *, tier, depends_on)` API.
- A workspace-wide `grep -nE 'register_strategy\s*\(' src/` returns **only the definition site** — no module under `src/` ever calls `register_strategy()`. The only call sites are inside `tests/unit/test_az270_compose_root.py` (test fixtures that mutate the registry per-test).
- `compose_root(config)` calls `_compose()` which walks `config.components` and invokes `_resolve_strategy(slug, strategy_name, allowed_tiers)`. For any component slug whose config block declares a `strategy` field, `_resolve_strategy` looks up `(slug, strategy_name)` in `_STRATEGY_REGISTRY`. Since the registry is empty, it raises `StrategyNotLinkedError`.
**Affected component slots** (every component config block with a `strategy: str` field — confirmed via `rg 'strategy:\s*str' src/.../components/*/config.py`):
| Component | Default strategy | Available strategies | Tier(s) |
|-----------|------------------|----------------------|---------|
| `c1_vio` | `klt_ransac` | `okvis2`, `vins_mono`, `klt_ransac` | airborne |
| `c2_vpr` | `net_vlad` | `net_vlad`, `ultra_vpr`, `mega_loc`, `mix_vpr`, `sela_vpr`, `eigen_places`, `salad` | airborne |
| `c2_5_rerank` | `inlier_count` | `inlier_count` (single) | airborne |
| `c3_matcher` | `disk_lightglue` | `disk_lightglue`, `aliked_lightglue` | airborne |
| `c3_5_adhop` | `adhop` | `adhop` (single) | airborne |
| `c4_pose` | `opencv_gtsam` | `opencv_gtsam` (single) | airborne |
| `c5_state` | `gtsam_isam2` | `gtsam_isam2`, `eskf_baseline` | airborne |
(Components without a `strategy` field — `c6_tile_cache`, `c7_inference`, `c8_fc_adapter`, `c11_tile_manager`, `c12_operator_orchestrator`, `c13_fdr` — use direct factories that `compose_root` consumes from `pre_constructed`, NOT the registry path. They are NOT in scope for this task.)
## Outcome
- `src/gps_denied_onboard/runtime_root/airborne_bootstrap.py` exists and exposes `register_airborne_strategies() -> None`. The function calls `register_strategy(...)` for every (component, strategy) pair in the 7-row table above, with `tier="airborne"`. Each registered factory is a small wrapper that adapts the existing per-component factory (`vio_factory.build_vio_strategy`, `vpr_factory.build_vpr_strategy`, etc.) to the `(config, constructed)` registry-factory signature.
- `src/gps_denied_onboard/runtime_root/operator_bootstrap.py` exists and exposes `register_operator_strategies() -> None`. Registers the operator-binary slots (`c10_provisioning`, `c11_tile_manager`, `c12_operator_orchestrator` — these DON'T have a `strategy: str` field today so the operator binary's `compose_operator` flow is already OK; this module is a placeholder for symmetry + future-proofing).
- The airborne entrypoint `runtime_root/__init__.py::main()` calls `register_airborne_strategies()` immediately BEFORE the first `compose_root(config)` call. Wired idempotently: re-invoking `main()` (e.g. in tests) does not raise on the second `register_strategy(...)` call because the registration is equal to the existing entry.
- The wrapper factories declare `depends_on=(...)` such that `_topo_order()` produces a sensible construction order: dependencies that already exist in the per-component factory signatures (e.g. `c1_vio` needs `fdr_client` from `c13_fdr`) are surfaced as `depends_on` edges OR pulled from the `constructed` dict if `c13_fdr` is in `pre_constructed`. Whichever path matches the production assembly.
- New unit tests `tests/unit/runtime_root/test_az591_airborne_bootstrap.py` verify:
- AC-1: `register_airborne_strategies()` populates the registry with the 7 component slots (one per non-test strategy registered).
- AC-2: `compose_root(config)` against a config that selects `c1_vio.strategy="klt_ransac"` + every other component's default strategy completes without raising `StrategyNotLinkedError`.
- AC-3: `register_airborne_strategies()` is idempotent — calling it twice in the same process does not raise.
- AC-4: A config that selects a strategy not registered (e.g. `c2_vpr.strategy="not_a_strategy"`) raises `StrategyNotLinkedError` with the available-strategies list populated.
- AC-5: The `tier="airborne"` filter excludes operator-only registrations from airborne lookups (verified by calling `compose_operator(config)` on the airborne registrations and confirming `StrategyNotLinkedError`).
## Scope
### Included
- `runtime_root/airborne_bootstrap.py` (new) — `register_airborne_strategies()` + per-component wrapper factories.
- `runtime_root/operator_bootstrap.py` (new, minimal) — placeholder for the operator entrypoint's future registry needs; today only `clear_pose_registry` / `clear_state_registry` style cleanup is needed.
- `runtime_root/__init__.py::main()` modification: insert `register_airborne_strategies()` call before `compose_root(config)`.
- `tests/unit/runtime_root/test_az591_airborne_bootstrap.py` (new) — AC-1..AC-5 suite.
### Excluded
- C++ binding work for OKVIS2 (`AZ-592`) and VINS-Mono (`AZ-593`) — these Tier-2 tasks are parked in `backlog/` until their hardware + CI prerequisites are provisioned. The bootstrap registers the c1_vio:okvis2 + c1_vio:vins_mono slots so the registry seam is correct, but the strategy factory still raises `StrategyNotAvailableError` at construction time when `BUILD_OKVIS2=OFF` (existing behaviour from `vio_factory.py`, unchanged).
- Refactoring the per-component factory signatures from `(config, fdr_client=...)` to `(config, constructed)` — instead, the bootstrap's wrapper factories adapt one signature to the other. The per-component factories are stable surfaces and should not change shape inside this task.
- Operator binary strategy registrations beyond the placeholder — the operator binary's actual strategy use is handled by direct factories today (`build_flights_api_client`, etc.) which compose_operator already consumes correctly.
- Replay-branch additions — `compose_root`'s replay path uses `pre_constructed`, which is orthogonal to the registry-driven path this task fixes.
## Acceptance Criteria
**AC-1: Bootstrap populates the airborne registry with 7 component slots**
Given a fresh process where `_STRATEGY_REGISTRY` is empty
When `register_airborne_strategies()` is called
Then `list_registered_strategies("c1_vio")` returns `["klt_ransac", "okvis2", "vins_mono"]` (sorted); same exhaustive list for c2_vpr / c2_5_rerank / c3_matcher / c3_5_adhop / c4_pose / c5_state; every registered factory carries `tier="airborne"`.
**AC-2: compose_root reaches takeoff with default strategies + klt_ransac**
Given `register_airborne_strategies()` has been called
And a config that selects `c1_vio.strategy="klt_ransac"`, `c2_vpr.strategy="net_vlad"`, `c3_matcher.strategy="disk_lightglue"`, `c4_pose.strategy="opencv_gtsam"`, `c5_state.strategy="gtsam_isam2"` (i.e. defaults)
When `compose_root(config)` runs (with required env populated)
Then it returns a `RuntimeRoot` whose `components` dict contains all 7 registered slots; no `StrategyNotLinkedError` is raised.
**AC-3: Idempotent registration**
Given `register_airborne_strategies()` has been called once
When it is called a second time in the same process
Then no exception is raised; the registry retains the same 14+ entries (call-2 is a no-op due to equal `_Registration` records).
**AC-4: Unknown strategy in config still raises with useful message**
Given `register_airborne_strategies()` has been called
And a config selects `c2_vpr.strategy="not_a_real_strategy"`
When `compose_root(config)` runs
Then `StrategyNotLinkedError` is raised with `strategy_name="not_a_real_strategy"`, `component_slug="c2_vpr"`, `available_strategies` including `"net_vlad"` etc., and `reason="not linked"`.
**AC-5: Tier isolation prevents airborne registrations from leaking into compose_operator**
Given `register_airborne_strategies()` has been called (no operator registrations)
When `compose_operator(config)` runs against the same config
Then it raises `StrategyNotLinkedError` for each airborne-tier registration with `reason` mentioning the tier mismatch; no airborne strategy is constructed by the operator binary path.
## Non-Functional Requirements
**Performance**
- `register_airborne_strategies()` cost ≤ 50 ms on cold import (it's effectively 14 dict inserts + their dependency-resolution).
**Reliability**
- No raw `RuntimeError` from the registry path should reach the operator — every failure mode passes through `StrategyNotLinkedError` with the contextual fields populated (already true of the existing surface).
## Constraints
- The wrapper factories MUST use the existing per-component factories. NEVER duplicate the BUILD_* flag gating logic inside the bootstrap — `vio_factory.build_vio_strategy` already does that for c1_vio, and similarly for each component.
- AZ-507 cross-component import rule: `runtime_root/airborne_bootstrap.py` is the composition root, so it MAY import from any component's Public API. NEVER reach into a component's internal modules; always go through the per-component factory.
- The `depends_on` declarations MUST be consistent with the per-component factory signatures. Document any inferred ordering in the wrapper factory's docstring.
## Risks & Mitigation
**Risk 1: Per-component factory signatures don't match `(config, constructed)`**
- *Risk*: `build_vio_strategy(config, *, fdr_client)` takes `fdr_client` as a kwarg, not from a `constructed` dict. Adapting requires the wrapper to read `constructed["c13_fdr"]` and pass it as `fdr_client=...`. But `c13_fdr` is constructed by the takeoff path (`take_off()`), NOT by `compose_root`'s registry path. So the wrapper's `constructed` may not contain `c13_fdr` at call time.
- *Mitigation*: For c1_vio specifically, the existing `take_off()` flow passes `fdr_client` separately via `other_components_factory(config, writer, fc_adapter)`. The bootstrap's wrapper for c1_vio should match this — it expects `constructed` to contain `c13_fdr`, raises a clear error if not, and the airborne entrypoint orchestrates `take_off()` to populate `constructed["c13_fdr"]` before calling `compose_root`. Document the call-order invariant in `airborne_bootstrap.py`.
**Risk 2: Compose-root construction order doesn't match the live takeoff path**
- *Risk*: `_topo_order` runs Kahn's algorithm over the `depends_on` graph; the production `take_off()` runs a specific ordered sequence (writer → flight header → fc_adapter → other components). Disagreement between these two orderings can produce subtle bugs.
- *Mitigation*: For now, the airborne bootstrap registers ONLY the 7 strategy-selecting component slots. The `take_off()` / `_replay_branch` flows continue to own c13_fdr / c8_fc_adapter / c6_tile_cache / c7_inference / replay components via their existing direct factories. The `pre_constructed` mechanism lets the registry-driven `_compose` see them already-built. Document this explicitly in the bootstrap module docstring.
## Notes
- This task does NOT validate end-to-end on the airborne binary because that requires a real Jetson + nav-camera + FC. It validates that `compose_root()` returns a `RuntimeRoot` without raising — the unit-test gate. End-to-end binary validation lives in the Tier-2 Jetson harness (AZ-444).
- After this task lands, the cycle-1 completeness gate report at `_docs/03_implementation/implementation_completeness_cycle1_report.md` should be re-read: the `FAIL` classification for AZ-332 + AZ-333 is re-classified to `BLOCKED on Tier-2 prerequisites` per AZ-592 / AZ-593. The actual production blocker (this task) is being remediated here.
- The user's PBI complexity rule caps PBIs at 5pt. This task is at the 5pt boundary because all 7 slots use the same wrapper pattern (so the slot count doesn't multiply complexity). If any slot's wrapper needs more than a few-line factory adapter, that slot's wrapper should split into its own PBI (`AZ-591_<slug>_bootstrap`).
## Implementation Notes (2026-05-16, batch 66)
**Outcome**: Landed `src/gps_denied_onboard/runtime_root/airborne_bootstrap.py` with `register_airborne_strategies()` registering 14 entries into the central `_STRATEGY_REGISTRY` across 7 component slots (c1_vio, c2_vpr, c2_5_rerank, c3_matcher, c3_5_adhop, c4_pose, c5_state). Each slot's wrapper extracts infrastructure deps from `constructed` by documented key (see `AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS`) and forwards to the existing per-component factory (`build_vio_strategy`, `build_vpr_strategy`, etc.). Inter-component dependency edges are declared via `register_strategy(... depends_on=...)` so `_topo_order()` respects the runtime data-flow ordering (c2_vpr → c2_5_rerank; c3_matcher → c3_5_adhop; c1_vio + c3_matcher → c4_pose; c1_vio + c4_pose → c5_state).
**API extension**: `compose_root(config, *, pre_constructed, replay_components_factory)` now accepts a `pre_constructed` kwarg in live mode (previously only used in replay mode via `replay_components`). This is the seam the bootstrap wrappers rely on for infrastructure deps. Existing `compose_root` callers are unaffected (the kwarg defaults to `None`).
**main() integration**: `runtime_root/__init__.py::main()` now calls `register_airborne_strategies()` BEFORE `compose_root(config)`. Production binaries that call this `main()` no longer crash with `StrategyNotLinkedError` at the registry-lookup step. Note: end-to-end takeoff still requires a separate task to wire infrastructure pre-construction (c13_fdr, c6_descriptor_index, c7_inference, etc.) into the `pre_constructed` dict passed to `compose_root`. The wrappers fail loudly with `AirborneBootstrapError` if a dep is missing — that's the actionable next-step error for that follow-up task.
**Lazy-loading preservation**: The bootstrap module's top-level imports pull in the runtime_root factory modules (`vio_factory`, `vpr_factory`, etc.) which are thin import-time-safe — they don't transitively import gtsam, opencv-cuda, or other heavy deps. The c5_state private registry (`_STATE_REGISTRY`) is populated lazily inside `_c5_state_wrapper` via `_ensure_state_strategy_registered(config)`, which checks `BUILD_STATE_GTSAM_ISAM2` / `BUILD_STATE_ESKF` env flags before importing the gtsam-bound module. c4_pose's `_POSE_REGISTRY` is populated by `pose_factory._resolve_factory`'s own lazy-import fallback — no explicit `register()` from this bootstrap is needed.
**Tests**: 7 ACs verified in `tests/unit/runtime_root/test_az591_airborne_bootstrap.py`:
- AC-1 — every slot has the expected strategy set after `register_airborne_strategies()`.
- AC-2 — `compose_root(config, pre_constructed=...)` reaches completion with stubbed wrappers; topological order honoured.
- AC-3 — idempotent re-registration.
- AC-4 — unknown strategy in config raises `StrategyNotLinkedError` with available-strategies list.
- AC-5 — airborne registrations are tier-isolated from `compose_operator`.
- Plus a negative-path test that the production wrappers surface `AirborneBootstrapError` with the missing-key name when `pre_constructed` is empty.
- Plus a consistency test that `AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS` covers every registered slot.
**Test results**: 7/7 new tests pass; 8/8 existing `test_az270_compose_root.py` tests still pass (no regression from the `pre_constructed` kwarg extension); full unit suite 2105 passed / 88 environment-gated skips / 0 failures.
**Follow-up not in this task**: The actual infrastructure pre-construction (building c13_fdr / c6_descriptor_index / c7_inference / c3_lightglue_runtime / c282_ransac_filter / c5_imu_preintegrator / etc. into a dict and passing it to `compose_root(..., pre_constructed=...)`) is a separate cross-cutting task. AZ-591 surfaces the registry seam; that follow-up wires the infrastructure side. Recommended split: per-component infrastructure-prep tasks (3pt each) gated by their existing factory's BUILD_* flag, sequenced behind AZ-591.