[AZ-618] Task spec + autodev rewind to Step 7

Step 11 gate failed per greenfield rule: 5 e2e ACs reach
`replay.compose_root.ready` and then crash inside
runtime_root.airborne_bootstrap on the first pre_constructed
lookup. That is "missing internal product implementation",
which the gate description routes back to Implement.

* Task spec AZ-618 (255 lines, 5 pts, 6-phase internal split,
  AC-1..AC-5) parked in _docs/02_tasks/todo/. Phases land in
  dependency order: c13_fdr+clock -> c6_* -> c7_inference ->
  c3_lightglue+features -> c282_ransac_filter -> c5 helpers.
* Autodev state: step 7 (Implement), status not_started,
  sub_step awaiting-invocation, cycle 1. retry_count = 0.
* Leftover D-CROSS-CVE-1: replay attempted, still deferred
  (gtsam 4.2.1 on PyPI still pins numpy<2.0.0); timestamp
  bumped to 2026-05-18T20:35+03:00.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-18 20:42:25 +03:00
parent e054a55804
commit bcdc17bd74
3 changed files with 262 additions and 7 deletions
@@ -0,0 +1,255 @@
# AZ-618 — Airborne main() builds pre_constructed infrastructure for compose_root
**Task**: AZ-618_airborne_bootstrap_pre_constructed
**Name**: Airborne bootstrap pre_constructed assembly (cross-cutting Tier-1)
**Description**: Land an `airborne_bootstrap.build_pre_constructed(config) -> dict[str, Any]` function (or equivalent in-`main()` wiring) that constructs every infrastructure object the registered airborne-strategy wrappers require, and call `compose_root(config, pre_constructed=...)` with the result from `runtime_root.main()`. Without this, `compose_root()` raises `AirborneBootstrapError` on the first wrapper lookup (`c1_vio` reaches for `pre_constructed['c13_fdr']` and finds nothing) and the binary cannot reach takeoff.
**Complexity**: 5 points (cross-cutting; touches up to 12 infrastructure slots, but each slot reuses an existing per-component builder; GPU init for `c7_inference` + `c3_lightglue_runtime` + `c3_feature_extractor` is the only genuinely new wiring)
**Dependencies**: AZ-591 (registry registration is the prerequisite — without it the wrappers do not run at all). Helper / runtime classes consumed by the wrappers are all already in `done/` per their own task IDs (c13_fdr → AZ-273+, c6_descriptor_index → AZ-306, c6_tile_store → AZ-303+, c7_inference → AZ-320+, c3_lightglue_runtime + c3_feature_extractor → AZ-278+, c2_82_ransac_filter → AZ-358, c5_imu_preintegrator → AZ-276, c5_se3_utils → AZ-277, c5_wgs_converter → AZ-284, c5_isam2_graph_handle → AZ-381).
**Component**: runtime_root (cross-cutting)
**Tracker**: AZ-618
**Epic**: AZ-602 (E2E Tier-1 harness rehabilitation — parent set during ticket creation)
## Problem
Step 11 (Run Tests) cycle 1 Jetson tier-2 e2e rerun #3 surfaced this gap. With AZ-614 (synth time-base) + AZ-611 (skip-auto-sync) + AZ-602 (compose `BUILD_*` flag completeness) all landed, the Derkachi 1-min replay path now passes every layer up to and including:
```
replay.compose_root.ready: pace=asap resolved_offset_ms=0 auto_sync_used=false
```
…then crashes inside `runtime_root.airborne_bootstrap._require`:
```
runtime_root: airborne_bootstrap: component 'c4_pose' requires
pre_constructed['c282_ransac_filter'] to be populated before compose_root() runs;
available keys in constructed: ['clock', 'fc_adapter', 'frame_source',
'mavlink_transport', 'replay_sink'].
Production main() must build infrastructure (c13_fdr, c6_*, c7_inference, etc.)
into pre_constructed and pass it to compose_root(config, pre_constructed=...).
Tests stub it via the same kwarg.
```
**Cause**: `runtime_root.main()` (`src/gps_denied_onboard/runtime_root/__init__.py:636`) calls `register_airborne_strategies()` (registers the wrapper factories — AZ-591 work) and then `compose_root(config)` with **no** `pre_constructed=`. The wrappers' `_require(constructed, "c13_fdr", "c1_vio")` etc. raise on the first lookup because the dict is empty.
**Why hidden until now**: every prior Reality-Gate run died at auto-sync (AZ-614 root cause, 2026-05-17) BEFORE the composition graph was walked. AZ-591 was self-described as registering the "registry seam" — it explicitly deferred the `pre_constructed` assembly to a follow-up. That follow-up is this task.
**Why both binaries are affected**: the live `gps-denied-onboard` binary would crash at the same lookup the moment any component reaches into `pre_constructed`. Existing unit tests for `compose_root` (`tests/unit/test_az401_compose_root_replay.py`, 38 passing) pass only because they inject a stub via the `replay_components_factory` kwarg, bypassing the registry-driven path entirely. There is currently no test that exercises the production assembly.
## Outcome
- `src/gps_denied_onboard/runtime_root/airborne_bootstrap.py` exposes a new
public `build_pre_constructed(config: Config) -> dict[str, Any]` that returns
a dict populated with every key in `AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS`
(12 distinct infrastructure objects: `c13_fdr`, `c6_descriptor_index`,
`c6_tile_store`, `c7_inference`, `c3_lightglue_runtime`,
`c3_feature_extractor`, `c282_ransac_filter`, `c5_wgs_converter`,
`c5_se3_utils`, `c5_isam2_graph_handle`, `c5_imu_preintegrator`, `clock`).
GPU-touching builders (`c7_inference`, `c3_lightglue_runtime`,
`c3_feature_extractor`) are gated by their existing `BUILD_*` env flags;
when a flag is OFF, the builder either skips (if the matching component
strategy is not selected by config) or raises a clear operator-facing error
naming the missing flag.
- `src/gps_denied_onboard/runtime_root/__init__.py::main()` calls
`register_airborne_strategies()` followed by
`pre_constructed = build_pre_constructed(config)` and then
`compose_root(config, pre_constructed=pre_constructed)`. The
`EXIT_FDR_OPEN_FAILURE` path already covers FDR open failures; this task
extends the existing `RuntimeError` catch to surface
`AirborneBootstrapError` with a clear operator-facing message rather than
the current implicit traceback.
- New unit tests under `tests/unit/runtime_root/test_az618_pre_constructed.py`
verify:
- AC-1: `build_pre_constructed(config)` returns a dict containing every key
in `AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS` flattened (no duplicates).
- AC-2: A config that selects every default strategy completes
`compose_root(config, pre_constructed=build_pre_constructed(config))`
without raising. (Heavy infrastructure objects may be stubbed via the
existing `_BUILD_*` env flags — the test asserts the seam, not the
runtime.)
- AC-3: When a required `BUILD_*` flag is OFF but the matching component
strategy IS selected by config, the builder raises a clear error naming
both the missing flag and the consuming component slug.
- AC-4: `runtime_root.main()` end-to-end on a minimal config returns 0
(success) when all `BUILD_*` flags + infra deps resolve; returns
`EXIT_GENERIC_FAILURE` with the `AirborneBootstrapError` message in
stderr when a required infra dep cannot be constructed.
- Existing Jetson tier-2 e2e replay tests
(`tests/e2e/replay/test_derkachi_1min.py`) cross the
`replay.compose_root.ready` log boundary and reach the per-frame inference
loop. The 5 currently-failing ACs (AC-1, AC-2, AC-5, AC-6 × 2) advance to
exercising C1..C8 end-to-end on the GPU — at which point any remaining
failure is a different, deeper class of bug and out of scope for this task.
## Scope
### Included
- New / refactored module: `runtime_root/airborne_bootstrap.py`
`build_pre_constructed(config)` function with one internal builder per
required key. Builders reuse existing helper / strategy factories (no new
infrastructure logic — only assembly).
- `runtime_root/__init__.py::main()` modification: insert
`build_pre_constructed(config)` call between `register_airborne_strategies()`
and `compose_root(config, ...)`. Add `AirborneBootstrapError` to the
exception block so it surfaces with `EXIT_GENERIC_FAILURE` and a clear
operator-facing message.
- New unit tests: `tests/unit/runtime_root/test_az618_pre_constructed.py`
covering AC-1..AC-4.
- 6 internal phases — each phase is one source-file delta + matching unit
test, and they may be batched but MUST land in dependency order:
1. **c13_fdr + clock** — foundational. The FDR client + WallClock helper
(live) / TlogDerivedClock reuse (replay) — both already exist; the
builder is an assembly step.
2. **c6_descriptor_index + c6_tile_store** — descriptor faiss index +
tile cache storage. AZ-306 + AZ-303 already built the runtime classes.
3. **c7_inference engine** — GPU model load. PyTorch FP16 vs. TensorRT
selected by config; `BUILD_TENSORRT_RUNTIME` / `BUILD_PYTORCH_FP16_RUNTIME`
env flags gate the import path.
4. **c3_lightglue_runtime + c3_feature_extractor** — ALIKED / DISK
LightGlue. Gated by `BUILD_C3_MATCHER_DISK_LIGHTGLUE` /
`BUILD_C3_MATCHER_ALIKED_LIGHTGLUE` env flags.
5. **c282_ransac_filter** — small, stateless OpenCV-USAC wrapper.
6. **c5 helpers**`c5_imu_preintegrator`, `c5_se3_utils`,
`c5_wgs_converter`, `c5_isam2_graph_handle`. All four are already-done
helpers; the builder is pure assembly.
### Excluded
- Changing the per-component helper / strategy factory signatures. Each
builder consumes the existing factory's documented surface (e.g.
`make_fdr_client(...)`, `build_inference_runtime(config, ...)`); no
changes to those signatures are in scope.
- GPU build-flag matrix expansion. The `BUILD_*` env flag system is already
in place per component (`config.components.*.strategy`); this task only
consumes the existing flags. New flags are out of scope.
- Operator binary (`operator_bootstrap.py`) extensions. AZ-591 deferred the
operator-side pre_constructed assembly; this task is airborne-only.
Operator binary's current direct-factory path is not affected.
- Replay-branch wiring beyond what already exists. Replay continues to
supply `frame_source` / `fc_adapter` / `clock` / `mavlink_transport` /
`replay_sink` via `build_replay_components`; this task adds the
airborne-side keys ABOVE that set in the same `pre_constructed` dict.
- Refactor of `airborne_bootstrap.py`'s wrapper-factory layer. The existing
`_c1_vio_wrapper`, `_c2_vpr_wrapper`, etc. functions consume `constructed`
correctly today; only the dict-population layer is new.
## Acceptance Criteria
**AC-1: `build_pre_constructed(config)` populates every required key**
Given a process where `register_airborne_strategies()` has run
And a `Config` selecting every component's default strategy
When `build_pre_constructed(config)` is called
Then the returned dict contains exactly the set of keys
`set.union(*AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS.values())`
And no key maps to `None`.
**AC-2: `compose_root(config, pre_constructed=...)` reaches takeoff**
Given `register_airborne_strategies()` has run
And `pre_constructed = build_pre_constructed(config)` for the default config
When `compose_root(config, pre_constructed=pre_constructed)` runs
Then it returns a `RuntimeRoot` whose `components` dict contains all 7
registered slots (c1_vio, c2_vpr, c2_5_rerank, c3_matcher, c3_5_adhop,
c4_pose, c5_state) without raising `AirborneBootstrapError`.
**AC-3: `BUILD_*` flag mismatch surfaces a clear error**
Given the config selects `c2_vpr.strategy="net_vlad"` (requires `c7_inference`)
And `BUILD_PYTORCH_FP16_RUNTIME=OFF`
When `build_pre_constructed(config)` is called
Then it raises `AirborneBootstrapError` whose message names both
`c7_inference` (the missing infrastructure) and the gating
`BUILD_*` flag.
**AC-4: `runtime_root.main()` end-to-end exit codes**
Given a minimal in-process `Config` that selects all-defaults
When `main(config)` is called with every `BUILD_*` flag the defaults need
Then it returns `0` (success) and the runtime_root constructed log line
fires.
And when a single required infra dep is forcibly unavailable
Then it returns `EXIT_GENERIC_FAILURE` (`1`) and stderr contains the
`airborne_bootstrap:` prefix with the missing key and consuming component.
**AC-5: Jetson tier-2 e2e replay tests cross compose_root.ready**
Given the AZ-618 changes are landed
And the Jetson tier-2 e2e harness is invoked
(`tests/e2e/replay/test_derkachi_1min.py::test_ac1_exits_0_jsonl_count_match`
+ AC-2 + AC-5 + AC-6 ×2)
Then each test progresses BEYOND `replay.compose_root.ready` (cross-cycle
smoke: log appears in stdout AND the per-frame pipeline log
`replay.input.frame_emitted` fires at least once per test).
This AC verifies the airborne wiring is correct end-to-end; whether
the per-frame results pass each AC's substantive threshold (count
match, schema match, determinism, pace) is gated by other tasks and
not blocking this AC.
## Non-Functional Requirements
- **Startup time**: `build_pre_constructed(config)` must complete within
60 s on Jetson Orin Nano (JetPack 6.2.2+b24) for the default config.
GPU model load + TensorRT engine cache compilation dominate; if the
engine cache is cold and exceeds 60 s, log a one-line progress
notice at 30 s.
- **Memory**: peak resident set after `build_pre_constructed` must be
< 2 GB on Jetson (excluding the inference model itself; the model is
separately bounded by AZ-320's NFRs).
- **Determinism**: invoking `build_pre_constructed(config)` twice in the
same process MUST produce equivalent dicts (every key present, every
builder callable). Re-invocation is not expected in production but
IS expected in tests; the second call must not raise on already-loaded
GPU resources.
- **Operator-facing error contract**: every `AirborneBootstrapError`
message MUST include (a) the consuming component slug, (b) the
missing dependency key or `BUILD_*` flag, and (c) one actionable
sentence pointing at the fix (e.g. "set `BUILD_C3_MATCHER_DISK_LIGHTGLUE=ON`"
or "ensure `c13_fdr.path` is writable").
## Dependencies
- AZ-591 (registry registration prerequisite) — DONE
- All component runtime classes/factories listed under the **Dependencies**
field above — DONE per individual task IDs
## Constraints
- This task MUST NOT touch any per-component factory signature. All
changes are confined to `runtime_root/airborne_bootstrap.py`,
`runtime_root/__init__.py`, and the new test file.
- This task MUST NOT introduce new `BUILD_*` env flags. Reuse the
existing per-strategy `BUILD_*` matrix already gated by each
component's strategy factory.
- Do not stub or mock the inference engine in production code. The
`c7_inference` builder MUST exercise the real (PyTorch FP16 or
TensorRT) runtime when called from `main()`. Tests MAY stub it
via `build_pre_constructed` mock seams documented in the new test
file.
## Implementation Notes
- 6-phase internal split (see Scope.Included). Phases land in
dependency order; AC tests for each phase live with the phase
but the full AC-1..AC-4 suite only goes green after phase 6.
- The Jetson-only AC-5 cannot be run from the Mac dev host. The
task is "done" when AC-1..AC-4 pass locally + AC-5 passes on
the operator's Jetson per `scripts/run-tests-jetson.sh`.
- AZ-591's task spec called out this exact follow-up (see its
"AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS docstring": *"production
wiring populates them from the takeoff orchestrator — separate
task — AZ-591 follow-up infrastructure-prep"*). This is that task.
## Evidence
- Step-11 Cycle-3 addendum: `_docs/03_implementation/run_tests_step11_report.md`
(committed `e054a55`)
- Jetson tier-2 e2e rerun #3 terminal output:
`/Users/obezdienie001/.cursor/projects/Users-obezdienie001-dev-azaion-suite-gps-denied-onboard/terminals/110515.txt`
(2026-05-18 06:01 UTC, log lines for `replay.compose_root.ready` +
`airborne_bootstrap` raise).
- `AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS` definition site:
`src/gps_denied_onboard/runtime_root/airborne_bootstrap.py:92`.
- Current incomplete `main()`: `src/gps_denied_onboard/runtime_root/__init__.py:636`.
+6 -6
View File
@@ -2,13 +2,13 @@
## Current Step
flow: greenfield
step: 11
name: Run Tests
status: passed_with_followups
step: 7
name: Implement
status: not_started
sub_step:
phase: 8
name: az614-az611-landed-bootstrap-gap-discovered
detail: "AZ-614 + AZ-611 + AZ-602 build-flags + AZ-615 tilde-fix all landed (commits e114bfd, bd41956, 324bbd6, b7012d2). Jetson Cycle-3 rerun (terminal 110515.txt): replay path now reaches `replay.compose_root.ready: auto_sync_used=false`, then crashes in `runtime_root.airborne_bootstrap` with `pre_constructed['c282_ransac_filter']` missing. Same 5 heavy ACs still fail but 3 layers deeper — `runtime_root.main()` calls `register_airborne_strategies()` but does NOT build c13_fdr/c6_*/c7_inference/c3_*/c2_82_ransac_filter into pre_constructed. Filed AZ-618 (Story under AZ-602, 5 pts capped). Pending user decision on whether to start AZ-618 immediately or close out Step 11 with the current Reality-Gate signal."
phase: 0
name: awaiting-invocation
detail: "AZ-618 task spec in todo/ (Step 11 gate sent flow back per greenfield rule: missing internal product implementation = back to Implement)"
retry_count: 0
cycle: 1
tracker: jira
@@ -1,7 +1,7 @@
# D-CROSS-CVE-1 opencv-python pin deferred — gtsam/numpy ABI block
**Recorded**: 2026-05-11T02:55+03:00 (Europe/Kyiv)
**Last replay attempt**: 2026-05-17T16:23+03:00 (Europe/Kyiv) — PyPI still shows
**Last replay attempt**: 2026-05-18T20:35+03:00 (Europe/Kyiv) — PyPI still shows
`gtsam==4.2.1` as the latest stable (`requires_dist: numpy<2.0.0,>=1.11.0`);
`gtsam==4.3a0` alpha exists but is not a stable wheel target. Replay condition
(numpy>=2 stable wheels) still NOT met. Leftover remains open.