mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 19:01:14 +00:00
[AZ-618] Task spec + autodev rewind to Step 7
Step 11 gate failed per greenfield rule: 5 e2e ACs reach `replay.compose_root.ready` and then crash inside runtime_root.airborne_bootstrap on the first pre_constructed lookup. That is "missing internal product implementation", which the gate description routes back to Implement. * Task spec AZ-618 (255 lines, 5 pts, 6-phase internal split, AC-1..AC-5) parked in _docs/02_tasks/todo/. Phases land in dependency order: c13_fdr+clock -> c6_* -> c7_inference -> c3_lightglue+features -> c282_ransac_filter -> c5 helpers. * Autodev state: step 7 (Implement), status not_started, sub_step awaiting-invocation, cycle 1. retry_count = 0. * Leftover D-CROSS-CVE-1: replay attempted, still deferred (gtsam 4.2.1 on PyPI still pins numpy<2.0.0); timestamp bumped to 2026-05-18T20:35+03:00. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,255 @@
|
|||||||
|
# AZ-618 — Airborne main() builds pre_constructed infrastructure for compose_root
|
||||||
|
|
||||||
|
**Task**: AZ-618_airborne_bootstrap_pre_constructed
|
||||||
|
**Name**: Airborne bootstrap pre_constructed assembly (cross-cutting Tier-1)
|
||||||
|
**Description**: Land an `airborne_bootstrap.build_pre_constructed(config) -> dict[str, Any]` function (or equivalent in-`main()` wiring) that constructs every infrastructure object the registered airborne-strategy wrappers require, and call `compose_root(config, pre_constructed=...)` with the result from `runtime_root.main()`. Without this, `compose_root()` raises `AirborneBootstrapError` on the first wrapper lookup (`c1_vio` reaches for `pre_constructed['c13_fdr']` and finds nothing) and the binary cannot reach takeoff.
|
||||||
|
**Complexity**: 5 points (cross-cutting; touches up to 12 infrastructure slots, but each slot reuses an existing per-component builder; GPU init for `c7_inference` + `c3_lightglue_runtime` + `c3_feature_extractor` is the only genuinely new wiring)
|
||||||
|
**Dependencies**: AZ-591 (registry registration is the prerequisite — without it the wrappers do not run at all). Helper / runtime classes consumed by the wrappers are all already in `done/` per their own task IDs (c13_fdr → AZ-273+, c6_descriptor_index → AZ-306, c6_tile_store → AZ-303+, c7_inference → AZ-320+, c3_lightglue_runtime + c3_feature_extractor → AZ-278+, c2_82_ransac_filter → AZ-358, c5_imu_preintegrator → AZ-276, c5_se3_utils → AZ-277, c5_wgs_converter → AZ-284, c5_isam2_graph_handle → AZ-381).
|
||||||
|
**Component**: runtime_root (cross-cutting)
|
||||||
|
**Tracker**: AZ-618
|
||||||
|
**Epic**: AZ-602 (E2E Tier-1 harness rehabilitation — parent set during ticket creation)
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
Step 11 (Run Tests) cycle 1 Jetson tier-2 e2e rerun #3 surfaced this gap. With AZ-614 (synth time-base) + AZ-611 (skip-auto-sync) + AZ-602 (compose `BUILD_*` flag completeness) all landed, the Derkachi 1-min replay path now passes every layer up to and including:
|
||||||
|
|
||||||
|
```
|
||||||
|
replay.compose_root.ready: pace=asap resolved_offset_ms=0 auto_sync_used=false
|
||||||
|
```
|
||||||
|
|
||||||
|
…then crashes inside `runtime_root.airborne_bootstrap._require`:
|
||||||
|
|
||||||
|
```
|
||||||
|
runtime_root: airborne_bootstrap: component 'c4_pose' requires
|
||||||
|
pre_constructed['c282_ransac_filter'] to be populated before compose_root() runs;
|
||||||
|
available keys in constructed: ['clock', 'fc_adapter', 'frame_source',
|
||||||
|
'mavlink_transport', 'replay_sink'].
|
||||||
|
Production main() must build infrastructure (c13_fdr, c6_*, c7_inference, etc.)
|
||||||
|
into pre_constructed and pass it to compose_root(config, pre_constructed=...).
|
||||||
|
Tests stub it via the same kwarg.
|
||||||
|
```
|
||||||
|
|
||||||
|
**Cause**: `runtime_root.main()` (`src/gps_denied_onboard/runtime_root/__init__.py:636`) calls `register_airborne_strategies()` (registers the wrapper factories — AZ-591 work) and then `compose_root(config)` with **no** `pre_constructed=`. The wrappers' `_require(constructed, "c13_fdr", "c1_vio")` etc. raise on the first lookup because the dict is empty.
|
||||||
|
|
||||||
|
**Why hidden until now**: every prior Reality-Gate run died at auto-sync (AZ-614 root cause, 2026-05-17) BEFORE the composition graph was walked. AZ-591 was self-described as registering the "registry seam" — it explicitly deferred the `pre_constructed` assembly to a follow-up. That follow-up is this task.
|
||||||
|
|
||||||
|
**Why both binaries are affected**: the live `gps-denied-onboard` binary would crash at the same lookup the moment any component reaches into `pre_constructed`. Existing unit tests for `compose_root` (`tests/unit/test_az401_compose_root_replay.py`, 38 passing) pass only because they inject a stub via the `replay_components_factory` kwarg, bypassing the registry-driven path entirely. There is currently no test that exercises the production assembly.
|
||||||
|
|
||||||
|
## Outcome
|
||||||
|
|
||||||
|
- `src/gps_denied_onboard/runtime_root/airborne_bootstrap.py` exposes a new
|
||||||
|
public `build_pre_constructed(config: Config) -> dict[str, Any]` that returns
|
||||||
|
a dict populated with every key in `AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS`
|
||||||
|
(12 distinct infrastructure objects: `c13_fdr`, `c6_descriptor_index`,
|
||||||
|
`c6_tile_store`, `c7_inference`, `c3_lightglue_runtime`,
|
||||||
|
`c3_feature_extractor`, `c282_ransac_filter`, `c5_wgs_converter`,
|
||||||
|
`c5_se3_utils`, `c5_isam2_graph_handle`, `c5_imu_preintegrator`, `clock`).
|
||||||
|
GPU-touching builders (`c7_inference`, `c3_lightglue_runtime`,
|
||||||
|
`c3_feature_extractor`) are gated by their existing `BUILD_*` env flags;
|
||||||
|
when a flag is OFF, the builder either skips (if the matching component
|
||||||
|
strategy is not selected by config) or raises a clear operator-facing error
|
||||||
|
naming the missing flag.
|
||||||
|
|
||||||
|
- `src/gps_denied_onboard/runtime_root/__init__.py::main()` calls
|
||||||
|
`register_airborne_strategies()` followed by
|
||||||
|
`pre_constructed = build_pre_constructed(config)` and then
|
||||||
|
`compose_root(config, pre_constructed=pre_constructed)`. The
|
||||||
|
`EXIT_FDR_OPEN_FAILURE` path already covers FDR open failures; this task
|
||||||
|
extends the existing `RuntimeError` catch to surface
|
||||||
|
`AirborneBootstrapError` with a clear operator-facing message rather than
|
||||||
|
the current implicit traceback.
|
||||||
|
|
||||||
|
- New unit tests under `tests/unit/runtime_root/test_az618_pre_constructed.py`
|
||||||
|
verify:
|
||||||
|
- AC-1: `build_pre_constructed(config)` returns a dict containing every key
|
||||||
|
in `AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS` flattened (no duplicates).
|
||||||
|
- AC-2: A config that selects every default strategy completes
|
||||||
|
`compose_root(config, pre_constructed=build_pre_constructed(config))`
|
||||||
|
without raising. (Heavy infrastructure objects may be stubbed via the
|
||||||
|
existing `_BUILD_*` env flags — the test asserts the seam, not the
|
||||||
|
runtime.)
|
||||||
|
- AC-3: When a required `BUILD_*` flag is OFF but the matching component
|
||||||
|
strategy IS selected by config, the builder raises a clear error naming
|
||||||
|
both the missing flag and the consuming component slug.
|
||||||
|
- AC-4: `runtime_root.main()` end-to-end on a minimal config returns 0
|
||||||
|
(success) when all `BUILD_*` flags + infra deps resolve; returns
|
||||||
|
`EXIT_GENERIC_FAILURE` with the `AirborneBootstrapError` message in
|
||||||
|
stderr when a required infra dep cannot be constructed.
|
||||||
|
|
||||||
|
- Existing Jetson tier-2 e2e replay tests
|
||||||
|
(`tests/e2e/replay/test_derkachi_1min.py`) cross the
|
||||||
|
`replay.compose_root.ready` log boundary and reach the per-frame inference
|
||||||
|
loop. The 5 currently-failing ACs (AC-1, AC-2, AC-5, AC-6 × 2) advance to
|
||||||
|
exercising C1..C8 end-to-end on the GPU — at which point any remaining
|
||||||
|
failure is a different, deeper class of bug and out of scope for this task.
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
|
||||||
|
### Included
|
||||||
|
|
||||||
|
- New / refactored module: `runtime_root/airborne_bootstrap.py` —
|
||||||
|
`build_pre_constructed(config)` function with one internal builder per
|
||||||
|
required key. Builders reuse existing helper / strategy factories (no new
|
||||||
|
infrastructure logic — only assembly).
|
||||||
|
|
||||||
|
- `runtime_root/__init__.py::main()` modification: insert
|
||||||
|
`build_pre_constructed(config)` call between `register_airborne_strategies()`
|
||||||
|
and `compose_root(config, ...)`. Add `AirborneBootstrapError` to the
|
||||||
|
exception block so it surfaces with `EXIT_GENERIC_FAILURE` and a clear
|
||||||
|
operator-facing message.
|
||||||
|
|
||||||
|
- New unit tests: `tests/unit/runtime_root/test_az618_pre_constructed.py`
|
||||||
|
covering AC-1..AC-4.
|
||||||
|
|
||||||
|
- 6 internal phases — each phase is one source-file delta + matching unit
|
||||||
|
test, and they may be batched but MUST land in dependency order:
|
||||||
|
|
||||||
|
1. **c13_fdr + clock** — foundational. The FDR client + WallClock helper
|
||||||
|
(live) / TlogDerivedClock reuse (replay) — both already exist; the
|
||||||
|
builder is an assembly step.
|
||||||
|
2. **c6_descriptor_index + c6_tile_store** — descriptor faiss index +
|
||||||
|
tile cache storage. AZ-306 + AZ-303 already built the runtime classes.
|
||||||
|
3. **c7_inference engine** — GPU model load. PyTorch FP16 vs. TensorRT
|
||||||
|
selected by config; `BUILD_TENSORRT_RUNTIME` / `BUILD_PYTORCH_FP16_RUNTIME`
|
||||||
|
env flags gate the import path.
|
||||||
|
4. **c3_lightglue_runtime + c3_feature_extractor** — ALIKED / DISK
|
||||||
|
LightGlue. Gated by `BUILD_C3_MATCHER_DISK_LIGHTGLUE` /
|
||||||
|
`BUILD_C3_MATCHER_ALIKED_LIGHTGLUE` env flags.
|
||||||
|
5. **c282_ransac_filter** — small, stateless OpenCV-USAC wrapper.
|
||||||
|
6. **c5 helpers** — `c5_imu_preintegrator`, `c5_se3_utils`,
|
||||||
|
`c5_wgs_converter`, `c5_isam2_graph_handle`. All four are already-done
|
||||||
|
helpers; the builder is pure assembly.
|
||||||
|
|
||||||
|
### Excluded
|
||||||
|
|
||||||
|
- Changing the per-component helper / strategy factory signatures. Each
|
||||||
|
builder consumes the existing factory's documented surface (e.g.
|
||||||
|
`make_fdr_client(...)`, `build_inference_runtime(config, ...)`); no
|
||||||
|
changes to those signatures are in scope.
|
||||||
|
- GPU build-flag matrix expansion. The `BUILD_*` env flag system is already
|
||||||
|
in place per component (`config.components.*.strategy`); this task only
|
||||||
|
consumes the existing flags. New flags are out of scope.
|
||||||
|
- Operator binary (`operator_bootstrap.py`) extensions. AZ-591 deferred the
|
||||||
|
operator-side pre_constructed assembly; this task is airborne-only.
|
||||||
|
Operator binary's current direct-factory path is not affected.
|
||||||
|
- Replay-branch wiring beyond what already exists. Replay continues to
|
||||||
|
supply `frame_source` / `fc_adapter` / `clock` / `mavlink_transport` /
|
||||||
|
`replay_sink` via `build_replay_components`; this task adds the
|
||||||
|
airborne-side keys ABOVE that set in the same `pre_constructed` dict.
|
||||||
|
- Refactor of `airborne_bootstrap.py`'s wrapper-factory layer. The existing
|
||||||
|
`_c1_vio_wrapper`, `_c2_vpr_wrapper`, etc. functions consume `constructed`
|
||||||
|
correctly today; only the dict-population layer is new.
|
||||||
|
|
||||||
|
## Acceptance Criteria
|
||||||
|
|
||||||
|
**AC-1: `build_pre_constructed(config)` populates every required key**
|
||||||
|
Given a process where `register_airborne_strategies()` has run
|
||||||
|
And a `Config` selecting every component's default strategy
|
||||||
|
When `build_pre_constructed(config)` is called
|
||||||
|
Then the returned dict contains exactly the set of keys
|
||||||
|
`set.union(*AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS.values())`
|
||||||
|
And no key maps to `None`.
|
||||||
|
|
||||||
|
**AC-2: `compose_root(config, pre_constructed=...)` reaches takeoff**
|
||||||
|
Given `register_airborne_strategies()` has run
|
||||||
|
And `pre_constructed = build_pre_constructed(config)` for the default config
|
||||||
|
When `compose_root(config, pre_constructed=pre_constructed)` runs
|
||||||
|
Then it returns a `RuntimeRoot` whose `components` dict contains all 7
|
||||||
|
registered slots (c1_vio, c2_vpr, c2_5_rerank, c3_matcher, c3_5_adhop,
|
||||||
|
c4_pose, c5_state) without raising `AirborneBootstrapError`.
|
||||||
|
|
||||||
|
**AC-3: `BUILD_*` flag mismatch surfaces a clear error**
|
||||||
|
Given the config selects `c2_vpr.strategy="net_vlad"` (requires `c7_inference`)
|
||||||
|
And `BUILD_PYTORCH_FP16_RUNTIME=OFF`
|
||||||
|
When `build_pre_constructed(config)` is called
|
||||||
|
Then it raises `AirborneBootstrapError` whose message names both
|
||||||
|
`c7_inference` (the missing infrastructure) and the gating
|
||||||
|
`BUILD_*` flag.
|
||||||
|
|
||||||
|
**AC-4: `runtime_root.main()` end-to-end exit codes**
|
||||||
|
Given a minimal in-process `Config` that selects all-defaults
|
||||||
|
When `main(config)` is called with every `BUILD_*` flag the defaults need
|
||||||
|
Then it returns `0` (success) and the runtime_root constructed log line
|
||||||
|
fires.
|
||||||
|
And when a single required infra dep is forcibly unavailable
|
||||||
|
Then it returns `EXIT_GENERIC_FAILURE` (`1`) and stderr contains the
|
||||||
|
`airborne_bootstrap:` prefix with the missing key and consuming component.
|
||||||
|
|
||||||
|
**AC-5: Jetson tier-2 e2e replay tests cross compose_root.ready**
|
||||||
|
Given the AZ-618 changes are landed
|
||||||
|
And the Jetson tier-2 e2e harness is invoked
|
||||||
|
(`tests/e2e/replay/test_derkachi_1min.py::test_ac1_exits_0_jsonl_count_match`
|
||||||
|
+ AC-2 + AC-5 + AC-6 ×2)
|
||||||
|
Then each test progresses BEYOND `replay.compose_root.ready` (cross-cycle
|
||||||
|
smoke: log appears in stdout AND the per-frame pipeline log
|
||||||
|
`replay.input.frame_emitted` fires at least once per test).
|
||||||
|
This AC verifies the airborne wiring is correct end-to-end; whether
|
||||||
|
the per-frame results pass each AC's substantive threshold (count
|
||||||
|
match, schema match, determinism, pace) is gated by other tasks and
|
||||||
|
not blocking this AC.
|
||||||
|
|
||||||
|
## Non-Functional Requirements
|
||||||
|
|
||||||
|
- **Startup time**: `build_pre_constructed(config)` must complete within
|
||||||
|
60 s on Jetson Orin Nano (JetPack 6.2.2+b24) for the default config.
|
||||||
|
GPU model load + TensorRT engine cache compilation dominate; if the
|
||||||
|
engine cache is cold and exceeds 60 s, log a one-line progress
|
||||||
|
notice at 30 s.
|
||||||
|
- **Memory**: peak resident set after `build_pre_constructed` must be
|
||||||
|
< 2 GB on Jetson (excluding the inference model itself; the model is
|
||||||
|
separately bounded by AZ-320's NFRs).
|
||||||
|
- **Determinism**: invoking `build_pre_constructed(config)` twice in the
|
||||||
|
same process MUST produce equivalent dicts (every key present, every
|
||||||
|
builder callable). Re-invocation is not expected in production but
|
||||||
|
IS expected in tests; the second call must not raise on already-loaded
|
||||||
|
GPU resources.
|
||||||
|
- **Operator-facing error contract**: every `AirborneBootstrapError`
|
||||||
|
message MUST include (a) the consuming component slug, (b) the
|
||||||
|
missing dependency key or `BUILD_*` flag, and (c) one actionable
|
||||||
|
sentence pointing at the fix (e.g. "set `BUILD_C3_MATCHER_DISK_LIGHTGLUE=ON`"
|
||||||
|
or "ensure `c13_fdr.path` is writable").
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
- AZ-591 (registry registration prerequisite) — DONE
|
||||||
|
- All component runtime classes/factories listed under the **Dependencies**
|
||||||
|
field above — DONE per individual task IDs
|
||||||
|
|
||||||
|
## Constraints
|
||||||
|
|
||||||
|
- This task MUST NOT touch any per-component factory signature. All
|
||||||
|
changes are confined to `runtime_root/airborne_bootstrap.py`,
|
||||||
|
`runtime_root/__init__.py`, and the new test file.
|
||||||
|
- This task MUST NOT introduce new `BUILD_*` env flags. Reuse the
|
||||||
|
existing per-strategy `BUILD_*` matrix already gated by each
|
||||||
|
component's strategy factory.
|
||||||
|
- Do not stub or mock the inference engine in production code. The
|
||||||
|
`c7_inference` builder MUST exercise the real (PyTorch FP16 or
|
||||||
|
TensorRT) runtime when called from `main()`. Tests MAY stub it
|
||||||
|
via `build_pre_constructed` mock seams documented in the new test
|
||||||
|
file.
|
||||||
|
|
||||||
|
## Implementation Notes
|
||||||
|
|
||||||
|
- 6-phase internal split (see Scope.Included). Phases land in
|
||||||
|
dependency order; AC tests for each phase live with the phase
|
||||||
|
but the full AC-1..AC-4 suite only goes green after phase 6.
|
||||||
|
- The Jetson-only AC-5 cannot be run from the Mac dev host. The
|
||||||
|
task is "done" when AC-1..AC-4 pass locally + AC-5 passes on
|
||||||
|
the operator's Jetson per `scripts/run-tests-jetson.sh`.
|
||||||
|
- AZ-591's task spec called out this exact follow-up (see its
|
||||||
|
"AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS docstring": *"production
|
||||||
|
wiring populates them from the takeoff orchestrator — separate
|
||||||
|
task — AZ-591 follow-up infrastructure-prep"*). This is that task.
|
||||||
|
|
||||||
|
## Evidence
|
||||||
|
|
||||||
|
- Step-11 Cycle-3 addendum: `_docs/03_implementation/run_tests_step11_report.md`
|
||||||
|
(committed `e054a55`)
|
||||||
|
- Jetson tier-2 e2e rerun #3 terminal output:
|
||||||
|
`/Users/obezdienie001/.cursor/projects/Users-obezdienie001-dev-azaion-suite-gps-denied-onboard/terminals/110515.txt`
|
||||||
|
(2026-05-18 06:01 UTC, log lines for `replay.compose_root.ready` +
|
||||||
|
`airborne_bootstrap` raise).
|
||||||
|
- `AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS` definition site:
|
||||||
|
`src/gps_denied_onboard/runtime_root/airborne_bootstrap.py:92`.
|
||||||
|
- Current incomplete `main()`: `src/gps_denied_onboard/runtime_root/__init__.py:636`.
|
||||||
@@ -2,13 +2,13 @@
|
|||||||
|
|
||||||
## Current Step
|
## Current Step
|
||||||
flow: greenfield
|
flow: greenfield
|
||||||
step: 11
|
step: 7
|
||||||
name: Run Tests
|
name: Implement
|
||||||
status: passed_with_followups
|
status: not_started
|
||||||
sub_step:
|
sub_step:
|
||||||
phase: 8
|
phase: 0
|
||||||
name: az614-az611-landed-bootstrap-gap-discovered
|
name: awaiting-invocation
|
||||||
detail: "AZ-614 + AZ-611 + AZ-602 build-flags + AZ-615 tilde-fix all landed (commits e114bfd, bd41956, 324bbd6, b7012d2). Jetson Cycle-3 rerun (terminal 110515.txt): replay path now reaches `replay.compose_root.ready: auto_sync_used=false`, then crashes in `runtime_root.airborne_bootstrap` with `pre_constructed['c282_ransac_filter']` missing. Same 5 heavy ACs still fail but 3 layers deeper — `runtime_root.main()` calls `register_airborne_strategies()` but does NOT build c13_fdr/c6_*/c7_inference/c3_*/c2_82_ransac_filter into pre_constructed. Filed AZ-618 (Story under AZ-602, 5 pts capped). Pending user decision on whether to start AZ-618 immediately or close out Step 11 with the current Reality-Gate signal."
|
detail: "AZ-618 task spec in todo/ (Step 11 gate sent flow back per greenfield rule: missing internal product implementation = back to Implement)"
|
||||||
retry_count: 0
|
retry_count: 0
|
||||||
cycle: 1
|
cycle: 1
|
||||||
tracker: jira
|
tracker: jira
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
# D-CROSS-CVE-1 opencv-python pin deferred — gtsam/numpy ABI block
|
# D-CROSS-CVE-1 opencv-python pin deferred — gtsam/numpy ABI block
|
||||||
|
|
||||||
**Recorded**: 2026-05-11T02:55+03:00 (Europe/Kyiv)
|
**Recorded**: 2026-05-11T02:55+03:00 (Europe/Kyiv)
|
||||||
**Last replay attempt**: 2026-05-17T16:23+03:00 (Europe/Kyiv) — PyPI still shows
|
**Last replay attempt**: 2026-05-18T20:35+03:00 (Europe/Kyiv) — PyPI still shows
|
||||||
`gtsam==4.2.1` as the latest stable (`requires_dist: numpy<2.0.0,>=1.11.0`);
|
`gtsam==4.2.1` as the latest stable (`requires_dist: numpy<2.0.0,>=1.11.0`);
|
||||||
`gtsam==4.3a0` alpha exists but is not a stable wheel target. Replay condition
|
`gtsam==4.3a0` alpha exists but is not a stable wheel target. Replay condition
|
||||||
(numpy>=2 stable wheels) still NOT met. Leftover remains open.
|
(numpy>=2 stable wheels) still NOT met. Leftover remains open.
|
||||||
|
|||||||
Reference in New Issue
Block a user