Imports AZ-943 (OKVIS2 binding: real ThreadedSlam wiring; AZ-592 split
1/3, 5pt) from Jira into a local task spec at
_docs/02_tasks/todo/AZ-943_okvis2_threadedslam_binding.md so the
implement skill batch loop has the input it needs.
Dependency table: +AZ-943 row, +preamble entry, totals 180→181 tasks /
570→575 SP. AZ-944 + AZ-945 stay Jira-only this session per the
AZ-943→AZ-944→AZ-945 Blocks chain (their local specs land when their
Implement turns come up).
State file trimmed from 52 lines to schema-compliant 13 lines per
.cursor/skills/autodev/state.md (sub_step.detail must be a one-line
pointer, not a logbook). Resume context lives in the new task spec +
git log of 94d2358 (AZ-918..AZ-922 baseline fixes).
Per AZ-942 + AZ-923 are parked (state file's "Open Items At Pause" is
recorded in git log via this commit's body; not retained in state file
going forward).
Co-authored-by: Cursor <cursoragent@cursor.com>
Replaces the tlog two-clock replay surface with a single-clock path
driven by the Derkachi-schema CSV. --imu is the new required CLI arg;
--tlog stays as a deprecated alias (warned + ignored when --imu set)
until AZ-895 deletes it.
* csv_ground_truth.py parses the 15-column schema, fails fast at
startup on every documented schema fault (AC-5).
* CsvReplayFcAdapter slots into ReplayInputBundle.fc_adapter alongside
the tlog sibling; mirrors Invariant-5 outbound wiring; inbound bus is
intentionally a no-op since the loop reads CSV directly.
* _run_replay_loop branches on imu_csv_path, stamps
VioOutput.emitted_at_ns from the CSV-derived frame_end_ns (AC-4),
closing the AZ-848 two-clock surface for the new path.
* AZ-896 ships the operator-facing format spec at
_docs/02_document/contracts/replay/csv_replay_format.md plus a
20-row example CSV (AC-3 regression-locked).
Tests: 11 + 12 new unit tests, plus updates to AZ-401 import-boundary
and AZ-402 CLI suites. Full unit suite 2,327 passed / 86 skipped.
Co-authored-by: Cursor <cursoragent@cursor.com>
Cycle-3 refactor run 02-az507 (RouteSpec relocation + module-layout
refresh + AZ-270 lint widening). Single batch of 3 tasks; epic AZ-844.
AZ-845 — Relocate RouteSpec DTO to _types/route.py (rule-9 fix):
* New canonical home: src/gps_denied_onboard/_types/route.py
(frozen+slots dataclass; full docstring carried over verbatim).
* c11_tile_manager/route_client.py imports from _types.route.
* replay_input/tlog_route.py and replay_input/__init__.py keep
re-exports for backward-compat (RouteSpec in __all__).
* 5 test files updated to import from _types.route for symmetry.
* Identity-preserving re-export verified by new test
test_az845_routespec_canonical_home_and_reexport_identity.
AZ-846 — Refresh module-layout.md cycle-3 entries:
* c11_tile_manager Internal list rewritten with all 8 internals
(alphabetised) — corrects a stale entry that referenced files
(satellite_provider_*.py) that no longer exist.
* shared/replay_input file list adds errors.py (cycle-2 carry),
tlog_ground_truth.py (cycle-2 carry), tlog_route.py (cycle-3 NEW).
* shared/_types section registers route.py with provenance line.
* Out-of-scope cycle-2 carry-overs (replay_api/, cli/render_map.py,
helpers/gps_compare.py, etc.) intentionally untouched.
AZ-847 — Widen test_az270 lint to enforce full rule-9 allow-list:
* test_ac6_only_compose_root_imports_concrete_strategies now walks
every components/<X>/*.py ImportFrom/Import and rejects anything
not in the rule-9 allow-list (own subpackage + _types + helpers
+ config/logging/fdr_client/clock + frame_source interface-only).
* Strict superset of the original AC-6 narrow check.
* Reports zero violations on the codebase post-AZ-845.
* Two principled carve-outs documented in the test docstring:
- components/<X>/bench/** path skip (measurement code legitimately
constructs production strategies via runtime_root factories).
- register_* lazy self-registration imports from
runtime_root.<X>_factory (central-registry plugin pattern).
* Both carve-outs surfaced to user via Choose A/B/C/D Risk-1
protocol; user skipped both — agent proceeded with documented
defaults. Doc-only follow-up tracked in
_docs/_process_leftovers/2026-05-24_az847_rule9_wording_followup.md
for rule-9 wording update in module-layout.md.
Test results: 2287 passed, 90 skipped (environmental — Docker / CUDA
/ TensorRT / Jetson hardware / fixtures), 0 failed. Focused subset
(replay_input/ + c11_tile_manager/ + test_az270_compose_root.py)
also clean: 169 passed, 1 skipped.
Tracker: AZ-845/846/847 transitioned In Progress -> In Testing.
Co-authored-by: Cursor <cursoragent@cursor.com>
- Changed autodev state sub_step to reflect new phase and task details: updated phase from 7 to 2, renamed task to 'refactor-analysis-gate', and revised detail to indicate the creation of new tasks AZ-844, AZ-845, AZ-846, and AZ-847, awaiting Phase-2 gate.
- Updated dependencies table with the latest task counts and complexity points, reflecting the addition of new tasks and the closure of AZ-777 in Jira. Total tasks now stand at 173 with 557 complexity points.
Wraps the AZ-699 verdict-report path with the AZ-839
operator_pre_flight_setup C3 fixture so a single Tier-2 test
takes only (tlog, video, calibration) and runs the full 7-step
pipeline on the Jetson harness without operator hand-curation.
New surface (tests-only, no src/ changes):
- tests/e2e/replay/_e2e_orchestrator.py — orchestrator with
OrchestratorStep enum, OrchestrationFailure exception (step
prefix per AC-5), OrchestrationReport dataclass,
write_effective_replay_config helper, and
run_e2e_orchestration entry point covering steps 1-2-6-7.
- tests/e2e/replay/test_e2e_orchestrator_unit.py — 17 unit
tests covering each failure mode + happy path with mocked
subprocess + ground-truth loader (AC-8).
- tests/e2e/replay/test_az835_e2e_real_flight.py — Tier-2 +
RUN_REPLAY_E2E gated integration test asserting verdict
report exists, 15-min budget held (AC-1, AC-2, AC-3, AC-4,
AC-6).
The effective config write overlays c6_tile_cache.root_dir
onto the static operator YAML at runtime so the airborne
subprocess shares the cache_root the C3 fixture chose. Field-
level merge — every other operator-config block stays
verbatim. The static YAML on disk is never touched.
Test run: tests/e2e/replay 45 passed, 10 skipped (10 skips
were 9 pre-existing + 1 new tier2). No src/ touched, no
AZ-839 driver changes; AC-7 (AZ-699 still passes) holds by
inspection.
Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the placeholder operator_pre_flight_setup pytest fixture (the
mkdir stub at tests/e2e/replay/conftest.py:293-310) with a real driver
that wires C1 (AZ-836 RouteSpec) + C2 (AZ-838 SatelliteProviderRoute
Client) + C11 (AZ-316 HttpTileDownloader) + C10 (AZ-322 Descriptor
Batcher) end-to-end and yields a typed PopulatedC6Cache. AZ-306 FAISS
sidecar triple-consistency is verified post-rebuild via a caller-
supplied descriptor_index_factory; partial sidecars are cleaned up on
failure (AC-7) while pre-existing warm-cache files are preserved.
Algorithm lives in tests/e2e/replay/_operator_pre_flight.py with
pure dependency injection so the AC-8 unit suite (11 tests covering
happy / transient-retry / terminal-failure / validation-error /
tamper-detection / cleanup-on-failure) runs against stubs and the
AC-9 Tier-2 integration test runs the same algorithm against the
real Jetson harness. The conftest fixture skip-gates on RUN_REPLAY
_E2E + SATELLITE_PROVIDER_URL/API_KEY + BUILD_FAISS_INDEX +
GPS_DENIED_OPERATOR_CONFIG_PATH and wires deps through the existing
runtime_root factories. Supersedes AZ-777 Phase 3.
Co-authored-by: Cursor <cursoragent@cursor.com>
Operator-side HTTP client + CLI that takes a RouteSpec from AZ-836
and onboards it via satellite-provider's POST /api/satellite/route:
pre-emptive AZ-809 validation, request submission, polling until
mapsReady, and POST /api/satellite/tiles/inventory verify.
Lives in c11_tile_manager (shared parent-suite HTTP/JWT plumbing,
shared BUILD_C11_TILE_MANAGER gate); error hierarchy split off
SatelliteProviderRouteError to keep the tile path and route path
independent. 30 unit tests + 1 RUN_E2E-gated integration test.
Pre-emptive validator tracks the actual AZ-809 server bounds
(points [2,500], zoom [0,22]) instead of the AZ-838 spec's narrower
client-only bounds; flagged as F1 in batch_107_cycle3_report.md
for user decision (accept-and-update-spec / revert-to-spec).
Co-authored-by: Cursor <cursoragent@cursor.com>
First building block of Epic AZ-835. Pure function that consumes
an ArduPilot binary tlog and returns a RouteSpec (waypoints +
per-waypoint coverage radius + provenance) suitable for posting
to satellite-provider's POST /api/satellite/route endpoint.
Pipeline:
- Load GPS fixes via existing load_tlog_ground_truth (AZ-697).
- Trim leading + trailing rows below takeoff thresholds
(speed >= 2 m/s AND AGL >= 5 m by default; configurable).
- Coarsen to <= max_waypoints via iterative Douglas-Peucker on
the local-ENU projection (WgsConverter.latlonalt_to_local_enu,
AZ-279). DP tolerance is caller-supplied or binary-searched
(<= 32 iterations, <= 1 m convergence).
Public surface (re-exported from replay_input/__init__.py):
- RouteSpec (frozen, slots, with provenance fields).
- RouteExtractionError (subclass of ReplayInputAdapterError).
- extract_route_from_tlog().
Tests: 14 unit tests cover AC-1..AC-10 plus edge cases (custom
DP tolerance, invalid inputs, error hierarchy, too-short segment).
AC-1 exercises the real Derkachi tlog; the test's lat/lon bounds
are widened to match actual GPS extent (50.0800..50.0840 /
36.1070..36.1145) — the AZ-836 spec's tighter IMU-derived bounds
(50.0808..50.0832 / 36.1070..36.1134) cover only the IMU-active
window, not GPS-active takeoff/landing fringes that the trim
thresholds (per spec) correctly include. See
_docs/03_implementation/batch_106_cycle3_report.md "Spec drift
surfaced" for the full note.
Semantics decision documented inline: max_waypoints is enforced
only in auto-tolerance mode; with an explicit DP tolerance the
result reflects that exact tolerance.
AZ-836 moved to done/.
Co-authored-by: Cursor <cursoragent@cursor.com>
AZ-835 Epic (E2E real-flight validation pipeline, ~17 SP across
6 children C1-C6) supersedes AZ-777 Phase 3+ (bbox-based static
seed). Children C3-C6 deliberately not yet filed — will be
re-estimated after C1+C2 land from real RouteSpec shape and
Route API client ergonomics.
- AZ-836 (C1, 3 SP): TlogRouteExtractor — pure function over
.tlog binary returning RouteSpec (waypoints + suggested
region size). Deps: AZ-697 (load_tlog_ground_truth, done),
AZ-279 (WGS converter, done).
- AZ-838 (C2, 3 SP): SatelliteProviderRouteClient + seed_route.py
CLI mirror of seed_region.py. Hard-depends on AZ-836's
RouteSpec dataclass.
- _dependencies_table.md updated with the three new rows.
Workspace-boundary rule expansion: codifies the sibling-repo
task-spec exception (the only permitted write into a sibling
repo) and the "External Systems Are Black Boxes" rule
(contract-only consumption of producer repos like
satellite-provider).
Bookkeeping: _autodev_state.md condensed to <30 lines per the
state.md conciseness rule; opencv-pin leftover replay
re-checked 2026-05-22 (gtsam still only 4.2, replay condition
unchanged).
Co-authored-by: Cursor <cursoragent@cursor.com>
Phase 1 hotfix:
- C11 HttpTileDownloader adapted to satellite-provider v2.0.0
z/x/y inventory contract (bulk POST keyed by slippy-map coords).
- Unit tests rewritten to exercise the new inventory schema.
- E2E smoke test updated to match the v2.0.0 wire.
Phase 2 (Derkachi seed + smoke-validated on Jetson):
- tests/fixtures/derkachi_c6/{README,bbox.yaml,seed_region.py}
drives POST /api/satellite/region against satellite-provider
with Google Maps as the imagery source. Smoke run produced
4 regions, 175 tiles, inventory 32/32.
- scripts/mint_dev_jwt.py + run-tests-jetson.sh auto-mint and
export SATELLITE_PROVIDER_API_KEY using JWT_SECRET / JWT_ISSUER
/ JWT_AUDIENCE env vars (no host port mappings; e2e-runner
reaches SP via internal docker network only).
Spec amendment: AZ-777 todo spec updated to record the
Google Maps imagery source decision and STOP-gate state.
AZ-777 Phase 3+ work is superseded by Epic AZ-835 (see next
commit).
Co-authored-by: Cursor <cursoragent@cursor.com>
Cycle-3 /autodev session discovered material drift between the prior
session's rewritten AZ-777 spec and current codebase reality. Refreshed
the spec, re-synced Jira (description + summary updated, status
unchanged at In Progress), appended an addendum to the 2026-05-21
decision log capturing the findings, and slimmed the state file to
the conciseness rule.
Findings reconciled:
- Tier-1 (docker-compose.test.yml) is deprecated per 2026-05-20 env
policy; original Phase 1 mods there are out of scope.
- Jetson compose ALREADY has satellite-provider + satellite-provider
-postgres services (lineage AZ-688 / AZ-691 / AZ-692). No new
service definitions needed; only e2e-runner env block.
- Port / protocol: 8080 HTTPS (self-signed dev cert), not 5101 HTTP.
- C11 contract drift: _LIST_PATH/_GET_PATH constants in
tile_downloader.py don't match the real /api/satellite/tiles
/inventory + /tiles/{z}/{x}/{y} endpoints. Phase 1 now includes
C11 contract adaptation (the largest single sub-deliverable).
- arm64 manifest of mcr.microsoft.com/dotnet/aspnet:10.0 verified;
Risk 3 closed.
- mock-sat retired from Jetson + D-PROJ-2 /api/satellite/upload
shipped on parent; mock-sat retention closed.
8-pt complexity unchanged. Single-ticket containment preserved.
Phase boundaries (STOP gates) preserved. No code changed yet —
this commit is spec / state / decision-log only; next /autodev
session executes Phase 1.
Co-authored-by: Cursor <cursoragent@cursor.com>
Original spec called for direct OSM/CARTO downloads, contradicting
architecture (C11 owns tile network I/O against parent-suite
satellite-provider .NET 8 service; C10 batches descriptors over the
populated C6, never touches the upstream). Rewritten spec drives the
production C10/C11 pipeline against the real satellite-provider
running in docker-compose.test.yml, replacing the mock-suite-sat-
service GET stub. Complexity 5 -> 8 pts (single-ticket override).
Decision log: _docs/_process_leftovers/2026-05-21_az777_complexity_
override.md. Jira AZ-777 description + summary synced. Autodev state
pauses for next session to pick up Phase 1 (satellite-provider
stand-up + smoke test).
Co-authored-by: Cursor <cursoragent@cursor.com>
New replay_api component: FastAPI service wrapping the offline
gps-denied-replay pipeline. POST tlog+video (multipart) → either
sync 200 with result/map/report URLs, or async 202 + job id with
/jobs/{id} polling. Magic-byte validation, bearer auth, in-memory
JobRegistry with concurrency + queue caps (429 on overflow).
Helper accuracy_report.py promoted from tests/ to src/ because the
API needs the Markdown report writer at runtime; all AZ-699 imports
re-pointed. OpenAPI spec exported to docs.
18/18 unit tests pass (AC-1 sync, AC-2 async, AC-3 state machine,
AC-5 auth, AC-6 health, AC-8 concurrency, AC-9 magic-byte). Full
unit suite: 2251 pass, 86 skip, 1 pre-existing C12 cold-start flake
(unchanged). mypy --strict clean on the new surface.
Co-authored-by: Cursor <cursoragent@cursor.com>
New operator-side console-script renders a self-contained HTML map
(folium / Leaflet) comparing the estimator's JSONL track against
the tlog ground-truth track. Pinned visual style: red truth + blue
estimated polylines, start/end markers per track, 100 m + 50 m
scale circles, optional AZ-699 accuracy-summary banner, and an
--offline-tiles mode (with optional local tile-URL template) for
Jetsons without internet.
folium is gated behind a new [operator-tools] optional-dep so the
airborne binary's cold-start NFR is unaffected (C12 binary doesn't
import the new module). 14 new unit tests pin polyline count,
marker count, scale-circle radii, summary embedding, offline-tile
behaviour, and full CLI smoke. Zero mypy --strict errors.
Refines the 2026-05-20 Jetson-only test policy: unit tests may run
locally, e2e/perf/resilience/security stay Jetson-only. Documented
in _docs/02_document/tests/environment.md (Where each tier runs)
and .cursor/rules/testing.mdc (Test environment for this project).
Co-authored-by: Cursor <cursoragent@cursor.com>
New e2e test runs gps-denied-replay --auto-trim against the real
derkachi.tlog + flight video + AZ-702 calibration, computes the
horizontal-error distribution (mean/p50/p95/p99 + 10/25/50/100 m
threshold-hit share), writes _docs/06_metrics/real_flight_
validation_{date}.md, and asserts honest PASS/FAIL with no @xfail
mask. AZ-404's 1-min test is untouched (sibling, not replacement).
Extends gps_compare.py with HorizontalErrorDistribution +
percentile_sorted (numpy-equivalent linear interpolation). New
test helper _report_writer.py renders the canonical Markdown
schema documented as FT-P-20 in blackbox-tests.md.
16 new unit tests pin distribution arithmetic, verdict gate,
failure-message templating (references calibration acquisition
method per AC-3), and report layout. 129 passed in focused
regression, 3 skipped (real video / Tier-2 prerequisites).
Zero new mypy --strict errors.
Co-authored-by: Cursor <cursoragent@cursor.com>
Adds find_aligned_window cross-correlation (NCC, per-window unit norm)
between IMU energy and video optical-flow magnitude. Returns
AlignedWindow{tlog_start_ns, tlog_end_ns, offset_ms, confidence,
used_fallback}, with fallback to head-takeoff on low confidence to
preserve AZ-405 behavior. TlogReplayFcAdapter honors tlog_start_ns and
skips pre-window messages. New --auto-trim CLI flag, mutex with
--time-offset-ms. AC-1..AC-4 covered by unit tests; AC-5 skipped (no
real flight_derkachi.mp4 in repo). 106 tests pass in regression slice.
Zero new mypy --strict errors.
Co-authored-by: Cursor <cursoragent@cursor.com>
Batch 98 (cycle 2) — first two PBIs of epic AZ-696 (real-flight
validation harness):
AZ-697: direct binary-tlog GPS-truth extractor
- New src/gps_denied_onboard/replay_input/tlog_ground_truth.py reads
GLOBAL_POSITION_INT (with GPS_RAW_INT fallback) from a binary
ArduPilot tlog via pymavlink.mavutil and returns a frozen+slotted
TlogGroundTruth DTO with per-record ts_ns / lat_deg / lon_deg / alt_m
/ hdg_deg / vx_m_s / vy_m_s / vz_m_s.
- Promoted l2_horizontal_m + match_percentage + GroundTruthRow from
tests/e2e/replay/_helpers.py into the new production module
src/gps_denied_onboard/helpers/gps_compare.py. The e2e helper now
re-exports the same objects (identity, not copies) so existing test
imports continue working untouched.
- tests/e2e/replay/conftest.py prefers the real derkachi.tlog when
present, falls back to the CSV synth path otherwise.
- 22 new unit tests cover AC-1..AC-5 (mypy --strict subprocess test
included). All passing.
AZ-702: Topotek KHP20S30 factory-sheet camera calibration
- New _docs/00_problem/input_data/flight_derkachi/khp20s30_factory.json:
fx = fy = 4644.444, cx = 960, cy = 540, HFOV ~ 23.3 deg, VFOV ~ 13.2
deg, computed from the published 8.5 mm focal length + 1/2.8" sensor
+ 1920x1080 capture at lowest zoom step. Distortion zeroed,
body_to_camera_se3 = identity with nadir convention. Acquisition
method explicitly recorded as factory_sheet so downstream code can
expect higher residual error than a lab calibration.
- _docs/00_problem/input_data/flight_derkachi/camera_info.md updated
to document the assumptions, expected residual error window, and
conftest pick-up rule.
- tests/e2e/replay/conftest.py::_calibration_path() prefers
khp20s30_factory.json when present, falls back to adti26.json.
- 9 new unit tests cover AC-1..AC-4 (schema, intrinsics traceback,
doc reference, conftest pick-up). All passing.
Test run: 45 new tests, all passing. Full-suite gate deferred to
Step 16 (after the last batch in cycle 2 per the implement skill).
Adjacent note (not fixed in this batch, recorded in the batch report):
auto_sync.py has the same redundant pymavlink type:ignore + a few
numpy/cv2 mypy --strict issues. None on this batch's path.
Refs: _docs/03_implementation/batch_98_cycle2_report.md
Refs: _docs/02_tasks/done/AZ-697_tlog_ground_truth_extractor.md
Refs: _docs/02_tasks/done/AZ-702_khp20s30_calibration.md
Co-authored-by: Cursor <cursoragent@cursor.com>
Pre-implement chore commit to land orchestration artifacts produced by
autodev cycle-2 Step 9 (New Task), so that Step 10 (Implement) starts
against a clean working tree.
What's included:
- .gitignore: exclude _docs/00_problem/input_data/**/*.{tlog,mp4,h264}
(derkachi.tlog is a 5.8 MB binary input and stays out-of-band).
- _docs/02_tasks/todo/AZ-697..AZ-702: 6 new PBI specs under epic AZ-696
(tlog ground-truth extractor, mid-flight trim+align, real-flight
validation runner, replay map viz, HTTP replay API, KHP20S30 calib).
- _docs/02_tasks/_dependencies_table.md: dep edges for the 6 PBIs.
- _docs/_autodev_state.md: status -> in_progress, step 10 cycle 2.
- _docs/_process_leftovers/...opencv_pin_deferred.md: replay-attempt
timestamp refreshed (gtsam-numpy-2 wheels still not published;
leftover remains open).
No source code is modified by this commit.
Co-authored-by: Cursor <cursoragent@cursor.com>
Replay CLI synthesizes a minimal Config whose `components` mapping
omits the strategy-component blocks (`c6_tile_cache`, `c7_inference`,
`c5_state`) the airborne bootstrap historically read unconditionally.
Add `_replay_omits_component_block` and gate the c6 seeds, the c7 +
c3_lightglue_runtime pair, and the c5 (estimator, handle) eager build
on `config.mode == "replay" AND block absent`. Live mode and any
replay config that DOES populate the blocks remain unchanged — the
guard is conditional, not blanket.
The skip is safe because compose_root's per-component wrappers only
run for slugs in `config.components`; absent blocks mean absent
wrappers, so the seeded slots would never be read. Fix lives at the
BUILD-PRE-CONSTRUCTED layer per the spec's explicit "no silent fallback
in `_c6_config`" constraint.
Covers AC-687-1 / AC-687-2 / AC-687-4. AC-687-3 (Jetson Tier-2 e2e
replay) requires an out-of-band hardware re-run; evidence destination
documented in autodev state.
Co-authored-by: Cursor <cursoragent@cursor.com>
Jetson Tier-2 e2e on 2026-05-19 11:27 surfaced a NEW gap one phase
deeper than where Rerun 3 died: build_pre_constructed seeds
c6_descriptor_index unconditionally, which reads
config.components["c6_tile_cache"] via storage_factory._c6_config.
The replay CLI synthesizes a Config that has no c6_tile_cache
block, so AC-1/2/5/6 fail with KeyError 'c6_tile_cache'.
Bootstrap (no source code changes):
- AZ-687 (Story, To Do, 2pt, Epic AZ-602; blocks AZ-618)
- Task spec in _docs/02_tasks/todo/
- _dependencies_table.md row + header narrative
- _docs/_autodev_state.md detail repointed at AZ-687
- _docs/03_implementation/jetson_runs/ Tier-2 evidence
The fix itself lives in batch 97 (next session): guard the c6/c7
seeds at the BUILD-PRE-CONSTRUCTED layer when config.mode ==
"replay". Per existing storage_factory._c6_config docstring the
silent-fallback path is explicitly rejected — the bootstrap layer
is the right seam.
Co-authored-by: Cursor <cursoragent@cursor.com>
Wire register_airborne_strategies + build_pre_constructed +
compose_root(config, pre_constructed=...) into runtime_root.main(). The
existing exception block now catches AirborneBootstrapError distinctly
before the broader (ConfigurationError, StrategyNotLinkedError,
RuntimeError) clause so the operator-facing "airborne_bootstrap:"
prefix carried by every bootstrap error reaches stderr cleanly with
EXIT_GENERIC_FAILURE rather than getting absorbed into a generic
backtrace.
This closes the AZ-618 umbrella: AZ-619..AZ-623 + AZ-625 had built
each pre_constructed key; this batch lands the integration that the
production main() actually invokes them. Both the live
gps-denied-onboard and replay gps-denied-replay binaries dispatch
through this main() per ADR-011, so both reach takeoff with
pre_constructed populated end-to-end.
Tests: tests/unit/runtime_root/test_az618_pre_constructed.py adds 6
tests covering AC-618-1..AC-618-4 + AZ-624 local handler-ordering
regression guard. The strategy factories are stubbed at the
airborne_bootstrap module boundary so the test exercises the
integration seam without standing up gtsam / FAISS / TensorRT /
PyTorch / OpenCV at unit-test scope.
AC-618-5 (Jetson tier-2 e2e) is BLOCKED on operator-supplied hardware
evidence: scripts/run-tests-jetson.sh
tests/e2e/replay/test_derkachi_1min.py must run on Jetson Orin Nano
(JetPack 6.2.2+b24) and the terminal log path + JetPack version + run
timestamp captured per _docs/02_document/tests/tier2-jetson-testing.md.
Quality gates: ruff format clean, ruff lint clean, 6/6 new umbrella
tests pass, 261/261 runtime_root + c5_state regression suite passes,
25/25 test_az401_compose_root_replay regression passes, full Tier-1
unit suite 2150/2151 passes (1 unrelated pre-existing failure:
c12_operator_orchestrator subprocess cold-start NFR fails on Mac dev
host's Python startup ~700 ms; not regressed by AZ-624). Code review
verdict PASS (1 Low finding; full report in
_docs/03_implementation/reviews/batch_96_review.md).
Archives AZ-624 task spec + AZ-618 umbrella reference to done/.
Co-authored-by: Cursor <cursoragent@cursor.com>
Wire the airborne bootstrap to seed pre_constructed['c5_isam2_graph_handle']
so c4_pose's compose-time lookup is satisfied (c4_pose runs before c5_state in
topological order; the iSAM2 graph handle is built INSIDE the C5 estimator's
constructor and so must be produced eagerly at bootstrap time).
build_pre_constructed now invokes a new internal _build_c5_state_estimator_pair
helper that calls state_factory.build_state_estimator once, captures the
(estimator, handle) tuple, and seeds two slots: 'c5_isam2_graph_handle' for
C4's lookup, and an internal '_c5_prebuilt_estimator' look-aside key for the
C5 wrapper's short-circuit. _c5_state_wrapper checks the look-aside key first
and returns the prebuilt instance as-is — the SAME object the handle was
extracted from, so c4_pose._isam2_handle and c5_state._isam2_handle reference
ONE object across the C4 / C5 seam (AC-625.3 cross-seam identity invariant).
C5_STATE_BUILD_FLAGS mirrors state_factory._STATE_BUILD_FLAGS so the bootstrap
can name the gating BUILD_STATE_* flag in operator errors before the lower
level StateEstimatorConfigError fires (AC-625.2). When the factory itself
rejects the configuration with the flag ON, the error wraps into
AirborneBootstrapError with __cause__ preserved (matches AZ-621 / AZ-622
patterns).
Constraints respected per AZ-618 umbrella: no per-component factory signature
changed; additive on top of AZ-619..AZ-623; no edits under state_factory,
pose_factory, or c5_state internals.
Tests: tests/unit/runtime_root/test_az625_c5_isam2_graph_handle_ordering.py
adds 8 tests covering AC-625.1..3 (presence + Protocol conformance, internal
key invariant, BUILD-flag-OFF error, unknown-strategy error, factory error
wrapping, cross-seam identity, wrapper short-circuit, wrapper fallback).
Autouse stubs added to test_az619/620/621/622/623 so prior phase tests stay
isolated from the new builder.
Quality gates: ruff format clean, ruff lint clean, 32/32 phase tests pass,
255/255 runtime_root + c5_state regression suite passes. Code review verdict
PASS (2 Low findings; full report in
_docs/03_implementation/reviews/batch_95_review.md).
Co-authored-by: Cursor <cursoragent@cursor.com>
build_pre_constructed now populates c3_lightglue_runtime
(LightGlueRuntime) + c3_feature_extractor (FeatureExtractor) on top
of AZ-619/620/621. Strategy-specific BUILD_MATCHER_* flag mismatch
raises AirborneBootstrapError naming the missing flag and the c3_matcher
consumer; the c7 InferenceRuntime built earlier in the bootstrap is
reused as the engine source so no double-build at this layer.
C3MatcherConfig gains optional lightglue_weights_path: Path | None
for the operator's deployment config; production main() (AZ-624)
populates it. Real LightGlue inference correctness is verified by
AZ-624's Jetson AC-5 run per the AZ-622 Tier-2 Note.
Phase tests for AZ-619/620/621 gain an autouse _stub_c3_matcher_builders
fixture so additivity assertions remain valid as the bootstrap grows.
Code review: PASS_WITH_WARNINGS (3 Low: signature drift from spec,
_is_build_flag_on duplication across 3 runtime_root modules, and
BuildConfig literal mirrored with per-strategy build configs). All
deferred to future hygiene PBIs.
Co-authored-by: Cursor <cursoragent@cursor.com>
Third subtask of AZ-618. Extends airborne_bootstrap.build_pre_constructed
additively with c7_inference (GPU InferenceRuntime). Wraps the existing
inference_factory.build_inference_runtime so a BUILD_TENSORRT_RUNTIME /
BUILD_PYTORCH_FP16_RUNTIME mismatch surfaces a clear operator-facing
AirborneBootstrapError naming BOTH airborne C7 flags plus the consuming
component slug, rather than bubbling up RuntimeNotAvailableError with no
context.
New public const C7_AIRBORNE_BUILD_FLAGS pairs each airborne runtime
with its gating env flag (onnx_trt_ep deliberately omitted — research
only). Tests stub at the factory boundary; real GPU/TensorRT load
remains Tier-2 only (consolidated at AZ-624). AZ-619 and AZ-620 test
files extended with a _stub_c7_inference_builder autouse fixture
mirroring the AZ-620 pattern for _build_c6_*.
18/18 runtime_root unit tests pass.
Co-authored-by: Cursor <cursoragent@cursor.com>
Prior session committed AZ-619 (Phase A of AZ-618) as 8abfb02,
transitioned the tracker, and archived the spec, but did not write
the batch report. Content reconstructed from git show + the AZ-619
task spec + the prior _docs/_autodev_state.md sub_step.detail.
No code change. Pure audit-trail housekeeping.
Co-authored-by: Cursor <cursoragent@cursor.com>
Adds airborne_bootstrap.build_pre_constructed(config) returning a
dict with the two foundational keys: a per-binary shared FdrClient
under "c13_fdr" (via make_fdr_client with the new
AIRBORNE_MAIN_PRODUCER_ID constant) and a fresh WallClock under
"clock". Phases B..F (AZ-620..AZ-624) extend this function
additively without breaking the AZ-619 contract.
The c13_fdr instance is identity-stable across calls (per the
make_fdr_client per-producer cache) so callers can call
build_pre_constructed twice and get the same FdrClient back -
AC-619.2.
Replay-mode override is unchanged: compose_root merges
replay_components over pre_constructed so the WallClock here is
replaced by TlogDerivedClock in replay binaries (existing
contract documented in compose_root's docstring).
Tests: 5 new unit tests under tests/unit/runtime_root/
test_az619_pre_constructed_phase_a.py, all passing. AZ-591 not
regressed (12/12 in the combined run).
Spec moved to _docs/02_tasks/done/.
Co-authored-by: Cursor <cursoragent@cursor.com>
The AZ-618 spec author flagged "likely a true 8" with a recommended
6-subtask split; combined with the user-rule cap on PBI complexity
(create at 2-3pt, max 5pt) the right move was to split before any
implementation began. Subtasks created in Jira as children of AZ-618:
AZ-619 (Phase A) c13_fdr + clock 2pt
AZ-620 (Phase B) c6_descriptor_index + c6_tile_store 3pt
AZ-621 (Phase C) c7_inference engine 3pt
AZ-622 (Phase D) c3_lightglue_runtime + c3_feature_extractor 3pt
AZ-623 (Phase E) c282_ransac_filter + c5 helpers 3pt
AZ-624 (Phase F) wire main() + AC-1..AC-5 + Jetson 2pt
Aggregate: 16pt actionable work (vs. AZ-618's original 5pt filing,
which the author had already qualified as understated). AZ-618 stays
In Progress in Jira as the umbrella tracker; its task spec file is
now an umbrella reference pointing to the 6 phase-specific spec files.
Deps table updated: AZ-618 row reduced to 0pt with subtask deps; six
new rows added; header counts refreshed (156 -> 162 tasks, 522 -> 533
points). Autodev state set to phase=1 (parse) for the next batch =
AZ-619 (Phase A) only.
Co-authored-by: Cursor <cursoragent@cursor.com>
Batch 89 — adds optional `band`, `ci95_low`, `ci95_high` kw-only
parameters to `_NfrRecorder.record_metric` and emits a new per-metric
report.csv artifact (one row per scenario × metric, columns:
scenario_id, metric_name, value, value_band, ci95_low, ci95_high,
ac_id, outcome). Backwards compatible — existing 4-arg callers
unchanged; unbalanced ci95 pair raises ValueError. report.csv is
written once per pytest session from `pytest_sessionfinish` so the
annotation pass runs once per CI invocation regardless of
(fc_adapter, vio_strategy) (AC-3). `regression-baseline.json`
intentionally kept flat to preserve the diff contract used by
regression-detection tooling.
NFT-RES-03 + NFT-PERF-01 scenarios updated to pass real bands and
compute empirical 2.5/97.5-percentile ci95 from their own sample
streams (per-iteration envelope ratios for Monte Carlo,
per-frame latency samples for N-sample latency).
Tests: 1229 e2e/_unit_tests pass (+6 vs. batch 88 for AZ-446
band/CI behavior, value-error on unbalanced ci95, report.csv columns,
explicit-path override, and end-to-end emission via the pytest
plugin). Code review: PASS_WITH_WARNINGS — 1 Low (empirical-CI
semantics, documented inline), 1 Medium carried over from batch 88's
cumulative-review backlog (write_csv_evidence + _resolve_fixture_path
duplication is outside AZ-446 reporting scope).
This commit closes Step 10 Implement Tests for cycle 1 (41 of 41
blackbox-test tasks done, AZ-406..AZ-446). Greenfield auto-chains to
Step 11 Run Tests next.
Co-authored-by: Cursor <cursoragent@cursor.com>
Batch 88 — adds four resource-limit blackbox scenarios + pure-logic
helpers + unit tests:
- NFT-LIM-01 Jetson memory (AC-NEW-13): tier2_only; Plan A/B budgets;
AC-4 OOM-event scan; 30 s warm-up window; VmRSS + tegrastats streams.
- NFT-LIM-02 FDR size (AC-7.3): 30 min → 8 h linear extrapolation
against 50 GiB; ±60 s replay-window slack for AC-1.
- NFT-LIM-03+05 storage (AC-7.4 + AC-NEW-12 + RESTRICT-STORAGE):
aggregate ≤ 100 GiB across tile-cache + tile-cache-write +
fdr-output; thumbnail-log < 1 GiB strict 8 h-extrapolated.
- NFT-LIM-04 thermal (AC-NEW-5 PARTIAL): tier2_only; CPU/SoC p99
≤ T_throttle − 5 °C; throttle-event scan; PARTIAL annotation written
to traceability-status.json. Thresholds fixture lives at
e2e/fixtures/jetson/thermal-thresholds.json (moved from the
task spec's suggested tests/fixtures/ path so the file stays
inside the blackbox_tests Owns: e2e/** envelope).
All four helpers are public-boundary-only (no src/gps_denied_onboard
imports). Scenarios skip cleanly in the Tier-1 docker harness pending
AZ-595 (SITL replay builder) for the four shared fixture inputs and
AZ-444 (Tier-2 Jetson runner) for the tier2_only scenarios.
Code review: PASS_WITH_WARNINGS (0/0/2/1). Both Mediums are
carried-over write_csv_evidence + _resolve_fixture_path duplication,
deferred to AZ-446 (batch 89). Low is the self-resolved AZ-443 fixture
ownership drift documented in the review.
Tests: 1223 e2e/_unit_tests passing (+1 vs. batch 87 from the new
directory-layout entry); 24 resource_limit scenarios collect and skip
cleanly under runner/pytest.ini.
Co-authored-by: Cursor <cursoragent@cursor.com>
Archive AZ-420 to done/; add cumulative review for batches 79-81 (PASS,
no new findings); advance autodev state to await batch 82.
Co-authored-by: Cursor <cursoragent@cursor.com>
Three blackbox-harness tasks landed together — all depend only on
AZ-406 and unblock the FT-* / NFT-* scenario tasks scheduled for
batches 69+.
AZ-407 — Static fixture builders (3pt):
* tile-cache-builder/{builder.py, Dockerfile, build.sh} produces a
deterministic tile-cache-fixture Docker volume from
_docs/00_problem/input_data/. Reproducibility primitives: sorted
iteration, frozen PIL JPEG settings, FAISS HNSW32 built single-
threaded with seeded stub descriptors.
* age-injector/{age_injector.py, inject.sh} clones the volume and
shifts capture_date by N×30.44 days; tile JPEG bytes preserved
bit-identical. Emits synth-age-7mo + synth-age-13mo volumes.
* cold-boot/cold_boot_fixture.json: frozen FC pose snapshot at
Derkachi sector centre, schema v1.
* secrets/mavlink-test-passkey.txt: 64-hex with required
`# TEST ONLY` header line per AC-5. Passkey-equality test now
compares the secret line after stripping the header.
* security/cve-2025-53644.jpg: synthetic 158-byte malformed JPEG
(truncated SOS marker). OpenCV 4.11.x rejects gracefully with
imdecode → None. AZ-439 will sharpen for ASan instrumentation.
* Top-level Makefile with `make fixtures` / `make fixtures-*` /
`make e2e-tier1*` / `make unit-tests` targets.
AZ-444 — Tier-2 Jetson harness wrapper (5pt):
* run-tier2.sh rewritten as orchestrator. Detects local
(aarch64 + TIER2_HOST=localhost) vs remote (ssh into TIER2_HOST).
New flags: -k/--selector, --build-kind production|asan,
--reflash (gated behind TIER2_REFLASH_ACK=1 two-key gate),
--dry-run.
* tier2-on-jetson.sh (new) — on-device delegate. Verifies
gps-denied-onboard{,-asan}.service health; restarts with 5s
tolerance; spawns tegrastats + jtop parallel samplers; tails
ASan unit's journal in asan mode; drives docker compose with
TIER=tier2-jetson; forwards SELECTOR to pytest -k.
* docker/run-tier1.sh (new) — selector-parity sibling.
* AC-1 (selector parity) and AC-6 (reflash gating) unit-tested via
--dry-run output assertions. AC-2/AC-3/AC-4/AC-5 are hardware-
loop ACs verified by the Tier-2 runtime smoke (no Jetson in the
unit-test layer).
AZ-445 — CSV reporter + evidence bundler refinements (2pt):
* reporting/nfr_recorder.py (new) — pytest plugin. Provides the
`nfr_recorder` fixture with record_metric(name, value, ac_id)
and partial(ac_id, reason). At session end emits:
- per-nfr/<scenario_id>.json (AC-1)
- traceability-status.json with every AC ID parsed from
traceability-matrix.md, classified Covered/PARTIAL/NOT
COVERED with source scenario IDs (AC-2)
- regression-baseline.json with all numeric metrics (AC-3)
* csv_reporter.py extended — `_outcome_to_result` consults the
aggregator; rows flip PASS → PARTIAL when an AC was marked
PARTIAL by nfr_recorder (AC-4). Graceful fallback when
aggregator isn't registered (unit-test contexts).
* conftest.py registers nfr_recorder in pytest_plugins.
* New --traceability-matrix CLI flag seeds the NOT COVERED rows.
Build / config:
* pyproject.toml dev extras: added Pillow>=10.4,<13.0 for the
tile-cache-builder unit test (broad enough to keep torchvision's
Pillow 12 pin happy; the production builder runs inside its own
Docker image with its own pin).
* Updated test_directory_layout.py to cover 10 new files + replaced
the byte-equal passkey assertion with the header-stripping
variant.
Test results:
* 157 focused tests pass (was 97 in batch 67; +60 new across this
batch). No regressions.
Module-layout / spec drift:
* AZ-407 spec text says `tests/fixtures/...`; module-layout
blackbox_tests entry (commit d7a17a8) authoritatively places the
harness under `e2e/`. Implementation followed the layout entry.
* AZ-444 spec mentions `e2e/tier2/run-tier2.sh`; AZ-406 placed it
at `e2e/jetson/run-tier2.sh`. Kept at `e2e/jetson/` for
consistency.
* Cold-boot README ownership: corrected from AZ-419 to AZ-407 per
AZ-419's own Dependencies field.
Specs archived to _docs/02_tasks/done/. Jira tickets transitioned to
In Testing on commit.
Co-authored-by: Cursor <cursoragent@cursor.com>