mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 13:11:14 +00:00
ab92946833af12768bb9ef3a00f40f56809dfcf2
62 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
ab92946833 |
[autodev] Step 13 partial: helpers 5-8 cycle-1 doc sync
Batch 5b completes the helpers sweep for cycle-1 Step 13. For each of the four remaining helpers (sha256_sidecar, engine_filename_schema, ransac_filter, descriptor_normaliser): - Append "Cycle-1 operational reality" section to the existing common-helpers/<NN>_*.md, documenting the shipped interface, exception types, public constants, determinism / validation invariants, and AZ-task lineage. Specific cycle-1 facts captured per helper: - sha256_sidecar (AZ-280): single Sha256SidecarError hierarchy, SIDECAR_SUFFIX public constant, sidecar format is pure lowercase 64-char hex (no JSON), verbatim ".sha256" suffix append, streaming digests in 1 MiB chunks, verify-returns-False semantics for missing payload vs. raise for missing sidecar, byte-deterministic aggregate_hash with sorted-by-str basenames. - engine_filename_schema (AZ-281): EngineFilenameSchemaError, ENGINE_SUFFIX and ALLOWED_PRECISIONS public constants, strict model validation ([a-z0-9_]+ ≤64 chars no __), dotted version regex, non-bool sm validation, matches_host ignores precision by design. - ransac_filter (AZ-282 / AZ-623): RansacFilterError, frozen RansacResult dataclass, cv2.setRNGSeed(0) determinism, median-not-mean residual, NaN for empty inliers, min_inliers is informational only, filter_correspondences uses perspectiveTransform vs. compute_reprojection_residual uses projectPoints, OK to import se3_utils (both Layer 1). - descriptor_normaliser (AZ-283 / AZ-338): DescriptorNormaliserError, ALLOWED_DTYPES = (float16, float32), float32 norm computation with dtype-preserving cast-back, new intra_cluster_normalise method for NetVLAD per-cluster L2 (AZ-338), descriptor_metric returns "inner_product" string. Two contract files (descriptor_normaliser.md and ransac_filter.md mention follow-up) need follow-up minor revisions to match shipped surface; queued for the contracts-folder sweep. Bumps _docs/_autodev_state.md sub_step to tests-doc-updates phase 9. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
4fdf1968af |
[autodev] Step 13 partial: helpers 1-4 cycle-1 doc sync
Batch 5a of the cycle-1 doc sync. For each of the four foundation helpers (imu_preintegrator, se3_utils, lightglue_runtime, wgs_converter): - Append "Cycle-1 operational reality" section to the existing common-helpers/<NN>_*.md, documenting what the shipped implementation actually exposes vs. the design- intent sketch (interfaces, exception types, public constants, AZ-task lineage). Specific cycle-1 facts captured per helper: - imu_preintegrator (AZ-276): make_imu_preintegrator factory, BMI088-class noise defaults, single ImuPreintegrationError exception, actual return type is PreintegratedCombinedMeasurements (consumer builds the CombinedImuFactor), destructive reset_with_bias semantics, first-sample-not-integrated dt=0 handling. - se3_utils (AZ-277): SE3 = gtsam.Pose3 re-export, Se3InvalidMatrixError, strict caller-orthogonalisation invariant, _DEFAULT_ROT_ATOL=1e-6 and small-angle Taylor cutoff for exp_map, is_valid_rotation predicate, strict dtype=float64 everywhere. - lightglue_runtime (AZ-278 / R14 fix): EngineHandle Protocol-typed constructor, LightGlueRuntimeError + LightGlueConcurrentAccessError, non-blocking concurrent- access guard (raises rather than serialises), match_batch equal-length precondition, composition-root single-instance into C2.5 + C3. - wgs_converter (AZ-279 + AZ-490): WEB_MERCATOR_MAX_LAT_DEG and MAX_ZOOM constants, WgsConversionError, ECEF arrays are ndarray(3,) float64, new horizontal_distance_m method (AZ-490 takeoff-origin bounded-delta gate), slippy-map tile math hand-rolled to match satellite-provider on-disk layout. Two contract files (imu_preintegrator.md and wgs_converter.md) need follow-up minor revisions to match shipped surface; queued for the next contracts-folder sweep, noted inline in each helper's new section. Also refresh D-CROSS-CVE-1 opencv-pin leftover replay timestamp (8-min debounce — gtsam upstream state cannot change in that window). Bumps _docs/_autodev_state.md sub_step detail. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
12aba8139f |
[autodev] Step 13 partial: c10/c11/c12/c13 cycle-1 doc sync
Batch 4 of the cycle-1 component-doc sync. For each of C10
(provisioning), C11 (tilemanager), C12 (operator_orchestrator),
and C13 (fdr):
- Append "Cycle-1 operational reality" paragraph to § 1
documenting the actual cycle-1 wiring path:
- C10: operator-side / cross-tier; NOT in _STRATEGY_REGISTRY;
composed via runtime_root/c10_factory.py with six per-service
factories; reuses C7 InferenceRuntime for engine compile;
AZ-323 Ed25519 signer + C10ManifestConfig signing-mode gate;
AZ-324 ManifestVerifierImpl with airborne/operator modes;
AZ-507 c6 cuts kept in c10_factory; AZ-687 N/A.
- C11: operator-workstation-only; airborne build target
excludes source tree (ADR-004 / AC-8.4); composed via
runtime_root/c11_factory.py with three per-service factories;
distinct FdrClient producer_ids for signing_key + tile_uploader;
AZ-320 IdempotentRetryTileUploader wraps by default;
AZ-507 keeps c6 surfaces caller-injected; AZ-687 N/A.
- C12: operator-workstation CLI binary; airborne build excludes
source tree (ADR-004 + Principle #9); composed via
runtime_root/c12_factory.py; OperatorOrchestratorServices
dataclass aggregates AZ-326/327/328/329/330/489 services with
sibling fields defaulting to None; AZ-507 cuts via
RemoteCacheProvisionerInvoker + TileDownloaderCut/UploaderCut;
AZ-687 N/A.
- C13: airborne infrastructure; pre_constructed[c13_fdr] seeded
FIRST via make_fdr_client(AIRBORNE_MAIN_PRODUCER_ID, config)
(AZ-619 Phase A); per-producer _CACHE gives AC-619.2 singleton;
AZ-274 drop-oldest overrun policy wired at construction;
c1_vio / c5_state require it, c2_5/c3/c3_5/c4 optional; AZ-687
guard explicitly does NOT apply — seed runs before any block
presence check so replay binaries still write FDR.
Also bump _docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md
replay timestamp to 17:18 (start of this /autodev invocation);
gtsam==4.2.1 still requires numpy<2.0.0 so the relaxed opencv pin
remains in effect.
Update _docs/_autodev_state.md sub_step.detail to record batch
4/~5 done; next batch is the 8 helpers under common-helpers/.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
76f460c88a |
[autodev] Step 13 partial: c6/c7/c8 cycle-1 doc sync
Batch 3 of the cycle-1 component-doc sync. For each of C6
(tile_cache), C7 (inference), C8 (fc_adapter):
- Append "Cycle-1 operational reality" paragraph to § 1
documenting the actual cycle-1 wiring path:
- C6: infrastructure seeded via build_pre_constructed's
c6_descriptor_index (BUILD_FAISS_INDEX-gated) and
c6_tile_store slots; no _STRATEGY_REGISTRY slot;
AZ-687 replay-mode guard skips both seeds when the
minimal replay Config omits the c6_tile_cache block.
- C7: single InferenceRuntime built once via
_build_c7_inference, identity-shared as the engine
source for c3_lightglue_runtime (AZ-622 phase D);
C7_AIRBORNE_BUILD_FLAGS lists tensorrt (production-
default) + pytorch_fp16 (Tier-0 fallback);
onnx_trt_ep deliberately omitted from airborne flags;
AZ-687 replay-mode guard cascades to c3_lightglue_runtime.
- C8: composed via a SEPARATE registry path
(runtime_root/fc_factory.py) with its own _FC_REGISTRY
+ _GCS_REGISTRY; per-binary bootstrap modules register
concrete strategies under BUILD_FC_* / BUILD_GCS_*
flags; bind_outbound_emit_thread enforces the
single-writer outbound invariant (AC-6).
- Add "Cycle-1 Tier-2 follow-up dependencies" subsection
in § 7 of C7 only: onnx_trt_ep is implemented and the
inference_factory recognises BUILD_ONNX_TRT_EP_RUNTIME,
but airborne config selecting it raises a clean
AirborneBootstrapError pointing only at the two airborne
options. C6 and C8 have no parked Tier-2 strategies for
cycle-1.
None of c6/c7/c8 import cv2 directly, so no OpenCV pin
row is added to § 5 (D-CROSS-CVE-1 leftover stays as it
is; the relaxed pin is recorded against c2.5/c3/c3.5/c4/c5
where the imports actually live).
Also refresh the D-CROSS-CVE-1 leftover replay timestamp
(condition still upstream-gated: gtsam wheels remain
numpy<2) and bump the autodev state's sub_step.detail to
record "batch 3/~5 done (c6/c7/c8); 4 components + 8
helpers + tests/ remain".
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
39a7267a23 |
[autodev] Step 13 partial: c3_5/c4/c5 cycle-1 doc sync
Batch 2 of the cycle-1 component-doc sync. For each of C3.5 (AdHoP), C4 (Pose), C5 (State): - Append "Cycle-1 operational reality" paragraph to § 1 documenting the _STRATEGY_REGISTRY wiring, the AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS slot, and the composition-time errors raised on missing seeds. - Relax the OpenCV pin in § 5 to >=4.11.0.86,<4.12 with a pointer to the D-CROSS-CVE-1 leftover (C5 adds a new row for the AZ-389 orthorectifier subsystem's cv2 import). - Add "Cycle-1 Tier-2 follow-up dependencies" subsection in § 7 where applicable: C3.5 calls out the airborne registry's omission of PassthroughRefiner; C5 calls out the AZ-389 orthorectifier wiring (default OFF) and the AZ-624 operator-supplied flight metadata that must land before flipping orthorectifier.enabled=True. C4 has no parked Tier-2 (only opencv_gtsam is defined). Also refresh the D-CROSS-CVE-1 leftover replay timestamp (condition still upstream-gated: gtsam wheels remain numpy<2) and bump the autodev state's sub_step.detail to record "batch 2/~5 done (c3_5/c4/c5); 7 components + 8 helpers + tests/ remain". Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
c1f27e4681 |
[autodev] Step 13 partial: c1/c2/c2_5/c3 cycle-1 doc sync
Item 2 (C1) + item 3 batch 1 of ~5 (C2 VPR, C2.5 Rerank, C3 Matcher) of the cycle-1 component-description reconciliation called out in ripple_log_cycle1.md. For each touched description.md: - Add a "Cycle-1 operational reality" paragraph in section 1 that names the _STRATEGY_REGISTRY + register_airborne_strategies() runtime gate (AZ-591), the pre_constructed dict path through compose_root (AZ-618 umbrella), the per-component AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS row, and any cycle-1 strategy-default vs documented-primary disambiguation (net_vlad as the C2 default; xfeat parked from the C3 airborne registry). - Relax the OpenCV row in section 5 Key Dependencies to the D-CROSS-CVE-1 cycle-1 pin (>=4.11.0.86,<4.12) wherever the component imports cv2 (C2 preprocessors, C2.5 ORB placeholder, C3 RANSAC + reprojection). - Add a "Cycle-1 Tier-2 follow-up dependencies" subsection in section 7 only for components with a strategy module that is built but parked from the airborne registry (C3 xfeat). Refresh ripple_log_cycle1.md follow-up ordering with per-batch progress + extracted batch pattern so the next batch session has a self-contained recipe. Bump _autodev_state.md sub_step.detail to reflect batch 1 completion (10 components + 8 helpers + tests/ remain). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
bb9c408597 |
[autodev] Step 12 cycle-1 sync: tests/resilience+traceability
Backfill the uncommitted Step 12 (Test-Spec Sync) output for the resilience-tests and traceability-matrix surfaces; these were produced by the test-spec skill in cycle-update mode but never landed as a git commit before the flow moved to Step 13. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
1ca9a59b0b |
[autodev] Step 13 partial: arch + module-layout cycle-1 sync
Item 1 of the deferred Step 13 refresh set per _docs/02_document/ripple_log_cycle1.md. architecture.md: - Components C1: KltRansac is the cycle-1 operational default while AZ-332/AZ-333 are BLOCKED awaiting Tier-2 prerequisites; ADR-001 / ADR-002 unchanged (the seam holds; the selection shifted). - Principle #3: same KltRansac note (cross-link to Components). - § Technology Stack: OpenCV pin row reflects the cycle-1 relaxation to >=4.11.0.86,<4.12 with the leftover-file pointer; OKVIS2 + VINS- Mono rows note BLOCKED with AZ-592 / AZ-593 follow-ups. - § NFR: Dependency CVE pinning row notes the relaxation and the CVE-2025-53644 re-validation owed before close. - § ADR-001: cycle-1 operational note (KltRansac default; AZ-332/333 facade-only; AZ-589/590 closed Won't-Fix). - § ADR-009: new Cycle-1 implementation subsection covers _STRATEGY_REGISTRY + register_strategy (AZ-591) and the pre_constructed kwarg + build_pre_constructed (AZ-618 umbrella; Phases A-F including AZ-625 / AZ-687). module-layout.md: - shared/runtime_root entry: package layout (was single file in the Plan-era sketch); new public-surface table covering __init__.py, airborne_bootstrap.py, _replay_branch.py, and the per-component factory modules; ownership rows extended (AZ-591, AZ-618, AZ-625, AZ-687). system-flows.md: intentionally not modified — F2 / F8 narratives are at the component-flow abstraction level and do not reference compose_root / pre_constructed mechanics, so they have not drifted. Items 2-4 of the ripple-log refresh set (C1 description, the other 13 components, 8 helpers, tests/*.md) remain deferred to subsequent sessions. State: Step 13 stays in_progress; sub_step advanced to phase 6 (component-doc-updates). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
4f122b604d |
[autodev] Step 13 partial: system-level cycle-1 doc sync
Updates _docs/02_document/ to capture the highest-leverage cycle-1 deltas after 97 implementation batches: - FINAL_report.md: revise Decision 9 to reflect the actual opencv-python pin (>=4.11.0.86,<4.12; D-CROSS-CVE-1 deferred per leftover); new "Cycle 1 Implementation Status" section documents the _STRATEGY_REGISTRY + pre_constructed composition-root additions (AZ-591, AZ-618/AZ-619..AZ-624), AZ-332 + AZ-333 BLOCKED with parked Tier-2 follow-ups AZ-592 + AZ-593, AZ-589 + AZ-590 closed Won't-Fix, Step 11 Run Tests results (3343 passed / 88 skipped / 0 failed local; Docker harness rehab tracked by AZ-602), and the deferred-reconciliation list. - glossary.md: 5 new cycle-1 entries (_STRATEGY_REGISTRY, airborne_bootstrap, KltRansac as production-default Tier-1 VIO, pre_constructed kwarg, Tier-1 task / Tier-2 task capability classification). Status line notes the cycle-1 additions pending re-confirmation. - ripple_log_cycle1.md (new): explains why per-file enumeration is N/A for end-of-cycle-1 sync, lists the three doc-update levels and their effective scope, and records the recommended follow-up ordering for the deferred component / helper / contract / test passes. Step 13 deferred: architecture.md, module-layout.md, system-flows.md, 14 component description.md + tests.md, 8 helper docs, 18 contract subfolders, 7 test docs (~50+ files; ~80 product tasks + ~8 helper tasks + ~36 blackbox test tasks). Filed in FINAL_report.md and ripple_log_cycle1.md; resume in a fresh conversation per the 2026-05-18 LESSONS.md guidance. State: greenfield / Step 13 / in_progress / phase 5 (system-level-updates) / cycle 1. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
d066a23cb1 |
[autodev] Add Tier-2 Jetson testing strategy doc
Codifies that Tier-1 (local pytest + Docker) is necessary but NOT sufficient: Tier-2 (Jetson Orin Nano via run-tests-jetson.sh) is the product-completeness gate for runtime_root, c7_inference, c3_matcher, c2_5_rerank, replay_input, and the replay CLI. Documents the mandatory-Tier-2 scope, what Tier-1-only stubs cannot prove, the operating procedure, and what batch reports must capture for in-scope changes. Surfaced by the Step-11 cycle-1 finding that AZ-618 was only caught because Tier-2 was actually run. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
d7a17a8248 |
[AZ-406] Add blackbox_tests cross-cutting entry to module-layout.md
The 41 blackbox/e2e test tasks (AZ-406..AZ-446 under epic AZ-262) all declare Component=Blackbox Tests, but module-layout.md had no matching Per-Component Mapping entry. The implement skill's Step 4 (File Ownership) requires every batch's component to be resolvable in module-layout.md. Add a `blackbox_tests` entry in the Shared / Cross-Cutting section that owns the top-level `e2e/` directory (separate from `tests/`), documents the public-boundary discipline (no SUT imports), and clarifies that boundary-driven performance/resilience/security scenarios live under `e2e/tests/<category>/` rather than under `tests/perf|security|resilience/`. Also update Layout Rule #7 to reflect the harness split and the state file's sub_step to parse-and-detect-progress (Step 10 entry). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
8149083cac |
[AZ-405] Replay — replay_input/ coordinator + IMU take-off auto-sync
Adds the Layer-4 cross-cutting `replay_input/` module per ADR-011: ReplayInputAdapter converges (video, tlog) into the standard FrameSource + FcAdapter + Clock surfaces the airborne composition root consumes. Owns time-alignment between video frames and tlog IMU/attitude ticks (manual via --time-offset-ms or auto via the AZ-405 IMU-take-off detector + Farneback motion-onset detector). Auto-sync algorithm (auto_sync.py): - Tlog take-off detector: sustained vertical-accel excess > 0.5 g for >= 0.5 s + sustained attitude-rate magnitude > 1 rad/s. - Video motion-onset detector: dense Farneback flow magnitude > 1.5 px sustained >= 0.5 s (deterministic per AC-10). - compute_offset combines the two; confidence = min(tlog, video). - validate_offset_or_fail implements the AC-9 95 % frame-window match validator with configurable threshold + window. ReplayInputAdapter.open() ordering (AC-13): 1. Load tlog samples + fail-fast on missing RAW_IMU/SCALED_IMU2 or ATTITUDE BEFORE any video read. 2. Resolve offset (auto-sync OR manual override; manual bypasses the detectors entirely per AC-8). 3. Run AC-9 validator on resolved offset; raise auto-sync hard-fail for AC-7 (CLI exit 2 mapping). 4. Build single Clock instance per pace (TlogDerived/ASAP, Wall/REAL). 5. Construct VideoFileFrameSource and TlogReplayFcAdapter with the resolved offset baked in (replay protocol Invariant 8). Structured log + FDR records on auto-sync detected / low-confidence / AC-8 hard-fail kinds. Idempotent close (AC-12). Tests: 25 unit tests across tests/unit/replay_input/ covering all 13 ACs (kernel-level synthetic fixtures for AC-1..AC-10; coordinator- level OpenCV synthetic videos + faked pymavlink for AC-6..AC-13). Contract update: replay_protocol.md v2.0.0 added fdr_client to the ReplayInputAdapter __init__ signature (was missing in the prose; the task spec already listed it in the allowed-imports section). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
5adf3dd04f |
[AZ-265] Replay as configuration of airborne binary (ADR-011)
Re-design replay mode per user direction: replay is no longer a fourth Docker image with a reduced component set, but a `config.mode = "replay"` branch of the single airborne binary. The pre-flight workflow (route in suite UI -> C12 tile download via real satellite-provider -> C10 manifest+engines build) is identical between live and replay; only three strategies swap at compose time: FrameSource: Live <-> Video FcAdapter: Pymavlink/MSP2 <-> TlogReplay MavlinkTransport: Serial <-> Noop The C8 outbound MAVLink encoders run unchanged in both modes; their bytes hit `NoopMavlinkTransport` in replay and disappear. A new `JsonlReplaySink` taps C5's `EstimatorOutput` stream so the parent-suite UI sees per-tick coordinates by tailing `results.jsonl`. MAVLink 2.0 signing key remains mandatory (operator supplies a dummy file). A new `replay_input/` Layer-4 cross-cutting coordinator owns `(video, tlog) -> (FrameSource, FcAdapter, Clock)` convergence; the composition root sees only standard interfaces past `.open()`. Docs: - architecture.md: new ADR-011 with full rationale; ADR-002 binary narrative updated. - contracts/replay/replay_protocol.md: bumped to v2.0.0; 12 invariants (notably mode-agnosticism + encoder byte-equality + signing key mandatory + real C6 cache in replay). - module-layout.md: Build-Time Exclusion Map dropped from 4 to 3 binary columns; replay-mode `BUILD_*` flags default ON in airborne; `shared/replay_input` cross-cutting entry added. - epics.md: E-DEMO-REPLAY scope reframed; story points 27-32 -> 19-24. Task respecs: - AZ-401: shrunk 3 -> 2 pts; `compose_root` mode branch + JSONL sink + NoopMavlinkTransport wiring; legacy `compose_replay` export deleted. - AZ-402: console-script wrapper that mutates `config.mode = "replay"` and dispatches into the shared airborne main; `--mavlink-signing-key` mandatory. - AZ-403: CANCELLED. Moved to done/ with banner; Jira transition deferred via `_docs/_process_leftovers/2026-05-14_az_403_cancellation_pending_tracker.md`. - AZ-404: AC-4 reworded as mode-agnosticism AST scan + encoder byte-equality test; new AC-8 operator-workflow rehearsal. - AZ-405: also owns the `replay_input/` module + `ReplayInputAdapter`. _dependencies_table.md updated: AZ-401 gains AZ-405 dep; AZ-404 drops AZ-403 dep; AZ-403 row marked CANCELLED. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
fa3742d582 |
[AZ-399] [AZ-400] C8 TlogReplayFcAdapter + ReplaySink + JsonlReplaySink
Opens E-DEMO-REPLAY (AZ-265): the two C8 strategies that let the upcoming compose_replay (AZ-401) and gps-denied-replay CLI (AZ-402) run the production C1-C5 pipeline against a recorded (.tlog, video) pair without touching live FC I/O. AZ-400 lands the contract ReplaySink Protocol (emit + close per replay_protocol.md v1.0.0) and JsonlReplaySink: orjson-serialised JSONL, fsync-on-close, build-flag gated (BUILD_REPLAY_SINK_JSONL), double-close idempotent, FDR mirror on open/close. The drifted AZ-390 stub in interface.py is removed; the canonical Protocol now lives in replay_sink.py per module-layout.md and is re-exported via __init__.py. AZ-390 conformance test widened. AZ-399 lands TlogReplayFcAdapter: full FcAdapter Protocol surface, build-flag gated (BUILD_TLOG_REPLAY_ADAPTER), pymavlink stream-parse with bounded pre-scan + fail-fast on missing required messages (R-DEMO-3), dedicated decode thread feeding the existing AZ-391 SubscriptionBus. Outbound surface raises FcEmitError per Invariant 5; request_source_set_switch raises SourceSetSwitchNotSupportedError. Pacing honours Invariant 6 via Clock.sleep_until_ns. time_offset_ms shifts every emitted received_at per Invariant 8. Non-monotonic timestamps raise FcOpenError. Test coverage: 188 c8_fc_adapter tests pass; 1 skipped (AZ-399 AC-1 500 MB tlog RSS bound, deferred to AZ-404 e2e behind RUN_REPLAY_E2E). Code review: PASS_WITH_WARNINGS — 1 Medium (mapping logic duplicates AZ-391 live decoder; intentional today, four behavioural deltas documented), 2 Low. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
5dfd9a577e |
[AZ-526] Consolidate _iso_ts_from_clock into helpers/iso_timestamps
Closes cumulative review 46-48 F1 (Medium) + F3 (Low). Adds iso_ts_from_clock(clock) alongside iso_ts_now() in the Layer-1 helper; migrates four duplicate definitions in c2_vpr (net_vlad, ultra_vpr, _faiss_bridge) and c12_operator_orchestrator (operator_reloc_service). Output format flipped +00:00 -> Z to align with iso_ts_now() and the canonical FDR _TS fixture (FDR schema test passes unmodified). 18 helper AC tests + 186 sibling tests pass; ruff clean. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
5441ea2017 |
[AZ-508] Consolidate _iso_ts_now into helpers/iso_timestamps
Batch 48 / Cycle 1 (greenfield Step 7). Closes cumulative review batches 31-33 F2 and 28-30 F3 by replacing the duplicated private _iso_ts_now() one-liners with a single Layer-1 helper: src/gps_denied_onboard/helpers/iso_timestamps.py iso_ts_now() -> str Output format matches the canonical FDR _TS fixture (YYYY-MM-DDTHH:MM:SS.ffffffZ); no FDR schema change. Migrated call-sites (3): c7_inference/onnx_trt_ep_runtime, c7_inference/thermal_publisher, plus the 3 c6_tile_cache callers that previously imported from the local c6_tile_cache/_timestamp shim (now deleted, superseded by the Layer-1 helper). Spec drift resolved (Choose A, user-approved): spec listed 5 call sites + +00:00 regex; on-disk reality at batch start is 3 sites + Z-suffix matching every existing helper and the FDR _TS fixture. Spec preamble + AC-2 regex updated in the task file; documented in batch_48_cycle1_report.md. Tests: 9 new AC tests (AC-1..AC-7 + Layer-1 invariant + public-surface defensive); 216 focused tests pass including the unmodified AZ-272 FDR schema suite and AZ-270 / AZ-507 layering lints. Verdict: PASS (no findings). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
af0dbe863a |
[AZ-338] [AZ-283] C2 NetVLAD mandatory simple-baseline VprStrategy
NetVLAD is the C2 comparative baseline per the engine rule (every production-default backbone ships with a simple-baseline alongside). Runs on the C7 PyTorch FP16 runtime (NOT TRT) so a TRT engine compile bug cannot simultaneously break NetVLAD AND UltraVPR. Production changes: - c2_vpr/net_vlad.py — NetVladStrategy + module-level create() factory. Constructor wires InferenceRuntimeCut + DescriptorIndexCut + NetVladBackbonePreprocessor + DescriptorNormaliser + FaissBridge. embed_query pipeline: preprocess -> runtime.infer -> dual-stage normalisation (intra-cluster THEN global L2) -> VprQuery. retrieve_topk delegates one-line to FaissBridge. - c2_vpr/_net_vlad_architecture.py — Arandjelovic et al. 2016 NetVLAD layer over torchvision VGG16 features + optional Linear PCA projection to descriptor_dim (default 4096; published Pittsburgh reference uses K*D=64*512=32768 raw + Linear(32768, 4096) PCA). - c2_vpr/_preprocessor_net_vlad.py — OpenCV-based image preprocessor: decode -> centre-crop square -> resize (480, 480) -> ImageNet normalisation -> FP16 NCHW. Calibration is not consumed (NetVLAD is calibration-agnostic per published preprocessing chain). - c2_vpr/inference_runtime_cut.py — NEW AZ-507 consumer-side cut mirroring C7 InferenceRuntime; lets c2_vpr stay AZ-507-clean. - c2_vpr/config.py — added netvlad_descriptor_dim: int = 4096 knob. - helpers/descriptor_normaliser.py — added intra_cluster_normalise (DescriptorNormaliser v1.0.0 -> v1.1.0; backward-compatible add). - runtime_root/vpr_factory.py — added _register_strategy_architecture helper that binds (MODEL_NAME, architecture_factory(descriptor_dim)) to C7's architecture registry before delegating to the strategy's create() factory. Keeps the c7 import at L4, preserves AZ-507. - fdr_client/records.py — registered vpr.embed_query, vpr.backbone_error, vpr.preprocess_error record kinds. Tests: - tests/unit/c2_vpr/test_net_vlad.py — 31 tests covering all 11 ACs + preprocessor contract + architecture factory + constructor validation + FDR record emission. - tests/unit/test_az283_descriptor_normaliser.py — +8 tests for the new intra_cluster_normalise. - tests/unit/test_az272_fdr_record_schema.py — +3 fixture payloads. Full unit suite: 1608 passed / 80 env-skipped (+43 new tests). Per-batch code review (batch_46_review.md): PASS_WITH_WARNINGS (4 Low-severity hygiene findings; no Critical/High/Medium). Architectural notes: - The spec implied c2_vpr.net_vlad.create() registers the architecture with C7. That violates AZ-507 (no cross-component imports). Resolved by exposing MODEL_NAME + architecture_factory(descriptor_dim) on the strategy module and having the composition root perform the C7 bind. - C7 PyTorch runtime API names in the spec (forward, load_engine) were outdated; aligned implementation with the live v1.0.0 Protocol (infer, compile_engine + deserialize_engine). Spec hygiene flagged in review F2. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
a92e5ee482 |
[AZ-329] [AZ-330] [AZ-523] [AZ-524] Doc sweep: arch + glossary for Batch 44
Propagate Batch 44 SRP refactor (C11 internal flight-state gate moved to C12; PostLandingUploadOrchestrator gates on flight_footer.clean_shutdown; OperatorReLocService dispatches AC-3.4 hints via OperatorCommandTransport) into the suite-wide architecture documents that the per-component sweep in Phase F did not yet cover. Files updated: - architecture.md: C11/C12 component entries, principle #4 phrasing, Data Model table (FlightStateSignal annotation + new FlightFooterRecord / PostLandingUploadRequest / ReLocHint rows), post-landing + reloc data-flow summaries, ADR-004 "Why the gate moved to C12" rationale, deployment + security wording. - glossary.md: Tile Manager entry — gate-removal note. - data_model.md: FlightStateSignal row clarified; new rows for Batch 44 DTOs. - system-flows.md: F10 row, dependencies, full F10 prose + preconditions + mermaid + error table reworked around the footer-based gate. - epics.md: E-C11 scope/interface/AC/child-issue table (gate stripped, AZ-317 superseded); E-C12 scope/interface/AC/child- issue table expanded with PostLandingUploadOrchestrator, OperatorReLocService, FdrFooterReader, OperatorCommandTransport. - FINAL_report.md: component table rows 12 + 13. - components/10_c8_fc_adapter/description.md: removed stale claim that C11 TileUploader consumes FlightStateSignal. - contracts/c6_tile_cache/tile_metadata_store.md: minor C12 naming fix. Tests: 1543 passed / 80 skipped — doc-only sweep, no regressions. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
5fe67023b2 |
[AZ-329] [AZ-330] [AZ-523] [AZ-524] Batch 44 atomic refactor
Implements two new C12 services and rebalances the C11/C12 boundary in one atomic commit: * AZ-329 PostLandingUploadOrchestrator — gates C11 upload on the `flight_footer` FDR record's `clean_shutdown` field; 4 refusal modes; new FdrFooterReader Protocol + LocalFdrFooterReader. * AZ-330 OperatorReLocService — AC-3.4 visual-loss re-localization hint; reuses shared LatLonAlt; OperatorCommandTransport Protocol cut (E-C8 owns the future pymavlink concrete); new FDR record kind `c12.reloc.requested`; log redaction (lat/lon 5 decimals, reason 200 chars). * AZ-523 C11 internal flight-state gate removed (SRP refactor): `confirm_flight_state` / `FlightStateSignal` use / `FlightStateNotOnGroundError` deleted from C11; TileUploader contract bumped to v2.0.0 (frozen) with migration note; AZ-317 superseded. * AZ-524 Package rename `c12_operator_tooling` → `c12_operator_orchestrator` across source, tests, pyproject, CMake, Dockerfile, compose, CI, runtime-root services class (`OperatorOrchestratorServices`) + factory function (`build_operator_orchestrator`), logger namespaces, config slug, docs, and the E-C12 epic title. Tests: 1543 passed, 80 skipped (all environment gates). Targeted AC suite (AZ-329 + AZ-330 + FdrFooterReader): 37 passed. Cold-start NFR-perf still ≤ 500 ms p99. Tracker: AZ-317 → Done (superseded); AZ-319 v2.0.0 contract bump comment; AZ-329/AZ-330 → In Testing; AZ-253 epic renamed; AZ-523 + AZ-524 created and closed as audit-trail tickets. See `_docs/03_implementation/batch_44_cycle1_report.md`. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
a06b107fc3 |
[AZ-320] Add C11 IdempotentRetryTileUploader decorator
Wraps HttpTileUploader (AZ-319) with two bounded retry budgets: - In-call (per-batch) — re-invokes inner on PARTIAL outcome up to `max_in_call_retries` times with capped exponential backoff (`min(base ** attempt_number, cap)`). On exhaustion: surfaces an operator hint via `next_retry_at_s = now + backoff_cap_s`. - Per-tile (cross-call) — atomically increments c6's `tiles.upload_attempts` counter for every rejection; once a tile hits `max_per_tile_attempts` it is forward-only transitioned to `voting_status = upload_giveup` (excluded from `pending_uploads`). Each transition emits FDR `kind="c11.upload.giveup"` plus an ERROR log. C6 contract changes (AZ-303 v1.3.0): - VotingStatus.UPLOAD_GIVEUP added (forward-only from PENDING/TRUSTED). - TileMetadataStore.increment_upload_attempts(tile_id) -> int added with NotImplementedError default for backwards-compat. - Migration 0003_c11_upload_attempts: additive column + widened ck_tiles_voting_status (preserves IS NULL clause). C11 wiring: - C11RetryConfig + disable_retry_decorator on C11Config. - build_tile_uploader wraps in decorator by default; bypass flag returns the bare HttpTileUploader. New `clock` keyword. Cross-component isolation honoured (AZ-507): the decorator declares `_RetryMetadataStoreLike` Protocol cut over c6's TileMetadataStore and references `UPLOAD_GIVEUP` via a local string constant — no c6 imports. Tests: 13 decorator + 1 conformance + 2 factory bypass + AC-6 enum update + alembic head bump + AZ-272 schema fixture. 238 passed across c11/c6/fdr suites; pre-existing perf microbenches unrelated. Code review: PASS_WITH_WARNINGS (5 Low/Informational findings, docs-level or downstream-CI-blocked). See _docs/03_implementation/reviews/batch_41_review.md. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
f01a5058ab |
[AZ-322] C10 DescriptorBatcher (faiss-cpu, OOM halve-retry)
Implements the C10 internal phase that walks every C6 tile, embeds through C2's backbone via the AZ-321-produced engine, and rebuilds the AZ-306 FAISS HNSW index in one atomic write. - DescriptorBatcher with halve-and-retry OOM recovery (default 1 retry) - BackboneEmbedder Protocol + C7EngineBackboneEmbedder default impl - DescriptorBatchError for OOM / dim-mismatch / missing-output failures - Empty-corpus surfaces as outcome=failure with explicit hint to run C11 - Per-10% progress callback + DEBUG logs (no engine bytes leaked) - Consumer-side Protocol cuts (TilesByBboxBatchQuery, TilePixelOpener, DescriptorIndexRebuilder) so c10 stays within AZ-270 lint - runtime_root.c10_factory adds build_descriptor_batcher + three C6->C10 adapters - 16 unit tests covering AC-1..AC-10 + 2 NFRs + 4 supplemental (Protocol conformance, query pass-through, handle release, config) Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
3b7265757b |
[AZ-306] C6 FaissDescriptorIndex (faiss-cpu, HNSW32)
Production-default DescriptorIndex strategy backed by the faiss-cpu PyPI wheel (>=1.7,<2.0). Implements the AZ-303 Protocol surface end to end: HNSW32 + IndexIDMap2 search, atomic three-file rebuild (.index + .sha256 sidecar + .meta.json), triple-consistency load check, mmap-backed reads with IO_FLAG_MMAP|IO_FLAG_READ_ONLY, optional warm-up query at construction, FAISS RuntimeError rewrap to IndexUnavailableError / IndexBuildError, and FaissDescriptorIndex.from_config classmethod wired into runtime_root.storage_factory. The original spec required a custom pybind11 wrapper over a vendored FAISS HEAD; the user opted for the upstream faiss-cpu wheel after research fact #92 confirmed ARM64 wheel availability for Jetson and the existing pyproject.toml already pinned faiss-cpu. cpp/faiss_index/ placeholder removed; BUILD_FAISS_INDEX flag retained as a runtime/factory gate (no native target). Spec rewritten end-to-end and archived to _docs/02_tasks/done/. C6TileCacheConfig extended with faiss_index_path and faiss_warmup_query_path fields. tests/conftest.py sets KMP_DUPLICATE_LIB_OK=TRUE to remediate the macOS faiss/torch libomp duplicate-load abort during pytest (no-op on CI Linux). 21 new tests cover AC-1..12 + 2 NFRs + from_config smoke; AZ-303 protocol-conformance fake updated with from_config classmethod. Tests: 124/124 c6_tile_cache pass; 1334 project-wide pass; 1 pre-existing OKVIS2 submodule failure unrelated. Doc sync: module-layout.md, components/08_c6_tile_cache/description.md §5, batch_35_cycle1_report.md. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
e2bebefdfc |
[AZ-507] [AZ-323] [AZ-324] C10 Manifest build + verify + AZ-270 hygiene
AZ-507: codify cross-component import rule. Added _types/inference_errors.py shim re-exporting EngineBuildError + CalibrationCacheError from c7_inference; narrowed C10 EngineCompiler's except Exception to the two typed errors so unknown exceptions propagate (AC-3). Rewrote module-layout.md "Imports from" sections for 9 components + added Rule 9; appended an architecture.md ADR-009 note explaining why components must go through _types/*. AZ-323: ManifestBuilder + Ed25519ManifestSigner. Canonical JSON via orjson OPT_SORT_KEYS+OPT_INDENT_2, atomic-write Manifest.json + sha sidecar + .sig via AZ-280, operator-key fingerprint allowlist gate (C10-ST-01), ADR-010 takeoff_origin + flight_id baked into Manifest AND manifest_hash so re-planned routes change the cache identity (AC-15/AC-16). 20 unit tests cover all 16 ACs. AZ-324: ManifestVerifierImpl. Fail-closed Steps A-D: Manifest.json sidecar self-hash, Ed25519 trust-key set, schema parse with absolute/.. path rejection + takeoff_origin in-bbox check, stream SHA-256 per artifact with multi-failure accumulation. Operator mode re-derives tiles_coverage_sha256 from C6; airborne mode trusts the signed aggregate. 19 unit tests cover all 17 ACs. Composition root: c10_factory.build_manifest_builder + build_manifest_verifier + c6_tile_metadata_store_to_tiles_query adapter (the one place that legitimately imports both C6 and C10 without violating the AZ-270 lint). Dependency: pinned cryptography>=43.0,<46.0 in pyproject.toml. Tests: 1300 passed, 80 skipped (env-only), ruff clean for all AZ-323/324 files. AZ-306 (FAISS) intentionally deferred to batch 35 — needs C++ pybind11 toolchain not present in this environment. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
0dfe7c5301 |
[AZ-321] C10 EngineCompiler: hardware-tied TRT compile + cache reuse
Land the C10 per-model engine compile + cache-reuse orchestrator. `EngineCompiler.compile_engines_for_corpus(request)` walks the corpus, computes the canonical engine filename via AZ-281 `EngineFilenameSchema.build`, and either reuses the cached binary (cache hit, AZ-280 `Sha256Sidecar.verify` returns True) or delegates to the AZ-297 `compile_engine` on the injected runtime (cache miss; the runtime owns the write path). Returns one `EngineCompileResult` per backbone carrying the canonical `EngineCacheEntry`, outcome (BUILT / REUSED), and `compile_duration_s` (None on reuse). Hardware-tied reuse (D-C10-6 / D-C10-7) falls out of the filename schema — a host change rebuilds at the new path and leaves the old files untouched (AC-4). Design corrections vs. the task spec body: - The spec proposed a c10-local `EngineCacheEntry` carrying outcome and duration; that name is already taken by the AZ-297 canonical DTO. The wrapper is renamed `EngineCompileResult`; the canonical shape wins. - The spec called `InferenceRuntime.host_info()`, which is not in the AZ-297 Protocol. `HostCapabilities` is threaded through `EngineCompileRequest` instead so the composition root owns host probing and the compiler stays decoupled. - The c10 layer cannot import `components.c7_inference` (arch rule `test_az270_compose_root.test_ac6`). `engine_compiler.py` defines `CompileEngineCallable` — a structural Protocol cut of `InferenceRuntime` exposing only `compile_engine` — and catches broad `Exception` (re-raising preserves the original type; `error_class` is recorded in the ERROR log payload). Production - engine_compiler.py: `CompileOutcome` enum, `BackboneSpec`, `EngineCompileRequest`, `EngineCompileResult`, `EngineCompileSummary` DTOs; `CompileEngineCallable` Protocol; `EngineCompiler` with the single public method. - config.py: `BackboneConfig` + `C10ProvisioningConfig` (`workspace_mb` default 4 GiB to match C7 NFT-LIM-01); validate positive shape dims and duplicate model_name detection in `__post_init__`. - runtime_root/c10_factory.py: `build_engine_compiler(config)` wires the existing `build_inference_runtime` factory through; `build_backbone_specs(config)` materialises the `BackboneSpec` tuple from the config block. - components/c10_provisioning/__init__.py: re-exports the AZ-321 surface and registers the new config block. Tests - test_engine_compiler.py: covers AC-1..AC-10 + missing-sidecar sibling case for AC-5. Tier-1 via fake runtime that writes through the REAL `Sha256Sidecar.write_atomic_and_sidecar`. Tier-2 placeholders for the cache-hit p99 NFR (200 MB engine sweep) and kill-during-compile atomic-write NFR. Docs - module-layout.md: c10_provisioning Per-Component Mapping lists the new internal modules (engine_compiler.py, config.py), the composition-root c10_factory.py, the AZ-321 public re-export surface, and the registered config block. - batch_33_cycle1_report.md + reviews/batch_33_review.md: PASS_WITH_WARNINGS (4 Low findings accepted). Tests run: c10_provisioning 13 passing + 2 Tier-2 skips; combined unit suite (excluding pending components) 543 passing, 21 env-skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
0ad3278b12 |
[AZ-299] C7 OnnxTrtEpRuntime: ORT + TRT EP fallback strategy
Land the fallback InferenceRuntime strategy that satisfies C7-IT-05: when the TRT-direct path (AZ-298) cannot deserialise a cached engine or when the operator explicitly selects ORT, the system stays in the air at degraded latency rather than dropping the request. Conforms to the AZ-297 Protocol; current_runtime_label() == "onnx_trt_ep". Production - onnx_trt_ep_runtime.py: compile_engine is a no-op returning an EngineCacheEntry pointing at the source .onnx; deserialize_engine is gate-first for .engine entries and gate-skip for .onnx, builds an ORT InferenceSession with the provider list [TensorrtExecutionProvider, CUDAExecutionProvider, CPUExecutionProvider], stages cached engines into the ORT TRT EP cache directory via symlink-or-copy, warms up with one session.run after construction, and honours config.inference.ort_disallow_cpu_ fallback by raising EngineDeserializeError when the active provider resolves to CPU; infer emits a one-shot c7.fallback_to_onnx_trt_ep WARN log plus gcs_alert callback on first call when is_fallback= True; release_engine is idempotent. _build_provider_args is the single point that pins TRT EP option-key names (Risk-3) and caps trt_max_workspace_size at gpu_memory_budget_bytes // 4 (AC-8). - config.py: adds ort_trt_cache_dir (validated non-empty) and ort_disallow_cpu_fallback to C7InferenceConfig. - fdr_client/records.py: adds c7.fallback_to_onnx_trt_ep and c7.cpu_fallback FDR record kinds. Tests - test_onnx_trt_ep_runtime.py: covers AC-1..AC-8 + Risk-2 CPU-fallback alert + Risk-3 option-key pin + NFR-reliability error rewrap; Tier-1 via fake ORT session; Tier-2 placeholders skip on macOS dev for numerical FP16 comparison and session-creation perf NFR. - test_protocol_conformance.py: drops onnx_trt_ep from the missing- module parametrize now that the module ships. - test_az272_fdr_record_schema.py: extends per-kind fixture builder to cover the two new C7 FDR kinds in the roundtrip / schema-version AC tests. Docs - module-layout.md: replaces the pending onnx_trt_runtime row with the shipped onnx_trt_ep_runtime row + capabilities list. - batch_32_cycle1_report.md + reviews/batch_32_review.md: full batch + self-review (PASS_WITH_WARNINGS, 4 Low findings accepted). Tests run: c7_inference 139 passing + 17 Tier-2 skips; combined unit suite (excluding pending components) 529 passing, 19 env-skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
18a69022b3 |
[AZ-298] C7 TensorrtRuntime: TRT 10.3 + INT8 calib trust + GPU budget
Implement the production-default InferenceRuntime strategy on JetPack 6.2 + TensorRT 10.3 (per D-C7-9). The runtime owns the full TRT lifecycle: compile_engine via the Polygraphy + trtexec + IBuilderConfig hybrid (FP16 / INT8 / Mixed precision), deserialize_engine with EngineGate-first ordering and a pre-allocation GPU memory budget gate, infer via H2D -> enqueueV3 -> D2H -> stream sync on the owned CUDA stream, idempotent release_engine, and an injected ThermalStatePublisher delegation for thermal_state. INT8 calibration cache trust (D-C10-6, AC-2/3/4) is enforced by a .calib_cache.sha256 file-integrity sidecar (AZ-280) plus a new .calib_cache.dataset_sha256 sidecar that records the dataset content hash at compile time; reuse only when both agree, rebuild silently on dataset hash mismatch, raise CalibrationCacheError on corrupt sidecar (never silently overwritten). GPU memory budget (NFT-LIM-01, default 4 GiB) is checked BEFORE any TRT call beyond the gate (AC-6); a pre-allocation refusal raises OutOfMemoryError and leaves the resident state unchanged. TensorRT 10.3 / Polygraphy / PyCUDA are lazy-imported inside the methods that need them so the module loads cleanly on Tier-0 hosts. A standalone CLI entry (python -m gps_denied_onboard.components.c7_inference.tensorrt_runtime compile <onnx> <build_config.json>) is wired for C10 CacheProvisioner (AZ-321) to invoke pre-flight without holding a runtime instance. C7InferenceConfig gains gpu_memory_budget_bytes (default 4 GiB) and trtexec_timeout_s (default 600 s, Risk 4 mitigation), both validated in __post_init__. Tests: 26 active + 6 Tier-2-gated skips; AC-1 / AC-3 / AC-4 / AC-5 / AC-6 / AC-7 / AC-10 + NFR-reliability fully covered on Tier-1 via fake CUDA / TRT modules; AC-2 / AC-8 / AC-9 / NFR-perf-deserialize placeholders skip with prerequisite reason and live in the AZ-298 Tier-2 microbench harness. Code review verdict PASS_WITH_WARNINGS (1 Medium hot-path hoist fix auto-applied). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
54942f3052 |
chore: c6 docs-hygiene from cumulative_review_batches_28-30
Land F1+F2+F3 from the PASS_WITH_WARNINGS cumulative review of batches 28-30 (AZ-305 / AZ-307 / AZ-308) before continuing to batch 31. All three are bounded by the c6_tile_cache component; no public API contract change beyond the new error re-export. F1 (Medium / Architecture): Re-export CacheBudgetExhaustedError from c6_tile_cache package __init__ so consumers can catch the AZ-308 budget-exhaustion variant without widening to TileCacheError (which drops the needed_bytes / available_bytes / evicted_count diagnostics). F2 (Medium / Architecture): Refresh the c6_tile_cache section of module-layout.md so the Public API line and the Internal-files list reflect what is actually on disk after batches 28-30 (drop the stale Tile / TileRecord / connection.py entries; add the AZ-305 postgres_filesystem_store + tools.py, AZ-307 freshness_gate, AZ-308 cache_budget_enforcer entries; pivot the Public API bullet to the __init__.__all__ as canonical, mirroring the c7_inference section format). F3 (Low / Maintainability): Promote the triplicate intra-module _iso_ts_now() helper into a single c6_tile_cache._timestamp.iso_ts_now and import it from postgres_filesystem_store, freshness_gate, and cache_budget_enforcer. FDR record envelope ts format now has one source of truth. Test impact: tests/unit/c6_tile_cache: 105 passed, 57 skipped (pre-existing Docker-compose skip markers). No new tests required for F1/F2 (re-export + doc) and F3 (pure refactor; existing tests assert FDR record shape, not the helper symbol). Autodev state advanced to awaiting-invocation; next session resumes greenfield Step 7 at batch 31 (AZ-298 TensorrtRuntime). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
d571ca25f9 |
[AZ-308] c6 CacheBudgetEnforcer: 10 GB hard cap + LRU sweep
CacheBudgetEnforcer.reserve_headroom(needed_bytes) returns immediately when total_disk_bytes() + needed_bytes <= budget, otherwise iterates lru_candidates in eviction_batch_size batches, deletes via delete_tile, emits one INFO log per evicted tile (c6.evicted) and one FDR record per eviction batch (c6.eviction_batch, evicted_tile_ids capped to 5). Raises CacheBudgetExhaustedError AFTER a full sweep if the budget cannot be met. BudgetEnforcedTileStore decorates a TileStore so the policy stays separable from PostgresFilesystemStore. Composition root in storage_factory.build_tile_store wires the wrapper unconditionally. PostgresFilesystemStore now accepts lru_clock: Clock | None = None; when set, read_tile_pixels calls record_lru_access(tile_id, now) so eviction picks the right LRU candidates. Production wiring injects WallClock(); AZ-305 unit tests still construct without the clock and keep their pass-through semantics. Contract tile_store.md bumped to v1.1.0 to add CacheBudgetExhaustedError to the TileCacheError family; shared FDR schema bumped to v1.3.0 for the new c6.eviction_batch kind. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
39ff47087f |
[AZ-307] c6 FreshnessGate: active-conflict reject + stable-rear downgrade
Replaces the AZ-305 pass-through _evaluate_freshness hook with the production FreshnessGate. Loads tile_freshness_rules + sector classifications once at construction, builds an rtree index, and on every evaluate() either returns metadata unchanged (FRESH), stamps freshness_label=DOWNGRADED (stable_rear + stale), or raises FreshnessRejectionError carrying tile_id / age_seconds / classification / rule diagnostics (active_conflict + stale). Constructed inside PostgresFilesystemStore.from_config; the public storage_factory signature is preserved so AZ-305 unit tests still build the store with freshness_gate=None for the pass-through path. FDR schema bumped to v1.2.0: adds c6.freshness.rejected and c6.freshness.downgraded kinds (non-breaking; v1.1 readers route them opaquely). Operator CLI `python -m c6_tile_cache.freshness_gate explain` dry-runs the decision for a (lat, lon, capture_ts). Adjacent hygiene: c6_tile_cache.tools._dump_tile now passes os.environ to load_config (AZ-305 regression — load_config requires the env mapping). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
d1c1cd9ab4 |
[AZ-305] c6 PostgresFilesystemStore: TileStore + TileMetadataStore impl
Adds the production PostgresFilesystemStore implementing both protocols in a single class. Filesystem-backed JPEG I/O (atomic sidecar write, read-only mmap) + Postgres-backed metadata (spatial bbox, LRU, voting, upload bookkeeping). Wires composition via `from_config` classmethod. Key behaviors: - AC-3 strict reading: INSERT runs first inside an open transaction; duplicate-key collisions raise `TileMetadataError` BEFORE any byte is written, leaving the original file + sidecar byte-identical. Atomic sidecar write happens inside the same transaction; commit closes it. Comp-delete remains as a safety net for the rare commit-after-write failure path. - AC-2 content-hash gate runs before any I/O. - Construction performs an orphan-file reconciliation scan and emits an INFO `c6.store.construct` log with steady-state stats. Adds `c6.write` and `c6.write_failed` FDR record kinds (schema v1.1.0, forward-compatible) and a thin operator CLI at `c6_tile_cache.tools dump` for inspection. Dependencies: adds `psycopg-pool>=3.2,<4.0` for the connection pool used on the F3 read-hot path. Tests: 25 new tests for c6_tile_cache cover AC-1..AC-15 plus MmapTilePixelHandle + helper round-trips. Full Tier-2 unit suite passes (1215 passed, 8 skipped, 1 pre-existing unrelated failure `test_ac8_read_host_tuple_on_jetson` — missing `pynvml` on macOS, Jetson-only). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
1141d17769 |
[AZ-300] [AZ-301] [AZ-302] [AZ-304] docs: sync module-layout for c6+c7
Cumulative review of batches 23-27 (cycle 1) surfaced three Medium
documentation-drift findings on module-layout.md. All three fixed
inline per user direction:
F1: c7_inference Internal list expanded with architecture_registry,
config, engine_gate, errors, manifest, thermal_publisher (added
across AZ-300/301/302).
F2: c6_tile_cache `connection.py` re-attributed from AZ-304 (which
deferred it) to AZ-305 with a "planned, not landed yet" tag.
F3: c7_inference Public API description rewritten by category
(Protocol + DTOs + component services + config + error family)
with a pointer to __init__.py's __all__ for the canonical list.
Cumulative review report: _docs/03_implementation/cumulative_review_
batches_23-27_cycle1_report.md (PASS_WITH_WARNINGS).
Autodev state moved to status: paused_user_requested per user
choice; /autodev will resume at greenfield Step 7 (next batch
selection) on next invocation.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
dde838d2cc |
[AZ-304] C6 Postgres schema: additive 0002 migration + UUIDv5
Strictly additive Alembic migration on the AZ-263 baseline (data_model
.md § 6.1 / § 6.3): six new tiles columns (tile_uuid UNIQUE,
location_hash, content_sha256, disk_bytes, accessed_at, uploaded_at),
four new btree indices, one UNIQUE expression index over the
COALESCE-zero-uuid natural key, CHECK widening of
ck_tiles_freshness_status to the AZ-263 + AZ-303 vocabulary UNION,
four NULLable bbox columns on sector_classifications, and a new
tile_freshness_rules table seeded with the two default thresholds.
Pinned UUIDv5 namespace (TILE_NAMESPACE_UUID =
5b8d0c2e-1a4f-4b3a-8c9d-e7f6a3b2c1d0) + derive_tile_id /
derive_location_hash helpers cross-coordinated with
satellite-provider. Migration runner apply_migrations(config) drives
Alembic command.upgrade("head") against the AZ-263 env with one
retry on PG SQLSTATE 40001 and structured INFO logs on apply / no-op.
Contract bump tile_metadata_store.md v1.1.0 -> v1.2.0 adds
TileMetadata.location_hash: UUID | None = None (non-breaking).
module-layout.md updated so c6_tile_cache explicitly Owns
db/migrations/**.
Tier-1 tests: UUIDv5 determinism + locked vectors + DSN resolution +
retry mocked DBAPIError -> 1180 passed, 32 skipped. Tier-2 docker
schema tests gated by @pytest.mark.docker run against the existing
docker-compose.test.yml db service.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
21f5a30d09 |
refactor: update autodev state and tile metadata store version
- Changed autodev state to reflect the transition from batch 26 to batch 27, updating the phase and details for the compute-batch step. - Incremented the version of the tile metadata store from 1.0.0 to 1.1.0, refining the uniqueness invariant to use a natural key that includes flight_id, allowing coexistence of multiple rows for the same tile from different flights. - Updated the last modified date in the tile metadata store documentation to reflect recent changes. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
48ea1e2fc2 |
[AZ-343] C2.5 InlierCountReRanker + shared FeatureExtractor helper
Implements the production-default ReRankStrategy: K=10 → N=3 by single-pair LightGlue inlier count, with strict drop-and-continue (INV-8) on per-candidate TileFetch / backbone / zero-inlier failures and RerankAllCandidatesFailedError on zero survivors. Composition root injects the shared LightGlueRuntime + Clock + the new FeatureExtractor helper (an L1 placeholder OpenCvOrbExtractor that unblocks AZ-343 and future C3 strategies — task scope expansion). Architectural notes: - Cross-component imports stay banned; tile_store types as `object` and the C6 TileCacheError family is duck-typed by class module prefix (same workaround AZ-348 adopted for c7_inference; proper fix is to relocate TileCacheError to _types/ in a follow-up). - Clock injection follows the replay contract (AZ-398 Invariant 2); reranked_at is sourced from clock.monotonic_ns(). - AZ-342 factory grew `feature_extractor` + `clock` + `fdr_client` parameters; existing AZ-342 conformance tests updated. Tests: 19 new AC-1..AC-12 + mixed-failure scenarios in test_inlier_count_reranker.py; existing AZ-342 suite (26) still green. Full repo sweep 1093 passed / 2 skipped (cmake/actionlint not on PATH). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
9a605c8514 |
[AZ-348] C3.5 ConditionalRefiner Protocol + factory + PassthroughRefiner
Defines the public `ConditionalRefiner` Protocol (PEP 544 @runtime_checkable, two methods: `refine_if_needed` + `was_invoked`), extends `MatchResult` in-place with two default-valued refinement fields (`refinement_label`, `refinement_added_latency_ms`), defines the `RefinerError` family (`RefinerBackboneError`, `RefinerConfigError`), and ships the trivial `PassthroughRefiner` reference impl. Both refiner strategies are linked unconditionally — no `BUILD_REFINER_*` flag (NOT ADR-002 territory). Runtime selection only per ADR-001. `PassthroughRefiner` returns the input `MatchResult` by reference (bit-identical correspondences per contract INV-5) and always reports `was_invoked() is False`. Documentation: renames `module-layout.md` `c3_5_adhop` Public API symbol from `AdHoPRefinementStrategy` to `ConditionalRefiner` (AC-14) so the doc agrees with `description.md` and the contract. AC-9 (single-thread binding) deferred to AZ-270 runtime-root composition, mirroring AZ-336 / AZ-342 / AZ-344 Risk-4 precedent. AC-7 for the `"adhop"` strategy stops at `ModuleNotFoundError` because the AdHoP backbone is owned by AZ-349. All other ACs + NFRs covered by 36 new conformance tests. Architectural note: `PassthroughRefiner.inference_runtime` is typed as `object` because the L3→L3 import ban (`test_az270_compose_root`) forbids c3_5_adhop from importing c7_inference; the runtime-root factory narrows the type at construction time. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
89c223882b |
[AZ-344] C3 CrossDomainMatcher Protocol + factory + RollingHealthWindow
Defines the public `CrossDomainMatcher` Protocol (PEP 544 @runtime_checkable, two methods: `match` + `health_snapshot`), the three frozen+slotted DTOs (`CandidateMatchSet`, `MatchResult`, `MatcherHealth`) in the L1 `_types/matcher.py` layer, the `MatcherError` family (`MatcherBackboneError`, `InsufficientInliersError`), and the composition-root `build_matcher_strategy` factory with lazy-import + `BUILD_MATCHER_<variant>` gating per ADR-002. `RollingHealthWindow` accumulator (60 s, amortised O(1) update, strict O(1) snapshot) is constructed by the factory and injected into every concrete matcher so all backbones share window semantics; this is what backs C5's spoof-promotion gate. Legacy placeholder `MatchResult` removed from `_types/matching.py`; import-only consumers (`c4_pose.interface`, `c3_5_adhop.interface`) repointed at the new `_types/matcher.py` home — zero behavioural change to those components. AC-9 (single-thread binding) and AC-10 (LightGlueRuntime identity-share with C2.5) deferred to AZ-270 runtime-root composition, mirroring the AZ-342 Risk-4 escape clause. All other ACs + NFRs covered by 70 new conformance tests. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
d6756f1855 |
[AZ-342] C2.5 ReRankStrategy: Protocol + DTOs + factory + composition
Foundational scaffolding for the InlierCountReRanker (AZ-343) and the future C3 CrossDomainMatcher consumer (AZ-344). No concrete re-ranker is implemented here. * ReRankStrategy Protocol (single rerank(frame, vpr_result, n, calibration) -> RerankResult method) with all 8 invariants in the docstring — notably INV-8 drop-and-continue (per-candidate failure NEVER propagates unless every candidate fails). * DTOs moved to L1 _types/rerank.py — RerankCandidate, RerankResult; frozen+slots; tuple-not-list for RerankResult.candidates; tile_id encoded as (zoom_level, lat, lon) tuple to keep _types/ free of any c6_tile_cache (L3) import per module-layout.md. * Error family: RerankError + RerankBackboneError + RerankAllCandidatesFailedError. Only RerankAllCandidatesFailedError escapes rerank(); RerankBackboneError is caught inside the per- candidate loop, logged ERROR, FDR-stamped, candidate dropped. * C2_5RerankConfig (strategy enum default "inlier_count", top_n int default 3) with strict validation at load; registered into Config.components on c2_5_rerank import. * build_rerank_strategy(config, *, tile_store, lightglue_runtime) factory: 1-strategy resolution table, lazy import, BUILD_RERANK_<variant> gate, ImportError → StrategyNotAvailableError mapping. The shared LightGlueRuntime is constructor-injected (R14 fix: neither C2.5 nor C3 owns its lifecycle). Renamed the Protocol from the existing stub "RerankStrategy" to "ReRankStrategy" to match the contract; updated module-layout.md. Removed the legacy RerankResult shape from _types/vpr.py — the v1.0.0 shape lives in _types/rerank.py. Excluded per task spec: * Concrete InlierCountReRanker (AZ-343). * C3 matcher protocol task (AZ-344, next in batch). * AC-9 single-thread binding + AC-10 LightGlueRuntime identity-share between C2.5/C3 — deferred per task spec Risk 3 until the generic compose_root thread-binding registry and the C3 factory both land. Tests: AC-1..AC-8 + AC-11 + NFR-perf-factory in tests/unit/c2_5_rerank/test_protocol_conformance.py. The legacy smoke test is removed. Full sweep: 997 passed (one pre-existing flake in test_az296_takeoff_abort, subprocess timing, unrelated to this commit; passes in isolation). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
3665acef66 |
[AZ-336] C2 VprStrategy: Protocol + DTOs + factory + composition
Foundational scaffolding for every concrete C2 backbone (UltraVPR, NetVLAD, MegaLoc, MixVPR, SelaVPR, EigenPlaces, SALAD — AZ-337..AZ-340) and the C2.5 ReRanker consumer side. No backbone is implemented here. * VprStrategy Protocol (embed_query / retrieve_topk / descriptor_dim) + BackbonePreprocessor C2-internal Protocol (NOT in Public API per description.md § 6). * DTOs in L1 _types/vpr.py — VprQuery, VprCandidate, VprResult; all frozen + slots; tuple-not-list for VprResult.candidates so the immutability invariant truly holds. * Error family: VprError + VprBackboneError + VprPreprocessError + IndexUnavailableError; same-named but namespace-distinct from c6_tile_cache.IndexUnavailableError (the c2 family is the closed envelope C5 / C2.5 consume; concrete strategies rewrap the C6 form). * C2VprConfig (strategy enum + backbone_weights_path + faiss_index_path) with strict validation at load; registered into Config.components on c2_vpr import. * build_vpr_strategy factory with 7-strategy resolution table, lazy import, BUILD_VPR_<variant> gating, ImportError→ StrategyNotAvailableError mapping, and pre-flight descriptor_dim match against DescriptorIndex.descriptor_dim() — mismatch fires ConfigError at startup, NOT at first frame. Contract change vs the v1.0.0 draft: factory takes descriptor_index: DescriptorIndex (not tile_store: TileStore) because descriptor_dim() lives on DescriptorIndex per C6's Public API. The contract markdown is updated to match. Architecture: VprCandidate.tile_id is a plain (zoom, lat, lon) tuple, keeping _types/ (L1) free of any c6_tile_cache (L3) import per module-layout.md. Consumers reconstruct TileId at the C6 boundary. Excluded per task spec: * Concrete backbones (AZ-337..AZ-340). * FAISS HNSW retrieve wiring (AZ-341). * DescriptorNormaliser helper (AZ-283, already shipped). * AC-9 single-thread binding — deferred per task spec Risk 4 until the generic compose_root thread-binding registry is in place (today each factory owns its own, e.g. fc_factory). Tests: 45 ACs + NFRs in tests/unit/c2_vpr/test_protocol_conformance.py covering AC-1..AC-8, the error family, the config validation, the factory NFR (p99 ≤ 50 ms). The legacy smoke test is removed. Full sweep 973 passed, 2 skipped (CI-only cmake / actionlint). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
8a83166261 |
[AZ-490] C5 set_takeoff_origin entrypoint + bounded-delta GPS gate
Add operator warm-start path to C5 StateEstimator Protocol and both
implementations (GtsamIsam2StateEstimator, EskfStateEstimator), plus
the third clause of the AZ-385 spoof-promotion gate.
- StateEstimator Protocol: set_takeoff_origin(origin, sigma_horiz_m,
sigma_vert_m) -> None.
- iSAM2: PriorFactorPose3 at origin with diagonal sigmas, single
isam2.update().
- ESKF: zero _nominal_pos, overwrite _P position block with sigma**2.
- SourceLabelStateMachine.process_gps_sample bounded-delta clause:
WgsConverter.horizontal_distance_m vs smoother estimate; reject
resets the dwell-time counter so AZ-385 cannot re-promote off bad
GPS.
- New EstimatorAlreadyStartedError (StateEstimatorConfigError
subclass) on late call after first add_*.
- C5StateConfig: spoof_promotion_bounded_delta_m=200,
default_takeoff_origin_sigma_horiz_m=5,
default_takeoff_origin_sigma_vert_m=10.
- New GpsSample DTO + WgsConverter.horizontal_distance_m helper.
- 4 new FDR kinds (cold_start_origin.{set,unavailable},
gps_bounded_delta.{accept,reject}) registered in AZ-272 schema.
- 33 new unit tests cover AC-1..AC-15; full repo 750 passed / 2
skipped (pre-existing CI tooling skips).
Docs synced: protocol contract, C5 component description,
architecture, glossary, system-flows, C10 provisioning description.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
e0be591b06 |
[AZ-489] [AZ-490] ADR-010 design pass: operator-mission as cold-start anchor
Architecture, contracts, and task amendments for the flight-route-driven preflight + cold-start origin feature (ADR-010). No source code touched in this commit; the implementation commits for AZ-489 / AZ-490 / AZ-419 land separately. * architecture.md: ADR-010, new Principle #14, amended Principle #11, external systems gain flights service + Mission Planner UI, data model gains Flight / Waypoint / TakeoffOrigin. * system-flows.md: F1 gains phase 0 (Flight resolve), F2 gains cold-start ladder, F7 gains mid-flight bounded-delta GPS gate. * glossary.md: Flight, Flights API, Mid-flight bounded-delta GPS gate, Mission Planner UI, Takeoff origin, Waypoint. * C10: description + cache_provisioner + manifest_verifier bumped to v1.1 carrying takeoff_origin + flight_id in the manifest hash. * C12: description updated + new flights_api_client.md contract v1.0. * C5: description + state_estimator_protocol bumped to v1.1 with set_takeoff_origin + 3-clause spoof-promotion gate. * AZ-323/324/325/326/328/419 amended in place. AZ-490 spec created (C5 set_takeoff_origin entrypoint). * Dependencies table: 142 tasks / 478 pts / 15 forward edges (2 new tasks, 2 backward deps, 2 forward deps from AZ-419). * Leftovers cleared: 2026-05-11 Jira transition entries for AZ-355 and AZ-386 are deleted (Jira reconnected; both already transitioned in their respective implementation commits). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
beed43724f |
[AZ-381] C5 StateEstimator protocol + factory + C8 DTO reshape
- Add StateEstimator Protocol (6 methods, @runtime_checkable) + DTOs (EstimatorOutput, EstimatorHealth, IsamState, PoseSourceLabel, Quat) in _types/state.py per state_estimator_protocol.md v1.0.0. - Add C5 error hierarchy (StateEstimatorError + 3 subclasses) and C5StateConfig (strategy, keyframe_window, spoof gates, no_estimate_fallback_s) with __post_init__ validation. - Add ISam2GraphHandle Protocol + ISam2GraphHandleImpl skeleton (all 4 methods raise NotImplementedError naming AZ-382 as owner). - Add build_state_estimator factory + bind_state_ingest_thread for single-writer enforcement; ADR-002 build-flag gating (BUILD_STATE_<variant>); INFO log on success. - Strict reshape of legacy EstimatorOutput / EstimatorHealth across all 6 C8 production files (_outbound_provenance, _covariance_projector, pymavlink_ardupilot_adapter, msp2_inav_adapter, mavlink_gcs_adapter, interface) + 6 C8 test files (UUID frame_id, LatLonAlt position_wgs84, Quat orientation, PoseSourceLabel enum source_label). Remove ad-hoc DTOs from _types/pose.py and from C4's public __init__ (EstimatorOutput is a C5 concept, not a C4 one). - 20 AZ-381 AC tests (10 ACs + 4 config range + NFR + conformance). - Full suite: 521 passed, 2 skipped (+20 vs Batch 11). - Contracts: state_estimator_protocol.md v1.0.0 -> active; composition_root_protocol.md v1.2.0 -> v1.3.0 (additive state block + factory + ingest-thread binding). - Impl report: _docs/03_implementation/batch_12_cycle1_report.md. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
362e93c626 |
[AZ-390] [AZ-392] C8 FC/GCS adapter foundation + covariance projector
Adds the C8 foundation: - FcAdapter / GcsAdapter / ReplaySink Protocols + contract DTOs in _types/fc.py (PortConfig, FcKind, FlightState, GpsStatus, Severity, TelemetryKind, FcTelemetryFrame, FlightStateSignal, GpsHealth, OperatorCommand, Subscription, Imu/Attitude samples). - Disjoint FcAdapterError / GcsAdapterError trees with SourceSetSwitchNotSupportedError <: SourceSetSwitchError per AC-9. - FcConfig + GcsConfig cross-cutting Config blocks with config-load validation (unknown strategy rejected at __post_init__). - runtime_root/fc_factory.py: build_fc_adapter / build_gcs_adapter with BUILD_FC_*/BUILD_GCS_* flag gating + INFO log on load + single-writer outbound-thread binding. - CovarianceProjector (helper, AZ-392): 6x6 -> 3x3 -> 2x2 -> sqrt(lambda_max) reduction; AP returns float m, iNav returns int mm with uint16 clamp + WARN + FDR record. Non-SPD / NaN / wrong-shape raise FcEmitError and emit an FDR ERROR record carrying frame_id. Contracts: - composition_root_protocol.md 1.1.0 -> 1.2.0 (added fc/gcs blocks + build_fc_adapter / build_gcs_adapter + outbound-thread binding). - fc_adapter_protocol.md unchanged (this batch implements v1.0.0). Tests: 410 pass / 2 skip / 0 fail (+53 new tests in batch 8). AZ-391 (inbound subscription) deferred to batch 9 — pulls YAMSPy as a new external dependency (iNav MSP2 decode). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
e4ecdaf619 |
[AZ-294] [AZ-295] [AZ-296] Finish C13: tile snapshot + record-kind policy + takeoff abort
AZ-294: MidFlightTileSnapshotSink writes orthorectified tile JPEGs atomically to flight_root/<flight_id>/tiles/<tile_id>.jpg, emits a kind="mid_flight_tile_snapshot" pointer record, and evicts the oldest tile when the per-flight 64 MiB cap is exceeded. Adds optional frame_id to the snapshot payload (fdr_record_schema bump). AZ-295: RecordKindPolicy with two paired gates: - enforce_or_raise (producer-side) raises RawFrameWriteForbiddenError for raw_nav_frame / raw_ai_cam_frame at the call site, defending AC-8.5 / RESTRICT-UAV-4. - gate_for_writer (writer-side) tumbling-window rate-caps failed_tile_thumbnail records at <= 0.1 Hz; over-cap drops are coalesced into kind="overrun" records with the originating producer slug. AZ-296: take_off() composition-root sequence with strict ordering (writer.__init__ -> start -> open_flight -> fc_adapter.__init__ -> fc_adapter.open). On FdrOpenError, logs ERROR record, calls writer.stop(), prints the documented FATAL line to stderr, and sys.exit(EXIT_FDR_OPEN_FAILURE=2). composition_root_protocol bumped to v1.1.0 with the new constants + takeoff-sequence section. 29 new tests; full suite 356 passed / 2 skipped / 0 failures. No new dependencies (stdlib only). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
b5dd6031d2 |
[AZ-291] [AZ-292] [AZ-293] C13 FDR writer chain (batch 6)
AZ-291 — FileFdrWriter: single writer thread draining every registered FdrClient SPSC ring buffer to per-flight segment files; per-segment size rotation; cross-process fcntl.flock filelock on flight_root; ENOSPC degraded mode with rate-capped ERROR logs and one GCS alert. AZ-292 — FlightHeader/FlightFooter dataclasses + open_flight / close_flight lifecycle methods; four per-flight monotonic counters (records_written, records_dropped_overrun, bytes_written, rollover_count) reported by the footer; flight_id mismatch and close-without-open are typed errors. AZ-293 — CapacityCapPolicy (post-rotation hook): walks the flight directory, drops the oldest CLOSED segment when total > cap (default 64 GiB), emits a kind="segment_rollover" record per drop. Never drops the currently-open segment or segment 0 alone; cap_misconfigured path logs ERROR + GCS alert. No config flag disables emission (C13-ST-01). Schema: bumped fdr_record_schema flight_header / flight_footer payload key sets to match the AZ-292 task spec (effective 1.0.0 -> 1.1.0; no prior producer); KNOWN_PAYLOAD_KEYS updated. Added FdrWriterConfig nested in FdrConfig (segment_size_bytes, batch_size, flight_cap_bytes, debug_log_per_record). Tests: 29 new unit tests (8 AC + 1 invariant per task); full suite 323 passed, 2 pre-existing skips, 0 regressions. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
880eabcb3f |
Decompose Step 6 snapshot: 140 task specs + contract docs
Closes out greenfield Step 6 (Decompose) for all 14 components (C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446 plus the _dependencies_table.md and component contract documents. State file updated to greenfield Step 7 (Implement), not_started. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
8171fcb29e |
[AZ-263] [AZ-264] [AZ-265] Decompose: layout, helpers epic, replay epic
Decompose Step 1 + Step 1.5 + new cycle-1 epics: - Step 1 (Bootstrap): AZ-263 spec at _docs/02_tasks/todo/. Single top-level Python package src/gps_denied_onboard/ + nested components/ subpackage per user feedback (replaces earlier src/gps_denied/ + sibling src/components/ split). - Step 1.5 (Module Layout): _docs/02_document/module-layout.md is the file-ownership map consumed by /implement Step 4. Covers all 14 components + cross-cuttings (_types, config, logging, fdr_client, helpers x8, frame_source, clock, runtime_root, cli/replay, healthcheck), 5-layer layering, and the Build-Time Exclusion Map for all 4 binaries (airborne, research, operator-tooling, replay-cli). - New epic AZ-264 (E-CC-HELPERS): re-homes the 8 shared helpers from per-component child-issues into a single cross-cutting epic per the decompose skill cross-cutting rule. R14 (LightGlue circular dep) is structurally prevented because both C2.5 and C3 import gps_denied_onboard.helpers.lightglue_runtime. - New epic AZ-265 (E-DEMO-REPLAY): offline replay mode (video + tlog -> per-tick coordinate stream). 8 child tasks, 27-32 pts. Reuses C8 FcAdapter via TlogReplayFcAdapter strategy + new VideoFileFrameSource + JsonlReplaySink + compose_replay composition root + gps-denied-replay CLI + auto-sync via IMU take-off detection (per how_to_test.md). NO ROS dependency. - Plan Final report at FINAL_report.md. - _autodev_state.md updated with handoff notes for Step 2 execution in a fresh chat (~290 MCP calls expected; epic ordering documented). Step 2 task PLAN approved (97 implementation tasks across 18 epics) but EXECUTION deferred per user choice to a fresh chat. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
64542d32fc |
Update autodev state, architecture documentation, and glossary terms
Transitioned the autodev state to phase 21, reflecting the completion of Step 5 and the drafting of Step 6 epics. Revised the architecture documentation to clarify the roles of the Tile Manager and its components, ensuring accurate representation of the system's operational flow. Updated glossary entries for Flight State and Operator to incorporate recent changes and enhance clarity on component interactions and responsibilities. |
||
|
|
723f574b14 |
Update autodev state and glossary definitions
Modified the autodev state to transition to phase 10, updating the sub-step name and details to reflect the latest architectural review changes. Enhanced the glossary entry for VioStrategy to clarify its functionality, build-time exclusions, and implications for deployment and research binaries, ensuring alignment with recent architectural decisions. |
||
|
|
c19c76481c |
Update autodev skill documentation and acceptance criteria
Enhanced the SKILL.md file to enforce conciseness rules for the state file, specifying acceptable content and file size limits. Updated the autodev state to reflect the transition to the planning phase, including changes to the current step and sub-step details. Revised acceptance criteria to clarify validation requirements and external dependencies, ensuring alignment with the latest research findings. Added a new overlay for Mode B revisions to track changes and decisions made during the assessment process. |
||
|
|
8382cdae10 | start over again |