mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 20:11:15 +00:00
21cef8bdce011c039bbd0a97d8787c2f9d2a1c94
91 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
ceb24b5a62 |
[AZ-334] C1 KLT/RANSAC strategy — engine-rule simple-baseline VIO
Implement KltRansacStrategy, the ADR-002 engine-rule mandatory simple-baseline VioStrategy for E-C1. Pure-Python facade over OpenCV's cv2.goodFeaturesToTrack / calcOpticalFlowPyrLK / findEssentialMat / recoverPose pipeline — no C++/pybind11 binding by design so a Tier-0 workstation runs the strategy with `pip install opencv-python` and the BUILD_KLT_RANSAC=ON gate alone. Constructor + state machine + FDR transition spine mirror Okvis2Strategy + VinsMonoStrategy so the AZ-331 factory + IT-12 comparative harness treat all three as drop-in substitutable; the duplication is the consolidation target now formally in scope for the next cumulative review (batches 52-54). AC coverage: AC-1..AC-11 + NFR-perf mapped to passing tests (25 tests, 23 pass + 2 tier-2 skipped on dev/CI runners; all 25 pass under GPS_DENIED_TIER=2). Honest-covariance invariant (AC-9) implemented as residual-scatter / (N_inliers - 5) with an inlier- count penalty — no client-side floor or smoother; cov Frobenius grows monotonically across DEGRADED. Camera-agnostic source (AC-11) enforced by CI-grep gate that excludes docstring text. Test-Run Cadence: focused suite tests/unit/c1_vio/ green (95 passed, 6 skipped); config-loader + compose-root suites green; full-suite gate deferred to Step 16 per implement skill. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
6a5954bdae |
[AZ-333] C1 VINS-Mono strategy — research-only comparative VIO
VinsMonoStrategy: Python facade conforming to AZ-331 Protocol; mirrors the AZ-332 OKVIS2 facade so the AZ-331 factory + IT-12 comparative harness can treat both as drop-in substitutable. Native binding is a pybind11 skeleton compiled behind BUILD_VINS_MONO=ON (default OFF for airborne / operator-tooling / replay-cli per module-layout.md Build-Time Exclusion Map). Real vins_estimator wiring is the Tier-2 follow-up. VinsMonoConfig added to c1_vio/config.py with sliding-window / feature-tracker / marginalisation / opt-iteration knobs plus __post_init__ validation; exported through the package __init__. cpp/vins_mono/CMakeLists.txt replaces the AZ-263 placeholder with full pybind11 wiring: Risk-1 mitigation forces VINS_MONO_USE_ROS=OFF; Risk-2 mitigation links Eigen from the same cpp/_third_party/eigen pin as OKVIS2; Risk-3 mitigation enforces BUILD_VINS_MONO=OFF in deployment binaries via the gate at the top of the file. Tests: 17 new in test_vins_mono_strategy.py (15 pass + 2 tier2 skip); fake_vins_mono_binding fixture added to conftest.py mirroring the fake_okvis2_binding pattern; test_protocol_conformance updated to drop vins_mono from _STRATEGIES_WITHOUT_PY_MODULE so the existing parametrised factory tests route through the new strategy. Focused c1_vio suite: 72 passed, 4 skipped. Full suite: 1788 passed, 1 unrelated pre-existing flake (c12 cold-start perf, env-bound). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
2ce300ddb1 |
[AZ-527] Archive AZ-527 + batch 52 report + state bump
Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
f6a180e5df |
[AZ-340] [AZ-527] Archive AZ-340 + batch 51 report + cumulative review 49-51
Bookkeeping for batch 51 close: - Archive AZ-340 spec todo/ -> done/ - Add _docs/03_implementation/batch_51_cycle1_report.md - Add _docs/03_implementation/cumulative_review_batches_49-51_cycle1_report.md Verdict: PASS_WITH_WARNINGS. F1 (Medium) escalates the 2-way _assert_engine_output_dim near-duplicate from cumulative-46-48 to a 7-way duplication after AZ-339 + AZ-340; new hygiene PBI AZ-527 formally created. F2 (Low) carries the AC-10 ConfigError vs literal ConfigurationError spec drift (documentation only). - File AZ-527 hygiene PBI (Hygiene -- consolidate _assert_engine_output_dim into a c2-internal helper, 2pt, AZ-255 E-C2). Add the spec stub at _docs/02_tasks/todo/AZ-527_*.md. - Refresh _docs/02_tasks/_dependencies_table.md: +AZ-527 row, totals bumped to 148 tasks / 491 points. - Bump _docs/_autodev_state.md: last_completed_batch=51, last_cumulative_review=batches_49-51. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
0d65ff4705 |
[AZ-339] C2 MegaLoc + MixVPR secondary VPR backbones
Adds two research-only VprStrategy implementations for the IT-12 comparative-study matrix. MegaLocStrategy (D=2048, 322x322) and MixVprStrategy (D=4096, 320x320), both via C7 TensorRT FP16 with their own concrete BackbonePreprocessor. Single-stage global L2 normalisation; retrieval delegated to FaissBridge; FDR records + structured logs identical to UltraVPR. BUILD_VPR_MEGALOC and BUILD_VPR_MIXVPR ON for research/replay-cli only, OFF for airborne and operator-tooling (fail-fast at composition root via existing AZ-336 factory). Uses helpers.iso_ts_from_clock from day 1 — no new timestamp helper duplicates introduced. 36 parametrised AC tests + 25 protocol-conformance + 18 helper regression tests pass; 1690 / 1690 unit tests pass (excluding 1 pre-existing flaky cold-start subprocess test in c12). Verdict: PASS_WITH_WARNINGS — one Medium follow-on (AZ-527 to consolidate 4-way _assert_engine_output_dim) + one Low AC wording drift. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
5dfd9a577e |
[AZ-526] Consolidate _iso_ts_from_clock into helpers/iso_timestamps
Closes cumulative review 46-48 F1 (Medium) + F3 (Low). Adds iso_ts_from_clock(clock) alongside iso_ts_now() in the Layer-1 helper; migrates four duplicate definitions in c2_vpr (net_vlad, ultra_vpr, _faiss_bridge) and c12_operator_orchestrator (operator_reloc_service). Output format flipped +00:00 -> Z to align with iso_ts_now() and the canonical FDR _TS fixture (FDR schema test passes unmodified). 18 helper AC tests + 186 sibling tests pass; ruff clean. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
5441ea2017 |
[AZ-508] Consolidate _iso_ts_now into helpers/iso_timestamps
Batch 48 / Cycle 1 (greenfield Step 7). Closes cumulative review batches 31-33 F2 and 28-30 F3 by replacing the duplicated private _iso_ts_now() one-liners with a single Layer-1 helper: src/gps_denied_onboard/helpers/iso_timestamps.py iso_ts_now() -> str Output format matches the canonical FDR _TS fixture (YYYY-MM-DDTHH:MM:SS.ffffffZ); no FDR schema change. Migrated call-sites (3): c7_inference/onnx_trt_ep_runtime, c7_inference/thermal_publisher, plus the 3 c6_tile_cache callers that previously imported from the local c6_tile_cache/_timestamp shim (now deleted, superseded by the Layer-1 helper). Spec drift resolved (Choose A, user-approved): spec listed 5 call sites + +00:00 regex; on-disk reality at batch start is 3 sites + Z-suffix matching every existing helper and the FDR _TS fixture. Spec preamble + AC-2 regex updated in the task file; documented in batch_48_cycle1_report.md. Tests: 9 new AC tests (AC-1..AC-7 + Layer-1 invariant + public-surface defensive); 216 focused tests pass including the unmodified AZ-272 FDR schema suite and AZ-270 / AZ-507 layering lints. Verdict: PASS (no findings). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
b64f3a1b93 |
[AZ-337] Archive task spec + batch 47 report + state bump
- _docs/02_tasks/todo/AZ-337_c2_ultra_vpr.md -> _docs/02_tasks/done/AZ-337_c2_ultra_vpr.md - _docs/03_implementation/batch_47_cycle1_report.md (new) - _docs/_autodev_state.md: last_completed_batch 46 -> 47; sub_step.detail "batch 47 complete - selecting batch 48" AZ-337 transitioned in Jira: In Progress -> In Testing. Batches 45/46/47 close the C2 production path (Protocol + FaissBridge + NetVLAD baseline + UltraVPR primary). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
773d589d34 |
[AZ-338] Archive task spec + batch 46 report + state bump
- _docs/02_tasks/todo/AZ-338_c2_net_vlad.md -> _docs/02_tasks/done/AZ-338_c2_net_vlad.md - _docs/03_implementation/batch_46_cycle1_report.md (new) - _docs/_autodev_state.md: last_completed_batch 45 -> 46; sub_step.detail "batch 46 complete - selecting batch 47" AZ-338 transitioned in Jira: In Progress -> In Testing. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
1682dc354b |
[AZ-341] Archive AZ-341 + batch 45 report
Batch 45 (AZ-341 C2 FAISS retrieve wiring) post-commit bookkeeping: - Move AZ-341 task spec to done/ (implement skill step 13). - Write batch_45_cycle1_report.md (test results, AC coverage, architectural decisions, findings carried into cumulative review). - Bump state.last_completed_batch 44 → 45. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
25836925c9 |
[AZ-329] [AZ-330] Archive Batch 44 task files to done/
Implementation completed in Batch 44 (commit
|
||
|
|
5fe67023b2 |
[AZ-329] [AZ-330] [AZ-523] [AZ-524] Batch 44 atomic refactor
Implements two new C12 services and rebalances the C11/C12 boundary in one atomic commit: * AZ-329 PostLandingUploadOrchestrator — gates C11 upload on the `flight_footer` FDR record's `clean_shutdown` field; 4 refusal modes; new FdrFooterReader Protocol + LocalFdrFooterReader. * AZ-330 OperatorReLocService — AC-3.4 visual-loss re-localization hint; reuses shared LatLonAlt; OperatorCommandTransport Protocol cut (E-C8 owns the future pymavlink concrete); new FDR record kind `c12.reloc.requested`; log redaction (lat/lon 5 decimals, reason 200 chars). * AZ-523 C11 internal flight-state gate removed (SRP refactor): `confirm_flight_state` / `FlightStateSignal` use / `FlightStateNotOnGroundError` deleted from C11; TileUploader contract bumped to v2.0.0 (frozen) with migration note; AZ-317 superseded. * AZ-524 Package rename `c12_operator_tooling` → `c12_operator_orchestrator` across source, tests, pyproject, CMake, Dockerfile, compose, CI, runtime-root services class (`OperatorOrchestratorServices`) + factory function (`build_operator_orchestrator`), logger namespaces, config slug, docs, and the E-C12 epic title. Tests: 1543 passed, 80 skipped (all environment gates). Targeted AC suite (AZ-329 + AZ-330 + FdrFooterReader): 37 passed. Cold-start NFR-perf still ≤ 500 ms p99. Tracker: AZ-317 → Done (superseded); AZ-319 v2.0.0 contract bump comment; AZ-329/AZ-330 → In Testing; AZ-253 epic renamed; AZ-523 + AZ-524 created and closed as audit-trail tickets. See `_docs/03_implementation/batch_44_cycle1_report.md`. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
7644b25e8c |
[AZ-328] C12 BuildCacheOrchestrator + remote C10 invoker (Batch 43)
Implements F1 pre-flight cache build orchestrator on the operator workstation. Composes C11 TileDownloader (AZ-316), C12 CompanionBringup (AZ-327), C12 FlightsApiClient (AZ-489), and the new RemoteCacheProvisionerInvoker into one sequenced flow guarded by a filelock-backed workstation-side lockfile. Architectural decisions: - Phase-0 flight-resolve runs BEFORE the lockfile (ADR-010): a flight that cannot be resolved is an operator-input error, not a contended- resource error. Enforced by AC-11 + AC-14. - Consumer-side cuts (AZ-507) for C11 + C10 types: local Protocols / mirror DTOs in tile_downloader_cut.py and _types.py; external errors matched by name-based whitelisting so unknown exceptions still propagate per AC-6. Cross-component type translation lives at the composition root (c12_factory). - Failure surfacing: recognised operational failures (download error, companion not ready, build error, flight-resolve error) return as CacheBuildReport(outcome=failure, failure_phase=...). Only lockfile contention raises (BuildLockHeldError) since no phase ever ran. - Workstation-side filelock library (project pin); no custom primitive. - Remote C10 stdout streamed line-by-line as DEBUG with api_key / auth_token redacted before logging (defence-in-depth). - CLI is now a thin adapter; all workflow logic lives in build_cache.py. operator-tool build-cache exit codes map per CacheBuildReport.failure_phase + failure_exception_type. Tests: 116 c12 unit tests pass (29 new for AZ-328 covering 15/15 ACs + NFR-perf-overhead microbench; 7 new for remote_c10_invoker; 3 new for file_lock; test_cli_build_cache rewritten for new orchestrator interface). Full repo suite: 1522 passed, 80 skipped. Also: replays Batch 42's ruff format leftover for c12 flights_api + test_az489 files (formatter ran over the c12 directory after new files were added). Pure whitespace; no behaviour change. Full report: _docs/03_implementation/batch_43_cycle1_report.md Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
91ce1c2047 |
[AZ-326] [AZ-327] C12 operator-tool CLI + companion SSH bringup
AZ-326 (3pt): operator-tool Click CLI shell at src/gps_denied_onboard/components/c12_operator_tooling/cli.py with six subcommands (download, build-cache, upload-pending, reloc-confirm, verify-ready, set-sector); SectorClassificationStore (atomic-write JSON under ~/.azaion/onboard/sector-classifications.json); freshness-table lookup driving AC-NEW-6; EXIT_* constants; AZ-266 structured-JSON log wiring to a rotating ~/.azaion/onboard/c12-tooling.log handler; operator-tool console-script entry in pyproject.toml. AZ-327 (3pt): CompanionBringup orchestrator at src/gps_denied_onboard/components/c12_operator_tooling/companion_bringup.py that opens an SSH session against the companion (paramiko per project pin), checks the four pre-flight artifacts (Manifest, expected engines, sha256 sidecars, calibration), and returns a ReadinessReport per description.md S2; CompanionUnreachableError + ContentHashMismatchError with operator-friendly remediation hints; ParamikoSshSessionFactory + RemoteSidecarVerifier (sha256sum + cat over SSH, no bytes pulled to the workstation); paramiko>=3.4,<4.0 dep added. NFR-perf-cold-start fix: PEP 562 lazy __getattr__ in c12_operator_tooling/__init__.py and flights_api/__init__.py defers HttpxFlightsApiClient (httpx), ParamikoSshSession[Factory] (paramiko + cryptography), bbox_from_waypoints / takeoff_origin_from_flight (numpy + pyproj). cli.py imports from leaf flights_api modules. operator-tool --help cold start: ~870ms -> <200ms typical, <500ms p99. Includes 73 unit tests (incl. paramiko-version-drift smoke per AZ-327 Risk 1) + console-script integration test. All 1494 repo-wide unit tests pass; 80 skips are pre-existing environment gates. Batch report: _docs/03_implementation/batch_42_cycle1_report.md. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
a06b107fc3 |
[AZ-320] Add C11 IdempotentRetryTileUploader decorator
Wraps HttpTileUploader (AZ-319) with two bounded retry budgets: - In-call (per-batch) — re-invokes inner on PARTIAL outcome up to `max_in_call_retries` times with capped exponential backoff (`min(base ** attempt_number, cap)`). On exhaustion: surfaces an operator hint via `next_retry_at_s = now + backoff_cap_s`. - Per-tile (cross-call) — atomically increments c6's `tiles.upload_attempts` counter for every rejection; once a tile hits `max_per_tile_attempts` it is forward-only transitioned to `voting_status = upload_giveup` (excluded from `pending_uploads`). Each transition emits FDR `kind="c11.upload.giveup"` plus an ERROR log. C6 contract changes (AZ-303 v1.3.0): - VotingStatus.UPLOAD_GIVEUP added (forward-only from PENDING/TRUSTED). - TileMetadataStore.increment_upload_attempts(tile_id) -> int added with NotImplementedError default for backwards-compat. - Migration 0003_c11_upload_attempts: additive column + widened ck_tiles_voting_status (preserves IS NULL clause). C11 wiring: - C11RetryConfig + disable_retry_decorator on C11Config. - build_tile_uploader wraps in decorator by default; bypass flag returns the bare HttpTileUploader. New `clock` keyword. Cross-component isolation honoured (AZ-507): the decorator declares `_RetryMetadataStoreLike` Protocol cut over c6's TileMetadataStore and references `UPLOAD_GIVEUP` via a local string constant — no c6 imports. Tests: 13 decorator + 1 conformance + 2 factory bypass + AC-6 enum update + alembic head bump + AZ-272 schema fixture. 238 passed across c11/c6/fdr suites; pre-existing perf microbenches unrelated. Code review: PASS_WITH_WARNINGS (5 Low/Informational findings, docs-level or downstream-CI-blocked). See _docs/03_implementation/reviews/batch_41_review.md. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
90f4ac78f4 |
[AZ-316] Implement C11 HttpTileDownloader (batch 40)
Lands the operator-side pre-flight download path: authenticated httpx GETs against satellite-provider, RESTRICT-SAT-4 (>= 0.5 m/px) enforcement at the C11 boundary, c6 writes via consumer-side cuts (_TileWriterLike, _BudgetEnforcerLike), per-(flight_id, request_hash) journal under cache_root/.c11/journal/ for idempotent re-runs (AC-8, AC-12), 429 Retry-After + 5xx exponential backoff handling, fail-fast on TLS / 401 / 403, and a redacted-bearer auth-header policy. Architecture: - AZ-507 cross-component rule held: tile_downloader.py imports zero c6 symbols; the composition-root _C6DownloadAdapter in runtime_root/c11_factory.py absorbs c6's TileMetadata / TileSource / FreshnessLabel / VotingStatus enum assembly. - Sleep-callable injection (not full Clock) per Batch 39 precedent; default routes through WallClock.sleep_until_ns to keep the AZ-398 invariant intact. - No FDR records on the download path; spec mandates structured logs only (8 log kinds wired: session.start/end, resolution_rejected, freshness_rejected_summary, freshness_downgraded, batch.retry, provider.failed, budget.exceeded, idempotent_no_op). Tests: 14 new downloader unit tests covering AC-1..AC-9, AC-11, AC-12 plus throughput NFR + 429 HTTP-date + 429 budget exhaustion; 2 new TileDownloader Protocol conformance tests (AC-10). Full unit suite: 1420 passed, 80 skipped (env-gated), 0 failed. Code review: PASS_WITH_WARNINGS (5 Low findings, all documentation or downstream-blocked). See _docs/03_implementation/reviews/ batch_40_review.md and batch_40_cycle1_report.md. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
610e8a743c |
[AZ-319] C11 HttpTileUploader (post-landing upload path)
Lands the production HttpTileUploader composing AZ-317's gate, AZ-318's per-flight signing, and consumer-side cuts over c6 storage. Implements the full upload flow: gate ON_GROUND -> start_session -> enumerate pending -> per-batch multipart POST with Ed25519 signing -> mark_uploaded on ack -> end_session in finally. Honours Retry-After (RFC 7231 int + HTTP-date), exponential backoff on 5xx, fail-fast on TLS/401/403. Adds C11Config block, three FDR kinds (tile.queued, tile.rejected, batch.complete), and the build_tile_uploader composition-root factory. Cross-component access to c6 stays Protocol-cut (AZ-507 / AZ-270). Tests: 17 new unit tests covering AC-1..AC-14 plus throughput NFR; AZ-272 schema fixtures for the three new FDR kinds. Full unit suite: 1404 passed. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
cde237e236 |
[AZ-317] [AZ-318] C11 upload-side: flight-state gate + per-flight key
Batch 38 (cycle 1) lands the two upload-side prerequisites the upcoming AZ-319 TileUploader needs to authenticate per-flight sessions against the parent suite's D-PROJ-2 ingest contract. AZ-317 FlightStateGate: - confirm_on_ground() defence-in-depth gate atop ADR-004 process isolation; fail-closed for UNKNOWN, IN_FLIGHT, TAKING_OFF, LANDING, and source-failure (mapped to UNKNOWN with original exception preserved on __cause__). - ERROR log on refusal, INFO log on pass, single source call per invocation (no polling, no retry). AZ-318 PerFlightKeyManager: - Per-flight ephemeral Ed25519 keypair via the project-pinned cryptography library; sign(payload) -> 64-byte Ed25519 signature. - Best-effort zeroisation of a project-controlled bytearray mirror on end_session; OpenSSL-side buffer freed via dropped reference. - __del__ safety net with WARN log if end_session was missed. - start_session emits FDR kind=c11.upload.session.key.public so the safety officer can correlate flights with key fingerprints. - record_signature_rejection emits FDR + ERROR log on parent-suite ingest rejection (security-critical, never silently dropped). Shared C11 plumbing: - TileManagerError parent + 3 subclasses (FlightStateNotOnGroundError, SessionNotActiveError, SignatureRejectedError envelope). - FlightStateSignal (str, Enum) and PublicKeyFingerprint DTOs. - FlightStateSource Protocol on c11_tile_manager.interface. - runtime_root.c11_factory factories for both new services. - Two new FDR kinds registered in fdr_client.records central KNOWN_PAYLOAD_KEYS; AZ-272 schema-roundtrip fixtures added in lockstep so the central test stays green. Tests: 26 new + 2 fixture additions; full suite 1384 passed, 80 skipped (documented Docker / Tier-2 / CUDA gates). Code review: PASS_WITH_WARNINGS — 2 Low findings documented in _docs/03_implementation/reviews/batch_38_review.md (dev-host vs operator-workstation perf bound; spec text named StrEnum but project pins Python 3.10). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
f7b2e70085 |
[AZ-325] C10 CacheProvisioner orchestrator
Implements the public top-level F1 build orchestrator for E-C10 per contract v1.1.0. Composes EngineCompiler (AZ-321), DescriptorBatcher (AZ-322), and ManifestBuilder (AZ-323) into a single idempotent operation guarded by a fcntl-backed cache_root/.c10.lock and a post-build coverage walk. Adds: - CacheProvisionerImpl + FilelockFileLockFactory (provisioner.py) - BuildRequest/BuildReport/BuildOutcome/SectorClassification DTOs + FileLockFactory Protocol + replaced placeholder CacheProvisioner Protocol with v1.1.0 surface (interface.py) - C10ProvisionerConfig wired into C10ProvisioningConfig (config.py) - BuildLockHeldError + ManifestCoverageError (errors.py) - build_cache_provisioner composition root (c10_factory.py) - 18 tests covering AC-1..AC-16 + NFR-perf-coverage-walk - filelock>=3.13,<4.0 (single new third-party dep) Idempotence (CP-INV-1) reuses AZ-323's _compute_manifest_hash / _aggregate_tile_hash so the build-identity decision agrees byte-for- byte with the Manifest's recorded manifest_hash. Coverage rollback uses a .prev rename snapshot. Diagnostic compile_engines_for_corpus is lock-free per AC-10. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
f01a5058ab |
[AZ-322] C10 DescriptorBatcher (faiss-cpu, OOM halve-retry)
Implements the C10 internal phase that walks every C6 tile, embeds through C2's backbone via the AZ-321-produced engine, and rebuilds the AZ-306 FAISS HNSW index in one atomic write. - DescriptorBatcher with halve-and-retry OOM recovery (default 1 retry) - BackboneEmbedder Protocol + C7EngineBackboneEmbedder default impl - DescriptorBatchError for OOM / dim-mismatch / missing-output failures - Empty-corpus surfaces as outcome=failure with explicit hint to run C11 - Per-10% progress callback + DEBUG logs (no engine bytes leaked) - Consumer-side Protocol cuts (TilesByBboxBatchQuery, TilePixelOpener, DescriptorIndexRebuilder) so c10 stays within AZ-270 lint - runtime_root.c10_factory adds build_descriptor_batcher + three C6->C10 adapters - 16 unit tests covering AC-1..AC-10 + 2 NFRs + 4 supplemental (Protocol conformance, query pass-through, handle release, config) Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
3b7265757b |
[AZ-306] C6 FaissDescriptorIndex (faiss-cpu, HNSW32)
Production-default DescriptorIndex strategy backed by the faiss-cpu PyPI wheel (>=1.7,<2.0). Implements the AZ-303 Protocol surface end to end: HNSW32 + IndexIDMap2 search, atomic three-file rebuild (.index + .sha256 sidecar + .meta.json), triple-consistency load check, mmap-backed reads with IO_FLAG_MMAP|IO_FLAG_READ_ONLY, optional warm-up query at construction, FAISS RuntimeError rewrap to IndexUnavailableError / IndexBuildError, and FaissDescriptorIndex.from_config classmethod wired into runtime_root.storage_factory. The original spec required a custom pybind11 wrapper over a vendored FAISS HEAD; the user opted for the upstream faiss-cpu wheel after research fact #92 confirmed ARM64 wheel availability for Jetson and the existing pyproject.toml already pinned faiss-cpu. cpp/faiss_index/ placeholder removed; BUILD_FAISS_INDEX flag retained as a runtime/factory gate (no native target). Spec rewritten end-to-end and archived to _docs/02_tasks/done/. C6TileCacheConfig extended with faiss_index_path and faiss_warmup_query_path fields. tests/conftest.py sets KMP_DUPLICATE_LIB_OK=TRUE to remediate the macOS faiss/torch libomp duplicate-load abort during pytest (no-op on CI Linux). 21 new tests cover AC-1..12 + 2 NFRs + from_config smoke; AZ-303 protocol-conformance fake updated with from_config classmethod. Tests: 124/124 c6_tile_cache pass; 1334 project-wide pass; 1 pre-existing OKVIS2 submodule failure unrelated. Doc sync: module-layout.md, components/08_c6_tile_cache/description.md §5, batch_35_cycle1_report.md. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
e2bebefdfc |
[AZ-507] [AZ-323] [AZ-324] C10 Manifest build + verify + AZ-270 hygiene
AZ-507: codify cross-component import rule. Added _types/inference_errors.py shim re-exporting EngineBuildError + CalibrationCacheError from c7_inference; narrowed C10 EngineCompiler's except Exception to the two typed errors so unknown exceptions propagate (AC-3). Rewrote module-layout.md "Imports from" sections for 9 components + added Rule 9; appended an architecture.md ADR-009 note explaining why components must go through _types/*. AZ-323: ManifestBuilder + Ed25519ManifestSigner. Canonical JSON via orjson OPT_SORT_KEYS+OPT_INDENT_2, atomic-write Manifest.json + sha sidecar + .sig via AZ-280, operator-key fingerprint allowlist gate (C10-ST-01), ADR-010 takeoff_origin + flight_id baked into Manifest AND manifest_hash so re-planned routes change the cache identity (AC-15/AC-16). 20 unit tests cover all 16 ACs. AZ-324: ManifestVerifierImpl. Fail-closed Steps A-D: Manifest.json sidecar self-hash, Ed25519 trust-key set, schema parse with absolute/.. path rejection + takeoff_origin in-bbox check, stream SHA-256 per artifact with multi-failure accumulation. Operator mode re-derives tiles_coverage_sha256 from C6; airborne mode trusts the signed aggregate. 19 unit tests cover all 17 ACs. Composition root: c10_factory.build_manifest_builder + build_manifest_verifier + c6_tile_metadata_store_to_tiles_query adapter (the one place that legitimately imports both C6 and C10 without violating the AZ-270 lint). Dependency: pinned cryptography>=43.0,<46.0 in pyproject.toml. Tests: 1300 passed, 80 skipped (env-only), ruff clean for all AZ-323/324 files. AZ-306 (FAISS) intentionally deferred to batch 35 — needs C++ pybind11 toolchain not present in this environment. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
08e657d433 |
[AZ-507] [AZ-508] Onboard hygiene PBIs from batches 31-33 review
Open two ~2-point hygiene PBIs surfaced by _docs/03_implementation/cumulative_review_batches_31-33_cycle1_report.md: - AZ-507 (parent AZ-246 / E-CC-CONF) — align module-layout.md cross-component import rules with the AZ-270 lint test. Resolves the doc-vs-lint contradiction surfaced on AZ-321 by tightening the doc (option (a) from the review) + hoisting EngineBuildError / CalibrationCacheError to _types/inference_errors.py. - AZ-508 (parent AZ-264 / E-CC-HELPERS) — consolidate 5 identical _iso_ts_now() one-liners across c6_tile_cache + c7_inference into a single Layer-1 helper at helpers/iso_timestamps.py. Dependencies table headers bumped: 142 -> 144 tasks, 478 -> 482 points (product 345 -> 349). State file's pause-point detail cleared; next sub_step is the implement skill's Step 3 (compute next batch) for batch 34. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
0dfe7c5301 |
[AZ-321] C10 EngineCompiler: hardware-tied TRT compile + cache reuse
Land the C10 per-model engine compile + cache-reuse orchestrator. `EngineCompiler.compile_engines_for_corpus(request)` walks the corpus, computes the canonical engine filename via AZ-281 `EngineFilenameSchema.build`, and either reuses the cached binary (cache hit, AZ-280 `Sha256Sidecar.verify` returns True) or delegates to the AZ-297 `compile_engine` on the injected runtime (cache miss; the runtime owns the write path). Returns one `EngineCompileResult` per backbone carrying the canonical `EngineCacheEntry`, outcome (BUILT / REUSED), and `compile_duration_s` (None on reuse). Hardware-tied reuse (D-C10-6 / D-C10-7) falls out of the filename schema — a host change rebuilds at the new path and leaves the old files untouched (AC-4). Design corrections vs. the task spec body: - The spec proposed a c10-local `EngineCacheEntry` carrying outcome and duration; that name is already taken by the AZ-297 canonical DTO. The wrapper is renamed `EngineCompileResult`; the canonical shape wins. - The spec called `InferenceRuntime.host_info()`, which is not in the AZ-297 Protocol. `HostCapabilities` is threaded through `EngineCompileRequest` instead so the composition root owns host probing and the compiler stays decoupled. - The c10 layer cannot import `components.c7_inference` (arch rule `test_az270_compose_root.test_ac6`). `engine_compiler.py` defines `CompileEngineCallable` — a structural Protocol cut of `InferenceRuntime` exposing only `compile_engine` — and catches broad `Exception` (re-raising preserves the original type; `error_class` is recorded in the ERROR log payload). Production - engine_compiler.py: `CompileOutcome` enum, `BackboneSpec`, `EngineCompileRequest`, `EngineCompileResult`, `EngineCompileSummary` DTOs; `CompileEngineCallable` Protocol; `EngineCompiler` with the single public method. - config.py: `BackboneConfig` + `C10ProvisioningConfig` (`workspace_mb` default 4 GiB to match C7 NFT-LIM-01); validate positive shape dims and duplicate model_name detection in `__post_init__`. - runtime_root/c10_factory.py: `build_engine_compiler(config)` wires the existing `build_inference_runtime` factory through; `build_backbone_specs(config)` materialises the `BackboneSpec` tuple from the config block. - components/c10_provisioning/__init__.py: re-exports the AZ-321 surface and registers the new config block. Tests - test_engine_compiler.py: covers AC-1..AC-10 + missing-sidecar sibling case for AC-5. Tier-1 via fake runtime that writes through the REAL `Sha256Sidecar.write_atomic_and_sidecar`. Tier-2 placeholders for the cache-hit p99 NFR (200 MB engine sweep) and kill-during-compile atomic-write NFR. Docs - module-layout.md: c10_provisioning Per-Component Mapping lists the new internal modules (engine_compiler.py, config.py), the composition-root c10_factory.py, the AZ-321 public re-export surface, and the registered config block. - batch_33_cycle1_report.md + reviews/batch_33_review.md: PASS_WITH_WARNINGS (4 Low findings accepted). Tests run: c10_provisioning 13 passing + 2 Tier-2 skips; combined unit suite (excluding pending components) 543 passing, 21 env-skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
0ad3278b12 |
[AZ-299] C7 OnnxTrtEpRuntime: ORT + TRT EP fallback strategy
Land the fallback InferenceRuntime strategy that satisfies C7-IT-05: when the TRT-direct path (AZ-298) cannot deserialise a cached engine or when the operator explicitly selects ORT, the system stays in the air at degraded latency rather than dropping the request. Conforms to the AZ-297 Protocol; current_runtime_label() == "onnx_trt_ep". Production - onnx_trt_ep_runtime.py: compile_engine is a no-op returning an EngineCacheEntry pointing at the source .onnx; deserialize_engine is gate-first for .engine entries and gate-skip for .onnx, builds an ORT InferenceSession with the provider list [TensorrtExecutionProvider, CUDAExecutionProvider, CPUExecutionProvider], stages cached engines into the ORT TRT EP cache directory via symlink-or-copy, warms up with one session.run after construction, and honours config.inference.ort_disallow_cpu_ fallback by raising EngineDeserializeError when the active provider resolves to CPU; infer emits a one-shot c7.fallback_to_onnx_trt_ep WARN log plus gcs_alert callback on first call when is_fallback= True; release_engine is idempotent. _build_provider_args is the single point that pins TRT EP option-key names (Risk-3) and caps trt_max_workspace_size at gpu_memory_budget_bytes // 4 (AC-8). - config.py: adds ort_trt_cache_dir (validated non-empty) and ort_disallow_cpu_fallback to C7InferenceConfig. - fdr_client/records.py: adds c7.fallback_to_onnx_trt_ep and c7.cpu_fallback FDR record kinds. Tests - test_onnx_trt_ep_runtime.py: covers AC-1..AC-8 + Risk-2 CPU-fallback alert + Risk-3 option-key pin + NFR-reliability error rewrap; Tier-1 via fake ORT session; Tier-2 placeholders skip on macOS dev for numerical FP16 comparison and session-creation perf NFR. - test_protocol_conformance.py: drops onnx_trt_ep from the missing- module parametrize now that the module ships. - test_az272_fdr_record_schema.py: extends per-kind fixture builder to cover the two new C7 FDR kinds in the roundtrip / schema-version AC tests. Docs - module-layout.md: replaces the pending onnx_trt_runtime row with the shipped onnx_trt_ep_runtime row + capabilities list. - batch_32_cycle1_report.md + reviews/batch_32_review.md: full batch + self-review (PASS_WITH_WARNINGS, 4 Low findings accepted). Tests run: c7_inference 139 passing + 17 Tier-2 skips; combined unit suite (excluding pending components) 529 passing, 19 env-skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
18a69022b3 |
[AZ-298] C7 TensorrtRuntime: TRT 10.3 + INT8 calib trust + GPU budget
Implement the production-default InferenceRuntime strategy on JetPack 6.2 + TensorRT 10.3 (per D-C7-9). The runtime owns the full TRT lifecycle: compile_engine via the Polygraphy + trtexec + IBuilderConfig hybrid (FP16 / INT8 / Mixed precision), deserialize_engine with EngineGate-first ordering and a pre-allocation GPU memory budget gate, infer via H2D -> enqueueV3 -> D2H -> stream sync on the owned CUDA stream, idempotent release_engine, and an injected ThermalStatePublisher delegation for thermal_state. INT8 calibration cache trust (D-C10-6, AC-2/3/4) is enforced by a .calib_cache.sha256 file-integrity sidecar (AZ-280) plus a new .calib_cache.dataset_sha256 sidecar that records the dataset content hash at compile time; reuse only when both agree, rebuild silently on dataset hash mismatch, raise CalibrationCacheError on corrupt sidecar (never silently overwritten). GPU memory budget (NFT-LIM-01, default 4 GiB) is checked BEFORE any TRT call beyond the gate (AC-6); a pre-allocation refusal raises OutOfMemoryError and leaves the resident state unchanged. TensorRT 10.3 / Polygraphy / PyCUDA are lazy-imported inside the methods that need them so the module loads cleanly on Tier-0 hosts. A standalone CLI entry (python -m gps_denied_onboard.components.c7_inference.tensorrt_runtime compile <onnx> <build_config.json>) is wired for C10 CacheProvisioner (AZ-321) to invoke pre-flight without holding a runtime instance. C7InferenceConfig gains gpu_memory_budget_bytes (default 4 GiB) and trtexec_timeout_s (default 600 s, Risk 4 mitigation), both validated in __post_init__. Tests: 26 active + 6 Tier-2-gated skips; AC-1 / AC-3 / AC-4 / AC-5 / AC-6 / AC-7 / AC-10 + NFR-reliability fully covered on Tier-1 via fake CUDA / TRT modules; AC-2 / AC-8 / AC-9 / NFR-perf-deserialize placeholders skip with prerequisite reason and live in the AZ-298 Tier-2 microbench harness. Code review verdict PASS_WITH_WARNINGS (1 Medium hot-path hoist fix auto-applied). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
d571ca25f9 |
[AZ-308] c6 CacheBudgetEnforcer: 10 GB hard cap + LRU sweep
CacheBudgetEnforcer.reserve_headroom(needed_bytes) returns immediately when total_disk_bytes() + needed_bytes <= budget, otherwise iterates lru_candidates in eviction_batch_size batches, deletes via delete_tile, emits one INFO log per evicted tile (c6.evicted) and one FDR record per eviction batch (c6.eviction_batch, evicted_tile_ids capped to 5). Raises CacheBudgetExhaustedError AFTER a full sweep if the budget cannot be met. BudgetEnforcedTileStore decorates a TileStore so the policy stays separable from PostgresFilesystemStore. Composition root in storage_factory.build_tile_store wires the wrapper unconditionally. PostgresFilesystemStore now accepts lru_clock: Clock | None = None; when set, read_tile_pixels calls record_lru_access(tile_id, now) so eviction picks the right LRU candidates. Production wiring injects WallClock(); AZ-305 unit tests still construct without the clock and keep their pass-through semantics. Contract tile_store.md bumped to v1.1.0 to add CacheBudgetExhaustedError to the TileCacheError family; shared FDR schema bumped to v1.3.0 for the new c6.eviction_batch kind. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
39ff47087f |
[AZ-307] c6 FreshnessGate: active-conflict reject + stable-rear downgrade
Replaces the AZ-305 pass-through _evaluate_freshness hook with the production FreshnessGate. Loads tile_freshness_rules + sector classifications once at construction, builds an rtree index, and on every evaluate() either returns metadata unchanged (FRESH), stamps freshness_label=DOWNGRADED (stable_rear + stale), or raises FreshnessRejectionError carrying tile_id / age_seconds / classification / rule diagnostics (active_conflict + stale). Constructed inside PostgresFilesystemStore.from_config; the public storage_factory signature is preserved so AZ-305 unit tests still build the store with freshness_gate=None for the pass-through path. FDR schema bumped to v1.2.0: adds c6.freshness.rejected and c6.freshness.downgraded kinds (non-breaking; v1.1 readers route them opaquely). Operator CLI `python -m c6_tile_cache.freshness_gate explain` dry-runs the decision for a (lat, lon, capture_ts). Adjacent hygiene: c6_tile_cache.tools._dump_tile now passes os.environ to load_config (AZ-305 regression — load_config requires the env mapping). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
d1c1cd9ab4 |
[AZ-305] c6 PostgresFilesystemStore: TileStore + TileMetadataStore impl
Adds the production PostgresFilesystemStore implementing both protocols in a single class. Filesystem-backed JPEG I/O (atomic sidecar write, read-only mmap) + Postgres-backed metadata (spatial bbox, LRU, voting, upload bookkeeping). Wires composition via `from_config` classmethod. Key behaviors: - AC-3 strict reading: INSERT runs first inside an open transaction; duplicate-key collisions raise `TileMetadataError` BEFORE any byte is written, leaving the original file + sidecar byte-identical. Atomic sidecar write happens inside the same transaction; commit closes it. Comp-delete remains as a safety net for the rare commit-after-write failure path. - AC-2 content-hash gate runs before any I/O. - Construction performs an orphan-file reconciliation scan and emits an INFO `c6.store.construct` log with steady-state stats. Adds `c6.write` and `c6.write_failed` FDR record kinds (schema v1.1.0, forward-compatible) and a thin operator CLI at `c6_tile_cache.tools dump` for inspection. Dependencies: adds `psycopg-pool>=3.2,<4.0` for the connection pool used on the F3 read-hot path. Tests: 25 new tests for c6_tile_cache cover AC-1..AC-15 plus MmapTilePixelHandle + helper round-trips. Full Tier-2 unit suite passes (1215 passed, 8 skipped, 1 pre-existing unrelated failure `test_ac8_read_host_tuple_on_jetson` — missing `pynvml` on macOS, Jetson-only). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
dde838d2cc |
[AZ-304] C6 Postgres schema: additive 0002 migration + UUIDv5
Strictly additive Alembic migration on the AZ-263 baseline (data_model
.md § 6.1 / § 6.3): six new tiles columns (tile_uuid UNIQUE,
location_hash, content_sha256, disk_bytes, accessed_at, uploaded_at),
four new btree indices, one UNIQUE expression index over the
COALESCE-zero-uuid natural key, CHECK widening of
ck_tiles_freshness_status to the AZ-263 + AZ-303 vocabulary UNION,
four NULLable bbox columns on sector_classifications, and a new
tile_freshness_rules table seeded with the two default thresholds.
Pinned UUIDv5 namespace (TILE_NAMESPACE_UUID =
5b8d0c2e-1a4f-4b3a-8c9d-e7f6a3b2c1d0) + derive_tile_id /
derive_location_hash helpers cross-coordinated with
satellite-provider. Migration runner apply_migrations(config) drives
Alembic command.upgrade("head") against the AZ-263 env with one
retry on PG SQLSTATE 40001 and structured INFO logs on apply / no-op.
Contract bump tile_metadata_store.md v1.1.0 -> v1.2.0 adds
TileMetadata.location_hash: UUID | None = None (non-breaking).
module-layout.md updated so c6_tile_cache explicitly Owns
db/migrations/**.
Tier-1 tests: UUIDv5 determinism + locked vectors + DSN resolution +
retry mocked DBAPIError -> 1180 passed, 32 skipped. Tier-2 docker
schema tests gated by @pytest.mark.docker run against the existing
docker-compose.test.yml db service.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
21f5a30d09 |
refactor: update autodev state and tile metadata store version
- Changed autodev state to reflect the transition from batch 26 to batch 27, updating the phase and details for the compute-batch step. - Incremented the version of the tile metadata store from 1.0.0 to 1.1.0, refining the uniqueness invariant to use a natural key that includes flight_id, allowing coexistence of multiple rows for the same tile from different flights. - Updated the last modified date in the tile metadata store documentation to reflect recent changes. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
49a67f770d |
[AZ-302] C7 ThermalStatePublisher — jtop/NVML 1 Hz background poller
Implements AZ-297 InferenceRuntime's thermal_state() side: a singleton background-thread publisher that polls jtop (preferred) or pynvml (fallback) at config.thermal_poll_hz, stores an atomic ThermalState snapshot, and emits c7.thermal_transition FDR records on every throttle flip with a WARN log on entry and an INFO log on exit. Default-safe on TelemetryUnavailableError per Invariant I-6 with a 1-Hz rate-limited WARN. Sources return a raw ThermalReading; the publisher stamps measured_at_ns via its injected Clock so _JtopSource / _PynvmlSource stay clean of direct time.* calls (Invariant 2). _poll_once is the deterministic test seam — start() spawns the production thread. - c7.thermal_transition registered in fdr_client.records KNOWN_PAYLOAD_KEYS - [telemetry] optional dep group (jetson-stats, pynvml) added to pyproject - 14 unit tests (AC-1..AC-6, AC-8, NFR-default-safe, structural) green; AC-7 / AC-1 microbench / NFR-perf-poll Tier-2 deferred - full unit suite: 1140 passed, 11 expected Tier-2/CUDA skips Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
59f56c032f |
[AZ-301] Implement EngineGate — D-C10-3 + D-C10-7 takeoff validator
AZ-301 takeoff-side validator every InferenceRuntime strategy calls
before deserialize_engine. Five-step deterministic refusal pipeline,
in order:
1. filename schema parse -> EngineSchemaMismatchError(reason=...)
2. schema tuple match -> EngineSchemaMismatchError(expected,got)
3. sidecar present -> EngineSidecarMissingError
4. sidecar trust -> EngineHashMismatchError(stage=sidecar)
5. manifest match -> EngineHashMismatchError(stage=manifest)
Refusal order is part of the public contract (AC-7 verifies a
fixture that is BOTH schema-mismatched AND missing-sidecar refuses
at step 1).
Production code (new):
- components/c7_inference/engine_gate.py -- EngineGate, HostTuple,
read_host_tuple (Jetson: pynvml + /etc/nv_tegra_release +
tensorrt.__version__; raises RuntimeError on Tier-1)
- components/c7_inference/manifest.py -- DeploymentManifest,
ManifestReader, ManifestReaderProtocol. Risk-2 enforced at the
type level: __getitem__ raises EngineHashMismatchError on
missing key, NEVER KeyError, so the gate cannot silently pass
- components/c7_inference/__init__.py -- re-exports the new
public surface
Tests (new): tests/unit/c7_inference/test_engine_gate.py covers
AC-1..AC-7 + NFR-reliability-no-write + manifest reader + refusal
log emission. 14 tests unconditional + AC-8 Tier-2 skip (needs
real NVML + L4T release file + tensorrt binding).
Three task-spec -> as-built deltas documented in
_docs/02_tasks/done/AZ-301_c7_engine_gate.md Implementation Notes:
1. HostTuple lives in engine_gate.py (the only consumer);
re-exported from package __init__.py.
2. read_host_tuple takes precision as a keyword argument — three
of four fields come from the host, precision is engine-build
metadata supplied by the caller.
3. AC-8 is Tier-2-only; AC-1..AC-7 + NFR-reliability + extras
run on every CI host.
Risk-2 (manifest reader silently treats missing entry as pass):
DeploymentManifest.__getitem__ raises EngineHashMismatchError with
"missing manifest entry for {path}" — covered by
test_manifest_missing_entry_raises_hash_mismatch.
NFR-perf-validate (p99 <= 50 ms): tier-2 only — a real 500 MB
engine streaming sha256 cannot be benchmarked on Tier-1 fixtures.
AZ-302 (ThermalStatePublisher) + AZ-304 (C6 Postgres schema)
deferred to batches 26 / 27 to keep the 1-task batch cadence and
isolate their respective env / testcontainer surface areas.
Suite: 1134 passed / 11 skipped. No regressions outside the new
files.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
65ad2168ed |
[AZ-300] Implement PytorchFp16Runtime — C7 simple-baseline strategy
AZ-300 mandatory simple-baseline InferenceRuntime (eager FP16 PyTorch).
Implements the AZ-297 Protocol; current_runtime_label returns
"pytorch_fp16". Numerical reference every fancier C7 strategy (AZ-298
TRT, AZ-299 ORT) is measured against, and the only viable runtime for
Tier-1 workstation Docker where TRT is non-trivial to install.
Production code (new):
- components/c7_inference/pytorch_fp16_runtime.py — runtime +
PytorchEngineHandle + output-shape adapter
- components/c7_inference/architecture_registry.py — torch-free
register_architecture / default_registry / ArchitectureFactory
(Risk-1 mitigation: no L2->L3 back-edge from C7 into per-backbone
code)
- components/c7_inference/__init__.py — re-exports the registry
mechanism. Still does NOT import the concrete strategy module
(Invariant I-5)
- components/c7_inference/config.py — adds per_frame_debug_log bool
field (gates the DEBUG per-frame latency log)
Tests (new): tests/unit/c7_inference/test_pytorch_fp16_runtime.py
covers AC-1..AC-8 + NFRs. AC-1/2/6/7 + thermal/release/registry
guards run unconditionally (17 tests); AC-3/4/5/8 +
NFR-perf-deserialize + NFR-reliability-eval-mode require CUDA and
skip on Tier-1 CI / macOS dev.
Tests (modified):
- test_protocol_conformance.py — narrowed
test_ac5_build_inference_runtime_flag_on_but_module_missing
parametrisation to exclude pytorch_fp16 (now-built); TRT / ORT
still covered until AZ-298 / AZ-299 ship.
CI: .github/workflows/ci.yml lint + unit jobs now install
'-e .[dev,inference]' because mypy + pytest need torch + torchvision +
onnxruntime on the runner.
Three task-spec -> as-built deltas documented in
_docs/02_tasks/done/AZ-300_c7_pytorch_baseline.md Implementation Notes:
1. Constructor conforms to AZ-297 factory shape (config positional;
thermal_publisher + registry + clock keyword-only optionals).
AZ-302 will update the factory to thread thermal_publisher.
2. Architecture registry uses extras["model_name"] as lookup key
(avoids touching the frozen BuildConfig / EngineCacheEntry DTOs).
3. Warm-up forward deferred to AZ-300 tier-2 follow-up — the zero-arg
registry has no per-backbone input-shape metadata.
Suite: 1120 passed / 10 skipped (CUDA + Tier-2 + cmake / actionlint
environment gates). No regressions in non-c7_inference areas.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
1ebab29a4f |
[AZ-332] C1 OKVIS2 Strategy: facade + binding skeleton
Python facade (`Okvis2Strategy`) is production-quality and satisfies
AZ-331's `VioStrategy` protocol; full AC-1..10 coverage with
AC-9 + NFR-perf marked `tier2`. The C++ pybind11 binding compiles
and loads but throws `OkvisFatalException("estimator not yet wired")`
on first `add_frame` — the `okvis::ThreadedKFVio` wiring is a tier2
follow-up the Step-15 Product Completeness Gate is expected to track
as a remediation task.
Resolved contradictions:
* Constructor signature aligned with the AZ-331 factory: `(config, *,
fdr_client, clock=None)`. Calibration / preintegrator / logger
built internally from config. No churn on AZ-331.
* IMU substrate: OKVIS2 owns its internal estimator IMU integration;
the AZ-276 `ImuPreintegrator` is a separate substrate consumed by
E-C5's fusion graph. Single source of truth lives at the sample
stream, not the integrator instance.
* FDR API: `FdrClient.enqueue(record)` with new `vio.health` kind
added to AZ-272 `KNOWN_PAYLOAD_KEYS`.
CI matrix forces `-DBUILD_OKVIS2=OFF` until the tier2 wiring task
brings Ceres / SuiteSparse / OKVIS2 vendored submodules into the
Linux build.
Files: 17 added/modified across `c1_vio/`, `fdr_client/records.py`,
`cpp/okvis2/CMakeLists.txt`, CI workflow, AZ-332 task spec
(implementation-notes section), batch 23 report.
Tests: 17 new (15 tier1 + 2 tier2). Full Tier-1 suite: 1109 pass,
2 skipped (env), 2 deselected (tier2). No regressions.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
48ea1e2fc2 |
[AZ-343] C2.5 InlierCountReRanker + shared FeatureExtractor helper
Implements the production-default ReRankStrategy: K=10 → N=3 by single-pair LightGlue inlier count, with strict drop-and-continue (INV-8) on per-candidate TileFetch / backbone / zero-inlier failures and RerankAllCandidatesFailedError on zero survivors. Composition root injects the shared LightGlueRuntime + Clock + the new FeatureExtractor helper (an L1 placeholder OpenCvOrbExtractor that unblocks AZ-343 and future C3 strategies — task scope expansion). Architectural notes: - Cross-component imports stay banned; tile_store types as `object` and the C6 TileCacheError family is duck-typed by class module prefix (same workaround AZ-348 adopted for c7_inference; proper fix is to relocate TileCacheError to _types/ in a follow-up). - Clock injection follows the replay contract (AZ-398 Invariant 2); reranked_at is sourced from clock.monotonic_ns(). - AZ-342 factory grew `feature_extractor` + `clock` + `fdr_client` parameters; existing AZ-342 conformance tests updated. Tests: 19 new AC-1..AC-12 + mixed-failure scenarios in test_inlier_count_reranker.py; existing AZ-342 suite (26) still green. Full repo sweep 1093 passed / 2 skipped (cmake/actionlint not on PATH). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
9a605c8514 |
[AZ-348] C3.5 ConditionalRefiner Protocol + factory + PassthroughRefiner
Defines the public `ConditionalRefiner` Protocol (PEP 544 @runtime_checkable, two methods: `refine_if_needed` + `was_invoked`), extends `MatchResult` in-place with two default-valued refinement fields (`refinement_label`, `refinement_added_latency_ms`), defines the `RefinerError` family (`RefinerBackboneError`, `RefinerConfigError`), and ships the trivial `PassthroughRefiner` reference impl. Both refiner strategies are linked unconditionally — no `BUILD_REFINER_*` flag (NOT ADR-002 territory). Runtime selection only per ADR-001. `PassthroughRefiner` returns the input `MatchResult` by reference (bit-identical correspondences per contract INV-5) and always reports `was_invoked() is False`. Documentation: renames `module-layout.md` `c3_5_adhop` Public API symbol from `AdHoPRefinementStrategy` to `ConditionalRefiner` (AC-14) so the doc agrees with `description.md` and the contract. AC-9 (single-thread binding) deferred to AZ-270 runtime-root composition, mirroring AZ-336 / AZ-342 / AZ-344 Risk-4 precedent. AC-7 for the `"adhop"` strategy stops at `ModuleNotFoundError` because the AdHoP backbone is owned by AZ-349. All other ACs + NFRs covered by 36 new conformance tests. Architectural note: `PassthroughRefiner.inference_runtime` is typed as `object` because the L3→L3 import ban (`test_az270_compose_root`) forbids c3_5_adhop from importing c7_inference; the runtime-root factory narrows the type at construction time. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
89c223882b |
[AZ-344] C3 CrossDomainMatcher Protocol + factory + RollingHealthWindow
Defines the public `CrossDomainMatcher` Protocol (PEP 544 @runtime_checkable, two methods: `match` + `health_snapshot`), the three frozen+slotted DTOs (`CandidateMatchSet`, `MatchResult`, `MatcherHealth`) in the L1 `_types/matcher.py` layer, the `MatcherError` family (`MatcherBackboneError`, `InsufficientInliersError`), and the composition-root `build_matcher_strategy` factory with lazy-import + `BUILD_MATCHER_<variant>` gating per ADR-002. `RollingHealthWindow` accumulator (60 s, amortised O(1) update, strict O(1) snapshot) is constructed by the factory and injected into every concrete matcher so all backbones share window semantics; this is what backs C5's spoof-promotion gate. Legacy placeholder `MatchResult` removed from `_types/matching.py`; import-only consumers (`c4_pose.interface`, `c3_5_adhop.interface`) repointed at the new `_types/matcher.py` home — zero behavioural change to those components. AC-9 (single-thread binding) and AC-10 (LightGlueRuntime identity-share with C2.5) deferred to AZ-270 runtime-root composition, mirroring the AZ-342 Risk-4 escape clause. All other ACs + NFRs covered by 70 new conformance tests. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
d6756f1855 |
[AZ-342] C2.5 ReRankStrategy: Protocol + DTOs + factory + composition
Foundational scaffolding for the InlierCountReRanker (AZ-343) and the future C3 CrossDomainMatcher consumer (AZ-344). No concrete re-ranker is implemented here. * ReRankStrategy Protocol (single rerank(frame, vpr_result, n, calibration) -> RerankResult method) with all 8 invariants in the docstring — notably INV-8 drop-and-continue (per-candidate failure NEVER propagates unless every candidate fails). * DTOs moved to L1 _types/rerank.py — RerankCandidate, RerankResult; frozen+slots; tuple-not-list for RerankResult.candidates; tile_id encoded as (zoom_level, lat, lon) tuple to keep _types/ free of any c6_tile_cache (L3) import per module-layout.md. * Error family: RerankError + RerankBackboneError + RerankAllCandidatesFailedError. Only RerankAllCandidatesFailedError escapes rerank(); RerankBackboneError is caught inside the per- candidate loop, logged ERROR, FDR-stamped, candidate dropped. * C2_5RerankConfig (strategy enum default "inlier_count", top_n int default 3) with strict validation at load; registered into Config.components on c2_5_rerank import. * build_rerank_strategy(config, *, tile_store, lightglue_runtime) factory: 1-strategy resolution table, lazy import, BUILD_RERANK_<variant> gate, ImportError → StrategyNotAvailableError mapping. The shared LightGlueRuntime is constructor-injected (R14 fix: neither C2.5 nor C3 owns its lifecycle). Renamed the Protocol from the existing stub "RerankStrategy" to "ReRankStrategy" to match the contract; updated module-layout.md. Removed the legacy RerankResult shape from _types/vpr.py — the v1.0.0 shape lives in _types/rerank.py. Excluded per task spec: * Concrete InlierCountReRanker (AZ-343). * C3 matcher protocol task (AZ-344, next in batch). * AC-9 single-thread binding + AC-10 LightGlueRuntime identity-share between C2.5/C3 — deferred per task spec Risk 3 until the generic compose_root thread-binding registry and the C3 factory both land. Tests: AC-1..AC-8 + AC-11 + NFR-perf-factory in tests/unit/c2_5_rerank/test_protocol_conformance.py. The legacy smoke test is removed. Full sweep: 997 passed (one pre-existing flake in test_az296_takeoff_abort, subprocess timing, unrelated to this commit; passes in isolation). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
3665acef66 |
[AZ-336] C2 VprStrategy: Protocol + DTOs + factory + composition
Foundational scaffolding for every concrete C2 backbone (UltraVPR, NetVLAD, MegaLoc, MixVPR, SelaVPR, EigenPlaces, SALAD — AZ-337..AZ-340) and the C2.5 ReRanker consumer side. No backbone is implemented here. * VprStrategy Protocol (embed_query / retrieve_topk / descriptor_dim) + BackbonePreprocessor C2-internal Protocol (NOT in Public API per description.md § 6). * DTOs in L1 _types/vpr.py — VprQuery, VprCandidate, VprResult; all frozen + slots; tuple-not-list for VprResult.candidates so the immutability invariant truly holds. * Error family: VprError + VprBackboneError + VprPreprocessError + IndexUnavailableError; same-named but namespace-distinct from c6_tile_cache.IndexUnavailableError (the c2 family is the closed envelope C5 / C2.5 consume; concrete strategies rewrap the C6 form). * C2VprConfig (strategy enum + backbone_weights_path + faiss_index_path) with strict validation at load; registered into Config.components on c2_vpr import. * build_vpr_strategy factory with 7-strategy resolution table, lazy import, BUILD_VPR_<variant> gating, ImportError→ StrategyNotAvailableError mapping, and pre-flight descriptor_dim match against DescriptorIndex.descriptor_dim() — mismatch fires ConfigError at startup, NOT at first frame. Contract change vs the v1.0.0 draft: factory takes descriptor_index: DescriptorIndex (not tile_store: TileStore) because descriptor_dim() lives on DescriptorIndex per C6's Public API. The contract markdown is updated to match. Architecture: VprCandidate.tile_id is a plain (zoom, lat, lon) tuple, keeping _types/ (L1) free of any c6_tile_cache (L3) import per module-layout.md. Consumers reconstruct TileId at the C6 boundary. Excluded per task spec: * Concrete backbones (AZ-337..AZ-340). * FAISS HNSW retrieve wiring (AZ-341). * DescriptorNormaliser helper (AZ-283, already shipped). * AC-9 single-thread binding — deferred per task spec Risk 4 until the generic compose_root thread-binding registry is in place (today each factory owns its own, e.g. fc_factory). Tests: 45 ACs + NFRs in tests/unit/c2_vpr/test_protocol_conformance.py covering AC-1..AC-8, the error family, the config validation, the factory NFR (p99 ≤ 50 ms). The legacy smoke test is removed. Full sweep 973 passed, 2 skipped (CI-only cmake / actionlint). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
823c0f1b2e |
[AZ-398] Replay: FrameSource + Clock Protocols + Clock injection
Ship the two Layer-1 cross-cutting Protocols replay mode needs to leave production C1-C5 components mode-agnostic (Invariant 1) and replay- deterministic (Invariant 2). Live + replay binaries see the same interfaces; only the strategy differs. * Clock Protocol (monotonic_ns / time_ns / sleep_until_ns) + WallClock (live + REALTIME replay) + TlogDerivedClock (ASAP replay; advance-on-call; non-monotonic source → ClockOrderingError). * FrameSource Protocol (next_frame -> NavCameraFrame | None / close) + LiveCameraFrameSource (cv2.VideoCapture device index) + VideoFileFrameSource (cv2.VideoCapture file). * Build-flag gating: BUILD_VIDEO_FILE_FRAME_SOURCE, BUILD_LIVE_CAMERA_FRAME_SOURCE (constructor-time check; Tier-0 OFF refuses construction with FrameSourceConfigError). * Composition-root factories: build_clock + build_frame_source. * Injected Clock across every component that previously called time.monotonic_ns() / time.sleep() directly: c5_state (estimator, ESKF, fallback watcher, source-label SM, isam2 handle), c8_fc_adapter (inbound MAVLink + MSP2, AP outbound, iNav outbound, QGC GCS), c13_fdr writer, c12_operator_tooling httpx flights client. All constructors default to WallClock() so existing call sites keep live-binary behaviour without a wiring change. * AC-4 CI guard (tests/_meta/test_no_direct_time_in_components.py) AST-scans components/**/*.py for direct time.monotonic_ns / time.time_ns / time.sleep references and fails loudly with file:line. * Conformance + factory tests: tests/unit/clock + tests/unit/frame_source. * Test fixture updates: FallbackWatcher / SourceLabelStateMachine clock_ns is now required (removed time.monotonic_ns default); test_az388 patches estimator._clock instead of a module-level time; test_az393 ardupilot adapter uses a _FixedClock test double. Excluded per the task spec: TlogReplayFcAdapter (AZ-399), ReplaySink (AZ-400), compose_replay (AZ-401), CLI (AZ-402), Docker/CI (AZ-403), E2E fixture (AZ-404), IMU auto-sync (AZ-405). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
6c7d24f7e0 |
[AZ-331] C1 VioStrategy: Protocol + DTOs + factory + C5 migration
Freezes the c1_vio Public API per _docs/02_document/contracts/c1_vio/vio_strategy_protocol.md v1.0.0: - VioStrategy Protocol (4 methods: process_frame, reset_to_warm_start, health_snapshot, current_strategy_label) in components/c1_vio/interface.py. - DTOs (VioOutput, VioHealth, FeatureQuality, WarmStartPose) + VioState enum in _types/nav.py — L1 placement so C5 + C13 consume them without crossing the components.* boundary (AZ-270 AC-6). The new VioOutput shape (frame_id: str, relative_pose_T: gtsam.Pose3, pose_covariance_6x6, imu_bias, feature_quality, emitted_at_ns) replaces the AZ-263 scaffolding in _types/vio.py, which is now deleted. - VioError family (VioInitializingError / VioDegradedError / VioFatalError) in components/c1_vio/errors.py. Documented rationale: the degraded-operation path returns a VioOutput with inflated covariance + VioHealth.state=DEGRADED rather than raising VioDegradedError — the error type exists only for the rare degraded->fatal transition. - C1VioConfig per-component config block (strategy enum, lost_frame_threshold default 9, warm_start_max_frames default 5) with constructor-time validation rejecting unknown strategy labels. - StrategyNotAvailableError added to runtime_root/errors.py; composition-time error distinct from the VioError family. - Composition-root factory build_vio_strategy in runtime_root/vio_factory.py with three BUILD_* gates (BUILD_OKVIS2, BUILD_VINS_MONO, BUILD_KLT_RANSAC). Concrete strategy modules are imported lazily via __import__ AFTER the flag check — Tier-0 workstation builds with the flag OFF MUST NOT load the strategy module (Risk-2 / I-5; verifiable via sys.modules). - 36 conformance tests cover all 9 ACs + NFR-perf-factory (p99 build under 200 ms x 1000 calls) + NFR-reliability-error-family. AC-8 introspects the contract file's Shape table and asserts method parity against the runtime Protocol; AC-9 asserts the frame_id annotation is 'str' (PEP-563 stringified). C5 migration (consumers of the new VioOutput shape): - gtsam_isam2_estimator.py + eskf_baseline.py: replaced vio.timestamp -> vio.emitted_at_ns (drops _datetime_to_ns on the VIO path), vio.pose_se3 -> vio.relative_pose_T (gtsam.Pose3 direct; drops _pose_se3_to_gtsam / _pose_se3_to_array), vio.covariance_6x6 -> vio.pose_covariance_6x6 (rename). - key_for_frame signature widened to UUID | int | str to accept the new str frame_id. - 4 C5 test files migrated to the new VioOutput shape with helper fixtures producing ImuBias + FeatureQuality + str frame_id. - c5_state/interface.py TYPE_CHECKING import path updated. Bootstrap healthcheck + test_types_importable updated to drop the deleted _types/vio module and pick up _types/inference (AZ-297) in the same sweep. Full unit-test sweep: 884 passed, 2 pre-existing environment skips (cmake, actionlint). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
daff5d4d1c |
[AZ-297] C7 InferenceRuntime: Protocol + DTOs + factory
Freezes the c7_inference Public API per _docs/02_document/contracts/c7_inference/inference_runtime_protocol.md v1.0.0: - InferenceRuntime Protocol (6 methods: compile_engine, deserialize_engine, infer, release_engine, thermal_state, current_runtime_label) in components/c7_inference/interface.py. - DTOs (PrecisionMode enum, OptimizationProfile, BuildConfig, EngineCacheEntry, EngineHandle opaque marker) in _types/inference.py — placed at the L1 types layer so C10 can re-export EngineCacheEntry without crossing the components.* boundary (AZ-270 AC-6). - ThermalState DTO expanded in _types/thermal.py from the AZ-355 forward-declared stub to the AZ-297 contract shape (cpu/gpu temp, thermal_throttle_active, measured_clock_mhz, measured_at_ns, is_telemetry_available). Invariant I-6: when telemetry is unavailable, throttle is False. - Error family rooted at c7_inference.errors.RuntimeError (9 subtypes: EngineBuildError, EngineDeserializeError, EngineHashMismatchError, EngineSchemaMismatchError, EngineSidecarMissingError, CalibrationCacheError, InferenceError, OutOfMemoryError, TelemetryUnavailableError). RuntimeNotAvailableError stays in runtime_root/errors.py — composition-time, outside the family. - C7InferenceConfig per-component config block (runtime label, thermal_poll_hz, engine_cache_dir) with constructor-time validation rejecting unknown runtime labels. - Composition-root factory build_inference_runtime in runtime_root/inference_factory.py with three BUILD_* gates (BUILD_TENSORRT_RUNTIME, BUILD_ONNX_TRT_EP_RUNTIME, BUILD_PYTORCH_FP16_RUNTIME). Concrete strategy modules are imported lazily via __import__ AFTER the flag check, so a Tier-0 build with the flag OFF MUST NOT load the strategy module (AC-5 / I-5; verifiable via sys.modules). - 37 conformance tests cover all 8 ACs + NFR-perf-factory (p99 build under 200 ms × 1000 calls) + NFR-reliability-error-family. AC-8 introspects the contract file's Shape table and asserts method parity against the runtime Protocol; also asserts all 9 error subtypes are documented. Retired the AZ-263 scaffolding EngineCacheEntry from _types/manifests.py (replaced by the AZ-297 canonical shape in _types/inference.py); updated the LightGlue-flavoured EngineHandle Protocol docstring in _types/manifests.py to rationalize its intentional dual existence with the C7 opaque EngineHandle (same name, different consumer-side cut, mirroring the C4/C5 ISam2GraphHandle pattern). Stale ThermalState.throttle docstring references in c4_pose/config.py, c4_pose/interface.py, and _types/pose.py updated to thermal_throttle_active. Full unit-test sweep: 843 passed, 2 pre-existing environment skips (cmake, actionlint). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
f925af9de3 |
[AZ-303] C6 storage interfaces: Protocols + DTOs + factories
Freezes the c6_tile_cache Public API per
_docs/02_document/contracts/c6_tile_cache/{tile_store,tile_metadata_store,
descriptor_index}.md v1.0.0:
- Three runtime_checkable Protocols (TileStore 4-method, TileMetadataStore
9-method, DescriptorIndex 5-method) in components/c6_tile_cache/interface.py.
- Frozen DTOs + enums (TileId, TileMetadata, TileMetadataPersistent,
TileQualityMetadata, Bbox, SectorBoundary, HnswParams, IndexMetadata,
TileSource, FreshnessLabel, VotingStatus, SectorClassification) in
components/c6_tile_cache/_types.py. Constructor-time validation rejects
out-of-range zoom_level / lat / lon and inverted Bbox.
- TilePixelHandle ABC for read-only mmap access (Invariant I-4).
- TileCacheError family (6 subtypes) + IndexBuildError (deliberately
outside the family) in components/c6_tile_cache/errors.py.
- C6TileCacheConfig per-component config block, registered on package
import; validates known runtime labels at construction time.
- Composition-root factories build_tile_store / build_tile_metadata_store /
build_descriptor_index in runtime_root/storage_factory.py, with lazy
concrete-impl imports gated by BUILD_FAISS_INDEX (AC-5 / Risk 2:
no module-level FAISS import when the flag is OFF).
- RuntimeNotAvailableError defined in runtime_root/errors.py to be shared
with AZ-297 (composition-time error, distinct from per-component
runtime errors).
51 conformance tests cover all 10 ACs + NFR-perf-factory (p99 build_*
under 50 ms across 1000 calls) + NFR-reliability-error-family. AC-9
introspects each contract file's Shape table and asserts method
parity against the runtime Protocol.
Retired the AZ-263 scaffolding SectorClassification (dataclass) and
TileQualityMetadata from _types/tile.py since their canonical home is
now c6_tile_cache._types; Tile and TileRecord remain in _types/tile.py
until c3_matcher (AZ-344) and c11_tile_manager (AZ-316/319) retire
their interface stubs.
Full unit-test sweep: 791 passed, 2 pre-existing environment skips.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
8a83166261 |
[AZ-490] C5 set_takeoff_origin entrypoint + bounded-delta GPS gate
Add operator warm-start path to C5 StateEstimator Protocol and both
implementations (GtsamIsam2StateEstimator, EskfStateEstimator), plus
the third clause of the AZ-385 spoof-promotion gate.
- StateEstimator Protocol: set_takeoff_origin(origin, sigma_horiz_m,
sigma_vert_m) -> None.
- iSAM2: PriorFactorPose3 at origin with diagonal sigmas, single
isam2.update().
- ESKF: zero _nominal_pos, overwrite _P position block with sigma**2.
- SourceLabelStateMachine.process_gps_sample bounded-delta clause:
WgsConverter.horizontal_distance_m vs smoother estimate; reject
resets the dwell-time counter so AZ-385 cannot re-promote off bad
GPS.
- New EstimatorAlreadyStartedError (StateEstimatorConfigError
subclass) on late call after first add_*.
- C5StateConfig: spoof_promotion_bounded_delta_m=200,
default_takeoff_origin_sigma_horiz_m=5,
default_takeoff_origin_sigma_vert_m=10.
- New GpsSample DTO + WgsConverter.horizontal_distance_m helper.
- 4 new FDR kinds (cold_start_origin.{set,unavailable},
gps_bounded_delta.{accept,reject}) registered in AZ-272 schema.
- 33 new unit tests cover AC-1..AC-15; full repo 750 passed / 2
skipped (pre-existing CI tooling skips).
Docs synced: protocol contract, C5 component description,
architecture, glossary, system-flows, C10 provisioning description.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
72a06edab0 |
[AZ-489] C12 FlightsApiClient + offline JSON loader + bbox helper
ADR-010 primary cold-start path now has a real source for the cache bbox and the takeoff origin. Single concrete strategy (`HttpxFlightsApiClient`) behind a `@runtime_checkable` Protocol; offline JSON fallback (`load_flight_file`) shares the same DTO shape per FAC-INV-1. * `flights_api/interface.py` — `FlightsApiClient` Protocol + `FlightDto` + `WaypointDto` + `WaypointObjective` / `WaypointSource` enums (plain frozen-slotted dataclasses, matching project's LatLonAlt / PoseEstimate pattern). * `flights_api/errors.py` — 8-class hierarchy under `FlightsApiError`. * `flights_api/_parser.py` — shared JSON validator: range checks, lat/lon bounds, contiguous ordinals, finite floats, enum membership. * `flights_api/bbox.py` — `bbox_from_waypoints` envelopes lat/lon and inflates by a horizontal-distance buffer via WgsConverter ENU round-trip (NOT degree-space); `takeoff_origin_from_flight` passes waypoints[0] through unrounded. * `flights_api/file_loader.py` — orjson-backed offline loader. * `flights_api/httpx_client.py` — concrete client with ONE retry on transient 5xx + connect errors; token redaction at every log site; test-injectable transport + sleep. * `runtime_root/c12_factory.py` — `build_flights_api_client(config)`; re-exported from `runtime_root/__init__.py`. OperatorToolServices aggregate intentionally deferred to AZ-328 per scope discipline. * `pyproject.toml` — `httpx>=0.28,<1.0` added (chosen over `requests` for native `MockTransport` testing). Tests: 28 cases across AC-1..AC-18 plus extras (malformed JSON, negative buffer, zero buffer, missing top-level fields, negative ordinal, empty-flight takeoff). Full repo run: 713 passed, 2 skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
e0be591b06 |
[AZ-489] [AZ-490] ADR-010 design pass: operator-mission as cold-start anchor
Architecture, contracts, and task amendments for the flight-route-driven preflight + cold-start origin feature (ADR-010). No source code touched in this commit; the implementation commits for AZ-489 / AZ-490 / AZ-419 land separately. * architecture.md: ADR-010, new Principle #14, amended Principle #11, external systems gain flights service + Mission Planner UI, data model gains Flight / Waypoint / TakeoffOrigin. * system-flows.md: F1 gains phase 0 (Flight resolve), F2 gains cold-start ladder, F7 gains mid-flight bounded-delta GPS gate. * glossary.md: Flight, Flights API, Mid-flight bounded-delta GPS gate, Mission Planner UI, Takeoff origin, Waypoint. * C10: description + cache_provisioner + manifest_verifier bumped to v1.1 carrying takeoff_origin + flight_id in the manifest hash. * C12: description updated + new flights_api_client.md contract v1.0. * C5: description + state_estimator_protocol bumped to v1.1 with set_takeoff_origin + 3-clause spoof-promotion gate. * AZ-323/324/325/326/328/419 amended in place. AZ-490 spec created (C5 set_takeoff_origin entrypoint). * Dependencies table: 142 tasks / 478 pts / 15 forward edges (2 new tasks, 2 backward deps, 2 forward deps from AZ-419). * Leftovers cleared: 2026-05-11 Jira transition entries for AZ-355 and AZ-386 are deleted (Jira reconnected; both already transitioned in their respective implementation commits). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
db27e25630 |
[AZ-355] C4 PoseEstimator Protocol + factory + DTOs + composition
Land the foundational C4 surface AZ-358 (Marginals) and AZ-361 (Hybrid) build on top of: - PoseEstimator Protocol (@runtime_checkable): estimate(...) + current_covariance_mode(). - Error hierarchy: PoseEstimatorError, PnpFailureError, PoseEstimatorConfigError; CovarianceDegradedWarning as a Warning subclass (warnings.warn path, not raised). - ISam2GraphHandle Protocol stub (READ-ONLY view, get_pose_key only) decoupled from C5's concrete ISam2GraphHandleImpl. - C4PoseConfig (frozen dataclass) + register on c4_pose import. - runtime_root/pose_factory.build_pose_estimator with lazy-import fallback; INFO log c4.pose.strategy_loaded; shares ingest-thread binding with C5 per ADR-003. DTO restructuring (cross-cutting): retire the legacy raw-4x4 PoseEstimate(int frame_id, datetime timestamp, pose_se3, ...) and ship the contract shape PoseEstimate(UUID, LatLonAlt, Quat, np.ndarray, CovarianceMode, PoseSourceLabel, last_satellite_anchor_age_ms, emitted_at). C5 add_pose_anchor in both gtsam_isam2 + eskf_baseline migrated in lockstep via WGS84->ENU + Quat->R helpers; test fixtures updated. VIO output stays on the raw shape until AZ-331 (C1 protocol) lands. LatLonAlt upgraded to slots=True per AC-2. ThermalState stub added to _types/thermal.py so the Protocol typechecks pre-AZ-302. Tests: 25 new in tests/unit/c4_pose/test_az355_pose_protocol.py covering AC-1..AC-10 + factory wiring + config validation; full repo: 685 passed, 2 pre-existing CI-only skips. Jira transition deferred: MCP "Not connected"; leftover entry in _docs/_process_leftovers/2026-05-11_jira_transition_az355_deferred.md. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
c0bdb57957 |
[AZ-386] C5 ESKF baseline: 16-state error-state KF (NumPy)
Implements the mandatory simple-baseline StateEstimator per AC-2.1a engine-rule at C5 (IT-12 comparative study vs iSAM2). NumPy-only; no GTSAM dependency so BUILD_STATE_ESKF=ON binaries ship without GTSAM at all. - 16-state error vector (pos 3 + vel 3 + rot 3 + ba 3 + bg 3 + dt 1) over a textbook nominal-state / error-state ESKF split. - add_fc_imu: full nonlinear IMU integration + linearised F P F^T + Q covariance propagation per IMU sample. - add_vio: simplified relative-pose update (snapshot-based; baseline scope, documented). - add_pose_anchor: absolute-pose update; integrates BOTH marginals and jacobian modes (no skip — ESKF has no graph; AC-4). - AC-9 divergence test: Mahalanobis r^T S^-1 r > 100 (10 sigma) on the innovation covariance S = H P H^T + R. - AC-5 SPD: Cholesky-positive enforcement on every emitted covariance; non-SPD raises EstimatorFatalError and locks state to LOST. - AC-6 honesty: smoothed_history entries carry smoothed=False; deviation from C5 contract Invariant 7 documented in module + report. - AC-7 / AC-10 BUILD_STATE_ESKF gating: works through existing factory infra (state_factory._STATE_BUILD_FLAGS). - AC-8: SourceLabelStateMachine + FallbackWatcher auto-wired eagerly in __init__, same pattern as the iSAM2 estimator. Tests: 20 new unit tests covering AC-1..AC-10 + robustness checks. Full suite: 660 passed, 2 skipped (CI-only). The AZ-386 Jira transition to Done is deferred (Atlassian MCP returned 'Not connected'); recorded in _docs/_process_leftovers/ for replay on the next autodev invocation per the Leftovers Mechanism. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
098aabac0c |
[AZ-387] C5 smoothed-history → FDR side-channel
After every successful current_estimate(), emit one c5.state.smoothed_history FDR record per newly-smoothed past keyframe from IncrementalFixedLagSmoother. AC-4.5 (revised): the smoothed stream goes ONLY to FDR; the C8 outbound forward-time stream is unaffected. Idempotency via _smoothed_fdr_watermark_s (smoother-native float seconds); the same pose key is never emitted twice. Hook is best-effort — internal failures log warnings but do not raise, so a smoother divergence cannot contaminate the forward-time path. Cross-task invariants documented: - AC-3 ESKF no-op — AZ-386 installs an inert hook on the ESKF. - AC-4 No C8 leak — enforced at the C8 boundary by AZ-261. 8 new unit tests against AC-1/2/5/6 + robustness (no-FDR-client, marginals failure). Full suite: 640 passed, 2 skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |