mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 14:51:13 +00:00
3c2b63ce22a2f4c97de92cf49ab5bc8b0164411d
21 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
4eac24f37a |
[AZ-358] [AZ-361] C4 OpenCVGtsamPoseEstimator + Jacobian thermal hybrid
Implement the single production-default C4 PoseEstimator strategy. AZ-358 — Marginals path: OpenCV solvePnPRansac (SOLVEPNP_IPPE) on best-candidate inliers, PriorFactorPose3 with Jacobian-derived initial covariance, flushed into C5's iSAM2 graph via the widened ISam2GraphHandle.update(graph, values, None) (Option B). Posterior covariance from compute_marginals().marginalCovariance(pose_key) with SPD-defensive Cholesky check. Tile pixel -> ENU world conversion via the shared WgsConverter + a configurable tile_size_px. Two spec deviations now documented in the AZ-358 task file: PriorFactorPose3 over GenericProjectionFactorCal3DS2 (avoids unbounded landmark variables; same Fisher information on the pose marginal) and explicit (graph, values, timestamps) update args (aligns with C5's impl). AZ-361 — Jacobian + thermal hybrid: per-frame dispatch on thermal_state.thermal_throttle_active selects the cv2.projectPoints- derived 6x6 information matrix (with ridge regularisation) as the emitted covariance. Skips the iSAM2 factor add under throttle (Invariant 12). Emits CovarianceDegradedWarning via warnings.warn (never raised); paired WARN log + FDR record rate-limited per covariance_degraded_warn_window_ns (default 60 s) via an injected monotonic Clock. Supersedes the AZ-358 NotImplementedError stub. Widens ISam2GraphHandle from get_pose_key only to all five C4-facing methods (add_factor, update, compute_marginals, last_anchor_age_ms); C5's existing ISam2GraphHandleImpl already satisfies the superset, so no C5 source change this batch. Threads fdr_client + clock through pose_factory composition. Registers two new FDR payload kinds: pose.frame_done (per-call telemetry; both success and PnpFailureError paths) and pose.covariance_degraded (per-window throttle exposure). Tests: 21 new (AZ-358 AC-1..11 + AZ-361 AC-1..10/12/13; AZ-361 AC-11 RMSE-ratio informational per spec, not asserted). Updates 2 existing test files for Protocol widening and the FDR-schema round trip. Code review verdict: PASS_WITH_WARNINGS (5 findings: Medium x2, Low x3; none blocking). Full suite: 1958 passed, 1 unrelated host-dependent perf failure (c12 CLI cold-start, pre-existing). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
a1185d0a28 |
[AZ-345] [AZ-346] [AZ-347] [AZ-349] C3 matchers + C3.5 AdHoP refiner
Implement the three concrete C3 CrossDomainMatcher strategies plus the C3.5 production-default AdHoPRefiner. C3 (AZ-345/346/347): - DiskLightGlueMatcher + AlikedLightGlueMatcher share a single shared _pipeline.run_lightglue_pipeline orchestrator (decode -> query extract -> per-candidate loop -> RANSAC sort -> health update -> FDR emit) so the only per-backbone delta is the keypoint+descriptor extractor closure. ALIKED adds a create-time engine output-schema probe (AC-special-1). - XFeatMatcher owns its own per-candidate loop (single forward fuses extraction + matching); it re-uses the shared FDR emission helpers to keep telemetry byte-identical across strategies. lightglue_runtime parameter accepted by factory but discarded (AC-special-1). - All three consume the shared LightGlueRuntime / RansacFilter / RollingHealthWindow helpers; no helper forks. InferenceRuntimeCut consumer-side Protocol added per AZ-507. C3.5 (AZ-349): - AdHoPRefiner implements the <= conditional gate, runs the OrthoLoC AdHoP TRT engine over best-candidate correspondences, re-runs RANSAC on the perspective-preconditioned set, and emits an enriched MatchResult with refinement_label="adhop". - Invariant 4 passthrough fall-through: any RefinerBackboneError (TRT failure, OOM, NaN, bad shape) is caught, logged ERROR, FDR-emitted with error: true, and converted to passthrough that still counts against the rolling invocation-rate window. MemoryError and other non-listed exceptions propagate by design (AC-5 closed-set semantics). - Rolling 60-s invocation-rate window + rate-limited WARN log (configurable via ratelimited_warn_window_ns; default 60 s). Shared changes: - C3MatcherConfig + C3_5RefinerConfig extended with the new weights/threshold/window fields. - matcher_factory + refiner_factory optionally forward clock + fdr_client to the strategy's create(); backward-compatible. - fdr_client.records registers five new kinds: matcher.frame_done, matcher.backbone_error, matcher.insufficient_inliers, matcher.all_failed, refiner.frame_done. Tests: 66 new (43 C3 parametrised + 23 AdHoP) covering 47/47 ACs; focused suite green; full project test suite green except for one pre-existing flaky CLI cold-start timing test unrelated to this batch. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
06f655d8fb |
[AZ-335] C1 warm-start hint persistence + F8 reboot recovery wiring
Adds JsonSidecarWarmStartHintStore (atomic JSON + SHA-256 sidecar via AZ-280) inside c1_vio, plus the cross-strategy WarmStartWiredStrategy wrapper + prime_warm_start_from_disk / prime_warm_start_from_fc hooks at runtime_root. AC-7 post-reset covariance inflation and AC-8 "no fake confidence" baseline floor are enforced at the wiring layer so no strategy module needed edits. Adds three c1_vio config fields (warm_start_store_dir, warm_start_save_period_frames, post_reset_covariance_inflation_factor) and registers the new FDR kind vio.warm_start. 34 unit tests cover all 10 ACs + 3 NFRs. Verdict PASS_WITH_WARNINGS — see _docs/03_implementation/reviews/batch_56_review.md for the four non-blocking documentation findings (F1 cold-start log kind shorthand, F2 strategy-frame pose semantics, F3 dev-hardware perf smoke, F4 runtime_root importing c1-internal _facade_spine for shared FDR conventions). Closes AZ-335; depends on AZ-528 (batch 55). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
af0dbe863a |
[AZ-338] [AZ-283] C2 NetVLAD mandatory simple-baseline VprStrategy
NetVLAD is the C2 comparative baseline per the engine rule (every production-default backbone ships with a simple-baseline alongside). Runs on the C7 PyTorch FP16 runtime (NOT TRT) so a TRT engine compile bug cannot simultaneously break NetVLAD AND UltraVPR. Production changes: - c2_vpr/net_vlad.py — NetVladStrategy + module-level create() factory. Constructor wires InferenceRuntimeCut + DescriptorIndexCut + NetVladBackbonePreprocessor + DescriptorNormaliser + FaissBridge. embed_query pipeline: preprocess -> runtime.infer -> dual-stage normalisation (intra-cluster THEN global L2) -> VprQuery. retrieve_topk delegates one-line to FaissBridge. - c2_vpr/_net_vlad_architecture.py — Arandjelovic et al. 2016 NetVLAD layer over torchvision VGG16 features + optional Linear PCA projection to descriptor_dim (default 4096; published Pittsburgh reference uses K*D=64*512=32768 raw + Linear(32768, 4096) PCA). - c2_vpr/_preprocessor_net_vlad.py — OpenCV-based image preprocessor: decode -> centre-crop square -> resize (480, 480) -> ImageNet normalisation -> FP16 NCHW. Calibration is not consumed (NetVLAD is calibration-agnostic per published preprocessing chain). - c2_vpr/inference_runtime_cut.py — NEW AZ-507 consumer-side cut mirroring C7 InferenceRuntime; lets c2_vpr stay AZ-507-clean. - c2_vpr/config.py — added netvlad_descriptor_dim: int = 4096 knob. - helpers/descriptor_normaliser.py — added intra_cluster_normalise (DescriptorNormaliser v1.0.0 -> v1.1.0; backward-compatible add). - runtime_root/vpr_factory.py — added _register_strategy_architecture helper that binds (MODEL_NAME, architecture_factory(descriptor_dim)) to C7's architecture registry before delegating to the strategy's create() factory. Keeps the c7 import at L4, preserves AZ-507. - fdr_client/records.py — registered vpr.embed_query, vpr.backbone_error, vpr.preprocess_error record kinds. Tests: - tests/unit/c2_vpr/test_net_vlad.py — 31 tests covering all 11 ACs + preprocessor contract + architecture factory + constructor validation + FDR record emission. - tests/unit/test_az283_descriptor_normaliser.py — +8 tests for the new intra_cluster_normalise. - tests/unit/test_az272_fdr_record_schema.py — +3 fixture payloads. Full unit suite: 1608 passed / 80 env-skipped (+43 new tests). Per-batch code review (batch_46_review.md): PASS_WITH_WARNINGS (4 Low-severity hygiene findings; no Critical/High/Medium). Architectural notes: - The spec implied c2_vpr.net_vlad.create() registers the architecture with C7. That violates AZ-507 (no cross-component imports). Resolved by exposing MODEL_NAME + architecture_factory(descriptor_dim) on the strategy module and having the composition root perform the C7 bind. - C7 PyTorch runtime API names in the spec (forward, load_engine) were outdated; aligned implementation with the live v1.0.0 Protocol (infer, compile_engine + deserialize_engine). Spec hygiene flagged in review F2. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
88f6ae6dce |
[AZ-341] C2 FAISS HNSW retrieve wiring (FaissBridge + AZ-507 cut)
Shared retrieve_topk plumbing for every concrete C2 VprStrategy:
- FaissBridge centralises the c6 search_topk → VprResult pipeline,
the defended-in-depth INV-4 check (exactly k, distance-ascending),
the WARN-threshold check on distances[0], optional per-frame DEBUG
log, and one `vpr.retrieve_topk` FDR record per call with latency
measurement.
- DescriptorIndexCut Protocol — consumer-side structural cut of c6
DescriptorIndex.search_topk (AZ-507); keeps c2_vpr c6-import-free.
- C2VprConfig gains warn_top1_threshold + debug_per_frame_distances
knobs with validators.
- KNOWN_PAYLOAD_KEYS registers vpr.retrieve_topk for the FDR record
schema with payload {frame_id, backbone_label, top10_distances,
latency_us}; companion fixture added to the AZ-272 roundtrip suite.
- 22 unit tests cover AC-1..AC-11 + NFR-perf microbench (p95 ≤ 0.5 ms)
+ constructor and retrieve-argument validation.
Verdict: PASS_WITH_WARNINGS (2 Low findings — duplicated ISO-ts
helper across c2/c5/c11/c12, captured in AZ-508 hygiene PBI;
spec-listed but unused `normaliser` parameter dropped — INV-3 makes
the embedding L2-normalised at the strategy's `embed_query`).
Tests: 1565 passed / 80 skipped (was 1543; +22 new tests).
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
5fe67023b2 |
[AZ-329] [AZ-330] [AZ-523] [AZ-524] Batch 44 atomic refactor
Implements two new C12 services and rebalances the C11/C12 boundary in one atomic commit: * AZ-329 PostLandingUploadOrchestrator — gates C11 upload on the `flight_footer` FDR record's `clean_shutdown` field; 4 refusal modes; new FdrFooterReader Protocol + LocalFdrFooterReader. * AZ-330 OperatorReLocService — AC-3.4 visual-loss re-localization hint; reuses shared LatLonAlt; OperatorCommandTransport Protocol cut (E-C8 owns the future pymavlink concrete); new FDR record kind `c12.reloc.requested`; log redaction (lat/lon 5 decimals, reason 200 chars). * AZ-523 C11 internal flight-state gate removed (SRP refactor): `confirm_flight_state` / `FlightStateSignal` use / `FlightStateNotOnGroundError` deleted from C11; TileUploader contract bumped to v2.0.0 (frozen) with migration note; AZ-317 superseded. * AZ-524 Package rename `c12_operator_tooling` → `c12_operator_orchestrator` across source, tests, pyproject, CMake, Dockerfile, compose, CI, runtime-root services class (`OperatorOrchestratorServices`) + factory function (`build_operator_orchestrator`), logger namespaces, config slug, docs, and the E-C12 epic title. Tests: 1543 passed, 80 skipped (all environment gates). Targeted AC suite (AZ-329 + AZ-330 + FdrFooterReader): 37 passed. Cold-start NFR-perf still ≤ 500 ms p99. Tracker: AZ-317 → Done (superseded); AZ-319 v2.0.0 contract bump comment; AZ-329/AZ-330 → In Testing; AZ-253 epic renamed; AZ-523 + AZ-524 created and closed as audit-trail tickets. See `_docs/03_implementation/batch_44_cycle1_report.md`. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
a06b107fc3 |
[AZ-320] Add C11 IdempotentRetryTileUploader decorator
Wraps HttpTileUploader (AZ-319) with two bounded retry budgets: - In-call (per-batch) — re-invokes inner on PARTIAL outcome up to `max_in_call_retries` times with capped exponential backoff (`min(base ** attempt_number, cap)`). On exhaustion: surfaces an operator hint via `next_retry_at_s = now + backoff_cap_s`. - Per-tile (cross-call) — atomically increments c6's `tiles.upload_attempts` counter for every rejection; once a tile hits `max_per_tile_attempts` it is forward-only transitioned to `voting_status = upload_giveup` (excluded from `pending_uploads`). Each transition emits FDR `kind="c11.upload.giveup"` plus an ERROR log. C6 contract changes (AZ-303 v1.3.0): - VotingStatus.UPLOAD_GIVEUP added (forward-only from PENDING/TRUSTED). - TileMetadataStore.increment_upload_attempts(tile_id) -> int added with NotImplementedError default for backwards-compat. - Migration 0003_c11_upload_attempts: additive column + widened ck_tiles_voting_status (preserves IS NULL clause). C11 wiring: - C11RetryConfig + disable_retry_decorator on C11Config. - build_tile_uploader wraps in decorator by default; bypass flag returns the bare HttpTileUploader. New `clock` keyword. Cross-component isolation honoured (AZ-507): the decorator declares `_RetryMetadataStoreLike` Protocol cut over c6's TileMetadataStore and references `UPLOAD_GIVEUP` via a local string constant — no c6 imports. Tests: 13 decorator + 1 conformance + 2 factory bypass + AC-6 enum update + alembic head bump + AZ-272 schema fixture. 238 passed across c11/c6/fdr suites; pre-existing perf microbenches unrelated. Code review: PASS_WITH_WARNINGS (5 Low/Informational findings, docs-level or downstream-CI-blocked). See _docs/03_implementation/reviews/batch_41_review.md. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
610e8a743c |
[AZ-319] C11 HttpTileUploader (post-landing upload path)
Lands the production HttpTileUploader composing AZ-317's gate, AZ-318's per-flight signing, and consumer-side cuts over c6 storage. Implements the full upload flow: gate ON_GROUND -> start_session -> enumerate pending -> per-batch multipart POST with Ed25519 signing -> mark_uploaded on ack -> end_session in finally. Honours Retry-After (RFC 7231 int + HTTP-date), exponential backoff on 5xx, fail-fast on TLS/401/403. Adds C11Config block, three FDR kinds (tile.queued, tile.rejected, batch.complete), and the build_tile_uploader composition-root factory. Cross-component access to c6 stays Protocol-cut (AZ-507 / AZ-270). Tests: 17 new unit tests covering AC-1..AC-14 plus throughput NFR; AZ-272 schema fixtures for the three new FDR kinds. Full unit suite: 1404 passed. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
cde237e236 |
[AZ-317] [AZ-318] C11 upload-side: flight-state gate + per-flight key
Batch 38 (cycle 1) lands the two upload-side prerequisites the upcoming AZ-319 TileUploader needs to authenticate per-flight sessions against the parent suite's D-PROJ-2 ingest contract. AZ-317 FlightStateGate: - confirm_on_ground() defence-in-depth gate atop ADR-004 process isolation; fail-closed for UNKNOWN, IN_FLIGHT, TAKING_OFF, LANDING, and source-failure (mapped to UNKNOWN with original exception preserved on __cause__). - ERROR log on refusal, INFO log on pass, single source call per invocation (no polling, no retry). AZ-318 PerFlightKeyManager: - Per-flight ephemeral Ed25519 keypair via the project-pinned cryptography library; sign(payload) -> 64-byte Ed25519 signature. - Best-effort zeroisation of a project-controlled bytearray mirror on end_session; OpenSSL-side buffer freed via dropped reference. - __del__ safety net with WARN log if end_session was missed. - start_session emits FDR kind=c11.upload.session.key.public so the safety officer can correlate flights with key fingerprints. - record_signature_rejection emits FDR + ERROR log on parent-suite ingest rejection (security-critical, never silently dropped). Shared C11 plumbing: - TileManagerError parent + 3 subclasses (FlightStateNotOnGroundError, SessionNotActiveError, SignatureRejectedError envelope). - FlightStateSignal (str, Enum) and PublicKeyFingerprint DTOs. - FlightStateSource Protocol on c11_tile_manager.interface. - runtime_root.c11_factory factories for both new services. - Two new FDR kinds registered in fdr_client.records central KNOWN_PAYLOAD_KEYS; AZ-272 schema-roundtrip fixtures added in lockstep so the central test stays green. Tests: 26 new + 2 fixture additions; full suite 1384 passed, 80 skipped (documented Docker / Tier-2 / CUDA gates). Code review: PASS_WITH_WARNINGS — 2 Low findings documented in _docs/03_implementation/reviews/batch_38_review.md (dev-host vs operator-workstation perf bound; spec text named StrEnum but project pins Python 3.10). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
0ad3278b12 |
[AZ-299] C7 OnnxTrtEpRuntime: ORT + TRT EP fallback strategy
Land the fallback InferenceRuntime strategy that satisfies C7-IT-05: when the TRT-direct path (AZ-298) cannot deserialise a cached engine or when the operator explicitly selects ORT, the system stays in the air at degraded latency rather than dropping the request. Conforms to the AZ-297 Protocol; current_runtime_label() == "onnx_trt_ep". Production - onnx_trt_ep_runtime.py: compile_engine is a no-op returning an EngineCacheEntry pointing at the source .onnx; deserialize_engine is gate-first for .engine entries and gate-skip for .onnx, builds an ORT InferenceSession with the provider list [TensorrtExecutionProvider, CUDAExecutionProvider, CPUExecutionProvider], stages cached engines into the ORT TRT EP cache directory via symlink-or-copy, warms up with one session.run after construction, and honours config.inference.ort_disallow_cpu_ fallback by raising EngineDeserializeError when the active provider resolves to CPU; infer emits a one-shot c7.fallback_to_onnx_trt_ep WARN log plus gcs_alert callback on first call when is_fallback= True; release_engine is idempotent. _build_provider_args is the single point that pins TRT EP option-key names (Risk-3) and caps trt_max_workspace_size at gpu_memory_budget_bytes // 4 (AC-8). - config.py: adds ort_trt_cache_dir (validated non-empty) and ort_disallow_cpu_fallback to C7InferenceConfig. - fdr_client/records.py: adds c7.fallback_to_onnx_trt_ep and c7.cpu_fallback FDR record kinds. Tests - test_onnx_trt_ep_runtime.py: covers AC-1..AC-8 + Risk-2 CPU-fallback alert + Risk-3 option-key pin + NFR-reliability error rewrap; Tier-1 via fake ORT session; Tier-2 placeholders skip on macOS dev for numerical FP16 comparison and session-creation perf NFR. - test_protocol_conformance.py: drops onnx_trt_ep from the missing- module parametrize now that the module ships. - test_az272_fdr_record_schema.py: extends per-kind fixture builder to cover the two new C7 FDR kinds in the roundtrip / schema-version AC tests. Docs - module-layout.md: replaces the pending onnx_trt_runtime row with the shipped onnx_trt_ep_runtime row + capabilities list. - batch_32_cycle1_report.md + reviews/batch_32_review.md: full batch + self-review (PASS_WITH_WARNINGS, 4 Low findings accepted). Tests run: c7_inference 139 passing + 17 Tier-2 skips; combined unit suite (excluding pending components) 529 passing, 19 env-skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
d571ca25f9 |
[AZ-308] c6 CacheBudgetEnforcer: 10 GB hard cap + LRU sweep
CacheBudgetEnforcer.reserve_headroom(needed_bytes) returns immediately when total_disk_bytes() + needed_bytes <= budget, otherwise iterates lru_candidates in eviction_batch_size batches, deletes via delete_tile, emits one INFO log per evicted tile (c6.evicted) and one FDR record per eviction batch (c6.eviction_batch, evicted_tile_ids capped to 5). Raises CacheBudgetExhaustedError AFTER a full sweep if the budget cannot be met. BudgetEnforcedTileStore decorates a TileStore so the policy stays separable from PostgresFilesystemStore. Composition root in storage_factory.build_tile_store wires the wrapper unconditionally. PostgresFilesystemStore now accepts lru_clock: Clock | None = None; when set, read_tile_pixels calls record_lru_access(tile_id, now) so eviction picks the right LRU candidates. Production wiring injects WallClock(); AZ-305 unit tests still construct without the clock and keep their pass-through semantics. Contract tile_store.md bumped to v1.1.0 to add CacheBudgetExhaustedError to the TileCacheError family; shared FDR schema bumped to v1.3.0 for the new c6.eviction_batch kind. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
39ff47087f |
[AZ-307] c6 FreshnessGate: active-conflict reject + stable-rear downgrade
Replaces the AZ-305 pass-through _evaluate_freshness hook with the production FreshnessGate. Loads tile_freshness_rules + sector classifications once at construction, builds an rtree index, and on every evaluate() either returns metadata unchanged (FRESH), stamps freshness_label=DOWNGRADED (stable_rear + stale), or raises FreshnessRejectionError carrying tile_id / age_seconds / classification / rule diagnostics (active_conflict + stale). Constructed inside PostgresFilesystemStore.from_config; the public storage_factory signature is preserved so AZ-305 unit tests still build the store with freshness_gate=None for the pass-through path. FDR schema bumped to v1.2.0: adds c6.freshness.rejected and c6.freshness.downgraded kinds (non-breaking; v1.1 readers route them opaquely). Operator CLI `python -m c6_tile_cache.freshness_gate explain` dry-runs the decision for a (lat, lon, capture_ts). Adjacent hygiene: c6_tile_cache.tools._dump_tile now passes os.environ to load_config (AZ-305 regression — load_config requires the env mapping). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
d1c1cd9ab4 |
[AZ-305] c6 PostgresFilesystemStore: TileStore + TileMetadataStore impl
Adds the production PostgresFilesystemStore implementing both protocols in a single class. Filesystem-backed JPEG I/O (atomic sidecar write, read-only mmap) + Postgres-backed metadata (spatial bbox, LRU, voting, upload bookkeeping). Wires composition via `from_config` classmethod. Key behaviors: - AC-3 strict reading: INSERT runs first inside an open transaction; duplicate-key collisions raise `TileMetadataError` BEFORE any byte is written, leaving the original file + sidecar byte-identical. Atomic sidecar write happens inside the same transaction; commit closes it. Comp-delete remains as a safety net for the rare commit-after-write failure path. - AC-2 content-hash gate runs before any I/O. - Construction performs an orphan-file reconciliation scan and emits an INFO `c6.store.construct` log with steady-state stats. Adds `c6.write` and `c6.write_failed` FDR record kinds (schema v1.1.0, forward-compatible) and a thin operator CLI at `c6_tile_cache.tools dump` for inspection. Dependencies: adds `psycopg-pool>=3.2,<4.0` for the connection pool used on the F3 read-hot path. Tests: 25 new tests for c6_tile_cache cover AC-1..AC-15 plus MmapTilePixelHandle + helper round-trips. Full Tier-2 unit suite passes (1215 passed, 8 skipped, 1 pre-existing unrelated failure `test_ac8_read_host_tuple_on_jetson` — missing `pynvml` on macOS, Jetson-only). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
49a67f770d |
[AZ-302] C7 ThermalStatePublisher — jtop/NVML 1 Hz background poller
Implements AZ-297 InferenceRuntime's thermal_state() side: a singleton background-thread publisher that polls jtop (preferred) or pynvml (fallback) at config.thermal_poll_hz, stores an atomic ThermalState snapshot, and emits c7.thermal_transition FDR records on every throttle flip with a WARN log on entry and an INFO log on exit. Default-safe on TelemetryUnavailableError per Invariant I-6 with a 1-Hz rate-limited WARN. Sources return a raw ThermalReading; the publisher stamps measured_at_ns via its injected Clock so _JtopSource / _PynvmlSource stay clean of direct time.* calls (Invariant 2). _poll_once is the deterministic test seam — start() spawns the production thread. - c7.thermal_transition registered in fdr_client.records KNOWN_PAYLOAD_KEYS - [telemetry] optional dep group (jetson-stats, pynvml) added to pyproject - 14 unit tests (AC-1..AC-6, AC-8, NFR-default-safe, structural) green; AC-7 / AC-1 microbench / NFR-perf-poll Tier-2 deferred - full unit suite: 1140 passed, 11 expected Tier-2/CUDA skips Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
1ebab29a4f |
[AZ-332] C1 OKVIS2 Strategy: facade + binding skeleton
Python facade (`Okvis2Strategy`) is production-quality and satisfies
AZ-331's `VioStrategy` protocol; full AC-1..10 coverage with
AC-9 + NFR-perf marked `tier2`. The C++ pybind11 binding compiles
and loads but throws `OkvisFatalException("estimator not yet wired")`
on first `add_frame` — the `okvis::ThreadedKFVio` wiring is a tier2
follow-up the Step-15 Product Completeness Gate is expected to track
as a remediation task.
Resolved contradictions:
* Constructor signature aligned with the AZ-331 factory: `(config, *,
fdr_client, clock=None)`. Calibration / preintegrator / logger
built internally from config. No churn on AZ-331.
* IMU substrate: OKVIS2 owns its internal estimator IMU integration;
the AZ-276 `ImuPreintegrator` is a separate substrate consumed by
E-C5's fusion graph. Single source of truth lives at the sample
stream, not the integrator instance.
* FDR API: `FdrClient.enqueue(record)` with new `vio.health` kind
added to AZ-272 `KNOWN_PAYLOAD_KEYS`.
CI matrix forces `-DBUILD_OKVIS2=OFF` until the tier2 wiring task
brings Ceres / SuiteSparse / OKVIS2 vendored submodules into the
Linux build.
Files: 17 added/modified across `c1_vio/`, `fdr_client/records.py`,
`cpp/okvis2/CMakeLists.txt`, CI workflow, AZ-332 task spec
(implementation-notes section), batch 23 report.
Tests: 17 new (15 tier1 + 2 tier2). Full Tier-1 suite: 1109 pass,
2 skipped (env), 2 deselected (tier2). No regressions.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
8a83166261 |
[AZ-490] C5 set_takeoff_origin entrypoint + bounded-delta GPS gate
Add operator warm-start path to C5 StateEstimator Protocol and both
implementations (GtsamIsam2StateEstimator, EskfStateEstimator), plus
the third clause of the AZ-385 spoof-promotion gate.
- StateEstimator Protocol: set_takeoff_origin(origin, sigma_horiz_m,
sigma_vert_m) -> None.
- iSAM2: PriorFactorPose3 at origin with diagonal sigmas, single
isam2.update().
- ESKF: zero _nominal_pos, overwrite _P position block with sigma**2.
- SourceLabelStateMachine.process_gps_sample bounded-delta clause:
WgsConverter.horizontal_distance_m vs smoother estimate; reject
resets the dwell-time counter so AZ-385 cannot re-promote off bad
GPS.
- New EstimatorAlreadyStartedError (StateEstimatorConfigError
subclass) on late call after first add_*.
- C5StateConfig: spoof_promotion_bounded_delta_m=200,
default_takeoff_origin_sigma_horiz_m=5,
default_takeoff_origin_sigma_vert_m=10.
- New GpsSample DTO + WgsConverter.horizontal_distance_m helper.
- 4 new FDR kinds (cold_start_origin.{set,unavailable},
gps_bounded_delta.{accept,reject}) registered in AZ-272 schema.
- 33 new unit tests cover AC-1..AC-15; full repo 750 passed / 2
skipped (pre-existing CI tooling skips).
Docs synced: protocol contract, C5 component description,
architecture, glossary, system-flows, C10 provisioning description.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
e4ecdaf619 |
[AZ-294] [AZ-295] [AZ-296] Finish C13: tile snapshot + record-kind policy + takeoff abort
AZ-294: MidFlightTileSnapshotSink writes orthorectified tile JPEGs atomically to flight_root/<flight_id>/tiles/<tile_id>.jpg, emits a kind="mid_flight_tile_snapshot" pointer record, and evicts the oldest tile when the per-flight 64 MiB cap is exceeded. Adds optional frame_id to the snapshot payload (fdr_record_schema bump). AZ-295: RecordKindPolicy with two paired gates: - enforce_or_raise (producer-side) raises RawFrameWriteForbiddenError for raw_nav_frame / raw_ai_cam_frame at the call site, defending AC-8.5 / RESTRICT-UAV-4. - gate_for_writer (writer-side) tumbling-window rate-caps failed_tile_thumbnail records at <= 0.1 Hz; over-cap drops are coalesced into kind="overrun" records with the originating producer slug. AZ-296: take_off() composition-root sequence with strict ordering (writer.__init__ -> start -> open_flight -> fc_adapter.__init__ -> fc_adapter.open). On FdrOpenError, logs ERROR record, calls writer.stop(), prints the documented FATAL line to stderr, and sys.exit(EXIT_FDR_OPEN_FAILURE=2). composition_root_protocol bumped to v1.1.0 with the new constants + takeoff-sequence section. 29 new tests; full suite 356 passed / 2 skipped / 0 failures. No new dependencies (stdlib only). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
b5dd6031d2 |
[AZ-291] [AZ-292] [AZ-293] C13 FDR writer chain (batch 6)
AZ-291 — FileFdrWriter: single writer thread draining every registered FdrClient SPSC ring buffer to per-flight segment files; per-segment size rotation; cross-process fcntl.flock filelock on flight_root; ENOSPC degraded mode with rate-capped ERROR logs and one GCS alert. AZ-292 — FlightHeader/FlightFooter dataclasses + open_flight / close_flight lifecycle methods; four per-flight monotonic counters (records_written, records_dropped_overrun, bytes_written, rollover_count) reported by the footer; flight_id mismatch and close-without-open are typed errors. AZ-293 — CapacityCapPolicy (post-rotation hook): walks the flight directory, drops the oldest CLOSED segment when total > cap (default 64 GiB), emits a kind="segment_rollover" record per drop. Never drops the currently-open segment or segment 0 alone; cap_misconfigured path logs ERROR + GCS alert. No config flag disables emission (C13-ST-01). Schema: bumped fdr_record_schema flight_header / flight_footer payload key sets to match the AZ-292 task spec (effective 1.0.0 -> 1.1.0; no prior producer); KNOWN_PAYLOAD_KEYS updated. Added FdrWriterConfig nested in FdrConfig (segment_size_bytes, batch_size, flight_cap_bytes, debug_log_per_record). Tests: 29 new unit tests (8 AC + 1 invariant per task); full suite 323 passed, 2 pre-existing skips, 0 regressions. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
ba20c2d195 |
[AZ-273] [AZ-274] [AZ-275] [AZ-267] [AZ-268] FDR producer chain + log bridge + contract test
AZ-273: lock-free SPSC ring buffer with pre-allocated slots, power-of- two capacity, opt-in SPSC guard, and EnqueueResult / FdrSpscViolationError on the public surface. make_fdr_client caches one client per producer_id and reads capacity from config.fdr.per_producer_capacity with fallback to queue_size. AZ-274: default_overrun_policy implements drop-oldest + retry + immediate marker emission, with prior-marker dropped_count folding via _evict_one so user-loss info is never lost across iterations. ERROR diagnostic is rate-limited to <=1/sec per producer. AZ-275: FakeFdrSink mirrors the FdrClient public surface and reuses the production default_overrun_policy via a duck-typed _PolicyAdapter. The test-only records/all_records_ever properties let component tests assert both in-buffer and lifetime state. tests/conftest.py registers the fake_fdr_sink fixture and an AST architecture lint forbids production imports of fakes. AZ-267: FdrLogBridgeHandler installs on the root logger via wire_log_bridge and forwards only WARN+ERROR records into the FDR with kind="log". Thread-local recursion guard short-circuits internal logging; saturated- queue diagnostics go to stderr every N=1000 drops. AZ-268: tests/contract/log_schema.py covers every row of the schema's Test Cases table plus the "DEBUG+INFO never reach FDR" invariant. pyproject.toml registers the contract pytest marker and the contract-mandated log_schema.py file-name. 251 unit + contract tests pass (48 new). Review verdict: PASS_WITH_WARNINGS; findings are NFR-perf deferrals + documented relaxation of AZ-274 AC-2 coalescing under permanently-stalled consumer. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
3acc7f33dd |
[AZ-270] [AZ-272] [AZ-279] [AZ-281] [AZ-283] Compose root + FDR schema + 3 Layer-1 helpers
AZ-270: composition root with strategy registry, tier-gated lookup, topo-order construction, all-or-nothing teardown, StrategyNotLinkedError payload. AZ-272: orjson-backed FdrRecord serialise/parse with forward-compat for unknown payload + top-level fields and canonical overrun-record shape. AZ-279: pyproj-backed WGS84/ECEF/ENU + OSM slippy-map tile math with WgsConversionError for shape/range/zoom guards. AZ-281: strict EngineFilenameSchema build/parse/matches_host with anchored regex + enum validation; round-trip identity by construction. AZ-283: dtype-preserving (fp16/fp32) single + batch L2 normaliser with zero-norm safety and descriptor_metric() source-of-truth. pyproject.toml pins pyproj>=3.6 and orjson>=3.9 (named-backend deps per the AZ-272 / AZ-279 contracts). New DTOs LatLonAlt + BoundingBox and EngineCacheKey + HostCapabilities land in _types/ to back the helper contracts. 203 unit tests pass (64 new). Review verdict: PASS_WITH_WARNINGS; findings are perf-NFR deferrals + dep amendment + minor docstring polish. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
b12db61444 |
[AZ-263] Bootstrap: repo skeleton + Docker + CI + Alembic + Tier-1 tests
Implements the AZ-263 / E-BOOT initial structure task:
- Python src/-layout package `gps_denied_onboard/` with per-component
interface stubs (14 components), type-only DTOs under `_types/`,
shared helpers under `helpers/` (R14 LightGlue ownership), structured
JSON logging, runtime composition root with env-var fail-fast gate,
healthcheck module shared by Docker and CI smoke.
- CMake top-level + `cmake/{build_options,dependencies,strategies}.cmake`
with the BUILD_* per-binary flags (ADR-002) and pinned external git
refs for OKVIS2 / VINS-Mono / GTSAM / FAISS / OpenCV >=4.12.0.
- Three Dockerfiles (companion-tier1, operator-tooling,
mock-suite-sat-service) + two compose files (dev + Tier-1 test).
- Four GitHub Actions workflows: ci.yml (lint/unit/integration/dual
binary build/SBOM diff/security), ci-tier2.yml (self-hosted Jetson
AC-bound NFTs), release.yml, cve-rescan.yml.
- Two CI gate scripts: `ci/sbom_diff.py` (deployment SBOM subset +
R02 exclusion), `ci/opencv_pin_gate.py` (>=4.12.0 enforcement,
D-CROSS-CVE-1).
- Alembic-driven Postgres 16 initial migration `0001_initial.py`
mirroring satellite-provider tiles + flights + sector_classifications
+ manifests + engine_cache_entries (data_model.md s 2).
- Tier-1 test scaffolding: 95 passing unit tests covering every AC,
per-component smoke tests, structured logging JSON output check,
env-var gate check, healthcheck import check. Two CI-gated tests
(cmake configure, actionlint) skip locally with explicit reasons.
- Batch report + code review report under `_docs/03_implementation/`.
Verdict: PASS_WITH_WARNINGS (two Low findings, both informational).
Co-authored-by: Cursor <cursoragent@cursor.com>
|