mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 19:01:14 +00:00
0dfe7c53012ec9f7d04174fb47bdfd7b479cd0f1
58 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
0dfe7c5301 |
[AZ-321] C10 EngineCompiler: hardware-tied TRT compile + cache reuse
Land the C10 per-model engine compile + cache-reuse orchestrator. `EngineCompiler.compile_engines_for_corpus(request)` walks the corpus, computes the canonical engine filename via AZ-281 `EngineFilenameSchema.build`, and either reuses the cached binary (cache hit, AZ-280 `Sha256Sidecar.verify` returns True) or delegates to the AZ-297 `compile_engine` on the injected runtime (cache miss; the runtime owns the write path). Returns one `EngineCompileResult` per backbone carrying the canonical `EngineCacheEntry`, outcome (BUILT / REUSED), and `compile_duration_s` (None on reuse). Hardware-tied reuse (D-C10-6 / D-C10-7) falls out of the filename schema — a host change rebuilds at the new path and leaves the old files untouched (AC-4). Design corrections vs. the task spec body: - The spec proposed a c10-local `EngineCacheEntry` carrying outcome and duration; that name is already taken by the AZ-297 canonical DTO. The wrapper is renamed `EngineCompileResult`; the canonical shape wins. - The spec called `InferenceRuntime.host_info()`, which is not in the AZ-297 Protocol. `HostCapabilities` is threaded through `EngineCompileRequest` instead so the composition root owns host probing and the compiler stays decoupled. - The c10 layer cannot import `components.c7_inference` (arch rule `test_az270_compose_root.test_ac6`). `engine_compiler.py` defines `CompileEngineCallable` — a structural Protocol cut of `InferenceRuntime` exposing only `compile_engine` — and catches broad `Exception` (re-raising preserves the original type; `error_class` is recorded in the ERROR log payload). Production - engine_compiler.py: `CompileOutcome` enum, `BackboneSpec`, `EngineCompileRequest`, `EngineCompileResult`, `EngineCompileSummary` DTOs; `CompileEngineCallable` Protocol; `EngineCompiler` with the single public method. - config.py: `BackboneConfig` + `C10ProvisioningConfig` (`workspace_mb` default 4 GiB to match C7 NFT-LIM-01); validate positive shape dims and duplicate model_name detection in `__post_init__`. - runtime_root/c10_factory.py: `build_engine_compiler(config)` wires the existing `build_inference_runtime` factory through; `build_backbone_specs(config)` materialises the `BackboneSpec` tuple from the config block. - components/c10_provisioning/__init__.py: re-exports the AZ-321 surface and registers the new config block. Tests - test_engine_compiler.py: covers AC-1..AC-10 + missing-sidecar sibling case for AC-5. Tier-1 via fake runtime that writes through the REAL `Sha256Sidecar.write_atomic_and_sidecar`. Tier-2 placeholders for the cache-hit p99 NFR (200 MB engine sweep) and kill-during-compile atomic-write NFR. Docs - module-layout.md: c10_provisioning Per-Component Mapping lists the new internal modules (engine_compiler.py, config.py), the composition-root c10_factory.py, the AZ-321 public re-export surface, and the registered config block. - batch_33_cycle1_report.md + reviews/batch_33_review.md: PASS_WITH_WARNINGS (4 Low findings accepted). Tests run: c10_provisioning 13 passing + 2 Tier-2 skips; combined unit suite (excluding pending components) 543 passing, 21 env-skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
0ad3278b12 |
[AZ-299] C7 OnnxTrtEpRuntime: ORT + TRT EP fallback strategy
Land the fallback InferenceRuntime strategy that satisfies C7-IT-05: when the TRT-direct path (AZ-298) cannot deserialise a cached engine or when the operator explicitly selects ORT, the system stays in the air at degraded latency rather than dropping the request. Conforms to the AZ-297 Protocol; current_runtime_label() == "onnx_trt_ep". Production - onnx_trt_ep_runtime.py: compile_engine is a no-op returning an EngineCacheEntry pointing at the source .onnx; deserialize_engine is gate-first for .engine entries and gate-skip for .onnx, builds an ORT InferenceSession with the provider list [TensorrtExecutionProvider, CUDAExecutionProvider, CPUExecutionProvider], stages cached engines into the ORT TRT EP cache directory via symlink-or-copy, warms up with one session.run after construction, and honours config.inference.ort_disallow_cpu_ fallback by raising EngineDeserializeError when the active provider resolves to CPU; infer emits a one-shot c7.fallback_to_onnx_trt_ep WARN log plus gcs_alert callback on first call when is_fallback= True; release_engine is idempotent. _build_provider_args is the single point that pins TRT EP option-key names (Risk-3) and caps trt_max_workspace_size at gpu_memory_budget_bytes // 4 (AC-8). - config.py: adds ort_trt_cache_dir (validated non-empty) and ort_disallow_cpu_fallback to C7InferenceConfig. - fdr_client/records.py: adds c7.fallback_to_onnx_trt_ep and c7.cpu_fallback FDR record kinds. Tests - test_onnx_trt_ep_runtime.py: covers AC-1..AC-8 + Risk-2 CPU-fallback alert + Risk-3 option-key pin + NFR-reliability error rewrap; Tier-1 via fake ORT session; Tier-2 placeholders skip on macOS dev for numerical FP16 comparison and session-creation perf NFR. - test_protocol_conformance.py: drops onnx_trt_ep from the missing- module parametrize now that the module ships. - test_az272_fdr_record_schema.py: extends per-kind fixture builder to cover the two new C7 FDR kinds in the roundtrip / schema-version AC tests. Docs - module-layout.md: replaces the pending onnx_trt_runtime row with the shipped onnx_trt_ep_runtime row + capabilities list. - batch_32_cycle1_report.md + reviews/batch_32_review.md: full batch + self-review (PASS_WITH_WARNINGS, 4 Low findings accepted). Tests run: c7_inference 139 passing + 17 Tier-2 skips; combined unit suite (excluding pending components) 529 passing, 19 env-skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
18a69022b3 |
[AZ-298] C7 TensorrtRuntime: TRT 10.3 + INT8 calib trust + GPU budget
Implement the production-default InferenceRuntime strategy on JetPack 6.2 + TensorRT 10.3 (per D-C7-9). The runtime owns the full TRT lifecycle: compile_engine via the Polygraphy + trtexec + IBuilderConfig hybrid (FP16 / INT8 / Mixed precision), deserialize_engine with EngineGate-first ordering and a pre-allocation GPU memory budget gate, infer via H2D -> enqueueV3 -> D2H -> stream sync on the owned CUDA stream, idempotent release_engine, and an injected ThermalStatePublisher delegation for thermal_state. INT8 calibration cache trust (D-C10-6, AC-2/3/4) is enforced by a .calib_cache.sha256 file-integrity sidecar (AZ-280) plus a new .calib_cache.dataset_sha256 sidecar that records the dataset content hash at compile time; reuse only when both agree, rebuild silently on dataset hash mismatch, raise CalibrationCacheError on corrupt sidecar (never silently overwritten). GPU memory budget (NFT-LIM-01, default 4 GiB) is checked BEFORE any TRT call beyond the gate (AC-6); a pre-allocation refusal raises OutOfMemoryError and leaves the resident state unchanged. TensorRT 10.3 / Polygraphy / PyCUDA are lazy-imported inside the methods that need them so the module loads cleanly on Tier-0 hosts. A standalone CLI entry (python -m gps_denied_onboard.components.c7_inference.tensorrt_runtime compile <onnx> <build_config.json>) is wired for C10 CacheProvisioner (AZ-321) to invoke pre-flight without holding a runtime instance. C7InferenceConfig gains gpu_memory_budget_bytes (default 4 GiB) and trtexec_timeout_s (default 600 s, Risk 4 mitigation), both validated in __post_init__. Tests: 26 active + 6 Tier-2-gated skips; AC-1 / AC-3 / AC-4 / AC-5 / AC-6 / AC-7 / AC-10 + NFR-reliability fully covered on Tier-1 via fake CUDA / TRT modules; AC-2 / AC-8 / AC-9 / NFR-perf-deserialize placeholders skip with prerequisite reason and live in the AZ-298 Tier-2 microbench harness. Code review verdict PASS_WITH_WARNINGS (1 Medium hot-path hoist fix auto-applied). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
d571ca25f9 |
[AZ-308] c6 CacheBudgetEnforcer: 10 GB hard cap + LRU sweep
CacheBudgetEnforcer.reserve_headroom(needed_bytes) returns immediately when total_disk_bytes() + needed_bytes <= budget, otherwise iterates lru_candidates in eviction_batch_size batches, deletes via delete_tile, emits one INFO log per evicted tile (c6.evicted) and one FDR record per eviction batch (c6.eviction_batch, evicted_tile_ids capped to 5). Raises CacheBudgetExhaustedError AFTER a full sweep if the budget cannot be met. BudgetEnforcedTileStore decorates a TileStore so the policy stays separable from PostgresFilesystemStore. Composition root in storage_factory.build_tile_store wires the wrapper unconditionally. PostgresFilesystemStore now accepts lru_clock: Clock | None = None; when set, read_tile_pixels calls record_lru_access(tile_id, now) so eviction picks the right LRU candidates. Production wiring injects WallClock(); AZ-305 unit tests still construct without the clock and keep their pass-through semantics. Contract tile_store.md bumped to v1.1.0 to add CacheBudgetExhaustedError to the TileCacheError family; shared FDR schema bumped to v1.3.0 for the new c6.eviction_batch kind. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
39ff47087f |
[AZ-307] c6 FreshnessGate: active-conflict reject + stable-rear downgrade
Replaces the AZ-305 pass-through _evaluate_freshness hook with the production FreshnessGate. Loads tile_freshness_rules + sector classifications once at construction, builds an rtree index, and on every evaluate() either returns metadata unchanged (FRESH), stamps freshness_label=DOWNGRADED (stable_rear + stale), or raises FreshnessRejectionError carrying tile_id / age_seconds / classification / rule diagnostics (active_conflict + stale). Constructed inside PostgresFilesystemStore.from_config; the public storage_factory signature is preserved so AZ-305 unit tests still build the store with freshness_gate=None for the pass-through path. FDR schema bumped to v1.2.0: adds c6.freshness.rejected and c6.freshness.downgraded kinds (non-breaking; v1.1 readers route them opaquely). Operator CLI `python -m c6_tile_cache.freshness_gate explain` dry-runs the decision for a (lat, lon, capture_ts). Adjacent hygiene: c6_tile_cache.tools._dump_tile now passes os.environ to load_config (AZ-305 regression — load_config requires the env mapping). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
d1c1cd9ab4 |
[AZ-305] c6 PostgresFilesystemStore: TileStore + TileMetadataStore impl
Adds the production PostgresFilesystemStore implementing both protocols in a single class. Filesystem-backed JPEG I/O (atomic sidecar write, read-only mmap) + Postgres-backed metadata (spatial bbox, LRU, voting, upload bookkeeping). Wires composition via `from_config` classmethod. Key behaviors: - AC-3 strict reading: INSERT runs first inside an open transaction; duplicate-key collisions raise `TileMetadataError` BEFORE any byte is written, leaving the original file + sidecar byte-identical. Atomic sidecar write happens inside the same transaction; commit closes it. Comp-delete remains as a safety net for the rare commit-after-write failure path. - AC-2 content-hash gate runs before any I/O. - Construction performs an orphan-file reconciliation scan and emits an INFO `c6.store.construct` log with steady-state stats. Adds `c6.write` and `c6.write_failed` FDR record kinds (schema v1.1.0, forward-compatible) and a thin operator CLI at `c6_tile_cache.tools dump` for inspection. Dependencies: adds `psycopg-pool>=3.2,<4.0` for the connection pool used on the F3 read-hot path. Tests: 25 new tests for c6_tile_cache cover AC-1..AC-15 plus MmapTilePixelHandle + helper round-trips. Full Tier-2 unit suite passes (1215 passed, 8 skipped, 1 pre-existing unrelated failure `test_ac8_read_host_tuple_on_jetson` — missing `pynvml` on macOS, Jetson-only). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
dde838d2cc |
[AZ-304] C6 Postgres schema: additive 0002 migration + UUIDv5
Strictly additive Alembic migration on the AZ-263 baseline (data_model
.md § 6.1 / § 6.3): six new tiles columns (tile_uuid UNIQUE,
location_hash, content_sha256, disk_bytes, accessed_at, uploaded_at),
four new btree indices, one UNIQUE expression index over the
COALESCE-zero-uuid natural key, CHECK widening of
ck_tiles_freshness_status to the AZ-263 + AZ-303 vocabulary UNION,
four NULLable bbox columns on sector_classifications, and a new
tile_freshness_rules table seeded with the two default thresholds.
Pinned UUIDv5 namespace (TILE_NAMESPACE_UUID =
5b8d0c2e-1a4f-4b3a-8c9d-e7f6a3b2c1d0) + derive_tile_id /
derive_location_hash helpers cross-coordinated with
satellite-provider. Migration runner apply_migrations(config) drives
Alembic command.upgrade("head") against the AZ-263 env with one
retry on PG SQLSTATE 40001 and structured INFO logs on apply / no-op.
Contract bump tile_metadata_store.md v1.1.0 -> v1.2.0 adds
TileMetadata.location_hash: UUID | None = None (non-breaking).
module-layout.md updated so c6_tile_cache explicitly Owns
db/migrations/**.
Tier-1 tests: UUIDv5 determinism + locked vectors + DSN resolution +
retry mocked DBAPIError -> 1180 passed, 32 skipped. Tier-2 docker
schema tests gated by @pytest.mark.docker run against the existing
docker-compose.test.yml db service.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
49a67f770d |
[AZ-302] C7 ThermalStatePublisher — jtop/NVML 1 Hz background poller
Implements AZ-297 InferenceRuntime's thermal_state() side: a singleton background-thread publisher that polls jtop (preferred) or pynvml (fallback) at config.thermal_poll_hz, stores an atomic ThermalState snapshot, and emits c7.thermal_transition FDR records on every throttle flip with a WARN log on entry and an INFO log on exit. Default-safe on TelemetryUnavailableError per Invariant I-6 with a 1-Hz rate-limited WARN. Sources return a raw ThermalReading; the publisher stamps measured_at_ns via its injected Clock so _JtopSource / _PynvmlSource stay clean of direct time.* calls (Invariant 2). _poll_once is the deterministic test seam — start() spawns the production thread. - c7.thermal_transition registered in fdr_client.records KNOWN_PAYLOAD_KEYS - [telemetry] optional dep group (jetson-stats, pynvml) added to pyproject - 14 unit tests (AC-1..AC-6, AC-8, NFR-default-safe, structural) green; AC-7 / AC-1 microbench / NFR-perf-poll Tier-2 deferred - full unit suite: 1140 passed, 11 expected Tier-2/CUDA skips Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
59f56c032f |
[AZ-301] Implement EngineGate — D-C10-3 + D-C10-7 takeoff validator
AZ-301 takeoff-side validator every InferenceRuntime strategy calls
before deserialize_engine. Five-step deterministic refusal pipeline,
in order:
1. filename schema parse -> EngineSchemaMismatchError(reason=...)
2. schema tuple match -> EngineSchemaMismatchError(expected,got)
3. sidecar present -> EngineSidecarMissingError
4. sidecar trust -> EngineHashMismatchError(stage=sidecar)
5. manifest match -> EngineHashMismatchError(stage=manifest)
Refusal order is part of the public contract (AC-7 verifies a
fixture that is BOTH schema-mismatched AND missing-sidecar refuses
at step 1).
Production code (new):
- components/c7_inference/engine_gate.py -- EngineGate, HostTuple,
read_host_tuple (Jetson: pynvml + /etc/nv_tegra_release +
tensorrt.__version__; raises RuntimeError on Tier-1)
- components/c7_inference/manifest.py -- DeploymentManifest,
ManifestReader, ManifestReaderProtocol. Risk-2 enforced at the
type level: __getitem__ raises EngineHashMismatchError on
missing key, NEVER KeyError, so the gate cannot silently pass
- components/c7_inference/__init__.py -- re-exports the new
public surface
Tests (new): tests/unit/c7_inference/test_engine_gate.py covers
AC-1..AC-7 + NFR-reliability-no-write + manifest reader + refusal
log emission. 14 tests unconditional + AC-8 Tier-2 skip (needs
real NVML + L4T release file + tensorrt binding).
Three task-spec -> as-built deltas documented in
_docs/02_tasks/done/AZ-301_c7_engine_gate.md Implementation Notes:
1. HostTuple lives in engine_gate.py (the only consumer);
re-exported from package __init__.py.
2. read_host_tuple takes precision as a keyword argument — three
of four fields come from the host, precision is engine-build
metadata supplied by the caller.
3. AC-8 is Tier-2-only; AC-1..AC-7 + NFR-reliability + extras
run on every CI host.
Risk-2 (manifest reader silently treats missing entry as pass):
DeploymentManifest.__getitem__ raises EngineHashMismatchError with
"missing manifest entry for {path}" — covered by
test_manifest_missing_entry_raises_hash_mismatch.
NFR-perf-validate (p99 <= 50 ms): tier-2 only — a real 500 MB
engine streaming sha256 cannot be benchmarked on Tier-1 fixtures.
AZ-302 (ThermalStatePublisher) + AZ-304 (C6 Postgres schema)
deferred to batches 26 / 27 to keep the 1-task batch cadence and
isolate their respective env / testcontainer surface areas.
Suite: 1134 passed / 11 skipped. No regressions outside the new
files.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
65ad2168ed |
[AZ-300] Implement PytorchFp16Runtime — C7 simple-baseline strategy
AZ-300 mandatory simple-baseline InferenceRuntime (eager FP16 PyTorch).
Implements the AZ-297 Protocol; current_runtime_label returns
"pytorch_fp16". Numerical reference every fancier C7 strategy (AZ-298
TRT, AZ-299 ORT) is measured against, and the only viable runtime for
Tier-1 workstation Docker where TRT is non-trivial to install.
Production code (new):
- components/c7_inference/pytorch_fp16_runtime.py — runtime +
PytorchEngineHandle + output-shape adapter
- components/c7_inference/architecture_registry.py — torch-free
register_architecture / default_registry / ArchitectureFactory
(Risk-1 mitigation: no L2->L3 back-edge from C7 into per-backbone
code)
- components/c7_inference/__init__.py — re-exports the registry
mechanism. Still does NOT import the concrete strategy module
(Invariant I-5)
- components/c7_inference/config.py — adds per_frame_debug_log bool
field (gates the DEBUG per-frame latency log)
Tests (new): tests/unit/c7_inference/test_pytorch_fp16_runtime.py
covers AC-1..AC-8 + NFRs. AC-1/2/6/7 + thermal/release/registry
guards run unconditionally (17 tests); AC-3/4/5/8 +
NFR-perf-deserialize + NFR-reliability-eval-mode require CUDA and
skip on Tier-1 CI / macOS dev.
Tests (modified):
- test_protocol_conformance.py — narrowed
test_ac5_build_inference_runtime_flag_on_but_module_missing
parametrisation to exclude pytorch_fp16 (now-built); TRT / ORT
still covered until AZ-298 / AZ-299 ship.
CI: .github/workflows/ci.yml lint + unit jobs now install
'-e .[dev,inference]' because mypy + pytest need torch + torchvision +
onnxruntime on the runner.
Three task-spec -> as-built deltas documented in
_docs/02_tasks/done/AZ-300_c7_pytorch_baseline.md Implementation Notes:
1. Constructor conforms to AZ-297 factory shape (config positional;
thermal_publisher + registry + clock keyword-only optionals).
AZ-302 will update the factory to thread thermal_publisher.
2. Architecture registry uses extras["model_name"] as lookup key
(avoids touching the frozen BuildConfig / EngineCacheEntry DTOs).
3. Warm-up forward deferred to AZ-300 tier-2 follow-up — the zero-arg
registry has no per-backbone input-shape metadata.
Suite: 1120 passed / 10 skipped (CUDA + Tier-2 + cmake / actionlint
environment gates). No regressions in non-c7_inference areas.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
1ebab29a4f |
[AZ-332] C1 OKVIS2 Strategy: facade + binding skeleton
Python facade (`Okvis2Strategy`) is production-quality and satisfies
AZ-331's `VioStrategy` protocol; full AC-1..10 coverage with
AC-9 + NFR-perf marked `tier2`. The C++ pybind11 binding compiles
and loads but throws `OkvisFatalException("estimator not yet wired")`
on first `add_frame` — the `okvis::ThreadedKFVio` wiring is a tier2
follow-up the Step-15 Product Completeness Gate is expected to track
as a remediation task.
Resolved contradictions:
* Constructor signature aligned with the AZ-331 factory: `(config, *,
fdr_client, clock=None)`. Calibration / preintegrator / logger
built internally from config. No churn on AZ-331.
* IMU substrate: OKVIS2 owns its internal estimator IMU integration;
the AZ-276 `ImuPreintegrator` is a separate substrate consumed by
E-C5's fusion graph. Single source of truth lives at the sample
stream, not the integrator instance.
* FDR API: `FdrClient.enqueue(record)` with new `vio.health` kind
added to AZ-272 `KNOWN_PAYLOAD_KEYS`.
CI matrix forces `-DBUILD_OKVIS2=OFF` until the tier2 wiring task
brings Ceres / SuiteSparse / OKVIS2 vendored submodules into the
Linux build.
Files: 17 added/modified across `c1_vio/`, `fdr_client/records.py`,
`cpp/okvis2/CMakeLists.txt`, CI workflow, AZ-332 task spec
(implementation-notes section), batch 23 report.
Tests: 17 new (15 tier1 + 2 tier2). Full Tier-1 suite: 1109 pass,
2 skipped (env), 2 deselected (tier2). No regressions.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
48ea1e2fc2 |
[AZ-343] C2.5 InlierCountReRanker + shared FeatureExtractor helper
Implements the production-default ReRankStrategy: K=10 → N=3 by single-pair LightGlue inlier count, with strict drop-and-continue (INV-8) on per-candidate TileFetch / backbone / zero-inlier failures and RerankAllCandidatesFailedError on zero survivors. Composition root injects the shared LightGlueRuntime + Clock + the new FeatureExtractor helper (an L1 placeholder OpenCvOrbExtractor that unblocks AZ-343 and future C3 strategies — task scope expansion). Architectural notes: - Cross-component imports stay banned; tile_store types as `object` and the C6 TileCacheError family is duck-typed by class module prefix (same workaround AZ-348 adopted for c7_inference; proper fix is to relocate TileCacheError to _types/ in a follow-up). - Clock injection follows the replay contract (AZ-398 Invariant 2); reranked_at is sourced from clock.monotonic_ns(). - AZ-342 factory grew `feature_extractor` + `clock` + `fdr_client` parameters; existing AZ-342 conformance tests updated. Tests: 19 new AC-1..AC-12 + mixed-failure scenarios in test_inlier_count_reranker.py; existing AZ-342 suite (26) still green. Full repo sweep 1093 passed / 2 skipped (cmake/actionlint not on PATH). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
9a605c8514 |
[AZ-348] C3.5 ConditionalRefiner Protocol + factory + PassthroughRefiner
Defines the public `ConditionalRefiner` Protocol (PEP 544 @runtime_checkable, two methods: `refine_if_needed` + `was_invoked`), extends `MatchResult` in-place with two default-valued refinement fields (`refinement_label`, `refinement_added_latency_ms`), defines the `RefinerError` family (`RefinerBackboneError`, `RefinerConfigError`), and ships the trivial `PassthroughRefiner` reference impl. Both refiner strategies are linked unconditionally — no `BUILD_REFINER_*` flag (NOT ADR-002 territory). Runtime selection only per ADR-001. `PassthroughRefiner` returns the input `MatchResult` by reference (bit-identical correspondences per contract INV-5) and always reports `was_invoked() is False`. Documentation: renames `module-layout.md` `c3_5_adhop` Public API symbol from `AdHoPRefinementStrategy` to `ConditionalRefiner` (AC-14) so the doc agrees with `description.md` and the contract. AC-9 (single-thread binding) deferred to AZ-270 runtime-root composition, mirroring AZ-336 / AZ-342 / AZ-344 Risk-4 precedent. AC-7 for the `"adhop"` strategy stops at `ModuleNotFoundError` because the AdHoP backbone is owned by AZ-349. All other ACs + NFRs covered by 36 new conformance tests. Architectural note: `PassthroughRefiner.inference_runtime` is typed as `object` because the L3→L3 import ban (`test_az270_compose_root`) forbids c3_5_adhop from importing c7_inference; the runtime-root factory narrows the type at construction time. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
89c223882b |
[AZ-344] C3 CrossDomainMatcher Protocol + factory + RollingHealthWindow
Defines the public `CrossDomainMatcher` Protocol (PEP 544 @runtime_checkable, two methods: `match` + `health_snapshot`), the three frozen+slotted DTOs (`CandidateMatchSet`, `MatchResult`, `MatcherHealth`) in the L1 `_types/matcher.py` layer, the `MatcherError` family (`MatcherBackboneError`, `InsufficientInliersError`), and the composition-root `build_matcher_strategy` factory with lazy-import + `BUILD_MATCHER_<variant>` gating per ADR-002. `RollingHealthWindow` accumulator (60 s, amortised O(1) update, strict O(1) snapshot) is constructed by the factory and injected into every concrete matcher so all backbones share window semantics; this is what backs C5's spoof-promotion gate. Legacy placeholder `MatchResult` removed from `_types/matching.py`; import-only consumers (`c4_pose.interface`, `c3_5_adhop.interface`) repointed at the new `_types/matcher.py` home — zero behavioural change to those components. AC-9 (single-thread binding) and AC-10 (LightGlueRuntime identity-share with C2.5) deferred to AZ-270 runtime-root composition, mirroring the AZ-342 Risk-4 escape clause. All other ACs + NFRs covered by 70 new conformance tests. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
d6756f1855 |
[AZ-342] C2.5 ReRankStrategy: Protocol + DTOs + factory + composition
Foundational scaffolding for the InlierCountReRanker (AZ-343) and the future C3 CrossDomainMatcher consumer (AZ-344). No concrete re-ranker is implemented here. * ReRankStrategy Protocol (single rerank(frame, vpr_result, n, calibration) -> RerankResult method) with all 8 invariants in the docstring — notably INV-8 drop-and-continue (per-candidate failure NEVER propagates unless every candidate fails). * DTOs moved to L1 _types/rerank.py — RerankCandidate, RerankResult; frozen+slots; tuple-not-list for RerankResult.candidates; tile_id encoded as (zoom_level, lat, lon) tuple to keep _types/ free of any c6_tile_cache (L3) import per module-layout.md. * Error family: RerankError + RerankBackboneError + RerankAllCandidatesFailedError. Only RerankAllCandidatesFailedError escapes rerank(); RerankBackboneError is caught inside the per- candidate loop, logged ERROR, FDR-stamped, candidate dropped. * C2_5RerankConfig (strategy enum default "inlier_count", top_n int default 3) with strict validation at load; registered into Config.components on c2_5_rerank import. * build_rerank_strategy(config, *, tile_store, lightglue_runtime) factory: 1-strategy resolution table, lazy import, BUILD_RERANK_<variant> gate, ImportError → StrategyNotAvailableError mapping. The shared LightGlueRuntime is constructor-injected (R14 fix: neither C2.5 nor C3 owns its lifecycle). Renamed the Protocol from the existing stub "RerankStrategy" to "ReRankStrategy" to match the contract; updated module-layout.md. Removed the legacy RerankResult shape from _types/vpr.py — the v1.0.0 shape lives in _types/rerank.py. Excluded per task spec: * Concrete InlierCountReRanker (AZ-343). * C3 matcher protocol task (AZ-344, next in batch). * AC-9 single-thread binding + AC-10 LightGlueRuntime identity-share between C2.5/C3 — deferred per task spec Risk 3 until the generic compose_root thread-binding registry and the C3 factory both land. Tests: AC-1..AC-8 + AC-11 + NFR-perf-factory in tests/unit/c2_5_rerank/test_protocol_conformance.py. The legacy smoke test is removed. Full sweep: 997 passed (one pre-existing flake in test_az296_takeoff_abort, subprocess timing, unrelated to this commit; passes in isolation). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
3665acef66 |
[AZ-336] C2 VprStrategy: Protocol + DTOs + factory + composition
Foundational scaffolding for every concrete C2 backbone (UltraVPR, NetVLAD, MegaLoc, MixVPR, SelaVPR, EigenPlaces, SALAD — AZ-337..AZ-340) and the C2.5 ReRanker consumer side. No backbone is implemented here. * VprStrategy Protocol (embed_query / retrieve_topk / descriptor_dim) + BackbonePreprocessor C2-internal Protocol (NOT in Public API per description.md § 6). * DTOs in L1 _types/vpr.py — VprQuery, VprCandidate, VprResult; all frozen + slots; tuple-not-list for VprResult.candidates so the immutability invariant truly holds. * Error family: VprError + VprBackboneError + VprPreprocessError + IndexUnavailableError; same-named but namespace-distinct from c6_tile_cache.IndexUnavailableError (the c2 family is the closed envelope C5 / C2.5 consume; concrete strategies rewrap the C6 form). * C2VprConfig (strategy enum + backbone_weights_path + faiss_index_path) with strict validation at load; registered into Config.components on c2_vpr import. * build_vpr_strategy factory with 7-strategy resolution table, lazy import, BUILD_VPR_<variant> gating, ImportError→ StrategyNotAvailableError mapping, and pre-flight descriptor_dim match against DescriptorIndex.descriptor_dim() — mismatch fires ConfigError at startup, NOT at first frame. Contract change vs the v1.0.0 draft: factory takes descriptor_index: DescriptorIndex (not tile_store: TileStore) because descriptor_dim() lives on DescriptorIndex per C6's Public API. The contract markdown is updated to match. Architecture: VprCandidate.tile_id is a plain (zoom, lat, lon) tuple, keeping _types/ (L1) free of any c6_tile_cache (L3) import per module-layout.md. Consumers reconstruct TileId at the C6 boundary. Excluded per task spec: * Concrete backbones (AZ-337..AZ-340). * FAISS HNSW retrieve wiring (AZ-341). * DescriptorNormaliser helper (AZ-283, already shipped). * AC-9 single-thread binding — deferred per task spec Risk 4 until the generic compose_root thread-binding registry is in place (today each factory owns its own, e.g. fc_factory). Tests: 45 ACs + NFRs in tests/unit/c2_vpr/test_protocol_conformance.py covering AC-1..AC-8, the error family, the config validation, the factory NFR (p99 ≤ 50 ms). The legacy smoke test is removed. Full sweep 973 passed, 2 skipped (CI-only cmake / actionlint). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
823c0f1b2e |
[AZ-398] Replay: FrameSource + Clock Protocols + Clock injection
Ship the two Layer-1 cross-cutting Protocols replay mode needs to leave production C1-C5 components mode-agnostic (Invariant 1) and replay- deterministic (Invariant 2). Live + replay binaries see the same interfaces; only the strategy differs. * Clock Protocol (monotonic_ns / time_ns / sleep_until_ns) + WallClock (live + REALTIME replay) + TlogDerivedClock (ASAP replay; advance-on-call; non-monotonic source → ClockOrderingError). * FrameSource Protocol (next_frame -> NavCameraFrame | None / close) + LiveCameraFrameSource (cv2.VideoCapture device index) + VideoFileFrameSource (cv2.VideoCapture file). * Build-flag gating: BUILD_VIDEO_FILE_FRAME_SOURCE, BUILD_LIVE_CAMERA_FRAME_SOURCE (constructor-time check; Tier-0 OFF refuses construction with FrameSourceConfigError). * Composition-root factories: build_clock + build_frame_source. * Injected Clock across every component that previously called time.monotonic_ns() / time.sleep() directly: c5_state (estimator, ESKF, fallback watcher, source-label SM, isam2 handle), c8_fc_adapter (inbound MAVLink + MSP2, AP outbound, iNav outbound, QGC GCS), c13_fdr writer, c12_operator_tooling httpx flights client. All constructors default to WallClock() so existing call sites keep live-binary behaviour without a wiring change. * AC-4 CI guard (tests/_meta/test_no_direct_time_in_components.py) AST-scans components/**/*.py for direct time.monotonic_ns / time.time_ns / time.sleep references and fails loudly with file:line. * Conformance + factory tests: tests/unit/clock + tests/unit/frame_source. * Test fixture updates: FallbackWatcher / SourceLabelStateMachine clock_ns is now required (removed time.monotonic_ns default); test_az388 patches estimator._clock instead of a module-level time; test_az393 ardupilot adapter uses a _FixedClock test double. Excluded per the task spec: TlogReplayFcAdapter (AZ-399), ReplaySink (AZ-400), compose_replay (AZ-401), CLI (AZ-402), Docker/CI (AZ-403), E2E fixture (AZ-404), IMU auto-sync (AZ-405). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
6c7d24f7e0 |
[AZ-331] C1 VioStrategy: Protocol + DTOs + factory + C5 migration
Freezes the c1_vio Public API per _docs/02_document/contracts/c1_vio/vio_strategy_protocol.md v1.0.0: - VioStrategy Protocol (4 methods: process_frame, reset_to_warm_start, health_snapshot, current_strategy_label) in components/c1_vio/interface.py. - DTOs (VioOutput, VioHealth, FeatureQuality, WarmStartPose) + VioState enum in _types/nav.py — L1 placement so C5 + C13 consume them without crossing the components.* boundary (AZ-270 AC-6). The new VioOutput shape (frame_id: str, relative_pose_T: gtsam.Pose3, pose_covariance_6x6, imu_bias, feature_quality, emitted_at_ns) replaces the AZ-263 scaffolding in _types/vio.py, which is now deleted. - VioError family (VioInitializingError / VioDegradedError / VioFatalError) in components/c1_vio/errors.py. Documented rationale: the degraded-operation path returns a VioOutput with inflated covariance + VioHealth.state=DEGRADED rather than raising VioDegradedError — the error type exists only for the rare degraded->fatal transition. - C1VioConfig per-component config block (strategy enum, lost_frame_threshold default 9, warm_start_max_frames default 5) with constructor-time validation rejecting unknown strategy labels. - StrategyNotAvailableError added to runtime_root/errors.py; composition-time error distinct from the VioError family. - Composition-root factory build_vio_strategy in runtime_root/vio_factory.py with three BUILD_* gates (BUILD_OKVIS2, BUILD_VINS_MONO, BUILD_KLT_RANSAC). Concrete strategy modules are imported lazily via __import__ AFTER the flag check — Tier-0 workstation builds with the flag OFF MUST NOT load the strategy module (Risk-2 / I-5; verifiable via sys.modules). - 36 conformance tests cover all 9 ACs + NFR-perf-factory (p99 build under 200 ms x 1000 calls) + NFR-reliability-error-family. AC-8 introspects the contract file's Shape table and asserts method parity against the runtime Protocol; AC-9 asserts the frame_id annotation is 'str' (PEP-563 stringified). C5 migration (consumers of the new VioOutput shape): - gtsam_isam2_estimator.py + eskf_baseline.py: replaced vio.timestamp -> vio.emitted_at_ns (drops _datetime_to_ns on the VIO path), vio.pose_se3 -> vio.relative_pose_T (gtsam.Pose3 direct; drops _pose_se3_to_gtsam / _pose_se3_to_array), vio.covariance_6x6 -> vio.pose_covariance_6x6 (rename). - key_for_frame signature widened to UUID | int | str to accept the new str frame_id. - 4 C5 test files migrated to the new VioOutput shape with helper fixtures producing ImuBias + FeatureQuality + str frame_id. - c5_state/interface.py TYPE_CHECKING import path updated. Bootstrap healthcheck + test_types_importable updated to drop the deleted _types/vio module and pick up _types/inference (AZ-297) in the same sweep. Full unit-test sweep: 884 passed, 2 pre-existing environment skips (cmake, actionlint). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
daff5d4d1c |
[AZ-297] C7 InferenceRuntime: Protocol + DTOs + factory
Freezes the c7_inference Public API per _docs/02_document/contracts/c7_inference/inference_runtime_protocol.md v1.0.0: - InferenceRuntime Protocol (6 methods: compile_engine, deserialize_engine, infer, release_engine, thermal_state, current_runtime_label) in components/c7_inference/interface.py. - DTOs (PrecisionMode enum, OptimizationProfile, BuildConfig, EngineCacheEntry, EngineHandle opaque marker) in _types/inference.py — placed at the L1 types layer so C10 can re-export EngineCacheEntry without crossing the components.* boundary (AZ-270 AC-6). - ThermalState DTO expanded in _types/thermal.py from the AZ-355 forward-declared stub to the AZ-297 contract shape (cpu/gpu temp, thermal_throttle_active, measured_clock_mhz, measured_at_ns, is_telemetry_available). Invariant I-6: when telemetry is unavailable, throttle is False. - Error family rooted at c7_inference.errors.RuntimeError (9 subtypes: EngineBuildError, EngineDeserializeError, EngineHashMismatchError, EngineSchemaMismatchError, EngineSidecarMissingError, CalibrationCacheError, InferenceError, OutOfMemoryError, TelemetryUnavailableError). RuntimeNotAvailableError stays in runtime_root/errors.py — composition-time, outside the family. - C7InferenceConfig per-component config block (runtime label, thermal_poll_hz, engine_cache_dir) with constructor-time validation rejecting unknown runtime labels. - Composition-root factory build_inference_runtime in runtime_root/inference_factory.py with three BUILD_* gates (BUILD_TENSORRT_RUNTIME, BUILD_ONNX_TRT_EP_RUNTIME, BUILD_PYTORCH_FP16_RUNTIME). Concrete strategy modules are imported lazily via __import__ AFTER the flag check, so a Tier-0 build with the flag OFF MUST NOT load the strategy module (AC-5 / I-5; verifiable via sys.modules). - 37 conformance tests cover all 8 ACs + NFR-perf-factory (p99 build under 200 ms × 1000 calls) + NFR-reliability-error-family. AC-8 introspects the contract file's Shape table and asserts method parity against the runtime Protocol; also asserts all 9 error subtypes are documented. Retired the AZ-263 scaffolding EngineCacheEntry from _types/manifests.py (replaced by the AZ-297 canonical shape in _types/inference.py); updated the LightGlue-flavoured EngineHandle Protocol docstring in _types/manifests.py to rationalize its intentional dual existence with the C7 opaque EngineHandle (same name, different consumer-side cut, mirroring the C4/C5 ISam2GraphHandle pattern). Stale ThermalState.throttle docstring references in c4_pose/config.py, c4_pose/interface.py, and _types/pose.py updated to thermal_throttle_active. Full unit-test sweep: 843 passed, 2 pre-existing environment skips (cmake, actionlint). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
f925af9de3 |
[AZ-303] C6 storage interfaces: Protocols + DTOs + factories
Freezes the c6_tile_cache Public API per
_docs/02_document/contracts/c6_tile_cache/{tile_store,tile_metadata_store,
descriptor_index}.md v1.0.0:
- Three runtime_checkable Protocols (TileStore 4-method, TileMetadataStore
9-method, DescriptorIndex 5-method) in components/c6_tile_cache/interface.py.
- Frozen DTOs + enums (TileId, TileMetadata, TileMetadataPersistent,
TileQualityMetadata, Bbox, SectorBoundary, HnswParams, IndexMetadata,
TileSource, FreshnessLabel, VotingStatus, SectorClassification) in
components/c6_tile_cache/_types.py. Constructor-time validation rejects
out-of-range zoom_level / lat / lon and inverted Bbox.
- TilePixelHandle ABC for read-only mmap access (Invariant I-4).
- TileCacheError family (6 subtypes) + IndexBuildError (deliberately
outside the family) in components/c6_tile_cache/errors.py.
- C6TileCacheConfig per-component config block, registered on package
import; validates known runtime labels at construction time.
- Composition-root factories build_tile_store / build_tile_metadata_store /
build_descriptor_index in runtime_root/storage_factory.py, with lazy
concrete-impl imports gated by BUILD_FAISS_INDEX (AC-5 / Risk 2:
no module-level FAISS import when the flag is OFF).
- RuntimeNotAvailableError defined in runtime_root/errors.py to be shared
with AZ-297 (composition-time error, distinct from per-component
runtime errors).
51 conformance tests cover all 10 ACs + NFR-perf-factory (p99 build_*
under 50 ms across 1000 calls) + NFR-reliability-error-family. AC-9
introspects each contract file's Shape table and asserts method
parity against the runtime Protocol.
Retired the AZ-263 scaffolding SectorClassification (dataclass) and
TileQualityMetadata from _types/tile.py since their canonical home is
now c6_tile_cache._types; Tile and TileRecord remain in _types/tile.py
until c3_matcher (AZ-344) and c11_tile_manager (AZ-316/319) retire
their interface stubs.
Full unit-test sweep: 791 passed, 2 pre-existing environment skips.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
8a83166261 |
[AZ-490] C5 set_takeoff_origin entrypoint + bounded-delta GPS gate
Add operator warm-start path to C5 StateEstimator Protocol and both
implementations (GtsamIsam2StateEstimator, EskfStateEstimator), plus
the third clause of the AZ-385 spoof-promotion gate.
- StateEstimator Protocol: set_takeoff_origin(origin, sigma_horiz_m,
sigma_vert_m) -> None.
- iSAM2: PriorFactorPose3 at origin with diagonal sigmas, single
isam2.update().
- ESKF: zero _nominal_pos, overwrite _P position block with sigma**2.
- SourceLabelStateMachine.process_gps_sample bounded-delta clause:
WgsConverter.horizontal_distance_m vs smoother estimate; reject
resets the dwell-time counter so AZ-385 cannot re-promote off bad
GPS.
- New EstimatorAlreadyStartedError (StateEstimatorConfigError
subclass) on late call after first add_*.
- C5StateConfig: spoof_promotion_bounded_delta_m=200,
default_takeoff_origin_sigma_horiz_m=5,
default_takeoff_origin_sigma_vert_m=10.
- New GpsSample DTO + WgsConverter.horizontal_distance_m helper.
- 4 new FDR kinds (cold_start_origin.{set,unavailable},
gps_bounded_delta.{accept,reject}) registered in AZ-272 schema.
- 33 new unit tests cover AC-1..AC-15; full repo 750 passed / 2
skipped (pre-existing CI tooling skips).
Docs synced: protocol contract, C5 component description,
architecture, glossary, system-flows, C10 provisioning description.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
72a06edab0 |
[AZ-489] C12 FlightsApiClient + offline JSON loader + bbox helper
ADR-010 primary cold-start path now has a real source for the cache bbox and the takeoff origin. Single concrete strategy (`HttpxFlightsApiClient`) behind a `@runtime_checkable` Protocol; offline JSON fallback (`load_flight_file`) shares the same DTO shape per FAC-INV-1. * `flights_api/interface.py` — `FlightsApiClient` Protocol + `FlightDto` + `WaypointDto` + `WaypointObjective` / `WaypointSource` enums (plain frozen-slotted dataclasses, matching project's LatLonAlt / PoseEstimate pattern). * `flights_api/errors.py` — 8-class hierarchy under `FlightsApiError`. * `flights_api/_parser.py` — shared JSON validator: range checks, lat/lon bounds, contiguous ordinals, finite floats, enum membership. * `flights_api/bbox.py` — `bbox_from_waypoints` envelopes lat/lon and inflates by a horizontal-distance buffer via WgsConverter ENU round-trip (NOT degree-space); `takeoff_origin_from_flight` passes waypoints[0] through unrounded. * `flights_api/file_loader.py` — orjson-backed offline loader. * `flights_api/httpx_client.py` — concrete client with ONE retry on transient 5xx + connect errors; token redaction at every log site; test-injectable transport + sleep. * `runtime_root/c12_factory.py` — `build_flights_api_client(config)`; re-exported from `runtime_root/__init__.py`. OperatorToolServices aggregate intentionally deferred to AZ-328 per scope discipline. * `pyproject.toml` — `httpx>=0.28,<1.0` added (chosen over `requests` for native `MockTransport` testing). Tests: 28 cases across AC-1..AC-18 plus extras (malformed JSON, negative buffer, zero buffer, missing top-level fields, negative ordinal, empty-flight takeoff). Full repo run: 713 passed, 2 skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
db27e25630 |
[AZ-355] C4 PoseEstimator Protocol + factory + DTOs + composition
Land the foundational C4 surface AZ-358 (Marginals) and AZ-361 (Hybrid) build on top of: - PoseEstimator Protocol (@runtime_checkable): estimate(...) + current_covariance_mode(). - Error hierarchy: PoseEstimatorError, PnpFailureError, PoseEstimatorConfigError; CovarianceDegradedWarning as a Warning subclass (warnings.warn path, not raised). - ISam2GraphHandle Protocol stub (READ-ONLY view, get_pose_key only) decoupled from C5's concrete ISam2GraphHandleImpl. - C4PoseConfig (frozen dataclass) + register on c4_pose import. - runtime_root/pose_factory.build_pose_estimator with lazy-import fallback; INFO log c4.pose.strategy_loaded; shares ingest-thread binding with C5 per ADR-003. DTO restructuring (cross-cutting): retire the legacy raw-4x4 PoseEstimate(int frame_id, datetime timestamp, pose_se3, ...) and ship the contract shape PoseEstimate(UUID, LatLonAlt, Quat, np.ndarray, CovarianceMode, PoseSourceLabel, last_satellite_anchor_age_ms, emitted_at). C5 add_pose_anchor in both gtsam_isam2 + eskf_baseline migrated in lockstep via WGS84->ENU + Quat->R helpers; test fixtures updated. VIO output stays on the raw shape until AZ-331 (C1 protocol) lands. LatLonAlt upgraded to slots=True per AC-2. ThermalState stub added to _types/thermal.py so the Protocol typechecks pre-AZ-302. Tests: 25 new in tests/unit/c4_pose/test_az355_pose_protocol.py covering AC-1..AC-10 + factory wiring + config validation; full repo: 685 passed, 2 pre-existing CI-only skips. Jira transition deferred: MCP "Not connected"; leftover entry in _docs/_process_leftovers/2026-05-11_jira_transition_az355_deferred.md. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
c0bdb57957 |
[AZ-386] C5 ESKF baseline: 16-state error-state KF (NumPy)
Implements the mandatory simple-baseline StateEstimator per AC-2.1a engine-rule at C5 (IT-12 comparative study vs iSAM2). NumPy-only; no GTSAM dependency so BUILD_STATE_ESKF=ON binaries ship without GTSAM at all. - 16-state error vector (pos 3 + vel 3 + rot 3 + ba 3 + bg 3 + dt 1) over a textbook nominal-state / error-state ESKF split. - add_fc_imu: full nonlinear IMU integration + linearised F P F^T + Q covariance propagation per IMU sample. - add_vio: simplified relative-pose update (snapshot-based; baseline scope, documented). - add_pose_anchor: absolute-pose update; integrates BOTH marginals and jacobian modes (no skip — ESKF has no graph; AC-4). - AC-9 divergence test: Mahalanobis r^T S^-1 r > 100 (10 sigma) on the innovation covariance S = H P H^T + R. - AC-5 SPD: Cholesky-positive enforcement on every emitted covariance; non-SPD raises EstimatorFatalError and locks state to LOST. - AC-6 honesty: smoothed_history entries carry smoothed=False; deviation from C5 contract Invariant 7 documented in module + report. - AC-7 / AC-10 BUILD_STATE_ESKF gating: works through existing factory infra (state_factory._STATE_BUILD_FLAGS). - AC-8: SourceLabelStateMachine + FallbackWatcher auto-wired eagerly in __init__, same pattern as the iSAM2 estimator. Tests: 20 new unit tests covering AC-1..AC-10 + robustness checks. Full suite: 660 passed, 2 skipped (CI-only). The AZ-386 Jira transition to Done is deferred (Atlassian MCP returned 'Not connected'); recorded in _docs/_process_leftovers/ for replay on the next autodev invocation per the Leftovers Mechanism. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
098aabac0c |
[AZ-387] C5 smoothed-history → FDR side-channel
After every successful current_estimate(), emit one c5.state.smoothed_history FDR record per newly-smoothed past keyframe from IncrementalFixedLagSmoother. AC-4.5 (revised): the smoothed stream goes ONLY to FDR; the C8 outbound forward-time stream is unaffected. Idempotency via _smoothed_fdr_watermark_s (smoother-native float seconds); the same pose key is never emitted twice. Hook is best-effort — internal failures log warnings but do not raise, so a smoother divergence cannot contaminate the forward-time path. Cross-task invariants documented: - AC-3 ESKF no-op — AZ-386 installs an inert hook on the ESKF. - AC-4 No C8 leak — enforced at the C8 boundary by AZ-261. 8 new unit tests against AC-1/2/5/6 + robustness (no-FDR-client, marginals failure). Full suite: 640 passed, 2 skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
7cbd17ee83 |
[AZ-385] C5 SourceLabelStateMachine + spoof-promotion gate
Implements Invariants 5 + 8 + AC-NEW-2 / AC-NEW-8: the EstimatorOutput.source_label now reflects a real state machine (DEAD_RECKONED → SATELLITE_ANCHORED ↔ VISUAL_PROPAGATED) governed by a spoof-promotion gate that latches closed on FC SPOOFED GPS health and re-opens only when BOTH conditions hold — ≥10 s STABLE_NON_SPOOFED AND next anchor within spoof_promotion_visual_consistency_tol_m. Every reject emits a c5.state.spoof_rejected FDR record plus a subscriber-fan-out STATUSTEXT (severity WARNING, 50-char cap per MAVLink). FDR and subscriber paths bypass the standard logger so silencing logs cannot suppress the spoof trail (R07 / AC-6). GtsamIsam2StateEstimator now eagerly builds the SM from C5StateConfig in __init__; new public methods notify_gps_health() (delegates to SM, called by composition root from C8 inbound) and subscribe_spoof_rejection() (composition root attaches C8's QgcTelemetryAdapter here). health_snapshot.spoof_promotion_blocked + current_estimate.source_label now flow from the live SM. 25 new unit tests across all 12 ACs plus cancellation, subscriber exception isolation, and estimator wire-up integration cases. One AZ-384 test renamed + updated to expect DEAD_RECKONED before any anchor (was VISUAL_PROPAGATED placeholder pre-AZ-385). Full suite: 632 passed, 2 skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
31a300f8a2 |
[AZ-388] C5 AC-5.2 no-estimate fallback detector + signal emission
Implements Invariant 9 / AC-5.2: when current_estimate cannot return a fresh output for >= state.no_estimate_fallback_s (default 3.0 s), emit ONE engagement signal (FDR kind=c5.state.no_estimate_fallback_engaged + GCS STATUSTEXT severity CRITICAL); on recovery, ONE recovery signal (FDR kind=c5.state.no_estimate_fallback_recovered + STATUSTEXT NOTICE). Rate-limited via single _in_fallback latch (AC-2: 30 s sustained no-estimate still emits exactly one engagement). New FallbackWatcher class owns the state machine; estimator wires it through constructor + current_estimate entry/success hooks. Public check_fallback_state(now_ns) watchdog (NFR p99 <= 5 us) + subscribe APIs let C8 outbound react without coupling C5 to a concrete GCS adapter at construction. Severity enum extended with CRITICAL=2 and NOTICE=5 to match MAVLink MAV_SEVERITY. 18 new unit tests across all 8 ACs, deterministic synthetic clock, integration tests patch monotonic_ns through GtsamIsam2StateEstimator to drive AC-7 iSAM2 leg (ESKF leg deferred to AZ-386). Full suite: 607 passed, 2 skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
b3ad94c155 |
[AZ-384] C5 marginals + current_estimate/smoothed_history/health_snapshot
Replaces the last three NotImplementedError placeholders on GtsamIsam2StateEstimator with real Marginals + output methods: - current_estimate(): recovers the 6x6 Marginals covariance for the most-recently committed pose key, enforces the SPD invariant via np.linalg.cholesky (Invariant 10), converts the local-ENU pose translation to WGS84 via the shared WgsConverter, derives a body->world quaternion, and emits a fresh EstimatorOutput (smoothed=False, Invariant 4). On SPD failure transitions isam2_state -> LOST and raises EstimatorFatalError (AC-5.2 path). - smoothed_history(n): iterates the smoother's active POSE keys via _smoother.calculateEstimate().keys() (filtered by GTSAM symbol char) and the smoother timestamps via ts_map.at(key) - workaround for the pinned gtsam_unstable build's non-iterable FixedLagSmootherKeyTimestampMap. Bounded by K (Invariant 6); every entry has smoothed=True (Invariant 7). - health_snapshot(): cheap O(1) accumulator read; reports IsamState lifecycle, pose-key count, AC-NEW-8 cov_norm_growing_for_s rolling 60s deque-backed counter, and spoof_promotion_blocked via the AZ-385 state machine injection point. Adds two public injection points for AZ-385/composition root: set_enu_origin(LatLonAlt) and attach_source_label_state_machine(machine). Defaults: (0, 0, 0) ENU origin, VISUAL_PROPAGATED source label, spoof_promotion_blocked=False. Wires _record_committed_pose_key into the three add_* success paths so current_estimate only reads keys that have real values in iSAM2. The JACOBIAN path in add_pose_anchor deliberately skips this call - Invariant 3 keeps the JACOBIAN pose out of the iSAM2 graph. Tests: +27 in tests/unit/c5_state/test_az384_marginals_outputs.py covering all 10 ACs. Three obsolete AZ-382 tests (test_ac10_*_raises_named_az384) removed. Full suite: 589 passed, 2 skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
fd848266d1 |
[AZ-383] C5 add_vio/add_pose_anchor/add_fc_imu factor adds
Replaces AZ-382 NotImplementedError placeholders with real GTSAM factor adds wired against the iSAM2 graph handle: - add_vio -> BetweenFactorPose3 between consecutive VIO pose keys (first call primes the chain; AZ-388 owns first-keyframe seeding). - add_pose_anchor -> mode-dispatch per pose.covariance_mode: "marginals" -> PriorFactorPose3 + handle.update(); "jacobian" -> skip iSAM2 add per AZ-361 contract. Both paths bump _last_anchor_ns via time.monotonic_ns(). - add_fc_imu -> shared ImuPreintegrator.integrate_window + reset_for_new_keyframe; builds a CombinedImuFactor between the prev/curr (X, V, B) keyframe triple. Introduces new 'v' (velocity) and 'b' (bias) GTSAM key namespaces decoupled from the VIO/pose frame_id mapping. Invariant 2 - non-decreasing timestamps - enforced per call with EstimatorDegradedError + c5.state.out_of_order log. Every successful add emits a structured DEBUG *_ok log; every failure emits a structured ERROR *_failed log and raises through the C5 error hierarchy (R05). Contract-vs-reality fix-ups also landed: - StateEstimator Protocol: add_fc_imu(ImuWindow) - was incorrectly annotated as ImuTelemetrySample by AZ-381. - _last_anchor_ns semantics switched to monotonic_ns() to match last_anchor_age_ms. - create() factory back-wires the ISam2GraphHandle to the estimator via the new attach_handle() method. Tests: +21 in tests/unit/c5_state/test_az383_factor_adds.py covering all 8 ACs with mock ISam2GraphHandle instances. Three obsolete AZ-382 tests (test_ac10_add_*_raises_named_az383) removed. Full suite: 565 passed, 2 skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
8b394a98c6 |
[AZ-382] C5 GtsamIsam2StateEstimator skeleton + real iSAM2 handle bodies
- Add GtsamIsam2StateEstimator owning the GTSAM substrate:
gtsam.ISAM2(ISAM2Params()) + gtsam_unstable.IncrementalFixedLagSmoother
(K * 1/3 s window per D-C5-3) + NonlinearFactorGraph + Values.
- Module-level create(...) factory + register() helper for
register_state_estimator("gtsam_isam2", create). Opt-in registration
per ADR-002 — no auto-import.
- Key-management policy: key_for_frame(UUID) -> int via
gtsam.symbol('x', counter); idempotent re-lookup.
- Replace all four NotImplementedError bodies in _isam2_handle.py with
real GTSAM calls:
* add_factor → estimator._graph.add(factor); R05 defensive logging
on success/failure; EstimatorDegradedError on failure.
* update → _isam2.update + _smoother.update; empty
FixedLagSmootherKeyTimestampMap substituted for timestamps=None;
EstimatorFatalError on either failure.
* compute_marginals → gtsam.Marginals(getFactorsUnsafe(),
calculateEstimate()).
* last_anchor_age_ms → (monotonic_ns - _last_anchor_ns) // 1e6.
- StateEstimator Protocol methods on the estimator still raise
NotImplementedError naming AZ-383 (factor adds) / AZ-384
(marginals + outputs).
- AZ-382 AC tests: 27 cases covering 10/10 ACs + factory integration.
- AZ-381 test_ac8_handle_methods_raise_named_task removed (obsolete:
bodies are real now); test_ac8_handle_is_isam2_graph_handle retained.
- Full suite: 547 passed (+26 vs B12), 2 skipped.
- Impl report: _docs/03_implementation/batch_13_cycle1_report.md.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
beed43724f |
[AZ-381] C5 StateEstimator protocol + factory + C8 DTO reshape
- Add StateEstimator Protocol (6 methods, @runtime_checkable) + DTOs (EstimatorOutput, EstimatorHealth, IsamState, PoseSourceLabel, Quat) in _types/state.py per state_estimator_protocol.md v1.0.0. - Add C5 error hierarchy (StateEstimatorError + 3 subclasses) and C5StateConfig (strategy, keyframe_window, spoof gates, no_estimate_fallback_s) with __post_init__ validation. - Add ISam2GraphHandle Protocol + ISam2GraphHandleImpl skeleton (all 4 methods raise NotImplementedError naming AZ-382 as owner). - Add build_state_estimator factory + bind_state_ingest_thread for single-writer enforcement; ADR-002 build-flag gating (BUILD_STATE_<variant>); INFO log on success. - Strict reshape of legacy EstimatorOutput / EstimatorHealth across all 6 C8 production files (_outbound_provenance, _covariance_projector, pymavlink_ardupilot_adapter, msp2_inav_adapter, mavlink_gcs_adapter, interface) + 6 C8 test files (UUID frame_id, LatLonAlt position_wgs84, Quat orientation, PoseSourceLabel enum source_label). Remove ad-hoc DTOs from _types/pose.py and from C4's public __init__ (EstimatorOutput is a C5 concept, not a C4 one). - 20 AZ-381 AC tests (10 ACs + 4 config range + NFR + conformance). - Full suite: 521 passed, 2 skipped (+20 vs Batch 11). - Contracts: state_estimator_protocol.md v1.0.0 -> active; composition_root_protocol.md v1.2.0 -> v1.3.0 (additive state block + factory + ingest-thread binding). - Impl report: _docs/03_implementation/batch_12_cycle1_report.md. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
8a9cf88a46 |
[AZ-396] [AZ-397] Batch 11: C8 source-set switch + QGC telemetry adapter
AZ-396: PymavlinkArdupilotAdapter.request_source_set_switch body sends MAV_CMD_SET_EKF_SOURCE_SET, awaits COMMAND_ACK with timeout, enforces Invariant 11 idempotence (1s rate-limit + skip-after-success). Adds runtime_root.SpoofRecoverySink to bridge C5 spoof-promotion-recovered signal to the C8 outbound thread via a bounded dispatch queue. FcConfig gains spoof_recovery_source_set + source_set_switch_timeout_ms. AZ-397: QgcTelemetryAdapter implements GcsAdapter strategy: MAVLink 2.0 to QGC, emit_summary downsamples 5Hz to configurable summary_rate_hz [0.5, 5.0] via integer modulo, emit_status_text mirrors to GCS link, subscribe_operator_commands translates COMMAND_LONG / PARAM_REQUEST_* / REQUEST_DATA_STREAM / MISSION_* / SET_MODE into OperatorCommand DTOs and audits each receipt to FDR. FcKind.GCS_QGC added for PortConfig. Tests: 25 new (12 AZ-396 + 13 AZ-397); full suite 501 passing, 2 skipped. Contracts unchanged (additive FcConfig fields, range relaxation on GcsConfig.summary_rate_hz, additive FcKind enum value). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
1e0be08e8a |
[AZ-393] [AZ-394] [AZ-395] C8 outbound chain + AP MAVLink2 signing
AZ-393 ArduPilot outbound: PymavlinkArdupilotAdapter encodes EstimatorOutput to MAVLink2 GPS_INPUT via gps_input_send; emits NAMED_VALUE_FLOAT(name="src_lbl") every frame and STATUSTEXT on source_label transition (1 Hz per-severity cap). Smoothed-output guard (Invariant 6), single-writer thread (Invariant 8), SPD propagation. Shared helper _outbound_provenance.py owns the canonical source-label-to-float table + transition rate-limiter. AZ-394 iNav outbound: Msp2InavAdapter encodes EstimatorOutput to hand-rolled MSP2_SENSOR_GPS (0x1F03, 52-byte LE payload via _msp2_sensor_gps_encoder.py + YAMSPy send_RAW_msg). Secondary unsigned MAVLink channel for STATUSTEXT transitions. open() rejects non-None signing_key (RESTRICT-COMM-2 / Invariant 2); request_source_set_switch raises SourceSetSwitchNotSupportedError (Invariant 9 verified: never calls setup_signing on secondary). AZ-395 AP MAVLink2 signing: ephemeral per-flight 32-byte key from secrets.token_bytes; pymavlink setup_signing handshake at open(); in-place bytearray zeroisation on close(); mid-flight signing-failure detection (ERROR log + WARNING STATUSTEXT + no raise; threshold configurable). Key never logged / persisted / serialised (regex-scanned by AC-4/AC-5). BUILD_DEV_STATIC_KEY=ON enables repeatable static-key dev path; rejected at open() when the build flag is absent. Shared: EstimatorOutput.smoothed (default False) added for the Invariant 6 gate at the C8 boundary; FcConfig extended with dev_static_signing_key + signing_failure_threshold (additive defaults; cross-field validation in __post_init__). Tests: 33 new AC tests (11 + 11 + 11) covering all 30 ACs; full suite 476 passing / 2 skipped / 0 failing (was 443). Contract surfaces unchanged at fc_adapter_protocol v1.0.0 and composition_root v1.2.0. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
a61d2d3f4b |
[AZ-391] C8 inbound: MAVLink + MSP2 decoders + rings + bus + warm-start
Adds the C8 inbound producer side: - TelemetryRing[T]: bounded drop-oldest ring; first-overflow INFO log + monotonic dropped_count. - SubscriptionBus + SubscriptionHandle: synchronous fan-out, lock- released-before-callback to avoid deadlock; subscriber crash caught + DEBUG-logged so one bad subscriber cannot kill the decode loop. - PymavlinkInboundDecoder: pymavlink-based AP decoder for RAW_IMU, SCALED_IMU2, ATTITUDE, GPS_RAW_INT, GPS2_RAW, HEARTBEAT, STATUSTEXT. Out-of-order drop (Invariant 7) per-kind WARN. STATUSTEXT spoofing sentinel promotes subsequent GPS to GpsStatus.SPOOFED within 5 s. AC-5.1 warm-start hint cached on first 3D+ fix; embedded into every FlightStateSignal. - Msp2InavInboundDecoder: YAMSPy-based iNav polling decoder for IMU / attitude / GPS / flight-state. signed=False always (RESTRICT-COMM-2); GpsStatus.SPOOFED is unreachable on iNav. Adds yamspy>=0.3.3 + pyserial>=3.5 to pyproject.toml. Tests: 443 pass / 2 skip / 0 fail (+33 in batch 9). Contract: no drift on fc_adapter_protocol.md v1.0.0; this batch implements the inbound producer side without changing signatures. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
362e93c626 |
[AZ-390] [AZ-392] C8 FC/GCS adapter foundation + covariance projector
Adds the C8 foundation: - FcAdapter / GcsAdapter / ReplaySink Protocols + contract DTOs in _types/fc.py (PortConfig, FcKind, FlightState, GpsStatus, Severity, TelemetryKind, FcTelemetryFrame, FlightStateSignal, GpsHealth, OperatorCommand, Subscription, Imu/Attitude samples). - Disjoint FcAdapterError / GcsAdapterError trees with SourceSetSwitchNotSupportedError <: SourceSetSwitchError per AC-9. - FcConfig + GcsConfig cross-cutting Config blocks with config-load validation (unknown strategy rejected at __post_init__). - runtime_root/fc_factory.py: build_fc_adapter / build_gcs_adapter with BUILD_FC_*/BUILD_GCS_* flag gating + INFO log on load + single-writer outbound-thread binding. - CovarianceProjector (helper, AZ-392): 6x6 -> 3x3 -> 2x2 -> sqrt(lambda_max) reduction; AP returns float m, iNav returns int mm with uint16 clamp + WARN + FDR record. Non-SPD / NaN / wrong-shape raise FcEmitError and emit an FDR ERROR record carrying frame_id. Contracts: - composition_root_protocol.md 1.1.0 -> 1.2.0 (added fc/gcs blocks + build_fc_adapter / build_gcs_adapter + outbound-thread binding). - fc_adapter_protocol.md unchanged (this batch implements v1.0.0). Tests: 410 pass / 2 skip / 0 fail (+53 new tests in batch 8). AZ-391 (inbound subscription) deferred to batch 9 — pulls YAMSPy as a new external dependency (iNav MSP2 decode). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
e4ecdaf619 |
[AZ-294] [AZ-295] [AZ-296] Finish C13: tile snapshot + record-kind policy + takeoff abort
AZ-294: MidFlightTileSnapshotSink writes orthorectified tile JPEGs atomically to flight_root/<flight_id>/tiles/<tile_id>.jpg, emits a kind="mid_flight_tile_snapshot" pointer record, and evicts the oldest tile when the per-flight 64 MiB cap is exceeded. Adds optional frame_id to the snapshot payload (fdr_record_schema bump). AZ-295: RecordKindPolicy with two paired gates: - enforce_or_raise (producer-side) raises RawFrameWriteForbiddenError for raw_nav_frame / raw_ai_cam_frame at the call site, defending AC-8.5 / RESTRICT-UAV-4. - gate_for_writer (writer-side) tumbling-window rate-caps failed_tile_thumbnail records at <= 0.1 Hz; over-cap drops are coalesced into kind="overrun" records with the originating producer slug. AZ-296: take_off() composition-root sequence with strict ordering (writer.__init__ -> start -> open_flight -> fc_adapter.__init__ -> fc_adapter.open). On FdrOpenError, logs ERROR record, calls writer.stop(), prints the documented FATAL line to stderr, and sys.exit(EXIT_FDR_OPEN_FAILURE=2). composition_root_protocol bumped to v1.1.0 with the new constants + takeoff-sequence section. 29 new tests; full suite 356 passed / 2 skipped / 0 failures. No new dependencies (stdlib only). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
b5dd6031d2 |
[AZ-291] [AZ-292] [AZ-293] C13 FDR writer chain (batch 6)
AZ-291 — FileFdrWriter: single writer thread draining every registered FdrClient SPSC ring buffer to per-flight segment files; per-segment size rotation; cross-process fcntl.flock filelock on flight_root; ENOSPC degraded mode with rate-capped ERROR logs and one GCS alert. AZ-292 — FlightHeader/FlightFooter dataclasses + open_flight / close_flight lifecycle methods; four per-flight monotonic counters (records_written, records_dropped_overrun, bytes_written, rollover_count) reported by the footer; flight_id mismatch and close-without-open are typed errors. AZ-293 — CapacityCapPolicy (post-rotation hook): walks the flight directory, drops the oldest CLOSED segment when total > cap (default 64 GiB), emits a kind="segment_rollover" record per drop. Never drops the currently-open segment or segment 0 alone; cap_misconfigured path logs ERROR + GCS alert. No config flag disables emission (C13-ST-01). Schema: bumped fdr_record_schema flight_header / flight_footer payload key sets to match the AZ-292 task spec (effective 1.0.0 -> 1.1.0; no prior producer); KNOWN_PAYLOAD_KEYS updated. Added FdrWriterConfig nested in FdrConfig (segment_size_bytes, batch_size, flight_cap_bytes, debug_log_per_record). Tests: 29 new unit tests (8 AC + 1 invariant per task); full suite 323 passed, 2 pre-existing skips, 0 regressions. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
33486588de |
[AZ-271] [AZ-276] [AZ-278] [AZ-282] Finish cross-cutting helpers + relax opencv pin
E-CC-HELPERS closes with the three remaining Layer-1 helpers and E-CC-CONF closes with the env > YAML > defaults precedence test gate. All four tickets ship with frozen public surfaces, hermetic unit tests, and no upward (components.*) imports. * AZ-271 — tests/unit/shared/config/test_precedence.py (5 ACs + smoke test + helper that names the layer in failure messages). * AZ-282 — helpers/ransac_filter.py: static RansacFilter + RansacResult; cv2.setRNGSeed(0) for byte-equal determinism; median residual semantics pinned by contract. * AZ-276 — helpers/imu_preintegrator.py + make_imu_preintegrator; GTSAM PreintegratedCombinedMeasurements; strict-monotonic ts_ns guard runs before any state mutation. Adjacent hygiene: _types/nav.py ImuSample/ImuWindow now use ts_ns:int and the spec-mandated ImuBias dataclass. * AZ-278 — helpers/lightglue_runtime.py: structural R14 fix. LightGlueRuntime + non-blocking concurrent-access guard that raises rather than serialising. EngineHandle Protocol in _types/manifests.py + KeypointSet/CorrespondenceSet in _types/matching.py (Protocol surface adds approved by spec). Dependency conflict (Finding 1, user-approved): gtsam 4.2 (PyPI) is numpy-1.x-ABI only; opencv-python>=4.12 needs numpy>=2 at runtime. Resolution: opencv-python pin relaxed to >=4.11.0.86,<4.12. The D-CROSS-CVE-1 ratchet at ci/opencv_pin_gate.py is held at 4.11.0 with the original 4.12.0 floor restored once a numpy-2-compatible gtsam wheel ships. Full replay procedure in _docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md. Tests: 294 passed, 2 skipped (cmake/actionlint env-skips, pre-existing). 43 new tests added for batch 5. Ruff check + format clean. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
ba20c2d195 |
[AZ-273] [AZ-274] [AZ-275] [AZ-267] [AZ-268] FDR producer chain + log bridge + contract test
AZ-273: lock-free SPSC ring buffer with pre-allocated slots, power-of- two capacity, opt-in SPSC guard, and EnqueueResult / FdrSpscViolationError on the public surface. make_fdr_client caches one client per producer_id and reads capacity from config.fdr.per_producer_capacity with fallback to queue_size. AZ-274: default_overrun_policy implements drop-oldest + retry + immediate marker emission, with prior-marker dropped_count folding via _evict_one so user-loss info is never lost across iterations. ERROR diagnostic is rate-limited to <=1/sec per producer. AZ-275: FakeFdrSink mirrors the FdrClient public surface and reuses the production default_overrun_policy via a duck-typed _PolicyAdapter. The test-only records/all_records_ever properties let component tests assert both in-buffer and lifetime state. tests/conftest.py registers the fake_fdr_sink fixture and an AST architecture lint forbids production imports of fakes. AZ-267: FdrLogBridgeHandler installs on the root logger via wire_log_bridge and forwards only WARN+ERROR records into the FDR with kind="log". Thread-local recursion guard short-circuits internal logging; saturated- queue diagnostics go to stderr every N=1000 drops. AZ-268: tests/contract/log_schema.py covers every row of the schema's Test Cases table plus the "DEBUG+INFO never reach FDR" invariant. pyproject.toml registers the contract pytest marker and the contract-mandated log_schema.py file-name. 251 unit + contract tests pass (48 new). Review verdict: PASS_WITH_WARNINGS; findings are NFR-perf deferrals + documented relaxation of AZ-274 AC-2 coalescing under permanently-stalled consumer. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
3acc7f33dd |
[AZ-270] [AZ-272] [AZ-279] [AZ-281] [AZ-283] Compose root + FDR schema + 3 Layer-1 helpers
AZ-270: composition root with strategy registry, tier-gated lookup, topo-order construction, all-or-nothing teardown, StrategyNotLinkedError payload. AZ-272: orjson-backed FdrRecord serialise/parse with forward-compat for unknown payload + top-level fields and canonical overrun-record shape. AZ-279: pyproj-backed WGS84/ECEF/ENU + OSM slippy-map tile math with WgsConversionError for shape/range/zoom guards. AZ-281: strict EngineFilenameSchema build/parse/matches_host with anchored regex + enum validation; round-trip identity by construction. AZ-283: dtype-preserving (fp16/fp32) single + batch L2 normaliser with zero-norm safety and descriptor_metric() source-of-truth. pyproject.toml pins pyproj>=3.6 and orjson>=3.9 (named-backend deps per the AZ-272 / AZ-279 contracts). New DTOs LatLonAlt + BoundingBox and EngineCacheKey + HostCapabilities land in _types/ to back the helper contracts. 203 unit tests pass (64 new). Review verdict: PASS_WITH_WARNINGS; findings are perf-NFR deferrals + dep amendment + minor docstring polish. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
8e71f6c002 |
[AZ-266] [AZ-269] [AZ-277] [AZ-280] Cross-cutting log/config + SE3/SHA256 helpers
AZ-266: schema-compliant JSON logging entrypoint, level normalisation, handler-topology guard, format-error fallback (log_record_schema v1.0.0). AZ-269: env > YAML > defaults config loader, frozen Config dataclass, missing-var fail-fast with pointer to .env.example, component-block registry. AZ-277: GTSAM-backed SE3Utils (matrix<->SE3 + exp/log/adjoint) with strict orthogonality, dtype, and bottom-row contract enforcement. AZ-280: atomicwrites-backed write_atomic + independent verify + order-deterministic aggregate_hash; sidecar format strictness. pyproject.toml pins gtsam>=4.2,<5.0 and atomicwrites>=1.4,<2.0 (named-backend deps per the AZ-277 / AZ-280 contracts). 139 unit tests pass (44 new). Review verdict: PASS_WITH_WARNINGS; findings are perf-NFR + journald deferrals, no blocking issues. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
b12db61444 |
[AZ-263] Bootstrap: repo skeleton + Docker + CI + Alembic + Tier-1 tests
Implements the AZ-263 / E-BOOT initial structure task:
- Python src/-layout package `gps_denied_onboard/` with per-component
interface stubs (14 components), type-only DTOs under `_types/`,
shared helpers under `helpers/` (R14 LightGlue ownership), structured
JSON logging, runtime composition root with env-var fail-fast gate,
healthcheck module shared by Docker and CI smoke.
- CMake top-level + `cmake/{build_options,dependencies,strategies}.cmake`
with the BUILD_* per-binary flags (ADR-002) and pinned external git
refs for OKVIS2 / VINS-Mono / GTSAM / FAISS / OpenCV >=4.12.0.
- Three Dockerfiles (companion-tier1, operator-tooling,
mock-suite-sat-service) + two compose files (dev + Tier-1 test).
- Four GitHub Actions workflows: ci.yml (lint/unit/integration/dual
binary build/SBOM diff/security), ci-tier2.yml (self-hosted Jetson
AC-bound NFTs), release.yml, cve-rescan.yml.
- Two CI gate scripts: `ci/sbom_diff.py` (deployment SBOM subset +
R02 exclusion), `ci/opencv_pin_gate.py` (>=4.12.0 enforcement,
D-CROSS-CVE-1).
- Alembic-driven Postgres 16 initial migration `0001_initial.py`
mirroring satellite-provider tiles + flights + sector_classifications
+ manifests + engine_cache_entries (data_model.md s 2).
- Tier-1 test scaffolding: 95 passing unit tests covering every AC,
per-component smoke tests, structured logging JSON output check,
env-var gate check, healthcheck import check. Two CI-gated tests
(cmake configure, actionlint) skip locally with explicit reasons.
- Batch report + code review report under `_docs/03_implementation/`.
Verdict: PASS_WITH_WARNINGS (two Low findings, both informational).
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
8382cdae10 | start over again | ||
|
|
2425f8e6fd |
[AZ-243] Integrate production native VIO runtime
Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
2ba44a33c5 |
[AZ-238] [AZ-239] Add resource restart tests
Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
5acd14b792 |
[AZ-234] [AZ-235] [AZ-236] [AZ-237] Add replay tests
Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
c30fd4f67d |
[AZ-233] Add blackbox replay infrastructure
Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
70f786f2d1 |
[AZ-240] [AZ-241] [AZ-242] Add native retrieval remediation
Implement the product remediation paths required before greenfield code testability revision: native VIO backend selection, local VPR descriptor index retrieval, and computed anchor matching gates. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
9fb9e4a349 |
[AZ-232] Add safety anchor state machine
Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
7819ae7a38 |
[AZ-231] Add anchor verification gates
Co-authored-by: Cursor <cursoragent@cursor.com> |