mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 16:31:14 +00:00
5fe67023b221f928f5f199d404675e03cfbf18da
72 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
5fe67023b2 |
[AZ-329] [AZ-330] [AZ-523] [AZ-524] Batch 44 atomic refactor
Implements two new C12 services and rebalances the C11/C12 boundary in one atomic commit: * AZ-329 PostLandingUploadOrchestrator — gates C11 upload on the `flight_footer` FDR record's `clean_shutdown` field; 4 refusal modes; new FdrFooterReader Protocol + LocalFdrFooterReader. * AZ-330 OperatorReLocService — AC-3.4 visual-loss re-localization hint; reuses shared LatLonAlt; OperatorCommandTransport Protocol cut (E-C8 owns the future pymavlink concrete); new FDR record kind `c12.reloc.requested`; log redaction (lat/lon 5 decimals, reason 200 chars). * AZ-523 C11 internal flight-state gate removed (SRP refactor): `confirm_flight_state` / `FlightStateSignal` use / `FlightStateNotOnGroundError` deleted from C11; TileUploader contract bumped to v2.0.0 (frozen) with migration note; AZ-317 superseded. * AZ-524 Package rename `c12_operator_tooling` → `c12_operator_orchestrator` across source, tests, pyproject, CMake, Dockerfile, compose, CI, runtime-root services class (`OperatorOrchestratorServices`) + factory function (`build_operator_orchestrator`), logger namespaces, config slug, docs, and the E-C12 epic title. Tests: 1543 passed, 80 skipped (all environment gates). Targeted AC suite (AZ-329 + AZ-330 + FdrFooterReader): 37 passed. Cold-start NFR-perf still ≤ 500 ms p99. Tracker: AZ-317 → Done (superseded); AZ-319 v2.0.0 contract bump comment; AZ-329/AZ-330 → In Testing; AZ-253 epic renamed; AZ-523 + AZ-524 created and closed as audit-trail tickets. See `_docs/03_implementation/batch_44_cycle1_report.md`. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
2d88d3d674 |
[Batch 44 prep] Add batch 44 implementation plan
Captures the architectural plan agreed in the prior /autodev session: C12 package rename (c12_operator_tooling -> c12_operator_orchestrator), C11 internal flight-state gate removal (SRP fix; supersedes AZ-317), AZ-329 PostLandingUploadOrchestrator rewrite around flight_footer FDR record, and AZ-330 OperatorReLocService implementation. Execution starts in the next /autodev invocation; this commit makes the planning artifact durable so the batch executes against a fixed plan. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
7644b25e8c |
[AZ-328] C12 BuildCacheOrchestrator + remote C10 invoker (Batch 43)
Implements F1 pre-flight cache build orchestrator on the operator workstation. Composes C11 TileDownloader (AZ-316), C12 CompanionBringup (AZ-327), C12 FlightsApiClient (AZ-489), and the new RemoteCacheProvisionerInvoker into one sequenced flow guarded by a filelock-backed workstation-side lockfile. Architectural decisions: - Phase-0 flight-resolve runs BEFORE the lockfile (ADR-010): a flight that cannot be resolved is an operator-input error, not a contended- resource error. Enforced by AC-11 + AC-14. - Consumer-side cuts (AZ-507) for C11 + C10 types: local Protocols / mirror DTOs in tile_downloader_cut.py and _types.py; external errors matched by name-based whitelisting so unknown exceptions still propagate per AC-6. Cross-component type translation lives at the composition root (c12_factory). - Failure surfacing: recognised operational failures (download error, companion not ready, build error, flight-resolve error) return as CacheBuildReport(outcome=failure, failure_phase=...). Only lockfile contention raises (BuildLockHeldError) since no phase ever ran. - Workstation-side filelock library (project pin); no custom primitive. - Remote C10 stdout streamed line-by-line as DEBUG with api_key / auth_token redacted before logging (defence-in-depth). - CLI is now a thin adapter; all workflow logic lives in build_cache.py. operator-tool build-cache exit codes map per CacheBuildReport.failure_phase + failure_exception_type. Tests: 116 c12 unit tests pass (29 new for AZ-328 covering 15/15 ACs + NFR-perf-overhead microbench; 7 new for remote_c10_invoker; 3 new for file_lock; test_cli_build_cache rewritten for new orchestrator interface). Full repo suite: 1522 passed, 80 skipped. Also: replays Batch 42's ruff format leftover for c12 flights_api + test_az489 files (formatter ran over the c12 directory after new files were added). Pure whitespace; no behaviour change. Full report: _docs/03_implementation/batch_43_cycle1_report.md Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
099c75c6f8 |
chore: cumulative review batches 40-42 (PASS_WITH_WARNINGS)
5 findings: F1 (Medium / Maintainability) - _iso_now copies grew to 8 across c11 + c13 + c7, AZ-508 hygiene PBI no longer matches reality; F2-F5 (Low) - triplicated atomic-write JSON helpers, 4x duplicated SectorClassification enum (acknowledged by ADR-009), recurring "outcome=failure" prose vs typed-exception drift across the C11 trio, and an NFR-perf-cold-start near-miss that prompted PEP 562 lazy-import discipline in c12. None block the implement loop. Updated _autodev_state.md last_cumulative_review to batches_40-42. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
91ce1c2047 |
[AZ-326] [AZ-327] C12 operator-tool CLI + companion SSH bringup
AZ-326 (3pt): operator-tool Click CLI shell at src/gps_denied_onboard/components/c12_operator_tooling/cli.py with six subcommands (download, build-cache, upload-pending, reloc-confirm, verify-ready, set-sector); SectorClassificationStore (atomic-write JSON under ~/.azaion/onboard/sector-classifications.json); freshness-table lookup driving AC-NEW-6; EXIT_* constants; AZ-266 structured-JSON log wiring to a rotating ~/.azaion/onboard/c12-tooling.log handler; operator-tool console-script entry in pyproject.toml. AZ-327 (3pt): CompanionBringup orchestrator at src/gps_denied_onboard/components/c12_operator_tooling/companion_bringup.py that opens an SSH session against the companion (paramiko per project pin), checks the four pre-flight artifacts (Manifest, expected engines, sha256 sidecars, calibration), and returns a ReadinessReport per description.md S2; CompanionUnreachableError + ContentHashMismatchError with operator-friendly remediation hints; ParamikoSshSessionFactory + RemoteSidecarVerifier (sha256sum + cat over SSH, no bytes pulled to the workstation); paramiko>=3.4,<4.0 dep added. NFR-perf-cold-start fix: PEP 562 lazy __getattr__ in c12_operator_tooling/__init__.py and flights_api/__init__.py defers HttpxFlightsApiClient (httpx), ParamikoSshSession[Factory] (paramiko + cryptography), bbox_from_waypoints / takeoff_origin_from_flight (numpy + pyproj). cli.py imports from leaf flights_api modules. operator-tool --help cold start: ~870ms -> <200ms typical, <500ms p99. Includes 73 unit tests (incl. paramiko-version-drift smoke per AZ-327 Risk 1) + console-script integration test. All 1494 repo-wide unit tests pass; 80 skips are pre-existing environment gates. Batch report: _docs/03_implementation/batch_42_cycle1_report.md. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
a06b107fc3 |
[AZ-320] Add C11 IdempotentRetryTileUploader decorator
Wraps HttpTileUploader (AZ-319) with two bounded retry budgets: - In-call (per-batch) — re-invokes inner on PARTIAL outcome up to `max_in_call_retries` times with capped exponential backoff (`min(base ** attempt_number, cap)`). On exhaustion: surfaces an operator hint via `next_retry_at_s = now + backoff_cap_s`. - Per-tile (cross-call) — atomically increments c6's `tiles.upload_attempts` counter for every rejection; once a tile hits `max_per_tile_attempts` it is forward-only transitioned to `voting_status = upload_giveup` (excluded from `pending_uploads`). Each transition emits FDR `kind="c11.upload.giveup"` plus an ERROR log. C6 contract changes (AZ-303 v1.3.0): - VotingStatus.UPLOAD_GIVEUP added (forward-only from PENDING/TRUSTED). - TileMetadataStore.increment_upload_attempts(tile_id) -> int added with NotImplementedError default for backwards-compat. - Migration 0003_c11_upload_attempts: additive column + widened ck_tiles_voting_status (preserves IS NULL clause). C11 wiring: - C11RetryConfig + disable_retry_decorator on C11Config. - build_tile_uploader wraps in decorator by default; bypass flag returns the bare HttpTileUploader. New `clock` keyword. Cross-component isolation honoured (AZ-507): the decorator declares `_RetryMetadataStoreLike` Protocol cut over c6's TileMetadataStore and references `UPLOAD_GIVEUP` via a local string constant — no c6 imports. Tests: 13 decorator + 1 conformance + 2 factory bypass + AC-6 enum update + alembic head bump + AZ-272 schema fixture. 238 passed across c11/c6/fdr suites; pre-existing perf microbenches unrelated. Code review: PASS_WITH_WARNINGS (5 Low/Informational findings, docs-level or downstream-CI-blocked). See _docs/03_implementation/reviews/batch_41_review.md. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
90f4ac78f4 |
[AZ-316] Implement C11 HttpTileDownloader (batch 40)
Lands the operator-side pre-flight download path: authenticated httpx GETs against satellite-provider, RESTRICT-SAT-4 (>= 0.5 m/px) enforcement at the C11 boundary, c6 writes via consumer-side cuts (_TileWriterLike, _BudgetEnforcerLike), per-(flight_id, request_hash) journal under cache_root/.c11/journal/ for idempotent re-runs (AC-8, AC-12), 429 Retry-After + 5xx exponential backoff handling, fail-fast on TLS / 401 / 403, and a redacted-bearer auth-header policy. Architecture: - AZ-507 cross-component rule held: tile_downloader.py imports zero c6 symbols; the composition-root _C6DownloadAdapter in runtime_root/c11_factory.py absorbs c6's TileMetadata / TileSource / FreshnessLabel / VotingStatus enum assembly. - Sleep-callable injection (not full Clock) per Batch 39 precedent; default routes through WallClock.sleep_until_ns to keep the AZ-398 invariant intact. - No FDR records on the download path; spec mandates structured logs only (8 log kinds wired: session.start/end, resolution_rejected, freshness_rejected_summary, freshness_downgraded, batch.retry, provider.failed, budget.exceeded, idempotent_no_op). Tests: 14 new downloader unit tests covering AC-1..AC-9, AC-11, AC-12 plus throughput NFR + 429 HTTP-date + 429 budget exhaustion; 2 new TileDownloader Protocol conformance tests (AC-10). Full unit suite: 1420 passed, 80 skipped (env-gated), 0 failed. Code review: PASS_WITH_WARNINGS (5 Low findings, all documentation or downstream-blocked). See _docs/03_implementation/reviews/ batch_40_review.md and batch_40_cycle1_report.md. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
3a61a4f5bf |
chore: cumulative review batches 37-39 (PASS_WITH_WARNINGS)
Captures the C11 operator-side trio landing (AZ-317/318/319) plus the C10 build-orchestrator close-out (AZ-325) and the AZ-515 canonical-hash extraction. Three Low findings, all documentation-level drift between spec text and as-built code; none block Batch 40. Resolves prior F1 (AZ-515 closed the verifier-into-builder private import). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
610e8a743c |
[AZ-319] C11 HttpTileUploader (post-landing upload path)
Lands the production HttpTileUploader composing AZ-317's gate, AZ-318's per-flight signing, and consumer-side cuts over c6 storage. Implements the full upload flow: gate ON_GROUND -> start_session -> enumerate pending -> per-batch multipart POST with Ed25519 signing -> mark_uploaded on ack -> end_session in finally. Honours Retry-After (RFC 7231 int + HTTP-date), exponential backoff on 5xx, fail-fast on TLS/401/403. Adds C11Config block, three FDR kinds (tile.queued, tile.rejected, batch.complete), and the build_tile_uploader composition-root factory. Cross-component access to c6 stays Protocol-cut (AZ-507 / AZ-270). Tests: 17 new unit tests covering AC-1..AC-14 plus throughput NFR; AZ-272 schema fixtures for the three new FDR kinds. Full unit suite: 1404 passed. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
cde237e236 |
[AZ-317] [AZ-318] C11 upload-side: flight-state gate + per-flight key
Batch 38 (cycle 1) lands the two upload-side prerequisites the upcoming AZ-319 TileUploader needs to authenticate per-flight sessions against the parent suite's D-PROJ-2 ingest contract. AZ-317 FlightStateGate: - confirm_on_ground() defence-in-depth gate atop ADR-004 process isolation; fail-closed for UNKNOWN, IN_FLIGHT, TAKING_OFF, LANDING, and source-failure (mapped to UNKNOWN with original exception preserved on __cause__). - ERROR log on refusal, INFO log on pass, single source call per invocation (no polling, no retry). AZ-318 PerFlightKeyManager: - Per-flight ephemeral Ed25519 keypair via the project-pinned cryptography library; sign(payload) -> 64-byte Ed25519 signature. - Best-effort zeroisation of a project-controlled bytearray mirror on end_session; OpenSSL-side buffer freed via dropped reference. - __del__ safety net with WARN log if end_session was missed. - start_session emits FDR kind=c11.upload.session.key.public so the safety officer can correlate flights with key fingerprints. - record_signature_rejection emits FDR + ERROR log on parent-suite ingest rejection (security-critical, never silently dropped). Shared C11 plumbing: - TileManagerError parent + 3 subclasses (FlightStateNotOnGroundError, SessionNotActiveError, SignatureRejectedError envelope). - FlightStateSignal (str, Enum) and PublicKeyFingerprint DTOs. - FlightStateSource Protocol on c11_tile_manager.interface. - runtime_root.c11_factory factories for both new services. - Two new FDR kinds registered in fdr_client.records central KNOWN_PAYLOAD_KEYS; AZ-272 schema-roundtrip fixtures added in lockstep so the central test stays green. Tests: 26 new + 2 fixture additions; full suite 1384 passed, 80 skipped (documented Docker / Tier-2 / CUDA gates). Code review: PASS_WITH_WARNINGS — 2 Low findings documented in _docs/03_implementation/reviews/batch_38_review.md (dev-host vs operator-workstation perf bound; spec text named StrEnum but project pins Python 3.10). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
f7b2e70085 |
[AZ-325] C10 CacheProvisioner orchestrator
Implements the public top-level F1 build orchestrator for E-C10 per contract v1.1.0. Composes EngineCompiler (AZ-321), DescriptorBatcher (AZ-322), and ManifestBuilder (AZ-323) into a single idempotent operation guarded by a fcntl-backed cache_root/.c10.lock and a post-build coverage walk. Adds: - CacheProvisionerImpl + FilelockFileLockFactory (provisioner.py) - BuildRequest/BuildReport/BuildOutcome/SectorClassification DTOs + FileLockFactory Protocol + replaced placeholder CacheProvisioner Protocol with v1.1.0 surface (interface.py) - C10ProvisionerConfig wired into C10ProvisioningConfig (config.py) - BuildLockHeldError + ManifestCoverageError (errors.py) - build_cache_provisioner composition root (c10_factory.py) - 18 tests covering AC-1..AC-16 + NFR-perf-coverage-walk - filelock>=3.13,<4.0 (single new third-party dep) Idempotence (CP-INV-1) reuses AZ-323's _compute_manifest_hash / _aggregate_tile_hash so the build-identity decision agrees byte-for- byte with the Manifest's recorded manifest_hash. Coverage rollback uses a .prev rename snapshot. Diagnostic compile_engines_for_corpus is lock-free per AC-10. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
684ec2601c |
chore: record cumulative review batches 34-36 + state
Cumulative code review for batches 34-36 (AZ-507, AZ-323, AZ-324, AZ-306, AZ-322) per implement skill Step 14.5 K=3 cadence. Verdict: PASS_WITH_WARNINGS — 0 Critical / 0 High / 0 Medium / 3 Low (all Maintainability). Previous review's Medium F1 (doc-vs-lint) is RESOLVED by AZ-507. Carryover-Low findings tracked: - F1: manifest_verifier imports private _aggregate_tile_hash from manifest_builder; promote to public or extract to a shared module (1-pt follow-up PBI). - F2: AZ-508 task spec stale — c6 already consolidated within-component, c7 has 2 active copies (+ a new thermal_publisher copy not in spec). - F3: consumer-side Protocol cut pattern still un-documented in architecture.md; pattern now 9+ instances and is the established cross-component contract surface. State updated: last_cumulative_review = batches_34-36; sub_step = parse-tasks; batch 37 (AZ-325 C10 CacheProvisioner solo, 3pt) is next. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
f01a5058ab |
[AZ-322] C10 DescriptorBatcher (faiss-cpu, OOM halve-retry)
Implements the C10 internal phase that walks every C6 tile, embeds through C2's backbone via the AZ-321-produced engine, and rebuilds the AZ-306 FAISS HNSW index in one atomic write. - DescriptorBatcher with halve-and-retry OOM recovery (default 1 retry) - BackboneEmbedder Protocol + C7EngineBackboneEmbedder default impl - DescriptorBatchError for OOM / dim-mismatch / missing-output failures - Empty-corpus surfaces as outcome=failure with explicit hint to run C11 - Per-10% progress callback + DEBUG logs (no engine bytes leaked) - Consumer-side Protocol cuts (TilesByBboxBatchQuery, TilePixelOpener, DescriptorIndexRebuilder) so c10 stays within AZ-270 lint - runtime_root.c10_factory adds build_descriptor_batcher + three C6->C10 adapters - 16 unit tests covering AC-1..AC-10 + 2 NFRs + 4 supplemental (Protocol conformance, query pass-through, handle release, config) Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
3b7265757b |
[AZ-306] C6 FaissDescriptorIndex (faiss-cpu, HNSW32)
Production-default DescriptorIndex strategy backed by the faiss-cpu PyPI wheel (>=1.7,<2.0). Implements the AZ-303 Protocol surface end to end: HNSW32 + IndexIDMap2 search, atomic three-file rebuild (.index + .sha256 sidecar + .meta.json), triple-consistency load check, mmap-backed reads with IO_FLAG_MMAP|IO_FLAG_READ_ONLY, optional warm-up query at construction, FAISS RuntimeError rewrap to IndexUnavailableError / IndexBuildError, and FaissDescriptorIndex.from_config classmethod wired into runtime_root.storage_factory. The original spec required a custom pybind11 wrapper over a vendored FAISS HEAD; the user opted for the upstream faiss-cpu wheel after research fact #92 confirmed ARM64 wheel availability for Jetson and the existing pyproject.toml already pinned faiss-cpu. cpp/faiss_index/ placeholder removed; BUILD_FAISS_INDEX flag retained as a runtime/factory gate (no native target). Spec rewritten end-to-end and archived to _docs/02_tasks/done/. C6TileCacheConfig extended with faiss_index_path and faiss_warmup_query_path fields. tests/conftest.py sets KMP_DUPLICATE_LIB_OK=TRUE to remediate the macOS faiss/torch libomp duplicate-load abort during pytest (no-op on CI Linux). 21 new tests cover AC-1..12 + 2 NFRs + from_config smoke; AZ-303 protocol-conformance fake updated with from_config classmethod. Tests: 124/124 c6_tile_cache pass; 1334 project-wide pass; 1 pre-existing OKVIS2 submodule failure unrelated. Doc sync: module-layout.md, components/08_c6_tile_cache/description.md §5, batch_35_cycle1_report.md. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
9b6e0b81f5 |
chore: backfill batch_34_cycle1_report from commit e2bebef
The previous /autodev session committed batch-34 (AZ-507 + AZ-323 +
AZ-324) and recorded the completion in _autodev_state.md but never
wrote the batch report file. Backfill the report now so the
cumulative-review trigger and resumability scans see the true latest
batch on disk. Reconstructed from commit
|
||
|
|
defe80dc75 |
chore: record cumulative review batches 31-33 + state
Cumulative review covering AZ-298 / AZ-299 / AZ-321: PASS_WITH_WARNINGS. 0 Critical, 0 High, 1 Medium, 2 Low. Medium: `module-layout.md` declares c10 may import from c7 Public API but `test_az270_compose_root.test_ac6` forbids ANY cross-component import — doc-vs-lint mismatch surfaced by AZ-321; refactor pivoted to `CompileEngineCallable` local Protocol cut. Flagged for hygiene PBI; not blocking. Low: `_iso_ts_now` now duplicated five times across c7+c6; consumer-side Protocol cut pattern recurring (LightGlue `EngineHandle` + `CompileEngineCallable`). Both deferred to the next hygiene cycle. State advances to phase 3 (compute-next-batch) with last_cumulative_review=batches_31-33 so the next /autodev invocation enters at the right point. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
0dfe7c5301 |
[AZ-321] C10 EngineCompiler: hardware-tied TRT compile + cache reuse
Land the C10 per-model engine compile + cache-reuse orchestrator. `EngineCompiler.compile_engines_for_corpus(request)` walks the corpus, computes the canonical engine filename via AZ-281 `EngineFilenameSchema.build`, and either reuses the cached binary (cache hit, AZ-280 `Sha256Sidecar.verify` returns True) or delegates to the AZ-297 `compile_engine` on the injected runtime (cache miss; the runtime owns the write path). Returns one `EngineCompileResult` per backbone carrying the canonical `EngineCacheEntry`, outcome (BUILT / REUSED), and `compile_duration_s` (None on reuse). Hardware-tied reuse (D-C10-6 / D-C10-7) falls out of the filename schema — a host change rebuilds at the new path and leaves the old files untouched (AC-4). Design corrections vs. the task spec body: - The spec proposed a c10-local `EngineCacheEntry` carrying outcome and duration; that name is already taken by the AZ-297 canonical DTO. The wrapper is renamed `EngineCompileResult`; the canonical shape wins. - The spec called `InferenceRuntime.host_info()`, which is not in the AZ-297 Protocol. `HostCapabilities` is threaded through `EngineCompileRequest` instead so the composition root owns host probing and the compiler stays decoupled. - The c10 layer cannot import `components.c7_inference` (arch rule `test_az270_compose_root.test_ac6`). `engine_compiler.py` defines `CompileEngineCallable` — a structural Protocol cut of `InferenceRuntime` exposing only `compile_engine` — and catches broad `Exception` (re-raising preserves the original type; `error_class` is recorded in the ERROR log payload). Production - engine_compiler.py: `CompileOutcome` enum, `BackboneSpec`, `EngineCompileRequest`, `EngineCompileResult`, `EngineCompileSummary` DTOs; `CompileEngineCallable` Protocol; `EngineCompiler` with the single public method. - config.py: `BackboneConfig` + `C10ProvisioningConfig` (`workspace_mb` default 4 GiB to match C7 NFT-LIM-01); validate positive shape dims and duplicate model_name detection in `__post_init__`. - runtime_root/c10_factory.py: `build_engine_compiler(config)` wires the existing `build_inference_runtime` factory through; `build_backbone_specs(config)` materialises the `BackboneSpec` tuple from the config block. - components/c10_provisioning/__init__.py: re-exports the AZ-321 surface and registers the new config block. Tests - test_engine_compiler.py: covers AC-1..AC-10 + missing-sidecar sibling case for AC-5. Tier-1 via fake runtime that writes through the REAL `Sha256Sidecar.write_atomic_and_sidecar`. Tier-2 placeholders for the cache-hit p99 NFR (200 MB engine sweep) and kill-during-compile atomic-write NFR. Docs - module-layout.md: c10_provisioning Per-Component Mapping lists the new internal modules (engine_compiler.py, config.py), the composition-root c10_factory.py, the AZ-321 public re-export surface, and the registered config block. - batch_33_cycle1_report.md + reviews/batch_33_review.md: PASS_WITH_WARNINGS (4 Low findings accepted). Tests run: c10_provisioning 13 passing + 2 Tier-2 skips; combined unit suite (excluding pending components) 543 passing, 21 env-skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
0ad3278b12 |
[AZ-299] C7 OnnxTrtEpRuntime: ORT + TRT EP fallback strategy
Land the fallback InferenceRuntime strategy that satisfies C7-IT-05: when the TRT-direct path (AZ-298) cannot deserialise a cached engine or when the operator explicitly selects ORT, the system stays in the air at degraded latency rather than dropping the request. Conforms to the AZ-297 Protocol; current_runtime_label() == "onnx_trt_ep". Production - onnx_trt_ep_runtime.py: compile_engine is a no-op returning an EngineCacheEntry pointing at the source .onnx; deserialize_engine is gate-first for .engine entries and gate-skip for .onnx, builds an ORT InferenceSession with the provider list [TensorrtExecutionProvider, CUDAExecutionProvider, CPUExecutionProvider], stages cached engines into the ORT TRT EP cache directory via symlink-or-copy, warms up with one session.run after construction, and honours config.inference.ort_disallow_cpu_ fallback by raising EngineDeserializeError when the active provider resolves to CPU; infer emits a one-shot c7.fallback_to_onnx_trt_ep WARN log plus gcs_alert callback on first call when is_fallback= True; release_engine is idempotent. _build_provider_args is the single point that pins TRT EP option-key names (Risk-3) and caps trt_max_workspace_size at gpu_memory_budget_bytes // 4 (AC-8). - config.py: adds ort_trt_cache_dir (validated non-empty) and ort_disallow_cpu_fallback to C7InferenceConfig. - fdr_client/records.py: adds c7.fallback_to_onnx_trt_ep and c7.cpu_fallback FDR record kinds. Tests - test_onnx_trt_ep_runtime.py: covers AC-1..AC-8 + Risk-2 CPU-fallback alert + Risk-3 option-key pin + NFR-reliability error rewrap; Tier-1 via fake ORT session; Tier-2 placeholders skip on macOS dev for numerical FP16 comparison and session-creation perf NFR. - test_protocol_conformance.py: drops onnx_trt_ep from the missing- module parametrize now that the module ships. - test_az272_fdr_record_schema.py: extends per-kind fixture builder to cover the two new C7 FDR kinds in the roundtrip / schema-version AC tests. Docs - module-layout.md: replaces the pending onnx_trt_runtime row with the shipped onnx_trt_ep_runtime row + capabilities list. - batch_32_cycle1_report.md + reviews/batch_32_review.md: full batch + self-review (PASS_WITH_WARNINGS, 4 Low findings accepted). Tests run: c7_inference 139 passing + 17 Tier-2 skips; combined unit suite (excluding pending components) 529 passing, 19 env-skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
18a69022b3 |
[AZ-298] C7 TensorrtRuntime: TRT 10.3 + INT8 calib trust + GPU budget
Implement the production-default InferenceRuntime strategy on JetPack 6.2 + TensorRT 10.3 (per D-C7-9). The runtime owns the full TRT lifecycle: compile_engine via the Polygraphy + trtexec + IBuilderConfig hybrid (FP16 / INT8 / Mixed precision), deserialize_engine with EngineGate-first ordering and a pre-allocation GPU memory budget gate, infer via H2D -> enqueueV3 -> D2H -> stream sync on the owned CUDA stream, idempotent release_engine, and an injected ThermalStatePublisher delegation for thermal_state. INT8 calibration cache trust (D-C10-6, AC-2/3/4) is enforced by a .calib_cache.sha256 file-integrity sidecar (AZ-280) plus a new .calib_cache.dataset_sha256 sidecar that records the dataset content hash at compile time; reuse only when both agree, rebuild silently on dataset hash mismatch, raise CalibrationCacheError on corrupt sidecar (never silently overwritten). GPU memory budget (NFT-LIM-01, default 4 GiB) is checked BEFORE any TRT call beyond the gate (AC-6); a pre-allocation refusal raises OutOfMemoryError and leaves the resident state unchanged. TensorRT 10.3 / Polygraphy / PyCUDA are lazy-imported inside the methods that need them so the module loads cleanly on Tier-0 hosts. A standalone CLI entry (python -m gps_denied_onboard.components.c7_inference.tensorrt_runtime compile <onnx> <build_config.json>) is wired for C10 CacheProvisioner (AZ-321) to invoke pre-flight without holding a runtime instance. C7InferenceConfig gains gpu_memory_budget_bytes (default 4 GiB) and trtexec_timeout_s (default 600 s, Risk 4 mitigation), both validated in __post_init__. Tests: 26 active + 6 Tier-2-gated skips; AC-1 / AC-3 / AC-4 / AC-5 / AC-6 / AC-7 / AC-10 + NFR-reliability fully covered on Tier-1 via fake CUDA / TRT modules; AC-2 / AC-8 / AC-9 / NFR-perf-deserialize placeholders skip with prerequisite reason and live in the AZ-298 Tier-2 microbench harness. Code review verdict PASS_WITH_WARNINGS (1 Medium hot-path hoist fix auto-applied). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
afe42f451c |
chore: record cumulative review batches 28-30 + state
PASS_WITH_WARNINGS verdict for batches 28-30 (AZ-305, AZ-307, AZ-308); five findings, all Medium/Low — module-layout drift + cross-batch DRY. No Critical/High, no auto-fix gate; per implement Step 14.5, PASS_WITH_WARNINGS continues to the next batch. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
d571ca25f9 |
[AZ-308] c6 CacheBudgetEnforcer: 10 GB hard cap + LRU sweep
CacheBudgetEnforcer.reserve_headroom(needed_bytes) returns immediately when total_disk_bytes() + needed_bytes <= budget, otherwise iterates lru_candidates in eviction_batch_size batches, deletes via delete_tile, emits one INFO log per evicted tile (c6.evicted) and one FDR record per eviction batch (c6.eviction_batch, evicted_tile_ids capped to 5). Raises CacheBudgetExhaustedError AFTER a full sweep if the budget cannot be met. BudgetEnforcedTileStore decorates a TileStore so the policy stays separable from PostgresFilesystemStore. Composition root in storage_factory.build_tile_store wires the wrapper unconditionally. PostgresFilesystemStore now accepts lru_clock: Clock | None = None; when set, read_tile_pixels calls record_lru_access(tile_id, now) so eviction picks the right LRU candidates. Production wiring injects WallClock(); AZ-305 unit tests still construct without the clock and keep their pass-through semantics. Contract tile_store.md bumped to v1.1.0 to add CacheBudgetExhaustedError to the TileCacheError family; shared FDR schema bumped to v1.3.0 for the new c6.eviction_batch kind. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
39ff47087f |
[AZ-307] c6 FreshnessGate: active-conflict reject + stable-rear downgrade
Replaces the AZ-305 pass-through _evaluate_freshness hook with the production FreshnessGate. Loads tile_freshness_rules + sector classifications once at construction, builds an rtree index, and on every evaluate() either returns metadata unchanged (FRESH), stamps freshness_label=DOWNGRADED (stable_rear + stale), or raises FreshnessRejectionError carrying tile_id / age_seconds / classification / rule diagnostics (active_conflict + stale). Constructed inside PostgresFilesystemStore.from_config; the public storage_factory signature is preserved so AZ-305 unit tests still build the store with freshness_gate=None for the pass-through path. FDR schema bumped to v1.2.0: adds c6.freshness.rejected and c6.freshness.downgraded kinds (non-breaking; v1.1 readers route them opaquely). Operator CLI `python -m c6_tile_cache.freshness_gate explain` dry-runs the decision for a (lat, lon, capture_ts). Adjacent hygiene: c6_tile_cache.tools._dump_tile now passes os.environ to load_config (AZ-305 regression — load_config requires the env mapping). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
d1c1cd9ab4 |
[AZ-305] c6 PostgresFilesystemStore: TileStore + TileMetadataStore impl
Adds the production PostgresFilesystemStore implementing both protocols in a single class. Filesystem-backed JPEG I/O (atomic sidecar write, read-only mmap) + Postgres-backed metadata (spatial bbox, LRU, voting, upload bookkeeping). Wires composition via `from_config` classmethod. Key behaviors: - AC-3 strict reading: INSERT runs first inside an open transaction; duplicate-key collisions raise `TileMetadataError` BEFORE any byte is written, leaving the original file + sidecar byte-identical. Atomic sidecar write happens inside the same transaction; commit closes it. Comp-delete remains as a safety net for the rare commit-after-write failure path. - AC-2 content-hash gate runs before any I/O. - Construction performs an orphan-file reconciliation scan and emits an INFO `c6.store.construct` log with steady-state stats. Adds `c6.write` and `c6.write_failed` FDR record kinds (schema v1.1.0, forward-compatible) and a thin operator CLI at `c6_tile_cache.tools dump` for inspection. Dependencies: adds `psycopg-pool>=3.2,<4.0` for the connection pool used on the F3 read-hot path. Tests: 25 new tests for c6_tile_cache cover AC-1..AC-15 plus MmapTilePixelHandle + helper round-trips. Full Tier-2 unit suite passes (1215 passed, 8 skipped, 1 pre-existing unrelated failure `test_ac8_read_host_tuple_on_jetson` — missing `pynvml` on macOS, Jetson-only). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
1141d17769 |
[AZ-300] [AZ-301] [AZ-302] [AZ-304] docs: sync module-layout for c6+c7
Cumulative review of batches 23-27 (cycle 1) surfaced three Medium
documentation-drift findings on module-layout.md. All three fixed
inline per user direction:
F1: c7_inference Internal list expanded with architecture_registry,
config, engine_gate, errors, manifest, thermal_publisher (added
across AZ-300/301/302).
F2: c6_tile_cache `connection.py` re-attributed from AZ-304 (which
deferred it) to AZ-305 with a "planned, not landed yet" tag.
F3: c7_inference Public API description rewritten by category
(Protocol + DTOs + component services + config + error family)
with a pointer to __init__.py's __all__ for the canonical list.
Cumulative review report: _docs/03_implementation/cumulative_review_
batches_23-27_cycle1_report.md (PASS_WITH_WARNINGS).
Autodev state moved to status: paused_user_requested per user
choice; /autodev will resume at greenfield Step 7 (next batch
selection) on next invocation.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
dde838d2cc |
[AZ-304] C6 Postgres schema: additive 0002 migration + UUIDv5
Strictly additive Alembic migration on the AZ-263 baseline (data_model
.md § 6.1 / § 6.3): six new tiles columns (tile_uuid UNIQUE,
location_hash, content_sha256, disk_bytes, accessed_at, uploaded_at),
four new btree indices, one UNIQUE expression index over the
COALESCE-zero-uuid natural key, CHECK widening of
ck_tiles_freshness_status to the AZ-263 + AZ-303 vocabulary UNION,
four NULLable bbox columns on sector_classifications, and a new
tile_freshness_rules table seeded with the two default thresholds.
Pinned UUIDv5 namespace (TILE_NAMESPACE_UUID =
5b8d0c2e-1a4f-4b3a-8c9d-e7f6a3b2c1d0) + derive_tile_id /
derive_location_hash helpers cross-coordinated with
satellite-provider. Migration runner apply_migrations(config) drives
Alembic command.upgrade("head") against the AZ-263 env with one
retry on PG SQLSTATE 40001 and structured INFO logs on apply / no-op.
Contract bump tile_metadata_store.md v1.1.0 -> v1.2.0 adds
TileMetadata.location_hash: UUID | None = None (non-breaking).
module-layout.md updated so c6_tile_cache explicitly Owns
db/migrations/**.
Tier-1 tests: UUIDv5 determinism + locked vectors + DSN resolution +
retry mocked DBAPIError -> 1180 passed, 32 skipped. Tier-2 docker
schema tests gated by @pytest.mark.docker run against the existing
docker-compose.test.yml db service.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
49a67f770d |
[AZ-302] C7 ThermalStatePublisher — jtop/NVML 1 Hz background poller
Implements AZ-297 InferenceRuntime's thermal_state() side: a singleton background-thread publisher that polls jtop (preferred) or pynvml (fallback) at config.thermal_poll_hz, stores an atomic ThermalState snapshot, and emits c7.thermal_transition FDR records on every throttle flip with a WARN log on entry and an INFO log on exit. Default-safe on TelemetryUnavailableError per Invariant I-6 with a 1-Hz rate-limited WARN. Sources return a raw ThermalReading; the publisher stamps measured_at_ns via its injected Clock so _JtopSource / _PynvmlSource stay clean of direct time.* calls (Invariant 2). _poll_once is the deterministic test seam — start() spawns the production thread. - c7.thermal_transition registered in fdr_client.records KNOWN_PAYLOAD_KEYS - [telemetry] optional dep group (jetson-stats, pynvml) added to pyproject - 14 unit tests (AC-1..AC-6, AC-8, NFR-default-safe, structural) green; AC-7 / AC-1 microbench / NFR-perf-poll Tier-2 deferred - full unit suite: 1140 passed, 11 expected Tier-2/CUDA skips Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
59f56c032f |
[AZ-301] Implement EngineGate — D-C10-3 + D-C10-7 takeoff validator
AZ-301 takeoff-side validator every InferenceRuntime strategy calls
before deserialize_engine. Five-step deterministic refusal pipeline,
in order:
1. filename schema parse -> EngineSchemaMismatchError(reason=...)
2. schema tuple match -> EngineSchemaMismatchError(expected,got)
3. sidecar present -> EngineSidecarMissingError
4. sidecar trust -> EngineHashMismatchError(stage=sidecar)
5. manifest match -> EngineHashMismatchError(stage=manifest)
Refusal order is part of the public contract (AC-7 verifies a
fixture that is BOTH schema-mismatched AND missing-sidecar refuses
at step 1).
Production code (new):
- components/c7_inference/engine_gate.py -- EngineGate, HostTuple,
read_host_tuple (Jetson: pynvml + /etc/nv_tegra_release +
tensorrt.__version__; raises RuntimeError on Tier-1)
- components/c7_inference/manifest.py -- DeploymentManifest,
ManifestReader, ManifestReaderProtocol. Risk-2 enforced at the
type level: __getitem__ raises EngineHashMismatchError on
missing key, NEVER KeyError, so the gate cannot silently pass
- components/c7_inference/__init__.py -- re-exports the new
public surface
Tests (new): tests/unit/c7_inference/test_engine_gate.py covers
AC-1..AC-7 + NFR-reliability-no-write + manifest reader + refusal
log emission. 14 tests unconditional + AC-8 Tier-2 skip (needs
real NVML + L4T release file + tensorrt binding).
Three task-spec -> as-built deltas documented in
_docs/02_tasks/done/AZ-301_c7_engine_gate.md Implementation Notes:
1. HostTuple lives in engine_gate.py (the only consumer);
re-exported from package __init__.py.
2. read_host_tuple takes precision as a keyword argument — three
of four fields come from the host, precision is engine-build
metadata supplied by the caller.
3. AC-8 is Tier-2-only; AC-1..AC-7 + NFR-reliability + extras
run on every CI host.
Risk-2 (manifest reader silently treats missing entry as pass):
DeploymentManifest.__getitem__ raises EngineHashMismatchError with
"missing manifest entry for {path}" — covered by
test_manifest_missing_entry_raises_hash_mismatch.
NFR-perf-validate (p99 <= 50 ms): tier-2 only — a real 500 MB
engine streaming sha256 cannot be benchmarked on Tier-1 fixtures.
AZ-302 (ThermalStatePublisher) + AZ-304 (C6 Postgres schema)
deferred to batches 26 / 27 to keep the 1-task batch cadence and
isolate their respective env / testcontainer surface areas.
Suite: 1134 passed / 11 skipped. No regressions outside the new
files.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
65ad2168ed |
[AZ-300] Implement PytorchFp16Runtime — C7 simple-baseline strategy
AZ-300 mandatory simple-baseline InferenceRuntime (eager FP16 PyTorch).
Implements the AZ-297 Protocol; current_runtime_label returns
"pytorch_fp16". Numerical reference every fancier C7 strategy (AZ-298
TRT, AZ-299 ORT) is measured against, and the only viable runtime for
Tier-1 workstation Docker where TRT is non-trivial to install.
Production code (new):
- components/c7_inference/pytorch_fp16_runtime.py — runtime +
PytorchEngineHandle + output-shape adapter
- components/c7_inference/architecture_registry.py — torch-free
register_architecture / default_registry / ArchitectureFactory
(Risk-1 mitigation: no L2->L3 back-edge from C7 into per-backbone
code)
- components/c7_inference/__init__.py — re-exports the registry
mechanism. Still does NOT import the concrete strategy module
(Invariant I-5)
- components/c7_inference/config.py — adds per_frame_debug_log bool
field (gates the DEBUG per-frame latency log)
Tests (new): tests/unit/c7_inference/test_pytorch_fp16_runtime.py
covers AC-1..AC-8 + NFRs. AC-1/2/6/7 + thermal/release/registry
guards run unconditionally (17 tests); AC-3/4/5/8 +
NFR-perf-deserialize + NFR-reliability-eval-mode require CUDA and
skip on Tier-1 CI / macOS dev.
Tests (modified):
- test_protocol_conformance.py — narrowed
test_ac5_build_inference_runtime_flag_on_but_module_missing
parametrisation to exclude pytorch_fp16 (now-built); TRT / ORT
still covered until AZ-298 / AZ-299 ship.
CI: .github/workflows/ci.yml lint + unit jobs now install
'-e .[dev,inference]' because mypy + pytest need torch + torchvision +
onnxruntime on the runner.
Three task-spec -> as-built deltas documented in
_docs/02_tasks/done/AZ-300_c7_pytorch_baseline.md Implementation Notes:
1. Constructor conforms to AZ-297 factory shape (config positional;
thermal_publisher + registry + clock keyword-only optionals).
AZ-302 will update the factory to thread thermal_publisher.
2. Architecture registry uses extras["model_name"] as lookup key
(avoids touching the frozen BuildConfig / EngineCacheEntry DTOs).
3. Warm-up forward deferred to AZ-300 tier-2 follow-up — the zero-arg
registry has no per-backbone input-shape metadata.
Suite: 1120 passed / 10 skipped (CUDA + Tier-2 + cmake / actionlint
environment gates). No regressions in non-c7_inference areas.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
1ebab29a4f |
[AZ-332] C1 OKVIS2 Strategy: facade + binding skeleton
Python facade (`Okvis2Strategy`) is production-quality and satisfies
AZ-331's `VioStrategy` protocol; full AC-1..10 coverage with
AC-9 + NFR-perf marked `tier2`. The C++ pybind11 binding compiles
and loads but throws `OkvisFatalException("estimator not yet wired")`
on first `add_frame` — the `okvis::ThreadedKFVio` wiring is a tier2
follow-up the Step-15 Product Completeness Gate is expected to track
as a remediation task.
Resolved contradictions:
* Constructor signature aligned with the AZ-331 factory: `(config, *,
fdr_client, clock=None)`. Calibration / preintegrator / logger
built internally from config. No churn on AZ-331.
* IMU substrate: OKVIS2 owns its internal estimator IMU integration;
the AZ-276 `ImuPreintegrator` is a separate substrate consumed by
E-C5's fusion graph. Single source of truth lives at the sample
stream, not the integrator instance.
* FDR API: `FdrClient.enqueue(record)` with new `vio.health` kind
added to AZ-272 `KNOWN_PAYLOAD_KEYS`.
CI matrix forces `-DBUILD_OKVIS2=OFF` until the tier2 wiring task
brings Ceres / SuiteSparse / OKVIS2 vendored submodules into the
Linux build.
Files: 17 added/modified across `c1_vio/`, `fdr_client/records.py`,
`cpp/okvis2/CMakeLists.txt`, CI workflow, AZ-332 task spec
(implementation-notes section), batch 23 report.
Tests: 17 new (15 tier1 + 2 tier2). Full Tier-1 suite: 1109 pass,
2 skipped (env), 2 deselected (tier2). No regressions.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
9c35776bcb |
chore: pre-batch-23 carry-over (state + AZ-332 plan)
Handoff artifacts from the prior /autodev session that stopped at Step 7 sub_step compute-next-batch: - _docs/_autodev_state.md: pointer updated to batch 23, AZ-332 only (AZ-345 deferred — dep AZ-346 not yet in done/). - _docs/03_implementation/AZ-332_implementation_plan.md: locked-in decisions (no ROS 2, no Python re-impl, three-env split: macOS dev / Ubuntu CI / Jetson tier2) + step-by-step playbook for next session. Pre-batch chore commit per implement skill prereq #4 (clean tree required before AZ-332 commit so the batch diff stays focused). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
48281db9e9 |
[AZ-381] Fix ISam2GraphHandleImpl missing get_pose_key + comments
F1 (High/Architecture) from cumulative review of batches 01-22: `ISam2GraphHandleImpl` did not satisfy C4's `ISam2GraphHandle` Protocol stub (AZ-355) because it lacked `get_pose_key`. `pose_factory`'s isinstance gate would have raised at composition. Two Protocols (C4 minimal consumer cut, C5 richer producer surface) are intentional per AZ-355 Risk 1 — the impl just needed to expose the canonical name. Delegates to estimator.key_for_frame. Added cross-component conformance test asserting the C5 impl satisfies both Protocols, so future drift trips a unit test. F2 (Medium/Maintainability): added justifying comments at four `except: pass` sites in runtime_root, c8_fc_adapter (ap + inav), and c13_fdr writer. No behavioral change. Updated cumulative review report verdict from FAIL to PASS and recorded a post-mortem on the initial misframing (treated the dual-Protocol design as duplication on first read). Autodev state: batch 22 done, cumulative-review PASS, ready for batch 23. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
8a83166261 |
[AZ-490] C5 set_takeoff_origin entrypoint + bounded-delta GPS gate
Add operator warm-start path to C5 StateEstimator Protocol and both
implementations (GtsamIsam2StateEstimator, EskfStateEstimator), plus
the third clause of the AZ-385 spoof-promotion gate.
- StateEstimator Protocol: set_takeoff_origin(origin, sigma_horiz_m,
sigma_vert_m) -> None.
- iSAM2: PriorFactorPose3 at origin with diagonal sigmas, single
isam2.update().
- ESKF: zero _nominal_pos, overwrite _P position block with sigma**2.
- SourceLabelStateMachine.process_gps_sample bounded-delta clause:
WgsConverter.horizontal_distance_m vs smoother estimate; reject
resets the dwell-time counter so AZ-385 cannot re-promote off bad
GPS.
- New EstimatorAlreadyStartedError (StateEstimatorConfigError
subclass) on late call after first add_*.
- C5StateConfig: spoof_promotion_bounded_delta_m=200,
default_takeoff_origin_sigma_horiz_m=5,
default_takeoff_origin_sigma_vert_m=10.
- New GpsSample DTO + WgsConverter.horizontal_distance_m helper.
- 4 new FDR kinds (cold_start_origin.{set,unavailable},
gps_bounded_delta.{accept,reject}) registered in AZ-272 schema.
- 33 new unit tests cover AC-1..AC-15; full repo 750 passed / 2
skipped (pre-existing CI tooling skips).
Docs synced: protocol contract, C5 component description,
architecture, glossary, system-flows, C10 provisioning description.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
72a06edab0 |
[AZ-489] C12 FlightsApiClient + offline JSON loader + bbox helper
ADR-010 primary cold-start path now has a real source for the cache bbox and the takeoff origin. Single concrete strategy (`HttpxFlightsApiClient`) behind a `@runtime_checkable` Protocol; offline JSON fallback (`load_flight_file`) shares the same DTO shape per FAC-INV-1. * `flights_api/interface.py` — `FlightsApiClient` Protocol + `FlightDto` + `WaypointDto` + `WaypointObjective` / `WaypointSource` enums (plain frozen-slotted dataclasses, matching project's LatLonAlt / PoseEstimate pattern). * `flights_api/errors.py` — 8-class hierarchy under `FlightsApiError`. * `flights_api/_parser.py` — shared JSON validator: range checks, lat/lon bounds, contiguous ordinals, finite floats, enum membership. * `flights_api/bbox.py` — `bbox_from_waypoints` envelopes lat/lon and inflates by a horizontal-distance buffer via WgsConverter ENU round-trip (NOT degree-space); `takeoff_origin_from_flight` passes waypoints[0] through unrounded. * `flights_api/file_loader.py` — orjson-backed offline loader. * `flights_api/httpx_client.py` — concrete client with ONE retry on transient 5xx + connect errors; token redaction at every log site; test-injectable transport + sleep. * `runtime_root/c12_factory.py` — `build_flights_api_client(config)`; re-exported from `runtime_root/__init__.py`. OperatorToolServices aggregate intentionally deferred to AZ-328 per scope discipline. * `pyproject.toml` — `httpx>=0.28,<1.0` added (chosen over `requests` for native `MockTransport` testing). Tests: 28 cases across AC-1..AC-18 plus extras (malformed JSON, negative buffer, zero buffer, missing top-level fields, negative ordinal, empty-flight takeoff). Full repo run: 713 passed, 2 skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
db27e25630 |
[AZ-355] C4 PoseEstimator Protocol + factory + DTOs + composition
Land the foundational C4 surface AZ-358 (Marginals) and AZ-361 (Hybrid) build on top of: - PoseEstimator Protocol (@runtime_checkable): estimate(...) + current_covariance_mode(). - Error hierarchy: PoseEstimatorError, PnpFailureError, PoseEstimatorConfigError; CovarianceDegradedWarning as a Warning subclass (warnings.warn path, not raised). - ISam2GraphHandle Protocol stub (READ-ONLY view, get_pose_key only) decoupled from C5's concrete ISam2GraphHandleImpl. - C4PoseConfig (frozen dataclass) + register on c4_pose import. - runtime_root/pose_factory.build_pose_estimator with lazy-import fallback; INFO log c4.pose.strategy_loaded; shares ingest-thread binding with C5 per ADR-003. DTO restructuring (cross-cutting): retire the legacy raw-4x4 PoseEstimate(int frame_id, datetime timestamp, pose_se3, ...) and ship the contract shape PoseEstimate(UUID, LatLonAlt, Quat, np.ndarray, CovarianceMode, PoseSourceLabel, last_satellite_anchor_age_ms, emitted_at). C5 add_pose_anchor in both gtsam_isam2 + eskf_baseline migrated in lockstep via WGS84->ENU + Quat->R helpers; test fixtures updated. VIO output stays on the raw shape until AZ-331 (C1 protocol) lands. LatLonAlt upgraded to slots=True per AC-2. ThermalState stub added to _types/thermal.py so the Protocol typechecks pre-AZ-302. Tests: 25 new in tests/unit/c4_pose/test_az355_pose_protocol.py covering AC-1..AC-10 + factory wiring + config validation; full repo: 685 passed, 2 pre-existing CI-only skips. Jira transition deferred: MCP "Not connected"; leftover entry in _docs/_process_leftovers/2026-05-11_jira_transition_az355_deferred.md. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
c0bdb57957 |
[AZ-386] C5 ESKF baseline: 16-state error-state KF (NumPy)
Implements the mandatory simple-baseline StateEstimator per AC-2.1a engine-rule at C5 (IT-12 comparative study vs iSAM2). NumPy-only; no GTSAM dependency so BUILD_STATE_ESKF=ON binaries ship without GTSAM at all. - 16-state error vector (pos 3 + vel 3 + rot 3 + ba 3 + bg 3 + dt 1) over a textbook nominal-state / error-state ESKF split. - add_fc_imu: full nonlinear IMU integration + linearised F P F^T + Q covariance propagation per IMU sample. - add_vio: simplified relative-pose update (snapshot-based; baseline scope, documented). - add_pose_anchor: absolute-pose update; integrates BOTH marginals and jacobian modes (no skip — ESKF has no graph; AC-4). - AC-9 divergence test: Mahalanobis r^T S^-1 r > 100 (10 sigma) on the innovation covariance S = H P H^T + R. - AC-5 SPD: Cholesky-positive enforcement on every emitted covariance; non-SPD raises EstimatorFatalError and locks state to LOST. - AC-6 honesty: smoothed_history entries carry smoothed=False; deviation from C5 contract Invariant 7 documented in module + report. - AC-7 / AC-10 BUILD_STATE_ESKF gating: works through existing factory infra (state_factory._STATE_BUILD_FLAGS). - AC-8: SourceLabelStateMachine + FallbackWatcher auto-wired eagerly in __init__, same pattern as the iSAM2 estimator. Tests: 20 new unit tests covering AC-1..AC-10 + robustness checks. Full suite: 660 passed, 2 skipped (CI-only). The AZ-386 Jira transition to Done is deferred (Atlassian MCP returned 'Not connected'); recorded in _docs/_process_leftovers/ for replay on the next autodev invocation per the Leftovers Mechanism. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
098aabac0c |
[AZ-387] C5 smoothed-history → FDR side-channel
After every successful current_estimate(), emit one c5.state.smoothed_history FDR record per newly-smoothed past keyframe from IncrementalFixedLagSmoother. AC-4.5 (revised): the smoothed stream goes ONLY to FDR; the C8 outbound forward-time stream is unaffected. Idempotency via _smoothed_fdr_watermark_s (smoother-native float seconds); the same pose key is never emitted twice. Hook is best-effort — internal failures log warnings but do not raise, so a smoother divergence cannot contaminate the forward-time path. Cross-task invariants documented: - AC-3 ESKF no-op — AZ-386 installs an inert hook on the ESKF. - AC-4 No C8 leak — enforced at the C8 boundary by AZ-261. 8 new unit tests against AC-1/2/5/6 + robustness (no-FDR-client, marginals failure). Full suite: 640 passed, 2 skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
7cbd17ee83 |
[AZ-385] C5 SourceLabelStateMachine + spoof-promotion gate
Implements Invariants 5 + 8 + AC-NEW-2 / AC-NEW-8: the EstimatorOutput.source_label now reflects a real state machine (DEAD_RECKONED → SATELLITE_ANCHORED ↔ VISUAL_PROPAGATED) governed by a spoof-promotion gate that latches closed on FC SPOOFED GPS health and re-opens only when BOTH conditions hold — ≥10 s STABLE_NON_SPOOFED AND next anchor within spoof_promotion_visual_consistency_tol_m. Every reject emits a c5.state.spoof_rejected FDR record plus a subscriber-fan-out STATUSTEXT (severity WARNING, 50-char cap per MAVLink). FDR and subscriber paths bypass the standard logger so silencing logs cannot suppress the spoof trail (R07 / AC-6). GtsamIsam2StateEstimator now eagerly builds the SM from C5StateConfig in __init__; new public methods notify_gps_health() (delegates to SM, called by composition root from C8 inbound) and subscribe_spoof_rejection() (composition root attaches C8's QgcTelemetryAdapter here). health_snapshot.spoof_promotion_blocked + current_estimate.source_label now flow from the live SM. 25 new unit tests across all 12 ACs plus cancellation, subscriber exception isolation, and estimator wire-up integration cases. One AZ-384 test renamed + updated to expect DEAD_RECKONED before any anchor (was VISUAL_PROPAGATED placeholder pre-AZ-385). Full suite: 632 passed, 2 skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
31a300f8a2 |
[AZ-388] C5 AC-5.2 no-estimate fallback detector + signal emission
Implements Invariant 9 / AC-5.2: when current_estimate cannot return a fresh output for >= state.no_estimate_fallback_s (default 3.0 s), emit ONE engagement signal (FDR kind=c5.state.no_estimate_fallback_engaged + GCS STATUSTEXT severity CRITICAL); on recovery, ONE recovery signal (FDR kind=c5.state.no_estimate_fallback_recovered + STATUSTEXT NOTICE). Rate-limited via single _in_fallback latch (AC-2: 30 s sustained no-estimate still emits exactly one engagement). New FallbackWatcher class owns the state machine; estimator wires it through constructor + current_estimate entry/success hooks. Public check_fallback_state(now_ns) watchdog (NFR p99 <= 5 us) + subscribe APIs let C8 outbound react without coupling C5 to a concrete GCS adapter at construction. Severity enum extended with CRITICAL=2 and NOTICE=5 to match MAVLink MAV_SEVERITY. 18 new unit tests across all 8 ACs, deterministic synthetic clock, integration tests patch monotonic_ns through GtsamIsam2StateEstimator to drive AC-7 iSAM2 leg (ESKF leg deferred to AZ-386). Full suite: 607 passed, 2 skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
b3ad94c155 |
[AZ-384] C5 marginals + current_estimate/smoothed_history/health_snapshot
Replaces the last three NotImplementedError placeholders on GtsamIsam2StateEstimator with real Marginals + output methods: - current_estimate(): recovers the 6x6 Marginals covariance for the most-recently committed pose key, enforces the SPD invariant via np.linalg.cholesky (Invariant 10), converts the local-ENU pose translation to WGS84 via the shared WgsConverter, derives a body->world quaternion, and emits a fresh EstimatorOutput (smoothed=False, Invariant 4). On SPD failure transitions isam2_state -> LOST and raises EstimatorFatalError (AC-5.2 path). - smoothed_history(n): iterates the smoother's active POSE keys via _smoother.calculateEstimate().keys() (filtered by GTSAM symbol char) and the smoother timestamps via ts_map.at(key) - workaround for the pinned gtsam_unstable build's non-iterable FixedLagSmootherKeyTimestampMap. Bounded by K (Invariant 6); every entry has smoothed=True (Invariant 7). - health_snapshot(): cheap O(1) accumulator read; reports IsamState lifecycle, pose-key count, AC-NEW-8 cov_norm_growing_for_s rolling 60s deque-backed counter, and spoof_promotion_blocked via the AZ-385 state machine injection point. Adds two public injection points for AZ-385/composition root: set_enu_origin(LatLonAlt) and attach_source_label_state_machine(machine). Defaults: (0, 0, 0) ENU origin, VISUAL_PROPAGATED source label, spoof_promotion_blocked=False. Wires _record_committed_pose_key into the three add_* success paths so current_estimate only reads keys that have real values in iSAM2. The JACOBIAN path in add_pose_anchor deliberately skips this call - Invariant 3 keeps the JACOBIAN pose out of the iSAM2 graph. Tests: +27 in tests/unit/c5_state/test_az384_marginals_outputs.py covering all 10 ACs. Three obsolete AZ-382 tests (test_ac10_*_raises_named_az384) removed. Full suite: 589 passed, 2 skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
fd848266d1 |
[AZ-383] C5 add_vio/add_pose_anchor/add_fc_imu factor adds
Replaces AZ-382 NotImplementedError placeholders with real GTSAM factor adds wired against the iSAM2 graph handle: - add_vio -> BetweenFactorPose3 between consecutive VIO pose keys (first call primes the chain; AZ-388 owns first-keyframe seeding). - add_pose_anchor -> mode-dispatch per pose.covariance_mode: "marginals" -> PriorFactorPose3 + handle.update(); "jacobian" -> skip iSAM2 add per AZ-361 contract. Both paths bump _last_anchor_ns via time.monotonic_ns(). - add_fc_imu -> shared ImuPreintegrator.integrate_window + reset_for_new_keyframe; builds a CombinedImuFactor between the prev/curr (X, V, B) keyframe triple. Introduces new 'v' (velocity) and 'b' (bias) GTSAM key namespaces decoupled from the VIO/pose frame_id mapping. Invariant 2 - non-decreasing timestamps - enforced per call with EstimatorDegradedError + c5.state.out_of_order log. Every successful add emits a structured DEBUG *_ok log; every failure emits a structured ERROR *_failed log and raises through the C5 error hierarchy (R05). Contract-vs-reality fix-ups also landed: - StateEstimator Protocol: add_fc_imu(ImuWindow) - was incorrectly annotated as ImuTelemetrySample by AZ-381. - _last_anchor_ns semantics switched to monotonic_ns() to match last_anchor_age_ms. - create() factory back-wires the ISam2GraphHandle to the estimator via the new attach_handle() method. Tests: +21 in tests/unit/c5_state/test_az383_factor_adds.py covering all 8 ACs with mock ISam2GraphHandle instances. Three obsolete AZ-382 tests (test_ac10_add_*_raises_named_az383) removed. Full suite: 565 passed, 2 skipped. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
8b394a98c6 |
[AZ-382] C5 GtsamIsam2StateEstimator skeleton + real iSAM2 handle bodies
- Add GtsamIsam2StateEstimator owning the GTSAM substrate:
gtsam.ISAM2(ISAM2Params()) + gtsam_unstable.IncrementalFixedLagSmoother
(K * 1/3 s window per D-C5-3) + NonlinearFactorGraph + Values.
- Module-level create(...) factory + register() helper for
register_state_estimator("gtsam_isam2", create). Opt-in registration
per ADR-002 — no auto-import.
- Key-management policy: key_for_frame(UUID) -> int via
gtsam.symbol('x', counter); idempotent re-lookup.
- Replace all four NotImplementedError bodies in _isam2_handle.py with
real GTSAM calls:
* add_factor → estimator._graph.add(factor); R05 defensive logging
on success/failure; EstimatorDegradedError on failure.
* update → _isam2.update + _smoother.update; empty
FixedLagSmootherKeyTimestampMap substituted for timestamps=None;
EstimatorFatalError on either failure.
* compute_marginals → gtsam.Marginals(getFactorsUnsafe(),
calculateEstimate()).
* last_anchor_age_ms → (monotonic_ns - _last_anchor_ns) // 1e6.
- StateEstimator Protocol methods on the estimator still raise
NotImplementedError naming AZ-383 (factor adds) / AZ-384
(marginals + outputs).
- AZ-382 AC tests: 27 cases covering 10/10 ACs + factory integration.
- AZ-381 test_ac8_handle_methods_raise_named_task removed (obsolete:
bodies are real now); test_ac8_handle_is_isam2_graph_handle retained.
- Full suite: 547 passed (+26 vs B12), 2 skipped.
- Impl report: _docs/03_implementation/batch_13_cycle1_report.md.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
||
|
|
beed43724f |
[AZ-381] C5 StateEstimator protocol + factory + C8 DTO reshape
- Add StateEstimator Protocol (6 methods, @runtime_checkable) + DTOs (EstimatorOutput, EstimatorHealth, IsamState, PoseSourceLabel, Quat) in _types/state.py per state_estimator_protocol.md v1.0.0. - Add C5 error hierarchy (StateEstimatorError + 3 subclasses) and C5StateConfig (strategy, keyframe_window, spoof gates, no_estimate_fallback_s) with __post_init__ validation. - Add ISam2GraphHandle Protocol + ISam2GraphHandleImpl skeleton (all 4 methods raise NotImplementedError naming AZ-382 as owner). - Add build_state_estimator factory + bind_state_ingest_thread for single-writer enforcement; ADR-002 build-flag gating (BUILD_STATE_<variant>); INFO log on success. - Strict reshape of legacy EstimatorOutput / EstimatorHealth across all 6 C8 production files (_outbound_provenance, _covariance_projector, pymavlink_ardupilot_adapter, msp2_inav_adapter, mavlink_gcs_adapter, interface) + 6 C8 test files (UUID frame_id, LatLonAlt position_wgs84, Quat orientation, PoseSourceLabel enum source_label). Remove ad-hoc DTOs from _types/pose.py and from C4's public __init__ (EstimatorOutput is a C5 concept, not a C4 one). - 20 AZ-381 AC tests (10 ACs + 4 config range + NFR + conformance). - Full suite: 521 passed, 2 skipped (+20 vs Batch 11). - Contracts: state_estimator_protocol.md v1.0.0 -> active; composition_root_protocol.md v1.2.0 -> v1.3.0 (additive state block + factory + ingest-thread binding). - Impl report: _docs/03_implementation/batch_12_cycle1_report.md. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
8a9cf88a46 |
[AZ-396] [AZ-397] Batch 11: C8 source-set switch + QGC telemetry adapter
AZ-396: PymavlinkArdupilotAdapter.request_source_set_switch body sends MAV_CMD_SET_EKF_SOURCE_SET, awaits COMMAND_ACK with timeout, enforces Invariant 11 idempotence (1s rate-limit + skip-after-success). Adds runtime_root.SpoofRecoverySink to bridge C5 spoof-promotion-recovered signal to the C8 outbound thread via a bounded dispatch queue. FcConfig gains spoof_recovery_source_set + source_set_switch_timeout_ms. AZ-397: QgcTelemetryAdapter implements GcsAdapter strategy: MAVLink 2.0 to QGC, emit_summary downsamples 5Hz to configurable summary_rate_hz [0.5, 5.0] via integer modulo, emit_status_text mirrors to GCS link, subscribe_operator_commands translates COMMAND_LONG / PARAM_REQUEST_* / REQUEST_DATA_STREAM / MISSION_* / SET_MODE into OperatorCommand DTOs and audits each receipt to FDR. FcKind.GCS_QGC added for PortConfig. Tests: 25 new (12 AZ-396 + 13 AZ-397); full suite 501 passing, 2 skipped. Contracts unchanged (additive FcConfig fields, range relaxation on GcsConfig.summary_rate_hz, additive FcKind enum value). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
1e0be08e8a |
[AZ-393] [AZ-394] [AZ-395] C8 outbound chain + AP MAVLink2 signing
AZ-393 ArduPilot outbound: PymavlinkArdupilotAdapter encodes EstimatorOutput to MAVLink2 GPS_INPUT via gps_input_send; emits NAMED_VALUE_FLOAT(name="src_lbl") every frame and STATUSTEXT on source_label transition (1 Hz per-severity cap). Smoothed-output guard (Invariant 6), single-writer thread (Invariant 8), SPD propagation. Shared helper _outbound_provenance.py owns the canonical source-label-to-float table + transition rate-limiter. AZ-394 iNav outbound: Msp2InavAdapter encodes EstimatorOutput to hand-rolled MSP2_SENSOR_GPS (0x1F03, 52-byte LE payload via _msp2_sensor_gps_encoder.py + YAMSPy send_RAW_msg). Secondary unsigned MAVLink channel for STATUSTEXT transitions. open() rejects non-None signing_key (RESTRICT-COMM-2 / Invariant 2); request_source_set_switch raises SourceSetSwitchNotSupportedError (Invariant 9 verified: never calls setup_signing on secondary). AZ-395 AP MAVLink2 signing: ephemeral per-flight 32-byte key from secrets.token_bytes; pymavlink setup_signing handshake at open(); in-place bytearray zeroisation on close(); mid-flight signing-failure detection (ERROR log + WARNING STATUSTEXT + no raise; threshold configurable). Key never logged / persisted / serialised (regex-scanned by AC-4/AC-5). BUILD_DEV_STATIC_KEY=ON enables repeatable static-key dev path; rejected at open() when the build flag is absent. Shared: EstimatorOutput.smoothed (default False) added for the Invariant 6 gate at the C8 boundary; FcConfig extended with dev_static_signing_key + signing_failure_threshold (additive defaults; cross-field validation in __post_init__). Tests: 33 new AC tests (11 + 11 + 11) covering all 30 ACs; full suite 476 passing / 2 skipped / 0 failing (was 443). Contract surfaces unchanged at fc_adapter_protocol v1.0.0 and composition_root v1.2.0. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
a61d2d3f4b |
[AZ-391] C8 inbound: MAVLink + MSP2 decoders + rings + bus + warm-start
Adds the C8 inbound producer side: - TelemetryRing[T]: bounded drop-oldest ring; first-overflow INFO log + monotonic dropped_count. - SubscriptionBus + SubscriptionHandle: synchronous fan-out, lock- released-before-callback to avoid deadlock; subscriber crash caught + DEBUG-logged so one bad subscriber cannot kill the decode loop. - PymavlinkInboundDecoder: pymavlink-based AP decoder for RAW_IMU, SCALED_IMU2, ATTITUDE, GPS_RAW_INT, GPS2_RAW, HEARTBEAT, STATUSTEXT. Out-of-order drop (Invariant 7) per-kind WARN. STATUSTEXT spoofing sentinel promotes subsequent GPS to GpsStatus.SPOOFED within 5 s. AC-5.1 warm-start hint cached on first 3D+ fix; embedded into every FlightStateSignal. - Msp2InavInboundDecoder: YAMSPy-based iNav polling decoder for IMU / attitude / GPS / flight-state. signed=False always (RESTRICT-COMM-2); GpsStatus.SPOOFED is unreachable on iNav. Adds yamspy>=0.3.3 + pyserial>=3.5 to pyproject.toml. Tests: 443 pass / 2 skip / 0 fail (+33 in batch 9). Contract: no drift on fc_adapter_protocol.md v1.0.0; this batch implements the inbound producer side without changing signatures. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
362e93c626 |
[AZ-390] [AZ-392] C8 FC/GCS adapter foundation + covariance projector
Adds the C8 foundation: - FcAdapter / GcsAdapter / ReplaySink Protocols + contract DTOs in _types/fc.py (PortConfig, FcKind, FlightState, GpsStatus, Severity, TelemetryKind, FcTelemetryFrame, FlightStateSignal, GpsHealth, OperatorCommand, Subscription, Imu/Attitude samples). - Disjoint FcAdapterError / GcsAdapterError trees with SourceSetSwitchNotSupportedError <: SourceSetSwitchError per AC-9. - FcConfig + GcsConfig cross-cutting Config blocks with config-load validation (unknown strategy rejected at __post_init__). - runtime_root/fc_factory.py: build_fc_adapter / build_gcs_adapter with BUILD_FC_*/BUILD_GCS_* flag gating + INFO log on load + single-writer outbound-thread binding. - CovarianceProjector (helper, AZ-392): 6x6 -> 3x3 -> 2x2 -> sqrt(lambda_max) reduction; AP returns float m, iNav returns int mm with uint16 clamp + WARN + FDR record. Non-SPD / NaN / wrong-shape raise FcEmitError and emit an FDR ERROR record carrying frame_id. Contracts: - composition_root_protocol.md 1.1.0 -> 1.2.0 (added fc/gcs blocks + build_fc_adapter / build_gcs_adapter + outbound-thread binding). - fc_adapter_protocol.md unchanged (this batch implements v1.0.0). Tests: 410 pass / 2 skip / 0 fail (+53 new tests in batch 8). AZ-391 (inbound subscription) deferred to batch 9 — pulls YAMSPy as a new external dependency (iNav MSP2 decode). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
e4ecdaf619 |
[AZ-294] [AZ-295] [AZ-296] Finish C13: tile snapshot + record-kind policy + takeoff abort
AZ-294: MidFlightTileSnapshotSink writes orthorectified tile JPEGs atomically to flight_root/<flight_id>/tiles/<tile_id>.jpg, emits a kind="mid_flight_tile_snapshot" pointer record, and evicts the oldest tile when the per-flight 64 MiB cap is exceeded. Adds optional frame_id to the snapshot payload (fdr_record_schema bump). AZ-295: RecordKindPolicy with two paired gates: - enforce_or_raise (producer-side) raises RawFrameWriteForbiddenError for raw_nav_frame / raw_ai_cam_frame at the call site, defending AC-8.5 / RESTRICT-UAV-4. - gate_for_writer (writer-side) tumbling-window rate-caps failed_tile_thumbnail records at <= 0.1 Hz; over-cap drops are coalesced into kind="overrun" records with the originating producer slug. AZ-296: take_off() composition-root sequence with strict ordering (writer.__init__ -> start -> open_flight -> fc_adapter.__init__ -> fc_adapter.open). On FdrOpenError, logs ERROR record, calls writer.stop(), prints the documented FATAL line to stderr, and sys.exit(EXIT_FDR_OPEN_FAILURE=2). composition_root_protocol bumped to v1.1.0 with the new constants + takeoff-sequence section. 29 new tests; full suite 356 passed / 2 skipped / 0 failures. No new dependencies (stdlib only). Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
b5dd6031d2 |
[AZ-291] [AZ-292] [AZ-293] C13 FDR writer chain (batch 6)
AZ-291 — FileFdrWriter: single writer thread draining every registered FdrClient SPSC ring buffer to per-flight segment files; per-segment size rotation; cross-process fcntl.flock filelock on flight_root; ENOSPC degraded mode with rate-capped ERROR logs and one GCS alert. AZ-292 — FlightHeader/FlightFooter dataclasses + open_flight / close_flight lifecycle methods; four per-flight monotonic counters (records_written, records_dropped_overrun, bytes_written, rollover_count) reported by the footer; flight_id mismatch and close-without-open are typed errors. AZ-293 — CapacityCapPolicy (post-rotation hook): walks the flight directory, drops the oldest CLOSED segment when total > cap (default 64 GiB), emits a kind="segment_rollover" record per drop. Never drops the currently-open segment or segment 0 alone; cap_misconfigured path logs ERROR + GCS alert. No config flag disables emission (C13-ST-01). Schema: bumped fdr_record_schema flight_header / flight_footer payload key sets to match the AZ-292 task spec (effective 1.0.0 -> 1.1.0; no prior producer); KNOWN_PAYLOAD_KEYS updated. Added FdrWriterConfig nested in FdrConfig (segment_size_bytes, batch_size, flight_cap_bytes, debug_log_per_record). Tests: 29 new unit tests (8 AC + 1 invariant per task); full suite 323 passed, 2 pre-existing skips, 0 regressions. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
33486588de |
[AZ-271] [AZ-276] [AZ-278] [AZ-282] Finish cross-cutting helpers + relax opencv pin
E-CC-HELPERS closes with the three remaining Layer-1 helpers and E-CC-CONF closes with the env > YAML > defaults precedence test gate. All four tickets ship with frozen public surfaces, hermetic unit tests, and no upward (components.*) imports. * AZ-271 — tests/unit/shared/config/test_precedence.py (5 ACs + smoke test + helper that names the layer in failure messages). * AZ-282 — helpers/ransac_filter.py: static RansacFilter + RansacResult; cv2.setRNGSeed(0) for byte-equal determinism; median residual semantics pinned by contract. * AZ-276 — helpers/imu_preintegrator.py + make_imu_preintegrator; GTSAM PreintegratedCombinedMeasurements; strict-monotonic ts_ns guard runs before any state mutation. Adjacent hygiene: _types/nav.py ImuSample/ImuWindow now use ts_ns:int and the spec-mandated ImuBias dataclass. * AZ-278 — helpers/lightglue_runtime.py: structural R14 fix. LightGlueRuntime + non-blocking concurrent-access guard that raises rather than serialising. EngineHandle Protocol in _types/manifests.py + KeypointSet/CorrespondenceSet in _types/matching.py (Protocol surface adds approved by spec). Dependency conflict (Finding 1, user-approved): gtsam 4.2 (PyPI) is numpy-1.x-ABI only; opencv-python>=4.12 needs numpy>=2 at runtime. Resolution: opencv-python pin relaxed to >=4.11.0.86,<4.12. The D-CROSS-CVE-1 ratchet at ci/opencv_pin_gate.py is held at 4.11.0 with the original 4.12.0 floor restored once a numpy-2-compatible gtsam wheel ships. Full replay procedure in _docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md. Tests: 294 passed, 2 skipped (cmake/actionlint env-skips, pre-existing). 43 new tests added for batch 5. Ruff check + format clean. Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
ba20c2d195 |
[AZ-273] [AZ-274] [AZ-275] [AZ-267] [AZ-268] FDR producer chain + log bridge + contract test
AZ-273: lock-free SPSC ring buffer with pre-allocated slots, power-of- two capacity, opt-in SPSC guard, and EnqueueResult / FdrSpscViolationError on the public surface. make_fdr_client caches one client per producer_id and reads capacity from config.fdr.per_producer_capacity with fallback to queue_size. AZ-274: default_overrun_policy implements drop-oldest + retry + immediate marker emission, with prior-marker dropped_count folding via _evict_one so user-loss info is never lost across iterations. ERROR diagnostic is rate-limited to <=1/sec per producer. AZ-275: FakeFdrSink mirrors the FdrClient public surface and reuses the production default_overrun_policy via a duck-typed _PolicyAdapter. The test-only records/all_records_ever properties let component tests assert both in-buffer and lifetime state. tests/conftest.py registers the fake_fdr_sink fixture and an AST architecture lint forbids production imports of fakes. AZ-267: FdrLogBridgeHandler installs on the root logger via wire_log_bridge and forwards only WARN+ERROR records into the FDR with kind="log". Thread-local recursion guard short-circuits internal logging; saturated- queue diagnostics go to stderr every N=1000 drops. AZ-268: tests/contract/log_schema.py covers every row of the schema's Test Cases table plus the "DEBUG+INFO never reach FDR" invariant. pyproject.toml registers the contract pytest marker and the contract-mandated log_schema.py file-name. 251 unit + contract tests pass (48 new). Review verdict: PASS_WITH_WARNINGS; findings are NFR-perf deferrals + documented relaxation of AZ-274 AC-2 coalescing under permanently-stalled consumer. Co-authored-by: Cursor <cursoragent@cursor.com> |