Commit Graph

234 Commits

Author SHA1 Message Date
Oleksandr Bezdieniezhnykh 16a4582c3f chore: close tile-schema leftover, start batch 28 (AZ-305)
AZ-304 (batches 23-27 cumulative review) landed the onboard portion of
the tile-schema design (UUIDv5 helpers + 0002 migration + location_hash
field). The remaining cross-workspace satellite-provider hand-off is
tracked separately in that repo's todo. Autodev state advances to
sub_step.batch-loop for the next batch.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 17:19:31 +03:00
Oleksandr Bezdieniezhnykh 1141d17769 [AZ-300] [AZ-301] [AZ-302] [AZ-304] docs: sync module-layout for c6+c7
Cumulative review of batches 23-27 (cycle 1) surfaced three Medium
documentation-drift findings on module-layout.md. All three fixed
inline per user direction:

F1: c7_inference Internal list expanded with architecture_registry,
    config, engine_gate, errors, manifest, thermal_publisher (added
    across AZ-300/301/302).

F2: c6_tile_cache `connection.py` re-attributed from AZ-304 (which
    deferred it) to AZ-305 with a "planned, not landed yet" tag.

F3: c7_inference Public API description rewritten by category
    (Protocol + DTOs + component services + config + error family)
    with a pointer to __init__.py's __all__ for the canonical list.

Cumulative review report: _docs/03_implementation/cumulative_review_
batches_23-27_cycle1_report.md (PASS_WITH_WARNINGS).

Autodev state moved to status: paused_user_requested per user
choice; /autodev will resume at greenfield Step 7 (next batch
selection) on next invocation.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 17:12:30 +03:00
Oleksandr Bezdieniezhnykh dde838d2cc [AZ-304] C6 Postgres schema: additive 0002 migration + UUIDv5
Strictly additive Alembic migration on the AZ-263 baseline (data_model
.md § 6.1 / § 6.3): six new tiles columns (tile_uuid UNIQUE,
location_hash, content_sha256, disk_bytes, accessed_at, uploaded_at),
four new btree indices, one UNIQUE expression index over the
COALESCE-zero-uuid natural key, CHECK widening of
ck_tiles_freshness_status to the AZ-263 + AZ-303 vocabulary UNION,
four NULLable bbox columns on sector_classifications, and a new
tile_freshness_rules table seeded with the two default thresholds.

Pinned UUIDv5 namespace (TILE_NAMESPACE_UUID =
5b8d0c2e-1a4f-4b3a-8c9d-e7f6a3b2c1d0) + derive_tile_id /
derive_location_hash helpers cross-coordinated with
satellite-provider. Migration runner apply_migrations(config) drives
Alembic command.upgrade("head") against the AZ-263 env with one
retry on PG SQLSTATE 40001 and structured INFO logs on apply / no-op.

Contract bump tile_metadata_store.md v1.1.0 -> v1.2.0 adds
TileMetadata.location_hash: UUID | None = None (non-breaking).
module-layout.md updated so c6_tile_cache explicitly Owns
db/migrations/**.

Tier-1 tests: UUIDv5 determinism + locked vectors + DSN resolution +
retry mocked DBAPIError -> 1180 passed, 32 skipped. Tier-2 docker
schema tests gated by @pytest.mark.docker run against the existing
docker-compose.test.yml db service.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 17:05:41 +03:00
Oleksandr Bezdieniezhnykh 21f5a30d09 refactor: update autodev state and tile metadata store version
- Changed autodev state to reflect the transition from batch 26 to batch 27, updating the phase and details for the compute-batch step.
- Incremented the version of the tile metadata store from 1.0.0 to 1.1.0, refining the uniqueness invariant to use a natural key that includes flight_id, allowing coexistence of multiple rows for the same tile from different flights.
- Updated the last modified date in the tile metadata store documentation to reflect recent changes.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 16:33:23 +03:00
Oleksandr Bezdieniezhnykh ca37f8849d chore: record batch 26 push + queued candidates in autodev state
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 14:22:11 +03:00
Oleksandr Bezdieniezhnykh 49a67f770d [AZ-302] C7 ThermalStatePublisher — jtop/NVML 1 Hz background poller
Implements AZ-297 InferenceRuntime's thermal_state() side: a singleton
background-thread publisher that polls jtop (preferred) or pynvml
(fallback) at config.thermal_poll_hz, stores an atomic ThermalState
snapshot, and emits c7.thermal_transition FDR records on every throttle
flip with a WARN log on entry and an INFO log on exit. Default-safe on
TelemetryUnavailableError per Invariant I-6 with a 1-Hz rate-limited
WARN.

Sources return a raw ThermalReading; the publisher stamps measured_at_ns
via its injected Clock so _JtopSource / _PynvmlSource stay clean of
direct time.* calls (Invariant 2). _poll_once is the deterministic test
seam — start() spawns the production thread.

- c7.thermal_transition registered in fdr_client.records KNOWN_PAYLOAD_KEYS
- [telemetry] optional dep group (jetson-stats, pynvml) added to pyproject
- 14 unit tests (AC-1..AC-6, AC-8, NFR-default-safe, structural)
  green; AC-7 / AC-1 microbench / NFR-perf-poll Tier-2 deferred
- full unit suite: 1140 passed, 11 expected Tier-2/CUDA skips

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 10:33:37 +03:00
Oleksandr Bezdieniezhnykh 59f56c032f [AZ-301] Implement EngineGate — D-C10-3 + D-C10-7 takeoff validator
AZ-301 takeoff-side validator every InferenceRuntime strategy calls
before deserialize_engine. Five-step deterministic refusal pipeline,
in order:

  1. filename schema parse  -> EngineSchemaMismatchError(reason=...)
  2. schema tuple match     -> EngineSchemaMismatchError(expected,got)
  3. sidecar present        -> EngineSidecarMissingError
  4. sidecar trust          -> EngineHashMismatchError(stage=sidecar)
  5. manifest match         -> EngineHashMismatchError(stage=manifest)

Refusal order is part of the public contract (AC-7 verifies a
fixture that is BOTH schema-mismatched AND missing-sidecar refuses
at step 1).

Production code (new):
 - components/c7_inference/engine_gate.py  -- EngineGate, HostTuple,
   read_host_tuple (Jetson: pynvml + /etc/nv_tegra_release +
   tensorrt.__version__; raises RuntimeError on Tier-1)
 - components/c7_inference/manifest.py     -- DeploymentManifest,
   ManifestReader, ManifestReaderProtocol. Risk-2 enforced at the
   type level: __getitem__ raises EngineHashMismatchError on
   missing key, NEVER KeyError, so the gate cannot silently pass
 - components/c7_inference/__init__.py     -- re-exports the new
   public surface

Tests (new): tests/unit/c7_inference/test_engine_gate.py covers
AC-1..AC-7 + NFR-reliability-no-write + manifest reader + refusal
log emission. 14 tests unconditional + AC-8 Tier-2 skip (needs
real NVML + L4T release file + tensorrt binding).

Three task-spec -> as-built deltas documented in
_docs/02_tasks/done/AZ-301_c7_engine_gate.md Implementation Notes:
 1. HostTuple lives in engine_gate.py (the only consumer);
    re-exported from package __init__.py.
 2. read_host_tuple takes precision as a keyword argument — three
    of four fields come from the host, precision is engine-build
    metadata supplied by the caller.
 3. AC-8 is Tier-2-only; AC-1..AC-7 + NFR-reliability + extras
    run on every CI host.

Risk-2 (manifest reader silently treats missing entry as pass):
DeploymentManifest.__getitem__ raises EngineHashMismatchError with
"missing manifest entry for {path}" — covered by
test_manifest_missing_entry_raises_hash_mismatch.

NFR-perf-validate (p99 <= 50 ms): tier-2 only — a real 500 MB
engine streaming sha256 cannot be benchmarked on Tier-1 fixtures.

AZ-302 (ThermalStatePublisher) + AZ-304 (C6 Postgres schema)
deferred to batches 26 / 27 to keep the 1-task batch cadence and
isolate their respective env / testcontainer surface areas.

Suite: 1134 passed / 11 skipped. No regressions outside the new
files.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 10:20:21 +03:00
Oleksandr Bezdieniezhnykh 65ad2168ed [AZ-300] Implement PytorchFp16Runtime — C7 simple-baseline strategy
AZ-300 mandatory simple-baseline InferenceRuntime (eager FP16 PyTorch).
Implements the AZ-297 Protocol; current_runtime_label returns
"pytorch_fp16". Numerical reference every fancier C7 strategy (AZ-298
TRT, AZ-299 ORT) is measured against, and the only viable runtime for
Tier-1 workstation Docker where TRT is non-trivial to install.

Production code (new):
 - components/c7_inference/pytorch_fp16_runtime.py — runtime +
   PytorchEngineHandle + output-shape adapter
 - components/c7_inference/architecture_registry.py — torch-free
   register_architecture / default_registry / ArchitectureFactory
   (Risk-1 mitigation: no L2->L3 back-edge from C7 into per-backbone
   code)
 - components/c7_inference/__init__.py — re-exports the registry
   mechanism. Still does NOT import the concrete strategy module
   (Invariant I-5)
 - components/c7_inference/config.py — adds per_frame_debug_log bool
   field (gates the DEBUG per-frame latency log)

Tests (new): tests/unit/c7_inference/test_pytorch_fp16_runtime.py
covers AC-1..AC-8 + NFRs. AC-1/2/6/7 + thermal/release/registry
guards run unconditionally (17 tests); AC-3/4/5/8 +
NFR-perf-deserialize + NFR-reliability-eval-mode require CUDA and
skip on Tier-1 CI / macOS dev.

Tests (modified):
 - test_protocol_conformance.py — narrowed
   test_ac5_build_inference_runtime_flag_on_but_module_missing
   parametrisation to exclude pytorch_fp16 (now-built); TRT / ORT
   still covered until AZ-298 / AZ-299 ship.

CI: .github/workflows/ci.yml lint + unit jobs now install
'-e .[dev,inference]' because mypy + pytest need torch + torchvision +
onnxruntime on the runner.

Three task-spec -> as-built deltas documented in
_docs/02_tasks/done/AZ-300_c7_pytorch_baseline.md Implementation Notes:
 1. Constructor conforms to AZ-297 factory shape (config positional;
    thermal_publisher + registry + clock keyword-only optionals).
    AZ-302 will update the factory to thread thermal_publisher.
 2. Architecture registry uses extras["model_name"] as lookup key
    (avoids touching the frozen BuildConfig / EngineCacheEntry DTOs).
 3. Warm-up forward deferred to AZ-300 tier-2 follow-up — the zero-arg
    registry has no per-backbone input-shape metadata.

Suite: 1120 passed / 10 skipped (CUDA + Tier-2 + cmake / actionlint
environment gates). No regressions in non-c7_inference areas.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 10:13:21 +03:00
Oleksandr Bezdieniezhnykh fce80290bc chore: park batch 24 plan; AZ-300 blocked on [inference] extras
batch 23 (AZ-332) is committed + pushed; AZ-332 transitioned to In
Testing. Batch 24 next-task computation revealed AZ-300 (C7
PytorchFp16Runtime) cannot proceed without `pip install -e .[inference]`
(torch + torchvision + onnxruntime). State file now reflects this gate
so the next /autodev invocation knows the explicit Choose A/B/C is
queued.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 10:00:08 +03:00
Oleksandr Bezdieniezhnykh 1ebab29a4f [AZ-332] C1 OKVIS2 Strategy: facade + binding skeleton
Python facade (`Okvis2Strategy`) is production-quality and satisfies
AZ-331's `VioStrategy` protocol; full AC-1..10 coverage with
AC-9 + NFR-perf marked `tier2`. The C++ pybind11 binding compiles
and loads but throws `OkvisFatalException("estimator not yet wired")`
on first `add_frame` — the `okvis::ThreadedKFVio` wiring is a tier2
follow-up the Step-15 Product Completeness Gate is expected to track
as a remediation task.

Resolved contradictions:

* Constructor signature aligned with the AZ-331 factory: `(config, *,
  fdr_client, clock=None)`. Calibration / preintegrator / logger
  built internally from config. No churn on AZ-331.
* IMU substrate: OKVIS2 owns its internal estimator IMU integration;
  the AZ-276 `ImuPreintegrator` is a separate substrate consumed by
  E-C5's fusion graph. Single source of truth lives at the sample
  stream, not the integrator instance.
* FDR API: `FdrClient.enqueue(record)` with new `vio.health` kind
  added to AZ-272 `KNOWN_PAYLOAD_KEYS`.

CI matrix forces `-DBUILD_OKVIS2=OFF` until the tier2 wiring task
brings Ceres / SuiteSparse / OKVIS2 vendored submodules into the
Linux build.

Files: 17 added/modified across `c1_vio/`, `fdr_client/records.py`,
`cpp/okvis2/CMakeLists.txt`, CI workflow, AZ-332 task spec
(implementation-notes section), batch 23 report.

Tests: 17 new (15 tier1 + 2 tier2). Full Tier-1 suite: 1109 pass,
2 skipped (env), 2 deselected (tier2). No regressions.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 09:56:45 +03:00
Oleksandr Bezdieniezhnykh 9c35776bcb chore: pre-batch-23 carry-over (state + AZ-332 plan)
Handoff artifacts from the prior /autodev session that stopped at
Step 7 sub_step compute-next-batch:

- _docs/_autodev_state.md: pointer updated to batch 23, AZ-332 only
  (AZ-345 deferred — dep AZ-346 not yet in done/).
- _docs/03_implementation/AZ-332_implementation_plan.md: locked-in
  decisions (no ROS 2, no Python re-impl, three-env split: macOS dev /
  Ubuntu CI / Jetson tier2) + step-by-step playbook for next session.

Pre-batch chore commit per implement skill prereq #4 (clean tree
required before AZ-332 commit so the batch diff stays focused).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 09:18:20 +03:00
Oleksandr Bezdieniezhnykh 48ea1e2fc2 [AZ-343] C2.5 InlierCountReRanker + shared FeatureExtractor helper
Implements the production-default ReRankStrategy: K=10 → N=3 by
single-pair LightGlue inlier count, with strict drop-and-continue
(INV-8) on per-candidate TileFetch / backbone / zero-inlier failures
and RerankAllCandidatesFailedError on zero survivors. Composition
root injects the shared LightGlueRuntime + Clock + the new
FeatureExtractor helper (an L1 placeholder OpenCvOrbExtractor that
unblocks AZ-343 and future C3 strategies — task scope expansion).

Architectural notes:
- Cross-component imports stay banned; tile_store types as `object`
  and the C6 TileCacheError family is duck-typed by class module
  prefix (same workaround AZ-348 adopted for c7_inference; proper
  fix is to relocate TileCacheError to _types/ in a follow-up).
- Clock injection follows the replay contract (AZ-398 Invariant 2);
  reranked_at is sourced from clock.monotonic_ns().
- AZ-342 factory grew `feature_extractor` + `clock` + `fdr_client`
  parameters; existing AZ-342 conformance tests updated.

Tests: 19 new AC-1..AC-12 + mixed-failure scenarios in
test_inlier_count_reranker.py; existing AZ-342 suite (26) still
green. Full repo sweep 1093 passed / 2 skipped (cmake/actionlint
not on PATH).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 06:22:40 +03:00
Oleksandr Bezdieniezhnykh 9a605c8514 [AZ-348] C3.5 ConditionalRefiner Protocol + factory + PassthroughRefiner
Defines the public `ConditionalRefiner` Protocol (PEP 544
@runtime_checkable, two methods: `refine_if_needed` +
`was_invoked`), extends `MatchResult` in-place with two
default-valued refinement fields (`refinement_label`,
`refinement_added_latency_ms`), defines the `RefinerError` family
(`RefinerBackboneError`, `RefinerConfigError`), and ships the
trivial `PassthroughRefiner` reference impl.

Both refiner strategies are linked unconditionally — no
`BUILD_REFINER_*` flag (NOT ADR-002 territory). Runtime selection
only per ADR-001. `PassthroughRefiner` returns the input
`MatchResult` by reference (bit-identical correspondences per
contract INV-5) and always reports `was_invoked() is False`.

Documentation: renames `module-layout.md` `c3_5_adhop` Public API
symbol from `AdHoPRefinementStrategy` to `ConditionalRefiner`
(AC-14) so the doc agrees with `description.md` and the contract.

AC-9 (single-thread binding) deferred to AZ-270 runtime-root
composition, mirroring AZ-336 / AZ-342 / AZ-344 Risk-4 precedent.
AC-7 for the `"adhop"` strategy stops at `ModuleNotFoundError`
because the AdHoP backbone is owned by AZ-349. All other ACs +
NFRs covered by 36 new conformance tests.

Architectural note: `PassthroughRefiner.inference_runtime` is
typed as `object` because the L3→L3 import ban
(`test_az270_compose_root`) forbids c3_5_adhop from importing
c7_inference; the runtime-root factory narrows the type at
construction time.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 05:52:36 +03:00
Oleksandr Bezdieniezhnykh 89c223882b [AZ-344] C3 CrossDomainMatcher Protocol + factory + RollingHealthWindow
Defines the public `CrossDomainMatcher` Protocol (PEP 544
@runtime_checkable, two methods: `match` + `health_snapshot`),
the three frozen+slotted DTOs (`CandidateMatchSet`, `MatchResult`,
`MatcherHealth`) in the L1 `_types/matcher.py` layer, the
`MatcherError` family (`MatcherBackboneError`,
`InsufficientInliersError`), and the composition-root
`build_matcher_strategy` factory with lazy-import +
`BUILD_MATCHER_<variant>` gating per ADR-002.

`RollingHealthWindow` accumulator (60 s, amortised O(1) update,
strict O(1) snapshot) is constructed by the factory and injected
into every concrete matcher so all backbones share window
semantics; this is what backs C5's spoof-promotion gate.

Legacy placeholder `MatchResult` removed from `_types/matching.py`;
import-only consumers (`c4_pose.interface`, `c3_5_adhop.interface`)
repointed at the new `_types/matcher.py` home — zero behavioural
change to those components.

AC-9 (single-thread binding) and AC-10 (LightGlueRuntime
identity-share with C2.5) deferred to AZ-270 runtime-root
composition, mirroring the AZ-342 Risk-4 escape clause. All other
ACs + NFRs covered by 70 new conformance tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 05:43:33 +03:00
Oleksandr Bezdieniezhnykh d6756f1855 [AZ-342] C2.5 ReRankStrategy: Protocol + DTOs + factory + composition
Foundational scaffolding for the InlierCountReRanker (AZ-343) and
the future C3 CrossDomainMatcher consumer (AZ-344). No concrete
re-ranker is implemented here.

* ReRankStrategy Protocol (single rerank(frame, vpr_result, n,
  calibration) -> RerankResult method) with all 8 invariants in the
  docstring — notably INV-8 drop-and-continue (per-candidate failure
  NEVER propagates unless every candidate fails).
* DTOs moved to L1 _types/rerank.py — RerankCandidate, RerankResult;
  frozen+slots; tuple-not-list for RerankResult.candidates; tile_id
  encoded as (zoom_level, lat, lon) tuple to keep _types/ free of any
  c6_tile_cache (L3) import per module-layout.md.
* Error family: RerankError + RerankBackboneError +
  RerankAllCandidatesFailedError. Only RerankAllCandidatesFailedError
  escapes rerank(); RerankBackboneError is caught inside the per-
  candidate loop, logged ERROR, FDR-stamped, candidate dropped.
* C2_5RerankConfig (strategy enum default "inlier_count", top_n int
  default 3) with strict validation at load; registered into
  Config.components on c2_5_rerank import.
* build_rerank_strategy(config, *, tile_store, lightglue_runtime)
  factory: 1-strategy resolution table, lazy import,
  BUILD_RERANK_<variant> gate, ImportError → StrategyNotAvailableError
  mapping. The shared LightGlueRuntime is constructor-injected
  (R14 fix: neither C2.5 nor C3 owns its lifecycle).

Renamed the Protocol from the existing stub "RerankStrategy" to
"ReRankStrategy" to match the contract; updated module-layout.md.
Removed the legacy RerankResult shape from _types/vpr.py — the
v1.0.0 shape lives in _types/rerank.py.

Excluded per task spec:
* Concrete InlierCountReRanker (AZ-343).
* C3 matcher protocol task (AZ-344, next in batch).
* AC-9 single-thread binding + AC-10 LightGlueRuntime identity-share
  between C2.5/C3 — deferred per task spec Risk 3 until the generic
  compose_root thread-binding registry and the C3 factory both land.

Tests: AC-1..AC-8 + AC-11 + NFR-perf-factory in
tests/unit/c2_5_rerank/test_protocol_conformance.py. The legacy
smoke test is removed. Full sweep: 997 passed (one pre-existing
flake in test_az296_takeoff_abort, subprocess timing, unrelated to
this commit; passes in isolation).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 05:31:27 +03:00
Oleksandr Bezdieniezhnykh 3665acef66 [AZ-336] C2 VprStrategy: Protocol + DTOs + factory + composition
Foundational scaffolding for every concrete C2 backbone (UltraVPR,
NetVLAD, MegaLoc, MixVPR, SelaVPR, EigenPlaces, SALAD — AZ-337..AZ-340)
and the C2.5 ReRanker consumer side. No backbone is implemented here.

* VprStrategy Protocol (embed_query / retrieve_topk / descriptor_dim)
  + BackbonePreprocessor C2-internal Protocol (NOT in Public API per
  description.md § 6).
* DTOs in L1 _types/vpr.py — VprQuery, VprCandidate, VprResult; all
  frozen + slots; tuple-not-list for VprResult.candidates so the
  immutability invariant truly holds.
* Error family: VprError + VprBackboneError + VprPreprocessError +
  IndexUnavailableError; same-named but namespace-distinct from
  c6_tile_cache.IndexUnavailableError (the c2 family is the closed
  envelope C5 / C2.5 consume; concrete strategies rewrap the C6 form).
* C2VprConfig (strategy enum + backbone_weights_path + faiss_index_path)
  with strict validation at load; registered into Config.components on
  c2_vpr import.
* build_vpr_strategy factory with 7-strategy resolution table, lazy
  import, BUILD_VPR_<variant> gating, ImportError→
  StrategyNotAvailableError mapping, and pre-flight descriptor_dim
  match against DescriptorIndex.descriptor_dim() — mismatch fires
  ConfigError at startup, NOT at first frame.

Contract change vs the v1.0.0 draft: factory takes descriptor_index:
DescriptorIndex (not tile_store: TileStore) because descriptor_dim()
lives on DescriptorIndex per C6's Public API. The contract markdown
is updated to match.

Architecture: VprCandidate.tile_id is a plain (zoom, lat, lon) tuple,
keeping _types/ (L1) free of any c6_tile_cache (L3) import per
module-layout.md. Consumers reconstruct TileId at the C6 boundary.

Excluded per task spec:
* Concrete backbones (AZ-337..AZ-340).
* FAISS HNSW retrieve wiring (AZ-341).
* DescriptorNormaliser helper (AZ-283, already shipped).
* AC-9 single-thread binding — deferred per task spec Risk 4 until the
  generic compose_root thread-binding registry is in place (today
  each factory owns its own, e.g. fc_factory).

Tests: 45 ACs + NFRs in tests/unit/c2_vpr/test_protocol_conformance.py
covering AC-1..AC-8, the error family, the config validation, the
factory NFR (p99 ≤ 50 ms). The legacy smoke test is removed. Full
sweep 973 passed, 2 skipped (CI-only cmake / actionlint).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 05:25:35 +03:00
Oleksandr Bezdieniezhnykh 823c0f1b2e [AZ-398] Replay: FrameSource + Clock Protocols + Clock injection
Ship the two Layer-1 cross-cutting Protocols replay mode needs to leave
production C1-C5 components mode-agnostic (Invariant 1) and replay-
deterministic (Invariant 2). Live + replay binaries see the same
interfaces; only the strategy differs.

* Clock Protocol (monotonic_ns / time_ns / sleep_until_ns) +
  WallClock (live + REALTIME replay) + TlogDerivedClock (ASAP replay;
  advance-on-call; non-monotonic source → ClockOrderingError).
* FrameSource Protocol (next_frame -> NavCameraFrame | None / close)
  + LiveCameraFrameSource (cv2.VideoCapture device index) +
  VideoFileFrameSource (cv2.VideoCapture file).
* Build-flag gating: BUILD_VIDEO_FILE_FRAME_SOURCE,
  BUILD_LIVE_CAMERA_FRAME_SOURCE (constructor-time check; Tier-0 OFF
  refuses construction with FrameSourceConfigError).
* Composition-root factories: build_clock + build_frame_source.
* Injected Clock across every component that previously called
  time.monotonic_ns() / time.sleep() directly: c5_state (estimator,
  ESKF, fallback watcher, source-label SM, isam2 handle), c8_fc_adapter
  (inbound MAVLink + MSP2, AP outbound, iNav outbound, QGC GCS),
  c13_fdr writer, c12_operator_tooling httpx flights client. All
  constructors default to WallClock() so existing call sites keep
  live-binary behaviour without a wiring change.
* AC-4 CI guard (tests/_meta/test_no_direct_time_in_components.py)
  AST-scans components/**/*.py for direct time.monotonic_ns /
  time.time_ns / time.sleep references and fails loudly with file:line.
* Conformance + factory tests: tests/unit/clock + tests/unit/frame_source.
* Test fixture updates: FallbackWatcher / SourceLabelStateMachine
  clock_ns is now required (removed time.monotonic_ns default);
  test_az388 patches estimator._clock instead of a module-level time;
  test_az393 ardupilot adapter uses a _FixedClock test double.

Excluded per the task spec: TlogReplayFcAdapter (AZ-399), ReplaySink
(AZ-400), compose_replay (AZ-401), CLI (AZ-402), Docker/CI (AZ-403),
E2E fixture (AZ-404), IMU auto-sync (AZ-405).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 05:10:01 +03:00
Oleksandr Bezdieniezhnykh 6c7d24f7e0 [AZ-331] C1 VioStrategy: Protocol + DTOs + factory + C5 migration
Freezes the c1_vio Public API per
_docs/02_document/contracts/c1_vio/vio_strategy_protocol.md v1.0.0:

- VioStrategy Protocol (4 methods: process_frame, reset_to_warm_start,
  health_snapshot, current_strategy_label) in
  components/c1_vio/interface.py.
- DTOs (VioOutput, VioHealth, FeatureQuality, WarmStartPose) + VioState
  enum in _types/nav.py — L1 placement so C5 + C13 consume them without
  crossing the components.* boundary (AZ-270 AC-6). The new VioOutput
  shape (frame_id: str, relative_pose_T: gtsam.Pose3,
  pose_covariance_6x6, imu_bias, feature_quality, emitted_at_ns)
  replaces the AZ-263 scaffolding in _types/vio.py, which is now
  deleted.
- VioError family (VioInitializingError / VioDegradedError /
  VioFatalError) in components/c1_vio/errors.py. Documented
  rationale: the degraded-operation path returns a VioOutput with
  inflated covariance + VioHealth.state=DEGRADED rather than raising
  VioDegradedError — the error type exists only for the rare
  degraded->fatal transition.
- C1VioConfig per-component config block (strategy enum,
  lost_frame_threshold default 9, warm_start_max_frames default 5)
  with constructor-time validation rejecting unknown strategy labels.
- StrategyNotAvailableError added to runtime_root/errors.py;
  composition-time error distinct from the VioError family.
- Composition-root factory build_vio_strategy in
  runtime_root/vio_factory.py with three BUILD_* gates (BUILD_OKVIS2,
  BUILD_VINS_MONO, BUILD_KLT_RANSAC). Concrete strategy modules are
  imported lazily via __import__ AFTER the flag check — Tier-0
  workstation builds with the flag OFF MUST NOT load the strategy
  module (Risk-2 / I-5; verifiable via sys.modules).
- 36 conformance tests cover all 9 ACs + NFR-perf-factory
  (p99 build under 200 ms x 1000 calls) + NFR-reliability-error-family.
  AC-8 introspects the contract file's Shape table and asserts method
  parity against the runtime Protocol; AC-9 asserts the frame_id
  annotation is 'str' (PEP-563 stringified).

C5 migration (consumers of the new VioOutput shape):
- gtsam_isam2_estimator.py + eskf_baseline.py: replaced
  vio.timestamp -> vio.emitted_at_ns (drops _datetime_to_ns on the
  VIO path), vio.pose_se3 -> vio.relative_pose_T (gtsam.Pose3 direct;
  drops _pose_se3_to_gtsam / _pose_se3_to_array), vio.covariance_6x6
  -> vio.pose_covariance_6x6 (rename).
- key_for_frame signature widened to UUID | int | str to accept the
  new str frame_id.
- 4 C5 test files migrated to the new VioOutput shape with helper
  fixtures producing ImuBias + FeatureQuality + str frame_id.
- c5_state/interface.py TYPE_CHECKING import path updated.

Bootstrap healthcheck + test_types_importable updated to drop the
deleted _types/vio module and pick up _types/inference (AZ-297) in
the same sweep.

Full unit-test sweep: 884 passed, 2 pre-existing environment skips
(cmake, actionlint).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 04:44:31 +03:00
Oleksandr Bezdieniezhnykh daff5d4d1c [AZ-297] C7 InferenceRuntime: Protocol + DTOs + factory
Freezes the c7_inference Public API per
_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md
v1.0.0:

- InferenceRuntime Protocol (6 methods: compile_engine,
  deserialize_engine, infer, release_engine, thermal_state,
  current_runtime_label) in components/c7_inference/interface.py.
- DTOs (PrecisionMode enum, OptimizationProfile, BuildConfig,
  EngineCacheEntry, EngineHandle opaque marker) in _types/inference.py
  — placed at the L1 types layer so C10 can re-export EngineCacheEntry
  without crossing the components.* boundary (AZ-270 AC-6).
- ThermalState DTO expanded in _types/thermal.py from the AZ-355
  forward-declared stub to the AZ-297 contract shape (cpu/gpu temp,
  thermal_throttle_active, measured_clock_mhz, measured_at_ns,
  is_telemetry_available). Invariant I-6: when telemetry is
  unavailable, throttle is False.
- Error family rooted at c7_inference.errors.RuntimeError (9 subtypes:
  EngineBuildError, EngineDeserializeError, EngineHashMismatchError,
  EngineSchemaMismatchError, EngineSidecarMissingError,
  CalibrationCacheError, InferenceError, OutOfMemoryError,
  TelemetryUnavailableError). RuntimeNotAvailableError stays in
  runtime_root/errors.py — composition-time, outside the family.
- C7InferenceConfig per-component config block (runtime label,
  thermal_poll_hz, engine_cache_dir) with constructor-time validation
  rejecting unknown runtime labels.
- Composition-root factory build_inference_runtime in
  runtime_root/inference_factory.py with three BUILD_* gates
  (BUILD_TENSORRT_RUNTIME, BUILD_ONNX_TRT_EP_RUNTIME,
  BUILD_PYTORCH_FP16_RUNTIME). Concrete strategy modules are imported
  lazily via __import__ AFTER the flag check, so a Tier-0 build with
  the flag OFF MUST NOT load the strategy module (AC-5 / I-5;
  verifiable via sys.modules).
- 37 conformance tests cover all 8 ACs + NFR-perf-factory
  (p99 build under 200 ms × 1000 calls) + NFR-reliability-error-family.
  AC-8 introspects the contract file's Shape table and asserts method
  parity against the runtime Protocol; also asserts all 9 error
  subtypes are documented.

Retired the AZ-263 scaffolding EngineCacheEntry from _types/manifests.py
(replaced by the AZ-297 canonical shape in _types/inference.py); updated
the LightGlue-flavoured EngineHandle Protocol docstring in
_types/manifests.py to rationalize its intentional dual existence
with the C7 opaque EngineHandle (same name, different consumer-side
cut, mirroring the C4/C5 ISam2GraphHandle pattern).

Stale ThermalState.throttle docstring references in c4_pose/config.py,
c4_pose/interface.py, and _types/pose.py updated to
thermal_throttle_active.

Full unit-test sweep: 843 passed, 2 pre-existing environment skips
(cmake, actionlint).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 04:30:14 +03:00
Oleksandr Bezdieniezhnykh f925af9de3 [AZ-303] C6 storage interfaces: Protocols + DTOs + factories
Freezes the c6_tile_cache Public API per
_docs/02_document/contracts/c6_tile_cache/{tile_store,tile_metadata_store,
descriptor_index}.md v1.0.0:

- Three runtime_checkable Protocols (TileStore 4-method, TileMetadataStore
  9-method, DescriptorIndex 5-method) in components/c6_tile_cache/interface.py.
- Frozen DTOs + enums (TileId, TileMetadata, TileMetadataPersistent,
  TileQualityMetadata, Bbox, SectorBoundary, HnswParams, IndexMetadata,
  TileSource, FreshnessLabel, VotingStatus, SectorClassification) in
  components/c6_tile_cache/_types.py. Constructor-time validation rejects
  out-of-range zoom_level / lat / lon and inverted Bbox.
- TilePixelHandle ABC for read-only mmap access (Invariant I-4).
- TileCacheError family (6 subtypes) + IndexBuildError (deliberately
  outside the family) in components/c6_tile_cache/errors.py.
- C6TileCacheConfig per-component config block, registered on package
  import; validates known runtime labels at construction time.
- Composition-root factories build_tile_store / build_tile_metadata_store /
  build_descriptor_index in runtime_root/storage_factory.py, with lazy
  concrete-impl imports gated by BUILD_FAISS_INDEX (AC-5 / Risk 2:
  no module-level FAISS import when the flag is OFF).
- RuntimeNotAvailableError defined in runtime_root/errors.py to be shared
  with AZ-297 (composition-time error, distinct from per-component
  runtime errors).

51 conformance tests cover all 10 ACs + NFR-perf-factory (p99 build_*
under 50 ms across 1000 calls) + NFR-reliability-error-family. AC-9
introspects each contract file's Shape table and asserts method
parity against the runtime Protocol.

Retired the AZ-263 scaffolding SectorClassification (dataclass) and
TileQualityMetadata from _types/tile.py since their canonical home is
now c6_tile_cache._types; Tile and TileRecord remain in _types/tile.py
until c3_matcher (AZ-344) and c11_tile_manager (AZ-316/319) retire
their interface stubs.

Full unit-test sweep: 791 passed, 2 pre-existing environment skips.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 04:21:44 +03:00
Oleksandr Bezdieniezhnykh 48281db9e9 [AZ-381] Fix ISam2GraphHandleImpl missing get_pose_key + comments
F1 (High/Architecture) from cumulative review of batches 01-22:
`ISam2GraphHandleImpl` did not satisfy C4's `ISam2GraphHandle`
Protocol stub (AZ-355) because it lacked `get_pose_key`.
`pose_factory`'s isinstance gate would have raised at composition.
Two Protocols (C4 minimal consumer cut, C5 richer producer surface)
are intentional per AZ-355 Risk 1 — the impl just needed to expose
the canonical name. Delegates to estimator.key_for_frame.

Added cross-component conformance test asserting the C5 impl
satisfies both Protocols, so future drift trips a unit test.

F2 (Medium/Maintainability): added justifying comments at four
`except: pass` sites in runtime_root, c8_fc_adapter (ap + inav),
and c13_fdr writer. No behavioral change.

Updated cumulative review report verdict from FAIL to PASS and
recorded a post-mortem on the initial misframing
(treated the dual-Protocol design as duplication on first read).

Autodev state: batch 22 done, cumulative-review PASS,
ready for batch 23.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 03:55:41 +03:00
Oleksandr Bezdieniezhnykh 8a83166261 [AZ-490] C5 set_takeoff_origin entrypoint + bounded-delta GPS gate
Add operator warm-start path to C5 StateEstimator Protocol and both
implementations (GtsamIsam2StateEstimator, EskfStateEstimator), plus
the third clause of the AZ-385 spoof-promotion gate.

- StateEstimator Protocol: set_takeoff_origin(origin, sigma_horiz_m,
  sigma_vert_m) -> None.
- iSAM2: PriorFactorPose3 at origin with diagonal sigmas, single
  isam2.update().
- ESKF: zero _nominal_pos, overwrite _P position block with sigma**2.
- SourceLabelStateMachine.process_gps_sample bounded-delta clause:
  WgsConverter.horizontal_distance_m vs smoother estimate; reject
  resets the dwell-time counter so AZ-385 cannot re-promote off bad
  GPS.
- New EstimatorAlreadyStartedError (StateEstimatorConfigError
  subclass) on late call after first add_*.
- C5StateConfig: spoof_promotion_bounded_delta_m=200,
  default_takeoff_origin_sigma_horiz_m=5,
  default_takeoff_origin_sigma_vert_m=10.
- New GpsSample DTO + WgsConverter.horizontal_distance_m helper.
- 4 new FDR kinds (cold_start_origin.{set,unavailable},
  gps_bounded_delta.{accept,reject}) registered in AZ-272 schema.
- 33 new unit tests cover AC-1..AC-15; full repo 750 passed / 2
  skipped (pre-existing CI tooling skips).

Docs synced: protocol contract, C5 component description,
architecture, glossary, system-flows, C10 provisioning description.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 02:53:58 +03:00
Oleksandr Bezdieniezhnykh 72a06edab0 [AZ-489] C12 FlightsApiClient + offline JSON loader + bbox helper
ADR-010 primary cold-start path now has a real source for the cache bbox
and the takeoff origin. Single concrete strategy (`HttpxFlightsApiClient`)
behind a `@runtime_checkable` Protocol; offline JSON fallback (`load_flight_file`)
shares the same DTO shape per FAC-INV-1.

* `flights_api/interface.py` — `FlightsApiClient` Protocol + `FlightDto`
  + `WaypointDto` + `WaypointObjective` / `WaypointSource` enums (plain
  frozen-slotted dataclasses, matching project's LatLonAlt / PoseEstimate
  pattern).
* `flights_api/errors.py` — 8-class hierarchy under `FlightsApiError`.
* `flights_api/_parser.py` — shared JSON validator: range checks, lat/lon
  bounds, contiguous ordinals, finite floats, enum membership.
* `flights_api/bbox.py` — `bbox_from_waypoints` envelopes lat/lon and
  inflates by a horizontal-distance buffer via WgsConverter ENU
  round-trip (NOT degree-space); `takeoff_origin_from_flight` passes
  waypoints[0] through unrounded.
* `flights_api/file_loader.py` — orjson-backed offline loader.
* `flights_api/httpx_client.py` — concrete client with ONE retry on
  transient 5xx + connect errors; token redaction at every log site;
  test-injectable transport + sleep.
* `runtime_root/c12_factory.py` — `build_flights_api_client(config)`;
  re-exported from `runtime_root/__init__.py`. OperatorToolServices
  aggregate intentionally deferred to AZ-328 per scope discipline.
* `pyproject.toml` — `httpx>=0.28,<1.0` added (chosen over `requests`
  for native `MockTransport` testing).

Tests: 28 cases across AC-1..AC-18 plus extras (malformed JSON,
negative buffer, zero buffer, missing top-level fields, negative
ordinal, empty-flight takeoff). Full repo run: 713 passed, 2 skipped.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 01:28:49 +03:00
Oleksandr Bezdieniezhnykh e0be591b06 [AZ-489] [AZ-490] ADR-010 design pass: operator-mission as cold-start anchor
Architecture, contracts, and task amendments for the flight-route-driven
preflight + cold-start origin feature (ADR-010). No source code touched
in this commit; the implementation commits for AZ-489 / AZ-490 / AZ-419
land separately.

* architecture.md: ADR-010, new Principle #14, amended Principle #11,
  external systems gain flights service + Mission Planner UI, data
  model gains Flight / Waypoint / TakeoffOrigin.
* system-flows.md: F1 gains phase 0 (Flight resolve), F2 gains
  cold-start ladder, F7 gains mid-flight bounded-delta GPS gate.
* glossary.md: Flight, Flights API, Mid-flight bounded-delta GPS gate,
  Mission Planner UI, Takeoff origin, Waypoint.
* C10: description + cache_provisioner + manifest_verifier bumped to
  v1.1 carrying takeoff_origin + flight_id in the manifest hash.
* C12: description updated + new flights_api_client.md contract v1.0.
* C5: description + state_estimator_protocol bumped to v1.1 with
  set_takeoff_origin + 3-clause spoof-promotion gate.
* AZ-323/324/325/326/328/419 amended in place. AZ-490 spec created
  (C5 set_takeoff_origin entrypoint).
* Dependencies table: 142 tasks / 478 pts / 15 forward edges
  (2 new tasks, 2 backward deps, 2 forward deps from AZ-419).
* Leftovers cleared: 2026-05-11 Jira transition entries for AZ-355
  and AZ-386 are deleted (Jira reconnected; both already transitioned
  in their respective implementation commits).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 01:28:05 +03:00
Oleksandr Bezdieniezhnykh db27e25630 [AZ-355] C4 PoseEstimator Protocol + factory + DTOs + composition
Land the foundational C4 surface AZ-358 (Marginals) and AZ-361
(Hybrid) build on top of:

- PoseEstimator Protocol (@runtime_checkable): estimate(...) +
  current_covariance_mode().
- Error hierarchy: PoseEstimatorError, PnpFailureError,
  PoseEstimatorConfigError; CovarianceDegradedWarning as a Warning
  subclass (warnings.warn path, not raised).
- ISam2GraphHandle Protocol stub (READ-ONLY view, get_pose_key only)
  decoupled from C5's concrete ISam2GraphHandleImpl.
- C4PoseConfig (frozen dataclass) + register on c4_pose import.
- runtime_root/pose_factory.build_pose_estimator with lazy-import
  fallback; INFO log c4.pose.strategy_loaded; shares ingest-thread
  binding with C5 per ADR-003.

DTO restructuring (cross-cutting): retire the legacy raw-4x4
PoseEstimate(int frame_id, datetime timestamp, pose_se3, ...) and
ship the contract shape PoseEstimate(UUID, LatLonAlt, Quat,
np.ndarray, CovarianceMode, PoseSourceLabel,
last_satellite_anchor_age_ms, emitted_at). C5 add_pose_anchor in
both gtsam_isam2 + eskf_baseline migrated in lockstep via
WGS84->ENU + Quat->R helpers; test fixtures updated. VIO output
stays on the raw shape until AZ-331 (C1 protocol) lands.

LatLonAlt upgraded to slots=True per AC-2. ThermalState stub added
to _types/thermal.py so the Protocol typechecks pre-AZ-302.

Tests: 25 new in tests/unit/c4_pose/test_az355_pose_protocol.py
covering AC-1..AC-10 + factory wiring + config validation; full
repo: 685 passed, 2 pre-existing CI-only skips.

Jira transition deferred: MCP "Not connected"; leftover entry in
_docs/_process_leftovers/2026-05-11_jira_transition_az355_deferred.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 10:32:14 +03:00
Oleksandr Bezdieniezhnykh c0bdb57957 [AZ-386] C5 ESKF baseline: 16-state error-state KF (NumPy)
Implements the mandatory simple-baseline StateEstimator per AC-2.1a
engine-rule at C5 (IT-12 comparative study vs iSAM2). NumPy-only;
no GTSAM dependency so BUILD_STATE_ESKF=ON binaries ship without
GTSAM at all.

- 16-state error vector (pos 3 + vel 3 + rot 3 + ba 3 + bg 3 + dt 1)
  over a textbook nominal-state / error-state ESKF split.
- add_fc_imu: full nonlinear IMU integration + linearised F P F^T + Q
  covariance propagation per IMU sample.
- add_vio: simplified relative-pose update (snapshot-based; baseline
  scope, documented).
- add_pose_anchor: absolute-pose update; integrates BOTH marginals and
  jacobian modes (no skip — ESKF has no graph; AC-4).
- AC-9 divergence test: Mahalanobis r^T S^-1 r > 100 (10 sigma) on the
  innovation covariance S = H P H^T + R.
- AC-5 SPD: Cholesky-positive enforcement on every emitted covariance;
  non-SPD raises EstimatorFatalError and locks state to LOST.
- AC-6 honesty: smoothed_history entries carry smoothed=False; deviation
  from C5 contract Invariant 7 documented in module + report.
- AC-7 / AC-10 BUILD_STATE_ESKF gating: works through existing factory
  infra (state_factory._STATE_BUILD_FLAGS).
- AC-8: SourceLabelStateMachine + FallbackWatcher auto-wired eagerly
  in __init__, same pattern as the iSAM2 estimator.

Tests: 20 new unit tests covering AC-1..AC-10 + robustness checks.
Full suite: 660 passed, 2 skipped (CI-only).

The AZ-386 Jira transition to Done is deferred (Atlassian MCP returned
'Not connected'); recorded in _docs/_process_leftovers/ for replay on
the next autodev invocation per the Leftovers Mechanism.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 10:12:30 +03:00
Oleksandr Bezdieniezhnykh 098aabac0c [AZ-387] C5 smoothed-history → FDR side-channel
After every successful current_estimate(), emit one
c5.state.smoothed_history FDR record per newly-smoothed past
keyframe from IncrementalFixedLagSmoother. AC-4.5 (revised): the
smoothed stream goes ONLY to FDR; the C8 outbound forward-time
stream is unaffected.

Idempotency via _smoothed_fdr_watermark_s (smoother-native float
seconds); the same pose key is never emitted twice. Hook is
best-effort — internal failures log warnings but do not raise, so
a smoother divergence cannot contaminate the forward-time path.

Cross-task invariants documented:
- AC-3 ESKF no-op — AZ-386 installs an inert hook on the ESKF.
- AC-4 No C8 leak — enforced at the C8 boundary by AZ-261.

8 new unit tests against AC-1/2/5/6 + robustness (no-FDR-client,
marginals failure). Full suite: 640 passed, 2 skipped.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 07:13:44 +03:00
Oleksandr Bezdieniezhnykh 7cbd17ee83 [AZ-385] C5 SourceLabelStateMachine + spoof-promotion gate
Implements Invariants 5 + 8 + AC-NEW-2 / AC-NEW-8: the
EstimatorOutput.source_label now reflects a real state machine
(DEAD_RECKONED → SATELLITE_ANCHORED ↔ VISUAL_PROPAGATED) governed by
a spoof-promotion gate that latches closed on FC SPOOFED GPS health
and re-opens only when BOTH conditions hold — ≥10 s
STABLE_NON_SPOOFED AND next anchor within
spoof_promotion_visual_consistency_tol_m.

Every reject emits a c5.state.spoof_rejected FDR record plus a
subscriber-fan-out STATUSTEXT (severity WARNING, 50-char cap per
MAVLink). FDR and subscriber paths bypass the standard logger so
silencing logs cannot suppress the spoof trail (R07 / AC-6).

GtsamIsam2StateEstimator now eagerly builds the SM from C5StateConfig
in __init__; new public methods notify_gps_health() (delegates to
SM, called by composition root from C8 inbound) and
subscribe_spoof_rejection() (composition root attaches C8's
QgcTelemetryAdapter here). health_snapshot.spoof_promotion_blocked
+ current_estimate.source_label now flow from the live SM.

25 new unit tests across all 12 ACs plus cancellation, subscriber
exception isolation, and estimator wire-up integration cases. One
AZ-384 test renamed + updated to expect DEAD_RECKONED before any
anchor (was VISUAL_PROPAGATED placeholder pre-AZ-385).

Full suite: 632 passed, 2 skipped.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 07:06:38 +03:00
Oleksandr Bezdieniezhnykh 31a300f8a2 [AZ-388] C5 AC-5.2 no-estimate fallback detector + signal emission
Implements Invariant 9 / AC-5.2: when current_estimate cannot return a
fresh output for >= state.no_estimate_fallback_s (default 3.0 s), emit
ONE engagement signal (FDR kind=c5.state.no_estimate_fallback_engaged
+ GCS STATUSTEXT severity CRITICAL); on recovery, ONE recovery signal
(FDR kind=c5.state.no_estimate_fallback_recovered + STATUSTEXT NOTICE).
Rate-limited via single _in_fallback latch (AC-2: 30 s sustained
no-estimate still emits exactly one engagement).

New FallbackWatcher class owns the state machine; estimator wires it
through constructor + current_estimate entry/success hooks. Public
check_fallback_state(now_ns) watchdog (NFR p99 <= 5 us) + subscribe
APIs let C8 outbound react without coupling C5 to a concrete GCS
adapter at construction. Severity enum extended with CRITICAL=2 and
NOTICE=5 to match MAVLink MAV_SEVERITY.

18 new unit tests across all 8 ACs, deterministic synthetic clock,
integration tests patch monotonic_ns through GtsamIsam2StateEstimator
to drive AC-7 iSAM2 leg (ESKF leg deferred to AZ-386).

Full suite: 607 passed, 2 skipped.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 06:53:22 +03:00
Oleksandr Bezdieniezhnykh b3ad94c155 [AZ-384] C5 marginals + current_estimate/smoothed_history/health_snapshot
Replaces the last three NotImplementedError placeholders on
GtsamIsam2StateEstimator with real Marginals + output methods:

- current_estimate(): recovers the 6x6 Marginals covariance for the
  most-recently committed pose key, enforces the SPD invariant via
  np.linalg.cholesky (Invariant 10), converts the local-ENU pose
  translation to WGS84 via the shared WgsConverter, derives a
  body->world quaternion, and emits a fresh EstimatorOutput
  (smoothed=False, Invariant 4). On SPD failure transitions
  isam2_state -> LOST and raises EstimatorFatalError (AC-5.2 path).
- smoothed_history(n): iterates the smoother's active POSE keys via
  _smoother.calculateEstimate().keys() (filtered by GTSAM symbol
  char) and the smoother timestamps via ts_map.at(key) - workaround
  for the pinned gtsam_unstable build's non-iterable
  FixedLagSmootherKeyTimestampMap. Bounded by K (Invariant 6); every
  entry has smoothed=True (Invariant 7).
- health_snapshot(): cheap O(1) accumulator read; reports
  IsamState lifecycle, pose-key count, AC-NEW-8
  cov_norm_growing_for_s rolling 60s deque-backed counter, and
  spoof_promotion_blocked via the AZ-385 state machine injection
  point.

Adds two public injection points for AZ-385/composition root:
set_enu_origin(LatLonAlt) and attach_source_label_state_machine(machine).
Defaults: (0, 0, 0) ENU origin, VISUAL_PROPAGATED source label,
spoof_promotion_blocked=False.

Wires _record_committed_pose_key into the three add_* success paths
so current_estimate only reads keys that have real values in iSAM2.
The JACOBIAN path in add_pose_anchor deliberately skips this call -
Invariant 3 keeps the JACOBIAN pose out of the iSAM2 graph.

Tests: +27 in tests/unit/c5_state/test_az384_marginals_outputs.py
covering all 10 ACs. Three obsolete AZ-382 tests
(test_ac10_*_raises_named_az384) removed. Full suite: 589 passed,
2 skipped.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 06:20:01 +03:00
Oleksandr Bezdieniezhnykh fd848266d1 [AZ-383] C5 add_vio/add_pose_anchor/add_fc_imu factor adds
Replaces AZ-382 NotImplementedError placeholders with real GTSAM factor
adds wired against the iSAM2 graph handle:

- add_vio  -> BetweenFactorPose3 between consecutive VIO pose keys
  (first call primes the chain; AZ-388 owns first-keyframe seeding).
- add_pose_anchor -> mode-dispatch per pose.covariance_mode:
  "marginals" -> PriorFactorPose3 + handle.update();
  "jacobian"  -> skip iSAM2 add per AZ-361 contract.
  Both paths bump _last_anchor_ns via time.monotonic_ns().
- add_fc_imu -> shared ImuPreintegrator.integrate_window +
  reset_for_new_keyframe; builds a CombinedImuFactor between the
  prev/curr (X, V, B) keyframe triple. Introduces new 'v' (velocity)
  and 'b' (bias) GTSAM key namespaces decoupled from the VIO/pose
  frame_id mapping.

Invariant 2 - non-decreasing timestamps - enforced per call with
EstimatorDegradedError + c5.state.out_of_order log. Every successful
add emits a structured DEBUG *_ok log; every failure emits a
structured ERROR *_failed log and raises through the C5 error
hierarchy (R05).

Contract-vs-reality fix-ups also landed:

- StateEstimator Protocol: add_fc_imu(ImuWindow) - was incorrectly
  annotated as ImuTelemetrySample by AZ-381.
- _last_anchor_ns semantics switched to monotonic_ns() to match
  last_anchor_age_ms.
- create() factory back-wires the ISam2GraphHandle to the estimator
  via the new attach_handle() method.

Tests: +21 in tests/unit/c5_state/test_az383_factor_adds.py covering
all 8 ACs with mock ISam2GraphHandle instances. Three obsolete
AZ-382 tests (test_ac10_add_*_raises_named_az383) removed. Full
suite: 565 passed, 2 skipped.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 06:07:45 +03:00
Oleksandr Bezdieniezhnykh 8b394a98c6 [AZ-382] C5 GtsamIsam2StateEstimator skeleton + real iSAM2 handle bodies
- Add GtsamIsam2StateEstimator owning the GTSAM substrate:
  gtsam.ISAM2(ISAM2Params()) + gtsam_unstable.IncrementalFixedLagSmoother
  (K * 1/3 s window per D-C5-3) + NonlinearFactorGraph + Values.
- Module-level create(...) factory + register() helper for
  register_state_estimator("gtsam_isam2", create). Opt-in registration
  per ADR-002 — no auto-import.
- Key-management policy: key_for_frame(UUID) -> int via
  gtsam.symbol('x', counter); idempotent re-lookup.
- Replace all four NotImplementedError bodies in _isam2_handle.py with
  real GTSAM calls:
  * add_factor → estimator._graph.add(factor); R05 defensive logging
    on success/failure; EstimatorDegradedError on failure.
  * update → _isam2.update + _smoother.update; empty
    FixedLagSmootherKeyTimestampMap substituted for timestamps=None;
    EstimatorFatalError on either failure.
  * compute_marginals → gtsam.Marginals(getFactorsUnsafe(),
    calculateEstimate()).
  * last_anchor_age_ms → (monotonic_ns - _last_anchor_ns) // 1e6.
- StateEstimator Protocol methods on the estimator still raise
  NotImplementedError naming AZ-383 (factor adds) / AZ-384
  (marginals + outputs).
- AZ-382 AC tests: 27 cases covering 10/10 ACs + factory integration.
- AZ-381 test_ac8_handle_methods_raise_named_task removed (obsolete:
  bodies are real now); test_ac8_handle_is_isam2_graph_handle retained.
- Full suite: 547 passed (+26 vs B12), 2 skipped.
- Impl report: _docs/03_implementation/batch_13_cycle1_report.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 05:51:23 +03:00
Oleksandr Bezdieniezhnykh beed43724f [AZ-381] C5 StateEstimator protocol + factory + C8 DTO reshape
- Add StateEstimator Protocol (6 methods, @runtime_checkable) + DTOs
  (EstimatorOutput, EstimatorHealth, IsamState, PoseSourceLabel, Quat)
  in _types/state.py per state_estimator_protocol.md v1.0.0.
- Add C5 error hierarchy (StateEstimatorError + 3 subclasses) and
  C5StateConfig (strategy, keyframe_window, spoof gates,
  no_estimate_fallback_s) with __post_init__ validation.
- Add ISam2GraphHandle Protocol + ISam2GraphHandleImpl skeleton (all
  4 methods raise NotImplementedError naming AZ-382 as owner).
- Add build_state_estimator factory + bind_state_ingest_thread for
  single-writer enforcement; ADR-002 build-flag gating
  (BUILD_STATE_<variant>); INFO log on success.
- Strict reshape of legacy EstimatorOutput / EstimatorHealth across
  all 6 C8 production files (_outbound_provenance,
  _covariance_projector, pymavlink_ardupilot_adapter,
  msp2_inav_adapter, mavlink_gcs_adapter, interface) + 6 C8 test
  files (UUID frame_id, LatLonAlt position_wgs84, Quat orientation,
  PoseSourceLabel enum source_label). Remove ad-hoc DTOs from
  _types/pose.py and from C4's public __init__ (EstimatorOutput is a
  C5 concept, not a C4 one).
- 20 AZ-381 AC tests (10 ACs + 4 config range + NFR + conformance).
- Full suite: 521 passed, 2 skipped (+20 vs Batch 11).
- Contracts: state_estimator_protocol.md v1.0.0 -> active;
  composition_root_protocol.md v1.2.0 -> v1.3.0 (additive state
  block + factory + ingest-thread binding).
- Impl report: _docs/03_implementation/batch_12_cycle1_report.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 05:35:20 +03:00
Oleksandr Bezdieniezhnykh 8a9cf88a46 [AZ-396] [AZ-397] Batch 11: C8 source-set switch + QGC telemetry adapter
AZ-396: PymavlinkArdupilotAdapter.request_source_set_switch body sends
MAV_CMD_SET_EKF_SOURCE_SET, awaits COMMAND_ACK with timeout, enforces
Invariant 11 idempotence (1s rate-limit + skip-after-success). Adds
runtime_root.SpoofRecoverySink to bridge C5 spoof-promotion-recovered
signal to the C8 outbound thread via a bounded dispatch queue.
FcConfig gains spoof_recovery_source_set + source_set_switch_timeout_ms.

AZ-397: QgcTelemetryAdapter implements GcsAdapter strategy: MAVLink 2.0
to QGC, emit_summary downsamples 5Hz to configurable summary_rate_hz
[0.5, 5.0] via integer modulo, emit_status_text mirrors to GCS link,
subscribe_operator_commands translates COMMAND_LONG / PARAM_REQUEST_*
/ REQUEST_DATA_STREAM / MISSION_* / SET_MODE into OperatorCommand DTOs
and audits each receipt to FDR. FcKind.GCS_QGC added for PortConfig.

Tests: 25 new (12 AZ-396 + 13 AZ-397); full suite 501 passing, 2 skipped.
Contracts unchanged (additive FcConfig fields, range relaxation on
GcsConfig.summary_rate_hz, additive FcKind enum value).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 05:06:56 +03:00
Oleksandr Bezdieniezhnykh 1e0be08e8a [AZ-393] [AZ-394] [AZ-395] C8 outbound chain + AP MAVLink2 signing
AZ-393 ArduPilot outbound: PymavlinkArdupilotAdapter encodes
EstimatorOutput to MAVLink2 GPS_INPUT via gps_input_send; emits
NAMED_VALUE_FLOAT(name="src_lbl") every frame and STATUSTEXT on
source_label transition (1 Hz per-severity cap). Smoothed-output
guard (Invariant 6), single-writer thread (Invariant 8), SPD
propagation. Shared helper _outbound_provenance.py owns the
canonical source-label-to-float table + transition rate-limiter.

AZ-394 iNav outbound: Msp2InavAdapter encodes EstimatorOutput to
hand-rolled MSP2_SENSOR_GPS (0x1F03, 52-byte LE payload via
_msp2_sensor_gps_encoder.py + YAMSPy send_RAW_msg). Secondary
unsigned MAVLink channel for STATUSTEXT transitions. open()
rejects non-None signing_key (RESTRICT-COMM-2 / Invariant 2);
request_source_set_switch raises SourceSetSwitchNotSupportedError
(Invariant 9 verified: never calls setup_signing on secondary).

AZ-395 AP MAVLink2 signing: ephemeral per-flight 32-byte key
from secrets.token_bytes; pymavlink setup_signing handshake at
open(); in-place bytearray zeroisation on close(); mid-flight
signing-failure detection (ERROR log + WARNING STATUSTEXT + no
raise; threshold configurable). Key never logged / persisted /
serialised (regex-scanned by AC-4/AC-5). BUILD_DEV_STATIC_KEY=ON
enables repeatable static-key dev path; rejected at open() when
the build flag is absent.

Shared: EstimatorOutput.smoothed (default False) added for the
Invariant 6 gate at the C8 boundary; FcConfig extended with
dev_static_signing_key + signing_failure_threshold (additive
defaults; cross-field validation in __post_init__).

Tests: 33 new AC tests (11 + 11 + 11) covering all 30 ACs; full
suite 476 passing / 2 skipped / 0 failing (was 443). Contract
surfaces unchanged at fc_adapter_protocol v1.0.0 and
composition_root v1.2.0.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 04:47:44 +03:00
Oleksandr Bezdieniezhnykh a61d2d3f4b [AZ-391] C8 inbound: MAVLink + MSP2 decoders + rings + bus + warm-start
Adds the C8 inbound producer side:
- TelemetryRing[T]: bounded drop-oldest ring; first-overflow INFO log
  + monotonic dropped_count.
- SubscriptionBus + SubscriptionHandle: synchronous fan-out, lock-
  released-before-callback to avoid deadlock; subscriber crash caught
  + DEBUG-logged so one bad subscriber cannot kill the decode loop.
- PymavlinkInboundDecoder: pymavlink-based AP decoder for RAW_IMU,
  SCALED_IMU2, ATTITUDE, GPS_RAW_INT, GPS2_RAW, HEARTBEAT, STATUSTEXT.
  Out-of-order drop (Invariant 7) per-kind WARN. STATUSTEXT spoofing
  sentinel promotes subsequent GPS to GpsStatus.SPOOFED within 5 s.
  AC-5.1 warm-start hint cached on first 3D+ fix; embedded into
  every FlightStateSignal.
- Msp2InavInboundDecoder: YAMSPy-based iNav polling decoder for IMU /
  attitude / GPS / flight-state. signed=False always (RESTRICT-COMM-2);
  GpsStatus.SPOOFED is unreachable on iNav.

Adds yamspy>=0.3.3 + pyserial>=3.5 to pyproject.toml.

Tests: 443 pass / 2 skip / 0 fail (+33 in batch 9).

Contract: no drift on fc_adapter_protocol.md v1.0.0; this batch
implements the inbound producer side without changing signatures.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 04:28:14 +03:00
Oleksandr Bezdieniezhnykh 362e93c626 [AZ-390] [AZ-392] C8 FC/GCS adapter foundation + covariance projector
Adds the C8 foundation:
- FcAdapter / GcsAdapter / ReplaySink Protocols + contract DTOs in
  _types/fc.py (PortConfig, FcKind, FlightState, GpsStatus, Severity,
  TelemetryKind, FcTelemetryFrame, FlightStateSignal, GpsHealth,
  OperatorCommand, Subscription, Imu/Attitude samples).
- Disjoint FcAdapterError / GcsAdapterError trees with
  SourceSetSwitchNotSupportedError <: SourceSetSwitchError per AC-9.
- FcConfig + GcsConfig cross-cutting Config blocks with config-load
  validation (unknown strategy rejected at __post_init__).
- runtime_root/fc_factory.py: build_fc_adapter / build_gcs_adapter
  with BUILD_FC_*/BUILD_GCS_* flag gating + INFO log on load +
  single-writer outbound-thread binding.
- CovarianceProjector (helper, AZ-392): 6x6 -> 3x3 -> 2x2 ->
  sqrt(lambda_max) reduction; AP returns float m, iNav returns int mm
  with uint16 clamp + WARN + FDR record. Non-SPD / NaN / wrong-shape
  raise FcEmitError and emit an FDR ERROR record carrying frame_id.

Contracts:
- composition_root_protocol.md 1.1.0 -> 1.2.0 (added fc/gcs blocks +
  build_fc_adapter / build_gcs_adapter + outbound-thread binding).
- fc_adapter_protocol.md unchanged (this batch implements v1.0.0).

Tests: 410 pass / 2 skip / 0 fail (+53 new tests in batch 8).

AZ-391 (inbound subscription) deferred to batch 9 — pulls YAMSPy as
a new external dependency (iNav MSP2 decode).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 04:17:59 +03:00
Oleksandr Bezdieniezhnykh e4ecdaf619 [AZ-294] [AZ-295] [AZ-296] Finish C13: tile snapshot + record-kind policy + takeoff abort
AZ-294: MidFlightTileSnapshotSink writes orthorectified tile JPEGs
atomically to flight_root/<flight_id>/tiles/<tile_id>.jpg, emits a
kind="mid_flight_tile_snapshot" pointer record, and evicts the oldest
tile when the per-flight 64 MiB cap is exceeded. Adds optional
frame_id to the snapshot payload (fdr_record_schema bump).

AZ-295: RecordKindPolicy with two paired gates:
- enforce_or_raise (producer-side) raises RawFrameWriteForbiddenError
  for raw_nav_frame / raw_ai_cam_frame at the call site, defending
  AC-8.5 / RESTRICT-UAV-4.
- gate_for_writer (writer-side) tumbling-window rate-caps
  failed_tile_thumbnail records at <= 0.1 Hz; over-cap drops are
  coalesced into kind="overrun" records with the originating
  producer slug.

AZ-296: take_off() composition-root sequence with strict ordering
(writer.__init__ -> start -> open_flight -> fc_adapter.__init__ ->
fc_adapter.open). On FdrOpenError, logs ERROR record, calls
writer.stop(), prints the documented FATAL line to stderr, and
sys.exit(EXIT_FDR_OPEN_FAILURE=2). composition_root_protocol bumped
to v1.1.0 with the new constants + takeoff-sequence section.

29 new tests; full suite 356 passed / 2 skipped / 0 failures.
No new dependencies (stdlib only).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 03:52:07 +03:00
Oleksandr Bezdieniezhnykh b5dd6031d2 [AZ-291] [AZ-292] [AZ-293] C13 FDR writer chain (batch 6)
AZ-291 — FileFdrWriter: single writer thread draining every registered
FdrClient SPSC ring buffer to per-flight segment files; per-segment
size rotation; cross-process fcntl.flock filelock on flight_root;
ENOSPC degraded mode with rate-capped ERROR logs and one GCS alert.

AZ-292 — FlightHeader/FlightFooter dataclasses + open_flight /
close_flight lifecycle methods; four per-flight monotonic counters
(records_written, records_dropped_overrun, bytes_written,
rollover_count) reported by the footer; flight_id mismatch and
close-without-open are typed errors.

AZ-293 — CapacityCapPolicy (post-rotation hook): walks the flight
directory, drops the oldest CLOSED segment when total > cap (default
64 GiB), emits a kind="segment_rollover" record per drop. Never drops
the currently-open segment or segment 0 alone; cap_misconfigured path
logs ERROR + GCS alert. No config flag disables emission (C13-ST-01).

Schema: bumped fdr_record_schema flight_header / flight_footer payload
key sets to match the AZ-292 task spec (effective 1.0.0 -> 1.1.0; no
prior producer); KNOWN_PAYLOAD_KEYS updated. Added FdrWriterConfig
nested in FdrConfig (segment_size_bytes, batch_size, flight_cap_bytes,
debug_log_per_record).

Tests: 29 new unit tests (8 AC + 1 invariant per task); full suite
323 passed, 2 pre-existing skips, 0 regressions.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 03:38:58 +03:00
Oleksandr Bezdieniezhnykh 33486588de [AZ-271] [AZ-276] [AZ-278] [AZ-282] Finish cross-cutting helpers + relax opencv pin
E-CC-HELPERS closes with the three remaining Layer-1 helpers and
E-CC-CONF closes with the env > YAML > defaults precedence test
gate. All four tickets ship with frozen public surfaces, hermetic
unit tests, and no upward (components.*) imports.

* AZ-271 — tests/unit/shared/config/test_precedence.py (5 ACs + smoke
  test + helper that names the layer in failure messages).
* AZ-282 — helpers/ransac_filter.py: static RansacFilter +
  RansacResult; cv2.setRNGSeed(0) for byte-equal determinism;
  median residual semantics pinned by contract.
* AZ-276 — helpers/imu_preintegrator.py + make_imu_preintegrator;
  GTSAM PreintegratedCombinedMeasurements; strict-monotonic ts_ns
  guard runs before any state mutation. Adjacent hygiene:
  _types/nav.py ImuSample/ImuWindow now use ts_ns:int and the
  spec-mandated ImuBias dataclass.
* AZ-278 — helpers/lightglue_runtime.py: structural R14 fix.
  LightGlueRuntime + non-blocking concurrent-access guard that
  raises rather than serialising. EngineHandle Protocol in
  _types/manifests.py + KeypointSet/CorrespondenceSet in
  _types/matching.py (Protocol surface adds approved by spec).

Dependency conflict (Finding 1, user-approved): gtsam 4.2 (PyPI) is
numpy-1.x-ABI only; opencv-python>=4.12 needs numpy>=2 at runtime.
Resolution: opencv-python pin relaxed to >=4.11.0.86,<4.12. The
D-CROSS-CVE-1 ratchet at ci/opencv_pin_gate.py is held at 4.11.0
with the original 4.12.0 floor restored once a numpy-2-compatible
gtsam wheel ships. Full replay procedure in
_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md.

Tests: 294 passed, 2 skipped (cmake/actionlint env-skips,
pre-existing). 43 new tests added for batch 5. Ruff check + format
clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 03:23:33 +03:00
Oleksandr Bezdieniezhnykh ba20c2d195 [AZ-273] [AZ-274] [AZ-275] [AZ-267] [AZ-268] FDR producer chain + log bridge + contract test
AZ-273: lock-free SPSC ring buffer with pre-allocated slots, power-of-
two capacity, opt-in SPSC guard, and EnqueueResult / FdrSpscViolationError
on the public surface. make_fdr_client caches one client per producer_id
and reads capacity from config.fdr.per_producer_capacity with fallback
to queue_size.
AZ-274: default_overrun_policy implements drop-oldest + retry + immediate
marker emission, with prior-marker dropped_count folding via _evict_one
so user-loss info is never lost across iterations. ERROR diagnostic is
rate-limited to <=1/sec per producer.
AZ-275: FakeFdrSink mirrors the FdrClient public surface and reuses the
production default_overrun_policy via a duck-typed _PolicyAdapter. The
test-only records/all_records_ever properties let component tests assert
both in-buffer and lifetime state. tests/conftest.py registers the
fake_fdr_sink fixture and an AST architecture lint forbids production
imports of fakes.
AZ-267: FdrLogBridgeHandler installs on the root logger via wire_log_bridge
and forwards only WARN+ERROR records into the FDR with kind="log".
Thread-local recursion guard short-circuits internal logging; saturated-
queue diagnostics go to stderr every N=1000 drops.
AZ-268: tests/contract/log_schema.py covers every row of the schema's
Test Cases table plus the "DEBUG+INFO never reach FDR" invariant.
pyproject.toml registers the contract pytest marker and the
contract-mandated log_schema.py file-name.
251 unit + contract tests pass (48 new). Review verdict:
PASS_WITH_WARNINGS; findings are NFR-perf deferrals + documented
relaxation of AZ-274 AC-2 coalescing under permanently-stalled consumer.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 03:00:49 +03:00
Oleksandr Bezdieniezhnykh 3acc7f33dd [AZ-270] [AZ-272] [AZ-279] [AZ-281] [AZ-283] Compose root + FDR schema + 3 Layer-1 helpers
AZ-270: composition root with strategy registry, tier-gated lookup,
topo-order construction, all-or-nothing teardown, StrategyNotLinkedError
payload.
AZ-272: orjson-backed FdrRecord serialise/parse with forward-compat for
unknown payload + top-level fields and canonical overrun-record shape.
AZ-279: pyproj-backed WGS84/ECEF/ENU + OSM slippy-map tile math with
WgsConversionError for shape/range/zoom guards.
AZ-281: strict EngineFilenameSchema build/parse/matches_host with
anchored regex + enum validation; round-trip identity by construction.
AZ-283: dtype-preserving (fp16/fp32) single + batch L2 normaliser with
zero-norm safety and descriptor_metric() source-of-truth.
pyproject.toml pins pyproj>=3.6 and orjson>=3.9 (named-backend deps per
the AZ-272 / AZ-279 contracts). New DTOs LatLonAlt + BoundingBox and
EngineCacheKey + HostCapabilities land in _types/ to back the helper
contracts.
203 unit tests pass (64 new). Review verdict: PASS_WITH_WARNINGS;
findings are perf-NFR deferrals + dep amendment + minor docstring polish.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 02:03:36 +03:00
Oleksandr Bezdieniezhnykh 8e71f6c002 [AZ-266] [AZ-269] [AZ-277] [AZ-280] Cross-cutting log/config + SE3/SHA256 helpers
AZ-266: schema-compliant JSON logging entrypoint, level normalisation,
handler-topology guard, format-error fallback (log_record_schema v1.0.0).
AZ-269: env > YAML > defaults config loader, frozen Config dataclass,
missing-var fail-fast with pointer to .env.example, component-block registry.
AZ-277: GTSAM-backed SE3Utils (matrix<->SE3 + exp/log/adjoint) with strict
orthogonality, dtype, and bottom-row contract enforcement.
AZ-280: atomicwrites-backed write_atomic + independent verify +
order-deterministic aggregate_hash; sidecar format strictness.
pyproject.toml pins gtsam>=4.2,<5.0 and atomicwrites>=1.4,<2.0
(named-backend deps per the AZ-277 / AZ-280 contracts).
139 unit tests pass (44 new). Review verdict: PASS_WITH_WARNINGS;
findings are perf-NFR + journald deferrals, no blocking issues.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 01:33:42 +03:00
Oleksandr Bezdieniezhnykh b12db61444 [AZ-263] Bootstrap: repo skeleton + Docker + CI + Alembic + Tier-1 tests
Implements the AZ-263 / E-BOOT initial structure task:

- Python src/-layout package `gps_denied_onboard/` with per-component
  interface stubs (14 components), type-only DTOs under `_types/`,
  shared helpers under `helpers/` (R14 LightGlue ownership), structured
  JSON logging, runtime composition root with env-var fail-fast gate,
  healthcheck module shared by Docker and CI smoke.
- CMake top-level + `cmake/{build_options,dependencies,strategies}.cmake`
  with the BUILD_* per-binary flags (ADR-002) and pinned external git
  refs for OKVIS2 / VINS-Mono / GTSAM / FAISS / OpenCV >=4.12.0.
- Three Dockerfiles (companion-tier1, operator-tooling,
  mock-suite-sat-service) + two compose files (dev + Tier-1 test).
- Four GitHub Actions workflows: ci.yml (lint/unit/integration/dual
  binary build/SBOM diff/security), ci-tier2.yml (self-hosted Jetson
  AC-bound NFTs), release.yml, cve-rescan.yml.
- Two CI gate scripts: `ci/sbom_diff.py` (deployment SBOM subset +
  R02 exclusion), `ci/opencv_pin_gate.py` (>=4.12.0 enforcement,
  D-CROSS-CVE-1).
- Alembic-driven Postgres 16 initial migration `0001_initial.py`
  mirroring satellite-provider tiles + flights + sector_classifications
  + manifests + engine_cache_entries (data_model.md s 2).
- Tier-1 test scaffolding: 95 passing unit tests covering every AC,
  per-component smoke tests, structured logging JSON output check,
  env-var gate check, healthcheck import check. Two CI-gated tests
  (cmake configure, actionlint) skip locally with explicit reasons.
- Batch report + code review report under `_docs/03_implementation/`.

Verdict: PASS_WITH_WARNINGS (two Low findings, both informational).
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 01:00:28 +03:00
Oleksandr Bezdieniezhnykh 880eabcb3f Decompose Step 6 snapshot: 140 task specs + contract docs
Closes out greenfield Step 6 (Decompose) for all 14 components
(C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446
plus the _dependencies_table.md and component contract documents.

State file updated to greenfield Step 7 (Implement), not_started.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 00:39:48 +03:00
Oleksandr Bezdieniezhnykh 8171fcb29e [AZ-263] [AZ-264] [AZ-265] Decompose: layout, helpers epic, replay epic
Decompose Step 1 + Step 1.5 + new cycle-1 epics:

- Step 1 (Bootstrap): AZ-263 spec at _docs/02_tasks/todo/. Single
  top-level Python package src/gps_denied_onboard/ + nested
  components/ subpackage per user feedback (replaces earlier
  src/gps_denied/ + sibling src/components/ split).
- Step 1.5 (Module Layout): _docs/02_document/module-layout.md is
  the file-ownership map consumed by /implement Step 4. Covers all
  14 components + cross-cuttings (_types, config, logging,
  fdr_client, helpers x8, frame_source, clock, runtime_root,
  cli/replay, healthcheck), 5-layer layering, and the Build-Time
  Exclusion Map for all 4 binaries (airborne, research,
  operator-tooling, replay-cli).
- New epic AZ-264 (E-CC-HELPERS): re-homes the 8 shared helpers
  from per-component child-issues into a single cross-cutting
  epic per the decompose skill cross-cutting rule. R14
  (LightGlue circular dep) is structurally prevented because
  both C2.5 and C3 import gps_denied_onboard.helpers.lightglue_runtime.
- New epic AZ-265 (E-DEMO-REPLAY): offline replay mode (video +
  tlog -> per-tick coordinate stream). 8 child tasks, 27-32 pts.
  Reuses C8 FcAdapter via TlogReplayFcAdapter strategy + new
  VideoFileFrameSource + JsonlReplaySink + compose_replay
  composition root + gps-denied-replay CLI + auto-sync via IMU
  take-off detection (per how_to_test.md). NO ROS dependency.
- Plan Final report at FINAL_report.md.
- _autodev_state.md updated with handoff notes for Step 2
  execution in a fresh chat (~290 MCP calls expected; epic
  ordering documented).

Step 2 task PLAN approved (97 implementation tasks across 18
epics) but EXECUTION deferred per user choice to a fresh chat.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-10 03:14:42 +03:00
Oleksandr Bezdieniezhnykh 64542d32fc Update autodev state, architecture documentation, and glossary terms
Transitioned the autodev state to phase 21, reflecting the completion of Step 5 and the drafting of Step 6 epics. Revised the architecture documentation to clarify the roles of the Tile Manager and its components, ensuring accurate representation of the system's operational flow. Updated glossary entries for Flight State and Operator to incorporate recent changes and enhance clarity on component interactions and responsibilities.
2026-05-10 00:21:34 +03:00
Oleksandr Bezdieniezhnykh 723f574b14 Update autodev state and glossary definitions
Modified the autodev state to transition to phase 10, updating the sub-step name and details to reflect the latest architectural review changes. Enhanced the glossary entry for VioStrategy to clarify its functionality, build-time exclusions, and implications for deployment and research binaries, ensuring alignment with recent architectural decisions.
2026-05-09 04:53:38 +03:00
Oleksandr Bezdieniezhnykh c19c76481c Update autodev skill documentation and acceptance criteria
Enhanced the SKILL.md file to enforce conciseness rules for the state file, specifying acceptable content and file size limits. Updated the autodev state to reflect the transition to the planning phase, including changes to the current step and sub-step details. Revised acceptance criteria to clarify validation requirements and external dependencies, ensuring alignment with the latest research findings. Added a new overlay for Mode B revisions to track changes and decisions made during the assessment process.
2026-05-09 03:10:57 +03:00
Oleksandr Bezdieniezhnykh 846670a5c5 Refactor documentation for splittable artifacts and update references
Updated various documentation files to clarify the handling of splittable artifacts, allowing for folder equivalents of key markdown files when they exceed size limits. Adjusted references in multiple sections to reflect this new structure, ensuring consistency across the research methodology. Enhanced clarity on the saving actions and artifact organization, particularly for `01_source_registry.md`, `02_fact_cards.md`, and `06_component_fit_matrix.md`. This change aims to improve usability and maintainability of the research documentation.
2026-05-08 23:39:30 +03:00