Wraps HttpTileUploader (AZ-319) with two bounded retry budgets:
- In-call (per-batch) — re-invokes inner on PARTIAL outcome up to
`max_in_call_retries` times with capped exponential backoff
(`min(base ** attempt_number, cap)`). On exhaustion: surfaces an
operator hint via `next_retry_at_s = now + backoff_cap_s`.
- Per-tile (cross-call) — atomically increments c6's
`tiles.upload_attempts` counter for every rejection; once a tile
hits `max_per_tile_attempts` it is forward-only transitioned to
`voting_status = upload_giveup` (excluded from `pending_uploads`).
Each transition emits FDR `kind="c11.upload.giveup"` plus an
ERROR log.
C6 contract changes (AZ-303 v1.3.0):
- VotingStatus.UPLOAD_GIVEUP added (forward-only from PENDING/TRUSTED).
- TileMetadataStore.increment_upload_attempts(tile_id) -> int added
with NotImplementedError default for backwards-compat.
- Migration 0003_c11_upload_attempts: additive column +
widened ck_tiles_voting_status (preserves IS NULL clause).
C11 wiring:
- C11RetryConfig + disable_retry_decorator on C11Config.
- build_tile_uploader wraps in decorator by default; bypass flag
returns the bare HttpTileUploader. New `clock` keyword.
Cross-component isolation honoured (AZ-507): the decorator declares
`_RetryMetadataStoreLike` Protocol cut over c6's TileMetadataStore
and references `UPLOAD_GIVEUP` via a local string constant — no c6
imports.
Tests: 13 decorator + 1 conformance + 2 factory bypass + AC-6 enum
update + alembic head bump + AZ-272 schema fixture. 238 passed across
c11/c6/fdr suites; pre-existing perf microbenches unrelated.
Code review: PASS_WITH_WARNINGS (5 Low/Informational findings,
docs-level or downstream-CI-blocked). See
_docs/03_implementation/reviews/batch_41_review.md.
Co-authored-by: Cursor <cursoragent@cursor.com>
Production-default DescriptorIndex strategy backed by the faiss-cpu
PyPI wheel (>=1.7,<2.0). Implements the AZ-303 Protocol surface end
to end: HNSW32 + IndexIDMap2 search, atomic three-file rebuild
(.index + .sha256 sidecar + .meta.json), triple-consistency load
check, mmap-backed reads with IO_FLAG_MMAP|IO_FLAG_READ_ONLY, optional
warm-up query at construction, FAISS RuntimeError rewrap to
IndexUnavailableError / IndexBuildError, and FaissDescriptorIndex.from_config
classmethod wired into runtime_root.storage_factory.
The original spec required a custom pybind11 wrapper over a vendored
FAISS HEAD; the user opted for the upstream faiss-cpu wheel after
research fact #92 confirmed ARM64 wheel availability for Jetson and
the existing pyproject.toml already pinned faiss-cpu. cpp/faiss_index/
placeholder removed; BUILD_FAISS_INDEX flag retained as a
runtime/factory gate (no native target). Spec rewritten end-to-end and
archived to _docs/02_tasks/done/.
C6TileCacheConfig extended with faiss_index_path and
faiss_warmup_query_path fields. tests/conftest.py sets
KMP_DUPLICATE_LIB_OK=TRUE to remediate the macOS faiss/torch libomp
duplicate-load abort during pytest (no-op on CI Linux). 21 new tests
cover AC-1..12 + 2 NFRs + from_config smoke; AZ-303 protocol-conformance
fake updated with from_config classmethod.
Tests: 124/124 c6_tile_cache pass; 1334 project-wide pass; 1
pre-existing OKVIS2 submodule failure unrelated.
Doc sync: module-layout.md, components/08_c6_tile_cache/description.md
§5, batch_35_cycle1_report.md.
Co-authored-by: Cursor <cursoragent@cursor.com>
AZ-507: codify cross-component import rule. Added
_types/inference_errors.py shim re-exporting EngineBuildError +
CalibrationCacheError from c7_inference; narrowed C10
EngineCompiler's except Exception to the two typed errors so unknown
exceptions propagate (AC-3). Rewrote module-layout.md "Imports from"
sections for 9 components + added Rule 9; appended an
architecture.md ADR-009 note explaining why components must go
through _types/*.
AZ-323: ManifestBuilder + Ed25519ManifestSigner. Canonical JSON via
orjson OPT_SORT_KEYS+OPT_INDENT_2, atomic-write Manifest.json + sha
sidecar + .sig via AZ-280, operator-key fingerprint allowlist gate
(C10-ST-01), ADR-010 takeoff_origin + flight_id baked into Manifest
AND manifest_hash so re-planned routes change the cache identity
(AC-15/AC-16). 20 unit tests cover all 16 ACs.
AZ-324: ManifestVerifierImpl. Fail-closed Steps A-D: Manifest.json
sidecar self-hash, Ed25519 trust-key set, schema parse with
absolute/.. path rejection + takeoff_origin in-bbox check, stream
SHA-256 per artifact with multi-failure accumulation. Operator mode
re-derives tiles_coverage_sha256 from C6; airborne mode trusts the
signed aggregate. 19 unit tests cover all 17 ACs.
Composition root: c10_factory.build_manifest_builder +
build_manifest_verifier + c6_tile_metadata_store_to_tiles_query
adapter (the one place that legitimately imports both C6 and C10
without violating the AZ-270 lint).
Dependency: pinned cryptography>=43.0,<46.0 in pyproject.toml.
Tests: 1300 passed, 80 skipped (env-only), ruff clean for all
AZ-323/324 files.
AZ-306 (FAISS) intentionally deferred to batch 35 — needs C++
pybind11 toolchain not present in this environment.
Co-authored-by: Cursor <cursoragent@cursor.com>
Land the C10 per-model engine compile + cache-reuse orchestrator.
`EngineCompiler.compile_engines_for_corpus(request)` walks the
corpus, computes the canonical engine filename via AZ-281
`EngineFilenameSchema.build`, and either reuses the cached binary
(cache hit, AZ-280 `Sha256Sidecar.verify` returns True) or delegates
to the AZ-297 `compile_engine` on the injected runtime (cache miss;
the runtime owns the write path). Returns one `EngineCompileResult`
per backbone carrying the canonical `EngineCacheEntry`, outcome
(BUILT / REUSED), and `compile_duration_s` (None on reuse).
Hardware-tied reuse (D-C10-6 / D-C10-7) falls out of the filename
schema — a host change rebuilds at the new path and leaves the old
files untouched (AC-4).
Design corrections vs. the task spec body:
- The spec proposed a c10-local `EngineCacheEntry` carrying outcome
and duration; that name is already taken by the AZ-297 canonical
DTO. The wrapper is renamed `EngineCompileResult`; the canonical
shape wins.
- The spec called `InferenceRuntime.host_info()`, which is not in
the AZ-297 Protocol. `HostCapabilities` is threaded through
`EngineCompileRequest` instead so the composition root owns host
probing and the compiler stays decoupled.
- The c10 layer cannot import `components.c7_inference` (arch rule
`test_az270_compose_root.test_ac6`). `engine_compiler.py` defines
`CompileEngineCallable` — a structural Protocol cut of
`InferenceRuntime` exposing only `compile_engine` — and catches
broad `Exception` (re-raising preserves the original type;
`error_class` is recorded in the ERROR log payload).
Production
- engine_compiler.py: `CompileOutcome` enum, `BackboneSpec`,
`EngineCompileRequest`, `EngineCompileResult`,
`EngineCompileSummary` DTOs; `CompileEngineCallable` Protocol;
`EngineCompiler` with the single public method.
- config.py: `BackboneConfig` + `C10ProvisioningConfig`
(`workspace_mb` default 4 GiB to match C7 NFT-LIM-01); validate
positive shape dims and duplicate model_name detection in
`__post_init__`.
- runtime_root/c10_factory.py: `build_engine_compiler(config)` wires
the existing `build_inference_runtime` factory through;
`build_backbone_specs(config)` materialises the `BackboneSpec`
tuple from the config block.
- components/c10_provisioning/__init__.py: re-exports the AZ-321
surface and registers the new config block.
Tests
- test_engine_compiler.py: covers AC-1..AC-10 + missing-sidecar
sibling case for AC-5. Tier-1 via fake runtime that writes through
the REAL `Sha256Sidecar.write_atomic_and_sidecar`. Tier-2
placeholders for the cache-hit p99 NFR (200 MB engine sweep) and
kill-during-compile atomic-write NFR.
Docs
- module-layout.md: c10_provisioning Per-Component Mapping lists the
new internal modules (engine_compiler.py, config.py), the
composition-root c10_factory.py, the AZ-321 public re-export
surface, and the registered config block.
- batch_33_cycle1_report.md + reviews/batch_33_review.md:
PASS_WITH_WARNINGS (4 Low findings accepted).
Tests run: c10_provisioning 13 passing + 2 Tier-2 skips; combined
unit suite (excluding pending components) 543 passing, 21
env-skipped.
Co-authored-by: Cursor <cursoragent@cursor.com>
Land the fallback InferenceRuntime strategy that satisfies C7-IT-05:
when the TRT-direct path (AZ-298) cannot deserialise a cached engine
or when the operator explicitly selects ORT, the system stays in the
air at degraded latency rather than dropping the request. Conforms to
the AZ-297 Protocol; current_runtime_label() == "onnx_trt_ep".
Production
- onnx_trt_ep_runtime.py: compile_engine is a no-op returning an
EngineCacheEntry pointing at the source .onnx; deserialize_engine
is gate-first for .engine entries and gate-skip for .onnx, builds
an ORT InferenceSession with the provider list
[TensorrtExecutionProvider, CUDAExecutionProvider,
CPUExecutionProvider], stages cached engines into the ORT TRT EP
cache directory via symlink-or-copy, warms up with one session.run
after construction, and honours config.inference.ort_disallow_cpu_
fallback by raising EngineDeserializeError when the active provider
resolves to CPU; infer emits a one-shot c7.fallback_to_onnx_trt_ep
WARN log plus gcs_alert callback on first call when is_fallback=
True; release_engine is idempotent. _build_provider_args is the
single point that pins TRT EP option-key names (Risk-3) and caps
trt_max_workspace_size at gpu_memory_budget_bytes // 4 (AC-8).
- config.py: adds ort_trt_cache_dir (validated non-empty) and
ort_disallow_cpu_fallback to C7InferenceConfig.
- fdr_client/records.py: adds c7.fallback_to_onnx_trt_ep and
c7.cpu_fallback FDR record kinds.
Tests
- test_onnx_trt_ep_runtime.py: covers AC-1..AC-8 + Risk-2 CPU-fallback
alert + Risk-3 option-key pin + NFR-reliability error rewrap; Tier-1
via fake ORT session; Tier-2 placeholders skip on macOS dev for
numerical FP16 comparison and session-creation perf NFR.
- test_protocol_conformance.py: drops onnx_trt_ep from the missing-
module parametrize now that the module ships.
- test_az272_fdr_record_schema.py: extends per-kind fixture builder
to cover the two new C7 FDR kinds in the roundtrip / schema-version
AC tests.
Docs
- module-layout.md: replaces the pending onnx_trt_runtime row with
the shipped onnx_trt_ep_runtime row + capabilities list.
- batch_32_cycle1_report.md + reviews/batch_32_review.md: full batch
+ self-review (PASS_WITH_WARNINGS, 4 Low findings accepted).
Tests run: c7_inference 139 passing + 17 Tier-2 skips; combined unit
suite (excluding pending components) 529 passing, 19 env-skipped.
Co-authored-by: Cursor <cursoragent@cursor.com>
Implement the production-default InferenceRuntime strategy on JetPack
6.2 + TensorRT 10.3 (per D-C7-9). The runtime owns the full TRT
lifecycle: compile_engine via the Polygraphy + trtexec + IBuilderConfig
hybrid (FP16 / INT8 / Mixed precision), deserialize_engine with
EngineGate-first ordering and a pre-allocation GPU memory budget gate,
infer via H2D -> enqueueV3 -> D2H -> stream sync on the owned CUDA
stream, idempotent release_engine, and an injected
ThermalStatePublisher delegation for thermal_state.
INT8 calibration cache trust (D-C10-6, AC-2/3/4) is enforced by a
.calib_cache.sha256 file-integrity sidecar (AZ-280) plus a new
.calib_cache.dataset_sha256 sidecar that records the dataset content
hash at compile time; reuse only when both agree, rebuild silently on
dataset hash mismatch, raise CalibrationCacheError on corrupt sidecar
(never silently overwritten).
GPU memory budget (NFT-LIM-01, default 4 GiB) is checked BEFORE any
TRT call beyond the gate (AC-6); a pre-allocation refusal raises
OutOfMemoryError and leaves the resident state unchanged.
TensorRT 10.3 / Polygraphy / PyCUDA are lazy-imported inside the
methods that need them so the module loads cleanly on Tier-0 hosts.
A standalone CLI entry (python -m
gps_denied_onboard.components.c7_inference.tensorrt_runtime compile
<onnx> <build_config.json>) is wired for C10 CacheProvisioner
(AZ-321) to invoke pre-flight without holding a runtime instance.
C7InferenceConfig gains gpu_memory_budget_bytes (default 4 GiB) and
trtexec_timeout_s (default 600 s, Risk 4 mitigation), both validated
in __post_init__.
Tests: 26 active + 6 Tier-2-gated skips; AC-1 / AC-3 / AC-4 / AC-5
/ AC-6 / AC-7 / AC-10 + NFR-reliability fully covered on Tier-1
via fake CUDA / TRT modules; AC-2 / AC-8 / AC-9 / NFR-perf-deserialize
placeholders skip with prerequisite reason and live in the AZ-298
Tier-2 microbench harness. Code review verdict
PASS_WITH_WARNINGS (1 Medium hot-path hoist fix auto-applied).
Co-authored-by: Cursor <cursoragent@cursor.com>
Land F1+F2+F3 from the PASS_WITH_WARNINGS cumulative review of
batches 28-30 (AZ-305 / AZ-307 / AZ-308) before continuing to
batch 31. All three are bounded by the c6_tile_cache component;
no public API contract change beyond the new error re-export.
F1 (Medium / Architecture):
Re-export CacheBudgetExhaustedError from c6_tile_cache package
__init__ so consumers can catch the AZ-308 budget-exhaustion
variant without widening to TileCacheError (which drops the
needed_bytes / available_bytes / evicted_count diagnostics).
F2 (Medium / Architecture):
Refresh the c6_tile_cache section of module-layout.md so the
Public API line and the Internal-files list reflect what is
actually on disk after batches 28-30 (drop the stale
Tile / TileRecord / connection.py entries; add the AZ-305
postgres_filesystem_store + tools.py, AZ-307 freshness_gate,
AZ-308 cache_budget_enforcer entries; pivot the Public API
bullet to the __init__.__all__ as canonical, mirroring the
c7_inference section format).
F3 (Low / Maintainability):
Promote the triplicate intra-module _iso_ts_now() helper into
a single c6_tile_cache._timestamp.iso_ts_now and import it
from postgres_filesystem_store, freshness_gate, and
cache_budget_enforcer. FDR record envelope ts format now has
one source of truth.
Test impact:
tests/unit/c6_tile_cache: 105 passed, 57 skipped (pre-existing
Docker-compose skip markers). No new tests required for F1/F2
(re-export + doc) and F3 (pure refactor; existing tests assert
FDR record shape, not the helper symbol).
Autodev state advanced to awaiting-invocation; next session
resumes greenfield Step 7 at batch 31 (AZ-298 TensorrtRuntime).
Co-authored-by: Cursor <cursoragent@cursor.com>
CacheBudgetEnforcer.reserve_headroom(needed_bytes) returns immediately
when total_disk_bytes() + needed_bytes <= budget, otherwise iterates
lru_candidates in eviction_batch_size batches, deletes via delete_tile,
emits one INFO log per evicted tile (c6.evicted) and one FDR record per
eviction batch (c6.eviction_batch, evicted_tile_ids capped to 5).
Raises CacheBudgetExhaustedError AFTER a full sweep if the budget
cannot be met. BudgetEnforcedTileStore decorates a TileStore so the
policy stays separable from PostgresFilesystemStore. Composition root
in storage_factory.build_tile_store wires the wrapper unconditionally.
PostgresFilesystemStore now accepts lru_clock: Clock | None = None;
when set, read_tile_pixels calls record_lru_access(tile_id, now) so
eviction picks the right LRU candidates. Production wiring injects
WallClock(); AZ-305 unit tests still construct without the clock and
keep their pass-through semantics. Contract tile_store.md bumped to
v1.1.0 to add CacheBudgetExhaustedError to the TileCacheError family;
shared FDR schema bumped to v1.3.0 for the new c6.eviction_batch kind.
Co-authored-by: Cursor <cursoragent@cursor.com>
Replaces the AZ-305 pass-through _evaluate_freshness hook with the
production FreshnessGate. Loads tile_freshness_rules + sector
classifications once at construction, builds an rtree index, and on
every evaluate() either returns metadata unchanged (FRESH), stamps
freshness_label=DOWNGRADED (stable_rear + stale), or raises
FreshnessRejectionError carrying tile_id / age_seconds /
classification / rule diagnostics (active_conflict + stale).
Constructed inside PostgresFilesystemStore.from_config; the public
storage_factory signature is preserved so AZ-305 unit tests still
build the store with freshness_gate=None for the pass-through path.
FDR schema bumped to v1.2.0: adds c6.freshness.rejected and
c6.freshness.downgraded kinds (non-breaking; v1.1 readers route them
opaquely). Operator CLI `python -m c6_tile_cache.freshness_gate
explain` dry-runs the decision for a (lat, lon, capture_ts).
Adjacent hygiene: c6_tile_cache.tools._dump_tile now passes
os.environ to load_config (AZ-305 regression — load_config requires
the env mapping).
Co-authored-by: Cursor <cursoragent@cursor.com>
Adds the production PostgresFilesystemStore implementing both protocols
in a single class. Filesystem-backed JPEG I/O (atomic sidecar write,
read-only mmap) + Postgres-backed metadata (spatial bbox, LRU, voting,
upload bookkeeping). Wires composition via `from_config` classmethod.
Key behaviors:
- AC-3 strict reading: INSERT runs first inside an open transaction;
duplicate-key collisions raise `TileMetadataError` BEFORE any byte is
written, leaving the original file + sidecar byte-identical. Atomic
sidecar write happens inside the same transaction; commit closes it.
Comp-delete remains as a safety net for the rare commit-after-write
failure path.
- AC-2 content-hash gate runs before any I/O.
- Construction performs an orphan-file reconciliation scan and emits an
INFO `c6.store.construct` log with steady-state stats.
Adds `c6.write` and `c6.write_failed` FDR record kinds (schema v1.1.0,
forward-compatible) and a thin operator CLI at
`c6_tile_cache.tools dump` for inspection.
Dependencies: adds `psycopg-pool>=3.2,<4.0` for the connection pool used
on the F3 read-hot path.
Tests: 25 new tests for c6_tile_cache cover AC-1..AC-15 plus
MmapTilePixelHandle + helper round-trips. Full Tier-2 unit suite passes
(1215 passed, 8 skipped, 1 pre-existing unrelated failure
`test_ac8_read_host_tuple_on_jetson` — missing `pynvml` on macOS,
Jetson-only).
Co-authored-by: Cursor <cursoragent@cursor.com>
Cumulative review of batches 23-27 (cycle 1) surfaced three Medium
documentation-drift findings on module-layout.md. All three fixed
inline per user direction:
F1: c7_inference Internal list expanded with architecture_registry,
config, engine_gate, errors, manifest, thermal_publisher (added
across AZ-300/301/302).
F2: c6_tile_cache `connection.py` re-attributed from AZ-304 (which
deferred it) to AZ-305 with a "planned, not landed yet" tag.
F3: c7_inference Public API description rewritten by category
(Protocol + DTOs + component services + config + error family)
with a pointer to __init__.py's __all__ for the canonical list.
Cumulative review report: _docs/03_implementation/cumulative_review_
batches_23-27_cycle1_report.md (PASS_WITH_WARNINGS).
Autodev state moved to status: paused_user_requested per user
choice; /autodev will resume at greenfield Step 7 (next batch
selection) on next invocation.
Co-authored-by: Cursor <cursoragent@cursor.com>
Strictly additive Alembic migration on the AZ-263 baseline (data_model
.md § 6.1 / § 6.3): six new tiles columns (tile_uuid UNIQUE,
location_hash, content_sha256, disk_bytes, accessed_at, uploaded_at),
four new btree indices, one UNIQUE expression index over the
COALESCE-zero-uuid natural key, CHECK widening of
ck_tiles_freshness_status to the AZ-263 + AZ-303 vocabulary UNION,
four NULLable bbox columns on sector_classifications, and a new
tile_freshness_rules table seeded with the two default thresholds.
Pinned UUIDv5 namespace (TILE_NAMESPACE_UUID =
5b8d0c2e-1a4f-4b3a-8c9d-e7f6a3b2c1d0) + derive_tile_id /
derive_location_hash helpers cross-coordinated with
satellite-provider. Migration runner apply_migrations(config) drives
Alembic command.upgrade("head") against the AZ-263 env with one
retry on PG SQLSTATE 40001 and structured INFO logs on apply / no-op.
Contract bump tile_metadata_store.md v1.1.0 -> v1.2.0 adds
TileMetadata.location_hash: UUID | None = None (non-breaking).
module-layout.md updated so c6_tile_cache explicitly Owns
db/migrations/**.
Tier-1 tests: UUIDv5 determinism + locked vectors + DSN resolution +
retry mocked DBAPIError -> 1180 passed, 32 skipped. Tier-2 docker
schema tests gated by @pytest.mark.docker run against the existing
docker-compose.test.yml db service.
Co-authored-by: Cursor <cursoragent@cursor.com>
- Changed autodev state to reflect the transition from batch 26 to batch 27, updating the phase and details for the compute-batch step.
- Incremented the version of the tile metadata store from 1.0.0 to 1.1.0, refining the uniqueness invariant to use a natural key that includes flight_id, allowing coexistence of multiple rows for the same tile from different flights.
- Updated the last modified date in the tile metadata store documentation to reflect recent changes.
Co-authored-by: Cursor <cursoragent@cursor.com>
Implements the production-default ReRankStrategy: K=10 → N=3 by
single-pair LightGlue inlier count, with strict drop-and-continue
(INV-8) on per-candidate TileFetch / backbone / zero-inlier failures
and RerankAllCandidatesFailedError on zero survivors. Composition
root injects the shared LightGlueRuntime + Clock + the new
FeatureExtractor helper (an L1 placeholder OpenCvOrbExtractor that
unblocks AZ-343 and future C3 strategies — task scope expansion).
Architectural notes:
- Cross-component imports stay banned; tile_store types as `object`
and the C6 TileCacheError family is duck-typed by class module
prefix (same workaround AZ-348 adopted for c7_inference; proper
fix is to relocate TileCacheError to _types/ in a follow-up).
- Clock injection follows the replay contract (AZ-398 Invariant 2);
reranked_at is sourced from clock.monotonic_ns().
- AZ-342 factory grew `feature_extractor` + `clock` + `fdr_client`
parameters; existing AZ-342 conformance tests updated.
Tests: 19 new AC-1..AC-12 + mixed-failure scenarios in
test_inlier_count_reranker.py; existing AZ-342 suite (26) still
green. Full repo sweep 1093 passed / 2 skipped (cmake/actionlint
not on PATH).
Co-authored-by: Cursor <cursoragent@cursor.com>
Defines the public `ConditionalRefiner` Protocol (PEP 544
@runtime_checkable, two methods: `refine_if_needed` +
`was_invoked`), extends `MatchResult` in-place with two
default-valued refinement fields (`refinement_label`,
`refinement_added_latency_ms`), defines the `RefinerError` family
(`RefinerBackboneError`, `RefinerConfigError`), and ships the
trivial `PassthroughRefiner` reference impl.
Both refiner strategies are linked unconditionally — no
`BUILD_REFINER_*` flag (NOT ADR-002 territory). Runtime selection
only per ADR-001. `PassthroughRefiner` returns the input
`MatchResult` by reference (bit-identical correspondences per
contract INV-5) and always reports `was_invoked() is False`.
Documentation: renames `module-layout.md` `c3_5_adhop` Public API
symbol from `AdHoPRefinementStrategy` to `ConditionalRefiner`
(AC-14) so the doc agrees with `description.md` and the contract.
AC-9 (single-thread binding) deferred to AZ-270 runtime-root
composition, mirroring AZ-336 / AZ-342 / AZ-344 Risk-4 precedent.
AC-7 for the `"adhop"` strategy stops at `ModuleNotFoundError`
because the AdHoP backbone is owned by AZ-349. All other ACs +
NFRs covered by 36 new conformance tests.
Architectural note: `PassthroughRefiner.inference_runtime` is
typed as `object` because the L3→L3 import ban
(`test_az270_compose_root`) forbids c3_5_adhop from importing
c7_inference; the runtime-root factory narrows the type at
construction time.
Co-authored-by: Cursor <cursoragent@cursor.com>
Defines the public `CrossDomainMatcher` Protocol (PEP 544
@runtime_checkable, two methods: `match` + `health_snapshot`),
the three frozen+slotted DTOs (`CandidateMatchSet`, `MatchResult`,
`MatcherHealth`) in the L1 `_types/matcher.py` layer, the
`MatcherError` family (`MatcherBackboneError`,
`InsufficientInliersError`), and the composition-root
`build_matcher_strategy` factory with lazy-import +
`BUILD_MATCHER_<variant>` gating per ADR-002.
`RollingHealthWindow` accumulator (60 s, amortised O(1) update,
strict O(1) snapshot) is constructed by the factory and injected
into every concrete matcher so all backbones share window
semantics; this is what backs C5's spoof-promotion gate.
Legacy placeholder `MatchResult` removed from `_types/matching.py`;
import-only consumers (`c4_pose.interface`, `c3_5_adhop.interface`)
repointed at the new `_types/matcher.py` home — zero behavioural
change to those components.
AC-9 (single-thread binding) and AC-10 (LightGlueRuntime
identity-share with C2.5) deferred to AZ-270 runtime-root
composition, mirroring the AZ-342 Risk-4 escape clause. All other
ACs + NFRs covered by 70 new conformance tests.
Co-authored-by: Cursor <cursoragent@cursor.com>
Foundational scaffolding for the InlierCountReRanker (AZ-343) and
the future C3 CrossDomainMatcher consumer (AZ-344). No concrete
re-ranker is implemented here.
* ReRankStrategy Protocol (single rerank(frame, vpr_result, n,
calibration) -> RerankResult method) with all 8 invariants in the
docstring — notably INV-8 drop-and-continue (per-candidate failure
NEVER propagates unless every candidate fails).
* DTOs moved to L1 _types/rerank.py — RerankCandidate, RerankResult;
frozen+slots; tuple-not-list for RerankResult.candidates; tile_id
encoded as (zoom_level, lat, lon) tuple to keep _types/ free of any
c6_tile_cache (L3) import per module-layout.md.
* Error family: RerankError + RerankBackboneError +
RerankAllCandidatesFailedError. Only RerankAllCandidatesFailedError
escapes rerank(); RerankBackboneError is caught inside the per-
candidate loop, logged ERROR, FDR-stamped, candidate dropped.
* C2_5RerankConfig (strategy enum default "inlier_count", top_n int
default 3) with strict validation at load; registered into
Config.components on c2_5_rerank import.
* build_rerank_strategy(config, *, tile_store, lightglue_runtime)
factory: 1-strategy resolution table, lazy import,
BUILD_RERANK_<variant> gate, ImportError → StrategyNotAvailableError
mapping. The shared LightGlueRuntime is constructor-injected
(R14 fix: neither C2.5 nor C3 owns its lifecycle).
Renamed the Protocol from the existing stub "RerankStrategy" to
"ReRankStrategy" to match the contract; updated module-layout.md.
Removed the legacy RerankResult shape from _types/vpr.py — the
v1.0.0 shape lives in _types/rerank.py.
Excluded per task spec:
* Concrete InlierCountReRanker (AZ-343).
* C3 matcher protocol task (AZ-344, next in batch).
* AC-9 single-thread binding + AC-10 LightGlueRuntime identity-share
between C2.5/C3 — deferred per task spec Risk 3 until the generic
compose_root thread-binding registry and the C3 factory both land.
Tests: AC-1..AC-8 + AC-11 + NFR-perf-factory in
tests/unit/c2_5_rerank/test_protocol_conformance.py. The legacy
smoke test is removed. Full sweep: 997 passed (one pre-existing
flake in test_az296_takeoff_abort, subprocess timing, unrelated to
this commit; passes in isolation).
Co-authored-by: Cursor <cursoragent@cursor.com>
Foundational scaffolding for every concrete C2 backbone (UltraVPR,
NetVLAD, MegaLoc, MixVPR, SelaVPR, EigenPlaces, SALAD — AZ-337..AZ-340)
and the C2.5 ReRanker consumer side. No backbone is implemented here.
* VprStrategy Protocol (embed_query / retrieve_topk / descriptor_dim)
+ BackbonePreprocessor C2-internal Protocol (NOT in Public API per
description.md § 6).
* DTOs in L1 _types/vpr.py — VprQuery, VprCandidate, VprResult; all
frozen + slots; tuple-not-list for VprResult.candidates so the
immutability invariant truly holds.
* Error family: VprError + VprBackboneError + VprPreprocessError +
IndexUnavailableError; same-named but namespace-distinct from
c6_tile_cache.IndexUnavailableError (the c2 family is the closed
envelope C5 / C2.5 consume; concrete strategies rewrap the C6 form).
* C2VprConfig (strategy enum + backbone_weights_path + faiss_index_path)
with strict validation at load; registered into Config.components on
c2_vpr import.
* build_vpr_strategy factory with 7-strategy resolution table, lazy
import, BUILD_VPR_<variant> gating, ImportError→
StrategyNotAvailableError mapping, and pre-flight descriptor_dim
match against DescriptorIndex.descriptor_dim() — mismatch fires
ConfigError at startup, NOT at first frame.
Contract change vs the v1.0.0 draft: factory takes descriptor_index:
DescriptorIndex (not tile_store: TileStore) because descriptor_dim()
lives on DescriptorIndex per C6's Public API. The contract markdown
is updated to match.
Architecture: VprCandidate.tile_id is a plain (zoom, lat, lon) tuple,
keeping _types/ (L1) free of any c6_tile_cache (L3) import per
module-layout.md. Consumers reconstruct TileId at the C6 boundary.
Excluded per task spec:
* Concrete backbones (AZ-337..AZ-340).
* FAISS HNSW retrieve wiring (AZ-341).
* DescriptorNormaliser helper (AZ-283, already shipped).
* AC-9 single-thread binding — deferred per task spec Risk 4 until the
generic compose_root thread-binding registry is in place (today
each factory owns its own, e.g. fc_factory).
Tests: 45 ACs + NFRs in tests/unit/c2_vpr/test_protocol_conformance.py
covering AC-1..AC-8, the error family, the config validation, the
factory NFR (p99 ≤ 50 ms). The legacy smoke test is removed. Full
sweep 973 passed, 2 skipped (CI-only cmake / actionlint).
Co-authored-by: Cursor <cursoragent@cursor.com>
Add operator warm-start path to C5 StateEstimator Protocol and both
implementations (GtsamIsam2StateEstimator, EskfStateEstimator), plus
the third clause of the AZ-385 spoof-promotion gate.
- StateEstimator Protocol: set_takeoff_origin(origin, sigma_horiz_m,
sigma_vert_m) -> None.
- iSAM2: PriorFactorPose3 at origin with diagonal sigmas, single
isam2.update().
- ESKF: zero _nominal_pos, overwrite _P position block with sigma**2.
- SourceLabelStateMachine.process_gps_sample bounded-delta clause:
WgsConverter.horizontal_distance_m vs smoother estimate; reject
resets the dwell-time counter so AZ-385 cannot re-promote off bad
GPS.
- New EstimatorAlreadyStartedError (StateEstimatorConfigError
subclass) on late call after first add_*.
- C5StateConfig: spoof_promotion_bounded_delta_m=200,
default_takeoff_origin_sigma_horiz_m=5,
default_takeoff_origin_sigma_vert_m=10.
- New GpsSample DTO + WgsConverter.horizontal_distance_m helper.
- 4 new FDR kinds (cold_start_origin.{set,unavailable},
gps_bounded_delta.{accept,reject}) registered in AZ-272 schema.
- 33 new unit tests cover AC-1..AC-15; full repo 750 passed / 2
skipped (pre-existing CI tooling skips).
Docs synced: protocol contract, C5 component description,
architecture, glossary, system-flows, C10 provisioning description.
Co-authored-by: Cursor <cursoragent@cursor.com>
AZ-294: MidFlightTileSnapshotSink writes orthorectified tile JPEGs
atomically to flight_root/<flight_id>/tiles/<tile_id>.jpg, emits a
kind="mid_flight_tile_snapshot" pointer record, and evicts the oldest
tile when the per-flight 64 MiB cap is exceeded. Adds optional
frame_id to the snapshot payload (fdr_record_schema bump).
AZ-295: RecordKindPolicy with two paired gates:
- enforce_or_raise (producer-side) raises RawFrameWriteForbiddenError
for raw_nav_frame / raw_ai_cam_frame at the call site, defending
AC-8.5 / RESTRICT-UAV-4.
- gate_for_writer (writer-side) tumbling-window rate-caps
failed_tile_thumbnail records at <= 0.1 Hz; over-cap drops are
coalesced into kind="overrun" records with the originating
producer slug.
AZ-296: take_off() composition-root sequence with strict ordering
(writer.__init__ -> start -> open_flight -> fc_adapter.__init__ ->
fc_adapter.open). On FdrOpenError, logs ERROR record, calls
writer.stop(), prints the documented FATAL line to stderr, and
sys.exit(EXIT_FDR_OPEN_FAILURE=2). composition_root_protocol bumped
to v1.1.0 with the new constants + takeoff-sequence section.
29 new tests; full suite 356 passed / 2 skipped / 0 failures.
No new dependencies (stdlib only).
Co-authored-by: Cursor <cursoragent@cursor.com>
AZ-291 — FileFdrWriter: single writer thread draining every registered
FdrClient SPSC ring buffer to per-flight segment files; per-segment
size rotation; cross-process fcntl.flock filelock on flight_root;
ENOSPC degraded mode with rate-capped ERROR logs and one GCS alert.
AZ-292 — FlightHeader/FlightFooter dataclasses + open_flight /
close_flight lifecycle methods; four per-flight monotonic counters
(records_written, records_dropped_overrun, bytes_written,
rollover_count) reported by the footer; flight_id mismatch and
close-without-open are typed errors.
AZ-293 — CapacityCapPolicy (post-rotation hook): walks the flight
directory, drops the oldest CLOSED segment when total > cap (default
64 GiB), emits a kind="segment_rollover" record per drop. Never drops
the currently-open segment or segment 0 alone; cap_misconfigured path
logs ERROR + GCS alert. No config flag disables emission (C13-ST-01).
Schema: bumped fdr_record_schema flight_header / flight_footer payload
key sets to match the AZ-292 task spec (effective 1.0.0 -> 1.1.0; no
prior producer); KNOWN_PAYLOAD_KEYS updated. Added FdrWriterConfig
nested in FdrConfig (segment_size_bytes, batch_size, flight_cap_bytes,
debug_log_per_record).
Tests: 29 new unit tests (8 AC + 1 invariant per task); full suite
323 passed, 2 pre-existing skips, 0 regressions.
Co-authored-by: Cursor <cursoragent@cursor.com>
Closes out greenfield Step 6 (Decompose) for all 14 components
(C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446
plus the _dependencies_table.md and component contract documents.
State file updated to greenfield Step 7 (Implement), not_started.
Co-authored-by: Cursor <cursoragent@cursor.com>
Transitioned the autodev state to phase 21, reflecting the completion of Step 5 and the drafting of Step 6 epics. Revised the architecture documentation to clarify the roles of the Tile Manager and its components, ensuring accurate representation of the system's operational flow. Updated glossary entries for Flight State and Operator to incorporate recent changes and enhance clarity on component interactions and responsibilities.
Modified the autodev state to transition to phase 10, updating the sub-step name and details to reflect the latest architectural review changes. Enhanced the glossary entry for VioStrategy to clarify its functionality, build-time exclusions, and implications for deployment and research binaries, ensuring alignment with recent architectural decisions.
Enhanced the SKILL.md file to enforce conciseness rules for the state file, specifying acceptable content and file size limits. Updated the autodev state to reflect the transition to the planning phase, including changes to the current step and sub-step details. Revised acceptance criteria to clarify validation requirements and external dependencies, ensuring alignment with the latest research findings. Added a new overlay for Mode B revisions to track changes and decisions made during the assessment process.
- Modified the Docker Compose configuration to include an input root for replay tests and added an environment variable for enabling SITL.
- Enhanced documentation for various testing processes, including the addition of a Runtime Completeness Decomposition Gate and clarifications on internal module testing requirements.
- Updated the implementation completeness report to reflect the current state and added new test cases for performance and resilience scenarios.
Co-authored-by: Cursor <cursoragent@cursor.com>
Keep VIO package and native bridge paths backend-neutral so BASALT remains an implementation choice rather than a component boundary.
Co-authored-by: Cursor <cursoragent@cursor.com>
- Revised acceptance criteria in the acceptance_criteria.md file to clarify metrics and expectations, including updates to GPS accuracy and image processing quality.
- Enhanced restrictions documentation to reflect operational parameters and constraints for UAV flights, including camera specifications and satellite imagery usage.
- Added new research documents for acceptance criteria assessment and question decomposition to support ongoing project evaluation and decision-making.