Wires the airborne composition root for replay-as-configuration (ADR-011):
- compose_root(config) branches on config.mode in {"live", "replay"}.
Live behaviour is unchanged; replay builds ReplayInputAdapter,
attaches JsonlReplaySink, and injects NoopMavlinkTransport.
- New private module runtime_root/_replay_branch.py holds the
replay-only strategy graph + build-flag gate + calibration loader.
- Config gains Config.mode (Literal["live","replay"]) plus
Config.replay sub-block with nested ReplayAutoSyncConfig that mirrors
the AZ-405 AutoSyncConfig DTO; YAML loader + ENV map updated.
Absorbs the AZ-400 transport-seam retrofit that AZ-401 strictly
required but AZ-400 had not delivered:
- New MavlinkTransport Protocol (write/bytes_written/close).
- NoopMavlinkTransport (replay; build-flag gated, idempotent close,
thread-safe byte counter).
- SerialMavlinkTransport (live, no-op restructure of existing pymavlink
byte path; encoder retrofit to actually USE it is the AZ-558
follow-up).
AZ-401 AC-9 (NoopMavlinkTransport.bytes_written > 0 after C8 encoders
run) is BLOCKED on AZ-558 — the encoder routing retrofit is out of
the AZ-401 task envelope (FORBIDDEN files: pymavlink_ardupilot_adapter,
msp2_inav_adapter). AZ-558 spec, batch_61_review.md, and the test's
@pytest.mark.skip rationale all carry the deferral reason.
Tests: 22 compose_root replay-branch tests + 17 transport tests.
Full regression: 2063 passed, 86 environment-skips, 1 documented
skip (AC-9 / AZ-558), 1 pre-existing flaky perf test deselected.
Co-authored-by: Cursor <cursoragent@cursor.com>
Adds the Layer-4 cross-cutting `replay_input/` module per ADR-011:
ReplayInputAdapter converges (video, tlog) into the standard
FrameSource + FcAdapter + Clock surfaces the airborne composition
root consumes. Owns time-alignment between video frames and tlog
IMU/attitude ticks (manual via --time-offset-ms or auto via the
AZ-405 IMU-take-off detector + Farneback motion-onset detector).
Auto-sync algorithm (auto_sync.py):
- Tlog take-off detector: sustained vertical-accel excess > 0.5 g for
>= 0.5 s + sustained attitude-rate magnitude > 1 rad/s.
- Video motion-onset detector: dense Farneback flow magnitude > 1.5 px
sustained >= 0.5 s (deterministic per AC-10).
- compute_offset combines the two; confidence = min(tlog, video).
- validate_offset_or_fail implements the AC-9 95 % frame-window match
validator with configurable threshold + window.
ReplayInputAdapter.open() ordering (AC-13):
1. Load tlog samples + fail-fast on missing RAW_IMU/SCALED_IMU2 or
ATTITUDE BEFORE any video read.
2. Resolve offset (auto-sync OR manual override; manual bypasses the
detectors entirely per AC-8).
3. Run AC-9 validator on resolved offset; raise auto-sync hard-fail
for AC-7 (CLI exit 2 mapping).
4. Build single Clock instance per pace (TlogDerived/ASAP, Wall/REAL).
5. Construct VideoFileFrameSource and TlogReplayFcAdapter with the
resolved offset baked in (replay protocol Invariant 8).
Structured log + FDR records on auto-sync detected / low-confidence /
AC-8 hard-fail kinds. Idempotent close (AC-12).
Tests: 25 unit tests across tests/unit/replay_input/ covering all 13
ACs (kernel-level synthetic fixtures for AC-1..AC-10; coordinator-
level OpenCV synthetic videos + faked pymavlink for AC-6..AC-13).
Contract update: replay_protocol.md v2.0.0 added fdr_client to the
ReplayInputAdapter __init__ signature (was missing in the prose; the
task spec already listed it in the allowed-imports section).
Co-authored-by: Cursor <cursoragent@cursor.com>
Re-design replay mode per user direction: replay is no longer a fourth
Docker image with a reduced component set, but a `config.mode = "replay"`
branch of the single airborne binary. The pre-flight workflow (route in
suite UI -> C12 tile download via real satellite-provider -> C10
manifest+engines build) is identical between live and replay; only three
strategies swap at compose time:
FrameSource: Live <-> Video
FcAdapter: Pymavlink/MSP2 <-> TlogReplay
MavlinkTransport: Serial <-> Noop
The C8 outbound MAVLink encoders run unchanged in both modes; their
bytes hit `NoopMavlinkTransport` in replay and disappear. A new
`JsonlReplaySink` taps C5's `EstimatorOutput` stream so the parent-suite
UI sees per-tick coordinates by tailing `results.jsonl`. MAVLink 2.0
signing key remains mandatory (operator supplies a dummy file).
A new `replay_input/` Layer-4 cross-cutting coordinator owns
`(video, tlog) -> (FrameSource, FcAdapter, Clock)` convergence; the
composition root sees only standard interfaces past `.open()`.
Docs:
- architecture.md: new ADR-011 with full rationale; ADR-002 binary
narrative updated.
- contracts/replay/replay_protocol.md: bumped to v2.0.0; 12 invariants
(notably mode-agnosticism + encoder byte-equality + signing key
mandatory + real C6 cache in replay).
- module-layout.md: Build-Time Exclusion Map dropped from 4 to 3 binary
columns; replay-mode `BUILD_*` flags default ON in airborne;
`shared/replay_input` cross-cutting entry added.
- epics.md: E-DEMO-REPLAY scope reframed; story points 27-32 -> 19-24.
Task respecs:
- AZ-401: shrunk 3 -> 2 pts; `compose_root` mode branch + JSONL sink +
NoopMavlinkTransport wiring; legacy `compose_replay` export deleted.
- AZ-402: console-script wrapper that mutates `config.mode = "replay"`
and dispatches into the shared airborne main; `--mavlink-signing-key`
mandatory.
- AZ-403: CANCELLED. Moved to done/ with banner; Jira transition deferred
via `_docs/_process_leftovers/2026-05-14_az_403_cancellation_pending_tracker.md`.
- AZ-404: AC-4 reworded as mode-agnosticism AST scan + encoder
byte-equality test; new AC-8 operator-workflow rehearsal.
- AZ-405: also owns the `replay_input/` module + `ReplayInputAdapter`.
_dependencies_table.md updated: AZ-401 gains AZ-405 dep; AZ-404 drops
AZ-403 dep; AZ-403 row marked CANCELLED.
Co-authored-by: Cursor <cursoragent@cursor.com>
Opens E-DEMO-REPLAY (AZ-265): the two C8 strategies that let the
upcoming compose_replay (AZ-401) and gps-denied-replay CLI (AZ-402)
run the production C1-C5 pipeline against a recorded (.tlog, video)
pair without touching live FC I/O.
AZ-400 lands the contract ReplaySink Protocol (emit + close per
replay_protocol.md v1.0.0) and JsonlReplaySink: orjson-serialised
JSONL, fsync-on-close, build-flag gated (BUILD_REPLAY_SINK_JSONL),
double-close idempotent, FDR mirror on open/close. The drifted
AZ-390 stub in interface.py is removed; the canonical Protocol now
lives in replay_sink.py per module-layout.md and is re-exported via
__init__.py. AZ-390 conformance test widened.
AZ-399 lands TlogReplayFcAdapter: full FcAdapter Protocol surface,
build-flag gated (BUILD_TLOG_REPLAY_ADAPTER), pymavlink stream-parse
with bounded pre-scan + fail-fast on missing required messages
(R-DEMO-3), dedicated decode thread feeding the existing AZ-391
SubscriptionBus. Outbound surface raises FcEmitError per Invariant 5;
request_source_set_switch raises SourceSetSwitchNotSupportedError.
Pacing honours Invariant 6 via Clock.sleep_until_ns. time_offset_ms
shifts every emitted received_at per Invariant 8. Non-monotonic
timestamps raise FcOpenError.
Test coverage: 188 c8_fc_adapter tests pass; 1 skipped (AZ-399 AC-1
500 MB tlog RSS bound, deferred to AZ-404 e2e behind RUN_REPLAY_E2E).
Code review: PASS_WITH_WARNINGS — 1 Medium (mapping logic duplicates
AZ-391 live decoder; intentional today, four behavioural deltas
documented), 2 Low.
Co-authored-by: Cursor <cursoragent@cursor.com>
Implement the single production-default C4 PoseEstimator strategy.
AZ-358 — Marginals path: OpenCV solvePnPRansac (SOLVEPNP_IPPE) on
best-candidate inliers, PriorFactorPose3 with Jacobian-derived initial
covariance, flushed into C5's iSAM2 graph via the widened
ISam2GraphHandle.update(graph, values, None) (Option B). Posterior
covariance from compute_marginals().marginalCovariance(pose_key) with
SPD-defensive Cholesky check. Tile pixel -> ENU world conversion via
the shared WgsConverter + a configurable tile_size_px. Two spec
deviations now documented in the AZ-358 task file: PriorFactorPose3
over GenericProjectionFactorCal3DS2 (avoids unbounded landmark
variables; same Fisher information on the pose marginal) and explicit
(graph, values, timestamps) update args (aligns with C5's impl).
AZ-361 — Jacobian + thermal hybrid: per-frame dispatch on
thermal_state.thermal_throttle_active selects the cv2.projectPoints-
derived 6x6 information matrix (with ridge regularisation) as the
emitted covariance. Skips the iSAM2 factor add under throttle
(Invariant 12). Emits CovarianceDegradedWarning via warnings.warn
(never raised); paired WARN log + FDR record rate-limited per
covariance_degraded_warn_window_ns (default 60 s) via an injected
monotonic Clock. Supersedes the AZ-358 NotImplementedError stub.
Widens ISam2GraphHandle from get_pose_key only to all five C4-facing
methods (add_factor, update, compute_marginals, last_anchor_age_ms);
C5's existing ISam2GraphHandleImpl already satisfies the superset, so
no C5 source change this batch. Threads fdr_client + clock through
pose_factory composition.
Registers two new FDR payload kinds: pose.frame_done (per-call
telemetry; both success and PnpFailureError paths) and
pose.covariance_degraded (per-window throttle exposure).
Tests: 21 new (AZ-358 AC-1..11 + AZ-361 AC-1..10/12/13; AZ-361 AC-11
RMSE-ratio informational per spec, not asserted). Updates 2 existing
test files for Protocol widening and the FDR-schema round trip.
Code review verdict: PASS_WITH_WARNINGS (5 findings: Medium x2,
Low x3; none blocking). Full suite: 1958 passed, 1 unrelated
host-dependent perf failure (c12 CLI cold-start, pre-existing).
Co-authored-by: Cursor <cursoragent@cursor.com>
Adds JsonSidecarWarmStartHintStore (atomic JSON + SHA-256 sidecar via
AZ-280) inside c1_vio, plus the cross-strategy WarmStartWiredStrategy
wrapper + prime_warm_start_from_disk / prime_warm_start_from_fc hooks
at runtime_root. AC-7 post-reset covariance inflation and AC-8 "no
fake confidence" baseline floor are enforced at the wiring layer so
no strategy module needed edits. Adds three c1_vio config fields
(warm_start_store_dir, warm_start_save_period_frames,
post_reset_covariance_inflation_factor) and registers the new FDR
kind vio.warm_start. 34 unit tests cover all 10 ACs + 3 NFRs.
Verdict PASS_WITH_WARNINGS — see
_docs/03_implementation/reviews/batch_56_review.md for the four
non-blocking documentation findings (F1 cold-start log kind shorthand,
F2 strategy-frame pose semantics, F3 dev-hardware perf smoke, F4
runtime_root importing c1-internal _facade_spine for shared FDR
conventions).
Closes AZ-335; depends on AZ-528 (batch 55).
Co-authored-by: Cursor <cursoragent@cursor.com>
Replace 3-way byte-equivalent orchestration-spine duplication across
okvis2.py / vins_mono.py / klt_ransac.py with a single c1-internal
helper at components/c1_vio/_facade_spine.py. Closes cumulative
review batches 52-54 Finding F1. No behaviour change — all existing
AZ-332 / AZ-333 / AZ-334 AC tests pass unmodified (114 c1_vio tests
green, 237 with adjacent regression suite).
The helper exposes 5 stateless free functions (now_iso, bias_norm,
se3_from_4x4, frame_ts_ns, frame_image) and a FacadeSpine mixin
class providing _classify_state / _tick_lost / _emit_transition.
Concrete strategies inherit the mixin and set spine-required
instance attributes in __init__. Mirrors the AZ-527 precedent for
c2_vpr-side _assert_engine_output_dim consolidation.
New test file test_az528_facade_spine.py covers AC-1..AC-8 with 19
tests, including an AST regression guard that prevents future
re-introduction of the consolidated free functions in any strategy
module, plus a Risk-1 static check that every strategy's __init__
assigns every spine-required attribute.
Archive AZ-528 task spec to done/, bump autodev state to batch 56.
Co-authored-by: Cursor <cursoragent@cursor.com>
Verdict: PASS_WITH_WARNINGS — auto-chain allowed per implement skill
Step 14.5. AZ-528 created as the formal hygiene PBI for the c1_vio
strategy facade orchestration-spine 3-way duplication (Medium /
Maintainability) — the deferred F1 finding from B53 + B54 per-batch
reviews. AZ-527 closes the parallel c2_vpr-side helper duplication
finding (carried over from cumulative-49-51 F1).
Carry-overs: F2 (B52-54 test-fake / _patch_pose_recovery sharing) +
cumulative-49-51 F2 (AC-10 spec wording drift across c2_vpr specs)
remain informational; no code defect, no active drift.
Next cumulative review trigger fires after Batch 57 (every K=3).
Co-authored-by: Cursor <cursoragent@cursor.com>
Implement KltRansacStrategy, the ADR-002 engine-rule mandatory
simple-baseline VioStrategy for E-C1. Pure-Python facade over
OpenCV's cv2.goodFeaturesToTrack / calcOpticalFlowPyrLK /
findEssentialMat / recoverPose pipeline — no C++/pybind11 binding
by design so a Tier-0 workstation runs the strategy with
`pip install opencv-python` and the BUILD_KLT_RANSAC=ON gate alone.
Constructor + state machine + FDR transition spine mirror
Okvis2Strategy + VinsMonoStrategy so the AZ-331 factory + IT-12
comparative harness treat all three as drop-in substitutable; the
duplication is the consolidation target now formally in scope for
the next cumulative review (batches 52-54).
AC coverage: AC-1..AC-11 + NFR-perf mapped to passing tests
(25 tests, 23 pass + 2 tier-2 skipped on dev/CI runners; all 25
pass under GPS_DENIED_TIER=2). Honest-covariance invariant (AC-9)
implemented as residual-scatter / (N_inliers - 5) with an inlier-
count penalty — no client-side floor or smoother; cov Frobenius
grows monotonically across DEGRADED. Camera-agnostic source
(AC-11) enforced by CI-grep gate that excludes docstring text.
Test-Run Cadence: focused suite tests/unit/c1_vio/ green (95 passed,
6 skipped); config-loader + compose-root suites green; full-suite
gate deferred to Step 16 per implement skill.
Co-authored-by: Cursor <cursoragent@cursor.com>
VinsMonoStrategy: Python facade conforming to AZ-331 Protocol; mirrors
the AZ-332 OKVIS2 facade so the AZ-331 factory + IT-12 comparative
harness can treat both as drop-in substitutable. Native binding is a
pybind11 skeleton compiled behind BUILD_VINS_MONO=ON (default OFF for
airborne / operator-tooling / replay-cli per module-layout.md
Build-Time Exclusion Map). Real vins_estimator wiring is the Tier-2
follow-up.
VinsMonoConfig added to c1_vio/config.py with sliding-window /
feature-tracker / marginalisation / opt-iteration knobs plus
__post_init__ validation; exported through the package __init__.
cpp/vins_mono/CMakeLists.txt replaces the AZ-263 placeholder with full
pybind11 wiring: Risk-1 mitigation forces VINS_MONO_USE_ROS=OFF;
Risk-2 mitigation links Eigen from the same cpp/_third_party/eigen pin
as OKVIS2; Risk-3 mitigation enforces BUILD_VINS_MONO=OFF in
deployment binaries via the gate at the top of the file.
Tests: 17 new in test_vins_mono_strategy.py (15 pass + 2 tier2 skip);
fake_vins_mono_binding fixture added to conftest.py mirroring the
fake_okvis2_binding pattern; test_protocol_conformance updated to drop
vins_mono from _STRATEGIES_WITHOUT_PY_MODULE so the existing
parametrised factory tests route through the new strategy.
Focused c1_vio suite: 72 passed, 4 skipped. Full suite: 1788 passed,
1 unrelated pre-existing flake (c12 cold-start perf, env-bound).
Co-authored-by: Cursor <cursoragent@cursor.com>
Adds two research-only VprStrategy implementations for the IT-12
comparative-study matrix. MegaLocStrategy (D=2048, 322x322) and
MixVprStrategy (D=4096, 320x320), both via C7 TensorRT FP16 with
their own concrete BackbonePreprocessor. Single-stage global L2
normalisation; retrieval delegated to FaissBridge; FDR records +
structured logs identical to UltraVPR. BUILD_VPR_MEGALOC and
BUILD_VPR_MIXVPR ON for research/replay-cli only, OFF for airborne
and operator-tooling (fail-fast at composition root via existing
AZ-336 factory). Uses helpers.iso_ts_from_clock from day 1 — no
new timestamp helper duplicates introduced.
36 parametrised AC tests + 25 protocol-conformance + 18 helper
regression tests pass; 1690 / 1690 unit tests pass (excluding 1
pre-existing flaky cold-start subprocess test in c12). Verdict:
PASS_WITH_WARNINGS — one Medium follow-on (AZ-527 to consolidate
4-way _assert_engine_output_dim) + one Low AC wording drift.
Co-authored-by: Cursor <cursoragent@cursor.com>
Closes cumulative review 46-48 F1 (Medium) + F3 (Low). Adds
iso_ts_from_clock(clock) alongside iso_ts_now() in the Layer-1
helper; migrates four duplicate definitions in c2_vpr (net_vlad,
ultra_vpr, _faiss_bridge) and c12_operator_orchestrator
(operator_reloc_service). Output format flipped +00:00 -> Z to
align with iso_ts_now() and the canonical FDR _TS fixture (FDR
schema test passes unmodified).
18 helper AC tests + 186 sibling tests pass; ruff clean.
Co-authored-by: Cursor <cursoragent@cursor.com>
Batch 48 / Cycle 1 (greenfield Step 7). Closes cumulative review
batches 31-33 F2 and 28-30 F3 by replacing the duplicated private
_iso_ts_now() one-liners with a single Layer-1 helper:
src/gps_denied_onboard/helpers/iso_timestamps.py
iso_ts_now() -> str
Output format matches the canonical FDR _TS fixture
(YYYY-MM-DDTHH:MM:SS.ffffffZ); no FDR schema change.
Migrated call-sites (3): c7_inference/onnx_trt_ep_runtime,
c7_inference/thermal_publisher, plus the 3 c6_tile_cache callers
that previously imported from the local c6_tile_cache/_timestamp
shim (now deleted, superseded by the Layer-1 helper).
Spec drift resolved (Choose A, user-approved): spec listed 5 call
sites + +00:00 regex; on-disk reality at batch start is 3 sites +
Z-suffix matching every existing helper and the FDR _TS fixture.
Spec preamble + AC-2 regex updated in the task file; documented in
batch_48_cycle1_report.md.
Tests: 9 new AC tests (AC-1..AC-7 + Layer-1 invariant +
public-surface defensive); 216 focused tests pass including the
unmodified AZ-272 FDR schema suite and AZ-270 / AZ-507 layering
lints. Verdict: PASS (no findings).
Co-authored-by: Cursor <cursoragent@cursor.com>
User feedback after a transitionJiraIssue call returned a bare
{"success": true} that I trusted blindly: the rule should require
explicit verification and stop-and-ask on any ambiguous response.
Two targeted clarifications:
- .cursor/rules/tracker.mdc - Tracker Availability Gate now lists
the full set of failure modes (non-2xx, timeout, empty body,
opaque success) and bans automatic retries. Adds an explicit
read-back requirement when the response is minimal, and adds
"abort" to the user-choice menu.
- .cursor/skills/implement/SKILL.md - Step 5 (In Progress) and
Step 12 (In Testing) now spell out the STOP-and-ASK rule inline
instead of just pointing at tracker.mdc. Adds the read-back
verification step for opaque responses.
Co-authored-by: Cursor <cursoragent@cursor.com>
UltraVPR is the Documentary Lead's PRIMARY backbone per
description.md § 1 and is wired by default
(config.c2_vpr.strategy = "ultra_vpr"). Runs on the C7 TensorRT
runtime (AZ-298) or ONNX-Runtime fallback (AZ-299); explicitly NOT
on the PyTorch FP16 runtime so a TRT engine compile bug can fall
back to NetVLAD without simultaneously breaking both strategies.
Production changes:
- c2_vpr/ultra_vpr.py - UltraVprStrategy + module-level create()
factory. embed_query pipeline: preprocess -> runtime.infer ->
single-stage L2 -> VprQuery. retrieve_topk delegates one-line to
FaissBridge. Engine load + output-shape assertion happen at
create() time (AC-6) so misconfiguration surfaces at startup,
not 17 minutes into a flight. UltraVPR has D=512 fixed (NOT a
config knob; AC-5 / AC-6 / AC-7 all assume 512). Single-stage L2
(no intra-cluster step like NetVLAD; spy-test enforces this so a
future refactor cannot silently regress recall).
- c2_vpr/_preprocessor_ultra_vpr.py - centre-crop using the camera
calibration's principal point (cx, cy from intrinsics_3x3),
falling back to geometric centre + WARN log when calibration is
absent (AC-9). Resize -> (384, 384) -> ImageNet mean/std ->
FP16 NCHW.
- No composition-root changes: UltraVPR consumes a pre-compiled
.trt engine (no PyTorch nn.Module), so the strategy module does
NOT expose MODEL_NAME / architecture_factory. The composition-
root _register_strategy_architecture helper no-ops cleanly for
this case (verified by test_create_does_not_register_pytorch_architecture).
Tests:
- tests/unit/c2_vpr/test_ultra_vpr.py - 29 tests covering all 12
ACs + preprocessor contract + constructor validation + FDR
record emission + single-stage L2 enforcement.
Full unit suite: 1637 passed / 80 env-skipped (+29 new tests).
Per-batch code review (batch_47_review.md): PASS_WITH_WARNINGS
(3 Low-severity findings; no Critical / High / Medium):
- F1: _iso_ts_from_clock is now the 7th copy (AZ-508 will close).
- F2: AZ-337 spec uses outdated C7 API names; affects upcoming
AZ-339 / AZ-340. Spec-hygiene PBI recommended.
- F3: principal-point fallback uses (0, 0) zero-detection for
missing calibration; safe but tightens when intrinsics become
Optional.
Architectural notes:
- AZ-507 layering clean. Imports only InferenceRuntimeCut,
DescriptorIndexCut, c2_vpr internals, _types, helpers,
clock, fdr_client. Architecture lint test passes.
- Pattern parity with NetVLAD (B46) where semantics permit;
UltraVPR-specific paths (single-stage L2, 'embedding' output
key, TRT runtime, no architecture registry, principal-point
crop) are clearly localised.
Co-authored-by: Cursor <cursoragent@cursor.com>
PASS_WITH_WARNINGS verdict covering AZ-328 (BuildCacheOrchestrator),
AZ-329 (PostLandingUploadOrchestrator + FdrFooterReader), AZ-330
(OperatorReLocService), AZ-523/AZ-524 (C11 internal gate removal +
c12_operator_orchestrator rename), and AZ-341 (FaissBridge +
DescriptorIndexCut).
Four Low-severity findings, all hygiene or carry-over: F1 ISO
timestamp helper duplicated across 6 modules (AZ-508 hygiene PBI
exists), F2 IndexUnavailableError namespace duplication c2/c6
flagged for spec/docstring alignment, F3 AZ-341 spec lists unused
normaliser parameter, F4 carry-over cold-start microbench host-load
flake.
Full unit suite 1565 passed / 80 env-skipped at close of window.
No new layer-direction or AZ-507 violations introduced; three new
structural Protocol cuts (TileDownloaderCut, FdrFooterReader,
DescriptorIndexCut) all follow the same shape.
State file updated: last_cumulative_review batches_40-42 ->
batches_43-45.
Co-authored-by: Cursor <cursoragent@cursor.com>
Implements F1 pre-flight cache build orchestrator on the operator
workstation. Composes C11 TileDownloader (AZ-316), C12 CompanionBringup
(AZ-327), C12 FlightsApiClient (AZ-489), and the new
RemoteCacheProvisionerInvoker into one sequenced flow guarded by a
filelock-backed workstation-side lockfile.
Architectural decisions:
- Phase-0 flight-resolve runs BEFORE the lockfile (ADR-010): a flight
that cannot be resolved is an operator-input error, not a contended-
resource error. Enforced by AC-11 + AC-14.
- Consumer-side cuts (AZ-507) for C11 + C10 types: local Protocols /
mirror DTOs in tile_downloader_cut.py and _types.py; external errors
matched by name-based whitelisting so unknown exceptions still
propagate per AC-6. Cross-component type translation lives at the
composition root (c12_factory).
- Failure surfacing: recognised operational failures (download error,
companion not ready, build error, flight-resolve error) return as
CacheBuildReport(outcome=failure, failure_phase=...). Only lockfile
contention raises (BuildLockHeldError) since no phase ever ran.
- Workstation-side filelock library (project pin); no custom primitive.
- Remote C10 stdout streamed line-by-line as DEBUG with api_key /
auth_token redacted before logging (defence-in-depth).
- CLI is now a thin adapter; all workflow logic lives in
build_cache.py. operator-tool build-cache exit codes map per
CacheBuildReport.failure_phase + failure_exception_type.
Tests: 116 c12 unit tests pass (29 new for AZ-328 covering 15/15 ACs +
NFR-perf-overhead microbench; 7 new for remote_c10_invoker; 3 new for
file_lock; test_cli_build_cache rewritten for new orchestrator
interface). Full repo suite: 1522 passed, 80 skipped.
Also: replays Batch 42's ruff format leftover for c12 flights_api +
test_az489 files (formatter ran over the c12 directory after new
files were added). Pure whitespace; no behaviour change.
Full report: _docs/03_implementation/batch_43_cycle1_report.md
Co-authored-by: Cursor <cursoragent@cursor.com>
5 findings: F1 (Medium / Maintainability) - _iso_now copies grew to 8
across c11 + c13 + c7, AZ-508 hygiene PBI no longer matches reality;
F2-F5 (Low) - triplicated atomic-write JSON helpers, 4x duplicated
SectorClassification enum (acknowledged by ADR-009), recurring
"outcome=failure" prose vs typed-exception drift across the C11 trio,
and an NFR-perf-cold-start near-miss that prompted PEP 562 lazy-import
discipline in c12. None block the implement loop.
Updated _autodev_state.md last_cumulative_review to batches_40-42.
Co-authored-by: Cursor <cursoragent@cursor.com>
Wraps HttpTileUploader (AZ-319) with two bounded retry budgets:
- In-call (per-batch) — re-invokes inner on PARTIAL outcome up to
`max_in_call_retries` times with capped exponential backoff
(`min(base ** attempt_number, cap)`). On exhaustion: surfaces an
operator hint via `next_retry_at_s = now + backoff_cap_s`.
- Per-tile (cross-call) — atomically increments c6's
`tiles.upload_attempts` counter for every rejection; once a tile
hits `max_per_tile_attempts` it is forward-only transitioned to
`voting_status = upload_giveup` (excluded from `pending_uploads`).
Each transition emits FDR `kind="c11.upload.giveup"` plus an
ERROR log.
C6 contract changes (AZ-303 v1.3.0):
- VotingStatus.UPLOAD_GIVEUP added (forward-only from PENDING/TRUSTED).
- TileMetadataStore.increment_upload_attempts(tile_id) -> int added
with NotImplementedError default for backwards-compat.
- Migration 0003_c11_upload_attempts: additive column +
widened ck_tiles_voting_status (preserves IS NULL clause).
C11 wiring:
- C11RetryConfig + disable_retry_decorator on C11Config.
- build_tile_uploader wraps in decorator by default; bypass flag
returns the bare HttpTileUploader. New `clock` keyword.
Cross-component isolation honoured (AZ-507): the decorator declares
`_RetryMetadataStoreLike` Protocol cut over c6's TileMetadataStore
and references `UPLOAD_GIVEUP` via a local string constant — no c6
imports.
Tests: 13 decorator + 1 conformance + 2 factory bypass + AC-6 enum
update + alembic head bump + AZ-272 schema fixture. 238 passed across
c11/c6/fdr suites; pre-existing perf microbenches unrelated.
Code review: PASS_WITH_WARNINGS (5 Low/Informational findings,
docs-level or downstream-CI-blocked). See
_docs/03_implementation/reviews/batch_41_review.md.
Co-authored-by: Cursor <cursoragent@cursor.com>
Captures the C11 operator-side trio landing (AZ-317/318/319) plus the
C10 build-orchestrator close-out (AZ-325) and the AZ-515 canonical-hash
extraction. Three Low findings, all documentation-level drift between
spec text and as-built code; none block Batch 40. Resolves prior F1
(AZ-515 closed the verifier-into-builder private import).
Co-authored-by: Cursor <cursoragent@cursor.com>
Lands the production HttpTileUploader composing AZ-317's gate, AZ-318's
per-flight signing, and consumer-side cuts over c6 storage. Implements
the full upload flow: gate ON_GROUND -> start_session -> enumerate
pending -> per-batch multipart POST with Ed25519 signing -> mark_uploaded
on ack -> end_session in finally. Honours Retry-After (RFC 7231 int +
HTTP-date), exponential backoff on 5xx, fail-fast on TLS/401/403.
Adds C11Config block, three FDR kinds (tile.queued, tile.rejected,
batch.complete), and the build_tile_uploader composition-root factory.
Cross-component access to c6 stays Protocol-cut (AZ-507 / AZ-270).
Tests: 17 new unit tests covering AC-1..AC-14 plus throughput NFR; AZ-272
schema fixtures for the three new FDR kinds. Full unit suite: 1404 passed.
Co-authored-by: Cursor <cursoragent@cursor.com>
Batch 38 (cycle 1) lands the two upload-side prerequisites the
upcoming AZ-319 TileUploader needs to authenticate per-flight
sessions against the parent suite's D-PROJ-2 ingest contract.
AZ-317 FlightStateGate:
- confirm_on_ground() defence-in-depth gate atop ADR-004 process
isolation; fail-closed for UNKNOWN, IN_FLIGHT, TAKING_OFF,
LANDING, and source-failure (mapped to UNKNOWN with original
exception preserved on __cause__).
- ERROR log on refusal, INFO log on pass, single source call per
invocation (no polling, no retry).
AZ-318 PerFlightKeyManager:
- Per-flight ephemeral Ed25519 keypair via the project-pinned
cryptography library; sign(payload) -> 64-byte Ed25519 signature.
- Best-effort zeroisation of a project-controlled bytearray mirror
on end_session; OpenSSL-side buffer freed via dropped reference.
- __del__ safety net with WARN log if end_session was missed.
- start_session emits FDR kind=c11.upload.session.key.public so the
safety officer can correlate flights with key fingerprints.
- record_signature_rejection emits FDR + ERROR log on parent-suite
ingest rejection (security-critical, never silently dropped).
Shared C11 plumbing:
- TileManagerError parent + 3 subclasses (FlightStateNotOnGroundError,
SessionNotActiveError, SignatureRejectedError envelope).
- FlightStateSignal (str, Enum) and PublicKeyFingerprint DTOs.
- FlightStateSource Protocol on c11_tile_manager.interface.
- runtime_root.c11_factory factories for both new services.
- Two new FDR kinds registered in fdr_client.records central
KNOWN_PAYLOAD_KEYS; AZ-272 schema-roundtrip fixtures added in
lockstep so the central test stays green.
Tests: 26 new + 2 fixture additions; full suite 1384 passed, 80
skipped (documented Docker / Tier-2 / CUDA gates).
Code review: PASS_WITH_WARNINGS — 2 Low findings documented in
_docs/03_implementation/reviews/batch_38_review.md (dev-host vs
operator-workstation perf bound; spec text named StrEnum but
project pins Python 3.10).
Co-authored-by: Cursor <cursoragent@cursor.com>
Implements the public top-level F1 build orchestrator for E-C10 per
contract v1.1.0. Composes EngineCompiler (AZ-321), DescriptorBatcher
(AZ-322), and ManifestBuilder (AZ-323) into a single idempotent
operation guarded by a fcntl-backed cache_root/.c10.lock and a
post-build coverage walk.
Adds:
- CacheProvisionerImpl + FilelockFileLockFactory (provisioner.py)
- BuildRequest/BuildReport/BuildOutcome/SectorClassification DTOs +
FileLockFactory Protocol + replaced placeholder CacheProvisioner
Protocol with v1.1.0 surface (interface.py)
- C10ProvisionerConfig wired into C10ProvisioningConfig (config.py)
- BuildLockHeldError + ManifestCoverageError (errors.py)
- build_cache_provisioner composition root (c10_factory.py)
- 18 tests covering AC-1..AC-16 + NFR-perf-coverage-walk
- filelock>=3.13,<4.0 (single new third-party dep)
Idempotence (CP-INV-1) reuses AZ-323's _compute_manifest_hash /
_aggregate_tile_hash so the build-identity decision agrees byte-for-
byte with the Manifest's recorded manifest_hash. Coverage rollback
uses a .prev rename snapshot. Diagnostic compile_engines_for_corpus
is lock-free per AC-10.
Co-authored-by: Cursor <cursoragent@cursor.com>
Cumulative code review for batches 34-36 (AZ-507, AZ-323, AZ-324,
AZ-306, AZ-322) per implement skill Step 14.5 K=3 cadence.
Verdict: PASS_WITH_WARNINGS — 0 Critical / 0 High / 0 Medium / 3 Low
(all Maintainability). Previous review's Medium F1 (doc-vs-lint) is
RESOLVED by AZ-507. Carryover-Low findings tracked:
- F1: manifest_verifier imports private _aggregate_tile_hash from
manifest_builder; promote to public or extract to a shared module
(1-pt follow-up PBI).
- F2: AZ-508 task spec stale — c6 already consolidated within-component,
c7 has 2 active copies (+ a new thermal_publisher copy not in spec).
- F3: consumer-side Protocol cut pattern still un-documented in
architecture.md; pattern now 9+ instances and is the established
cross-component contract surface.
State updated: last_cumulative_review = batches_34-36; sub_step =
parse-tasks; batch 37 (AZ-325 C10 CacheProvisioner solo, 3pt) is next.
Co-authored-by: Cursor <cursoragent@cursor.com>
Production-default DescriptorIndex strategy backed by the faiss-cpu
PyPI wheel (>=1.7,<2.0). Implements the AZ-303 Protocol surface end
to end: HNSW32 + IndexIDMap2 search, atomic three-file rebuild
(.index + .sha256 sidecar + .meta.json), triple-consistency load
check, mmap-backed reads with IO_FLAG_MMAP|IO_FLAG_READ_ONLY, optional
warm-up query at construction, FAISS RuntimeError rewrap to
IndexUnavailableError / IndexBuildError, and FaissDescriptorIndex.from_config
classmethod wired into runtime_root.storage_factory.
The original spec required a custom pybind11 wrapper over a vendored
FAISS HEAD; the user opted for the upstream faiss-cpu wheel after
research fact #92 confirmed ARM64 wheel availability for Jetson and
the existing pyproject.toml already pinned faiss-cpu. cpp/faiss_index/
placeholder removed; BUILD_FAISS_INDEX flag retained as a
runtime/factory gate (no native target). Spec rewritten end-to-end and
archived to _docs/02_tasks/done/.
C6TileCacheConfig extended with faiss_index_path and
faiss_warmup_query_path fields. tests/conftest.py sets
KMP_DUPLICATE_LIB_OK=TRUE to remediate the macOS faiss/torch libomp
duplicate-load abort during pytest (no-op on CI Linux). 21 new tests
cover AC-1..12 + 2 NFRs + from_config smoke; AZ-303 protocol-conformance
fake updated with from_config classmethod.
Tests: 124/124 c6_tile_cache pass; 1334 project-wide pass; 1
pre-existing OKVIS2 submodule failure unrelated.
Doc sync: module-layout.md, components/08_c6_tile_cache/description.md
§5, batch_35_cycle1_report.md.
Co-authored-by: Cursor <cursoragent@cursor.com>
cmake 4.3.2, libomp 22.1.5, pybind11 3.0.4 (Python pkg) installed
locally; FAISS C++ source still to be vendored by AZ-306 itself.
sub_step.detail cleared per state.md conciseness rule.
Co-authored-by: Cursor <cursoragent@cursor.com>
AZ-507: codify cross-component import rule. Added
_types/inference_errors.py shim re-exporting EngineBuildError +
CalibrationCacheError from c7_inference; narrowed C10
EngineCompiler's except Exception to the two typed errors so unknown
exceptions propagate (AC-3). Rewrote module-layout.md "Imports from"
sections for 9 components + added Rule 9; appended an
architecture.md ADR-009 note explaining why components must go
through _types/*.
AZ-323: ManifestBuilder + Ed25519ManifestSigner. Canonical JSON via
orjson OPT_SORT_KEYS+OPT_INDENT_2, atomic-write Manifest.json + sha
sidecar + .sig via AZ-280, operator-key fingerprint allowlist gate
(C10-ST-01), ADR-010 takeoff_origin + flight_id baked into Manifest
AND manifest_hash so re-planned routes change the cache identity
(AC-15/AC-16). 20 unit tests cover all 16 ACs.
AZ-324: ManifestVerifierImpl. Fail-closed Steps A-D: Manifest.json
sidecar self-hash, Ed25519 trust-key set, schema parse with
absolute/.. path rejection + takeoff_origin in-bbox check, stream
SHA-256 per artifact with multi-failure accumulation. Operator mode
re-derives tiles_coverage_sha256 from C6; airborne mode trusts the
signed aggregate. 19 unit tests cover all 17 ACs.
Composition root: c10_factory.build_manifest_builder +
build_manifest_verifier + c6_tile_metadata_store_to_tiles_query
adapter (the one place that legitimately imports both C6 and C10
without violating the AZ-270 lint).
Dependency: pinned cryptography>=43.0,<46.0 in pyproject.toml.
Tests: 1300 passed, 80 skipped (env-only), ruff clean for all
AZ-323/324 files.
AZ-306 (FAISS) intentionally deferred to batch 35 — needs C++
pybind11 toolchain not present in this environment.
Co-authored-by: Cursor <cursoragent@cursor.com>
Open two ~2-point hygiene PBIs surfaced by
_docs/03_implementation/cumulative_review_batches_31-33_cycle1_report.md:
- AZ-507 (parent AZ-246 / E-CC-CONF) — align module-layout.md
cross-component import rules with the AZ-270 lint test. Resolves the
doc-vs-lint contradiction surfaced on AZ-321 by tightening the doc
(option (a) from the review) + hoisting EngineBuildError /
CalibrationCacheError to _types/inference_errors.py.
- AZ-508 (parent AZ-264 / E-CC-HELPERS) — consolidate 5 identical
_iso_ts_now() one-liners across c6_tile_cache + c7_inference into a
single Layer-1 helper at helpers/iso_timestamps.py.
Dependencies table headers bumped: 142 -> 144 tasks, 478 -> 482 points
(product 345 -> 349). State file's pause-point detail cleared; next
sub_step is the implement skill's Step 3 (compute next batch) for
batch 34.
Co-authored-by: Cursor <cursoragent@cursor.com>
Cumulative review covering AZ-298 / AZ-299 / AZ-321:
PASS_WITH_WARNINGS. 0 Critical, 0 High, 1 Medium, 2 Low.
Medium: `module-layout.md` declares c10 may import from c7
Public API but `test_az270_compose_root.test_ac6` forbids ANY
cross-component import — doc-vs-lint mismatch surfaced by
AZ-321; refactor pivoted to `CompileEngineCallable` local
Protocol cut. Flagged for hygiene PBI; not blocking.
Low: `_iso_ts_now` now duplicated five times across c7+c6;
consumer-side Protocol cut pattern recurring (LightGlue
`EngineHandle` + `CompileEngineCallable`). Both deferred to
the next hygiene cycle.
State advances to phase 3 (compute-next-batch) with
last_cumulative_review=batches_31-33 so the next /autodev
invocation enters at the right point.
Co-authored-by: Cursor <cursoragent@cursor.com>
Land the C10 per-model engine compile + cache-reuse orchestrator.
`EngineCompiler.compile_engines_for_corpus(request)` walks the
corpus, computes the canonical engine filename via AZ-281
`EngineFilenameSchema.build`, and either reuses the cached binary
(cache hit, AZ-280 `Sha256Sidecar.verify` returns True) or delegates
to the AZ-297 `compile_engine` on the injected runtime (cache miss;
the runtime owns the write path). Returns one `EngineCompileResult`
per backbone carrying the canonical `EngineCacheEntry`, outcome
(BUILT / REUSED), and `compile_duration_s` (None on reuse).
Hardware-tied reuse (D-C10-6 / D-C10-7) falls out of the filename
schema — a host change rebuilds at the new path and leaves the old
files untouched (AC-4).
Design corrections vs. the task spec body:
- The spec proposed a c10-local `EngineCacheEntry` carrying outcome
and duration; that name is already taken by the AZ-297 canonical
DTO. The wrapper is renamed `EngineCompileResult`; the canonical
shape wins.
- The spec called `InferenceRuntime.host_info()`, which is not in
the AZ-297 Protocol. `HostCapabilities` is threaded through
`EngineCompileRequest` instead so the composition root owns host
probing and the compiler stays decoupled.
- The c10 layer cannot import `components.c7_inference` (arch rule
`test_az270_compose_root.test_ac6`). `engine_compiler.py` defines
`CompileEngineCallable` — a structural Protocol cut of
`InferenceRuntime` exposing only `compile_engine` — and catches
broad `Exception` (re-raising preserves the original type;
`error_class` is recorded in the ERROR log payload).
Production
- engine_compiler.py: `CompileOutcome` enum, `BackboneSpec`,
`EngineCompileRequest`, `EngineCompileResult`,
`EngineCompileSummary` DTOs; `CompileEngineCallable` Protocol;
`EngineCompiler` with the single public method.
- config.py: `BackboneConfig` + `C10ProvisioningConfig`
(`workspace_mb` default 4 GiB to match C7 NFT-LIM-01); validate
positive shape dims and duplicate model_name detection in
`__post_init__`.
- runtime_root/c10_factory.py: `build_engine_compiler(config)` wires
the existing `build_inference_runtime` factory through;
`build_backbone_specs(config)` materialises the `BackboneSpec`
tuple from the config block.
- components/c10_provisioning/__init__.py: re-exports the AZ-321
surface and registers the new config block.
Tests
- test_engine_compiler.py: covers AC-1..AC-10 + missing-sidecar
sibling case for AC-5. Tier-1 via fake runtime that writes through
the REAL `Sha256Sidecar.write_atomic_and_sidecar`. Tier-2
placeholders for the cache-hit p99 NFR (200 MB engine sweep) and
kill-during-compile atomic-write NFR.
Docs
- module-layout.md: c10_provisioning Per-Component Mapping lists the
new internal modules (engine_compiler.py, config.py), the
composition-root c10_factory.py, the AZ-321 public re-export
surface, and the registered config block.
- batch_33_cycle1_report.md + reviews/batch_33_review.md:
PASS_WITH_WARNINGS (4 Low findings accepted).
Tests run: c10_provisioning 13 passing + 2 Tier-2 skips; combined
unit suite (excluding pending components) 543 passing, 21
env-skipped.
Co-authored-by: Cursor <cursoragent@cursor.com>
Land the fallback InferenceRuntime strategy that satisfies C7-IT-05:
when the TRT-direct path (AZ-298) cannot deserialise a cached engine
or when the operator explicitly selects ORT, the system stays in the
air at degraded latency rather than dropping the request. Conforms to
the AZ-297 Protocol; current_runtime_label() == "onnx_trt_ep".
Production
- onnx_trt_ep_runtime.py: compile_engine is a no-op returning an
EngineCacheEntry pointing at the source .onnx; deserialize_engine
is gate-first for .engine entries and gate-skip for .onnx, builds
an ORT InferenceSession with the provider list
[TensorrtExecutionProvider, CUDAExecutionProvider,
CPUExecutionProvider], stages cached engines into the ORT TRT EP
cache directory via symlink-or-copy, warms up with one session.run
after construction, and honours config.inference.ort_disallow_cpu_
fallback by raising EngineDeserializeError when the active provider
resolves to CPU; infer emits a one-shot c7.fallback_to_onnx_trt_ep
WARN log plus gcs_alert callback on first call when is_fallback=
True; release_engine is idempotent. _build_provider_args is the
single point that pins TRT EP option-key names (Risk-3) and caps
trt_max_workspace_size at gpu_memory_budget_bytes // 4 (AC-8).
- config.py: adds ort_trt_cache_dir (validated non-empty) and
ort_disallow_cpu_fallback to C7InferenceConfig.
- fdr_client/records.py: adds c7.fallback_to_onnx_trt_ep and
c7.cpu_fallback FDR record kinds.
Tests
- test_onnx_trt_ep_runtime.py: covers AC-1..AC-8 + Risk-2 CPU-fallback
alert + Risk-3 option-key pin + NFR-reliability error rewrap; Tier-1
via fake ORT session; Tier-2 placeholders skip on macOS dev for
numerical FP16 comparison and session-creation perf NFR.
- test_protocol_conformance.py: drops onnx_trt_ep from the missing-
module parametrize now that the module ships.
- test_az272_fdr_record_schema.py: extends per-kind fixture builder
to cover the two new C7 FDR kinds in the roundtrip / schema-version
AC tests.
Docs
- module-layout.md: replaces the pending onnx_trt_runtime row with
the shipped onnx_trt_ep_runtime row + capabilities list.
- batch_32_cycle1_report.md + reviews/batch_32_review.md: full batch
+ self-review (PASS_WITH_WARNINGS, 4 Low findings accepted).
Tests run: c7_inference 139 passing + 17 Tier-2 skips; combined unit
suite (excluding pending components) 529 passing, 19 env-skipped.
Co-authored-by: Cursor <cursoragent@cursor.com>