Files
Oleksandr Bezdieniezhnykh f7b2e70085 [AZ-325] C10 CacheProvisioner orchestrator
Implements the public top-level F1 build orchestrator for E-C10 per
contract v1.1.0. Composes EngineCompiler (AZ-321), DescriptorBatcher
(AZ-322), and ManifestBuilder (AZ-323) into a single idempotent
operation guarded by a fcntl-backed cache_root/.c10.lock and a
post-build coverage walk.

Adds:
- CacheProvisionerImpl + FilelockFileLockFactory (provisioner.py)
- BuildRequest/BuildReport/BuildOutcome/SectorClassification DTOs +
  FileLockFactory Protocol + replaced placeholder CacheProvisioner
  Protocol with v1.1.0 surface (interface.py)
- C10ProvisionerConfig wired into C10ProvisioningConfig (config.py)
- BuildLockHeldError + ManifestCoverageError (errors.py)
- build_cache_provisioner composition root (c10_factory.py)
- 18 tests covering AC-1..AC-16 + NFR-perf-coverage-walk
- filelock>=3.13,<4.0 (single new third-party dep)

Idempotence (CP-INV-1) reuses AZ-323's _compute_manifest_hash /
_aggregate_tile_hash so the build-identity decision agrees byte-for-
byte with the Manifest's recorded manifest_hash. Coverage rollback
uses a .prev rename snapshot. Diagnostic compile_engines_for_corpus
is lock-free per AC-10.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 05:00:16 +03:00

8.5 KiB
Raw Permalink Blame History

Batch 37 — Cycle 1 Report

Date: 2026-05-13 Batch: 37 (single task — closes the C10 build-phase trilogy AZ-321/322/323/325) Tasks: AZ-325 (C10 CacheProvisioner orchestrator, 3pt) Status: complete; AZ-325 pending transition to "In Testing".

Scope

AZ-325 implements CacheProvisionerImpl — the public top-level F1 build orchestrator for E-C10. It composes EngineCompiler (AZ-321), DescriptorBatcher (AZ-322), and ManifestBuilder (AZ-323) into a single idempotent operation guarded by a filesystem lockfile and a post-build coverage walk.

This unblocks E-C12 OperatorTooling — c10 build becomes a one-liner — and provides the final assembly point for D-C10-1 idempotence and D-C10-3 ManifestCoverageError.

Architectural Decisions

1. Public surface lives in interface.py only

The contract _docs/02_document/contracts/c10_provisioning/cache_provisioner.md v1.1.0 defines CacheProvisioner Protocol + BuildRequest / BuildReport / BuildOutcome / SectorClassification DTOs + FileLockFactory Protocol. These all live in interface.py — the single public API surface for the component. The implementation (provisioner.py) imports the Protocols and DTOs from there and declares only the implementation classes in its own __all__. This matches the pattern established by AZ-321 / AZ-323 / AZ-324.

2. Build-identity hash byte-aligned with AZ-323

AZ-325's idempotence check has to match the manifest_hash AZ-323 wrote into the prior Manifest.json byte-for-byte. Re-implementing the hash formula here would risk drift. We instead import AZ-323's existing _compute_manifest_hash and _aggregate_tile_hash helpers directly and reconstruct the inputs the helper needs from a combination of the new BuildRequest (for tiles_coverage_sha256, calibration_sha256, sector/bbox/zoom/origin/flight) and the prior Manifest's recorded artifacts (engine SHA-256s, descriptor index SHA-256). The leading underscore on the helpers is acknowledged technical debt — it remains finding F1 from the batch 3133 cumulative review, with a deferred hygiene PBI to extract a shared _build_identity module after AZ-324 ships. The decision is documented inline in provisioner.py:43-50.

3. Idempotence path performs zero compile / embed / write work

CP-INV-1 + AC-2 are explicit: a warm idempotent re-run must result in zero calls to compile_engines_for_corpus, zero calls to populate_descriptors, zero calls to build_manifest, and the on-disk Manifest.json must remain byte-identical (mtime unchanged). The orchestrator never instantiates a write path before the idempotence check returns — only tile_metadata_store.query_by_bbox (a read) + Manifest.json parse + SHA-256 of calibration_path are touched. All spies in the unit tests verify this.

4. Coverage rollback uses .prev snapshot, not in-memory bytes

_run_active_build snapshots the prior-good Manifest by renaming Manifest.jsonManifest.json.prev BEFORE the active phases run. Every error path (engine compile raise, descriptor batcher raise, manifest builder raise, ManifestCoverageError) calls _restore_prior_manifest which deletes the new partial Manifest and renames .prev back. This guarantees CP-INV-2 (failed build leaves cache no worse than at start) without holding bytes in memory across the whole build.

5. Lockfile uses filelock package (fcntl-backed on POSIX)

The FileLockFactory Protocol is the seam; the default FilelockFileLockFactory wraps filelock.FileLock (fcntl flock on POSIX → kernel auto-releases on process exit, satisfying the SIGKILL clause of AC-8; msvcrt locks on Windows). On acquisition timeout, the wrapper re-raises as the contract's typed BuildLockHeldError. Lockfile cleanup is best-effort — a leftover .c10.lock is harmless (filelock re-uses the file on next acquisition); the kernel-level advisory lock is what enforces mutual exclusion.

6. Diagnostic compile_engines_for_corpus is lock-free

AC-10 / CP-TC-11: the engine-only diagnostic passthrough does NOT acquire the lockfile. Operators run this for hardware-change scenarios where forcing a full transactional build would be overkill, and the lock-free path keeps it from contending with a concurrently-held lock from an unrelated build_cache_artifacts invocation (covered by test_diagnostic_engine_compile_does_not_acquire_lock).

7. C10ProvisionerConfig lives at the top of C10ProvisioningConfig

The new config dataclass (coverage_strict, lock_timeout_s, manifest_filename) is wired in as C10ProvisioningConfig.provisioner, matching the existing manifest / engine_compiler sub-block pattern. The composition root reads block.provisioner and passes it directly into the orchestrator's constructor.

Files Changed

Production code (new)

  • src/gps_denied_onboard/components/c10_provisioning/provisioner.pyCacheProvisionerImpl (orchestrator) + _LockGuard + FilelockFileLockFactory.

Production code (modified)

  • pyproject.toml — added filelock>=3.13,<4.0 (single new third-party dep, per task constraint).
  • src/gps_denied_onboard/components/c10_provisioning/interface.py — replaced placeholder CacheProvisioner Protocol with v1.1.0 surface; added BuildOutcome, BuildRequest, BuildReport, SectorClassification, FileLockFactory.
  • src/gps_denied_onboard/components/c10_provisioning/errors.py — added BuildLockHeldError, ManifestCoverageError.
  • src/gps_denied_onboard/components/c10_provisioning/config.py — added C10ProvisionerConfig + integrated as C10ProvisioningConfig.provisioner sub-block.
  • src/gps_denied_onboard/components/c10_provisioning/__init__.py — re-exported new public symbols.
  • src/gps_denied_onboard/runtime_root/c10_factory.py — added build_cache_provisioner(config, *, engine_compiler, descriptor_batcher, manifest_builder, tile_metadata_store, host, precision, clock) composition-root factory.

Tests (new)

  • tests/unit/c10_provisioning/test_cache_provisioner.py — 18 tests covering AC-1..AC-16 + NFR-perf-coverage-walk + test_diagnostic_engine_compile_does_not_acquire_lock supplemental. AC-12 (cold-build benchmark) is wired with pytest.skip() — runs manually on Tier-1 GPU host only.

Test Results

  • 17 / 17 AZ-325 tests pass; 1 GPU-only test skipped as expected.
  • 80 / 80 targeted runs pass on tests/unit/c10_provisioning/ (excluding the pre-existing AZ-322 faiss-import failure) + tests/unit/composition_root/.
  • One pre-existing failure is unchanged from HEAD: tests/unit/c10_provisioning/test_descriptor_batcher.py::test_ac6_descriptor_id_mapping_matches_az306_scheme fails with ModuleNotFoundError: No module named 'faiss' because faiss is an optional Tier-1 dependency. Verified pre-existing by git stash + re-run on HEAD. Not introduced by AZ-325; tracked in _docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md context.

Decisions Ledger

Decision Rationale
Public surface centralised in interface.py Mirrors AZ-321 / AZ-323 / AZ-324; one source of truth for contract Protocols + DTOs
Idempotence uses AZ-323's private hash helpers Byte-for-byte agreement with the on-disk manifest_hash; refactor deferred to a hygiene PBI
.prev rollback over in-memory snapshot Lower memory pressure for large Manifests; rename is atomic
filelock chosen over fasteners Already idiomatic for the project size; fcntl-backed; SIGKILL-safe
Diagnostic passthrough is lock-free AC-10; operator-controlled engine-only re-compile must not contend with a held lock
C10ProvisionerConfig is a sub-block of C10ProvisioningConfig Matches existing manifest / engine_compiler pattern; keeps the config tree shallow

Notes

  • build_cache_provisioner is wired but no integration test exists yet for the full real-AZ-321/322/323 pipeline (requires GPU + FAISS + TRT). E2E coverage lands with AZ-326 (T5 orchestrator) which composes the provisioner into the operator CLI.
  • F1 from the batch 3133 cumulative review (verifier importing private helper from manifest_builder) carries over; AZ-325 also depends on the same private helpers. The hygiene PBI to extract a shared _build_identity module is intentionally deferred — both consumers (AZ-324 verifier + AZ-325 provisioner) need the same helper, and a single refactor PBI after AZ-326 is cleaner than re-touching each consumer twice.
  • The OKVIS2 cmake submodule failure (carryover from batch 35/36) remains and is independent of this batch.