Files
gps-denied-onboard/_docs/02_document/contracts/c10_provisioning/cache_provisioner.md
T
Oleksandr Bezdieniezhnykh 880eabcb3f Decompose Step 6 snapshot: 140 task specs + contract docs
Closes out greenfield Step 6 (Decompose) for all 14 components
(C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446
plus the _dependencies_table.md and component contract documents.

State file updated to greenfield Step 7 (Implement), not_started.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 00:39:48 +03:00

8.5 KiB

Contract: CacheProvisioner (C10)

Type: Python Protocol (@runtime_checkable) — local in-process API. Producer task: AZ-325_c10_cache_provisioner Consumers:

  • C12 Operator Tooling — orchestrates the F1 build sequence C11 TileDownloader → CacheProvisioner.build_artifacts and surfaces the BuildReport to the operator (E-C12 / AZ-253).
  • C13 FDR — out of scope for build (F1 is offline / pre-flight); F2's verify is owned by the ManifestVerifier contract.

Purpose

CacheProvisioner is the public top-level surface for the C10 build phase. It composes EngineCompiler (AZ-321), DescriptorBatcher (AZ-322), and ManifestBuilder (AZ-323) into a single idempotent operation that the operator runs after C11 TileDownloader has populated C6. The Provisioner enforces D-C10-1 idempotence (skip rebuild when the build-identity hash matches the prior Manifest), D-C10-3 ManifestCoverageError (every shipped artifact under cache_root MUST be in the Manifest — no smuggled files), and D-C10-6 hardware-tied engine reuse (delegated to AZ-321). It does NOT touch satellite-provider (per epic § Architecture notes); tile I/O is C11's responsibility.

Public Surface

from pathlib import Path
from typing import Protocol, runtime_checkable


@runtime_checkable
class CacheProvisioner(Protocol):
    """Public top-level orchestrator for C10 cache build.

    Idempotent: if the prior Manifest's build-identity hash matches the
    request's, returns `outcome=IDEMPOTENT_NO_OP` without rebuilding.
    Otherwise composes engine compile + descriptor population + Manifest
    write + coverage check.
    """

    def build_cache_artifacts(self, request: BuildRequest) -> BuildReport: ...
    def compile_engines_for_corpus(self, request: EngineCompileRequest) -> tuple[EngineCacheEntry, ...]: ...

DTOs

from dataclasses import dataclass
from enum import Enum
from pathlib import Path


class SectorClassification(Enum):
    ACTIVE_CONFLICT = "active_conflict"
    STABLE_REAR = "stable_rear"


class BuildOutcome(Enum):
    SUCCESS = "success"
    FAILURE = "failure"
    IDEMPOTENT_NO_OP = "idempotent_no_op"


@dataclass(frozen=True)
class Bbox:
    lat_min: float
    lon_min: float
    lat_max: float
    lon_max: float


@dataclass(frozen=True)
class BuildRequest:
    bbox: Bbox
    zoom_levels: tuple[int, ...]
    sector_class: SectorClassification
    calibration_path: Path
    cache_root: Path
    key_path: Path  # operator signing key per C10-ST-01


@dataclass(frozen=True)
class BuildReport:
    outcome: BuildOutcome
    engines_built: int
    engines_reused: int
    descriptors_generated: int
    manifest_hash: str | None
    manifest_path: Path | None
    failure_reason: str | None
    elapsed_s: float

(EngineCompileRequest and EngineCacheEntry are AZ-321's; re-exported for convenience.)

Exceptions

Exception When raised Caller action
BuildLockHeldError Another build_cache_artifacts invocation holds the cache_root lockfile (per description.md § 7 race-condition mitigation). Operator waits / kills the other process; not retried automatically.
ManifestCoverageError After build, an orphan file exists under cache_root that is not listed in the Manifest. Build is rolled back to prior-good Manifest (if present); operator inspects the orphan.
EngineBuildError, CalibrationCacheError Propagated from AZ-321 / AZ-298. Operator triages GPU / calibration.
DescriptorBatchError Propagated from AZ-322. Operator triages GPU OOM / model.
ManifestWriteError Propagated from AZ-323 (key fingerprint mismatch in operator mode, key load failure, atomic-write failure). Operator inspects key / disk.

BuildOutcome.FAILURE is reserved for soft failures captured in BuildReport (missing tiles in C6, coverage warning when configured non-strict). Hard errors raise.

Invariants

ID Invariant Why
CP-INV-1 Idempotence: if Manifest.json exists at cache_root AND its manifest_hash equals the build-identity hash for the new request → outcome=IDEMPOTENT_NO_OP, ZERO new compiles, ZERO new embeds, ZERO new Manifest writes; the existing Manifest is left untouched. D-C10-1; warm re-run ≤ 1 min envelope (C10-PT-01).
CP-INV-2 A failed build_cache_artifacts does NOT leave the cache in a worse state than at the start: new engines may exist (cache hits) but the Manifest is either the previous-good one OR rolled back; the FAISS index is either the previous-good one OR atomically replaced. Operators can retry safely.
CP-INV-3 After a SUCCESS outcome, ManifestCoverageError has been verified absent: every file under cache_root (recursively, excluding the Manifest itself + sidecars + sig) is listed in the Manifest's artifacts. D-C10-3 — no smuggled artifacts in the takeoff cache.
CP-INV-4 Concurrent build_cache_artifacts calls on the same cache_root are mutually exclusive via a filesystem lockfile at cache_root/.c10.lock. description.md § 7 race-condition mitigation.
CP-INV-5 cache_root must already exist; build_cache_artifacts does NOT create the directory tree (operator workflow places it). Avoids accidental builds in unintended paths.
CP-INV-6 No network calls (no satellite-provider, no Postgres TLS to a remote DB beyond the local instance, no metric push). Epic § Architecture notes: C10 is workstation-local.
CP-INV-7 The operator key file at request.key_path is opened exactly once (via AZ-323's signer) and zeroized when out of scope; this contract does NOT cache the key in memory across calls. Operator key hygiene.

Non-Goals

  • Tile fetch from satellite-provider — owned by E-C11 / C11 TileDownloader.
  • Engine deserialization at takeoff — owned by E-C7 / AZ-298 + C5 takeoff arming.
  • Manifest verification — owned by AZ-324's ManifestVerifier (separate contract).
  • Multi-cache management (rotating between sector caches) — operator runs build_cache_artifacts per cache_root.
  • Garbage collection of stale engines — explicit operator action; not part of the build flow.
  • Resumable build (mid-build process kill → resume from last batch) — out of scope; restart from scratch.

Versioning

  • v1.0.0 — initial Protocol surface (this document).
  • Breaking changes: changing BuildRequest shape, removing a BuildOutcome, adding a required field — bump major.
  • Additive changes: new optional kwarg, new BuildOutcome value, new field on BuildReport — bump minor. Consumers MUST handle unknown outcomes gracefully (treat as FAILURE).
  • Patch: clarifications, doc edits.
Version Date Notes Author
1.0.0 2026-05-10 Initial contract — produced by AZ-325 (E-C10 decomposition) autodev

Test Cases (consumer side)

ID Scenario Expected Outcome
CP-TC-1 Cold build with all dependencies satisfied outcome=SUCCESS; counts > 0; Manifest at cache_root/Manifest.json
CP-TC-2 Warm build, identical request outcome=IDEMPOTENT_NO_OP; counts all 0; Manifest unchanged on disk
CP-TC-3 Warm build, different bbox outcome=SUCCESS; rebuild happens; new Manifest replaces old (atomic)
CP-TC-4 C6 has zero tiles for the requested scope outcome=FAILURE; failure_reason directs operator to run C11 first
CP-TC-5 Concurrent invocation while another build in progress BuildLockHeldError; second invocation does not corrupt state
CP-TC-6 An orphan file exists under cache_root after build ManifestCoverageError; rolled back to prior Manifest if present
CP-TC-7 Operator key file fingerprint not in allowlist (operator mode) ManifestWriteError (propagated from AZ-323); ZERO file writes
CP-TC-8 EngineBuildError mid-compile Exception propagates; partial cache state consistent (atomic engines on disk for those that succeeded; Manifest NOT updated)
CP-TC-9 DescriptorBatchError (persistent CUDA OOM) Exception propagates; engines may be on disk; Manifest NOT updated
CP-TC-10 Conformance: isinstance(impl, CacheProvisioner) True
CP-TC-11 compile_engines_for_corpus directly callable for re-compile-only flows Returns tuple[EngineCacheEntry, ...]; no descriptor / Manifest work
CP-TC-12 Cold build wall-clock benchmark on Tier-1 dev workstation, 1k tiles, 3 backbones ≤ 12 min (NFR C10-PT-01)
CP-TC-13 Warm idempotent re-run benchmark ≤ 1 min (NFR C10-PT-01)