# C10 — Pre-flight Cache Provisioning ## 1. High-Level Overview **Purpose**: build the **model-derived** pre-flight cache artifacts on top of an already-populated tile store, and verify them at takeoff. After C11 `TileDownloader` has fetched tiles into C6, C10 orchestrates: compile/deserialize TensorRT engines via C7 → batch each tile through C2's backbone for descriptors → atomically write FAISS HNSW index with SHA-256 sidecars (D-C10-3) → write Manifest with hash of (model + calibration + corpus + sector_class **+ takeoff_origin**) for D-C10-1 idempotence. The `takeoff_origin` is supplied by C12 (derived from `Flight.waypoints[0]` via the `FlightsApiClient`, ADR-010 + AZ-489); C10 treats it as one more identity field and bakes it into both the Manifest body and the manifest-hash. At F2 takeoff load, run `verify_manifest` (D-C10-3 SHA-256 content-hash gate) before allowing the system to arm; the verifier also surfaces `takeoff_origin` so the companion's composition root can pass it to `C5.set_takeoff_origin(origin, sigma_horiz_m, sigma_vert_m)` before any sensor sample (AZ-490). **C10 does NOT touch `satellite-provider`.** Tile I/O — both download (F1 inbound) and post-landing upload (F10) — lives in C11 (Tile Manager). C10 reads tiles from C6, writes engines + descriptors + manifest to filesystem and Postgres. The split is operational: C11 carries the operator-side network identity (TLS API key for download, per-flight signing key for upload) and the airborne-exclusion property (ADR-004); C10 carries the model identity and the takeoff-load verifier — neither of which need to leave the workstation/companion enclave at runtime. **Architectural Pattern**: Coordinator — single concrete implementation `CacheProvisioner` behind two interfaces (`CacheProvisioner` for the F1 build phase, `ManifestVerifier` for F2's content-hash gate). The interfaces are split because F2 only needs the verifier and shouldn't pull in the full provisioning code path. **Upstream dependencies**: - C12 OperatorTooling → triggers `build_cache_artifacts(...)` after C11 `TileDownloader` has populated C6. - C6 TileStore + TileMetadataStore + DescriptorIndex → read source (tiles + metadata), write target (FAISS index). - C7 InferenceRuntime → engine compile + deserialize. - C2 backbone (via C7 engine) → descriptor batched generation. **Downstream consumers**: - F2 takeoff load → consumes `verify_manifest` outcome. ## 2. Internal Interfaces ### Interface: `CacheProvisioner` | Method | Input | Output | Async | Error Types | |--------|-------|--------|-------|-------------| | `build_cache_artifacts` | `BuildRequest` | `BuildReport` | No (offline; minutes) | `EngineBuildError`, `DescriptorBatchError`, `ManifestWriteError`, `IdempotentNoOp` | | `compile_engines_for_corpus` | `BackboneList` | `list[EngineCacheEntry]` | No | `EngineBuildError`, `CalibrationCacheError` | ### Interface: `ManifestVerifier` | Method | Input | Output | Async | Error Types | |--------|-------|--------|-------|-------------| | `verify_manifest` | `manifest_path: Path` | `VerificationResult` | No | `ManifestNotFoundError`, `ContentHashMismatchError` | **Input/Output DTOs**: ``` BuildRequest: bbox: BoundingBox (lat_min, lon_min, lat_max, lon_max) # scopes which C6 tiles are in the manifest zoom_levels: list[int] sector_class: enum {active_conflict, stable_rear} # baked into manifest calibration_path: Path cache_root: Path takeoff_origin: LatLonAlt | None # ADR-010 / AZ-489; baked into manifest + hash flight_id: UUID | None # ADR-010; pass-through provenance, baked into manifest BuildReport: engines_built: int engines_reused: int descriptors_generated: int manifest_hash: sha256 outcome: enum {success, failure, idempotent_no_op} failure_reason: string (optional) Manifest: see data_model.md (carries takeoff_origin + flight_id when set; hash includes them) EngineCacheEntry: see data_model.md VerificationResult: manifest_hash_match: bool per_artifact_hash_match: dict[Path, bool] takeoff_origin: LatLonAlt | None # passed through from manifest for C5 warm-start (AZ-490) flight_id: UUID | None outcome: enum {pass, fail} fail_reasons: list[string] ``` ## 3. External API Specification Not applicable. C10 has no network surface — all I/O is local filesystem + local Postgres. ## 4. Data Access Patterns C10 reads `tiles` rows from C6 (scoped to the build's bbox + zoom_levels), writes the FAISS `.index` to filesystem via `Sha256Sidecar`, and writes Manifest + `manifests` row to Postgres via C6. ### Storage Estimates | Table/Collection | Est. Row Count (1yr) | Row Size | Total Size | Growth Rate | |-----------------|---------------------|----------|------------|-------------| | Manifest | one per build per cached area | ~10 KB (YAML/JSON) | negligible | per build | | SHA-256 sidecars | one per artifact (.index, calibration JSON, manifest, .engine) | 64 B (hex digest) | negligible | per build | ### Data Management **Seed data**: none — C10 writes from scratch (or D-C10-1 idempotently no-ops). Tiles must already be in C6 (placed there by C11 `TileDownloader`); a missing-tiles condition is a build error, not a download trigger. **Rollback**: D-C10-1 manifest-hash check makes provisioning idempotent. Atomic writes (atomicwrites package) prevent partial states; on partial failure, the previous-good cache remains until the new one is fully written. ## 5. Implementation Details **Algorithmic Complexity**: dominated by descriptor batched generation on Jetson (GPU-bound). Worst-case ~400 km² provisioning is ≤ tens of minutes (offline, not time-critical per AC-8.3). Tile network bandwidth is **not** in C10's budget — that cost is in C11. **State Management**: stateless w.r.t. flight lifetime. No connection state — all dependencies are local. **Key Dependencies**: | Library | Version | Purpose | |---------|---------|---------| | atomicwrites | latest | Atomic file replacement for `.index` + Manifest (D-C10-3) | | hashlib (stdlib) | stdlib | SHA-256 content-hash sidecars | | PyYAML / orjson | per project pin | Manifest serialization | | numpy | per project pin | Descriptor batch ndarray container (AZ-322 `DescriptorBatcher`) | **AZ-322 internal phase — `DescriptorBatcher`**: The `populate_descriptors` phase walks every tile in C6 for the requested `(bbox, zoom_levels, sector_class)`, embeds them through C7's `InferenceRuntime` (via `C7EngineBackboneEmbedder`, the default `BackboneEmbedder` impl), and hands the resulting `(N, descriptor_dim)` ndarray to AZ-306's `DescriptorIndex.rebuild_from_descriptors` for atomic FAISS index write. CUDA OOM is handled via halve-and-retry bounded by `C10BatcherConfig.max_oom_retries` (default 1: 64 → 32, then succeed-or-fail-fast) so a real GPU regression surfaces in seconds rather than via silent retries. Per-10% progress is emitted both as DEBUG logs (`c10.descriptor.progress`) and via an optional `progress_callback` so operator tooling can wire a TTY/GUI bar without touching the batcher itself. The descriptor int64 id formula is the canonical AZ-306 scheme (`int.from_bytes(sha256("zoom|lat|lon").first8, "big", signed=True)`) — invented locally to avoid a circular dependency back into C6 internals would break AC-6. **Error Handling Strategy**: - `EngineBuildError` / `CalibrationCacheError`: surfaced from C7 — never silently fall back; operator must intervene. - `DescriptorBatchError`: CUDA OOM during descriptor generation. Halve batch size and retry once; if still OOM, surface to operator. - `ManifestWriteError`: filesystem error or atomic-write rollback. Cache marked invalid; operator must re-run. - `IdempotentNoOp`: D-C10-1 manifest-hash matched the prior build's hash; skip rebuild; emit no-op report. - `ContentHashMismatchError` (F2): refuse takeoff; STATUSTEXT to GCS; FDR records the event; operator must re-run F1. - **Missing tiles in C6 for the requested bbox/zoom**: surface as `BuildReport.failure` with explicit instruction to run C11 `TileDownloader` first; do **not** fall back to a network fetch — that responsibility lives in C11. ## 6. Extensions and Helpers | Helper | Purpose | Used By | |--------|---------|---------| | `Sha256Sidecar` | atomic write + content-hash sidecar pattern | C6, C7, C10 | | `EngineFilenameSchema` | self-describing filename per D-C10-7 | C7, C10 | | `WgsConverter` | bbox math | C4, C5, C6, C8, C10 | ## 7. Caveats & Edge Cases **Known limitations**: - C10 depends on C6 already containing the tiles for the requested bbox + zoom levels. The F1 cache-build workflow (C12) sequences `C11 TileDownloader → C10 build_cache_artifacts`; C10 alone is not a complete F1. - D-C10-3 SHA-256 content-hash gate must cover EVERY artifact: every tile (the per-tile hash is computed at C11 download time and stored in C6), the FAISS `.index`, the calibration JSON, and the Manifest itself. Missing sidecars are a release-blocking defect. **Potential race conditions**: - Concurrent `build_cache_artifacts` invocations on the same cache root would corrupt state. Single-process operator-orchestrator wraps with a filesystem lockfile (the same lockfile C11 honours); if a second invocation tries to start, fail with explicit error. **Performance bottlenecks**: - Descriptor batched generation is GPU-bound; batching is the main lever (D-C7-1 INT8/FP16 mix decision applies). - Engine compile is workspace-bound on Jetson; D-C10-6 calibration cache reuse is the main lever. ## 8. Dependency Graph **Must be implemented after**: C6 (read source for tiles, write target for FAISS), C7 (engine + descriptor runtime), C2 (backbone interface for descriptor generation; called via C7). **Can be implemented in parallel with**: C8, C13. **Blocks**: C12 (operator can't sequence F1 without C10 ready), F1, F2 (verify_manifest), F8 (warm-cache verify on reboot recovery). ## 9. Logging Strategy | Log Level | When | Example | |-----------|------|---------| | ERROR | `EngineBuildError`, `DescriptorBatchError`, `ManifestWriteError`, `ContentHashMismatchError` (F2) | `C10 engine build failed: backbone=disk; takeoff blocked` | | WARN | engine cache miss falls through to build | `C10 engine cache miss: model=ultra_vpr; sm=87, jp=6.2, trt=10.3, fp16; rebuild` | | INFO | Build start/end + report; verify_manifest pass | `C10 build complete: engines=4, descriptors=87654, manifest_hash=…; outcome=success` | | DEBUG | per-tile descriptor batch progress | `C10 descriptor batch progress: 12345/87654 (14%)` | **Log format**: structured JSON. **Log storage**: stdout (operator tool); journald (companion verify); FDR via C13 (only for F2 verify_manifest events — provisioning is offline and goes to operator-facing logs, not flight FDR).