Implements F1 pre-flight cache build orchestrator on the operator workstation. Composes C11 TileDownloader (AZ-316), C12 CompanionBringup (AZ-327), C12 FlightsApiClient (AZ-489), and the new RemoteCacheProvisionerInvoker into one sequenced flow guarded by a filelock-backed workstation-side lockfile. Architectural decisions: - Phase-0 flight-resolve runs BEFORE the lockfile (ADR-010): a flight that cannot be resolved is an operator-input error, not a contended- resource error. Enforced by AC-11 + AC-14. - Consumer-side cuts (AZ-507) for C11 + C10 types: local Protocols / mirror DTOs in tile_downloader_cut.py and _types.py; external errors matched by name-based whitelisting so unknown exceptions still propagate per AC-6. Cross-component type translation lives at the composition root (c12_factory). - Failure surfacing: recognised operational failures (download error, companion not ready, build error, flight-resolve error) return as CacheBuildReport(outcome=failure, failure_phase=...). Only lockfile contention raises (BuildLockHeldError) since no phase ever ran. - Workstation-side filelock library (project pin); no custom primitive. - Remote C10 stdout streamed line-by-line as DEBUG with api_key / auth_token redacted before logging (defence-in-depth). - CLI is now a thin adapter; all workflow logic lives in build_cache.py. operator-tool build-cache exit codes map per CacheBuildReport.failure_phase + failure_exception_type. Tests: 116 c12 unit tests pass (29 new for AZ-328 covering 15/15 ACs + NFR-perf-overhead microbench; 7 new for remote_c10_invoker; 3 new for file_lock; test_cli_build_cache rewritten for new orchestrator interface). Full repo suite: 1522 passed, 80 skipped. Also: replays Batch 42's ruff format leftover for c12 flights_api + test_az489 files (formatter ran over the c12 directory after new files were added). Pure whitespace; no behaviour change. Full report: _docs/03_implementation/batch_43_cycle1_report.md Co-authored-by: Cursor <cursoragent@cursor.com>
12 KiB
Batch 43 — Cycle 1 Report
Date: 2026-05-13 Batch: 43 Tasks: AZ-328 (C12 Build-Cache Orchestrator, 5pt) Status: complete; ticket ready to transition to "In Testing".
Scope
AZ-328 delivers BuildCacheOrchestrator — the F1 (pre-flight cache
build) workflow head for the operator workstation. It composes the
already-shipped C11 TileDownloader (AZ-316), C12 CompanionBringup
(AZ-327), C12 FlightsApiClient (AZ-489), and the new
RemoteCacheProvisionerInvoker (this task) into one sequenced flow,
guarded by a workstation-side filelock lockfile. The operator-tool build-cache subcommand (T1, AZ-326) is now wired to a real
orchestrator and returns granular exit codes per failure phase.
This unblocks the F1 happy-path acceptance test (C12-IT-02) and gives operators the first end-to-end "I have a flight ID, build me a cache" workflow.
Architectural Decisions
1. Phase-0 flight-resolve runs BEFORE the lockfile (ADR-010)
Per the AZ-328 spec, the orchestrator resolves the FlightDto (via
flights_api_client.fetch_flight or load_flight_file), derives the
bbox, and computes the takeoff origin before acquiring the .c12.lock
file. Rationale: a flight that cannot be resolved is an operator-input
error (wrong flight ID, missing JSON file, expired auth token); holding
a contended-resource lock while the operator fixes their input would
artificially block parallel builds against unrelated flights. AC-11 and
AC-14 enforce.
2. Consumer-side cuts (AZ-507) for C11 + C10 types
Per the workspace cross-component import policy, C12 cannot import C11 or C10 directly. Implemented as:
tile_downloader_cut.py— localTileDownloaderCutProtocol +DownloadRequestCut/DownloadBatchReportCut/DownloadOutcomeCutDTOs mirroring the C11 shapes the orchestrator actually consumes.RemoteBuildOutcome+RemoteBuildReportin_types.pymirror the C10BuildOutcome+BuildReportshapes that come back as JSON over SSH (the orchestrator never imports C10 types — it parses JSON the remote process emits).- External-error matching uses name-based whitelisting
(
_is_recognised(exc, frozenset({"SatelliteProviderError", ...}))) rather thanisinstance. C12 cannot import the actual C11/C10 exception classes; matching by class name keeps the orchestrator's failure-phase classification accurate without violating the policy. Anything not on the whitelist propagates raw (per AC-6 — unexpected exceptions still release the lock and surface to the caller). - C11 → C12 type translation lives at the composition root
(
c12_factory.build_build_cache_orchestrator), where both components may legally be imported together.
3. Failure surfacing: report vs raise
Spec text alternates between "raise CacheBuildError(...)" and
"return CacheBuildReport(outcome=failure, ...)". Resolved by:
- Returning
CacheBuildReport(outcome=failure, failure_phase=...)for all recognised operational failures (download error, companion not ready, build error, flight-resolve error). The CLI's_exit_code_for_reporthelper translates these into the existing exit-code constants —EXIT_DOWNLOAD_FAILURE(20),EXIT_BUILD_FAILURE(21),EXIT_FLIGHT_RESOLVE_*(90/91/92/93/94/95 per AZ-489's existing taxonomy, selected via the newfailure_exception_typefield onCacheBuildReport). - Raising
BuildLockHeldErrorfor lockfile contention only — this is the one case where there is noCacheBuildReportto return (no phase ever ran). The CLI maps it toEXIT_LOCK_HELD(50).
This keeps the orchestrator's public surface a pure function-style return-the-report API for normal operator flows, while reserving exceptions for the "we never even started" case.
4. Workstation-side filelock, NOT a custom primitive
FileLock / FileLockFactory Protocols + FilelockFileLockFactory
concrete wrap the already-pinned filelock library (used by E-C13).
Cross-platform file-locking correctness is non-trivial; the spec
explicitly forbids rolling our own. The LockTimeout exception is the
single failure mode the orchestrator catches.
5. Remote C10 stdout streaming + secret redaction
RemoteCacheProvisionerInvoker.invoke line-iterates the SSH session's
stdout, logging each line as kind="c10.remote.progress" at DEBUG.
Memory stays bounded even for multi-hour builds. Before logging, every
line is filtered through a redactor that replaces the literal api_key
and auth_token values with <REDACTED> (defence-in-depth — guards
against C10 itself accidentally echoing them). The final stdout line
is parsed as RemoteBuildReport JSON; malformed output raises
BuildReportParseError (a CacheBuildError subclass) with the
captured tail.
6. CLI request construction now lives in cli.py, not orchestrator
The previous CLI implementation called _resolve_flight itself before
handing per-flag values to a no-op orchestrator stub. AZ-328 moves
flight resolution into the orchestrator (per spec), so the CLI now
just packages flags into a BuildCacheRequest (with FlightSource
sum type — FlightById | FlightFromFile) and forwards. The CLI is
now a thin adapter; all the workflow logic lives in
build_cache.py.
Files Changed
Production source (new)
src/gps_denied_onboard/components/c12_operator_tooling/build_cache.py—BuildCacheOrchestratorclass + sequencedbuild_cache(...)flow_is_recognisedhelper for name-based external-error matching.
src/gps_denied_onboard/components/c12_operator_tooling/file_lock.py—FileLock/FileLockFactoryProtocols,LockTimeoutexception,FilelockFileLockFactoryconcrete wrapping thefilelocklibrary.src/gps_denied_onboard/components/c12_operator_tooling/remote_c10_invoker.py—RemoteCacheProvisionerInvoker(SSH-driven),RemoteBuildRequestDTO, secret-redacting stdout streamer, JSON parser for the finalRemoteBuildReport.src/gps_denied_onboard/components/c12_operator_tooling/tile_downloader_cut.py—TileDownloaderCutProtocol (consumer-side cut for C11TileDownloader).
Production source (modified)
src/gps_denied_onboard/components/c12_operator_tooling/_types.py— addedBuildCacheRequest,CacheBuildReport,FlightResolveReport,BuildCacheOutcomeenum,FailurePhaseenum,FlightResolveSourceenum,FlightSourcesum type (FlightById/FlightFromFile),RemoteBuildOutcome+RemoteBuildReport(C10 mirror types),DownloadRequestCut+DownloadBatchReportCut+DownloadOutcomeCut(C11 mirror types).src/gps_denied_onboard/components/c12_operator_tooling/errors.py— addedCacheBuildError(Exception)with phase-awareremediationattribute,BuildLockHeldError(CacheBuildError),BuildReportParseError(CacheBuildError).src/gps_denied_onboard/components/c12_operator_tooling/config.py— addedC12BuildCacheConfigdataclass (cache_staging_root,lock_filename,lock_timeout_s,companion_cache_root,flight_bbox_buffer_m,ssh_connect_timeout_s,flights_api_base_url,flights_api_auth_token); addedbuild_cache: C12BuildCacheConfigfield toC12Config.src/gps_denied_onboard/components/c12_operator_tooling/cli.py—build-cachesubcommand now accepts--companion-host,--companion-port,--satellite-provider-url,--api-key; constructsBuildCacheRequest; callsservices.build_cache_orchestrator.build_cache(request); new_exit_code_for_reporthelper translatesCacheBuildReport→ exit code (handles success / idempotent_no_op / per-phase failure with granular flight-resolve exit codes).src/gps_denied_onboard/components/c12_operator_tooling/__init__.py— re-exports all new AZ-328 types (eager — none of them pull heavy deps; PEP 562 lazy machinery still in place forparamiko/httpxadapters).src/gps_denied_onboard/runtime_root/c12_factory.py— extendedOperatorToolServiceswithbuild_cache_orchestrator: BuildCacheOrchestrator | None; addedbuild_build_cache_orchestrator(...)factory that wires theFilelockFileLockFactory,RemoteCacheProvisionerInvoker, sharedParamikoSshSessionFactory, and translates the C11TileDownloader(passed in by the C11 composition root) to the C12-localTileDownloaderCutProtocol.
Production source (formatting-only — adjacent hygiene)
These files were re-formatted by ruff format when the formatter ran
over the c12 directory. No behaviour change; pure whitespace /
line-wrapping normalisation:
src/gps_denied_onboard/components/c12_operator_tooling/flights_api/_parser.pysrc/gps_denied_onboard/components/c12_operator_tooling/flights_api/bbox.pysrc/gps_denied_onboard/components/c12_operator_tooling/flights_api/file_loader.pysrc/gps_denied_onboard/components/c12_operator_tooling/flights_api/httpx_client.pytests/unit/c12_operator_tooling/test_az489_flights_api_client.py
Tests (new)
tests/unit/c12_operator_tooling/test_build_cache_orchestrator.py— 29 tests covering all 15 ACs + NFR-perf-overhead microbench.tests/unit/c12_operator_tooling/test_remote_c10_invoker.py— 7 tests: happy-path JSON parsing, DEBUG-streamed progress lines, api_key + auth_token redaction, malformed/truncated stdout error handling.tests/unit/c12_operator_tooling/test_file_lock.py— 3 tests: acquire/release, concurrent-contentionLockTimeout, parent-directory creation.
Tests (modified)
tests/unit/c12_operator_tooling/test_cli_build_cache.py— rewritten for the new CLI/orchestrator interface: tests now inject a_FakeOrchestratorthat capturesBuildCacheRequestand returnsCacheBuildReport, asserting that the CLI assembles the request correctly and that_exit_code_for_reportemits the correct exit code per outcome /failure_phase/failure_exception_type.
Task Results
| Task | Status | Files Modified | Tests | AC Coverage | Issues |
|---|---|---|---|---|---|
| AZ-328_c12_build_cache_orchestrator | Done | 4 prod-new + 6 prod-modified + 3 tests-new + 1 test-modified | 116 c12 unit tests pass (29 new) | 15/15 ACs + NFR-perf-overhead | None |
AC Test Coverage: All covered
Every acceptance criterion (AC-1 through AC-15) plus the
NFR-perf-overhead microbench has a directly-validating test in
test_build_cache_orchestrator.py. Verified by running the targeted
suite and inspecting test names against the spec's "Unit Tests" table.
Code Review Verdict: PASS
Findings
None of severity Low or higher.
Notes (informational):
httpx_client.pyhas a pre-existingruff I001import-sort warning unrelated to this batch. Not in scope; left for the file's owning task to address.- Adjacent formatting-only changes to four
flights_api/*.pyfiles and onetest_az489_*.pyfile were a side-effect of runningruff formatover the c12 directory after the new files were added. They normalise existing pre-formatter style and are safe to ship.
Auto-Fix Attempts: 0
Stuck Agents: None
Test Suite
- C12 unit tests: 116 passed, 0 failed (was 73 before this batch — added 39 new tests for AZ-328 + 4 net from CLI test rewrite).
- Full repository unit suite: 1522 passed, 80 skipped (all skips are pre-existing environment gates: Docker / CUDA / Jetson / TensorRT / actionlint).
- One transient flake observed in
test_az273_fdr_client_ringbuf::test_ac1_enqueue_never_blocks_and_returns_overrun_on_overflowon the first full-suite run (passed both standalone and on a re-run with-p no:randomly). Unrelated to this batch (C13 / FDR area). Not blocking. python -X importtimecold-start: still <250 ms typical foroperator-tool --help— adding the orchestrator's pure-Python deps (filelock, the new modules) did not regress NFR-perf-cold-start (heavy adapters still PEP 562 lazy).
Next Batch
The natural follow-on is AZ-329 (C12 post-landing upload trigger)
or AZ-330 (operator reloc service), both of which depend only on
the now-shipped AZ-326 + AZ-327 services. Confirm with
_docs/02_tasks/_dependencies_table.md at the start of Batch 44.