Files
gps-denied-onboard/_docs/02_document/components/12_c11_tilemanager/description.md
T
Oleksandr Bezdieniezhnykh 5fe67023b2 [AZ-329] [AZ-330] [AZ-523] [AZ-524] Batch 44 atomic refactor
Implements two new C12 services and rebalances the C11/C12 boundary
in one atomic commit:

* AZ-329 PostLandingUploadOrchestrator — gates C11 upload on the
  `flight_footer` FDR record's `clean_shutdown` field; 4 refusal
  modes; new FdrFooterReader Protocol + LocalFdrFooterReader.
* AZ-330 OperatorReLocService — AC-3.4 visual-loss re-localization
  hint; reuses shared LatLonAlt; OperatorCommandTransport Protocol
  cut (E-C8 owns the future pymavlink concrete); new FDR record
  kind `c12.reloc.requested`; log redaction (lat/lon 5 decimals,
  reason 200 chars).
* AZ-523 C11 internal flight-state gate removed (SRP refactor):
  `confirm_flight_state` / `FlightStateSignal` use /
  `FlightStateNotOnGroundError` deleted from C11; TileUploader
  contract bumped to v2.0.0 (frozen) with migration note; AZ-317
  superseded.
* AZ-524 Package rename `c12_operator_tooling` →
  `c12_operator_orchestrator` across source, tests, pyproject,
  CMake, Dockerfile, compose, CI, runtime-root services class
  (`OperatorOrchestratorServices`) + factory function
  (`build_operator_orchestrator`), logger namespaces, config slug,
  docs, and the E-C12 epic title.

Tests: 1543 passed, 80 skipped (all environment gates). Targeted
AC suite (AZ-329 + AZ-330 + FdrFooterReader): 37 passed. Cold-start
NFR-perf still ≤ 500 ms p99.

Tracker: AZ-317 → Done (superseded); AZ-319 v2.0.0 contract bump
comment; AZ-329/AZ-330 → In Testing; AZ-253 epic renamed; AZ-523
+ AZ-524 created and closed as audit-trail tickets.

See `_docs/03_implementation/batch_44_cycle1_report.md`.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 19:42:46 +03:00

15 KiB
Raw Blame History

C11 — Tile Manager

1. High-Level Overview

Purpose: own the operator-side network I/O against satellite-provider for the onboard tile corpus, in both directions:

  • Download (pre-flight, F1): fetch tiles from satellite-provider for the operational area, apply AC-NEW-6 freshness gating, and write into C6 (TileStore + TileMetadataStore). C11 is the only path that crosses the workstation/companion enclave to the parent suite for tile pixels — C10 reads from the populated C6 store and never touches satellite-provider itself.
  • Upload (post-landing, F10): read pending mid-flight tiles from C6 and POST to satellite-provider's ingest endpoint (D-PROJ-2 contract sketch). C11 itself does NOT gate on flight state — it is a dumb pipe; the post-landing safety gate is owned by C12's PostLandingUploadOrchestrator (AZ-329 / Batch 44), which checks the C13 flight_footer FDR record for clean_shutdown=True before invoking TileUploader.upload_pending_tiles.

C11 is a separate operator-side binary / image. The airborne companion image's CMake target deliberately excludes the entire c11_tilemanager/ source tree so the airborne process cannot accidentally execute either the download path or the upload path even via reflection or config error (ADR-004 process-level isolation, AC-8.4). Both directions of tile I/O are operator-driven on the operator workstation; the companion only consumes the populated C6 store while airborne.

Architectural Pattern: Pipeline behind two interfaces (TileDownloader, TileUploader) under one component, consistent with C8's multi-interface shape (FC-AP, FC-iNav, GCS adapters under one component). The two interfaces are bundled into C11 because they share auth (TLS + service-internal API key for download, per-flight onboard signing key for upload), HTTP client, network configuration, deployment unit (operator-tooling tarball), and the airborne-exclusion property — splitting them into two components would duplicate all of that. They are kept as two interfaces so SRP is preserved at the call-site level: C12 binds TileDownloader for the F1 cache-build workflow, TileUploader for the F10 post-landing trigger; neither is forced to depend on the other.

Upstream dependencies:

  • C12 OperatorTooling → invokes TileDownloader.download_tiles_for_area(...) during F1 and TileUploader.upload_pending_tiles(...) post-landing.
  • C6 TileStore + TileMetadataStore → write target during download (source = googlemaps); read source during upload (source = onboard_ingest, voting_status = pending).
  • Operator workstation OS → invocation entry point (CLI / tray app, owned by C12).
  • satellite-provider (external) → GET /api/satellite/tiles?bbox=…&zoom=… for download; POST /api/satellite/tiles/ingest for upload (D-PROJ-2 design task #1, planned, not yet implemented service-side).

Downstream consumers:

  • C10 CacheProvisioner reads the populated C6 store after a TileDownloader run completes; C10 does not call C11 directly. C12 sequences the two steps.
  • On the upload side: none on the onboard side; the parent-suite voting layer (D-PROJ-2 design task #2) consumes the uploaded tiles asynchronously.

2. Internal Interfaces

Interface: TileDownloader

Method Input Output Async Error Types
download_tiles_for_area DownloadRequest DownloadBatchReport No (offline; minutes) SatelliteProviderError, FreshnessRejectionError, ResolutionRejectionError, CacheBudgetExceededError, IdempotentNoOp
enumerate_remote_coverage bbox: BoundingBox, zoom_levels: list[int] list[TileSummary] No SatelliteProviderError

Interface: TileUploader

Method Input Output Async Error Types
enumerate_pending_tiles flight_id: uuid (optional) list[TileMetadata] No TileMetadataError
upload_pending_tiles UploadRequest UploadBatchReport No SatelliteProviderError, RateLimitedError, SignatureRejectedError

C11 no longer exposes confirm_flight_state — the post-landing flight-state gate moved to C12 (PostLandingUploadOrchestrator, AZ-329) per Batch 44. FlightStateNotOnGroundError is retired from C11; the corresponding refusal now lives at the C12 boundary as FlightStateNotConfirmedError.

Input/Output DTOs:

DownloadRequest:
  bbox:                       BoundingBox (lat_min, lon_min, lat_max, lon_max)
  zoom_levels:                list[int]
  sector_class:               enum {active_conflict, stable_rear}
  satellite_provider_url:     URL
  service_api_key:            string
  cache_root:                 Path

DownloadBatchReport:
  tiles_downloaded:                 int
  tiles_rejected_freshness:         int
  tiles_rejected_resolution:        int   # RESTRICT-SAT-4 < 0.5 m/px
  tiles_downgraded:                 int
  freshness_summary:                dict[freshness_label, count]
  outcome:                          enum {success, failure, idempotent_no_op}
  failure_reason:                   string (optional)

UploadRequest:
  flight_id:                  uuid (optional; defaults to all flights with pending tiles)
  batch_size:                 int
  satellite_provider_url:     URL

UploadBatchReport:
  batch_uuid:                       uuid (assigned by satellite-provider per D-PROJ-2 contract)
  per_tile_status:                  list[(tile_id, status: enum {queued, rejected, duplicate, superseded})]
  retry_count:                      int
  next_retry_at_s:                  int (when retried)
  outcome:                          enum {success, partial, failure}

3. External API Specification

C11 is a client of satellite-provider's REST surface in both directions.

3.1 Download — read path (existing satellite-provider API)

Endpoint Method Auth Rate Limit Description
/api/satellite/tiles?bbox=…&zoom=… GET TLS + service-internal API key parent-suite enforces Paged tile blobs + metadata for a bounding box at the given zoom level(s).

C11 honours Retry-After on 429s, fails fast on TLS / auth errors, retries with backoff on 5xx. Resolution below 0.5 m/px (RESTRICT-SAT-4) is rejected at the C11 boundary, not pushed downstream.

3.2 Upload — write path (D-PROJ-2 contract sketch, planned)

Endpoint Method Auth Rate Limit Description
/api/satellite/tiles/ingest (parent-suite, planned) POST Per-flight onboard signing key (D-C8-9 = (d) family); each tile carries the signature parent-suite enforces Multipart upload of one or more tiles; response 202 with batch UUID + per-tile status.

Request schema (multipart fields per tile):

  • tile_blob (JPEG body, byte-identical to satellite-provider's existing tile format)
  • zoomLevel (int)
  • latitude / longitude (double)
  • tile_size_meters (double), tile_size_pixels (int)
  • capture_timestamp (ISO 8601), flight_id (UUID), companion_id (string)
  • quality_metadata (JSON; TileQualityMetadata per data_model.md)
  • signature (per-flight onboard signing key signature over the payload)

Response: 202 Accepted + {batch_uuid: UUID, per_tile_status: [...]}.

Test substitute during NFT-SEC-01 / FT-P-17 / IT runs: the e2e-test mock-suite-sat-service fixture (under tests/fixtures/mock-suite-sat-service/) implements the planned POST surface so upload integration tests can run before D-PROJ-2 ships service-side. Download integration tests run against the real satellite-provider (its existing GET surface is already implemented). The mock is not a component and is never reached in production.

4. Data Access Patterns

C11 reads from / writes to C6 (the local store) and reads from / writes to satellite-provider (network). It owns no relational state of its own beyond a small download-progress journal and a small upload-progress journal.

Caching Strategy

Data Cache Type TTL Invalidation
Download-progress journal filesystem alongside the operator workstation cache root until a download_tiles_for_area run completes per-area run on completion
Pending-upload journal filesystem alongside the operator workstation cache root until upload acknowledged per-batch acknowledgment removes from journal

Storage Estimates

Table/Collection Est. Row Count (1yr) Row Size Total Size Growth Rate
Download-progress journal a few hundred rows per area provisioning ~256 B / row <1 MB per provisioning run
Pending-upload journal a few hundred per flight ~256 B / row <1 MB per flight

Data Management

Seed data: none — both journals are empty until the operator triggers a download or an upload run.

Rollback: the download path is idempotent — re-running download_tiles_for_area for an unchanged (bbox, zoom_levels, sector_class) triggers C10's manifest-hash check (D-C10-1) downstream and the engine/descriptor build is skipped. The upload path is idempotent on the service side via the (zoomLevel, lat, lon, capture_timestamp, companion_id, flight_id) dedup key.

5. Implementation Details

Algorithmic Complexity:

  • Download: linear in tile count; bandwidth-bound by the operator workstation's link to satellite-provider.
  • Upload: linear in pending tile count; bandwidth-bound; bursty post-landing.

State Management: stateless except for the two journals.

Key Dependencies:

Library Version Purpose
httpx per project pin GET (download) + multipart POST (upload) to satellite-provider
atomicwrites latest Journal updates
cryptography per project pin Per-flight signing key (upload payload signing); the production satellite-provider ingest endpoint and the e2e-test mock-suite-sat-service fixture both verify with the same key family

Error Handling Strategy:

  • SatelliteProviderError: HTTP timeout / 5xx / TLS failure on either direction. Retry-with-backoff on 5xx; fail fast on TLS / auth. On download, surface to operator + takeoff blocked. On upload, leave tiles in the pending-upload journal and surface to operator. Do not delete uploaded tiles from C6 until acknowledged.
  • RateLimitedError (429): obey Retry-After; the operator can also re-invoke later. Same handling either direction.
  • FreshnessRejectionError / ResolutionRejectionError: download-side only. Per AC-NEW-6 / RESTRICT-SAT-4 — never silently downgrade fresh-required tiles in active_conflict sectors. Surface counts in the DownloadBatchReport.
  • CacheBudgetExceededError: download-side only. Pre-flight free-space check against AC-8.3 (≤ 10 GB). Fail fast with explicit budget delta; no partial write.
  • SignatureRejectedError: upload-side only. Per-flight signing key was rejected by satellite-provider. This is a security-critical event — do NOT silently drop; surface to operator + log to FDR.

Post-landing safety: C11's upload path no longer gates on flight state internally. The check now lives in C12's PostLandingUploadOrchestrator (AZ-329 / Batch 44), which refuses to invoke TileUploader.upload_pending_tiles unless the C13 flight_footer FDR record records clean_shutdown=True for the target flight. ADR-004 process-level isolation remains the primary control — C11 should never run on the companion at all.

6. Extensions and Helpers

Helper Purpose Used By
TileSignaturePayloadBuilder constructs the signed payload for D-PROJ-2 contract (upload) C11 only — keep inside the component

7. Caveats & Edge Cases

Known limitations:

  • D-PROJ-2 ingest endpoint is NOT yet implemented service-side. Until parent-suite delivers the endpoint, C11 will fail every upload — the pending-upload journal accumulates. Operator workflow tolerates this.
  • The e2e-test mock-suite-sat-service fixture implements only the planned POST contract (per the leftover file). Download integration tests run against the real satellite-provider. Production runs reach satellite-provider directly in both directions; the fixture is never on the production path.
  • TileDownloader requires the operator workstation to have network reach to satellite-provider (the only path that crosses out of the workstation enclave). Pre-flight network configuration is an operator concern owned by C12; C11 fails fast if reachability is missing.

Potential race conditions:

  • If the operator launches two TileDownloader runs concurrently against the same cache root, a filesystem lockfile (operated by C12 tooling) prevents corrupting C6's tile rows. Same lockfile gates concurrent TileUploader invocations.

Performance bottlenecks:

  • Download: bandwidth-bound by the operator workstation's satellite-provider link; descriptor / engine work is downstream in C10 (offline, minutes).
  • Upload: bandwidth-bound. Per-flight upload volume is bounded by the F4 mid-flight tile gen cap (typically a few hundred tiles, each 50200 KB → tens of MB per flight).

8. Dependency Graph

Must be implemented after: C6 (read source for upload, write target for download), satellite-provider (download path; existing) + D-PROJ-2 endpoint (upload path; the e2e-test mock-suite-sat-service fixture covers tests until the real endpoint ships).

Can be implemented in parallel with: anything except C6 changes.

Blocks: F1 (pre-flight cache build cannot start without TileDownloader), F10 (post-landing upload cannot start without TileUploader).

9. Logging Strategy

Log Level When Example
ERROR SignatureRejectedError, persistent SatelliteProviderError, CacheBudgetExceededError C11 upload failure: signature rejected by satellite-provider
WARN one-off network failure, scheduled retry, freshness-driven rejections (counts) C11 batch upload retry: batch_uuid=…; next_retry_in_s=30
INFO session start/end; per-batch report (download + upload) C11 download complete: 87654 tiles, 12 stale-rejected; bbox=…
DEBUG per-tile request/response C11 tile uploaded: tile_id=(z=18,lat=…,lon=…); status=queued

Log format: structured JSON. Log storage: operator workstation log file (e.g. ~/.azaion/onboard/c11-tilemanager.log); also writes per-run summaries (download report, upload report) to the operator workstation cache root for audit. The companion's FDR is NOT involved (C11 doesn't run on the companion).