Implements two new C12 services and rebalances the C11/C12 boundary in one atomic commit: * AZ-329 PostLandingUploadOrchestrator — gates C11 upload on the `flight_footer` FDR record's `clean_shutdown` field; 4 refusal modes; new FdrFooterReader Protocol + LocalFdrFooterReader. * AZ-330 OperatorReLocService — AC-3.4 visual-loss re-localization hint; reuses shared LatLonAlt; OperatorCommandTransport Protocol cut (E-C8 owns the future pymavlink concrete); new FDR record kind `c12.reloc.requested`; log redaction (lat/lon 5 decimals, reason 200 chars). * AZ-523 C11 internal flight-state gate removed (SRP refactor): `confirm_flight_state` / `FlightStateSignal` use / `FlightStateNotOnGroundError` deleted from C11; TileUploader contract bumped to v2.0.0 (frozen) with migration note; AZ-317 superseded. * AZ-524 Package rename `c12_operator_tooling` → `c12_operator_orchestrator` across source, tests, pyproject, CMake, Dockerfile, compose, CI, runtime-root services class (`OperatorOrchestratorServices`) + factory function (`build_operator_orchestrator`), logger namespaces, config slug, docs, and the E-C12 epic title. Tests: 1543 passed, 80 skipped (all environment gates). Targeted AC suite (AZ-329 + AZ-330 + FdrFooterReader): 37 passed. Cold-start NFR-perf still ≤ 500 ms p99. Tracker: AZ-317 → Done (superseded); AZ-319 v2.0.0 contract bump comment; AZ-329/AZ-330 → In Testing; AZ-253 epic renamed; AZ-523 + AZ-524 created and closed as audit-trail tickets. See `_docs/03_implementation/batch_44_cycle1_report.md`. Co-authored-by: Cursor <cursoragent@cursor.com>
17 KiB
GPS-Denied Onboard — Deployment Procedures
Date: 2026-05-09 (Plan Phase 2c — initial draft). Inputs:
_docs/02_document/architecture.md§ 3 (Deployment Model) + § 7 (Security);_docs/02_document/data_model.md§ 4 (Migration Strategy); environment_strategy.md; ADR-002, ADR-004, ADR-005; AC-NEW-1, AC-NEW-3, AC-NEW-4, AC-NEW-5.
Deployment scope and model
This project does not ship a service; it ships an embedded edge image plus an operator-orchestrator bundle. The "deployment" patterns from the standard template (blue-green / rolling / canary) are not applicable. Deployment for this project means:
| Artifact | Target | Deployment mechanism |
|---|---|---|
JetPack image (gps-denied-jetpack-<semver>-<sha>.img) |
Production Jetson Orin Nano Super on a UAV | Operator flashes the image onto the Jetson via NVIDIA sdkmanager or Etcher-style dd from the operator workstation |
| Operator tooling tarball | Operator workstation | Operator extracts; docker compose up -d brings up mock-suite-sat-service (when offline) + operator-orchestrator |
| Tier-1 dev compose | Developer workstation | Developer runs docker compose up from repo root |
Zero-downtime is not a goal: a UAV is not in service while it is being re-flashed. The deployment cadence is per-airframe maintenance, not per-request availability.
Strategy: the closest analogue to a "rolling deploy" is the operator's fleet-management process (re-flash one UAV at a time across the fleet). The fleet-management process is the operator's concern, not this project's; this document covers the per-airframe procedure.
Pre-deployment artifact assembly (release engineer)
Performed once per release on Tier-1 + Tier-2 CI; produces signed artifacts stored in the release bucket.
- Tag a commit on
main. CI runs the full pipeline (ci_cd_pipeline.md). - Tier-1 produces:
companion-tier1:deployment-<sha>andcompanion-tier1:research-<sha>Docker images (pushed to registry).mock-suite-sat-service:<sha>Docker image.operator-orchestrator:<sha>Docker image.- SBOM artifacts for both binaries (deployment and research).
operator-orchestrator-<semver>-<sha>.tar.gzcontaining the operator-orchestrator image + mock-sat image + their compose file + verification script + relevant docs.
- Tier-2 produces:
- Native deployment-binary build on the self-hosted Jetson runner.
- SBOM verification: byte-equal (after canonicalization) to Tier-1's deployment-binary SBOM. Mismatch fails the release.
- JetPack image build: a JetPack 6.2 base image with the deployment binary + PostgreSQL 16 + base migrations +
/etc/gps-denied/runtime.yamltemplate preinstalled. Output:gps-denied-jetpack-<semver>-<sha>.img.
- Signing (Tier-1):
- Both Docker image manifests are signed with the project's release key.
- The JetPack image is signed; checksum is published as a separate signed file (
gps-denied-jetpack-<semver>-<sha>.img.sha256.sig). - The operator-orchestrator tarball is signed.
- Release bucket: artifacts uploaded; release notes published; the previous release's artifacts retained for at least 90 days for rollback support.
A release fails if any step above fails — including any AC-bound NFT failure on Tier-2 (ci_cd_pipeline.md § AC-bound NFTs).
Pre-takeoff readiness gate ("health check" analog)
Production has no /health/live HTTP endpoint (no listener; NFT-SEC-05). The companion's "health check" is the pre-takeoff readiness gate: a sequence of checks that runs at takeoff load and decides whether the companion is ready to emit external position to the FC.
| Check | What it validates | Action on failure |
|---|---|---|
| Manifest content-hash gate (D-C10-3) | The on-disk manifest matches the operator-staged manifest hash (data_model.md § 2.4) | FDR record 0x000D ContentHashGateFail + STATUSTEXT critical + companion refuses to publish a GPS_INPUT / MSP2_SENSOR_GPS source |
| Camera calibration JSON validation | File present + schema-valid + content-hash matches manifests.calibration_artifact_hash |
Same |
FAISS .index mmap + content-hash |
mmap succeeds + content-hash matches manifests.descriptor_index_hash |
Same |
| TRT engine cache verification | All required engines present per engine_cache_entries; each engine's content-hash matches engine_hash |
Same |
alembic current == head |
DB schema is up-to-date for this binary | Same |
| MAVLink-2.0 signing handshake (AP profile) | Signed handshake with the FC succeeds within AC-NEW-1 30 s budget (D-C8-9 = (d)) | FDR record MavlinkSigningKeyRotated with reason "handshake_failed" + STATUSTEXT critical + companion refuses to emit |
| Per-flight key generation | Both per-flight ephemeral keys (MAVLink signing + onboard tile signing) generated and persisted under /var/lib/gps-denied/per-flight/ |
Same |
| Initial frame → emit pipeline test | First nav-camera frame reaches C8 outbound encoder; EmittedExternalPosition produced |
Same |
| Network egress is denied | Verify no outbound network egress is possible (DNS blackhole effective, iptables OUTPUT REJECT loaded) — defense-in-depth on architecture.md § 7 + NFT-SEC-05 | FDR critical + STATUSTEXT + refuse to emit |
The gate completes within the AC-NEW-1 30 s p95 budget; failure produces a clear FDR + STATUSTEXT trail and the companion's GPS_INPUT / MSP2_SENSOR_GPS channel stays silent — the FC operates as if no companion-GPS source is available, which is the correct safe-default.
Production deployment procedure (per-airframe)
This is the per-airframe deployment procedure performed by the operator, NOT by CI.
1. Pre-deploy approval
Required before any production-bound flight:
- Release notes for the target version reviewed; AC-NEW-4 / AC-NEW-7 statistical summaries reviewed.
- All Tier-2 AC-bound NFTs green at the target version (
ci_cd_pipeline.md§ AC-bound NFTs). - Security audit of the target version completed (Tier-1 SBOM clean of unpatched CVEs; D-CROSS-CVE-1).
- D-PROJ-1 calibration step performed on the target Jetson + UAV pairing (hybrid factory + checkerboard-refined; ~1 day per deployed unit).
- Rollback artifact (the previous release's JetPack image) is staged on the operator workstation.
- FDR retention policy for this airframe confirmed (default 30 days; environment_strategy.md § Database Management).
- If switching FC profile (
ardupilot_plane↔inav), FC firmware compatibility confirmed.
2. Pre-deploy checks (operator workstation)
# Verify the artifact bundle integrity.
cosign verify-blob \
--signature gps-denied-jetpack-<semver>-<sha>.img.sha256.sig \
--key gps-denied-release-key.pub \
gps-denied-jetpack-<semver>-<sha>.img.sha256
sha256sum -c gps-denied-jetpack-<semver>-<sha>.img.sha256
# Verify the operator-orchestrator tarball.
cosign verify-blob \
--signature operator-orchestrator-<semver>-<sha>.tar.gz.sig \
--key gps-denied-release-key.pub \
operator-orchestrator-<semver>-<sha>.tar.gz
3. Pre-flight cache build (operator-orchestrator C12)
Performed on the operator workstation, with satellite-provider reachable (locally mirrored or via lab VPN).
docker compose -f operator-orchestrator-compose.yml up -d
# Operator opens http://127.0.0.1:8080
The C12 UI walks the operator through:
- Upload / select the target operational sector (GeoJSON polygon).
- Set sector classifications (
active_conflict↔stable_rear) — drives freshness threshold (data_model.md § 2.3). - Tile download from
satellite-provider(parent suite) — producestilesrows withsource='googlemaps'+ filesystem JPEGs. - Descriptor (FAISS) index generation across the loaded tile corpus.
- TRT engine compilation on the workstation (Tier-2 emulation if no Jetson is present, or directly on a co-located Jetson dev kit).
- Manifest generation: hash over (model bundle + calibration JSON + corpus + sector classifications + descriptor index + engine cache).
- Output: a sealed pre-flight bundle on a USB drive or staged for direct ethernet transfer.
4. JetPack image flash
Operator flashes the target JetPack image onto the Jetson:
sudo dd if=gps-denied-jetpack-<semver>-<sha>.img of=/dev/sdX bs=4M status=progress
# OR via NVIDIA SDK Manager for a more guided flow.
sync
The flashed image contains:
- JetPack 6.2 base
- The deployment binary preinstalled at
/opt/gps-denied/ - PostgreSQL 16 with
alembicschema initialized at the target migration head /etc/gps-denied/runtime.yamltemplate (the operator fills in airframe-specific values:fc_profile,companion_id)- A systemd unit
gps-denied.servicethat auto-starts at boot
The image is identical across UAVs; per-airframe configuration (/etc/gps-denied/runtime.yaml) is filled in after flash.
5. Per-airframe configuration
Operator boots the Jetson in maintenance mode, ssh's in (this is the only time the Jetson has any inbound network surface; closed before takeoff), and:
sudo $EDITOR /etc/gps-denied/runtime.yaml
# Set: fc_profile, companion_id, fdr_retention_days, log_level
sudo gps-denied-cli stage-cache /mnt/usb/gps-denied-cache-<sector-id>.tar.gz
# Stages the operator-prepared cache + calibration + manifest into /var/lib/gps-denied/.
sudo gps-denied-cli verify-readiness
# Runs all gate checks except MAVLink signing handshake (which requires the FC to be powered).
6. UAV integration
- Wire the Jetson UART/USB to the FC.
- For ArduPilot Plane: configure FC parameters per the AP-side checklist (
EKF3_SRC1_POSXY = 3or per D-C8-2 = (b) configuration, AHRS_EKF_TYPE = 3). - For iNav: configure
gps_provider = MSP,gps_ublox_use_galileo = OFF. - Power up the FC; verify MAVLink signing handshake completes within 30 s (AC-NEW-1).
7. First-flight commissioning
The first flight on a freshly-deployed airframe is a commissioning flight, not a production flight:
- Operator stays in line-of-sight.
- AC-5.2 fallback (FC IMU-only) is the primary safety net during commissioning.
- Operator manually triggers a
MAV_CMD_REQUEST_MESSAGEto confirmGPS_INPUTis being received and the FC's EKF source-set switch responds correctly. - If everything looks healthy on the GCS dashboard for 5+ minutes of cruise, the airframe is cleared for production flights.
8. Post-deploy monitoring
Post first commissioning flight:
- FDR retrieved and visualized on operator workstation (operator-orchestrator C12 dashboard, observability.md § 5.1).
- AC-NEW-4 statistics for the commissioning flight reviewed; outliers investigated.
- No FDR segment drops; no
ContentHashGateFailevents. - Mid-flight tile generation working (post-landing upload — handle that separately).
- If everything green, the deployment is finalised; the previous release's JetPack image can be archived (still kept for rollback).
Post-landing tile upload (per-flight, ADR-004)
Per AC-8.4 + ADR-004, mid-flight tile upload to satellite-provider is post-landing only, and uses the operator-orchestrator's C11 Tile Manager (TileUploader interface; a separate binary, never linked into the airborne image).
# Operator plugs the companion's NVM into the workstation OR ssh's into the powered-off-then-re-booted Jetson.
docker compose run operator-orchestrator \
python -m operator_orchestrator.tilemanager upload \
--flight-id <uuid> \
--satellite-provider $SATELLITE_PROVIDER_URL \
--signing-pubkey-fingerprint <fingerprint>
Behavior:
- Reads the local
tilesrows wheresource='onboard_ingest' AND voting_status='pending' AND flight_id=<uuid>. - Reads the corresponding JPEG body + sidecar JSON from filesystem.
- Reads the per-flight onboard tile-signing private key (still on the companion's NVM until FDR rolls over).
- Submits to
satellite-provider'sPOST /api/satellite/tiles/ingestendpoint (D-PROJ-2 contract). - On 2xx success: deletes local row + JPEG + sidecar + emits FDR event
tile_uploaded. - On 4xx: leaves local data; emits FDR event
tile_upload_failedwith reason; operator decides next steps (likely a parent-suite issue). - On 5xx: retries with exponential backoff; persistent failure →
tile_upload_failed+ operator review.
When the parent-suite voting layer (D-PROJ-2 design task #2) ships, this flow does NOT change on the onboard side — the parent suite's promotion logic is invisible to onboard-side upload.
Rollback Procedures
Trigger criteria
| Severity | Trigger | Decision-maker |
|---|---|---|
| Critical (per-airframe) | Commissioning flight fails AC-5.2 fallback (the FC IMU-only fallback also failed; airframe lost) | Safety review board (out of scope of this project) |
| Critical (fleet-wide) | Any post-deploy AC-NEW-4 outlier indicates a regression: P(err > 1 km) measured on a real flight > AC threshold by ≥ 2x | Suite security + onboard team lead |
| High (per-airframe) | Commissioning flight passes but post-flight FDR analysis shows AC-NEW-4 / AC-NEW-7 regression vs. prior release | Onboard team lead |
| High (per-airframe) | Operator unable to complete pre-flight readiness gate (manifest hash gate fails repeatedly) | Operator + onboard team lead |
| Medium (per-airframe) | Sustained dead_reckoned periods longer than expected; FDR segment drops occurring |
Operator + onboard team lead (post-flight investigation; may not warrant immediate rollback) |
Rollback steps (per-airframe)
- Re-flash the previous release's JetPack image onto the affected Jetson (same procedure as § 4 with the previous artifact).
- Re-stage the previous release's pre-flight bundle (the operator workstation retains it in the operator-orchestrator cache for ≥ 30 days).
- Re-run the pre-takeoff readiness gate.
- Confirm AC-5.2 fallback is still functional (it is FC firmware behavior; rolling back the companion image cannot break it, but verify on the GCS).
- Document the rollback in the post-mortem template; include FDR snapshots from the offending flight (if any) plus the rollback artifacts versions.
Database rollback (data_model.md § 4.2 reversibility)
Per data_model.md § 4.2, every Alembic migration MUST implement a working downgrade(). Rolling back the JetPack image to the previous release rolls back the schema to whatever migration head the previous release uses. Concretely:
- The previous release's JetPack image contains its own Alembic migration tree.
- On boot, the previous-release runtime asserts
alembic current == head_for_that_release. If the database is on a NEWER head (because the airframe ran the new release between deployments), the runtime invokesalembic downgrade <previous-release-head>automatically. - If a migration is not reversible (which requires an explicit ADR — data_model.md § 4.2), the rollback must be manually adjudicated by the operator + onboard team lead. This case is rare by policy.
Post-mortem
Required after every rollback (per-airframe or fleet-wide):
- Timeline: when was the new release flashed; when did the failure surface; when was rollback initiated.
- Root cause: which AC was missed; which component is implicated; was it a regression introduced by this release or by a hardware/operational variable change.
- What went wrong in the release process: did Tier-2 CI catch it; if not, why not.
- Prevention: new test scenario added to NFT suite; new lint check; new rule in
_docs/LESSONS.md. - Distribution: post-mortem report stored under
_docs/06_metrics/incident_<YYYY-MM-DD>_<topic>.md(per autodev failure-handling protocol).
Deployment Checklist
Pre-flash:
- All Tier-2 AC-bound NFTs green at target version
- Security scan clean (zero critical / high CVEs; SBOM diff passes ADR-002 enforcement)
- Both Docker images built and pushed (deployment + research)
- JetPack image built, signed, checksummed
- Operator-tooling tarball built, signed
- Pre-flight bundle prepared by operator (cache + calibration + manifest)
- Pre-takeoff readiness gate behavior verified on a bench Jetson before flashing onto the production unit
- Rollback artifact (previous release JetPack image) staged on operator workstation
- FDR retention policy confirmed for the target airframe
Post-flash:
- First-flight commissioning flight cleared per § 7
- FDR retrieved and analyzed; AC-NEW-4 / AC-NEW-7 statistics within expected envelope
- Post-landing upload procedure tested end-to-end (companion → operator workstation →
satellite-provider) - Operator runbook updated with airframe-specific notes (e.g., "this airframe has UART2 wired to FC")
Tier-2 enablement
Until the Tier-2 self-hosted Jetson runner is fully provisioned:
- AC-bound NFTs are gated as manual trigger only on PRs (
ci_cd_pipeline.md§ Manual-trigger override). - The merge gate on
devexcludes Tier-2 NFTs; the merge gate onstageandmainretains the full gate. - The pre-takeoff readiness gate (§ Pre-takeoff readiness gate) is unaffected — it runs on the Jetson at every takeoff regardless of CI gating posture.
When the Tier-2 runner is in steady state, this section is removed and the merge gates harmonize across dev / stage / main.