# GPS-Denied Onboard — Deployment Procedures > Date: 2026-05-09 (Plan Phase 2c — initial draft). > Inputs: `_docs/02_document/architecture.md` § 3 (Deployment Model) + § 7 (Security); `_docs/02_document/data_model.md` § 4 (Migration Strategy); environment_strategy.md; ADR-002, ADR-004, ADR-005; AC-NEW-1, AC-NEW-3, AC-NEW-4, AC-NEW-5. ## Deployment scope and model This project does **not** ship a service; it ships an **embedded edge image** plus an **operator-orchestrator bundle**. The "deployment" patterns from the standard template (blue-green / rolling / canary) are not applicable. Deployment for this project means: | Artifact | Target | Deployment mechanism | |---|---|---| | **JetPack image** (`gps-denied-jetpack--.img`) | Production Jetson Orin Nano Super on a UAV | Operator flashes the image onto the Jetson via NVIDIA `sdkmanager` or `Etcher`-style `dd` from the operator workstation | | **Operator tooling tarball** | Operator workstation | Operator extracts; `docker compose up -d` brings up `mock-suite-sat-service` (when offline) + `operator-orchestrator` | | **Tier-1 dev compose** | Developer workstation | Developer runs `docker compose up` from repo root | **Zero-downtime is not a goal**: a UAV is not in service while it is being re-flashed. The deployment cadence is per-airframe maintenance, not per-request availability. **Strategy**: the closest analogue to a "rolling deploy" is the operator's fleet-management process (re-flash one UAV at a time across the fleet). The fleet-management process is the operator's concern, not this project's; this document covers the per-airframe procedure. ## Pre-deployment artifact assembly (release engineer) Performed once per release on Tier-1 + Tier-2 CI; produces signed artifacts stored in the release bucket. 1. Tag a commit on `main`. CI runs the full pipeline (`ci_cd_pipeline.md`). 2. **Tier-1 produces**: - `companion-tier1:deployment-` and `companion-tier1:research-` Docker images (pushed to registry). - `mock-suite-sat-service:` Docker image. - `operator-orchestrator:` Docker image. - SBOM artifacts for both binaries (deployment and research). - `operator-orchestrator--.tar.gz` containing the operator-orchestrator image + mock-sat image + their compose file + verification script + relevant docs. 3. **Tier-2 produces**: - Native deployment-binary build on the self-hosted Jetson runner. - SBOM verification: byte-equal (after canonicalization) to Tier-1's deployment-binary SBOM. Mismatch fails the release. - **JetPack image build**: a JetPack 6.2 base image with the deployment binary + PostgreSQL 16 + base migrations + `/etc/gps-denied/runtime.yaml` template preinstalled. Output: `gps-denied-jetpack--.img`. 4. **Signing** (Tier-1): - Both Docker image manifests are signed with the project's release key. - The JetPack image is signed; checksum is published as a separate signed file (`gps-denied-jetpack--.img.sha256.sig`). - The operator-orchestrator tarball is signed. 5. **Release bucket**: artifacts uploaded; release notes published; the previous release's artifacts retained for at least 90 days for rollback support. A release fails if any step above fails — including any AC-bound NFT failure on Tier-2 (`ci_cd_pipeline.md` § AC-bound NFTs). ## Pre-takeoff readiness gate ("health check" analog) Production has no `/health/live` HTTP endpoint (no listener; NFT-SEC-05). The companion's "health check" is the **pre-takeoff readiness gate**: a sequence of checks that runs at takeoff load and decides whether the companion is ready to emit external position to the FC. | Check | What it validates | Action on failure | |---|---|---| | Manifest content-hash gate (D-C10-3) | The on-disk manifest matches the operator-staged manifest hash (data_model.md § 2.4) | FDR record `0x000D ContentHashGateFail` + STATUSTEXT critical + companion refuses to publish a `GPS_INPUT` / `MSP2_SENSOR_GPS` source | | Camera calibration JSON validation | File present + schema-valid + content-hash matches `manifests.calibration_artifact_hash` | Same | | FAISS `.index` mmap + content-hash | mmap succeeds + content-hash matches `manifests.descriptor_index_hash` | Same | | TRT engine cache verification | All required engines present per `engine_cache_entries`; each engine's content-hash matches `engine_hash` | Same | | `alembic current == head` | DB schema is up-to-date for this binary | Same | | MAVLink-2.0 signing handshake (AP profile) | Signed handshake with the FC succeeds within AC-NEW-1 30 s budget (D-C8-9 = (d)) | FDR record `MavlinkSigningKeyRotated` with reason "handshake_failed" + STATUSTEXT critical + companion refuses to emit | | Per-flight key generation | Both per-flight ephemeral keys (MAVLink signing + onboard tile signing) generated and persisted under `/var/lib/gps-denied/per-flight/` | Same | | Initial frame → emit pipeline test | First nav-camera frame reaches C8 outbound encoder; `EmittedExternalPosition` produced | Same | | Network egress is denied | Verify no outbound network egress is possible (DNS blackhole effective, iptables OUTPUT REJECT loaded) — defense-in-depth on architecture.md § 7 + NFT-SEC-05 | FDR critical + STATUSTEXT + refuse to emit | The gate completes within the AC-NEW-1 30 s p95 budget; failure produces a clear FDR + STATUSTEXT trail and the companion's `GPS_INPUT` / `MSP2_SENSOR_GPS` channel stays silent — the FC operates as if no companion-GPS source is available, which is the correct safe-default. ## Production deployment procedure (per-airframe) This is the per-airframe deployment procedure performed by the operator, NOT by CI. ### 1. Pre-deploy approval Required before any production-bound flight: - [ ] Release notes for the target version reviewed; AC-NEW-4 / AC-NEW-7 statistical summaries reviewed. - [ ] All Tier-2 AC-bound NFTs green at the target version (`ci_cd_pipeline.md` § AC-bound NFTs). - [ ] Security audit of the target version completed (Tier-1 SBOM clean of unpatched CVEs; D-CROSS-CVE-1). - [ ] D-PROJ-1 calibration step performed on the target Jetson + UAV pairing (hybrid factory + checkerboard-refined; ~1 day per deployed unit). - [ ] Rollback artifact (the previous release's JetPack image) is staged on the operator workstation. - [ ] FDR retention policy for this airframe confirmed (default 30 days; environment_strategy.md § Database Management). - [ ] If switching FC profile (`ardupilot_plane` ↔ `inav`), FC firmware compatibility confirmed. ### 2. Pre-deploy checks (operator workstation) ```sh # Verify the artifact bundle integrity. cosign verify-blob \ --signature gps-denied-jetpack--.img.sha256.sig \ --key gps-denied-release-key.pub \ gps-denied-jetpack--.img.sha256 sha256sum -c gps-denied-jetpack--.img.sha256 # Verify the operator-orchestrator tarball. cosign verify-blob \ --signature operator-orchestrator--.tar.gz.sig \ --key gps-denied-release-key.pub \ operator-orchestrator--.tar.gz ``` ### 3. Pre-flight cache build (operator-orchestrator C12) Performed on the operator workstation, with `satellite-provider` reachable (locally mirrored or via lab VPN). ```sh docker compose -f operator-orchestrator-compose.yml up -d # Operator opens http://127.0.0.1:8080 ``` The C12 UI walks the operator through: 1. Upload / select the target operational sector (GeoJSON polygon). 2. Set sector classifications (`active_conflict` ↔ `stable_rear`) — drives freshness threshold (data_model.md § 2.3). 3. Tile download from `satellite-provider` (parent suite) — produces `tiles` rows with `source='googlemaps'` + filesystem JPEGs. 4. Descriptor (FAISS) index generation across the loaded tile corpus. 5. TRT engine compilation on the workstation (Tier-2 emulation if no Jetson is present, or directly on a co-located Jetson dev kit). 6. Manifest generation: hash over (model bundle + calibration JSON + corpus + sector classifications + descriptor index + engine cache). 7. Output: a sealed pre-flight bundle on a USB drive or staged for direct ethernet transfer. ### 4. JetPack image flash Operator flashes the target JetPack image onto the Jetson: ```sh sudo dd if=gps-denied-jetpack--.img of=/dev/sdX bs=4M status=progress # OR via NVIDIA SDK Manager for a more guided flow. sync ``` The flashed image contains: - JetPack 6.2 base - The deployment binary preinstalled at `/opt/gps-denied/` - PostgreSQL 16 with `alembic` schema initialized at the target migration head - `/etc/gps-denied/runtime.yaml` template (the operator fills in airframe-specific values: `fc_profile`, `companion_id`) - A systemd unit `gps-denied.service` that auto-starts at boot The image is **identical across UAVs**; per-airframe configuration (`/etc/gps-denied/runtime.yaml`) is filled in after flash. ### 5. Per-airframe configuration Operator boots the Jetson in maintenance mode, ssh's in (this is the only time the Jetson has any inbound network surface; closed before takeoff), and: ```sh sudo $EDITOR /etc/gps-denied/runtime.yaml # Set: fc_profile, companion_id, fdr_retention_days, log_level sudo gps-denied-cli stage-cache /mnt/usb/gps-denied-cache-.tar.gz # Stages the operator-prepared cache + calibration + manifest into /var/lib/gps-denied/. sudo gps-denied-cli verify-readiness # Runs all gate checks except MAVLink signing handshake (which requires the FC to be powered). ``` ### 6. UAV integration - Wire the Jetson UART/USB to the FC. - For ArduPilot Plane: configure FC parameters per the AP-side checklist (`EKF3_SRC1_POSXY = 3` or per D-C8-2 = (b) configuration, AHRS_EKF_TYPE = 3). - For iNav: configure `gps_provider = MSP`, `gps_ublox_use_galileo = OFF`. - Power up the FC; verify MAVLink signing handshake completes within 30 s (AC-NEW-1). ### 7. First-flight commissioning The first flight on a freshly-deployed airframe is a **commissioning flight**, not a production flight: - Operator stays in line-of-sight. - AC-5.2 fallback (FC IMU-only) is the primary safety net during commissioning. - Operator manually triggers a `MAV_CMD_REQUEST_MESSAGE` to confirm `GPS_INPUT` is being received and the FC's EKF source-set switch responds correctly. - If everything looks healthy on the GCS dashboard for 5+ minutes of cruise, the airframe is cleared for production flights. ### 8. Post-deploy monitoring Post first commissioning flight: - [ ] FDR retrieved and visualized on operator workstation (operator-orchestrator C12 dashboard, observability.md § 5.1). - [ ] AC-NEW-4 statistics for the commissioning flight reviewed; outliers investigated. - [ ] No FDR segment drops; no `ContentHashGateFail` events. - [ ] Mid-flight tile generation working (post-landing upload — handle that separately). - [ ] If everything green, the deployment is finalised; the previous release's JetPack image can be archived (still kept for rollback). ## Post-landing tile upload (per-flight, ADR-004) Per AC-8.4 + ADR-004, mid-flight tile upload to `satellite-provider` is **post-landing only**, and uses the operator-orchestrator's C11 Tile Manager (`TileUploader` interface; a separate binary, never linked into the airborne image). ```sh # Operator plugs the companion's NVM into the workstation OR ssh's into the powered-off-then-re-booted Jetson. docker compose run operator-orchestrator \ python -m operator_orchestrator.tilemanager upload \ --flight-id \ --satellite-provider $SATELLITE_PROVIDER_URL \ --signing-pubkey-fingerprint ``` Behavior: - Reads the local `tiles` rows where `source='onboard_ingest' AND voting_status='pending' AND flight_id=`. - Reads the corresponding JPEG body + sidecar JSON from filesystem. - Reads the per-flight onboard tile-signing private key (still on the companion's NVM until FDR rolls over). - Submits to `satellite-provider`'s `POST /api/satellite/tiles/ingest` endpoint (D-PROJ-2 contract). - On 2xx success: deletes local row + JPEG + sidecar + emits FDR event `tile_uploaded`. - On 4xx: leaves local data; emits FDR event `tile_upload_failed` with reason; operator decides next steps (likely a parent-suite issue). - On 5xx: retries with exponential backoff; persistent failure → `tile_upload_failed` + operator review. When the parent-suite voting layer (D-PROJ-2 design task #2) ships, this flow does NOT change on the onboard side — the parent suite's promotion logic is invisible to onboard-side upload. ## Rollback Procedures ### Trigger criteria | Severity | Trigger | Decision-maker | |---|---|---| | Critical (per-airframe) | Commissioning flight fails AC-5.2 fallback (the FC IMU-only fallback also failed; airframe lost) | Safety review board (out of scope of this project) | | Critical (fleet-wide) | Any post-deploy AC-NEW-4 outlier indicates a regression: P(err > 1 km) measured on a real flight > AC threshold by ≥ 2x | Suite security + onboard team lead | | High (per-airframe) | Commissioning flight passes but post-flight FDR analysis shows AC-NEW-4 / AC-NEW-7 regression vs. prior release | Onboard team lead | | High (per-airframe) | Operator unable to complete pre-flight readiness gate (manifest hash gate fails repeatedly) | Operator + onboard team lead | | Medium (per-airframe) | Sustained `dead_reckoned` periods longer than expected; FDR segment drops occurring | Operator + onboard team lead (post-flight investigation; may not warrant immediate rollback) | ### Rollback steps (per-airframe) 1. **Re-flash** the previous release's JetPack image onto the affected Jetson (same procedure as § 4 with the previous artifact). 2. **Re-stage** the previous release's pre-flight bundle (the operator workstation retains it in the operator-orchestrator cache for ≥ 30 days). 3. **Re-run** the pre-takeoff readiness gate. 4. **Confirm** AC-5.2 fallback is still functional (it is FC firmware behavior; rolling back the companion image cannot break it, but verify on the GCS). 5. **Document** the rollback in the post-mortem template; include FDR snapshots from the offending flight (if any) plus the rollback artifacts versions. ### Database rollback (data_model.md § 4.2 reversibility) Per data_model.md § 4.2, every Alembic migration MUST implement a working `downgrade()`. Rolling back the JetPack image to the previous release rolls back the schema to whatever migration head the previous release uses. Concretely: - The previous release's JetPack image contains its own Alembic migration tree. - On boot, the previous-release runtime asserts `alembic current == head_for_that_release`. If the database is on a NEWER head (because the airframe ran the new release between deployments), the runtime invokes `alembic downgrade ` automatically. - If a migration is **not reversible** (which requires an explicit ADR — data_model.md § 4.2), the rollback must be manually adjudicated by the operator + onboard team lead. This case is rare by policy. ### Post-mortem Required after every rollback (per-airframe or fleet-wide): - Timeline: when was the new release flashed; when did the failure surface; when was rollback initiated. - Root cause: which AC was missed; which component is implicated; was it a regression introduced by this release or by a hardware/operational variable change. - What went wrong in the release process: did Tier-2 CI catch it; if not, why not. - Prevention: new test scenario added to NFT suite; new lint check; new rule in `_docs/LESSONS.md`. - Distribution: post-mortem report stored under `_docs/06_metrics/incident__.md` (per autodev failure-handling protocol). ## Deployment Checklist Pre-flash: - [ ] All Tier-2 AC-bound NFTs green at target version - [ ] Security scan clean (zero critical / high CVEs; SBOM diff passes ADR-002 enforcement) - [ ] Both Docker images built and pushed (deployment + research) - [ ] JetPack image built, signed, checksummed - [ ] Operator-tooling tarball built, signed - [ ] Pre-flight bundle prepared by operator (cache + calibration + manifest) - [ ] Pre-takeoff readiness gate behavior verified on a bench Jetson before flashing onto the production unit - [ ] Rollback artifact (previous release JetPack image) staged on operator workstation - [ ] FDR retention policy confirmed for the target airframe Post-flash: - [ ] First-flight commissioning flight cleared per § 7 - [ ] FDR retrieved and analyzed; AC-NEW-4 / AC-NEW-7 statistics within expected envelope - [ ] Post-landing upload procedure tested end-to-end (companion → operator workstation → `satellite-provider`) - [ ] Operator runbook updated with airframe-specific notes (e.g., "this airframe has UART2 wired to FC") ## Tier-2 enablement Until the Tier-2 self-hosted Jetson runner is fully provisioned: - AC-bound NFTs are gated as **manual trigger only** on PRs (`ci_cd_pipeline.md` § Manual-trigger override). - The merge gate on `dev` excludes Tier-2 NFTs; the merge gate on `stage` and `main` retains the full gate. - The pre-takeoff readiness gate (§ Pre-takeoff readiness gate) is unaffected — it runs on the Jetson at every takeoff regardless of CI gating posture. When the Tier-2 runner is in steady state, this section is removed and the merge gates harmonize across `dev` / `stage` / `main`.