mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 08:21:13 +00:00
[AZ-329] [AZ-330] [AZ-523] [AZ-524] Batch 44 atomic refactor
Implements two new C12 services and rebalances the C11/C12 boundary in one atomic commit: * AZ-329 PostLandingUploadOrchestrator — gates C11 upload on the `flight_footer` FDR record's `clean_shutdown` field; 4 refusal modes; new FdrFooterReader Protocol + LocalFdrFooterReader. * AZ-330 OperatorReLocService — AC-3.4 visual-loss re-localization hint; reuses shared LatLonAlt; OperatorCommandTransport Protocol cut (E-C8 owns the future pymavlink concrete); new FDR record kind `c12.reloc.requested`; log redaction (lat/lon 5 decimals, reason 200 chars). * AZ-523 C11 internal flight-state gate removed (SRP refactor): `confirm_flight_state` / `FlightStateSignal` use / `FlightStateNotOnGroundError` deleted from C11; TileUploader contract bumped to v2.0.0 (frozen) with migration note; AZ-317 superseded. * AZ-524 Package rename `c12_operator_tooling` → `c12_operator_orchestrator` across source, tests, pyproject, CMake, Dockerfile, compose, CI, runtime-root services class (`OperatorOrchestratorServices`) + factory function (`build_operator_orchestrator`), logger namespaces, config slug, docs, and the E-C12 epic title. Tests: 1543 passed, 80 skipped (all environment gates). Targeted AC suite (AZ-329 + AZ-330 + FdrFooterReader): 37 passed. Cold-start NFR-perf still ≤ 500 ms p99. Tracker: AZ-317 → Done (superseded); AZ-319 v2.0.0 contract bump comment; AZ-329/AZ-330 → In Testing; AZ-253 epic renamed; AZ-523 + AZ-524 created and closed as audit-trail tickets. See `_docs/03_implementation/batch_44_cycle1_report.md`. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -19,7 +19,7 @@ The pipeline has **two execution tiers** (architecture.md ADR-005), reflected in
|
||||
| Build (Tier-2 deployment binary) | PR merge to `dev`, `stage`, `main` | Tier-2 (self-hosted Jetson) | Native build on Jetson green; deployment binary SBOM matches Tier-1 deployment SBOM |
|
||||
| AC-bound NFTs (Tier-2) | PR merge to `dev`, `stage`, `main`; manual on PR | Tier-2 | NFT-PERF-* (AC-4.1, AC-NEW-1, AC-NEW-2), NFT-LIM-* (AC-4.2, AC-NEW-3), NFT-RES-* (AC-NEW-4, AC-NEW-7), IT-12 (comparative study) all pass thresholds in `tests/traceability-matrix.md` |
|
||||
| JetPack image build | Tag on `main` | Tier-2 | JetPack 6.2 image built with deployment binary preinstalled, signed, and attested |
|
||||
| Operator tooling tarball | Tag on `main` | Tier-1 | Tarball contains C11 Tile Manager (both `TileDownloader` and `TileUploader`) + C12 Operator Pre-flight Tooling + mock-sat-service compose + verification script |
|
||||
| Operator tooling tarball | Tag on `main` | Tier-1 | Tarball contains C11 Tile Manager (both `TileDownloader` and `TileUploader`) + C12 Operator Pre-flight Orchestrator + mock-sat-service compose + verification script |
|
||||
|
||||
Tier-2 jobs are the **only** AC-bound jobs. Everything else runs on Tier-1.
|
||||
|
||||
@@ -146,7 +146,7 @@ Runs on tag push to `main`. Produces `gps-denied-jetpack-<semver>-<sha>.img` (th
|
||||
|
||||
### Operator tooling tarball (release-only)
|
||||
|
||||
Bundles `operator-tooling` Docker image + `mock-suite-sat-service` Docker image + their compose file + a verification script + the documentation under `_docs/02_document/`. The tarball is uploaded to the release bucket alongside the JetPack image.
|
||||
Bundles `operator-orchestrator` Docker image + `mock-suite-sat-service` Docker image + their compose file + a verification script + the documentation under `_docs/02_document/`. The tarball is uploaded to the release bucket alongside the JetPack image.
|
||||
|
||||
## Caching Strategy
|
||||
|
||||
|
||||
@@ -9,7 +9,7 @@ This project has **asymmetric containerization** by design (architecture.md § 3
|
||||
|
||||
- **Tier-1** (workstation): Docker is the universal runtime. Dev, lint, unit, most integration, and `mock-suite-sat-service` all run in Docker compose.
|
||||
- **Tier-2 (Jetson)**: **NO Docker**. The deployed JetPack image runs the deployment binary natively. TensorRT INT8 calibration caches and `jetson-stats` thermal telemetry are most reliable without a container layer (D-C7-9 + D-C10-6). The "image" is a JetPack 6.2 system image with the deployment binary preinstalled.
|
||||
- **Operator workstation**: Docker is used for the local `satellite-provider` mirror, the `mock-suite-sat-service` (when offline), and the operator-tooling stack (C11 Tile Manager + C12 Operator Pre-flight Tooling).
|
||||
- **Operator workstation**: Docker is used for the local `satellite-provider` mirror, the `mock-suite-sat-service` (when offline), and the operator-orchestrator stack (C11 Tile Manager + C12 Operator Pre-flight Orchestrator).
|
||||
|
||||
Three Dockerfiles are maintained; the airborne companion uses **none of them** in production.
|
||||
|
||||
@@ -43,9 +43,9 @@ e2e-test fixture only — implements the planned D-PROJ-2 ingest contract (`POST
|
||||
| Health check | HTTP `GET /healthz` (returns 200 if listening + storage backend mounted). 10 s interval. |
|
||||
| Exposed ports | `5100/tcp` (matches `satellite-provider`'s port so the same client config works) |
|
||||
| Key build args | `MOCK_FAILURE_PROFILE` (default `none`; used by NFT-SEC-01 to inject latency / 5xx / partial responses) |
|
||||
| Notes | The mock is a release artifact (operator-tooling tarball includes its compose file). When the real `satellite-provider` D-PROJ-2 endpoint ships, the mock is retired. |
|
||||
| Notes | The mock is a release artifact (operator-orchestrator tarball includes its compose file). When the real `satellite-provider` D-PROJ-2 endpoint ships, the mock is retired. |
|
||||
|
||||
### `operator-tooling` (Operator workstation Tile Manager + pre-flight UI, C11 + C12)
|
||||
### `operator-orchestrator` (Operator workstation Tile Manager + pre-flight UI, C11 + C12)
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
@@ -53,7 +53,7 @@ e2e-test fixture only — implements the planned D-PROJ-2 ingest contract (`POST
|
||||
| Build image | `python:3.10-slim` (no native deps; pure Python plus `httpx` for both download and upload, `psycopg` for read/write of C6 mirror, `cryptography` for upload signing) |
|
||||
| Stages | `python-deps` → `runtime` |
|
||||
| User | `operator` (non-root) |
|
||||
| Health check | `python -m operator_tooling.healthcheck` (validates `satellite-provider` reachable). 30 s interval. |
|
||||
| Health check | `python -m operator_orchestrator.healthcheck` (validates `satellite-provider` reachable). 30 s interval. |
|
||||
| Exposed ports | `8080/tcp` (operator pre-flight UI, C12); no inbound network for C11 Tile Manager (it's a CLI / one-shot tool, both directions) |
|
||||
| Key build args | `INCLUDE_PRE_FLIGHT_UI=true` (default; can be turned off for headless CLI-only deployments) |
|
||||
| Notes | **C11 Tile Manager (both `TileDownloader` and `TileUploader`) is in this image, NEVER in `gps-denied-companion-tier1`** (ADR-004 process-level isolation). The airborne deployment binary on Tier-2 also does not contain C11. |
|
||||
@@ -120,11 +120,11 @@ services:
|
||||
interval: 5s
|
||||
networks: [ gps-denied-net ]
|
||||
|
||||
operator-tooling:
|
||||
operator-orchestrator:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: docker/operator-tooling.Dockerfile
|
||||
image: gps-denied/operator-tooling:dev
|
||||
dockerfile: docker/operator-orchestrator.Dockerfile
|
||||
image: gps-denied/operator-orchestrator:dev
|
||||
environment:
|
||||
- SATELLITE_PROVIDER_URL=http://mock-sat:5100
|
||||
- COMPANION_DB_URL=postgresql://gps_denied:dev@db:5432/gps_denied
|
||||
@@ -207,7 +207,7 @@ Tier-2 CI runs the same deployment binary directly on the self-hosted Jetson run
|
||||
| CI build (deployment binary) | `<registry>/gps-denied/companion-tier1:deployment-<git-sha>` | `ghcr.io/azaion/gps-denied/companion-tier1:deployment-a1b2c3d` |
|
||||
| CI build (research binary) | `<registry>/gps-denied/companion-tier1:research-<git-sha>` | `ghcr.io/azaion/gps-denied/companion-tier1:research-a1b2c3d` |
|
||||
| Mock sat service | `<registry>/gps-denied/mock-suite-sat-service:<git-sha>` | `ghcr.io/azaion/gps-denied/mock-suite-sat-service:a1b2c3d` |
|
||||
| Operator tooling | `<registry>/gps-denied/operator-tooling:<git-sha>` | `ghcr.io/azaion/gps-denied/operator-tooling:a1b2c3d` |
|
||||
| Operator tooling | `<registry>/gps-denied/operator-orchestrator:<git-sha>` | `ghcr.io/azaion/gps-denied/operator-orchestrator:a1b2c3d` |
|
||||
| Release | `<registry>/gps-denied/<image>:<semver>` | `ghcr.io/azaion/gps-denied/companion-tier1:deployment-1.2.0` |
|
||||
| Local dev | `gps-denied/<image>:dev` | `gps-denied/companion-tier1:dev` |
|
||||
| JetPack image (Tier-2) | `gps-denied-jetpack-<semver>-<sha>.img` | `gps-denied-jetpack-1.2.0-a1b2c3d.img` (file artifact, not a container tag) |
|
||||
|
||||
@@ -5,12 +5,12 @@
|
||||
|
||||
## Deployment scope and model
|
||||
|
||||
This project does **not** ship a service; it ships an **embedded edge image** plus an **operator-tooling bundle**. The "deployment" patterns from the standard template (blue-green / rolling / canary) are not applicable. Deployment for this project means:
|
||||
This project does **not** ship a service; it ships an **embedded edge image** plus an **operator-orchestrator bundle**. The "deployment" patterns from the standard template (blue-green / rolling / canary) are not applicable. Deployment for this project means:
|
||||
|
||||
| Artifact | Target | Deployment mechanism |
|
||||
|---|---|---|
|
||||
| **JetPack image** (`gps-denied-jetpack-<semver>-<sha>.img`) | Production Jetson Orin Nano Super on a UAV | Operator flashes the image onto the Jetson via NVIDIA `sdkmanager` or `Etcher`-style `dd` from the operator workstation |
|
||||
| **Operator tooling tarball** | Operator workstation | Operator extracts; `docker compose up -d` brings up `mock-suite-sat-service` (when offline) + `operator-tooling` |
|
||||
| **Operator tooling tarball** | Operator workstation | Operator extracts; `docker compose up -d` brings up `mock-suite-sat-service` (when offline) + `operator-orchestrator` |
|
||||
| **Tier-1 dev compose** | Developer workstation | Developer runs `docker compose up` from repo root |
|
||||
|
||||
**Zero-downtime is not a goal**: a UAV is not in service while it is being re-flashed. The deployment cadence is per-airframe maintenance, not per-request availability.
|
||||
@@ -25,9 +25,9 @@ Performed once per release on Tier-1 + Tier-2 CI; produces signed artifacts stor
|
||||
2. **Tier-1 produces**:
|
||||
- `companion-tier1:deployment-<sha>` and `companion-tier1:research-<sha>` Docker images (pushed to registry).
|
||||
- `mock-suite-sat-service:<sha>` Docker image.
|
||||
- `operator-tooling:<sha>` Docker image.
|
||||
- `operator-orchestrator:<sha>` Docker image.
|
||||
- SBOM artifacts for both binaries (deployment and research).
|
||||
- `operator-tooling-<semver>-<sha>.tar.gz` containing the operator-tooling image + mock-sat image + their compose file + verification script + relevant docs.
|
||||
- `operator-orchestrator-<semver>-<sha>.tar.gz` containing the operator-orchestrator image + mock-sat image + their compose file + verification script + relevant docs.
|
||||
3. **Tier-2 produces**:
|
||||
- Native deployment-binary build on the self-hosted Jetson runner.
|
||||
- SBOM verification: byte-equal (after canonicalization) to Tier-1's deployment-binary SBOM. Mismatch fails the release.
|
||||
@@ -35,7 +35,7 @@ Performed once per release on Tier-1 + Tier-2 CI; produces signed artifacts stor
|
||||
4. **Signing** (Tier-1):
|
||||
- Both Docker image manifests are signed with the project's release key.
|
||||
- The JetPack image is signed; checksum is published as a separate signed file (`gps-denied-jetpack-<semver>-<sha>.img.sha256.sig`).
|
||||
- The operator-tooling tarball is signed.
|
||||
- The operator-orchestrator tarball is signed.
|
||||
5. **Release bucket**: artifacts uploaded; release notes published; the previous release's artifacts retained for at least 90 days for rollback support.
|
||||
|
||||
A release fails if any step above fails — including any AC-bound NFT failure on Tier-2 (`ci_cd_pipeline.md` § AC-bound NFTs).
|
||||
@@ -85,19 +85,19 @@ cosign verify-blob \
|
||||
|
||||
sha256sum -c gps-denied-jetpack-<semver>-<sha>.img.sha256
|
||||
|
||||
# Verify the operator-tooling tarball.
|
||||
# Verify the operator-orchestrator tarball.
|
||||
cosign verify-blob \
|
||||
--signature operator-tooling-<semver>-<sha>.tar.gz.sig \
|
||||
--signature operator-orchestrator-<semver>-<sha>.tar.gz.sig \
|
||||
--key gps-denied-release-key.pub \
|
||||
operator-tooling-<semver>-<sha>.tar.gz
|
||||
operator-orchestrator-<semver>-<sha>.tar.gz
|
||||
```
|
||||
|
||||
### 3. Pre-flight cache build (operator-tooling C12)
|
||||
### 3. Pre-flight cache build (operator-orchestrator C12)
|
||||
|
||||
Performed on the operator workstation, with `satellite-provider` reachable (locally mirrored or via lab VPN).
|
||||
|
||||
```sh
|
||||
docker compose -f operator-tooling-compose.yml up -d
|
||||
docker compose -f operator-orchestrator-compose.yml up -d
|
||||
# Operator opens http://127.0.0.1:8080
|
||||
```
|
||||
|
||||
@@ -164,7 +164,7 @@ The first flight on a freshly-deployed airframe is a **commissioning flight**, n
|
||||
|
||||
Post first commissioning flight:
|
||||
|
||||
- [ ] FDR retrieved and visualized on operator workstation (operator-tooling C12 dashboard, observability.md § 5.1).
|
||||
- [ ] FDR retrieved and visualized on operator workstation (operator-orchestrator C12 dashboard, observability.md § 5.1).
|
||||
- [ ] AC-NEW-4 statistics for the commissioning flight reviewed; outliers investigated.
|
||||
- [ ] No FDR segment drops; no `ContentHashGateFail` events.
|
||||
- [ ] Mid-flight tile generation working (post-landing upload — handle that separately).
|
||||
@@ -172,12 +172,12 @@ Post first commissioning flight:
|
||||
|
||||
## Post-landing tile upload (per-flight, ADR-004)
|
||||
|
||||
Per AC-8.4 + ADR-004, mid-flight tile upload to `satellite-provider` is **post-landing only**, and uses the operator-tooling's C11 Tile Manager (`TileUploader` interface; a separate binary, never linked into the airborne image).
|
||||
Per AC-8.4 + ADR-004, mid-flight tile upload to `satellite-provider` is **post-landing only**, and uses the operator-orchestrator's C11 Tile Manager (`TileUploader` interface; a separate binary, never linked into the airborne image).
|
||||
|
||||
```sh
|
||||
# Operator plugs the companion's NVM into the workstation OR ssh's into the powered-off-then-re-booted Jetson.
|
||||
docker compose run operator-tooling \
|
||||
python -m operator_tooling.tilemanager upload \
|
||||
docker compose run operator-orchestrator \
|
||||
python -m operator_orchestrator.tilemanager upload \
|
||||
--flight-id <uuid> \
|
||||
--satellite-provider $SATELLITE_PROVIDER_URL \
|
||||
--signing-pubkey-fingerprint <fingerprint>
|
||||
@@ -210,7 +210,7 @@ When the parent-suite voting layer (D-PROJ-2 design task #2) ships, this flow do
|
||||
### Rollback steps (per-airframe)
|
||||
|
||||
1. **Re-flash** the previous release's JetPack image onto the affected Jetson (same procedure as § 4 with the previous artifact).
|
||||
2. **Re-stage** the previous release's pre-flight bundle (the operator workstation retains it in the operator-tooling cache for ≥ 30 days).
|
||||
2. **Re-stage** the previous release's pre-flight bundle (the operator workstation retains it in the operator-orchestrator cache for ≥ 30 days).
|
||||
3. **Re-run** the pre-takeoff readiness gate.
|
||||
4. **Confirm** AC-5.2 fallback is still functional (it is FC firmware behavior; rolling back the companion image cannot break it, but verify on the GCS).
|
||||
5. **Document** the rollback in the post-mortem template; include FDR snapshots from the offending flight (if any) plus the rollback artifacts versions.
|
||||
|
||||
@@ -141,7 +141,7 @@ This means the threat surface on a captured companion reduces to "what is in the
|
||||
|---|---|---|
|
||||
| Per-flight MAVLink signing key | Every flight (per-flight ephemeral) | Automated at takeoff load |
|
||||
| Per-flight onboard tile-signing key | Every flight (per-flight ephemeral) | Automated at takeoff load |
|
||||
| `SATELLITE_PROVIDER_API_KEY` | Operator-managed; rotated when an operator workstation is reissued or compromised is suspected | Operator workstation hardening procedure (out of scope of this document; operator-tooling C12 owns it) |
|
||||
| `SATELLITE_PROVIDER_API_KEY` | Operator-managed; rotated when an operator workstation is reissued or compromised is suspected | Operator workstation hardening procedure (out of scope of this document; operator-orchestrator C12 owns it) |
|
||||
| Production binary signing key | Per release cycle or on suspected compromise | Release engineer rotates; new key fingerprint is published in release notes; verification scripts on the operator workstation pull the latest fingerprint |
|
||||
| JetPack image signing key | Same as production binary signing key | Same |
|
||||
|
||||
|
||||
@@ -12,7 +12,7 @@ Observability therefore splits into three regimes:
|
||||
| Regime | Where | Live or post-flight | Primary mechanism |
|
||||
|---|---|---|---|
|
||||
| **In-flight onboard** | Production Jetson, in flight | Live (to FDR ring) + best-effort live (to GCS) | FDR binary record stream + GCS STATUSTEXT / NAMED_VALUE_FLOAT |
|
||||
| **Post-flight onboard** | Operator workstation after pulling the FDR | Post-flight | FDR replay + visualization in operator-tooling C12 |
|
||||
| **Post-flight onboard** | Operator workstation after pulling the FDR | Post-flight | FDR replay + visualization in operator-orchestrator C12 |
|
||||
| **CI / dev (Tier-1, Tier-2)** | Workstation Docker / Jetson CI runner | Live | Standard structured logging + Prometheus metrics endpoint where applicable |
|
||||
|
||||
The sections below are organized by regime.
|
||||
@@ -85,7 +85,7 @@ There is no Prometheus endpoint on the production airborne companion. The justif
|
||||
|
||||
When the operator plugs the companion in post-landing:
|
||||
|
||||
1. **FDR retrieval** (operator tooling C12 — feature, not in scope of this document's structure but observability-impacting): operator-tooling reads the FDR ring, copies it to the workstation, and seals the in-flight ring. The companion's per-flight ephemeral keys are deleted at this step (environment_strategy.md § Per-flight key lifecycle).
|
||||
1. **FDR retrieval** (operator tooling C12 — feature, not in scope of this document's structure but observability-impacting): operator-orchestrator reads the FDR ring, copies it to the workstation, and seals the in-flight ring. The companion's per-flight ephemeral keys are deleted at this step (environment_strategy.md § Per-flight key lifecycle).
|
||||
2. **Visualization** (operator tooling C12): the workstation renders:
|
||||
- Time-series of `horiz_accuracy`, `vert_accuracy`, `last_anchor_age_ms`, source label timeline, thermal-throttle hybrid switches, and CPU / GPU / temp.
|
||||
- Map view: emitted positions vs. (when available) FC `GLOBAL_POSITION_INT` ground truth.
|
||||
@@ -173,7 +173,7 @@ Collection interval: 15 s (typical Prometheus default; Tier-2 NFT runs may use 1
|
||||
|
||||
The runtime is a single in-process Python program with no cross-service hops in flight (architecture.md § 5 internal communication is all in-process). Distributed tracing is therefore not applicable to the production runtime.
|
||||
|
||||
The Tier-1 integration setup DOES involve cross-container hops (companion ↔ mock-sat ↔ db ↔ e2e-runner), but those are exercised by the e2e test framework's own log + status capture; OpenTelemetry is not provisioned for this project. If a future cycle introduces a multi-process companion (which ADR-004 explicitly rejected for the airborne profile but might appear on the operator workstation for C11 Tile Manager + C12 Operator Pre-flight Tooling), tracing can be reconsidered then.
|
||||
The Tier-1 integration setup DOES involve cross-container hops (companion ↔ mock-sat ↔ db ↔ e2e-runner), but those are exercised by the e2e test framework's own log + status capture; OpenTelemetry is not provisioned for this project. If a future cycle introduces a multi-process companion (which ADR-004 explicitly rejected for the airborne profile but might appear on the operator workstation for C11 Tile Manager + C12 Operator Pre-flight Orchestrator), tracing can be reconsidered then.
|
||||
|
||||
## 4. Alerting (post-flight, not in-flight)
|
||||
|
||||
@@ -201,7 +201,7 @@ There is no PagerDuty / on-call rotation for this project; in-flight failures ar
|
||||
|
||||
### 5.1 Operator workstation post-flight dashboard
|
||||
|
||||
Built into operator-tooling C12. Per flight:
|
||||
Built into operator-orchestrator C12. Per flight:
|
||||
|
||||
- Time series: source label, `horiz_accuracy`, `last_anchor_age_ms`, CPU%, GPU%, temp.
|
||||
- Event markers: VISUAL_BLACKOUT entries, spoofing events, signing key rotations, thermal hybrid switches.
|
||||
@@ -227,6 +227,6 @@ Out of scope by design. The GCS is the only live operator surface; all other ins
|
||||
|
||||
## 6. Open Items / Plan-Phase Carryforward
|
||||
|
||||
- **Long-term FDR archive** (multi-flight statistical headroom): D-PROJ-3 (multi-flight fixture acquisition for AC-NEW-4 / AC-NEW-7) is not pursued this cycle. If pursued in a future cycle, post-flight FDR archives become a corpus contribution path; the operator-tooling FDR-retrieval step would need an explicit "contribute to corpus" toggle.
|
||||
- **Long-term FDR archive** (multi-flight statistical headroom): D-PROJ-3 (multi-flight fixture acquisition for AC-NEW-4 / AC-NEW-7) is not pursued this cycle. If pursued in a future cycle, post-flight FDR archives become a corpus contribution path; the operator-orchestrator FDR-retrieval step would need an explicit "contribute to corpus" toggle.
|
||||
- **Telemetry-link encryption** beyond MAVLink-2.0 signing: out of scope; addressed by physical link assumptions in the threat model (architecture.md § 7).
|
||||
- **iNav signing**: still has no equivalent to MAVLink-2.0 signing (Mode B Source #129). Carryforward Plan-phase action: file a feature request upstream; meanwhile observability for iNav-profile flights is the same as AP-profile minus the `MavlinkSigningKeyRotated` records (which are NULL on iNav flights per data_model.md § 2.2).
|
||||
|
||||
Reference in New Issue
Block a user