mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 23:41:13 +00:00
5fe67023b2
Implements two new C12 services and rebalances the C11/C12 boundary in one atomic commit: * AZ-329 PostLandingUploadOrchestrator — gates C11 upload on the `flight_footer` FDR record's `clean_shutdown` field; 4 refusal modes; new FdrFooterReader Protocol + LocalFdrFooterReader. * AZ-330 OperatorReLocService — AC-3.4 visual-loss re-localization hint; reuses shared LatLonAlt; OperatorCommandTransport Protocol cut (E-C8 owns the future pymavlink concrete); new FDR record kind `c12.reloc.requested`; log redaction (lat/lon 5 decimals, reason 200 chars). * AZ-523 C11 internal flight-state gate removed (SRP refactor): `confirm_flight_state` / `FlightStateSignal` use / `FlightStateNotOnGroundError` deleted from C11; TileUploader contract bumped to v2.0.0 (frozen) with migration note; AZ-317 superseded. * AZ-524 Package rename `c12_operator_tooling` → `c12_operator_orchestrator` across source, tests, pyproject, CMake, Dockerfile, compose, CI, runtime-root services class (`OperatorOrchestratorServices`) + factory function (`build_operator_orchestrator`), logger namespaces, config slug, docs, and the E-C12 epic title. Tests: 1543 passed, 80 skipped (all environment gates). Targeted AC suite (AZ-329 + AZ-330 + FdrFooterReader): 37 passed. Cold-start NFR-perf still ≤ 500 ms p99. Tracker: AZ-317 → Done (superseded); AZ-319 v2.0.0 contract bump comment; AZ-329/AZ-330 → In Testing; AZ-253 epic renamed; AZ-523 + AZ-524 created and closed as audit-trail tickets. See `_docs/03_implementation/batch_44_cycle1_report.md`. Co-authored-by: Cursor <cursoragent@cursor.com>
246 lines
11 KiB
Markdown
246 lines
11 KiB
Markdown
# GPS-Denied Onboard — Containerization
|
||
|
||
> Date: 2026-05-09 (Plan Phase 2c — initial draft).
|
||
> Inputs: `_docs/02_document/architecture.md` § 3 (Deployment Model); `_docs/00_problem/restrictions.md` § Onboard Hardware; ADR-002 (build-time exclusion of unused strategies); ADR-005 (Tier-1 / Tier-2 are first-class).
|
||
|
||
## Containerization scope
|
||
|
||
This project has **asymmetric containerization** by design (architecture.md § 3, ADR-005):
|
||
|
||
- **Tier-1** (workstation): Docker is the universal runtime. Dev, lint, unit, most integration, and `mock-suite-sat-service` all run in Docker compose.
|
||
- **Tier-2 (Jetson)**: **NO Docker**. The deployed JetPack image runs the deployment binary natively. TensorRT INT8 calibration caches and `jetson-stats` thermal telemetry are most reliable without a container layer (D-C7-9 + D-C10-6). The "image" is a JetPack 6.2 system image with the deployment binary preinstalled.
|
||
- **Operator workstation**: Docker is used for the local `satellite-provider` mirror, the `mock-suite-sat-service` (when offline), and the operator-orchestrator stack (C11 Tile Manager + C12 Operator Pre-flight Orchestrator).
|
||
|
||
Three Dockerfiles are maintained; the airborne companion uses **none of them** in production.
|
||
|
||
## Component Dockerfiles
|
||
|
||
### `gps-denied-companion-tier1` (Tier-1 dev / CI only)
|
||
|
||
This image is for fast iterative development on a workstation. It is **never** flashed onto a Jetson.
|
||
|
||
| Property | Value |
|
||
|----------|-------|
|
||
| Base image | `nvidia/cuda:12.6.0-runtime-ubuntu22.04` (or `python:3.10-slim` if no GPU on dev box) |
|
||
| Build image | `nvidia/cuda:12.6.0-devel-ubuntu22.04` |
|
||
| Stages | `system-deps` → `python-deps` → `cpp-build` (CMake + GTSAM + FAISS + OpenCV + OKVIS2 + KltRansac) → `runtime` |
|
||
| User | `companion` (UID 1000, non-root) |
|
||
| Health check | `python -m gps_denied.healthcheck` (validates calibration JSON loadable + DB reachable + FAISS index mmap-able). 30 s interval. |
|
||
| Exposed ports | `5101/tcp` (companion control plane — Tier-1 only; Tier-2 production has no inbound network) |
|
||
| Key build args | `BUILD_VINS_MONO=OFF` (deployment build), `BUILD_SALAD=OFF`; `BUILD_VINS_MONO=ON BUILD_SALAD=ON` for the research build |
|
||
| Notes | Two distinct image tags built on every PR: `companion-tier1:deployment-<sha>` and `companion-tier1:research-<sha>` (ADR-002). |
|
||
|
||
### `mock-suite-sat-service` (Tier-1 e2e-test fixture; ADR-007 reversed 2026-05-09 — fixture only, not a component)
|
||
|
||
e2e-test fixture only — implements the planned D-PROJ-2 ingest contract (`POST /api/satellite/tiles/ingest`) so upload integration tests can run before the real endpoint ships service-side. Production never reaches it; the architectural counterparty for upload is the real `satellite-provider`. Download integration tests target the real `satellite-provider` directly (its GET surface is already implemented), not this fixture. Source lives under `tests/fixtures/mock-suite-sat-service/`, NOT `src/components/`.
|
||
|
||
| Property | Value |
|
||
|----------|-------|
|
||
| Base image | `mcr.microsoft.com/dotnet/aspnet:8.0-alpine` (matches the parent suite's stack) |
|
||
| Build image | `mcr.microsoft.com/dotnet/sdk:8.0-alpine` |
|
||
| Stages | `restore` → `build` → `publish` → `runtime` |
|
||
| User | `mock` (non-root) |
|
||
| Health check | HTTP `GET /healthz` (returns 200 if listening + storage backend mounted). 10 s interval. |
|
||
| Exposed ports | `5100/tcp` (matches `satellite-provider`'s port so the same client config works) |
|
||
| Key build args | `MOCK_FAILURE_PROFILE` (default `none`; used by NFT-SEC-01 to inject latency / 5xx / partial responses) |
|
||
| Notes | The mock is a release artifact (operator-orchestrator tarball includes its compose file). When the real `satellite-provider` D-PROJ-2 endpoint ships, the mock is retired. |
|
||
|
||
### `operator-orchestrator` (Operator workstation Tile Manager + pre-flight UI, C11 + C12)
|
||
|
||
| Property | Value |
|
||
|----------|-------|
|
||
| Base image | `python:3.10-slim` |
|
||
| Build image | `python:3.10-slim` (no native deps; pure Python plus `httpx` for both download and upload, `psycopg` for read/write of C6 mirror, `cryptography` for upload signing) |
|
||
| Stages | `python-deps` → `runtime` |
|
||
| User | `operator` (non-root) |
|
||
| Health check | `python -m operator_orchestrator.healthcheck` (validates `satellite-provider` reachable). 30 s interval. |
|
||
| Exposed ports | `8080/tcp` (operator pre-flight UI, C12); no inbound network for C11 Tile Manager (it's a CLI / one-shot tool, both directions) |
|
||
| Key build args | `INCLUDE_PRE_FLIGHT_UI=true` (default; can be turned off for headless CLI-only deployments) |
|
||
| Notes | **C11 Tile Manager (both `TileDownloader` and `TileUploader`) is in this image, NEVER in `gps-denied-companion-tier1`** (ADR-004 process-level isolation). The airborne deployment binary on Tier-2 also does not contain C11. |
|
||
|
||
## Docker Compose — Local Development
|
||
|
||
```yaml
|
||
# docker-compose.yml
|
||
services:
|
||
companion:
|
||
build:
|
||
context: .
|
||
dockerfile: docker/companion-tier1.Dockerfile
|
||
args:
|
||
BUILD_VINS_MONO: "OFF"
|
||
BUILD_SALAD: "OFF"
|
||
image: gps-denied/companion-tier1:dev
|
||
environment:
|
||
- DB_URL=postgresql://gps_denied:dev@db:5432/gps_denied
|
||
- SATELLITE_PROVIDER_URL=http://mock-sat:5100
|
||
- CAMERA_CALIBRATION_PATH=/fixtures/calibration/adti26.json
|
||
- LOG_LEVEL=DEBUG
|
||
- GPS_DENIED_FC_PROFILE=ardupilot_plane
|
||
volumes:
|
||
- ./tests/fixtures:/fixtures:ro
|
||
- tile-cache:/var/lib/gps-denied/tiles
|
||
- fdr:/var/lib/gps-denied/fdr
|
||
depends_on:
|
||
db: { condition: service_healthy }
|
||
mock-sat: { condition: service_healthy }
|
||
healthcheck:
|
||
test: ["CMD", "python", "-m", "gps_denied.healthcheck"]
|
||
interval: 30s
|
||
timeout: 10s
|
||
retries: 3
|
||
networks: [ gps-denied-net ]
|
||
|
||
mock-sat:
|
||
build:
|
||
context: ./mock-suite-sat-service
|
||
dockerfile: Dockerfile
|
||
image: gps-denied/mock-suite-sat-service:dev
|
||
environment:
|
||
- ASPNETCORE_URLS=http://+:5100
|
||
- MOCK_FAILURE_PROFILE=none
|
||
volumes:
|
||
- mock-sat-tiles:/srv/tiles
|
||
healthcheck:
|
||
test: ["CMD", "wget", "-q", "-O-", "http://localhost:5100/healthz"]
|
||
interval: 10s
|
||
networks: [ gps-denied-net ]
|
||
|
||
db:
|
||
image: postgres:16-alpine
|
||
environment:
|
||
- POSTGRES_DB=gps_denied
|
||
- POSTGRES_USER=gps_denied
|
||
- POSTGRES_PASSWORD=dev
|
||
volumes:
|
||
- db-data:/var/lib/postgresql/data
|
||
- ./docker/db-init:/docker-entrypoint-initdb.d:ro
|
||
healthcheck:
|
||
test: ["CMD", "pg_isready", "-U", "gps_denied"]
|
||
interval: 5s
|
||
networks: [ gps-denied-net ]
|
||
|
||
operator-orchestrator:
|
||
build:
|
||
context: .
|
||
dockerfile: docker/operator-orchestrator.Dockerfile
|
||
image: gps-denied/operator-orchestrator:dev
|
||
environment:
|
||
- SATELLITE_PROVIDER_URL=http://mock-sat:5100
|
||
- COMPANION_DB_URL=postgresql://gps_denied:dev@db:5432/gps_denied
|
||
ports:
|
||
- "8080:8080"
|
||
depends_on:
|
||
mock-sat: { condition: service_healthy }
|
||
networks: [ gps-denied-net ]
|
||
|
||
volumes:
|
||
tile-cache:
|
||
fdr:
|
||
db-data:
|
||
mock-sat-tiles:
|
||
|
||
networks:
|
||
gps-denied-net:
|
||
```
|
||
|
||
## Docker Compose — Tier-1 Integration & Blackbox Tests
|
||
|
||
```yaml
|
||
# docker-compose.test.yml
|
||
services:
|
||
companion:
|
||
extends:
|
||
file: docker-compose.yml
|
||
service: companion
|
||
environment:
|
||
- LOG_LEVEL=INFO
|
||
- GPS_DENIED_REPLAY_FIXTURE=/fixtures/flight_derkachi
|
||
- GPS_DENIED_TIER=1
|
||
|
||
mock-sat:
|
||
extends:
|
||
file: docker-compose.yml
|
||
service: mock-sat
|
||
volumes:
|
||
- ./tests/fixtures/tiles_corpus:/srv/tiles:ro
|
||
|
||
db:
|
||
extends:
|
||
file: docker-compose.yml
|
||
service: db
|
||
volumes:
|
||
- ./tests/fixtures/seed-db.sql:/docker-entrypoint-initdb.d/01_seed.sql:ro
|
||
|
||
e2e-runner:
|
||
build:
|
||
context: ./e2e
|
||
dockerfile: Dockerfile
|
||
image: gps-denied/e2e-runner:dev
|
||
depends_on:
|
||
companion: { condition: service_healthy }
|
||
mock-sat: { condition: service_healthy }
|
||
db: { condition: service_healthy }
|
||
environment:
|
||
- PYTEST_ARGS=--csv=/results/report.csv -v
|
||
volumes:
|
||
- ./e2e/results:/results
|
||
```
|
||
|
||
Run: `docker compose -f docker-compose.test.yml up --abort-on-container-exit --exit-code-from e2e-runner --build`.
|
||
|
||
## Tier-2 — Jetson runtime (NO Docker)
|
||
|
||
The Tier-2 deployment is a **JetPack 6.2 system image**, not a container. Its assembly is documented in `deployment_procedures.md` § Production Deployment. Key constraints driving the no-Docker decision (architecture.md § 3, D-C7-9 + D-C10-6):
|
||
|
||
1. **TensorRT INT8 calibration caches**: most reliable when the SM/JetPack/TRT triple matches the host kernel exactly; container-host abstraction is a known source of drift.
|
||
2. **`jetson-stats` thermal telemetry**: needs root + sysfs access; runs cleanest on bare metal.
|
||
3. **AC-NEW-1 cold-start budget (30 s p95)**: container start adds 1–2 s overhead the budget cannot afford.
|
||
4. **AC-NEW-3 FDR storage (≤ 64 GB)**: the FDR ring is mounted on the host's NVM directly; a container layer would either bind-mount (no benefit) or copy (defeats the storage guarantee).
|
||
|
||
Tier-2 CI runs the same deployment binary directly on the self-hosted Jetson runner, with no container shim.
|
||
|
||
## Image Tagging Strategy
|
||
|
||
| Context | Tag Format | Example |
|
||
|---------|-----------|---------|
|
||
| CI build (deployment binary) | `<registry>/gps-denied/companion-tier1:deployment-<git-sha>` | `ghcr.io/azaion/gps-denied/companion-tier1:deployment-a1b2c3d` |
|
||
| CI build (research binary) | `<registry>/gps-denied/companion-tier1:research-<git-sha>` | `ghcr.io/azaion/gps-denied/companion-tier1:research-a1b2c3d` |
|
||
| Mock sat service | `<registry>/gps-denied/mock-suite-sat-service:<git-sha>` | `ghcr.io/azaion/gps-denied/mock-suite-sat-service:a1b2c3d` |
|
||
| Operator tooling | `<registry>/gps-denied/operator-orchestrator:<git-sha>` | `ghcr.io/azaion/gps-denied/operator-orchestrator:a1b2c3d` |
|
||
| Release | `<registry>/gps-denied/<image>:<semver>` | `ghcr.io/azaion/gps-denied/companion-tier1:deployment-1.2.0` |
|
||
| Local dev | `gps-denied/<image>:dev` | `gps-denied/companion-tier1:dev` |
|
||
| JetPack image (Tier-2) | `gps-denied-jetpack-<semver>-<sha>.img` | `gps-denied-jetpack-1.2.0-a1b2c3d.img` (file artifact, not a container tag) |
|
||
|
||
## SBOM and binary track
|
||
|
||
CI emits both Tier-1 binary tracks on every PR (ADR-002). After build, an SBOM diff step asserts:
|
||
|
||
- The deployment-binary SBOM **must NOT** include `vins_mono`, `salad`, or any other research-only library.
|
||
- The research-binary SBOM **must** include every strategy listed in the architecture.
|
||
|
||
A failing SBOM diff fails the PR. SBOM artifacts are attached to the release; they are NOT shipped on the deployed Jetson image (they live only in the release artifacts directory).
|
||
|
||
## .dockerignore
|
||
|
||
```
|
||
.git
|
||
.cursor
|
||
_docs
|
||
_standalone
|
||
node_modules
|
||
**/bin
|
||
**/obj
|
||
**/__pycache__
|
||
**/.venv
|
||
**/venv
|
||
**/.pytest_cache
|
||
**/.mypy_cache
|
||
*.md
|
||
.env*
|
||
docker-compose*.yml
|
||
tests/fixtures/large_replays/
|
||
```
|
||
|
||
The `tests/fixtures/large_replays/` exclusion is critical: that directory holds the Derkachi flight footage (multi-GB) which is mounted into the test runner via `volumes:` rather than baked into images.
|