mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 20:31:12 +00:00
bf13549b32
ci/woodpecker/push/02-build-push Pipeline failed
- Enhanced `.env.example` with detailed CMake build flags and replay-mode strategy flags for development and CI environments. - Updated `.gitignore` to include a new deploy rollback bookmark. - Revised `_docs/_autodev_state.md` to reflect the current task status and steps. - Added new lessons to `_docs/LESSONS.md` regarding testing and architectural improvements. - Documented changes in `_docs/02_document/deployment/ci_cd_pipeline.md` to reflect the relaxed OpenCV version pin. - Updated test data documentation in `_docs/02_document/tests/test-data.md` to clarify fixture usage and paths. This commit continues the cycle-1 documentation sync and addresses various configuration updates for improved clarity and functionality.
215 lines
14 KiB
Markdown
215 lines
14 KiB
Markdown
# Phase 4 — Configuration & Infrastructure Review
|
||
|
||
**Review date**: 2026-05-19
|
||
**Scope**: Container build files, docker-compose topology, env templates, committed-secret hygiene, network policy, gitignore.
|
||
**Files reviewed**:
|
||
|
||
- `docker/companion-tier1.Dockerfile`
|
||
- `docker/operator-orchestrator.Dockerfile`
|
||
- `docker/mock-suite-sat-service.Dockerfile`
|
||
- `e2e/runner/Dockerfile`
|
||
- `e2e/fixtures/mock-suite-sat/Dockerfile`
|
||
- `e2e/fixtures/tile-cache-builder/Dockerfile`
|
||
- `e2e/docker/docker-compose.test.yml`
|
||
- `e2e/docker/docker-compose.tier2-bridge.yml`
|
||
- `e2e/docker/secrets/{README.md,mavlink_passkey}`
|
||
- `e2e/fixtures/secrets/{README.md,mavlink-test-passkey.txt}`
|
||
- `e2e/runner/requirements.txt`
|
||
- `.env.example`
|
||
- `.gitignore`
|
||
- `scripts/run-tests.sh`, `scripts/run-tests-jetson.sh`
|
||
|
||
## Summary
|
||
|
||
| Severity (this project) | Count |
|
||
|---|---|
|
||
| Critical | 0 |
|
||
| High | 0 |
|
||
| Medium | 4 |
|
||
| Low | 5 |
|
||
| Informational / positive observations | 7 |
|
||
|
||
The closed-system threat model (no inbound listeners, no airborne network egress — see Phase 3 § A04) caps the blast radius of any container-hardening gap. The Medium-severity findings would all be raised to High in a multi-tenant or internet-exposed deployment; here they are Medium because the airborne / operator-workstation surface keeps an in-container attacker contained.
|
||
|
||
## Findings
|
||
|
||
### F14 — Production Dockerfiles run as `root` (no `USER` directive)
|
||
|
||
**Severity (this project)**: Medium
|
||
**Locations**: `docker/companion-tier1.Dockerfile` (entrypoint at line 55), `docker/operator-orchestrator.Dockerfile` (entrypoint at line 22).
|
||
|
||
Neither production Dockerfile drops privileges before `ENTRYPOINT`. The Python runtime executes as UID 0 inside the container.
|
||
|
||
**Project-specific context**: the SUT has no inbound network listener, so an external attacker has no direct path to in-container code execution. The risk is post-compromise: any RCE via a dependency vulnerability (e.g., the future-day equivalent of Phase 1's F1–F12) executes as root in the container, with write access to mounted volumes (`/var/azaion/fdr`, `/var/azaion/tile-cache:ro` — the read-only mount limits damage there, but `fdr-output` is RW).
|
||
|
||
**Evidence the pattern is known to the project**: `e2e/fixtures/tile-cache-builder/Dockerfile:43-46` already implements the correct pattern:
|
||
|
||
```Dockerfile
|
||
RUN useradd -u 10001 -m -d /home/builder builder \
|
||
&& mkdir -p /input /output \
|
||
&& chown -R builder:builder /opt/builder /input /output
|
||
USER 10001:10001
|
||
```
|
||
|
||
**Remediation**: replicate the same `useradd` + `chown` + `USER` block in both production Dockerfiles. Choose a stable UID (e.g., 10100 for the companion, 10200 for the orchestrator) and chown `/opt/gps-denied`, `/opt/venv`, `/var/azaion/fdr` accordingly.
|
||
|
||
---
|
||
|
||
### F15 — Production images install `[dev]` extras
|
||
|
||
**Severity (this project)**: Medium
|
||
**Locations**: `docker/companion-tier1.Dockerfile:27` (`pip install --no-cache-dir -e ".[dev]"`), `docker/operator-orchestrator.Dockerfile:14` (`pip install --no-cache-dir -e ".[dev]"`).
|
||
|
||
The production runtime image ships with the `[dev]` extras: `pytest`, `pytest-asyncio`, `ruff`, `mypy`, `black`, `pytest-cov`, etc. This (a) ~doubles image size, (b) increases the attack surface inside the container (each test-only dep is a CVE candidate, and dev tools like `pytest` parse user-supplied files), and (c) muddies the dependency lockfile audit.
|
||
|
||
**Project-specific context**: same closed-system bound as F14 — an attacker needs in-container execution first. But these packages substantially increase the count of in-process Python modules under control of an attacker.
|
||
|
||
**Remediation**: define a runtime-only extras group in `pyproject.toml` (or rely on the base install with no extras) and use `pip install --no-cache-dir -e ".[runtime]"` or just `pip install --no-cache-dir -e .` in the production Dockerfile. Keep `[dev]` for developer environments and the e2e-runner only.
|
||
|
||
---
|
||
|
||
### F16 — Test-stack base images use moving / `latest` tags
|
||
|
||
**Severity (this project)**: Medium (for `mavproxy:latest`), Low (for `ardupilot-plane-sitl:plane-stable`)
|
||
**Locations**:
|
||
|
||
- `e2e/docker/docker-compose.test.yml:41` — `ardupilot/ardupilot-sitl:plane-stable`
|
||
- `e2e/docker/docker-compose.test.yml:67` — `ardupilot/mavproxy:latest`
|
||
- `e2e/docker/docker-compose.test.yml:49` — `inavflight/inav-sitl:9.0.0` (this one IS pinned — good)
|
||
|
||
**Project-specific context**: the test stack runs in `e2e-net.internal: true` (egress blocked), so a hostile image's network capability is neutered at the docker level. The remaining risk is build-reproducibility regression: a tagged-tomorrow release could break or change SITL behaviour silently between CI runs.
|
||
|
||
**Remediation**: pin both to explicit versions (`mavproxy:1.8.55` style) or to SHA256 digest (`mavproxy@sha256:...`) — match the pattern at `e2e/fixtures/tile-cache-builder/Dockerfile:20` which uses a full SHA256 digest.
|
||
|
||
---
|
||
|
||
### F17 — Production Dockerfile base images use floating tags
|
||
|
||
**Severity (this project)**: Low
|
||
**Locations**: `docker/companion-tier1.Dockerfile:8,38` (`ubuntu:22.04`), `docker/operator-orchestrator.Dockerfile:4` (`python:3.10-slim`).
|
||
|
||
These tags receive security-patch updates without explicit opt-in. That is intentionally desirable for OS patching, but it conflicts with bit-reproducible builds and the supply-chain audit goal.
|
||
|
||
**Project-specific context**: Ubuntu LTS and `python:slim` are reasonable defaults; the failure mode is "two builds of the same commit hash produce different base layers", which complicates incident response (which `libc6` did the failing build ship?).
|
||
|
||
**Remediation**: pin to SHA256 digest at release-tag time; bump explicitly on dependency-refresh cycles. Same pattern as `tile-cache-builder/Dockerfile:20`.
|
||
|
||
---
|
||
|
||
### F18 — Orphan / stale `docker/mock-suite-sat-service.Dockerfile`
|
||
|
||
**Severity (this project)**: Low
|
||
**Location**: `docker/mock-suite-sat-service.Dockerfile`.
|
||
|
||
This file references `tests/fixtures/mock-suite-sat-service/` (path does NOT exist; the real fixture lives at `e2e/fixtures/mock-suite-sat/`), declares port 5100 + path `/healthz`, while the working build (`e2e/docker/docker-compose.test.yml:54 build: ../fixtures/mock-suite-sat`) uses port 8080 + path `/mock/health`. The `docker/`-side file is not referenced by any active compose target.
|
||
|
||
**Project-specific context**: not a runtime vulnerability — orphan artifacts are dead code in the build system. The risk is operator confusion ("which Dockerfile does the mock build from?") and accidental future use of the broken file.
|
||
|
||
**Remediation**: delete `docker/mock-suite-sat-service.Dockerfile`, OR fix it to be a thin wrapper around `e2e/fixtures/mock-suite-sat/Dockerfile`. (Project pattern: `docker/` should hold production-only Dockerfiles; test fixtures should live under `e2e/`.)
|
||
|
||
---
|
||
|
||
### F19 — Unused `curl` binary in production runtime image
|
||
|
||
**Severity (this project)**: Low
|
||
**Location**: `docker/operator-orchestrator.Dockerfile:9` (`curl` in the runtime apt-get install).
|
||
|
||
Healthcheck uses `python3 -m gps_denied_onboard.healthcheck` (line 20), not curl. `curl` is a classic post-compromise tool (data exfil, second-stage payload fetch) and provides no runtime value.
|
||
|
||
**Remediation**: remove `curl` from the runtime apt-get install line.
|
||
|
||
---
|
||
|
||
### F20 — Runner image `opencv-python>=4.12.0` has no upper bound
|
||
|
||
**Severity (this project)**: Low
|
||
**Location**: `e2e/runner/requirements.txt:25`.
|
||
|
||
While the docstring at lines 4–6 correctly notes that the runner does not depend on `gtsam` (so the D-CROSS-CVE-1 numpy<2 ABI block doesn't apply), there is no upper bound — a future opencv 5.x release could ship a behaviour break that lands automatically on the next CI rebuild.
|
||
|
||
**Remediation**: add an upper bound consistent with the rest of `requirements.txt` style: `opencv-python>=4.12.0,<5.0`.
|
||
|
||
---
|
||
|
||
### F21 — Stale path in `.env.example`
|
||
|
||
**Severity (this project)**: Low
|
||
**Location**: `.env.example:29` — `MAVLINK_SIGNING_KEY=tests/fixtures/mavlink_signing/dev_key`.
|
||
|
||
That path predates the secrets reorganization that landed `e2e/fixtures/secrets/mavlink-test-passkey.txt` + `e2e/docker/secrets/mavlink_passkey`. Confusing for a new developer.
|
||
|
||
**Remediation**: update to the current path conventions. Also note that the env var name itself (`MAVLINK_SIGNING_KEY`) is inconsistent with the production env var the docker-compose actually sets (`MAVLINK_SIGNING_PASSKEY_FILE`); align both.
|
||
|
||
---
|
||
|
||
### F22 — Production WORKDIR is not chowned
|
||
|
||
**Severity (this project)**: Low (depends on whether F14 is fixed first)
|
||
**Location**: `docker/companion-tier1.Dockerfile:50` (`WORKDIR /opt/gps-denied`), `docker/operator-orchestrator.Dockerfile:12`.
|
||
|
||
If/when F14's non-root `USER` directive is added, the runtime user will not own `/opt/gps-denied` and will fail to write any artefact there (e.g., the tmpfs FDR pre-buffer). Today this is dormant because the container runs as root. Filing as a coupled remediation item to F14.
|
||
|
||
**Remediation**: when adding the `USER` directive, also add `chown -R <uid>:<gid> /opt/gps-denied /opt/venv /var/azaion`.
|
||
|
||
---
|
||
|
||
## Positive Observations
|
||
|
||
### P5 — Test network is enforced as `internal: true`
|
||
|
||
`e2e/docker/docker-compose.test.yml:117-124` declares `e2e-net.internal: true`. The SUT, mock, runner, and SITLs can talk to each other but none can reach the public internet. The e2e-runner verifies this at runtime by attempting a TCP connect to `1.1.1.1:443` (AC-5 of `NFT-SEC-02`). This is the docker-compose-layer counterpart to the production iptables / DNS blackhole (RESTRICT-OPS-1 / NFT-SEC-05).
|
||
|
||
### P6 — Committed test secrets are demonstrably synthetic
|
||
|
||
Both committed secrets files (`e2e/docker/secrets/mavlink_passkey` and `e2e/fixtures/secrets/mavlink-test-passkey.txt`) contain the same canonical pattern `0123456789abcdef...` repeated, and both README files explicitly state "TEST ONLY — not for production use" with the production-side wiring documented. The `e2e/_unit_tests/test_directory_layout.py::test_passkey_files_match` assertion keeps the two files in lock-step (verified separately during the SUT review). No real secret is in version control.
|
||
|
||
### P7 — `e2e/runner/Dockerfile` follows the public-boundary contract
|
||
|
||
The runner image:
|
||
|
||
- Pins `python:3.12-slim-bookworm` (line 11) — explicit tag.
|
||
- Uses `tini` as PID 1 (zombie reaping under `pytest --forked`).
|
||
- Does NOT install the SUT package and explicitly excludes `src/` from `PYTHONPATH` (line 45 — `ENV PYTHONPATH=/opt/e2e-runner:/opt/e2e-runner/runner` only).
|
||
- Sets `PYTHONDONTWRITEBYTECODE=1`, `PYTHONUNBUFFERED=1`, `PIP_NO_CACHE_DIR=1`, `PIP_DISABLE_PIP_VERSION_CHECK=1`.
|
||
|
||
### P8 — `e2e/fixtures/tile-cache-builder/Dockerfile` is gold-standard
|
||
|
||
It pins Python to SHA256-digest (`python:3.10.14-slim-bookworm@sha256:...`), pins every Python dep with version bounds, drops to a numbered non-root user (`USER 10001:10001`), explicitly chowns the workdir, and sets `PYTHONHASHSEED=0` for reproducibility (line 24). This is the pattern the rest of the project should match.
|
||
|
||
### P9 — `.gitignore` covers secrets and build artefacts comprehensively
|
||
|
||
`*.key`, `.env`, `.env.local` are blocked. The single explicit allow (`!tests/fixtures/mavlink_signing/dev_key`) is documented in the README. Build outputs (`.engine`, `.calib`, `.index`, `.faiss`, `.onnx`, `.trt`) are excluded. CMake artefacts (`build/`, `_skbuild/`, `compile_commands.json`) are excluded.
|
||
|
||
### P10 — Docker secrets are used for the test SUT (not env vars)
|
||
|
||
`e2e/docker/docker-compose.test.yml:30-32` mounts the test mavlink passkey via Docker `secrets:` declaration (`mavlink_passkey` → `/run/secrets/mavlink_passkey`), not via the `environment:` block. The SUT reads from `MAVLINK_SIGNING_PASSKEY_FILE=/run/secrets/mavlink_passkey` — passkey content never crosses the container env. Production mirrors the same wiring (with a real secret store-mounted file). Correct pattern.
|
||
|
||
### P11 — Healthchecks defined on every service
|
||
|
||
`gps-denied-onboard` (line 35), `mock-suite-sat-service` (line 61), and the production Dockerfiles themselves all declare HEALTHCHECK. `depends_on` uses `condition: service_healthy` for the SUT and mock (lines 106-109).
|
||
|
||
### P12 — `internal: true` AND no `ports:` block
|
||
|
||
No production service in `docker-compose.test.yml` publishes a port to the host. The only host-reachable surface is via the `e2e-results` bind mount, which is a read-only artefact dropbox (line 142). Defense-in-depth on top of `internal: true`.
|
||
|
||
## Cross-Reference Index
|
||
|
||
| Source | Phase 4 § | Note |
|
||
|---|---|---|
|
||
| `_docs/02_document/deployment/containerization.md` | F14, F15, F17, F22 | Docs the project's container conventions |
|
||
| `_docs/02_document/deployment/environment_strategy.md` | F16, F21 | Docs env-var contract |
|
||
| `_docs/02_document/tests/environment.md` § Communication with SUT | P10, F21 | Production passkey wiring |
|
||
| `_docs/05_security/dependency_scan.md` | F15, F20 | Phase 1 deps audit (the dev extras shipping to production are part of Phase 1's surface) |
|
||
| `_docs/02_document/tests/security-tests.md` § NFT-SEC-02 | P5 | The harness-side enforcement of the `internal: true` network |
|
||
| `e2e/fixtures/tile-cache-builder/Dockerfile` | F14, F17 | Project's existing reference implementation of the pattern |
|
||
|
||
## Self-Verification
|
||
|
||
- [x] All Dockerfiles in the repo scanned: 6 files (`docker/*.Dockerfile` × 3, `e2e/runner/Dockerfile`, `e2e/fixtures/*/Dockerfile` × 2)
|
||
- [x] All docker-compose files scanned: 2 (`docker-compose.test.yml`, `docker-compose.tier2-bridge.yml`)
|
||
- [x] All committed secret files inspected; content verified as synthetic test data
|
||
- [x] `.gitignore` reviewed for secret-exclusion completeness
|
||
- [x] `.env.example` reviewed for accidentally-committed credentials
|
||
- [x] Findings cite file:line evidence
|
||
- [x] Project-specific severity calibration applied (closed-system threat model recognized)
|