Files
gps-denied-onboard/_docs/05_security/config_infra_review.md
T
Oleksandr Bezdieniezhnykh bf13549b32
ci/woodpecker/push/02-build-push Pipeline failed
[autodev] Update configuration and documentation for cycle-1
- Enhanced `.env.example` with detailed CMake build flags and replay-mode strategy flags for development and CI environments.
- Updated `.gitignore` to include a new deploy rollback bookmark.
- Revised `_docs/_autodev_state.md` to reflect the current task status and steps.
- Added new lessons to `_docs/LESSONS.md` regarding testing and architectural improvements.
- Documented changes in `_docs/02_document/deployment/ci_cd_pipeline.md` to reflect the relaxed OpenCV version pin.
- Updated test data documentation in `_docs/02_document/tests/test-data.md` to clarify fixture usage and paths.

This commit continues the cycle-1 documentation sync and addresses various configuration updates for improved clarity and functionality.
2026-05-20 08:05:35 +03:00

215 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 4 — Configuration & Infrastructure Review
**Review date**: 2026-05-19
**Scope**: Container build files, docker-compose topology, env templates, committed-secret hygiene, network policy, gitignore.
**Files reviewed**:
- `docker/companion-tier1.Dockerfile`
- `docker/operator-orchestrator.Dockerfile`
- `docker/mock-suite-sat-service.Dockerfile`
- `e2e/runner/Dockerfile`
- `e2e/fixtures/mock-suite-sat/Dockerfile`
- `e2e/fixtures/tile-cache-builder/Dockerfile`
- `e2e/docker/docker-compose.test.yml`
- `e2e/docker/docker-compose.tier2-bridge.yml`
- `e2e/docker/secrets/{README.md,mavlink_passkey}`
- `e2e/fixtures/secrets/{README.md,mavlink-test-passkey.txt}`
- `e2e/runner/requirements.txt`
- `.env.example`
- `.gitignore`
- `scripts/run-tests.sh`, `scripts/run-tests-jetson.sh`
## Summary
| Severity (this project) | Count |
|---|---|
| Critical | 0 |
| High | 0 |
| Medium | 4 |
| Low | 5 |
| Informational / positive observations | 7 |
The closed-system threat model (no inbound listeners, no airborne network egress — see Phase 3 § A04) caps the blast radius of any container-hardening gap. The Medium-severity findings would all be raised to High in a multi-tenant or internet-exposed deployment; here they are Medium because the airborne / operator-workstation surface keeps an in-container attacker contained.
## Findings
### F14 — Production Dockerfiles run as `root` (no `USER` directive)
**Severity (this project)**: Medium
**Locations**: `docker/companion-tier1.Dockerfile` (entrypoint at line 55), `docker/operator-orchestrator.Dockerfile` (entrypoint at line 22).
Neither production Dockerfile drops privileges before `ENTRYPOINT`. The Python runtime executes as UID 0 inside the container.
**Project-specific context**: the SUT has no inbound network listener, so an external attacker has no direct path to in-container code execution. The risk is post-compromise: any RCE via a dependency vulnerability (e.g., the future-day equivalent of Phase 1's F1F12) executes as root in the container, with write access to mounted volumes (`/var/azaion/fdr`, `/var/azaion/tile-cache:ro` — the read-only mount limits damage there, but `fdr-output` is RW).
**Evidence the pattern is known to the project**: `e2e/fixtures/tile-cache-builder/Dockerfile:43-46` already implements the correct pattern:
```Dockerfile
RUN useradd -u 10001 -m -d /home/builder builder \
&& mkdir -p /input /output \
&& chown -R builder:builder /opt/builder /input /output
USER 10001:10001
```
**Remediation**: replicate the same `useradd` + `chown` + `USER` block in both production Dockerfiles. Choose a stable UID (e.g., 10100 for the companion, 10200 for the orchestrator) and chown `/opt/gps-denied`, `/opt/venv`, `/var/azaion/fdr` accordingly.
---
### F15 — Production images install `[dev]` extras
**Severity (this project)**: Medium
**Locations**: `docker/companion-tier1.Dockerfile:27` (`pip install --no-cache-dir -e ".[dev]"`), `docker/operator-orchestrator.Dockerfile:14` (`pip install --no-cache-dir -e ".[dev]"`).
The production runtime image ships with the `[dev]` extras: `pytest`, `pytest-asyncio`, `ruff`, `mypy`, `black`, `pytest-cov`, etc. This (a) ~doubles image size, (b) increases the attack surface inside the container (each test-only dep is a CVE candidate, and dev tools like `pytest` parse user-supplied files), and (c) muddies the dependency lockfile audit.
**Project-specific context**: same closed-system bound as F14 — an attacker needs in-container execution first. But these packages substantially increase the count of in-process Python modules under control of an attacker.
**Remediation**: define a runtime-only extras group in `pyproject.toml` (or rely on the base install with no extras) and use `pip install --no-cache-dir -e ".[runtime]"` or just `pip install --no-cache-dir -e .` in the production Dockerfile. Keep `[dev]` for developer environments and the e2e-runner only.
---
### F16 — Test-stack base images use moving / `latest` tags
**Severity (this project)**: Medium (for `mavproxy:latest`), Low (for `ardupilot-plane-sitl:plane-stable`)
**Locations**:
- `e2e/docker/docker-compose.test.yml:41``ardupilot/ardupilot-sitl:plane-stable`
- `e2e/docker/docker-compose.test.yml:67``ardupilot/mavproxy:latest`
- `e2e/docker/docker-compose.test.yml:49``inavflight/inav-sitl:9.0.0` (this one IS pinned — good)
**Project-specific context**: the test stack runs in `e2e-net.internal: true` (egress blocked), so a hostile image's network capability is neutered at the docker level. The remaining risk is build-reproducibility regression: a tagged-tomorrow release could break or change SITL behaviour silently between CI runs.
**Remediation**: pin both to explicit versions (`mavproxy:1.8.55` style) or to SHA256 digest (`mavproxy@sha256:...`) — match the pattern at `e2e/fixtures/tile-cache-builder/Dockerfile:20` which uses a full SHA256 digest.
---
### F17 — Production Dockerfile base images use floating tags
**Severity (this project)**: Low
**Locations**: `docker/companion-tier1.Dockerfile:8,38` (`ubuntu:22.04`), `docker/operator-orchestrator.Dockerfile:4` (`python:3.10-slim`).
These tags receive security-patch updates without explicit opt-in. That is intentionally desirable for OS patching, but it conflicts with bit-reproducible builds and the supply-chain audit goal.
**Project-specific context**: Ubuntu LTS and `python:slim` are reasonable defaults; the failure mode is "two builds of the same commit hash produce different base layers", which complicates incident response (which `libc6` did the failing build ship?).
**Remediation**: pin to SHA256 digest at release-tag time; bump explicitly on dependency-refresh cycles. Same pattern as `tile-cache-builder/Dockerfile:20`.
---
### F18 — Orphan / stale `docker/mock-suite-sat-service.Dockerfile`
**Severity (this project)**: Low
**Location**: `docker/mock-suite-sat-service.Dockerfile`.
This file references `tests/fixtures/mock-suite-sat-service/` (path does NOT exist; the real fixture lives at `e2e/fixtures/mock-suite-sat/`), declares port 5100 + path `/healthz`, while the working build (`e2e/docker/docker-compose.test.yml:54 build: ../fixtures/mock-suite-sat`) uses port 8080 + path `/mock/health`. The `docker/`-side file is not referenced by any active compose target.
**Project-specific context**: not a runtime vulnerability — orphan artifacts are dead code in the build system. The risk is operator confusion ("which Dockerfile does the mock build from?") and accidental future use of the broken file.
**Remediation**: delete `docker/mock-suite-sat-service.Dockerfile`, OR fix it to be a thin wrapper around `e2e/fixtures/mock-suite-sat/Dockerfile`. (Project pattern: `docker/` should hold production-only Dockerfiles; test fixtures should live under `e2e/`.)
---
### F19 — Unused `curl` binary in production runtime image
**Severity (this project)**: Low
**Location**: `docker/operator-orchestrator.Dockerfile:9` (`curl` in the runtime apt-get install).
Healthcheck uses `python3 -m gps_denied_onboard.healthcheck` (line 20), not curl. `curl` is a classic post-compromise tool (data exfil, second-stage payload fetch) and provides no runtime value.
**Remediation**: remove `curl` from the runtime apt-get install line.
---
### F20 — Runner image `opencv-python>=4.12.0` has no upper bound
**Severity (this project)**: Low
**Location**: `e2e/runner/requirements.txt:25`.
While the docstring at lines 46 correctly notes that the runner does not depend on `gtsam` (so the D-CROSS-CVE-1 numpy<2 ABI block doesn't apply), there is no upper bound — a future opencv 5.x release could ship a behaviour break that lands automatically on the next CI rebuild.
**Remediation**: add an upper bound consistent with the rest of `requirements.txt` style: `opencv-python>=4.12.0,<5.0`.
---
### F21 — Stale path in `.env.example`
**Severity (this project)**: Low
**Location**: `.env.example:29``MAVLINK_SIGNING_KEY=tests/fixtures/mavlink_signing/dev_key`.
That path predates the secrets reorganization that landed `e2e/fixtures/secrets/mavlink-test-passkey.txt` + `e2e/docker/secrets/mavlink_passkey`. Confusing for a new developer.
**Remediation**: update to the current path conventions. Also note that the env var name itself (`MAVLINK_SIGNING_KEY`) is inconsistent with the production env var the docker-compose actually sets (`MAVLINK_SIGNING_PASSKEY_FILE`); align both.
---
### F22 — Production WORKDIR is not chowned
**Severity (this project)**: Low (depends on whether F14 is fixed first)
**Location**: `docker/companion-tier1.Dockerfile:50` (`WORKDIR /opt/gps-denied`), `docker/operator-orchestrator.Dockerfile:12`.
If/when F14's non-root `USER` directive is added, the runtime user will not own `/opt/gps-denied` and will fail to write any artefact there (e.g., the tmpfs FDR pre-buffer). Today this is dormant because the container runs as root. Filing as a coupled remediation item to F14.
**Remediation**: when adding the `USER` directive, also add `chown -R <uid>:<gid> /opt/gps-denied /opt/venv /var/azaion`.
---
## Positive Observations
### P5 — Test network is enforced as `internal: true`
`e2e/docker/docker-compose.test.yml:117-124` declares `e2e-net.internal: true`. The SUT, mock, runner, and SITLs can talk to each other but none can reach the public internet. The e2e-runner verifies this at runtime by attempting a TCP connect to `1.1.1.1:443` (AC-5 of `NFT-SEC-02`). This is the docker-compose-layer counterpart to the production iptables / DNS blackhole (RESTRICT-OPS-1 / NFT-SEC-05).
### P6 — Committed test secrets are demonstrably synthetic
Both committed secrets files (`e2e/docker/secrets/mavlink_passkey` and `e2e/fixtures/secrets/mavlink-test-passkey.txt`) contain the same canonical pattern `0123456789abcdef...` repeated, and both README files explicitly state "TEST ONLY — not for production use" with the production-side wiring documented. The `e2e/_unit_tests/test_directory_layout.py::test_passkey_files_match` assertion keeps the two files in lock-step (verified separately during the SUT review). No real secret is in version control.
### P7 — `e2e/runner/Dockerfile` follows the public-boundary contract
The runner image:
- Pins `python:3.12-slim-bookworm` (line 11) — explicit tag.
- Uses `tini` as PID 1 (zombie reaping under `pytest --forked`).
- Does NOT install the SUT package and explicitly excludes `src/` from `PYTHONPATH` (line 45 — `ENV PYTHONPATH=/opt/e2e-runner:/opt/e2e-runner/runner` only).
- Sets `PYTHONDONTWRITEBYTECODE=1`, `PYTHONUNBUFFERED=1`, `PIP_NO_CACHE_DIR=1`, `PIP_DISABLE_PIP_VERSION_CHECK=1`.
### P8 — `e2e/fixtures/tile-cache-builder/Dockerfile` is gold-standard
It pins Python to SHA256-digest (`python:3.10.14-slim-bookworm@sha256:...`), pins every Python dep with version bounds, drops to a numbered non-root user (`USER 10001:10001`), explicitly chowns the workdir, and sets `PYTHONHASHSEED=0` for reproducibility (line 24). This is the pattern the rest of the project should match.
### P9 — `.gitignore` covers secrets and build artefacts comprehensively
`*.key`, `.env`, `.env.local` are blocked. The single explicit allow (`!tests/fixtures/mavlink_signing/dev_key`) is documented in the README. Build outputs (`.engine`, `.calib`, `.index`, `.faiss`, `.onnx`, `.trt`) are excluded. CMake artefacts (`build/`, `_skbuild/`, `compile_commands.json`) are excluded.
### P10 — Docker secrets are used for the test SUT (not env vars)
`e2e/docker/docker-compose.test.yml:30-32` mounts the test mavlink passkey via Docker `secrets:` declaration (`mavlink_passkey``/run/secrets/mavlink_passkey`), not via the `environment:` block. The SUT reads from `MAVLINK_SIGNING_PASSKEY_FILE=/run/secrets/mavlink_passkey` — passkey content never crosses the container env. Production mirrors the same wiring (with a real secret store-mounted file). Correct pattern.
### P11 — Healthchecks defined on every service
`gps-denied-onboard` (line 35), `mock-suite-sat-service` (line 61), and the production Dockerfiles themselves all declare HEALTHCHECK. `depends_on` uses `condition: service_healthy` for the SUT and mock (lines 106-109).
### P12 — `internal: true` AND no `ports:` block
No production service in `docker-compose.test.yml` publishes a port to the host. The only host-reachable surface is via the `e2e-results` bind mount, which is a read-only artefact dropbox (line 142). Defense-in-depth on top of `internal: true`.
## Cross-Reference Index
| Source | Phase 4 § | Note |
|---|---|---|
| `_docs/02_document/deployment/containerization.md` | F14, F15, F17, F22 | Docs the project's container conventions |
| `_docs/02_document/deployment/environment_strategy.md` | F16, F21 | Docs env-var contract |
| `_docs/02_document/tests/environment.md` § Communication with SUT | P10, F21 | Production passkey wiring |
| `_docs/05_security/dependency_scan.md` | F15, F20 | Phase 1 deps audit (the dev extras shipping to production are part of Phase 1's surface) |
| `_docs/02_document/tests/security-tests.md` § NFT-SEC-02 | P5 | The harness-side enforcement of the `internal: true` network |
| `e2e/fixtures/tile-cache-builder/Dockerfile` | F14, F17 | Project's existing reference implementation of the pattern |
## Self-Verification
- [x] All Dockerfiles in the repo scanned: 6 files (`docker/*.Dockerfile` × 3, `e2e/runner/Dockerfile`, `e2e/fixtures/*/Dockerfile` × 2)
- [x] All docker-compose files scanned: 2 (`docker-compose.test.yml`, `docker-compose.tier2-bridge.yml`)
- [x] All committed secret files inspected; content verified as synthetic test data
- [x] `.gitignore` reviewed for secret-exclusion completeness
- [x] `.env.example` reviewed for accidentally-committed credentials
- [x] Findings cite file:line evidence
- [x] Project-specific severity calibration applied (closed-system threat model recognized)