autopilot/_docs/00_problem/security_approach.md

# Security Approach

Threat model + non-negotiable security principles. Specific schemes / libraries / algorithms (HMAC vs ed25519, Unix-domain socket peer-cred mechanism, etc.) are design choices and live in `_docs/02_document/architecture.md` + per-component specs. (Audited against `.cursor/rules/artifact-srp.mdc`.)

## Threat model

The autopilot runs onboard a flying UAV. The threats it must defend against on the MVP timeline:

1. **Hijack of operator commands over the radio link.** Even with modem-level link encryption, an attacker who acquires session state could replay a confirm / decline / target-follow / abort command and seize the system's behaviour. The radio link is hostile territory; link encryption alone cannot be the entire defence.
2. **Crafted input payloads** (image / video crops sent to onboard models, malformed messages on the airframe link, oversize attachments to any onboard service) exploiting decoders, memory bugs, or causing resource exhaustion.
3. **Unstructured model output** corrupting downstream decisions and producing false operator-facing confidence (e.g. a free-form VLM text response treated as a trusted downstream API).
4. **Mid-flight peer spoofing** — a fake sibling service (Tier 1 detection, mission service, or any local IPC peer) impersonating a trusted dependency.
5. **Forensic / audit gaps** — wall-clock drift breaking operator-command timestamping, post-mission diff attribution, or replay-protection windowing.

**Out of scope** (lives elsewhere in the suite or is not relevant to the airborne payload):

- Cloud-hosted secret management — autopilot does not call cloud services.
- Multi-tenancy — single mission per flight; single operator-or-paired-operator session per flight.
- Web-attack surface — the operator browser UI lives in the Ground Station, not in autopilot.
- OTA update signing — Watchtower at the suite level owns it; autopilot only consumes signed images.

## Non-negotiable security principles

These are existence-of-the-rule constraints. The chosen mechanism for each is a design decision and lives in `_docs/02_document/architecture.md`.

- **Operator commands MUST be authenticated, signed, and replay-protected.** Every confirm / decline / target-follow / abort command MUST carry a session-bound, replay-resistant signature that is validated before any state change. Failures are logged at WARN+ and dropped silently from the system's state machine; they are never permitted to take effect.
- **No cloud egress for inference.** Tier 2 + Tier 3 (if enabled) MUST run on the same compute as the rest of autopilot. No HTTP / external network call originating from autopilot for inference is permitted.
- **No silent error swallowing for security-relevant failures.** Signature invalid, peer-credential mismatch, schema violation, oversize payload rejected — each MUST surface through the health endpoint and the structured log.
- **Bounded input for any model call.** Crop size + format allow-list + patched image decoders. Crafted-input and resource-exhaustion mitigation is mandatory; "accept anything and hope the decoder handles it" is not acceptable.
- **Schema validation for any non-deterministic model output.** Free-form generative output (e.g. VLM text) MUST be projected onto a fixed structured schema before it crosses any decision boundary inside autopilot. Schema violation MUST fail closed.
- **Local IPC peer authorisation.** Any onboard IPC peer that autopilot trusts MUST be identifiable as the expected local process (not just "anyone who can reach the socket"). The mechanism is a design choice.
- **Health endpoint MUST reflect security state.** Pre-flight BIT covers reachability + warm-up of every external dependency; the same endpoint surfaces in-flight security signals (repeated signature failures, peer-credential mismatch, schema-violation rate).
- **Wall-clock binding requirement.** Operator-command timestamping requires a trusted clock source. Wall-clock MUST be bound to GPS time once GPS is locked, or NTP at boot. Both sources MUST be recorded with `clock_source` + `last_sync_at`. Drift > 200 ms surfaces health yellow (the AC enforces the threshold; this rule mandates the binding).
- **Airframe MAVLink integrity.** Whether the airframe link MUST use MAVLink-2 message signing depends on whether the link is physically isolated. If it is not physically isolated, message signing MUST be enabled. (The decision and the mechanism are tracked as Q6 in `architecture.md §8`.)

## What this system does NOT own

- Modem-link encryption setup — handled at the radio layer below autopilot.
- Suite-wide TLS / certificate provisioning — delegated to suite-level deployment (`../_infra/`).
- OTA update signing — Watchtower; autopilot consumes already-signed images. Boot-time self-check + rollback policy is an open suite-level question (Q10 in `architecture.md §8`).
- Annotation / training-data security — lives in the `ai-training` repo.
- Operator browser UI auth — Ground Station owns it; the modem-side handshake is jointly specified per the operator-command auth scheme (Q9).

## Open security decisions (tracked in `_docs/02_document/architecture.md §8`)

- **Q6** — MAVLink-2 message signing on the airframe link.
- **Q9** — Operator-command authentication scheme (HMAC / ed25519 / MAVLink-2-extension / separate envelope).
- **Q10** — Software rollback policy on the airframe (boot-time self-check, A/B partition, watchdog rollback).
- **Q11** — Multi-operator session policy (single active operator vs quorum).
- **Q12** — Comms blackout during banking turns (tolerate vs suppress lost-link failsafe during known turn arcs).

None of these block the rest of the design. Each affected component spec calls out the question it depends on and the temporary contract used until the question resolves.