mirror of
https://github.com/azaion/autopilot.git
synced 2026-06-21 13:11:11 +00:00
[AZ-626] Decompose complete: 47 tasks + docs + module layout
Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy Qt/C++ to a Rust workspace. - Remove legacy Qt/C++ tree (ai_controller, drone_controller, misc/camera, python_scaffold, root Dockerfile, autopilot.pro, legacy main.py / requirements.txt). - Add _docs/00_problem (problem, restrictions, acceptance criteria, security approach, input data + fixtures). - Add _docs/01_solution/solution_draft01. - Add _docs/02_document (architecture, system-flows, data_model, glossary, decision-rationale, deployment, 13 component descriptions, tests/ specs, FINAL_report, module-layout). - Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one bootstrap + 46 component tasks) and _dependencies_table.md. - Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for canonical _docs artifacts). - Track autodev state in _docs/_autodev_state.md (Step 6 completed, ready for Step 7 Implement). Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks AZ-640..AZ-686. Total complexity 173 points across 12 epics. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,76 @@
|
||||
# Component — `detection_client`
|
||||
|
||||
**Layer**: Perception (data plane in)
|
||||
**Status**: forward-looking design (Rust)
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
Bi-directional gRPC client to the external `../detections` service. Streams frames out, receives bounding-box detections back. Same bboxes are reused by `semantic_analyzer` (Tier 2 ROI selection) and by `telemetry_stream` (operator overlay). This is the only component in autopilot that talks to `../detections`.
|
||||
|
||||
## 2. Inputs
|
||||
|
||||
| Input | Source | Cadence | Notes |
|
||||
|---|---|---|---|
|
||||
| `Frame` | `frame_ingest` | up to 30 fps | Skipped when `ai_locked` is set. |
|
||||
| Tier-1 service config | startup config | once | gRPC endpoint, TLS settings, request budget, max concurrent streams. |
|
||||
|
||||
## 3. Outputs
|
||||
|
||||
| Output | Consumer | Shape |
|
||||
|---|---|---|
|
||||
| `DetectionBatch` | `scan_controller`, `semantic_analyzer`, `telemetry_stream` | `{ frame_seq: u64, detections: Vec<Detection>, latency_ms, model_version }` |
|
||||
| Health metric | health aggregator | gRPC connection state, `requests_in_flight`, `latency_p50/p99`, `errors_by_kind`, `model_version`. |
|
||||
|
||||
`Detection` mirrors the `../detections` contract: `{ class_id, class_name, confidence, bbox_normalized, optional_mask_or_polyline, source_frame_seq }`.
|
||||
|
||||
## 4. Key Responsibilities
|
||||
|
||||
- Maintain a single bi-directional gRPC stream to `../detections`. Reconnect on stream loss with bounded exponential backoff.
|
||||
- Frame budgeting: respect the Tier-1 ≤100 ms/frame target by dropping older in-flight frames if a new frame arrives before the previous response (configurable).
|
||||
- Validate the response payload against the schema version the client was built against. Surface a hard error on schema mismatch; do not silently downcast.
|
||||
- Tag each `DetectionBatch` with the source frame's monotonic timestamp so downstream consumers can compute end-to-end latency.
|
||||
|
||||
## 5. Internal State
|
||||
|
||||
- gRPC channel, stream handle, reconnect state.
|
||||
- Sliding window of in-flight frame sequence numbers.
|
||||
- Last-known model version (echoed by `../detections` on each response or on stream init).
|
||||
|
||||
State is in-process only.
|
||||
|
||||
## 6. Failure Modes
|
||||
|
||||
| Failure | Detection | Behaviour |
|
||||
|---|---|---|
|
||||
| `../detections` unreachable | gRPC connect error | Bounded exponential backoff; health → red after threshold; `scan_controller` continues but the `detection_client` health flag is red. |
|
||||
| Mid-stream cancellation by server | stream error | Reopen stream; do not lose frames in flight (best-effort retry on the latest only). |
|
||||
| Schema mismatch | response decode error | Hard error to the health aggregator; reject the response; alert. |
|
||||
| Model version change at runtime | new `model_version` on the stream | Log it; if the change implies new classes, surface to `scan_controller` so per-class thresholds can be reloaded. |
|
||||
| Consistent latency above budget | `latency_p99 > 100 ms` over a sliding window | Health → yellow; `scan_controller` may degrade to alternate-frame inference. |
|
||||
|
||||
## 7. Dependencies
|
||||
|
||||
**In-process**: `frame_ingest` (input), `scan_controller` / `semantic_analyzer` / `telemetry_stream` (output).
|
||||
|
||||
**External**:
|
||||
- `../detections` gRPC service. Contract owner: `../_docs/03_detections.md`. Bi-directional streaming.
|
||||
|
||||
## 8. Non-Functional Targets
|
||||
|
||||
| Concern | Target |
|
||||
|---|---|
|
||||
| Per-frame round-trip latency | ≤100 ms (Tier-1 NFR; mostly owned by `../detections`, autopilot's call budget respects it) |
|
||||
| Reconnect latency | ≤2 s after `../detections` returns |
|
||||
| Throughput | up to 30 fps at 1080p |
|
||||
| Backpressure | drop oldest in-flight rather than queue indefinitely |
|
||||
|
||||
## 9. Open Questions
|
||||
|
||||
- Versioning strategy of the gRPC contract (covered in `architecture.md §8 Q4`).
|
||||
|
||||
## 10. References
|
||||
|
||||
- `architecture.md §1`, `§3`, `§7.6`.
|
||||
- `system-flows.md §F1`.
|
||||
- `../_docs/03_detections.md`.
|
||||
- `data_model.md §Detection`, `§DetectionBatch`.
|
||||
@@ -0,0 +1,74 @@
|
||||
# Component — `frame_ingest`
|
||||
|
||||
**Layer**: Perception (data plane in)
|
||||
**Status**: forward-looking design (Rust)
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
Pull RTSP from the ViewPro A40 camera, decode H.264/265 to raw frames, attach a monotonic timestamp + sequence number, and hand each frame to the downstream consumers (`detection_client`, `movement_detector`, `telemetry_stream`) without copying frame buffers more than once.
|
||||
|
||||
Frames are the system's primary input. Everything downstream of `frame_ingest` is rate-limited by it.
|
||||
|
||||
## 2. Inputs
|
||||
|
||||
| Input | Source | Cadence | Notes |
|
||||
|---|---|---|---|
|
||||
| RTSP video stream | ViewPro A40 (via airframe IP/port) | 30 fps at 1080p (60 fps capable) | TCP or UDP transport per camera config. Re-opens on failure with bounded backoff. |
|
||||
| Camera startup config | Static config (env or CLI) | once at process start | Stream URL, transport, decode codec preference. |
|
||||
| `bringCameraDown` / `bringCameraUp` health signal | local supervisor (if present) | event | Optional. Used by deployments that gate AI access to the camera (e.g., during RC takeover). When `down` is asserted, `frame_ingest` continues decoding for `telemetry_stream` but flags frames as "AI-locked" so downstream consumers skip detection. |
|
||||
|
||||
## 3. Outputs
|
||||
|
||||
| Output | Consumer | Shape |
|
||||
|---|---|---|
|
||||
| `Frame` | `detection_client`, `movement_detector`, `telemetry_stream` | `{ seq: u64, capture_ts_monotonic: ns, decode_ts_monotonic: ns, pixels: Arc<Bytes>, width, height, pix_fmt, ai_locked: bool }` |
|
||||
| Health metric | health aggregator | `frames/s`, `decode_ms_p50/p99`, `last_frame_age_ms`, `reopens_total`, `decode_errors_total` |
|
||||
|
||||
## 4. Key Responsibilities
|
||||
|
||||
- Open the RTSP session and recover from transient connection loss with bounded exponential backoff.
|
||||
- Decode frames using a hardware decoder where available (NVDEC on Jetson) with software fallback.
|
||||
- Stamp each frame with a monotonic capture timestamp at the earliest practical point in the pipeline; this is what `movement_detector` uses for telemetry-skew checks.
|
||||
- Publish frames through a single multi-consumer channel (Tokio broadcast or equivalent) using `Arc<Bytes>` for pixel data so consumers do not copy.
|
||||
- Drop frames if downstream consumers fall behind beyond a configured queue depth; record the drop with a reason ({{detection_client_slow, movement_detector_slow, telemetry_slow}}) and surface it through the health endpoint.
|
||||
|
||||
## 5. Internal State
|
||||
|
||||
- RTSP session handle and reconnect state (closed / connecting / streaming / failing).
|
||||
- Last-frame timestamp and sequence number.
|
||||
- Per-consumer drop counters.
|
||||
|
||||
State is in-process only; nothing persists across restarts.
|
||||
|
||||
## 6. Failure Modes
|
||||
|
||||
| Failure | Detection | Behaviour |
|
||||
|---|---|---|
|
||||
| RTSP connection refused / lost | TCP connect error / read timeout | Bounded exponential backoff (1 s → 30 s cap); health flips to yellow after first failure, red after `last_frame_age_ms` exceeds a configured threshold. |
|
||||
| Decode error on a single frame | decoder returns error | Drop the frame; increment `decode_errors_total`; do not abort the stream. |
|
||||
| Decoder cold-start latency | first-frame timestamp far from session-open | Surface `decode_ms_first_frame` once; not an alert by itself. |
|
||||
| Downstream consumer slow | broadcast channel back-pressure | Drop the oldest frame for that consumer; counter-tagged drop; warning on sustained drops. |
|
||||
| Camera output format mismatch | unexpected SPS/PPS | Hard-fail at session open with an explicit error; do not silently pick a wrong decode path. |
|
||||
|
||||
## 7. Dependencies
|
||||
|
||||
**In-process**: none upstream; downstream consumers are `detection_client`, `movement_detector`, `telemetry_stream`.
|
||||
|
||||
**External**:
|
||||
- ViewPro A40 RTSP (live).
|
||||
- Hardware video decoder (NVDEC on Jetson) via FFmpeg / GStreamer or a Rust binding.
|
||||
|
||||
## 8. Non-Functional Targets
|
||||
|
||||
| Concern | Target |
|
||||
|---|---|
|
||||
| End-to-end frame latency (RTSP rx → publish to consumers) | ≤30 ms p99 on Jetson Orin Nano. |
|
||||
| Frame drop rate | ≤0.1 % under normal conditions. |
|
||||
| Reconnect latency after camera reboot | ≤5 s from camera availability. |
|
||||
| Memory | one decoded-frame buffer pool with bounded size; no unbounded growth on slow consumers. |
|
||||
|
||||
## 9. References
|
||||
|
||||
- `architecture.md §1 System Context`, `§3 Components`, `§7.6 Solution Architecture`.
|
||||
- `system-flows.md §F1 Frame pipeline`.
|
||||
- `data_model.md §Frame`.
|
||||
@@ -0,0 +1,78 @@
|
||||
# Component — `gimbal_controller`
|
||||
|
||||
**Layer**: Action (data plane out)
|
||||
**Status**: forward-looking design (Rust); ViewPro A40 vendor protocol
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
Drives the ViewPro A40 gimbal: pan (yaw), tilt (pitch), and zoom. Honours the ≤2 s zoom-transition budget and ≤500 ms decision-to-movement latency. Owns the zoom-out sweep, the smooth-pan path-tracking primitive used during the zoom-in level (follow-the-footpath behaviour), and the centre-window primitive used during target-follow.
|
||||
|
||||
## 2. Inputs
|
||||
|
||||
| Input | Source | Cadence | Notes |
|
||||
|---|---|---|---|
|
||||
| `GimbalCommand` | `scan_controller` | per state-machine tick or per zoom-in plan step | yaw / pitch / zoom goal; or pan plan; or centre-on-target. |
|
||||
| Sweep config | startup config | once | Zoom-out sweep pattern (pendulum / raster / lawn-mower — see `architecture.md §8 Q1`). |
|
||||
| Live gimbal status | ViewPro A40 (vendor protocol) | as emitted by camera | yaw / pitch / zoom feedback + faults. |
|
||||
|
||||
## 3. Outputs
|
||||
|
||||
| Output | Consumer | Shape |
|
||||
|---|---|---|
|
||||
| Vendor-protocol commands | ViewPro A40 (UDP) | yaw / pitch / zoom commands |
|
||||
| `GimbalState` | `frame_ingest` (for telemetry tagging), `movement_detector` (for ego-motion compensation) | `{ yaw, pitch, zoom, ts_monotonic, command_in_flight: bool }` |
|
||||
| Health metric | health aggregator | `commands_per_min`, `decision_to_movement_p99_ms`, `zoom_transition_p99_ms`, `vendor_faults_total`. |
|
||||
|
||||
## 4. Key Responsibilities
|
||||
|
||||
- Send vendor-protocol commands to the ViewPro A40 over UDP. Re-issue on timeout with bounded retry.
|
||||
- Run the zoom-out sweep pattern when `scan_controller` is in `ZoomedOut` (pattern itself depends on `architecture.md §8 Q1` resolution).
|
||||
- For the zoom-in path-follow, accept a pan plan (sequence of yaw / pitch / zoom goals with timing) from `scan_controller` / `semantic_analyzer` and execute it smoothly.
|
||||
- For target-follow, accept a centre-on-target stream (target bbox normalized) from `scan_controller` and command the gimbal to keep the target inside the centre 25 % of frame while visible.
|
||||
- Stamp every emitted command with a monotonic timestamp so `movement_detector` can synchronise it with frames.
|
||||
- Surface vendor-protocol faults to health and to `scan_controller`.
|
||||
|
||||
## 5. Internal State
|
||||
|
||||
- Last-known commanded yaw / pitch / zoom.
|
||||
- Last-known reported yaw / pitch / zoom (from gimbal feedback).
|
||||
- Sweep pattern state (current direction, dwell counter).
|
||||
- Current execution mode: `Sweep | PanPlan | CentreOnTarget | Idle`.
|
||||
|
||||
State is in-process only.
|
||||
|
||||
## 6. Failure Modes
|
||||
|
||||
| Failure | Detection | Behaviour |
|
||||
|---|---|---|
|
||||
| ViewPro A40 not responding | command timeout | Bounded exponential backoff; health → yellow then red; `scan_controller` is informed and may pause zoom-in. |
|
||||
| Decision-to-movement above budget | self-instrumented | Health → yellow; investigate (likely UDP loss or vendor firmware issue). |
|
||||
| Zoom transition stalls | feedback shows no zoom progress | Re-issue command; health → yellow; report to `scan_controller`. |
|
||||
| Target lost during target-follow | feedback + tracker | Surface `target_lost` to `scan_controller`; controller decides to release follow. |
|
||||
| Conflicting commands | execution-mode mismatch | Reject the lower-priority command; log a hard error; never silently merge. |
|
||||
|
||||
## 7. Dependencies
|
||||
|
||||
**In-process** (input): `scan_controller`.
|
||||
**In-process** (output): `frame_ingest`, `movement_detector` (timestamped state).
|
||||
|
||||
**External**: ViewPro A40 over UDP (vendor protocol).
|
||||
|
||||
## 8. Non-Functional Targets
|
||||
|
||||
| Concern | Target |
|
||||
|---|---|
|
||||
| Decision-to-movement latency | ≤500 ms |
|
||||
| Zoom transition (medium → high) | ≤2 s |
|
||||
| Sweep pattern stability | bounded jitter; no overshoot beyond configured FOV bounds |
|
||||
| Target-follow centre-window | target inside centre 25 % of frame while visible |
|
||||
|
||||
## 9. Open Questions
|
||||
|
||||
- Sweep pattern specification (`architecture.md §8 Q1`): pendulum / raster / lawn-mower; FOV per zoom tier; dwell time per direction.
|
||||
|
||||
## 10. References
|
||||
|
||||
- `architecture.md §3`, `§6 NFR`, `§7.6 Solution Architecture`.
|
||||
- `system-flows.md §F2 Movement detection (zoom-out + zoom-in)`.
|
||||
- `data_model.md §GimbalState`.
|
||||
@@ -0,0 +1,124 @@
|
||||
# Component — `mapobjects_store`
|
||||
|
||||
**Layer**: Decision + Memory
|
||||
**Status**: forward-looking design (Rust); on-device working copy of the central MapObjects state, mission-bracketed
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
On-device, H3-indexed working copy of the centrally maintained MapObjects state plus the IgnoredItems list, scoped to the active mission's bounding box. Computes new / moved / existing / removed diffs across survey passes and is the source of truth for the operator-decline suppression rule **for the duration of the active mission**.
|
||||
|
||||
This is **not** a private database. It is hydrated pre-flight from the central `missions` API (`/missions/{id}/mapobjects`) and the mission's full pass diff is pushed back post-flight. The central observation log + computed current view are authoritative across missions and across UAVs (`architecture.md §7.13`).
|
||||
|
||||
## 2. Inputs
|
||||
|
||||
| Input | Source | Cadence | Notes |
|
||||
|---|---|---|---|
|
||||
| Pre-flight pull payload | `mission_client` (from `missions` API) | once per mission | Hydrates `current_state` + `pending_ignored`. |
|
||||
| New detection / movement candidate (with MGRS + class + size) | `scan_controller` | per detection | Each is classified as new / moved / existing. |
|
||||
| `IgnoredItem` append | `scan_controller` (on operator decline) | event | `(MGRS, class_group)` plus operator metadata. |
|
||||
| End-of-pass marker | `scan_controller` / `mission_executor` | event per pass over a region | Triggers the removed-candidate sweep. |
|
||||
| Mission delete cascade | suite-level missions API hook (process-level config; not a network call) | event | Drops mission-scoped objects on mission deletion. |
|
||||
| Post-flight push trigger | `mission_executor` | once per mission, on terminal state | Causes `mission_client` to drain `pending_observations` + `pending_ignored` to the central API. |
|
||||
|
||||
## 3. Outputs
|
||||
|
||||
| Output | Consumer | Shape |
|
||||
|---|---|---|
|
||||
| `MapObjectClassification` | `scan_controller` | `new \| moved \| existing \| removed_candidate` per detection |
|
||||
| `IgnoredItem` match | `scan_controller` | suppression flag for (MGRS, class_group) |
|
||||
| Pass diff | `mission_client` (post-flight upload) + `operator_bridge` (optionally surfaced in-flight) | new / moved / removed lists per pass |
|
||||
| Sync state | `scan_controller`, health aggregator | `synced \| cached_fallback \| degraded`; `pending_observations_count`, `pending_ignored_count` |
|
||||
|
||||
## 4. Key Responsibilities
|
||||
|
||||
- **Pre-flight hydrate** from `mission_client` pull. Establish `current_state` and `pending_ignored`. Surface `sync_state` (`synced` or `cached_fallback` or `degraded`).
|
||||
- Compute H3 cell for each detection at the configured resolution (default res 10, ~15 m edge).
|
||||
- Build the composite key `H3_cell + class`. Maintain an in-memory hashmap; persist asynchronously to disk for crash recovery.
|
||||
- Answer queries: `classify(detection) → new | moved | existing` using k-ring lookup and `(distance_threshold_m, move_threshold_m, similar_classes)` configuration.
|
||||
- After a region's scan-pass ends, return objects in the region that were not re-observed as `removed_candidate`s (the operator decides on actual removal).
|
||||
- Maintain the `IgnoredItem` set; answer suppression queries (`is_ignored(MGRS, class_group)`).
|
||||
- Append every NEW / MOVED / EXISTING / REMOVED-CANDIDATE / IgnoredItem event to `pending_observations` / `pending_ignored` for the post-flight push (in-flight central writes are forbidden — Frozen choice 6 in `architecture.md §7.3`).
|
||||
- **Post-flight push**: hand the contents of `pending_observations` + `pending_ignored` to `mission_client` for `POST /missions/{id}/mapobjects` and `POST /missions/{id}/mapobjects/ignored`. On ack, clear pending; on failure, persist for retry.
|
||||
- On `DELETE /missions/{id}` cascade signal (received via `mission_client`), drop all objects scoped to that mission. The central side cascades as well.
|
||||
|
||||
## 5. Sync state machine
|
||||
|
||||
```text
|
||||
fresh_boot
|
||||
│
|
||||
├──> pre-flight pull
|
||||
│ │
|
||||
│ ├── 200 OK ────────────> synced
|
||||
│ ├── unreachable ────────> [operator ack required]
|
||||
│ │ │
|
||||
│ │ ├── ack on cache ──> cached_fallback
|
||||
│ │ └── abort ─────────> BIT fail
|
||||
│ └── 4xx ─────────────────> BIT fail
|
||||
│
|
||||
├── (during flight; in-process writes only)
|
||||
│ │
|
||||
│ ├── pending_observations grow
|
||||
│ └── pending_ignored grow
|
||||
│
|
||||
└── post-flight push
|
||||
│
|
||||
├── 200 OK on both endpoints ──> synced (pending cleared)
|
||||
├── partial ────────────────────> retry per-endpoint
|
||||
└── persistent failure ─────────> degraded (operator warning, manual replay)
|
||||
```
|
||||
|
||||
## 6. Internal State
|
||||
|
||||
- In-memory hashmap of `(H3_cell + class) → MapObject`.
|
||||
- `IgnoredItem` set keyed by `(MGRS, class_group)`.
|
||||
- Per-region pass tracker for removed-candidate detection.
|
||||
- `pending_observations`: ordered log of NEW / MOVED / REMOVED-CANDIDATE / EXISTING events not yet pushed centrally.
|
||||
- `pending_ignored`: ordered log of IgnoredItem appends not yet pushed centrally.
|
||||
- `sync_state`: enum + last-pull timestamp + last-push timestamp + last error.
|
||||
- Persistence layer (engine TBD — see Open Questions) for crash recovery and post-flight upload durability.
|
||||
|
||||
## 7. Failure Modes
|
||||
|
||||
| Failure | Detection | Behaviour |
|
||||
|---|---|---|
|
||||
| Pre-flight pull unreachable | network | Surface BIT degradation; operator must acknowledge cached fallback or abort. Never silent. |
|
||||
| Pre-flight pull stale beyond freshness window | last-fetch-at compared to configured staleness | `sync_state = degraded`; operator must acknowledge or abort. |
|
||||
| Persistence write failure | engine error | Log + retry; in-memory state continues authoritative for this mission; health → yellow. |
|
||||
| Persistence corruption on startup | checksum / open failure | Refuse to start with stale state; require explicit recovery (engine-specific); surface to operator at startup. |
|
||||
| H3 query inconsistency near cell boundaries | algorithmic | Always query the k-ring (k=2 default) so boundary objects are matched anyway. |
|
||||
| Mission cascade signal lost | absent signal | `DELETE /missions/{id}` is the only cleanup trigger; on lost signal, mission-scoped objects accumulate. Operator-driven manual purge is acceptable. |
|
||||
| Post-flight push partial success | per-endpoint status | Independent retry per endpoint; do not roll back the successful one. |
|
||||
| Post-flight push persistent failure | bounded retries exhausted | `sync_state = degraded`; pending diff persisted on disk; operator-visible warning; manual replay supported. Mission's central data integrity at risk until replayed. |
|
||||
| In-flight crash | startup detects non-empty `pending_*` for a terminated mission | `mission_client` runs the post-flight push at startup before BIT completes for any new mission. |
|
||||
|
||||
## 8. Dependencies
|
||||
|
||||
**In-process**: `scan_controller`, `mission_client` (for pull/push round-trips), `mission_executor` (for post-flight trigger).
|
||||
|
||||
**External**: H3 spatial-index library (Rust crate). Persistent store engine — TBD (SQLite + H3 extension / KV / in-memory + snapshot — see Open Questions). Central API contract via `mission_client`'s extension of the `missions` API (per `architecture.md §7.13`).
|
||||
|
||||
## 9. Non-Functional Targets
|
||||
|
||||
| Concern | Target |
|
||||
|---|---|
|
||||
| Per-detection classify latency | O(1); p99 ≤1 ms |
|
||||
| Pre-flight pull time | ≤30 s for a 30 km × 30 km mission area (per `architecture.md §6 NFR`) |
|
||||
| Post-flight push time | ≤2 min for a 60 min mission's pass diff (per `architecture.md §6 NFR`) |
|
||||
| Persistent-store size (single mission) | bounded; configurable retention |
|
||||
| Crash recovery time | ≤2 s to a usable state; in-flight crash → next-boot push of pending |
|
||||
| Boundary correctness | guaranteed by k-ring query |
|
||||
|
||||
## 10. Open Questions
|
||||
|
||||
- **Engine choice** (architecture.md §8 Q3): SQLite + H3 extension / KV / in-memory + snapshot.
|
||||
- **Central API schema details** (architecture.md §8 Q7): paging strategy, photo-reference upload mechanism, observation-history retention policy.
|
||||
- **Conflict resolution rules** (architecture.md §8 Q8): exact projection from observation log to current view; REMOVED-claim expiry window; multi-class disambiguation.
|
||||
- Optimal H3 resolution per terrain class.
|
||||
- Class-group definitions (`military_vehicle_group` vs `concealed_position_group` vs `movement_candidate`) — currently in `scan_controller` config.
|
||||
|
||||
## 11. References
|
||||
|
||||
- `architecture.md §3`, `§5 Architectural Principles` (MapObjects are mission-bracketed and centrally synchronised), `§6 NFR`, `§7.9 MapObjects (H3 spatial index)`, `§7.10 Sync Message Format`, `§7.11 Target Relocation`, `§7.12 New vs Existing object detection`, `§7.13 MapObjects Sync`.
|
||||
- `system-flows.md §F7 MapObjects + ignored-items` (in-flight diff), `§F8 MapObjects sync (central DB, mission-bracketing)`.
|
||||
- `data_model.md §MapObject`, `§IgnoredItem`, `§MapObjectObservation`, `§MapObjectsBundle`.
|
||||
- `../_docs/02_missions.md` (mission cascade contract; new MapObjects endpoints).
|
||||
@@ -0,0 +1,87 @@
|
||||
# Component — `mavlink_layer`
|
||||
|
||||
**Layer**: Action (data plane out)
|
||||
**Status**: forward-looking design (Rust); hand-rolled (no third-party SDK)
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
Hand-rolled MAVLink v2 transport. Implements only the ~10–15 commands this codebase needs (full list in `architecture.md §7.7`). Owns serialisation / deserialisation, heartbeat, sequence numbers, retry, and a single connection abstraction (UDP or serial, picked at startup from CLI / env). No third-party SDK — eliminating the largest current dependency-risk item.
|
||||
|
||||
## 2. Inputs
|
||||
|
||||
| Input | Source | Cadence | Notes |
|
||||
|---|---|---|---|
|
||||
| Outgoing `COMMAND_LONG`, `MISSION_*`, `SET_MODE` | `mission_executor` | per state transition | Hand-rolled message constructors per command. |
|
||||
| Outgoing heartbeat | self (timer) | 1 Hz | `HEARTBEAT` to keep the autopilot's GCS-link alive. |
|
||||
| Connection URI | startup config | once | `udp://...` or `serial:///dev/...`. |
|
||||
| MAVLink-2 signing config | startup config | once | If supported by the link, signing is enabled; otherwise the link is treated as trusted. |
|
||||
|
||||
## 3. Outputs
|
||||
|
||||
| Output | Consumer | Shape |
|
||||
|---|---|---|
|
||||
| Decoded MAVLink messages | `mission_executor`, `telemetry_stream`, `movement_detector` (for UAV motion telemetry) | typed enum per message kind |
|
||||
| Connection state | health aggregator | `connected`, `last_heartbeat_age_ms`, `tx_seq`, `rx_seq`, `parse_errors_total`, `signing_enabled`. |
|
||||
|
||||
The supported message surface (concise list; full table in `architecture.md §7.7`):
|
||||
|
||||
- `HEARTBEAT` (bidir)
|
||||
- `COMMAND_LONG` subset (out): arm/disarm, takeoff, set-mode, change-speed, change-alt, land, RTL
|
||||
- `COMMAND_ACK` (in)
|
||||
- `MISSION_COUNT`, `MISSION_REQUEST_INT`, `MISSION_ITEM_INT`, `MISSION_ACK`, `MISSION_SET_CURRENT`, `MISSION_CURRENT`, `MISSION_ITEM_REACHED`, `MISSION_CLEAR_ALL`
|
||||
- `GLOBAL_POSITION_INT`, `ATTITUDE`, `SYS_STATUS`, `EXTENDED_SYS_STATE`, `STATUSTEXT`
|
||||
- `SET_MODE` (out, fixed-wing)
|
||||
|
||||
## 4. Key Responsibilities
|
||||
|
||||
- Open and maintain the MAVLink connection (UDP or serial). Reconnect on transport loss with bounded backoff.
|
||||
- Encode outgoing messages with correct sequence numbers, system / component IDs, and (when enabled) MAVLink-2 signing.
|
||||
- Decode incoming messages with strict validation: reject malformed frames, unknown message IDs, and signing failures.
|
||||
- Emit a 1 Hz heartbeat. Detect autopilot heartbeat timeouts and surface to health.
|
||||
- Demux `COMMAND_ACK` to the originating caller (per `command_id`); enforce a wall-clock ack timeout.
|
||||
|
||||
## 5. Internal State
|
||||
|
||||
- Connection handle (UDP socket or serial port).
|
||||
- Outgoing sequence number.
|
||||
- In-flight command map (`command_id → (caller, deadline)`).
|
||||
- Per-message-kind parse error counters.
|
||||
|
||||
State is in-process only.
|
||||
|
||||
## 6. Failure Modes
|
||||
|
||||
| Failure | Detection | Behaviour |
|
||||
|---|---|---|
|
||||
| Transport open failure | OS error | Bounded backoff; surface to health → red. |
|
||||
| Heartbeat from autopilot missing | wall-clock timeout | Surface `link_lost` to health and to `mission_executor`; do not silently fail. |
|
||||
| Command-ack timeout | wall-clock | Bubble timeout to `mission_executor`; the executor decides retry vs failure. |
|
||||
| Malformed inbound frame | parser error | Drop the frame; increment counter; do not abort the link. |
|
||||
| MAVLink-2 signing mismatch (if enabled) | signature check | Reject the frame; alert; do not silently accept. |
|
||||
| Sequence-number gap | rx_seq vs expected | Log; not a hard failure on its own. |
|
||||
|
||||
## 7. Dependencies
|
||||
|
||||
**In-process** (input): `mission_executor`.
|
||||
**In-process** (output): `mission_executor`, `telemetry_stream`, `movement_detector`.
|
||||
|
||||
**External**: ArduPilot / PX4 over MAVLink v2 (UDP or serial).
|
||||
|
||||
## 8. Non-Functional Targets
|
||||
|
||||
| Concern | Target |
|
||||
|---|---|
|
||||
| Per-message round-trip on a healthy link | ≤50 ms p99 |
|
||||
| Heartbeat cadence | 1 Hz out |
|
||||
| Command-ack timeout | configurable; default 1 s, with retry handled by `mission_executor` |
|
||||
| Reconnect after transport loss | ≤2 s on serial / ≤5 s on UDP |
|
||||
| Message subset | ~10–15 commands only — adding more requires explicit design review |
|
||||
|
||||
## 9. Open Questions
|
||||
|
||||
- **MAVLink-2 message signing** (`architecture.md §8 Q6`): whether the airframe link enables signing or treats the link as trusted.
|
||||
|
||||
## 10. References
|
||||
|
||||
- `architecture.md §3`, `§5 Architectural Principles` (no MAVSDK, no silent error swallowing), `§7.7 MAVLink and Piloting`.
|
||||
- `system-flows.md §F6 Mission lifecycle`.
|
||||
@@ -0,0 +1,93 @@
|
||||
# Component — `mission_client`
|
||||
|
||||
**Layer**: Action (data plane out)
|
||||
**Status**: forward-looking design (Rust)
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
Pulls the mission from the external `missions` API on start; validates against the shared `mission-schema` artefact; supplies the parsed mission to `mission_executor`. POSTs middle-waypoint inserts on operator-confirmed targets, owns the **MapObjects pre-flight pull / post-flight push** round trips against the same `missions` API, and survives transient connection loss with bounded retry.
|
||||
|
||||
`autopilot` and `missions` are **separate repos** with a shared `mission-schema`. There is no in-process mission database in autopilot. The MapObjects endpoints (`/missions/{id}/mapobjects` GET + POST) are an extension of the `missions` API per `architecture.md §7.13`.
|
||||
|
||||
## 2. Inputs
|
||||
|
||||
| Input | Source | Cadence | Notes |
|
||||
|---|---|---|---|
|
||||
| `mission_id` | startup CLI / env | once | Identifies which mission to fetch. |
|
||||
| Missions API endpoint + auth | startup config | once | HTTPS REST; auth model TBD per `../_docs/02_missions.md`. |
|
||||
| Middle-waypoint POST request | `mission_executor` (via `scan_controller` / `operator_bridge`) | event | The mission with the inserted middle waypoint. |
|
||||
| Mission-update notification | missions API (push or poll) | event | Optional; if missions API supports change notifications, propagate to `mission_executor`. |
|
||||
| MapObjects post-flight push trigger | `mission_executor` (on terminal state) + `mapobjects_store` (pending diff handle) | once per mission | Triggers the F8 post-flight upload. |
|
||||
|
||||
## 3. Outputs
|
||||
|
||||
| Output | Consumer | Shape |
|
||||
|---|---|---|
|
||||
| Parsed mission | `mission_executor` | `{ waypoints: Vec<MissionWaypoint>, geofences: Vec<Geofence>, return_point, mission_id, schema_version }` |
|
||||
| Pre-flight MapObjects bundle | `mapobjects_store` | `{ map_objects, ignored_items, fetched_at, schema_version, fallback_used: bool }` |
|
||||
| Post-flight push status | `mapobjects_store`, health aggregator | per-endpoint ack / retry / failure |
|
||||
| Mission cascade signal (`DELETE /missions/{id}` echoed by missions API) | `mapobjects_store` | event |
|
||||
| Health metric | health aggregator | `last_fetch_ts`, `fetch_errors_total`, `schema_version`, `connection_state`, `mapobjects_pull_state`, `mapobjects_push_pending`. |
|
||||
|
||||
## 4. Key Responsibilities
|
||||
|
||||
- Fetch the mission by `mission_id` on startup. Validate against `mission-schema`. Reject on schema-invalid; do not silently downcast.
|
||||
- **MapObjects pre-flight pull.** Immediately after the mission fetch succeeds, call `GET /missions/{id}/mapobjects` (and `GET /missions/{id}/mapobjects/ignored` if separated). Hand the bundle to `mapobjects_store`. On failure, surface to `mission_executor` BIT (F9) — operator may acknowledge cached fallback or abort. Never silent.
|
||||
- POST middle-waypoint updates; await ack; surface failure to `mission_executor` (which decides whether to halt, RTL, or proceed with the original mission).
|
||||
- **MapObjects post-flight push.** When `mission_executor` reaches a terminal state, drain `mapobjects_store`'s pending diff and call `POST /missions/{id}/mapobjects` + `POST /missions/{id}/mapobjects/ignored`. Independent retry per endpoint with bounded backoff. On persistent failure, persist pending diff on disk and surface a warning (operator may manually replay).
|
||||
- **Crash-recovery push.** At startup, if `mapobjects_store` reports non-empty pending diff for a previously terminated mission, run the post-flight push for that mission BEFORE BIT for any new mission begins.
|
||||
- On `DELETE /missions/{mission_id}` (observed via missions API or out-of-band signal), notify `mapobjects_store` to drop mission-scoped objects.
|
||||
- Survive transient connection loss with bounded exponential backoff. Pre-flight, this delays takeoff. In-flight, missing connectivity does not stop execution of the already-in-memory mission. (No central writes happen in-flight by design — Frozen choice 6.)
|
||||
|
||||
## 5. Internal State
|
||||
|
||||
- Currently active mission (the original, plus any patched version from middle-waypoint inserts).
|
||||
- Schema version reported by missions API at fetch.
|
||||
- MapObjects pull state: `not_started | in_flight | synced | cached_fallback | failed`.
|
||||
- MapObjects push queue: per-mission pending diff with retry counter and last-failure reason.
|
||||
- Retry counter and last-failure reason for each endpoint.
|
||||
|
||||
State is in-process only **except** for the post-flight push queue, which is durable on disk so a crash mid-mission does not lose the diff.
|
||||
|
||||
## 6. Failure Modes
|
||||
|
||||
| Failure | Detection | Behaviour |
|
||||
|---|---|---|
|
||||
| Missions API unreachable at startup | HTTP error / DNS failure | Bounded retry; if max-retry exceeded, refuse to start the mission; health → red; surface to operator. |
|
||||
| Schema mismatch (mission or mapobjects) | response decoder | Refuse to start the mission; surface raw response (size-capped) for offline analysis. |
|
||||
| Pre-flight MapObjects pull fails | HTTP error / timeout | BIT degrades; operator may acknowledge cached fallback or abort. Never silent. |
|
||||
| Mid-flight middle-waypoint POST fails | HTTP error | `mission_executor` decides: continue with the existing in-memory mission, or RTL if the failure is persistent. |
|
||||
| Post-flight MapObjects push fails | HTTP error / 5xx | Persist pending diff on disk; bounded retry with exponential backoff; operator-visible warning after max retries. |
|
||||
| Post-flight push partial success | per-endpoint status | Independent retry per endpoint; do not roll back the successful one. |
|
||||
| Mission deleted mid-flight | `DELETE` notification | Surface to operator; safe-shutdown decision is a policy in `mission_executor` (default: continue current mission and notify on landing). The post-flight push will receive 404; data preserved as orphaned for forensic review. |
|
||||
|
||||
## 7. Dependencies
|
||||
|
||||
**In-process** (input): startup config, `mission_executor`, `operator_bridge` (via `scan_controller`), `mapobjects_store` (pending-diff handle).
|
||||
**In-process** (output): `mission_executor`, `mapobjects_store`.
|
||||
|
||||
**External**: missions API (HTTPS REST), including the MapObjects extension. Contract owner: `../_docs/02_missions.md` (with the §7.13 extension proposed in this repo).
|
||||
|
||||
## 8. Non-Functional Targets
|
||||
|
||||
| Concern | Target |
|
||||
|---|---|
|
||||
| Startup mission fetch | ≤5 s on healthy connectivity |
|
||||
| Pre-flight MapObjects pull | ≤30 s for a 30 km × 30 km mission area |
|
||||
| Middle-waypoint POST | ≤2 s on healthy connectivity |
|
||||
| Post-flight MapObjects push | ≤2 min for a 60 min mission's pass diff; persisted on disk if push fails |
|
||||
| Bounded retry | configurable max; default 5 attempts with exponential backoff for synchronous calls; 24 h durable retry window for the post-flight push |
|
||||
|
||||
## 9. Open Questions
|
||||
|
||||
- **`mission-schema` extraction location** (`architecture.md §8 Q5`): `_infra/` at suite root, or a small third repo.
|
||||
- **MapObjects endpoint contract** (`architecture.md §8 Q7`): paging, photo-ref upload, retention policy.
|
||||
- **MapObjects conflict resolution** (`architecture.md §8 Q8`): server-side; this component only consumes the result.
|
||||
- Auth / session model for the missions API (per `../_docs/02_missions.md`).
|
||||
|
||||
## 10. References
|
||||
|
||||
- `architecture.md §3`, `§5 Architectural Principles` (separate repos + shared schema; MapObjects mission-bracketed), `§7.6 Solution Architecture`, `§7.13 MapObjects Sync`.
|
||||
- `system-flows.md §F6 Mission lifecycle`, `§F8 MapObjects sync`.
|
||||
- `data_model.md §MissionItem`, `§MissionWaypoint`, `§Geofence`, `§MapObjectsBundle`, `§MapObjectObservation`.
|
||||
- `../_docs/02_missions.md`.
|
||||
@@ -0,0 +1,94 @@
|
||||
# Component — `mission_executor`
|
||||
|
||||
**Layer**: Action (data plane out)
|
||||
**Status**: forward-looking design (Rust)
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
Drives the airframe through a typed state machine: connect → health-check → **pre-flight self-test (BIT, F9)** → (variant-specific arm/takeoff or wait-for-AUTO) → upload mission → fly mission → land. Owns geofence enforcement (both INCLUSION and EXCLUSION), the **lost-link failsafe ladder** (F10), and **battery / fuel threshold enforcement**. Inserts middle waypoints on operator-confirmed targets and resumes the original mission after target-follow ends. Issues all autopilot-facing commands through `mavlink_layer`. Triggers post-flight MapObjects push (F8) on terminal state.
|
||||
|
||||
## 2. Inputs
|
||||
|
||||
| Input | Source | Cadence | Notes |
|
||||
|---|---|---|---|
|
||||
| Mission JSON (parsed) | `mission_client` | once at start; on middle-waypoint update | Contains waypoints + INCLUSION/EXCLUSION geofences + return point. |
|
||||
| Airframe variant | startup config | once | `multirotor` or `fixed_wing`. |
|
||||
| MAVLink telemetry | `mavlink_layer` | continuous | Position, attitude, mode, sys-status, mission progress. |
|
||||
| Middle-waypoint hint | `scan_controller` (from `operator_bridge`) | event on operator confirm | Triggers mission re-upload. |
|
||||
| Target-follow release / loss / timeout | `scan_controller` | event | Triggers reverting to the original mission. |
|
||||
| Health input from peer components | health aggregator | continuous | Used for the health-check gate before takeoff. |
|
||||
|
||||
## 3. Outputs
|
||||
|
||||
| Output | Consumer | Shape |
|
||||
|---|---|---|
|
||||
| MAVLink commands (arm, takeoff, set-mode, change-speed, change-alt, land, RTL, mission-clear, mission-upload, set-current, RTL) | `mavlink_layer` | per state transition |
|
||||
| UAV telemetry (forwarded) | `scan_controller`, `movement_detector`, `telemetry_stream` | continuous |
|
||||
| Mission state | `scan_controller`, `operator_bridge` | event on transitions |
|
||||
| Health metric | health aggregator | current state, `state_duration_ms`, `transition_failures_by_state`, geofence violations, retry counts. |
|
||||
|
||||
## 4. Key Responsibilities
|
||||
|
||||
- Run the variant-specific state machine (see `architecture.md §7.7`):
|
||||
- **Multirotor**: `DISCONNECTED → CONNECTED → HEALTH_OK → BIT_OK → ARMED → TAKE_OFF → MISSION_UPLOADED → FLY_MISSION → LAND → POST_FLIGHT_SYNC → DONE`.
|
||||
- **Fixed-wing**: `DISCONNECTED → CONNECTED → HEALTH_OK → BIT_OK → MISSION_UPLOADED → WAIT_AUTO → FLY_MISSION → LAND → POST_FLIGHT_SYNC → DONE`.
|
||||
- Apply bounded retry with exponential backoff at every transition; explicit max-retry; on exceeding it, health flips to red and the executor surfaces the failure via `operator_bridge`. **No infinite retry.**
|
||||
- **Run pre-flight BIT (F9)** before transitioning to `ARMED` / `WAIT_AUTO`. BIT covers every dependency in `architecture.md §5` plus mission load + MapObjects pre-flight pull (cached fallback acknowledged) + persistent-store free space + wall-clock binding. On BIT FAIL, no transition. On DEGRADED, surface to operator for signed acknowledgement (per Q9).
|
||||
- **Run the lost-link failsafe ladder (F10)** every tick: `LinkOk → LinkDegraded → LinkLost → LinkLostInFollow`. Default RTL after 30 s grace; configurable. MAVLink-link loss to ArduPilot itself is a separate, more severe event — health → red, airframe failsafe takes over (we do NOT override it).
|
||||
- **Enforce battery / fuel thresholds.** Read `SYS_STATUS` / `EXTENDED_SYS_STATE` continuously; trigger RTL at `battery ≤ rtl_threshold` (default 25 %); land-now at `battery ≤ hard_floor` (default 15 %); operator override only via signed command.
|
||||
- Enforce geofences. INCLUSION violations halt forward progress and trigger RTL; EXCLUSION violations trigger the same. Both are honoured (the earlier C++ behaviour silently ignored EXCLUSION; the new design rejects that).
|
||||
- On middle-waypoint hint: recompute the mission (`current_position → middle_waypoint → resume_original_route`), `MISSION_CLEAR_ALL`, re-upload via the standard sequence, `MISSION_SET_CURRENT(0)`, and resume.
|
||||
- On target-follow ending: recompute and re-upload the original mission from the current position; resume.
|
||||
- **Trigger post-flight MapObjects push (F8)** on entry to `POST_FLIGHT_SYNC` — that is, after `LAND` completes (or after RTL completes, or after operator-acknowledged abort). Hand off to `mission_client`.
|
||||
- Forward MAVLink telemetry to `scan_controller` (for proximity priority + middle-waypoint computation), to `movement_detector` (for ego-motion compensation), and to `telemetry_stream` (for operator overlay).
|
||||
|
||||
## 5. Internal State
|
||||
|
||||
- Current state + variant.
|
||||
- Currently active mission (original) + active patched mission (with middle waypoint), if any.
|
||||
- Per-transition retry counter and last-failure reason.
|
||||
- Mission progress (current item index).
|
||||
- Geofence violation history (for diagnostics).
|
||||
|
||||
State is in-process only; restart re-runs the state machine from `DISCONNECTED`.
|
||||
|
||||
## 6. Failure Modes
|
||||
|
||||
| Failure | Detection | Behaviour |
|
||||
|---|---|---|
|
||||
| MAVLink connection lost | heartbeat timeout from `mavlink_layer` | Bounded retry; health → red after threshold; state machine pauses (does not reset). |
|
||||
| Health-check gate fails (sensors not ok, low battery, etc.) | telemetry inspection | Stay in `CONNECTED` state; alert; no takeoff. |
|
||||
| BIT FAIL on any item | F9 evaluation | No transition past `BIT_OK`; surface report to operator; remain in `HEALTH_OK`. |
|
||||
| Mission upload `MISSION_ACK` rejection | `mavlink_layer` response | Bounded retry with full re-upload; on max-retry, health → red, surface to operator. |
|
||||
| Geofence INCLUSION exit | telemetry vs polygon | Trigger RTL via MAVLink; surface alert; transition to `LAND`. |
|
||||
| Geofence EXCLUSION entry | telemetry vs polygon | Trigger RTL via MAVLink; surface alert; transition to `LAND`. |
|
||||
| Operator/Ground-Station modem link lost | F10 ladder evaluation | `LinkDegraded` (5–30 s) → health yellow + queue events; `LinkLost` (>30 s) → RTL; `LinkLostInFollow` (>30 s in target-follow) → 30 s grace then RTL. Configurable. |
|
||||
| MAVLink-link loss to ArduPilot/PX4 | heartbeat timeout | Health → red; airframe's own MAVLink failsafe takes over (we do NOT override). |
|
||||
| Battery ≤ rtl_threshold (default 25 %) | SYS_STATUS | Trigger RTL; surface alert; transition to `LAND`. |
|
||||
| Battery ≤ hard_floor (default 15 %) | SYS_STATUS | Land-now via `MAV_CMD_NAV_LAND` at safest reachable point; health → red. |
|
||||
| Operator override of safety threshold | signed command (Q9) | Permitted; recorded in audit log with operator ID + rationale. |
|
||||
| Middle-waypoint compute fails (e.g., target outside INCLUSION) | pre-upload validation | Reject the hint with reason; surface to `operator_bridge`; original mission continues. |
|
||||
| Target-follow handover from `scan_controller` while not yet airborne | state guard | Reject; surface error; never deliver target-follow before `FLY_MISSION`. |
|
||||
| Post-flight MapObjects push fails | F8 status | Persist pending diff on disk; bounded retry; operator-visible warning after max retries. State machine still reaches `DONE` so a new mission can start. |
|
||||
|
||||
## 7. Dependencies
|
||||
|
||||
**In-process** (input): `mission_client`, `mavlink_layer`, `scan_controller`, health aggregator.
|
||||
**In-process** (output): `mavlink_layer`, `scan_controller`, `movement_detector`, `telemetry_stream`, `operator_bridge`.
|
||||
|
||||
**External**: ArduPilot / PX4 over MAVLink (mediated by `mavlink_layer`).
|
||||
|
||||
## 8. Non-Functional Targets
|
||||
|
||||
| Concern | Target |
|
||||
|---|---|
|
||||
| Time-to-takeoff (multirotor, healthy startup) | bounded; no infinite waits |
|
||||
| Mission-upload retry budget | configurable max; default 3 attempts |
|
||||
| Geofence response time | ≤500 ms from violation detection to RTL command |
|
||||
| Middle-waypoint re-upload | ≤2 s end-to-end |
|
||||
|
||||
## 9. References
|
||||
|
||||
- `architecture.md §3`, `§5 Architectural Principles` (bounded retry, geofence symmetric, lost-link mandatory, BIT mandatory, MapObjects mission-bracketed), `§7.3 Reliability and safety`, `§7.7 MAVLink and Piloting` (lost-link ladder + battery thresholds).
|
||||
- `system-flows.md §F6 Mission lifecycle`, `§F8 MapObjects sync`, `§F9 Pre-flight self-test`, `§F10 Lost-link failsafe ladder`.
|
||||
- `data_model.md §MissionItem`, `§MissionWaypoint`, `§Geofence`.
|
||||
@@ -0,0 +1,96 @@
|
||||
# Component — `movement_detector`
|
||||
|
||||
**Layer**: Perception (data plane in)
|
||||
**Status**: forward-looking design (Rust + OpenCV bindings; learned-CV fallback per `architecture.md §8 Q14`)
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
Detect small moving point/cluster candidates that are not yet classifiable by Tier 1, in **both** the zoom-out and zoom-in scan levels, and enqueue them as POIs for confirmation. Compensates for UAV and gimbal motion using synchronised telemetry; naive frame differencing is rejected.
|
||||
|
||||
The component is suppressed only during `scan_controller`'s `TargetFollow` state (the gimbal is dominated by tracking commands during follow).
|
||||
|
||||
## 2. Inputs
|
||||
|
||||
| Input | Source | Cadence | Notes |
|
||||
|---|---|---|---|
|
||||
| `Frame` | `frame_ingest` | up to 30 fps | Frames are skipped when `ai_locked` is set or the system is in `TargetFollow`. |
|
||||
| Gimbal angle (yaw, pitch) | `gimbal_controller` | per frame, monotonic-timestamped | Telemetry-skew gate: reject samples where frame ↔ gimbal skew exceeds the configured tolerance for the current zoom band. |
|
||||
| Zoom state | `gimbal_controller` | per frame, monotonic-timestamped | Drives zoom-band selection (`zoomed_out` vs `zoomed_in`) and per-band thresholds; also used for residual-motion scaling. |
|
||||
| UAV motion telemetry | `mavlink_layer` (via `mission_executor`) | 10 Hz target | Position + attitude + velocity + monotonic timestamp. |
|
||||
| Active-state hint | `scan_controller` | event | `enable_zoomed_out` / `enable_zoomed_in` / `disable` (the latter is set during `TargetFollow`). |
|
||||
|
||||
## 3. Outputs
|
||||
|
||||
| Output | Consumer | Shape |
|
||||
|---|---|---|
|
||||
| `MovementCandidate` | `scan_controller` | `{ frame_seq, bbox_normalized, residual_velocity_estimate, telemetry_quality, source_frame_ts, source_zoom_band }` |
|
||||
| Health metric | health aggregator | `enabled`, `current_zoom_band`, `candidates_per_min_zoomed_out`, `candidates_per_min_zoomed_in`, `telemetry_skew_drops_total`, `compensation_quality_per_band`. |
|
||||
|
||||
## 4. Key Responsibilities
|
||||
|
||||
- Compute per-frame ego-motion using OpenCV optical flow / global motion estimation (e.g. dense Lucas-Kanade or feature-based homography), refined by the synchronised gimbal + UAV telemetry.
|
||||
- Subtract estimated ego-motion from per-pixel motion; cluster the residuals.
|
||||
- Emit clusters that meet the **per-zoom-band** minimum size + persistence threshold as `MovementCandidate`s, capped to honour the system-wide ≤5 POIs/min operator-review budget shared with `scan_controller`.
|
||||
- Self-disable in `TargetFollow`. The component still consumes frames while disabled (to keep its motion-history warm) but emits no candidates.
|
||||
- Tag each emitted candidate with `source_zoom_band` so `scan_controller` can apply zoom-band-aware queueing logic (described in `system-flows.md §F2`).
|
||||
|
||||
## 5. Per-zoom-band tuning
|
||||
|
||||
The same code path runs at zoom-out and zoom-in, but the configuration differs because the pixel-to-metre ratio differs by ~10×.
|
||||
|
||||
| Knob | Zoom-out (typical) | Zoom-in (typical) |
|
||||
|---|---|---|
|
||||
| Cluster persistence threshold | 3–5 frames | 6–10 frames (gimbal-pan-induced flicker is more frequent at narrow FOV) |
|
||||
| Residual-velocity floor | low (small physical motion is enough) | higher (small physical motion is amplified pixel-wise; raising the floor reduces FP from compensation residuals) |
|
||||
| Telemetry-skew tolerance | 50 ms frame ↔ gimbal, 100 ms frame ↔ UAV | 25 ms frame ↔ gimbal, 50 ms frame ↔ UAV (stricter — gimbal slewing dominates zoomed FOV) |
|
||||
| Enqueue-latency budget | ≤1 s | ≤1.5 s (allows brief gimbal-stability window) |
|
||||
| FP cap (per-band) | per `architecture.md §6 NFR` | per `architecture.md §6 NFR`; if exceeded, fallback per Q14 |
|
||||
|
||||
Exact values are mission-tunable; defaults are calibrated during the benchmark gate.
|
||||
|
||||
## 6. Internal State
|
||||
|
||||
- Rolling motion-history buffer (a few seconds of frames + telemetry). One buffer per zoom band; switching bands does not invalidate the buffer for the other.
|
||||
- Per-cluster persistence counters (per zoom band).
|
||||
- Telemetry-sync state machine.
|
||||
- `current_zoom_band` derived from `gimbal_controller`'s zoom state.
|
||||
|
||||
State is in-process only.
|
||||
|
||||
## 7. Failure Modes
|
||||
|
||||
| Failure | Detection | Behaviour |
|
||||
|---|---|---|
|
||||
| Telemetry skew above tolerance (per zoom band) | timestamp delta exceeds threshold | Drop that frame's compensation; do not emit candidates for the affected window; counter-tagged drop. |
|
||||
| Optical-flow degenerate | flow magnitudes implausible (e.g. camera failure, full motion blur) | Skip emission for that frame; surface as a health signal on sustained occurrence. |
|
||||
| Sustained candidate flood at zoom-in (FP cap exceeded) | candidates_per_min_zoomed_in over a sliding window | Suppress zoom-in emission only; keep zoom-out emission running; surface health → yellow; this is the trigger condition for the Q14 fallback. |
|
||||
| Sustained candidate flood at zoom-out (FP cap exceeded) | candidates_per_min_zoomed_out over a sliding window | Down-rank lowest-confidence candidates; surface health → yellow; never silently drop without counting. |
|
||||
| Component disabled by `scan_controller` | active-state hint = `disable` | Emit zero candidates; keep motion history warm. |
|
||||
|
||||
## 8. Dependencies
|
||||
|
||||
**In-process**: `frame_ingest`, `gimbal_controller`, `mavlink_layer`, `scan_controller`.
|
||||
|
||||
**External**: OpenCV (patched, version-pinned). Optional: a learned-CV crate / module (RAFT-derivative or CNN motion-segmentation) behind a build-time feature flag — engaged only when the Q14 fallback is required.
|
||||
|
||||
## 9. Non-Functional Targets
|
||||
|
||||
| Concern | Target |
|
||||
|---|---|
|
||||
| Candidate enqueue latency (zoom-out) | ≤1 s from detection to POI in queue |
|
||||
| Candidate enqueue latency (zoom-in) | ≤1.5 s from detection to POI in queue |
|
||||
| False-positive rate at the operator surface | bounded by `scan_controller`'s ≤5 POIs/min cap; per-zoom-band internal caps prevent zoom-in starving zoom-out |
|
||||
| CPU budget on Jetson | configurable; must coexist with Tier 1 (running in `../detections`) and Tier 2 |
|
||||
| Telemetry-skew tolerance | per-zoom-band; defaults in §5 |
|
||||
|
||||
## 10. Open Questions
|
||||
|
||||
- **Q14 fallback selection** (architecture.md §8): if classical OpenCV fails the per-zoom-band FP cap at zoom-in, the fallback module — learned optical flow vs CNN motion-segmentation vs IMU-tighter-coupled classical — is open. Interface contract is fixed (`Frame + telemetry → Vec<MovementCandidate>`).
|
||||
- Minimum cluster persistence threshold across zoom bands (refined during benchmark gate).
|
||||
- Whether to share the motion-history buffer across zoom-band transitions or reset on transition (§6 currently says share).
|
||||
|
||||
## 11. References
|
||||
|
||||
- `architecture.md §3`, `§5 Architectural Principles` (ego-motion compensation mandatory; movement runs at both zoom levels), `§7.6 Movement detector`, `§8 Q14`.
|
||||
- `system-flows.md §F2 Movement detection (zoom-out + zoom-in)`.
|
||||
- `data_model.md §MovementCandidate`.
|
||||
@@ -0,0 +1,89 @@
|
||||
# Component — `operator_bridge`
|
||||
|
||||
**Layer**: Action (data plane out)
|
||||
**Status**: forward-looking design (Rust)
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
Surfaces POIs to the operator (via the always-on `telemetry_stream`) and routes operator commands (confirm / decline / target-follow start / target-follow release / safety-override / BIT-degraded-acknowledge) back into autopilot. On decline, appends an `IgnoredItem`. On confirm, hands a middle-waypoint hint to `mission_executor`. On target-follow start / release, drives `scan_controller`'s state transition. **Validates every operator command's authentication signature, replay protection, and session binding before dispatching it** — the modem link's encryption alone is not sufficient (per `architecture.md §5` and Q9).
|
||||
|
||||
The Ground Station is the operator-facing UI; `operator_bridge` is the autopilot-side counterpart.
|
||||
|
||||
## 2. Inputs
|
||||
|
||||
| Input | Source | Cadence | Notes |
|
||||
|---|---|---|---|
|
||||
| POI surface request | `scan_controller` | per POI | Includes Tier 1, Tier 2, and (optional) Tier 3 evidence. |
|
||||
| POI dequeue / replace | `scan_controller` | event | When the queue rotates (cap, age-out, or completion). |
|
||||
| Operator command (confirm / decline / target-follow start / target-follow release) | Ground Station (via `telemetry_stream`) | event | Acked back to operator with command id + result. |
|
||||
| Modem link state | `telemetry_stream` | event | Used to decide whether to surface POIs at all (see Failure Modes). |
|
||||
|
||||
## 3. Outputs
|
||||
|
||||
| Output | Consumer | Shape |
|
||||
|---|---|---|
|
||||
| Operator-facing POI event | `telemetry_stream` (which pushes to Ground Station) | `{ poi_id, mgrs, class_group, confidence, vlm_status, tier2_evidence_summary, photo_metadata }` |
|
||||
| `IgnoredItem` append | `mapobjects_store` (via `scan_controller`) | on operator decline |
|
||||
| Middle-waypoint hint | `mission_executor` (via `scan_controller`) | on operator confirm |
|
||||
| Target-follow start / release | `scan_controller` | on operator command |
|
||||
| Health metric | health aggregator | `pois_surfaced_per_min`, `decision_latency_p50/p99` (operator-side), `commands_in_flight`. |
|
||||
|
||||
## 4. Key Responsibilities
|
||||
|
||||
- Translate `POI` events from `scan_controller` into the wire format defined in `architecture.md §7.10 Drone ⇄ Operator Sync Message Format` and push them through `telemetry_stream`.
|
||||
- Receive operator commands on the return path; **validate the authentication signature, replay-protection sequence number, and session token** before any other processing. Reject and surface to health on signature failure, sequence-number reuse, or unknown session.
|
||||
- Validate the command id matches a POI in flight (or a target-follow session, BIT report, or safety-override scope); ack the operator with the result.
|
||||
- Apply the confidence-scaled operator decision window (40 % → 30 s, 100 % → 120 s, linear) — though the timeout itself is enforced by `scan_controller`; this component just ensures the surfaced POI carries the deadline.
|
||||
- On confirm, hand `(target_mgrs, target_class)` to `scan_controller` (which forwards a middle-waypoint hint to `mission_executor`).
|
||||
- On decline, hand `(MGRS, class_group)` to `scan_controller` for `IgnoredItem` append.
|
||||
- Forward BIT-degraded acknowledgements (signed) to `mission_executor` (F9), and safety-override commands (signed) for battery / lost-link suppression to `mission_executor` (F10).
|
||||
|
||||
## 5. Internal State
|
||||
|
||||
- Currently surfaced POIs by id (with deadlines).
|
||||
- In-flight target-follow session (if any).
|
||||
- Per-command idempotency keys.
|
||||
|
||||
State is in-process only.
|
||||
|
||||
## 6. Failure Modes
|
||||
|
||||
| Failure | Detection | Behaviour |
|
||||
|---|---|---|
|
||||
| Modem link down | `telemetry_stream` health | Stop surfacing POIs; queue them in `scan_controller` (whose cap still applies); resume on reconnect. F10 lost-link ladder owns the larger response. |
|
||||
| Operator command for unknown POI id | command validation | Ack with error; do not act on it. |
|
||||
| Operator command after deadline | command validation | Ack with `expired`; do not act on it. |
|
||||
| Duplicate operator command (re-tx) | idempotency key | Ack with the cached result; do not double-act. |
|
||||
| `scan_controller` rejects the confirm (e.g., already in target-follow) | response from controller | Ack operator with `rejected: already_following`; surface the active target. |
|
||||
| Operator command signature invalid | auth check | Reject with `auth_failed`; log; surface health → red on sustained failures (potential hostile injection). |
|
||||
| Operator command sequence number reused | replay-protection check | Reject with `replay_detected`; log; do not act on it. |
|
||||
| Unknown session token | session validation | Reject with `auth_failed`; log; require operator re-auth at Ground Station. |
|
||||
| Operator attempts to acknowledge a BIT FAIL as DEGRADED | severity check | Rejected by validation; surface to operator as `cannot_acknowledge_fail`. |
|
||||
|
||||
## 7. Dependencies
|
||||
|
||||
**In-process** (input): `scan_controller`, `telemetry_stream`.
|
||||
**In-process** (output): `scan_controller` (for state transitions), indirectly `mapobjects_store` and `mission_executor` (via `scan_controller`).
|
||||
|
||||
**External**: Ground Station API (operator-facing); contract owned by `../_docs/04_system_design_clarifications.md`.
|
||||
|
||||
## 8. Non-Functional Targets
|
||||
|
||||
| Concern | Target |
|
||||
|---|---|
|
||||
| POI surface → operator visible | ≤1 s under normal modem conditions |
|
||||
| Operator command → autopilot effect | ≤1 s under normal modem conditions |
|
||||
| Idempotency window | 60 s (per-command-id cache) |
|
||||
|
||||
## 9. Open Questions
|
||||
|
||||
- Ground Station API contract (`architecture.md §8 Q2`): stream protocol (WebRTC / WebSocket-H.264 / gRPC server-streaming?), session/auth model, bbox-overlay rendering.
|
||||
- **Operator-command authentication scheme** (`architecture.md §8 Q9`): HMAC over (session_token, sequence_number, payload) vs JWT-style ed25519 vs MAVLink-2 signing extended to operator commands vs separate envelope. The principle is committed; the scheme is open.
|
||||
- **Multi-operator session policy** (`architecture.md §8 Q11`): single active operator at a time, or quorum?
|
||||
|
||||
## 10. References
|
||||
|
||||
- `architecture.md §3`, `§5 Architectural Principles` (operator commands authenticated, signed, replay-protected), `§7.10 Drone ⇄ Operator Sync Message Format`, `§8 Q9 / Q11`.
|
||||
- `system-flows.md §F5 Operator round trip`, `§F9 Pre-flight self-test`, `§F10 Lost-link failsafe ladder`.
|
||||
- `data_model.md §POI`, `§IgnoredItem`, `§OperatorCommand`.
|
||||
- `../_docs/04_system_design_clarifications.md`.
|
||||
@@ -0,0 +1,96 @@
|
||||
# Component — `scan_controller`
|
||||
|
||||
**Layer**: Decision + Memory
|
||||
**Status**: forward-looking design (Rust)
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
The system's brain. A deterministic typed state machine — `ZoomedOut`, `ZoomedIn { roi, hold_started_at }`, and `TargetFollow { target_id, started_at }`. Owns the POI queue, timeouts, the ≤5 POIs/min operator-review cap, the confidence-scaled operator-decision window, gimbal command issuance, and the new/moved/existing/removed dispatch into `mapobjects_store`.
|
||||
|
||||
The full behaviour-tree spec — including tick scenarios and the 15 fixed-wing rules — lives in `system-flows.md §F4`.
|
||||
|
||||
## 2. Inputs
|
||||
|
||||
| Input | Source | Cadence | Notes |
|
||||
|---|---|---|---|
|
||||
| `DetectionBatch` | `detection_client` | per frame | Tier 1 primitives. |
|
||||
| `MovementCandidate` | `movement_detector` | per frame at both zoom-out and zoom-in (suppressed only during `TargetFollow`) | Each candidate carries `source_zoom_band`. |
|
||||
| `Tier2Evidence` | `semantic_analyzer` | per zoom-in hold | Path / endpoint / concealment scoring. |
|
||||
| `VlmAssessment` | `vlm_client` (optional) | per zoom-in endpoint hold | `status: disabled` if VLM is off. |
|
||||
| Operator commands | `operator_bridge` | event | confirm / decline / target-follow start / target-follow release. Authenticated, signed, replay-protected upstream of this component. |
|
||||
| UAV telemetry | `mavlink_layer` (via `mission_executor`) | 10 Hz target | Position used for proximity-weighted POI priority and middle-waypoint computation. |
|
||||
| Mission state | `mission_executor` | event | Current waypoint, mission progress; used for sweep-vs-route alignment. |
|
||||
| MapObjects sync state | `mapobjects_store` | event at startup + post-flight | `synced` / `cached_fallback` / `degraded` — surfaces a health flag and (for `degraded`) suppresses MapObject diff classifications until corrected. |
|
||||
|
||||
## 3. Outputs
|
||||
|
||||
| Output | Consumer | Shape |
|
||||
|---|---|---|
|
||||
| `GimbalCommand` (yaw / pitch / zoom) | `gimbal_controller` | per state-machine tick or per zoom-in plan step |
|
||||
| `POI` to operator | `operator_bridge` (then `telemetry_stream`) | enqueue / dequeue events |
|
||||
| Middle-waypoint hint | `mission_executor` | event on operator-confirmed target |
|
||||
| MapObjects update | `mapobjects_store` | new / moved / existing / removed dispatch |
|
||||
| Health metric | health aggregator | `state`, `pois_in_queue`, `pois_per_min`, `tick_latency_p99`, `last_state_change_ts`, `mapobjects_sync_state`. |
|
||||
|
||||
## 4. Key Responsibilities
|
||||
|
||||
- Run the `ZoomedOut` / `ZoomedIn` / `TargetFollow` state machine. Transitions are explicit, typed, and fully enumerated; no ad-hoc booleans.
|
||||
- Maintain the POI queue ordered by `confidence × proximity_to_current_camera × age_factor`. Hard-cap output to ≤5 POIs/min surfaced to the operator.
|
||||
- Apply the confidence-scaled operator decision window (40 % → 30 s, 100 % → 120 s, linear; below 40 % the POI is not surfaced). Timeout = forget; decline = `IgnoredItem` entry via `mapobjects_store`.
|
||||
- Suppress new POIs whose `(MGRS, class_group)` matches an existing `IgnoredItem`.
|
||||
- For each new detection or movement candidate: compute the H3 cell, ask `mapobjects_store` to classify as new / moved / existing, and only surface non-existing entries.
|
||||
- **Zoom-in candidate handling.** When a `MovementCandidate` arrives with `source_zoom_band = zoomed_in`, evaluate against the current ROI: if inside, bump current-ROI confidence; if outside the ROI but inside the broader zoomed FOV, enqueue as a candidate-POI; only interrupt the current zoom-in hold if the candidate's priority exceeds the current hold's priority.
|
||||
- On operator confirmation: hand a middle-waypoint hint to `mission_executor`, transition to `TargetFollow`, and command `gimbal_controller` to keep the target in the centre 25 % of frame.
|
||||
- On operator decline / timeout / target loss: append (decline only) an `IgnoredItem` and return to `ZoomedOut`.
|
||||
- On `mapobjects_store` reporting `sync_state = degraded`, surface health → red and **do not** classify new detections (avoid corrupting the central observation log on next push); continue to surface POIs to the operator on Tier-1 + movement evidence alone.
|
||||
|
||||
## 5. Internal State
|
||||
|
||||
The state machine lives entirely in this component. State variables:
|
||||
|
||||
- Current state: `ZoomedOut | ZoomedIn { roi, hold_started_at } | TargetFollow { target_id, started_at }`.
|
||||
- POI queue: ordered, with per-entry priority and queue position.
|
||||
- Per-class operator-decision-window thresholds.
|
||||
- Last-N tick timestamps for tick-latency observability.
|
||||
- Frame-rate floor monitor: when sustained FPS < 10, suppress `ZoomedOut → ZoomedIn` transitions and surface health → yellow.
|
||||
|
||||
State is in-process only; restart starts in `ZoomedOut` with an empty queue.
|
||||
|
||||
## 6. Failure Modes
|
||||
|
||||
| Failure | Detection | Behaviour |
|
||||
|---|---|---|
|
||||
| `detection_client` health red | health input | Continue zoom-out sweep; emit no new POIs from Tier 1; movement candidates still flow. |
|
||||
| `movement_detector` health red | health input | Continue; lose movement-candidate enqueueing. |
|
||||
| `semantic_analyzer` health red | health input | Skip Tier 2; surface POIs with Tier-1-only evidence; flag in operator overlay. |
|
||||
| `vlm_client` returns `status: disabled \| timeout \| ipc_error \| schema_invalid` | per-call status | Surface POI without VLM evidence (fail-closed). |
|
||||
| `gimbal_controller` not ready | health input | Stay in current state; alert; do not silently drop scan steps. |
|
||||
| `operator_bridge` disconnected | health input | Continue zoom-out (operator UI is unreachable, but the system must not crash); pause POI surfacing; resume on reconnect. F10 lost-link ladder owns the larger response. |
|
||||
| `mapobjects_store` sync degraded | sync_state input | Suppress diff classifications; surface POIs on Tier-1 + movement only; health → red. |
|
||||
| Sustained FPS < 10 | self-instrumented | Suppress zoom-in transitions; health → yellow. |
|
||||
| Tick-latency above budget | self-instrumented | Health → yellow; investigate (likely upstream consumer slowness). |
|
||||
|
||||
## 7. Dependencies
|
||||
|
||||
**In-process** (input): `detection_client`, `movement_detector`, `semantic_analyzer`, `vlm_client`, `operator_bridge`, `mission_executor`, `mapobjects_store`.
|
||||
**In-process** (output): `gimbal_controller`, `operator_bridge`, `mission_executor`, `mapobjects_store`.
|
||||
|
||||
**External**: none directly. All external integrations are mediated by other components.
|
||||
|
||||
## 8. Non-Functional Targets
|
||||
|
||||
| Concern | Target |
|
||||
|---|---|
|
||||
| Tick latency | ≤10 ms p99 |
|
||||
| POI enqueue → operator surface | ≤1 s in normal load |
|
||||
| POI rate to operator | ≤5 POIs/min (hard cap) |
|
||||
| Zoom-out → zoom-in transition | ≤2 s including physical zoom |
|
||||
| Zoom-in hold duration | configurable; default 5 s/POI |
|
||||
| Target-follow centre-window | target inside centre 25 % of frame while visible |
|
||||
| Frame-rate floor | ≥10 fps sustained; below this, suppress zoom-in transitions |
|
||||
|
||||
## 9. References
|
||||
|
||||
- `architecture.md §3`, `§5 Architectural Principles`, `§6 NFR`, `§7.6 Scan controller and POI queue`, `§7.12 New vs Existing / Moved / Removed Object Detection`, `§7.13 MapObjects Sync`.
|
||||
- `system-flows.md §F4 Scan controller behaviour tree` (full BT spec, tick scenarios, 15 fixed-wing rules).
|
||||
- `data_model.md §POI`, `§IgnoredItem`, `§MapObject`.
|
||||
@@ -0,0 +1,70 @@
|
||||
# Component — `semantic_analyzer`
|
||||
|
||||
**Layer**: Perception (data plane in)
|
||||
**Status**: forward-looking design (Rust + ONNX/TensorRT bindings)
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
Tier 2 of the perception pipeline. Reasons over zoom-in crops using a primitive graph plus a lightweight ROI CNN. Active only when `scan_controller` is in `ZoomedIn`. Owns path-freshness scoring, endpoint scoring, branch choice at intersections, and concealment-POI scoring. Operates on bounded ROIs only — never full frames.
|
||||
|
||||
## 2. Inputs
|
||||
|
||||
| Input | Source | Cadence | Notes |
|
||||
|---|---|---|---|
|
||||
| `DetectionBatch` (Tier 1 primitives) | `detection_client` | per zoom-in frame | Used for primitive-graph construction (paths, branches, entrances, trees). |
|
||||
| Zoom-in frame + ROI selection | `frame_ingest` (frame), `scan_controller` (ROI bounds) | per zoom-in hold | Bounded crop only; full frame is not consumed. |
|
||||
| Per-class config | startup config | once | Confidence floors, freshness thresholds, branch-priority rules. |
|
||||
|
||||
## 3. Outputs
|
||||
|
||||
| Output | Consumer | Shape |
|
||||
|---|---|---|
|
||||
| `Tier2Evidence` | `scan_controller` | `{ roi_id, path_freshness, endpoint_score, concealment_score, recommended_next_action: PanFollowFootpath \| HoldEndpoint \| PanBroad \| ReturnToZoomOut, source_detections: Vec<DetectionId> }` |
|
||||
| `Pan plan` | `scan_controller` (then `gimbal_controller`) | sequence of pan goals for footpath following |
|
||||
| Health metric | health aggregator | `tier2_latency_p50/p99`, `roi_size_bytes_p99`, `errors_total`. |
|
||||
|
||||
## 4. Key Responsibilities
|
||||
|
||||
- Build a small primitive graph from Tier-1 detections inside the ROI: path nodes (footpaths, roads), endpoint nodes (branch piles, dark entrances, dugouts), context nodes (trees, tree blocks).
|
||||
- Score path freshness using the freshness model (texture, edge clarity, undisturbed-surroundings cues).
|
||||
- Score concealment for endpoint candidates.
|
||||
- At intersections, recommend the freshest / most-promising branch for `gimbal_controller` to pan toward; emit a follow plan that keeps the path centered while the UAV moves.
|
||||
- Bound every inference call by a strict ROI size and timeout. Never run on a full frame.
|
||||
|
||||
## 5. Internal State
|
||||
|
||||
- ROI-scoped primitive graphs (per-ROI lifetime; dropped on zoom-in exit).
|
||||
- Lightweight CNN session (ONNX/TensorRT engine).
|
||||
|
||||
State is in-process only.
|
||||
|
||||
## 6. Failure Modes
|
||||
|
||||
| Failure | Detection | Behaviour |
|
||||
|---|---|---|
|
||||
| ROI size exceeds limit | pre-decode size check | Reject the ROI; surface to `scan_controller` as `tier2_oversize`; do not decode. |
|
||||
| Inference timeout (>200 ms) | wall-clock | Return `Tier2Evidence` with `status: timeout`; `scan_controller` decides to skip VLM and surface a low-evidence POI. |
|
||||
| CNN session OOM or hardware error | inference call error | Health → red on sustained errors; `scan_controller` falls back to Tier-1-only POI surfacing. |
|
||||
| Inconsistent primitive graph (e.g., disconnected paths) | graph validation step | Emit `Tier2Evidence` with `recommended_next_action: ReturnToZoomOut` and `path_freshness: undefined`. |
|
||||
|
||||
## 7. Dependencies
|
||||
|
||||
**In-process**: `detection_client`, `frame_ingest`, `scan_controller`.
|
||||
|
||||
**External**: ONNX Runtime / TensorRT (whichever the lightweight CNN ships with), OpenCV (preprocessing).
|
||||
|
||||
## 8. Non-Functional Targets
|
||||
|
||||
| Concern | Target |
|
||||
|---|---|
|
||||
| Per-ROI latency | ≤200 ms p99 |
|
||||
| Concealed-position recall | ≥60 % |
|
||||
| Concealed-position precision | ≥20 % (operators filter) |
|
||||
| Footpath detection recall | ≥70 % |
|
||||
| ROI memory footprint | bounded; no unbounded buffering |
|
||||
|
||||
## 9. References
|
||||
|
||||
- `architecture.md §3`, `§7.6 Tier 2 semantic analyzer`, `§7.5 Training Data`.
|
||||
- `system-flows.md §F1 Frame pipeline`, `§F4 Scan controller behaviour tree`.
|
||||
- `data_model.md §Tier2Evidence`.
|
||||
@@ -0,0 +1,78 @@
|
||||
# Component — `telemetry_stream`
|
||||
|
||||
**Layer**: Telemetry plane (always-on, parallel to the decision loop)
|
||||
**Status**: forward-looking design (Rust)
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
Continuous, always-on push of the camera feed + UAV telemetry + bbox overlay to the Ground Station API over modem. Carries operator commands (confirm / decline / target-follow start / target-follow release) on the return path. Independent of the decision loop — the operator always sees the live feed, not just on detection.
|
||||
|
||||
## 2. Inputs
|
||||
|
||||
| Input | Source | Cadence | Notes |
|
||||
|---|---|---|---|
|
||||
| Decoded `Frame` | `frame_ingest` | up to 30 fps | Re-encoded for the modem link bandwidth. |
|
||||
| `DetectionBatch` | `detection_client` | per frame | Used to build the bbox overlay (server-burn-in or client-render — see Open Questions). |
|
||||
| `MovementCandidate` (zoom-out + zoom-in) | `scan_controller` (forwarded) | per candidate | Surfaced in operator overlay; the `source_zoom_band` tag is preserved so the overlay can render zoom-out vs zoom-in candidates differently. |
|
||||
| UAV telemetry | `mavlink_layer` (via `mission_executor`) | 10 Hz | Position, attitude, mode, sys-status. |
|
||||
| Gimbal state | `gimbal_controller` | per change | yaw / pitch / zoom. |
|
||||
| `POI` events | `operator_bridge` | per POI surface / dequeue | Passed straight through. |
|
||||
| Operator commands | Ground Station (return path) | event | Forwarded to `operator_bridge`. |
|
||||
|
||||
## 3. Outputs
|
||||
|
||||
| Output | Consumer | Shape |
|
||||
|---|---|---|
|
||||
| Outbound stream | Ground Station API (over modem) | per stream protocol (TBD — see Open Questions) |
|
||||
| Inbound operator commands | `operator_bridge` | event |
|
||||
| Health metric | health aggregator | `link_state`, `bandwidth_used_mbps`, `frame_drop_rate`, `last_command_received_ts`. |
|
||||
|
||||
## 4. Key Responsibilities
|
||||
|
||||
- Encode and push the camera feed + telemetry + bbox overlay continuously, regardless of detection state.
|
||||
- Apply bandwidth-aware rate adaptation (drop bbox-overlay frequency before frame frequency; drop frame frequency before resolution).
|
||||
- Surface the modem link state to the health aggregator; `operator_bridge` consults this to decide whether to surface POIs.
|
||||
- Receive operator commands on the return path; forward to `operator_bridge` with monotonic timestamps.
|
||||
|
||||
## 5. Internal State
|
||||
|
||||
- Stream session handle.
|
||||
- Rate-adaptation state machine.
|
||||
- In-flight frame buffer (bounded).
|
||||
|
||||
State is in-process only.
|
||||
|
||||
## 6. Failure Modes
|
||||
|
||||
| Failure | Detection | Behaviour |
|
||||
|---|---|---|
|
||||
| Modem link down | transport error / heartbeat | Surface `link_lost`; pause outbound push (do not buffer indefinitely); `operator_bridge` pauses POI surfacing. |
|
||||
| Bandwidth saturation | adaptive monitor | Reduce bbox-overlay rate, then frame rate, then resolution; surface to health → yellow. |
|
||||
| Inbound command unparseable | parser error | Reject; ack with error; do not act. |
|
||||
| Inbound command from unauthenticated peer | session check (per Ground Station contract) | Reject; alert. |
|
||||
|
||||
## 7. Dependencies
|
||||
|
||||
**In-process** (input): `frame_ingest`, `detection_client`, `scan_controller`, `mavlink_layer`, `gimbal_controller`, `operator_bridge`.
|
||||
**In-process** (output): `operator_bridge` (return-path commands).
|
||||
|
||||
**External**: Ground Station API. Contract owner: `../_docs/04_system_design_clarifications.md`.
|
||||
|
||||
## 8. Non-Functional Targets
|
||||
|
||||
| Concern | Target |
|
||||
|---|---|
|
||||
| End-to-end glass-to-operator latency | bounded by modem characteristics; target ≤2 s p99 on a healthy link |
|
||||
| Always-on | yes; not detection-gated |
|
||||
| Rate adaptation | smooth; no sudden full-resolution → no-feed transitions |
|
||||
| Outbound buffering | bounded; no unbounded growth on slow link |
|
||||
|
||||
## 9. Open Questions
|
||||
|
||||
- **Ground Station API contract** (`architecture.md §8 Q2`): stream protocol (WebRTC / WebSocket-H.264 / gRPC server-streaming?), session/auth model, bbox-overlay rendering (server-side burn-in vs client-side render).
|
||||
|
||||
## 10. References
|
||||
|
||||
- `architecture.md §3`, `§5 Architectural Principles` (always-on stream, no silent error swallowing), `§7.6 Integration and reliability`.
|
||||
- `system-flows.md §F5 Operator round trip`.
|
||||
- `../_docs/04_system_design_clarifications.md`.
|
||||
@@ -0,0 +1,82 @@
|
||||
# Component — `vlm_client` (optional)
|
||||
|
||||
**Layer**: Perception (data plane in)
|
||||
**Status**: forward-looking design (Rust); optional behind a feature flag and a runtime config flag
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
Tier 3 of the perception pipeline. Asks a local NanoLLM/VILA1.5-3B process to confirm a zoom-in endpoint POI using one bounded ROI crop and a short prompt. Returns a structured `VlmAssessment`. The free-form VLM text is **not** a downstream API contract — only the validated structured output is.
|
||||
|
||||
VLM is optional; the system MUST function correctly when VLM is disabled or absent.
|
||||
|
||||
## 2. Inputs
|
||||
|
||||
| Input | Source | Cadence | Notes |
|
||||
|---|---|---|---|
|
||||
| Zoom-in ROI crop + prompt | `scan_controller` | per zoom-in endpoint hold | One bounded crop, short prompt, short answer. |
|
||||
| `vlm_enabled` runtime flag | startup config | once at start (re-readable on SIGHUP if implemented) | Gates whether `scan_controller` calls this component at all. |
|
||||
| IPC socket path | startup config | once | Unix-domain socket to the NanoLLM process. |
|
||||
|
||||
## 3. Outputs
|
||||
|
||||
| Output | Consumer | Shape |
|
||||
|---|---|---|
|
||||
| `VlmAssessment` | `scan_controller` | `{ label, confidence, status: ok \| inconclusive \| timeout \| schema_invalid \| ipc_error \| disabled, source_roi_id, latency_ms, model_version }` |
|
||||
| Health metric | health aggregator | `enabled`, `vlm_latency_p50/p99`, `errors_by_kind`, `peer_cred_check_pass_rate`. |
|
||||
|
||||
## 4. Key Responsibilities
|
||||
|
||||
- Validate the ROI payload (size, format) **before** sending it across the IPC channel.
|
||||
- Maintain the Unix-domain-socket connection to the NanoLLM process; perform a peer-credential check on connect (where supported by the platform).
|
||||
- Send one bounded ROI + short prompt; await one short response within ≤5 s.
|
||||
- Validate the response against the `VlmAssessment` schema; on schema-invalid, return `status: schema_invalid` to `scan_controller` and surface to health.
|
||||
- Return `status: disabled` when the runtime flag is `false`; `scan_controller` treats this identically to "VLM not present" and proceeds with Tier 2 evidence alone.
|
||||
- Capture `model_version` (whatever the NanoLLM process reports for its loaded weights) on every assessment for forensic correlation; log the version on change.
|
||||
|
||||
## 5. Internal State
|
||||
|
||||
- IPC socket handle and peer-credential cache.
|
||||
- In-flight request map (request id → caller).
|
||||
|
||||
State is in-process only.
|
||||
|
||||
## 6. Failure Modes
|
||||
|
||||
| Failure | Detection | Behaviour |
|
||||
|---|---|---|
|
||||
| VLM process not reachable | connect / send error | Return `status: ipc_error`; bounded-backoff reconnect; health → yellow then red. |
|
||||
| Peer-cred check fails | platform API | Hard-fail the connect; do not retry without operator intervention; health → red. |
|
||||
| Response timeout (>5 s) | wall-clock | Return `status: timeout`; do not block `scan_controller` past the budget. |
|
||||
| Schema-invalid response | response parser | Return `status: schema_invalid`; log the raw response (size-capped) for offline analysis. |
|
||||
| ROI payload too large | pre-send size check | Return `status: schema_invalid` synchronously; never send. |
|
||||
| Optional component absent at build time | feature flag off at compile | `scan_controller` depends only on the `VlmAssessment` provider trait; the default impl returns `status: disabled`. The binary builds and runs identically without `vlm_client`. |
|
||||
|
||||
## 7. Dependencies
|
||||
|
||||
**In-process**: `scan_controller`.
|
||||
|
||||
**External**: NanoLLM / VILA1.5-3B local process. IPC over Unix-domain socket. No network egress.
|
||||
|
||||
## 8. Non-Functional Targets
|
||||
|
||||
| Concern | Target |
|
||||
|---|---|
|
||||
| Per-ROI latency | ≤5 s p99 |
|
||||
| Memory budget | within the 6 GB shared budget after Tier 1 + Tier 2 |
|
||||
| Cloud egress | **none** (hard rule) |
|
||||
| Failure mode | fail-closed — never surface a POI with VLM evidence on a degraded VLM call |
|
||||
|
||||
## 9. Optionality Model
|
||||
|
||||
Two complementary mechanisms; the implementation chooses one or both:
|
||||
|
||||
1. **Runtime flag (`vlm_enabled`)** gated by the benchmark-gate result. When `false`, `scan_controller` skips VLM confirmation; the zoom-in hold proceeds with Tier 2 evidence alone.
|
||||
2. **Build-time feature module.** `vlm_client` is a separate Cargo feature; the binary builds, links, and runs identically when the feature is off. `scan_controller` depends on a `VlmAssessmentProvider` trait whose default impl returns `status: disabled`.
|
||||
|
||||
Both must yield the same observable behaviour: the system functions correctly with VLM absent, only losing the zoom-in confirmation step.
|
||||
|
||||
## 10. References
|
||||
|
||||
- `architecture.md §5 Architectural Principles` (no cloud egress, fail-closed), `§7.6 Local VLM confirmation`.
|
||||
- `system-flows.md §F3 VLM confirmation` (with explicit fail-closed and disabled branches).
|
||||
- `data_model.md §VlmAssessment`.
|
||||
Reference in New Issue
Block a user