[AZ-650] mission_executor pre-flight BIT (F9) gate (batch 8)

AZ-650 (mission_executor pre-flight Built-In Test):
- BitEvaluator trait + BitItemStatus { Pass, Degraded, Fail, Skipped }
  + BitReport + BitOverall fusion. Pluggable per-item evaluators so
  the composition root decides which dependencies are wired today.
- BitController owns evaluator list + mpsc ack channel + sticky-pass
  + ack deadline. Publishes bit_ok via tokio watch — composition root
  pipes it into the telemetry projection where the existing FSM
  bit_ok guard already consumes it (no FSM changes needed).
- BitState { Idle, Pass, AwaitingAck { report_id }, Failed { reason } }
  with broadcast::Sender<BitEvent> for operator-side observability.
  Sticky-pass semantics: once Pass is reached (directly or via signed
  ack on a Degraded report), the controller stops re-evaluating —
  BIT is a one-shot pre-flight gate, not a continuous monitor.
- BitDegradedAck arrives pre-validated by operator_bridge; the
  controller only matches report_id and applies the operator id to
  the audit log.
- Concrete evaluators landed today (3 of 12 spec items, the rest
  depend on components still in todo/):
  - StateDirFreeSpaceEvaluator (dir creatable/readable; statvfs is
    documented follow-up).
  - WallClockBoundEvaluator (chrono::Utc::now vs configurable bound).
  - MissionLoadedEvaluator (waypoint count via Arc<Mutex<usize>>).
  - MapObjectsSyncedEvaluator (maps SyncState -> BIT status per Q9).

Tests:
- ac1_all_pass_proceeds, ac2_fail_blocks_transition,
  ac3_degraded_requires_signed_ack (+ mismatched_ack supplement),
  ac4_degraded_ack_timeout_fails_the_bit — all 4 ACs green.
- Pure next_state table covered by lib unit tests.
- Per-evaluator unit tests for Pass/Fail/Degraded branches.

Quality gates:
- cargo fmt: clean.
- cargo clippy -p mission_executor --tests -- -D warnings: 0 warns.
- cargo test --workspace: all green.
- Pre-existing flake in state_machine::ac3_bounded_retry_then_success
  (batch 7 report) remains pre-existing — passes on rerun.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-19 19:12:48 +03:00
parent 2bcd4a8059
commit 8a4bd00526
15 changed files with 1373 additions and 47 deletions
@@ -1,69 +0,0 @@
# Pre-Flight BIT (F9)
**Task**: AZ-650_mission_executor_bit_f9
**Name**: Built-In Test gate before ARMED/WAIT_AUTO
**Description**: Pre-flight Built-In Test (F9). Gates the transition to `ARMED` (multirotor) or `WAIT_AUTO` (fixed-wing). Covers every dependency in `architecture.md §5` plus mission load + MapObjects pre-flight pull (cached fallback acknowledged) + persistent-store free space + wall-clock binding. On FAIL no transition. On DEGRADED, surface to operator for signed acknowledgement (per Q9).
**Complexity**: 5 points
**Dependencies**: AZ-640_initial_structure, AZ-648_mission_executor_state_machine, AZ-649_mission_executor_telemetry_forwarding, AZ-644_mission_client_pull_and_schema, AZ-646_mission_client_mapobjects_pull
**Component**: mission_executor
**Tracker**: AZ-650
**Epic**: AZ-636
## Problem
The airframe must not be armed until every load-bearing dependency is verified healthy and every load-bearing input has been ingested. The BIT is the deliberate gate that captures `architecture.md §5` "BIT mandatory" + `system-flows.md §F9`. On FAIL the executor MUST refuse to transition past `BIT_OK`. On DEGRADED the executor surfaces a signed acknowledgement requirement to the operator (per Q9) and only proceeds when ack is observed.
## Outcome
- `Bit::evaluate(env) -> BitReport { items: Vec<BitItem { name, status: Pass | Degraded | Fail, detail }> }` returns a structured report.
- BIT items cover (at minimum): `mavlink_link`, `gimbal_link`, `camera_rtsp`, `detection_grpc`, `movement_telemetry_sync_ready`, `mapobjects_synced_or_cached_acked`, `mission_loaded`, `state_dir_free_space`, `wall_clock_bound`, `tier2_session_ready` (if enabled), `vlm_session_ready` (if enabled), `operator_bridge_session`.
- On `Fail` for any item, the state machine does NOT transition past `BIT_OK`; the report surfaces via `operator_bridge`.
- On `Degraded` items, the state machine waits for a signed `BitDegradedAck` from `operator_bridge` (matching the report id); on ack, proceeds; on timeout (configurable; default 5 min), surfaces failure.
## Scope
### Included
- BIT item evaluators (one per item).
- Report aggregation + status fusion.
- Signed `BitDegradedAck` handling (the auth check itself lives in `operator_bridge` — this task only consumes the validated event).
- Timeout for ack.
### Excluded
- BIT UI / operator overlay (Ground Station + `operator_bridge`).
- Operator-command auth validation (lives in `operator_bridge` — task 41).
## Acceptance Criteria
**AC-1: All-pass BIT proceeds**
Given every dependency is healthy
When the executor reaches `HEALTH_OK` and runs BIT
Then `BitReport.overall = Pass`, the machine transitions to `BIT_OK`, and proceeds to `ARMED` (multirotor) or `MISSION_UPLOADED` (fixed-wing).
**AC-2: Fail blocks transition**
Given `camera_rtsp` reports `Fail`
When BIT runs
Then `BitReport.overall = Fail`, the machine stays at `HEALTH_OK`, and the report is observable via `operator_bridge`.
**AC-3: Degraded requires signed ack**
Given `mapobjects_synced_or_cached_acked` reports `Degraded` (cached fallback)
When BIT runs
Then the executor waits; only after a signed `BitDegradedAck` matching the report id does the machine transition to `BIT_OK`.
**AC-4: Degraded ack timeout fails the BIT**
Given a Degraded report with no ack within the configured timeout (default 5 min)
When the timeout fires
Then `BitReport.overall = Fail`, the machine stays at `HEALTH_OK`, and the timeout is observable.
## Non-Functional Requirements
**Performance**
- BIT evaluation completes in ≤2 s when all dependencies are healthy.
**Reliability**
- No silent FAIL; every item's status is observable.
## Runtime Completeness
- **Named capability**: F9 BIT — production gate before arming.
- **Production code that must exist**: real evaluators that read live health from each dependency; real signed-ack consumption path.
- **Unacceptable substitutes**: a hardcoded "BIT always passes" path in production is unacceptable.