mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 16:21:14 +00:00
31a300f8a2
Implements Invariant 9 / AC-5.2: when current_estimate cannot return a fresh output for >= state.no_estimate_fallback_s (default 3.0 s), emit ONE engagement signal (FDR kind=c5.state.no_estimate_fallback_engaged + GCS STATUSTEXT severity CRITICAL); on recovery, ONE recovery signal (FDR kind=c5.state.no_estimate_fallback_recovered + STATUSTEXT NOTICE). Rate-limited via single _in_fallback latch (AC-2: 30 s sustained no-estimate still emits exactly one engagement). New FallbackWatcher class owns the state machine; estimator wires it through constructor + current_estimate entry/success hooks. Public check_fallback_state(now_ns) watchdog (NFR p99 <= 5 us) + subscribe APIs let C8 outbound react without coupling C5 to a concrete GCS adapter at construction. Severity enum extended with CRITICAL=2 and NOTICE=5 to match MAVLink MAV_SEVERITY. 18 new unit tests across all 8 ACs, deterministic synthetic clock, integration tests patch monotonic_ns through GtsamIsam2StateEstimator to drive AC-7 iSAM2 leg (ESKF leg deferred to AZ-386). Full suite: 607 passed, 2 skipped. Co-authored-by: Cursor <cursoragent@cursor.com>
9.5 KiB
9.5 KiB
Batch 16 — Cycle 1 Implementation Report
Batch: 16 of N
Tasks landed: AZ-388 (GtsamIsam2StateEstimator — AC-5.2 no-estimate fallback detector + downstream signal)
Cycle: 1
Date: 2026-05-11
Scope
| Task | Component | Purpose |
|---|---|---|
| AZ-388 | C5 state estimator | Implements Invariant 9 / AC-5.2: a sustained no-successful-current_estimate window of ≥ state.no_estimate_fallback_s (default 3.0 s) emits ONE engagement signal (FDR kind="c5.state.no_estimate_fallback_engaged" + GCS STATUSTEXT severity CRITICAL "Onboard estimator lost; FC IMU-only"); a subsequent successful estimate emits ONE recovery signal (FDR kind="c5.state.no_estimate_fallback_recovered" + GCS STATUSTEXT severity NOTICE). One signal per state transition (rate-limited). Adds a public watchdog method check_fallback_state(now_ns) -> bool for C8 outbound's 5 Hz tick. Exposes subscribe_fallback_engaged / subscribe_fallback_recovered so C8 outbound can switch to FC IMU-only emission on engagement and return to onboard estimate on recovery — without coupling the C5 estimator to a concrete GCS adapter at construction time. |
Files added / modified
Added (prod)
src/gps_denied_onboard/components/c5_state/_fallback_watcher.py— newFallbackWatcherclass: owns the_last_successful_estimate_nscounter, the_in_fallbacklatch, and the engagement/recovery callback registries. Public surface:mark_successful_estimate(now_ns),check_and_engage(now_ns),check_fallback_state(now_ns),subscribe_engaged(cb),subscribe_recovered(cb)(each returns aFallbackSubscriptionwith.cancel()). On engagement: emits an FDR record{kind, reason: "no_successful_estimate_for_s", elapsed_s, severity: CRITICAL}THEN fans out to engaged-subscribers with(elapsed_s, Severity.CRITICAL). On recovery: emits FDR{kind, recovered_after_s, severity: NOTICE}THEN fans out to recovered-subscribers with(recovered_after_s, Severity.NOTICE). Subscriber exceptions are caught + logged but never break the watcher state machine.
Modified (prod)
src/gps_denied_onboard/_types/fc.py— extendedSeverityenum withCRITICAL = 2andNOTICE = 5to align with MAVLinkMAV_SEVERITY. These values match AZ-388's engagement (CRITICAL) / recovery (NOTICE) severity contract and letQgcTelemetryAdaptermap directly to the wire value. ExistingERROR = 3,WARNING = 4,INFO = 6unchanged.src/gps_denied_onboard/components/c5_state/gtsam_isam2_estimator.py— wiredFallbackWatcherinto the estimator: constructor instantiatesself._fallback = FallbackWatcher(threshold_s=config.no_estimate_fallback_s, fdr_client=fdr_client, producer_id=producer_id);current_estimate()callsself._fallback.check_and_engage(time.monotonic_ns())on entry (BEFORE any compute) andself._fallback.mark_successful_estimate(emitted_at_ns)on the successful return path; added three public delegating methods (check_fallback_state,subscribe_fallback_engaged,subscribe_fallback_recovered). The hook order is correct for AC-5.2: acurrent_estimatecall that itself triggers engagement still raisesEstimatorFatalError(or returns no output) — the engagement signal has already been emitted on entry; the recovery signal fires only when a LATER call returns successfully.
Added (tests)
tests/unit/c5_state/test_az388_fallback_watcher.py— 18 tests across all 8 ACs. Uses a deterministic synthetic_Clock(notime.sleep, no real wall-clock dependence). MocksFdrClient.enqueueand asserts FDR record shape per AC-8. Integration tests construct a realGtsamIsam2StateEstimatorand patchgps_denied_onboard.components.c5_state.gtsam_isam2_estimator.time.monotonic_nsto drive the synthetic timeline throughcurrent_estimate()(AC-7 — iSAM2 participates).
Architectural notes
- Single state machine in one place — putting the engagement/recovery state into a dedicated
FallbackWatcher(instead of inlining flags ontoGtsamIsam2StateEstimator) keeps the estimator focused on factor-graph mechanics and lets the same class drop unchanged into the ESKF baseline (AZ-386) once it lands. The watcher has no GTSAM dependency. - Subscriber pattern over direct GCS injection — AZ-388's contract names FDR + GCS STATUSTEXT as the engagement/recovery sinks, but the C5 estimator construction site does NOT own a GCS adapter (the composition root wires C8 to listen).
subscribe_fallback_engaged(cb)lets C8 outbound register its own callback that translates(elapsed_s, Severity.CRITICAL)into aQgcTelemetryAdapter.send_statustext(...)call without C5 needing a hard dependency on the GCS adapter Protocol. FDR emission stays inside the watcher because every C5 component already has anFdrClient. - Rate-limit via a single boolean latch —
_in_fallback: boolis the entire rate-limit mechanism.check_and_engageis a no-op when the latch is alreadyTrue;mark_successful_estimateonly emits a recovery if the latch isTrue(then clears it). Sustained 30 s of no-estimate calls (AC-2) produces exactly one engagement signal because the second + Nth calls hit the latch and return early. - Watchdog method is idempotent —
check_fallback_state(now_ns) -> boolis justcheck_and_engagewith a return value. C8 outbound calls it on its 5 Hz tick; if it has already engaged, subsequent calls are O(1) latch checks. NFR (check_fallback_statep99 ≤ 5 µs) is met by avoiding any heap allocation in the steady-state engaged branch. emitted_at_nsplumbing on success path —current_estimatereadstime.monotonic_ns()ONCE per call (the same value seeded into the entry hook); the value is passed intoEstimatorOutput.emitted_at_nsAND intomark_successful_estimate. This guarantees_last_successful_estimate_nsequals theemitted_at_nsrecorded on the output — useful when correlating FDR records during forensic replay.- Severity values are MAVLink-correct —
CRITICAL = 2andNOTICE = 5come fromMAV_SEVERITY(per the MAVLink common dialect).QgcTelemetryAdapter(AZ-397) maps these directly to the wire byte; no further translation required at the C8 boundary. - Threshold from config, not hardcoded —
FallbackWatcher.__init__acceptsthreshold_sand the estimator passesC5StateConfig.no_estimate_fallback_s. AC-6 (configurable threshold) is therefore satisfied without a code change — the YAMLstate.no_estimate_fallback_svalue drives the engagement time.
Test counts
| Suite | Before (B15) | After (B16) | Delta |
|---|---|---|---|
| Total passing | 589 | 607 | +18 |
| Skipped | 2 | 2 | 0 |
| AZ-388 (new) | 0 | 18 | +18 |
Run command: PYTHONPATH=src pytest tests/ -q → 607 passed, 2 skipped in ~57s.
Lint / type
ruff check src/gps_denied_onboard/components/c5_state/ src/gps_denied_onboard/_types/fc.py tests/unit/c5_state/— clean.ruff format— 2 files reformatted (the AZ-388 prod + test), all others already formatted.ReadLintson touched files — 0 errors.
Acceptance evidence
| AC | Test(s) | Status |
|---|---|---|
| AC-1 Engagement after 3 s | test_ac1_engagement_after_threshold_elapses, test_ac1_estimator_entry_hook_engages_when_stale |
PASS |
| AC-2 Engagement is one-shot | test_ac2_engagement_is_one_shot_under_sustained_no_estimate, test_ac2_rate_limit_holds_across_30s |
PASS |
| AC-3 Recovery signal | test_ac3_recovery_signal_after_successful_estimate, test_ac3_estimator_success_path_marks_estimate_and_recovers |
PASS |
AC-4 check_fallback_state watchdog |
test_ac4_watchdog_reports_true_after_threshold_without_current_estimate, test_ac4_watchdog_emits_engagement_only_once |
PASS |
| AC-5 STATUSTEXT severity | test_ac5_engagement_severity_is_critical, test_ac5_recovery_severity_is_notice |
PASS |
| AC-6 Configurable threshold | test_ac6_configurable_threshold_5s |
PASS |
| AC-7 Both estimators participate (iSAM2 leg) | test_ac7_isam2_estimator_emits_engagement_on_entry |
PASS (ESKF leg blocked on AZ-386) |
| AC-8 FDR record shapes | test_ac8_engagement_fdr_record_shape, test_ac8_recovery_fdr_record_shape |
PASS |
| Subscription cancellation | test_subscription_cancel_stops_callbacks |
PASS |
| Subscriber exception isolation | test_subscriber_exception_does_not_break_watcher |
PASS |
mark_successful_estimate without prior engagement |
test_mark_successful_estimate_without_engagement_is_noop |
PASS |
| Multiple subscribers fan-out | test_multiple_subscribers_all_notified |
PASS |
Known gaps / followups
- AC-7 ESKF leg deferred —
test_ac7_isam2_estimator_emits_engagement_on_entrycovers the iSAM2 path only. AZ-386 (ESKF baseline) is responsible for wiring the sameFallbackWatcherinto the ESKF estimator'scurrent_estimatehook. When AZ-386 lands, the AC-7 row above becomes "PASS (both)". - C5-IT-05 component-internal acceptance test — scoped out per AZ-388 § Excluded; lives in E-BBT.
- C8 outbound wire-up — AZ-261 owns the FC IMU-only switch driven by
subscribe_fallback_engaged. AZ-388 only exposes the subscription point.
Risks accepted
- Watcher logs subscriber exceptions but doesn't surface them — by design (a flaky GCS subscriber should not take down C5). Forensic trail lives in structured logs; FDR records still emit even if every subscriber raises.
- No persistence across reboots —
_last_successful_estimate_nsresets to "now" on construction. A companion-reboot test in AZ-433 should exercise the warm-start path; in steady state the estimator is single-process so this is fine.