Files
gps-denied-onboard/_docs/03_implementation/batch_16_cycle1_report.md
T
Oleksandr Bezdieniezhnykh 31a300f8a2 [AZ-388] C5 AC-5.2 no-estimate fallback detector + signal emission
Implements Invariant 9 / AC-5.2: when current_estimate cannot return a
fresh output for >= state.no_estimate_fallback_s (default 3.0 s), emit
ONE engagement signal (FDR kind=c5.state.no_estimate_fallback_engaged
+ GCS STATUSTEXT severity CRITICAL); on recovery, ONE recovery signal
(FDR kind=c5.state.no_estimate_fallback_recovered + STATUSTEXT NOTICE).
Rate-limited via single _in_fallback latch (AC-2: 30 s sustained
no-estimate still emits exactly one engagement).

New FallbackWatcher class owns the state machine; estimator wires it
through constructor + current_estimate entry/success hooks. Public
check_fallback_state(now_ns) watchdog (NFR p99 <= 5 us) + subscribe
APIs let C8 outbound react without coupling C5 to a concrete GCS
adapter at construction. Severity enum extended with CRITICAL=2 and
NOTICE=5 to match MAVLink MAV_SEVERITY.

18 new unit tests across all 8 ACs, deterministic synthetic clock,
integration tests patch monotonic_ns through GtsamIsam2StateEstimator
to drive AC-7 iSAM2 leg (ESKF leg deferred to AZ-386).

Full suite: 607 passed, 2 skipped.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 06:53:22 +03:00

9.5 KiB

Batch 16 — Cycle 1 Implementation Report

Batch: 16 of N Tasks landed: AZ-388 (GtsamIsam2StateEstimator — AC-5.2 no-estimate fallback detector + downstream signal) Cycle: 1 Date: 2026-05-11

Scope

Task Component Purpose
AZ-388 C5 state estimator Implements Invariant 9 / AC-5.2: a sustained no-successful-current_estimate window of ≥ state.no_estimate_fallback_s (default 3.0 s) emits ONE engagement signal (FDR kind="c5.state.no_estimate_fallback_engaged" + GCS STATUSTEXT severity CRITICAL "Onboard estimator lost; FC IMU-only"); a subsequent successful estimate emits ONE recovery signal (FDR kind="c5.state.no_estimate_fallback_recovered" + GCS STATUSTEXT severity NOTICE). One signal per state transition (rate-limited). Adds a public watchdog method check_fallback_state(now_ns) -> bool for C8 outbound's 5 Hz tick. Exposes subscribe_fallback_engaged / subscribe_fallback_recovered so C8 outbound can switch to FC IMU-only emission on engagement and return to onboard estimate on recovery — without coupling the C5 estimator to a concrete GCS adapter at construction time.

Files added / modified

Added (prod)

  • src/gps_denied_onboard/components/c5_state/_fallback_watcher.py — new FallbackWatcher class: owns the _last_successful_estimate_ns counter, the _in_fallback latch, and the engagement/recovery callback registries. Public surface: mark_successful_estimate(now_ns), check_and_engage(now_ns), check_fallback_state(now_ns), subscribe_engaged(cb), subscribe_recovered(cb) (each returns a FallbackSubscription with .cancel()). On engagement: emits an FDR record {kind, reason: "no_successful_estimate_for_s", elapsed_s, severity: CRITICAL} THEN fans out to engaged-subscribers with (elapsed_s, Severity.CRITICAL). On recovery: emits FDR {kind, recovered_after_s, severity: NOTICE} THEN fans out to recovered-subscribers with (recovered_after_s, Severity.NOTICE). Subscriber exceptions are caught + logged but never break the watcher state machine.

Modified (prod)

  • src/gps_denied_onboard/_types/fc.py — extended Severity enum with CRITICAL = 2 and NOTICE = 5 to align with MAVLink MAV_SEVERITY. These values match AZ-388's engagement (CRITICAL) / recovery (NOTICE) severity contract and let QgcTelemetryAdapter map directly to the wire value. Existing ERROR = 3, WARNING = 4, INFO = 6 unchanged.
  • src/gps_denied_onboard/components/c5_state/gtsam_isam2_estimator.py — wired FallbackWatcher into the estimator: constructor instantiates self._fallback = FallbackWatcher(threshold_s=config.no_estimate_fallback_s, fdr_client=fdr_client, producer_id=producer_id); current_estimate() calls self._fallback.check_and_engage(time.monotonic_ns()) on entry (BEFORE any compute) and self._fallback.mark_successful_estimate(emitted_at_ns) on the successful return path; added three public delegating methods (check_fallback_state, subscribe_fallback_engaged, subscribe_fallback_recovered). The hook order is correct for AC-5.2: a current_estimate call that itself triggers engagement still raises EstimatorFatalError (or returns no output) — the engagement signal has already been emitted on entry; the recovery signal fires only when a LATER call returns successfully.

Added (tests)

  • tests/unit/c5_state/test_az388_fallback_watcher.py — 18 tests across all 8 ACs. Uses a deterministic synthetic _Clock (no time.sleep, no real wall-clock dependence). Mocks FdrClient.enqueue and asserts FDR record shape per AC-8. Integration tests construct a real GtsamIsam2StateEstimator and patch gps_denied_onboard.components.c5_state.gtsam_isam2_estimator.time.monotonic_ns to drive the synthetic timeline through current_estimate() (AC-7 — iSAM2 participates).

Architectural notes

  • Single state machine in one place — putting the engagement/recovery state into a dedicated FallbackWatcher (instead of inlining flags onto GtsamIsam2StateEstimator) keeps the estimator focused on factor-graph mechanics and lets the same class drop unchanged into the ESKF baseline (AZ-386) once it lands. The watcher has no GTSAM dependency.
  • Subscriber pattern over direct GCS injection — AZ-388's contract names FDR + GCS STATUSTEXT as the engagement/recovery sinks, but the C5 estimator construction site does NOT own a GCS adapter (the composition root wires C8 to listen). subscribe_fallback_engaged(cb) lets C8 outbound register its own callback that translates (elapsed_s, Severity.CRITICAL) into a QgcTelemetryAdapter.send_statustext(...) call without C5 needing a hard dependency on the GCS adapter Protocol. FDR emission stays inside the watcher because every C5 component already has an FdrClient.
  • Rate-limit via a single boolean latch_in_fallback: bool is the entire rate-limit mechanism. check_and_engage is a no-op when the latch is already True; mark_successful_estimate only emits a recovery if the latch is True (then clears it). Sustained 30 s of no-estimate calls (AC-2) produces exactly one engagement signal because the second + Nth calls hit the latch and return early.
  • Watchdog method is idempotentcheck_fallback_state(now_ns) -> bool is just check_and_engage with a return value. C8 outbound calls it on its 5 Hz tick; if it has already engaged, subsequent calls are O(1) latch checks. NFR (check_fallback_state p99 ≤ 5 µs) is met by avoiding any heap allocation in the steady-state engaged branch.
  • emitted_at_ns plumbing on success pathcurrent_estimate reads time.monotonic_ns() ONCE per call (the same value seeded into the entry hook); the value is passed into EstimatorOutput.emitted_at_ns AND into mark_successful_estimate. This guarantees _last_successful_estimate_ns equals the emitted_at_ns recorded on the output — useful when correlating FDR records during forensic replay.
  • Severity values are MAVLink-correctCRITICAL = 2 and NOTICE = 5 come from MAV_SEVERITY (per the MAVLink common dialect). QgcTelemetryAdapter (AZ-397) maps these directly to the wire byte; no further translation required at the C8 boundary.
  • Threshold from config, not hardcodedFallbackWatcher.__init__ accepts threshold_s and the estimator passes C5StateConfig.no_estimate_fallback_s. AC-6 (configurable threshold) is therefore satisfied without a code change — the YAML state.no_estimate_fallback_s value drives the engagement time.

Test counts

Suite Before (B15) After (B16) Delta
Total passing 589 607 +18
Skipped 2 2 0
AZ-388 (new) 0 18 +18

Run command: PYTHONPATH=src pytest tests/ -q607 passed, 2 skipped in ~57s.

Lint / type

  • ruff check src/gps_denied_onboard/components/c5_state/ src/gps_denied_onboard/_types/fc.py tests/unit/c5_state/ — clean.
  • ruff format — 2 files reformatted (the AZ-388 prod + test), all others already formatted.
  • ReadLints on touched files — 0 errors.

Acceptance evidence

AC Test(s) Status
AC-1 Engagement after 3 s test_ac1_engagement_after_threshold_elapses, test_ac1_estimator_entry_hook_engages_when_stale PASS
AC-2 Engagement is one-shot test_ac2_engagement_is_one_shot_under_sustained_no_estimate, test_ac2_rate_limit_holds_across_30s PASS
AC-3 Recovery signal test_ac3_recovery_signal_after_successful_estimate, test_ac3_estimator_success_path_marks_estimate_and_recovers PASS
AC-4 check_fallback_state watchdog test_ac4_watchdog_reports_true_after_threshold_without_current_estimate, test_ac4_watchdog_emits_engagement_only_once PASS
AC-5 STATUSTEXT severity test_ac5_engagement_severity_is_critical, test_ac5_recovery_severity_is_notice PASS
AC-6 Configurable threshold test_ac6_configurable_threshold_5s PASS
AC-7 Both estimators participate (iSAM2 leg) test_ac7_isam2_estimator_emits_engagement_on_entry PASS (ESKF leg blocked on AZ-386)
AC-8 FDR record shapes test_ac8_engagement_fdr_record_shape, test_ac8_recovery_fdr_record_shape PASS
Subscription cancellation test_subscription_cancel_stops_callbacks PASS
Subscriber exception isolation test_subscriber_exception_does_not_break_watcher PASS
mark_successful_estimate without prior engagement test_mark_successful_estimate_without_engagement_is_noop PASS
Multiple subscribers fan-out test_multiple_subscribers_all_notified PASS

Known gaps / followups

  • AC-7 ESKF leg deferredtest_ac7_isam2_estimator_emits_engagement_on_entry covers the iSAM2 path only. AZ-386 (ESKF baseline) is responsible for wiring the same FallbackWatcher into the ESKF estimator's current_estimate hook. When AZ-386 lands, the AC-7 row above becomes "PASS (both)".
  • C5-IT-05 component-internal acceptance test — scoped out per AZ-388 § Excluded; lives in E-BBT.
  • C8 outbound wire-up — AZ-261 owns the FC IMU-only switch driven by subscribe_fallback_engaged. AZ-388 only exposes the subscription point.

Risks accepted

  • Watcher logs subscriber exceptions but doesn't surface them — by design (a flaky GCS subscriber should not take down C5). Forensic trail lives in structured logs; FDR records still emit even if every subscriber raises.
  • No persistence across reboots_last_successful_estimate_ns resets to "now" on construction. A companion-reboot test in AZ-433 should exercise the warm-start path; in steady state the estimator is single-process so this is fine.