# Resilience Tests Failure / recovery scenarios at the SPA's observable boundary: bearer expiry, refresh cookie loss, upstream 5xx, network partition, oversized uploads, SSE drop. Every fault injection is at the network / browser layer; the UI is observed for graceful behavior and recovery. ### NFT-RES-01: 401 → refresh → retry recovery is transparent **Summary**: An authenticated request that returns 401 mid-session is refreshed and retried without unmounting the routed view. **Traces to**: AC-01, AC-23 **Preconditions**: - Active session on `/flights`. **Fault injection**: - Force a 401 on the next outbound authenticated request. **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | Stub the next request to return 401 once | first response 401 | | 2 | Observe the SPA's reaction | `POST /api/admin/auth/refresh` with `credentials:'include'`; on 200, original request retried with new bearer | | 3 | Inspect `` children | not unmounted; re-render delta ≤ 1 | **Pass criteria**: row 03 sequence; row 11 re-render bound (≤ 1); row 12 refresh count == 1 — all hold simultaneously. **Expected result source**: `results_report.md` rows 03, 11, 12. --- ### NFT-RES-02: SSE bearer-rotation — both streams reconnect within 5 s **Summary**: A bearer rotation during open SSE streams (live-GPS + annotation-status) tears them down and reopens them with the new token. **Traces to**: AC-24 **Preconditions**: - Two EventSources open. **Fault injection**: - Trigger a server-driven rotation of the bearer (force a refresh). **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | Rotate bearer | new token in memory | | 2 | Observe each EventSource | closes, then reopens with new `?token=` | | 3 | Measure max reconnect time | ≤ 5 000 ms | **Pass criteria**: both streams close+open exactly once; max reconnect ≤ 5 000 ms (`results_report.md` row 13). **Status**: `quarantined` until SSE reconnect-on-rotation ships (Step 8 hardening). **Expected result source**: `results_report.md` row 13. --- ### NFT-RES-03: Network offline at boot — error state, no offline mode **Summary**: With network disabled, app boot results in a user-visible error state — NOT a service worker-served cached UI. **Traces to**: AC-N3 **Preconditions**: - Browser network disabled (or all `/api/*` stubs respond with offline error). **Fault injection**: - All outbound requests fail with `net::ERR_INTERNET_DISCONNECTED` (or equivalent). **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | Load the SPA | static assets served by nginx OK; API calls fail | | 2 | Observe DOM | login or general error surface present | | 3 | Inspect `navigator.serviceWorker.controller` | `null` | **Pass criteria**: row 93 — error/login-failed state present; no service worker controller. **Expected result source**: `results_report.md` row 93. --- ### NFT-RES-04: ProtectedRoute loading timeout fallback after 10 s **Summary**: The `` spinner has a bounded loading window; a stalled auth bootstrap surfaces a retry CTA / error. **Traces to**: AC-17 **Preconditions**: - Bootstrap refresh stubbed to never resolve. **Fault injection**: - `POST /api/admin/auth/refresh` hangs (no response). **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | Mount `` | spinner rendered | | 2 | Advance fake time to 10 s | timeout fires | | 3 | Inspect DOM | retry CTA or error message present; spinner unmounted | **Pass criteria**: row 59 — fallback present, spinner absent. **Status**: `quarantined` until timeout fix lands (Step 4). **Expected result source**: `results_report.md` row 59. --- ### NFT-RES-05: Settings save with upstream 500 — UI state recovers **Summary**: A 500 on settings save surfaces an error and resets the `saving` flag. **Traces to**: AC-27 **Preconditions**: - Form filled in valid state on `/settings`. **Fault injection**: - Upstream `PUT /api/annotations/settings/system` returns 500 after T ms. **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | Click Save | PUT issued | | 2 | Stub responds 500 within 2 s | failure | | 3 | Inspect within 2 s | toast / inline error; `saving === false`; no route navigation | **Pass criteria**: row 68. **Status**: `quarantined` until Step 4 try/finally fix. **Expected result source**: `results_report.md` row 68. --- ### NFT-RES-06: Settings save with network drop — try/finally state reset **Summary**: When the underlying fetch throws (network drop), `saving` resets and the user sees an error. **Traces to**: AC-27 **Preconditions**: - Form filled in valid state. **Fault injection**: - Network drop mid-PUT (`fetch` rejects). **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | Click Save | PUT issued | | 2 | Stub throws | rejection delivered | | 3 | Inspect | `saving === false`; error surfaced | **Pass criteria**: row 69. **Status**: `quarantined`. **Expected result source**: `results_report.md` row 69. --- ### NFT-RES-07: nginx 413 on oversized upload surfaces user-visible error **Summary**: An upload that exceeds `client_max_body_size 500M` returns 413; the UI presents a user-facing message (no silent failure, no `alert()`). **Traces to**: AC-10 **Preconditions**: - Authenticated; `` open. **Fault injection**: - Drop a 501 MB synthetic file. **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | Upload 501 MB | upload starts | | 2 | nginx rejects | 413 delivered | | 3 | Inspect UI | error containing the i18n "file too large" string; no `alert()` invoked | **Pass criteria**: row 39. **Expected result source**: `results_report.md` row 39. --- ### NFT-RES-08: Refresh cookie expired — redirect to /login **Summary**: When the refresh cookie is gone (or expired) and a 401 occurs, the SPA redirects the user to `/login` rather than silently looping refresh. **Traces to**: AC-01, AC-22 **Preconditions**: - Cookie cleared from the browser jar. **Fault injection**: - 401 on any authenticated call; subsequent `POST /api/admin/auth/refresh` returns 401. **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | Issue authenticated call | 401 | | 2 | Refresh attempted | 401 (no cookie) | | 3 | Observe routing | redirect to `/login` | **Pass criteria**: final URL `/login`; no infinite refresh loop (single refresh attempt). Derived from AC-01 + AC-22; no specific results_report row binds the loop bound — Phase 3 flags this and the loop bound is added to row 03 if accepted. --- ### NFT-RES-09: Annotation download tainted-canvas fallback **Summary**: When `.toBlob()` raises a tainted-canvas exception (cross-origin video frame), the user sees an error rather than a silent no-op. **Traces to**: NFR (`04_verification_log.md` finding on `handleDownload` tainted-canvas risk) **Preconditions**: - An annotation is loaded from a video sourced with CORS that taints the canvas. **Fault injection**: - Cross-origin video source taints the canvas; `toBlob` throws. **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | Click Download | export attempted | | 2 | toBlob throws | error handled | | 3 | Inspect UI | user-visible error; no `alert()`; no silent swallow | **Pass criteria**: row 96 — error surfaced; no silent swallow; no fabricated blob; no `alert()`. **Expected result source**: `results_report.md` row 96. --- ### NFT-RES-10: SSE server disconnect — UI surfaces a connection-lost indicator **Summary**: When the suite server closes a live-GPS or status-events SSE without rotation, the UI does NOT show stale data and DOES indicate the connection lost. **Traces to**: AC-08, AC-09, AC-24 **Preconditions**: - One SSE stream open. **Fault injection**: - Server force-closes the stream (no rotation). **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | Drop the stream from the server | `error` fires on EventSource | | 2 | Observe DOM | a connection-lost indicator is rendered (or stale-data badge) | | 3 | Observe reconnect behavior | `EventSource` auto-retries per browser default; if the SPA re-creates it, exactly one new instance | **Pass criteria**: row 97 — connection-lost indicator OR reconnect attempt within 10 s; stale data NOT rendered as live; reconnect attempts ≤ 1 in the 10 s window. **Expected result source**: `results_report.md` row 97. --- ### NFT-RES-11: Tile endpoint 401/503 does NOT crash the map **Summary**: When the `satellite-provider /tiles/{z}/{x}/{y}` endpoint returns 401 (cookie-auth failure) or 503 (Google Maps upstream down), the SPA renders a broken-tile placeholder for the failing tile(s) and the rest of the application keeps working. No React error boundary fires; no full-page crash. **Traces to**: AC-41 (AZ-498 NFR-Reliability) **Preconditions**: - `` mounted with a valid `VITE_SATELLITE_TILE_URL`. - Tile endpoint configured to return 401 (auth failure) OR 503 (upstream provider down) for one or more tile coordinates. **Fault injection**: - (auth-failure variant) Strip / invalidate the satellite-provider auth cookie before the SPA attempts a tile fetch; tile endpoint responds 401. - (upstream-down variant) Configure the test stub to return 503 for `GET /tiles/{z}/{x}/{y}`. **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | Mount ``; trigger a tile load that fails per the fault | Leaflet emits a `tileerror` event for the affected coordinate | | 2 | Observe the rendered map | broken-tile placeholder shown in the failing cell; surrounding tiles continue rendering normally | | 3 | Observe the rest of the SPA (header, side panels, navigation) | remains interactive; no React error boundary fires; no console error of category `Uncaught` | | 4 | Observe a recovery path (auth restored OR upstream back) | next pan/zoom successfully fetches the tile; the placeholder is replaced with the imagery | **Pass criteria**: - 401 response on a tile request MUST NOT crash the map; broken-tile placeholder rendered in the failing cell, rest of SPA interactive. - 503 response treated identically to 404/transient failure (fault budget — recovery path works after the upstream returns). - No new uncaught error in the console attributable to the failed tile. **Expected result source**: AZ-498 NFR-Reliability (no `results_report.md` row needed — observable through DOM state and console). **Note on follow-up**: AZ-498 risk #5 flags an optional `tileerror` listener on `` that surfaces a structured warning + an optional inline banner ("Imagery unavailable; please re-sign-in"). If/when that lands, this scenario gains a Step 5 asserting the banner appears within 2 s of the first tile error.