[AZ-455] Decompose Step 3 — test task specs (AZ-457..AZ-482)

Adds 26 blackbox-test task specs under epic AZ-455 plus the matching rows in _dependencies_table.md. Each task depends on AZ-456 (test infrastructure). Advances autodev existing-code flow Step 5 → Step 6 (Implement Tests, cycle 1) ready for batch implementation. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-21 23:01:11 +00:00 · 2026-05-11 01:49:44 +03:00
parent 83dee6759f
commit 15a878d6f1
28 changed files with 1678 additions and 8 deletions
@@ -1,5 +1,7 @@
 # Task Dependencies

+## Epic AZ-447 — 01-testability-refactoring (Autodev Step 4)
+
 | Task | Name | Epic | Complexity | Depends on |
 |------|------|------|-----------|------------|
 | AZ-448 | C01 — Externalize OWM API key | AZ-447 | 2 | None |
@@ -10,9 +12,64 @@
 | AZ-453 | C06 — `navigateToLoginImpl()` accessor | AZ-447 | 2 | None |
 | AZ-454 | C07 — Document `setToken/getToken` | AZ-447 | 1 | None |

-## Notes
+### Notes (AZ-447)

 - Epic AZ-447 is the umbrella for the autodev existing-code Step 4 testability run (`01-testability-refactoring`).
 - AZ-448 and AZ-449 share `src/features/flights/flightPlanUtils.ts` and should land in one commit to avoid a mid-state where the URL still hardcodes a base while the key is externalized.
- Total: 14 complexity points across 7 tasks.
- Every task fits the existing-code flow Step 4 allowed-change list (externalize hardcoded URLs/credentials, wrap globals in thin accessors, comment-only documentation). Deferred items are in `_docs/04_refactoring/01-testability-refactoring/deferred_to_refactor.md`.
+- Total: 14 complexity points across 7 tasks. **Status: closed** — all tasks done (see `_docs/04_refactoring/01-testability-refactoring/FINAL_report.md`).
+- Every task fit the existing-code flow Step 4 allowed-change list (externalize hardcoded URLs/credentials, wrap globals in thin accessors, comment-only documentation). Deferred items are in `_docs/04_refactoring/01-testability-refactoring/deferred_to_refactor.md`.
+
+---
+
+## Epic AZ-455 — Blackbox Tests (Autodev Step 5, tests-only decomposition)
+
+| Task | Name | Epic | Complexity | Depends on |
+|------|------|------|-----------|------------|
+| AZ-456 | Test Infrastructure (Vitest + MSW + Playwright + static) | AZ-455 | 5 | None |
+| AZ-457 | Auth & token handling (11 scenarios) | AZ-455 | 5 | AZ-456 |
+| AZ-458 | SSE lifecycle + bearer rotation (9 scenarios) | AZ-455 | 5 | AZ-456 |
+| AZ-459 | Wire-contract enum compliance (4 scenarios) | AZ-455 | 2 | AZ-456 |
+| AZ-460 | Annotation save URL + payload contract (2 scenarios) | AZ-455 | 2 | AZ-456 |
+| AZ-461 | Detection endpoints sync/async/long-video (3 scenarios) | AZ-455 | 2 | AZ-456 |
+| AZ-462 | Overlay window membership edges (4 scenarios) | AZ-455 | 2 | AZ-456 |
+| AZ-463 | Flight selection persistence + memory soaks (4 scenarios) | AZ-455 | 3 | AZ-456 |
+| AZ-464 | Bulk-validate URL + body + UI sync (3 scenarios) | AZ-455 | 2 | AZ-456 |
+| AZ-465 | i18n parity + t() coverage + detector + persistence (4 scenarios) | AZ-455 | 3 | AZ-456 |
+| AZ-466 | Destructive UX + ConfirmDialog + no-alert (8 scenarios) | AZ-455 | 4 | AZ-456 |
+| AZ-467 | ProtectedRoute spinner + timeout + RBAC (7 scenarios) | AZ-455 | 4 | AZ-456 |
+| AZ-468 | Header flight dropdown a11y + Escape (3 scenarios) | AZ-455 | 2 | AZ-456 |
+| AZ-469 | Browser support + responsive variants (3 scenarios) | AZ-455 | 2 | AZ-456 |
+| AZ-470 | Panel-width debounced PUT + rehydration (3 scenarios) | AZ-455 | 2 | AZ-456 |
+| AZ-471 | CanvasEditor draw/resize/multi-select/zoom/pan (5 scenarios) | AZ-455 | 5 | AZ-456 |
+| AZ-472 | DetectionClasses load + hotkeys + click + fallback (4 scenarios) | AZ-455 | 3 | AZ-456 |
+| AZ-473 | PhotoMode switch + auto-select + yoloId wire (3 scenarios) | AZ-455 | 2 | AZ-456, AZ-472 |
+| AZ-474 | Tile-split + YOLO parser + auto-zoom + indicator (6 scenarios) | AZ-455 | 3 | AZ-456 |
+| AZ-475 | Numeric form hygiene (2 scenarios) | AZ-455 | 2 | AZ-456 |
+| AZ-476 | Upload 501 MB → 413 → user-visible error (2 scenarios) | AZ-455 | 2 | AZ-456 |
+| AZ-477 | Settings save 500/network resilience (5 scenarios) | AZ-455 | 3 | AZ-456 |
+| AZ-478 | Network offline + SSE disconnect + tainted-canvas (3 scenarios) | AZ-455 | 3 | AZ-456 |
+| AZ-479 | Bundle ≤2 MB + mission-planner excluded + FCP + soak (4 scenarios) | AZ-455 | 3 | AZ-456 |
+| AZ-480 | Prod image nginx:alpine + 500M + 9 routes + edge RAM (5 scenarios) | AZ-455 | 3 | AZ-456 |
+| AZ-481 | CI image tag scheme + OCI labels + revision binding (3 scenarios) | AZ-455 | 2 | AZ-456 |
+| AZ-482 | Secrets/banned-libs/AC-N1 anti-criterion (6 scenarios) | AZ-455 | 3 | AZ-456 |
+
+### Notes (AZ-455)
+
+- **Epic AZ-455** is the umbrella for the autodev existing-code Step 5 tests-only decomposition.
+- **Scenario count**: 117 distinct scenarios across 26 implementation tasks (T02..T27).
+- **Total complexity**: 79 points across 27 tasks (including infrastructure AZ-456).
+- **Critical-path dependency**: every test task depends on AZ-456 (Test Infrastructure). AZ-456 MUST be implemented first; tests cannot be implemented in parallel until the harness is in place.
+- **Internal cross-deps** (non-blocking for ordering, but worth noting for review):
+  - AZ-473 (PhotoMode) reuses the DetectionClasses fixtures landed by AZ-472; flagged as a soft dep to avoid duplicating fixture wiring.
+  - AZ-457 (Auth) lands `setToken / getToken / setNavigateToLogin` test helpers that AZ-458 (SSE bearer rotation), AZ-467 (ProtectedRoute), and AZ-468 (Header dropdown — uses authed page) all rely on. Recommended landing order after AZ-456: AZ-457 → everything else in parallel.
+  - AZ-466 (Destructive UX) lands the `data-destructive` marker + `<DestructiveButton>` wrapper used by the static enforcement check; per-feature surfaces (admin user delete, class delete, flight delete) need the marker before AZ-466 can pass — handled in-task by AZ-466 itself (lands the wrapper AND the markers across existing destructive surfaces in scope).
+  - AZ-479 (bundle/FCP) is the only test task that requires `bun run build` to succeed against the SPA; if a Phase-B feature breaks build, AZ-479 starts failing first.
+  - AZ-480 / AZ-481 require the production image build pipeline; they can be implemented in parallel with the rest but their CI lane is conditional on the `dev` branch (out-of-band from feature merge gates).
+- **Profile distribution**:
+  - `static` only: AZ-481, AZ-482, parts of AZ-459, AZ-465, AZ-466, AZ-474, AZ-479, AZ-480
+  - `fast` only (no e2e companion): AZ-462, AZ-468, AZ-470 (e2e companion present but not gating), AZ-471 (mostly fast, e2e smoke), AZ-475, AZ-477
+  - `fast + e2e`: most of the rest
+  - `e2e (long-running)`: AZ-463 NFT-RES-LIM-06/07, AZ-479 NFT-RES-LIM-05 — tagged `@long-running`, run on `dev`/`stage` merges only
+  - `e2e (requires-docker)`: AZ-480 — requires the suite docker-compose stack
+  - `e2e (requires-ci)`: AZ-481 NFT-RES-LIM-12/13 — local skip allowed
+- **Quarantine scenarios**: FT-P-12 (async video detect, AZ-461) starts QUARANTINEd until AC-25 / Phase B; verification_pending enums in AZ-459 quarantine until Step 4 .NET-service snapshot lifts.
@@ -0,0 +1,79 @@
+# Test — Auth & Token Handling
+
+**Task**: AZ-457_test_auth_token_handling
+**Name**: Auth & token-handling blackbox suite
+**Description**: Implement every blackbox test that exercises the in-memory bearer, the refresh cookie, the 401→refresh→retry path, and the redirect-to-/login flow. Spans `fast` (MSW) and `e2e` (real `admin/`) profiles.
+**Complexity**: 5 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 01_api-transport + 02_auth (Blackbox Tests)
+**Tracker**: AZ-457
+**Epic**: AZ-455
+
+## Problem
+
+The SPA's auth surface (`<AuthContext>` + `src/api/client.ts` + `<ProtectedRoute>`) is the gate every other test depends on. It mixes a memory-only bearer (AC-02 / O2), a HttpOnly refresh cookie (AC-03), a 401-retry loop (AC-23), and a redirect on refresh failure (C06 from autodev Step 4). Tests must lock down the wire contract AND the storage contract without leaking into production.
+
+## Outcome
+
+- 11 test scenarios pass in their declared profile, all referencing `results_report.md` rows for expected observables.
+- The fast suite uses MSW to drive the `/api/admin/auth/refresh` path; e2e uses the real `admin/` service.
+- No test stubs `src/api/client.ts`, `<AuthContext>`, or `<ProtectedRoute>` internals — every assertion is observable at the DOM, network, or browser-storage surface.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file | results_report row |
+|----------|---------|-------------|--------------------|
+| FT-P-01 — bootstrap refresh sends `credentials:'include'` | fast | blackbox-tests.md | 02 |
+| FT-P-02 — 401 → refresh → retry sequence | fast + e2e | blackbox-tests.md | 03, 12 |
+| FT-P-03 — refresh transparency, no `<ProtectedRoute>` unmount | fast | blackbox-tests.md | 11 |
+| FT-N-04 — unauthenticated `/admin` → redirect to `/login` | fast | blackbox-tests.md | row(s) per blackbox-tests.md FT-N-04 |
+| NFT-SEC-01 — bearer never in localStorage/sessionStorage | fast | security-tests.md | 01 |
+| NFT-SEC-02 — refresh cookie not in `document.cookie` | fast | security-tests.md | 04 |
+| NFT-SEC-03 — refresh cookie has `Secure; HttpOnly; SameSite=Strict` | e2e | security-tests.md | 05 |
+| NFT-SEC-04 — `credentials:'include'` on every authed fetch | fast | security-tests.md | 06 |
+| NFT-PERF-02 — exactly one refresh round trip per cycle | fast | performance-tests.md | 12 |
+| NFT-RES-01 — 401→refresh→retry is transparent end-to-end | fast | resilience-tests.md | 03, 11 |
+| NFT-RES-08 — refresh cookie expired → redirect to `/login` | fast | resilience-tests.md | row(s) per NFT-RES-08 |
+
+### Excluded
+
+- SSE bearer rotation (covered in 03_test_sse_lifecycle).
+- `/settings` / `/admin` RBAC beyond the unauthenticated-redirect case (covered in 12_test_protected_route_rbac).
+- ConfirmDialog and destructive-action gating (covered in 11_test_destructive_ux).
+
+## Acceptance Criteria
+
+**AC-1: All 11 scenarios implemented**
+Given the test infrastructure from AZ-456 is in place,
+When `bun run test:fast` and `bun run test:e2e` execute,
+Then every scenario above is present, runs in its declared profile, and references its `results_report.md` row in test comments.
+
+**AC-2: Black-box discipline**
+No test imports anything from `src/api/client.ts` (other than the public `setToken` / `getToken` / `setNavigateToLogin` accessors per AC-23 / autodev Step 4 testability changes); no test imports `<AuthContext>` internals; no test asserts on React state directly.
+
+**AC-3: Storage assertions are exhaustive**
+NFT-SEC-01 / NFT-SEC-02 assert that **for the duration of the test** the bearer never appears in `localStorage`, `sessionStorage`, or `document.cookie` — not just at one snapshot.
+
+**AC-4: Redirect assertion uses the accessor**
+FT-N-04 and NFT-RES-08 install a `setNavigateToLogin(spy)` and assert `spy` was called exactly once with no arguments; they do NOT globally stub `window.location`.
+
+## System Under Test Boundary
+
+- The system under test is the assembled SPA: React tree + `src/api/client.ts` + `<AuthContext>` + `<ProtectedRoute>` + Router.
+- Allowed stubs: the suite's `admin/` `auth/refresh` and `auth/login` endpoints — stubbed via MSW in `fast`, real service in `e2e`.
+- Disallowed: stubbing `src/api/client.ts`, `<AuthContext>`, `<ProtectedRoute>`, or any other internal SPA module. If `<AuthContext>` is not behaving as expected, the test FAILS — it does not bypass it.
+- Expected observables compared against `_docs/00_problem/input_data/expected_results/results_report.md` rows 02, 03, 04, 05, 06, 11, 12, and the rows for FT-N-04 / NFT-RES-08 as listed in `traceability-matrix.md`.
+
+## Constraints
+
+- One test file per top-level concern; co-locate next to source (e.g., `src/api/client.test.ts`, `src/auth/AuthContext.test.tsx`, `src/auth/ProtectedRoute.test.tsx`).
+- MSW handlers live in `tests/msw/handlers/admin.ts`; per-test overrides via `server.use(...)` only.
+- E2E auth uses the `op_alice` seed user (test-data.md).
+
+## Risks & Mitigation
+
+**Risk 1 — Refresh-cookie HttpOnly invisibility**
+- *Risk*: AC-03 requires `HttpOnly` — by spec, the cookie is invisible to JS, so the fast profile cannot directly assert presence.
+- *Mitigation*: in `fast`, MSW asserts the response `Set-Cookie` header carries the three flags (the server contract). In `e2e`, the test uses Playwright's `context.cookies()` (which sees HttpOnly cookies) to verify presence + flags. NFT-SEC-03 runs e2e-only.
@@ -0,0 +1,76 @@
+# Test — SSE Lifecycle & Bearer Rotation
+
+**Task**: AZ-458_test_sse_lifecycle
+**Name**: SSE lifecycle + bearer-rotation reconnect
+**Description**: Implement every blackbox test that exercises the SPA's SSE streams — live-GPS, annotation-status — covering open/close lifecycle and the bearer-rotation reconnect (≤5 s) after a token refresh.
+**Complexity**: 5 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 01_api-transport + 05_flights + 06_annotations (Blackbox Tests)
+**Tracker**: AZ-458
+**Epic**: AZ-455
+
+## Problem
+
+The SPA holds two long-running `EventSource` connections — live-GPS (per `<FlightsPage>`) and annotation-status (per `<AnnotationsPage>`) — using the in-query-string bearer pattern (ADR-008). The open/close timing and the reconnect-on-refresh path must be deterministic; otherwise tests downstream see flapping streams and undefined ordering.
+
+## Outcome
+
+- 9 test scenarios pass, asserting open/close timing and bearer-rotation reconnect behavior.
+- Fast tests use `tests/helpers/sse-mock.ts` (per AZ-456 Risk 3); e2e tests use the embedded LiveGPS + annotation-status generators in the suite test images.
+- No test reads or manipulates `src/api/sse.ts` internals.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file | results_report row |
+|----------|---------|-------------|--------------------|
+| FT-P-09 — annotation-status SSE opens on `<AnnotationsPage>` mount | fast + e2e | blackbox-tests.md | per FT-P-09 |
+| FT-P-10 — annotation-status SSE closes on unmount | fast | blackbox-tests.md | per FT-P-10 |
+| FT-P-18 — live-GPS SSE opens within 5 s of flight select | fast + e2e | blackbox-tests.md | per FT-P-18 |
+| FT-P-19 — live-GPS SSE closes within 1 s of deselect | fast | blackbox-tests.md | per FT-P-19 |
+| NFT-PERF-03 — SSE bearer-rotation reconnect ≤ 5 s | e2e | performance-tests.md | per NFT-PERF-03 |
+| NFT-PERF-04 — live-GPS SSE opens within 5 s of flight select | fast | performance-tests.md | per NFT-PERF-04 |
+| NFT-PERF-05 — live-GPS SSE closes within 1 s of deselect | fast | performance-tests.md | per NFT-PERF-05 |
+| NFT-PERF-06 — annotation-status SSE unsubscribes within 1 s on page unmount | fast | performance-tests.md | per NFT-PERF-06 |
+| NFT-RES-02 — SSE bearer rotation — both streams reconnect within 5 s | e2e | resilience-tests.md | per NFT-RES-02 |
+
+### Excluded
+
+- SSE server-disconnect indicator (covered in 23_test_network_resilience).
+- The 401-retry path on `fetch` (covered in 02_test_auth_token_handling).
+- The async-video detect SSE (covered in 06_test_detection_endpoints — quarantined as Phase B target).
+
+## Acceptance Criteria
+
+**AC-1: Open/close timing**
+Given a freshly-mounted `<FlightsPage>` or `<AnnotationsPage>`,
+When the relevant SSE should open or close per the scenario,
+Then the event happens within the timing budget and the observable EventSource state matches (`readyState: 1` for open, closed connection for close).
+
+**AC-2: Bearer rotation**
+Given an active SSE stream and an upcoming token refresh,
+When the bearer rotates,
+Then both live-GPS and annotation-status streams reconnect with the new bearer within 5 s; no event from before the rotation is replayed AFTER the new bearer is in use.
+
+**AC-3: No internal stubs**
+The fast profile uses the published `sse-mock.ts` helper to simulate the wire-level SSE event stream. Tests MUST NOT replace `src/api/sse.ts` with a stub; if the SSE module misbehaves, the test FAILS.
+
+## System Under Test Boundary
+
+- System under test: `src/api/sse.ts` + the consumer hooks in `<FlightsPage>` / `<AnnotationsPage>` + the React tree.
+- Allowed stubs: the suite-side SSE endpoints — stubbed via the MSW SSE adapter / `sse-mock.ts` in `fast`, real `flights/` and `annotations/` services in `e2e`.
+- Disallowed: stubbing `src/api/sse.ts`, the SPA's SSE hooks, or the React tree.
+- Expected observables per `results_report.md` rows for FT-P-09, 10, 18, 19, NFT-PERF-03..06, NFT-RES-02.
+
+## Constraints
+
+- E2E tests rely on the embedded LiveGPS simulator and annotation-status event generator in the suite's test-mode images (per test-data.md).
+- Bearer in SSE URL `?token=` per ADR-008 — tests assert the URL pattern, not stripped headers.
+- Bearer-rotation tests MUST drive the rotation through `<AuthContext>`'s normal refresh path, not by directly calling `setToken`.
+
+## Risks & Mitigation
+
+**Risk 1 — MSW SSE polyfill brittleness (AZ-456 Risk 3)**
+- *Risk*: MSW 2.x does not have first-class SSE support. The `sse-mock.ts` helper introduces a small abstraction that, if buggy, makes tests pass that shouldn't.
+- *Mitigation*: every fast SSE scenario also has an e2e companion that exercises the real wire. If the two disagree, the e2e wins and the fast test is QUARANTINEd until the helper is fixed.
@@ -0,0 +1,76 @@
+# Test — Wire-Contract Enum Compliance
+
+**Task**: AZ-459_test_wire_contract_enums
+**Name**: Enum compliance + MediaType hygiene
+**Description**: Implement every blackbox test that asserts the SPA's on-wire enum values (`AnnotationStatus`, `MediaStatus`, `Affiliation`, `CombatReadiness`, `MediaType`, `AnnotationSource`) match the contract pinned in `enum_spec_snapshot.json`, plus the MediaType magic-literal hygiene check.
+**Complexity**: 2 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 07_dataset + 06_annotations + cross-cutting (Blackbox Tests)
+**Tracker**: AZ-459
+**Epic**: AZ-455
+
+## Problem
+
+`AC-04` and `AC-29` require the UI to send/accept the exact numeric values from the suite spec. Today the UI carries drift (documented in `enum_spec_snapshot.json` § `ui_drift_summary`) — for `AnnotationStatus` the UI sends `Edited=1` while the contract pins `Edited=20`. Tests must FAIL on the drift so the regression is visible until the Phase B fix lands; they must NOT silently accept the wrong values.
+
+## Outcome
+
+- 4 test scenarios pass (or fail loudly to document the drift) per the contract pin.
+- Tests load `_docs/00_problem/input_data/enum_spec_snapshot.json` once per test file and compare runtime values against the pinned values.
+- `verification_pending: true` enums (`CombatReadiness`, `MediaType`) are quarantined with a clear marker until the Step 4 .NET-service inspection lands.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file | results_report row |
+|----------|---------|-------------|--------------------|
+| FT-P-04 — AnnotationStatus enum on the wire | static + fast | blackbox-tests.md | 14 |
+| FT-P-05 — MediaStatus / Affiliation / CombatReadiness enums match the spec | static + fast | blackbox-tests.md | 15, 16, 17 |
+| FT-P-06 — detection wire payload — affiliation + combatReadiness in spec value sets | fast + e2e | blackbox-tests.md | 18, 19 |
+| FT-N-15 — MediaType magic-literal / magic-string hygiene | static + fast | blackbox-tests.md | per FT-N-15 |
+
+### Excluded
+
+- Annotation save body shape (covered in 05_test_annotations_endpoint).
+- DetectionClasses ordering / hotkey path (covered in 17_test_detection_classes).
+
+## Acceptance Criteria
+
+**AC-1: Snapshot-driven assertion**
+Given `_docs/00_problem/input_data/enum_spec_snapshot.json` is the contract pin,
+When any test in this task runs,
+Then it loads the snapshot and asserts the runtime wire value matches the pinned value per row 14, 15, 16, 17, 18, 19 of `results_report.md`.
+
+**AC-2: Drift surfaces, not silently passes**
+Given the UI today drifts on `AnnotationStatus` (Edited=1 vs spec 20) and similar,
+When the test runs against today's UI,
+Then it FAILS with a message naming the enum and the observed-vs-expected pair. (It does NOT pass by re-reading the UI's value as authoritative.)
+
+**AC-3: verification_pending markers**
+Given `CombatReadiness` and `MediaType` carry `verification_pending: true` in the snapshot,
+When the test sees the flag,
+Then it emits a clear QUARANTINE marker (CSV `Result: QUARANTINE`) and a comment naming the resolution path (Step 4 .NET-service inspection).
+
+**AC-4: MediaType magic-literal hygiene (FT-N-15)**
+Given the source tree at HEAD,
+When the static check scans `src/` for hardcoded numeric `MediaType` literals,
+Then any literal not wrapped in the typed enum is flagged.
+
+## System Under Test Boundary
+
+- System under test: the actual fetch URLs and payloads issued by the SPA, the rendered DOM that surfaces enum-derived state, and (for FT-N-15) the source tree itself.
+- Allowed stubs: MSW handlers that capture the outbound request and assert the numeric values.
+- Disallowed: importing `src/types/index.ts` and comparing the snapshot to it — that's a tautology. The test compares the snapshot to the RUNTIME wire output. Importing the typed enum SHAPES for declaration is fine per `P9` / black-box discipline (the enums are the wire contract).
+- Expected observables compared against `_docs/00_problem/input_data/expected_results/results_report.md` rows 14-19.
+
+## Constraints
+
+- Snapshot must be read once per module; cache via Vitest module-level import.
+- FT-N-15 runs as a `ripgrep`-driven static check from `scripts/run-tests.sh --static-only`.
+
+## Risks & Mitigation
+
+**Risk 1 — verification_pending lifts mid-implementation**
+- *Risk*: If Step 4 .NET-service inspection lands while these tests are being implemented, the snapshot rewrites and a previously-QUARANTINEd test may flip to PASS or FAIL.
+- *Mitigation*: tests read the snapshot at runtime — they auto-adapt. The QUARANTINE marker is conditional on the runtime `verification_pending` flag.
@@ -0,0 +1,64 @@
+# Test — Annotations Endpoint & Payload Shape
+
+**Task**: AZ-460_test_annotations_endpoint
+**Name**: Annotation save URL + payload contract
+**Description**: Implement the two blackbox tests that pin the annotation-save endpoint URL (AC-N5 doubly-prefixed canary) and the full required-fields payload shape (`Source`, `WaypointId`, `videoTime`, `mediaId`, `detections`, `status`).
+**Complexity**: 2 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 06_annotations (Blackbox Tests)
+**Tracker**: AZ-460
+**Epic**: AZ-455
+
+## Problem
+
+Two prior regressions (`_docs/02_document/modules/src__features__annotations.md` finding #32 and the AC-N5 doubly-prefixed URL) are silent if no test pins the contract. The annotation save path is one mutation away from leaking AI vs Manual provenance or dropping the waypoint association — both impossible to recover after-the-fact.
+
+## Outcome
+
+- 2 test scenarios pass per the contract.
+- Tests assert ALL six required keys are present in every save, regardless of which UI path issued the save (AI suggestion accept, manual draw, multi-edit).
+- The endpoint URL canary detects doubly-prefixed regressions immediately.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file | results_report row |
+|----------|---------|-------------|--------------------|
+| FT-P-07 — annotation save endpoint URL is doubly-prefixed | fast + e2e | blackbox-tests.md | per FT-P-07 |
+| FT-P-08 — annotation save body contains all required fields | fast + e2e | blackbox-tests.md | row 23 |
+
+### Excluded
+
+- Bulk-validate path (covered in 09_test_bulk_validate).
+- Annotation-status SSE (covered in 03_test_sse_lifecycle).
+- Tile-split path (covered in 19_test_tile_split_zoom).
+
+## Acceptance Criteria
+
+**AC-1: URL canary**
+FT-P-07 captures the outbound URL for every annotation save in the test and asserts it equals the expected canary (`/api/annotations/annotations/...`).
+
+**AC-2: Required-fields presence**
+FT-P-08 captures the outbound body for every annotation save and asserts ALL of `Source`, `WaypointId`, `videoTime`, `mediaId`, `detections`, `status` are present, with `Source` ∈ `{AI, Manual}` per the wire contract.
+
+**AC-3: Multiple entry points**
+The required-fields check runs for at least three save entry points: AI suggestion accept, manual draw, and bulk-edit save — to catch path-specific regressions.
+
+## System Under Test Boundary
+
+- System under test: `<AnnotationsPage>` + `<CanvasEditor>` + `src/api/client.ts` save call path.
+- Allowed stubs: MSW for the suite's annotations endpoint (fast); real `annotations/` service (e2e).
+- Disallowed: stubbing the components above; reading their internal state.
+- Expected observables compared against `results_report.md` row 23 + the row for FT-P-07.
+
+## Constraints
+
+- Tests MUST use the typed wire shape from `src/types/index.ts` (enum compliance is in 04_test_wire_contract_enums; here we only check key presence).
+- E2E tests run after a fresh seed so the test can issue multiple saves without polluting state.
+
+## Risks & Mitigation
+
+**Risk 1 — Source field default**
+- *Risk*: If a future commit defaults `Source` to one branch unconditionally, the test may pass but the provenance is lost.
+- *Mitigation*: AC-3 — the test runs for both AI and Manual entry points and asserts the specific value, not just presence.
@@ -0,0 +1,62 @@
+# Test — Detection Endpoints (Sync / Async / Long-Video)
+
+**Task**: AZ-461_test_detection_endpoints
+**Name**: Detection endpoints — sync image + async video (Phase B target) + long-video header
+**Description**: Implement the three blackbox tests that exercise the `detect/` service's three detection paths from the SPA: sync image detect (FT-P-11), async video detect SSE (FT-P-12, quarantined as Phase B target), and the `X-Refresh-Token` header carried on long-video detect (FT-P-13).
+**Complexity**: 2 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 06_annotations (detection sub-surface) (Blackbox Tests)
+**Tracker**: AZ-461
+**Epic**: AZ-455
+
+## Problem
+
+Detection is multimodal: sync for images (today), async with SSE updates for video (Phase B per AC-25), and long-video has its own header contract for token rotation. Without targeted tests, regressions slip in mode-by-mode.
+
+## Outcome
+
+- 3 test scenarios are present in their declared profile and reference their `results_report.md` rows.
+- FT-P-12 is QUARANTINEd in CI with a clear marker until the async-video path lands.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file | results_report row |
+|----------|---------|-------------|--------------------|
+| FT-P-11 — sync image detect endpoint | fast + e2e | blackbox-tests.md | per FT-P-11 |
+| FT-P-12 — async video detect endpoint + SSE (target — Phase B) | fast (quarantined) | blackbox-tests.md | per FT-P-12 |
+| FT-P-13 — long-video detect carries `X-Refresh-Token` header | fast | blackbox-tests.md | per FT-P-13 |
+
+### Excluded
+
+- Annotation save after detect (covered in 05_test_annotations_endpoint).
+- Detection class CRUD (covered in 17_test_detection_classes).
+
+## Acceptance Criteria
+
+**AC-1: Sync image detect URL + body**
+FT-P-11 asserts the outbound POST URL + body shape for sync image detect against the contract.
+
+**AC-2: Async video detect — quarantined**
+FT-P-12 is implemented and registered, but marked `Result: QUARANTINE` in the CSV report until AC-25 (Phase B) lands. The test code itself runs (does not just `xit`) and produces a clear log entry "FT-P-12 awaits AC-25 / async video detect impl".
+
+**AC-3: Header carried**
+FT-P-13 asserts every long-video detect request carries the `X-Refresh-Token` header.
+
+## System Under Test Boundary
+
+- System under test: the SPA's `<AnnotationsPage>` (or its detect-trigger sub-component) + `src/api/client.ts`.
+- Allowed stubs: MSW for `/api/detect/*` (fast); real `detect/` service (e2e — async video path stays quarantined until the suite has it).
+- Disallowed: stubbing the SPA components or constructing the request manually from a unit test.
+- Expected observables per `results_report.md` rows for FT-P-11, 12, 13.
+
+## Constraints
+
+- FT-P-12 is QUARANTINE in CI, NOT skipped — the result must appear in the report with that status, traceable to AC-25.
+
+## Risks & Mitigation
+
+**Risk 1 — Async video detect path lands while tests are being implemented**
+- *Risk*: AC-25 may go from target to shipped, flipping FT-P-12 from QUARANTINE to PASS / FAIL.
+- *Mitigation*: FT-P-12 reads the runtime UI to detect whether the path exists; the QUARANTINE marker is conditional on absence.
@@ -0,0 +1,57 @@
+# Test — Annotation Overlay Window Membership
+
+**Task**: AZ-462_test_overlay_membership
+**Name**: Overlay membership at the in-window edges
+**Description**: Pin the 4 inclusive/exclusive edge cases for the annotation overlay window (FT-P-14, FT-P-15, FT-N-01, FT-N-02). These edges have caused subtle off-by-one regressions before; without explicit edge-tests they recur silently.
+**Complexity**: 2 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 06_annotations (overlay window) (Blackbox Tests)
+**Tracker**: AZ-462
+**Epic**: AZ-455
+
+## Problem
+
+The overlay window logic asserts an annotation is "in-window" iff `lowerBound <= ts <= upperBound`. Off-by-one mistakes (strict instead of inclusive, or shifted by one frame interval) only surface when the playback head is exactly on the boundary — a state that test scenarios rarely hit by accident.
+
+## Outcome
+
+- 4 edge-case scenarios pass per the inclusive-boundary contract.
+- Tests are deterministic — they construct exact-edge fixtures rather than relying on real-time playback.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file |
+|----------|---------|-------------|
+| FT-P-14 — overlay membership at the lower in-window edge | fast | blackbox-tests.md |
+| FT-P-15 — overlay membership at the upper in-window edge | fast | blackbox-tests.md |
+| FT-N-01 — overlay annotation below the lower bound is NOT rendered | fast | blackbox-tests.md |
+| FT-N-02 — overlay annotation above the upper bound is NOT rendered | fast | blackbox-tests.md |
+
+### Excluded
+
+- Overlay style / Z-order / color (covered by class-color tests in component scenarios).
+- Overlay clicking / selection (covered by 16_test_canvas_bbox).
+
+## Acceptance Criteria
+
+**AC-1: Inclusive boundary**
+FT-P-14 / FT-P-15 assert an annotation EXACTLY on `lowerBound` (resp. `upperBound`) IS rendered in the overlay.
+
+**AC-2: Strict exclusion**
+FT-N-01 / FT-N-02 assert an annotation one frame interval before / after the window is NOT rendered.
+
+**AC-3: Reads the DOM, not internal state**
+The "rendered" assertion queries the canvas / overlay DOM nodes (or rendered SVG / Leaflet markers). It does NOT inspect React component state.
+
+## System Under Test Boundary
+
+- System under test: `<CanvasEditor>` + overlay annotation rendering.
+- Allowed stubs: MSW for annotation list responses (the fixture pins exact timestamps).
+- Disallowed: stubbing the overlay logic itself, or asserting on React state.
+- Expected observables compared against `results_report.md` rows for FT-P-14, 15, FT-N-01, 02.
+
+## Constraints
+
+- Fixtures must pin timestamps to exact boundaries via deterministic numeric values; no `new Date()` or similar.
@@ -0,0 +1,67 @@
+# Test — Flight Selection, Persistence & Memory Soak
+
+**Task**: AZ-463_test_flight_selection_persistence
+**Name**: Flight selection persistence + 100-selection soak + 1-hour SSE soak
+**Description**: Implement the blackbox tests that pin the `PUT /api/annotations/settings/user` persistence path for the selected flight, the rehydration-on-boot path, and the two memory-soak tests (100 selections + 1-hour live-GPS SSE) that catch listener leaks.
+**Complexity**: 3 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 10_app-shell + 05_flights (Blackbox Tests)
+**Tracker**: AZ-463
+**Epic**: AZ-455
+
+## Problem
+
+The selected-flight state is the SPA's most-touched mutable singleton (per `<FlightContext>`). It must persist via the suite, rehydrate on boot, and NOT leak SSE listeners or `<FlightContext>` consumers across rapid selections. Past regressions in this surface manifested as memory creep over long sessions, hard to catch without a soak.
+
+## Outcome
+
+- 4 scenarios pass — selection persistence, rehydration, listener leak guard, memory soak.
+- Soak tests run in CI but are gated to `e2e` and tagged `Long-running` so they don't gate every commit.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file |
+|----------|---------|-------------|
+| FT-P-16 — flight selection persists via `PUT /api/annotations/settings/user` | fast + e2e | blackbox-tests.md |
+| FT-P-17 — selected-flight rehydration on boot | fast + e2e | blackbox-tests.md |
+| NFT-RES-LIM-06 — live-GPS SSE 1-hour soak — no listener leak, no memory creep | e2e (long-running) | resource-limit-tests.md |
+| NFT-RES-LIM-07 — 100 sequential flight selections — no leaked SSEs, no leaked Contexts | e2e (long-running) | resource-limit-tests.md |
+
+### Excluded
+
+- Live-GPS SSE timing (covered in 03_test_sse_lifecycle).
+- Flight CRUD / waypoint shape (covered in subsequent feature-cycle tasks; out of scope for the baseline test suite).
+
+## Acceptance Criteria
+
+**AC-1: Persistence wire pattern**
+FT-P-16 captures the outbound `PUT /api/annotations/settings/user` body and asserts the new `selectedFlightId` is in the payload (per the wire contract).
+
+**AC-2: Rehydration**
+FT-P-17 boots a fresh `<App>` against a seed where `op_alice` has `selectedFlightId` set; asserts the SPA renders that flight as initially selected without explicit user action.
+
+**AC-3: Listener leak guard**
+NFT-RES-LIM-07 performs 100 sequential `select(flightA) → select(flightB)` cycles; asserts the active EventSource count never exceeds 1 at the end of each cycle and the total `<FlightContext>` consumer count remains bounded.
+
+**AC-4: Memory soak**
+NFT-RES-LIM-06 maintains a live-GPS SSE open for 1 hour with the simulator emitting at 1 Hz; asserts the heap snapshot at t=3600 s is within 10% of the heap snapshot at t=60 s (allows for fixture growth, rejects leaks).
+
+## System Under Test Boundary
+
+- System under test: `<FlightContext>` + `<FlightsPage>` + `src/api/sse.ts` + `src/api/client.ts`.
+- Allowed stubs: MSW for fast; real `flights/` + `annotations/` services for e2e.
+- Disallowed: stubbing `<FlightContext>` or its consumers; reading React state directly.
+- Expected observables per `results_report.md` rows for FT-P-16, 17 + the resource-limit row binding for the soak tests (rows 64-65 region per traceability-matrix.md).
+
+## Constraints
+
+- Long-running tests (NFT-RES-LIM-06, 07) tagged `@long-running` in the Playwright config; CI only runs them on `dev`/`stage` merges, not on every commit.
+- Memory measurements via `page.evaluate(() => performance.memory.usedJSHeapSize)` in Chromium; Firefox skipped for these two (Firefox does not expose `performance.memory`).
+
+## Risks & Mitigation
+
+**Risk 1 — Soak test flakiness on shared CI runners**
+- *Risk*: A noisy neighbor on the runner inflates heap measurements, false-failing the soak.
+- *Mitigation*: 10% tolerance + retry once on `dev`/`stage`; manual approval required on `main`.
@@ -0,0 +1,55 @@
+# Test — Bulk-Validate (Dataset)
+
+**Task**: AZ-464_test_bulk_validate
+**Name**: Bulk-validate URL + body + UI sync
+**Description**: Implement the 3 blackbox tests that pin the dataset bulk-validate path: outbound URL, request body shape, and the post-validate UI sync (≤2 s).
+**Complexity**: 2 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 07_dataset (Blackbox Tests)
+**Tracker**: AZ-464
+**Epic**: AZ-455
+
+## Problem
+
+Bulk-validate is a single mutation that flips many `mediaStatus` values; getting the URL or the body wrong corrupts an arbitrary number of seeds. A targeted contract test pins both, plus a 2-s sync deadline catches stale-UI regressions.
+
+## Outcome
+
+- 3 scenarios pass per the contract.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file |
+|----------|---------|-------------|
+| FT-P-20 — bulk-validate request URL and body | fast + e2e | blackbox-tests.md |
+| FT-P-21 — bulk-validate UI reflects new status within 2 s | fast + e2e | blackbox-tests.md |
+| NFT-PERF-07 — bulk-validate UI reflects new status within 2 s | fast | performance-tests.md |
+
+### Excluded
+
+- Individual annotation status changes (covered in 05_test_annotations_endpoint).
+- Dataset filtering / paging (out of scope for the baseline test suite — feature-cycle work).
+
+## Acceptance Criteria
+
+**AC-1: URL canary**
+FT-P-20 captures the outbound bulk-validate URL and asserts it equals the contract value.
+
+**AC-2: Body shape**
+FT-P-20 captures the outbound body and asserts it carries the expected media-ID set + the target status value.
+
+**AC-3: UI sync deadline**
+FT-P-21 / NFT-PERF-07 measure the wall-clock from response receipt to DOM update of the dataset list rows; asserts ≤2 s.
+
+## System Under Test Boundary
+
+- System under test: `<DatasetPage>` + its bulk-validate action handler + `src/api/client.ts`.
+- Allowed stubs: MSW for the dataset bulk-validate endpoint; real `annotations/` service for e2e.
+- Disallowed: reading React state to assert UI sync — the test reads the rendered DOM (table rows / status badges).
+- Expected observables compared against `results_report.md` rows 36-37.
+
+## Constraints
+
+- Use the `seed_media` fixture's 6-row baseline so the bulk operation is bounded.
@@ -0,0 +1,66 @@
+# Test — i18n Coverage & Persistence
+
+**Task**: AZ-465_test_i18n
+**Name**: i18n key parity + t() coverage + detector + persistence
+**Description**: Implement the 4 blackbox tests that pin the i18n contract: en↔ua key parity (static), `t()` coverage (no raw user-visible strings), boot-time language detector, and persistence across reload.
+**Complexity**: 3 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 03_shared-ui + 10_app-shell (i18n) (Blackbox Tests)
+**Tracker**: AZ-465
+**Epic**: AZ-455
+
+## Problem
+
+A missing translation key or a hardcoded user-visible string only surfaces when a user switches language — by which point it's a customer-visible defect. Static checks + a behavioral test for detect/persist catch these at commit time.
+
+## Outcome
+
+- 4 scenarios pass per the contract.
+- The "no raw strings" check is enforceable in CI and produces a clear allow-list mechanism for legitimate non-i18n text (e.g. brand names).
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file | results_report row |
+|----------|---------|-------------|--------------------|
+| FT-P-22 — i18n key parity en ↔ ua | static | blackbox-tests.md | 45 |
+| FT-P-23 — no raw user-visible strings outside `t(...)` | static | blackbox-tests.md | 46 |
+| FT-P-24 — i18n detector path used at first boot | fast + e2e | blackbox-tests.md | 47 |
+| FT-P-25 — i18n persistence across reload | fast + e2e | blackbox-tests.md | 48 |
+
+### Excluded
+
+- Adding a third language (project is en + ua per scope).
+- RTL support (not required by any AC).
+
+## Acceptance Criteria
+
+**AC-1: Key parity**
+Static check: `keys(en.json) == keys(ua.json)` (set equality). Test FAILS on any drift.
+
+**AC-2: t() coverage**
+Static check via ripgrep + AST walker: every JSX text node and string-literal `aria-*` / `title` / `placeholder` either lives in an i18n key, is in the allow-list (brand names, version strings), or fails the check.
+
+**AC-3: Detector path**
+First boot with no persisted language preference: SPA reads `navigator.language` and renders the matching bundle.
+
+**AC-4: Persistence**
+After user switches to UA and reloads, the UA bundle is rendered without explicit user action.
+
+## System Under Test Boundary
+
+- System under test: `src/i18n/i18n.ts` + every React component rendering user-visible text.
+- Allowed stubs: none beyond the standard test renderer.
+- Disallowed: reading the i18n state directly — the test asserts the rendered DOM text.
+- Expected observables per rows 45-48.
+
+## Constraints
+
+- Allow-list file lives at `tests/i18n-allowlist.json`; CI enforces it must not grow without a code-review reason.
+
+## Risks & Mitigation
+
+**Risk 1 — AST walker false positives**
+- *Risk*: the t() coverage walker may misclassify dynamic strings (e.g. `t(\`key_${id}\`)`) or ternaries.
+- *Mitigation*: explicit allow-list per file, plus a comment marker `// i18n-ok: <reason>` honored by the walker.
@@ -0,0 +1,73 @@
+# Test — Destructive UX & ConfirmDialog
+
+**Task**: AZ-466_test_destructive_ux
+**Name**: Destructive UX policy + ConfirmDialog a11y + no-alert + cancel paths
+**Description**: Implement the 8 blackbox tests that pin the destructive-action policy: every destructive surface (delete class, delete user, etc.) shows a `<ConfirmDialog>` before issuing the request; the dialog has proper a11y; cancel suppresses the request; no `alert()` is ever used.
+**Complexity**: 4 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 03_shared-ui (ConfirmDialog) + 08_admin (Blackbox Tests)
+**Tracker**: AZ-466
+**Epic**: AZ-455
+
+## Problem
+
+Without a uniform destructive-action gate, regressions add a delete button that fires directly — a one-click-data-loss bug. The policy is "no destructive action without ConfirmDialog, no `alert()` anywhere".
+
+## Outcome
+
+- 8 scenarios pass per the policy.
+- A static check enumerates every destructive surface to keep the policy enforceable.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file |
+|----------|---------|-------------|
+| FT-P-26 — class-delete with confirmation — happy path | fast + e2e | blackbox-tests.md |
+| FT-P-27 — destructive policy — dialog before request for every destructive surface | fast (static enumeration) | blackbox-tests.md |
+| FT-P-28 — ConfirmDialog has dialog + modal a11y attributes | fast | blackbox-tests.md |
+| FT-P-29 — ConfirmDialog focus trap (Tab cycles inside) | fast | blackbox-tests.md |
+| FT-N-07 — class-delete Cancel path — NO DELETE request issued | fast | blackbox-tests.md |
+| FT-N-08 — Escape on `<ConfirmDialog>` cancels — no destructive request | fast | blackbox-tests.md |
+| NFT-SEC-07 — `alert()` is forbidden anywhere in the SPA | static | security-tests.md |
+| NFT-SEC-08 — ConfirmDialog gates every destructive action | static + fast | security-tests.md |
+
+### Excluded
+
+- ConfirmDialog content / phrasing (covered by i18n parity in 10_test_i18n).
+- Specific delete-target wire shapes (covered by per-feature tasks).
+
+## Acceptance Criteria
+
+**AC-1: Happy path**
+FT-P-26 simulates user clicking Delete → confirming → asserts the DELETE request fires AFTER the confirm.
+
+**AC-2: Cancel paths**
+FT-N-07 / FT-N-08 assert that pressing Cancel / Escape on the dialog suppresses the DELETE request entirely.
+
+**AC-3: a11y**
+FT-P-28 / FT-P-29 assert `role="dialog"`, `aria-modal="true"`, `aria-labelledby` / `aria-describedby` linkage, and a focus trap that keeps Tab inside the dialog.
+
+**AC-4: Policy enforcement**
+FT-P-27 / NFT-SEC-08 — a static check enumerates every surface with a `data-destructive` (or equivalent) attribute and asserts each one mounts a `<ConfirmDialog>` before its mutating handler runs.
+
+**AC-5: No alert()**
+NFT-SEC-07 — ripgrep static check `grep -rn 'alert(' src/` returns no hits outside test files.
+
+## System Under Test Boundary
+
+- System under test: `<ConfirmDialog>` + every destructive surface (delete class, delete user, etc.).
+- Allowed stubs: MSW for the suite's delete endpoints (fast); real services (e2e).
+- Disallowed: stubbing `<ConfirmDialog>`; reading its React state.
+- Expected observables per `results_report.md` rows 49-51 + the rows for NFT-SEC-07, 08.
+
+## Constraints
+
+- Static check (FT-P-27 / NFT-SEC-08) requires a discoverable marker on destructive surfaces; this task lands the test, and per-component tasks (already in scope above) wire the markers.
+
+## Risks & Mitigation
+
+**Risk 1 — A destructive surface is added without the marker**
+- *Risk*: a new feature adds a delete-button that bypasses the static check.
+- *Mitigation*: the marker is on the shared `<DestructiveButton>` wrapper; using raw `<button>` for destructive actions is flagged by an ESLint rule landed in this task.
@@ -0,0 +1,68 @@
+# Test — ProtectedRoute, RBAC & Auth Spinner
+
+**Task**: AZ-467_test_protected_route_rbac
+**Name**: ProtectedRoute spinner + timeout + RBAC route gating
+**Description**: Implement the 7 blackbox tests that pin `<ProtectedRoute>`'s auth-loading spinner (a11y), the 10 s timeout fallback, and the client-side RBAC gating for `/admin` and `/settings` (defence in depth; server enforces too).
+**Complexity**: 4 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 02_auth + 03_shared-ui (Blackbox Tests)
+**Tracker**: AZ-467
+**Epic**: AZ-455
+
+## Problem
+
+`<ProtectedRoute>` is the single client-side gate; getting it wrong leaks RBAC. The spinner + timeout fallback prevents indefinite waits if the bootstrap refresh hangs. RBAC tests run with three seed users (Operator, Admin, integrator-Dave-without-SETTINGS).
+
+## Outcome
+
+- 7 scenarios pass: 4 RBAC paths + 3 spinner/timeout paths.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file |
+|----------|---------|-------------|
+| FT-P-32 — ProtectedRoute spinner a11y | fast | blackbox-tests.md |
+| FT-P-33 — ProtectedRoute timeout fallback after 10 s | fast | blackbox-tests.md |
+| FT-N-03 — authenticated non-admin → `/admin` redirects to `/flights` | fast + e2e | blackbox-tests.md |
+| FT-N-05 — authenticated user without SETTINGS permission → `/settings` | fast + e2e | blackbox-tests.md |
+| NFT-SEC-05 — `/admin` blocks non-admins client-side | fast | security-tests.md |
+| NFT-SEC-06 — `/settings` route gate per RBAC | fast | security-tests.md |
+| NFT-RES-04 — ProtectedRoute loading timeout fallback after 10 s | fast | resilience-tests.md |
+
+### Excluded
+
+- Unauthenticated `/admin` (FT-N-04 — covered in 02_test_auth_token_handling).
+- Server-side enforcement (suite-level test, out of UI scope).
+
+## Acceptance Criteria
+
+**AC-1: Spinner a11y**
+`role="status"` + `aria-live="polite"` + accessible label on the loading element.
+
+**AC-2: Timeout fallback**
+After 10 s without bootstrap-refresh resolution, fallback UI appears with a retry affordance.
+
+**AC-3: RBAC redirects**
+Operator → `/admin` redirects to `/flights`. integrator-Dave → `/settings` redirects per the policy. Operator + Admin → `/admin` reaches it normally.
+
+**AC-4: Both fast + e2e**
+RBAC tests run in both profiles. Fast asserts redirect via Router state; e2e asserts via `page.url()`.
+
+## System Under Test Boundary
+
+- System under test: `<ProtectedRoute>` + Router + `<AuthContext>`.
+- Allowed stubs: MSW for `/api/admin/auth/me` returning the right RBAC payload per seed user; real `admin/` in e2e.
+- Disallowed: stubbing `<ProtectedRoute>`, `<AuthContext>`, or Router.
+- Expected observables per `results_report.md` rows for FT-P-32, 33, FT-N-03, 05, NFT-SEC-05, 06, NFT-RES-04.
+
+## Constraints
+
+- Use seed users `op_alice` (Operator), `admin_carol` (Admin), `integrator_dave` (no SETTINGS) per test-data.md.
+
+## Risks & Mitigation
+
+**Risk 1 — Timeout test flakiness**
+- *Risk*: 10 s wait is long for a fast suite; CI variance may flake.
+- *Mitigation*: use Vitest fake-timers to advance the clock instantaneously.
@@ -0,0 +1,50 @@
+# Test — Header Flight Dropdown a11y
+
+**Task**: AZ-468_test_header_dropdown
+**Name**: Header flight dropdown — closed/open a11y + Escape handler
+**Description**: Implement the 3 blackbox tests pinning the header flight dropdown's open/closed-state a11y attributes and the Escape-to-close handler-detachment behavior.
+**Complexity**: 2 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 10_app-shell (Header) (Blackbox Tests)
+**Tracker**: AZ-468
+**Epic**: AZ-455
+
+## Problem
+
+The header dropdown is keyboard-traversed dozens of times per session; an a11y regression makes the app unusable for keyboard / screen-reader users. The Escape handler must detach on close — a leak that hijacks Escape elsewhere.
+
+## Outcome
+
+- 3 scenarios pass.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file |
+|----------|---------|-------------|
+| FT-P-30 — header flight dropdown closed-state a11y | fast | blackbox-tests.md |
+| FT-P-31 — header flight dropdown open-state a11y | fast | blackbox-tests.md |
+| FT-N-09 — header dropdown Escape — close + handler detached | fast | blackbox-tests.md |
+
+### Excluded
+
+- Flight selection logic itself (covered in 08_test_flight_selection_persistence).
+
+## Acceptance Criteria
+
+**AC-1: Closed state**
+`aria-expanded="false"`; trigger has accessible name; no `aria-activedescendant`.
+
+**AC-2: Open state**
+`aria-expanded="true"`; `role="listbox"` (or `menu`); option list has roles; `aria-activedescendant` points to a real id.
+
+**AC-3: Escape detach**
+After Escape closes the dropdown, the document-level Escape handler installed by the dropdown is removed (tracked via `addEventListener` / `removeEventListener` spies). No leakage into other components' Escape handlers.
+
+## System Under Test Boundary
+
+- System under test: `<Header>` flight dropdown + Escape handler.
+- Allowed stubs: MSW for flights list endpoint.
+- Disallowed: reading dropdown React state.
+- Expected observables per `results_report.md` rows for FT-P-30, 31, FT-N-09.
@@ -0,0 +1,56 @@
+# Test — Browser Support & Responsive Variants
+
+**Task**: AZ-469_test_browser_support_responsive
+**Name**: Browser-support smoke (Chromium + Firefox) + responsive variants (mobile + desktop)
+**Description**: Implement the 3 blackbox tests pinning the cross-browser smoke (AC-18: Chromium + Firefox latest 2) and the responsive bottom-nav / top-bar variants at 480 px / 1024 px breakpoints.
+**Complexity**: 2 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 10_app-shell (Header / nav) (Blackbox Tests)
+**Tracker**: AZ-469
+**Epic**: AZ-455
+
+## Problem
+
+Browser-support and responsive defects are class-of-bug regressions: they don't surface in dev unless the dev runs both browsers at both breakpoints. CI-pinned smoke catches them at commit time.
+
+## Outcome
+
+- 3 scenarios pass; Chromium + Firefox Playwright projects each execute every e2e scenario in this task.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file |
+|----------|---------|-------------|
+| FT-P-34 — browser-support smoke (Chromium + Firefox) | e2e | blackbox-tests.md |
+| FT-P-35 — mobile bottom-nav variant at 480 px | fast (RTL viewport) + e2e | blackbox-tests.md |
+| FT-P-36 — desktop top-bar variant at 1024 px | fast + e2e | blackbox-tests.md |
+
+### Excluded
+
+- Per-page layout (covered by each page's tests).
+- Tablet / non-target breakpoints (no AC).
+
+## Acceptance Criteria
+
+**AC-1: Cross-browser**
+FT-P-34 navigates to `/flights`, `/annotations`, `/dataset` in both Chromium and Firefox; asserts core elements render in both.
+
+**AC-2: Mobile variant**
+At viewport 480×800 the bottom-nav is rendered; the desktop top-bar is hidden.
+
+**AC-3: Desktop variant**
+At viewport 1024×768 the top-bar is rendered; the bottom-nav is hidden.
+
+## System Under Test Boundary
+
+- System under test: app shell + responsive header / nav components.
+- Allowed stubs: standard MSW handlers.
+- Disallowed: stubbing media-query matchers or reading internal React state for layout.
+- Expected observables per `results_report.md` rows 60-62.
+
+## Constraints
+
+- Playwright config defines two browser projects (Chromium, Firefox); this task does NOT introduce a third browser.
+- Viewport assertion uses `page.setViewportSize()` in e2e and `matchMedia` polyfill + viewport mock in fast.
@@ -0,0 +1,54 @@
+# Test — Panel-Width Persistence
+
+**Task**: AZ-470_test_panel_width_persistence
+**Name**: Panel-width debounced PUT + rehydration on reload
+**Description**: Implement the 3 blackbox tests pinning the panel-width persistence behavior: a debounced PUT (≤1 s after resize-end) and rehydration of the saved width on reload.
+**Complexity**: 2 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 10_app-shell + 06_annotations (split panels) (Blackbox Tests)
+**Tracker**: AZ-470
+**Epic**: AZ-455
+
+## Problem
+
+Layout preferences are a quality-of-life feature that must NOT thrash the suite with PUTs (debounce required) and MUST persist across reloads (saved-state contract). Both regress silently — users notice but rarely report.
+
+## Outcome
+
+- 3 scenarios pass.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file |
+|----------|---------|-------------|
+| FT-P-37 — panel-width persistence — debounced PUT on resize end | fast | blackbox-tests.md |
+| FT-P-38 — panel-width rehydration on reload | fast + e2e | blackbox-tests.md |
+| NFT-PERF-08 — panel-width persistence debounce ≤ 1 s after resize-end | fast | performance-tests.md |
+
+### Excluded
+
+- Flight-selection persistence (covered in 08_test_flight_selection_persistence) — different state key but same `PUT /api/annotations/settings/user` endpoint.
+
+## Acceptance Criteria
+
+**AC-1: Debounce window**
+Multiple resize events within 1 s yield exactly one outbound PUT. Asserted via captured request log.
+
+**AC-2: Body shape**
+The PUT body carries the `panelWidths` field (per the contract).
+
+**AC-3: Rehydration**
+After reload with `seed_user_settings.panelWidths` set, the rendered panel widths match the seed.
+
+## System Under Test Boundary
+
+- System under test: panel split component(s) + `<UserSettings>` save path.
+- Allowed stubs: MSW for `PUT /api/annotations/settings/user`.
+- Disallowed: reading internal React state.
+- Expected observables per `results_report.md` rows 64-65.
+
+## Constraints
+
+- Use Vitest fake-timers to simulate the debounce window deterministically.
@@ -0,0 +1,70 @@
+# Test — Canvas Editor (Bounding-Box + Multi-Select + Zoom + Pan)
+
+**Task**: AZ-471_test_canvas_bbox
+**Name**: CanvasEditor manual draw + 8-handle resize + Ctrl+click multi-select + Ctrl+wheel zoom + Ctrl+drag pan
+**Description**: Implement the 5 blackbox tests covering the core canvas interactions: draw bbox, resize via 8 handles, multi-select via Ctrl+click, zoom-around-cursor via Ctrl+wheel, pan via Ctrl+drag.
+**Complexity**: 5 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 06_annotations (CanvasEditor) (Blackbox Tests)
+**Tracker**: AZ-471
+**Epic**: AZ-455
+
+## Problem
+
+The canvas editor is the SPA's most-interactive surface; canvas-pixel coordinates introduce floating-point + DPR gotchas that have caused subtle off-by-pixel regressions. The 5 scenarios pin geometry, modifier-key semantics, and viewport transformation.
+
+## Outcome
+
+- 5 scenarios pass with deterministic numeric fixtures (no `getBoundingClientRect` flakiness across viewport sizes).
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file |
+|----------|---------|-------------|
+| FT-P-39 — manual bounding-box draw on `<CanvasEditor>` | fast + e2e | blackbox-tests.md |
+| FT-P-40 — 8-handle bbox resize | fast | blackbox-tests.md |
+| FT-P-41 — Ctrl+click multi-select on canvas | fast | blackbox-tests.md |
+| FT-P-42 — Ctrl+wheel zoom-around-cursor | fast | blackbox-tests.md |
+| FT-P-43 — Ctrl+drag pan on empty canvas | fast | blackbox-tests.md |
+
+### Excluded
+
+- Tile-split interaction (covered in 19_test_tile_split_zoom).
+- Annotation overlay membership (covered in 07_test_overlay_membership).
+- Save-on-change (covered in 05_test_annotations_endpoint).
+
+## Acceptance Criteria
+
+**AC-1: Manual draw geometry**
+Draw a bbox at `(x1,y1)→(x2,y2)`; resulting annotation carries the canonical canvas-coordinate quad. Floating-point compare within ±0.5 px tolerance.
+
+**AC-2: 8-handle resize**
+Drag each of the 8 handles independently; assert resulting bbox geometry per handle. Each handle's anchor (opposite corner / edge midpoint) is invariant during the drag.
+
+**AC-3: Ctrl+click multi-select**
+Ctrl+click on a second bbox adds it to the selection; selection set contains both bboxes (asserted via DOM rendering — selection ring style).
+
+**AC-4: Zoom-around-cursor**
+Ctrl+wheel at cursor `(cx, cy)`: the canvas pixel under `(cx, cy)` BEFORE the wheel equals the canvas pixel under `(cx, cy)` AFTER (within ±0.5 px).
+
+**AC-5: Empty-canvas pan**
+Ctrl+drag on an empty canvas region: viewport offset shifts by `(dx, dy)`; bbox positions in canvas coords are invariant.
+
+## System Under Test Boundary
+
+- System under test: `<CanvasEditor>` + its pointer/mouse handlers + the canvas-coordinate ↔ viewport transform.
+- Allowed stubs: MSW for annotation load.
+- Disallowed: stubbing the canvas component or asserting on React state. Pointer events are dispatched via RTL `user-event` (or Playwright `dispatchEvent` for e2e).
+- Expected observables per `results_report.md` rows 73-78 (Group 16).
+
+## Constraints
+
+- Use fixed-size canvas (640×480) so coordinate math is deterministic.
+
+## Risks & Mitigation
+
+**Risk 1 — DPR + retina display flakiness**
+- *Risk*: Test runner on a retina display reports different physical pixels than the e2e Docker container.
+- *Mitigation*: Playwright config forces `deviceScaleFactor: 1`; Vitest+jsdom defaults to DPR 1.
@@ -0,0 +1,59 @@
+# Test — DetectionClasses (Load, Hotkeys, Click, Fallback)
+
+**Task**: AZ-472_test_detection_classes
+**Name**: DetectionClasses load + 1-9 hotkeys + click path + empty/5xx fallback
+**Description**: Implement the 4 blackbox tests pinning `<DetectionClasses>` behavior: load from `/api/annotations/classes`, hotkey 1-9 → `classes[(key-1) + P]` (P = current PhotoMode offset), click-to-select path, and the fallback list when the API is empty or 5xx.
+**Complexity**: 3 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 06_annotations (DetectionClasses) + 11_class-colors (Blackbox Tests)
+**Tracker**: AZ-472
+**Epic**: AZ-455
+
+## Problem
+
+`<DetectionClasses>` is keyboard-driven during annotation; the hotkey-to-class mapping changes with PhotoMode (offset P). Wrong P leaks the wrong class number on save — a quality regression hard to spot without explicit hotkey tests.
+
+## Outcome
+
+- 4 scenarios pass — load contract, hotkey arithmetic, click selection, fallback robustness.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file |
+|----------|---------|-------------|
+| FT-P-44 — DetectionClasses loads from `/api/annotations/classes` | fast + e2e | blackbox-tests.md |
+| FT-P-45 — class hotkey 1–9 selects `classes[(key-1) + P]` | fast | blackbox-tests.md |
+| FT-P-46 — class click path | fast | blackbox-tests.md |
+| FT-P-47 — fallback class list on API empty/5xx | fast | blackbox-tests.md |
+
+### Excluded
+
+- PhotoMode switching itself (covered in 18_test_photo_mode).
+- Class CRUD via Admin (out of scope; per-feature task in Phase B).
+
+## Acceptance Criteria
+
+**AC-1: Load contract**
+FT-P-44 — outbound URL is `/api/annotations/classes`; response shape matches `seed_classes` (per the contract).
+
+**AC-2: Hotkey arithmetic**
+FT-P-45 — for each PhotoMode P ∈ {0, 20, 40}, pressing keys 1..9 selects the corresponding class from the appropriate window of 9.
+
+**AC-3: Click**
+FT-P-46 — clicking a class entry selects that class; outbound annotation save (if any) carries the right classId.
+
+**AC-4: Fallback**
+FT-P-47 — when `/api/annotations/classes` returns 200 with `[]` or any 5xx, the fallback class list is rendered; tests assert the fallback length and a marker indicating fallback mode.
+
+## System Under Test Boundary
+
+- System under test: `<DetectionClasses>` + `<PhotoModeContext>` (read-only here).
+- Allowed stubs: MSW for `/api/annotations/classes`.
+- Disallowed: stubbing `<DetectionClasses>` or reading its React state.
+- Expected observables per `results_report.md` rows 73-77 (Group 16).
+
+## Constraints
+
+- `seed_classes` MUST satisfy N≥9 (per test-data.md) so all hotkeys 1..9 are hot in P=0.
@@ -0,0 +1,54 @@
+# Test — PhotoMode Switch & yoloId Wire
+
+**Task**: AZ-473_test_photo_mode
+**Name**: PhotoMode switch + auto-select + yoloId offsetting on the wire
+**Description**: Implement the 3 blackbox tests pinning PhotoMode behavior: switching modes sets the offset filter, auto-selecting when the prior class is no longer valid, and the on-wire `classNum == classId + photoModeOffset` assertion.
+**Complexity**: 2 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 06_annotations (PhotoModeContext + AnnotationsPage) (Blackbox Tests)
+**Tracker**: AZ-473
+**Epic**: AZ-455
+
+## Problem
+
+PhotoMode offsets `classId` to `classNum` on the wire (`classNum == classId + photoModeOffset`). Getting this wrong leaks a different class on every save without any user-visible symptom — until a downstream consumer mis-buckets the data.
+
+## Outcome
+
+- 3 scenarios pass — mode switch, auto-select on invalid, wire offset arithmetic.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file |
+|----------|---------|-------------|
+| FT-P-48 — PhotoMode switch — mode set + filter | fast | blackbox-tests.md |
+| FT-P-49 — PhotoMode auto-select when prior class no longer valid | fast | blackbox-tests.md |
+| FT-P-50 — yoloId on the wire — `classNum == classId + photoModeOffset` | fast + e2e | blackbox-tests.md |
+
+### Excluded
+
+- DetectionClasses load + hotkeys (covered in 17_test_detection_classes).
+
+## Acceptance Criteria
+
+**AC-1: Switch sets filter**
+FT-P-48 — toggling PhotoMode updates the rendered class list (filter applied); the selected mode is persisted in `<PhotoModeContext>` (asserted via the rendered filter, not via context read).
+
+**AC-2: Auto-select**
+FT-P-49 — switching to a mode where the currently-selected class is out-of-range auto-selects the first valid class in the new window.
+
+**AC-3: Wire offset**
+FT-P-50 — issue an annotation save in mode P; outbound body carries `classNum == classId + P` for every detection.
+
+## System Under Test Boundary
+
+- System under test: `<PhotoModeContext>` + `<AnnotationsPage>` save call.
+- Allowed stubs: MSW for `/api/annotations/classes` + annotation save.
+- Disallowed: reading `<PhotoModeContext>` state directly.
+- Expected observables per `results_report.md` rows 77-80 region.
+
+## Constraints
+
+- Tests exercise all three modes (P ∈ {0, 20, 40}); each saves a probe annotation and asserts the wire offset.
@@ -0,0 +1,73 @@
+# Test — Tile Split, Zoom Indicator & YOLO Parser
+
+**Task**: AZ-474_test_tile_split_zoom
+**Name**: Tile-split endpoint + YOLO parser + isSplit handling + auto-zoom viewport + zoom indicator + malformed parse
+**Description**: Implement the 6 blackbox tests covering the tile-split feature: endpoint contract, YOLO label parser happy path, `DatasetItem.isSplit` honor, auto-zoom viewport matching tile rect, indicator visibility, and the malformed-label error path.
+**Complexity**: 3 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 07_dataset (tile split / zoom) + 06_annotations (parser) (Blackbox Tests)
+**Tracker**: AZ-474
+**Epic**: AZ-455
+
+## Problem
+
+Tile-split + tile-zoom is the most-numerically-intensive feature in the SPA. A single off-by-one in the YOLO label parser or a stale viewport calc loses every annotation in the tile. The malformed-label path must surface a user-visible error (per AC-39 sad path), not silently produce NaN-rendered boxes.
+
+## Outcome
+
+- 6 scenarios pass.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file |
+|----------|---------|-------------|
+| FT-P-51 — tile-split endpoint contract | fast + e2e | blackbox-tests.md |
+| FT-P-52 — YOLO label parser — happy path | fast | blackbox-tests.md |
+| FT-P-53 — `DatasetItem.isSplit` is honored on the dataset list path | fast + e2e | blackbox-tests.md |
+| FT-P-54 — tile auto-zoom viewport matches tile rect | fast | blackbox-tests.md |
+| FT-P-55 — tile-zoom indicator visible while active | fast | blackbox-tests.md |
+| FT-N-10 — malformed YOLO label surfaces a user-visible error (no silent swallow, no NaN render) | fast | blackbox-tests.md |
+
+### Excluded
+
+- Canvas-pixel coordinate math (covered in 16_test_canvas_bbox).
+- Annotation save body shape (covered in 05_test_annotations_endpoint).
+
+## Acceptance Criteria
+
+**AC-1: Endpoint contract**
+FT-P-51 — outbound URL + body for tile-split match the contract.
+
+**AC-2: YOLO parser happy**
+FT-P-52 — input `"3 0.5 0.5 0.2 0.2"` produces the canonical 5-tuple { classId: 3, cx: 0.5, cy: 0.5, w: 0.2, h: 0.2 }.
+
+**AC-3: isSplit honored**
+FT-P-53 — dataset items with `isSplit: true` render the split affordance; items with `false` do not.
+
+**AC-4: Auto-zoom**
+FT-P-54 — entering a tile sets the viewport to the tile rect (±0.5 px tolerance).
+
+**AC-5: Indicator visibility**
+FT-P-55 — while zoomed into a tile, the indicator is visible with the correct ARIA label.
+
+**AC-6: Malformed parser**
+FT-N-10 — input `"garbage"` produces a user-visible error (toast or inline) and NO bbox render. Asserted via DOM (error region present) AND `getBoundingClientRect` returning finite values for every rendered box (no NaN).
+
+## System Under Test Boundary
+
+- System under test: `<DatasetPage>` + `<TileViewer>` + the YOLO parser module.
+- Allowed stubs: MSW for tile-split endpoint + dataset list.
+- Disallowed: stubbing the parser or reading its state — tests pass input via the public surface (typically a paste / load) and assert the rendered result.
+- Expected observables per `results_report.md` rows 85-90 (Group 17) + the row for FT-N-10.
+
+## Constraints
+
+- The malformed-label test (FT-N-10) MUST NOT pass by virtue of `alert()` — that's banned by NFT-SEC-07. The error must be in-DOM.
+
+## Risks & Mitigation
+
+**Risk 1 — Viewport rect comparison flakiness**
+- *Risk*: `getBoundingClientRect` values depend on layout timing.
+- *Mitigation*: wait for `requestAnimationFrame` before measuring; ±0.5 px tolerance.
@@ -0,0 +1,46 @@
+# Test — Form Hygiene (Numeric Inputs)
+
+**Task**: AZ-475_test_form_hygiene
+**Name**: Numeric form input — empty / non-numeric rejection
+**Description**: Implement the 2 blackbox tests pinning numeric form input validation per AC-26 — empty input does not silently zero, non-numeric input is rejected; in both cases NO PUT fires.
+**Complexity**: 2 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 03_shared-ui (form components) + 09_settings (Blackbox Tests)
+**Tracker**: AZ-475
+**Epic**: AZ-455
+
+## Problem
+
+Silent zero-default on empty numeric input is a common regression (because `Number('') === 0`). It causes user settings to drift to 0 invisibly, which then propagates to the server. Both inputs must be REJECTED at the UI layer, not coerced.
+
+## Outcome
+
+- 2 scenarios pass.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file |
+|----------|---------|-------------|
+| FT-N-11 — numeric field with empty input — no silent zero | fast | blackbox-tests.md |
+| FT-N-12 — numeric field with non-numeric input — rejected | fast | blackbox-tests.md |
+
+### Excluded
+
+- Range / min-max validation (out of scope for the baseline; per-feature tasks in Phase B).
+
+## Acceptance Criteria
+
+**AC-1: Empty rejection**
+FT-N-11 — when a numeric input is cleared and the user attempts to submit, a validation error is shown; no PUT fires.
+
+**AC-2: Non-numeric rejection**
+FT-N-12 — when a numeric input receives "abc", the field surfaces a validation error and no PUT fires.
+
+## System Under Test Boundary
+
+- System under test: every numeric form field in the SPA (Settings, Admin classes, etc.).
+- Allowed stubs: MSW for the PUT endpoint (so test can verify ABSENCE of the request).
+- Disallowed: stubbing form components.
+- Expected observables per `results_report.md` rows 66-67.
@@ -0,0 +1,51 @@
+# Test — Upload 500 MB Cap
+
+**Task**: AZ-476_test_upload_size_cap
+**Name**: Upload >500 MB → 413 + user-visible error (no alert)
+**Description**: Implement the 2 blackbox tests pinning the upload size cap behavior (per E9 / AC-10): nginx-side 413 on a 501 MB file is surfaced to the user via a friendly in-DOM error — never `alert()`.
+**Complexity**: 2 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 06_annotations (upload) (Blackbox Tests)
+**Tracker**: AZ-476
+**Epic**: AZ-455
+
+## Problem
+
+A 413 hidden behind a `console.error` makes the SPA look unresponsive. The contract: when nginx rejects an oversized upload, the SPA shows an i18n-keyed error in the DOM, without `alert()`.
+
+## Outcome
+
+- 2 scenarios pass.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file |
+|----------|---------|-------------|
+| FT-N-06 — upload of 501 MB file surfaces a user-visible 413 error | fast + e2e | blackbox-tests.md |
+| NFT-RES-07 — nginx 413 on oversized upload surfaces user-visible error | fast + e2e | resilience-tests.md |
+
+### Excluded
+
+- Upload success path (out of scope for the baseline).
+- Multi-part chunked uploads (not supported — out of scope).
+
+## Acceptance Criteria
+
+**AC-1: User-visible error**
+FT-N-06 / NFT-RES-07 — attempt to upload a 501 MB file; nginx returns 413; the SPA renders a user-visible in-DOM error (toast or inline) carrying the i18n-keyed message.
+
+**AC-2: No alert**
+The error path does NOT invoke `alert()` (defence in depth — caught by NFT-SEC-07 as well, but asserted here too).
+
+## System Under Test Boundary
+
+- System under test: upload flow component(s) + `src/api/client.ts` upload helper.
+- Allowed stubs: in `fast`, MSW responds 413; in `e2e`, the suite's real nginx enforces the cap.
+- Disallowed: stubbing the upload component.
+- Expected observables per `results_report.md` rows 38-39.
+
+## Constraints
+
+- E2E test sends a 501 MB sparse file (zero-filled) to avoid bloating CI bandwidth.
@@ -0,0 +1,53 @@
+# Test — Settings Save Resilience & 2 s Error Budget
+
+**Task**: AZ-477_test_settings_resilience
+**Name**: Settings save 500 + network drop — `saving` flag reset + error surfaces ≤ 2 s
+**Description**: Implement the 5 blackbox tests covering Settings save resilience: upstream 500, network drop, the 2-second error-surface deadline, and the try/finally state reset that prevents a stuck "saving…" indicator.
+**Complexity**: 3 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 09_settings (Blackbox Tests)
+**Tracker**: AZ-477
+**Epic**: AZ-455
+
+## Problem
+
+Settings save has been observed to leave the `saving` flag set after an upstream failure (annotated in `src__features__settings__SettingsPage.md`). The user sees a forever-spinning button and the page becomes unusable. The contract: on ANY failure (HTTP error or network error), state resets within 2 s AND a user-visible error appears.
+
+## Outcome
+
+- 5 scenarios pass.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file |
+|----------|---------|-------------|
+| FT-N-13 — Settings save with 500 response — `saving` flag reset; error surfaced | fast | blackbox-tests.md |
+| FT-N-14 — Settings save with network failure — try/finally state reset | fast | blackbox-tests.md |
+| NFT-PERF-09 — Settings save error surfaces within 2 s | fast | performance-tests.md |
+| NFT-RES-05 — Settings save with upstream 500 — UI state recovers | fast | resilience-tests.md |
+| NFT-RES-06 — Settings save with network drop — try/finally state reset | fast | resilience-tests.md |
+
+### Excluded
+
+- Settings happy path (covered by per-feature tests in Phase B).
+- Settings RBAC redirect (covered in 12_test_protected_route_rbac).
+
+## Acceptance Criteria
+
+**AC-1: 500 recovery**
+FT-N-13 / NFT-RES-05 — MSW returns 500 on settings PUT; assert: (a) `saving` flag is off (Save button no longer disabled / spinner not shown) within 2 s, (b) an error region is present in DOM.
+
+**AC-2: Network drop**
+FT-N-14 / NFT-RES-06 — MSW returns a network error; assert the same two conditions as AC-1.
+
+**AC-3: Deadline**
+NFT-PERF-09 — measure wall-clock from response receipt (or network failure) to DOM error visibility; assert ≤ 2 s.
+
+## System Under Test Boundary
+
+- System under test: `<SettingsPage>` + its save handler + `src/api/client.ts`.
+- Allowed stubs: MSW for the settings PUT endpoint (returns 500 / network error per scenario).
+- Disallowed: reading React state — test asserts DOM affordances (disabled button, error region).
+- Expected observables per `results_report.md` rows 68-70 + the rows for NFT-RES-05/06.
@@ -0,0 +1,56 @@
+# Test — Network Resilience (Offline, SSE Disconnect, Tainted-Canvas)
+
+**Task**: AZ-478_test_network_resilience
+**Name**: Network offline at boot + SSE server disconnect indicator + tainted-canvas fallback
+**Description**: Implement the 3 blackbox tests covering network-loss paths: offline-at-boot error state (NO offline mode per E10), SSE server disconnect surfaces a connection-lost indicator, and annotation download falls back gracefully on a tainted canvas.
+**Complexity**: 3 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 01_api-transport + 06_annotations (Blackbox Tests)
+**Tracker**: AZ-478
+**Epic**: AZ-455
+
+## Problem
+
+The SPA explicitly rejects offline mode (E10 / NFT-SEC-12) but must still degrade gracefully when the network drops mid-session. The two streams in scope are the SSE streams (must surface a connection-lost banner / indicator) and the annotation-download path (must avoid `SecurityError: tainted canvas` crashes).
+
+## Outcome
+
+- 3 scenarios pass.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file | results_report row |
+|----------|---------|-------------|--------------------|
+| NFT-RES-03 — Network offline at boot — error state, no offline mode | fast + e2e | resilience-tests.md | per NFT-RES-03 |
+| NFT-RES-09 — Annotation download tainted-canvas fallback | fast | resilience-tests.md | 96 |
+| NFT-RES-10 — SSE server disconnect — UI surfaces a connection-lost indicator | fast + e2e | resilience-tests.md | 97 |
+
+### Excluded
+
+- 401 → refresh → retry network path (covered in 02_test_auth_token_handling).
+- Settings save resilience (covered in 22_test_settings_resilience).
+- Upload 413 path (covered in 21_test_upload_size_cap).
+
+## Acceptance Criteria
+
+**AC-1: Offline at boot**
+NFT-RES-03 — with all `/api/*` requests returning network errors, the SPA renders an error state and does NOT register a service worker / offline cache (defence in depth — NFT-SEC-12 catches the SW separately).
+
+**AC-2: Tainted-canvas fallback**
+NFT-RES-09 — annotation download via canvas-to-data-URL on a tainted canvas does not crash the page; a user-visible fallback (alternative download path or in-DOM error) is rendered. Asserted per `results_report.md` row 96.
+
+**AC-3: Disconnect indicator**
+NFT-RES-10 — when an SSE EventSource fires `error` with `readyState === 2` (CLOSED), within 2 s a connection-lost indicator is visible in the DOM with the i18n-keyed text. Asserted per `results_report.md` row 97.
+
+## System Under Test Boundary
+
+- System under test: `<App>` boot path, SSE consumers (`<FlightsPage>`, `<AnnotationsPage>`), annotation download handler in `<AnnotationsPage>`.
+- Allowed stubs: MSW for `/api/*` (fast); for e2e, suite `flights/` / `annotations/` services are killed mid-session to drive the disconnect path.
+- Disallowed: stubbing the SPA components.
+- Tainted canvas is induced in fast by drawing an `<img crossOrigin="anonymous">` against a stub that intentionally omits CORS headers.
+
+## Constraints
+
+- Service-worker registration assertion: `navigator.serviceWorker.getRegistrations()` returns `[]` at all times (defence in depth, also enforced by NFT-SEC-12).
@@ -0,0 +1,65 @@
+# Test — Bundle Size, FCP, & Annotation Session Memory Soak
+
+**Task**: AZ-479_test_bundle_fcp_soak
+**Name**: Bundle ≤ 2 MB gzipped + mission-planner excluded + FCP ≤ 3 s + 30-min annotation soak
+**Description**: Implement the 4 blackbox tests covering the production-bundle budget, the mission-planner exclusion canary, the First-Contentful-Paint budget on `/flights`, and the 30-minute annotation-session memory soak.
+**Complexity**: 3 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 00_foundation (build) + 10_app-shell (FCP) + 06_annotations (memory) (Blackbox Tests)
+**Tracker**: AZ-479
+**Epic**: AZ-455
+
+## Problem
+
+The 2 MB gzipped budget (NFT-PERF-01 / NFT-RES-LIM-01) bounds first-load on edge hardware. The mission-planner workspace is intentionally excluded from the prod bundle (NFT-RES-LIM-04) — a regression that re-includes it doubles the bundle. FCP ≤ 3 s on `/flights` (NFT-PERF-10) is the user-perceived latency target. The 30-minute annotation soak (NFT-RES-LIM-05) catches slow heap creep that 1-hour live-GPS soak (T08) does not catch on the annotation surface.
+
+## Outcome
+
+- 4 scenarios pass.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file | results_report row |
+|----------|---------|-------------|--------------------|
+| NFT-PERF-01 — Initial JS bundle ≤ 2 MB gzipped | static | performance-tests.md | per NFT-PERF-01 |
+| NFT-RES-LIM-04 — `mission-planner/` is excluded from production bundle | static | resource-limit-tests.md | per NFT-RES-LIM-04 |
+| NFT-PERF-10 — FCP on `/flights` ≤ 3 s | e2e | performance-tests.md | 98 |
+| NFT-RES-LIM-05 — SPA memory stable across 30-minute annotation session | e2e (long-running) | resource-limit-tests.md | per NFT-RES-LIM-05 |
+
+### Excluded
+
+- Per-route lazy-chunk budgets (out of scope for the baseline).
+- Live-GPS 1-hour soak + 100-flight-selection soak (covered in 08_test_flight_selection_persistence).
+
+## Acceptance Criteria
+
+**AC-1: Bundle budget**
+NFT-PERF-01 / NFT-RES-LIM-01 — `bun run build` produces `dist/`; sum of gzipped JS in initial chunks ≤ 2 MB. Static check runs after build.
+
+**AC-2: Mission-planner exclusion**
+NFT-RES-LIM-04 — no chunk in `dist/` references `mission-planner/` source; ripgrep static check returns no hits in built JS.
+
+**AC-3: FCP budget**
+NFT-PERF-10 — Playwright `performance.getEntriesByName('first-contentful-paint')` on `/flights` reports ≤ 3000 ms median over 5 runs on the configured mid-range CPU throttle (4x slowdown per the test env).
+
+**AC-4: Annotation memory soak**
+NFT-RES-LIM-05 — 30-minute annotation session: load 50 media items, annotate each, navigate dataset; heap at t=1800 s within 10% of heap at t=60 s.
+
+## System Under Test Boundary
+
+- System under test: built `dist/` artefacts (AC-1 / AC-2); the running SPA in Chromium (AC-3 / AC-4).
+- Allowed stubs: MSW / suite services as needed for the e2e tests.
+- Disallowed: hand-tuning the bundle to "fit" or excluding files from build for the test.
+
+## Constraints
+
+- AC-3 / AC-4 require Chromium-only (Firefox lacks `performance.memory`); soak tagged `@long-running`.
+- Bundle measurement uses the gzip size, not the raw bytes — matches NFT-PERF-01 contract.
+
+## Risks & Mitigation
+
+**Risk 1 — FCP flakiness from cold-cache state**
+- *Risk*: First navigation to `/flights` includes auth handshake and SSE setup, inflating FCP.
+- *Mitigation*: the test issues a warmup navigation, then measures FCP on the second navigation; warmup is recorded in the CSV but does not gate.
@@ -0,0 +1,62 @@
+# Test — Production Image, nginx Routes & Edge-host RAM
+
+**Task**: AZ-480_test_prod_image_nginx_ram
+**Name**: nginx:alpine no-Node + 500M cap + 9 routes + prefix-strip + edge-host RAM
+**Description**: Implement the 5 blackbox tests pinning the production container surface: nginx:alpine image with no Node runtime (NFT-RES-LIM-03), `client_max_body_size 500M` (NFT-RES-LIM-02), exactly 9 location blocks for the suite services (NFT-RES-LIM-09), each route strips its `/api/<service>/` prefix (NFT-RES-LIM-10), and steady-state RAM on the edge image (NFT-RES-LIM-08).
+**Complexity**: 3 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 00_foundation (build + image) (Blackbox Tests)
+**Tracker**: AZ-480
+**Epic**: AZ-455
+
+## Problem
+
+The production image MUST be `nginx:alpine` (no Node — defence against "I'll just add a tiny Node helper" regressions), enforce a 500 MB body cap (matches NFT-PERF / upload cap), expose exactly 9 nginx routes (one per suite service), strip the route prefix (so upstream services see the canonical path), and keep RAM bounded at steady state. These are five contract pins on the runtime surface that have no source-code home — they live in the image / nginx config and CI build.
+
+## Outcome
+
+- 5 scenarios pass.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file |
+|----------|---------|-------------|
+| NFT-RES-LIM-02 — nginx `client_max_body_size 500M` | static | resource-limit-tests.md |
+| NFT-RES-LIM-03 — Production image is `nginx:alpine` and carries no Node.js | static + e2e | resource-limit-tests.md |
+| NFT-RES-LIM-08 — Edge-host RAM profile of the UI image at steady state | e2e | resource-limit-tests.md |
+| NFT-RES-LIM-09 — nginx routes — exactly 9 location blocks for the suite services | static | resource-limit-tests.md |
+| NFT-RES-LIM-10 — nginx — each route strips its `/api/<service>/` prefix | static + e2e | resource-limit-tests.md |
+
+### Excluded
+
+- CI image-tag scheme + OCI labels (covered in 26_test_ci_image_labels).
+- Bundle exclusions (covered in 24_test_bundle_fcp_soak).
+
+## Acceptance Criteria
+
+**AC-1: nginx config — 500 MB cap**
+NFT-RES-LIM-02 — `grep -E 'client_max_body_size\s+500M'` against `nginx/conf.d/*.conf` returns exactly one hit (in the SPA server block).
+
+**AC-2: nginx:alpine, no Node**
+NFT-RES-LIM-03 — `Dockerfile` base image is `nginx:alpine` (or a versioned variant thereof); the produced image, run with `docker run --rm $IMAGE which node`, returns non-zero. E2e probes the running container.
+
+**AC-3: Steady-state RAM**
+NFT-RES-LIM-08 — boot the production image with the suite test stack; after 5 min of idle traffic, `docker stats $CONTAINER --no-stream --format '{{.MemUsage}}'` reports usage ≤ the published edge budget (per `_docs/02_document/tests/environment.md`).
+
+**AC-4: 9 routes**
+NFT-RES-LIM-09 — `grep -cE '^\s*location\s+/api/'` against `nginx/conf.d/*.conf` returns exactly 9.
+
+**AC-5: Prefix strip**
+NFT-RES-LIM-10 — for each suite service `S`, a request to `/api/<S>/probe` reaches `S` with the path `/probe` (asserted via the suite's echo endpoints). Static check additionally verifies `rewrite ^/api/<S>/(.*)$ /$1 break;` or equivalent for each route.
+
+## System Under Test Boundary
+
+- System under test: `Dockerfile`, `nginx/conf.d/*.conf` (or equivalent), and the built image.
+- Allowed stubs: suite echo endpoints for the prefix-strip e2e assertion.
+- Disallowed: stubbing nginx itself.
+
+## Constraints
+
+- E2E tests (AC-2, AC-3, AC-5) run against the suite docker-compose stack; tagged `@requires-docker`.
@@ -0,0 +1,54 @@
+# Test — CI Image Tag Scheme & OCI Labels
+
+**Task**: AZ-481_test_ci_image_labels
+**Name**: CI image tag `${branch}-arm` + OCI labels + revision label
+**Description**: Implement the 3 blackbox tests pinning the CI-produced image surface: tag scheme is `${branch}-arm` (NFT-RES-LIM-11), required OCI labels are present (NFT-RES-LIM-12), and the `org.opencontainers.image.revision` label equals `$CI_COMMIT_SHA` (NFT-RES-LIM-13).
+**Complexity**: 2 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 00_foundation (CI/CD) (Blackbox Tests)
+**Tracker**: AZ-481
+**Epic**: AZ-455
+
+## Problem
+
+Without a tag canary, two builds on the same branch can overwrite each other silently (no `-arm` suffix); without OCI labels, the image is not traceable to a source revision. Both regressions are operationally catastrophic and surface only after a deploy.
+
+## Outcome
+
+- 3 scenarios pass.
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file |
+|----------|---------|-------------|
+| NFT-RES-LIM-11 — CI image tag scheme is `${branch}-arm` | static (CI config) + e2e (against pushed image) | resource-limit-tests.md |
+| NFT-RES-LIM-12 — OCI labels present on the pushed image | static + e2e | resource-limit-tests.md |
+| NFT-RES-LIM-13 — Revision label equals `$CI_COMMIT_SHA` | e2e | resource-limit-tests.md |
+
+### Excluded
+
+- nginx route / image-base assertions (covered in 25_test_prod_image_nginx_ram).
+
+## Acceptance Criteria
+
+**AC-1: Tag scheme**
+NFT-RES-LIM-11 — CI config (`.gitlab-ci.yml` / equivalent) builds with tag pattern `${CI_COMMIT_REF_SLUG}-arm`; the static check asserts the pattern is intact. E2e additionally asserts `docker manifest inspect $REGISTRY/$IMAGE:${BRANCH}-arm` succeeds.
+
+**AC-2: OCI labels present**
+NFT-RES-LIM-12 — `docker inspect $IMAGE` reports the required OCI labels: `org.opencontainers.image.source`, `org.opencontainers.image.revision`, `org.opencontainers.image.title`, `org.opencontainers.image.created` — all non-empty.
+
+**AC-3: Revision binding**
+NFT-RES-LIM-13 — `docker inspect --format='{{index .Config.Labels "org.opencontainers.image.revision"}}'` returns exactly `$CI_COMMIT_SHA`.
+
+## System Under Test Boundary
+
+- System under test: CI config (`.gitlab-ci.yml` / equivalent) + the pushed image's metadata.
+- Allowed stubs: a tag fixture for static-mode validation; in e2e the test inspects an actual locally-built image.
+- Disallowed: bypassing CI to inject labels for the test.
+
+## Constraints
+
+- E2e tests run only on CI; locally `bun run test:e2e -- --skip-tags @requires-ci` skips them.
+- CI environment provides `CI_COMMIT_SHA` and `CI_COMMIT_REF_SLUG`.
@@ -0,0 +1,67 @@
+# Test — Secrets in Source + Banned-Lib Hygiene + Anti-Criterion
+
+**Task**: AZ-482_test_secrets_and_banned_libs
+**Name**: OWM key not in source/bundle + no ML/JOSE/SW/dropped libs + AC-N1 anti-criterion
+**Description**: Implement the 6 static blackbox tests pinning the source-/bundle-hygiene contracts: no OpenWeatherMap key in source (NFT-SEC-09), no in-browser ML libs (NFT-SEC-10), no JOSE / response-signature libs (NFT-SEC-11), no service worker (NFT-SEC-12), no dropped legacy features (NFT-SEC-13), and the AC-N1 anti-criterion (no concurrent-edit reconciliation surfaces in source) (NFT-SEC-14).
+**Complexity**: 3 points
+**Dependencies**: AZ-456_test_infrastructure
+**Component**: 00_foundation + cross-cutting (Blackbox Tests)
+**Tracker**: AZ-482
+**Epic**: AZ-455
+
+## Problem
+
+Six contract pins that are all static-checkable and all about what MUST NOT exist in the codebase / built bundle. Each comes from a hard restriction in `_docs/00_problem/restrictions.md` (R*) or `_docs/00_problem/security_approach.md`. Without static enforcement, they regress on the day someone "just adds" a banned library.
+
+## Outcome
+
+- 6 scenarios pass (as static checks).
+
+## Scope
+
+### Included
+
+| Scenario | Profile | Source file | results_report row |
+|----------|---------|-------------|--------------------|
+| NFT-SEC-09 — OpenWeatherMap API key is not shipped in source or bundle | static | security-tests.md | 63 |
+| NFT-SEC-10 — No in-browser ML libs | static | security-tests.md | per NFT-SEC-10 |
+| NFT-SEC-11 — No response-signature / JOSE libs on the request path | static | security-tests.md | per NFT-SEC-11 |
+| NFT-SEC-12 — No service worker — offline mode is explicitly absent | static + e2e | security-tests.md | per NFT-SEC-12 |
+| NFT-SEC-13 — Dropped legacy features are not present in source | static | security-tests.md | per NFT-SEC-13 |
+| NFT-SEC-14 — Anti-criterion AC-N1 — no concurrent-edit reconciliation surfaces | static | security-tests.md | per NFT-SEC-14 |
+
+### Excluded
+
+- Auth-storage assertions (covered in 02_test_auth_token_handling).
+- Destructive `alert()` ban (covered in 11_test_destructive_ux).
+
+## Acceptance Criteria
+
+**AC-1: OWM key absence**
+NFT-SEC-09 — `grep -rn '335799082893fad97fa36118b131f919' src/` returns 0 hits (per the autodev Step 4 C01 refactor); `grep -rn` against `dist/` after `bun run build` also returns 0 hits. The OWM key is provided only via `VITE_OWM_API_KEY` at build time.
+
+**AC-2: No ML libs**
+NFT-SEC-10 — `package.json` does not declare `tensorflow`, `tfjs`, `onnxruntime`, `@tensorflow/*`, `@huggingface/*`; the static check enumerates against an explicit deny-list.
+
+**AC-3: No JOSE / signature libs**
+NFT-SEC-11 — `package.json` does not declare `jose`, `jsonwebtoken`, `node-forge`, `tweetnacl`; same deny-list pattern.
+
+**AC-4: No service worker**
+NFT-SEC-12 — `grep -rn 'serviceWorker' src/` returns no `register(` call; `grep -rn 'sw\.js\|workbox' src/ public/` returns 0; e2e: `navigator.serviceWorker.getRegistrations()` returns `[]`.
+
+**AC-5: Dropped features absent**
+NFT-SEC-13 — `grep -rn 'WhatsApp\|TelegramBot\|D-Bus\|libsignal' src/` returns 0 hits (the dropped legacy integrations).
+
+**AC-6: AC-N1 anti-criterion**
+NFT-SEC-14 — no source string matches `concurrent.edit|operational.transform|crdt|y-?websocket` — the SPA has no concurrent-edit reconciliation surface.
+
+## System Under Test Boundary
+
+- System under test: the source tree at HEAD, the built `dist/` after `bun run build`, and `package.json` / `bun.lock`.
+- Allowed stubs: none — these are pure ripgrep / grep / build-output checks.
+- Disallowed: bypassing the build to test against a hand-edited bundle.
+
+## Constraints
+
+- All checks runnable from `scripts/run-tests.sh --static-only`; total runtime ≤ 30 s.
+- AC-2 / AC-3 deny-list lives in `tests/security/banned-deps.json` so additions are visible in code review.