# Resilience Tests > **Status**: produced by autodev `/test-spec` Phase 2 (2026-05-14). > **Naming**: post-rename target. Resilience scenarios use the side-channel + container-orchestration tools to inject faults; the system-under-test is treated as a black box. > **Critical scenarios**: AC-3.3 (cascade NOT transaction-wrapped) and AC-3.4 (orphan-row race) are the highest-impact resilience invariants — they intentionally encode current (sub-optimal) behavior so tests catch any silent change. Several other resilience tests verify documented operational behaviors (idempotent migrator, DB-down crash, JWT secret rotation). --- ### NFT-RES-01: Cascade is NOT transaction-wrapped — partial deletes survive mid-walk failure **Summary**: Verifies the documented ADR-006 carry-forward — when the cascade walk fails mid-way (e.g., `media` table absent), already-committed deletes remain. **Traces to**: AC-3.3, AC-3.4 (related), AC-10.2 **Preconditions**: - `fixture_cascade_F3` applied (chain rooted at `mid`) - `missions` running **Fault injection**: side-channel `DROP TABLE media CASCADE;` BEFORE the request — this turns the second sub-step of the cascade walk into a `relation does not exist` failure. **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | Side-channel `DROP TABLE media CASCADE` | succeeds | | 2 | `DELETE /missions/{mid}` (JWT `FL`) | `500`; envelope `{ statusCode:500, message:"Internal server error" }` | | 3 | Side-channel `SELECT COUNT(*) FROM map_objects WHERE mission_id={mid}` | `count == 0` (work BEFORE the failure point committed — non-zero pre-fault, zero post-fault) | | 4 | Side-channel `SELECT COUNT(*) FROM missions WHERE id={mid}` | `count == 1` (work AFTER the failure point did NOT run — row remains) | | 5 | `docker logs missions \| grep "Unhandled exception"` | At least one matching log line containing `relation` and `media` | **Pass criteria**: `map_objects` count is 0 (deleted before failure) AND `missions` count is 1 (not deleted because failure short-circuited the walk) AND the response is 500 with the redacted body. **Note**: this test will FAIL once a transaction wrap is added (ADR-006 closure) — at that point ALL deletes will roll back and `map_objects` count will be `>0`. When the transaction wrap lands, update this test. **Max execution time**: 10s. --- ### NFT-RES-02: Waypoint cascade is NOT transaction-wrapped (same invariant as F3) **Summary**: Same invariant as NFT-RES-01, scoped to F4 (waypoint cascade). **Traces to**: AC-4.6, AC-3.3 (same root cause), AC-10.2 **Preconditions**: - `fixture_cascade_F4` applied (waypoint `wp1` with chain) **Fault injection**: side-channel `DROP TABLE media CASCADE` BEFORE the request. **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | Side-channel `DROP TABLE media CASCADE` | succeeds | | 2 | `DELETE /missions/{mid}/waypoints/{wp1}` (JWT `FL`) | `500` | | 3 | Side-channel `SELECT COUNT(*) FROM detection WHERE annotation_id IN (wp1 chain)` | `count == 0` (deleted before failure) | | 4 | Side-channel `SELECT COUNT(*) FROM waypoints WHERE id={wp1}` | `count == 1` (not deleted) | **Pass criteria**: same shape as NFT-RES-01. **Max execution time**: 10s. --- ### NFT-RES-03: Idempotent migrator — second startup is a no-op **Summary**: Verifies the migrator can run twice in a row without `relation already exists` errors. **Traces to**: AC-6.6, AC-6.4 **Preconditions**: - `seed_empty` (schema migrated once via first `missions` startup) **Fault injection**: container restart (NOT volume reset). **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | `docker compose restart missions` | container exits cleanly + restarts | | 2 | Wait for `GET /health` to return `200` | within ≤ 30s | | 3 | `docker logs missions \| grep -E "(error\|Error\|exception)"` | no NEW error / exception lines after the restart timestamp | | 4 | Side-channel `\d+ vehicles` | schema unchanged from after the first migrate | **Pass criteria**: second start completes; no new error log lines; schema unchanged. **Max execution time**: 60s. --- ### NFT-RES-04: B9 one-shot legacy table drop runs once and is idempotent **Summary**: Verifies that the post-B9 `DROP TABLE IF EXISTS orthophotos / gps_corrections` block in the migrator is destructive on the FIRST run against a legacy device, and a no-op on subsequent runs. **Traces to**: AC-6.5, AC-10.5 **Preconditions**: - `seed_legacy_gps_tables` (schema includes `orthophotos` + `gps_corrections` with 1 row each) - `missions` NOT yet started for this scenario **Fault injection**: none — purely observe migrator behavior. **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | Side-channel `SELECT to_regclass('orthophotos'), to_regclass('gps_corrections')` | both NON-NULL (legacy tables present) | | 2 | `docker compose up -d missions` | container starts | | 3 | Wait for `GET /health` to return `200` | | | 4 | Side-channel re-query | both NULL (dropped) | | 5 | `docker compose restart missions` | | | 6 | Side-channel re-query | both still NULL (idempotent — no error) | | 7 | `docker logs missions \| grep -i "does not exist"` | NO log line (because of `IF EXISTS`) | **Pass criteria**: legacy tables absent after first start; subsequent restarts produce no errors and leave them absent. **Note**: this test only meaningfully runs on a **post-B9 build**. Before B9 lands, the migrator has no DROP block; gate this scenario on a build-time flag or by inspecting the migrator source. **Max execution time**: 60s. --- ### NFT-RES-05: DB unreachable at startup — process exits non-zero **Summary**: Verifies AC-6.7 — DB unreachability causes process exit, NOT silent retry-forever. **Traces to**: AC-6.7 **Preconditions**: - `missions` NOT running **Fault injection**: stop `postgres-test` (`docker compose stop postgres-test`) then start `missions`. **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | `docker compose stop postgres-test` | | | 2 | `docker compose up -d missions` | | | 3 | Poll `docker inspect --format '{{.State.ExitCode}}' missions` every 1s for ≤ 30s | At some point within 30s, the container has exited with non-zero exit code | | 4 | `docker logs missions` | Contains an Npgsql connection error message (e.g., `Connection refused`) | **Pass criteria**: container exits with non-zero code within 30s; logs contain a recognisable Npgsql error. **Max execution time**: 60s. --- ### NFT-RES-06: DB missing (database does not exist) — process exits with Npgsql 3D000 **Summary**: Verifies AC-6.8 — when the `azaion` database does not exist, process exits with the documented PostgreSQL error code. **Traces to**: AC-6.8 **Preconditions**: - `postgres-test` running with the `azaion` database NOT yet created (use `POSTGRES_DB=postgres` instead, or `DROP DATABASE azaion`) **Fault injection**: same as preconditions. **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | Side-channel `DROP DATABASE IF EXISTS azaion` | | | 2 | `docker compose up -d missions` | | | 3 | Poll exit code for ≤ 30s | non-zero | | 4 | `docker logs missions \| grep "3D000"` | at least one match | **Pass criteria**: container exits non-zero within 30s; logs contain `3D000`. **Max execution time**: 60s. --- ### NFT-RES-07: JWT_SECRET rotation invalidates existing tokens **Summary**: Verifies AC-5.7 — restarting the service with a different `JWT_SECRET` causes previously-valid tokens to fail validation. **Traces to**: AC-5.7 **Preconditions**: - `missions` running with `JWT_SECRET=test-secret-32-chars-min!!!!!!!!!` - Token `T1` minted with the same secret, valid for 1h **Fault injection**: restart `missions` with `JWT_SECRET=rotated-secret-32-chars-min!!!!!`. **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | `GET /vehicles` with `Authorization: Bearer T1` | `200` | | 2 | `docker compose stop missions` | | | 3 | `docker compose run -e JWT_SECRET=rotated-secret-32-chars-min!!!!! -d missions` | | | 4 | Wait for `GET /health` 200 | | | 5 | `GET /vehicles` with `Authorization: Bearer T1` (same token as step 1) | `401` | | 6 | Mint token `T2` with the new secret, `GET /vehicles` with `T2` | `200` | **Pass criteria**: `T1` works pre-rotation, fails post-rotation; `T2` works post-rotation. **Max execution time**: 90s. --- ### NFT-RES-08: TOCTOU on default-vehicle exclusivity (race window) **Summary**: Verifies AC-1.4 — the clear-then-set is NOT transaction-wrapped, so a concurrent INSERT can leave 2+ defaults. **Traces to**: AC-1.4 (carry-forward) **Preconditions**: - `seed_one_default_vehicle` (default `P1`) **Fault injection**: a second concurrent client issues `INSERT INTO vehicles (..., is_default=true)` directly to the side-channel DB at the same moment as the API client issues `POST /vehicles { IsDefault:true }`. Synchronisation: the test orchestrator pauses at `pg_advisory_lock(1)` after the service's `UPDATE vehicles SET is_default=FALSE` and BEFORE the service's `INSERT` — implemented via a Postgres function that the test installs and the service path traverses (this requires an instrumented test build; if not available, the test is best-effort by issuing 100 parallel INSERTs). **Steps**: | Step | Action | Expected Behavior | |------|--------|------------------| | 1 | Run 100 parallel iterations of `(POST /vehicles { IsDefault:true } + side-channel INSERT (..., is_default=true))` | each iteration completes | | 2 | Side-channel `SELECT COUNT(*) FROM vehicles WHERE is_default=true` | count is `≥ 2` in at least one iteration | **Pass criteria**: at least one iteration produces `default_count ≥ 2`. If 0 iterations produce the race, the test FAILS — either the race window has been closed (good news; rewrite the test to assert `default_count == 1` and update AC-1.4 to remove the race carry-forward), OR the test concurrency primitive is wrong (investigate). **Note**: this test is intentionally PROBABILISTIC — it asserts the RACE EXISTS, not that the system is broken. A `PASS` here is bad news for the system but means the test is correctly observing the documented behavior. **Max execution time**: 30s. --- ## Notes - Tests that drop tables (NFT-RES-01, NFT-RES-02, NFT-RES-08) run in a per-class `IClassFixture` that recreates the schema after each scenario. - NFT-RES-03 through NFT-RES-07 require container-orchestration via `docker compose` from inside the test runner. The `e2e-consumer` container needs the `docker` CLI + a mounted Docker socket — alternatively, the test runs from the host with `docker compose` available there. Hardware-assessment will lock this preference. - NFT-RES-08 is intentionally probabilistic and may be flaky on slow runners. CI marks this as `[Trait("Stability","probabilistic")]` and tolerates ≤ 1 failed run per 5; deterministic implementation (advisory lock + instrumented build) is a follow-up.