Step 5 of autodev existing-code flow. Epic AZ-563 plus 11 atomic tasks covering all 67 test scenarios from _docs/02_document/tests/* exactly once: - AZ-564 test infrastructure (xUnit + Docker + mock JWKS + dataseed) - AZ-565..568 functional positive (FT-P-01..22) - AZ-569..570 functional negative (FT-N-01..16) - AZ-571 security (NFT-SEC-01..10) - AZ-572 resilience (NFT-RES-01..06) - AZ-573 resource limits (NFT-RES-LIM-01..06) - AZ-574 performance (NFT-PERF-*) _dependencies_table.md records the cross-check vs traceability matrix (22 + 16 + 29 = 67 scenarios, no overlaps, no gaps; deferred items remain deferred per matrix). All task headers carry their Jira IDs (tracker: jira). Autodev state advanced to Step 6 (Implement Tests). Co-authored-by: Cursor <cursoragent@cursor.com>
2.7 KiB
Resilience tests (NFT-RES-01..06)
Task: AZ-572
Name: Resilience tests
Description: Implement xUnit tests for the 6 resilience scenarios: RabbitMQ outage during create, Postgres restart, Postgres unreachable, SSE subscriber disconnect mid-stream, FailsafeProducer empty-catch path, stream consumer reconnect.
Complexity: 5 points
Dependencies: AZ-564 (test infrastructure)
Component: Blackbox Tests → Resilience
Tracker: jira
Epic: AZ-563
Scenarios Covered
| Test ID | Source | What it asserts |
|---|---|---|
| NFT-RES-01 | _docs/02_document/tests/resilience-tests.md |
RabbitMQ outage during create. POST /annotations returns 200; outbox row stays; on broker recovery, message is delivered. |
| NFT-RES-02 | same | Postgres restart between writes. POST /annotations after restart succeeds without errors. |
| NFT-RES-03 | same | Postgres unreachable during create. POST /annotations returns 5xx; error envelope; no partial state. |
| NFT-RES-04 | same | SSE subscriber disconnects mid-stream. Server tears down channel cleanly; no zombie subscriptions; per-instance state cleanup. |
| NFT-RES-05 | same | Repeated FailsafeProducer empty-catch path (catch{} swallowing IOException). Drain loop survives missing image file; no crash. |
| NFT-RES-06 | same | Stream consumer reconnect. After broker restart, consumer resumes from offset and reads the same messages. |
System Under Test Boundary
- HTTP only for SUT invocations.
BrokerFixture.StopBroker()/StartBroker()for NFT-RES-01, NFT-RES-06.docker exec postgres pg_ctl stop/start(ordocker restart postgres) for NFT-RES-02, NFT-RES-03.- NFT-RES-05 deliberately removes a specific image file from
annotations-imagesvolume (out-of-band, runner-only) to trigger the empty-catch path. Marked with[Trait("fs_access", "delete-image-file")]. - Long-running scenarios (NFT-RES-06 reconnect window) use a
[Fact(Timeout = 60000)]cap.
Acceptance Criteria
AC-1: Every scenario passes per its spec.
AC-2: SUT never crashes during the test class
Given any resilience test in the class is running,
When the runner asserts the test's post-condition,
Then the SUT's /health endpoint still returns 200 (the SUT survives every external failure).
AC-3: NFT-RES-01 verifies stream delivery on recovery
Given the broker was stopped before a POST /annotations and restarted after,
When BrokerFixture.StartBroker() returns,
Then the stream consumer reads the queued message within 30 s of broker recovery.
Constraints
- AAA pattern.
[Trait("traces_to", "AC-F-04, AC-F-12, AC-N-04, SW-03")]plus per-test specific traces.- Resilience tests are long; group them in their own xUnit collection to avoid blocking the fast suite.