docs: Step 4 testability refactor — list-of-changes + 2 task specs

autodev existing-code Step 4 (Code Testability Revision) — invoked
refactor skill in guided mode. Phase 0 (baseline) + Phase 1 (discovery
+ validation) + Phase 2 (analysis + task decomposition) artifacts.

list-of-changes.md identifies two surgical fixes required before the
67-scenario blackbox suite (already specified in _docs/02_document/
tests/) can run against the SUT:

  C01 — env-gate JWKS RequireHttps on ASPNETCORE_ENVIRONMENT=E2ETest
       (architecture.md Open Risks Section 6 prescribes this; the
       mock issuer in e2e/docker-compose.test.yml serves plain HTTP)

  C02 — DNS-resolve RABBITMQ_HOST in FailsafeProducer.ProcessQueue
       (IPAddress.Parse currently throws on every drain cycle when
       host is a service name; latent production-relevant bug, not
       just a test-env issue)

Two task specs in _docs/02_tasks/todo/ (3 story points total).
Independent — no inter-task dependency.

Tracker: local — Atlassian MCP reported errored at task-creation
time. Deferred Jira writes (epic + 2 tickets) recorded in
_docs/_process_leftovers/2026-05-14_testability-tracker.md for
replay when MCP is restored.

Items explicitly deferred to Step 8 Refactor are enumerated in
list-of-changes.md "Deferred to Step 8 Refactor" — including the
FailsafeProducer static helper (F3), the JWKS GetAwaiter().GetResult()
hot path, RB-05/06/08 backlog items, and the MediaService ffprobe
empty-catch.

State: Step 4 in_progress, sub_step 3 (phase-2-task-decomposition).
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 20:19:27 +03:00
parent 03f879206e
commit 13e9731a8f
12 changed files with 775 additions and 0 deletions
@@ -0,0 +1,61 @@
# Discovery — Component: Auth & Identity (scoped to C01)
**Component**: `06_platform` → Auth & Identity subsystem
**Source files in scope**: `src/Auth/JwtExtensions.cs`
**Component spec reference**: `_docs/02_document/modules/auth-identity.md`
## Purpose
JWT validation for API authorization policies (`ANN`, `DATASET`, `ADM`). Annotations is a **verifier-only** service — all token minting is the admin service's responsibility.
## Affected API / behavior
- `JwtExtensions.AddJwtAuth(IServiceCollection, IConfiguration)` — wires the JWT bearer scheme. The line affected by C01 is the `HttpDocumentRetriever` construction (line 33). No method signature changes. No DI graph changes. The `TokenValidationParameters` block (issuer, audience, lifetime, ES256 alg pinning, signed-tokens requirement) is untouched by this change.
## Coupling map (affected only)
```
Program.cs
└─ builder.Services.AddJwtAuth(builder.Configuration) ← caller of the affected code
Auth/JwtExtensions.cs (AddJwtAuth)
├─ ConfigurationResolver.ResolveRequiredOrThrow ← unaffected
├─ new ConfigurationManager<JsonWebKeySet>( ← container, unaffected
│ jwksUrl,
│ new JwksRetriever(),
│ new HttpDocumentRetriever { RequireHttps = true } ← C01 changes this constant to env-gated
│ )
└─ services.AddAuthentication(...).AddJwtBearer(...) ← unaffected
```
## C01 — input file claims vs. code reality
| Claim in `list-of-changes.md` C01 | Verification against `src/Auth/JwtExtensions.cs` | Status |
|----------------------------------|--------------------------------------------------|--------|
| `RequireHttps = true` on line 33 | Confirmed at line 33 (`new HttpDocumentRetriever { RequireHttps = true }`). | ✓ |
| No `IHostEnvironment` parameter on `AddJwtAuth` today | Confirmed — signature is `AddJwtAuth(IServiceCollection services, IConfiguration configuration)`. Adding an environment-name parameter (or reading `Environment.GetEnvironmentVariable("ASPNETCORE_ENVIRONMENT")` inline) does not change the public method shape. | ✓ |
| `Program.cs:53` already uses `builder.Environment.EnvironmentName` for the CORS validator | Confirmed (`CorsConfigurationValidator.EnsureSafeForEnvironment(allowedOrigins, allowAnyOrigin, builder.Environment.EnvironmentName)`). | ✓ |
| Test stack sets `ASPNETCORE_ENVIRONMENT=E2ETest` | Confirmed in `e2e/docker-compose.test.yml` line 76 (`ASPNETCORE_ENVIRONMENT: E2ETest`). | ✓ |
| Open Risks §6 in `architecture.md` flags this exact change | Confirmed — `_docs/02_document/architecture.md` Open Risks §6 reads: "JWKS HTTPS-only retrieval blocks plain-HTTP test harness; resolution is `ASPNETCORE_ENVIRONMENT=E2ETest` + relaxed `RequireHttps` for tests, never in production." | ✓ |
| `test-data.md` "Bearer token harness" §2 prescribes the same fix | Confirmed verbatim. | ✓ |
All claims hold; no contradictions to surface to the user.
## Issues discovered during scoped analysis (additional to the input file)
None within the C01 scope. The `IssuerSigningKeyResolver` uses `.GetAwaiter().GetResult()` (sync-over-async on the auth hot path) — already enumerated under "Deferred to Step 8 Refactor" in `list-of-changes.md`; the test suite does not depend on substituting it, so no change is required for testability.
## Architecture Vision check
`_docs/02_document/architecture.md` Architecture Vision § "Verifier-only auth, no token issuance in annotations":
- C01 does NOT change verification semantics — algorithm pinning, signature, lifetime, audience, and issuer all remain enforced.
- C01 changes only the *transport requirement* for fetching the public-key document from a non-production issuer URL.
- No contradiction.
## Module-layout check
`_docs/02_document/module-layout.md` Component 06 (`06_platform`) → Auth: the affected file `src/Auth/JwtExtensions.cs` is the documented owner; no boundary crossing.
## Public API impact
None — `AddJwtAuth` signature unchanged; no DTOs, OpenAPI shapes, or HTTP responses affected.
@@ -0,0 +1,75 @@
# Discovery — Component: Realtime Sync / Failsafe Producer (scoped to C02)
**Component**: `02 annotations-realtime-sync`
**Source files in scope**: `src/Services/FailsafeProducer.cs`
**Component spec reference**: `_docs/02_document/modules/rabbitmq-stream-sync.md`
## Purpose
Outbox drain + RabbitMQ Stream producer (`BackgroundService`). Reads `annotations_queue_records`, serializes payloads (MessagePack + gzip), publishes to the `azaion-annotations` stream, then deletes drained rows.
## Affected API / behavior
- `FailsafeProducer.ProcessQueue(CancellationToken)` — line 54-76 — currently constructs `StreamSystem` via:
```csharp
Endpoints = [new IPEndPoint(IPAddress.Parse(config.Host), config.Port)]
```
Affected by C02. No other call site uses `IPAddress.Parse` against `config.Host`.
- The `FailsafeProducer` constructor (line 24-29) takes `IServiceScopeFactory`, `PathResolver`, `RabbitMqConfig`, `ILogger`. **Unchanged by C02.**
- The static `FailsafeProducer.EnqueueAsync` (line 195) — synchronous outbox row insert called from `AnnotationService` — does NOT use the broker connection and is unaffected.
## Coupling map (affected only)
```
Program.cs
└─ builder.Services.AddSingleton(rabbitMqConfig) ← unaffected
└─ builder.Services.AddHostedService<FailsafeProducer> ← unaffected
Services/FailsafeProducer.cs
├─ ExecuteAsync ← unaffected (loop / retry envelope)
├─ ProcessQueue ← C02 changes this line
│ ├─ IPAddress.Parse(config.Host) ← REPLACED by env-resolve
│ └─ StreamSystem.Create / Producer.Create ← unchanged
├─ DrainQueue ← unaffected (queue read / msg build / publish / delete)
└─ EnqueueAsync (static) ← unaffected
```
## C02 — input file claims vs. code reality
| Claim in `list-of-changes.md` C02 | Verification against `src/Services/FailsafeProducer.cs` | Status |
|----------------------------------|---------------------------------------------------------|--------|
| `IPAddress.Parse(config.Host)` on line 56 | Confirmed at line 56. | ✓ |
| `IPAddress.Parse` throws `FormatException` for non-IP strings | Verified against .NET BCL contract for `IPAddress.Parse(string)`. | ✓ |
| `config.Host` is populated from `RABBITMQ_HOST` env var | Confirmed at `Program.cs:40` (`Environment.GetEnvironmentVariable("RABBITMQ_HOST") ?? "127.0.0.1"`). | ✓ |
| Test stack sets `RABBITMQ_HOST=rabbitmq` (DNS hostname) | Confirmed in `e2e/docker-compose.test.yml` line 82. | ✓ |
| Test-environment fallback default in `RabbitMqConfig` class is `"rabbitmq"` | Confirmed in `FailsafeProducer.cs:17` (`public string Host { get; set; } = "rabbitmq"`). This means even ignoring `Program.cs`, the *default* triggers the bug. | ✓ |
| `BackgroundService` catches exceptions in `ExecuteAsync` and backs off 10 s | Confirmed at lines 44-48. | ✓ |
| Outbox insert (`EnqueueAsync`) is synchronous from the request thread and unaffected | Confirmed at line 195; called from `AnnotationService.cs:102`. | ✓ |
| `IPEndPoint` ctor requires `IPAddress`, not hostname | Verified against `RabbitMQ.Stream.Client` API surface (`StreamSystemConfig.Endpoints` is `IList<EndPoint>`; `IPEndPoint` is the standard-library type the existing code uses; `RabbitMQ.Stream.Client` accepts any `System.Net.EndPoint`, so a `DnsEndPoint` is a theoretical alternative — but every example in the client repo uses `IPEndPoint`, and the call is wrapped in a sync `IPEndPoint` constructor today, so the smallest-change path is to keep `IPEndPoint` and resolve the hostname ourselves). | ✓ |
All claims hold; no contradictions to surface to the user.
## Issues discovered during scoped analysis (additional to the input file)
1. **`IServiceScopeFactory.CreateScope()` is called inside `DrainQueue` to fetch a scoped `AppDataConnection`** (line 80). This is fine — it follows the documented `BackgroundService` pattern. Not in scope.
2. **`catch { }` at line 138 swallows image-read failures** — already enumerated under "Deferred to Step 8 Refactor" in `list-of-changes.md` (RB-05 tracks the proper logging + metric). No change here.
3. **`ProcessQueue` creates a new `StreamSystem` on every entry** — i.e., on every retry. With C02 applied, this remains the behavior — broker reconnects per outage cycle. Acceptable; matches the documented "broker recovers, drain resumes" behavior in NFT-RES-01. No additional change.
## Architecture Vision check
`_docs/02_document/architecture.md` Architecture Vision § "Lifecycle observability via outbox + stream":
- C02 is required to *honor* this vision in any environment where `RABBITMQ_HOST` is a hostname. Today the producer silently never drains in such environments.
- The fix preserves the documented flow (outbox row → batch read → MessagePack serialize → gzip → publish → delete row).
- No contradiction; this change is squarely aligned with the vision.
## Module-layout check
`_docs/02_document/module-layout.md` Component 02 (`02_annotations-realtime-sync`) → `FailsafeProducer` is the documented owner; the static `EnqueueAsync` helper is part of the Component 02 Public API (per F3 in the baseline) and is unchanged. C02 is internal to the component.
## Public API impact
None — `FailsafeProducer` is a `BackgroundService`; its `ExecuteAsync` is called by the host. The static `EnqueueAsync` (the only external surface) is unchanged. No DTOs, no HTTP shapes, no MessagePack wire format affected.
## Wire-format / stream contract check
C02 changes *how the producer reaches the broker*, not what it sends. Verified by reading `DrainQueue` (lines 78-181): the `MessagePackSerializer.Serialize(...)` calls, `Producer.Send(messages, CompressionType.Gzip)`, and the queue-table delete are all downstream of the line C02 touches and are untouched. Consumers (admin's `AnnotationSyncWorker`, AI Training consumer) see identical messages.