Files
annotations/_docs/02_document/system-flows.md
T
Oleksandr Bezdieniezhnykh 03f879206e docs+src: complete Steps 1-3 outcomes + auth re-sync baseline
This commit captures everything produced during autodev existing-code
Steps 1 (Document), 2 (Architecture Baseline Scan), and 3 (Test Spec),
together with the targeted auth + CORS re-sync triggered on 2026-05-14
when codebase drift was detected at Step 4 entry. None of this work was
previously committed.

Step 1 (Document) — 50+ _docs/02_document/ files: problem, solution,
architecture, system flows, glossary, module-layout, per-component
specs (01..06), modules, deployment, diagrams, data model, FINAL
report, verification log, discovery.

Step 2 (Architecture Baseline) — architecture_compliance_baseline.md.
Verdict PASS_WITH_WARNINGS (0 Critical, 0 High, 1 Medium, 2 Low). No
High/Critical findings; auto-chained to Step 3 per existing-code flow.

Step 3 (Test Spec) — _docs/02_document/tests/* (67 scenarios across
blackbox, security, resilience, resource-limit, performance), plus
e2e/docker-compose.test.yml, e2e/seed/run.sh, scripts/run-tests.sh,
scripts/run-performance-tests.sh. Coverage 88% over the active scope
(40 of 45 items covered, 6 RB-deferred, 5 documented-as-uncovered).

Targeted auth + CORS re-sync — replaces the deleted in-house token
issuer with a JWKS-verifier model. AuthController and TokenService
removed; JwtExtensions switched from HS256 symmetric to ES256 over
admin's JWKS. ConfigurationResolver and CorsConfigurationValidator
added under src/Infrastructure/. ADR-002 and ADR-006 retired; SEC-01,
SEC-02, SEC-03 marked Closed. One new testability risk recorded in
architecture.md Open Risks Section 6 (JWKS HTTPS gating).

Source changes:
- src/Auth/JwtExtensions.cs (modified) — ES256, JWKS, alg pinning
- src/Program.cs (modified) — DI wiring for ConfigurationResolver
  and CorsConfigurationValidator
- src/Controllers/AuthController.cs (deleted) — no in-service issuance
- src/Services/TokenService.cs (deleted) — same
- src/Infrastructure/ConfigurationResolver.cs (new)
- src/Infrastructure/CorsConfigurationValidator.cs (new)
- .env.example (new) — required env var documentation
- .gitignore (updated)

Cross-repo coordination: _docs/cross-repo/flights_h1_h2_h3_change_spec
captures the change-spec for downstream services that consumed the now
deleted /auth endpoints.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 20:19:05 +03:00

461 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Azaion.Annotations — System Flows
> Bottom-up: traces in this document are derived from `components/*/description.md`, `modules/*.md`, and the source under `src/`. Mermaid diagrams per flow are linked under `diagrams/flows/`.
## Flow Inventory
| # | Flow Name | Trigger | Primary Components | Criticality |
|---|-----------|---------|---------------------|-------------|
| F1 | Annotation Create (with image bytes) | `POST /annotations` from detections service or UI | 01 + 02 + 06 + 03 | High |
| F2 | Annotation Listing / Read | `GET /annotations`, `GET /annotations/{id}/{thumbnail|image}` | 01 + 06 + 03 | High |
| F3 | Real-time SSE Subscription | `GET /annotations/events` from UI | 01 + 02 + 06 | High |
| F4 | Failsafe Outbox Drain → RabbitMQ Stream | `FailsafeProducer` background loop | 02 + 06 | High |
| F5 | Media Upload (single + batch) | `POST /media`, `POST /media/batch` | 03 + 06 | High |
| F6 | Auth Refresh (out-of-process) | Long-running callers refresh against admin's `POST /token/refresh`; annotations only verifies the resulting access token | 06 (verifier) + admin (issuer, out-of-scope) | Medium |
| F7 | Directory Settings Change → Path Cache Reset | `PUT /settings/directories` | 05 + 06 | Medium |
| F8 | Dataset Bulk Status | `PATCH /dataset/.../status`, bulk variant | 04 + 06 | Medium |
## Flow Dependencies
| Flow | Depends on | Shares data with |
|------|------------|-------------------|
| F1 | F5 (media must exist for create-with-`MediaId`) | F2 (read-after-write), F3 (Create-only event publish), F4 (Create-only queue insert, gated by `silent_detection`) |
| F2 | F1 (writes data being read), F5 | F3 (consistency window) |
| F3 | F1 (SSE stream is fed by F1 Create publishes only) | — |
| F4 | F1 (reads outbox written by F1 Create only) | downstream consumers (admin sync, AI training) |
| F5 | — | F1 |
| F6 | — | all `[Authorize]` flows (refreshes the token they use) |
| F7 | — | F1, F2, F4, F5 (all paths via `PathResolver`) |
| F8 | F1 | **none today** — F8 does not feed F3 or F4 (open question) |
---
## Flow F1: Annotation Create (with image bytes)
### Description
Detections service or UI POSTs an annotation payload with image bytes (or a `MediaId` for an existing media row). The service hashes the bytes, derives the annotation id, writes the image to disk, ensures a `media` row exists, persists annotation + detection rows, writes the YOLO label file, publishes an in-process SSE event, and — unless `system_settings.silent_detection` is true — enqueues an outbox row for downstream RabbitMQ stream export. **Thumbnails are not generated in this flow** (they are read-only via `PhysicalFile` from a separately populated path).
### Preconditions
- Caller holds a JWT with `permissions: ANN`.
- `directory_settings` row exists (seeded by migrator with `/data/...` defaults).
- Postgres reachable (errors otherwise surfaced as 500 by `ErrorHandlingMiddleware`).
### Sequence Diagram
See `diagrams/flows/flow_annotation_create.md` for the full sequence + flowchart.
```mermaid
sequenceDiagram
autonumber
participant Caller as Detections / UI
participant Ctrl as AnnotationsController (01)
participant Svc as AnnotationService (01)
participant Path as PathResolver (06)
participant DB as PostgreSQL (06)
participant FS as Filesystem
participant Evt as AnnotationEventService (02)
participant Q as annotations_queue_records (DB / 02)
Caller->>Ctrl: POST /annotations (CreateAnnotationRequest, JWT)
Ctrl->>Svc: CreateAnnotation(request, userId from JWT)
alt request.Image bytes provided
Svc->>Svc: ComputeHash (XxHash64 over sampled bytes) -> id
Svc->>Path: GetImagePath(id)
Svc->>FS: write {id}.jpg
Svc->>DB: SELECT media WHERE id=id
opt media row missing
Svc->>DB: INSERT media (Image, MediaStatus.New, ...)
end
else request.MediaId provided
Svc->>DB: SELECT media WHERE id=MediaId (404 if missing)
Svc->>Path: GetImagePath(id)
opt source media file exists & target image missing
Svc->>FS: copy media.Path -> {id}.jpg
end
end
Svc->>DB: INSERT annotations
Svc->>DB: BulkCopy detection rows
Svc->>Path: GetLabelPath(id)
Svc->>FS: write {id}.txt (YOLO)
Svc->>Evt: PublishAsync(AnnotationEventDto)
Svc->>DB: SELECT system_settings (FirstOrDefault)
alt SilentDetection != true
Svc->>Q: FailsafeProducer.EnqueueAsync(db, id, QueueOperation.Created)
end
Svc-->>Ctrl: Annotation
Ctrl-->>Caller: 201 Created (Location: /annotations/{id})
```
### Data Flow
| Step | From | To | Data | Format |
|------|------|----|------|--------|
| 1 | Caller | `AnnotationsController` | `CreateAnnotationRequest` + JWT | JSON / Bearer |
| 2 | `AnnotationService` | Filesystem | image bytes | `{id}.jpg` under `images_dir` |
| 3 | `AnnotationService` | DB | `media` row (insert if absent) | SQL via Linq2DB |
| 4 | `AnnotationService` | DB | `annotations` row | SQL |
| 5 | `AnnotationService` | DB | `detection` rows | `BulkCopyAsync` |
| 6 | `AnnotationService` | Filesystem | YOLO label `{id}.txt` | text lines `class cx cy w h` |
| 7 | `AnnotationService` | `AnnotationEventService` | `AnnotationEventDto` | in-memory `Channel<>` |
| 8 | `AnnotationService` | DB outbox | `annotations_queue_records` (operation=Created) | row, only if `SilentDetection != true` |
### Error Scenarios
| Error | Where | Detection | Recovery |
|-------|-------|-----------|----------|
| Neither bytes nor MediaId provided | request validation | `ArgumentException` in service | mapped to 400 by middleware |
| Referenced `MediaId` not found | media lookup | `KeyNotFoundException` | 404 |
| Filesystem write fails (no perms / disk full) | step 2 / 6 | IOException | 500 via middleware; **NOT transactional with DB** — risk of orphan files on partial failure |
| DB write fails after FS success | steps 35 | Linq2DB exception | 500; orphan image / label may remain (open risk) |
| SSE publish fails | step 7 | unbounded channel — failure unlikely | logged via default ASP.NET Core logger |
| Outbox insert fails after SSE publish | step 8 | exception | 500; UI saw the event but downstream stream consumers will not — **observable inconsistency** |
| RabbitMQ unavailable | n/a here | — | F4 handles drain offline — F1 itself is unaffected |
### Performance Expectations
| Metric | Target | Notes |
|--------|--------|-------|
| End-to-end latency | not specified in code | dominant cost: hashing + 3 disk writes; flag for `00_problem` extraction |
| Throughput | not specified | single instance bounded by DB + disk FS |
---
## Flow F2: Annotation Listing / Read
### Description
UIs and dataset consumers list annotations with filters (e.g., `FlightId`, status) and fetch image / thumbnail bytes. Read path is read-only against Postgres + `PhysicalFile` from the configured directories.
### Preconditions
- Caller holds JWT with `ANN` (or `DATASET` for the dataset variant in F8).
### Sequence Diagram
```mermaid
sequenceDiagram
autonumber
participant UI
participant Ctrl as AnnotationsController (01)
participant Svc as AnnotationService (01)
participant DB
participant Path as PathResolver (06)
participant FS as Filesystem
UI->>Ctrl: GET /annotations?filters
Ctrl->>Svc: GetAnnotations(query)
Svc->>DB: SELECT annotations × detection × media
DB-->>Svc: rows
Svc-->>Ctrl: PaginatedResponse<AnnotationListItem>
Ctrl-->>UI: 200 OK (JSON)
UI->>Ctrl: GET /annotations/{id}/thumbnail
Ctrl->>Path: GetThumbnailPath(id)
Path-->>Ctrl: /data/thumbnails/{id}.jpg
Ctrl->>FS: File.Exists?
alt exists
Ctrl-->>UI: 200 OK (image/jpeg, PhysicalFile)
else missing
Ctrl-->>UI: 404 NotFound
end
```
### Data Flow
| Step | From | To | Data | Format |
|------|------|----|------|--------|
| 1 | UI | controller | `GetAnnotationsQuery` | query string |
| 2 | service | DB | filtered join | SQL |
| 3 | service | UI | list + paging metadata | `PaginatedResponse<AnnotationListItem>` |
| 4 | controller | UI | image / thumbnail bytes | `image/jpeg` |
### Error Scenarios
| Error | Where | Detection | Recovery |
|-------|-------|-----------|----------|
| Missing image file | thumbnail / image route | `File.Exists` false | 404 |
| Auth failure | model binding | JWT pipeline | 401 / 403 |
| DB error | listing | Linq2DB | 500 via middleware |
---
## Flow F3: Real-time SSE Subscription
### Description
UI opens a long-lived `text/event-stream` connection and receives JSON-serialized `AnnotationEventDto` payloads as they are published by F1, F8, and any other annotation mutation.
### Sequence Diagram
```mermaid
sequenceDiagram
autonumber
participant UI
participant Ctrl as AnnotationsController.Events (02 doc-ownership)
participant Evt as AnnotationEventService (02)
participant Producer as Other flows (F1/F8)
UI->>Ctrl: GET /annotations/events (Accept: text/event-stream, JWT ANN)
Ctrl->>Evt: subscribe(Reader)
loop until cancelled
Producer->>Evt: PublishAsync(AnnotationEventDto)
Evt-->>Ctrl: ReadAllAsync yields event
Ctrl-->>UI: data: {json}\n\n
end
UI--xCtrl: client disconnect / cancel
```
### Data Flow
| Step | From | To | Data | Format |
|------|------|----|------|--------|
| 1 | UI | controller | upgrade to SSE | HTTP/1.1 |
| 2 | producer | service | `AnnotationEventDto` | in-memory message |
| 3 | controller | UI | `data: {json}\n\n` | SSE frame |
### Error Scenarios
| Error | Where | Detection | Recovery |
|-------|-------|-----------|----------|
| Auth failure | request | JWT pipeline | 401 |
| Client disconnect | streaming | `CancellationToken` | controller exits cleanly |
| Process restart | streaming | n/a | UI must reconnect; **buffered events between disconnect and restart are lost** (intentional — durability handled by F4) |
### Performance Expectations
In-process channel; latency is bounded by `Channel<>` + write-flush — sub-millisecond locally.
---
## Flow F4: Failsafe Outbox Drain → RabbitMQ Stream
### Description
`FailsafeProducer` is a singleton `BackgroundService` that polls `annotations_queue_records`, re-reads image bytes for `Created` operations, packs `AnnotationQueueMessage` / `AnnotationBulkQueueMessage` (MessagePack), and publishes to the `azaion-annotations` RabbitMQ stream. After a successful publish, the row is deleted.
### Sequence Diagram
```mermaid
sequenceDiagram
autonumber
participant FP as FailsafeProducer (02)
participant DB
participant Path as PathResolver (06)
participant FS as Filesystem
participant RMQ as RabbitMQ Stream
loop while host running
FP->>DB: SELECT annotations_queue_records
DB-->>FP: pending rows
loop per row
alt operation = Created
FP->>Path: GetImagePath(annotationId)
FP->>FS: read bytes
end
FP->>FP: serialize MessagePack (Annotation* QueueMessage)
FP->>RMQ: publish stream entry
alt publish ok
FP->>DB: DELETE annotations_queue_records WHERE id = ...
else stream unavailable
FP->>FP: backoff + retry next loop
end
end
end
```
### Data Flow
| Step | From | To | Data | Format |
|------|------|----|------|--------|
| 1 | DB | producer | outbox rows | SQL |
| 2 | filesystem | producer | image bytes | binary |
| 3 | producer | RabbitMQ stream | `AnnotationQueueMessage` / `AnnotationBulkQueueMessage` | MessagePack (gzip per impl) |
| 4 | producer | DB | DELETE | SQL |
### Error Scenarios
| Error | Where | Detection | Recovery |
|-------|-------|-----------|----------|
| RabbitMQ unreachable | publish | client exception | row stays in outbox; retried next tick |
| Image file missing for `Created` | step 2 | FS read fails | open question — current behavior should be confirmed in code-review (skip vs retry) |
| Concurrent drainers (multiple instances) | step 4 | no leasing | rows may be picked up twice → duplicate stream entries; consumers must dedupe |
### Performance Expectations
Bounded by RabbitMQ stream throughput + disk read for `Created`; durability is the priority (see ADR-003).
---
## Flow F5: Media Upload (single + batch)
### Description
UI uploads media files. `MediaController` accepts a single JSON-described upload (`POST /media`) or a multipart batch (`POST /media/batch` with `waypointId` + `IFormFileCollection`). `MediaService` writes the file under the configured media directory and persists a `media` row.
### Sequence Diagram
```mermaid
sequenceDiagram
autonumber
participant UI
participant Ctrl as MediaController (03)
participant Svc as MediaService (03)
participant Path as PathResolver (06)
participant DB
participant FS as Filesystem
UI->>Ctrl: POST /media[/batch] (multipart or JSON, JWT ANN)
Ctrl->>Svc: CreateMedia / CreateBatch
Svc->>Path: GetMediaDir(...)
Svc->>FS: write file(s) under media dir
Svc->>DB: INSERT media row(s)
Svc-->>Ctrl: created media id(s)
Ctrl-->>UI: 201 Created
```
### Error Scenarios
| Error | Where | Detection | Recovery |
|-------|-------|-----------|----------|
| Filesystem write fails | service | IOException | 500 |
| Unsupported format | service | format check | 400 (per service validation; confirm during Step 4 verification) |
---
## Flow F6: Auth Refresh — REMOVED
Annotations no longer mints tokens. The legacy `POST /auth/refresh` endpoint and its backing `TokenService` were removed; admin (`POST /token/refresh`) is now the sole refresh issuer for the suite. Detections and any other long-running caller must refresh against admin and pass the resulting access token to annotations.
This service is a **verifier only**: it validates the `Authorization: Bearer …` header against admin's JWKS (`JWT_JWKS_URL`) on every `[Authorize]` route — see `JwtExtensions` in `_docs/02_document/modules/auth-identity.md`.
---
## Flow F7: Directory Settings Change → Path Cache Reset
### Description
Admin updates filesystem roots (`videos_dir`, `images_dir`, `labels_dir`, `thumbnails_dir`, `results_dir`, `gps_*`) via `PUT /settings/directories`. `SettingsService` persists the row and **must call** `PathResolver.Reset()` so subsequent reads see the new roots.
### Sequence Diagram
```mermaid
sequenceDiagram
autonumber
participant Admin
participant Ctrl as SettingsController (05)
participant Svc as SettingsService (05)
participant DB
participant Path as PathResolver (06)
Admin->>Ctrl: PUT /settings/directories (UpdateDirectoriesRequest, JWT ADM)
Ctrl->>Svc: UpdateDirectories(request)
Svc->>DB: UPDATE directory_settings
Svc->>Path: Reset()
Svc-->>Ctrl: ok
Ctrl-->>Admin: 204 NoContent
```
### Verified
`SettingsService` calls `pathResolver.Reset()` on directory updates (lines 71 and 85 of `Services/SettingsService.cs`). The invariant holds today.
### Error Scenarios
| Error | Where | Detection | Recovery |
|-------|-------|-----------|----------|
| Multi-instance deployments | n/a | each instance caches independently in its own `PathResolver` singleton | each pod re-loads on next miss; no cross-pod fan-out — flagged for horizontal scale planning |
---
## Flow F8: Dataset Bulk Status
### Description
Dataset Explorer changes annotation status one at a time or in bulk. `DatasetService.UpdateStatus` / `BulkUpdateStatus` issue a direct `UPDATE annotations SET status = ...` via `AppDataConnection`. **Today this flow does NOT publish SSE and does NOT enqueue the failsafe outbox** — the Annotator UI will not see dataset-driven status changes in real time, and downstream stream consumers will not see the lifecycle event. Open behavioral question (see Open Items below).
### Routes
- `PATCH /dataset/{annotationId}/status` (single)
- `POST /dataset/bulk-status` with `BulkStatusRequest { AnnotationIds, Status }` (bulk)
Both require `[Authorize(Policy = "DATASET")]`.
### Sequence Diagram
```mermaid
sequenceDiagram
autonumber
participant UI as Dataset Explorer
participant Ctrl as DatasetController (04)
participant Svc as DatasetService (04)
participant DB
UI->>Ctrl: PATCH /dataset/{id}/status OR POST /dataset/bulk-status (JWT DATASET)
Ctrl->>Svc: UpdateStatus(id, status) OR BulkUpdateStatus(request)
alt single
Svc->>DB: UPDATE annotations SET status WHERE id = :id
DB-->>Svc: rowcount
opt rowcount = 0
Svc-->>Ctrl: KeyNotFoundException
Ctrl-->>UI: 404
end
else bulk
Svc->>Svc: validate ids list non-empty (else 400)
Svc->>DB: UPDATE annotations SET status WHERE id IN (:ids)
end
Svc-->>Ctrl: ok
Ctrl-->>UI: 200 / 204
```
### Error Scenarios
| Error | Where | Detection | Recovery |
|-------|-------|-----------|----------|
| Empty bulk list | `BulkUpdateStatus` | `ArgumentException` | 400 via middleware |
| Annotation not found (single) | `UpdateStatus` | `updated == 0` | 404 |
| Partial bulk failure under DB error | service | exception mid-update | UPDATE is a single SQL statement (`Set` + `UpdateAsync`) — atomic at the statement level; either all listed rows update or none |
### Open behavioral questions
- Should this flow publish SSE so the Annotator UI updates live?
- Should this flow enqueue the outbox so AI training / admin sync reflect dataset status decisions?
- Today the answer to both is "no" — confirm with stakeholders.
---
## Stakeholder Resolutions (Step 4 outcome)
These were the open behavioral questions raised by the verification pass; resolved with the maintainer on 2026-05-14. The architecture doc carries the full ADRs (ADR-008..ADR-011) and the Refactor Backlog (RB-01..RB-06). Summary here:
1. **Silent Update / Delete / dataset-status changes** — confirmed real gap, not intent. World B is the design (drainer is already plumbed for `Validated` and `Deleted` per `FailsafeProducer.cs:108123`; the producer side was simply never wired in the new HTTP backend after the WPF split). Tracked: ADR-009 / RB-01.
2. **`system_settings.silent_detection`** — debug-time switch superseded by the suite e2e harness. Remove the flag and gating logic. Tracked: ADR-010 / RB-02.
3. **F1 atomicity** — adopt a business-transaction wrapper (transactional outbox): DB rows + outbox commit first, FS writes execute post-commit. Tracked: ADR-008 / RB-03.
4. **Annotation id collision risk** — switch to `XxHash3.Hash128` over the same sampled buffer to keep the hash file-size-independent (videos can be 35 GB) while moving from 64-bit to 128-bit collision space. Tracked: ADR-004 / RB-04.
5. **`FailsafeProducer.EnqueueAsync` static method doing DB I/O** — accepted as-is despite the `coderule.mdc` deviation; documented exception, no refactor.
6. **`detection_classes` static catalog** — promote to admin-managed (`POST/PUT/DELETE /classes` under `[ADM]`) with a read-through cache modeled on `PathResolver.Reset()`. Tracked: ADR-011 / RB-06.
### Sub-questions deferred to RB-01 implementation
- `UpdateAnnotation` (replaces detections, sets `Status=Edited`) → re-enqueue as `Created` (rich payload) or add `QueueOperation.Updated` and a new drainer branch?
- Status transitions other than `→ Validated` / `→ Deleted` — should they enqueue at all?
- `DeleteAnnotation` is hard-delete today even though `AnnotationStatus.Deleted = 40` exists. Confirm hard- vs soft-delete semantics.
### Verified during Step 4
- F7 (`PathResolver.Reset` on directory change) — invariant holds; `SettingsService` calls `Reset` on lines 71 + 85.
- All endpoint routes / policies match controller attributes.
- `AnnotationService.CreateAnnotation` exact sequence (image file → media row → annotation → detections → label file → SSE → outbox).
- `BulkUpdateStatus` empty-list rejection (`ArgumentException`).
- Whole `src/` tree has exactly **two** producer call sites: `AnnotationService.cs:90` (`PublishAsync`) and `:102` (`EnqueueAsync`). All other paths are silent today.
### Open at flow level (residual)
- **F4 missing-file behavior** for `Created` operations: `FailsafeProducer.cs:138` swallows `IOException` silently and emits a stream message with `image = null`. Tracked as RB-05 (architecture doc).
- **F4 multi-drainer dedupe**: still required — outbox uses no leasing. Suite consumer contract should dedupe by `(annotationId, operation)`.
Mermaid renderings of each flow are kept simple (no styling) per the template convention.