mirror of
https://github.com/azaion/annotations.git
synced 2026-06-21 18:51:06 +00:00
03f879206e
This commit captures everything produced during autodev existing-code Steps 1 (Document), 2 (Architecture Baseline Scan), and 3 (Test Spec), together with the targeted auth + CORS re-sync triggered on 2026-05-14 when codebase drift was detected at Step 4 entry. None of this work was previously committed. Step 1 (Document) — 50+ _docs/02_document/ files: problem, solution, architecture, system flows, glossary, module-layout, per-component specs (01..06), modules, deployment, diagrams, data model, FINAL report, verification log, discovery. Step 2 (Architecture Baseline) — architecture_compliance_baseline.md. Verdict PASS_WITH_WARNINGS (0 Critical, 0 High, 1 Medium, 2 Low). No High/Critical findings; auto-chained to Step 3 per existing-code flow. Step 3 (Test Spec) — _docs/02_document/tests/* (67 scenarios across blackbox, security, resilience, resource-limit, performance), plus e2e/docker-compose.test.yml, e2e/seed/run.sh, scripts/run-tests.sh, scripts/run-performance-tests.sh. Coverage 88% over the active scope (40 of 45 items covered, 6 RB-deferred, 5 documented-as-uncovered). Targeted auth + CORS re-sync — replaces the deleted in-house token issuer with a JWKS-verifier model. AuthController and TokenService removed; JwtExtensions switched from HS256 symmetric to ES256 over admin's JWKS. ConfigurationResolver and CorsConfigurationValidator added under src/Infrastructure/. ADR-002 and ADR-006 retired; SEC-01, SEC-02, SEC-03 marked Closed. One new testability risk recorded in architecture.md Open Risks Section 6 (JWKS HTTPS gating). Source changes: - src/Auth/JwtExtensions.cs (modified) — ES256, JWKS, alg pinning - src/Program.cs (modified) — DI wiring for ConfigurationResolver and CorsConfigurationValidator - src/Controllers/AuthController.cs (deleted) — no in-service issuance - src/Services/TokenService.cs (deleted) — same - src/Infrastructure/ConfigurationResolver.cs (new) - src/Infrastructure/CorsConfigurationValidator.cs (new) - .env.example (new) — required env var documentation - .gitignore (updated) Cross-repo coordination: _docs/cross-repo/flights_h1_h2_h3_change_spec captures the change-spec for downstream services that consumed the now deleted /auth endpoints. Co-authored-by: Cursor <cursoragent@cursor.com>
404 lines
40 KiB
Markdown
404 lines
40 KiB
Markdown
# Azaion.Annotations — Architecture
|
||
|
||
> **Source of truth for service-internal architecture.** Suite-level integration narrative lives in `../../../suite/_docs/01_annotations.md`. This file documents what the code in `src/` actually implements, derived bottom-up from module and component docs.
|
||
|
||
## Architecture Vision
|
||
|
||
**Status**: confirmed-by-user 2026-05-14.
|
||
|
||
Azaion.Annotations is a single .NET 10 ASP.NET Core service in the Azaion suite that owns the authoritative HTTP + streaming surface for annotation lifecycle, media upload, dataset exploration, and system metadata. State of record is PostgreSQL (Linq2DB + Npgsql) with an idempotent boot-time migrator. Real-time fan-out is in-process SSE; durable cross-service export is a transactional-outbox + RabbitMQ Stream pipeline producing MessagePack frames consumed by the admin sync worker and the AI training pipeline. The runtime is one container per node, ARM64-first via Woodpecker CI, with branch-driven image tags (`dev` | `stage` | `main`).
|
||
|
||
### Components & responsibilities
|
||
|
||
- **06 Platform** — shared kernel: DB, enums, JWT, error middleware, paths, composition root.
|
||
- **02 Annotations realtime & sync** — SSE channel + RabbitMQ Stream failsafe drainer.
|
||
- **01 Annotations REST** — annotation CRUD + image/thumbnail file routes; the lifecycle producer.
|
||
- **03 Media** — upload (single + batch), list, download, delete.
|
||
- **04 Dataset** — read-heavy `/dataset` surface + `DATASET`-policy status writes (planned to route through `01 Annotations REST` per RB-08).
|
||
- **05 Settings & metadata** — system / directory / camera / user settings + `/classes` catalog (becoming admin-managed per RB-06).
|
||
|
||
### Major data flows
|
||
|
||
- **F1 — Annotation create**: bytes → image file → DB rows → label file → SSE → outbox; will be wrapped in a business transaction (ADR-008).
|
||
- **F3 — SSE subscription**: UI long-poll on `/annotations/events`.
|
||
- **F4 — Outbox drain**: `FailsafeProducer` pumps queue rows to the RabbitMQ stream `azaion-annotations`.
|
||
- **F2 / F5 / F6 / F7 / F8** — read paths, media uploads, auth refresh, directory cache reset, dataset bulk status.
|
||
|
||
### Principles / non-negotiables
|
||
|
||
- **Wire enums are integer-stable** (suite contract). [inferred-from: `modules/wire-enums.md`, `suite/_docs/01_annotations.md`]
|
||
- **Annotation id is content-addressed** via a sampled image-bytes hash; remains file-size-independent (videos to ~5 GB). [inferred-from: `AnnotationService.ComputeHash`, ADR-004]
|
||
- **PostgreSQL is the state of record**; the filesystem is a content-addressed cache. [inferred-from: `data_model.md`, `system-flows.md` F1]
|
||
- **The transactional outbox is the durability boundary**; SSE is best-effort. [inferred-from: ADR-003 / ADR-008]
|
||
- **Lifecycle observability is World B**: every mutation publishes SSE and enqueues the outbox. [inferred-from: `FailsafeProducer` drainer plumbing for `Validated`/`Deleted`; maintainer resolution 2026-05-14 → ADR-009 / RB-01]
|
||
- **Soft-delete with file relocation**: `DeleteAnnotation` flips status to `AnnotationStatus.Deleted = 40` and moves files to a deleted-files directory rather than removing rows. [inferred-from: maintainer resolution 2026-05-14 → ADR-009 / RB-01]
|
||
- **Stream consumer dedupe contract is owned by this service**: outbox messages must carry enough metadata for downstream consumers to dedupe on `(annotationId, operation, dateTime)`. [inferred-from: maintainer resolution 2026-05-14 → ADR-013 / RB-09]
|
||
- **Mission is the canonical domain term**: code currently uses `FlightId`; the suite spec uses `missionId`. Code aligns to suite (rename, not the other way). [inferred-from: `00_discovery.md` drift list; maintainer resolution 2026-05-14 → ADR-012 / RB-07]
|
||
- **Dataset writes flow through the annotation domain service**: `04 Dataset` does not edit `annotations` rows directly. [inferred-from: `module-layout.md` Verification Needed §1; maintainer resolution 2026-05-14 → RB-08]
|
||
- **DB-driven runtime config**: directory roots and detection classes change at runtime via `ADM` endpoints, not redeploy. [inferred-from: `PathResolver.Reset`, ADR-011]
|
||
|
||
### Open questions / drift signals (residual)
|
||
|
||
- `UserId` body field vs JWT `NameIdentifier` (suite spec lists `UserId` on `POST /annotations`; code uses JWT subject). Reconcile in suite or code.
|
||
- The exact dedupe key shape for downstream consumers — `(annotationId, operation, dateTime)` is the working assumption per RB-09; suite consumer doc must be updated to match.
|
||
|
||
---
|
||
|
||
## 1. System Context
|
||
|
||
**Problem being solved**: Provide the canonical HTTP + streaming API for **annotation lifecycle** (create / update / status / delete / list / files), **media** (upload, list, download), **dataset exploration** (`DATASET` policy reads + bulk status writes), and **system metadata** (settings + detection class catalog), with **real-time SSE** push to UI consumers and **failsafe** export to RabbitMQ Stream consumers (admin sync, AI training).
|
||
|
||
**System boundaries**:
|
||
- **Inside**: a single ASP.NET Core process (`Azaion.Annotations.dll`), its embedded migrator, in-memory SSE channel, in-process `BackgroundService` outbox drain, and the on-disk image / label / thumbnail / results layout under `directory_settings`.
|
||
- **Outside**: PostgreSQL (state of record), RabbitMQ Streams (durable annotation export), the on-disk media/data filesystem (mounted), and every authenticated HTTP / SSE consumer (UIs, detections service, admin sync worker, AI training).
|
||
|
||
**External systems**:
|
||
|
||
| System | Integration Type | Direction | Purpose |
|
||
|--------|------------------|-----------|---------|
|
||
| PostgreSQL | DB (Linq2DB / Npgsql) | Both | State of record (annotations, media, queue, settings, classes) |
|
||
| RabbitMQ Streams | Stream client (`RabbitMQ.Stream.Client`) | Outbound | Durable export of annotation lifecycle (`azaion-annotations` stream) |
|
||
| Filesystem (mounted) | File I/O | Both | Annotation images, YOLO label `.txt`, thumbnails, results, GPS routes/sat |
|
||
| Annotator UI / Dataset Explorer UI | REST + SSE | Inbound | User flows (suite `01_annotations.md`, `09_dataset_explorer.md`) |
|
||
| Detections service (suite `detections`) | REST | Inbound | POST annotations after model inference; long-running tokens are refreshed against admin (annotations no longer mints tokens) |
|
||
| Admin sync worker / AI training | RabbitMQ Streams | Outbound | Consume `azaion-annotations` stream offsets (suite `Annotation Sync`) |
|
||
|
||
## 2. Technology Stack
|
||
|
||
| Layer | Technology | Version | Rationale |
|
||
|-------|------------|---------|-----------|
|
||
| Language | C# | `net10.0` (`src/Azaion.Annotations.csproj`) | Single language across suite .NET services |
|
||
| Framework | ASP.NET Core (minimal hosting + controllers) | net10.0 | Built-in JWT, CORS, Swagger, hosted services |
|
||
| ORM / DB driver | Linq2DB + Npgsql | per `csproj` | Linq2DB used for `ITable<>` repositories; Npgsql under the hood |
|
||
| Database | PostgreSQL | not pinned in code (URL-driven) | Suite-wide datastore |
|
||
| Auth | JWT Bearer (`Microsoft.AspNetCore.Authentication.JwtBearer`) — verifier-only, ES256 over admin's JWKS | net10.0 | Issuer/audience/lifetime/signature all validated; admin is the sole issuer (see Section 7) |
|
||
| Messaging | RabbitMQ Streams (`RabbitMQ.Stream.Client`) + MessagePack | per `csproj` | Durable, replayable annotation export |
|
||
| API docs | Swashbuckle (Swagger / Swagger UI) | per `csproj` | Always mounted (see ADR-005) |
|
||
| Hashing | `System.IO.Hashing` | net10.0 stdlib | Annotation id derived from image bytes hash |
|
||
| Hosting | `WebApplication` + `IHostedService` | net10.0 | `FailsafeProducer` runs in-process |
|
||
| Container | `mcr.microsoft.com/dotnet/aspnet:10.0` | linux/arm64 + linux/amd64 | Multi-arch image, ARM-first per Woodpecker |
|
||
| CI | Woodpecker CI (`.woodpecker/build-arm.yml`) | n/a | Branch-based image tag (`${BRANCH}-arm`) |
|
||
|
||
**Key constraints (evidenced in code/config)**:
|
||
- `DATABASE_URL` is **required** at startup — `ConfigurationResolver.ResolveRequiredOrThrow` throws if not set. The string is auto-converted from `postgresql://user:pass@host:port/db` URI form to Linq2DB's `Host=…;Username=…` form by `Program.ConvertPostgresUrl`.
|
||
- JWT verification is **required** at startup — `JWT_ISSUER`, `JWT_AUDIENCE`, and `JWT_JWKS_URL` are all resolved by `ConfigurationResolver.ResolveRequiredOrThrow`. There is no insecure fallback. The JWKS URL must be HTTPS (`HttpDocumentRetriever { RequireHttps = true }`).
|
||
- Default directory roots are `/data/{videos,images,labels,results,thumbnails,gps_sat,gps_route}` (migrator `directory_settings` defaults) → operator must mount or override at the DB level via `PUT /settings/directories`.
|
||
- CORS is **environment-gated**: `CorsConfigurationValidator.EnsureSafeForEnvironment` refuses to start in `Production` when `CorsConfig:AllowedOrigins` is empty unless `CorsConfig:AllowAnyOrigin=true` is set explicitly. ADR-006 was retired together with the wide-open default.
|
||
|
||
## 3. Deployment Model
|
||
|
||
**Environments** (evidenced from CI branches): `dev`, `stage`, `main` → image tag `${CI_COMMIT_BRANCH}-arm` pushed to a private registry resolved from `REGISTRY_HOST` secret.
|
||
|
||
**Infrastructure**:
|
||
- Single .NET service container; container exposes port `8080`.
|
||
- Multi-arch build supported in the Dockerfile (`--platform=$BUILDPLATFORM`, `$TARGETARCH`); the ARM Woodpecker pipeline currently only emits `arm64`.
|
||
- Scaling is **vertical-only** as written: SSE uses an in-process `Channel<AnnotationEventDto>`, and the `FailsafeProducer` outbox drainer is a per-instance `BackgroundService` — see "Open Architectural Risks".
|
||
|
||
**Environment-specific configuration** (defaults vs production):
|
||
|
||
| Config | Source | Development default | Production behavior |
|
||
|--------|--------|---------------------|---------------------|
|
||
| `DATABASE_URL` | env or `Database:Url` config key | none — fail-fast on missing (`ConfigurationResolver`) | MUST set |
|
||
| `JWT_ISSUER` | env or `Jwt:Issuer` config key | none — fail-fast | MUST set (matches admin's issuer) |
|
||
| `JWT_AUDIENCE` | env or `Jwt:Audience` config key | none — fail-fast | MUST set (matches admin's audience for this service) |
|
||
| `JWT_JWKS_URL` | env or `Jwt:JwksUrl` config key | none — fail-fast; HTTPS required | MUST set to admin's JWKS endpoint |
|
||
| `RABBITMQ_HOST` / `RABBITMQ_STREAM_PORT` | env | `127.0.0.1` / `5552` | Override per environment |
|
||
| `RABBITMQ_PRODUCER_USER` / `_PASS` | env | `azaion_producer` / `producer_pass` | Override |
|
||
| `RABBITMQ_STREAM_NAME` | env | `azaion-annotations` | Usually kept (suite contract) |
|
||
| `CorsConfig:AllowedOrigins` | `IConfiguration` (string array) | empty | MUST set (or set `AllowAnyOrigin=true` explicitly) — `CorsConfigurationValidator` refuses to start in Production otherwise |
|
||
| `CorsConfig:AllowAnyOrigin` | `IConfiguration` (bool) | false | Explicit opt-in for permissive policy |
|
||
| Directory roots (`/data/...`) | DB `directory_settings` | hard-coded SQL defaults | Tune via `PUT /settings/directories` (calls `PathResolver.Reset`) |
|
||
| Swagger UI | `Program.cs` | mounted | **Also mounted in prod** (ADR-005) |
|
||
| `AZAION_REVISION` | Dockerfile build arg `CI_COMMIT_SHA` | `unknown` | Stamped per-image |
|
||
|
||
## 4. Data Model Overview
|
||
|
||
> Detailed ERD, indexes, and migration semantics live in `data_model.md`. This section is the cross-component summary.
|
||
|
||
**Core entities** (owned by `06_platform`; consumed by feature components):
|
||
|
||
| Entity | Description | Owned by component |
|
||
|--------|-------------|---------------------|
|
||
| `media` | Uploaded image/video reference (waypoint-scoped) | `03_media` (writes) / `01_annotations-rest` (reads) |
|
||
| `annotations` | Annotation row keyed by image-bytes hash, soft-versioned by `created_date`, `time` (BIGINT ticks) | `01_annotations-rest` |
|
||
| `detection` | YOLO bounding boxes (`center_x/y, width, height`, class, affiliation, combat readiness) per annotation | `01_annotations-rest` |
|
||
| `annotations_queue_records` | Outbox for failsafe stream sync (`operation`, `annotation_ids` JSON array) | `02_annotations-realtime-sync` (writer) / `01_annotations-rest` (writer side) |
|
||
| `system_settings` | Singleton-ish org settings + `generate_annotated_image`, `silent_detection` toggles | `05_settings-metadata` |
|
||
| `directory_settings` | Filesystem roots consumed by `PathResolver` | `05_settings-metadata` |
|
||
| `detection_classes` | Seeded class catalog for UI label/color (ids 0–18, names + Cyrillic short names + hex colors) | `05_settings-metadata` (read-only `ClassesController`) |
|
||
| `user_settings` | Per-user UI prefs (panel widths, selected flight) | `05_settings-metadata` |
|
||
| `camera_settings` | Calibration (altitude, focal length, sensor width) | `05_settings-metadata` |
|
||
|
||
**Key relationships**:
|
||
- `annotations.media_id` → `media.id` (FK).
|
||
- `detection.annotation_id` → `annotations.id` (FK; cascades on annotation update logic in service layer, not DB).
|
||
- `annotations_queue_records.annotation_ids` is a **JSON array of TEXT ids** (no FK); single-row outbox entry can reference multiple annotations (bulk).
|
||
|
||
**Data flow summary**:
|
||
- **Inbound write (Create)** — *today*: HTTP body → `AnnotationService.CreateAnnotation` → image bytes to `images_dir/{id}.jpg`, optional `media` row insert, `annotations` + `detection` rows, YOLO label to `labels_dir/{id}.txt`, SSE publish, then (if `silent_detection != true`) outbox row → drained by `FailsafeProducer` → MessagePack frame on RabbitMQ stream. **Thumbnails are not produced by this flow** — they are read-only via `PhysicalFile` and presumed populated out-of-band.
|
||
- **Inbound write (Update / UpdateStatus / Delete annotations, dataset PATCH / bulk-status)** — *today*: DB-only, silent. *Target* (RB-01): every mutation publishes SSE and enqueues the outbox with the appropriate `QueueOperation` (`Created`, `Validated`, or `Deleted`).
|
||
- **Lifecycle ordering** — *target* (RB-03): all DB writes plus the outbox row commit inside a single business transaction; FS writes (image / label / future thumbnail generation) and SSE publish are post-commit, with the outbox row as the durable promise.
|
||
- **Inbound read**: HTTP query → DB joins (`annotations × detection × media`) → JSON list (`PaginatedResponse<AnnotationListItem>`); image/thumbnail served as `PhysicalFile`.
|
||
|
||
## 5. Integration Points
|
||
|
||
### Internal communication (in-process)
|
||
|
||
| From | To | Protocol | Pattern | Notes |
|
||
|------|----|----------|---------|-------|
|
||
| `01_annotations-rest` (`AnnotationService`) | `02_annotations-realtime-sync` (`AnnotationEventService`) | C# call | Fire-and-forget publish to `Channel<>` | **Today**: only on Create. **Target (RB-01)**: every mutation publishes (Create, Update, UpdateStatus, Delete) |
|
||
| `01_annotations-rest` (`AnnotationService`) | `02_annotations-realtime-sync` (`annotations_queue_records` table) | DB INSERT via `FailsafeProducer.EnqueueAsync` (static helper) | Outbox | **Today**: Create only, gated by `silent_detection`. **Target (RB-01 + RB-02)**: every mutation enqueues with the appropriate `QueueOperation`; gating flag removed |
|
||
| `02_annotations-realtime-sync` (`FailsafeProducer`) | `06_platform` (`AppDataConnection`, `PathResolver`) | C# call | Read-then-delete | Drainer is **already plumbed** for `Created`, `Validated`, and `Deleted` operations (see `FailsafeProducer.cs:108–123`) |
|
||
| `04_dataset` (`DatasetService.UpdateStatus` / `BulkUpdateStatus`) | `01_annotations-rest` (`AnnotationEventService`) + outbox | shared DB + cross-component call | Direct write today; lifecycle publish + enqueue per RB-01 | Bulk path enqueues a single `Validated` outbox record carrying all ids |
|
||
| `05_settings-metadata` (directory PUT) | `06_platform` (`PathResolver.Reset`) | C# call | Cache invalidation | Required after directory change |
|
||
|
||
### External integrations
|
||
|
||
| External system | Protocol | Auth | Rate limits | Failure mode |
|
||
|-----------------|----------|------|-------------|--------------|
|
||
| PostgreSQL | TCP / Linq2DB / Npgsql | Conn string | n/a | Surfaced as 500 via `ErrorHandlingMiddleware` |
|
||
| RabbitMQ Stream `azaion-annotations` | Stream protocol (5552) | Stream user/pass (`azaion_producer` default) | Stream-level | `FailsafeProducer` retries; rows stay in `annotations_queue_records` until drained |
|
||
| Filesystem (`/data/...`) | POSIX | OS perms | n/a | `IOException` → 500; missing image on GET → 404 |
|
||
| HTTP clients (UIs, detections, admin) | REST + SSE | JWT Bearer (`ANN`, `DATASET`, `ADM`) | n/a | `401` if invalid; `403` if missing claim |
|
||
|
||
## 6. Non-Functional Requirements
|
||
|
||
> Pulled only from code-level evidence — config defaults, validators, health checks, idempotent migrator. Anything not evidenced is left blank rather than guessed.
|
||
|
||
| Requirement | Target | Measurement | Priority | Source |
|
||
|-------------|--------|-------------|----------|--------|
|
||
| Liveness | 200 OK on `GET /health` | route in `Program.cs` | High | `Program.cs` |
|
||
| Idempotent startup | DB schema applies cleanly on every boot | `DatabaseMigrator.Migrate` uses `CREATE TABLE IF NOT EXISTS` + `ALTER TABLE … IF NOT EXISTS` and `INSERT … ON CONFLICT DO NOTHING` | High | `Database/DatabaseMigrator.cs` |
|
||
| Recovery: queue durability | Annotation lifecycle events are not lost across pod restarts | DB-backed outbox (`annotations_queue_records`) drained by `FailsafeProducer` | High | `Services/FailsafeProducer.cs` |
|
||
| Auth lifetime / clock skew | per `JwtExtensions.AddJwtAuth` config | `auth-identity` module | Medium | `Auth/JwtExtensions.cs` |
|
||
| Pagination defaults | `PaginatedResponse<T>` total/page/pageSize | applied in list endpoints | Medium | `DTOs/PaginatedResponse.cs` |
|
||
| Thumbnail dimensions | `240×135` with `10` border (defaults) | `system_settings.thumbnail_*` | Low | migrator defaults |
|
||
| Throughput / latency / availability targets | **not evidenced in code** | — | — | open question, see `00_problem` extraction (Step 6) |
|
||
|
||
## 7. Security Architecture
|
||
|
||
**Authentication**: JWT Bearer; **ES256 signature** verified against admin's JWKS endpoint (`JWT_JWKS_URL`, default `https://admin.azaion.com/.well-known/jwks.json`). `ValidateIssuer`, `ValidateAudience`, `RequireSignedTokens`, and `RequireExpirationTime` are all enforced; algorithms are pinned to `EcdsaSha256` to block HS256-confusion forgeries. Admin is the sole token issuer for the suite — annotations no longer holds an HMAC secret and no longer mints tokens (`TokenService` and `POST /auth/refresh` were removed; callers refresh against admin).
|
||
|
||
**Authorization** (per-endpoint policy claims, all evidenced in controllers):
|
||
- `ANN` — `AnnotationsController`, `MediaController`.
|
||
- `DATASET` — `DatasetController` (status writes including bulk).
|
||
- `ADM` — mutating routes on `SettingsController`.
|
||
- `[Authorize]` (any authenticated user) — read endpoints on settings, `ClassesController`.
|
||
- `[AllowAnonymous]` — `/health`.
|
||
|
||
**User identity**: server resolves user from JWT `NameIdentifier` (e.g., `AnnotationsController.Create` parses `User.FindFirstValue(ClaimTypes.NameIdentifier)` → `Guid`). Suite spec sometimes lists `UserId` in body — drift recorded in `00_discovery.md`.
|
||
|
||
**Data protection**:
|
||
- **At rest**: nothing in-code — relies on the underlying Postgres deployment + filesystem.
|
||
- **In transit**: terminated outside the container; service speaks plain HTTP on `:8080`.
|
||
- **Secrets**: env-driven (`DATABASE_URL`, `JWT_ISSUER`, `JWT_AUDIENCE`, `JWT_JWKS_URL`, `RABBITMQ_*`). `DATABASE_URL` and the three JWT vars now fail-fast on startup if unset (no insecure default). ADR-002 was retired together with `JWT_SECRET`.
|
||
- **CORS**: config-driven allow-list (`CorsConfig:AllowedOrigins`); `CorsConfigurationValidator.EnsureSafeForEnvironment` refuses to start in `Production` with an empty list unless `CorsConfig:AllowAnyOrigin=true` is explicitly set. ADR-006 was retired together with the wide-open default.
|
||
|
||
**Audit logging**: not evidenced beyond ASP.NET Core defaults — open gap; flag in retro/security audit.
|
||
|
||
**Input validation**: surfaces through model binding + `ErrorHandlingMiddleware` mapping (`400 / 404 / 409 / 500`); detailed validators per DTO live in `DTOs/Requests/` (component specs to confirm during Step 4 verification).
|
||
|
||
## 8. Key Architectural Decisions (inferred from code)
|
||
|
||
These ADRs document choices the codebase already evidences. They are descriptive, not prescriptive — call them out so downstream skills can challenge them deliberately.
|
||
|
||
### ADR-001: In-process SSE via `Channel<T>`
|
||
|
||
**Context**: Real-time annotation activity must reach the Annotator UI within 100ms of a write.
|
||
|
||
**Decision**: Use a singleton `AnnotationEventService` exposing an unbounded `Channel<AnnotationEventDto>` and serve subscribers from `AnnotationsController.Events` over `text/event-stream`.
|
||
|
||
**Alternatives considered (implicitly rejected)**:
|
||
1. Broker-backed pub/sub (Redis / RabbitMQ exchange) — rejected because it adds a dependency for what is already a single-process workload, and the failsafe queue covers durable export needs.
|
||
2. Server-side polling — rejected because it cannot meet sub-second latency cheaply.
|
||
|
||
**Consequences**: SSE state is **per-instance only**. Horizontal scaling requires a broker fanout layer or sticky sessions on the LB.
|
||
|
||
### ADR-002 (RETIRED): Symmetric JWT, no issuer/audience validation
|
||
|
||
**Status**: superseded — annotations is now a JWKS verifier of admin-signed ES256 tokens. `AddJwtAuth(IConfiguration)` pins `ValidAlgorithms = [SecurityAlgorithms.EcdsaSha256]`, enforces `ValidateIssuer`/`ValidateAudience`/`RequireSignedTokens`/`RequireExpirationTime`, and resolves keys through `ConfigurationManager<JsonWebKeySet>` against `JWT_JWKS_URL`. `JWT_SECRET` was removed along with the local refresh path; admin is the sole issuer for the suite. The original ADR is preserved here for historical context only.
|
||
|
||
### ADR-003: Failsafe outbox + RabbitMQ Stream (not direct publish)
|
||
|
||
**Context**: Annotation lifecycle must reach external consumers (admin sync, AI training) durably even when RabbitMQ is unavailable at the moment of the write.
|
||
|
||
**Decision**: Every mutation writes a row to `annotations_queue_records`; the in-process `FailsafeProducer` (`IHostedService`) drains this table and publishes MessagePack frames on the `azaion-annotations` stream, deleting rows after success.
|
||
|
||
**Alternatives considered**:
|
||
1. Direct publish in the request path — rejected because RabbitMQ unavailability would either drop events (`fire-and-forget`) or fail user-visible writes (sync publish).
|
||
2. Transactional outbox via Debezium / CDC — heavier, deferred.
|
||
|
||
**Consequences**: One outbox-drainer per service instance. Multiple instances drain concurrently → safe because the deletion is keyed on `id` and re-reads of disk bytes are idempotent, **but** ordering across consumers is not guaranteed.
|
||
|
||
### ADR-004: Annotation id from a sampled `XxHash3.Hash128` of image bytes
|
||
|
||
**Context**: Annotation rows must be deduplicated when the same image is re-uploaded (e.g., re-runs of the detection pipeline). The system also serves video media up to **3–5 GB**, so hashing must remain **constant-time with respect to file size** to keep create-path latency stable under load.
|
||
|
||
**Decision** (resolved 2026-05-14): Hash a deterministic **fixed-size sample** with `XxHash3.Hash128` (128-bit output, 32-char lower-case hex). Sample composition is unchanged from the current implementation:
|
||
- For inputs **≤ 3072 bytes**: `[length(8 bytes)] + [full bytes]`.
|
||
- For inputs **> 3072 bytes**: `[length(8 bytes)] + [first 1024] + [middle 1024 starting at len/2 − 512] + [last 1024]`.
|
||
|
||
When `MediaId` is provided instead of bytes, the annotation id is reused from the referenced media row.
|
||
|
||
**Why this combination**:
|
||
- **Sampling preserves file-size independence.** Reading a 5 GB video front-to-back just to derive an id is unacceptable on the hot path.
|
||
- **`XxHash3.Hash128` over the same sample** keeps the hashing itself O(1) in file size while moving the collision space from 2^64 to 2^128. Distinct large images that happen to share `(length, head 1 KB, middle 1 KB, tail 1 KB)` still collide deterministically — but the practical collision probability among such samples is now negligible at any realistic volume.
|
||
|
||
**Migration consequences**:
|
||
- The annotation `id` column is `TEXT PRIMARY KEY`; switching from 16-char (`XxHash64`) to 32-char (`XxHash3.Hash128`) hex requires no schema change.
|
||
- Existing rows keep their 16-char ids; new rows get 32-char ids. Re-create of an image whose original id was generated under `XxHash64` will produce a **different** new id under `XxHash3.Hash128` — i.e., re-creates after the upgrade no longer collide with their pre-upgrade row. Acceptable (and expected): old ids are stable, the deduplication property is preserved going forward, and the upgrade is irreversible by design.
|
||
|
||
**Status**: agreed. Implementation lives in the Refactor Backlog (RB-04).
|
||
|
||
### ADR-005: Swagger UI mounted in all environments
|
||
|
||
**Context**: Internal debugging / partner integration friction.
|
||
|
||
**Decision**: `app.UseSwagger()` and `app.UseSwaggerUI()` are unconditional in `Program.cs`.
|
||
|
||
**Consequences**: Schema is publicly readable wherever the service is reachable. If the perimeter is not closed, this leaks endpoint surface — treat as a security finding for production-internet exposure.
|
||
|
||
### ADR-006 (RETIRED): Wide-open CORS
|
||
|
||
**Status**: superseded — the default policy now reads `CorsConfig:AllowedOrigins` (string array) and `CorsConfig:AllowAnyOrigin` (boolean opt-in). `CorsConfigurationValidator.EnsureSafeForEnvironment` refuses to start in `Production` when origins are empty and `AllowAnyOrigin` is not explicitly set; a `LogWarning` is emitted in non-production when running with the permissive default. The original ADR is preserved here for historical context only.
|
||
|
||
### ADR-007: Embedded SQL migrator (not EF migrations / Flyway)
|
||
|
||
**Context**: Suite values single-binary deploys; the team prefers idempotent boot-time DDL over a separate migration tool.
|
||
|
||
**Decision**: `DatabaseMigrator.Migrate` runs a single multi-statement script via Linq2DB on every startup. Schema evolution is additive (`ALTER … ADD COLUMN IF NOT EXISTS`).
|
||
|
||
**Consequences**: Backwards-only, no down migrations. Renames or destructive changes need an explicit out-of-band script. Drift detection requires diffing live DB against `Database/DatabaseMigrator.cs`.
|
||
|
||
### ADR-008: Annotation lifecycle wrapped in a business transaction (planned)
|
||
|
||
**Context**: `CreateAnnotation` today touches the filesystem, three DB tables, an in-memory channel, and an outbox row, with no atomicity. World B (lifecycle is observable — see ADR-009) widens this surface to Update / Delete / status-change paths. A naive DB transaction does not wrap the FS writes; we want a single conceptual transactional boundary for the lifecycle, not just for the DB rows.
|
||
|
||
**Decision** (resolved 2026-05-14, to-be-implemented): introduce a **business-transaction wrapper** for annotation lifecycle operations. Concretely the chosen pattern is the **transactional outbox**:
|
||
|
||
1. Write all relevant DB rows (annotation / detection / annotations_queue_records) inside a single `db.BeginTransaction` scope.
|
||
2. Commit. The outbox row is the durable promise that the post-commit work is owed.
|
||
3. **Post-commit**, perform side effects: write image / label / thumbnail files, publish SSE event. These steps are idempotent on retry; the outbox row stays until the drainer succeeds.
|
||
4. The drainer (`FailsafeProducer`) is unchanged in role — it consumes the outbox.
|
||
|
||
**Implications**:
|
||
- FS write order shifts: today image is first, before any DB row; after the refactor, DB rows + outbox commit first, then FS writes execute (with the outbox row as the recovery anchor).
|
||
- A new abstraction (e.g., `AnnotationLifecycleTransaction` or a thin extension on `AppDataConnection`) is the right place to centralize this. Implementation deferred to RB-03.
|
||
|
||
**Alternatives considered**:
|
||
1. Pure DB transaction wrapping current order — rejected: doesn't cover FS, leaves orphan-file risk.
|
||
2. Saga / compensation steps with explicit rollback handlers — rejected: overkill for the linear lifecycle here.
|
||
|
||
**Status**: agreed. Implementation lives in the Refactor Backlog (RB-03).
|
||
|
||
### ADR-009: Lifecycle observability — World B (planned)
|
||
|
||
**Context**: Today only `CreateAnnotation` publishes SSE and enqueues the outbox. Update / UpdateStatus / Delete (annotations) and UpdateStatus / BulkUpdateStatus (dataset) are silent. The `QueueOperation` enum already declares `Validated` and `Deleted`, and `FailsafeProducer.cs:108–123` has a dedicated drainer branch for both — strong evidence that the design always intended every lifecycle change to be observable. The producer side simply was never wired (the prior WPF codebase blended UI + backend; lifecycle calls likely came from the UI directly, which the new HTTP backend has not replicated).
|
||
|
||
**Decision** (resolved 2026-05-14, to-be-implemented): every annotation mutation publishes SSE and enqueues the outbox.
|
||
|
||
Mapping (initial; sub-questions to be resolved at implementation time):
|
||
|
||
| Mutation | SSE | Outbox `QueueOperation` |
|
||
|----------|-----|--------------------------|
|
||
| `AnnotationService.CreateAnnotation` | yes (today) | `Created` (today) |
|
||
| `AnnotationService.UpdateAnnotation` (replace detections, status → `Edited`) | yes | open: re-enqueue as `Created` (richer payload) **or** add `QueueOperation.Updated` + corresponding drainer branch |
|
||
| `AnnotationService.UpdateStatus` (status → `Validated (30)` or `Deleted (40)`) | yes | `Validated` |
|
||
| `AnnotationService.UpdateStatus` (other transitions) | yes | open: skip outbox, or always enqueue `Validated`? |
|
||
| `AnnotationService.DeleteAnnotation` | yes | `Deleted` — **soft-delete**: status flips to `AnnotationStatus.Deleted = 40`, the row stays, image / label / thumbnail files relocate to a `deleted_dir` (new `directory_settings` column added by RB-01) |
|
||
| `DatasetService.UpdateStatus` / `BulkUpdateStatus` | yes (per-id for bulk) | `Validated` (single record covers the whole bulk via `AnnotationIds`) |
|
||
|
||
**Status**: agreed. Implementation lives in the Refactor Backlog (RB-01).
|
||
|
||
### ADR-010: Remove `system_settings.silent_detection`
|
||
|
||
**Context**: `silent_detection` was a debug-time switch to keep the RabbitMQ stream clean while a developer iterated locally. Now that the suite has e2e tests with isolated queues (per `_docs/_repo-config.yaml` suite-e2e), the in-product flag is dead code — debug isolation belongs in the test harness, not in `system_settings`.
|
||
|
||
**Decision** (resolved 2026-05-14, to-be-implemented):
|
||
- Remove the gating block in `AnnotationService.CreateAnnotation:100–102` (always enqueue).
|
||
- Drop `silent_detection` from `system_settings` (column, entity, migrator `CREATE TABLE`, migrator `ALTER` line, any DTO references).
|
||
- Remove the field from `UpdateSystemSettingsRequest` if present.
|
||
|
||
**Status**: agreed. Implementation lives in the Refactor Backlog (RB-02). Schema column removal is a destructive change explicitly authorized by the maintainer.
|
||
|
||
### ADR-012: Rename `Flight` → `Mission` to align with suite canonical (planned)
|
||
|
||
**Context**: The suite product spec (`suite/_docs/01_annotations.md`) calls the domain concept `mission` / `missionId`. The code uses `Flight` / `FlightId` (table `media.waypoint_id` + DTO `FlightId` filter). This drift was flagged in `00_discovery.md`.
|
||
|
||
**Decision** (resolved 2026-05-14, to-be-implemented): align code to the suite. `Flight*` → `Mission*` rename across DTOs, controllers, services, and the relevant query-parameter names. The `media.waypoint_id` column stays (it is the underlying physical identifier; mission is the logical grouping concept above it).
|
||
|
||
**Status**: agreed. Implementation lives in the Refactor Backlog (RB-07). Schema column changes are scoped to renames in DTOs and code only — no DB column rename is required for this ADR.
|
||
|
||
### ADR-013: Stream consumer dedupe contract is owned by this service (planned)
|
||
|
||
**Context**: The failsafe outbox + RabbitMQ Stream pipeline can produce duplicate stream entries when (a) the drainer retries after a partial publish or (b) two service instances both pick up the same outbox row before either deletes it. Today there is no documented dedupe contract; consumers (admin sync, AI training) silently accept whatever they get.
|
||
|
||
**Decision** (resolved 2026-05-14, to-be-implemented): publish a documented dedupe contract owned by this service. Working shape: consumers MUST dedupe by `(annotationId, operation, dateTime)`. The outbox row's `DateTime` (already populated by `EnqueueAsync`) becomes part of the on-the-wire stream message, alongside the `annotationId` and `operation` already in `AnnotationQueueMessage` / `AnnotationBulkQueueMessage`.
|
||
|
||
**Status**: agreed. Implementation lives in the Refactor Backlog (RB-09).
|
||
|
||
### ADR-011: Detection class catalog is admin-managed with in-memory cache (planned)
|
||
|
||
**Context**: `detection_classes` is currently seeded by the migrator (19 rows) and read-only via `GET /classes`. Operators have no way to add or correct classes (e.g., the `Smoke`/`Plane` color clash on `#000080`) without a code change and redeploy.
|
||
|
||
**Decision** (resolved 2026-05-14, to-be-implemented):
|
||
- `ClassesController` exposes `POST /classes`, `PUT /classes/{id}`, `DELETE /classes/{id}` under `[Authorize(Policy = "ADM")]`. `GET /classes` stays `[Authorize]`.
|
||
- Reads go through a new `DetectionClassCache` (DI singleton) modeled on `PathResolver`: lazy-load on first read, `Reset()` after any write.
|
||
- Migrator-seeded rows remain as the bootstrap state; admin writes overwrite them per id.
|
||
|
||
**Status**: agreed. Implementation lives in the Refactor Backlog (RB-06). Adds a new feature surface; must land before any UI change relying on dynamic class management.
|
||
|
||
## Resolved Architectural Decisions (Step 4 verification)
|
||
|
||
The following items were surfaced during verification and resolved with the maintainer on 2026-05-14. Each one either becomes an ADR above or maps to a refactor backlog entry below.
|
||
|
||
| # | Concern | Resolution | Tracked as |
|
||
|---|---------|------------|------------|
|
||
| 1 | Update / Delete / dataset-status changes are silent on SSE + outbox | Treat as gap; lifecycle is observable (World B) — every mutation publishes + enqueues | ADR-009 / RB-01 |
|
||
| 2 | `system_settings.silent_detection` semantics | Remove the flag; e2e harness covers debug isolation now | ADR-010 / RB-02 |
|
||
| 3 | F1 not transactional across FS + DB + outbox | Wrap lifecycle in a business-transaction (transactional outbox); FS writes happen post-commit | ADR-008 / RB-03 |
|
||
| 4 | `XxHash64` over sampled bytes — collision risk | Switch to `XxHash3.Hash128` over the same sample (file-size-independent + 128-bit space) | ADR-004 / RB-04 |
|
||
| 5 | `FailsafeProducer.EnqueueAsync` static method does DB I/O — violates `coderule.mdc` | Accept as-is; documented deviation from rule | (no refactor) |
|
||
| 6 | `detection_classes` schema-mutable but no controller writes | Admin-managed CRUD with read-through cache (modeled on `PathResolver`) | ADR-011 / RB-06 |
|
||
| 7 | `Flight` (code) vs `mission` (suite spec) drift | Rename code → `Mission*`; suite spec stays canonical | ADR-012 / RB-07 |
|
||
| 8 | Dataset writes coupled directly to annotation rows via shared `AppDataConnection` | Route dataset writes through `AnnotationService` (via a public domain interface) | RB-08 |
|
||
| 9 | Stream consumer dedupe contract owner | This service owns it; dedupe by `(annotationId, operation, dateTime)` baked into the wire message | ADR-013 / RB-09 |
|
||
| 10 | Hard-delete vs soft-delete on `DeleteAnnotation` | Soft-delete: status → `Deleted (40)`, files moved to a `deleted_dir` | ADR-009 (folded in) / RB-01 |
|
||
|
||
## Remaining Open Architectural Risks
|
||
|
||
These are residual risks that still need attention from later autodev steps (Test Spec, Refactor, Security Audit). Items previously listed here that have been resolved as of 2026-05-14 (Flight/mission drift, dataset coupling, hard-vs-soft delete, JWT issuer/audience validation, CORS environment gating, dev secret fallback) moved to the Resolved Architectural Decisions table above and the Refactor Backlog below.
|
||
|
||
1. **Horizontal scaling**: SSE channel is per-instance (singleton `AnnotationEventService`); the failsafe outbox uses no leasing/locking. Two pods will independently drain rows, with deletion keyed on `id`; under high concurrency the same row can be picked by both before either deletes — duplicate stream entries possible. Consumers must dedupe per ADR-013. (Touched by RB-03 / RB-09 indirectly but not solved by them.)
|
||
2. **Swagger exposure** in production: see ADR-005. Belongs to Step 14 (Security Audit). (CORS exposure was resolved by `CorsConfigurationValidator`; ADR-006 retired.)
|
||
3. **`UserId` body field vs JWT `NameIdentifier`** drift (suite spec lists `UserId` on `POST /annotations`; code uses JWT subject). Reconcile in the suite spec.
|
||
4. **No automated tests**: addressed by autodev Phase A Steps 3–7 (Test Spec → Implement Tests → Run Tests).
|
||
5. **`FailsafeProducer.cs:138` swallows `IOException` on image read silently** (`catch { }`). Direct `coderule.mdc` violation. Symptom in product: a missing or unreadable image yields a stream message with `image = null` and no log/metric — the gap is invisible to operators. Track on Refactor Backlog (RB-05).
|
||
6. **JWKS HTTPS-only retrieval blocks containerised test harnesses** that would otherwise serve a static JWKS over plain HTTP. Tests must either run a TLS-terminating sidecar in the test compose stack or rely on test-only configuration that relaxes `RequireHttps`. Not a production risk; a Step 4 (Code Testability Revision) item.
|
||
|
||
## Refactor Backlog
|
||
|
||
These items are the implementation work for the resolved decisions above. They are **not** part of Step 4 (Verification) corrections — they will be picked up by the autodev existing-code flow at Step 8 (Refactor) and/or new feature tasks in Phase B.
|
||
|
||
| ID | Scope | Source ADR / Risk | Notes |
|
||
|----|-------|--------------------|-------|
|
||
| RB-01 | Wire lifecycle publish + outbox enqueue across Update / UpdateStatus / Delete (annotations + dataset). Includes the soft-delete behavior: `DeleteAnnotation` flips `AnnotationStatus → Deleted (40)`, leaves the row, and moves image / label / thumbnail files to a new `deleted_dir` (added to `directory_settings`). Read paths must filter `Status = Deleted (40)` from default lists. | ADR-009 | Open sub-questions: (a) `UpdateAnnotation` mapping — re-enqueue as `Created` or add `QueueOperation.Updated` + drainer branch; (b) which non-Validated/Deleted status transitions enqueue at all |
|
||
| RB-02 | Remove `silent_detection` (schema column, entity field, gating logic, DTOs) | ADR-010 | Destructive schema change explicitly authorized |
|
||
| RB-03 | Introduce business-transaction wrapper (transactional outbox) for annotation lifecycle | ADR-008 | Reorders FS writes to post-commit; covers all mutation paths |
|
||
| RB-04 | Switch annotation id hashing to `XxHash3.Hash128` over the same sampled buffer | ADR-004 | Existing 16-char ids stay; new ids are 32-char hex |
|
||
| RB-05 | Replace `catch { }` at `FailsafeProducer.cs:138` with logged failure path; surface as a metric | Open Risk §6 | Downstream consumer should know an image-less message means a real disk error |
|
||
| RB-06 | Admin-managed `detection_classes` (CRUD endpoints `[ADM]`, in-memory cache with `Reset()`) | ADR-011 | Migrator seed remains as bootstrap; admin overrides per id; fix `Smoke`/`Plane` color collision while at it |
|
||
| RB-07 | Rename `Flight*` → `Mission*` across DTOs, controllers, services, and query-parameter names. `media.waypoint_id` column is unchanged (it's the physical id; mission is the logical concept). | ADR-012 | Code-only rename to align with suite spec; suite stays canonical |
|
||
| RB-08 | Decouple `04 Dataset` writes from direct `annotations` row mutations — route status writes through a public `AnnotationService` interface. Reads can stay direct for now (read coupling is lower-risk than write coupling). | Open Risks (former §4) | Likely introduces an `IAnnotationLifecycle` (or similar) interface owned by `01 Annotations REST` that `04 Dataset` consumes via DI |
|
||
| RB-09 | Bake `(annotationId, operation, dateTime)` into the on-the-wire stream message; document the dedupe contract in `suite/_docs/01_annotations.md`. | ADR-013 | Coordinate the suite-doc update with admin sync + AI training maintainers |
|
||
|
||
## References
|
||
|
||
- Suite product spec: `../../../suite/_docs/01_annotations.md` (REST contracts, SSE, Annotation Sync, camera, classes).
|
||
- Suite dataset narrative: `../../../suite/_docs/09_dataset_explorer.md`.
|
||
- Component specs: `components/01..06_*/description.md`.
|
||
- Module docs: `modules/*.md`.
|
||
- File ownership (downstream skills): `module-layout.md`.
|
||
- Component diagram: `diagrams/components.md`.
|
||
- Per-flow diagrams: `diagrams/flows/`.
|