Files
annotations/_docs/02_document/architecture.md
T
Oleksandr Bezdieniezhnykh d7d1c0ed6a [AZ-PENDING-1] [AZ-PENDING-2] Step 4 close-out: verification + docs
Phase 6 smoke (Docker, _docs/04_refactoring/01-testability-refactoring/
smoke-compose.yml):
  - Annotations app boots clean under ASPNETCORE_ENVIRONMENT=E2ETest.
  - /health 200 OK; /annotations with bearer returns 401 with the
    JWT library's own malformed-token rejection.
  - 0 IDX20108 occurrences in logs (C01 verified).
  - 0 IPAddress.Parse FormatException occurrences; FailsafeProducer
    reaches the broker via Docker DNS (C02 verified).
  - Full smoke report in verification.md.

Phase 7 docs:
  - architecture.md: retire Open Risks §6 (testability blocker
    resolved). Update the constraints block to describe the
    ASPNETCORE_ENVIRONMENT-gated RequireHttps behavior.
  - components/06_platform/description.md: one-liner on JwtExtensions
    JWKS gating.
  - components/02_annotations-realtime-sync/description.md: one-liner
    on FailsafeProducer host resolution accepting literal IP or DNS.
  - tests/test-data.md: refresh the JWKS URL configuration section to
    point at the resolved implementation instead of the open risk.

Task housekeeping:
  - _docs/02_tasks/todo/01_*.md -> done/
  - _docs/02_tasks/todo/02_*.md -> done/
  - _docs/_autodev_state.md: advance to Step 5 (Refactor Backlog Triage).

Tracker IDs remain placeholders pending Atlassian MCP availability —
real IDs to be assigned per
_docs/_process_leftovers/2026-05-14_testability-tracker.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 20:38:14 +03:00

41 KiB
Raw Blame History

Azaion.Annotations — Architecture

Source of truth for service-internal architecture. Suite-level integration narrative lives in ../../../suite/_docs/01_annotations.md. This file documents what the code in src/ actually implements, derived bottom-up from module and component docs.

Architecture Vision

Status: confirmed-by-user 2026-05-14.

Azaion.Annotations is a single .NET 10 ASP.NET Core service in the Azaion suite that owns the authoritative HTTP + streaming surface for annotation lifecycle, media upload, dataset exploration, and system metadata. State of record is PostgreSQL (Linq2DB + Npgsql) with an idempotent boot-time migrator. Real-time fan-out is in-process SSE; durable cross-service export is a transactional-outbox + RabbitMQ Stream pipeline producing MessagePack frames consumed by the admin sync worker and the AI training pipeline. The runtime is one container per node, ARM64-first via Woodpecker CI, with branch-driven image tags (dev | stage | main).

Components & responsibilities

  • 06 Platform — shared kernel: DB, enums, JWT, error middleware, paths, composition root.
  • 02 Annotations realtime & sync — SSE channel + RabbitMQ Stream failsafe drainer.
  • 01 Annotations REST — annotation CRUD + image/thumbnail file routes; the lifecycle producer.
  • 03 Media — upload (single + batch), list, download, delete.
  • 04 Dataset — read-heavy /dataset surface + DATASET-policy status writes (planned to route through 01 Annotations REST per RB-08).
  • 05 Settings & metadata — system / directory / camera / user settings + /classes catalog (becoming admin-managed per RB-06).

Major data flows

  • F1 — Annotation create: bytes → image file → DB rows → label file → SSE → outbox; will be wrapped in a business transaction (ADR-008).
  • F3 — SSE subscription: UI long-poll on /annotations/events.
  • F4 — Outbox drain: FailsafeProducer pumps queue rows to the RabbitMQ stream azaion-annotations.
  • F2 / F5 / F6 / F7 / F8 — read paths, media uploads, auth refresh, directory cache reset, dataset bulk status.

Principles / non-negotiables

  • Wire enums are integer-stable (suite contract). [inferred-from: modules/wire-enums.md, suite/_docs/01_annotations.md]
  • Annotation id is content-addressed via a sampled image-bytes hash; remains file-size-independent (videos to ~5 GB). [inferred-from: AnnotationService.ComputeHash, ADR-004]
  • PostgreSQL is the state of record; the filesystem is a content-addressed cache. [inferred-from: data_model.md, system-flows.md F1]
  • The transactional outbox is the durability boundary; SSE is best-effort. [inferred-from: ADR-003 / ADR-008]
  • Lifecycle observability is World B: every mutation publishes SSE and enqueues the outbox. [inferred-from: FailsafeProducer drainer plumbing for Validated/Deleted; maintainer resolution 2026-05-14 → ADR-009 / RB-01]
  • Soft-delete with file relocation: DeleteAnnotation flips status to AnnotationStatus.Deleted = 40 and moves files to a deleted-files directory rather than removing rows. [inferred-from: maintainer resolution 2026-05-14 → ADR-009 / RB-01]
  • Stream consumer dedupe contract is owned by this service: outbox messages must carry enough metadata for downstream consumers to dedupe on (annotationId, operation, dateTime). [inferred-from: maintainer resolution 2026-05-14 → ADR-013 / RB-09]
  • Mission is the canonical domain term: code currently uses FlightId; the suite spec uses missionId. Code aligns to suite (rename, not the other way). [inferred-from: 00_discovery.md drift list; maintainer resolution 2026-05-14 → ADR-012 / RB-07]
  • Dataset writes flow through the annotation domain service: 04 Dataset does not edit annotations rows directly. [inferred-from: module-layout.md Verification Needed §1; maintainer resolution 2026-05-14 → RB-08]
  • DB-driven runtime config: directory roots and detection classes change at runtime via ADM endpoints, not redeploy. [inferred-from: PathResolver.Reset, ADR-011]

Open questions / drift signals (residual)

  • UserId body field vs JWT NameIdentifier (suite spec lists UserId on POST /annotations; code uses JWT subject). Reconcile in suite or code.
  • The exact dedupe key shape for downstream consumers — (annotationId, operation, dateTime) is the working assumption per RB-09; suite consumer doc must be updated to match.

1. System Context

Problem being solved: Provide the canonical HTTP + streaming API for annotation lifecycle (create / update / status / delete / list / files), media (upload, list, download), dataset exploration (DATASET policy reads + bulk status writes), and system metadata (settings + detection class catalog), with real-time SSE push to UI consumers and failsafe export to RabbitMQ Stream consumers (admin sync, AI training).

System boundaries:

  • Inside: a single ASP.NET Core process (Azaion.Annotations.dll), its embedded migrator, in-memory SSE channel, in-process BackgroundService outbox drain, and the on-disk image / label / thumbnail / results layout under directory_settings.
  • Outside: PostgreSQL (state of record), RabbitMQ Streams (durable annotation export), the on-disk media/data filesystem (mounted), and every authenticated HTTP / SSE consumer (UIs, detections service, admin sync worker, AI training).

External systems:

System Integration Type Direction Purpose
PostgreSQL DB (Linq2DB / Npgsql) Both State of record (annotations, media, queue, settings, classes)
RabbitMQ Streams Stream client (RabbitMQ.Stream.Client) Outbound Durable export of annotation lifecycle (azaion-annotations stream)
Filesystem (mounted) File I/O Both Annotation images, YOLO label .txt, thumbnails, results, GPS routes/sat
Annotator UI / Dataset Explorer UI REST + SSE Inbound User flows (suite 01_annotations.md, 09_dataset_explorer.md)
Detections service (suite detections) REST Inbound POST annotations after model inference; long-running tokens are refreshed against admin (annotations no longer mints tokens)
Admin sync worker / AI training RabbitMQ Streams Outbound Consume azaion-annotations stream offsets (suite Annotation Sync)

2. Technology Stack

Layer Technology Version Rationale
Language C# net10.0 (src/Azaion.Annotations.csproj) Single language across suite .NET services
Framework ASP.NET Core (minimal hosting + controllers) net10.0 Built-in JWT, CORS, Swagger, hosted services
ORM / DB driver Linq2DB + Npgsql per csproj Linq2DB used for ITable<> repositories; Npgsql under the hood
Database PostgreSQL not pinned in code (URL-driven) Suite-wide datastore
Auth JWT Bearer (Microsoft.AspNetCore.Authentication.JwtBearer) — verifier-only, ES256 over admin's JWKS net10.0 Issuer/audience/lifetime/signature all validated; admin is the sole issuer (see Section 7)
Messaging RabbitMQ Streams (RabbitMQ.Stream.Client) + MessagePack per csproj Durable, replayable annotation export
API docs Swashbuckle (Swagger / Swagger UI) per csproj Always mounted (see ADR-005)
Hashing System.IO.Hashing net10.0 stdlib Annotation id derived from image bytes hash
Hosting WebApplication + IHostedService net10.0 FailsafeProducer runs in-process
Container mcr.microsoft.com/dotnet/aspnet:10.0 linux/arm64 + linux/amd64 Multi-arch image, ARM-first per Woodpecker
CI Woodpecker CI (.woodpecker/build-arm.yml) n/a Branch-based image tag (${BRANCH}-arm)

Key constraints (evidenced in code/config):

  • DATABASE_URL is required at startup — ConfigurationResolver.ResolveRequiredOrThrow throws if not set. The string is auto-converted from postgresql://user:pass@host:port/db URI form to Linq2DB's Host=…;Username=… form by Program.ConvertPostgresUrl.
  • JWT verification is required at startup — JWT_ISSUER, JWT_AUDIENCE, and JWT_JWKS_URL are all resolved by ConfigurationResolver.ResolveRequiredOrThrow. There is no insecure fallback. The JWKS URL is fetched with HttpDocumentRetriever, whose RequireHttps flag is gated on ASPNETCORE_ENVIRONMENT: HTTPS is required for any value other than E2ETest (Development, Staging, Production, and unset all enforce HTTPS); only E2ETest relaxes the flag to support the in-cluster mock issuer documented in tests/environment.md. The relaxation is gated in source (src/Auth/JwtExtensions.cs), not in config.
  • Default directory roots are /data/{videos,images,labels,results,thumbnails,gps_sat,gps_route} (migrator directory_settings defaults) → operator must mount or override at the DB level via PUT /settings/directories.
  • CORS is environment-gated: CorsConfigurationValidator.EnsureSafeForEnvironment refuses to start in Production when CorsConfig:AllowedOrigins is empty unless CorsConfig:AllowAnyOrigin=true is set explicitly. ADR-006 was retired together with the wide-open default.

3. Deployment Model

Environments (evidenced from CI branches): dev, stage, main → image tag ${CI_COMMIT_BRANCH}-arm pushed to a private registry resolved from REGISTRY_HOST secret.

Infrastructure:

  • Single .NET service container; container exposes port 8080.
  • Multi-arch build supported in the Dockerfile (--platform=$BUILDPLATFORM, $TARGETARCH); the ARM Woodpecker pipeline currently only emits arm64.
  • Scaling is vertical-only as written: SSE uses an in-process Channel<AnnotationEventDto>, and the FailsafeProducer outbox drainer is a per-instance BackgroundService — see "Open Architectural Risks".

Environment-specific configuration (defaults vs production):

Config Source Development default Production behavior
DATABASE_URL env or Database:Url config key none — fail-fast on missing (ConfigurationResolver) MUST set
JWT_ISSUER env or Jwt:Issuer config key none — fail-fast MUST set (matches admin's issuer)
JWT_AUDIENCE env or Jwt:Audience config key none — fail-fast MUST set (matches admin's audience for this service)
JWT_JWKS_URL env or Jwt:JwksUrl config key none — fail-fast; HTTPS required MUST set to admin's JWKS endpoint
RABBITMQ_HOST / RABBITMQ_STREAM_PORT env 127.0.0.1 / 5552 Override per environment
RABBITMQ_PRODUCER_USER / _PASS env azaion_producer / producer_pass Override
RABBITMQ_STREAM_NAME env azaion-annotations Usually kept (suite contract)
CorsConfig:AllowedOrigins IConfiguration (string array) empty MUST set (or set AllowAnyOrigin=true explicitly) — CorsConfigurationValidator refuses to start in Production otherwise
CorsConfig:AllowAnyOrigin IConfiguration (bool) false Explicit opt-in for permissive policy
Directory roots (/data/...) DB directory_settings hard-coded SQL defaults Tune via PUT /settings/directories (calls PathResolver.Reset)
Swagger UI Program.cs mounted Also mounted in prod (ADR-005)
AZAION_REVISION Dockerfile build arg CI_COMMIT_SHA unknown Stamped per-image

4. Data Model Overview

Detailed ERD, indexes, and migration semantics live in data_model.md. This section is the cross-component summary.

Core entities (owned by 06_platform; consumed by feature components):

Entity Description Owned by component
media Uploaded image/video reference (waypoint-scoped) 03_media (writes) / 01_annotations-rest (reads)
annotations Annotation row keyed by image-bytes hash, soft-versioned by created_date, time (BIGINT ticks) 01_annotations-rest
detection YOLO bounding boxes (center_x/y, width, height, class, affiliation, combat readiness) per annotation 01_annotations-rest
annotations_queue_records Outbox for failsafe stream sync (operation, annotation_ids JSON array) 02_annotations-realtime-sync (writer) / 01_annotations-rest (writer side)
system_settings Singleton-ish org settings + generate_annotated_image, silent_detection toggles 05_settings-metadata
directory_settings Filesystem roots consumed by PathResolver 05_settings-metadata
detection_classes Seeded class catalog for UI label/color (ids 018, names + Cyrillic short names + hex colors) 05_settings-metadata (read-only ClassesController)
user_settings Per-user UI prefs (panel widths, selected flight) 05_settings-metadata
camera_settings Calibration (altitude, focal length, sensor width) 05_settings-metadata

Key relationships:

  • annotations.media_idmedia.id (FK).
  • detection.annotation_idannotations.id (FK; cascades on annotation update logic in service layer, not DB).
  • annotations_queue_records.annotation_ids is a JSON array of TEXT ids (no FK); single-row outbox entry can reference multiple annotations (bulk).

Data flow summary:

  • Inbound write (Create)today: HTTP body → AnnotationService.CreateAnnotation → image bytes to images_dir/{id}.jpg, optional media row insert, annotations + detection rows, YOLO label to labels_dir/{id}.txt, SSE publish, then (if silent_detection != true) outbox row → drained by FailsafeProducer → MessagePack frame on RabbitMQ stream. Thumbnails are not produced by this flow — they are read-only via PhysicalFile and presumed populated out-of-band.
  • Inbound write (Update / UpdateStatus / Delete annotations, dataset PATCH / bulk-status)today: DB-only, silent. Target (RB-01): every mutation publishes SSE and enqueues the outbox with the appropriate QueueOperation (Created, Validated, or Deleted).
  • Lifecycle orderingtarget (RB-03): all DB writes plus the outbox row commit inside a single business transaction; FS writes (image / label / future thumbnail generation) and SSE publish are post-commit, with the outbox row as the durable promise.
  • Inbound read: HTTP query → DB joins (annotations × detection × media) → JSON list (PaginatedResponse<AnnotationListItem>); image/thumbnail served as PhysicalFile.

5. Integration Points

Internal communication (in-process)

From To Protocol Pattern Notes
01_annotations-rest (AnnotationService) 02_annotations-realtime-sync (AnnotationEventService) C# call Fire-and-forget publish to Channel<> Today: only on Create. Target (RB-01): every mutation publishes (Create, Update, UpdateStatus, Delete)
01_annotations-rest (AnnotationService) 02_annotations-realtime-sync (annotations_queue_records table) DB INSERT via FailsafeProducer.EnqueueAsync (static helper) Outbox Today: Create only, gated by silent_detection. Target (RB-01 + RB-02): every mutation enqueues with the appropriate QueueOperation; gating flag removed
02_annotations-realtime-sync (FailsafeProducer) 06_platform (AppDataConnection, PathResolver) C# call Read-then-delete Drainer is already plumbed for Created, Validated, and Deleted operations (see FailsafeProducer.cs:108123)
04_dataset (DatasetService.UpdateStatus / BulkUpdateStatus) 01_annotations-rest (AnnotationEventService) + outbox shared DB + cross-component call Direct write today; lifecycle publish + enqueue per RB-01 Bulk path enqueues a single Validated outbox record carrying all ids
05_settings-metadata (directory PUT) 06_platform (PathResolver.Reset) C# call Cache invalidation Required after directory change

External integrations

External system Protocol Auth Rate limits Failure mode
PostgreSQL TCP / Linq2DB / Npgsql Conn string n/a Surfaced as 500 via ErrorHandlingMiddleware
RabbitMQ Stream azaion-annotations Stream protocol (5552) Stream user/pass (azaion_producer default) Stream-level FailsafeProducer retries; rows stay in annotations_queue_records until drained
Filesystem (/data/...) POSIX OS perms n/a IOException → 500; missing image on GET → 404
HTTP clients (UIs, detections, admin) REST + SSE JWT Bearer (ANN, DATASET, ADM) n/a 401 if invalid; 403 if missing claim

6. Non-Functional Requirements

Pulled only from code-level evidence — config defaults, validators, health checks, idempotent migrator. Anything not evidenced is left blank rather than guessed.

Requirement Target Measurement Priority Source
Liveness 200 OK on GET /health route in Program.cs High Program.cs
Idempotent startup DB schema applies cleanly on every boot DatabaseMigrator.Migrate uses CREATE TABLE IF NOT EXISTS + ALTER TABLE … IF NOT EXISTS and INSERT … ON CONFLICT DO NOTHING High Database/DatabaseMigrator.cs
Recovery: queue durability Annotation lifecycle events are not lost across pod restarts DB-backed outbox (annotations_queue_records) drained by FailsafeProducer High Services/FailsafeProducer.cs
Auth lifetime / clock skew per JwtExtensions.AddJwtAuth config auth-identity module Medium Auth/JwtExtensions.cs
Pagination defaults PaginatedResponse<T> total/page/pageSize applied in list endpoints Medium DTOs/PaginatedResponse.cs
Thumbnail dimensions 240×135 with 10 border (defaults) system_settings.thumbnail_* Low migrator defaults
Throughput / latency / availability targets not evidenced in code open question, see 00_problem extraction (Step 6)

7. Security Architecture

Authentication: JWT Bearer; ES256 signature verified against admin's JWKS endpoint (JWT_JWKS_URL, default https://admin.azaion.com/.well-known/jwks.json). ValidateIssuer, ValidateAudience, RequireSignedTokens, and RequireExpirationTime are all enforced; algorithms are pinned to EcdsaSha256 to block HS256-confusion forgeries. Admin is the sole token issuer for the suite — annotations no longer holds an HMAC secret and no longer mints tokens (TokenService and POST /auth/refresh were removed; callers refresh against admin).

Authorization (per-endpoint policy claims, all evidenced in controllers):

  • ANNAnnotationsController, MediaController.
  • DATASETDatasetController (status writes including bulk).
  • ADM — mutating routes on SettingsController.
  • [Authorize] (any authenticated user) — read endpoints on settings, ClassesController.
  • [AllowAnonymous]/health.

User identity: server resolves user from JWT NameIdentifier (e.g., AnnotationsController.Create parses User.FindFirstValue(ClaimTypes.NameIdentifier)Guid). Suite spec sometimes lists UserId in body — drift recorded in 00_discovery.md.

Data protection:

  • At rest: nothing in-code — relies on the underlying Postgres deployment + filesystem.
  • In transit: terminated outside the container; service speaks plain HTTP on :8080.
  • Secrets: env-driven (DATABASE_URL, JWT_ISSUER, JWT_AUDIENCE, JWT_JWKS_URL, RABBITMQ_*). DATABASE_URL and the three JWT vars now fail-fast on startup if unset (no insecure default). ADR-002 was retired together with JWT_SECRET.
  • CORS: config-driven allow-list (CorsConfig:AllowedOrigins); CorsConfigurationValidator.EnsureSafeForEnvironment refuses to start in Production with an empty list unless CorsConfig:AllowAnyOrigin=true is explicitly set. ADR-006 was retired together with the wide-open default.

Audit logging: not evidenced beyond ASP.NET Core defaults — open gap; flag in retro/security audit.

Input validation: surfaces through model binding + ErrorHandlingMiddleware mapping (400 / 404 / 409 / 500); detailed validators per DTO live in DTOs/Requests/ (component specs to confirm during Step 4 verification).

8. Key Architectural Decisions (inferred from code)

These ADRs document choices the codebase already evidences. They are descriptive, not prescriptive — call them out so downstream skills can challenge them deliberately.

ADR-001: In-process SSE via Channel<T>

Context: Real-time annotation activity must reach the Annotator UI within 100ms of a write.

Decision: Use a singleton AnnotationEventService exposing an unbounded Channel<AnnotationEventDto> and serve subscribers from AnnotationsController.Events over text/event-stream.

Alternatives considered (implicitly rejected):

  1. Broker-backed pub/sub (Redis / RabbitMQ exchange) — rejected because it adds a dependency for what is already a single-process workload, and the failsafe queue covers durable export needs.
  2. Server-side polling — rejected because it cannot meet sub-second latency cheaply.

Consequences: SSE state is per-instance only. Horizontal scaling requires a broker fanout layer or sticky sessions on the LB.

ADR-002 (RETIRED): Symmetric JWT, no issuer/audience validation

Status: superseded — annotations is now a JWKS verifier of admin-signed ES256 tokens. AddJwtAuth(IConfiguration) pins ValidAlgorithms = [SecurityAlgorithms.EcdsaSha256], enforces ValidateIssuer/ValidateAudience/RequireSignedTokens/RequireExpirationTime, and resolves keys through ConfigurationManager<JsonWebKeySet> against JWT_JWKS_URL. JWT_SECRET was removed along with the local refresh path; admin is the sole issuer for the suite. The original ADR is preserved here for historical context only.

ADR-003: Failsafe outbox + RabbitMQ Stream (not direct publish)

Context: Annotation lifecycle must reach external consumers (admin sync, AI training) durably even when RabbitMQ is unavailable at the moment of the write.

Decision: Every mutation writes a row to annotations_queue_records; the in-process FailsafeProducer (IHostedService) drains this table and publishes MessagePack frames on the azaion-annotations stream, deleting rows after success.

Alternatives considered:

  1. Direct publish in the request path — rejected because RabbitMQ unavailability would either drop events (fire-and-forget) or fail user-visible writes (sync publish).
  2. Transactional outbox via Debezium / CDC — heavier, deferred.

Consequences: One outbox-drainer per service instance. Multiple instances drain concurrently → safe because the deletion is keyed on id and re-reads of disk bytes are idempotent, but ordering across consumers is not guaranteed.

ADR-004: Annotation id from a sampled XxHash3.Hash128 of image bytes

Context: Annotation rows must be deduplicated when the same image is re-uploaded (e.g., re-runs of the detection pipeline). The system also serves video media up to 35 GB, so hashing must remain constant-time with respect to file size to keep create-path latency stable under load.

Decision (resolved 2026-05-14): Hash a deterministic fixed-size sample with XxHash3.Hash128 (128-bit output, 32-char lower-case hex). Sample composition is unchanged from the current implementation:

  • For inputs ≤ 3072 bytes: [length(8 bytes)] + [full bytes].
  • For inputs > 3072 bytes: [length(8 bytes)] + [first 1024] + [middle 1024 starting at len/2 512] + [last 1024].

When MediaId is provided instead of bytes, the annotation id is reused from the referenced media row.

Why this combination:

  • Sampling preserves file-size independence. Reading a 5 GB video front-to-back just to derive an id is unacceptable on the hot path.
  • XxHash3.Hash128 over the same sample keeps the hashing itself O(1) in file size while moving the collision space from 2^64 to 2^128. Distinct large images that happen to share (length, head 1 KB, middle 1 KB, tail 1 KB) still collide deterministically — but the practical collision probability among such samples is now negligible at any realistic volume.

Migration consequences:

  • The annotation id column is TEXT PRIMARY KEY; switching from 16-char (XxHash64) to 32-char (XxHash3.Hash128) hex requires no schema change.
  • Existing rows keep their 16-char ids; new rows get 32-char ids. Re-create of an image whose original id was generated under XxHash64 will produce a different new id under XxHash3.Hash128 — i.e., re-creates after the upgrade no longer collide with their pre-upgrade row. Acceptable (and expected): old ids are stable, the deduplication property is preserved going forward, and the upgrade is irreversible by design.

Status: agreed. Implementation lives in the Refactor Backlog (RB-04).

ADR-005: Swagger UI mounted in all environments

Context: Internal debugging / partner integration friction.

Decision: app.UseSwagger() and app.UseSwaggerUI() are unconditional in Program.cs.

Consequences: Schema is publicly readable wherever the service is reachable. If the perimeter is not closed, this leaks endpoint surface — treat as a security finding for production-internet exposure.

ADR-006 (RETIRED): Wide-open CORS

Status: superseded — the default policy now reads CorsConfig:AllowedOrigins (string array) and CorsConfig:AllowAnyOrigin (boolean opt-in). CorsConfigurationValidator.EnsureSafeForEnvironment refuses to start in Production when origins are empty and AllowAnyOrigin is not explicitly set; a LogWarning is emitted in non-production when running with the permissive default. The original ADR is preserved here for historical context only.

ADR-007: Embedded SQL migrator (not EF migrations / Flyway)

Context: Suite values single-binary deploys; the team prefers idempotent boot-time DDL over a separate migration tool.

Decision: DatabaseMigrator.Migrate runs a single multi-statement script via Linq2DB on every startup. Schema evolution is additive (ALTER … ADD COLUMN IF NOT EXISTS).

Consequences: Backwards-only, no down migrations. Renames or destructive changes need an explicit out-of-band script. Drift detection requires diffing live DB against Database/DatabaseMigrator.cs.

ADR-008: Annotation lifecycle wrapped in a business transaction (planned)

Context: CreateAnnotation today touches the filesystem, three DB tables, an in-memory channel, and an outbox row, with no atomicity. World B (lifecycle is observable — see ADR-009) widens this surface to Update / Delete / status-change paths. A naive DB transaction does not wrap the FS writes; we want a single conceptual transactional boundary for the lifecycle, not just for the DB rows.

Decision (resolved 2026-05-14, to-be-implemented): introduce a business-transaction wrapper for annotation lifecycle operations. Concretely the chosen pattern is the transactional outbox:

  1. Write all relevant DB rows (annotation / detection / annotations_queue_records) inside a single db.BeginTransaction scope.
  2. Commit. The outbox row is the durable promise that the post-commit work is owed.
  3. Post-commit, perform side effects: write image / label / thumbnail files, publish SSE event. These steps are idempotent on retry; the outbox row stays until the drainer succeeds.
  4. The drainer (FailsafeProducer) is unchanged in role — it consumes the outbox.

Implications:

  • FS write order shifts: today image is first, before any DB row; after the refactor, DB rows + outbox commit first, then FS writes execute (with the outbox row as the recovery anchor).
  • A new abstraction (e.g., AnnotationLifecycleTransaction or a thin extension on AppDataConnection) is the right place to centralize this. Implementation deferred to RB-03.

Alternatives considered:

  1. Pure DB transaction wrapping current order — rejected: doesn't cover FS, leaves orphan-file risk.
  2. Saga / compensation steps with explicit rollback handlers — rejected: overkill for the linear lifecycle here.

Status: agreed. Implementation lives in the Refactor Backlog (RB-03).

ADR-009: Lifecycle observability — World B (planned)

Context: Today only CreateAnnotation publishes SSE and enqueues the outbox. Update / UpdateStatus / Delete (annotations) and UpdateStatus / BulkUpdateStatus (dataset) are silent. The QueueOperation enum already declares Validated and Deleted, and FailsafeProducer.cs:108123 has a dedicated drainer branch for both — strong evidence that the design always intended every lifecycle change to be observable. The producer side simply was never wired (the prior WPF codebase blended UI + backend; lifecycle calls likely came from the UI directly, which the new HTTP backend has not replicated).

Decision (resolved 2026-05-14, to-be-implemented): every annotation mutation publishes SSE and enqueues the outbox.

Mapping (initial; sub-questions to be resolved at implementation time):

Mutation SSE Outbox QueueOperation
AnnotationService.CreateAnnotation yes (today) Created (today)
AnnotationService.UpdateAnnotation (replace detections, status → Edited) yes open: re-enqueue as Created (richer payload) or add QueueOperation.Updated + corresponding drainer branch
AnnotationService.UpdateStatus (status → Validated (30) or Deleted (40)) yes Validated
AnnotationService.UpdateStatus (other transitions) yes open: skip outbox, or always enqueue Validated?
AnnotationService.DeleteAnnotation yes Deletedsoft-delete: status flips to AnnotationStatus.Deleted = 40, the row stays, image / label / thumbnail files relocate to a deleted_dir (new directory_settings column added by RB-01)
DatasetService.UpdateStatus / BulkUpdateStatus yes (per-id for bulk) Validated (single record covers the whole bulk via AnnotationIds)

Status: agreed. Implementation lives in the Refactor Backlog (RB-01).

ADR-010: Remove system_settings.silent_detection

Context: silent_detection was a debug-time switch to keep the RabbitMQ stream clean while a developer iterated locally. Now that the suite has e2e tests with isolated queues (per _docs/_repo-config.yaml suite-e2e), the in-product flag is dead code — debug isolation belongs in the test harness, not in system_settings.

Decision (resolved 2026-05-14, to-be-implemented):

  • Remove the gating block in AnnotationService.CreateAnnotation:100102 (always enqueue).
  • Drop silent_detection from system_settings (column, entity, migrator CREATE TABLE, migrator ALTER line, any DTO references).
  • Remove the field from UpdateSystemSettingsRequest if present.

Status: agreed. Implementation lives in the Refactor Backlog (RB-02). Schema column removal is a destructive change explicitly authorized by the maintainer.

ADR-012: Rename FlightMission to align with suite canonical (planned)

Context: The suite product spec (suite/_docs/01_annotations.md) calls the domain concept mission / missionId. The code uses Flight / FlightId (table media.waypoint_id + DTO FlightId filter). This drift was flagged in 00_discovery.md.

Decision (resolved 2026-05-14, to-be-implemented): align code to the suite. Flight*Mission* rename across DTOs, controllers, services, and the relevant query-parameter names. The media.waypoint_id column stays (it is the underlying physical identifier; mission is the logical grouping concept above it).

Status: agreed. Implementation lives in the Refactor Backlog (RB-07). Schema column changes are scoped to renames in DTOs and code only — no DB column rename is required for this ADR.

ADR-013: Stream consumer dedupe contract is owned by this service (planned)

Context: The failsafe outbox + RabbitMQ Stream pipeline can produce duplicate stream entries when (a) the drainer retries after a partial publish or (b) two service instances both pick up the same outbox row before either deletes it. Today there is no documented dedupe contract; consumers (admin sync, AI training) silently accept whatever they get.

Decision (resolved 2026-05-14, to-be-implemented): publish a documented dedupe contract owned by this service. Working shape: consumers MUST dedupe by (annotationId, operation, dateTime). The outbox row's DateTime (already populated by EnqueueAsync) becomes part of the on-the-wire stream message, alongside the annotationId and operation already in AnnotationQueueMessage / AnnotationBulkQueueMessage.

Status: agreed. Implementation lives in the Refactor Backlog (RB-09).

ADR-011: Detection class catalog is admin-managed with in-memory cache (planned)

Context: detection_classes is currently seeded by the migrator (19 rows) and read-only via GET /classes. Operators have no way to add or correct classes (e.g., the Smoke/Plane color clash on #000080) without a code change and redeploy.

Decision (resolved 2026-05-14, to-be-implemented):

  • ClassesController exposes POST /classes, PUT /classes/{id}, DELETE /classes/{id} under [Authorize(Policy = "ADM")]. GET /classes stays [Authorize].
  • Reads go through a new DetectionClassCache (DI singleton) modeled on PathResolver: lazy-load on first read, Reset() after any write.
  • Migrator-seeded rows remain as the bootstrap state; admin writes overwrite them per id.

Status: agreed. Implementation lives in the Refactor Backlog (RB-06). Adds a new feature surface; must land before any UI change relying on dynamic class management.

Resolved Architectural Decisions (Step 4 verification)

The following items were surfaced during verification and resolved with the maintainer on 2026-05-14. Each one either becomes an ADR above or maps to a refactor backlog entry below.

# Concern Resolution Tracked as
1 Update / Delete / dataset-status changes are silent on SSE + outbox Treat as gap; lifecycle is observable (World B) — every mutation publishes + enqueues ADR-009 / RB-01
2 system_settings.silent_detection semantics Remove the flag; e2e harness covers debug isolation now ADR-010 / RB-02
3 F1 not transactional across FS + DB + outbox Wrap lifecycle in a business-transaction (transactional outbox); FS writes happen post-commit ADR-008 / RB-03
4 XxHash64 over sampled bytes — collision risk Switch to XxHash3.Hash128 over the same sample (file-size-independent + 128-bit space) ADR-004 / RB-04
5 FailsafeProducer.EnqueueAsync static method does DB I/O — violates coderule.mdc Accept as-is; documented deviation from rule (no refactor)
6 detection_classes schema-mutable but no controller writes Admin-managed CRUD with read-through cache (modeled on PathResolver) ADR-011 / RB-06
7 Flight (code) vs mission (suite spec) drift Rename code → Mission*; suite spec stays canonical ADR-012 / RB-07
8 Dataset writes coupled directly to annotation rows via shared AppDataConnection Route dataset writes through AnnotationService (via a public domain interface) RB-08
9 Stream consumer dedupe contract owner This service owns it; dedupe by (annotationId, operation, dateTime) baked into the wire message ADR-013 / RB-09
10 Hard-delete vs soft-delete on DeleteAnnotation Soft-delete: status → Deleted (40), files moved to a deleted_dir ADR-009 (folded in) / RB-01

Remaining Open Architectural Risks

These are residual risks that still need attention from later autodev steps (Test Spec, Refactor, Security Audit). Items previously listed here that have been resolved as of 2026-05-14 (Flight/mission drift, dataset coupling, hard-vs-soft delete, JWT issuer/audience validation, CORS environment gating, dev secret fallback) moved to the Resolved Architectural Decisions table above and the Refactor Backlog below.

  1. Horizontal scaling: SSE channel is per-instance (singleton AnnotationEventService); the failsafe outbox uses no leasing/locking. Two pods will independently drain rows, with deletion keyed on id; under high concurrency the same row can be picked by both before either deletes — duplicate stream entries possible. Consumers must dedupe per ADR-013. (Touched by RB-03 / RB-09 indirectly but not solved by them.)
  2. Swagger exposure in production: see ADR-005. Belongs to Step 14 (Security Audit). (CORS exposure was resolved by CorsConfigurationValidator; ADR-006 retired.)
  3. UserId body field vs JWT NameIdentifier drift (suite spec lists UserId on POST /annotations; code uses JWT subject). Reconcile in the suite spec.
  4. No automated tests: addressed by autodev Phase A Steps 37 (Test Spec → Implement Tests → Run Tests).
  5. FailsafeProducer.cs:138 swallows IOException on image read silently (catch { }). Direct coderule.mdc violation. Symptom in product: a missing or unreadable image yields a stream message with image = null and no log/metric — the gap is invisible to operators. Track on Refactor Backlog (RB-05).
  6. JWKS HTTPS-only retrieval blocks containerised test harnessesRESOLVED 2026-05-14 by Step 4 (Code Testability Revision). JwtExtensions.AddJwtAuth now gates HttpDocumentRetriever.RequireHttps on ASPNETCORE_ENVIRONMENT == "E2ETest" (case-insensitive). Production / Staging / Development / unset all retain HTTPS-required behavior; only the E2ETest value relaxes the flag. Verified via the smoke harness in _docs/04_refactoring/01-testability-refactoring/verification.md. See _docs/04_refactoring/01-testability-refactoring/testability_changes_summary.md (item C01) for the full change log.

Refactor Backlog

These items are the implementation work for the resolved decisions above. They are not part of Step 4 (Verification) corrections — they will be picked up by the autodev existing-code flow at Step 8 (Refactor) and/or new feature tasks in Phase B.

ID Scope Source ADR / Risk Notes
RB-01 Wire lifecycle publish + outbox enqueue across Update / UpdateStatus / Delete (annotations + dataset). Includes the soft-delete behavior: DeleteAnnotation flips AnnotationStatus → Deleted (40), leaves the row, and moves image / label / thumbnail files to a new deleted_dir (added to directory_settings). Read paths must filter Status = Deleted (40) from default lists. ADR-009 Open sub-questions: (a) UpdateAnnotation mapping — re-enqueue as Created or add QueueOperation.Updated + drainer branch; (b) which non-Validated/Deleted status transitions enqueue at all
RB-02 Remove silent_detection (schema column, entity field, gating logic, DTOs) ADR-010 Destructive schema change explicitly authorized
RB-03 Introduce business-transaction wrapper (transactional outbox) for annotation lifecycle ADR-008 Reorders FS writes to post-commit; covers all mutation paths
RB-04 Switch annotation id hashing to XxHash3.Hash128 over the same sampled buffer ADR-004 Existing 16-char ids stay; new ids are 32-char hex
RB-05 Replace catch { } at FailsafeProducer.cs:138 with logged failure path; surface as a metric Open Risk §6 Downstream consumer should know an image-less message means a real disk error
RB-06 Admin-managed detection_classes (CRUD endpoints [ADM], in-memory cache with Reset()) ADR-011 Migrator seed remains as bootstrap; admin overrides per id; fix Smoke/Plane color collision while at it
RB-07 Rename Flight*Mission* across DTOs, controllers, services, and query-parameter names. media.waypoint_id column is unchanged (it's the physical id; mission is the logical concept). ADR-012 Code-only rename to align with suite spec; suite stays canonical
RB-08 Decouple 04 Dataset writes from direct annotations row mutations — route status writes through a public AnnotationService interface. Reads can stay direct for now (read coupling is lower-risk than write coupling). Open Risks (former §4) Likely introduces an IAnnotationLifecycle (or similar) interface owned by 01 Annotations REST that 04 Dataset consumes via DI
RB-09 Bake (annotationId, operation, dateTime) into the on-the-wire stream message; document the dedupe contract in suite/_docs/01_annotations.md. ADR-013 Coordinate the suite-doc update with admin sync + AI training maintainers

References

  • Suite product spec: ../../../suite/_docs/01_annotations.md (REST contracts, SSE, Annotation Sync, camera, classes).
  • Suite dataset narrative: ../../../suite/_docs/09_dataset_explorer.md.
  • Component specs: components/01..06_*/description.md.
  • Module docs: modules/*.md.
  • File ownership (downstream skills): module-layout.md.
  • Component diagram: diagrams/components.md.
  • Per-flow diagrams: diagrams/flows/.