This commit captures everything produced during autodev existing-code Steps 1 (Document), 2 (Architecture Baseline Scan), and 3 (Test Spec), together with the targeted auth + CORS re-sync triggered on 2026-05-14 when codebase drift was detected at Step 4 entry. None of this work was previously committed. Step 1 (Document) — 50+ _docs/02_document/ files: problem, solution, architecture, system flows, glossary, module-layout, per-component specs (01..06), modules, deployment, diagrams, data model, FINAL report, verification log, discovery. Step 2 (Architecture Baseline) — architecture_compliance_baseline.md. Verdict PASS_WITH_WARNINGS (0 Critical, 0 High, 1 Medium, 2 Low). No High/Critical findings; auto-chained to Step 3 per existing-code flow. Step 3 (Test Spec) — _docs/02_document/tests/* (67 scenarios across blackbox, security, resilience, resource-limit, performance), plus e2e/docker-compose.test.yml, e2e/seed/run.sh, scripts/run-tests.sh, scripts/run-performance-tests.sh. Coverage 88% over the active scope (40 of 45 items covered, 6 RB-deferred, 5 documented-as-uncovered). Targeted auth + CORS re-sync — replaces the deleted in-house token issuer with a JWKS-verifier model. AuthController and TokenService removed; JwtExtensions switched from HS256 symmetric to ES256 over admin's JWKS. ConfigurationResolver and CorsConfigurationValidator added under src/Infrastructure/. ADR-002 and ADR-006 retired; SEC-01, SEC-02, SEC-03 marked Closed. One new testability risk recorded in architecture.md Open Risks Section 6 (JWKS HTTPS gating). Source changes: - src/Auth/JwtExtensions.cs (modified) — ES256, JWKS, alg pinning - src/Program.cs (modified) — DI wiring for ConfigurationResolver and CorsConfigurationValidator - src/Controllers/AuthController.cs (deleted) — no in-service issuance - src/Services/TokenService.cs (deleted) — same - src/Infrastructure/ConfigurationResolver.cs (new) - src/Infrastructure/CorsConfigurationValidator.cs (new) - .env.example (new) — required env var documentation - .gitignore (updated) Cross-repo coordination: _docs/cross-repo/flights_h1_h2_h3_change_spec captures the change-spec for downstream services that consumed the now deleted /auth endpoints. Co-authored-by: Cursor <cursoragent@cursor.com>
40 KiB
Azaion.Annotations — Architecture
Source of truth for service-internal architecture. Suite-level integration narrative lives in
../../../suite/_docs/01_annotations.md. This file documents what the code insrc/actually implements, derived bottom-up from module and component docs.
Architecture Vision
Status: confirmed-by-user 2026-05-14.
Azaion.Annotations is a single .NET 10 ASP.NET Core service in the Azaion suite that owns the authoritative HTTP + streaming surface for annotation lifecycle, media upload, dataset exploration, and system metadata. State of record is PostgreSQL (Linq2DB + Npgsql) with an idempotent boot-time migrator. Real-time fan-out is in-process SSE; durable cross-service export is a transactional-outbox + RabbitMQ Stream pipeline producing MessagePack frames consumed by the admin sync worker and the AI training pipeline. The runtime is one container per node, ARM64-first via Woodpecker CI, with branch-driven image tags (dev | stage | main).
Components & responsibilities
- 06 Platform — shared kernel: DB, enums, JWT, error middleware, paths, composition root.
- 02 Annotations realtime & sync — SSE channel + RabbitMQ Stream failsafe drainer.
- 01 Annotations REST — annotation CRUD + image/thumbnail file routes; the lifecycle producer.
- 03 Media — upload (single + batch), list, download, delete.
- 04 Dataset — read-heavy
/datasetsurface +DATASET-policy status writes (planned to route through01 Annotations RESTper RB-08). - 05 Settings & metadata — system / directory / camera / user settings +
/classescatalog (becoming admin-managed per RB-06).
Major data flows
- F1 — Annotation create: bytes → image file → DB rows → label file → SSE → outbox; will be wrapped in a business transaction (ADR-008).
- F3 — SSE subscription: UI long-poll on
/annotations/events. - F4 — Outbox drain:
FailsafeProducerpumps queue rows to the RabbitMQ streamazaion-annotations. - F2 / F5 / F6 / F7 / F8 — read paths, media uploads, auth refresh, directory cache reset, dataset bulk status.
Principles / non-negotiables
- Wire enums are integer-stable (suite contract). [inferred-from:
modules/wire-enums.md,suite/_docs/01_annotations.md] - Annotation id is content-addressed via a sampled image-bytes hash; remains file-size-independent (videos to ~5 GB). [inferred-from:
AnnotationService.ComputeHash, ADR-004] - PostgreSQL is the state of record; the filesystem is a content-addressed cache. [inferred-from:
data_model.md,system-flows.mdF1] - The transactional outbox is the durability boundary; SSE is best-effort. [inferred-from: ADR-003 / ADR-008]
- Lifecycle observability is World B: every mutation publishes SSE and enqueues the outbox. [inferred-from:
FailsafeProducerdrainer plumbing forValidated/Deleted; maintainer resolution 2026-05-14 → ADR-009 / RB-01] - Soft-delete with file relocation:
DeleteAnnotationflips status toAnnotationStatus.Deleted = 40and moves files to a deleted-files directory rather than removing rows. [inferred-from: maintainer resolution 2026-05-14 → ADR-009 / RB-01] - Stream consumer dedupe contract is owned by this service: outbox messages must carry enough metadata for downstream consumers to dedupe on
(annotationId, operation, dateTime). [inferred-from: maintainer resolution 2026-05-14 → ADR-013 / RB-09] - Mission is the canonical domain term: code currently uses
FlightId; the suite spec usesmissionId. Code aligns to suite (rename, not the other way). [inferred-from:00_discovery.mddrift list; maintainer resolution 2026-05-14 → ADR-012 / RB-07] - Dataset writes flow through the annotation domain service:
04 Datasetdoes not editannotationsrows directly. [inferred-from:module-layout.mdVerification Needed §1; maintainer resolution 2026-05-14 → RB-08] - DB-driven runtime config: directory roots and detection classes change at runtime via
ADMendpoints, not redeploy. [inferred-from:PathResolver.Reset, ADR-011]
Open questions / drift signals (residual)
UserIdbody field vs JWTNameIdentifier(suite spec listsUserIdonPOST /annotations; code uses JWT subject). Reconcile in suite or code.- The exact dedupe key shape for downstream consumers —
(annotationId, operation, dateTime)is the working assumption per RB-09; suite consumer doc must be updated to match.
1. System Context
Problem being solved: Provide the canonical HTTP + streaming API for annotation lifecycle (create / update / status / delete / list / files), media (upload, list, download), dataset exploration (DATASET policy reads + bulk status writes), and system metadata (settings + detection class catalog), with real-time SSE push to UI consumers and failsafe export to RabbitMQ Stream consumers (admin sync, AI training).
System boundaries:
- Inside: a single ASP.NET Core process (
Azaion.Annotations.dll), its embedded migrator, in-memory SSE channel, in-processBackgroundServiceoutbox drain, and the on-disk image / label / thumbnail / results layout underdirectory_settings. - Outside: PostgreSQL (state of record), RabbitMQ Streams (durable annotation export), the on-disk media/data filesystem (mounted), and every authenticated HTTP / SSE consumer (UIs, detections service, admin sync worker, AI training).
External systems:
| System | Integration Type | Direction | Purpose |
|---|---|---|---|
| PostgreSQL | DB (Linq2DB / Npgsql) | Both | State of record (annotations, media, queue, settings, classes) |
| RabbitMQ Streams | Stream client (RabbitMQ.Stream.Client) |
Outbound | Durable export of annotation lifecycle (azaion-annotations stream) |
| Filesystem (mounted) | File I/O | Both | Annotation images, YOLO label .txt, thumbnails, results, GPS routes/sat |
| Annotator UI / Dataset Explorer UI | REST + SSE | Inbound | User flows (suite 01_annotations.md, 09_dataset_explorer.md) |
Detections service (suite detections) |
REST | Inbound | POST annotations after model inference; long-running tokens are refreshed against admin (annotations no longer mints tokens) |
| Admin sync worker / AI training | RabbitMQ Streams | Outbound | Consume azaion-annotations stream offsets (suite Annotation Sync) |
2. Technology Stack
| Layer | Technology | Version | Rationale |
|---|---|---|---|
| Language | C# | net10.0 (src/Azaion.Annotations.csproj) |
Single language across suite .NET services |
| Framework | ASP.NET Core (minimal hosting + controllers) | net10.0 | Built-in JWT, CORS, Swagger, hosted services |
| ORM / DB driver | Linq2DB + Npgsql | per csproj |
Linq2DB used for ITable<> repositories; Npgsql under the hood |
| Database | PostgreSQL | not pinned in code (URL-driven) | Suite-wide datastore |
| Auth | JWT Bearer (Microsoft.AspNetCore.Authentication.JwtBearer) — verifier-only, ES256 over admin's JWKS |
net10.0 | Issuer/audience/lifetime/signature all validated; admin is the sole issuer (see Section 7) |
| Messaging | RabbitMQ Streams (RabbitMQ.Stream.Client) + MessagePack |
per csproj |
Durable, replayable annotation export |
| API docs | Swashbuckle (Swagger / Swagger UI) | per csproj |
Always mounted (see ADR-005) |
| Hashing | System.IO.Hashing |
net10.0 stdlib | Annotation id derived from image bytes hash |
| Hosting | WebApplication + IHostedService |
net10.0 | FailsafeProducer runs in-process |
| Container | mcr.microsoft.com/dotnet/aspnet:10.0 |
linux/arm64 + linux/amd64 | Multi-arch image, ARM-first per Woodpecker |
| CI | Woodpecker CI (.woodpecker/build-arm.yml) |
n/a | Branch-based image tag (${BRANCH}-arm) |
Key constraints (evidenced in code/config):
DATABASE_URLis required at startup —ConfigurationResolver.ResolveRequiredOrThrowthrows if not set. The string is auto-converted frompostgresql://user:pass@host:port/dbURI form to Linq2DB'sHost=…;Username=…form byProgram.ConvertPostgresUrl.- JWT verification is required at startup —
JWT_ISSUER,JWT_AUDIENCE, andJWT_JWKS_URLare all resolved byConfigurationResolver.ResolveRequiredOrThrow. There is no insecure fallback. The JWKS URL must be HTTPS (HttpDocumentRetriever { RequireHttps = true }). - Default directory roots are
/data/{videos,images,labels,results,thumbnails,gps_sat,gps_route}(migratordirectory_settingsdefaults) → operator must mount or override at the DB level viaPUT /settings/directories. - CORS is environment-gated:
CorsConfigurationValidator.EnsureSafeForEnvironmentrefuses to start inProductionwhenCorsConfig:AllowedOriginsis empty unlessCorsConfig:AllowAnyOrigin=trueis set explicitly. ADR-006 was retired together with the wide-open default.
3. Deployment Model
Environments (evidenced from CI branches): dev, stage, main → image tag ${CI_COMMIT_BRANCH}-arm pushed to a private registry resolved from REGISTRY_HOST secret.
Infrastructure:
- Single .NET service container; container exposes port
8080. - Multi-arch build supported in the Dockerfile (
--platform=$BUILDPLATFORM,$TARGETARCH); the ARM Woodpecker pipeline currently only emitsarm64. - Scaling is vertical-only as written: SSE uses an in-process
Channel<AnnotationEventDto>, and theFailsafeProduceroutbox drainer is a per-instanceBackgroundService— see "Open Architectural Risks".
Environment-specific configuration (defaults vs production):
| Config | Source | Development default | Production behavior |
|---|---|---|---|
DATABASE_URL |
env or Database:Url config key |
none — fail-fast on missing (ConfigurationResolver) |
MUST set |
JWT_ISSUER |
env or Jwt:Issuer config key |
none — fail-fast | MUST set (matches admin's issuer) |
JWT_AUDIENCE |
env or Jwt:Audience config key |
none — fail-fast | MUST set (matches admin's audience for this service) |
JWT_JWKS_URL |
env or Jwt:JwksUrl config key |
none — fail-fast; HTTPS required | MUST set to admin's JWKS endpoint |
RABBITMQ_HOST / RABBITMQ_STREAM_PORT |
env | 127.0.0.1 / 5552 |
Override per environment |
RABBITMQ_PRODUCER_USER / _PASS |
env | azaion_producer / producer_pass |
Override |
RABBITMQ_STREAM_NAME |
env | azaion-annotations |
Usually kept (suite contract) |
CorsConfig:AllowedOrigins |
IConfiguration (string array) |
empty | MUST set (or set AllowAnyOrigin=true explicitly) — CorsConfigurationValidator refuses to start in Production otherwise |
CorsConfig:AllowAnyOrigin |
IConfiguration (bool) |
false | Explicit opt-in for permissive policy |
Directory roots (/data/...) |
DB directory_settings |
hard-coded SQL defaults | Tune via PUT /settings/directories (calls PathResolver.Reset) |
| Swagger UI | Program.cs |
mounted | Also mounted in prod (ADR-005) |
AZAION_REVISION |
Dockerfile build arg CI_COMMIT_SHA |
unknown |
Stamped per-image |
4. Data Model Overview
Detailed ERD, indexes, and migration semantics live in
data_model.md. This section is the cross-component summary.
Core entities (owned by 06_platform; consumed by feature components):
| Entity | Description | Owned by component |
|---|---|---|
media |
Uploaded image/video reference (waypoint-scoped) | 03_media (writes) / 01_annotations-rest (reads) |
annotations |
Annotation row keyed by image-bytes hash, soft-versioned by created_date, time (BIGINT ticks) |
01_annotations-rest |
detection |
YOLO bounding boxes (center_x/y, width, height, class, affiliation, combat readiness) per annotation |
01_annotations-rest |
annotations_queue_records |
Outbox for failsafe stream sync (operation, annotation_ids JSON array) |
02_annotations-realtime-sync (writer) / 01_annotations-rest (writer side) |
system_settings |
Singleton-ish org settings + generate_annotated_image, silent_detection toggles |
05_settings-metadata |
directory_settings |
Filesystem roots consumed by PathResolver |
05_settings-metadata |
detection_classes |
Seeded class catalog for UI label/color (ids 0–18, names + Cyrillic short names + hex colors) | 05_settings-metadata (read-only ClassesController) |
user_settings |
Per-user UI prefs (panel widths, selected flight) | 05_settings-metadata |
camera_settings |
Calibration (altitude, focal length, sensor width) | 05_settings-metadata |
Key relationships:
annotations.media_id→media.id(FK).detection.annotation_id→annotations.id(FK; cascades on annotation update logic in service layer, not DB).annotations_queue_records.annotation_idsis a JSON array of TEXT ids (no FK); single-row outbox entry can reference multiple annotations (bulk).
Data flow summary:
- Inbound write (Create) — today: HTTP body →
AnnotationService.CreateAnnotation→ image bytes toimages_dir/{id}.jpg, optionalmediarow insert,annotations+detectionrows, YOLO label tolabels_dir/{id}.txt, SSE publish, then (ifsilent_detection != true) outbox row → drained byFailsafeProducer→ MessagePack frame on RabbitMQ stream. Thumbnails are not produced by this flow — they are read-only viaPhysicalFileand presumed populated out-of-band. - Inbound write (Update / UpdateStatus / Delete annotations, dataset PATCH / bulk-status) — today: DB-only, silent. Target (RB-01): every mutation publishes SSE and enqueues the outbox with the appropriate
QueueOperation(Created,Validated, orDeleted). - Lifecycle ordering — target (RB-03): all DB writes plus the outbox row commit inside a single business transaction; FS writes (image / label / future thumbnail generation) and SSE publish are post-commit, with the outbox row as the durable promise.
- Inbound read: HTTP query → DB joins (
annotations × detection × media) → JSON list (PaginatedResponse<AnnotationListItem>); image/thumbnail served asPhysicalFile.
5. Integration Points
Internal communication (in-process)
| From | To | Protocol | Pattern | Notes |
|---|---|---|---|---|
01_annotations-rest (AnnotationService) |
02_annotations-realtime-sync (AnnotationEventService) |
C# call | Fire-and-forget publish to Channel<> |
Today: only on Create. Target (RB-01): every mutation publishes (Create, Update, UpdateStatus, Delete) |
01_annotations-rest (AnnotationService) |
02_annotations-realtime-sync (annotations_queue_records table) |
DB INSERT via FailsafeProducer.EnqueueAsync (static helper) |
Outbox | Today: Create only, gated by silent_detection. Target (RB-01 + RB-02): every mutation enqueues with the appropriate QueueOperation; gating flag removed |
02_annotations-realtime-sync (FailsafeProducer) |
06_platform (AppDataConnection, PathResolver) |
C# call | Read-then-delete | Drainer is already plumbed for Created, Validated, and Deleted operations (see FailsafeProducer.cs:108–123) |
04_dataset (DatasetService.UpdateStatus / BulkUpdateStatus) |
01_annotations-rest (AnnotationEventService) + outbox |
shared DB + cross-component call | Direct write today; lifecycle publish + enqueue per RB-01 | Bulk path enqueues a single Validated outbox record carrying all ids |
05_settings-metadata (directory PUT) |
06_platform (PathResolver.Reset) |
C# call | Cache invalidation | Required after directory change |
External integrations
| External system | Protocol | Auth | Rate limits | Failure mode |
|---|---|---|---|---|
| PostgreSQL | TCP / Linq2DB / Npgsql | Conn string | n/a | Surfaced as 500 via ErrorHandlingMiddleware |
RabbitMQ Stream azaion-annotations |
Stream protocol (5552) | Stream user/pass (azaion_producer default) |
Stream-level | FailsafeProducer retries; rows stay in annotations_queue_records until drained |
Filesystem (/data/...) |
POSIX | OS perms | n/a | IOException → 500; missing image on GET → 404 |
| HTTP clients (UIs, detections, admin) | REST + SSE | JWT Bearer (ANN, DATASET, ADM) |
n/a | 401 if invalid; 403 if missing claim |
6. Non-Functional Requirements
Pulled only from code-level evidence — config defaults, validators, health checks, idempotent migrator. Anything not evidenced is left blank rather than guessed.
| Requirement | Target | Measurement | Priority | Source |
|---|---|---|---|---|
| Liveness | 200 OK on GET /health |
route in Program.cs |
High | Program.cs |
| Idempotent startup | DB schema applies cleanly on every boot | DatabaseMigrator.Migrate uses CREATE TABLE IF NOT EXISTS + ALTER TABLE … IF NOT EXISTS and INSERT … ON CONFLICT DO NOTHING |
High | Database/DatabaseMigrator.cs |
| Recovery: queue durability | Annotation lifecycle events are not lost across pod restarts | DB-backed outbox (annotations_queue_records) drained by FailsafeProducer |
High | Services/FailsafeProducer.cs |
| Auth lifetime / clock skew | per JwtExtensions.AddJwtAuth config |
auth-identity module |
Medium | Auth/JwtExtensions.cs |
| Pagination defaults | PaginatedResponse<T> total/page/pageSize |
applied in list endpoints | Medium | DTOs/PaginatedResponse.cs |
| Thumbnail dimensions | 240×135 with 10 border (defaults) |
system_settings.thumbnail_* |
Low | migrator defaults |
| Throughput / latency / availability targets | not evidenced in code | — | — | open question, see 00_problem extraction (Step 6) |
7. Security Architecture
Authentication: JWT Bearer; ES256 signature verified against admin's JWKS endpoint (JWT_JWKS_URL, default https://admin.azaion.com/.well-known/jwks.json). ValidateIssuer, ValidateAudience, RequireSignedTokens, and RequireExpirationTime are all enforced; algorithms are pinned to EcdsaSha256 to block HS256-confusion forgeries. Admin is the sole token issuer for the suite — annotations no longer holds an HMAC secret and no longer mints tokens (TokenService and POST /auth/refresh were removed; callers refresh against admin).
Authorization (per-endpoint policy claims, all evidenced in controllers):
ANN—AnnotationsController,MediaController.DATASET—DatasetController(status writes including bulk).ADM— mutating routes onSettingsController.[Authorize](any authenticated user) — read endpoints on settings,ClassesController.[AllowAnonymous]—/health.
User identity: server resolves user from JWT NameIdentifier (e.g., AnnotationsController.Create parses User.FindFirstValue(ClaimTypes.NameIdentifier) → Guid). Suite spec sometimes lists UserId in body — drift recorded in 00_discovery.md.
Data protection:
- At rest: nothing in-code — relies on the underlying Postgres deployment + filesystem.
- In transit: terminated outside the container; service speaks plain HTTP on
:8080. - Secrets: env-driven (
DATABASE_URL,JWT_ISSUER,JWT_AUDIENCE,JWT_JWKS_URL,RABBITMQ_*).DATABASE_URLand the three JWT vars now fail-fast on startup if unset (no insecure default). ADR-002 was retired together withJWT_SECRET. - CORS: config-driven allow-list (
CorsConfig:AllowedOrigins);CorsConfigurationValidator.EnsureSafeForEnvironmentrefuses to start inProductionwith an empty list unlessCorsConfig:AllowAnyOrigin=trueis explicitly set. ADR-006 was retired together with the wide-open default.
Audit logging: not evidenced beyond ASP.NET Core defaults — open gap; flag in retro/security audit.
Input validation: surfaces through model binding + ErrorHandlingMiddleware mapping (400 / 404 / 409 / 500); detailed validators per DTO live in DTOs/Requests/ (component specs to confirm during Step 4 verification).
8. Key Architectural Decisions (inferred from code)
These ADRs document choices the codebase already evidences. They are descriptive, not prescriptive — call them out so downstream skills can challenge them deliberately.
ADR-001: In-process SSE via Channel<T>
Context: Real-time annotation activity must reach the Annotator UI within 100ms of a write.
Decision: Use a singleton AnnotationEventService exposing an unbounded Channel<AnnotationEventDto> and serve subscribers from AnnotationsController.Events over text/event-stream.
Alternatives considered (implicitly rejected):
- Broker-backed pub/sub (Redis / RabbitMQ exchange) — rejected because it adds a dependency for what is already a single-process workload, and the failsafe queue covers durable export needs.
- Server-side polling — rejected because it cannot meet sub-second latency cheaply.
Consequences: SSE state is per-instance only. Horizontal scaling requires a broker fanout layer or sticky sessions on the LB.
ADR-002 (RETIRED): Symmetric JWT, no issuer/audience validation
Status: superseded — annotations is now a JWKS verifier of admin-signed ES256 tokens. AddJwtAuth(IConfiguration) pins ValidAlgorithms = [SecurityAlgorithms.EcdsaSha256], enforces ValidateIssuer/ValidateAudience/RequireSignedTokens/RequireExpirationTime, and resolves keys through ConfigurationManager<JsonWebKeySet> against JWT_JWKS_URL. JWT_SECRET was removed along with the local refresh path; admin is the sole issuer for the suite. The original ADR is preserved here for historical context only.
ADR-003: Failsafe outbox + RabbitMQ Stream (not direct publish)
Context: Annotation lifecycle must reach external consumers (admin sync, AI training) durably even when RabbitMQ is unavailable at the moment of the write.
Decision: Every mutation writes a row to annotations_queue_records; the in-process FailsafeProducer (IHostedService) drains this table and publishes MessagePack frames on the azaion-annotations stream, deleting rows after success.
Alternatives considered:
- Direct publish in the request path — rejected because RabbitMQ unavailability would either drop events (
fire-and-forget) or fail user-visible writes (sync publish). - Transactional outbox via Debezium / CDC — heavier, deferred.
Consequences: One outbox-drainer per service instance. Multiple instances drain concurrently → safe because the deletion is keyed on id and re-reads of disk bytes are idempotent, but ordering across consumers is not guaranteed.
ADR-004: Annotation id from a sampled XxHash3.Hash128 of image bytes
Context: Annotation rows must be deduplicated when the same image is re-uploaded (e.g., re-runs of the detection pipeline). The system also serves video media up to 3–5 GB, so hashing must remain constant-time with respect to file size to keep create-path latency stable under load.
Decision (resolved 2026-05-14): Hash a deterministic fixed-size sample with XxHash3.Hash128 (128-bit output, 32-char lower-case hex). Sample composition is unchanged from the current implementation:
- For inputs ≤ 3072 bytes:
[length(8 bytes)] + [full bytes]. - For inputs > 3072 bytes:
[length(8 bytes)] + [first 1024] + [middle 1024 starting at len/2 − 512] + [last 1024].
When MediaId is provided instead of bytes, the annotation id is reused from the referenced media row.
Why this combination:
- Sampling preserves file-size independence. Reading a 5 GB video front-to-back just to derive an id is unacceptable on the hot path.
XxHash3.Hash128over the same sample keeps the hashing itself O(1) in file size while moving the collision space from 2^64 to 2^128. Distinct large images that happen to share(length, head 1 KB, middle 1 KB, tail 1 KB)still collide deterministically — but the practical collision probability among such samples is now negligible at any realistic volume.
Migration consequences:
- The annotation
idcolumn isTEXT PRIMARY KEY; switching from 16-char (XxHash64) to 32-char (XxHash3.Hash128) hex requires no schema change. - Existing rows keep their 16-char ids; new rows get 32-char ids. Re-create of an image whose original id was generated under
XxHash64will produce a different new id underXxHash3.Hash128— i.e., re-creates after the upgrade no longer collide with their pre-upgrade row. Acceptable (and expected): old ids are stable, the deduplication property is preserved going forward, and the upgrade is irreversible by design.
Status: agreed. Implementation lives in the Refactor Backlog (RB-04).
ADR-005: Swagger UI mounted in all environments
Context: Internal debugging / partner integration friction.
Decision: app.UseSwagger() and app.UseSwaggerUI() are unconditional in Program.cs.
Consequences: Schema is publicly readable wherever the service is reachable. If the perimeter is not closed, this leaks endpoint surface — treat as a security finding for production-internet exposure.
ADR-006 (RETIRED): Wide-open CORS
Status: superseded — the default policy now reads CorsConfig:AllowedOrigins (string array) and CorsConfig:AllowAnyOrigin (boolean opt-in). CorsConfigurationValidator.EnsureSafeForEnvironment refuses to start in Production when origins are empty and AllowAnyOrigin is not explicitly set; a LogWarning is emitted in non-production when running with the permissive default. The original ADR is preserved here for historical context only.
ADR-007: Embedded SQL migrator (not EF migrations / Flyway)
Context: Suite values single-binary deploys; the team prefers idempotent boot-time DDL over a separate migration tool.
Decision: DatabaseMigrator.Migrate runs a single multi-statement script via Linq2DB on every startup. Schema evolution is additive (ALTER … ADD COLUMN IF NOT EXISTS).
Consequences: Backwards-only, no down migrations. Renames or destructive changes need an explicit out-of-band script. Drift detection requires diffing live DB against Database/DatabaseMigrator.cs.
ADR-008: Annotation lifecycle wrapped in a business transaction (planned)
Context: CreateAnnotation today touches the filesystem, three DB tables, an in-memory channel, and an outbox row, with no atomicity. World B (lifecycle is observable — see ADR-009) widens this surface to Update / Delete / status-change paths. A naive DB transaction does not wrap the FS writes; we want a single conceptual transactional boundary for the lifecycle, not just for the DB rows.
Decision (resolved 2026-05-14, to-be-implemented): introduce a business-transaction wrapper for annotation lifecycle operations. Concretely the chosen pattern is the transactional outbox:
- Write all relevant DB rows (annotation / detection / annotations_queue_records) inside a single
db.BeginTransactionscope. - Commit. The outbox row is the durable promise that the post-commit work is owed.
- Post-commit, perform side effects: write image / label / thumbnail files, publish SSE event. These steps are idempotent on retry; the outbox row stays until the drainer succeeds.
- The drainer (
FailsafeProducer) is unchanged in role — it consumes the outbox.
Implications:
- FS write order shifts: today image is first, before any DB row; after the refactor, DB rows + outbox commit first, then FS writes execute (with the outbox row as the recovery anchor).
- A new abstraction (e.g.,
AnnotationLifecycleTransactionor a thin extension onAppDataConnection) is the right place to centralize this. Implementation deferred to RB-03.
Alternatives considered:
- Pure DB transaction wrapping current order — rejected: doesn't cover FS, leaves orphan-file risk.
- Saga / compensation steps with explicit rollback handlers — rejected: overkill for the linear lifecycle here.
Status: agreed. Implementation lives in the Refactor Backlog (RB-03).
ADR-009: Lifecycle observability — World B (planned)
Context: Today only CreateAnnotation publishes SSE and enqueues the outbox. Update / UpdateStatus / Delete (annotations) and UpdateStatus / BulkUpdateStatus (dataset) are silent. The QueueOperation enum already declares Validated and Deleted, and FailsafeProducer.cs:108–123 has a dedicated drainer branch for both — strong evidence that the design always intended every lifecycle change to be observable. The producer side simply was never wired (the prior WPF codebase blended UI + backend; lifecycle calls likely came from the UI directly, which the new HTTP backend has not replicated).
Decision (resolved 2026-05-14, to-be-implemented): every annotation mutation publishes SSE and enqueues the outbox.
Mapping (initial; sub-questions to be resolved at implementation time):
| Mutation | SSE | Outbox QueueOperation |
|---|---|---|
AnnotationService.CreateAnnotation |
yes (today) | Created (today) |
AnnotationService.UpdateAnnotation (replace detections, status → Edited) |
yes | open: re-enqueue as Created (richer payload) or add QueueOperation.Updated + corresponding drainer branch |
AnnotationService.UpdateStatus (status → Validated (30) or Deleted (40)) |
yes | Validated |
AnnotationService.UpdateStatus (other transitions) |
yes | open: skip outbox, or always enqueue Validated? |
AnnotationService.DeleteAnnotation |
yes | Deleted — soft-delete: status flips to AnnotationStatus.Deleted = 40, the row stays, image / label / thumbnail files relocate to a deleted_dir (new directory_settings column added by RB-01) |
DatasetService.UpdateStatus / BulkUpdateStatus |
yes (per-id for bulk) | Validated (single record covers the whole bulk via AnnotationIds) |
Status: agreed. Implementation lives in the Refactor Backlog (RB-01).
ADR-010: Remove system_settings.silent_detection
Context: silent_detection was a debug-time switch to keep the RabbitMQ stream clean while a developer iterated locally. Now that the suite has e2e tests with isolated queues (per _docs/_repo-config.yaml suite-e2e), the in-product flag is dead code — debug isolation belongs in the test harness, not in system_settings.
Decision (resolved 2026-05-14, to-be-implemented):
- Remove the gating block in
AnnotationService.CreateAnnotation:100–102(always enqueue). - Drop
silent_detectionfromsystem_settings(column, entity, migratorCREATE TABLE, migratorALTERline, any DTO references). - Remove the field from
UpdateSystemSettingsRequestif present.
Status: agreed. Implementation lives in the Refactor Backlog (RB-02). Schema column removal is a destructive change explicitly authorized by the maintainer.
ADR-012: Rename Flight → Mission to align with suite canonical (planned)
Context: The suite product spec (suite/_docs/01_annotations.md) calls the domain concept mission / missionId. The code uses Flight / FlightId (table media.waypoint_id + DTO FlightId filter). This drift was flagged in 00_discovery.md.
Decision (resolved 2026-05-14, to-be-implemented): align code to the suite. Flight* → Mission* rename across DTOs, controllers, services, and the relevant query-parameter names. The media.waypoint_id column stays (it is the underlying physical identifier; mission is the logical grouping concept above it).
Status: agreed. Implementation lives in the Refactor Backlog (RB-07). Schema column changes are scoped to renames in DTOs and code only — no DB column rename is required for this ADR.
ADR-013: Stream consumer dedupe contract is owned by this service (planned)
Context: The failsafe outbox + RabbitMQ Stream pipeline can produce duplicate stream entries when (a) the drainer retries after a partial publish or (b) two service instances both pick up the same outbox row before either deletes it. Today there is no documented dedupe contract; consumers (admin sync, AI training) silently accept whatever they get.
Decision (resolved 2026-05-14, to-be-implemented): publish a documented dedupe contract owned by this service. Working shape: consumers MUST dedupe by (annotationId, operation, dateTime). The outbox row's DateTime (already populated by EnqueueAsync) becomes part of the on-the-wire stream message, alongside the annotationId and operation already in AnnotationQueueMessage / AnnotationBulkQueueMessage.
Status: agreed. Implementation lives in the Refactor Backlog (RB-09).
ADR-011: Detection class catalog is admin-managed with in-memory cache (planned)
Context: detection_classes is currently seeded by the migrator (19 rows) and read-only via GET /classes. Operators have no way to add or correct classes (e.g., the Smoke/Plane color clash on #000080) without a code change and redeploy.
Decision (resolved 2026-05-14, to-be-implemented):
ClassesControllerexposesPOST /classes,PUT /classes/{id},DELETE /classes/{id}under[Authorize(Policy = "ADM")].GET /classesstays[Authorize].- Reads go through a new
DetectionClassCache(DI singleton) modeled onPathResolver: lazy-load on first read,Reset()after any write. - Migrator-seeded rows remain as the bootstrap state; admin writes overwrite them per id.
Status: agreed. Implementation lives in the Refactor Backlog (RB-06). Adds a new feature surface; must land before any UI change relying on dynamic class management.
Resolved Architectural Decisions (Step 4 verification)
The following items were surfaced during verification and resolved with the maintainer on 2026-05-14. Each one either becomes an ADR above or maps to a refactor backlog entry below.
| # | Concern | Resolution | Tracked as |
|---|---|---|---|
| 1 | Update / Delete / dataset-status changes are silent on SSE + outbox | Treat as gap; lifecycle is observable (World B) — every mutation publishes + enqueues | ADR-009 / RB-01 |
| 2 | system_settings.silent_detection semantics |
Remove the flag; e2e harness covers debug isolation now | ADR-010 / RB-02 |
| 3 | F1 not transactional across FS + DB + outbox | Wrap lifecycle in a business-transaction (transactional outbox); FS writes happen post-commit | ADR-008 / RB-03 |
| 4 | XxHash64 over sampled bytes — collision risk |
Switch to XxHash3.Hash128 over the same sample (file-size-independent + 128-bit space) |
ADR-004 / RB-04 |
| 5 | FailsafeProducer.EnqueueAsync static method does DB I/O — violates coderule.mdc |
Accept as-is; documented deviation from rule | (no refactor) |
| 6 | detection_classes schema-mutable but no controller writes |
Admin-managed CRUD with read-through cache (modeled on PathResolver) |
ADR-011 / RB-06 |
| 7 | Flight (code) vs mission (suite spec) drift |
Rename code → Mission*; suite spec stays canonical |
ADR-012 / RB-07 |
| 8 | Dataset writes coupled directly to annotation rows via shared AppDataConnection |
Route dataset writes through AnnotationService (via a public domain interface) |
RB-08 |
| 9 | Stream consumer dedupe contract owner | This service owns it; dedupe by (annotationId, operation, dateTime) baked into the wire message |
ADR-013 / RB-09 |
| 10 | Hard-delete vs soft-delete on DeleteAnnotation |
Soft-delete: status → Deleted (40), files moved to a deleted_dir |
ADR-009 (folded in) / RB-01 |
Remaining Open Architectural Risks
These are residual risks that still need attention from later autodev steps (Test Spec, Refactor, Security Audit). Items previously listed here that have been resolved as of 2026-05-14 (Flight/mission drift, dataset coupling, hard-vs-soft delete, JWT issuer/audience validation, CORS environment gating, dev secret fallback) moved to the Resolved Architectural Decisions table above and the Refactor Backlog below.
- Horizontal scaling: SSE channel is per-instance (singleton
AnnotationEventService); the failsafe outbox uses no leasing/locking. Two pods will independently drain rows, with deletion keyed onid; under high concurrency the same row can be picked by both before either deletes — duplicate stream entries possible. Consumers must dedupe per ADR-013. (Touched by RB-03 / RB-09 indirectly but not solved by them.) - Swagger exposure in production: see ADR-005. Belongs to Step 14 (Security Audit). (CORS exposure was resolved by
CorsConfigurationValidator; ADR-006 retired.) UserIdbody field vs JWTNameIdentifierdrift (suite spec listsUserIdonPOST /annotations; code uses JWT subject). Reconcile in the suite spec.- No automated tests: addressed by autodev Phase A Steps 3–7 (Test Spec → Implement Tests → Run Tests).
FailsafeProducer.cs:138swallowsIOExceptionon image read silently (catch { }). Directcoderule.mdcviolation. Symptom in product: a missing or unreadable image yields a stream message withimage = nulland no log/metric — the gap is invisible to operators. Track on Refactor Backlog (RB-05).- JWKS HTTPS-only retrieval blocks containerised test harnesses that would otherwise serve a static JWKS over plain HTTP. Tests must either run a TLS-terminating sidecar in the test compose stack or rely on test-only configuration that relaxes
RequireHttps. Not a production risk; a Step 4 (Code Testability Revision) item.
Refactor Backlog
These items are the implementation work for the resolved decisions above. They are not part of Step 4 (Verification) corrections — they will be picked up by the autodev existing-code flow at Step 8 (Refactor) and/or new feature tasks in Phase B.
| ID | Scope | Source ADR / Risk | Notes |
|---|---|---|---|
| RB-01 | Wire lifecycle publish + outbox enqueue across Update / UpdateStatus / Delete (annotations + dataset). Includes the soft-delete behavior: DeleteAnnotation flips AnnotationStatus → Deleted (40), leaves the row, and moves image / label / thumbnail files to a new deleted_dir (added to directory_settings). Read paths must filter Status = Deleted (40) from default lists. |
ADR-009 | Open sub-questions: (a) UpdateAnnotation mapping — re-enqueue as Created or add QueueOperation.Updated + drainer branch; (b) which non-Validated/Deleted status transitions enqueue at all |
| RB-02 | Remove silent_detection (schema column, entity field, gating logic, DTOs) |
ADR-010 | Destructive schema change explicitly authorized |
| RB-03 | Introduce business-transaction wrapper (transactional outbox) for annotation lifecycle | ADR-008 | Reorders FS writes to post-commit; covers all mutation paths |
| RB-04 | Switch annotation id hashing to XxHash3.Hash128 over the same sampled buffer |
ADR-004 | Existing 16-char ids stay; new ids are 32-char hex |
| RB-05 | Replace catch { } at FailsafeProducer.cs:138 with logged failure path; surface as a metric |
Open Risk §6 | Downstream consumer should know an image-less message means a real disk error |
| RB-06 | Admin-managed detection_classes (CRUD endpoints [ADM], in-memory cache with Reset()) |
ADR-011 | Migrator seed remains as bootstrap; admin overrides per id; fix Smoke/Plane color collision while at it |
| RB-07 | Rename Flight* → Mission* across DTOs, controllers, services, and query-parameter names. media.waypoint_id column is unchanged (it's the physical id; mission is the logical concept). |
ADR-012 | Code-only rename to align with suite spec; suite stays canonical |
| RB-08 | Decouple 04 Dataset writes from direct annotations row mutations — route status writes through a public AnnotationService interface. Reads can stay direct for now (read coupling is lower-risk than write coupling). |
Open Risks (former §4) | Likely introduces an IAnnotationLifecycle (or similar) interface owned by 01 Annotations REST that 04 Dataset consumes via DI |
| RB-09 | Bake (annotationId, operation, dateTime) into the on-the-wire stream message; document the dedupe contract in suite/_docs/01_annotations.md. |
ADR-013 | Coordinate the suite-doc update with admin sync + AI training maintainers |
References
- Suite product spec:
../../../suite/_docs/01_annotations.md(REST contracts, SSE, Annotation Sync, camera, classes). - Suite dataset narrative:
../../../suite/_docs/09_dataset_explorer.md. - Component specs:
components/01..06_*/description.md. - Module docs:
modules/*.md. - File ownership (downstream skills):
module-layout.md. - Component diagram:
diagrams/components.md. - Per-flow diagrams:
diagrams/flows/.