Files
annotations/_docs/02_document/data_model.md
T
Oleksandr Bezdieniezhnykh 03f879206e docs+src: complete Steps 1-3 outcomes + auth re-sync baseline
This commit captures everything produced during autodev existing-code
Steps 1 (Document), 2 (Architecture Baseline Scan), and 3 (Test Spec),
together with the targeted auth + CORS re-sync triggered on 2026-05-14
when codebase drift was detected at Step 4 entry. None of this work was
previously committed.

Step 1 (Document) — 50+ _docs/02_document/ files: problem, solution,
architecture, system flows, glossary, module-layout, per-component
specs (01..06), modules, deployment, diagrams, data model, FINAL
report, verification log, discovery.

Step 2 (Architecture Baseline) — architecture_compliance_baseline.md.
Verdict PASS_WITH_WARNINGS (0 Critical, 0 High, 1 Medium, 2 Low). No
High/Critical findings; auto-chained to Step 3 per existing-code flow.

Step 3 (Test Spec) — _docs/02_document/tests/* (67 scenarios across
blackbox, security, resilience, resource-limit, performance), plus
e2e/docker-compose.test.yml, e2e/seed/run.sh, scripts/run-tests.sh,
scripts/run-performance-tests.sh. Coverage 88% over the active scope
(40 of 45 items covered, 6 RB-deferred, 5 documented-as-uncovered).

Targeted auth + CORS re-sync — replaces the deleted in-house token
issuer with a JWKS-verifier model. AuthController and TokenService
removed; JwtExtensions switched from HS256 symmetric to ES256 over
admin's JWKS. ConfigurationResolver and CorsConfigurationValidator
added under src/Infrastructure/. ADR-002 and ADR-006 retired; SEC-01,
SEC-02, SEC-03 marked Closed. One new testability risk recorded in
architecture.md Open Risks Section 6 (JWKS HTTPS gating).

Source changes:
- src/Auth/JwtExtensions.cs (modified) — ES256, JWKS, alg pinning
- src/Program.cs (modified) — DI wiring for ConfigurationResolver
  and CorsConfigurationValidator
- src/Controllers/AuthController.cs (deleted) — no in-service issuance
- src/Services/TokenService.cs (deleted) — same
- src/Infrastructure/ConfigurationResolver.cs (new)
- src/Infrastructure/CorsConfigurationValidator.cs (new)
- .env.example (new) — required env var documentation
- .gitignore (updated)

Cross-repo coordination: _docs/cross-repo/flights_h1_h2_h3_change_spec
captures the change-spec for downstream services that consumed the now
deleted /auth endpoints.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 20:19:05 +03:00

255 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Azaion.Annotations — Data Model
> Source-of-truth: `src/Database/DatabaseMigrator.cs` and `src/Database/Entities/*.cs`. Every column name and type below is reproduced from migrator SQL.
## Schema overview
```mermaid
erDiagram
media ||--o{ annotations : "media_id"
annotations ||--o{ detection : "annotation_id"
annotations_queue_records }o..o{ annotations : "annotation_ids JSON (no FK)"
detection_classes ||..o{ detection : "class_num (logical, no FK)"
media {
TEXT id PK
TEXT name
TEXT path
INTEGER media_type "MediaType enum"
INTEGER media_status "MediaStatus enum"
UUID waypoint_id
UUID user_id
TEXT duration "added later (ALTER)"
}
annotations {
TEXT id PK "image-bytes hash (ADR-004)"
TEXT media_id FK
BIGINT time "ticks of TimeSpan"
TIMESTAMP created_date
UUID user_id
INTEGER source "AnnotationSource enum"
INTEGER status "AnnotationStatus enum"
BOOLEAN is_split "added via ALTER"
TEXT split_tile "added via ALTER"
}
detection {
UUID id PK
REAL center_x
REAL center_y
REAL width
REAL height
INTEGER class_num
TEXT label
TEXT description
REAL confidence
INTEGER affiliation "AffiliationEnum"
INTEGER combat_readiness "CombatReadiness"
TEXT annotation_id FK
}
annotations_queue_records {
UUID id PK
TIMESTAMP date_time
INTEGER operation "QueueOperation enum"
TEXT annotation_ids "JSON array of TEXT ids"
}
system_settings {
UUID id PK
TEXT name
TEXT military_unit
INTEGER default_camera_width
NUMERIC default_camera_fov
INTEGER thumbnail_width "default 240"
INTEGER thumbnail_height "default 135"
INTEGER thumbnail_border "default 10"
BOOLEAN generate_annotated_image "default false"
BOOLEAN silent_detection "default false"
}
directory_settings {
UUID id PK
TEXT videos_dir "default /data/videos"
TEXT images_dir "default /data/images"
TEXT labels_dir "default /data/labels"
TEXT results_dir "default /data/results"
TEXT thumbnails_dir "default /data/thumbnails"
TEXT gps_sat_dir "default /data/gps_sat"
TEXT gps_route_dir "default /data/gps_route"
}
detection_classes {
SERIAL id PK
TEXT name
TEXT short_name
TEXT color "hex e.g. #FF0000"
INTEGER max_size_m
INTEGER photo_mode
}
user_settings {
UUID id PK
UUID user_id "UNIQUE (ix_user_settings_user_id)"
UUID selected_flight_id
NUMERIC annotations_left_panel_width
NUMERIC annotations_right_panel_width
NUMERIC dataset_left_panel_width
NUMERIC dataset_right_panel_width
}
camera_settings {
UUID id PK
NUMERIC altitude "default 100"
NUMERIC focal_length "default 50"
NUMERIC sensor_width "default 36"
}
```
> Mermaid `erDiagram` does not represent JSON-array references; the dotted line for `annotations_queue_records ↔ annotations` is logical only — there is **no FK** in the schema.
## Tables
### `media`
Owned writes: `03_media`. Reads: `01_annotations-rest`, `04_dataset`.
| Column | Type | Notes |
|--------|------|-------|
| `id` | TEXT PK | Application-generated |
| `name` | TEXT NOT NULL | |
| `path` | TEXT NOT NULL | Filesystem path under media dir |
| `media_type` | INTEGER NOT NULL DEFAULT 0 | `MediaType` enum (numeric wire — see `wire-enums.md`) |
| `media_status` | INTEGER NOT NULL DEFAULT 0 | `MediaStatus` enum |
| `waypoint_id` | UUID | Indexed `ix_media_waypoint_id` |
| `user_id` | UUID NOT NULL | |
| `duration` | TEXT | Added via `ALTER`; nullable |
### `annotations`
Owned writes: `01_annotations-rest`. Status writes: `04_dataset` (bulk + single PATCH).
| Column | Type | Notes |
|--------|------|-------|
| `id` | TEXT PK | **Hash of image bytes** (ADR-004); collision implication noted |
| `media_id` | TEXT NOT NULL FK → `media.id` | |
| `time` | BIGINT NOT NULL DEFAULT 0 | Ticks of `TimeSpan` (suite spec stores `time` as ticks) |
| `created_date` | TIMESTAMP NOT NULL DEFAULT NOW() | Indexed `ix_annotations_created_date` |
| `user_id` | UUID NOT NULL | Indexed `ix_annotations_user_id` |
| `source` | INTEGER NOT NULL DEFAULT 0 | `AnnotationSource` enum (AI=0, Manual=1) |
| `status` | INTEGER NOT NULL DEFAULT 0 | `AnnotationStatus` enum (Created=10, Edited=20, …) |
| `is_split` | BOOLEAN NOT NULL DEFAULT false | Added via `ALTER`; tile-splitting flag |
| `split_tile` | TEXT | Tile id reference |
Indexes: `ix_annotations_media_id`, `ix_annotations_created_date`, `ix_annotations_user_id`.
### `detection`
| Column | Type | Notes |
|--------|------|-------|
| `id` | UUID PK | |
| `center_x`, `center_y`, `width`, `height` | REAL NOT NULL | YOLO-normalized box |
| `class_num` | INTEGER NOT NULL | Logical reference to `detection_classes.id` |
| `label` | TEXT NOT NULL DEFAULT '' | |
| `description` | TEXT | |
| `confidence` | REAL NOT NULL DEFAULT 0 | |
| `affiliation` | INTEGER NOT NULL DEFAULT 0 | `AffiliationEnum` |
| `combat_readiness` | INTEGER NOT NULL DEFAULT 0 | `CombatReadiness` enum |
| `annotation_id` | TEXT NOT NULL FK → `annotations.id` | Indexed `ix_detection_annotation_id` |
### `annotations_queue_records` (failsafe outbox)
| Column | Type | Notes |
|--------|------|-------|
| `id` | UUID PK | |
| `date_time` | TIMESTAMP NOT NULL DEFAULT NOW() | |
| `operation` | INTEGER NOT NULL DEFAULT 0 | `QueueOperation` enum |
| `annotation_ids` | TEXT NOT NULL DEFAULT '[]' | JSON array of annotation ids — single or bulk |
No FK to `annotations` — by design, since rows can survive an annotation deletion if export is in flight.
### `system_settings`
Singleton-ish (one row in practice). Includes:
- `generate_annotated_image` (BOOLEAN) — emits a baked-in annotated image alongside YOLO label when true (suite spec).
- `silent_detection` (BOOLEAN) — suppresses SSE / sync for detection events.
- `thumbnail_*` — defaults 240×135 with 10 border.
### `directory_settings`
Roots consumed by `PathResolver` (`06_platform`). Defaults: `/data/{videos,images,labels,results,thumbnails,gps_sat,gps_route}`. Updates require `PathResolver.Reset` (Flow F7 invariant).
### `detection_classes`
Seeded with 19 rows (ids 018) on first run via `INSERT ... ON CONFLICT (id) DO NOTHING`. Names + Cyrillic short names + hex colors + `max_size_m` + `photo_mode`.
| id | name | short_name | color | max_size_m |
|----|------|------------|-------|-------------|
| 0 | ArmorVehicle | Броня | `#FF0000` | 7 |
| 1 | Truck | Вантаж. | `#00FF00` | 8 |
| 2 | Vehicle | Машина | `#0000FF` | 7 |
| 3 | Artillery | Арта | `#FFFF00` | 14 |
| 4 | Shadow | Тінь | `#FF00FF` | 9 |
| 5 | Trenches | Окопи | `#00FFFF` | 10 |
| 6 | MilitaryMan | Військов | `#188021` | 2 |
| 7 | TyreTracks | Накати | `#800000` | 5 |
| 8 | AdditionArmoredTank | Танк.захист | `#008000` | 7 |
| 9 | Smoke | Дим | `#000080` | 8 |
| 10 | Plane | Літак | `#000080` | 12 |
| 11 | Moto | Мото | `#808000` | 3 |
| 12 | CamouflageNet | Сітка | `#800080` | 14 |
| 13 | CamouflageBranches | Гілки | `#2f4f4f` | 8 |
| 14 | Roof | Дах | `#1e90ff` | 15 |
| 15 | Building | Будівля | `#ffb6c1` | 20 |
| 16 | Caponier | Капонір | `#ffb6c1` | 10 |
| 17 | Ammo | БК | `#33658a` | 2 |
| 18 | Protect.Struct | Зуби.драк | `#969647` | 2 |
Note: ids 9 and 10 (`Smoke`, `Plane`) share `#000080` — a pre-existing data quirk, not a bug introduced by this skill.
### `user_settings`
Per-user UI prefs. Unique index on `user_id` (`ix_user_settings_user_id`). Carries selected flight + four panel widths (annotator left/right, dataset left/right).
### `camera_settings`
Calibration triple `(altitude, focal_length, sensor_width)` with defaults `(100, 50, 36)`.
## Migration strategy
- **Tool**: hand-rolled embedded SQL in `DatabaseMigrator.Migrate`, executed at every startup via Linq2DB.
- **Safety**: every statement is idempotent — `CREATE TABLE IF NOT EXISTS`, `ALTER TABLE … ADD COLUMN IF NOT EXISTS`, seed `INSERT … ON CONFLICT DO NOTHING`.
- **Direction**: forward-only. No down migrations or `DROP` operations; renames or destructive changes require an out-of-band migration.
- **Drift**: the only authoritative schema definition is `Database/DatabaseMigrator.cs`. Live DBs should be diffed against it on cadence; suite-level monitoring is out of scope here.
## Seed data observations
Only `detection_classes` has seeded data; all other tables start empty. `system_settings`, `directory_settings`, and `camera_settings` are inserted **lazily** by their respective services on first read/write — confirm exact upsert semantics in Step 4 verification.
## Backward compatibility
- Wire enums are **integer-stable** (suite contract). Renaming an enum case does not break wire compatibility because numeric values are the contract.
- Annotation id format is the hash of image bytes — changing the hashing algorithm would invalidate cross-build references; treat as a contract.
- MessagePack key order in `DTOs/QueueMessages.cs` is the export contract for RabbitMQ stream consumers — changing it breaks downstream.
## Cross-component data ownership
| Component | Writes | Reads |
|-----------|--------|-------|
| `01_annotations-rest` | `annotations`, `detection`, files on disk, `annotations_queue_records` (Created/Updated/Deleted) | `media` |
| `02_annotations-realtime-sync` | drains `annotations_queue_records` | `annotations`, `detection`, file bytes |
| `03_media` | `media`, files on disk | — |
| `04_dataset` | `annotations.status` (single + bulk) → also writes `annotations_queue_records`, publishes SSE | `annotations`, `detection`, `media` |
| `05_settings-metadata` | all `*_settings` tables | `detection_classes` (read-through for UI) |
| `06_platform` | none (pure infra) | `directory_settings` (via `PathResolver`) |
## Open data-model questions (Step 4 verification)
1. **`annotations.id` collisions**: behavior under same-bytes re-upload (insert vs noop vs error) is implicit — confirm in `AnnotationService`.
2. **`annotations_queue_records.annotation_ids` shape**: confirm consistent JSON formatting (escaped strings vs raw) across `Created`, `Updated`, `StatusChanged`, `Deleted`, bulk variants.
3. **`detection_classes` mutability**: schema permits inserts via `ALTER`/seed, but no controller exposes writes today — confirm whether class catalog is intended to be DB-managed or static.
4. **`media.duration`**: nullable TEXT — confirm format (`hh:mm:ss` vs ISO 8601 vs ticks).
5. **Lazy upsert** of `system_settings` / `directory_settings` / `camera_settings` first-row creation — confirm services initialize defaults vs rely on user-driven inserts.