diff --git a/_docs/02_document/contracts/data-access/tile-storage.md b/_docs/02_document/contracts/data-access/tile-storage.md new file mode 100644 index 0000000..35d705b --- /dev/null +++ b/_docs/02_document/contracts/data-access/tile-storage.md @@ -0,0 +1,117 @@ +# Contract: tile-storage + +**Component**: DataAccess +**Producer task**: AZ-484 — `_docs/02_tasks/todo/AZ-484_multi_source_tile_storage.md` +**Consumer tasks**: AZ-485 (planned T2 — UAV upload endpoint); future tasks adding additional sources (e.g., SatAR) +**Version**: 1.0.0 +**Status**: draft (frozen on AZ-484 implementation completion) +**Last Updated**: 2026-05-11 + +## Purpose + +Defines how satellite imagery tiles are persisted in the `tiles` table when more than one acquisition source can write to the same geographic cell. Producers must agree on the source enum, the captured-at semantics, and the per-source UPSERT contract. Readers must use the documented selection rule and tolerate the multi-source row layout. + +## Shape + +### Schema (PostgreSQL `tiles` table — relevant columns only) + +```sql +-- Pre-existing columns (unchanged) +id UUID PRIMARY KEY +tile_zoom INT NOT NULL +tile_x INT NOT NULL +tile_y INT NOT NULL +latitude DOUBLE PRECISION NOT NULL +longitude DOUBLE PRECISION NOT NULL +tile_size_meters DOUBLE PRECISION NOT NULL +tile_size_pixels INT NOT NULL +image_type VARCHAR(10) NOT NULL +file_path VARCHAR(500) NOT NULL +created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP +updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP + +-- New in v1.0.0 (this contract) +source VARCHAR(32) NOT NULL -- enum-stored: 'google_maps' | 'uav' +captured_at TIMESTAMP NOT NULL -- UTC; producer-supplied semantics, see below + +-- Vestigial columns (preserved per coderule.mdc; readers MUST NOT depend on them) +maps_version VARCHAR(50) NULL +version INT NULL +``` + +### Field reference + +| Field | Type | Required | Description | Constraints | +|-------|------|----------|-------------|-------------| +| `source` | enum (`TileSource`) stored as `VARCHAR(32)` | yes | Producer of the tile | `'google_maps'` or `'uav'`. New values require a contract version bump. | +| `captured_at` | `TIMESTAMP` UTC | yes | Producer-defined "moment the imagery represents" | For `google_maps`: `DateTime.UtcNow` at download time (provider does not expose original imagery date). For `uav`: the UAV capture timestamp supplied by the upload client. Must be UTC; non-UTC must be converted before write. | +| `(latitude, longitude, tile_zoom, tile_size_meters, source)` | composite | yes | Per-source uniqueness | Enforced via `UNIQUE INDEX idx_tiles_unique_location_source`. | + +### Index + +```sql +CREATE UNIQUE INDEX idx_tiles_unique_location_source + ON tiles (latitude, longitude, tile_zoom, tile_size_meters, source); +``` + +The previous 4-column unique index `(latitude, longitude, tile_zoom, tile_size_meters)` from migration 012 is dropped. + +### Producer write API + +| Operation | Repository method | Conflict semantics | +|-----------|-------------------|--------------------| +| Insert / replace same-source row for a cell | `ITileRepository.InsertAsync(TileEntity)` | `ON CONFLICT (latitude, longitude, tile_zoom, tile_size_meters, source) DO UPDATE SET file_path, tile_x, tile_y, captured_at, updated_at`. Producers MUST set `Source` and `CapturedAt`. | +| Update by primary key | `ITileRepository.UpdateAsync(TileEntity)` | Updates by `id` only. Caller's responsibility not to violate the unique index. | +| Delete by primary key | `ITileRepository.DeleteAsync(Guid)` | Removes a single row by `id`; no cascade. | + +### Consumer read API and selection rule + +| Operation | Repository method | Selection rule | +|-----------|-------------------|----------------| +| Read by `id` | `ITileRepository.GetByIdAsync(Guid)` | Returns the row identified by `id` (no source filter). | +| Read most-recent for a cell by slippy coordinates | `ITileRepository.GetByTileCoordinatesAsync(zoom, x, y)` | Returns the row with the highest `(captured_at, updated_at, id)` tuple across all sources for that cell. At most one row. | +| Read region | `ITileRepository.GetTilesByRegionAsync(lat, lon, sizeMeters, zoomLevel)` | Returns at most one row per `(latitude, longitude, tile_zoom, tile_size_meters)` group, selected by the same most-recent rule. | + +The selection rule is **most-recent across all sources** ordered by `captured_at DESC`, with `(updated_at DESC, id DESC)` as deterministic tie-breakers. + +## Invariants + +- **Inv-1**: Every row has a non-null `source` whose string value is a member of `TileSource`. Rows with unknown source values are a contract violation. +- **Inv-2**: Every row has a non-null `captured_at` in UTC. +- **Inv-3**: At most one row exists per `(latitude, longitude, tile_zoom, tile_size_meters, source)`. +- **Inv-4**: For any cell with one or more rows, the row returned by `GetByTileCoordinatesAsync` and the per-cell row returned by `GetTilesByRegionAsync` are identical. +- **Inv-5**: The `source` column value space is closed: only the strings emitted by `TileSource.ToString()` are valid. Adding a new producer requires a new enum member AND a contract version bump (minor). +- **Inv-6**: `captured_at` semantics are producer-defined per the Field Reference table above; consumers MUST NOT reinterpret it (e.g., consumers MUST NOT assume `captured_at` from `google_maps` reflects original imagery date). + +## Non-Goals + +- **Not covered**: Per-source historical revision retention. Same-source uploads to the same cell overwrite the previous row by design — this is not a versioned table. Consumers wanting season selection or rollback must propose a v2 schema. +- **Not covered**: Cross-source merging or compositing at read time. Reads return exactly one row per cell. +- **Not covered**: Quality scoring, threshold gating, or any policy beyond the selection rule. Quality enforcement happens upstream of the write (T2). +- **Not covered**: Backwards-compatible reads against the legacy 4-column unique index. Migration 013 is mandatory before any consumer of v1.0.0 runs. +- **Not covered**: The vestigial `maps_version` and `version` columns. Consumers MUST NOT read them; producers MUST NOT write them in v1.0.0+. + +## Versioning Rules + +- **Patch (1.0.x)**: Documentation clarifications, additional invariants that do not change runtime behavior, expanded test cases. +- **Minor (1.x.0)**: Adding a new `TileSource` enum member; adding optional columns that consumers may safely ignore; relaxing constraints in a backward-compatible way. +- **Major (2.0.0)**: Removing or renaming a column; changing the unique index columns; changing the selection rule (e.g., adding source priority); changing `captured_at` from required to optional or vice versa; introducing per-source historical revisions. + +Each version bump requires updating the Change Log below and notifying every consumer listed in the header. If consumers' tasks have not yet been written, the producer task is responsible for surfacing the change to the user before merging. + +## Test Cases + +| Case | Input | Expected | Notes | +|------|-------|----------|-------| +| valid-google-only | Insert `source='google_maps' captured_at=T1` for a fresh cell | Single row returned by region read; `source='google_maps'`, `captured_at=T1`. | Baseline regression case. | +| valid-multi-source | Insert `google_maps captured_at=T1`, then `uav captured_at=T2 > T1` for same cell | Both rows persisted; `GetByTileCoordinatesAsync` returns the `uav` row. | AC-1 + AC-2 of producer task. | +| same-source-upsert | Insert `uav captured_at=T1`, then `uav captured_at=T2 > T1` for same cell | Exactly one `uav` row remains, with `captured_at=T2` and updated `file_path`. | AC-3 of producer task. | +| time-tiebreak | Insert `google_maps captured_at=T`, then `uav captured_at=T` (identical) for same cell | Selection deterministic by `(updated_at DESC, id DESC)` tie-break; result must be reproducible across two test runs with the same seed. | Inv-4 enforcement. | +| backfill-completeness | Migration 013 against a snapshot DB with N pre-existing rows | Post-migration row count is N; every row has `source='google_maps'` and `captured_at = created_at`. | AC-4 of producer task. | +| invalid-source | Direct SQL insert with `source='satar'` (not in enum) | Repository read either rejects deserialization or raises a contract violation; behavior MUST surface the violation, not swallow it. | Inv-1 + `coderule.mdc` "never suppress errors silently". | + +## Change Log + +| Version | Date | Change | Author | +|---------|------|--------|--------| +| 1.0.0 | 2026-05-11 | Initial contract — multi-source schema (`source`, `captured_at`), 5-column unique key, most-recent-across-sources read rule. Produced by AZ-484. | autodev (Step 9) | diff --git a/_docs/02_tasks/_dependencies_table.md b/_docs/02_tasks/_dependencies_table.md index 2033b93..9d2ed9b 100644 --- a/_docs/02_tasks/_dependencies_table.md +++ b/_docs/02_tasks/_dependencies_table.md @@ -58,6 +58,13 @@ Roadmap: `_docs/04_refactoring/03-code-quality-refactoring/analysis/refactoring_ | AZ-380 | C27 | Delete CalculatePolygonDiagonalDistance | 4 | — | 1 | Done (In Testing) | | AZ-372 | C19 | dotnet format + NetAnalyzers + Coverlet | 4 | — | 3 | Done (In Testing) | +### Step 9 cycle 1 — New Task: Multi-source tile storage + UAV upload (AZ-483 epic) + +| Task | Title | Depends On | Points | Status | +|------|-------|-----------|--------|--------| +| AZ-484 | Multi-source tile storage schema (source + captured_at) | — | 5 | To Do | +| AZ-485 (planned) | UAV upload endpoint + quality gate | AZ-484, contract `tile-storage.md` v1.0.0 | ~5 | Not yet created (deferred to a future Step 9 loop) | + ## Execution Order ### Step 6 @@ -77,11 +84,16 @@ Phase 2 (Correctness): AZ-359 → AZ-357 → AZ-362 (AZ-362 needs AZ-353) Phase 3 (Structural cleanup): AZ-366 → AZ-377 → AZ-368 → AZ-367 → AZ-369 → AZ-365 → AZ-364 (folds AZ-360) — AZ-377 needs AZ-371 Phase 4 (Typing/config/tooling/polish): AZ-371 → AZ-370 → AZ-373 → AZ-374 → AZ-375 → AZ-376 → AZ-378 → AZ-379 → AZ-380 → AZ-372 +### Step 9 cycle 1 (Multi-source tile storage epic AZ-483) +1. AZ-484 — Multi-source tile storage schema (foundational) +2. AZ-485 (planned) — UAV upload endpoint + quality gate (consumer of AZ-484's contract) + ## Total Effort Step 6: 6 tasks, 17 story points Step 8 (02-coupling-refactoring): 6 tasks, 17 story points Step 8 (03-code-quality-refactoring): 27 tasks, ~66 story points +Step 9 cycle 1: 1 task created (AZ-484, 5 pts); 1 deferred (AZ-485) ## Coverage Verification diff --git a/_docs/02_tasks/todo/AZ-484_multi_source_tile_storage.md b/_docs/02_tasks/todo/AZ-484_multi_source_tile_storage.md new file mode 100644 index 0000000..25ba527 --- /dev/null +++ b/_docs/02_tasks/todo/AZ-484_multi_source_tile_storage.md @@ -0,0 +1,157 @@ +# Multi-source tile storage schema (source + captured_at) + +**Task**: AZ-484_multi_source_tile_storage +**Name**: Multi-source tile storage schema +**Description**: Extend `tiles` to allow one row per `(cell, source)`; add `captured_at`; update repository read selection to most-recent-across-sources; backfill existing rows; publish v1.0.0 storage contract; amend Architecture Vision. +**Complexity**: 5 points +**Dependencies**: None (builds on the post-AZ-357 / C06 schema state) +**Component**: DataAccess + Common (new TileSource enum) + TileDownloader (BuildTileEntity) +**Tracker**: AZ-484 +**Epic**: AZ-483 + +## Problem + +The `tiles` table currently allows exactly one row per `(latitude, longitude, tile_zoom, tile_size_meters)` (post-AZ-357 / C06). The Architecture Vision in `architecture.md` already commits to a Layer 1 + Layer 2 model where Google Maps and UAV imagery coexist per cell, but the schema cannot represent two producers for the same geographic cell. T1 (this task) closes that gap so that T2 (UAV upload, AZ-485) has a place to write its rows. Without T1, T2 would have to scaffold a temporary single-source path and immediately throw it away. + +## Outcome + +- `tiles` table accepts multiple rows per `(latitude, longitude, tile_zoom, tile_size_meters)` distinguished by `source` (enum) and stamps `captured_at` (UTC timestamp). +- Repository read paths return the most-recent row per cell across sources, deterministically (`captured_at DESC`, then `updated_at DESC`, then `id DESC`). +- All existing rows are backfilled to `source='google_maps'`, `captured_at=created_at` — zero orphan rows. +- `architecture.md` § Architecture Vision reflects the N-source model rather than only Layer 1 + Layer 2. +- `_docs/02_document/contracts/data-access/tile-storage.md` v1.0.0 published and referenced from this task and from the future T2 task (AZ-485). +- Region/route flows behave identically to today (Google Maps remains the only producer until T2 lands). +- All 200 unit + 5 smoke tests pass. + +## Scope + +### Included + +- Migration `SatelliteProvider.DataAccess/Migrations/013_AddTileSourceAndCapturedAt.sql`: + - Add `source VARCHAR(32) NOT NULL` and `captured_at TIMESTAMP NOT NULL` columns. + - Backfill existing rows: `source='google_maps'`, `captured_at = created_at`. + - Drop existing unique index `idx_tiles_unique_location` (4-column). + - Create new unique index `idx_tiles_unique_location_source` over `(latitude, longitude, tile_zoom, tile_size_meters, source)`. + - Wrap the entire migration in a single transaction. +- New enum `SatelliteProvider.Common.Enums.TileSource { GoogleMaps, Uav }`, string-stored via the existing `EnumStringTypeHandler` pattern (matches AZ-370 status / point-type enums). +- `SatelliteProvider.DataAccess.Models.TileEntity`: add `Source` (TileSource) and `CapturedAt` (DateTime) fields. +- `SatelliteProvider.DataAccess.Repositories.TileRepository`: + - Update `ColumnList` constant to include `source` and `captured_at`. + - Update `GetByIdAsync`, `GetByTileCoordinatesAsync`, `GetTilesByRegionAsync` to apply the most-recent-across-sources selection rule. + - Update `InsertAsync` to UPSERT on the new 5-column key, also setting `captured_at` and (re)setting `source` from the entity. + - Update `UpdateAsync` SET clause to include `source` and `captured_at`. +- `SatelliteProvider.Services.TileDownloader.TileService.BuildTileEntity`: set `Source = TileSource.GoogleMaps` and `CapturedAt = DateTime.UtcNow`. +- Update existing tests that construct `TileEntity` or mock `ITileRepository` to populate the new fields (12+ sites: `TileServiceTests` ~10, `RegionServiceTests` ~3, `RepositoryRefactorTests` ColumnList assertion, `InfrastructureTests` 2 mocks). +- New unit tests: + - Repository read selection: insert two rows for the same cell with different sources and `captured_at`, assert the latest one is returned by both `GetByTileCoordinatesAsync` and `GetTilesByRegionAsync`. + - Repository same-source UPSERT: re-insert a `uav` row, assert single row remains with updated `captured_at` and `file_path`. + - Migration backfill: pre/post row count + per-row `source='google_maps'` + `captured_at = created_at` invariants. +- Documentation: + - `_docs/02_document/architecture.md` § Architecture Vision amendment (Layer 1 + Layer 2 → N sources, append-by-source, latest-across-sources on read). + - `_docs/02_document/glossary.md`: add `Tile Source` and `Captured At` rows; update `Layer 1` / `Layer 2` rows to acknowledge the N-source generalization. + - `_docs/02_document/module-layout.md` § Component: Common Public API: list `SatelliteProvider.Common/Enums/TileSource.cs`. +- Contract file `_docs/02_document/contracts/data-access/tile-storage.md` v1.0.0 (already drafted; flip Status from `draft` to `frozen` upon implementation completion). + +### Excluded + +- `POST /api/satellite/upload` implementation — handled by T2 (AZ-485). +- Any quality assessment, threshold gating, or rejection logic — T2. +- Multi-revision / season-based / per-source historical retention — out of scope by design (the quality gate, not history retention, is the cache-freshness mechanism). +- Dropping vestigial `version` or `maps_version` columns (per `coderule.mdc` — column drops require explicit confirmation; user has confirmed leaving them). +- Public HTTP response shape changes (no new fields on `DownloadTileResponse` or any other response). + +## Acceptance Criteria + +**AC-1: Schema accepts source + captured_at** +Given the migration has run +When a tile row is inserted with `source='uav'` for a cell that already has a `source='google_maps'` row +Then both rows are stored and the unique index does not reject the insert. + +**AC-2: Read returns most-recent across sources** +Given a cell has two rows: one `source='google_maps' captured_at=T1` and one `source='uav' captured_at=T2 > T1` +When `GetByTileCoordinatesAsync` (or region read) is called for that cell +Then the row with `captured_at=T2` is returned (single row). + +**AC-3: Same-source UPSERT** +Given a cell has a row `source='uav' captured_at=T1` +When another insert arrives for `source='uav' captured_at=T2 > T1` +Then exactly one `source='uav'` row remains for that cell, with `captured_at=T2` and `file_path` updated. + +**AC-4: Backfill leaves no orphans** +Given the database had N rows in `tiles` before migration +When `013_AddTileSourceAndCapturedAt.sql` runs +Then the table still has N rows, every row has `source='google_maps'`, and every row has `captured_at = created_at`. + +**AC-5: Google Maps download path uses new fields** +Given a tile is downloaded via `TileService.DownloadAndStoreSingleTileAsync` or `DownloadAndStoreTilesAsync` +When the row is inspected +Then `source = 'google_maps'` and `captured_at` is the UTC timestamp at download time. + +**AC-6: Existing flows unchanged** +Given the post-T1 build +When `scripts/run-tests.sh --smoke` runs +Then all 200 unit + 5 smoke scenarios pass with no functional change to region/route output (CSV row count, stitched-image presence, ZIP size, status transitions all match pre-T1 baseline). + +**AC-7: Vision and contract documents updated** +Given T1 is complete +When `architecture.md`, `glossary.md`, `module-layout.md`, and `_docs/02_document/contracts/data-access/tile-storage.md` are inspected +Then the multi-source model is documented in all four, the contract is v1.0.0 with Status `frozen`, and the contract is referenced from this task spec. + +## Non-Functional Requirements + +**Performance** +- `GetTilesByRegionAsync` 95th-percentile latency must not regress more than 10% vs the pre-T1 baseline. The new 5-column unique index covers the existing read filter, so no regression is expected; the slow-query log threshold remains 500 ms (`TileRepository.SlowQueryThresholdMs`). + +**Compatibility** +- No public HTTP response field added or removed. +- Existing `tiles.maps_version` and `tiles.version` columns remain present and nullable; no consumer reads them in v1.0.0 of the storage contract. + +**Reliability** +- Migration MUST be transactional. A failure mid-migration MUST leave the table in its pre-migration state — never with a missing unique index or partially backfilled rows. + +## Unit Tests + +| AC Ref | What to Test | Required Outcome | +|--------|--------------|------------------| +| AC-1 | `TileRepository.InsertAsync` with two distinct `source` values for the same cell | Both rows inserted; `GetTilesByRegionAsync` returns at most one of them per the selection rule. | +| AC-2 | `TileRepository.GetByTileCoordinatesAsync` against a cell with rows of distinct sources | Row with the highest `captured_at` returned; tie-break by `updated_at DESC` then `id DESC` is deterministic across runs. | +| AC-3 | `TileRepository.InsertAsync` re-inserting a `uav` row for the same cell | Exactly one `uav` row remains with the new `captured_at` and `file_path`. | +| AC-5 | `TileService.BuildTileEntity` (via `DownloadAndStoreSingleTileAsync` test) | Resulting entity has `Source = TileSource.GoogleMaps` and `CapturedAt` close to `DateTime.UtcNow`. | +| AC-6 | All existing `TileServiceTests`, `RegionServiceTests`, `RepositoryRefactorTests`, `InfrastructureTests` after mock fixups | Suite remains green; `RepositoryRefactorTests` ColumnList assertion updated to include the new columns. | + +## Blackbox Tests + +| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References | +|--------|--------------------------|--------------|-------------------|-----------------| +| AC-4 | Apply migration 013 to a fixture DB seeded with the migration-012 schema and ≥3 sample rows | Run the migration | Pre-count = post-count, every row has `source='google_maps'`, every row has `captured_at = created_at`, the new unique index exists, the old one does not | Reliability | +| AC-6 | Pre-T1 baseline output captured for `BT-01` (single tile), `BT-03` (200m region), `BT-08` (route+ZIP) | Re-run `scripts/run-tests.sh --smoke` after T1 | All 5 smoke scenarios PASS; `./ready/` outputs match the pre-T1 byte-shape (file existence, row counts in CSV, ZIP size within ±1%) | Performance, Compatibility | + +## Constraints + +- DB column drops require explicit user confirmation (`coderule.mdc`); none performed in this task. The vestigial `version` and `maps_version` columns remain. +- No rename of any existing column. +- Migration must be transactional and must fail loudly on any inconsistency rather than silently corrupt state — per `coderule.mdc` "never suppress errors silently". +- Cross-component calls continue to flow through `ITileRepository` and `ITileService` interfaces in `Common`; no new compile-time `ProjectReference` between Layer-3 sibling components (the AZ-309 invariant). + +## Risks & Mitigation + +**Risk 1: Migration fails partway against a non-empty production-like DB** +- *Risk*: A failure mid-migration leaves the table without its unique index or with partially backfilled rows. +- *Mitigation*: Wrap the entire migration in `BEGIN; ... COMMIT;`. Verify on a Docker-Compose dev DB with seed data before applying anywhere else. Add a dedicated unit/integration test that runs the migration against a fixture DB and asserts the post-state. + +**Risk 2: Read-path selection rule mis-interacts with existing region cache hit logic** +- *Risk*: `TileService.DownloadAndStoreTilesAsync` does an existence check via `GetTilesByRegionAsync` and then re-downloads missing tiles. If the most-recent-per-cell rule changes which row is returned, region processing may double-download or skip cached tiles. +- *Mitigation*: AC-6 explicitly asserts pre/post equivalence. Smoke-test scenario `BT-03` (200m region) exercises the cache-hit path; pre/post tile-row counts must match. + +**Risk 3: TileEntity field additions break test mock construction broadly** +- *Risk*: ~12 test sites construct `TileEntity` directly or set up `Mock`. A missed site leaves a default-zero `CapturedAt` that may fail ORDER BY tie-breaks non-deterministically. +- *Mitigation*: Pre-implementation audit of all `new TileEntity` and `Mock` sites; the implementer must list them in the batch report. The `RepositoryRefactorTests` ColumnList assertion catches incomplete repository updates. + +**Risk 4: `EnumStringTypeHandler` registration drift** +- *Risk*: The existing handler may need an extra registration for `TileSource` to round-trip through Dapper. AZ-370 added similar enums; the registration site must be checked. +- *Mitigation*: Verify and extend the handler registration as part of this task (not a separate ticket). + +## Contract + +This task produces the contract at `_docs/02_document/contracts/data-access/tile-storage.md` (v1.0.0). +Consumers (T2 — UAV upload endpoint AZ-485, future SatAR provider, etc.) MUST read that file — not this task spec — to discover the storage interface. diff --git a/_docs/_autodev_state.md b/_docs/_autodev_state.md index 33160e2..b0f304f 100644 --- a/_docs/_autodev_state.md +++ b/_docs/_autodev_state.md @@ -2,13 +2,13 @@ ## Current Step flow: existing-code -step: 9 -name: New Task +step: 10 +name: Implement status: not_started sub_step: phase: 0 name: awaiting-invocation - detail: "" + detail: "AZ-484 only; T2 (AZ-485) deferred to a future cycle" retry_count: 0 cycle: 1 tracker: jira