Update autodev state, architecture documentation, and glossary terms

Transitioned the autodev state to phase 21, reflecting the completion of Step 5 and the drafting of Step 6 epics. Revised the architecture documentation to clarify the roles of the Tile Manager and its components, ensuring accurate representation of the system's operational flow. Updated glossary entries for Flight State and Operator to incorporate recent changes and enhance clarity on component interactions and responsibilities.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-10 00:21:34 +03:00
parent 723f574b14
commit 64542d32fc
52 changed files with 8789 additions and 88 deletions
@@ -0,0 +1,172 @@
# C6 — Tile Cache + Spatial Index
## 1. High-Level Overview
**Purpose**: own the on-companion persistent imagery store (tiles + descriptor index + tile metadata) in a layout that is byte-identical to `satellite-provider`'s on-disk format, so the C11 `TileUploader` post-landing upload (F10) is a straight copy. Provide read access to C2 (descriptor index), C2.5 / C3 (tile pixels), and write access to the orthorectifier path during F4 (mid-flight tile generation), to the C11 `TileDownloader` during F1 (pre-flight tile fetch), and to C10 during F1 (Manifest + FAISS index write).
**Architectural Pattern**: Repository — three concrete stores behind separate interfaces (`TileStore` for pixel + metadata I/O; `TileMetadataStore` for the Postgres spatial index; `DescriptorIndex` for FAISS HNSW). Single concrete implementation per interface today (`PostgresFilesystemStore`, `FaissDescriptorIndex`); future variants (e.g., RocksDB-backed metadata for resource-constrained tiers) can be added behind the same interfaces.
**Upstream dependencies**:
- C11 `TileDownloader` (writes `tiles` rows + JPEGs during F1 pre-flight provisioning, source='googlemaps').
- C10 CacheProvisioner (writes Manifest + FAISS index during F1 pre-flight provisioning, after C11 has populated tiles).
- C5 → orthorectifier → C6 (writes during F4 mid-flight tile generation, source='onboard_ingest').
- C11 `TileUploader` (reads during F10).
**Downstream consumers**:
- C2 VPR (reads descriptor index).
- C2.5 ReRanker (reads tile pixels).
- C3 CrossDomainMatcher (reads tile pixels).
## 2. Internal Interfaces
### Interface: `TileStore`
| Method | Input | Output | Async | Error Types |
|--------|-------|--------|-------|-------------|
| `read_tile_pixels` | `tile_id: TileId` | `TilePixelHandle` (mmap-backed) | No | `TileNotFoundError`, `TileFsError` |
| `write_tile` | `tile_blob: bytes, metadata: TileMetadata` | `None` | No | `TileFsError`, `TileMetadataError` |
| `tile_exists` | `tile_id: TileId` | `bool` | No | — |
### Interface: `TileMetadataStore`
| Method | Input | Output | Async | Error Types |
|--------|-------|--------|-------|-------------|
| `query_by_bbox` | `bbox, zoom` | `list[TileMetadata]` | No | `TileMetadataError` |
| `insert_metadata` | `TileMetadata` | `None` | No | `TileMetadataError` |
| `mark_voting_status` | `tile_id, status: VotingStatus` | `None` | No | `TileMetadataError` |
### Interface: `DescriptorIndex`
| Method | Input | Output | Async | Error Types |
|--------|-------|--------|-------|-------------|
| `search_topk` | `query: ndarray[D], k: int` | `list[(tile_id, distance)]` | No | `IndexUnavailableError` |
| `descriptor_dim` | `()` | `int` | No | — |
| `mmap_handle` | `()` | filesystem path / FAISS reader handle | No | — |
**Input/Output DTOs**:
```
TileId: composite (zoomLevel: int, lat: float, lon: float)
TileMetadata:
tile_id: TileId
tile_size_meters: float
tile_size_pixels: int
capture_timestamp: ISO 8601 datetime
source: enum {googlemaps, onboard_ingest}
freshness_label: enum {fresh, stale_active_conflict, stale_rear, downgraded}
flight_id: uuid (optional, set for onboard_ingest)
companion_id: string (optional, set for onboard_ingest)
quality_metadata: TileQualityMetadata (optional, set for onboard_ingest)
voting_status: enum {pending, trusted, rejected} (default pending for onboard_ingest)
TileQualityMetadata: see data_model.md (TileQualityMetadata entity)
TilePixelHandle: opaque (filesystem path + mmap pointer; consumer must not copy)
```
## 3. External API Specification
Not applicable — internal-only; C11 `TileUploader` reads via `TileStore` for upload to `satellite-provider` over an external HTTP API, and C11 `TileDownloader` writes via `TileStore` after fetching from `satellite-provider`. C11 owns those API calls, not C6.
## 4. Data Access Patterns
### Queries
| Query | Frequency | Hot Path | Index Needed |
|-------|-----------|----------|--------------|
| `read_tile_pixels` (3 candidates per frame from C2.5/C3) | 9 Hz | Yes | Filesystem mmap (page-cache backed) |
| `search_topk` from C2 (top-K=10) | 3 Hz | Yes | FAISS HNSW (.index file) |
| `query_by_bbox` from C10/C11/C12 (cache build + post-landing) | offline / pre-flight | No | btree spatial index on (zoomLevel, lat, lon) |
| `insert_metadata` + `write_tile` from F4 orthorectifier | bursty during flight (≤ a few Hz on average) | No (background) | btree spatial index |
### Caching Strategy
| Data | Cache Type | TTL | Invalidation |
|------|-----------|-----|-------------|
| Tile pixels | OS page cache (filesystem mmap) | flight lifetime | None — files are append-only during flight; F1 wipes for next flight |
| FAISS descriptor index | mmap (`IO_FLAG_MMAP_IFC`) | flight lifetime | F1 rebuild + content-hash gate |
| Tile metadata rows | Postgres internal page cache + connection pool | flight lifetime | F4 inserts append; F1 may truncate-and-reload |
### Storage Estimates
| Table/Collection | Est. Row Count (1yr) | Row Size | Total Size | Growth Rate |
|-----------------|---------------------|----------|------------|-------------|
| `tiles` (Postgres rows) | ~ tens to hundreds of thousands per cached area | ~256 B per row | up to ~100 MB | bounded by 10 GB AC-8.3 cache budget overall |
| `tiles/{zoomLevel}/{x}/{y}.jpg` | matches `tiles` row count | 50200 KB / tile | up to ~9 GB (after metadata/index overheads) | bounded by AC-8.3 |
| FAISS HNSW `.index` | one file per provisioning | 200 MB1 GB depending on backbone descriptor dim | bounded by AC-8.3 carve-out per D-C2-6/9/10 | F1 atomic rebuild |
| Onboard mid-flight ingest tiles (F4) | a few hundred per flight | same as above | bounded by AC-8.3 carve-out | per flight |
### Data Management
**Seed data**: F1 pre-flight provisioning is the seeding step (operator runs C12 → C11 `TileDownloader` (tiles) → C10 (engines + descriptors + manifest) → C6).
**Rollback**: F1 is idempotent (D-C10-1 manifest-hash check). Mid-flight writes during F4 are append-only; on F8 reboot, partially-written tiles are detected via the SHA-256 content-hash gate (D-C10-3) at takeoff, and the corrupted tile is dropped.
## 5. Implementation Details
**Algorithmic Complexity**:
- HNSW search: `O(log N)` in corpus size.
- bbox query: btree index `O(log N)` per probe; result-set scan `O(R)` in matched rows.
**State Management**:
- `TileStore` is stateless (filesystem-backed).
- `TileMetadataStore` holds a Postgres connection pool.
- `DescriptorIndex` holds the FAISS reader + mmap'd index file handle; lifetime = flight.
**Key Dependencies**:
| Library | Version | Purpose |
|---------|---------|---------|
| PostgreSQL (server + libpq) | 16.x (mirror of `satellite-provider`'s pin) | Spatial metadata index |
| psycopg / asyncpg | per project pin | Python Postgres client |
| FAISS (Python + C++) | upstream HEAD pinned per Plan-phase | HNSW retrieval |
| atomicwrites | latest | Atomic file replacement for `.index` rebuild (D-C10-3) |
| hashlib (stdlib) | stdlib | SHA-256 content-hash sidecars |
**Error Handling Strategy**:
- `TileNotFoundError`: tile pixel file missing despite metadata row present. Log + drop the candidate; signal cache inconsistency to C13 for FDR.
- `TileFsError`: I/O error on filesystem read/write. F4 writes retry once with backoff; reads do not retry (the candidate is dropped).
- `TileMetadataError`: Postgres failure. Pre-flight: F1 fails fast; takeoff blocked. Mid-flight: F4 writes drop the tile (logged); reads for C2's top-K fail to enrich and downstream uses descriptor distances only (degraded but not fatal).
- `IndexUnavailableError`: FAISS handle invalid (e.g., file replaced concurrently). Treated as fatal in flight; F8 reboot recovery re-mmaps.
## 6. Extensions and Helpers
| Helper | Purpose | Used By |
|--------|---------|---------|
| `WgsConverter` | shared with C4, C5, C8 | C4, C5, C6 (bbox queries), C8 |
| `Sha256Sidecar` | atomic write + SHA-256 content-hash sidecar pattern (D-C10-3) | C6, C10 |
| `OrthorectifierUtils` | minimal orthorectification math (used by F4 write path) | C5, C6 — keep inside the F4 boundary; if needed by more components, promote |
## 7. Caveats & Edge Cases
**Known limitations**:
- The on-disk layout MUST mirror `satellite-provider` exactly so F10 upload is byte-identical. Any deviation breaks AC-8.4. Verified by NFT-SEC-01 and IT-7.
- Postgres on a Jetson is heavyweight; runtime resource budget validated by NFT-LIM-01 (8 h replay).
**Potential race conditions**:
- F4 writes can race with C2/C2.5/C3 reads on the same tile filesystem. Solution: F4 writes use `atomicwrites` for atomic rename; readers see either the old version or the new version, never partial.
- FAISS `.index` mmap is read-only during a flight; F1 would never overwrite during a flight (F1 happens pre-flight only).
**Performance bottlenecks**:
- Postgres metadata writes during F4 are bursty; if bursts approach 10 Hz, the Postgres connection becomes a bottleneck. Bound by configurable backpressure on the F4 path.
- FAISS HNSW first query pays the page-in cost (multi-second). F2 takeoff load forces a warm-up query so F3 first frame is cheap.
## 8. Dependency Graph
**Must be implemented after**: external `satellite-provider` on-disk layout reference (read its README + table schema; mirror exactly).
**Can be implemented in parallel with**: C7 — independent paths.
**Blocks**: C2, C2.5, C3, C10, C11 (both `TileDownloader` and `TileUploader`), F1, F2, F3, F4, F8, F10.
## 9. Logging Strategy
| Log Level | When | Example |
|-----------|------|---------|
| ERROR | `TileNotFoundError` (cache inconsistency); `IndexUnavailableError` | `Tile pixel missing for tile_id=(z=18,lat=49.71,lon=37.45)` |
| WARN | F4 write retried; bbox query slower than threshold | `F4 tile write retry: tile_id=…; reason=fs_busy` |
| INFO | Provisioning complete; index loaded | `C6 ready: tiles=87654, faiss_dim=512, index_size_mb=412` |
| DEBUG | per-query timing | `C6 search_topk frame=12345 took=0.4ms; bbox_query took=2.1ms` |
**Log format**: structured JSON.
**Log storage**: stdout / journald / FDR via C13 (ERROR + WARN only; the FDR also captures every successful F4 write as part of the mid-flight-tile-gen log).
@@ -0,0 +1,168 @@
# Test Specification — C6 Tile Cache + Spatial Index
Component-scoped. Suite-level coverage in `_docs/02_document/tests/*.md`.
## Acceptance Criteria Traceability
| AC ID | Acceptance Criterion (one-line) | Test IDs | Coverage |
|-------|---------------------------------|----------|----------|
| AC-8.1 | Imagery via Suite Sat Service offline cache, ≥0.5 m/px | FT-P-15, FT-P-16, **C6-IT-01** | Covered |
| AC-8.2 | Tile freshness <6 mo (active-conflict) / <12 mo (rear) | FT-N-05, **C6-IT-02** | Covered |
| AC-8.4 | Mid-flight tile generation with quality metadata | FT-P-17, **C6-IT-03** | Covered |
| AC-NEW-3 | FDR ≤64 GB / flight, no silent drops (mid-flight tiles count toward C13 — C6 must not block) | NFT-LIM-02, **C6-IT-04** | Covered |
| AC-NEW-6 | System rejects/downgrades stale tiles | FT-N-05, FT-N-06, **C6-IT-05** | Covered |
| RESTRICT-SAT-2 | Cache budget 10 GB across operational area | NFT-LIM-03, **C6-IT-06** | Covered |
---
## Component-Internal Tests
### C6-IT-01: filesystem layout byte-identical to `satellite-provider`
**Summary**: tiles written by C11 `TileDownloader` and tiles emitted at mid-flight by C5/C6 produce identical filesystem paths and JPEG byte content for the same `(zoomLevel, lat, lon)` coordinate.
**Traces to**: AC-8.1 (interop with `satellite-provider` for the upload return path)
**Description**: download a known tile via C11 TileDownloader (against the real `satellite-provider` mock fixture); generate the same tile via the mid-flight orthorectification path (C5 + C6); assert (a) filesystem path equality, (b) JPEG bytes equality, (c) `tiles` table row equality on `(zoomLevel, lat, lon, source, content_sha256)`.
**Input data**: a single Derkachi-area tile (`zoom=18, lat≈49.94, lon≈36.31`).
**Expected result**: byte-identical layout.
**Max execution time**: 10 s.
---
### C6-IT-02: freshness gate at write-time
**Summary**: a tile with `produced_at < now - active_conflict_max_age` is rejected at C6 insert if the target sector is `active_conflict`.
**Traces to**: AC-8.2
**Description**: insert a synthetic tile with `produced_at = now - 7 months` into an `active_conflict` sector; assert C6 raises a freshness-rejection error and does NOT write to disk or DB. Repeat with `stable_rear` — assert the tile is written but flagged `freshness_downgraded`.
**Input data**: synthetic tile fixtures with controlled timestamps.
**Expected result**: rejection in active_conflict; downgrade-write in stable_rear.
**Max execution time**: 5 s.
---
### C6-IT-03: mid-flight tile insert with quality metadata
**Summary**: every tile written via the F4 mid-flight gen path has full `quality_metadata` (covariance, inlier count, source-label provenance).
**Traces to**: AC-8.4
**Description**: simulate F4 — feed C5 a 60 s sequence; the orthorectifier emits N tiles; assert each tile's row in `tiles` has non-null `quality_metadata.covariance_norm`, `quality_metadata.inlier_count`, `quality_metadata.source_label`. Empty quality metadata fails the test.
**Input data**: scripted F4 fixture.
**Expected result**: 100% of mid-flight tiles have full metadata.
**Max execution time**: 90 s.
---
### C6-IT-04: write throughput exceeds peak F4 demand
**Summary**: C6 sustains the peak mid-flight write rate (1 tile / 2 s typical, ~5/s peak) without queueing-induced backpressure on C5.
**Traces to**: AC-NEW-3 (no-silent-drop gate)
**Description**: synthetic burst — 100 tiles enqueued at 5 Hz; assert (a) all 100 land in `tiles` table within 30 s, (b) no tile dropped, (c) C5's write-side queue depth never exceeds the configured ceiling.
**Input data**: synthetic burst fixture.
**Expected result**: 100/100 written; no drops.
**Max execution time**: 60 s.
---
### C6-IT-05: stale-tile reject + downgrade across the F1 / F4 / F5 boundary
**Summary**: in F5 (steady-state retrieval), a tile flagged `freshness_downgraded` is returned with the downgrade flag, and any consumer that depends on freshness (e.g., spoof-rejection check) sees the flag.
**Traces to**: AC-NEW-6
**Description**: write a `freshness_downgraded` tile via C6-IT-02's stable_rear path; query via the C2/C3 fetch path; assert the returned tile carries the downgrade flag and that the consumer-side label propagates into the `EstimatorOutput.source_label`.
**Input data**: as C6-IT-02.
**Expected result**: downgrade flag survives the round-trip.
**Max execution time**: 10 s.
---
### C6-IT-06: 10 GB cache budget enforcement
**Summary**: when the operational-area working set hits 10 GB, C6 evicts oldest tiles by LRU; never silently dropping a tile that would put the system over budget.
**Traces to**: RESTRICT-SAT-2
**Description**: synthetic load that fills the cache to 10 GB - 50 MB; insert another 100 MB of tiles; assert (a) total disk usage stays ≤ 10 GB, (b) eviction count matches insert count above the threshold, (c) every eviction is logged at INFO with the evicted tile_id.
**Input data**: synthetic tile generator.
**Expected result**: 10 GB cap held; LRU eviction visible in logs.
**Max execution time**: 5 min on Tier-1 (disk-bound).
---
## Performance Tests
### C6-PT-01: per-tile read latency on Tier-2 (mmap hit)
**Traces to**: AC-4.1 (read-side; mmap is the design assumption for C2/C2.5/C3 hot paths)
**Load scenario**: 3 Hz × 10 candidate tiles per frame × 10 min replay; OS page cache warm.
**Expected results**:
| Metric | Target | Failure Threshold |
|--------|--------|-------------------|
| `get_tile_pixels` p95 (warm) | ≤ 0.5 ms | 5 ms |
| `get_tile_pixels` p95 (cold first read) | ≤ 50 ms | 200 ms |
---
## Security Tests
### C6-ST-01: content-sha256 mismatch rejects insert
**Summary**: a tile insert with a content_sha256 that doesn't match the actual JPEG bytes is rejected at C6 (defends against the cache-poisoning path D-C10-3 covers at takeoff).
**Traces to**: defensive (D-C10-3 + AC-NEW-7 cumulative)
**Test procedure**:
1. Compute SHA-256 of a known JPEG.
2. Submit insert with the JPEG bytes + a deliberately-wrong SHA-256.
3. Assert insert is rejected and the DB row is not created.
**Pass criteria**: insert rejection + no row.
**Fail criteria**: insert accepted or partial row created.
---
## Acceptance Tests
Covered transitively via FT-P-15 / FT-P-16 / FT-P-17 / FT-N-05 / FT-N-06.
---
## Test Data Management
| Data Set | Source | Size |
|----------|--------|------|
| Single Derkachi tile (`zoom=18`) | curated | <1 MB |
| Synthetic tile fixtures with controlled timestamps | scripted | ~50 MB |
| Synthetic 10 GB cache fill set | scripted, deterministic | 10 GB on disk |
| `flight_derkachi/normal_segment_60_stills/` for F4 simulation | shared | shared |
**Setup**: ephemeral Postgres + filesystem under `tests/tmp/c6/<test-id>/db/` and `…/tiles/`.
**Teardown**: drop tmp directory.
**Data isolation**: per-test ephemeral DB + filesystem.