# Data Model > Date: 2026-05-09 (Plan Phase 2b — initial draft). > Inputs: `_docs/02_document/architecture.md` § 4 (Data Model Overview), § 5, § 7; `_docs/02_document/glossary.md`; `_docs/00_problem/{acceptance_criteria,restrictions}.md`; `_docs/01_solution/solution.md`; parent-suite `satellite-provider` schema (mirrored target for byte-identical post-landing upload). > Scope: persistent and on-disk artifacts produced or consumed by the onboard companion, the operator-side pre-flight tooling (C12), the operator-side Tile Manager (C11 — pre-flight tile download + post-landing tile upload), and the FDR (C13). In-flight runtime DTOs (per-frame estimates, IMU windows, etc.) are summarised at the end of this document but are NOT persistent and NOT in the migration scope. --- ## 1. Overview The onboard system is **mostly streaming / in-memory**: the per-frame pipeline (`NavCameraFrame → C1 + C2 → C2.5 → C3/C3.5 → C4 → C5 → C8`) operates entirely on in-process Python DTOs and never persists those frames or their derived per-frame estimates as relational rows. What IS persistent splits into four backing stores; the data model below is the union of all four. | Store | Technology | What lives here | Persistence scope | |---|---|---|---| | **Relational** | PostgreSQL 16 (mirror of `satellite-provider`'s schema; one DB per companion) | `tiles` (mirrored), `flights`, `sector_classifications`, `manifests`, `engine_cache_entries` | Across flights; survives reboot | | **Filesystem (tile cache)** | Local NVM (`./tiles/{zoomLevel}/{x}/{y}.jpg` + `./tiles/{zoomLevel}/{x}/{y}.json` sidecar) | Tile JPEG bodies; per-tile metadata sidecar | Across flights; survives reboot | | **Filesystem (artifacts)** | Local NVM | FAISS HNSW `.index`; TRT engines + INT8 calibration caches; camera calibration JSON; manifest JSON | Across flights; rebuildable from the relational tables + `satellite-provider` | | **Filesystem (FDR)** | Local NVM, ≤ 64 GB ring per flight | `FdrRecord` segments; AC-8.5 ≤ 0.1 Hz failed-tile thumbnails | Per flight; rolling, oldest-segment-dropped on overrun | **Architecture-driven principles** (carried verbatim from `architecture.md` Principle list): 1. **The persistent tile schema MUST stay byte-compatible with `satellite-provider`'s** so post-landing upload (D-PROJ-2) is a copy, not a transformation. Onboard-only fields are added as **additive columns**; the canonical Google-Maps-sourced columns must keep their semantics and types. 2. **No raw-frame storage** (AC-8.5). The only image persistence path is the orthorectified `Tile` (mid-flight or pre-flight). The single forensic exception is the AC-8.5 ≤ 0.1 Hz failed-tile thumbnail log inside the FDR. 3. **The camera calibration artifact is the ONLY way camera-specific math enters the system** (Principle #1). Test fixtures (`adti26.json`) and production deployments (`adti20.json`) load different artifacts on the same code path; the schema MUST treat `adti20` vs `adti26` as data, not as a code branch. 4. **No persistent secrets on the airborne companion image.** The MAVLink-2.0 signing key and the per-flight onboard tile-signing key are generated at takeoff load, logged to the FDR, and discarded with the FDR ring. No long-lived secrets table. 5. **Append-only / additive-only by default.** Migrations may add tables, columns, indexes, and CHECK constraints, but MUST NOT rename or drop existing columns without an ADR-recorded deprecation window. The `tiles` schema specifically is **frozen on the canonical columns** and only extensible via additive onboard-only columns. 6. **Honest covariance is a schema concern, not a tuning knob.** Every persisted estimate-derived structure (`TileQualityMetadata`, FDR-emitted-position records) carries the same 2×2 horizontal sub-matrix that drives AC-NEW-4 / AC-NEW-7. Lossy storage of covariance is a defect. **What is NOT in scope for this document**: - Per-frame in-memory DTOs (`VioOutput`, `VprQuery`, `VprResult`, `RerankResult`, `MatchResult`, `PoseEstimate`, `EmittedExternalPosition`) — they are listed as "non-persistent runtime entities" in § 5 below for completeness only; their schema lives in component specs (Step 3). - The `satellite-provider`-side voting / trust schema (D-PROJ-2 design task #2). The onboard side writes `voting_status = pending` and reads only `voting_status = trusted` (or operator-overridden `pending`); the actual promotion logic is parent-suite work. - The operator-workstation pre-flight tooling UI state (C12) — that is operator-side code with its own persistence story, out of the onboard companion's data scope. --- ## 2. Persistent Entities ### 2.1 `tiles` (PostgreSQL — MIRRORS `satellite-provider`; additive onboard columns) The tile is the single most important persistent entity. The schema deliberately starts from `satellite-provider`'s existing PostgreSQL `tiles` table (Google-Maps-sourced) and extends it with onboard-only columns. The **canonical columns** are read-and-writeable from both sides; the **onboard-only columns** are NULL on `satellite-provider`-sourced rows and populated only on companion-orthorectified rows. **Storage split**: row in PostgreSQL ↔ JPEG body on filesystem at `./tiles/{zoomLevel}/{x}/{y}.jpg`. The JPEG is byte-identical to what `satellite-provider` produces / consumes; the sidecar `./tiles/{zoomLevel}/{x}/{y}.json` carries the same row content as a JSON dump for the Tile Manager's batched HTTP form-data payload (D-PROJ-2 contract). #### 2.1.1 Columns | Column | Type | Origin | Constraints | Description | |---|---|---|---|---| | `id` | `bigserial` PK | both | NOT NULL | Surrogate primary key (consistent with `satellite-provider`). | | `zoom_level` | `int` | both | NOT NULL, CHECK (`zoom_level` BETWEEN 10 AND 22) | Slippy/XYZ zoom; same semantics as `satellite-provider`. | | `tile_x` | `int` | both | NOT NULL | Slippy/XYZ X index. | | `tile_y` | `int` | both | NOT NULL | Slippy/XYZ Y index. | | `latitude` | `double precision` | both | NOT NULL | Tile center latitude (WGS84). | | `longitude` | `double precision` | both | NOT NULL | Tile center longitude (WGS84). | | `tile_size_meters` | `double precision` | both | NOT NULL, CHECK (`tile_size_meters` > 0) | Ground footprint side length in metres (lat-adjusted m/px × pixels). | | `tile_size_pixels` | `int` | both | NOT NULL, CHECK (`tile_size_pixels` > 0) | Pixel side length (square tiles). | | `capture_timestamp` | `timestamptz` | both | NOT NULL | When the tile imagery was captured (Google Maps ingest time for `googlemaps`; UAV nav-camera frame timestamp for `onboard_ingest`). | | `compression` | `text` | both | NOT NULL, default `'jpeg'` | Currently `jpeg` only; reserved for future formats. | | `crs` | `text` | both | NOT NULL, default `'EPSG:3857'` | Spherical Mercator; same as `satellite-provider`. | | `source` | `text` | both | NOT NULL, CHECK (`source` IN (`'googlemaps'`, `'onboard_ingest'`)) | New CHECK domain; existing `satellite-provider` rows pre-migration are backfilled `'googlemaps'`. | | `flight_id` | `uuid` | onboard-only | NULL for `googlemaps`; NOT NULL for `onboard_ingest`; FK → `flights.id` | Which flight produced the tile (D-PROJ-2 design task #1 contract). | | `companion_id` | `text` | onboard-only | NULL for `googlemaps`; NOT NULL for `onboard_ingest` | Deployed unit identifier; threat-model boundary marker. | | `quality_metadata` | `jsonb` | onboard-only | NULL for `googlemaps`; NOT NULL for `onboard_ingest`; schema § 2.1.3 | Sufficient inputs for parent-suite voting (D-PROJ-2 #2). | | `voting_status` | `text` | onboard-only | NULL for `googlemaps`; NOT NULL for `onboard_ingest`; CHECK (`voting_status` IN (`'pending'`, `'trusted'`, `'rejected'`)); default `'pending'` | Cache-poisoning safety budget gate (AC-NEW-7). The companion never writes `'trusted'`; promotion is parent-suite-side only. | | `freshness_status` | `text` | both | NOT NULL, CHECK (`freshness_status` IN (`'fresh'`, `'stale_warn'`, `'stale_reject'`)); default `'fresh'` | Computed at ingest from `capture_timestamp` vs. the area's `SectorClassification` (active-conflict 6 mo / stable rear 12 mo); demoted on age regardless of `voting_status`. | | `signature` | `bytea` | onboard-only | NULL for `googlemaps`; NOT NULL for `onboard_ingest` | Per-flight onboard signing key signature over the upload payload (D-PROJ-2 § Design task #1). | | `created_at` | `timestamptz` | both | NOT NULL, default `now()` | Row creation time on the companion (or on `satellite-provider` for `googlemaps` rows). | | `updated_at` | `timestamptz` | both | NOT NULL, default `now()` | Updated by trigger on `voting_status` / `freshness_status` change. | **Composite uniqueness**: `UNIQUE (zoom_level, tile_x, tile_y, source, flight_id)` — `(zoom_level, tile_x, tile_y)` is unique per `(source, flight_id)` tuple, so the same area can have one `googlemaps` row plus N `onboard_ingest` rows from different flights without collision. `flight_id` is treated as the empty UUID `'00000000-...-...-000000000000'` for `googlemaps` rows so the composite uniqueness still holds. **Indexes**: - B-tree on `(zoom_level, tile_x, tile_y)` — primary spatial lookup path for VPR retrieval and pre-flight cache hydration. - B-tree on `(latitude, longitude)` — bounding-box queries for sector classification and spatial-coverage reports. - B-tree on `voting_status` partial WHERE `source = 'onboard_ingest'` — operator-orchestrator queries for "which mid-flight tiles are still pending promotion?". - B-tree on `flight_id` — FDR cross-reference; post-landing upload batching. - B-tree on `created_at` — pruning / rollover queries. **Filesystem invariant**: for every row with `source = 'onboard_ingest'`, the JPEG body MUST exist at `./tiles/{zoom_level}/{tile_x}/{tile_y}.jpg` and a sidecar `./tiles/{zoom_level}/{tile_x}/{tile_y}.json` MUST exist with the row's full JSON form. The Tile Manager's `TileUploader` (C11) reads both and the row may be deleted from the local PostgreSQL only after `satellite-provider` returns 2xx for that `(flight_id, zoom_level, tile_x, tile_y)` tuple. #### 2.1.2 Lifecycle | Trigger | Action on `tiles` row | Action on filesystem | Action on FDR | |---|---|---|---| | Pre-flight cache hydration (C11 `TileDownloader` download from `satellite-provider`) | INSERT `source='googlemaps'`, `voting_status=NULL`, `freshness_status` computed | WRITE `./tiles/{z}/{x}/{y}.jpg` (atomic) | n/a (pre-flight) | | Mid-flight orthorectify-and-store (in-air) | INSERT `source='onboard_ingest'`, `voting_status='pending'`, `quality_metadata` populated | WRITE `./tiles/{z}/{x}/{y}.jpg` (atomic, dedup by composite key) + sidecar JSON | EVENT `tile_emitted` with row PK | | Freshness re-evaluation (background, periodic per AC-8.2 / AC-NEW-6) | UPDATE `freshness_status` | n/a | EVENT `freshness_demoted` if changed | | Post-landing upload success (C11 `TileUploader`, op-workstation) | DELETE local row after `satellite-provider` 2xx | DELETE local JPEG + sidecar | EVENT `tile_uploaded` with `(flight_id, z, x, y)` and remote ack ID | | Post-landing upload failure (retryable) | UPDATE `updated_at`; row retained for retry | retain JPEG + sidecar | EVENT `tile_upload_failed` with reason | **No DELETE in flight**: rows produced in flight (`source='onboard_ingest'`) MUST NOT be deleted while `flight_state == IN_AIR`. Defense-in-depth on Principle #4 (process-level isolation already prevents the upload code path from being loaded). #### 2.1.3 `quality_metadata` jsonb sub-schema For `source = 'onboard_ingest'` rows, `quality_metadata` carries the full input the parent-suite voting layer (D-PROJ-2 design task #2) needs without re-derivation. Schema is fixed; new keys are additive only. ```jsonc { "estimator_label": "satellite_anchored" | "visual_propagated" | "dead_reckoned", "covariance_2x2": [[, ], [, ]], "last_anchor_age_ms": = 0>, "mre_px": = 0>, // reprojection error at the contributing match "imu_bias_norm": = 0>, // VIO health proxy "vio_strategy": "okvis2" | "vins_mono" | "klt_ransac", "vpr_strategy": "ultra_vpr" | "mega_loc" | "mix_vpr" | "sela_vpr" | "eigen_places" | "net_vlad" | "salad", "matcher": "disk_lightglue" | "aliked_lightglue" | "xfeat", "adhop_invoked": , "thermal_throttle_active": , // D-CROSS-LATENCY-1 hybrid was active at emit time "build_kind": "deployment" | "research" // ADR-002 disambiguation; voting layer may down-weight research-binary tiles } ``` The `tiles_quality_metadata_schema_v1` jsonschema lives at `_docs/02_document/schemas/tiles_quality_metadata_v1.json` and is referenced by both the local INSERT path and the C11 `TileUploader` payload validator. Schema versioning: bumping any field requires a new `*_schema_v2.json` and the row stamps the schema version into the jsonb under `_v` (additive-only). --- ### 2.2 `flights` (PostgreSQL — onboard-only) A lightweight tracking row per flight, used by the FDR's manifest, the Tile Manager `TileUploader` batching, and the cache-poisoning safety audit (AC-NEW-7). | Column | Type | Constraints | Description | |---|---|---|---| | `id` | `uuid` PK | NOT NULL | Generated at takeoff load. | | `companion_id` | `text` | NOT NULL | Same as `tiles.companion_id`. | | `started_at` | `timestamptz` | NOT NULL, default `now()` | First MAVLink HEARTBEAT received post-takeoff. | | `ended_at` | `timestamptz` | NULL until landing | Set on `flight_state` transition `IN_AIR → ON_GROUND`. | | `signing_key_fingerprint` | `bytea` | NOT NULL | SHA-256 of the per-flight onboard signing pubkey; private key is never persisted. | | `mavlink_signing_key_fingerprint` | `bytea` | NULL on iNav-only flights | SHA-256 of the per-flight MAVLink-2.0 signing key (D-C8-9 = (d), AP path only). | | `fc_profile` | `text` | NOT NULL, CHECK (`fc_profile` IN (`'ardupilot_plane'`, `'inav'`)) | Which FC profile this flight ran. | | `vio_strategy` | `text` | NOT NULL | Active C1 strategy this flight (ADR-001 — locked at startup). | | `vpr_strategy` | `text` | NOT NULL | Active C2 strategy. | | `build_kind` | `text` | NOT NULL, CHECK (`build_kind` IN (`'deployment'`, `'research'`)) | Disambiguates IT-12 research-binary flights from production flights. | | `manifest_id` | `uuid` | NOT NULL, FK → `manifests.id` | Pre-flight manifest used for this flight (ties to model + calibration + corpus + sector). | | `fdr_path` | `text` | NOT NULL | Local path to the FDR ring directory for this flight. | | `created_at` | `timestamptz` | NOT NULL, default `now()` | | **Lifecycle**: row INSERTed at takeoff load; UPDATE on landing; never UPDATEd in flight after the takeoff-load row has been written. Row is never DELETED on the companion (the FDR retention policy decides physical removal). --- ### 2.3 `sector_classifications` (PostgreSQL — operator-set, onboard-side cache) Mirrors operator-orchestrator C12's authoritative sector classification onto the companion so the freshness gate (AC-8.2 / AC-NEW-6) can be evaluated locally without a network call. | Column | Type | Constraints | Description | |---|---|---|---| | `id` | `bigserial` PK | NOT NULL | | | `polygon_geojson` | `jsonb` | NOT NULL | GeoJSON polygon (WGS84) of the classified area. | | `classification` | `text` | NOT NULL, CHECK (`classification` IN (`'active_conflict'`, `'stable_rear'`)) | Drives the freshness threshold. | | `freshness_threshold_days` | `int` | NOT NULL, CHECK (`freshness_threshold_days` IN (180, 365)) | Materialized: 180 for `active_conflict`, 365 for `stable_rear`. | | `set_by` | `text` | NOT NULL | Operator identifier from C12. | | `set_at` | `timestamptz` | NOT NULL, default `now()` | | | `revoked_at` | `timestamptz` | NULL until revoked | Only set when a polygon's classification is changed; the new row supersedes; the old row is retained for audit. | | `manifest_id` | `uuid` | NOT NULL, FK → `manifests.id` | Which pre-flight manifest baked this classification. | **Indexes**: GiST on `polygon_geojson` for spatial containment queries; B-tree on `manifest_id`. **Note**: this table is the onboard cache of operator authority. The companion never *originates* a classification; it only consumes the rows the operator-workstation tooling (C12) staged into the manifest. A row mismatch between the manifest's expected classifications and the table is a content-hash gate failure (D-C10-3) → companion refuses takeoff. --- ### 2.4 `manifests` (PostgreSQL — pre-flight idempotence gate) Implements D-C10-1 (pre-flight idempotence): the same input bundle (model + calibration + tile corpus + sector classifications) must produce a stable manifest hash. The manifest is the takeoff-load content-hash root; D-C10-3 (atomic write + content-hash gate) verifies it. | Column | Type | Constraints | Description | |---|---|---|---| | `id` | `uuid` PK | NOT NULL | | | `manifest_hash` | `bytea` | NOT NULL, UNIQUE | SHA-256 over the canonicalized manifest contents. | | `model_bundle_hash` | `bytea` | NOT NULL | SHA-256 over the (VIO + VPR + matcher + AdHoP) model bundle. | | `calibration_artifact_hash` | `bytea` | NOT NULL | SHA-256 over the loaded camera calibration JSON. | | `corpus_hash` | `bytea` | NOT NULL | SHA-256 over the sorted list of `(zoom_level, tile_x, tile_y, content_hash)` for `source='googlemaps'` rows in the manifest's spatial scope. | | `sector_classifications_hash` | `bytea` | NOT NULL | SHA-256 over the sorted list of `sector_classifications` rows. | | `engine_cache_bundle_hash` | `bytea` | NOT NULL | SHA-256 over the linked TRT engines + INT8 calibration caches. | | `descriptor_index_hash` | `bytea` | NOT NULL | SHA-256 over the FAISS `.index` file. | | `created_at` | `timestamptz` | NOT NULL, default `now()` | | | `staged_by` | `text` | NOT NULL | Operator workstation identifier. | | `verified_at` | `timestamptz` | NULL until verified | Set by C10 at takeoff load when the content-hash gate passes. | **Indexes**: UNIQUE on `manifest_hash`; B-tree on `created_at`. --- ### 2.5 `engine_cache_entries` (PostgreSQL — TRT engine catalogue) Implements D-C10-7: TRT engines are keyed by the (SM, JetPack, TRT, precision) tuple so a deployed image with the wrong tuple cannot accidentally consume an engine compiled for a different SM. | Column | Type | Constraints | Description | |---|---|---|---| | `id` | `bigserial` PK | NOT NULL | | | `model_name` | `text` | NOT NULL | Logical name (`disk`, `lightglue`, `ultra_vpr`, `aliked`, `xfeat`, …). | | `model_revision` | `text` | NOT NULL | Upstream commit / weight revision pin. | | `sm` | `text` | NOT NULL | Jetson SM (e.g., `87` for Orin). | | `jetpack_version` | `text` | NOT NULL | E.g., `6.2`. | | `tensorrt_version` | `text` | NOT NULL | E.g., `10.3`. | | `precision` | `text` | NOT NULL, CHECK (`precision` IN (`'fp32'`, `'fp16'`, `'int8'`, `'int8_fp16_mixed'`)) | | | `engine_path` | `text` | NOT NULL | Filesystem path to the `.engine` binary. | | `engine_size_bytes` | `bigint` | NOT NULL | | | `engine_hash` | `bytea` | NOT NULL | SHA-256 over the engine file. | | `int8_calibration_path` | `text` | NULL for non-INT8 | Filesystem path to the INT8 calibration cache. | | `int8_calibration_hash` | `bytea` | NULL for non-INT8 | SHA-256 over the calibration cache. | | `created_at` | `timestamptz` | NOT NULL, default `now()` | | **Composite uniqueness**: `UNIQUE (model_name, model_revision, sm, jetpack_version, tensorrt_version, precision)`. **Filesystem invariant**: `engine_path` and `int8_calibration_path` (when set) MUST be valid local files; their on-disk SHA-256 MUST match `engine_hash` / `int8_calibration_hash`. C7 verifies the hash on engine load; mismatch ⇒ takeoff refused. --- ### 2.6 Camera calibration artifact (filesystem JSON, NOT in PostgreSQL) Per Principle #1, camera-specific math enters only via this file. It is a single JSON document loaded once at startup. It is **not** stored in PostgreSQL because it is treated as an *image* of the camera (test fixture vs. production unit) rather than a queryable record — different binaries / different fixtures point at different files via config, and the schema is the same on every code path. **Path convention**: - Production: `/etc/gps-denied/calibration/adti20.json` (per-deployed-unit, post D-PROJ-1 hybrid) - Test fixtures: `tests/fixtures/calibration/adti26.json` **Schema** (camera_calibration_v1): ```jsonc { "schema_version": 1, "camera_name": "adti20" | "adti26" | , "model": "ADTi 20MP 20L V1" | , "sensor": { "width_px": 5472, "height_px": 3648, "pixel_size_um": 3.76, "sensor_size_mm": [23.6, 15.7] }, "intrinsics": { "fx_px": , "fy_px": , "cx_px": , "cy_px": }, "distortion": { "model": "opencv_radtan" | "opencv_kannala_brandt", "coefficients": [, ...] }, "body_to_camera": { "rotation_quaternion_xyzw": [, , , ], "translation_m": [, , ] }, "acquisition_method": "factory_sheet" | "checkerboard_refined" | "hybrid", "acquisition_timestamp": , "acquisition_notes": , "content_hash": } ``` The artifact's `content_hash` is the value referenced by `manifests.calibration_artifact_hash`. --- ### 2.7 FAISS HNSW descriptor index (filesystem binary, NOT in PostgreSQL) VPR descriptors are stored in a FAISS-native `.index` file. This is the on-disk form of `VprQuery` candidates' lookup index, not a queryable relational entity. Implementation details: - Path: `/var/lib/gps-denied/descriptors/{vpr_strategy}_{model_revision}.index` - Atomic write: per D-C10-3 — write to `*.index.tmp`, fsync, rename via `atomicwrites`. - Content-hash gate: SHA-256 over the file body, recorded in `manifests.descriptor_index_hash`. - Companion of the `.index` is a `*.meta.json` carrying `{tile_id → descriptor_offset}` for the rows under the manifest's spatial scope. The mapping uses `tiles.id` PKs, so a tile pruned from the relational table invalidates the offset and forces a re-build of the index. The descriptor index is **rebuildable** from `(tiles, model_bundle)` + a deterministic VPR feature-extraction pass; therefore it is not part of any backup / DR target — losing the `.index` re-runs C10 descriptor generation on the operator workstation pre-flight. --- ### 2.8 Flight Data Recorder (`FdrRecord`, filesystem ring, NOT in PostgreSQL) The FDR is a per-flight ≤ 64 GB local ring buffer of structured records. It is **append-only** while the flight is `IN_AIR`; rollover (oldest segment dropped first) is logged but never silent (AC-NEW-3). **Layout** (per flight): ``` /var/lib/gps-denied/fdr/{flight_id}/ manifest.json # flight metadata snapshot (mirror of flights row + manifests row) segments/ seg_00001.bin # rolling segments (~256 MB each) seg_00002.bin ... thumbnails/ # AC-8.5 ≤ 0.1 Hz failed-tile thumbnail log (forensic exception) YYYYMMDD-HHMMSS-.jpg rollover.log # one line per dropped segment ``` **Record schema (`FdrRecord`, length-prefixed binary stream)**: ``` record_header (16 bytes): magic u32 = 0x47464452 # "GFDR" version u16 = 1 type u16 # see record types below monotonic_ms u64 # since flight start record_body: variable, schema per `type` record_crc32 u32 ``` **Record types (`type` field — additive-only)**: | Type ID | Name | Body | |---|---|---| | `0x0001` | `EmittedExternalPosition` | WGS84 + 6×6 covariance + provenance label + `last_satellite_anchor_age_ms` + per-FC encoding (AP `GPS_INPUT` or iNav `MSP2_SENSOR_GPS`) | | `0x0002` | `ImuTrace` | timestamped accel + gyro window | | `0x0003` | `ReceivedMavlinkRaw` | raw MAVLink frame (signed/unsigned), captured `tlog`-style | | `0x0004` | `ReceivedMsp2Raw` | raw MSP2 frame (iNav profile only) | | `0x0005` | `SystemHealth` | CPU%, GPU%, temp, throttle flag, RAM, VRAM, NVM remaining | | `0x0006` | `SourceLabelTransition` | `{satellite_anchored, visual_propagated, dead_reckoned}` transition | | `0x0007` | `MidFlightTileEmitted` | reference to `tiles.id` + `quality_metadata` snapshot | | `0x0008` | `MidFlightTileFailed` | reason + thumbnail filename (AC-8.5 forensic exception) | | `0x0009` | `MavlinkSigningKeyRotated` | new key fingerprint + reason (D-C8-9) | | `0x000A` | `EkfSourceSetCommand` | D-C8-2 source-set switch event | | `0x000B` | `VisualBlackoutEvent` | start / end + reason (AC-3.5, AC-NEW-8) | | `0x000C` | `SpoofingPromotionEvent` | promoted / rejected + 10 s-gate state (AC-NEW-2, AC-NEW-8) | | `0x000D` | `ContentHashGateFail` | takeoff-load gate fail (D-C10-3) | | `0x000E` | `ThermalThrottleHybridSwitch` | K=3 ↔ K=2 switch event (D-CROSS-LATENCY-1, ADR-006) | | `0x000F` | `ComponentLifecycleEvent` | per-component start / stop / fail | **Backward compatibility**: new record types are appended; readers MUST skip records they don't recognise (the `record_header` length is enough to advance the cursor). No record type is ever renumbered or removed; deprecation is by ceasing to emit. **Retention**: per-flight ring; on `IN_AIR → ON_GROUND` transition, the ring is sealed and the operator-orchestrator FDR-retrieval workflow (C12) copies it off the companion. The companion auto-prunes flights older than the configured retention window (default: 30 days) — the prune log itself is its own FDR record on the next flight. --- ### 2.9 Tile JPEG bodies (filesystem) JPEG bodies live at `./tiles/{zoomLevel}/{x}/{y}.jpg`. A sidecar `./tiles/{zoomLevel}/{x}/{y}.json` carries the full row content for upload-time payload assembly. Both files are atomic-written (via `atomicwrites`); both are removed only after the corresponding `tiles` row's lifecycle says it is safe (see § 2.1.2). Filesystem and PostgreSQL drift is treated as a defect: the operator-orchestrator C12 has a periodic `consistency_audit` that reports any orphan files / missing files. --- ## 3. Mermaid ERD ```mermaid erDiagram flights ||--o{ tiles : "1 flight produces N onboard tiles" flights ||--|| manifests : "1 flight references 1 manifest" manifests ||--o{ sector_classifications : "manifest captures N classifications" manifests ||--o{ engine_cache_entries : "manifest pins N engines" sector_classifications ||--o{ tiles : "polygon contains N tiles (spatial)" flights { uuid id PK text companion_id timestamptz started_at timestamptz ended_at bytea signing_key_fingerprint bytea mavlink_signing_key_fingerprint text fc_profile text vio_strategy text vpr_strategy text build_kind uuid manifest_id FK text fdr_path timestamptz created_at } tiles { bigserial id PK int zoom_level int tile_x int tile_y double latitude double longitude double tile_size_meters int tile_size_pixels timestamptz capture_timestamp text compression text crs text source uuid flight_id FK text companion_id jsonb quality_metadata text voting_status text freshness_status bytea signature timestamptz created_at timestamptz updated_at } sector_classifications { bigserial id PK jsonb polygon_geojson text classification int freshness_threshold_days text set_by timestamptz set_at timestamptz revoked_at uuid manifest_id FK } manifests { uuid id PK bytea manifest_hash bytea model_bundle_hash bytea calibration_artifact_hash bytea corpus_hash bytea sector_classifications_hash bytea engine_cache_bundle_hash bytea descriptor_index_hash timestamptz created_at text staged_by timestamptz verified_at } engine_cache_entries { bigserial id PK text model_name text model_revision text sm text jetpack_version text tensorrt_version text precision text engine_path bigint engine_size_bytes bytea engine_hash text int8_calibration_path bytea int8_calibration_hash timestamptz created_at } ``` > Filesystem-only artifacts (camera calibration JSON, FAISS `.index`, FDR records, tile JPEG bodies) are NOT shown in the ERD. Their referential ties to PostgreSQL are: `manifests.calibration_artifact_hash` ↔ calibration JSON; `manifests.descriptor_index_hash` ↔ FAISS `.index`; `flights.fdr_path` ↔ FDR ring directory; `tiles.id` × `(zoom_level, tile_x, tile_y)` ↔ tile JPEG body + sidecar. --- ## 4. Migration Strategy ### 4.1 Versioning tool **Alembic** (SQLAlchemy-based migrations) is the chosen tool. Justification: - Python is the host language (per `architecture.md` § 2). Alembic ships with the SQLAlchemy ecosystem the project is already using for any DB-touching code. - Alembic supports both autogenerate and hand-written migrations; the team will use **hand-written migrations only** (autogenerate is acceptable as a starting point for a migration but must be human-reviewed before commit — autogenerate cannot infer the additive-only / freeze-canonical rules). - Alembic's `downgrade()` direction satisfies the reversibility requirement below for additive migrations; non-reversible migrations (rare; only for the controlled-deprecation cases) are explicitly marked. - Alembic plays well with the per-companion local PostgreSQL deployment model — no shared upstream migration coordinator is needed. **Coordination with `satellite-provider`**: the parent suite owns the canonical `satellite-provider` schema (whatever migration tool — likely EF Core — they use is irrelevant to the onboard side). The onboard project's `tiles` migration baseline MUST be cross-checked against the latest `satellite-provider` `tiles` schema at every onboard cycle; the onboard-only columns are added on top. A Plan-phase carryforward (Step 4 risk register) captures the ongoing coordination obligation. ### 4.2 Reversibility requirement **Default = reversible.** Every migration MUST implement a working `downgrade()` that returns the schema to the prior state without data loss. Migrations that cannot be reversed (e.g., DROP COLUMN) are **forbidden by default** and require an ADR + user sign-off as part of the migration commit. Concretely: - `op.add_column(...)` ⇄ `op.drop_column(...)` — reversible. - `op.create_table(...)` ⇄ `op.drop_table(...)` — reversible (data loss is intentional for new tables). - `op.create_index(...)` ⇄ `op.drop_index(...)` — reversible. - `op.alter_column(...)` for type changes — only reversible if the type widening is truly bidirectional; otherwise this is a forbidden-by-default migration that requires ADR. - Backfills / data migrations — must be implemented as either (a) idempotent (safe to re-run on either direction) or (b) split into a separate "data-only" migration whose reverse is a no-op with a documented data-loss note. ### 4.3 Naming convention ``` migrations/versions/YYYY_MM_DD_HHMM_.py ``` Where: - `YYYY_MM_DD_HHMM` is the migration's authored timestamp in UTC (matches Alembic's natural ordering and is operator-friendly). - `` is kebab-case-with-underscores describing the change in ≤ 6 words: `add_voting_status_to_tiles`, `create_engine_cache_entries`, `freshness_threshold_check`, etc. - The Alembic `revision` and `down_revision` IDs are 12-character random hex (Alembic default). `migrations/env.py` uses the `gps_denied.db.metadata` SQLAlchemy `MetaData` object for autogenerate diffs; the connection URL is read from the same config file the runtime uses (no separate migration config). ### 4.4 Migration baseline The first migration `0001_initial.py` creates `flights`, `manifests`, `sector_classifications`, `engine_cache_entries`, and `tiles` (with the canonical `satellite-provider` columns). The first onboard-extension migration `0002_add_onboard_columns_to_tiles.py` adds the onboard-only columns (`flight_id`, `companion_id`, `quality_metadata`, `voting_status`, `freshness_status`, `signature`) plus the indexes specific to those columns. Splitting the baseline this way keeps the diff against `satellite-provider`'s migration history mechanically clear. --- ## 5. Seed Data | Table | dev-tier1 (Tier-1 Docker) | staging-tier1 / staging-tier2 (CI) | production | |---|---|---|---| | `flights` | One synthetic flight row keyed against `tests/fixtures/flight_derkachi/` | Synthetic flights per replay scenario; matches IT-12 comparative-study fixture set | Real flight data — written by takeoff-load code; NEVER seeded | | `tiles` (`source='googlemaps'`) | Loaded from `tests/fixtures/tiles_corpus/` (a curated, ≤100 MB subset of `satellite-provider`'s output for the Derkachi area) | Same as dev-tier1; the e2e-test `mock-suite-sat-service` fixture populates additional rows on demand for upload-side scenarios | Written by C11 `TileDownloader` from the operator workstation; NEVER seeded | | `tiles` (`source='onboard_ingest'`) | Synthetic injected for IT / NFT replays | Synthetic injected for NFT-SEC-01 cache-poisoning Monte-Carlo runs | NEVER seeded (only takeoff-load orthorectification code can write) | | `sector_classifications` | One `active_conflict` polygon over Derkachi + one `stable_rear` polygon (test fixtures) | Same | Loaded by C10 from operator-staged manifest; NEVER seeded | | `manifests` | One pre-built manifest for the Derkachi fixture set | Multiple — one per CI scenario | Loaded by C10 from operator-staged manifest; NEVER seeded | | `engine_cache_entries` | Tier-1 has no GPU engines; rows reference Tier-2-built engines pulled from a CI artifact cache | Tier-2 builds engines on first run; rows generated then | Loaded by C10 from operator-staged engine cache; NEVER seeded | **Hard rule**: production NEVER seeds any of these tables. The pre-flight cache build flow (C11 `TileDownloader` writes `tiles` rows; C10 writes `manifests` and `engine_cache_entries`) is the only writer of canonical pre-flight rows; the takeoff-load flow is the only writer of flight rows; the in-flight orthorectifier is the only writer of `source='onboard_ingest'` rows. A test fixture writing into a production database is a defect (per coderule.mdc "mocking data is needed only for tests, never mock data for dev or prod env"). **Seed mechanism (Tier-1 / staging)**: `tests/fixtures/seed_db.py` is the canonical Tier-1 seeder; it issues only INSERTs against a dedicated test database. It is invoked by the test harness (`pytest`) and by the `docker compose` developer setup. It does NOT use Alembic — it operates on the schema Alembic has already created. --- ## 6. Backward Compatibility ### 6.1 Default: additive-only Every schema change is **additive by default**: - New tables — additive; existing code paths unaffected. - New columns — additive; default value or NULL; existing rows backfilled per migration data step. - New indexes — additive; not visible to read-paths semantically, only to performance. - New CHECK constraints — additive **only if** all existing rows already satisfy the constraint; otherwise the migration MUST split into "backfill rows" then "add constraint". - New jsonb keys (e.g., in `tiles.quality_metadata`) — additive; readers MUST tolerate missing keys (treat as absent / null). - New FDR record types — additive; readers skip unknown types. ### 6.2 Forbidden by default The following changes are **forbidden by default** and require an ADR + user sign-off: - Renaming a column or table. - Dropping a column or table. - Tightening a column type (narrowing). - Tightening a CHECK constraint that existing rows could violate. - Removing a value from a CHECK constraint enum. This rule is the data-model expression of `coderule.mdc`'s "Do not rename any databases or tables or table columns without confirmation." ### 6.3 The `tiles` schema specifically: frozen on canonical columns Because the `tiles` schema MUST stay byte-compatible with `satellite-provider`'s on-disk and ingest format, the **canonical columns** (those marked "both" in § 2.1.1) are **frozen**: this project never renames, types-narrows, or drops them. Coordination with the parent suite is the only path to canonical-column changes; the onboard-only columns can evolve under the additive-only rule above without parent-suite coordination. If `satellite-provider` introduces a canonical-column change in a future cycle, the onboard project must: 1. Open an ADR documenting the upstream change. 2. Run a migration that mirrors the upstream change exactly. 3. Re-validate that the post-landing upload payload (D-PROJ-2 contract) still matches. 4. Re-run the post-landing upload integration test (`mock-suite-sat-service` swapped for the real `satellite-provider`). ### 6.4 Schema versioning for jsonb sub-schemas `tiles.quality_metadata` (and any future jsonb fields with non-trivial structure) carries a `_v` integer key. Readers branch on `_v`: - `_v == 1` (current): the schema in § 2.1.3. - `_v` missing: treat as `_v == 1` (back-compat for rows written before `_v` was added). - `_v == 2+`: future additive evolution; readers MUST tolerate unknown additional keys. Schema-version bumps are tracked in `_docs/02_document/schemas/` (a new `tiles_quality_metadata_v{N}.json` per bump). ### 6.5 FDR file-format compatibility The FDR `record_header` is fixed at version 1. Every FDR reader (operator-orchestrator, replay tools) MUST: - Validate `magic == 0x47464452` and skip a corrupt segment. - Read the `version` field; on `version != 1`, refuse to interpret the body and emit a "unknown FDR version" diagnostic. - For known `version == 1`, read the `type` field; on unknown `type`, advance the cursor by `body_length + 4` (CRC) and continue. The FDR file format will be bumped to version 2 only if a structural change to `record_header` is required; that change is gated on the same ADR + user sign-off rule as a forbidden-by-default schema change. --- ## 7. Non-Persistent Runtime Entities (referenced for completeness only) The following DTOs flow through the per-frame pipeline in memory and are **NOT** persistent (except as derived FDR records — see § 2.8 above). They are documented here because § 4 of `architecture.md` lists them as "Core entities" — but the data-model migration scope does not include them. Their full schema lives in component specs (Plan Step 3). | DTO | Producer | Consumers | Persisted as | |---|---|---|---| | `NavCameraFrame` | Camera ingest thread | C1, C2 | Never (AC-8.5 forbids raw-frame storage; the only forensic exception is AC-8.5 ≤ 0.1 Hz failed-tile thumbnails inside the FDR) | | `ImuSample` / `ImuWindow` | C8 inbound side | C1, C5 | FDR `ImuTrace` records (lossy summary, not full stream) | | `VioOutput` | C1 | C5 | FDR `SystemHealth` (covariance summary only); not as a row | | `VprQuery` / `VprResult` | C2 | C2.5 | Never (transient retrieval state) | | `RerankResult` | C2.5 | C3 / C3.5 | Never | | `MatchResult` | C3 / C3.5 | C4 | Never | | `PoseEstimate` | C4 → C5 | C8, C13 | FDR `EmittedExternalPosition` records (post-emission); also feeds `tiles.quality_metadata` for in-flight orthorectified tiles | | `EmittedExternalPosition` | C8 | FC, FDR | FDR record | | `FlightStateSignal` | C8 inbound side | flight-state guard, FDR | FDR `ComponentLifecycleEvent` on transition | | `CameraCalibration` (loaded once) | calibration loader | C1, C3, C4 | NOT in PostgreSQL — see § 2.6 | --- ## 8. Open Items / Plan-Phase Carryforward - **`satellite-provider` schema-drift coordination** (recurring): the canonical-columns freeze depends on an explicit cross-project schema review every cycle. Captured as a Plan-Step-4 risk register entry; carryforward. - **D-PROJ-2 #1 ingest-endpoint contract**: the `signature` column's exact algorithm (Ed25519 vs ECDSA) and the per-flight key distribution is a parent-suite design decision; onboard side is contract-flexible and treats `signature` as opaque `bytea`. - **D-PROJ-2 #2 voting-layer schema**: parent-suite-side; this onboard data model writes `voting_status='pending'` and reads `'trusted'` only — the actual promotion table lives in `satellite-provider`'s schema and is out of scope here. - **GeoJSON polygon precision** (`sector_classifications.polygon_geojson`): GeoJSON is precision-bounded by JSON number representation; if AC-NEW-7 cache-poisoning safety needs sub-metre polygon edges, a future migration can switch to PostGIS `geography(Polygon, 4326)`. Captured as carryforward (currently no AC requirement to do so). - **FDR retention policy default**: 30 days post-landing is a reasonable default but is not pinned in any AC; carryforward to the operator-orchestrator spec (C12) for confirmation.