mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 22:11:13 +00:00
[AZ-307] c6 FreshnessGate: active-conflict reject + stable-rear downgrade
Replaces the AZ-305 pass-through _evaluate_freshness hook with the production FreshnessGate. Loads tile_freshness_rules + sector classifications once at construction, builds an rtree index, and on every evaluate() either returns metadata unchanged (FRESH), stamps freshness_label=DOWNGRADED (stable_rear + stale), or raises FreshnessRejectionError carrying tile_id / age_seconds / classification / rule diagnostics (active_conflict + stale). Constructed inside PostgresFilesystemStore.from_config; the public storage_factory signature is preserved so AZ-305 unit tests still build the store with freshness_gate=None for the pass-through path. FDR schema bumped to v1.2.0: adds c6.freshness.rejected and c6.freshness.downgraded kinds (non-breaking; v1.1 readers route them opaquely). Operator CLI `python -m c6_tile_cache.freshness_gate explain` dry-runs the decision for a (lat, lon, capture_ts). Adjacent hygiene: c6_tile_cache.tools._dump_tile now passes os.environ to load_config (AZ-305 regression — load_config requires the env mapping). Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,202 @@
|
||||
# C6 Freshness Gate — Active-Conflict Reject + Stable-Rear Downgrade
|
||||
|
||||
**Task**: AZ-307_c6_freshness_gate
|
||||
**Name**: C6 Freshness Gate
|
||||
**Description**: Implement the freshness gate that runs at every `write_tile` and `insert_metadata` call site: looks up the target `(lat, lon)`'s sector classification from `sector_boundaries`, reads the per-classification rule from `tile_freshness_rules` (`max_age_seconds`, `action`), and either raises `FreshnessRejectionError` (active_conflict + stale → reject) or stamps `freshness_label = DOWNGRADED` (stable_rear + stale → downgrade) before the row lands. Replaces the pass-through `_evaluate_freshness` hook the `PostgresFilesystemStore` ships in AZ-305. Reads the rules table once at construction (rules are per-flight; the flight is the lifetime). Caches sector boundaries in an in-memory R-tree (operator sets ≤ a few hundred per flight). Emits an FDR record on every rejection and every downgrade.
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: AZ-303_c6_storage_interfaces, AZ-304_c6_postgres_schema, AZ-305_c6_postgres_filesystem_store, AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-273_fdr_client_ringbuf
|
||||
**Component**: c6_tile_cache (epic AZ-250 / E-C6)
|
||||
**Tracker**: AZ-307
|
||||
**Epic**: AZ-250 (E-C6)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/contracts/c6_tile_cache/tile_metadata_store.md` — Invariants I-2 (active_conflict reject) and I-3 (stable_rear downgrade) are the canonical statement of this task's behaviour.
|
||||
- `_docs/02_document/contracts/c6_tile_cache/tile_store.md` — defines `FreshnessRejectionError` and `FreshnessLabel`.
|
||||
- `_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md` — `kind="c6.freshness.rejected"` / `kind="c6.freshness.downgraded"` envelopes.
|
||||
- `_docs/02_document/contracts/shared_logging/log_record_schema.md` — INFO log shape on rule load + WARN log shape on rejection / downgrade.
|
||||
|
||||
## Problem
|
||||
|
||||
Without a real freshness gate:
|
||||
|
||||
- AC-8.2 (active_conflict ≤ 6 mo, stable_rear ≤ 12 mo) is unenforced — stale tiles in active sectors land silently and downstream consumers cannot tell them apart.
|
||||
- AC-NEW-6 (system rejects/downgrades stale tiles) collapses — the test C6-IT-02 / C6-IT-05 cannot pass.
|
||||
- The pass-through hook in `PostgresFilesystemStore` (AZ-305) accepts every freshness label as-is — an attacker could feed `freshness_label=FRESH` for a 10-year-old tile and bypass the safety budget.
|
||||
- The cache-poisoning safety budget (AC-NEW-7) loses one of its layers — the rule-evaluation point is a defence boundary, not a label-trust point.
|
||||
- Operator sector classifications (set via C12) have no read consumer at the C6 layer — the classifications would be data-only, never policy.
|
||||
|
||||
This task wires the rule-evaluation logic into the AZ-305 store's `_evaluate_freshness` hook.
|
||||
|
||||
## Outcome
|
||||
|
||||
- A `FreshnessGate` class at `src/gps_denied_onboard/components/c6_tile_cache/freshness_gate.py` with a single public method `evaluate(metadata: TileMetadata) -> TileMetadata` that returns either:
|
||||
- The same `metadata` if FRESH applies (no policy intervention).
|
||||
- A new `metadata` with `freshness_label=DOWNGRADED` if stable_rear-stale.
|
||||
- Raises `FreshnessRejectionError` if active_conflict-stale.
|
||||
- Constructor signature: `__init__(self, *, postgres_pool: psycopg_pool.ConnectionPool, fdr_client: FdrClient, logger: Logger, clock: Clock)`. The `clock` injection lets tests advance time deterministically.
|
||||
- At construction:
|
||||
1. Reads `sector_boundaries` once, builds an in-memory R-tree (using `rtree` library — already pinned by description.md or added here if not; check requirements file).
|
||||
2. Reads `tile_freshness_rules` once, caches the two rules in a frozen dict `{SectorClassification: FreshnessRule}`.
|
||||
3. Emits an INFO log: `kind="c6.freshness.loaded"` with `n_sectors`, `rules`.
|
||||
- `evaluate(metadata)`:
|
||||
1. Computes `tile_age_seconds = now - metadata.capture_timestamp` via the injected `clock`.
|
||||
2. Queries the R-tree for the sector containing `(metadata.tile_id.lat, metadata.tile_id.lon)`. If multiple sectors match (overlap), the smallest by area wins (deterministic tie-break).
|
||||
3. If no sector matches → treats as `STABLE_REAR` default (per data_model.md convention; documented as the implicit default).
|
||||
4. Looks up the rule for that classification.
|
||||
5. If `tile_age_seconds <= rule.max_age_seconds` → returns `metadata` unchanged (FRESH).
|
||||
6. Else if `rule.action == 'reject'` → emits FDR `kind="c6.freshness.rejected"` and WARN log; raises `FreshnessRejectionError(tile_id, age_seconds, classification, rule)`.
|
||||
7. Else if `rule.action == 'downgrade'` → emits FDR `kind="c6.freshness.downgraded"` and INFO log; returns `dataclasses.replace(metadata, freshness_label=FreshnessLabel.DOWNGRADED)`.
|
||||
- The `PostgresFilesystemStore`'s `_evaluate_freshness` hook is replaced — instead of `return metadata.freshness_label`, it now calls `freshness_gate.evaluate(metadata).freshness_label`. This is a wiring change in AZ-305's class — implemented as a small constructor argument addition (`freshness_gate: Optional[FreshnessGate] = None`) so AZ-305 remains testable in isolation.
|
||||
- The composition root constructs `FreshnessGate` and passes it to `PostgresFilesystemStore` AFTER the migration runner (AZ-304) has populated the rules table.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `FreshnessGate` class with `evaluate(metadata)` method.
|
||||
- Construction-time R-tree build over `sector_boundaries`.
|
||||
- Construction-time rules-table cache.
|
||||
- FDR emission on every rejection and every downgrade.
|
||||
- WARN log on rejection (per `tile_store.md` § log table); INFO log on downgrade (downgrade is recoverable, not an error).
|
||||
- The smallest-area tie-break for overlapping sector boundaries (deterministic, documented).
|
||||
- The implicit STABLE_REAR default for `(lat, lon)` outside any sector.
|
||||
- A constructor `Optional[FreshnessGate]` arg on `PostgresFilesystemStore` so AZ-305 stays unit-testable without this gate.
|
||||
- Composition-root wiring (the factory `build_tile_store` becomes `build_tile_store(config, freshness_gate)`).
|
||||
- A standalone CLI `python -m c6_tile_cache.freshness_gate explain <lat> <lon> <capture_iso>` for operators to dry-run the gate.
|
||||
|
||||
### Excluded
|
||||
|
||||
- Sector-boundary CRUD — owned by C12 operator tooling.
|
||||
- Tile-freshness-rule CRUD beyond the migration's seeded defaults — operators can edit at the DB level today; a future task adds an admin API.
|
||||
- Rule reload mid-flight — out of scope this cycle. The flight is the lifetime; rules change requires a process restart.
|
||||
- Cross-sector pose-error voting (the parent-suite D-PROJ-2 voting layer) — that lives in `satellite-provider`.
|
||||
- Time-of-day or seasonal freshness adjustments — not in description.md, out of scope.
|
||||
- Per-tile freshness override (operator manually marks one tile fresh) — out of scope; operator workaround is to delete + re-insert with a fresh capture_timestamp.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Active-conflict stale tile is rejected**
|
||||
Given a sector classified `ACTIVE_CONFLICT` with the default 6-month rule, and a tile inside it with `capture_timestamp = now - 7 months`
|
||||
When `evaluate(metadata)` is called
|
||||
Then `FreshnessRejectionError` is raised with a message naming the tile_id, the age, and the rule; ONE FDR `kind="c6.freshness.rejected"` record is emitted; ONE WARN log is emitted
|
||||
|
||||
**AC-2: Active-conflict fresh tile passes**
|
||||
Given the same sector and a tile with `capture_timestamp = now - 5 months`
|
||||
When `evaluate(metadata)` is called
|
||||
Then the call returns `metadata` unchanged; no FDR record is emitted; no WARN log is emitted
|
||||
|
||||
**AC-3: Stable-rear stale tile is downgraded**
|
||||
Given a sector classified `STABLE_REAR` with the default 12-month rule, and a tile inside it with `capture_timestamp = now - 13 months`
|
||||
When `evaluate(metadata)` is called
|
||||
Then the returned `TileMetadata` has `freshness_label = FreshnessLabel.DOWNGRADED`; the rest of the metadata is unchanged; ONE FDR `kind="c6.freshness.downgraded"` record is emitted; ONE INFO log is emitted
|
||||
|
||||
**AC-4: Stable-rear fresh tile passes**
|
||||
Given the same sector and a tile with `capture_timestamp = now - 10 months`
|
||||
When `evaluate(metadata)` is called
|
||||
Then the call returns `metadata` unchanged; no FDR; no log
|
||||
|
||||
**AC-5: Tile outside all sectors defaults to STABLE_REAR**
|
||||
Given a tile at `(lat, lon)` not contained in any `sector_boundaries` row
|
||||
When `evaluate(metadata)` is called with a 13-month-old `capture_timestamp`
|
||||
Then the result is `freshness_label = DOWNGRADED` (the implicit STABLE_REAR default applies); FDR `kind="c6.freshness.downgraded"` is emitted
|
||||
|
||||
**AC-6: Overlapping sectors resolve by smallest area**
|
||||
Given two `sector_boundaries` rows: a 1°×1° ACTIVE_CONFLICT box and a 0.1°×0.1° STABLE_REAR box, with the smaller box fully inside the larger
|
||||
When `evaluate(metadata)` is called for a tile inside the smaller (and thus also the larger) box
|
||||
Then the STABLE_REAR rule applies (smallest area wins); a 13-month-old tile is downgraded, NOT rejected
|
||||
|
||||
**AC-7: Rules and sectors are loaded once at construction**
|
||||
Given a `FreshnessGate` instance
|
||||
When 10000 `evaluate` calls are made
|
||||
Then no `sector_boundaries` or `tile_freshness_rules` SELECT is observed (verifiable via psycopg query log capture); only the construction-time SELECT pair is observed
|
||||
|
||||
**AC-8: FreshnessRejectionError carries diagnostic fields**
|
||||
Given an active_conflict rejection
|
||||
When the test inspects the raised exception
|
||||
Then `exc.tile_id`, `exc.age_seconds`, `exc.classification`, `exc.rule` are populated; the exception message starts with `"Tile rejected by freshness gate"`
|
||||
|
||||
**AC-9: FDR record envelopes match contract**
|
||||
Given a rejection or downgrade
|
||||
When the FDR record is captured
|
||||
Then the record matches `_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md` shape with the documented `kind`, `producer_id="c6_tile_cache.freshness"`, payload `{tile_id, age_seconds, classification, rule_action, rule_max_age_seconds}`
|
||||
|
||||
**AC-10: Composition wiring change works end-to-end**
|
||||
Given a `PostgresFilesystemStore` constructed WITH a `FreshnessGate` argument
|
||||
When `write_tile` is called with a stale active_conflict tile
|
||||
Then `FreshnessRejectionError` is raised; no JPEG / row / sidecar is written (verifiable via filesystem + DB inspection); the rejection FDR is emitted via the same `FdrClient` AZ-305 already holds
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- `evaluate` p99 ≤ 100 µs (R-tree point-in-rect lookup is sub-microsecond; the hot bottleneck is the `now - capture_timestamp` arithmetic and the FDR emission, both fast).
|
||||
- Construction takes ≤ 50 ms even for a few-hundred-sector flight (R-tree build is O(N log N) on a small N).
|
||||
|
||||
**Compatibility**
|
||||
- `rtree` Python library — verify the project pin already includes it; if not, this task adds it (compatible with the project's existing geospatial stack).
|
||||
- `dataclasses.replace` is stdlib.
|
||||
|
||||
**Reliability**
|
||||
- Construction failure is fail-fast: a malformed `tile_freshness_rules` row (e.g., unknown `action` enum value) raises a `ConfigSchemaError` extension at construction; the composition root catches and aborts startup with a clear operator message.
|
||||
- The gate is idempotent — calling `evaluate` on the same `metadata` twice returns deep-equal results (no hidden state changes).
|
||||
- The injected `Clock` MUST be the same singleton used by AZ-305's `record_lru_access` and AZ-302's thermal publisher (already a project-wide singleton).
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | Active-conflict + 7-month tile | FreshnessRejectionError; one FDR; one WARN log |
|
||||
| AC-2 | Active-conflict + 5-month tile | Returns unchanged; no FDR; no WARN |
|
||||
| AC-3 | Stable-rear + 13-month tile | Returns with freshness_label=DOWNGRADED; one FDR; one INFO |
|
||||
| AC-4 | Stable-rear + 10-month tile | Returns unchanged; no FDR; no log |
|
||||
| AC-5 | Tile outside all sectors + 13-month | Defaults to STABLE_REAR; downgraded |
|
||||
| AC-6 | Overlapping sectors (smaller STABLE_REAR inside larger ACTIVE_CONFLICT) | Smaller wins; downgrade, not reject |
|
||||
| AC-7 | 10k evaluate calls + query-log capture | Only construction-time SELECTs observed |
|
||||
| AC-8 | Inspect raised FreshnessRejectionError fields | tile_id, age_seconds, classification, rule populated |
|
||||
| AC-9 | FDR record shape on reject and downgrade | Matches schema deep-equal |
|
||||
| AC-10 | E2E PostgresFilesystemStore + FreshnessGate write | FreshnessRejectionError; no fs/db effects |
|
||||
| NFR-perf-evaluate | Microbench evaluate × 100k | p99 ≤ 100 µs |
|
||||
| NFR-reliability-malformed-rule | Inject `tile_freshness_rules` row with `action='unknown'` | ConfigSchemaError at construction |
|
||||
|
||||
## Constraints
|
||||
|
||||
- The R-tree is built ONCE at construction; mid-flight sector boundary changes are NOT honoured (process restart required).
|
||||
- The implicit STABLE_REAR default for tiles outside all sectors is documented and is the safer default (downgrade, not reject — operator may add an explicit `whole_world` ACTIVE_CONFLICT sector if they want fail-closed behaviour).
|
||||
- Tie-break for overlapping sectors is "smallest area wins" — deterministic and documented; bbox area is computed via `(max_lat - min_lat) * (max_lon - min_lon)` (degrees² — adequate for ranking, not for actual area).
|
||||
- The gate raises `FreshnessRejectionError` (defined in AZ-303); this task does NOT define new error types.
|
||||
- The gate's `evaluate` method MUST be idempotent and side-effect-free except for FDR + log emissions; future code-review treats internal state mutation as a `Reliability` finding (High).
|
||||
- `Clock` injection is mandatory — no `time.time()` direct calls; tests assert deterministic output by advancing the fake clock.
|
||||
- This task does NOT introduce new third-party dependencies beyond `rtree` (verify in requirements).
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: R-tree library API drift across pins**
|
||||
- *Risk*: `rtree` minor version bump changes API; constructor calls fail at runtime.
|
||||
- *Mitigation*: Pin recorded in requirements; the wrapper isolates `rtree` to this single class; future breaks fail-fast at construction.
|
||||
|
||||
**Risk 2: Sector-boundary update mid-flight is silently ignored**
|
||||
- *Risk*: Operator updates sector_boundaries via SQL during a flight; the gate's R-tree is stale; new tile classifications use old boundaries.
|
||||
- *Mitigation*: Documented constraint — process restart required for boundary changes. Operator workflow: pre-flight sector setup is C12's responsibility; in-flight boundary changes are not in scope.
|
||||
|
||||
**Risk 3: STABLE_REAR-default for tiles outside all sectors is too lenient**
|
||||
- *Risk*: A tile from an unmapped area lands as DOWNGRADED rather than rejected, leaking past the safety budget.
|
||||
- *Mitigation*: Documented as the safer default (operator adds explicit ACTIVE_CONFLICT whole_world sector for fail-closed). FDR `kind="c6.freshness.downgraded"` carries the classification, so the FDR-trace shows operators which tiles fell through. A future task could add a `config.tile_cache.freshness_gate.no_sector_default` config field — out of scope this cycle.
|
||||
|
||||
**Risk 4: Smallest-area tie-break interacts badly with adversarial sector layouts**
|
||||
- *Risk*: An operator (or attacker) inserts a tiny STABLE_REAR sector inside a large ACTIVE_CONFLICT box to bypass rejections.
|
||||
- *Mitigation*: Sector boundary CRUD is C12-only and operator-authenticated (per architecture's threat model). The smallest-area rule is documented; if abused, the operator audit log (set_by_operator + set_at columns in `sector_boundaries`) surfaces the change.
|
||||
|
||||
**Risk 5: Clock-injection mistake — fake clock used in production**
|
||||
- *Risk*: Composition root accidentally wires `FakeClock` instead of `WallClock` to the gate; freshness ages are computed against a fixed time; everything looks fresh forever.
|
||||
- *Mitigation*: AZ-265's `Clock` interface owns the WallClock vs. fake choice via the same composition-root selection that owns thermal-state polling. The factory's per-binary CMake `BUILD_*` flags already separate live (WallClock) from replay (TlogDerivedClock); test wiring is the only place fakes appear. Code review's wiring check (Phase 6 / Architecture) is the canonical guard.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: per-sector freshness gate enforcing AC-8.2 / AC-NEW-6 (description.md / E-C6 / data_model.md).
|
||||
- **Production code that must exist**: real `FreshnessGate` class with R-tree-backed sector lookup, real `tile_freshness_rules` query at construction, real `dataclasses.replace` for the downgrade label, real FDR emission on every reject and downgrade, real WARN/INFO logs.
|
||||
- **Allowed external stubs**: tests MAY use a fake `Clock`, fake `FdrClient`, fake `Logger`, and an in-memory psycopg fake (testcontainer is also fine — both are equivalent under AZ-304's schema fixture); production wiring uses real WallClock + real AZ-273 `FdrClient` + real AZ-266 `Logger` + real Postgres pool.
|
||||
- **Unacceptable substitutes**: a hardcoded "everything is fresh" pass-through (defeats the entire point); a Python in-memory boundary list ignoring `sector_boundaries` (would diverge from the operator's source of truth in C12); `time.time()` direct calls without Clock injection (would break test determinism); skipping the R-tree and doing a linear scan over sectors (works at small N but invites future regression at larger N — R-tree is pre-emptively the right shape per coderule.mdc's "the simplest solution that satisfies all requirements, including maintainability").
|
||||
|
||||
## Contract
|
||||
|
||||
This task implements behaviour mandated by `_docs/02_document/contracts/c6_tile_cache/tile_metadata_store.md` § Invariants I-2 + I-3. No new contract file — the gate is a policy implementation behind an existing Protocol surface (`PostgresFilesystemStore.write_tile / insert_metadata` already raise `FreshnessRejectionError` per the contract; this task supplies the rule-evaluation logic).
|
||||
Reference in New Issue
Block a user