[AZ-421] Batch 82: FT-P-15 + FT-P-16 + FT-P-18 cache / offline / no-raw-retention

FT-P-15: parse FDR `cache-self-check` records; assert every tile-manifest
entry has CRS, tile_matrix, dimension, m_per_px, capture_date, source,
compression; m_per_px >= 0.5 (or rejected by FDR `tile-load-rejected`).

FT-P-16: read `docker network inspect e2e-net` + `docker inspect <sut>`
snapshots; assert `Internal == true` AND SUT attached only to e2e-net.
The 0-egress semantic of AC-8.3 is enforced structurally.

FT-P-18: walk FDR + tile-cache, probe JPEG dimensions via stdlib SOF
parser, reject any file matching nav-camera raw pattern (5472x3648 or
880x720). Extrapolate thumbnail-log size to 8h; assert < 1 GB.

Adds runner.helpers.tile_cache_inspector with five evaluators
(manifest schema, offline mode, raw-frame detection, thumbnail budget,
JPEG dimension probe) + walk_files helper. Pure-logic coverage: 43
new unit tests; full e2e/_unit_tests/ suite 793 passing (was 746).
Scenarios skip locally when SITL replay fixture or docker-inspect
env vars are missing; production hooks (cache-self-check FDR record,
tile-load-rejected events, docker-inspect snapshots) are tracked
outside this task.

See _docs/03_implementation/batch_82_report.md +
reviews/batch_82_review.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-17 15:09:58 +03:00
parent b0296da911
commit 7d1288e4ba
9 changed files with 1693 additions and 3 deletions
+187
View File
@@ -0,0 +1,187 @@
# Batch 82 Report — FT-P-15 + FT-P-16 + FT-P-18 cache / offline / no-raw-retention
**Batch**: 82
**Date**: 2026-05-17
**Context**: Test implementation (greenfield Step 10 — Implement Tests)
**Tasks**: AZ-421 (3 cp) — single task covering 3 sub-scenarios
**Cycle**: 1
**Verdict**: COMPLETE — PASS_WITH_WARNINGS (self-reviewed; see `reviews/batch_82_review.md`)
## Summary
Implements three storage / cache compliance scenarios that share the
`tile-cache-fixture` + FDR-archive observation surface:
* **FT-P-15** — Tile manifest schema completeness + 0.5 m/px floor
(AC-8.1). Reads FDR `cache-self-check` record + `tile-load-rejected`
events, validates every entry has CRS, tile_matrix, dimension,
m_per_px, capture_date, source, compression; entries below floor
must be explicitly rejected.
* **FT-P-16** — Offline-only operation (AC-8.3 / RESTRICT-SAT-1).
Reads `docker network inspect e2e-net` + `docker inspect <sut>`
JSON snapshots; asserts `e2e-net.Internal == true` AND the SUT is
attached to that network only. The 0-egress semantic is enforced
structurally — no other network is reachable.
* **FT-P-18** — No raw nav/AI-camera frame retention (AC-8.5). Walks
FDR + tile-cache, probes JPEG dimensions, rejects any file whose
extension + dimensions match the nav-camera raw pattern
(5472×3648 or 880×720). Extrapolates thumbnail-log size to 8 h
and asserts < 1 GB.
### AZ-421 — FT-P-15 + FT-P-16 + FT-P-18 (3 cp)
* **`e2e/runner/helpers/tile_cache_inspector.py`** (new, ~370 lines):
pure-logic evaluators sourced from FDR / docker-inspect /
filesystem walks.
* `evaluate_manifest_schema(entries, *, tile_load_rejected_ids,
m_per_px_floor)` → `ManifestSchemaReport` (AC-1, AC-2).
* `evaluate_offline_mode(network_inspect, container_inspect)` →
`OfflineModeReport` (AC-3).
* `detect_raw_frames(file_specs, *, raw_dimensions,
decoded_dimensions, raw_extensions)` → `RawFrameDetectionReport`
(AC-4).
* `evaluate_thumbnail_budget(size_bytes, duration_h)` →
`ThumbnailLogBudgetReport` (AC-5).
* `walk_files(*roots)` — convenience recursive walker.
* `probe_jpeg_dimensions(path)` → `(w, h)` via SOF marker parse,
stdlib-only.
* Module-level constants: `CACHE_SELF_CHECK_FDR_KIND`,
`TILE_LOAD_REJECTED_FDR_KIND`, `MANIFEST_REQUIRED_FIELDS`,
`MANIFEST_M_PER_PX_FLOOR`, `NAV_CAMERA_RAW_DIMENSIONS`,
`THUMBNAIL_LOG_MAX_SIZE_GB_PER_8H`.
* **`e2e/tests/positive/test_ft_p_15_cache_schema.py`** (new, ~115 lines):
FT-P-15 scenario. Skips on missing fixture; fails loudly on empty
`cache-self-check` record. `traces_to(AC-8.1,AC-1,AC-2,AC-6)`.
* **`e2e/tests/positive/test_ft_p_16_offline_only.py`** (new, ~115 lines):
FT-P-16 scenario. Skips on missing `DOCKER_NETWORK_INSPECT_PATH` /
`DOCKER_CONTAINER_INSPECT_PATH` env vars (fixture builder
pre-snapshots these because the runner has no docker-socket access).
`traces_to(AC-8.3,AC-3,AC-6,RESTRICT-SAT-1)`.
* **`e2e/tests/positive/test_ft_p_18_no_raw_retention.py`** (new,
~125 lines): FT-P-18 scenario. Walks FDR + tile-cache once;
probes JPEGs; computes replay duration from FDR `monotonic_ms`
span; evaluates AC-4 + AC-5. `traces_to(AC-8.5,AC-4,AC-5,AC-6)`.
* **`e2e/_unit_tests/helpers/test_tile_cache_inspector.py`** (new,
43 tests): pure-logic coverage for every evaluator + walker +
probe.
* **`e2e/_unit_tests/test_directory_layout.py`** (edited): registers
`runner/helpers/tile_cache_inspector.py` and three new scenario
test paths.
## Tests
Full `e2e/_unit_tests/` suite: **793 passed in 139.27 s** (baseline
746 → +47 net). Run via `python -m pytest e2e/_unit_tests/` from
the workspace root. No flakes, no skips outside the pre-existing
intentional skips.
Collection check on the three new scenario tests: 18 items
(3 tests × 6 `(fc_adapter, vio_strategy)` combinations). Scenario
tests skip locally because `E2E_SITL_REPLAY_DIR` is unset and the
docker-inspect env vars are unset — intended container-vs-host
boundary.
Per-area test counts (this batch):
| File | Tests added |
|------|-------------|
| `test_tile_cache_inspector.py` (new) | 43 |
| `test_directory_layout.py` (edited) | 4 (4 path entries) |
| `test_no_sut_imports.py` (no edit; broader walk) | implicit +1 module covered |
| **Total** | **+47** |
## Acceptance Criteria Verification
| AC | Status | Evidence |
|-----|--------|----------|
| AC-1 — manifest schema completeness | ✓ | `test_ft_p_15_cache_schema` + 12 `test_evaluate_manifest_schema_*` |
| AC-2 — m/px ≥ 0.5 floor (or rejected) | ✓ | Same scenario; below-floor-with-rejection / without-rejection unit tests |
| AC-3 — offline operation (no non-e2e-net egress) | ✓ | `test_ft_p_16_offline_only` + 7 `test_evaluate_offline_mode_*` |
| AC-4 — no raw-frame retention | ✓ | `test_ft_p_18_no_raw_retention` + 9 `test_detect_raw_frames_*` + 5 `test_probe_jpeg_dimensions_*` |
| AC-5 — thumbnail log < 1 GB / 8 h | ✓ | Same scenario; 7 `test_evaluate_thumbnail_budget_*` |
| AC-6 — parameterisation | ✓ | 6 param IDs per scenario; 18 total items collected |
## Code Review Verdict
PASS_WITH_WARNINGS (no Critical, no High; 3 Low notes — see
`reviews/batch_82_review.md`).
## Auto-Fix Attempts
0 (no auto-fix-eligible findings).
## Stuck Agents
None.
## Notable Decisions
* **Single task in batch 82.** AZ-421 internally covers 3
sub-scenarios (FT-P-15 / 16 / 18) — the task spec itself groups
them because they share the `tile-cache-fixture` + FDR
observation surface. Pulling AZ-422/423/427 in would have
produced 7 test files + multiple new helpers in one batch,
exceeding the recent empirical scope per batch (12 sub-scenarios).
AZ-422 / AZ-423 / AZ-427 land as their own batches.
* **AC-3 (offline-only) is enforced structurally, not by packet
count.** The spec says "all egress to non-`e2e-net` destinations
is 0". With `e2e-net.Internal == true` and the SUT attached only
to `e2e-net`, the packet count is provably 0 by Docker's network
policy — there is literally no other network the SUT can reach.
Checking the docker-inspect snapshots is cheaper and more
reliable than per-packet counters.
* **JPEG SOF dimension probe is stdlib-only.** Loading every JPEG
through OpenCV / Pillow just to read `(width, height)` would
decode pixel data we discard. The 30-line SOF parser reads ≤16
bytes per segment hop and terminates in <30 hops on real JPEGs.
* **The `probe_jpeg_dimensions` returns `None` on truncation /
non-JPEG / OSError — does NOT raise.** The downstream
`detect_raw_frames` explicitly treats `None` as "dimension
unknown ≠ raw frame match" (documented). This avoids the test
failing on every directory walk that happens to contain a
corrupt JPEG, while still surfacing real raw-frame retention.
* **Docker inspect via env-var indirection.** The e2e-runner
container does not have docker-socket access (an intentional
security boundary). The fixture builder must `docker network
inspect e2e-net > /e2e-results/net.json` + `docker inspect
gps-denied-onboard > /e2e-results/sut.json` before the runner
starts, and the runner reads those snapshots through env vars.
This is the same pattern AZ-420 used for `gcs_tlog_<host>.tlog`
(fixture-builder responsibility).
## Production Dependencies (forward-look)
FT-P-15 / FT-P-16 / FT-P-18 transitively depend on:
* **FDR `cache-self-check` record** at SUT cold-start — the SUT's
C6 tile-cache loader must emit one record carrying every manifest
entry it loaded. (Cross-checked against the FDR schema documented
in `_docs/02_document/components/c6_*` — slot is reserved; no
producer wires it yet.)
* **FDR `tile-load-rejected` events** — for entries below the m/px
floor (or otherwise rejected by the freshness gate). Reserved
same way.
* **Docker compose `e2e-net` attribute `internal: true`** — owned
by AZ-406. Already wired per the existing compose file.
* **Fixture builder snapshots** of `docker inspect` (AZ-595).
Tests fail loudly when fixture data is missing rather than silently
skipping — the "tests as gates" pattern.
## Out of Scope (deferred)
* DNS blackhole defense-in-depth — owned by NFT-SEC-05 (AZ-437).
* Cache-poisoning safety — owned by NFT-SEC-01 (AZ-436).
* Stale-tile rejection on aged source tiles — owned by FT-N-05
(AZ-427).
* The fixture builder's actual `cache-self-check` FDR synthesis +
docker-inspect JSON capture — owned by AZ-595.
## Next Batch
Batch 83 candidates from `_docs/02_tasks/todo/` (20 remaining): AZ-422
(FT-P-17 + FT-N-06 mid-flight tiles, 3 cp), AZ-423 (FT-P-19 sat
reloc, 3 cp), AZ-427 (FT-N-05 stale-tile rejection, 2 cp). Topo-order
leader is AZ-422. Pick at next `/autodev` invocation.
@@ -0,0 +1,224 @@
# Code Review Report
**Batch**: 82 — AZ-421 (FT-P-15 + FT-P-16 + FT-P-18 cache/offline/no-raw-retention)
**Date**: 2026-05-17
**Verdict**: PASS_WITH_WARNINGS
## Findings
| # | Severity | Category | File:Line | Title |
|----|----------|-----------------|----------------------------------------------------------------------------|----------------------------------------------------------------|
| 1 | Low | Maintainability | `e2e/runner/helpers/tile_cache_inspector.py:120` | `_resolve_entry_id` falls back to `tile_matrix` before synth |
| 2 | Low | Style | `e2e/_unit_tests/helpers/test_tile_cache_inspector.py:139` | Multi-OR assert in synthesised-id test |
| 3 | Low | Scope | `e2e/tests/positive/test_ft_p_16_offline_only.py:80` | Docker inspect JSON env-var indirection requires fixture support |
### Finding Details
**F1: `_resolve_entry_id` lookup order may surface `tile_matrix` as an id**
(Low / Maintainability)
- Location: `e2e/runner/helpers/tile_cache_inspector.py:120-124`
- Description: When an entry lacks both `id` and `tile_id`, the
resolver falls through to `tile_matrix` before synthesising an
`entry_N` placeholder. This can produce duplicate "id" values if
several entries share a tile-matrix, which would in turn block
the `rejected_below_floor_ids` lookup from matching the right
entry.
- Suggestion: leave as-is for now; the FDR schema commits to `id`
being present per `_docs/02_document/components/c6_*` contracts.
The fallback is a defensive read for malformed fixtures. If the
fixture builder ever produces entries without `id`, the AC-1
"missing_fields" check already fails first — the entry-id
resolution is then for diagnostic display only.
- Task: AZ-421
**F2: Multi-OR assert in synthesised-id test** (Low / Style)
- Location: `e2e/_unit_tests/helpers/test_tile_cache_inspector.py:139`
- Description:
`test_evaluate_manifest_schema_entry_id_falls_back_to_synthesised`
uses a 3-way OR assert because the `_resolve_entry_id` resolver
inspects `id``tile_id``tile_matrix``entry_N` and the
test entry happens to have `tile_matrix`. The assert is correct
(covers the actual lookup order) but reads ambiguously.
- Suggestion: leave as-is; tightening the assert would force the
test to know the resolver's internal lookup chain, which is the
exact coupling code review usually flags. Documented here for
future cleanup if the resolver simplifies.
- Task: AZ-421
**F3: Docker inspect indirection requires fixture-builder support**
(Low / Scope)
- Location: `e2e/tests/positive/test_ft_p_16_offline_only.py:80-92`
- Description: The FT-P-16 scenario reads
`docker network inspect e2e-net` + `docker inspect <sut-ctr>` from
JSON files (env vars `DOCKER_NETWORK_INSPECT_PATH` /
`DOCKER_CONTAINER_INSPECT_PATH`) rather than calling `docker`
directly. This is intentional — the e2e-runner container does not
have docker-socket access, and the fixture builder must snapshot
inspect output before the runner starts.
- Suggestion: the fixture builder (AZ-595) needs a thin wrapper
that produces both JSON files at the start of every scenario run
that needs them. Tracked outside this batch.
- Task: AZ-421
## Findings Sweep
### Phase 1 — Context Loading
Read AZ-421 spec, blackbox-tests § FT-P-15/16/18, module-layout (confirmed
`blackbox_tests` owns `e2e/**`), conftest (fixture surface), existing FDR
reader, and recent helpers as templates (`gcs_telemetry_evaluator.py`,
`ap_contract_evaluator.py`).
### Phase 2 — Spec Compliance (AC trace)
* **AC-1 (FT-P-15 manifest schema completeness)** ✓
- Scenario: `test_ft_p_15_cache_schema` walks FDR for
`cache-self-check` records, builds the manifest entry list, calls
`evaluate_manifest_schema`, asserts `report.passes`.
- Pure-logic: 12 `test_evaluate_manifest_schema_*` unit tests
covering full-fields-pass, missing-fields-fail (single + multi
+ ordered), at-floor exactly, empty list, non-numeric m/px,
invalid floor → ValueError, custom required fields.
* **AC-2 (FT-P-15 m/px floor ≥ 0.5)** ✓
- Covered by `ManifestEntryReport.passes_floor` +
`ManifestSchemaReport.passes` (rejects below-floor entries
unless `tile_load_rejected_ids` includes them).
- Pure-logic: below-floor-no-rejection-fails,
below-floor-with-rejection-passes, at-floor-exactly-passes.
* **AC-3 (FT-P-16 offline operation)** ✓
- Scenario: `test_ft_p_16_offline_only` loads two docker-inspect
JSON files, calls `evaluate_offline_mode`, asserts
`report.passes`.
- Pure-logic: 7 `test_evaluate_offline_mode_*` unit tests
(passes, non-internal-fails, extra-network-fails,
no-networks-fails, missing-Internal-key-fails,
non-bool-Internal-fails, custom-expected-network-passes).
* **AC-4 (FT-P-18 no raw-frame retention)** ✓
- Scenario: `test_ft_p_18_no_raw_retention` walks FDR + tile-cache
via `walk_files`, probes JPEG dimensions, calls
`detect_raw_frames`, asserts `report.passes`.
- Pure-logic: 9 `test_detect_raw_frames_*` + 5
`test_probe_jpeg_dimensions_*` + 3 `test_walk_files_*` =
17 unit tests.
* **AC-5 (FT-P-18 thumbnail budget < 1 GB / 8 h)** ✓
- Scenario: computes `thumbnail_log_size_bytes` from the walk +
replay duration from FDR `monotonic_ms` span; calls
`evaluate_thumbnail_budget`; asserts `report.passes`.
- Pure-logic: 7 `test_evaluate_thumbnail_budget_*` unit tests
(under-budget, over-budget, extrapolation math,
zero-duration-fails, negative-size raises, invalid budget
raises, custom-budget-passes).
* **AC-6 (parameterisation)** ✓
- `pytest --collect-only` confirms 6 param IDs per scenario
(`[ardupilot|inav]-[okvis2|klt_ransac|vins_mono]`). All three
tests accept `fc_adapter` + `vio_strategy` fixtures.
### Phase 3 — Code Quality
* SRP: `tile_cache_inspector.py` carries five evaluators
(`evaluate_manifest_schema`, `evaluate_offline_mode`,
`detect_raw_frames`, `evaluate_thumbnail_budget`,
`probe_jpeg_dimensions`) + one walker (`walk_files`). Each
evaluator handles one AC family of one sub-scenario; the JPEG
dimension probe is co-located because it pairs structurally with
`detect_raw_frames`. ✓
* Naming: `m_per_px_floor`, `observed_size_bytes`,
`extrapolated_8h_size_bytes`, `nav_camera_raw_dimensions`
units in names. ✓
* AAA pattern in unit tests with `# Arrange / # Act / # Assert`
comments per coding rule. ✓
* No `try/except` swallows errors. `probe_jpeg_dimensions` catches
`OSError` and returns `None` — documented as "the file is not a
JPEG, the SOF marker is not present, or the file is truncated".
Callers of `probe_jpeg_dimensions` correctly treat `None` as
"dimension unknown" rather than silently zero. ✓
* No code comments narrating mechanics — only docstrings + one
one-liner on the SOF marker byte map (the byte list is part of
the JPEG standard; the link inside the docstring isn't needed
given the standard reference is universally known). ✓
* Function lengths: longest is `probe_jpeg_dimensions` at ~30 lines
including docstring; all under the 50-line / cyclomatic-10
threshold. ✓
### Phase 4 — Security Quick-Scan
* No SQL, no `shell=True`, no `eval`/`exec`. ✓
* No hardcoded secrets / API keys. ✓
* The JPEG SOF parser does bounded reads (every `read` checks
return-length); a malformed JPEG cannot cause unbounded memory
consumption. ✓
* `evaluate_offline_mode` validates `Internal` is a `bool` (not
truthy-coerced) — a string "true" or integer 1 in the inspect
JSON will not silently pass the gate. ✓
* `evaluate_thumbnail_budget` rejects negative size and
zero-or-negative budget. ✓
### Phase 5 — Performance
* `evaluate_manifest_schema`: O(N entries × F fields) — typically
<100 entries × 7 fields, trivial. ✓
* `detect_raw_frames`: O(N files), single pass; extension check
uses a tuple membership test (O(K) where K=8). ✓
* `evaluate_offline_mode`: O(M networks) where M is usually 1. ✓
* `evaluate_thumbnail_budget`: O(1). ✓
* `probe_jpeg_dimensions`: reads only segment headers (≤16 bytes
per segment hop) until SOF; even a multi-MB JPEG terminates
in <30 hops. ✓
* `walk_files`: O(total files under the roots), standard rglob
iteration; no in-memory list buffering. ✓
### Phase 6 — Cross-Task Consistency (single-task batch)
* Naming follows the recent `gcs_telemetry_evaluator` / `*_report`
/ `passes` property convention. ✓
* FDR record types declared as module-level constants
(`CACHE_SELF_CHECK_FDR_KIND`, `TILE_LOAD_REJECTED_FDR_KIND`)
mirrors the b81 pattern (`HINT_FDR_KIND`,
`ANCHOR_SEARCH_REGION_FDR_KIND`). ✓
* Skip-rule pattern (`if not sitl_replay_ready: pytest.skip(...)`)
is consistent with the 18 other scenario tests in `tests/positive`. ✓
### Phase 7 — Architecture Compliance
`_docs/02_document/module-layout.md` declares `blackbox_tests` as the
sole owner of `e2e/**`.
1. **Layer direction**: every import in the six new/edited files
resolves to `runner.helpers.*`, `runner.helpers.fdr_reader`,
`runner.helpers.tile_cache_inspector`, stdlib, or pytest. No
`src/gps_denied_onboard` imports. ✓ (verified by
`test_no_sut_imports.py`).
2. **Public API respect**: scenario tests import only top-level
module symbols from `runner.helpers.*` (no `_private`). ✓
3. **No new cyclic deps**: `tile_cache_inspector` is a leaf consumed
by 3 scenario tests + 1 unit-test module; no back-edges. ✓
4. **Duplicate symbols**: `probe_jpeg_dimensions` is the first JPEG
header parser in the e2e tree. If a future scenario needs the
same probe (e.g., NFT-LIM-02 size budgeting), promote to a
shared `runner/helpers/image_probe.py`. Tracked, not flagged.
5. **Cross-cutting concerns**: file-system walks (`walk_files`)
are local to `tile_cache_inspector` for now. If another scenario
needs filesystem walks for different reasons (e.g., FT-P-17
tile-output verification), promote. ✓
## Regression Gate
Full `e2e/_unit_tests/` suite: **793 passed in 139.27 s**, single run,
no flakes. Up from 746 (batch 81) by +47:
* +43 in new `test_tile_cache_inspector.py` (12 manifest, 7 offline,
9 raw-frames, 7 thumbnail-budget, 5 JPEG-probe, 3 walk).
* +3 new entries in `test_directory_layout.py` (3 scenario test paths).
* +1 from a `test_no_sut_imports.py` walk that now covers the new
helper.
No tests removed. Scenario tests skip locally because
`E2E_SITL_REPLAY_DIR` is unset (intended docker-vs-host boundary).