mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 11:31:13 +00:00
[AZ-407] [AZ-444] [AZ-445] Batch 68: fixtures, Tier-2 harness, NFR reporter
Three blackbox-harness tasks landed together — all depend only on
AZ-406 and unblock the FT-* / NFT-* scenario tasks scheduled for
batches 69+.
AZ-407 — Static fixture builders (3pt):
* tile-cache-builder/{builder.py, Dockerfile, build.sh} produces a
deterministic tile-cache-fixture Docker volume from
_docs/00_problem/input_data/. Reproducibility primitives: sorted
iteration, frozen PIL JPEG settings, FAISS HNSW32 built single-
threaded with seeded stub descriptors.
* age-injector/{age_injector.py, inject.sh} clones the volume and
shifts capture_date by N×30.44 days; tile JPEG bytes preserved
bit-identical. Emits synth-age-7mo + synth-age-13mo volumes.
* cold-boot/cold_boot_fixture.json: frozen FC pose snapshot at
Derkachi sector centre, schema v1.
* secrets/mavlink-test-passkey.txt: 64-hex with required
`# TEST ONLY` header line per AC-5. Passkey-equality test now
compares the secret line after stripping the header.
* security/cve-2025-53644.jpg: synthetic 158-byte malformed JPEG
(truncated SOS marker). OpenCV 4.11.x rejects gracefully with
imdecode → None. AZ-439 will sharpen for ASan instrumentation.
* Top-level Makefile with `make fixtures` / `make fixtures-*` /
`make e2e-tier1*` / `make unit-tests` targets.
AZ-444 — Tier-2 Jetson harness wrapper (5pt):
* run-tier2.sh rewritten as orchestrator. Detects local
(aarch64 + TIER2_HOST=localhost) vs remote (ssh into TIER2_HOST).
New flags: -k/--selector, --build-kind production|asan,
--reflash (gated behind TIER2_REFLASH_ACK=1 two-key gate),
--dry-run.
* tier2-on-jetson.sh (new) — on-device delegate. Verifies
gps-denied-onboard{,-asan}.service health; restarts with 5s
tolerance; spawns tegrastats + jtop parallel samplers; tails
ASan unit's journal in asan mode; drives docker compose with
TIER=tier2-jetson; forwards SELECTOR to pytest -k.
* docker/run-tier1.sh (new) — selector-parity sibling.
* AC-1 (selector parity) and AC-6 (reflash gating) unit-tested via
--dry-run output assertions. AC-2/AC-3/AC-4/AC-5 are hardware-
loop ACs verified by the Tier-2 runtime smoke (no Jetson in the
unit-test layer).
AZ-445 — CSV reporter + evidence bundler refinements (2pt):
* reporting/nfr_recorder.py (new) — pytest plugin. Provides the
`nfr_recorder` fixture with record_metric(name, value, ac_id)
and partial(ac_id, reason). At session end emits:
- per-nfr/<scenario_id>.json (AC-1)
- traceability-status.json with every AC ID parsed from
traceability-matrix.md, classified Covered/PARTIAL/NOT
COVERED with source scenario IDs (AC-2)
- regression-baseline.json with all numeric metrics (AC-3)
* csv_reporter.py extended — `_outcome_to_result` consults the
aggregator; rows flip PASS → PARTIAL when an AC was marked
PARTIAL by nfr_recorder (AC-4). Graceful fallback when
aggregator isn't registered (unit-test contexts).
* conftest.py registers nfr_recorder in pytest_plugins.
* New --traceability-matrix CLI flag seeds the NOT COVERED rows.
Build / config:
* pyproject.toml dev extras: added Pillow>=10.4,<13.0 for the
tile-cache-builder unit test (broad enough to keep torchvision's
Pillow 12 pin happy; the production builder runs inside its own
Docker image with its own pin).
* Updated test_directory_layout.py to cover 10 new files + replaced
the byte-equal passkey assertion with the header-stripping
variant.
Test results:
* 157 focused tests pass (was 97 in batch 67; +60 new across this
batch). No regressions.
Module-layout / spec drift:
* AZ-407 spec text says `tests/fixtures/...`; module-layout
blackbox_tests entry (commit d7a17a8) authoritatively places the
harness under `e2e/`. Implementation followed the layout entry.
* AZ-444 spec mentions `e2e/tier2/run-tier2.sh`; AZ-406 placed it
at `e2e/jetson/run-tier2.sh`. Kept at `e2e/jetson/` for
consistency.
* Cold-boot README ownership: corrected from AZ-419 to AZ-407 per
AZ-419's own Dependencies field.
Specs archived to _docs/02_tasks/done/. Jira tickets transitioned to
In Testing on commit.
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,49 @@
|
||||
# syntax=docker/dockerfile:1.7
|
||||
#
|
||||
# tile-cache-fixture builder image. Built once per CI; output is a named
|
||||
# Docker volume (`tile-cache-fixture`) mounted RO into the SUT by
|
||||
# `docker/docker-compose.test.yml`.
|
||||
#
|
||||
# Public-boundary discipline: this image does NOT install the SUT
|
||||
# package. It depends only on:
|
||||
# * Pillow — JPEG re-encode of the paired _gmaps.png reference tiles
|
||||
# and the deterministic stub-tile generator.
|
||||
# * faiss-cpu — deterministic HNSW descriptor index emission.
|
||||
# * numpy — backing array dtype for FAISS.
|
||||
#
|
||||
# Reproducibility:
|
||||
# * Pin Python to 3.10-slim (matches the runner image's Python line).
|
||||
# * Pin Pillow, faiss-cpu, numpy to the versions verified deterministic
|
||||
# in `e2e/_unit_tests/fixtures/test_tile_cache_builder.py`.
|
||||
# * `PYTHONHASHSEED=0` neutralises hash-order non-determinism.
|
||||
|
||||
FROM python:3.10.14-slim-bookworm@sha256:9c9efb0c19a8bb1f08e8e7a13be5d671e51bcb9c83a3a8b0e2ad7d8aaeb33b30
|
||||
|
||||
ENV PYTHONUNBUFFERED=1 \
|
||||
PYTHONDONTWRITEBYTECODE=1 \
|
||||
PYTHONHASHSEED=0 \
|
||||
PIP_NO_CACHE_DIR=1
|
||||
|
||||
RUN apt-get update \
|
||||
&& apt-get install -y --no-install-recommends \
|
||||
libgomp1 \
|
||||
ca-certificates \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
RUN pip install --no-cache-dir \
|
||||
"Pillow>=10.4,<12.0" \
|
||||
"numpy>=1.26,<2.0" \
|
||||
"faiss-cpu>=1.8,<2.0"
|
||||
|
||||
WORKDIR /opt/builder
|
||||
COPY builder.py /opt/builder/builder.py
|
||||
|
||||
# Drop root for runtime; the image only reads /input and writes to
|
||||
# /output, both bind-mounted by the caller.
|
||||
RUN useradd -u 10001 -m -d /home/builder builder \
|
||||
&& mkdir -p /input /output \
|
||||
&& chown -R builder:builder /opt/builder /input /output
|
||||
USER 10001:10001
|
||||
|
||||
ENTRYPOINT ["python", "/opt/builder/builder.py"]
|
||||
CMD ["--input-dir", "/input", "--output-dir", "/output"]
|
||||
@@ -1,15 +1,80 @@
|
||||
# tile-cache-builder
|
||||
# tile-cache-builder (AZ-407)
|
||||
|
||||
Builds the `tile-cache-fixture` Docker volume from the 60 still-image
|
||||
satellite references in `_docs/00_problem/input_data/` plus the Derkachi
|
||||
route bbox.
|
||||
satellite references in `_docs/00_problem/input_data/` plus the
|
||||
Derkachi route bbox.
|
||||
|
||||
This directory currently contains only the structural placeholder; the
|
||||
concrete builder (Dockerfile + build script + FAISS HNSW index emitter +
|
||||
manifest writer + reproducibility assertion) is delivered by **AZ-407**
|
||||
(Static fixture builders) — see AC-7 ("Fixture builders are reproducible")
|
||||
in `_docs/02_tasks/todo/AZ-406_test_infrastructure.md`.
|
||||
## Output schema
|
||||
|
||||
AZ-406 commits to the directory's location + name only. Do NOT delete this
|
||||
README before AZ-407 lands; the `e2e_unit_test_directory_layout` unit test
|
||||
asserts the placeholder is present.
|
||||
```
|
||||
tile-cache-fixture/
|
||||
tiles/<zoom>/<x>/<y>.jpg # tile JPEG body
|
||||
tiles/<zoom>/<x>/<y>.json # per-tile sidecar (mirrors `tiles` row)
|
||||
manifest.csv # sorted manifest (9 columns)
|
||||
descriptors.index # FAISS HNSW32 index (omitted if faiss not available)
|
||||
```
|
||||
|
||||
Manifest columns (per `_docs/00_problem/restrictions.md` § Satellite
|
||||
Imagery + `_docs/02_document/data_model.md` § 2.1):
|
||||
|
||||
| Column | Type | Notes |
|
||||
|--------|------|-------|
|
||||
| `zoom_level` | int | Slippy/XYZ zoom |
|
||||
| `tile_x`, `tile_y` | int | Tile coords at the zoom |
|
||||
| `capture_date` | ISO-8601 date | Default `2025-11-01` (frozen so freshness gate treats as fresh) |
|
||||
| `source` | enum | `googlemaps` for real paired tiles, `stub` for D-PROJ-3 fallback |
|
||||
| `m_per_px` | float | `0.5` (≥ the AC-8.1 floor) |
|
||||
| `jpeg_path` | str | Relative path to the JPEG body |
|
||||
| `content_hash` | hex | SHA-256 of the JPEG bytes |
|
||||
| `provenance` | str | `paired_gmaps:AD000NNN`, `STUB`, or `STUB_BBOX:derkachi:lat,lon,lat,lon` |
|
||||
|
||||
## Reproducibility (AC-1)
|
||||
|
||||
Two consecutive invocations from the same input produce a bit-identical
|
||||
output tree:
|
||||
|
||||
* Input files iterated in lexicographic order
|
||||
* PIL JPEG encoded with `quality=85, optimize=False, progressive=False, subsampling=2`
|
||||
* Manifest rows sorted by `(zoom_level, tile_x, tile_y)` before CSV
|
||||
serialisation
|
||||
* FAISS index built single-threaded with `omp_set_num_threads(1)` and
|
||||
SHA-derived stub descriptors
|
||||
|
||||
## Provenance (AC-7)
|
||||
|
||||
| Item | Source | License |
|
||||
|------|--------|---------|
|
||||
| Real tile bodies | `_docs/00_problem/input_data/AD*_gmaps.png` (2 paired references) | Project test fixture; safe to redistribute under this repo's license |
|
||||
| Stub tile bodies | Generated from `_stub_jpeg_bytes(seed)` (PIL solid-fill) | Fully synthetic; no third-party data |
|
||||
| Derkachi bbox tile | Synthetic placeholder until D-PROJ-3 lands | Fully synthetic |
|
||||
| FAISS index | SHA-derived stub vectors (not real VPR descriptors) | Fully synthetic |
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Production (Docker volume):
|
||||
e2e/fixtures/tile-cache-builder/build.sh
|
||||
|
||||
# Local mode (used by AZ-407 unit test):
|
||||
e2e/fixtures/tile-cache-builder/build.sh --local /tmp/tile-cache-out
|
||||
```
|
||||
|
||||
The unit test `e2e/_unit_tests/fixtures/test_tile_cache_builder.py`
|
||||
verifies AC-1 / AC-2 / AC-7 by invoking `builder.py` twice against a
|
||||
`tmp_path` and asserting the output is byte-identical.
|
||||
|
||||
## Notes on D-PROJ-3
|
||||
|
||||
When D-PROJ-3 supplies the production tile-corpus for the Derkachi
|
||||
sector, the stub tiles produced here (any row with `provenance = STUB`)
|
||||
should be replaced by real Suite Sat Service tiles for those
|
||||
footprints. The builder will then no longer fall back to
|
||||
`_stub_jpeg_bytes` — every still that lacks a paired `_gmaps.png`
|
||||
will draw from the real corpus instead.
|
||||
|
||||
## Owned by
|
||||
|
||||
AZ-407 (this task). The FAISS-stub descriptor format will not be used
|
||||
in production; the production VPR pipeline (C2) emits real DINOv2
|
||||
descriptors. The stub format is sufficient for AZ-407's reproducibility
|
||||
and schema contracts only.
|
||||
|
||||
Executable
+64
@@ -0,0 +1,64 @@
|
||||
#!/usr/bin/env bash
|
||||
# Build the tile-cache test fixture as a named Docker volume
|
||||
# (`tile-cache-fixture`), or emit it to a local directory in
|
||||
# ``--local <path>`` mode (used by the AZ-407 unit tests).
|
||||
#
|
||||
# AC-1 (deterministic): two invocations against the same input emit
|
||||
# identical FAISS index hash, identical manifest rows, and identical
|
||||
# tile filesystem byte sizes.
|
||||
#
|
||||
# Env vars:
|
||||
# TILE_CACHE_INPUT_DIR Path to _docs/00_problem/input_data (required)
|
||||
# TILE_CACHE_VOLUME_NAME Docker volume name (default: tile-cache-fixture)
|
||||
#
|
||||
# Usage:
|
||||
# build.sh # builds the named Docker volume
|
||||
# build.sh --local /tmp/out # emits to /tmp/out (no Docker)
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
REPO_ROOT="$(cd "${SCRIPT_DIR}/../../.." && pwd)"
|
||||
|
||||
VOLUME_NAME="${TILE_CACHE_VOLUME_NAME:-tile-cache-fixture}"
|
||||
INPUT_DIR="${TILE_CACHE_INPUT_DIR:-${REPO_ROOT}/_docs/00_problem/input_data}"
|
||||
|
||||
LOCAL_OUT=""
|
||||
if [[ "${1:-}" == "--local" ]]; then
|
||||
if [[ -z "${2:-}" ]]; then
|
||||
echo "ERROR: --local requires an output directory" >&2
|
||||
exit 2
|
||||
fi
|
||||
LOCAL_OUT="$2"
|
||||
fi
|
||||
|
||||
if [[ ! -d "${INPUT_DIR}" ]]; then
|
||||
echo "ERROR: input dir not found: ${INPUT_DIR}" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
if [[ -n "${LOCAL_OUT}" ]]; then
|
||||
# Local mode: invoke builder.py directly. The caller's venv must
|
||||
# have Pillow, numpy, faiss-cpu installed; the unit test pulls
|
||||
# them via the dev extras.
|
||||
python3 "${SCRIPT_DIR}/builder.py" \
|
||||
--input-dir "${INPUT_DIR}" \
|
||||
--output-dir "${LOCAL_OUT}"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# Docker mode: build the builder image and populate the named volume.
|
||||
IMAGE_TAG="azaion-tile-cache-builder:local"
|
||||
|
||||
docker build -t "${IMAGE_TAG}" "${SCRIPT_DIR}"
|
||||
|
||||
# Recreate the named volume so output is bit-stable across runs (AC-1).
|
||||
docker volume rm "${VOLUME_NAME}" >/dev/null 2>&1 || true
|
||||
docker volume create "${VOLUME_NAME}" >/dev/null
|
||||
|
||||
docker run --rm \
|
||||
-v "${INPUT_DIR}:/input:ro" \
|
||||
-v "${VOLUME_NAME}:/output" \
|
||||
"${IMAGE_TAG}"
|
||||
|
||||
echo "tile-cache-fixture volume '${VOLUME_NAME}' built from ${INPUT_DIR}"
|
||||
@@ -0,0 +1,418 @@
|
||||
"""Deterministic tile-cache fixture builder.
|
||||
|
||||
Reads source imagery + ground-truth from ``_docs/00_problem/input_data/``
|
||||
and emits a reproducible ``tile-cache-fixture`` tree at ``--output``:
|
||||
|
||||
<output>/
|
||||
tiles/<zoom>/<x>/<y>.jpg # tile JPEG bodies
|
||||
tiles/<zoom>/<x>/<y>.json # per-tile sidecar (mirrors `tiles` row)
|
||||
manifest.csv # sorted manifest with content hashes
|
||||
descriptors.index # stub FAISS HNSW index (optional)
|
||||
|
||||
The builder is invokable directly (``python -m runner.fixtures.tile_cache_builder.builder``)
|
||||
or inside the per-builder Docker image (``Dockerfile`` in this directory).
|
||||
|
||||
Reproducibility primitives (AC-1):
|
||||
|
||||
* Source files are sorted lexicographically before processing.
|
||||
* PIL JPEG encode uses ``quality=85, optimize=False, progressive=False``
|
||||
with explicit ``subsampling=2`` (4:2:0) — these are the PIL defaults
|
||||
but pinning them protects against future PIL changes.
|
||||
* Manifest rows are sorted by ``(zoom_level, tile_x, tile_y)`` before CSV
|
||||
serialization.
|
||||
* FAISS index (when ``faiss-cpu`` is importable) is built single-threaded
|
||||
with ``faiss.omp_set_num_threads(1)`` and a fixed seed (``faiss.write_index``
|
||||
output is deterministic given the same descriptor sequence).
|
||||
* Descriptors are SHA-256-derived stub vectors — sufficient for schema
|
||||
contracts, NOT a substitute for real VPR descriptors emitted by C2.
|
||||
|
||||
Public-boundary discipline: this module does NOT import any
|
||||
``src/gps_denied_onboard`` symbol. The on-disk schema lives in
|
||||
``_docs/00_problem/restrictions.md`` § Satellite Imagery and is the only
|
||||
contract this builder honours.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import csv
|
||||
import datetime as _dt
|
||||
import hashlib
|
||||
import io
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import shutil
|
||||
import sys
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
from typing import Iterable
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# AC-2: Derkachi route bbox (placeholder centre — refined when D-PROJ-3
|
||||
# lands the production Derkachi sector polygon). Lat/Lon are the bbox
|
||||
# corners; the builder emits one tile per `(zoom, tx, ty)` covering the
|
||||
# rectangle.
|
||||
DERKACHI_BBOX = {
|
||||
"min_lat": 50.05,
|
||||
"max_lat": 50.10,
|
||||
"min_lon": 36.10,
|
||||
"max_lon": 36.20,
|
||||
}
|
||||
|
||||
# Static "frozen" capture date for the base fixture. AC-3's age-injector
|
||||
# operates on a clone; the BASE fixture's date is intentionally fixed in
|
||||
# the past so the C6 freshness check (6-mo active-conflict /
|
||||
# 12-mo rear) treats it as fresh for the default scenarios.
|
||||
BASE_CAPTURE_DATE = "2025-11-01"
|
||||
|
||||
# Zoom level used by C6 for the Derkachi corpus (matches restrictions.md
|
||||
# §Satellite Imagery: ≥0.5 m/px at the cache interface).
|
||||
DEFAULT_ZOOM = 18
|
||||
|
||||
# Tile dimensions (slippy/XYZ convention).
|
||||
TILE_W = 256
|
||||
TILE_H = 256
|
||||
|
||||
# Stub-descriptor dimensionality (matches the production VPR descriptor
|
||||
# size declared in `_docs/02_document/components/c2_vpr/description.md`
|
||||
# for layout compatibility; the values themselves are SHA-derived stubs).
|
||||
DESCRIPTOR_DIM = 256
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class TileEntry:
|
||||
"""One row of the manifest. Sorted before CSV serialisation."""
|
||||
|
||||
zoom_level: int
|
||||
tile_x: int
|
||||
tile_y: int
|
||||
capture_date: str
|
||||
source: str
|
||||
m_per_px: float
|
||||
jpeg_path: str
|
||||
content_hash: str
|
||||
provenance: str
|
||||
|
||||
|
||||
def _iter_stills(input_dir: Path) -> Iterable[Path]:
|
||||
"""Yield AD000NNN.jpg files in sorted order."""
|
||||
|
||||
for p in sorted(input_dir.glob("AD*.jpg")):
|
||||
yield p
|
||||
|
||||
|
||||
def _iter_paired_gmaps(input_dir: Path) -> set[str]:
|
||||
"""Return the set of AD000NNN basenames that have a paired _gmaps.png."""
|
||||
|
||||
return {p.stem.removesuffix("_gmaps") for p in input_dir.glob("AD*_gmaps.png")}
|
||||
|
||||
|
||||
def _slippy_xy_from_index(idx: int, zoom: int) -> tuple[int, int]:
|
||||
"""Deterministic (tile_x, tile_y) layout: row-major raster across the
|
||||
Derkachi bbox. The mapping is NOT geodetically meaningful — it is a
|
||||
stable placeholder until D-PROJ-3 supplies the production tile-matrix
|
||||
transform. Each `idx` gets a unique (tx, ty) so the manifest stays
|
||||
collision-free.
|
||||
"""
|
||||
|
||||
cols = 16 # 16x16 grid covers 256 tiles → comfortably more than 60 stills + 1 bbox
|
||||
tx = (idx % cols) + (1 << (zoom - 1))
|
||||
ty = (idx // cols) + (1 << (zoom - 1))
|
||||
return tx, ty
|
||||
|
||||
|
||||
def _stub_jpeg_bytes(seed: int) -> bytes:
|
||||
"""Render a deterministic 256x256 JPEG keyed on `seed`.
|
||||
|
||||
No PIL randomness, no timestamps in metadata. The body is a 4-band
|
||||
gradient (R,G,B,grey) computed from `seed`; OpenCV's imdecode + C2's
|
||||
descriptor pipeline both treat the bytes as a valid JPEG.
|
||||
"""
|
||||
|
||||
from PIL import Image # noqa: PLC0415 — heavy import, deferred
|
||||
|
||||
r = (seed * 37) & 0xFF
|
||||
g = (seed * 53) & 0xFF
|
||||
b = (seed * 71) & 0xFF
|
||||
img = Image.new("RGB", (TILE_W, TILE_H), color=(r, g, b))
|
||||
buf = io.BytesIO()
|
||||
img.save(
|
||||
buf,
|
||||
format="JPEG",
|
||||
quality=85,
|
||||
optimize=False,
|
||||
progressive=False,
|
||||
subsampling=2,
|
||||
)
|
||||
return buf.getvalue()
|
||||
|
||||
|
||||
def _real_tile_jpeg_bytes(gmaps_png: Path) -> bytes:
|
||||
"""Re-encode a paired _gmaps.png as a deterministic JPEG."""
|
||||
|
||||
from PIL import Image # noqa: PLC0415
|
||||
|
||||
img = Image.open(gmaps_png).convert("RGB").resize((TILE_W, TILE_H), Image.BICUBIC)
|
||||
buf = io.BytesIO()
|
||||
img.save(
|
||||
buf,
|
||||
format="JPEG",
|
||||
quality=85,
|
||||
optimize=False,
|
||||
progressive=False,
|
||||
subsampling=2,
|
||||
)
|
||||
return buf.getvalue()
|
||||
|
||||
|
||||
def _content_hash(b: bytes) -> str:
|
||||
return hashlib.sha256(b).hexdigest()
|
||||
|
||||
|
||||
def _sidecar_dict(entry: TileEntry) -> dict:
|
||||
"""Per-tile JSON sidecar (mirrors the `tiles` row content per
|
||||
data_model.md § 2.1.2).
|
||||
"""
|
||||
|
||||
return {
|
||||
"zoom_level": entry.zoom_level,
|
||||
"tile_x": entry.tile_x,
|
||||
"tile_y": entry.tile_y,
|
||||
"capture_date": entry.capture_date,
|
||||
"source": entry.source,
|
||||
"m_per_px": entry.m_per_px,
|
||||
"content_hash": entry.content_hash,
|
||||
"provenance": entry.provenance,
|
||||
}
|
||||
|
||||
|
||||
def _emit_tile(out_dir: Path, entry: TileEntry, jpeg_bytes: bytes) -> None:
|
||||
"""Write `<out_dir>/tiles/<z>/<x>/<y>.{jpg,json}` atomically."""
|
||||
|
||||
tile_dir = out_dir / "tiles" / str(entry.zoom_level) / str(entry.tile_x)
|
||||
tile_dir.mkdir(parents=True, exist_ok=True)
|
||||
jpg_path = tile_dir / f"{entry.tile_y}.jpg"
|
||||
json_path = tile_dir / f"{entry.tile_y}.json"
|
||||
jpg_path.write_bytes(jpeg_bytes)
|
||||
json_path.write_text(
|
||||
json.dumps(_sidecar_dict(entry), sort_keys=True, separators=(",", ":")) + "\n"
|
||||
)
|
||||
|
||||
|
||||
def _write_manifest(out_dir: Path, rows: list[TileEntry]) -> Path:
|
||||
"""Write the sorted manifest CSV."""
|
||||
|
||||
manifest_path = out_dir / "manifest.csv"
|
||||
with manifest_path.open("w", newline="") as fp:
|
||||
writer = csv.writer(fp, lineterminator="\n")
|
||||
writer.writerow(
|
||||
[
|
||||
"zoom_level",
|
||||
"tile_x",
|
||||
"tile_y",
|
||||
"capture_date",
|
||||
"source",
|
||||
"m_per_px",
|
||||
"jpeg_path",
|
||||
"content_hash",
|
||||
"provenance",
|
||||
]
|
||||
)
|
||||
for r in sorted(rows, key=lambda x: (x.zoom_level, x.tile_x, x.tile_y)):
|
||||
writer.writerow(
|
||||
[
|
||||
r.zoom_level,
|
||||
r.tile_x,
|
||||
r.tile_y,
|
||||
r.capture_date,
|
||||
r.source,
|
||||
f"{r.m_per_px:.6f}",
|
||||
r.jpeg_path,
|
||||
r.content_hash,
|
||||
r.provenance,
|
||||
]
|
||||
)
|
||||
return manifest_path
|
||||
|
||||
|
||||
def _write_descriptors_index(out_dir: Path, rows: list[TileEntry]) -> Path | None:
|
||||
"""Emit a deterministic FAISS HNSW index of stub descriptors.
|
||||
|
||||
Returns the index path on success, or None when faiss-cpu is not
|
||||
importable. The unit test gates on importorskip("faiss"); the
|
||||
production build inside ``Dockerfile`` ships faiss-cpu so this path
|
||||
is always exercised in CI.
|
||||
"""
|
||||
|
||||
try:
|
||||
import faiss # noqa: PLC0415
|
||||
import numpy as np # noqa: PLC0415
|
||||
except ImportError:
|
||||
logger.warning(
|
||||
"faiss / numpy not importable in this environment — "
|
||||
"skipping descriptors.index emission. The fixture is still "
|
||||
"usable for schema-only scenarios; VPR-matching scenarios "
|
||||
"need the Docker build."
|
||||
)
|
||||
return None
|
||||
|
||||
# Single-thread + deterministic seed → bit-stable output.
|
||||
faiss.omp_set_num_threads(1)
|
||||
|
||||
descriptors = np.zeros((len(rows), DESCRIPTOR_DIM), dtype=np.float32)
|
||||
for i, r in enumerate(sorted(rows, key=lambda x: (x.zoom_level, x.tile_x, x.tile_y))):
|
||||
# SHA-derived stub: hash the tile's content_hash + index byte
|
||||
# into DESCRIPTOR_DIM float32s. Stable across runs because
|
||||
# content_hash is stable.
|
||||
seed_bytes = hashlib.sha256(
|
||||
f"{r.content_hash}|{i}".encode("ascii")
|
||||
).digest()
|
||||
rng = np.random.default_rng(int.from_bytes(seed_bytes[:8], "big"))
|
||||
descriptors[i] = rng.standard_normal(DESCRIPTOR_DIM, dtype=np.float32)
|
||||
|
||||
# HNSW32 + IP metric is the C2 production choice (see
|
||||
# _docs/02_document/components/c2_vpr/description.md).
|
||||
index = faiss.IndexHNSWFlat(DESCRIPTOR_DIM, 32, faiss.METRIC_INNER_PRODUCT)
|
||||
index.hnsw.efConstruction = 40
|
||||
index.hnsw.efSearch = 16
|
||||
index.add(descriptors)
|
||||
|
||||
index_path = out_dir / "descriptors.index"
|
||||
faiss.write_index(index, str(index_path))
|
||||
return index_path
|
||||
|
||||
|
||||
def build(input_dir: Path, output_dir: Path) -> dict:
|
||||
"""Build the tile-cache fixture under `output_dir` from `input_dir`.
|
||||
|
||||
Returns a manifest summary dict for caller logging:
|
||||
{"tile_count": int, "stub_count": int, "real_count": int,
|
||||
"manifest_hash": str, "descriptors_index_hash": str | None}
|
||||
|
||||
The output directory is wiped and re-created so two consecutive
|
||||
invocations against the same input produce bit-identical trees
|
||||
(AC-1).
|
||||
"""
|
||||
|
||||
if output_dir.exists():
|
||||
shutil.rmtree(output_dir)
|
||||
output_dir.mkdir(parents=True)
|
||||
|
||||
paired = _iter_paired_gmaps(input_dir)
|
||||
stills = list(_iter_stills(input_dir))
|
||||
if not stills:
|
||||
raise FileNotFoundError(
|
||||
f"No AD*.jpg files under {input_dir} — input_data/ may be missing"
|
||||
)
|
||||
|
||||
rows: list[TileEntry] = []
|
||||
stub_count = 0
|
||||
real_count = 0
|
||||
|
||||
# AC-2: one tile entry per still + one entry for the Derkachi bbox
|
||||
# (index 60 in our deterministic layout).
|
||||
for idx, still in enumerate(stills):
|
||||
tx, ty = _slippy_xy_from_index(idx, DEFAULT_ZOOM)
|
||||
if still.stem in paired:
|
||||
jpeg = _real_tile_jpeg_bytes(input_dir / f"{still.stem}_gmaps.png")
|
||||
source = "googlemaps"
|
||||
provenance = f"paired_gmaps:{still.stem}"
|
||||
real_count += 1
|
||||
else:
|
||||
# D-PROJ-3 stub-tile fallback per AZ-407 spec lines 18–19.
|
||||
jpeg = _stub_jpeg_bytes(idx + 1)
|
||||
source = "stub"
|
||||
provenance = "STUB"
|
||||
stub_count += 1
|
||||
entry = TileEntry(
|
||||
zoom_level=DEFAULT_ZOOM,
|
||||
tile_x=tx,
|
||||
tile_y=ty,
|
||||
capture_date=BASE_CAPTURE_DATE,
|
||||
source=source,
|
||||
m_per_px=0.5,
|
||||
jpeg_path=f"tiles/{DEFAULT_ZOOM}/{tx}/{ty}.jpg",
|
||||
content_hash=_content_hash(jpeg),
|
||||
provenance=provenance,
|
||||
)
|
||||
rows.append(entry)
|
||||
_emit_tile(output_dir, entry, jpeg)
|
||||
|
||||
# AC-2: Derkachi route bbox entry — single representative tile at
|
||||
# the bbox centre. Real coverage of the bbox is owned by D-PROJ-3.
|
||||
tx, ty = _slippy_xy_from_index(60, DEFAULT_ZOOM)
|
||||
bbox_jpeg = _stub_jpeg_bytes(60 + 1)
|
||||
bbox_entry = TileEntry(
|
||||
zoom_level=DEFAULT_ZOOM,
|
||||
tile_x=tx,
|
||||
tile_y=ty,
|
||||
capture_date=BASE_CAPTURE_DATE,
|
||||
source="stub",
|
||||
m_per_px=0.5,
|
||||
jpeg_path=f"tiles/{DEFAULT_ZOOM}/{tx}/{ty}.jpg",
|
||||
content_hash=_content_hash(bbox_jpeg),
|
||||
provenance=(
|
||||
f"STUB_BBOX:derkachi:{DERKACHI_BBOX['min_lat']},"
|
||||
f"{DERKACHI_BBOX['min_lon']},{DERKACHI_BBOX['max_lat']},"
|
||||
f"{DERKACHI_BBOX['max_lon']}"
|
||||
),
|
||||
)
|
||||
rows.append(bbox_entry)
|
||||
_emit_tile(output_dir, bbox_entry, bbox_jpeg)
|
||||
stub_count += 1
|
||||
|
||||
manifest_path = _write_manifest(output_dir, rows)
|
||||
manifest_hash = hashlib.sha256(manifest_path.read_bytes()).hexdigest()
|
||||
|
||||
index_path = _write_descriptors_index(output_dir, rows)
|
||||
if index_path is not None:
|
||||
descriptors_hash = hashlib.sha256(index_path.read_bytes()).hexdigest()
|
||||
else:
|
||||
descriptors_hash = None
|
||||
|
||||
return {
|
||||
"tile_count": len(rows),
|
||||
"stub_count": stub_count,
|
||||
"real_count": real_count,
|
||||
"paired_gmaps_count": len(paired),
|
||||
"manifest_hash": manifest_hash,
|
||||
"descriptors_index_hash": descriptors_hash,
|
||||
}
|
||||
|
||||
|
||||
def main(argv: list[str] | None = None) -> int:
|
||||
parser = argparse.ArgumentParser(description="Build the tile-cache test fixture")
|
||||
parser.add_argument(
|
||||
"--input-dir",
|
||||
type=Path,
|
||||
required=True,
|
||||
help="Directory containing AD*.jpg and AD*_gmaps.png source files",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output-dir",
|
||||
type=Path,
|
||||
required=True,
|
||||
help="Output directory for the tile-cache fixture tree",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--quiet",
|
||||
action="store_true",
|
||||
help="Suppress per-tile log lines (errors still surface)",
|
||||
)
|
||||
args = parser.parse_args(argv)
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.WARNING if args.quiet else logging.INFO,
|
||||
format="%(asctime)s %(levelname)s %(name)s %(message)s",
|
||||
)
|
||||
|
||||
summary = build(args.input_dir, args.output_dir)
|
||||
json.dump(summary, sys.stdout, sort_keys=True, indent=2)
|
||||
sys.stdout.write("\n")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
Reference in New Issue
Block a user