[AZ-408] [AZ-410] [AZ-411] Batch 69: synth injectors + FT-P-02/03/14

AZ-408 (3pt) — Replace AZ-406 injector scaffolds with concrete generators:
- outlier.py: deterministic stride + far-away tile replacement; AC-2 ≥350m offset
- blackout_spoof.py: paired video blackout + FC GPS spoof with ≤40ms alignment;
  AC-4 realistic fix_type/hdop; AC-NEW-8 200-500m inter-spoof deltas
- multi_segment.py: ≥3 disjoint windows, ≥30s gaps, ≤25% coverage
- fc_proxy.py: timed-splice runtime proxy with pre-activate RuntimeError guard
- _common.py: derive_rng + tile-manifest reader + tmpfs helpers
- injector_fixtures.py: pytest fixtures wired via runner conftest

AZ-410 (3pt) — FT-P-02 cumulative drift between satellite anchors:
- anchor_pair_detector.py: AC-1 detection, AC-2/3 pass-fraction,
  AC-4 monotonicity check, CSV evidence
- test_ft_p_02_derkachi_drift.py: scenario gated on upstream helper
  NotImplementedError (frame_source_replay / fdr_reader / imu_replay)

AZ-411 (2pt) — FT-P-03 + FT-P-14 schema + WGS84:
- estimate_schema.py: AC-1 schema completeness, AC-2 source-label set
  containment, AC-3 WGS84 range + int32 1e-7 decode
- test_ft_p_03_14_schema_wgs84.py: shared single-image-push scenario

Tests: 248 unit tests pass (+91 vs batch 68).
Reports: batch_69_report.md, batch_69_review.md (PASS),
cumulative_review_batches_67-69_cycle1_report.md (PASS).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-16 17:54:00 +03:00
parent ff1b00200c
commit 702a0c0ff3
27 changed files with 4619 additions and 58 deletions
+8 -4
View File
@@ -6,9 +6,13 @@ negative-path scenario:
- outlier.py — outlier-injection-derkachi (FT-N-01)
- blackout_spoof.py — blackout-spoof-derkachi (FT-N-04, NFT-RES-04)
- multi_segment.py — multi-segment-derkachi (FT-P-08)
- cold_boot.py — cold-boot-fixture (FT-P-11, NFT-PERF-03)
- fc_proxy.py — coordinated FC GPS spoof proxy (consumed by
blackout_spoof's runtime path; AZ-408 AC-3)
- cold_boot.py — cold-boot-fixture (FT-P-11, NFT-PERF-03;
deferred to AZ-419)
AZ-406 supplies the package layout + public function signatures; concrete
generators are delivered by **AZ-408** (Runtime synthetic-injection fixture
builders).
AZ-406 supplied the package layout + scaffold dataclasses; AZ-408 (this
batch) replaces every ``NotImplementedError`` with a real generator and
adds the shared ``_common.py`` (deterministic seeds, tile-cache
manifest reader, tmpfs scratch helpers) + ``fc_proxy.py``.
"""
+221
View File
@@ -0,0 +1,221 @@
"""Shared helpers for the AZ-408 runtime synthetic-injection fixture builders.
Three responsibilities, each kept deliberately small:
1. **Deterministic seed derivation** — every injector accepts an integer
``--seed`` flag and must produce bit-identical output across two runs
for the same ``(seed, density|window_seconds|n_segments)`` pair. The
shared ``derive_rng()`` helper hashes the inputs into a 64-bit seed,
so two unrelated injectors don't accidentally share a stream.
2. **Tile-cache manifest read** — the outlier injector needs to pick a
"far-away" tile (per AC-3.1: ≥350 m offset). The tile-cache fixture
(built by AZ-407 / ``e2e/fixtures/tile-cache-builder/builder.py``)
ships a ``manifest.csv`` with the per-tile ground-truth lat/lon
derivable from ``(zoom_level, tile_x, tile_y)`` via the slippy-map
convention. We read the CSV ourselves rather than depending on the
builder package — that keeps the injectors independently testable
without a Docker tile-cache volume present.
3. **Tmpfs scratch root** — AC-6 says "auto-cleared at teardown within
≤2 s". We expose ``tmpfs_root(run_id, scenario)`` so every injector
writes under the same predictable parent (``/tmp/<run_id>/<scenario>/``)
and the pytest fixture wrapper can shutil.rmtree on teardown.
Public-boundary discipline: this module does NOT import any
``src/gps_denied_onboard`` symbol.
"""
from __future__ import annotations
import csv
import hashlib
import math
import shutil
import struct
from dataclasses import dataclass
from pathlib import Path
from typing import Iterable
import numpy as np
DEFAULT_SCRATCH_ROOT = Path("/tmp")
def derive_rng(domain: str, *components: object) -> np.random.Generator:
"""Stable RNG keyed on ``(domain, components...)``.
The domain string is a short unique tag per injector (``"outlier"``,
``"blackout_spoof"``, ``"multi_segment"``); the components are the
user-visible knobs (seed, density, window_seconds, etc.).
Two invocations with the same arguments return RNGs that produce the
same sequence of values. Two invocations with different ``domain`` —
even with the same ``components`` — produce independent sequences.
"""
payload = "|".join((domain,) + tuple(str(c) for c in components))
digest = hashlib.sha256(payload.encode("ascii")).digest()
seed64 = struct.unpack(">Q", digest[:8])[0]
return np.random.default_rng(seed64)
def tmpfs_root(run_id: str, scenario: str, base: Path | None = None) -> Path:
"""Return ``<base>/<run_id>/<scenario>/`` (created); used by every injector.
The pytest fixture wrapper passes ``base = pytest's tmp_path_factory``
so unit-test runs stay inside the pytest tmp tree rather than ``/tmp``.
"""
base = base or DEFAULT_SCRATCH_ROOT
out = base / run_id / scenario
out.mkdir(parents=True, exist_ok=True)
return out
def cleanup_tmpfs(path: Path) -> None:
"""``rmtree`` ``path`` if it exists; silent no-op otherwise.
Called from pytest fixture teardown. Per AC-6 the rm must complete
within ≤2 s; ``shutil.rmtree`` of a single-scenario directory with a
few thousand small files reliably finishes in <100 ms.
"""
if path.exists():
shutil.rmtree(path)
# ---------------------------------------------------------------------------
# Tile-cache manifest read (AZ-407 schema)
# ---------------------------------------------------------------------------
# Slippy-map convention — see e2e/fixtures/tile-cache-builder/builder.py
# DEFAULT_ZOOM = 18 — these constants are the contract this module relies
# on (they are NOT imported from the builder to avoid a runtime dependency
# on the tile-cache-builder package at injector-test time).
_TILE_SIZE = 256 # px
@dataclass(frozen=True)
class TileGtRow:
"""One row of the tile-cache manifest, with derived lat/lon centre."""
zoom_level: int
tile_x: int
tile_y: int
capture_date: str
source: str
m_per_px: float
jpeg_path: str
content_hash: str
provenance: str
centre_lat_deg: float
centre_lon_deg: float
def _tile_centre_lat_lon(zoom: int, tx: int, ty: int) -> tuple[float, float]:
"""Slippy XYZ tile centre → (lat_deg, lon_deg).
Standard Web-Mercator inverse of the (tx, ty) tile origin offset by
``+0.5`` to get the centre rather than the NW corner.
"""
n = 2.0 ** zoom
lon_deg = (tx + 0.5) / n * 360.0 - 180.0
lat_rad = math.atan(math.sinh(math.pi * (1 - 2 * (ty + 0.5) / n)))
lat_deg = math.degrees(lat_rad)
return lat_deg, lon_deg
def read_tile_manifest(manifest_csv: Path) -> list[TileGtRow]:
"""Parse the tile-cache ``manifest.csv`` (AZ-407 schema) into typed rows.
Each row gets a derived ``(centre_lat_deg, centre_lon_deg)`` computed
from the slippy tile coordinates — the injectors use this for the
"far-away crop" geodesic check (AC-2).
Raises FileNotFoundError when the manifest is missing — the injector
CLI surfaces this with an explicit "build the tile-cache fixture
first" message. We do NOT silently fall back to a stub manifest;
that would hide a misconfigured test run.
"""
if not manifest_csv.is_file():
raise FileNotFoundError(
f"tile-cache manifest not found at {manifest_csv} — build the "
"tile-cache fixture first (`./e2e/fixtures/tile-cache-builder/build.sh`)"
)
rows: list[TileGtRow] = []
with manifest_csv.open("r", newline="") as fp:
reader = csv.DictReader(fp)
for raw in reader:
zoom = int(raw["zoom_level"])
tx = int(raw["tile_x"])
ty = int(raw["tile_y"])
lat, lon = _tile_centre_lat_lon(zoom, tx, ty)
rows.append(
TileGtRow(
zoom_level=zoom,
tile_x=tx,
tile_y=ty,
capture_date=raw["capture_date"],
source=raw["source"],
m_per_px=float(raw["m_per_px"]),
jpeg_path=raw["jpeg_path"],
content_hash=raw["content_hash"],
provenance=raw["provenance"],
centre_lat_deg=lat,
centre_lon_deg=lon,
)
)
if not rows:
raise ValueError(f"tile-cache manifest at {manifest_csv} is empty")
return rows
def haversine_m(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
"""Great-circle distance in meters (Haversine).
Used by the injector "far-away" check. We deliberately re-implement
rather than importing ``runner.helpers.geo.distance_m`` — the
injectors must work without pyproj installed (the project's
``[dev]`` extra installs pyproj, but the injectors run inside
minimal Docker images and on bare ground stations).
"""
R = 6_371_000.0
p1 = math.radians(lat1)
p2 = math.radians(lat2)
dp = math.radians(lat2 - lat1)
dl = math.radians(lon2 - lon1)
a = math.sin(dp / 2) ** 2 + math.cos(p1) * math.cos(p2) * math.sin(dl / 2) ** 2
return float(2 * R * math.asin(math.sqrt(a)))
def far_away_indices(
rows: list[TileGtRow],
src_idx: int,
min_offset_m: float,
) -> list[int]:
"""Return indices of rows whose centre is ≥ ``min_offset_m`` from ``src_idx``."""
src = rows[src_idx]
return [
j
for j, r in enumerate(rows)
if j != src_idx
and haversine_m(src.centre_lat_deg, src.centre_lon_deg, r.centre_lat_deg, r.centre_lon_deg)
>= min_offset_m
]
# ---------------------------------------------------------------------------
# Tiny utilities
# ---------------------------------------------------------------------------
def iter_video_frame_indices(total_frames: int, density_ratio: float) -> Iterable[int]:
"""Yield 1-of-N frame indices for the requested density ratio.
Density is the fraction of frames replaced; e.g., ``density_ratio=0.1``
means every 10th frame (deterministic stride, NOT random sampling) —
we keep the stride deterministic so the unit test's "X-th frame is
replaced" assertion stays stable.
"""
if not 0 < density_ratio <= 1.0:
raise ValueError(f"density_ratio must be in (0, 1]; got {density_ratio}")
stride = max(1, round(1 / density_ratio))
return range(0, total_frames, stride)
+401 -10
View File
@@ -1,27 +1,418 @@
"""blackout-spoof-derkachi — visual blackout + spoofed GPS combination (FT-N-04, NFT-RES-04).
"""blackout-spoof-derkachi — synchronized visual blackout + GPS spoof (FT-N-04, NFT-RES-04).
Concrete generator is owned by AZ-408. AZ-406 commits to the public
signature.
Produces a **schedule** + paired runtime artefacts for a coordinated
visual-blackout / FC-GPS-spoof scenario. The schedule itself is the
single source of truth — the video-overlay portion AND the FC-inbound
proxy patch both read from it so the two streams stay synchronized
within AC-3 (≤40 ms wall-clock alignment).
What ``build()`` writes:
<out_root>/
schedule.json # window_start_ms / window_end_ms,
# spoofed-GPS frame timeline
frames/AD000001.jpg # source frame, OR a black frame inside windows
manifest.csv # per-replaced-frame metadata for tests
summary.json # aggregate (window count, max alignment err, …)
The schedule's ``spoof_gps`` list is consumed by ``fc_proxy.py`` at run
time: the proxy walks its monotonic clock and, when ``now_ms`` falls
inside ``[window_start_ms, window_end_ms]``, replaces inbound GPS frames
with the next pre-computed spoofed record.
Determinism (AC-1 of AZ-408): identical ``(window_seconds, spoof_offset_m,
spoof_bearing_deg, seed)`` reproduce the same schedule and frame outputs.
Spoof-GPS values come from a ``derive_rng("blackout_spoof", …)`` stream;
window timing is deterministic-positional (anchored at 30 % of the source
duration so each window family ends inside the flight). The 200500 m
inter-spoof delta requirement (AC-4 / AC-NEW-8) is enforced by the
delta-bound parameter — no random rejection sampling.
Public-boundary discipline: this module does NOT import any
``src/gps_denied_onboard`` symbol.
"""
from __future__ import annotations
from dataclasses import dataclass
import argparse
import csv
import io
import json
import logging
import math
import shutil
import sys
from dataclasses import dataclass, field
from pathlib import Path
import numpy as np
from ._common import derive_rng, tmpfs_root
logger = logging.getLogger(__name__)
# AC-NEW-8: spoofed GPS jumps 200-500 m between consecutive spoof frames.
_MIN_INTER_SPOOF_DELTA_M = 200.0
_MAX_INTER_SPOOF_DELTA_M = 500.0
# Spoofed-frame cadence — typical FC GPS update rate (10 Hz).
_SPOOF_HZ = 10.0
# AC-4: spoofed fields stay inside typical-flight ranges.
_SPOOF_FIX_TYPES = (3, 4) # GPS_FIX_TYPE_3D / GPS_FIX_TYPE_DGPS
_SPOOF_HDOP_RANGE = (0.5, 2.5)
# Source-frame defaults — overrideable via CLI.
_DEFAULT_SRC_FPS = 30.0
_TILE_W = 256
_TILE_H = 256
@dataclass(frozen=True)
class BlackoutSpoofPlan:
"""Configuration for the blackout-spoof-derkachi fixture.
`blackout_seconds` corresponds to the 5 / 15 / 35 s window family from
NFT-RES-04 (35 s escalation ladder) and FT-N-04 (blackout + spoof).
AZ-408 replaces the AZ-406 scaffold dataclass; the previous shape
(``blackout_seconds`` / ``spoof_offset_m`` / ``spoof_bearing_deg``)
is preserved and extended with the inputs the runtime build path
needs.
"""
source_frames_dir: Path
blackout_seconds: float
spoof_offset_m: float
spoof_bearing_deg: float
seed: int = 0
spoof_offset_m: float = 350.0
spoof_bearing_deg: float = 45.0
source_fps: float = _DEFAULT_SRC_FPS
# AC-NEW-3: the proxy must START emitting spoofed GPS within ≤40 ms
# of the first all-black video frame. This is a documented invariant
# the runtime proxy enforces; we keep it in the plan as the
# "promised" alignment so tests can assert against it.
max_alignment_err_ms: float = 40.0
initial_lat_deg: float = 50.075
initial_lon_deg: float = 36.15
def build(plan: BlackoutSpoofPlan, out_root: Path) -> Path:
raise NotImplementedError("Owned by AZ-408 — AZ-406 supplies only the contract.")
@dataclass(frozen=True)
class SpoofGpsFrame:
"""One spoofed GPS record — what fc_proxy will inject in place of real GPS."""
monotonic_ms: int
lat_deg: float
lon_deg: float
alt_m: float
fix_type: int
hdop: float
@dataclass(frozen=True)
class BlackoutSpoofSchedule:
"""The full coordinated timeline written to ``schedule.json``."""
window_start_ms: int
window_end_ms: int
spoof_gps: list[SpoofGpsFrame] = field(default_factory=list)
blackout_frame_indices: list[int] = field(default_factory=list)
max_alignment_err_ms: float = 40.0
@dataclass(frozen=True)
class BlackoutSpoofReport:
"""Summary of a single ``build()`` run — written to ``summary.json``."""
out_root: Path
schedule: BlackoutSpoofSchedule
blackout_frame_count: int
spoof_frame_count: int
inter_spoof_delta_m_min: float
inter_spoof_delta_m_max: float
def _bearing_offset(lat: float, lon: float, bearing_deg: float, dist_m: float) -> tuple[float, float]:
"""Project ``(lat, lon)`` along ``bearing_deg`` by ``dist_m`` (great-circle)."""
R = 6_371_000.0
br = math.radians(bearing_deg)
lat1 = math.radians(lat)
lon1 = math.radians(lon)
ang = dist_m / R
lat2 = math.asin(math.sin(lat1) * math.cos(ang) + math.cos(lat1) * math.sin(ang) * math.cos(br))
lon2 = lon1 + math.atan2(
math.sin(br) * math.sin(ang) * math.cos(lat1),
math.cos(ang) - math.sin(lat1) * math.sin(lat2),
)
return math.degrees(lat2), math.degrees(lon2)
def _build_spoof_gps_track(
plan: BlackoutSpoofPlan,
window_start_ms: int,
window_end_ms: int,
rng: np.random.Generator,
) -> list[SpoofGpsFrame]:
"""Generate a spoofed-GPS track that satisfies AC-4 + AC-NEW-8.
The track starts at the plan's initial point + spoof_offset_m along
spoof_bearing_deg (the initial "jump" that defines the spoofed
position). Subsequent frames jump 200-500 m in a randomly-perturbed
bearing each step — enforced deterministically by the seeded RNG.
"""
cadence_ms = int(round(1000.0 / _SPOOF_HZ))
frames: list[SpoofGpsFrame] = []
cur_lat, cur_lon = _bearing_offset(
plan.initial_lat_deg, plan.initial_lon_deg, plan.spoof_bearing_deg, plan.spoof_offset_m
)
cur_alt = 300.0 # plausible-cruise altitude (matches `flight_derkachi/camera_info.md`)
cur_bearing = plan.spoof_bearing_deg
t = window_start_ms
while t <= window_end_ms:
delta_m = float(
rng.uniform(_MIN_INTER_SPOOF_DELTA_M, _MAX_INTER_SPOOF_DELTA_M)
)
# Perturb bearing ±60° per step so the spoofed track looks like
# a realistic-but-bad GPS noise pattern (not a straight line).
cur_bearing = (cur_bearing + float(rng.uniform(-60.0, 60.0))) % 360.0
cur_lat, cur_lon = _bearing_offset(cur_lat, cur_lon, cur_bearing, delta_m)
# Stay inside realistic flight altitude range; small noise only.
cur_alt += float(rng.uniform(-2.0, 2.0))
fix_type = int(rng.choice(_SPOOF_FIX_TYPES))
hdop = float(rng.uniform(*_SPOOF_HDOP_RANGE))
frames.append(
SpoofGpsFrame(
monotonic_ms=t,
lat_deg=round(cur_lat, 7),
lon_deg=round(cur_lon, 7),
alt_m=round(cur_alt, 3),
fix_type=fix_type,
hdop=round(hdop, 3),
)
)
t += cadence_ms
return frames
def _black_jpeg_bytes() -> bytes:
"""All-black 256×256 JPEG using the project's pinned PIL settings."""
from PIL import Image # noqa: PLC0415 — heavy import, deferred
img = Image.new("RGB", (_TILE_W, _TILE_H), color=(0, 0, 0))
buf = io.BytesIO()
img.save(
buf,
format="JPEG",
quality=85,
optimize=False,
progressive=False,
subsampling=2,
)
return buf.getvalue()
def build(plan: BlackoutSpoofPlan, out_root: Path) -> BlackoutSpoofReport:
"""Generate the blackout-spoof-derkachi fixture under ``out_root``."""
if plan.blackout_seconds <= 0:
raise ValueError(f"blackout_seconds must be > 0; got {plan.blackout_seconds}")
if out_root.exists():
shutil.rmtree(out_root)
(out_root / "frames").mkdir(parents=True)
src_dir = plan.source_frames_dir
if not src_dir.is_dir():
raise FileNotFoundError(f"source frames directory not found: {src_dir}")
frames = sorted(src_dir.glob("AD*.jpg"))
if not frames:
raise FileNotFoundError(f"no AD*.jpg frames under {src_dir}")
total_frames = len(frames)
src_duration_ms = int(round((total_frames / plan.source_fps) * 1000.0))
# Anchor the window at 30 % of the source duration. The window must
# fit inside the source — if the requested blackout is longer than
# the remaining flight, fall back to "blackout from 30 % to end".
window_start_ms = int(0.3 * src_duration_ms)
window_end_ms = min(
window_start_ms + int(plan.blackout_seconds * 1000), src_duration_ms
)
# Frame-index window in the source frame-stream (frames are at
# ``source_fps`` Hz so a window of ``W`` ms maps to ``W/1000 * fps``
# frames).
first_blackout_frame = int(round(window_start_ms / 1000.0 * plan.source_fps))
last_blackout_frame = int(round(window_end_ms / 1000.0 * plan.source_fps))
blackout_indices = list(range(first_blackout_frame, min(last_blackout_frame, total_frames)))
rng = derive_rng(
"blackout_spoof",
plan.seed,
plan.blackout_seconds,
plan.spoof_offset_m,
plan.spoof_bearing_deg,
)
spoof_frames = _build_spoof_gps_track(plan, window_start_ms, window_end_ms, rng)
schedule = BlackoutSpoofSchedule(
window_start_ms=window_start_ms,
window_end_ms=window_end_ms,
spoof_gps=spoof_frames,
blackout_frame_indices=blackout_indices,
max_alignment_err_ms=plan.max_alignment_err_ms,
)
black_jpeg = _black_jpeg_bytes()
manifest_rows: list[dict] = []
blackout_set = set(blackout_indices)
for frame_idx, frame_path in enumerate(frames):
out_path = out_root / "frames" / frame_path.name
if frame_idx in blackout_set:
out_path.write_bytes(black_jpeg)
manifest_rows.append(
{
"frame_idx": frame_idx,
"src_jpeg_path": frame_path.name,
"kind": "blackout",
"window_start_ms": window_start_ms,
"window_end_ms": window_end_ms,
"seed": plan.seed,
}
)
else:
shutil.copy2(frame_path, out_path)
_write_schedule(out_root, schedule)
_write_manifest(out_root, manifest_rows)
deltas_m: list[float] = []
for prev, nxt in zip(spoof_frames, spoof_frames[1:]):
from ._common import haversine_m as _hav
deltas_m.append(_hav(prev.lat_deg, prev.lon_deg, nxt.lat_deg, nxt.lon_deg))
report = BlackoutSpoofReport(
out_root=out_root,
schedule=schedule,
blackout_frame_count=len(blackout_indices),
spoof_frame_count=len(spoof_frames),
inter_spoof_delta_m_min=min(deltas_m) if deltas_m else 0.0,
inter_spoof_delta_m_max=max(deltas_m) if deltas_m else 0.0,
)
_write_summary(out_root, report)
return report
def _write_schedule(out_root: Path, schedule: BlackoutSpoofSchedule) -> None:
payload = {
"window_start_ms": schedule.window_start_ms,
"window_end_ms": schedule.window_end_ms,
"max_alignment_err_ms": schedule.max_alignment_err_ms,
"blackout_frame_indices": schedule.blackout_frame_indices,
"spoof_gps": [
{
"monotonic_ms": f.monotonic_ms,
"lat_deg": f.lat_deg,
"lon_deg": f.lon_deg,
"alt_m": f.alt_m,
"fix_type": f.fix_type,
"hdop": f.hdop,
}
for f in schedule.spoof_gps
],
}
(out_root / "schedule.json").write_text(
json.dumps(payload, sort_keys=True, indent=2) + "\n"
)
def _write_manifest(out_root: Path, rows: list[dict]) -> None:
manifest = out_root / "manifest.csv"
with manifest.open("w", newline="") as fp:
writer = csv.DictWriter(
fp,
fieldnames=["frame_idx", "src_jpeg_path", "kind", "window_start_ms", "window_end_ms", "seed"],
lineterminator="\n",
)
writer.writeheader()
for row in sorted(rows, key=lambda r: r["frame_idx"]):
writer.writerow(row)
def _write_summary(out_root: Path, report: BlackoutSpoofReport) -> None:
payload = {
"scenario": "blackout-spoof-derkachi",
"window_start_ms": report.schedule.window_start_ms,
"window_end_ms": report.schedule.window_end_ms,
"blackout_frame_count": report.blackout_frame_count,
"spoof_frame_count": report.spoof_frame_count,
"inter_spoof_delta_m_min": round(report.inter_spoof_delta_m_min, 3),
"inter_spoof_delta_m_max": round(report.inter_spoof_delta_m_max, 3),
"max_alignment_err_ms": report.schedule.max_alignment_err_ms,
}
(out_root / "summary.json").write_text(
json.dumps(payload, sort_keys=True, indent=2) + "\n"
)
def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser(description="Blackout + spoofed-GPS injection (FT-N-04)")
parser.add_argument("--source-frames", type=Path, required=True)
parser.add_argument(
"--window-seconds",
type=float,
required=True,
help="Blackout window length in seconds (5/15/35 for FT-N-04 / NFT-RES-04 family)",
)
parser.add_argument("--seed", type=int, default=0)
parser.add_argument("--spoof-offset-m", type=float, default=350.0)
parser.add_argument("--spoof-bearing-deg", type=float, default=45.0)
parser.add_argument("--source-fps", type=float, default=_DEFAULT_SRC_FPS)
parser.add_argument(
"--out-root",
type=Path,
default=None,
help="Output dir. If omitted, /tmp/<run_id>/blackout-spoof-<window_seconds>s/.",
)
parser.add_argument("--run-id", default="local")
parser.add_argument("--quiet", action="store_true")
args = parser.parse_args(argv)
logging.basicConfig(
level=logging.WARNING if args.quiet else logging.INFO,
format="%(asctime)s %(levelname)s %(name)s %(message)s",
)
out_root = args.out_root or tmpfs_root(
args.run_id, f"blackout-spoof-{int(args.window_seconds)}s"
)
plan = BlackoutSpoofPlan(
source_frames_dir=args.source_frames,
blackout_seconds=args.window_seconds,
seed=args.seed,
spoof_offset_m=args.spoof_offset_m,
spoof_bearing_deg=args.spoof_bearing_deg,
source_fps=args.source_fps,
)
report = build(plan, out_root)
summary = {
"scenario": "blackout-spoof-derkachi",
"out_root": str(report.out_root),
"window_start_ms": report.schedule.window_start_ms,
"window_end_ms": report.schedule.window_end_ms,
"blackout_frame_count": report.blackout_frame_count,
"spoof_frame_count": report.spoof_frame_count,
"inter_spoof_delta_m_min": round(report.inter_spoof_delta_m_min, 3),
"inter_spoof_delta_m_max": round(report.inter_spoof_delta_m_max, 3),
"max_alignment_err_ms": report.schedule.max_alignment_err_ms,
}
json.dump(summary, sys.stdout, sort_keys=True, indent=2)
sys.stdout.write("\n")
return 0
if __name__ == "__main__":
raise SystemExit(main())
+209
View File
@@ -0,0 +1,209 @@
"""FC-inbound proxy patch for blackout_spoof — coordinated GPS spoof injection.
The blackout_spoof injector ships a ``schedule.json`` with two paired
artefacts:
1. ``blackout_frame_indices`` — which video frames are replaced with
black frames (the video-overlay portion writes them to disk).
2. ``spoof_gps`` — the pre-computed spoofed GPS frames that must appear
on the FC inbound stream *during the same wall-clock window*.
This module is the runtime piece that consumes the ``spoof_gps`` list:
a stateless **pass-through proxy** with a "timed splice" rule.
Default behaviour: every inbound MAVLink GPS message is forwarded
unchanged to the FC. While the proxy's monotonic clock falls inside
``[window_start_ms, window_end_ms]``, the proxy *replaces* the next
inbound GPS frame with the next pre-computed spoofed record. The
``window_start_ms`` / ``window_end_ms`` are anchored to the proxy's own
monotonic clock (started by ``activate(now_ms_provider, t0)``), which the
test harness aligns with the video-overlay's first black-frame timestamp
to satisfy AC-3 (≤40 ms alignment).
The module is intentionally **transport-agnostic**: it takes a callable
that returns ``now_ms`` (for testability — pytest passes a fake clock)
and exposes ``process_inbound_message(raw_gps)`` which the actual
MAVLink-frame router calls. The router lives outside the AZ-408 task
scope (it's part of the runner image's docker-compose wiring, not the
injector module).
Public-boundary discipline: this module does NOT import any
``src/gps_denied_onboard`` symbol; it operates on opaque "raw GPS frame"
bytes/dicts at the MAVLink protocol level.
"""
from __future__ import annotations
import json
from dataclasses import dataclass
from pathlib import Path
from typing import Callable
NowMsProvider = Callable[[], int]
@dataclass(frozen=True)
class SpoofGpsRecord:
"""Mirror of `blackout_spoof.SpoofGpsFrame` — JSON-parsed at proxy init."""
monotonic_ms: int
lat_deg: float
lon_deg: float
alt_m: float
fix_type: int
hdop: float
@dataclass(frozen=True)
class ProxyAlignmentReport:
"""Reports the actual wall-clock alignment achieved at activation.
Tests assert ``alignment_err_ms <= max_alignment_err_ms`` (AC-3 / AC-NEW-3).
"""
window_start_ms: int
activation_now_ms: int
alignment_err_ms: int
class BlackoutSpoofProxy:
"""Coordinated pass-through proxy. NOT thread-safe; one per scenario.
Lifecycle:
proxy = BlackoutSpoofProxy.from_schedule_file(Path("schedule.json"))
report = proxy.activate(now_ms_provider=time.monotonic_ms)
# … runner forwards GPS frames …
while gps := router.next_inbound_gps():
forwarded = proxy.process_inbound_message(gps)
router.send_to_fc(forwarded)
"""
def __init__(
self,
window_start_ms: int,
window_end_ms: int,
spoof_gps: list[SpoofGpsRecord],
max_alignment_err_ms: float = 40.0,
) -> None:
self._window_start_ms = window_start_ms
self._window_end_ms = window_end_ms
self._spoof_gps = list(spoof_gps)
self._max_alignment_err_ms = max_alignment_err_ms
self._now_ms_provider: NowMsProvider | None = None
self._t0_ms: int | None = None
self._next_spoof_idx = 0
self._activated = False
self._activation_report: ProxyAlignmentReport | None = None
@classmethod
def from_schedule_file(cls, schedule_path: Path) -> "BlackoutSpoofProxy":
"""Load the proxy from a ``schedule.json`` written by blackout_spoof."""
if not schedule_path.is_file():
raise FileNotFoundError(f"schedule.json not found: {schedule_path}")
payload = json.loads(schedule_path.read_text())
spoof_gps = [
SpoofGpsRecord(
monotonic_ms=int(s["monotonic_ms"]),
lat_deg=float(s["lat_deg"]),
lon_deg=float(s["lon_deg"]),
alt_m=float(s["alt_m"]),
fix_type=int(s["fix_type"]),
hdop=float(s["hdop"]),
)
for s in payload["spoof_gps"]
]
return cls(
window_start_ms=int(payload["window_start_ms"]),
window_end_ms=int(payload["window_end_ms"]),
spoof_gps=spoof_gps,
max_alignment_err_ms=float(payload.get("max_alignment_err_ms", 40.0)),
)
def activate(
self,
now_ms_provider: NowMsProvider,
first_blackout_ms: int | None = None,
) -> ProxyAlignmentReport:
"""Bind the proxy to a clock and align ``t0`` to the first blackout frame.
``first_blackout_ms`` (in the proxy's monotonic clock space) is the
timestamp at which the video-overlay emitted its first all-black
frame. The proxy sets ``t0`` so that ``window_start_ms`` matches
that instant; this is what enforces AC-3 (≤40 ms alignment).
If ``first_blackout_ms`` is ``None`` the proxy uses ``now`` as the
anchor — useful for unit tests where the schedule's window starts
at t=0 in proxy time.
"""
now_ms = now_ms_provider()
anchor = first_blackout_ms if first_blackout_ms is not None else now_ms
# Adjust t0 so that ``proxy_time(now) = (now - t0) ≈ window_start_ms``
# at the moment of the first black frame.
self._t0_ms = anchor - self._window_start_ms
self._now_ms_provider = now_ms_provider
self._activated = True
self._activation_report = ProxyAlignmentReport(
window_start_ms=self._window_start_ms,
activation_now_ms=now_ms,
alignment_err_ms=abs(now_ms - anchor),
)
return self._activation_report
@property
def activation_report(self) -> ProxyAlignmentReport | None:
return self._activation_report
def _proxy_time_ms(self) -> int:
if not self._activated or self._now_ms_provider is None or self._t0_ms is None:
raise RuntimeError("proxy not activated — call activate(...) first")
return self._now_ms_provider() - self._t0_ms
def in_window(self) -> bool:
"""True iff the proxy clock is inside the blackout window."""
if not self._activated:
return False
t = self._proxy_time_ms()
return self._window_start_ms <= t <= self._window_end_ms
def process_inbound_message(self, raw_gps: dict) -> dict:
"""Pass-through (no-op) outside the window; spoofed-replace inside it.
``raw_gps`` is a dict in the shape of MAVLink ``GPS_INPUT`` /
``GPS_RAW_INT`` (we treat it as opaque; we just clone the keys
and overwrite the position fields). When the spoof list is
exhausted, the last spoofed frame keeps being emitted (the FC
sees a "stuck" spoofed position — that's what triggers
downstream failsafe escalation).
Calling this before ``activate()`` is a programming error and
raises ``RuntimeError`` — it would otherwise be a silent
passthrough that hides a mis-wired test setup.
"""
if not self._activated:
raise RuntimeError("proxy not activated — call activate(...) first")
if not self.in_window():
return raw_gps
spoof = self._next_spoof_record()
out = dict(raw_gps)
# Normalised + protocol-natural fields (the MAVLink router maps
# these to GPS_INPUT.lat / lon / alt / fix_type / hdop with the
# appropriate scaling; we keep degrees so the layer responsible
# for scaling owns it).
out["lat_deg"] = spoof.lat_deg
out["lon_deg"] = spoof.lon_deg
out["alt_m"] = spoof.alt_m
out["fix_type"] = spoof.fix_type
out["hdop"] = spoof.hdop
out["__spoofed__"] = True
return out
def _next_spoof_record(self) -> SpoofGpsRecord:
if self._next_spoof_idx < len(self._spoof_gps):
rec = self._spoof_gps[self._next_spoof_idx]
self._next_spoof_idx += 1
return rec
return self._spoof_gps[-1]
def emitted_spoof_count(self) -> int:
return self._next_spoof_idx
+291 -6
View File
@@ -1,20 +1,305 @@
"""multi-segment-derkachi — ≥3 disconnected segments via satellite re-loc (FT-P-08).
"""multi-segment-derkachi — ≥3 disjoint blackout windows, NO spoof (FT-P-08).
Concrete generator is owned by AZ-408. AZ-406 commits to the public
signature.
Generates a blackout-only fixture: ``n_segments`` disjoint all-black
windows distributed across the Derkachi flight, with no paired GPS spoof.
Drives the satellite-reference re-localization positive path; explicitly
NOT the security failsafe path (that's FT-N-04 / NFT-RES-04, owned by the
blackout_spoof injector).
Constraints (AC-5):
* ≥3 disjoint blackout windows.
* Consecutive windows separated by ≥30 s of normal frames.
* Total blackout coverage ≤25 % of the source duration.
Window placement is deterministic-positional (anchored at fixed fractions
of the source duration) rather than random — that keeps the test's
"window N starts at second X" assertion stable. The seed is still
accepted for API symmetry with the other injectors but currently does
not affect the output (documented in the dataclass docstring); future
NFT-RES-04 variants may use it to perturb segment lengths.
Public-boundary discipline: this module does NOT import any
``src/gps_denied_onboard`` symbol.
"""
from __future__ import annotations
import argparse
import csv
import io
import json
import logging
import shutil
import sys
from dataclasses import dataclass
from pathlib import Path
from ._common import tmpfs_root
logger = logging.getLogger(__name__)
# Constraint constants (AC-5 of AZ-408).
_MIN_INTER_SEGMENT_GAP_SECONDS = 30.0
_MAX_TOTAL_BLACKOUT_FRACTION = 0.25
_DEFAULT_SRC_FPS = 30.0
_TILE_W = 256
_TILE_H = 256
@dataclass(frozen=True)
class MultiSegmentPlan:
"""Configuration for the multi-segment-derkachi fixture.
AZ-408 replaces the AZ-406 scaffold dataclass; the previous shape
(just ``n_segments`` + ``gap_seconds``) is extended to include the
inputs the build path needs. ``seed`` is accepted for symmetry but
is not currently consumed — segment placement is deterministic-positional.
"""
source_frames_dir: Path
n_segments: int = 3
gap_seconds: float = 12.0
segment_seconds: float = 12.0
source_fps: float = _DEFAULT_SRC_FPS
seed: int = 0
def build(plan: MultiSegmentPlan, out_root: Path) -> Path:
raise NotImplementedError("Owned by AZ-408 — AZ-406 supplies only the contract.")
@dataclass(frozen=True)
class SegmentWindow:
start_ms: int
end_ms: int
first_frame_idx: int
last_frame_idx: int
@dataclass(frozen=True)
class MultiSegmentReport:
out_root: Path
segments: list[SegmentWindow]
source_duration_ms: int
total_blackout_frames: int
total_blackout_fraction: float
def _plan_segments(plan: MultiSegmentPlan, total_frames: int) -> list[SegmentWindow]:
"""Compute the segment windows that satisfy AC-5.
Strategy: place ``n_segments`` windows uniformly across the source
duration, each window starts at ``(i+1) / (n+1)`` of the duration
(so first window is not at t=0 and last window is not at t=END).
Then validate the gap constraint + the total-coverage constraint
and raise if the plan is infeasible (rather than silently truncating).
"""
if plan.n_segments < 3:
raise ValueError(f"n_segments must be ≥3 (AC-5); got {plan.n_segments}")
if plan.segment_seconds <= 0:
raise ValueError(f"segment_seconds must be > 0; got {plan.segment_seconds}")
src_duration_s = total_frames / plan.source_fps
src_duration_ms = int(round(src_duration_s * 1000.0))
seg_ms = int(round(plan.segment_seconds * 1000.0))
segments: list[SegmentWindow] = []
for i in range(plan.n_segments):
anchor_s = src_duration_s * (i + 1) / (plan.n_segments + 1)
start_ms = int(round(anchor_s * 1000.0))
end_ms = min(start_ms + seg_ms, src_duration_ms)
first_frame = int(round(start_ms / 1000.0 * plan.source_fps))
last_frame = int(round(end_ms / 1000.0 * plan.source_fps))
segments.append(
SegmentWindow(
start_ms=start_ms,
end_ms=end_ms,
first_frame_idx=first_frame,
last_frame_idx=min(last_frame, total_frames),
)
)
# AC-5 gap check.
for prev, nxt in zip(segments, segments[1:]):
gap_ms = nxt.start_ms - prev.end_ms
if gap_ms < _MIN_INTER_SEGMENT_GAP_SECONDS * 1000:
raise ValueError(
f"infeasible plan: gap between segment ending at {prev.end_ms} ms "
f"and segment starting at {nxt.start_ms} ms is {gap_ms} ms < "
f"{int(_MIN_INTER_SEGMENT_GAP_SECONDS * 1000)} ms (AC-5). Reduce "
"segment_seconds or n_segments, or use a longer source."
)
# AC-5 coverage check.
total_blackout_ms = sum(s.end_ms - s.start_ms for s in segments)
fraction = total_blackout_ms / max(1, src_duration_ms)
if fraction > _MAX_TOTAL_BLACKOUT_FRACTION:
raise ValueError(
f"infeasible plan: total blackout fraction is {fraction:.3f} "
f"> {_MAX_TOTAL_BLACKOUT_FRACTION:.2f} (AC-5). Reduce "
"segment_seconds or n_segments."
)
return segments
def _black_jpeg_bytes() -> bytes:
from PIL import Image # noqa: PLC0415 — heavy import, deferred
img = Image.new("RGB", (_TILE_W, _TILE_H), color=(0, 0, 0))
buf = io.BytesIO()
img.save(
buf,
format="JPEG",
quality=85,
optimize=False,
progressive=False,
subsampling=2,
)
return buf.getvalue()
def build(plan: MultiSegmentPlan, out_root: Path) -> MultiSegmentReport:
"""Generate the multi-segment-derkachi fixture under ``out_root``."""
if out_root.exists():
shutil.rmtree(out_root)
(out_root / "frames").mkdir(parents=True)
src_dir = plan.source_frames_dir
if not src_dir.is_dir():
raise FileNotFoundError(f"source frames directory not found: {src_dir}")
frames = sorted(src_dir.glob("AD*.jpg"))
if not frames:
raise FileNotFoundError(f"no AD*.jpg frames under {src_dir}")
total_frames = len(frames)
src_duration_ms = int(round(total_frames / plan.source_fps * 1000.0))
segments = _plan_segments(plan, total_frames)
black_jpeg = _black_jpeg_bytes()
manifest_rows: list[dict] = []
blackout_set: set[int] = set()
for seg_idx, seg in enumerate(segments):
for f in range(seg.first_frame_idx, min(seg.last_frame_idx, total_frames)):
blackout_set.add(f)
manifest_rows.append(
{
"frame_idx": f,
"src_jpeg_path": frames[f].name,
"segment_idx": seg_idx,
"segment_start_ms": seg.start_ms,
"segment_end_ms": seg.end_ms,
}
)
for frame_idx, frame_path in enumerate(frames):
out_path = out_root / "frames" / frame_path.name
if frame_idx in blackout_set:
out_path.write_bytes(black_jpeg)
else:
shutil.copy2(frame_path, out_path)
_write_schedule(out_root, segments)
_write_manifest(out_root, manifest_rows)
total_blackout = sum(s.last_frame_idx - s.first_frame_idx for s in segments)
fraction = (sum(s.end_ms - s.start_ms for s in segments)) / max(1, src_duration_ms)
report = MultiSegmentReport(
out_root=out_root,
segments=segments,
source_duration_ms=src_duration_ms,
total_blackout_frames=total_blackout,
total_blackout_fraction=fraction,
)
_write_summary(out_root, report)
return report
def _write_schedule(out_root: Path, segments: list[SegmentWindow]) -> None:
payload = {
"segments": [
{
"start_ms": s.start_ms,
"end_ms": s.end_ms,
"first_frame_idx": s.first_frame_idx,
"last_frame_idx": s.last_frame_idx,
}
for s in segments
]
}
(out_root / "schedule.json").write_text(
json.dumps(payload, sort_keys=True, indent=2) + "\n"
)
def _write_manifest(out_root: Path, rows: list[dict]) -> None:
manifest = out_root / "manifest.csv"
with manifest.open("w", newline="") as fp:
writer = csv.DictWriter(
fp,
fieldnames=["frame_idx", "src_jpeg_path", "segment_idx", "segment_start_ms", "segment_end_ms"],
lineterminator="\n",
)
writer.writeheader()
for row in sorted(rows, key=lambda r: (r["segment_idx"], r["frame_idx"])):
writer.writerow(row)
def _write_summary(out_root: Path, report: MultiSegmentReport) -> None:
payload = {
"scenario": "multi-segment-derkachi",
"n_segments": len(report.segments),
"source_duration_ms": report.source_duration_ms,
"total_blackout_frames": report.total_blackout_frames,
"total_blackout_fraction": round(report.total_blackout_fraction, 6),
"segments": [
{"start_ms": s.start_ms, "end_ms": s.end_ms} for s in report.segments
],
}
(out_root / "summary.json").write_text(
json.dumps(payload, sort_keys=True, indent=2) + "\n"
)
def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser(description="Multi-segment blackout (FT-P-08)")
parser.add_argument("--source-frames", type=Path, required=True)
parser.add_argument("--n-segments", type=int, default=3)
parser.add_argument("--segment-seconds", type=float, default=12.0)
parser.add_argument("--source-fps", type=float, default=_DEFAULT_SRC_FPS)
parser.add_argument("--seed", type=int, default=0)
parser.add_argument(
"--out-root",
type=Path,
default=None,
help="Output dir. If omitted, /tmp/<run_id>/multi-segment/.",
)
parser.add_argument("--run-id", default="local")
parser.add_argument("--quiet", action="store_true")
args = parser.parse_args(argv)
logging.basicConfig(
level=logging.WARNING if args.quiet else logging.INFO,
format="%(asctime)s %(levelname)s %(name)s %(message)s",
)
out_root = args.out_root or tmpfs_root(args.run_id, "multi-segment")
plan = MultiSegmentPlan(
source_frames_dir=args.source_frames,
n_segments=args.n_segments,
segment_seconds=args.segment_seconds,
source_fps=args.source_fps,
seed=args.seed,
)
report = build(plan, out_root)
summary = {
"scenario": "multi-segment-derkachi",
"out_root": str(report.out_root),
"n_segments": len(report.segments),
"source_duration_ms": report.source_duration_ms,
"total_blackout_frames": report.total_blackout_frames,
"total_blackout_fraction": round(report.total_blackout_fraction, 6),
}
json.dump(summary, sys.stdout, sort_keys=True, indent=2)
sys.stdout.write("\n")
return 0
if __name__ == "__main__":
raise SystemExit(main())
+296 -10
View File
@@ -1,24 +1,310 @@
"""outlier-injection-derkachi — injects up to 350 m position outliers (FT-N-01).
"""outlier-injection-derkachi — overlay far-away tile crops onto Derkachi frames (FT-N-01).
Concrete generator is owned by AZ-408. AZ-406 commits to the public
signature so test specs can plan against it.
Produces a per-test tmpfs fixture whose ``frames/`` subdirectory mirrors
the source Derkachi frames byte-for-byte EXCEPT that selected frames are
replaced with a JPEG crop pulled from a tile whose centre is ≥350 m
(AC-3.1) from the original frame's GT centre. The companion
``manifest.csv`` records, per replaced frame, ``(frame_idx, src_jpeg_path,
replacement_tile_x, replacement_tile_y, geodesic_offset_m, seed)`` so the
downstream FT-N-01 / FT-P-08 / NFT-RES-04 tests can assert AC-3.1 directly
without re-deriving the geo math.
Density flags ≈ AZ-408 AC-1 / AC-2:
* ``light`` → 1 in 100 frames (replacement ratio 0.01)
* ``medium`` → 1 in 10 frames (replacement ratio 0.10)
* ``heavy`` → 1 in 3 frames (replacement ratio ≈ 0.333)
Determinism (AC-1):
* The frame indices replaced are computed by a deterministic stride
(``_common.iter_video_frame_indices``) — not by random sampling — so two
runs replace the *same* frames.
* The replacement tile for each replaced frame is picked from a
``_common.derive_rng("outlier", seed, density)`` stream — same seed →
same picks.
* Output filenames mirror the source filenames; JPEG bodies are re-encoded
through a pinned PIL pipeline (``quality=85, optimize=False,
progressive=False, subsampling=2``) so the bytes are stable.
Tmpfs (AC-6): the injector writes only under the directory ``out_root``
passes in; the pytest fixture wrapper takes care of teardown.
Public-boundary discipline: this module does NOT import any
``src/gps_denied_onboard`` symbol.
"""
from __future__ import annotations
import argparse
import csv
import io
import json
import logging
import shutil
import sys
from dataclasses import dataclass
from pathlib import Path
from typing import Literal
from ._common import (
derive_rng,
far_away_indices,
haversine_m,
iter_video_frame_indices,
read_tile_manifest,
tmpfs_root,
)
logger = logging.getLogger(__name__)
Density = Literal["light", "medium", "heavy"]
_DENSITY_RATIO: dict[Density, float] = {
"light": 1 / 100,
"medium": 1 / 10,
"heavy": 1 / 3,
}
_TILE_W = 256
_TILE_H = 256
@dataclass(frozen=True)
class OutlierInjectionPlan:
"""Configuration for the outlier-injection-derkachi fixture."""
"""Configuration for the outlier-injection-derkachi fixture.
target_segment_seconds: tuple[float, float]
max_offset_m: float = 350.0
n_outliers: int = 5
AZ-408 replaces the AZ-406 scaffold dataclass; the previous shape
(``target_segment_seconds`` / ``max_offset_m`` / ``n_outliers``) was
a placeholder and is no longer used by any test.
"""
source_frames_dir: Path
tile_cache_dir: Path
density: Density
seed: int = 0
min_offset_m: float = 350.0
def build(plan: OutlierInjectionPlan, out_root: Path) -> Path:
"""Generate the fixture under ``out_root``. Returns the produced directory."""
raise NotImplementedError("Owned by AZ-408 — AZ-406 supplies only the contract.")
@dataclass(frozen=True)
class OutlierInjectionReport:
"""Summary of a single ``build()`` run — written to ``manifest.csv``."""
out_root: Path
total_source_frames: int
replaced_frame_count: int
density: Density
min_geodesic_offset_m: float
max_geodesic_offset_m: float
def _gt_centre_for_frame(
frame_idx: int,
tiles: list,
) -> tuple[float, float, int]:
"""Map a source frame to a (lat, lon, src_tile_idx) triple.
For the Derkachi fixture each AD-frame has a paired tile entry in
the tile-cache manifest (`paired_gmaps:ADNNNNNN` in the
`provenance` column). For unpaired frames we fall back to the
bbox tile (`STUB_BBOX:derkachi:*`); if even that's missing we
fall back to the first tile so the injector still runs.
"""
for j, r in enumerate(tiles):
if r.provenance.startswith("paired_gmaps:") and r.provenance.endswith(
f"AD{frame_idx + 1:06d}"
):
return r.centre_lat_deg, r.centre_lon_deg, j
for j, r in enumerate(tiles):
if r.provenance.startswith("STUB_BBOX:"):
return r.centre_lat_deg, r.centre_lon_deg, j
return tiles[0].centre_lat_deg, tiles[0].centre_lon_deg, 0
def _read_replacement_jpeg(tile_cache_dir: Path, jpeg_path: str) -> bytes:
"""Read + re-encode a tile JPEG through PIL with pinned settings.
Re-encoding (rather than raw copy) guarantees the body matches the
builder's encode (PIL ``quality=85, optimize=False, progressive=False,
subsampling=2``) even if the tile was written by a foreign tool.
"""
from PIL import Image # noqa: PLC0415 — heavy import, deferred
src = tile_cache_dir / jpeg_path
img = Image.open(src).convert("RGB").resize((_TILE_W, _TILE_H), Image.BICUBIC)
buf = io.BytesIO()
img.save(
buf,
format="JPEG",
quality=85,
optimize=False,
progressive=False,
subsampling=2,
)
return buf.getvalue()
def build(plan: OutlierInjectionPlan, out_root: Path) -> OutlierInjectionReport:
"""Generate the outlier-injection-derkachi fixture under ``out_root``.
Returns an ``OutlierInjectionReport`` summarising the run. Writes:
<out_root>/
frames/AD000001.jpg # passthrough or replaced
frames/AD000002.jpg # …
manifest.csv # per-replaced-frame metadata
summary.json # report fields, machine-readable
"""
if out_root.exists():
shutil.rmtree(out_root)
(out_root / "frames").mkdir(parents=True)
src_dir = plan.source_frames_dir
if not src_dir.is_dir():
raise FileNotFoundError(f"source frames directory not found: {src_dir}")
frames = sorted(src_dir.glob("AD*.jpg"))
if not frames:
raise FileNotFoundError(f"no AD*.jpg frames under {src_dir}")
tiles = read_tile_manifest(plan.tile_cache_dir / "manifest.csv")
ratio = _DENSITY_RATIO[plan.density]
replace_indices = set(iter_video_frame_indices(len(frames), ratio))
rng = derive_rng("outlier", plan.seed, plan.density)
manifest_rows: list[dict] = []
geodesic_offsets: list[float] = []
for frame_idx, frame_path in enumerate(frames):
out_path = out_root / "frames" / frame_path.name
if frame_idx not in replace_indices:
shutil.copy2(frame_path, out_path)
continue
src_lat, src_lon, src_tile_idx = _gt_centre_for_frame(frame_idx, tiles)
candidates = far_away_indices(tiles, src_tile_idx, plan.min_offset_m)
if not candidates:
raise RuntimeError(
f"no tile in {plan.tile_cache_dir} is ≥{plan.min_offset_m} m "
f"from frame {frame_path.name} — tile cache too small for "
"outlier injection"
)
pick_idx = int(rng.integers(0, len(candidates)))
chosen = tiles[candidates[pick_idx]]
offset_m = haversine_m(
src_lat, src_lon, chosen.centre_lat_deg, chosen.centre_lon_deg
)
geodesic_offsets.append(offset_m)
jpeg = _read_replacement_jpeg(plan.tile_cache_dir, chosen.jpeg_path)
out_path.write_bytes(jpeg)
manifest_rows.append(
{
"frame_idx": frame_idx,
"src_jpeg_path": str(frame_path.name),
"replacement_tile_x": chosen.tile_x,
"replacement_tile_y": chosen.tile_y,
"replacement_zoom": chosen.zoom_level,
"geodesic_offset_m": f"{offset_m:.3f}",
"density": plan.density,
"seed": plan.seed,
}
)
_write_manifest(out_root, manifest_rows)
report = OutlierInjectionReport(
out_root=out_root,
total_source_frames=len(frames),
replaced_frame_count=len(manifest_rows),
density=plan.density,
min_geodesic_offset_m=min(geodesic_offsets) if geodesic_offsets else 0.0,
max_geodesic_offset_m=max(geodesic_offsets) if geodesic_offsets else 0.0,
)
_write_summary(out_root, report)
return report
def _write_manifest(out_root: Path, rows: list[dict]) -> None:
manifest = out_root / "manifest.csv"
with manifest.open("w", newline="") as fp:
writer = csv.DictWriter(
fp,
fieldnames=[
"frame_idx",
"src_jpeg_path",
"replacement_tile_x",
"replacement_tile_y",
"replacement_zoom",
"geodesic_offset_m",
"density",
"seed",
],
lineterminator="\n",
)
writer.writeheader()
for row in sorted(rows, key=lambda r: r["frame_idx"]):
writer.writerow(row)
def _write_summary(out_root: Path, report: OutlierInjectionReport) -> None:
payload = {
"scenario": "outlier-injection-derkachi",
"total_source_frames": report.total_source_frames,
"replaced_frame_count": report.replaced_frame_count,
"density": report.density,
"min_geodesic_offset_m": round(report.min_geodesic_offset_m, 3),
"max_geodesic_offset_m": round(report.max_geodesic_offset_m, 3),
}
(out_root / "summary.json").write_text(
json.dumps(payload, sort_keys=True, indent=2) + "\n"
)
def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser(description="Outlier injection (FT-N-01)")
parser.add_argument("--source-frames", type=Path, required=True)
parser.add_argument("--tile-cache", type=Path, required=True)
parser.add_argument("--density", choices=("light", "medium", "heavy"), required=True)
parser.add_argument("--seed", type=int, default=0)
parser.add_argument("--min-offset-m", type=float, default=350.0)
parser.add_argument(
"--out-root",
type=Path,
default=None,
help="Output dir. If omitted, /tmp/<run_id>/outlier-<density>/.",
)
parser.add_argument("--run-id", default="local")
parser.add_argument("--quiet", action="store_true")
args = parser.parse_args(argv)
logging.basicConfig(
level=logging.WARNING if args.quiet else logging.INFO,
format="%(asctime)s %(levelname)s %(name)s %(message)s",
)
out_root = args.out_root or tmpfs_root(args.run_id, f"outlier-{args.density}")
plan = OutlierInjectionPlan(
source_frames_dir=args.source_frames,
tile_cache_dir=args.tile_cache,
density=args.density,
seed=args.seed,
min_offset_m=args.min_offset_m,
)
report = build(plan, out_root)
summary = {
"scenario": "outlier-injection-derkachi",
"out_root": str(report.out_root),
"total_source_frames": report.total_source_frames,
"replaced_frame_count": report.replaced_frame_count,
"density": report.density,
"min_geodesic_offset_m": round(report.min_geodesic_offset_m, 3),
"max_geodesic_offset_m": round(report.max_geodesic_offset_m, 3),
}
json.dump(summary, sys.stdout, sort_keys=True, indent=2)
sys.stdout.write("\n")
return 0
if __name__ == "__main__":
raise SystemExit(main())