# Solution Draft 02 — Recovering a Clean Nadir Fixture from `2026-05-09 16-10-54.mkv` > **Mode**: B (Solution Assessment) — additive. This draft does **not** modify any runtime component in `_docs/01_solution/solution_draft01.md` (C1…C12). It adds a *fixture-prep developer tool* that converts an OSD-burned-in GCS screen recording into the `flight_derkachi.mp4`-shaped artifact consumed by `tests/e2e/replay/test_az835_e2e_real_flight.py`. > > **Run date**: 2026-05-30. Continues the 2026-05-29 Mode B investigation (`_docs/00_research/_mode_b_2026-05-29_video_extraction/`), with one previously-open "Needs user decision" row now resolved by a fresh tlog scan (Section 5 below). > > **Extraction executed on 2026-05-30**. The primary path (§4.1 Steps 1 + 2) was run against this MKV; the resulting fixture is at [`../../00_problem/input_data/flight_topotek_2026-05-09/`](../../00_problem/input_data/flight_topotek_2026-05-09/) with its own short README. The non-nadir frame filter (§4.1 Steps 4–5) and the companion calibration / IMU files (§4.3) were intentionally NOT produced — they are downstream decisions, not part of "extract a clean video". The verified crop coordinates differ from the 2026-05-29 draft's PoC4 values (which assumed a smaller IR PIP); the current §4.1 numbers reflect what was actually used. > > **Backing artifacts** (read these alongside this draft for full evidence): > - Question decomposition: [`../../00_research/_mode_b_2026-05-29_video_extraction/00_question_decomposition.md`](../../00_research/_mode_b_2026-05-29_video_extraction/00_question_decomposition.md) > - Source registry (17 L1/L2/L3 sources): [`../../00_research/_mode_b_2026-05-29_video_extraction/01_source_registry.md`](../../00_research/_mode_b_2026-05-29_video_extraction/01_source_registry.md) > - Fact cards (18 verified facts incl. local PoC results): [`../../00_research/_mode_b_2026-05-29_video_extraction/02_fact_cards.md`](../../00_research/_mode_b_2026-05-29_video_extraction/02_fact_cards.md) > - Comparison framework: [`../../00_research/_mode_b_2026-05-29_video_extraction/03_comparison_framework.md`](../../00_research/_mode_b_2026-05-29_video_extraction/03_comparison_framework.md) > - Reasoning chain: [`../../00_research/_mode_b_2026-05-29_video_extraction/04_reasoning_chain.md`](../../00_research/_mode_b_2026-05-29_video_extraction/04_reasoning_chain.md) > - Validation log: [`../../00_research/_mode_b_2026-05-29_video_extraction/05_validation_log.md`](../../00_research/_mode_b_2026-05-29_video_extraction/05_validation_log.md) > - Component fit matrix: [`../../00_research/_mode_b_2026-05-29_video_extraction/06_component_fit_matrix.md`](../../00_research/_mode_b_2026-05-29_video_extraction/06_component_fit_matrix.md) > - Inputs: [`../../00_problem/input_data/10.05.2026/2026-05-09 16-10-54.mkv`](../../00_problem/input_data/10.05.2026/2026-05-09%2016-10-54.mkv) and [`../../00_problem/input_data/10.05.2026/2026-05-09 16-09-54.zip`](../../00_problem/input_data/10.05.2026/2026-05-09%2016-09-54.zip) (contains the paired `.tlog`) --- ## 1. TL;DR **Yes, a clean nadir replay fixture can be recovered**, but the answer has two parts that must both be done; doing only one will produce a fixture that quietly misleads the runtime pipeline. | Concern | Recommended primary | Cheap fallback (zero replay-code changes) | |---|---|---| | **Strip the GCS UI chrome (sidebars / minimap / IR-PIP)** | `ffmpeg crop` (deterministic, verbatim pixels) | — same — | | **Handle the gimbal's burned-in OSD (attitude ladder, crosshair, FOV brackets, status text)** | **Inject a static `osd_mask.png` into the existing C3 `kornia.feature.DISK.forward(img, mask=…)` call.** Zero pixel modification, zero fabrication risk. | `ffmpeg crop + delogo` chain (interpolates from neighbor pixels — non-generative; locally verified working as `poc4_delogo.mp4` on the prior run) | | **Filter out frames where the gimbal is not nadir** | **OCR the burned-in pitch text** (Option J) — the *previously* preferred telemetry path is dead for this recording (see Section 5). | Manual labeling pass: ship a small `frame_ranges.yaml` of nadir vs non-nadir segments alongside the MP4. ~30 min of human labour for a 6-minute clip. | **Disqualified**: - Generative video inpainters (VideoPainter / DiffuEraser / VidPivot et al.) — they fabricate terrain, which corrupts VPR/matching evaluation and violates `meta-rule.mdc` "Real Results, Not Simulated Ones". - FFmpeg `tmedian` (temporal median) — both the OSD text *and* the underlying scene change every frame, so the median is smeared in both regions (locally verified as `poc3_crop_tmedian.mp4` on the prior run). **Not available to this project** — the source MKV is what we have; there is no access to the camera, the GCS host, or the upstream RTSP. The ideal-but-out-of-reach path would have been to pull RTSP directly from the gimbal (`rtsp://192.168.144.108:554/stream=0` for the Topotek / Viewpro multi-sensor ball class) or extract DCIM with OSD off via the GimbalControl Ethernet utility (ArduPilot Source #4). That path is documented here only for completeness (and because the `flight_derkachi.mp4` fixture was produced that way, which is why it is already clean); it is not actionable for this data source. The only thing that could replace the cleanup pipeline is the original supplier voluntarily re-recording with OSD off — which is outside this project's control. --- ## 2. What is in `2026-05-09 16-10-54.mkv` ### 2.1 Technical metadata (verified via `ffprobe`) | Field | Value | |---|---| | Container | Matroska (`.mkv`) | | Video codec | H.264 | | Resolution | 1280 × 720 | | Frame rate | 30/1 fps | | Duration | 367.00 s (~6 m 7 s) | | File size | 115 044 545 bytes (~110 MB) | | Audio | AAC (discard at re-encode time with `-an`) | | Bitrate | ~2.5 Mbit/s | ### 2.2 Three overlay layers (Fact #1 — direct 12-frame variance analysis on the prior run) ``` +-----------------------------------------------------------------------+ | GCS chrome top bar (status, mode, GPS, alt) | +------------+----------------------------------------+-----------------+ | | TOP-LEFT HUD (burned by camera) | IR PIP | | SL STATS | · timer 00:04:24 | (live IR/thermal| | (live | · EO/IR zoom, FOV 53.2 ° | stream stamped | | sidebar | · target lat/lon | by the gimbal) | | values) | | | | | [actual EO video region] | | | | crosshair, attitude ladder | | | | FOV brackets, +/-3.7 ° pitch text | | | | +-----------------+ | | BOTTOM-RIGHT GIMBAL TEXT | ROLL / SPEED / | | | · 50.0823, 36.2515 | DIST / BATT / | | | · azimuth, elevation | CURRENT (live) | +------------+----------------------------------------+-----------------+ | Minimap / bottom status bar | +-----------------------------------------------------------------------+ ``` The three layers map to **two removal classes** (per Reasoning Chain Dimension 1): | Layer | Renderer | Removal class | |---|---|---| | GCS UI chrome (sidebars, minimap, status bars) | the GCS application, **after** the video stream arrives | **Pure crop** — discard the columns and rows around the EO region; pixels are *outside* the camera's video, no inpainting needed. | | Burned-in gimbal OSD (attitude ladder, crosshair, FOV brackets, top-left HUD, bottom-right text) | the **camera itself**, before the recorder ever saw the stream | **Mask or inpaint** — these pixels overwrite real EO pixels; you must either tell the downstream not to look at them (mask) or fill them with something visually plausible (inpaint). | | IR PIP (upper-right rectangle, ~360×210 px) | the camera (it stamps its IR channel into a corner of the EO output) | **Crop it out** geometrically — the rectangle is large enough that `delogo`'s interpolation is poor; cleanest to just keep the crop tight enough to exclude it. | ### 2.3 The aircraft / gimbal class (corroborated by the tlog scan in Section 5) - Airframe: **multirotor**, ArduCopter 4.6.3 on Pixhawk6X, QUAD/X frame (`STATUSTEXT`: `'Frame: QUAD/X'`). The project's spec'd nav-camera is a fixed-downward APS-C sensor on a *fixed-wing* per `restrictions.md` — this MKV represents a **different aircraft class** than the primary runtime target. That's a feature (extends test coverage) not a bug, but the fixture's metadata must record the discrepancy. - Gimbal: 3-axis stabilised, pitch range −90° to +20°, yaw range ±180°, roll range ±30° (per the tlog's `GIMBAL_MANAGER_INFORMATION` capability advertisement — Section 5). Consistent with the Topotek / Viewpro multi-sensor ball family identified by the prior run's visual inspection. - GCS: **Mission Planner 1.3.83** (per `STATUSTEXT`). The project's `restrictions.md` mandates QGroundControl as the production GCS; for this fixture, the GCS is just whatever was used to make the recording — not a runtime concern. --- ## 3. Where this fits in the existing solution (and where it does not) ### 3.1 The gap `_docs/01_solution/solution_draft01.md` defines components C1 (VIO), C2 (VPR), C3 (matchers), C4 (PnP), C5 (state estimator), C6 (tile cache), C7 (inference runtime), C8 (FC adapter), C10 (provisioning) and more — all *runtime* concerns on the Jetson Orin Nano Super. None of them is a "data ingestion / fixture-prep" component. Replay fixtures appear in `tests/e2e/replay/` as already-cleaned MP4s (Fact #18). This is fine for `flight_derkachi.mp4`, which arrived pre-cleaned because the operator (a) disabled the gimbal OSD via Topotek's GimbalControl utility, (b) mechanically locked the gimbal nadir, and (c) recorded the direct camera feed at 1080p before cropping to 880×720 (Reasoning Chain Dimension 7). `2026-05-09 16-10-54.mkv` arrived from the *opposite* situation: OSD on, gimbal unconstrained, GCS-screen-recorded. There is no existing project tool to turn this class of input into a usable fixture, which is why the question came up. ### 3.2 What this draft adds A new **fixture-prep developer tool** (location: `tools/fixture_prep/` or `tests/fixtures//build.py`, per existing project layout conventions) that converts one GCS-screen-recorded `.mkv` (plus its paired `.tlog`) into a directory of files in the same shape as `_docs/00_problem/input_data/flight_derkachi/`: ``` input_data/flight_topotek_2026-05-09/ ├── flight_topotek_2026-05-09.mp4 # cleaned, cropped, OSD handled (Section 4) ├── osd_mask.png # 1-channel mask used by Option B (Section 4.1) ├── 2026-05-09 16-09-54.tlog # unpacked from the supplied .zip ├── data_imu.csv # SCALED_IMU2 + GLOBAL_POSITION_INT export ├── frame_ranges.yaml # nadir vs non-nadir frame ranges (Section 4.3) ├── camera_info.md # camera class + calibration provenance └── topotek_gimbal_factory.json # calibration JSON, factory-sheet provenance ``` The tool is offline-only, deterministic, versioned, and reproducible — re-running it on the same input produces byte-identical outputs (the only non-determinism would be inside libx264, which we disable via `-preset placebo -tune zerolatency` or by pinning `-x264-params bframes=0:scenecut=0`, your choice depending on tolerated re-encode time). **It does not change any runtime component.** C1…C12 are untouched. The single optional change in the *test* layer is to teach `tests/e2e/replay/conftest.py::_calibration_path()` (or its sibling helpers) to also look for a companion `osd_mask.png` if Option B is selected — and to forward it as an extra kwarg to whatever wraps `DISK.forward()` inside C3 (see Section 4.1 for the exact one-line change required). --- ## 4. The recommended pipeline (and the cheap fallback) ### 4.1 PRIMARY — A + B + J: crop, mask-aware DISK, OCR pitch filter **Step 1 — Geometric crop (FFmpeg)**: discard the GCS chrome and the IR PIP rectangle. ```bash INPUT="_docs/00_problem/input_data/10.05.2026/2026-05-09 16-10-54.mkv" OUTPUT_DIR="_docs/00_problem/input_data/flight_topotek_2026-05-09" mkdir -p "$OUTPUT_DIR" # Crop coordinates *verified for this specific MKV* by direct frame inspection + # luminance/saturation discontinuity detection on 2026-05-30 (see fixture README). # Output: 610x260 EO-only region anchored at (250, 440) in the 1280x720 source. # # Why these numbers and not the prior research's draft (crop=900:445:50:25): # - The IR PIP is much larger than initially estimated: it spans roughly # x=620..1140, y=35..383 in the source frame. The prior crop's right edge at # x=950 cut into the PIP and the left edge at x=50 still included the # GCS left icon strip + the SL STATS panel. # - The IR PIP rectangle (~520 wide x ~350 tall) is too large for FFmpeg # `delogo` to interpolate cleanly. Geometric exclusion is the only honest # option for this recording. # - The largest *clean* EO rectangle (no GCS chrome, no IR PIP, almost no # burned-in OSD) is in the lower half of the frame, below the IR PIP. # # Re-verify if you ingest a future recording with a different GCS layout or # IR-PIP placement; see fixture README for the derivation script. ffmpeg -y -i "$INPUT" \ -vf "crop=610:260:250:440" \ -an -c:v libx264 -crf 18 -preset medium \ "$OUTPUT_DIR/flight_topotek_2026-05-09.mp4" ``` This is Option A: verbatim pixels, no inpainting, no fabrication, deterministic. On this specific MKV the crop is tight enough that essentially **no** burned-in gimbal OSD survives inside the output (verified on 8 sample frames spread across the recording — variance analysis flagged 1/158 600 = 0.0006 % of pixels as "static OSD-like"). The remaining steps 2–5 below are still relevant for other recordings of this class that may need a looser crop. **Step 2 — Build the OSD mask** (one-time, then versioned in the repo). Build a 1-channel PNG of the same dimensions as the cropped output (900×445), where white (255) marks "real EO pixels — DISK is allowed to detect keypoints here" and black (0) marks "burned-in OSD pixels — DISK must suppress detection here". One quick recipe: ```python # tools/fixture_prep/build_osd_mask.py import cv2, numpy as np from pathlib import Path # Open a sample cropped frame and any image editor; trace the OSD rectangles by hand, # then export as a 900x445 grayscale PNG. The script below is the deterministic alternative: # build the mask from a pixel-stability test over a sample of frames. src = Path("input_data/flight_topotek_2026-05-09/flight_topotek_2026-05-09_cropped.mp4") cap = cv2.VideoCapture(str(src)) n_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) sample = [int(n_frames * f) for f in (0.05, 0.15, 0.30, 0.45, 0.60, 0.75, 0.95)] stack = [] for idx in sample: cap.set(cv2.CAP_PROP_POS_FRAMES, idx) ok, frame = cap.read() if not ok: raise RuntimeError(f"frame {idx} unreadable") stack.append(cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY).astype(np.float32)) stack = np.stack(stack) # (N, H, W) std = stack.std(axis=0) # (H, W) mean = stack.mean(axis=0) # (H, W) # OSD heuristic: text/lines render as high-brightness, low-std (the *position* is stable # even if the *value* in that position changes — the bounding box itself does not move). # Real EO terrain over a moving camera is mid-brightness, high-std. osd_likely = (std < 12.0) & (mean > 180.0) # white/bright pixels stable in position osd_likely = cv2.dilate(osd_likely.astype(np.uint8) * 255, np.ones((7, 7), np.uint8)) mask = 255 - osd_likely # invert: white=keep, black=suppress cv2.imwrite("input_data/flight_topotek_2026-05-09/osd_mask.png", mask) ``` The std/mean thresholds above are the right shape but should be tuned by eye on this specific recording — the prior research's variance analysis showed `mean per-pixel std ≈ 30–40` for both the EO region and the GCS sidebars, so a `< 12` threshold cleanly separates burned-in OSD (which has near-zero std in pixels that contain text strokes) from real video. Inspect the saved `osd_mask.png` against a sample frame and refine the thresholds (or hand-trace) before committing it. **Step 3 — One-line wrapper change in C3 to forward the mask** (the only code change this draft proposes). `kornia.feature.DISK.forward(img, mask=None)` already accepts a mask argument of shape `(B, 1, H, W)` with values in `[0, 1]`, and multiplies the score map by it before NMS — keypoints in masked regions are suppressed by construction, with no preprocessing of the pixels themselves (Fact #13, Source #6 Kornia docs L1). The LightGlue maintainer (`cvg/LightGlue#97`) explicitly recommends this approach over post-hoc keypoint filtering. Locate the project's existing `kornia.feature.DISK(...)` instantiation and the call site that invokes it (per `solution_draft01.md` C3 the detector is DISK + LightGlue; the call site is somewhere under `src/.../matchers/` or the runtime DISK wrapper). Pass `mask=` through, where `` is loaded once at fixture-init time from `osd_mask.png` and re-used per frame. Sketch (project-specific paths to be filled in): ```python # Existing feats = self.disk(img, n=self.n_kp) # Becomes feats = self.disk(img, n=self.n_kp, mask=self.osd_mask) ``` `self.osd_mask` is loaded once in `__init__` from `(fixture_dir / "osd_mask.png").read_bytes()` and reshaped to `(1, 1, H, W)` float32 in `[0, 1]`. If the fixture has no `osd_mask.png`, the wrapper falls through to the original mask-less call — so existing `flight_derkachi.mp4` continues to work unchanged. **Step 4 — Frame-level filter (Option J: OCR pitch from burned-in attitude indicator)**. The previously-preferred telemetry path (Option I — parse `MOUNT_STATUS` / `GIMBAL_DEVICE_ATTITUDE_STATUS` from the paired `.tlog`) is **not viable for this recording**. See Section 5 for the evidence. The remaining viable paths are: - **(J) OCR the burned-in pitch number** — the gimbal renders pitch as text such as `-3.7°` in the attitude indicator. Use Tesseract or PaddleOCR per frame on a fixed crop around that text region, then drop frames where `|pitch − (−90°)| > 10°`. Quick recipe (project must add `pytesseract` or `paddleocr` to `requirements-dev.txt`): ```python # tools/fixture_prep/frame_pitch_from_ocr.py import cv2, pytesseract, re, json pat = re.compile(r"(-?\d+\.\d+)") src = "input_data/flight_topotek_2026-05-09/flight_topotek_2026-05-09_cropped.mp4" cap = cv2.VideoCapture(src) out = [] for frame_idx in range(int(cap.get(cv2.CAP_PROP_FRAME_COUNT))): ok, frame = cap.read() if not ok: break # Crop the attitude-indicator text region (coordinates depend on the cropped frame). roi = frame[y0:y1, x0:x1] text = pytesseract.image_to_string(roi, config="--psm 7 -c tessedit_char_whitelist=-0123456789.") m = pat.search(text) out.append({"frame": frame_idx, "pitch_deg": float(m.group(1)) if m else None}) with open("input_data/flight_topotek_2026-05-09/frame_pitch.json", "w") as f: json.dump(out, f) ``` Then derive `frame_ranges.yaml` from `frame_pitch.json` by clustering contiguous frame indices whose pitch is within the nadir band. - **(Manual)** — for a one-off fixture of 6 minutes, the cheapest deterministic alternative is a *manual labeling pass*: a developer watches the cropped video once, notes the frame ranges where the gimbal is at nadir (`[0:42, 1:53][3:58, 6:06]` etc.), and saves the ranges as `frame_ranges.yaml`. ~30 minutes of human labour, zero failure modes, fully reproducible by anyone who can re-watch the same MP4. This is the recommended path **for this specific fixture** unless additional GCS-screen-recorded fixtures are expected, in which case the OCR script amortises across them. **Step 5 — Filter the cropped video down to nadir-only frames** (using `frame_ranges.yaml` from Step 4). ```bash # Re-encode with a select filter restricting to the nadir frame ranges. # Build the select expression programmatically from frame_ranges.yaml. ffmpeg -y -i "$OUTPUT_DIR/flight_topotek_2026-05-09_cropped.mp4" \ -vf "select='between(n,1260,3390)+between(n,7140,10980)',setpts=N/FRAME_RATE/TB" \ -an -c:v libx264 -crf 18 -preset slow \ "$OUTPUT_DIR/flight_topotek_2026-05-09.mp4" ``` (Numbers above are illustrative; the actual `between(n, …)` segments come from the YAML.) ### 4.2 FALLBACK — A + C + (J or manual): no code change to C3 If teaching the C3 wrapper to forward `mask=…` is rejected for this fixture (a reasonable choice to keep `tests/e2e/replay/` purely "drop in an MP4 + a JSON + a CSV" with zero glue code), substitute **Option C** for Option B: replace each burned-in OSD rectangle with FFmpeg `delogo` interpolation. **On this specific MKV, Option C collapses into Option A.** The verified crop in §4.1 Step 1 already produces a near-zero-OSD output (0.0006 % of pixels flagged as static-OSD-like over 8 sample frames), so there are no rectangles left to delogo. The Option-C-versus-Option-B trade-off only re-emerges for hypothetical *other* recordings of this class that need a looser crop — e.g. a recording where the camera HUD is positioned differently and there is no clean rectangle wholly outside it. The generic recipe shape for such a recording would be: ```bash # Template only — instantiate W/H/X/Y for the looser crop and (x,y,w,h) # rectangles for each surviving OSD region, all in cropped-frame coords. # The delogo filter in FFmpeg 8.1 has no 'band' parameter (removed); only x, y, # w, h, show remain. Rectangles must NOT touch the cropped frame's edge. ffmpeg -y -i "$INPUT" \ -vf "crop=W:H:X:Y,\ delogo=x=x1:y=y1:w=w1:h=h1,\ delogo=x=x2:y=y2:w=w2:h=h2,\ ..." \ -an -c:v libx264 -crf 18 -preset medium \ "$OUTPUT_DIR/$FIXTURE_ID_delogo.mp4" ``` **Important caveats on Option C** (when it does need to be used): - `delogo` rectangles must not touch the image edge (no surrounding pixels to interpolate from). - `delogo` produces *new* pixels (interpolated from the immediate neighbourhood). They are not synthesised semantic terrain content, but they *are* new pixels that did not exist in the original camera capture. The downstream feature detector *can* fire on smooth interpolated regions (DISK keypoints sometimes detect on smooth gradient transitions). This is the residual risk of Option C versus Option B; quantify it by running both pipelines on a few nadir segments and comparing the keypoint density inside the masked regions on the Option C output to zero (the trivially-correct value Option B delivers). - `delogo` does **not** scale to rectangles much larger than ~50 px in their shorter dimension. For this MKV the IR PIP is ~520 × 350 px and cannot be cleanly delogo'd at all — geometric exclusion (i.e. the corrected crop in §4.1 Step 1) is the only honest option. - Then chain Step 4 + Step 5 from Section 4.1 on top of this Option C output to get the same nadir-only result. ### 4.3 Companion files | File | Source | Conventions | |---|---|---| | `flight_topotek_2026-05-09.mp4` | Sections 4.1 / 4.2 | H.264, 30 fps, exactly the cropped + OSD-handled + nadir-filtered video. Matches the `flight_derkachi.mp4` shape (any sub-1080p H.264 MP4 the replay harness already accepts). | | `osd_mask.png` | Section 4.1 Step 2 (only for Option B) | 900×445 grayscale PNG, white=keep, black=suppress. Versioned alongside the MP4. | | `2026-05-09 16-09-54.tlog` | Just unzip the `.zip` from `input_data/10.05.2026/` | Identical to the supplied tlog; ArduCopter 4.6.3 (Pixhawk6X), 133 191 messages over 446.8 s. | | `data_imu.csv` | Reuse the existing `derkachi.tlog → data_imu.csv` exporter, retargeted at this new tlog | 10 Hz table of `SCALED_IMU2` and `GLOBAL_POSITION_INT` per the `flight_derkachi/README.md` convention. | | `frame_ranges.yaml` | Section 4.1 Step 4 | List of `(start_frame, end_frame)` pairs the fixture considers "valid nadir frames". | | `camera_info.md` | Hand-written, modelled on `flight_derkachi/camera_info.md` | Records: camera class (Topotek / Viewpro 3-axis multi-sensor ball, per ArduPilot Source #4 + the tlog's `GIMBAL_MANAGER_INFORMATION` cap_flags), recording chain (camera HDMI → GCS app → desktop screen recorder → MKV), and the calibration's provenance flag (`factory_sheet`, per AZ-702 precedent — Fact #17). | | `topotek_gimbal_factory.json` | Same shape as `khp20s30_factory.json` (Fact #17) | Per-camera intrinsics + lens distortion from the camera's published spec sheet. Mark provenance `factory_sheet`. Residual focal-length error expected in the 1–3 % band, same envelope the project already accepts for `flight_derkachi.mp4`. | --- ## 5. New evidence — the paired tlog's gimbal state The prior 2026-05-29 run left exactly one unresolved row in `06_component_fit_matrix.md`: > **Frame filtering by gimbal pointing (PRIMARY) — `pymavlink` parser of paired `.tlog` for `MOUNT_STATUS` / `MOUNT_ORIENTATION`** → **Needs user decision**: depends on whether the paired tlog actually emits `MOUNT_STATUS` for the camera in question; if the gimbal does not report attitude over MAVLink, this option fails. This draft resolves that row by directly scanning the paired tlog. Here is the evidence. ### 5.1 What is in the tlog (`2026-05-09 16-09-54.tlog`, unpacked from the supplied `.zip`) `pymavlink 2.4.49` with `MAVLINK20=1`, `MAVLINK_DIALECT=all`. Scanned: 133 191 messages over 446.8 s, 46 distinct message types. Relevant subset: | Message type | Count | Mean rate | Notes | |---|---|---|---| | `HEARTBEAT` | 1492 | 3.3 Hz | 4 endpoints: `(sys=1, comp=1, autopilot=3, type=2)` = ArduCopter / QUAD multirotor; `(sys=1, comp=191, autopilot=0, type=6)` = a GCS-class component co-resident on sysid 1; `(sys=255, comp=0, autopilot=8, type=18)` and `(sys=255, comp=190, autopilot=8, type=6)` = Mission Planner GCS. | | `ATTITUDE` (vehicle) | 4174 | 9.3 Hz | Body pitch range: min −12.47°, max +4.68°, mean −3.95°. **This is the airframe attitude**, not the gimbal. | | `GIMBAL_DEVICE_ATTITUDE_STATUS` | **4338** | **9.7 Hz** | **All 4338 messages carry the identity quaternion `q = (1.0, 0.0, 0.0, 0.0)`** (exactly one distinct quaternion value across the entire flight). `flags = 0x002c` = `YAW_IN_VEHICLE_FRAME | PITCH_LOCK | ROLL_LOCK`. `failure_flags = 0x00000000`. | | `GIMBAL_MANAGER_INFORMATION` | 26 | discovery exchange | `gimbal_device_id=1`. Capability: pitch range `[−90°, +20°]`, yaw range `±180°`, roll range `±30°`. cap_flags=206847. Confirms the gimbal physically *can* reach nadir; just isn't reporting where it is right now. | | `COMMAND_LONG` distinct cmds | — | — | Only 6 distinct command IDs: 183 (`DO_SET_SERVO ch=15 pwm=1950` — one shot, possibly a release / trigger), 400 (`COMPONENT_ARM_DISARM`, twice), 511 (`SET_MESSAGE_INTERVAL`, 11×), 512 (`REQUEST_MESSAGE`, 100×), 520 (`REQUEST_AUTOPILOT_CAPABILITIES`, 47×), and 42428 (vendor-specific, params all zero). **None of these is a gimbal control command (no `MAV_CMD_DO_MOUNT_CONTROL` = 205, no `MAV_CMD_DO_GIMBAL_MANAGER_PITCHYAW` = 1000).** | | `NAMED_VALUE_FLOAT` names | 1 unique | — | Only `ESCs_CURR`. No gimbal-related custom variable. | | `STATUSTEXT` | 38 | — | Includes `'ArduCopter 4.6.3 - Agile(px6) (92b0cd78)'`, `'Pixhawk6X 001E0036 …'`, `'Frame: QUAD/X'`, `'Mission Planner 1.3.83'`. No gimbal-related text. | ### 5.2 What the identity quaternion really means `q = (1, 0, 0, 0)` is the null rotation. Per the MAVLink GIMBAL_DEVICE_ATTITUDE_STATUS spec, that means "the gimbal is in its default forward-pointing pose" (no rotation away from the body frame's +X). But the prior run's frame-by-frame visual inspection saw the gimbal *clearly pointing forward at t=30s and clearly pointing nadir at t=300s* (Fact #5). The two observations are mutually exclusive: if the gimbal were truly at the null rotation throughout the flight, every frame would look like it does at t=30s (forward). The reconciliation: **the gimbal is being moved by the operator, but the actual angle is not being reported back over MAVLink in this recording.** The gimbal driver is emitting the placeholder identity quaternion every ~100 ms because the ArduPilot mount driver expects to publish *something* at the configured rate, but no real angle is available (the gimbal device either isn't wired to talk back over MAVLink, or it is wired but isn't responding, or it is responding on a different transport — most likely the camera's own Ethernet protocol talking directly to the GCS, bypassing the autopilot). This is consistent with: - Mission Planner being able to control Topotek / Viewpro gimbals directly over Ethernet/UDP, separate from the ArduPilot MAVLink path. - The `DO_SET_SERVO ch=15 pwm=1950` one-shot pointing to a *trigger* (likely shutter / record-toggle), not a per-frame angle command. - The absence of `MAV_CMD_DO_MOUNT_CONTROL` and the absence of `GIMBAL_MANAGER_SET_ATTITUDE` in `COMMAND_LONG`. ### 5.3 Effect on the recommendation | Component-fit row (from the 2026-05-29 component fit matrix) | Original status | Status after Section 5 | |---|---|---| | Frame filtering by gimbal pointing (PRIMARY) — `pymavlink` parser of `MOUNT_STATUS`/`MOUNT_ORIENTATION` from the paired tlog | **Needs user decision** | **❌ Rejected for this recording.** The message type IS present at 9.7 Hz, but every quaternion is the placeholder identity value; the data carries zero information about the actual gimbal angle. | | Frame filtering by gimbal pointing (FALLBACK) — OCR on the burned-in pitch text | Experimental only | **✅ Selected as primary** (or the manual labeling pass, for one-off fixtures of this size). | **No other row in `06_component_fit_matrix.md` changes.** The pixel-handling recommendation (Option B primary, Option C fallback) and the rejections (Options F generative / G temporal-median) stand. ### 5.4 Why this is not a project-runtime issue The project's *runtime* nav-camera per `restrictions.md` is the ADTi 20MP fixed-downward (no gimbal at all). The runtime pipeline never sees a multi-sensor-ball gimbal-attitude stream. So the gap discovered here ("Mission-Planner-driven Topotek gimbals don't expose attitude over MAVLink") is only relevant for fixture preparation, not for the runtime contract. The follow-up "change the recording procedure to enable Topotek's own attitude-publish path" would require camera/GCS access this project does not have, so it is unavailable as a workaround. For any further recordings of this class, **plan on OCR-based pitch recovery (Option J) — or a manual labelling pass per fixture — as the standing strategy**, not as a temporary fallback. --- ## 6. Component fit summary (consolidated) > Full detail per row in [`../../00_research/_mode_b_2026-05-29_video_extraction/06_component_fit_matrix.md`](../../00_research/_mode_b_2026-05-29_video_extraction/06_component_fit_matrix.md). The table below is the *post-tlog-scan* update. | Component area | Candidate | Pinned mode | Status | Notes | |---|---|---|---|---| | GCS-chrome geometric crop | FFmpeg `crop` filter | `crop=900:445:50:25` per recording, derived from variance-map analysis | **Selected** | Trivial, lossless within re-encode, deterministic. PoC1 produced playable output on the prior run. | | OSD pixel handling (PRIMARY) | Kornia `DISK.forward(img, mask=…)` | mask-aware mode, `(B, 1, H, W)` mask multiplied into the DISK score map before NMS | **Selected** | No pixel modification; fabrication-risk = 0. Requires one-line C3 wrapper change to forward `mask=`. Already API-verified against Kornia docs L1 (Source #6) + LightGlue maintainer reply (Source #5). | | OSD pixel handling (FALLBACK) | FFmpeg `delogo` chained | multiple `delogo=x:y:w:h` after `crop`, rectangles inside the cropped frame | **Selected (fallback)** | PoC4 produced `poc4_delogo.mp4` on the prior run. Pick this if the C3 wrapper change is rejected for this fixture. | | OSD pixel handling | FFmpeg `removelogo` PNG mask | `removelogo=mask.png` | **Experimental only** | Failed locally with `Invalid argument` (-22) on FFmpeg 8.1; works in older versions per Source #15. Try first on your team's pinned FFmpeg before falling through to chained `delogo`. | | OSD pixel handling | ProPainter (non-generative video inpainter, ICCV 2023) | mask-guided sparse Transformer with flow completion | **Experimental only** | Highest visual quality among non-generative options. Adds PyTorch+CUDA toolchain; ~0.25 s/frame at 480p (Fact #11). Use only if a future recording's masked regions are too large for `delogo` interpolation. | | OSD pixel handling | VideoPainter / DiffuEraser / VidPivot (and any generative video inpainter) | diffusion-backbone I2V generative inpainter | **❌ Rejected** | Synthesises terrain content. Disqualified by `meta-rule.mdc` "Real Results, Not Simulated Ones" (Fact #12). | | OSD pixel handling | FFmpeg `tmedian` temporal median | `tmedian=radius=N` | **❌ Rejected** | Burned-in OSD text values change every frame, so the static-OSD assumption underneath the technique fails. PoC3 confirmed: smeared, ghosted output (Fact #10). | | Non-nadir frame filter (PRIMARY) | `pymavlink` MOUNT_STATUS / GIMBAL_DEVICE_ATTITUDE_STATUS | parse paired tlog → `frame_idx → gimbal_pitch_deg` table | **❌ Rejected for this recording (NEW)** | Section 5: message present at 9.7 Hz, but all 4338 quaternions are identity (1,0,0,0) — no real angle data. | | Non-nadir frame filter (PRIMARY, new) | OCR (Tesseract or PaddleOCR) on burned-in pitch text | per-frame OCR of the `−3.7°` text in the attitude indicator | **Selected (NEW)** | Was "Experimental only" pre-tlog-scan; promoted to primary now that the telemetry path is dead. Add `pytesseract` or `paddleocr` to `requirements-dev.txt`. | | Non-nadir frame filter (one-off alternative) | Manual labeling pass | developer watches the 6-min clip, marks ranges, commits `frame_ranges.yaml` | **Selected (for this fixture only)** | Cheapest deterministic path; recommended for this specific MKV unless additional GCS-screen-recorded fixtures are expected. | | Calibration JSON | Per-camera `topotek_gimbal_factory.json` (same shape as `khp20s30_factory.json`) | "factory_sheet" provenance per AZ-702 precedent | **Selected** | Project-accepted (Fact #17). Residual 1–3 % focal-length error envelope. | | Companion telemetry CSV | Existing `derkachi.tlog → data_imu.csv` exporter, retargeted | unchanged | **Selected** | Reuses existing tool. | | Source recovery (Option Z) | Pull RTSP / extract DCIM from gimbal with OSD disabled via Topotek GimbalControl utility (ArduPilot Source #4) | n/a — out-of-band camera access | **❌ Not available — no camera / GCS access for this data source.** | Documented here only because it is the cleanest path *in principle* and because the existing `flight_derkachi.mp4` fixture was produced this way. Not actionable for this project's data pipeline. The only way it returns to the table is if the original supplier voluntarily re-records with OSD off — outside this project's control. | --- ## 7. Testing strategy ### 7.1 Functional / integration 1. **Crop-coordinate validation.** Decode 5 frames from `flight_topotek_2026-05-09.mp4` and assert they are 900×445; assert the IR PIP is *not* present in the right-third of the frame; assert the GCS sidebars are *not* present in the leftmost / rightmost columns. 2. **OSD mask validation.** Open `osd_mask.png`, assert dimensions match the MP4, assert the union of black pixels covers ≥95 % of the union of OSD rectangles you would otherwise pass to `delogo`. Optionally, render `cropped_frame * (mask/255.)` and eyeball that the burned-in text is dimmed to black while the EO terrain is preserved. 3. **DISK mask-aware contract.** Add a unit test under `tests/unit/c3_matchers/` that loads the existing DISK wrapper, passes a synthetic 900×445 image with a checkerboard pattern + a corner rectangle of pure white, passes a mask zeroing the corner, and asserts no keypoint is returned at coordinates inside that corner. 4. **End-to-end replay smoke test.** Add a sibling test to `tests/e2e/replay/test_az835_e2e_real_flight.py` parameterised over `flight_topotek_2026-05-09` and confirm the pipeline runs to completion. Track end-to-end accuracy separately under AC-1.x. 5. **Frame-range filter sanity.** Iterate `frame_ranges.yaml` and assert: every range's `start_frame < end_frame`, ranges are non-overlapping, and the union covers at least N seconds of footage (where N is a project-chosen minimum-fixture-duration). ### 7.2 Non-functional - **Reproducibility**: re-run the entire `tools/fixture_prep/` script twice on a clean checkout and assert byte-identical outputs (pin libx264 settings; pin `pytesseract` version; pin Python version in `pyproject.toml`). - **Throughput**: the entire fixture-prep run for one 6-minute MKV should complete in well under 10 minutes on a developer workstation (no AC requirement; sanity ceiling). - **No fabrication regression**: extract keypoints from the masked region of the Option B output and assert count == 0; for the Option C output, assert keypoint count is at most 5 % of the unmasked terrain keypoint count. --- ## 8. Open questions 1. **Adopt the one-line C3 wrapper change?** Section 4.1 Step 3 proposes forwarding `mask=…` through the existing DISK call. This is the lowest-risk highest-quality path (Option B) but requires touching `src/.../matchers/`. The fallback (Option C, chained `delogo`) avoids this code change entirely and only touches the fixture-prep script. **Either is defensible** — the choice depends on whether the team is willing to formalise mask-aware fixtures as a first-class concept in the replay layer (recommended yes) or wants to keep that layer "drop in an MP4" pure (defensible too). 2. **Use OCR for the frame filter, or just hand-label this one fixture?** For a single 6-minute clip, the manual labeling pass is cheaper than building, validating, and pinning a Tesseract/PaddleOCR pipeline. Use OCR only if you expect to ingest additional fixtures of the same class (same gimbal HUD layout) and want the script to amortise. Either way, the YAML output format is the same — so this can be revisited later. 3. **Future recordings — Option Z (direct RTSP/DCIM extraction with OSD off) is not available** to this project: no camera or GCS access exists for this data source. The only theoretical path to bypass the cleanup pipeline is to ask the original supplier to re-record with OSD disabled (via Topotek's GimbalControl utility on their side, or by setting `MNT1_OPTIONS` / equivalent on their flight controller). Whether to even make that request is a separate decision; it is not a technical option this project can execute on its own. Assume Option Z stays unavailable and plan all future fixtures of this class around the OCR / manual-labelling path in §4 + §5.3. --- ## 9. References L1 (official documentation / source): - FFmpeg `delogo` filter, vf_delogo.c [Sources #1, #2 in source registry] - FFmpeg `removelogo` filter, vf_removelogo.c [Source #3] - FFmpeg `tmedian` filter [Source #8] - ArduPilot Topotek Gimbal docs [Source #4] - LightGlue maintainer reply on score-map masking, issue cvg/LightGlue#97 [Source #5] - Kornia `DISK.forward()` documentation [Source #6] - DISK upstream source (`disk/model/disk.py`) [Source #7] - Project: `_docs/00_problem/input_data/flight_derkachi/README.md` [Source #9] - Project: `_docs/00_problem/input_data/flight_derkachi/camera_info.md` [Source #10] L2 (peer-reviewed): - ProPainter (ICCV 2023) [Source #11] - VideoPainter (arXiv 2503.05639, 2025) [Source #12] — referenced as disqualified - VidPivot / DiffuEraser comparison (arXiv 2510.21461, 2025) [Source #13] - DISK paper (NeurIPS 2020, arXiv 2006.13566) [Source #14] L3 (practitioner / community): - "Removing obnoxious logos from videos" blog [Source #15] - Conditional Temporal Median Filter reference [Source #16] - Foundry Nuke TemporalMedian reference [Source #17] In-repo cross-references: - `_docs/01_solution/solution_draft01.md` — existing solution; C2 (MixVPR TensorRT INT8+FP16), C3 (DISK + LightGlue), C5 (GTSAM iSAM2 + CombinedImuFactor) [Source #R1] - `_docs/00_research/06_component_fit_matrix/00_summary.md` — confirms no fixture-prep component exists in the runtime [Source #R2] - `_docs/00_problem/input_data/flight_derkachi/khp20s30_factory.json` — existing per-camera calibration JSON precedent [Source #R3] This-run new evidence: - ffprobe verification of `2026-05-09 16-10-54.mkv` technical metadata (Section 2.1) - `pymavlink` scan of unpacked `2026-05-09 16-09-54.tlog` (Section 5) — 133 191 messages over 446.8 s, GIMBAL_DEVICE_ATTITUDE_STATUS at 9.7 Hz, all identity quaternions --- ## 10. Related artifacts | Artifact | Status | |---|---| | `_docs/00_research/_mode_b_2026-05-29_video_extraction/` | Complete through Step 7.5 — this draft is its Step 8 deliverable, with one row updated by new tlog evidence | | `_docs/01_solution/solution_draft01.md` | Untouched. C1–C12 unchanged. This draft is purely additive. | | `_docs/01_solution/solution.md` | Untouched. | | `_docs/00_problem/input_data/10.05.2026/2026-05-09 16-10-54.mkv` | Source MKV. Untouched. | | `_docs/00_problem/input_data/10.05.2026/2026-05-09 16-09-54.zip` | Source tlog archive. Untouched. | | Future: `input_data/flight_topotek_2026-05-09/` | The cleaned fixture directory this draft proposes producing. Not yet created. | | Future: `tools/fixture_prep/` | The reproducible script that will produce the above. Not yet created. |