- Changed paths in documentation and configuration files to reflect the new naming convention for the NetVLAD model, transitioning from `models/netvlad/netvlad.pt` to `models/net_vlad/net_vlad.pt`. - Updated the `.gitignore` to include additional file types and directories related to input data and locally-generated evidence frames. - Removed the old NetVLAD checkpoint file as part of the transition to the new naming scheme. These changes ensure consistency across the project and improve the management of generated files.
41 KiB
Solution Draft 02 — Recovering a Clean Nadir Fixture from 2026-05-09 16-10-54.mkv
Mode: B (Solution Assessment) — additive. This draft does not modify any runtime component in
_docs/01_solution/solution_draft01.md(C1…C12). It adds a fixture-prep developer tool that converts an OSD-burned-in GCS screen recording into theflight_derkachi.mp4-shaped artifact consumed bytests/e2e/replay/test_az835_e2e_real_flight.py.Run date: 2026-05-30. Continues the 2026-05-29 Mode B investigation (
_docs/00_research/_mode_b_2026-05-29_video_extraction/), with one previously-open "Needs user decision" row now resolved by a fresh tlog scan (Section 5 below).Extraction executed on 2026-05-30. The primary path (§4.1 Steps 1 + 2) was run against this MKV; the resulting fixture is at
../../00_problem/input_data/flight_topotek_2026-05-09/with its own short README. The non-nadir frame filter (§4.1 Steps 4–5) and the companion calibration / IMU files (§4.3) were intentionally NOT produced — they are downstream decisions, not part of "extract a clean video". The verified crop coordinates differ from the 2026-05-29 draft's PoC4 values (which assumed a smaller IR PIP); the current §4.1 numbers reflect what was actually used.Backing artifacts (read these alongside this draft for full evidence):
- Question decomposition:
../../00_research/_mode_b_2026-05-29_video_extraction/00_question_decomposition.md- Source registry (17 L1/L2/L3 sources):
../../00_research/_mode_b_2026-05-29_video_extraction/01_source_registry.md- Fact cards (18 verified facts incl. local PoC results):
../../00_research/_mode_b_2026-05-29_video_extraction/02_fact_cards.md- Comparison framework:
../../00_research/_mode_b_2026-05-29_video_extraction/03_comparison_framework.md- Reasoning chain:
../../00_research/_mode_b_2026-05-29_video_extraction/04_reasoning_chain.md- Validation log:
../../00_research/_mode_b_2026-05-29_video_extraction/05_validation_log.md- Component fit matrix:
../../00_research/_mode_b_2026-05-29_video_extraction/06_component_fit_matrix.md- Inputs:
../../00_problem/input_data/10.05.2026/2026-05-09 16-10-54.mkvand../../00_problem/input_data/10.05.2026/2026-05-09 16-09-54.zip(contains the paired.tlog)
1. TL;DR
Yes, a clean nadir replay fixture can be recovered, but the answer has two parts that must both be done; doing only one will produce a fixture that quietly misleads the runtime pipeline.
| Concern | Recommended primary | Cheap fallback (zero replay-code changes) |
|---|---|---|
| Strip the GCS UI chrome (sidebars / minimap / IR-PIP) | ffmpeg crop (deterministic, verbatim pixels) |
— same — |
| Handle the gimbal's burned-in OSD (attitude ladder, crosshair, FOV brackets, status text) | Inject a static osd_mask.png into the existing C3 kornia.feature.DISK.forward(img, mask=…) call. Zero pixel modification, zero fabrication risk. |
ffmpeg crop + delogo chain (interpolates from neighbor pixels — non-generative; locally verified working as poc4_delogo.mp4 on the prior run) |
| Filter out frames where the gimbal is not nadir | OCR the burned-in pitch text (Option J) — the previously preferred telemetry path is dead for this recording (see Section 5). | Manual labeling pass: ship a small frame_ranges.yaml of nadir vs non-nadir segments alongside the MP4. ~30 min of human labour for a 6-minute clip. |
Disqualified:
- Generative video inpainters (VideoPainter / DiffuEraser / VidPivot et al.) — they fabricate terrain, which corrupts VPR/matching evaluation and violates
meta-rule.mdc"Real Results, Not Simulated Ones". - FFmpeg
tmedian(temporal median) — both the OSD text and the underlying scene change every frame, so the median is smeared in both regions (locally verified aspoc3_crop_tmedian.mp4on the prior run).
Not available to this project — the source MKV is what we have; there is no access to the camera, the GCS host, or the upstream RTSP. The ideal-but-out-of-reach path would have been to pull RTSP directly from the gimbal (rtsp://192.168.144.108:554/stream=0 for the Topotek / Viewpro multi-sensor ball class) or extract DCIM with OSD off via the GimbalControl Ethernet utility (ArduPilot Source #4). That path is documented here only for completeness (and because the flight_derkachi.mp4 fixture was produced that way, which is why it is already clean); it is not actionable for this data source. The only thing that could replace the cleanup pipeline is the original supplier voluntarily re-recording with OSD off — which is outside this project's control.
2. What is in 2026-05-09 16-10-54.mkv
2.1 Technical metadata (verified via ffprobe)
| Field | Value |
|---|---|
| Container | Matroska (.mkv) |
| Video codec | H.264 |
| Resolution | 1280 × 720 |
| Frame rate | 30/1 fps |
| Duration | 367.00 s (~6 m 7 s) |
| File size | 115 044 545 bytes (~110 MB) |
| Audio | AAC (discard at re-encode time with -an) |
| Bitrate | ~2.5 Mbit/s |
2.2 Three overlay layers (Fact #1 — direct 12-frame variance analysis on the prior run)
+-----------------------------------------------------------------------+
| GCS chrome top bar (status, mode, GPS, alt) |
+------------+----------------------------------------+-----------------+
| | TOP-LEFT HUD (burned by camera) | IR PIP |
| SL STATS | · timer 00:04:24 | (live IR/thermal|
| (live | · EO/IR zoom, FOV 53.2 ° | stream stamped |
| sidebar | · target lat/lon | by the gimbal) |
| values) | | |
| | [actual EO video region] | |
| | crosshair, attitude ladder | |
| | FOV brackets, +/-3.7 ° pitch text | |
| | +-----------------+
| | BOTTOM-RIGHT GIMBAL TEXT | ROLL / SPEED / |
| | · 50.0823, 36.2515 | DIST / BATT / |
| | · azimuth, elevation | CURRENT (live) |
+------------+----------------------------------------+-----------------+
| Minimap / bottom status bar |
+-----------------------------------------------------------------------+
The three layers map to two removal classes (per Reasoning Chain Dimension 1):
| Layer | Renderer | Removal class |
|---|---|---|
| GCS UI chrome (sidebars, minimap, status bars) | the GCS application, after the video stream arrives | Pure crop — discard the columns and rows around the EO region; pixels are outside the camera's video, no inpainting needed. |
| Burned-in gimbal OSD (attitude ladder, crosshair, FOV brackets, top-left HUD, bottom-right text) | the camera itself, before the recorder ever saw the stream | Mask or inpaint — these pixels overwrite real EO pixels; you must either tell the downstream not to look at them (mask) or fill them with something visually plausible (inpaint). |
| IR PIP (upper-right rectangle, ~360×210 px) | the camera (it stamps its IR channel into a corner of the EO output) | Crop it out geometrically — the rectangle is large enough that delogo's interpolation is poor; cleanest to just keep the crop tight enough to exclude it. |
2.3 The aircraft / gimbal class (corroborated by the tlog scan in Section 5)
- Airframe: multirotor, ArduCopter 4.6.3 on Pixhawk6X, QUAD/X frame (
STATUSTEXT:'Frame: QUAD/X'). The project's spec'd nav-camera is a fixed-downward APS-C sensor on a fixed-wing perrestrictions.md— this MKV represents a different aircraft class than the primary runtime target. That's a feature (extends test coverage) not a bug, but the fixture's metadata must record the discrepancy. - Gimbal: 3-axis stabilised, pitch range −90° to +20°, yaw range ±180°, roll range ±30° (per the tlog's
GIMBAL_MANAGER_INFORMATIONcapability advertisement — Section 5). Consistent with the Topotek / Viewpro multi-sensor ball family identified by the prior run's visual inspection. - GCS: Mission Planner 1.3.83 (per
STATUSTEXT). The project'srestrictions.mdmandates QGroundControl as the production GCS; for this fixture, the GCS is just whatever was used to make the recording — not a runtime concern.
3. Where this fits in the existing solution (and where it does not)
3.1 The gap
_docs/01_solution/solution_draft01.md defines components C1 (VIO), C2 (VPR), C3 (matchers), C4 (PnP), C5 (state estimator), C6 (tile cache), C7 (inference runtime), C8 (FC adapter), C10 (provisioning) and more — all runtime concerns on the Jetson Orin Nano Super. None of them is a "data ingestion / fixture-prep" component. Replay fixtures appear in tests/e2e/replay/ as already-cleaned MP4s (Fact #18).
This is fine for flight_derkachi.mp4, which arrived pre-cleaned because the operator (a) disabled the gimbal OSD via Topotek's GimbalControl utility, (b) mechanically locked the gimbal nadir, and (c) recorded the direct camera feed at 1080p before cropping to 880×720 (Reasoning Chain Dimension 7).
2026-05-09 16-10-54.mkv arrived from the opposite situation: OSD on, gimbal unconstrained, GCS-screen-recorded. There is no existing project tool to turn this class of input into a usable fixture, which is why the question came up.
3.2 What this draft adds
A new fixture-prep developer tool (location: tools/fixture_prep/ or tests/fixtures/<flight_id>/build.py, per existing project layout conventions) that converts one GCS-screen-recorded .mkv (plus its paired .tlog) into a directory of files in the same shape as _docs/00_problem/input_data/flight_derkachi/:
input_data/flight_topotek_2026-05-09/
├── flight_topotek_2026-05-09.mp4 # cleaned, cropped, OSD handled (Section 4)
├── osd_mask.png # 1-channel mask used by Option B (Section 4.1)
├── 2026-05-09 16-09-54.tlog # unpacked from the supplied .zip
├── data_imu.csv # SCALED_IMU2 + GLOBAL_POSITION_INT export
├── frame_ranges.yaml # nadir vs non-nadir frame ranges (Section 4.3)
├── camera_info.md # camera class + calibration provenance
└── topotek_gimbal_factory.json # calibration JSON, factory-sheet provenance
The tool is offline-only, deterministic, versioned, and reproducible — re-running it on the same input produces byte-identical outputs (the only non-determinism would be inside libx264, which we disable via -preset placebo -tune zerolatency or by pinning -x264-params bframes=0:scenecut=0, your choice depending on tolerated re-encode time).
It does not change any runtime component. C1…C12 are untouched. The single optional change in the test layer is to teach tests/e2e/replay/conftest.py::_calibration_path() (or its sibling helpers) to also look for a companion osd_mask.png if Option B is selected — and to forward it as an extra kwarg to whatever wraps DISK.forward() inside C3 (see Section 4.1 for the exact one-line change required).
4. The recommended pipeline (and the cheap fallback)
4.1 PRIMARY — A + B + J: crop, mask-aware DISK, OCR pitch filter
Step 1 — Geometric crop (FFmpeg): discard the GCS chrome and the IR PIP rectangle.
INPUT="_docs/00_problem/input_data/10.05.2026/2026-05-09 16-10-54.mkv"
OUTPUT_DIR="_docs/00_problem/input_data/flight_topotek_2026-05-09"
mkdir -p "$OUTPUT_DIR"
# Crop coordinates *verified for this specific MKV* by direct frame inspection +
# luminance/saturation discontinuity detection on 2026-05-30 (see fixture README).
# Output: 610x260 EO-only region anchored at (250, 440) in the 1280x720 source.
#
# Why these numbers and not the prior research's draft (crop=900:445:50:25):
# - The IR PIP is much larger than initially estimated: it spans roughly
# x=620..1140, y=35..383 in the source frame. The prior crop's right edge at
# x=950 cut into the PIP and the left edge at x=50 still included the
# GCS left icon strip + the SL STATS panel.
# - The IR PIP rectangle (~520 wide x ~350 tall) is too large for FFmpeg
# `delogo` to interpolate cleanly. Geometric exclusion is the only honest
# option for this recording.
# - The largest *clean* EO rectangle (no GCS chrome, no IR PIP, almost no
# burned-in OSD) is in the lower half of the frame, below the IR PIP.
#
# Re-verify if you ingest a future recording with a different GCS layout or
# IR-PIP placement; see fixture README for the derivation script.
ffmpeg -y -i "$INPUT" \
-vf "crop=610:260:250:440" \
-an -c:v libx264 -crf 18 -preset medium \
"$OUTPUT_DIR/flight_topotek_2026-05-09.mp4"
This is Option A: verbatim pixels, no inpainting, no fabrication, deterministic. On this specific MKV the crop is tight enough that essentially no burned-in gimbal OSD survives inside the output (verified on 8 sample frames spread across the recording — variance analysis flagged 1/158 600 = 0.0006 % of pixels as "static OSD-like"). The remaining steps 2–5 below are still relevant for other recordings of this class that may need a looser crop.
Step 2 — Build the OSD mask (one-time, then versioned in the repo).
Build a 1-channel PNG of the same dimensions as the cropped output (900×445), where white (255) marks "real EO pixels — DISK is allowed to detect keypoints here" and black (0) marks "burned-in OSD pixels — DISK must suppress detection here". One quick recipe:
# tools/fixture_prep/build_osd_mask.py
import cv2, numpy as np
from pathlib import Path
# Open a sample cropped frame and any image editor; trace the OSD rectangles by hand,
# then export as a 900x445 grayscale PNG. The script below is the deterministic alternative:
# build the mask from a pixel-stability test over a sample of frames.
src = Path("input_data/flight_topotek_2026-05-09/flight_topotek_2026-05-09_cropped.mp4")
cap = cv2.VideoCapture(str(src))
n_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
sample = [int(n_frames * f) for f in (0.05, 0.15, 0.30, 0.45, 0.60, 0.75, 0.95)]
stack = []
for idx in sample:
cap.set(cv2.CAP_PROP_POS_FRAMES, idx)
ok, frame = cap.read()
if not ok: raise RuntimeError(f"frame {idx} unreadable")
stack.append(cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY).astype(np.float32))
stack = np.stack(stack) # (N, H, W)
std = stack.std(axis=0) # (H, W)
mean = stack.mean(axis=0) # (H, W)
# OSD heuristic: text/lines render as high-brightness, low-std (the *position* is stable
# even if the *value* in that position changes — the bounding box itself does not move).
# Real EO terrain over a moving camera is mid-brightness, high-std.
osd_likely = (std < 12.0) & (mean > 180.0) # white/bright pixels stable in position
osd_likely = cv2.dilate(osd_likely.astype(np.uint8) * 255, np.ones((7, 7), np.uint8))
mask = 255 - osd_likely # invert: white=keep, black=suppress
cv2.imwrite("input_data/flight_topotek_2026-05-09/osd_mask.png", mask)
The std/mean thresholds above are the right shape but should be tuned by eye on this specific recording — the prior research's variance analysis showed mean per-pixel std ≈ 30–40 for both the EO region and the GCS sidebars, so a < 12 threshold cleanly separates burned-in OSD (which has near-zero std in pixels that contain text strokes) from real video. Inspect the saved osd_mask.png against a sample frame and refine the thresholds (or hand-trace) before committing it.
Step 3 — One-line wrapper change in C3 to forward the mask (the only code change this draft proposes).
kornia.feature.DISK.forward(img, mask=None) already accepts a mask argument of shape (B, 1, H, W) with values in [0, 1], and multiplies the score map by it before NMS — keypoints in masked regions are suppressed by construction, with no preprocessing of the pixels themselves (Fact #13, Source #6 Kornia docs L1). The LightGlue maintainer (cvg/LightGlue#97) explicitly recommends this approach over post-hoc keypoint filtering.
Locate the project's existing kornia.feature.DISK(...) instantiation and the call site that invokes it (per solution_draft01.md C3 the detector is DISK + LightGlue; the call site is somewhere under src/.../matchers/ or the runtime DISK wrapper). Pass mask=<tensor> through, where <tensor> is loaded once at fixture-init time from osd_mask.png and re-used per frame.
Sketch (project-specific paths to be filled in):
# Existing
feats = self.disk(img, n=self.n_kp)
# Becomes
feats = self.disk(img, n=self.n_kp, mask=self.osd_mask)
self.osd_mask is loaded once in __init__ from (fixture_dir / "osd_mask.png").read_bytes() and reshaped to (1, 1, H, W) float32 in [0, 1]. If the fixture has no osd_mask.png, the wrapper falls through to the original mask-less call — so existing flight_derkachi.mp4 continues to work unchanged.
Step 4 — Frame-level filter (Option J: OCR pitch from burned-in attitude indicator).
The previously-preferred telemetry path (Option I — parse MOUNT_STATUS / GIMBAL_DEVICE_ATTITUDE_STATUS from the paired .tlog) is not viable for this recording. See Section 5 for the evidence. The remaining viable paths are:
-
(J) OCR the burned-in pitch number — the gimbal renders pitch as text such as
-3.7°in the attitude indicator. Use Tesseract or PaddleOCR per frame on a fixed crop around that text region, then drop frames where|pitch − (−90°)| > 10°. Quick recipe (project must addpytesseractorpaddleocrtorequirements-dev.txt):# tools/fixture_prep/frame_pitch_from_ocr.py import cv2, pytesseract, re, json pat = re.compile(r"(-?\d+\.\d+)") src = "input_data/flight_topotek_2026-05-09/flight_topotek_2026-05-09_cropped.mp4" cap = cv2.VideoCapture(src) out = [] for frame_idx in range(int(cap.get(cv2.CAP_PROP_FRAME_COUNT))): ok, frame = cap.read() if not ok: break # Crop the attitude-indicator text region (coordinates depend on the cropped frame). roi = frame[y0:y1, x0:x1] text = pytesseract.image_to_string(roi, config="--psm 7 -c tessedit_char_whitelist=-0123456789.") m = pat.search(text) out.append({"frame": frame_idx, "pitch_deg": float(m.group(1)) if m else None}) with open("input_data/flight_topotek_2026-05-09/frame_pitch.json", "w") as f: json.dump(out, f)Then derive
frame_ranges.yamlfromframe_pitch.jsonby clustering contiguous frame indices whose pitch is within the nadir band. -
(Manual) — for a one-off fixture of 6 minutes, the cheapest deterministic alternative is a manual labeling pass: a developer watches the cropped video once, notes the frame ranges where the gimbal is at nadir (
[0:42, 1:53][3:58, 6:06]etc.), and saves the ranges asframe_ranges.yaml. ~30 minutes of human labour, zero failure modes, fully reproducible by anyone who can re-watch the same MP4. This is the recommended path for this specific fixture unless additional GCS-screen-recorded fixtures are expected, in which case the OCR script amortises across them.
Step 5 — Filter the cropped video down to nadir-only frames (using frame_ranges.yaml from Step 4).
# Re-encode with a select filter restricting to the nadir frame ranges.
# Build the select expression programmatically from frame_ranges.yaml.
ffmpeg -y -i "$OUTPUT_DIR/flight_topotek_2026-05-09_cropped.mp4" \
-vf "select='between(n,1260,3390)+between(n,7140,10980)',setpts=N/FRAME_RATE/TB" \
-an -c:v libx264 -crf 18 -preset slow \
"$OUTPUT_DIR/flight_topotek_2026-05-09.mp4"
(Numbers above are illustrative; the actual between(n, …) segments come from the YAML.)
4.2 FALLBACK — A + C + (J or manual): no code change to C3
If teaching the C3 wrapper to forward mask=… is rejected for this fixture (a reasonable choice to keep tests/e2e/replay/ purely "drop in an MP4 + a JSON + a CSV" with zero glue code), substitute Option C for Option B: replace each burned-in OSD rectangle with FFmpeg delogo interpolation.
On this specific MKV, Option C collapses into Option A. The verified crop in §4.1 Step 1 already produces a near-zero-OSD output (0.0006 % of pixels flagged as static-OSD-like over 8 sample frames), so there are no rectangles left to delogo. The Option-C-versus-Option-B trade-off only re-emerges for hypothetical other recordings of this class that need a looser crop — e.g. a recording where the camera HUD is positioned differently and there is no clean rectangle wholly outside it. The generic recipe shape for such a recording would be:
# Template only — instantiate W/H/X/Y for the looser crop and (x,y,w,h)
# rectangles for each surviving OSD region, all in cropped-frame coords.
# The delogo filter in FFmpeg 8.1 has no 'band' parameter (removed); only x, y,
# w, h, show remain. Rectangles must NOT touch the cropped frame's edge.
ffmpeg -y -i "$INPUT" \
-vf "crop=W:H:X:Y,\
delogo=x=x1:y=y1:w=w1:h=h1,\
delogo=x=x2:y=y2:w=w2:h=h2,\
..." \
-an -c:v libx264 -crf 18 -preset medium \
"$OUTPUT_DIR/$FIXTURE_ID_delogo.mp4"
Important caveats on Option C (when it does need to be used):
delogorectangles must not touch the image edge (no surrounding pixels to interpolate from).delogoproduces new pixels (interpolated from the immediate neighbourhood). They are not synthesised semantic terrain content, but they are new pixels that did not exist in the original camera capture. The downstream feature detector can fire on smooth interpolated regions (DISK keypoints sometimes detect on smooth gradient transitions). This is the residual risk of Option C versus Option B; quantify it by running both pipelines on a few nadir segments and comparing the keypoint density inside the masked regions on the Option C output to zero (the trivially-correct value Option B delivers).delogodoes not scale to rectangles much larger than ~50 px in their shorter dimension. For this MKV the IR PIP is ~520 × 350 px and cannot be cleanly delogo'd at all — geometric exclusion (i.e. the corrected crop in §4.1 Step 1) is the only honest option.- Then chain Step 4 + Step 5 from Section 4.1 on top of this Option C output to get the same nadir-only result.
4.3 Companion files
| File | Source | Conventions |
|---|---|---|
flight_topotek_2026-05-09.mp4 |
Sections 4.1 / 4.2 | H.264, 30 fps, exactly the cropped + OSD-handled + nadir-filtered video. Matches the flight_derkachi.mp4 shape (any sub-1080p H.264 MP4 the replay harness already accepts). |
osd_mask.png |
Section 4.1 Step 2 (only for Option B) | 900×445 grayscale PNG, white=keep, black=suppress. Versioned alongside the MP4. |
2026-05-09 16-09-54.tlog |
Just unzip the .zip from input_data/10.05.2026/ |
Identical to the supplied tlog; ArduCopter 4.6.3 (Pixhawk6X), 133 191 messages over 446.8 s. |
data_imu.csv |
Reuse the existing derkachi.tlog → data_imu.csv exporter, retargeted at this new tlog |
10 Hz table of SCALED_IMU2 and GLOBAL_POSITION_INT per the flight_derkachi/README.md convention. |
frame_ranges.yaml |
Section 4.1 Step 4 | List of (start_frame, end_frame) pairs the fixture considers "valid nadir frames". |
camera_info.md |
Hand-written, modelled on flight_derkachi/camera_info.md |
Records: camera class (Topotek / Viewpro 3-axis multi-sensor ball, per ArduPilot Source #4 + the tlog's GIMBAL_MANAGER_INFORMATION cap_flags), recording chain (camera HDMI → GCS app → desktop screen recorder → MKV), and the calibration's provenance flag (factory_sheet, per AZ-702 precedent — Fact #17). |
topotek_gimbal_factory.json |
Same shape as khp20s30_factory.json (Fact #17) |
Per-camera intrinsics + lens distortion from the camera's published spec sheet. Mark provenance factory_sheet. Residual focal-length error expected in the 1–3 % band, same envelope the project already accepts for flight_derkachi.mp4. |
5. New evidence — the paired tlog's gimbal state
The prior 2026-05-29 run left exactly one unresolved row in 06_component_fit_matrix.md:
Frame filtering by gimbal pointing (PRIMARY) —
pymavlinkparser of paired.tlogforMOUNT_STATUS/MOUNT_ORIENTATION→ Needs user decision: depends on whether the paired tlog actually emitsMOUNT_STATUSfor the camera in question; if the gimbal does not report attitude over MAVLink, this option fails.
This draft resolves that row by directly scanning the paired tlog. Here is the evidence.
5.1 What is in the tlog (2026-05-09 16-09-54.tlog, unpacked from the supplied .zip)
pymavlink 2.4.49 with MAVLINK20=1, MAVLINK_DIALECT=all. Scanned: 133 191 messages over 446.8 s, 46 distinct message types. Relevant subset:
| Message type | Count | Mean rate | Notes |
|---|---|---|---|
HEARTBEAT |
1492 | 3.3 Hz | 4 endpoints: (sys=1, comp=1, autopilot=3, type=2) = ArduCopter / QUAD multirotor; (sys=1, comp=191, autopilot=0, type=6) = a GCS-class component co-resident on sysid 1; (sys=255, comp=0, autopilot=8, type=18) and (sys=255, comp=190, autopilot=8, type=6) = Mission Planner GCS. |
ATTITUDE (vehicle) |
4174 | 9.3 Hz | Body pitch range: min −12.47°, max +4.68°, mean −3.95°. This is the airframe attitude, not the gimbal. |
GIMBAL_DEVICE_ATTITUDE_STATUS |
4338 | 9.7 Hz | All 4338 messages carry the identity quaternion q = (1.0, 0.0, 0.0, 0.0) (exactly one distinct quaternion value across the entire flight). flags = 0x002c = `YAW_IN_VEHICLE_FRAME |
GIMBAL_MANAGER_INFORMATION |
26 | discovery exchange | gimbal_device_id=1. Capability: pitch range [−90°, +20°], yaw range ±180°, roll range ±30°. cap_flags=206847. Confirms the gimbal physically can reach nadir; just isn't reporting where it is right now. |
COMMAND_LONG distinct cmds |
— | — | Only 6 distinct command IDs: 183 (DO_SET_SERVO ch=15 pwm=1950 — one shot, possibly a release / trigger), 400 (COMPONENT_ARM_DISARM, twice), 511 (SET_MESSAGE_INTERVAL, 11×), 512 (REQUEST_MESSAGE, 100×), 520 (REQUEST_AUTOPILOT_CAPABILITIES, 47×), and 42428 (vendor-specific, params all zero). None of these is a gimbal control command (no MAV_CMD_DO_MOUNT_CONTROL = 205, no MAV_CMD_DO_GIMBAL_MANAGER_PITCHYAW = 1000). |
NAMED_VALUE_FLOAT names |
1 unique | — | Only ESCs_CURR. No gimbal-related custom variable. |
STATUSTEXT |
38 | — | Includes 'ArduCopter 4.6.3 - Agile(px6) (92b0cd78)', 'Pixhawk6X 001E0036 …', 'Frame: QUAD/X', 'Mission Planner 1.3.83'. No gimbal-related text. |
5.2 What the identity quaternion really means
q = (1, 0, 0, 0) is the null rotation. Per the MAVLink GIMBAL_DEVICE_ATTITUDE_STATUS spec, that means "the gimbal is in its default forward-pointing pose" (no rotation away from the body frame's +X). But the prior run's frame-by-frame visual inspection saw the gimbal clearly pointing forward at t=30s and clearly pointing nadir at t=300s (Fact #5). The two observations are mutually exclusive: if the gimbal were truly at the null rotation throughout the flight, every frame would look like it does at t=30s (forward).
The reconciliation: the gimbal is being moved by the operator, but the actual angle is not being reported back over MAVLink in this recording. The gimbal driver is emitting the placeholder identity quaternion every ~100 ms because the ArduPilot mount driver expects to publish something at the configured rate, but no real angle is available (the gimbal device either isn't wired to talk back over MAVLink, or it is wired but isn't responding, or it is responding on a different transport — most likely the camera's own Ethernet protocol talking directly to the GCS, bypassing the autopilot).
This is consistent with:
- Mission Planner being able to control Topotek / Viewpro gimbals directly over Ethernet/UDP, separate from the ArduPilot MAVLink path.
- The
DO_SET_SERVO ch=15 pwm=1950one-shot pointing to a trigger (likely shutter / record-toggle), not a per-frame angle command. - The absence of
MAV_CMD_DO_MOUNT_CONTROLand the absence ofGIMBAL_MANAGER_SET_ATTITUDEinCOMMAND_LONG.
5.3 Effect on the recommendation
| Component-fit row (from the 2026-05-29 component fit matrix) | Original status | Status after Section 5 |
|---|---|---|
Frame filtering by gimbal pointing (PRIMARY) — pymavlink parser of MOUNT_STATUS/MOUNT_ORIENTATION from the paired tlog |
Needs user decision | ❌ Rejected for this recording. The message type IS present at 9.7 Hz, but every quaternion is the placeholder identity value; the data carries zero information about the actual gimbal angle. |
| Frame filtering by gimbal pointing (FALLBACK) — OCR on the burned-in pitch text | Experimental only | ✅ Selected as primary (or the manual labeling pass, for one-off fixtures of this size). |
No other row in 06_component_fit_matrix.md changes. The pixel-handling recommendation (Option B primary, Option C fallback) and the rejections (Options F generative / G temporal-median) stand.
5.4 Why this is not a project-runtime issue
The project's runtime nav-camera per restrictions.md is the ADTi 20MP fixed-downward (no gimbal at all). The runtime pipeline never sees a multi-sensor-ball gimbal-attitude stream. So the gap discovered here ("Mission-Planner-driven Topotek gimbals don't expose attitude over MAVLink") is only relevant for fixture preparation, not for the runtime contract. The follow-up "change the recording procedure to enable Topotek's own attitude-publish path" would require camera/GCS access this project does not have, so it is unavailable as a workaround. For any further recordings of this class, plan on OCR-based pitch recovery (Option J) — or a manual labelling pass per fixture — as the standing strategy, not as a temporary fallback.
6. Component fit summary (consolidated)
Full detail per row in
../../00_research/_mode_b_2026-05-29_video_extraction/06_component_fit_matrix.md. The table below is the post-tlog-scan update.
| Component area | Candidate | Pinned mode | Status | Notes |
|---|---|---|---|---|
| GCS-chrome geometric crop | FFmpeg crop filter |
crop=900:445:50:25 per recording, derived from variance-map analysis |
Selected | Trivial, lossless within re-encode, deterministic. PoC1 produced playable output on the prior run. |
| OSD pixel handling (PRIMARY) | Kornia DISK.forward(img, mask=…) |
mask-aware mode, (B, 1, H, W) mask multiplied into the DISK score map before NMS |
Selected | No pixel modification; fabrication-risk = 0. Requires one-line C3 wrapper change to forward mask=. Already API-verified against Kornia docs L1 (Source #6) + LightGlue maintainer reply (Source #5). |
| OSD pixel handling (FALLBACK) | FFmpeg delogo chained |
multiple delogo=x:y:w:h after crop, rectangles inside the cropped frame |
Selected (fallback) | PoC4 produced poc4_delogo.mp4 on the prior run. Pick this if the C3 wrapper change is rejected for this fixture. |
| OSD pixel handling | FFmpeg removelogo PNG mask |
removelogo=mask.png |
Experimental only | Failed locally with Invalid argument (-22) on FFmpeg 8.1; works in older versions per Source #15. Try first on your team's pinned FFmpeg before falling through to chained delogo. |
| OSD pixel handling | ProPainter (non-generative video inpainter, ICCV 2023) | mask-guided sparse Transformer with flow completion | Experimental only | Highest visual quality among non-generative options. Adds PyTorch+CUDA toolchain; ~0.25 s/frame at 480p (Fact #11). Use only if a future recording's masked regions are too large for delogo interpolation. |
| OSD pixel handling | VideoPainter / DiffuEraser / VidPivot (and any generative video inpainter) | diffusion-backbone I2V generative inpainter | ❌ Rejected | Synthesises terrain content. Disqualified by meta-rule.mdc "Real Results, Not Simulated Ones" (Fact #12). |
| OSD pixel handling | FFmpeg tmedian temporal median |
tmedian=radius=N |
❌ Rejected | Burned-in OSD text values change every frame, so the static-OSD assumption underneath the technique fails. PoC3 confirmed: smeared, ghosted output (Fact #10). |
| Non-nadir frame filter (PRIMARY) | pymavlink MOUNT_STATUS / GIMBAL_DEVICE_ATTITUDE_STATUS |
parse paired tlog → frame_idx → gimbal_pitch_deg table |
❌ Rejected for this recording (NEW) | Section 5: message present at 9.7 Hz, but all 4338 quaternions are identity (1,0,0,0) — no real angle data. |
| Non-nadir frame filter (PRIMARY, new) | OCR (Tesseract or PaddleOCR) on burned-in pitch text | per-frame OCR of the −3.7° text in the attitude indicator |
Selected (NEW) | Was "Experimental only" pre-tlog-scan; promoted to primary now that the telemetry path is dead. Add pytesseract or paddleocr to requirements-dev.txt. |
| Non-nadir frame filter (one-off alternative) | Manual labeling pass | developer watches the 6-min clip, marks ranges, commits frame_ranges.yaml |
Selected (for this fixture only) | Cheapest deterministic path; recommended for this specific MKV unless additional GCS-screen-recorded fixtures are expected. |
| Calibration JSON | Per-camera topotek_gimbal_factory.json (same shape as khp20s30_factory.json) |
"factory_sheet" provenance per AZ-702 precedent | Selected | Project-accepted (Fact #17). Residual 1–3 % focal-length error envelope. |
| Companion telemetry CSV | Existing derkachi.tlog → data_imu.csv exporter, retargeted |
unchanged | Selected | Reuses existing tool. |
| Source recovery (Option Z) | Pull RTSP / extract DCIM from gimbal with OSD disabled via Topotek GimbalControl utility (ArduPilot Source #4) | n/a — out-of-band camera access | ❌ Not available — no camera / GCS access for this data source. | Documented here only because it is the cleanest path in principle and because the existing flight_derkachi.mp4 fixture was produced this way. Not actionable for this project's data pipeline. The only way it returns to the table is if the original supplier voluntarily re-records with OSD off — outside this project's control. |
7. Testing strategy
7.1 Functional / integration
- Crop-coordinate validation. Decode 5 frames from
flight_topotek_2026-05-09.mp4and assert they are 900×445; assert the IR PIP is not present in the right-third of the frame; assert the GCS sidebars are not present in the leftmost / rightmost columns. - OSD mask validation. Open
osd_mask.png, assert dimensions match the MP4, assert the union of black pixels covers ≥95 % of the union of OSD rectangles you would otherwise pass todelogo. Optionally, rendercropped_frame * (mask/255.)and eyeball that the burned-in text is dimmed to black while the EO terrain is preserved. - DISK mask-aware contract. Add a unit test under
tests/unit/c3_matchers/that loads the existing DISK wrapper, passes a synthetic 900×445 image with a checkerboard pattern + a corner rectangle of pure white, passes a mask zeroing the corner, and asserts no keypoint is returned at coordinates inside that corner. - End-to-end replay smoke test. Add a sibling test to
tests/e2e/replay/test_az835_e2e_real_flight.pyparameterised overflight_topotek_2026-05-09and confirm the pipeline runs to completion. Track end-to-end accuracy separately under AC-1.x. - Frame-range filter sanity. Iterate
frame_ranges.yamland assert: every range'sstart_frame < end_frame, ranges are non-overlapping, and the union covers at least N seconds of footage (where N is a project-chosen minimum-fixture-duration).
7.2 Non-functional
- Reproducibility: re-run the entire
tools/fixture_prep/script twice on a clean checkout and assert byte-identical outputs (pin libx264 settings; pinpytesseractversion; pin Python version inpyproject.toml). - Throughput: the entire fixture-prep run for one 6-minute MKV should complete in well under 10 minutes on a developer workstation (no AC requirement; sanity ceiling).
- No fabrication regression: extract keypoints from the masked region of the Option B output and assert count == 0; for the Option C output, assert keypoint count is at most 5 % of the unmasked terrain keypoint count.
8. Open questions
- Adopt the one-line C3 wrapper change? Section 4.1 Step 3 proposes forwarding
mask=…through the existing DISK call. This is the lowest-risk highest-quality path (Option B) but requires touchingsrc/.../matchers/. The fallback (Option C, chaineddelogo) avoids this code change entirely and only touches the fixture-prep script. Either is defensible — the choice depends on whether the team is willing to formalise mask-aware fixtures as a first-class concept in the replay layer (recommended yes) or wants to keep that layer "drop in an MP4" pure (defensible too). - Use OCR for the frame filter, or just hand-label this one fixture? For a single 6-minute clip, the manual labeling pass is cheaper than building, validating, and pinning a Tesseract/PaddleOCR pipeline. Use OCR only if you expect to ingest additional fixtures of the same class (same gimbal HUD layout) and want the script to amortise. Either way, the YAML output format is the same — so this can be revisited later.
- Future recordings — Option Z (direct RTSP/DCIM extraction with OSD off) is not available to this project: no camera or GCS access exists for this data source. The only theoretical path to bypass the cleanup pipeline is to ask the original supplier to re-record with OSD disabled (via Topotek's GimbalControl utility on their side, or by setting
MNT1_OPTIONS/ equivalent on their flight controller). Whether to even make that request is a separate decision; it is not a technical option this project can execute on its own. Assume Option Z stays unavailable and plan all future fixtures of this class around the OCR / manual-labelling path in §4 + §5.3.
9. References
L1 (official documentation / source):
- FFmpeg
delogofilter, vf_delogo.c [Sources #1, #2 in source registry] - FFmpeg
removelogofilter, vf_removelogo.c [Source #3] - FFmpeg
tmedianfilter [Source #8] - ArduPilot Topotek Gimbal docs [Source #4]
- LightGlue maintainer reply on score-map masking, issue cvg/LightGlue#97 [Source #5]
- Kornia
DISK.forward()documentation [Source #6] - DISK upstream source (
disk/model/disk.py) [Source #7] - Project:
_docs/00_problem/input_data/flight_derkachi/README.md[Source #9] - Project:
_docs/00_problem/input_data/flight_derkachi/camera_info.md[Source #10]
L2 (peer-reviewed):
- ProPainter (ICCV 2023) [Source #11]
- VideoPainter (arXiv 2503.05639, 2025) [Source #12] — referenced as disqualified
- VidPivot / DiffuEraser comparison (arXiv 2510.21461, 2025) [Source #13]
- DISK paper (NeurIPS 2020, arXiv 2006.13566) [Source #14]
L3 (practitioner / community):
- "Removing obnoxious logos from videos" blog [Source #15]
- Conditional Temporal Median Filter reference [Source #16]
- Foundry Nuke TemporalMedian reference [Source #17]
In-repo cross-references:
_docs/01_solution/solution_draft01.md— existing solution; C2 (MixVPR TensorRT INT8+FP16), C3 (DISK + LightGlue), C5 (GTSAM iSAM2 + CombinedImuFactor) [Source #R1]_docs/00_research/06_component_fit_matrix/00_summary.md— confirms no fixture-prep component exists in the runtime [Source #R2]_docs/00_problem/input_data/flight_derkachi/khp20s30_factory.json— existing per-camera calibration JSON precedent [Source #R3]
This-run new evidence:
- ffprobe verification of
2026-05-09 16-10-54.mkvtechnical metadata (Section 2.1) pymavlinkscan of unpacked2026-05-09 16-09-54.tlog(Section 5) — 133 191 messages over 446.8 s, GIMBAL_DEVICE_ATTITUDE_STATUS at 9.7 Hz, all identity quaternions
10. Related artifacts
| Artifact | Status |
|---|---|
_docs/00_research/_mode_b_2026-05-29_video_extraction/ |
Complete through Step 7.5 — this draft is its Step 8 deliverable, with one row updated by new tlog evidence |
_docs/01_solution/solution_draft01.md |
Untouched. C1–C12 unchanged. This draft is purely additive. |
_docs/01_solution/solution.md |
Untouched. |
_docs/00_problem/input_data/10.05.2026/2026-05-09 16-10-54.mkv |
Source MKV. Untouched. |
_docs/00_problem/input_data/10.05.2026/2026-05-09 16-09-54.zip |
Source tlog archive. Untouched. |
Future: input_data/flight_topotek_2026-05-09/ |
The cleaned fixture directory this draft proposes producing. Not yet created. |
Future: tools/fixture_prep/ |
The reproducible script that will produce the above. Not yet created. |