Update demo replay validation and testing documentation

- Modified the autodev state to reflect the current testing phase and details of the new `jetson-e2e` tests. - Enhanced the "How to Test" documentation to provide clearer instructions on the demo replay validation process, including video and tlog alignment steps. - Updated architectural documentation to include the new demo replay operator flow and its dependencies. - Documented the removal of deprecated auto-sync features and clarified the operator-facing UI for replay validation. - Added new entries in the dependencies table for upcoming tasks related to the demo replay flow. These changes improve clarity and usability for operators and developers working with the demo replay system.
fixes in tests, autodev update
2026-06-21 11:01:13 +00:00 · 2026-06-20 11:24:43 +03:00 · 2026-06-18 12:13:43 +03:00 · 2026-06-10 05:35:01 +03:00 · 2026-06-09 20:43:15 +03:00 · 2026-06-09 14:06:35 +03:00
284 changed files with 38830 additions and 3588 deletions
@@ -11,10 +11,20 @@ If you want to run a specific skill directly (without the orchestrator), use the
 ```
 /problem                — interactive problem gathering → _docs/00_problem/
 /research               — solution drafts → _docs/01_solution/
-/plan                   — architecture, components, tests → _docs/02_document/
-/decompose              — atomic task specs → _docs/02_tasks/todo/
-/implement              — batched parallel implementation → _docs/03_implementation/
-/deploy                 — containerization, CI/CD, observability → _docs/04_deploy/
+/plan                   — architecture, ADRs, components, tests, epics → _docs/02_document/
+/test-spec              — blackbox/perf/resilience/security test specs → _docs/02_document/tests/
+/decompose              — atomic task specs (multi-mode) → _docs/02_tasks/todo/
+/implement              — sequential dependency-aware batches with code review and completeness gates → _docs/03_implementation/
+/test-run               — runs the test suite (functional / perf modes) with gating
+/code-review            — multi-phase review used by /implement
+/refactor               — 8-phase structured refactoring (incl. testability sub-mode) → _docs/04_refactoring/
+/security               — OWASP-driven audit → _docs/05_security/
+/deploy                 — containerization, CI/CD, environments, observability, procedures, scripts → _docs/04_deploy/
+/release                — execute deploy artifacts in prod, smoke-test, watch, decide rollback → _docs/04_release/
+/document               — bottom-up reverse-engineering of an existing codebase → _docs/02_document/
+/new-task               — interactive feature planning for an existing codebase → _docs/02_tasks/todo/
+/ui-design              — HTML+CSS mockups + design system → _docs/02_document/ui_mockups/
+/retrospective          — metrics + lessons log → _docs/06_metrics/ + _docs/LESSONS.md
 ```

 ## How It Works
@@ -41,148 +51,201 @@ The state file tracks completed steps, key decisions, blockers, and session cont

 Skills auto-chain without pausing between them. The only pauses are:
 - **BLOCKING gates** inside each skill (user must confirm before proceeding)
- **Session boundary** after decompose (suggests new conversation before implement)
+- **Session boundaries** declared in each flow's auto-chain rules (e.g., after `decompose`, after `decompose tests`) — suggested new-conversation breakpoints to keep context fresh

-A typical project runs in 2-4 conversations:
- Session 1: Problem → Research → Research decision
- Session 2: Plan → Decompose
- Session 3: Implement (may span multiple sessions)
- Session 4: Deploy
+There are three flows, resolved on every invocation (see `skills/autodev/SKILL.md` § Flow Resolution):

-Re-entry is seamless: type `/autodev` in a new conversation and the orchestrator reads the state file to pick up exactly where you left off.
+| Flow | When | Steps |
+|------|------|-------|
+| **greenfield** | empty workspace, no source yet | 17 steps: Problem → Research → Plan → UI Design → Test Spec → Decompose → Implement → Code Testability Revision → Decompose Tests → Implement Tests → Run Tests → Test-Spec Sync → Update Docs → Security Audit (opt) → Performance Test (opt) → Deploy → Release → Retrospective |
+| **existing-code** | source files present | one-time baseline (Document → Architecture Baseline Scan → Test Spec → Code Testability Revision → Decompose Tests → Implement Tests → Run Tests → optional Refactor) then a feature-cycle loop (New Task → Implement → Run Tests → Test-Spec Sync → Update Docs → Security Audit (opt) → Performance Test (opt) → Deploy → Release → Retrospective → loops back to New Task) |
+| **meta-repo** | `.gitmodules`, workspace manifest, or multi-component aggregator | uses `monorepo-*` skills + `_docs/_repo-config.yaml` instead of per-component BUILD-SHIP folders |
+
+A typical greenfield project spans several conversations because of session boundaries. Re-entry is seamless: type `/autodev` in a new conversation and the orchestrator reads `_docs/_autodev_state.md` to pick up exactly where you left off.

 ## Skill Descriptions

 ### autodev (meta-orchestrator)

-Auto-chaining engine that sequences the full BUILD → SHIP workflow. Persists state to `_docs/_autodev_state.md`, tracks key decisions and session context, and flows through problem → research → plan → decompose → implement → deploy without manual skill invocation. Maximizes work per conversation with seamless cross-session re-entry.
+Auto-chaining engine that sequences the full BUILD → SHIP → EVOLVE workflow. Persists state to `_docs/_autodev_state.md`, surfaces top-3 lessons from `_docs/LESSONS.md` at every invocation, replays any `_docs/_process_leftovers/` entries, tracks key decisions and session context, and flows through the active flow's steps without manual skill invocation. Maximizes work per conversation with seamless cross-session re-entry.

 ### problem

-Interactive interview that builds `_docs/00_problem/`. Asks probing questions across 8 dimensions (problem, scope, hardware, software, acceptance criteria, input data, security, operations) until all required files can be written with concrete, measurable content.
+Interactive 4-phase interview that builds `_docs/00_problem/`. Asks probing questions across 8 dimensions (problem & goals, scope, hardware & environment, software & tech, acceptance criteria, input data, security, operational) until all required files can be written with concrete, measurable, quantifiable content. Acceptance criteria must include numeric targets; input data must include `expected_results/` mappings.

 ### research

-8-step deep research methodology. Mode A produces initial solution drafts. Mode B assesses and revises existing drafts. Includes AC assessment, source tiering, fact extraction, comparison frameworks, and validation. Run multiple rounds until the solution is solid.
+8-step deep research methodology. Mode A produces initial solution drafts. Mode B assesses and revises existing drafts. Classifies output as **Technical-component selection** (full per-mode API verification gates apply) or **Non-technical investigation** (gates relaxed). Source tiering, fact extraction, comparison frameworks, validation, exact-fit component selection. Run multiple rounds until the solution is solid.

 ### plan

-6-step planning workflow. Produces integration test specs, architecture, system flows, data model, deployment plan, component specs with interfaces, risk assessment, test specifications, and work item epics. Heavy interaction at BLOCKING gates.
+6-step planning workflow with one half-step (4.5: Architecture Decision Records). Produces blackbox test specs (delegated to test-spec), glossary, architecture vision, architecture document, data model, deployment plan, component specs with interfaces, risk assessment, ADRs, test specifications, and work item epics. Heavy interaction at BLOCKING gates (glossary+vision, architecture, components, mitigations, ADRs).
+
+### test-spec
+
+4-phase test specification workflow. Phase 1 analyzes input data + expected-results completeness. Phase 2 emits 8 test artifacts (environment, test-data, blackbox, performance, resilience, security, resource-limit, traceability matrix). Phase 3 is the hard gate that requires every test to have quantifiable expected results. Phase 4 emits runner scripts. Cycle-update mode for incremental refresh.

 ### decompose

-4-step task decomposition. Produces a bootstrap structure plan, atomic task specs per component, integration test tasks, and a cross-task dependency table. Each task gets a work item ticket and is capped at 8 complexity points.
+Multi-mode task decomposition with 6 internal step files. Implementation mode runs Step 1 (Bootstrap), 1.5 (Module Layout), 1.7 (System-Pipeline owner tasks), 2 (per-component tasks), 4 (Cross-Verification). Tests-only mode runs Step 1t (Test Infrastructure), 3 (Blackbox tasks), 4. Single-component mode runs Step 2 only. Each task is tracker-prefixed and capped at 5 complexity points. The 1.7 step exists specifically to prevent the GPS-passthrough class of failure (see `meta-rule.mdc`).

 ### implement

-Orchestrator that reads task specs, computes dependency-aware execution batches, launches up to 4 parallel implementer subagents, runs code review after each batch, and commits per batch. Does not write code itself.
-
-### deploy
-
-7-step deployment planning. Status check, containerization, CI/CD pipeline, environment strategy, observability, deployment procedures, and deployment scripts. Produces documents for steps 1-6 and executable scripts in step 7.
+Orchestrator that reads task specs, computes dependency-aware execution batches via topological sort, **implements tasks sequentially within each batch** (no subagents, no parallel execution — see `.cursor/rules/no-subagents.mdc`), runs code review after each batch, runs cumulative code review every K batches, and commits per batch. Has a Product Implementation Completeness Gate (Step 15) that compares promises in task specs / architecture against actual production code, plus a System-Pipeline Audit (Step 15.b) that walks architecture-named pipelines and verifies a real production caller wires each adjacent component pair. Either gate's FAIL stops the cycle until remediation tasks are created.

 ### code-review

-Multi-phase code review against task specs. Produces structured findings with verdict: PASS, FAIL, or PASS_WITH_WARNINGS.
+7-phase code review against task specs (Phase 7 is Architecture Compliance against `module-layout.md` and `architecture.md`). Produces structured findings with verdict: PASS, PASS_WITH_WARNINGS, or FAIL. Three modes: full (per batch), baseline (one-time architecture scan of an existing codebase), cumulative (mid-implementation across batches with `## Baseline Delta`).
+
+### test-run
+
+Runs the test suite. Functional mode (default): detects pytest/dotnet/cargo/npm or `scripts/run-tests.sh`, applies a System-Under-Test Reality Gate to refuse passes where internal product modules were stubbed, classifies failures and skips, gates on outcome. Perf mode: detects `scripts/run-performance-tests.sh` or k6/locust/artillery/wrk, captures latency/throughput/error metrics, compares against thresholds.

 ### refactor

-6-phase structured refactoring: baseline, discovery, analysis, safety net, execution, hardening.
+8-phase structured refactoring: baseline → discovery → analysis → safety net → execution → test sync → verification → documentation. Two input modes (Automatic / Guided). Testability sub-mode skips Phase 3 by design and emits a `testability_changes_summary.md` for user review. Each run lives in its own `RUN_DIR` under `_docs/04_refactoring/NN-<run-name>/`.

 ### security

-OWASP-based security testing and audit.
+5-phase OWASP-based audit: dependency scan → static analysis → OWASP Top 10 review → infrastructure review → consolidated security report. Severity-ranked, evidence-based, actionable. Complementary to `code-review` Phase 4 (lightweight security quick-scan).
+
+### deploy
+
+7-step deployment planning. Produces documents for steps 1–6 (status & env, containerization, CI/CD pipeline, environment strategy, observability, deployment procedures) and executable scripts in step 7 (`deploy.sh`, `pull-images.sh`, `start-services.sh`, `stop-services.sh`, `health-check.sh`).
+
+### release
+
+Executes the deployment plan produced by `/deploy` against a target environment. 6 phases: pre-release gate (AC + risk + rollback readiness), strategy select (all-at-once / blue-green / canary / manual), execute (run scripts, monitor exit codes), smoke test (delegate to test-run prod-smoke), watch window (read observability for the configured duration), commit-or-rollback. Outputs `_docs/04_release/release_<version>.md`. Produces a definitive Released / Rolled-Back / Aborted verdict; failure of any phase auto-triggers rollback unless the user opts to investigate.

 ### retrospective

-Collects metrics from implementation batch reports, analyzes trends, produces improvement reports.
+4-step workflow: collect metrics → analyze trends → produce report → update lessons log (`_docs/LESSONS.md`, ring buffer of last 15 entries consumed by `new-task`, `plan`, `decompose`, and `autodev`). Cycle-end (default) and incident modes; incident mode is auto-invoked after a 3-strike failure.

 ### document

-Bottom-up codebase documentation. Analyzes existing code from modules through components to architecture, then retrospectively derives problem/restrictions/acceptance criteria. Alternative entry point for existing codebases — produces the same `_docs/` artifacts as problem + plan, but from code analysis instead of user interview.
+Bottom-up codebase documentation. Analyzes existing code from modules through components to architecture, then retrospectively derives problem/restrictions/acceptance criteria. Alternative entry point for existing codebases — produces the same `_docs/` artifacts as problem + plan, but from code analysis instead of user interview. Two workflow files: `workflows/full.md` (full / focus-area / resume) and `workflows/task.md` (incremental update for a single task).
+
+### new-task
+
+Existing-code feature planning loop. Walks the user through Step 1 (description) → Step 2 (complexity assessment, consults `LESSONS.md`) → Step 3 (research if needed) → Step 4 (codebase analysis incl. test-coverage gap) → Step 4.5 (contract & layout check) → Step 5 (validate assumptions) → Step 6 (write task spec) → Step 7 (tracker ticket) → Step 8 (loop or finalize).
+
+### ui-design
+
+End-to-end UI workflow. Phase 0 (complexity detection: full vs quick) → Phase 1 (context check) → Phase 2 (requirements) → Phase 3 (direction exploration) → Phase 4 (design system synthesis: `DESIGN.md`) → Phase 5 (HTML+Tailwind code generation) → Phase 6 (visual verification, optional MCP enhancements) → Phase 7 (user review) → Phase 8 (iteration). Has Applicability Check that refuses to run on non-UI projects.
+
+### monorepo-* (suite-level)
+
+Six skills for meta-repos: `monorepo-discover` (write/refresh `_docs/_repo-config.yaml`), `monorepo-document` (sync unified docs), `monorepo-cicd` (sync CI/compose/env templates), `monorepo-onboard` (atomic add-component), `monorepo-status` (read-only drift report), `monorepo-e2e` (sync suite-level integration harness). They never cross domains; each touches exactly one artifact class.

 ## Developer TODO (Project Mode)

-### BUILD
+The numbered list below mirrors greenfield-flow ordering. Existing-code projects start at `/document`, then enter the feature-cycle loop at `/new-task`. See `skills/autodev/flows/{greenfield,existing-code,meta-repo}.md` for the authoritative step tables.
+
+### BUILD (greenfield)

 ```
-0. /problem                                      — interactive interview → _docs/00_problem/
-   - problem.md                                   (required)
-   - restrictions.md                              (required)
-   - acceptance_criteria.md                       (required)
-   - input_data/                                  (required)
-   - security_approach.md                         (optional)
-
-1. /research                                     — solution drafts → _docs/01_solution/
-   Run multiple times: Mode A → draft, Mode B → assess & revise
-
-2. /plan                                         — architecture, data model, deployment, components, risks, tests, epics → _docs/02_document/
-
-3. /decompose                                    — atomic task specs + dependency table → _docs/02_tasks/todo/
-
-4. /implement                                    — batched parallel agents, code review, commit per batch → _docs/03_implementation/
+1. /problem            — interactive 4-phase interview → _docs/00_problem/
+                          required: problem.md, restrictions.md, acceptance_criteria.md, input_data/
+                          optional: security_approach.md
+2. /research           — solution drafts (Mode A draft, Mode B assess) → _docs/01_solution/
+3. /plan               — glossary, architecture vision, architecture, data model, deployment, components,
+                          risks, ADRs (Step 4.5), test specs, epics → _docs/02_document/
+                          (Step 1 invokes /test-spec internally)
+4. /ui-design          — HTML+Tailwind mockups (UI projects only) → _docs/02_document/ui_mockups/
+5. /test-spec          — produces 8 test-spec artifacts + traceability matrix → _docs/02_document/tests/
+                          (already invoked from /plan Step 1; Step 5 here is the explicit autodev step)
+6. /decompose          — implementation tasks + module-layout + system-pipeline owner tasks →
+                          _docs/02_tasks/todo/
+7. /implement          — sequential dependency-aware batches; per-batch code-review;
+                          Product Completeness Gate + System-Pipeline Audit → _docs/03_implementation/
+8. (auto) Code Testability Revision  — surgical refactor to make code runnable under tests
+9. /decompose tests    — test-only decomposition mode → _docs/02_tasks/todo/
+10. /implement (tests) — implements test tasks
+11. /test-run          — full functional suite gate
+12. /test-spec --cycle-update  — append implementation-learned scenarios
+13. /document --task   — update affected component / module / architecture docs
+14. /security          — OWASP-based audit (optional gate)
+15. /test-run --perf   — perf/load tests (optional gate)
 ```

 ### SHIP

 ```
-5. /deploy                                       — containerization, CI/CD, environments, observability, procedures → _docs/04_deploy/
+16. /deploy            — containerization, CI/CD, environments, observability, procedures, scripts → _docs/04_deploy/
+17. /release           — execute deploy artifacts in prod, smoke-test, watch, decide rollback → _docs/04_release/
 ```

 ### EVOLVE

 ```
-6. /refactor                                     — structured refactoring → _docs/04_refactoring/
-7. /retrospective                                — metrics, trends, improvement actions → _docs/06_metrics/
+18. /retrospective     — metrics + trends + lessons-log update → _docs/06_metrics/ + _docs/LESSONS.md
+       (cycle-end mode after release; incident mode auto-fires after 3-strike failure)
+
+After greenfield completes, the state file is rewritten to point at the existing-code flow's
+feature-cycle loop, which begins with /new-task and ends with /retrospective. The loop runs once
+per feature with state.cycle incremented.
+
+Off-cycle:
+/refactor              — full 8-phase refactor → _docs/04_refactoring/NN-<run-name>/
+/document              — full reverse-engineering of an unfamiliar codebase
 ```

-Or just use `/autodev` to run steps 0-5 automatically.
+Or just use `/autodev` to run all the above automatically — the orchestrator chooses the right flow, sequences steps, surfaces lessons, processes leftovers, and pauses only at BLOCKING gates and declared session boundaries.

 ## Available Skills

 | Skill | Triggers | Output |
 |-------|----------|--------|
-| **autodev** | "autodev", "auto", "start", "continue", "what's next" | Orchestrates full workflow |
+| **autodev** | "autodev", "auto", "start", "continue", "what's next" | Orchestrates full workflow (3 flows) |
 | **problem** | "problem", "define problem", "new project" | `_docs/00_problem/` |
 | **research** | "research", "investigate" | `_docs/01_solution/` |
-| **plan** | "plan", "decompose solution" | `_docs/02_document/` |
+| **plan** | "plan", "decompose solution" | `_docs/02_document/` (incl. ADRs) |
 | **test-spec** | "test spec", "blackbox tests", "test scenarios" | `_docs/02_document/tests/` + `scripts/` |
-| **decompose** | "decompose", "task decomposition" | `_docs/02_tasks/todo/` |
-| **implement** | "implement", "start implementation" | `_docs/03_implementation/` |
-| **test-run** | "run tests", "test suite", "verify tests" | Test results + verdict |
-| **code-review** | "code review", "review code" | Verdict: PASS / FAIL / PASS_WITH_WARNINGS |
+| **decompose** | "decompose", "task decomposition", "decompose tests" | `_docs/02_tasks/todo/` + `_docs/02_document/module-layout.md` |
+| **implement** | "implement", "start implementation" | `_docs/03_implementation/` (sequential — see `no-subagents.mdc`) |
+| **test-run** | "run tests", "test suite", "verify tests", "perf test" | Test results + verdict |
+| **code-review** | "code review", "review code" | Verdict: PASS / FAIL / PASS_WITH_WARNINGS (7 phases) |
 | **new-task** | "new task", "add feature", "new functionality" | `_docs/02_tasks/todo/` |
 | **ui-design** | "design a UI", "mockup", "design system" | `_docs/02_document/ui_mockups/` |
-| **refactor** | "refactor", "improve code" | `_docs/04_refactoring/` |
-| **security** | "security audit", "OWASP" | `_docs/05_security/` |
+| **refactor** | "refactor", "improve code", "testability" | `_docs/04_refactoring/NN-<run-name>/` |
+| **security** | "security audit", "OWASP", "vulnerability scan" | `_docs/05_security/` |
 | **document** | "document", "document codebase", "reverse-engineer docs" | `_docs/02_document/` + `_docs/00_problem/` + `_docs/01_solution/` |
-| **deploy** | "deploy", "CI/CD", "observability" | `_docs/04_deploy/` |
-| **retrospective** | "retrospective", "retro" | `_docs/06_metrics/` |
+| **deploy** | "deploy", "CI/CD", "observability", "containerize" | `_docs/04_deploy/` (plans + scripts) |
+| **release** | "release", "ship", "go live", "rollback" | `_docs/04_release/` (executed deploy + verdict) |
+| **retrospective** | "retrospective", "retro", "metrics review" | `_docs/06_metrics/` + `_docs/LESSONS.md` |
+| **monorepo-discover** | "discover monorepo", "scan submodules" | `_docs/_repo-config.yaml` |
+| **monorepo-document** | "sync monorepo docs" | unified `_docs/*.md` |
+| **monorepo-cicd** | "sync compose", "sync ci" | suite-level CI/compose/env templates |
+| **monorepo-onboard** | "onboard component", "register submodule" | atomic component addition |
+| **monorepo-status** | "monorepo status", "drift report" | read-only drift report |
+| **monorepo-e2e** | "suite e2e", "integration harness" | `e2e/docker-compose.suite-e2e.yml` and fixtures |

-## Tools
-
-| Tool | Type | Purpose |
-|------|------|---------|
-| `implementer` | Subagent | Implements a single task. Launched by `/implement`. |
+> The `.cursor/agents/` directory is intentionally empty. Per `.cursor/rules/no-subagents.mdc` the main agent does not delegate to subagents in this workspace; `/implement` runs tasks sequentially.

 ## Project Folder Structure

 ```
-_project.md                              — project-specific config (tracker type, project key, etc.)
 _docs/
-├── _autodev_state.md                  — autodev orchestrator state (progress, decisions, session context)
-├── 00_problem/                          — problem definition, restrictions, AC, input data
+├── _autodev_state.md                   — autodev orchestrator state (≤30 lines; pointer only)
+├── _process_leftovers/                  — deferred tracker writes replayed at next /autodev (per tracker.mdc)
+├── _repo-config.yaml                    — meta-repo only; produced by monorepo-discover
+├── LESSONS.md                           — ring buffer of last 15 actionable lessons (consumed by autodev/new-task/plan/decompose)
+├── 00_problem/                          — problem definition, restrictions, AC, input data + expected_results/
 ├── 00_research/                         — intermediate research artifacts
 ├── 01_solution/                         — solution drafts, tech stack, security analysis
 ├── 02_document/
-│   ├── architecture.md
+│   ├── architecture.md                 — includes ## Architecture Vision (user-confirmed)
+│   ├── glossary.md                     — user-confirmed terminology
 │   ├── system-flows.md
 │   ├── data_model.md
+│   ├── module-layout.md                — per-component Owns/Imports-from/Public API (decompose Step 1.5)
+│   ├── architecture_compliance_baseline.md  — existing-code baseline scan output
 │   ├── risk_mitigations.md
+│   ├── adr/[NNN]_[decision_slug].md    — Architectural Decision Records (plan Step 4.5)
 │   ├── components/[##]_[name]/          — description.md + tests.md per component
+│   ├── contracts/<component>/<name>.md  — versioned public-API contracts
 │   ├── common-helpers/
-│   ├── tests/                           — environment, test data, blackbox, performance, resilience, security, traceability
-│   ├── deployment/                      — containerization, CI/CD, environments, observability, procedures
+│   ├── tests/                           — environment, test-data, blackbox, performance, resilience, security, resource-limit, traceability matrix
 │   ├── ui_mockups/                      — HTML+CSS mockups, DESIGN.md (ui-design skill)
 │   ├── diagrams/
 │   └── FINAL_report.md
@@ -192,12 +255,13 @@ _docs/
 │   ├── backlog/                         — parked tasks (not scheduled yet)
 │   └── done/                            — completed/archived tasks
 ├── 02_task_plans/                       — per-task research artifacts (new-task skill)
-├── 03_implementation/                   — batch reports, implementation_report_*.md
+├── 03_implementation/                   — batch_*_cycle*.md, implementation_report_*.md, implementation_completeness_cycle*.md, cumulative_review_*.md
 │   └── reviews/                         — code review reports per batch
-├── 04_deploy/                           — containerization, CI/CD, environments, observability, procedures, scripts
-├── 04_refactoring/                      — baseline, discovery, analysis, execution, hardening
-├── 05_security/                         — dependency scan, SAST, OWASP review, security report
-└── 06_metrics/                          — retro_[YYYY-MM-DD].md
+├── 04_deploy/                           — containerization, CI/CD, environments, observability, procedures, deploy_scripts.md, reports/
+├── 04_refactoring/NN-<run-name>/        — baseline_metrics, discovery, analysis, test_specs, execution_log, test_sync, verification, FINAL_report (one folder per refactor run)
+├── 04_release/                          — release_<version>.md (one per /release invocation), rollback_<version>.md
+├── 05_security/                         — dependency_scan, static_analysis, owasp_review, infrastructure_review, security_report
+└── 06_metrics/                          — retro_<YYYY-MM-DD>.md, structure_<YYYY-MM-DD>.md, perf_<YYYY-MM-DD>_<run-label>.md, incident_<YYYY-MM-DD>_<skill>.md
 ```

 ## Standalone Mode
@@ -1,105 +0,0 @@
---
-name: implementer
-description: |
-  Implements a single task from its spec file. Use when implementing tasks from _docs/02_tasks/todo/.
-  Reads the task spec, analyzes the codebase, implements the feature with tests, and verifies acceptance criteria.
-  Launched by the /implement skill as a subagent.
---
-
-You are a professional software developer implementing a single task.
-
-## Input
-
-You receive from the `/implement` orchestrator:
- Path to a task spec file (e.g., `_docs/02_tasks/todo/[TRACKER-ID]_[short_name].md`)
- Files OWNED (exclusive write access — only you may modify these)
- Files READ-ONLY (shared interfaces, types — read but do not modify)
- Files FORBIDDEN (other agents' owned files — do not touch)
-
-## Context (progressive loading)
-
-Load context in this order, stopping when you have enough:
-
-1. Read the task spec thoroughly — acceptance criteria, scope, constraints, dependencies
-2. Read `_docs/02_tasks/_dependencies_table.md` to understand where this task fits
-3. Read project-level context:
-   - `_docs/00_problem/problem.md`
-   - `_docs/00_problem/restrictions.md`
-   - `_docs/01_solution/solution.md`
-4. Analyze the specific codebase areas related to your OWNED files and task dependencies
-
-## Boundaries
-
-**Always:**
- Run tests before reporting done
- Follow existing code conventions and patterns
- Implement error handling per the project's strategy
- Stay within the task spec's Scope/Included section
-
-**Ask first:**
- Adding new dependencies or libraries
- Creating files outside your OWNED directories
- Changing shared interfaces that other tasks depend on
-
-**Never:**
- Modify files in the FORBIDDEN list
- Skip writing tests
- Change database schema unless the task spec explicitly requires it
- Commit secrets, API keys, or passwords
- Modify CI/CD configuration unless the task spec explicitly requires it
-
-## Process
-
-1. Read the task spec thoroughly — understand every acceptance criterion
-2. Analyze the existing codebase: conventions, patterns, related code, shared interfaces
-3. Research best implementation approaches for the tech stack if needed
-4. If the task has a dependency on an unimplemented component, create a minimal interface mock
-5. Implement the feature following existing code conventions
-6. Implement error handling per the project's defined strategy
-7. Implement unit tests (use Arrange / Act / Assert section comments in language-appropriate syntax)
-8. Implement integration tests — analyze existing tests, add to them or create new
-9. Run all tests, fix any failures
-10. Verify every acceptance criterion is satisfied — trace each AC with evidence
-
-## Stop Conditions
-
- If the same fix fails 3+ times with different approaches, stop and report as blocker
- If blocked on an unimplemented dependency, create a minimal interface mock and document it
- If the task scope is unclear, stop and ask rather than assume
-
-## Completion Report
-
-Report using this exact structure:
-
-```
-## Implementer Report: [task_name]
-
-**Status**: Done | Blocked | Partial
-**Task**: [TRACKER-ID]_[short_name]
-
-### Acceptance Criteria
-| AC | Satisfied | Evidence |
-|----|-----------|----------|
-| AC-1 | Yes/No | [test name or description] |
-| AC-2 | Yes/No | [test name or description] |
-
-### Files Modified
- [path] (new/modified)
-
-### Test Results
- Unit: [X/Y] passed
- Integration: [X/Y] passed
-
-### Mocks Created
- [path and reason, or "None"]
-
-### Blockers
- [description, or "None"]
-```
-
-## Principles
-
- Follow SOLID, KISS, DRY
- Dumb code, smart data
- No unnecessary comments or logs (only exceptions)
- Ask if requirements are ambiguous — do not assume
@@ -3,11 +3,28 @@ description: "Enforces readable, environment-aware coding standards with scope d
 alwaysApply: true
 ---
 # Coding preferences
- Prefer the simplest solution that satisfies all requirements, including maintainability. When in doubt between two approaches, choose the one with fewer moving parts — but never sacrifice correctness, error handling, or readability for brevity.
+
+## Simplicity is the highest priority (MANDATORY)
+
+**Prefer the simplest solution that satisfies all requirements, including maintainability. When in doubt between two approaches, choose the one with fewer moving parts — but never sacrifice correctness, error handling, or readability for brevity.**
+
+This is not a tie-breaker. It is the default. Every new class, layer, cache, hosted service, sliding window, persisted state, event-type variant, or configuration option is a liability — it has to be documented, tested, monitored, migrated, and reasoned about by every reader for the rest of the project's life. Add complexity only when a simpler design has been considered and explicitly rejected for a named, concrete reason tied to a requirement.
+
+Operational checks the agent MUST apply before adding code:
+
+- Before adding a new class, interface, abstract layer, configuration option, or hosted service, **justify in writing** (PR description, task spec, or chat message to the user) why the same effect cannot be achieved by extending an existing component. "Cleaner separation" / "more future-proof" / "more flexible" are NOT justifications unless tied to a concrete upcoming change that the simpler design would make harder.
+- Before introducing a sliding window, smoother, debouncer, in-memory cache, queue, or other stateful in-memory helper, justify why a stateless / on-demand alternative would not meet the requirement. Cite the acceptance criterion the helper is needed for.
+- **Two parallel pipelines for the same conceptual data are a smell.** Examples: two event types that differ only in a boolean flag; two HTTP endpoints that return the same resource shaped differently; two storage paths for the same entity. Either merge them or document on the producer's interface why both must exist and which downstream consumer needs which.
+- **Rehydrate-on-restart logic is a strong signal of over-engineering.** If a feature requires reading state from the DB at startup and re-running it through a state machine, the in-memory state is probably trying to be a database. Consider keeping the state in the DB and querying it on demand instead.
+- When a feature can be expressed in N existing primitives or N+1 (one new primitive + N existing), pick N existing. If you pick N+1, name the new primitive in the PR title.
+
+Violations of this section are reviewable. A reviewer who finds an unjustified abstraction, parallel pipeline, or stateful helper is right to ask for it to be removed.
+
+## Other preferences
 - Follow the Single Responsibility Principle — a class or method should have one reason to change:
  - If a method is hard to name precisely from the caller's perspective, its responsibility is misplaced. Vague names like "candidate", "data", or "item" are a signal — fix the design, not just the name.
  - Logic specific to a platform, variant, or environment belongs in the class that owns that variant, not in the general coordinator. Passing a dependency through is preferable to leaking variant-specific concepts into shared code.
-  - Only use static methods for pure, self-contained computations (constants, simple math, stateless lookups). If a static method involves resource access, side effects, OS interaction, or logic that varies across subclasses or environments — use an instance method or factory class instead. Before implementing a non-trivial static method, ask the user.
+  - Static members: see "Static members (functions / classes)" below — default to injectable instance types; `static` only for pure, simple, stateless helpers (constants, simple math, stateless lookups), never for business logic or anything with side effects/state. Before implementing a non-trivial static method, ask the user.
 - Avoid boilerplate and unnecessary indirection, but never sacrifice readability for brevity.
 - Never suppress errors silently — no `2>/dev/null`, empty `catch` blocks, bare `except: pass`, or discarded error returns. These hide the information you need most when something breaks. If an error is truly safe to ignore, log it or comment why.
 - Do not add comments that merely narrate what the code does. Comments are appropriate for: non-obvious business rules, workarounds with references to issues/bugs, safety invariants, and public API contracts. Make comments as short and concise as possible. Exception: every test must use the Arrange / Act / Assert pattern with language-appropriate comment syntax (`# Arrange` for Python, `// Arrange` for C#/Rust/JS/TS). Omit any section that is not needed (e.g. if there is no setup, skip Arrange; if act and assert are the same line, keep only Assert)
@@ -45,3 +62,81 @@ alwaysApply: true
 - Make sure we don't commit binaries, create and keep .gitignore up to date and delete binaries after you are done with the task
 - Never force-push to main or dev branches
 - For new projects, place source code under `src/` (this works for all stacks including .NET). For existing projects, follow the established directory structure. Keep project-level config, tests, and tooling at the repo root.
+- **Never run e2e or CI tests in quiet mode (`-q`).** Always use `-v --tb=short` (or equivalent verbosity flags) in all Dockerfiles, compose files, and scripts that invoke pytest. Full test output must be visible so failures can be diagnosed without re-running. This applies to both Tier-1 (Colima) and Tier-2 (Jetson) harnesses.
+- **Never substitute real algorithm execution with a data passthrough to make tests pass.** If a test is designed to validate output from a specific pipeline (e.g. VIO estimation, sensor fusion, inference), the implementation MUST actually run that pipeline — not bypass it by returning the input data directly as output. Tests that pass by skipping the component they are supposed to exercise create false confidence and hide the fact that the component is not integrated. If the real integration cannot be completed in this session, STOP and report the blocker to the user explicitly. A failing test with an honest explanation is always better than a passing test that proves nothing.
+
+# Language-agnostic engineering principles
+
+The sections below are cross-language paradigms. Each language/framework rule file (e.g. `dotnet.mdc`) is the **stack-specific realization** of these and references back here; the principle lives here, the mechanics live there. When a stack rule and this file appear to conflict, the stack rule wins for that stack (it is the concrete realization) — but flag the divergence so one of the two is corrected.
+
+## Architecture & layering
+
+### Layered separation of concerns
+
+- Keep the **delivery layer thin** (HTTP controllers, CLI commands, message/event handlers, UI handlers): bind/validate input, call **one** business operation, map the result back. **No business logic, no data-store queries, no orchestration in the delivery layer.**
+- Put **business logic behind interfaces in a layer that does not depend on the delivery mechanism** — it must be callable from a different entry point (HTTP, CLI, worker, test) without change. No framework request/response types in a business-layer signature.
+- Put **shared data shapes** (DTOs, value objects, enums, wire contracts) in a layer both can depend on. Dependency direction points **inward**: delivery → business → shared; shared depends on nothing. Never the reverse.
+- Why: business logic fused into the delivery layer can't be reused or unit-tested without booting the whole framework. This is a pragmatic layered split, not a full Clean-Architecture stack — justified for long-lived / complex domains; skip it for throwaway or trivial-CRUD code.
+
+### Service results vs. transport envelopes
+
+- A business operation returns a **domain result** (the values it computed) on success; the delivery layer maps that onto the transport/wire shape. The envelope (field names, status code, headers) is a delivery concern; the domain result is not.
+- **A value the business logic *reads to make a decision* is owned by the business layer** and returned by it — even if the response also echoes it back. Don't let the delivery layer independently re-derive it (two sources for one conceptual value is a latent bug). Canonical case: a "server now" timestamp used to compute staleness AND echoed to the client must be the *same* instant the business layer used.
+- A value that is **purely a transport artifact and never read by business logic** (a `Location`/redirect header, a per-response trace id) is owned by the delivery layer; the business layer never sees it.
+- Heuristic: "does business logic read this value to decide something?" — yes → business layer owns and returns it; no (formatting/transport only) → delivery layer owns it.
+
+## Static members (functions / classes)
+
+- Default to **instance types behind an interface**, injected — that is what is testable (mockable), swappable, and free of hidden global state. `static` is the exception, not the default.
+- **No business logic in a static function — ever.** `static` is for *mechanics* (convert, parse, compute, compare), never for *decisions* (which rule applies, what happens next). Domain decisions live in an injectable service.
+- `static` is appropriate **only** for: pure, stateless, **simple** functions (output depends solely on arguments — no I/O, clock, randomness, shared mutable state — and the body is short and obvious); constants; pure extension/utility helpers; static factory methods. The moment a would-be helper carries domain decisions, branches widely, or is complex enough to deserve its own test suite, make it an instance service.
+- **Never** use `static` for: business/domain logic; anything touching I/O, configuration, time, randomness, or external systems (that is a *service* — define an interface, inject it); or **mutable static state** (a thread-safety and test-isolation hazard — shared state belongs in a single injected instance, never a global mutable field).
+- Library-mandated process-global statics (a metrics registry, a logger handle) are an accepted exception; don't force them behind a bespoke interface.
+
+## Error handling
+
+Builds on "never suppress errors silently" above. Use exceptions for *exceptional* conditions, not normal control flow.
+
+- **Catch in one place.** Centralize error→response mapping at a single boundary (framework exception handler / middleware / error filter), not via `try/catch` scattered through every method. The only legitimate local `catch` blocks: converting a third-party/framework error into a domain error at a boundary, honoring cancellation, or keeping a long-running loop alive (log-and-continue). Never an empty/silent catch.
+- **Three failure tiers, three treatments:**
+  1. **Input validation** → handled at the boundary/validation pipeline, returns a client-error status; do **not** throw for ordinary request-shape validation.
+  2. **Expected business-rule failures** (not-found, conflict, invariant violation, forbidden-by-rule) → a **typed domain failure**: a business-exception hierarchy **or** a result type — pick one per project and be consistent. Each failure carries the status it maps to; there is **no single blanket business status**: not-found → 404, state-conflict → 409, well-formed-but-invariant-violation → 422, rule-forbidden → 403.
+  3. **Unexpected failures** (bugs, infrastructure) → propagate to the central handler, which returns a **generic, opaque** error to the client (never leak internal messages/stack traces in production) and **logs the full error** with a correlation id. Dev environments may surface detail.
+- **Don't throw on hot per-item paths** (inner loops, per-record processing) — represent the outcome as a return value / counted metric there; exceptions are for request/operation-level outcomes.
+- Pick **one** failure-representation strategy project-wide (typed exceptions *or* a result type) and stick to it; don't mix both for the same kind of failure.
+
+## Dependency injection
+
+- Prefer **constructor injection**: a type declares the collaborators it needs and they are provided. This is what makes it unit-testable and its dependencies explicit.
+- **Never capture a shorter-lived dependency inside a longer-lived one** (a request/scoped service held by a singleton — a "captive dependency"). Acquire the short-lived dependency per unit of work instead.
+- Don't manually dispose objects the DI container owns — the container manages their lifetime.
+
+## Configuration
+
+- **Bind configuration to typed objects** and **validate it at startup**, so misconfiguration is a boot-time crash, not a 3 AM runtime page.
+- Don't read raw config keys (`config["a:b"]`) inside business code — bind once, inject the typed object.
+- Secrets come from the environment / secret store per environment; never commit real secrets to source-controlled config files.
+
+## Logging (secrets & structure)
+
+Complements the log-level guidance in "Other preferences".
+
+- **Never log secrets, tokens, passwords, or PII.** Use ids, hashes, or redaction.
+- Prefer **structured logging with message templates / named fields** over string concatenation or interpolation — logs stay queryable and don't allocate when the level is disabled.
+
+## Data access
+
+- Route all application reads/writes through the project's **ORM / data-access layer**. Raw SQL is forbidden by default and allowed only for narrow, **justified** cases (DDL the ORM can't express, vendor-specific operators/functions, a benchmarked hot path) — each documented in a one-line comment and confined behind a single interface, nowhere else.
+- **Prevent N+1**: eager-load or project explicitly. For read-only queries, opt out of change-tracking where the data layer supports it.
+
+## Boundary discipline
+
+- **Don't pass the framework's request/response context** (HTTP context, raw request/response objects) into business logic. Extract the typed values you need at the boundary and pass those down.
+- **Authorize once at the boundary**, not per handler method; name authorization policies centrally and reference the names — don't inline role/permission strings at call sites.
+
+## Testing (real dependencies)
+
+Complements the AAA convention in "Other preferences".
+
+- **Don't use in-memory or fake data stores for query-correctness tests** — their semantics diverge from the real engine (translation differences, no real transactions/constraints). Use the real engine (e.g. a throwaway container) so tests exercise real behavior. Lightweight fakes are acceptable only for fast smoke tests that don't assert query shape.
+- Share expensive test fixtures (server boot, container) across tests instead of paying the cost per test.
@@ -19,7 +19,7 @@ globs: [".cursor/**"]
 - Kebab-case filenames

 ## Agent Files (.cursor/agents/)
- Must have `name` and `description` in frontmatter
+- The `.cursor/agents/` directory is intentionally empty. Per `.cursor/rules/no-subagents.mdc`, the main agent does not delegate to subagents in this workspace. Do not add agent files here without a corresponding rule change.

 ## Security
 - All `.cursor/` files must be scanned for hidden Unicode before committing (see cursor-security.mdc)
@@ -30,10 +30,11 @@ All rules and skills must reference the single source of truth below. Do NOT res

 | Concern | Threshold | Enforcement |
 |---------|-----------|-------------|
-| Test coverage on business logic | 75% | Aim (warn below); 100% on critical paths |
+| Test coverage on business logic | 75% | Aim (warn below); critical-path floor enforced separately (next row) |
+| Test coverage on critical paths | 90% floor / 100% aim | **90% is the enforcement floor** in CI gates, refactor verification, and release pre-flight. **100% is the aim** — drift below 100% but at-or-above 90% is acceptable; drift below 90% blocks. Critical paths = code paths where a bug would cause data loss, security breach, financial error, or system outage; identify from `acceptance_criteria.md` (must-have) and `_docs/00_problem/security_approach.md`. |
 | Test scenario coverage (vs AC + restrictions) | 75% | Blocking in test-spec Phase 1 and Phase 3 |
-| CI coverage gate | 75% | Fail build below |
+| CI coverage gate | 75% overall, 90% critical-path | Fail build below either threshold |
 | Lint errors (Critical/High) | 0 | Blocking pre-commit |
-| Code-review auto-fix | Low + Medium (Style/Maint/Perf) + High (Style/Scope) | Critical and Security always escalate |
+| Code-review auto-fix | Low + Medium (Style/Maint/Perf) + High (Style/Scope) | Critical and Security always escalate. Full categorization: see `.cursor/skills/implement/SKILL.md` § "Auto-Fix eligibility matrix" |

-When a skill or rule needs to cite a threshold, link to this table instead of hardcoding a different number.
+When a skill or rule needs to cite a threshold, link to this table instead of hardcoding a different number. The full auto-fix eligibility matrix (severity × category) lives in `implement/SKILL.md`; cite that file rather than re-tabulating the matrix.
@@ -1,17 +1,293 @@
 ---
-description: ".NET/C# coding conventions: naming, async patterns, DI, EF Core, error handling, layered architecture"
+description: ".NET/C# coding conventions: naming, async, DI, EF Core, error handling, logging, validation, testing, HTTP, ASP.NET Core handler discipline"
 globs: ["**/*.cs", "**/*.csproj", "**/*.sln"]
 ---
 # .NET / C#

+## General
+
 - PascalCase for classes, methods, properties, namespaces; camelCase for locals and parameters; prefix interfaces with `I`
- Use `async`/`await` for I/O-bound operations; the `Async` suffix on method names is optional — follow the project's existing convention
- Use dependency injection via constructor injection; register services in `Program.cs`
- Use linq2db for small projects, EF Core with migrations for big ones; avoid raw SQL unless performance-critical; prevent N+1 with `.Include()` or projection
- Use `Result<T, E>` pattern or custom error types over throwing exceptions for expected failures
 - Use `var` when type is obvious; prefer LINQ/lambdas for collections
 - Use C# 10+ features: records for DTOs, pattern matching, null-coalescing
- Layer structure: Controllers -> Services (interfaces) -> Repositories -> Data/EF contexts
- Use Data Annotations or FluentValidation for input validation
- Use middleware for cross-cutting: auth, error handling, logging
- API versioning via URL or header; document with XML comments for Swagger/OpenAPI
+- Layer structure: thin Controllers (HTTP only) -> Services (business logic, behind interfaces) -> EF Core `DbContext`. See "Solution layout & layering" below for the project split.
+- API versioning via URL or header; use XML comments on **controllers and public API surfaces** when Swagger/OpenAPI needs them — not on data shapes (see below).
+- **Do not add `/// <summary>` XML documentation** — especially on **EF entities**, **DTOs** (`*Request`, `*Response`, wire records in `Common`), or enums. These types are self-describing; `///` blocks on every property add noise, drift from the code, and are not required for OpenAPI (schema comes from the type shape). Do not generate or paste them during refactors. Reserve XML docs for non-obvious **behavior** on controllers, services, or public interfaces when the signature alone is insufficient.
+
+## Solution layout & layering (Api / Services / Common)
+
+> General principle (cross-language): see `coderule.mdc` → "Architecture & layering › Layered separation of concerns". This section is the .NET realization.
+
+Split the solution into three projects so business logic is reusable outside HTTP (CLI, workers, tests) and the HTTP layer stays thin. Use the solution's own prefix for the project names (`*.Api`, `*.Services`, `*.Common`):
+
+- **Api project** — the **thin** presentation layer: MVC controllers, middleware, auth wiring, the `Program.cs` composition root, and DI registration. A controller action does **one job**: bind/validate the request, call a single service method, map the result to an HTTP response. **No business logic, no EF queries, no orchestration** in the API layer. The Api project still references the service packages — it is the composition root and owns DI registration, so it legitimately holds every dependency *for wiring*, while each controller's constructor declares only the services it calls.
+- **Services project** — all business logic, behind interfaces (`IXxxService`). Services own EF Core access, orchestration, domain rules, and time/RNG/crypto dependencies (injected, never static). A service must be callable from a non-HTTP host — so **no `HttpContext`, no `IActionResult`/`IResult`, no ASP.NET types** may appear in a service signature or body.
+- **Common project** — types shared by both Api and Services: request/response DTOs (records), enums, wire contracts, shared value objects. No EF, no ASP.NET, no service logic. Dependency direction is `Api → Services → Common` (and `Api → Common`); **never the reverse**.
+
+Why: an HTTP handler that *is* the business logic cannot be reused by a CLI or worker, and forces every test through `WebApplicationFactory`. Keeping logic in the Services project lets it be unit-tested directly and re-hosted. This is the pragmatic layered split (not a full Clean-Architecture 4-layer stack) — a deliberate trade, justified for a long-lived, security-sensitive domain; skip it for throwaway or trivial-CRUD apps.
+
+- **MVC controllers are the API style here**, not Minimal APIs. Controllers give first-class **constructor injection** — declare a controller's dependencies once in its primary constructor, shared across actions — and enable automatic FluentValidation (see Validation). New endpoints are controller actions; legacy Minimal-API `*Endpoints` classes are migrated to controllers and **no new ones should be added**.
+- **HTTP-only concerns stay in the Api project** even after logic moves to Services: cookie `SignInAsync`/`SignOutAsync`, `Retry-After`/streaming headers, SSE frame writing, raw `Request.Body` framing. These are genuinely HTTP and must NOT be pushed into a service.
+
+## Async / await
+
+- Use `async`/`await` for I/O-bound operations; the `Async` suffix on method names is optional — follow the project's existing convention
+- **Avoid `async void`** outside event handlers. The runtime cannot observe exceptions from `async void` — they crash the host. Always return `Task`/`Task<T>` and `await` the call.
+- **Never block on async code** with `.Result`, `.Wait()`, or `.GetAwaiter().GetResult()` in any ASP.NET Core code path. Use `await`. Sync-over-async is a deadlock risk on legacy hosts and a thread-pool starvation risk on Kestrel.
+
+## Dependency injection
+
+> General principle (cross-language): see `coderule.mdc` → "Dependency injection". Below is the .NET realization.
+
+- Use dependency injection via constructor injection; register services in `Program.cs`
+- **Never inject a Scoped service into a Singleton constructor** (captive dependency). Examples: `DbContext` into a `BackgroundService`, `HttpContextAccessor`-derived state into a cache. Inject `IServiceScopeFactory` and create a fresh scope per unit of work:
+  ```csharp
+  using var scope = _scopeFactory.CreateScope();
+  var db = scope.ServiceProvider.GetRequiredService<AppDbContext>();
+  ```
+- Don't manually `Dispose` services resolved from the DI container — the container disposes them at scope/app shutdown.
+
+## Configuration / Options
+
+> General principle (cross-language): see `coderule.mdc` → "Configuration". Below is the .NET realization.
+
+- Bind configuration to strongly-typed records via the modern chained syntax with startup validation:
+  ```csharp
+  builder.Services
+      .AddOptions<FooSettings>()
+      .BindConfiguration("Foo")
+      .ValidateDataAnnotations()
+      .ValidateOnStart();
+  ```
+  `ValidateOnStart()` makes misconfiguration a startup crash, not a 3 AM runtime page. DataAnnotations on the options class is the canonical way to express constraints here (`[Range]`, `[Required]`, `[Url]`).
+- Don't read `IConfiguration["Foo:Bar"]` directly in business code. Bind once, inject `IOptions<T>` (or `IOptionsSnapshot<T>` / `IOptionsMonitor<T>` when reload semantics matter).
+- Secrets: User Secrets in Dev, environment variables / Key Vault / Secret Manager in Prod. Never commit real secrets to `appsettings.*.json`.
+
+## Logging
+
+> General principle (cross-language): see `coderule.mdc` → "Logging (secrets & structure)" (never log secrets/PII; prefer structured templates). Below is the .NET realization.
+
+- **Never use `$"..."` interpolation inside `ILogger.Log*` calls.** It allocates regardless of log level and breaks structured logging. Use template parameters (`logger.LogInformation("X happened for {UserId}", userId)`) or — for hot paths — the `[LoggerMessage]` source generator.
+- For any log call on a per-request / per-message hot path, use the `[LoggerMessage]` source generator (.NET 6+). Zero allocation when the level is disabled, no boxing, compile-time placeholder validation:
+  ```csharp
+  public partial class MyService(ILogger<MyService> logger)
+  {
+      [LoggerMessage(EventId = 1001, Level = LogLevel.Information,
+          Message = "User {UserId} placed order {OrderId}")]
+      private partial void LogOrderPlaced(int userId, string orderId);
+  }
+  ```
+  The older `LoggerMessage.Define<>` static-delegate pattern is supported but superseded — prefer the source generator for new code.
+- PascalCase placeholders in templates (`{UserId}`, not `{userId}`) — log aggregators (Seq, Datadog, Splunk) index on placeholder name.
+- Never log secrets, full bearer tokens, passwords, or PII. Use IDs, hashes, or redaction.
+- **Provider for this repo: Serilog** (sole provider, configured in `ObservabilityServiceCollectionExtensions.ConfigureSerilog`) — JSON-per-line to stdout (`CompactJsonFormatter`), `Enrich.FromLogContext()`, the `RedactionEnricher` (driven by `RedactionOptions`) as the PII/secret-redaction backstop, a correlation id from `CorrelationIdMiddleware`, and per-component `MinimumLevel.Override` from `LoggingOptions`. Log through `ILogger<T>` (do not call Serilog's static `Log.*` from application code); the provider stays an implementation detail behind `Microsoft.Extensions.Logging`. The redaction enricher is a backstop, **not** a license to log sensitive values.
+
+## Validation
+
+- **Use FluentValidation** for request DTO / business input validation. Register validators with `services.AddValidatorsFromAssemblyContaining<MarkerType>()`.
+- **Controllers: rely on automatic validation.** Add `AddFluentValidationAutoValidation()` (from `SharpGrip.FluentValidation.AutoValidation.Mvc`) alongside validator registration so validators run **before the action executes**. **Do not** call `await validator.ValidateAsync(...)` by hand in an action — that per-action boilerplate is exactly what auto-validation removes, and a forgotten call ships unvalidated input.
+  - **Mechanism (important — not the legacy pipeline):** SharpGrip is an **action filter** that runs the validator and, on failure, **short-circuits the request with a result from a result factory** — it does **not** populate `ModelState` and lean on `[ApiController]`'s built-in 400. By default the factory returns a `BadRequestObjectResult` wrapping the standard `ValidationProblemDetails` (RFC 7807 `errors` dictionary, always 400).
+  - **Custom error body → implement `IFluentValidationAutoValidationResultFactory` and register it via `config.OverrideDefaultResultFactoryWith<T>()`.** Required whenever the wire contract is anything other than the stock `ValidationProblemDetails` — e.g. this project's slug-keyed `problem+json` (`type = .../problems/<slug>`, first-failure-only) and its per-failure status override (a `bad-current-password` failure returns **401**, not 400). The MVC factory signature receives the **raw** `IDictionary<IValidationContext, ValidationResult>` (3rd parameter) in addition to the ModelState-derived `ValidationProblemDetails`, so `ValidationFailure.ErrorCode` (the slug) and `ValidationFailure.CustomState` (the status override) are available — the ModelState-only path loses both. MVC factories return `IActionResult`; wrap a `ProblemDetails` in `new ObjectResult(pd) { StatusCode = status, ContentTypes = { "application/problem+json" } }` to keep bytes identical to a `TypedResults.Problem(...)` body.
+  - The old `FluentValidation.AspNetCore` built-in auto-validation (the ASP.NET **validation-pipeline** mode, `services.AddFluentValidation(...)`) is **deprecated** — FluentValidation's own docs state it is "no longer recommended for new projects" — and is removed in FluentValidation 12. SharpGrip's action filter is the upstream-blessed automatic successor and runs **async** (the pipeline mode was sync-only, a problem for DB-lookup rules). FluentValidation's *other* recommended path is plain **manual** `ValidateAsync` — acceptable, but rejected here because it repeats the validate/return boilerplate in every action.
+  - .NET 10's native `AddValidation()` is **Minimal-API + DataAnnotations + synchronous only** — not a substitute for FluentValidation here.
+- Invoke a validator explicitly **only** for a rule that cannot run in the model pipeline (e.g. it needs a service result already fetched inside the action). Keep that the exception, not the norm.
+- DataAnnotations are acceptable on Options classes (paired with `.ValidateDataAnnotations()` per the Options section) and on simple non-FluentValidation property checks. Don't mix the two for the **same** DTO.
+
+## JSON serialization (property naming)
+
+- **Set the wire naming convention once, globally**, via `JsonSerializerOptions.PropertyNamingPolicy` — never by decorating every property. The convention is **lower camelCase** (`JsonNamingPolicy.CamelCase`) — the ASP.NET Core Web default and the idiomatic JS/TS-friendly shape. Configure it once in the composition root:
+  ```csharp
+  // Minimal-API / endpoint serialization
+  builder.Services.ConfigureHttpJsonOptions(o =>
+      o.SerializerOptions.PropertyNamingPolicy = JsonNamingPolicy.CamelCase);
+  // MVC controllers
+  builder.Services.AddControllers()
+      .AddJsonOptions(o => o.JsonSerializerOptions.PropertyNamingPolicy = JsonNamingPolicy.CamelCase);
+  ```
+  DTO members stay plain PascalCase C# (`ServerNow`, `DeviceId`) and serialize **and deserialize** as `serverNow`, `deviceId` automatically.
+- **Migration note (BREAKING — not behavior-preserving).** The contract historically shipped `snake_case` (`server_now`, `device_id`, …), consumed raw by the SPA (`web/`), the TS types, E2E/blackbox tests, `TestCommon` DTOs, seed fixtures, and `_docs/`. Flipping the policy to camelCase renames **every field on the wire**, so it is a breaking change tracked as **its own ticket** and must land **atomically** with the SPA + tests + fixtures + docs update (and an API version bump). Do **not** flip the policy — or strip the snake_case attributes — in isolation, and never inside a "behavior-preserving" refactor task.
+- **`[JsonPropertyName("...")]` is for overrides only — names the global policy cannot derive — never the default way to set casing.** It always wins over the policy, so reach for it ONLY when:
+  - the wire name is **irregular** vs. what the policy produces — e.g. acronym casing the CamelCase policy only lowercases the first char of (`IPAddress` → `iPAddress`, `DeviceID` → `deviceID`) when the contract wants `ipAddress`/`deviceId`, or an external contract demands an exact string we don't control;
+  - the wire name is **not a valid C# identifier** or otherwise inexpressible by any policy.
+- Decorating every property with `[JsonPropertyName("...")]` to emulate a global policy is a **code-review-fail signal**: it is noise, it drifts, and it silently shadows the policy. If a whole DTO's attributes merely restate what the policy would produce, delete them and rely on the policy.
+- Enum string values use a `JsonStringEnumConverter`; keep its naming policy consistent with the property policy.
+- Grounding: Microsoft's System.Text.Json docs recommend the global `PropertyNamingPolicy` for project-wide conventions and reserve `[JsonPropertyName]` for exact-string overrides (it takes highest precedence and overrides the policy).
+
+## Error handling
+
+> General principle (cross-language): see `coderule.mdc` → "Error handling". This section is the .NET realization (the three-tier model, central handler, opaque-500, and status mapping all originate there).
+
+This project uses a **business-exception model with one central handler** — *not* `Result<T,E>` and *not* per-method `try/catch`. Three failure tiers, three treatments:
+
+1. **Input validation** — handled by the **auto-validation action filter, never by throwing.** FluentValidation auto-validation (see Validation) short-circuits the request before the action runs and returns the `400` (slug-keyed `problem+json` via the custom result factory). Do **not** raise a `ValidationException` for request-shape validation.
+2. **Business-rule violations** (expected, part of the API contract: not-found, conflict, invariant violation, forbidden-by-rule) — the service **throws a `BusinessException` subtype**. Services express failure by throwing; they do **not** return error-wrapper values and do **not** catch their own business exceptions.
+3. **Unexpected failures** (bugs — NRE, invariant breaks; infrastructure — DB unreachable, network) — thrown by the framework/runtime and left to **propagate** to the central handler.
+
+### Business exception hierarchy
+
+- A single abstract base — `abstract class BusinessException : Exception` — carries the HTTP mapping data: an `int Status` and a stable `string Slug` (and optional extension members). Every expected, contract-level failure is a concrete subtype that fixes its own status; **there is no single blanket business status code**:
+  - not-found → `404`
+  - state conflict (duplicate key, concurrent edit, illegal state transition) → `409`
+  - well-formed request that violates a business invariant → `422`
+  - forbidden by a business rule (not auth-scheme denial) → `403`
+- The `Slug`/`Status`/title **must reuse the existing `FleetViewerProblems` slug catalog** (`Common/Problems/`) so the `application/problem+json` wire contract (`type` URI, `title`, `status`, any `code` extension) stays byte-identical to what blackbox tests pin. The catalog stays the single source of truth for the error contract; the exception types reference it.
+- Choose `422` vs `409` by meaning, never interchangeably: `422` = the request is well-formed but the business invariant rejects it; `409` = it conflicts with the resource's current state.
+
+### Central handler (catch in exactly one place)
+
+- Register **one** `IExceptionHandler` via `builder.Services.AddExceptionHandler<...>()` + `AddProblemDetails()` + `app.UseExceptionHandler()`. It maps:
+  - `BusinessException` → `ProblemDetails` built from its `Status` + `FleetViewerProblems.TypePrefix + Slug` (+ extensions). **Do NOT log these as errors** — they are expected 4xx contract outcomes; at most a `Debug`/`Information` line. Logging them at `Error` pollutes the error rate and pages on-call for normal client mistakes.
+  - **everything else (unexpected)** → `500` `ProblemDetails` with a **fixed, opaque production body** — `title: "Unexpected error"`, `detail: "An unexpected error occurred. Our team has been notified."` — and **log the full exception to Serilog at `Error`** (`logger.LogError(ex, ...)`) with the correlation id, so the log entry correlates to the client's response. The body must **never** carry the exception message, stack trace, or any internal detail (information-disclosure risk). In `Development` only, it is acceptable to surface `ex.Message`/stack in the body to aid debugging — gate that on `IHostEnvironment.IsDevelopment()`.
+- **No per-method `try/catch` for error mapping.** A handler/controller does not catch business exceptions to turn them into responses — that is the central handler's only job. Legitimate local `catch` blocks remain only for: converting a third-party/framework exception into a `BusinessException` at a boundary, honoring `OperationCanceledException`, or keeping a background loop alive (catch-log-continue). Never an empty/silent catch (see `coderule.mdc`).
+- **Do not throw on hot per-item paths** (e.g. ingest per-record processing): exceptions are for request-level outcomes, not inner loops — return/skip with a counted metric there.
+- API error responses are always `ProblemDetails` (RFC 7807) with a stable slug `type` when the failure is part of the contract.
+
+## HttpClient
+
+- **Never `new HttpClient()` per request** (sockets enter `TIME_WAIT` for ~240s; you exhaust the ephemeral port range under load).
+- **Never use a naive `static HttpClient`** either (handlers don't rotate, DNS changes are missed).
+- Register via `IHttpClientFactory` — typed or named clients:
+  ```csharp
+  builder.Services.AddHttpClient<MyApiClient>(c => c.BaseAddress = new Uri("https://api.example.com"));
+  ```
+- **Don't capture a typed `HttpClient` in a singleton.** Typed clients are Transient; capturing one in a singleton defeats handler rotation. Inject `IHttpClientFactory` into the singleton and call `CreateClient(name)` per operation, **or** configure `SocketsHttpHandler.PooledConnectionLifetime` so DNS refreshes at the socket level instead of the factory level.
+
+## Modern C# / nullable reference types
+
+- Enable nullable reference types (`<Nullable>enable</Nullable>`) on every new project.
+- **Don't paper over NRT warnings with `!`** (null-forgiving operator). Prefer:
+  - `required` members (C# 11) for properties the caller must initialize via object initializer.
+  - Constructor parameters for invariants established at construction.
+  - `[NotNullWhen(true)]` / `[NotNull]` / `[MaybeNull]` attributes for `Try*` patterns.
+- Use `ArgumentNullException.ThrowIfNull(x)` at the top of any public method taking a reference-type argument. NRTs are design-time only; library entry points still need runtime guards.
+
+## Static classes and static members
+
+> General principle (cross-language): see `coderule.mdc` → "Static members (functions / classes)". Below is the .NET realization plus framework-specific exemptions.
+
+Default to **instance classes behind an interface, registered in DI and constructor-injected.** That is what makes a unit testable (mockable), swappable, and free of hidden global state. `static` is the exception, not the default — reach for it only when the alternative below clearly applies.
+
+**No business logic in a static method — ever.** `static` is for *mechanics* (convert, parse, compute, compare), never for *decisions* (what the system should do, which rule applies, what happens next). Domain logic lives in a service.
+
+- **`static` is appropriate ONLY for:**
+  - **Pure, stateless, and SIMPLE functions** — output depends solely on the arguments; no I/O, no clock, no `Random`/`Guid.NewGuid`, no DB/file/network, no mutable shared state; **and** the body is short and obvious (math, encoding/decoding, parsing, formatting, a small predicate). Simplicity — not purity alone — is the bar: the moment a would-be helper carries domain decisions, branches across many cases, or is complex enough to deserve its own unit-test suite, it stops being a "helper." Make it an **instance service behind an interface** so it is injectable, mockable by its collaborators, and discoverable. A complicated *pure* function still belongs in a service.
+  - **Extension methods** over framework or domain types, when the body is pure and simple (e.g. claim/identity readers, enum⇄wire mappers).
+  - **Constants / well-known values** (a `static class` holding `const`s).
+  - **Static factory methods** on a type (private ctor + `public static Create(...)` returning a fully-formed instance) — an accepted construction pattern, distinct from a static *service*.
+- **Never use `static` for:**
+  - **Business / domain logic of any kind**, even if currently it looks "pure." Decisions belong in a tested, injectable service.
+  - A helper that touches I/O, configuration, time, randomness, or any external system — that is a *service*. Define an interface, make it an instance class, inject it. A static method that reaches a DB/clock/file cannot be mocked and forces brittle integration-style tests.
+  - **Mutable static fields of any kind.** Global mutable state is a thread-safety and test-isolation hazard. A cache or in-memory state store belongs in a DI **singleton behind an interface**, never a `static Dictionary`.
+  - Avoiding `new`/DI "ceremony." DI registration is one line and buys testability; saving it is never a reason to go static.
+- **Controllers are instance classes (constructor DI), not static.** A controller is `[ApiController] public sealed class XxxController(IXxxService svc) : ControllerBase { ... }` — dependencies are constructor-injected, actions are thin, and the type is never `static`. This is the standard for all new HTTP code (see "Solution layout & layering").
+- **Transitional exemption — legacy Minimal-API endpoint classes.** Existing `internal static class XxxEndpoints` exposing `MapXxxEndpoints(this RouteGroupBuilder group)` + `static` handler methods are the idiomatic *Minimal-API* pattern (no static state; deps are per-request method parameters; testable via `WebApplicationFactory`) and are **not** a static-class violation **while they exist**. Where the codebase has chosen controllers, migrate them and do **not** add new ones; until migrated, keep handler bodies thin with logic in injected services.
+- The static-OK rule also covers framework callback types that the runtime instantiates or invokes by convention — `AuthenticationHandler<TOptions>`, middleware `InvokeAsync`, `CookieAuthenticationEvents`, route predicates. They legitimately receive `HttpContext`/framework primitives and are not "static-class" or "HttpContext-discipline" violations.
+- **Library-mandated process-global statics are an accepted exception.** Some libraries are *designed* around a process-global, thread-safe static registry — e.g. a metrics library's `static readonly` counter/gauge collectors, or a `static` logger handle. Those `static readonly` fields are not the "mutable static state" this rule bans; do not force them behind a bespoke interface. A stateless utility over the system CSPRNG is likewise acceptable as `static` (folding it behind an interface for consistency with sibling generators is a fine choice, not a requirement).
+
+## Data access (EF Core)
+
+> General principle (cross-language): see `coderule.mdc` → "Data access" (single ORM path, justify raw SQL, prevent N+1). Below is the EF Core realization.
+
+- **Use the project ORM (EF Core for this repo) as the ONLY data-access path for application reads/writes.** Raw SQL via `CommandText`, `FromSqlRaw`, `FromSqlInterpolated`, `ExecuteSqlRaw`, `ExecuteSqlInterpolated`, or `NpgsqlCommand`/`NpgsqlConnection.CreateCommand()` is **forbidden by default** in endpoint, service, and repository code. Reaching for raw SQL because "it's simpler" or "EF generates ugly SQL" is not a valid reason — write the LINQ query, profile if you must, and only then justify a workaround.
+  - Narrow exceptions (each requires a 1-line comment in the code naming the EF limitation being worked around):
+    - **DDL the ORM cannot express** — `CREATE EXTENSION`, vendor enum-cast DEFAULT (`HasDefaultValueSql("'active'::device_state")`). Confine to migrations or to one-shot `IHostedService.StartAsync` bootstrap hooks.
+    - **Vendor-specific operators / functions** (e.g., TimescaleDB `time_bucket`, `make_interval(secs => ...)`, hypertable functions, PostGIS `ST_*`). Wrap each operator in a single repository method behind an interface; nowhere else in the codebase touches raw SQL for that operator. Prefer EF Core function mapping (`HasDbFunction` + `[DbFunction]`) before falling back to `FromSqlInterpolated`.
+    - **Benchmarked hot path** where EF demonstrably generates a worse plan than hand-rolled SQL. Requires a `BenchmarkDotNet` file checked in next to the workaround proving the gap. "We think it's faster" is not a benchmark.
+  - Prevent N+1 with `.Include()` / projection / explicit `.Select()`. New raw-SQL sites that do not fit one of the three exceptions MUST be flagged in code review as **High** severity (Maintainability / Architecture). Reviewers reject the PR until the SQL is either replaced with LINQ or moved behind a justified repository method with the required comment.
+- **`AsNoTracking()` on every read-only query.** The change tracker costs ~50% more memory and 2.9–5.2× more time on typical reads; you pay it for nothing on `GET` endpoints, reports, lookups. For read-heavy services, set `QueryTrackingBehavior.NoTracking` as the DbContext default and opt **in** to tracking with `.AsTracking()` on update paths.
+
+## ASP.NET Core handler discipline (controllers)
+
+> General principle (cross-language): see `coderule.mdc` → "Boundary discipline" (don't leak request/response context into business logic; authorize once at the boundary). Below is the ASP.NET Core realization.
+
+These rules keep controller actions and services free of framework primitives that hide dependencies, defeat unit testing, and bypass the auth/binding pipelines the framework already gives you. (They also apply to the legacy Minimal-API handlers still being migrated.)
+
+### `HttpContext` discipline
+
+- **Do not pass `HttpContext`, `HttpRequest`, `HttpResponse`, or `IHttpContextAccessor` into services or repositories.** Extract the values you need (headers, route values, body, `ClaimsPrincipal`) inside the handler and pass them down as typed parameters.
+- Take `HttpContext` (or `HttpRequest`/`HttpResponse`) as a handler parameter **only** when no binding source can express the requirement. Concrete examples that justify it:
+  - Custom body framing or streaming (you read `Request.Body`/`BodyReader` yourself).
+  - Multiple discriminated payload shapes on one URL that cannot be one DTO.
+  - Pre-allocation size caps that must reject **before** the body materializes into objects.
+  - Writing a custom response envelope that doesn't fit `Results.*`/`TypedResults.*`.
+  Document the reason with a `//` comment on the parameter or above the method.
+- Prefer **separate endpoints/methods** over discriminated payload shapes on one URL. Only fuse them when splitting would duplicate the majority of the validation logic — otherwise you trade testability for one fewer route registration, which is rarely worth it.
+- Default to specific binding sources: `[FromBody]`, `[FromQuery]`, `[FromHeader]`, `[FromRoute]`, `[FromServices]`, `ClaimsPrincipal user`, `CancellationToken cancellationToken`. Each of those is documented, testable, and integrates with OpenAPI.
+
+### JSON deserialization
+
+- **Default to `[FromBody]` + a typed `record`/DTO.** The framework calls `JsonSerializer.DeserializeAsync` for you, validates `Content-Type`, surfaces `BadHttpRequestException` on malformed input, and produces OpenAPI metadata.
+- Direct `JsonDocument` / `Utf8JsonReader` parsing of `Request.Body` is allowed **only** when typed deserialization cannot express the required validation. Allowed reasons:
+  - **Typed slug-keyed error envelopes** that the standard binder cannot produce (e.g., per-field problem+json with a stable `type` URI).
+  - **Pre-allocation size caps** that must reject `batch-too-large` before the array materializes.
+  - **Shape discrimination at parse time** when the alternative is a single fat DTO + runtime branching.
+  Each site needs a one-line comment naming which exception applies.
+- Reading raw `Request.Body` for plain typed JSON content is a code-review-fail signal in the absence of one of the named exceptions.
+
+### Custom authentication schemes
+
+- Custom bearer/token/API-key schemes go through **`AuthenticationHandler<TOptions>`** registered via `AddAuthentication().AddScheme<TOptions, THandler>(name, …)`. Apply `.RequireAuthorization(new AuthorizeAttribute { AuthenticationSchemes = name })` or `[Authorize(AuthenticationSchemes = name)]` on the endpoint.
+- **Do not read `Authorization` / cookie / API-key headers manually inside a handler that is `.AllowAnonymous()`.** That bypasses the auth pipeline, makes the auth logic unreusable for any second endpoint, and forces tests to reach the logic via reflection.
+- If you need a custom 401/403 body envelope (e.g. typed `application/problem+json` with a slug), override `HandleChallengeAsync` / `HandleForbiddenAsync` in the scheme handler — not by bypassing the pipeline.
+- In the endpoint, take `ClaimsPrincipal user` as a parameter and read identity from claims (`user.FindFirstValue(...)`). The auth handler is responsible for putting the right claims on the principal.
+
+### Authorization (declare-once at the boundary)
+
+- Authorize at the **boundary, once** — not per action. In MVC, put `[Authorize(Policy = "...")]` on the **controller class** (or a shared base controller); every action inherits it. Override on a single action with a narrower `[Authorize(Policy = ...)]` / `[AllowAnonymous]` only where it genuinely differs.
+- The Minimal-API equivalent is `group.MapGroup("/...").RequireAuthorization(policy)` on the **route group**. Both compile to the **same authorization metadata** — the group-level fluent call and the class-level attribute are equally correct and equally DRY. Per-method attributes / per-endpoint `RequireAuthorization` are for intentional per-route overrides only.
+- Name policies centrally (a single constants holder) and reference the constant — never inline role strings at the call site.
+
+### Current-user / identity access
+
+- **Inject `ClaimsPrincipal` directly into handlers for current-user identity; read it through the shared `ClaimsPrincipalExtensions` (`GetUserId()`, `GetSessionId()`, `GetDeviceId()`).** Do **not** wrap identity access in an `ICurrentUser` / `ICurrentUserProvider` service by default.
+- Why `ClaimsPrincipal` is the right seam here (not an over-coupling):
+  - It is a **data-driven seam whose producer is the auth handler** — the cookie scheme, `DeviceBearerAuthenticationHandler`, or any future JWT all populate the *same* `ClaimsPrincipal`. The handler is already decoupled from *how* identity was obtained.
+  - It is **available for free** in the HTTP layer — `ControllerBase.User` in a controller action (or a `ClaimsPrincipal user` parameter in a legacy Minimal-API handler), sourced from `HttpContext.User`; no `IHttpContextAccessor`, no scoped registration, no lifetime caveat. Identity stays in the `Api` layer: a controller reads `User`, extracts the IDs it needs via `ClaimsPrincipalExtensions`, and passes **plain values** (`Guid userId`) into the service — `ClaimsPrincipal` does not cross into the Services layer.
+  - It is **testable without an interface**: `ClaimsPrincipal` is `new`-able with arbitrary claims and its behaviour (`IsInRole`, `FindFirst`, the extensions) is fully driven by those claims. Construct a real principal with test claims — preferable to a mocked `IPrincipal`, which can diverge from real claim-matching semantics. (In this repo, handlers are exercised over HTTP via `WebApplicationFactory` with a real login, so identity is never substituted anyway.)
+  - The `ClaimsPrincipalExtensions` already provide the domain-friendly, centralized read surface that a provider's properties would duplicate.
+- A current-user provider adds a scoped `IHttpContextAccessor`-backed service — exactly the captive-dependency shape the DI section warns about — to replace a free, already-abstracted, already-testable binding. That fails the "simplicity is the highest priority" bar unless one of the concrete triggers below holds.
+- **Introduce an `ICurrentUser` abstraction ONLY when a named trigger appears:**
+  1. **Identity is needed outside an HTTP request** — background job, message consumer, worker thread — where `ClaimsPrincipal` cannot be bound from the pipeline. A provider with swappable impls (HTTP-backed vs job-context) earns its keep.
+  2. **The domain layer must consume identity** and you do not want `System.Security.Claims` types leaking into domain code — expose a domain-pure `ICurrentUser` value instead.
+  3. **You need richer-than-claims current-user data** (a loaded `User` entity, tenant, permission set) resolved and cached per request.
+  When introduced: back the HTTP implementation with `IHttpContextAccessor`, register it **Scoped**, never capture it in a singleton, and keep `ClaimsPrincipalExtensions` as the implementation detail it delegates to.
+
+### Response shapes
+
+**Controllers (the standard here): default to `ActionResult<T>`.** It mixes the success type `T` with `ActionResult` error shapes, participates in MVC's configured output formatters / content negotiation, and is the most reliable for OpenAPI:
+- Annotate with `[ProducesResponseType]`; the `Type` can be **omitted for the success code** (`[ProducesResponseType(StatusCodes.Status200OK)]`) — it is inferred from `T`. Add one attribute per additional status code (`404`, `409`, …).
+- Return the value directly (`return product;` — implicit cast to `200 OK`) or a `ControllerBase` helper for other shapes (`NotFound()`, `Conflict()`, `BadRequest(error)`, `CreatedAtAction(...)`).
+- The auto-validation action filter already produces the `400` for invalid input before the action runs (see Validation) — don't hand-write that path.
+- Keep the action **thin**: it maps the service's **success value** onto the success shape (`return product;` → `200`, `CreatedAtAction(...)` → `201`) and does not compute the business decision itself. **Expected failures are not mapped here** — the service throws a `BusinessException` subtype and the central `IExceptionHandler` produces the `ProblemDetails` (see Error handling). So a controller action has essentially no error branches: happy path in, success shape out.
+- `TypedResults` / `Results<T1, T2, …>` / `IResult` **are** usable in controllers, but they are the *Minimal-API* idiom and they **bypass MVC's configured output formatters / content negotiation** (they write the response directly — Microsoft Learn: "Does not leverage the configured Formatters"). Prefer `ActionResult<T>` in a controller; reach for `IResult` only for a deliberately format-agnostic raw response.
+
+**Legacy Minimal-API endpoints (until migrated): default to `TypedResults.*`** over `Results.*`. `TypedResults` returns concrete types (`Ok<T>`, `NotFound`, `BadRequest<T>`) that carry OpenAPI metadata and are unit-testable without casting. For handlers that return more than one shape, declare the return type as `Results<T1, T2, ...>` — the compiler enforces every branch returns a declared type and the OpenAPI generator reads the union, so no `Produces`/`ProducesResponseType` attributes are needed:
+  ```csharp
+  app.MapGet("/items/{id}", Results<Ok<Item>, NotFound> (int id) =>
+      item is not null ? TypedResults.Ok(item) : TypedResults.NotFound());
+  ```
+  Don't mix `Results.*` and `TypedResults.*` in the same handler — you lose the metadata.
+
+### Service results vs. wire envelopes
+
+> General principle (cross-language): see `coderule.mdc` → "Architecture & layering › Service results vs. transport envelopes". Below is the .NET realization.
+
+- A service returns a **domain result** — a record of the values it computed (`IReadOnlyList<LiveDevice>`, a small snapshot record) on success, and **throws a `BusinessException` subtype** on an expected failure (see Error handling); it does not return error-wrapper values. The **controller maps the success value onto the wire DTO**. The response envelope (the `*Response` record, its field names, the HTTP status) is an **Api-layer concern**; the domain result is not, and ASP.NET / wire types must not appear in a service signature (see "Solution layout & layering").
+- **A value that the response echoes to the client but that the service ALSO used to compute the result is owned by the service** — it returns that value alongside the data; the controller must NOT independently re-derive it. Two clocks/sources for the same conceptual value is a latent bug.
+  - Canonical case: a "server now" timestamp that a projection uses to decide freshness/staleness (which devices are dropped, what color each gets) **and** is echoed so the client renders relative ages consistently. If the controller stamped its own `DateTimeOffset.UtcNow`, it would diverge from the instant the service filtered against — a boundary bug.
+  - Pattern: the service injects `TimeProvider`, captures the instant **once**, uses it, and returns it inside a domain result — e.g. `LiveSnapshot(DateTimeOffset CapturedAt, IReadOnlyList<LiveDevice> Devices)`. The controller returns `ActionResult<LiveStateResponse>`, mapping `CapturedAt → server_now`. The envelope name and JSON shape stay in the Api layer; the *instant* originates in the Services layer where it is consumed.
+- The opposite case: a value that is **purely an HTTP/transport artifact and is never consumed by domain logic** (a `Location` header, a per-response correlation id minted for tracing) is owned by the **Api layer** and the service never sees it.
+- Heuristic: ask "does the business logic *read* this value to make a decision?" If yes → it lives in the service and is returned. If it is only *formatting/transport* → it lives in the controller.
+
+## Testing
+
+> General principle (cross-language): see `coderule.mdc` → "Testing (real dependencies)" (real engine over fakes for query-correctness; share expensive fixtures). Below is the .NET realization.
+
+- **xUnit** is the test framework for this repo. Use its per-test class lifecycle (constructor = setup, `IDisposable.Dispose` / `IAsyncLifetime.DisposeAsync` = teardown) — that's what most integration-testing patterns assume.
+- **FluentAssertions** for assertions: `result.Should().Be(...)`, `collection.Should().HaveCount(3).And.ContainSingle(x => ...)`, etc. Failure messages are much clearer than raw `Assert.Equal`, and the fluent chain reads like the spec it tests.
+- **`WebApplicationFactory<Program>`** for ASP.NET Core integration tests. It boots the real DI container and pipeline from `Program.cs` in-memory. Expose `Program` to the test project with `public partial class Program;` in `Program.cs`. Share the factory across tests in a class with `IClassFixture<T>` and across classes with `ICollectionFixture<T>` — host-boot is the expensive step; don't re-pay it per test.
+- **Never use the EF Core in-memory provider for query-correctness tests.** Its semantics diverge from real Postgres/SQL Server (LINQ translation differences, no real transactions, no concurrency tokens). Use Testcontainers (real Postgres container via `IAsyncLifetime` on the factory) + Respawn for between-test cleanup. The in-memory provider is acceptable only for fast smoke tests where you're not asserting query shape.
+- Tests follow the Arrange / Act / Assert pattern with `// Arrange` / `// Act` / `// Assert` comments (workspace convention; see `coderule.mdc`).
+
+## Cross-cutting
+
+- Use middleware for cross-cutting: auth, error handling, logging. Standard order in `Program.cs`: forwarded headers → exception handler → HTTPS/HSTS → static files → routing → CORS → authentication → authorization → rate limiter → endpoints.
@@ -4,6 +4,26 @@ alwaysApply: true
 ---
 # Agent Meta Rules

+## Real Results, Not Simulated Ones
+
+**The goal is a working product, not the appearance of one.**
+
+- If something does not work, STOP and report it honestly. Do not find a way around it.
+- Never produce results by bypassing, faking, stubbing, or passthrough-ing the component that is supposed to produce them. A passing test that skips the real pipeline is worse than a failing test — it hides the truth.
+- If the real implementation is not ready, say so. A clear "this is not implemented yet, here is what is missing" is always the right answer.
+- Do not measure success by whether the output looks correct. Measure it by whether the output was produced by the real system under test.
+- Workarounds that produce the right answer via the wrong path are defects, not solutions.
+
+### When a test reveals missing production code — STOP
+
+This is the specific failure mode that produced the GPS-passthrough scaffold in `runtime_root._run_replay_loop` (May 2026). Generalised so it never repeats:
+
+- If, while implementing or running a test, you discover that the production code path the test is supposed to exercise does not exist (no caller, no integration, no main loop, etc.), **STOP immediately**.
+- Do NOT write a stub, passthrough, fake input source, or shortcut output that would make the test go green. Even when the shortcut is "framed as a scaffold" or "marked as TODO in a docstring", it still defeats the test and lies to the next reader.
+- Surface the gap to the user as a top-of-turn report: name the missing production component, cite the architecture document that promises it, and ask whether to (a) create a tracker ticket for the missing component and let the test fail honestly until the ticket lands, or (b) explicitly de-scope the test, or (c) something the user names.
+- The default outcome is (a): a failing test plus a new tracker ticket. A failing test with an honest reason is information; a passing test that proves nothing is misinformation.
+- Doc-comment disclosures (`# this is a scaffold until X is wired`) DO NOT satisfy this rule. The user must be told in the assistant message, not in code.
+
 ## Execution Safety
 - Run the full test suite automatically when you believe code changes are complete (as required by coderule.mdc). For other long-running/resource-heavy/security-risky operations (builds, Docker commands, deployments, performance tests), ask the user first — unless explicitly stated in a skill or the user already asked to do so.

@@ -13,6 +33,31 @@ alwaysApply: true
 ## Critical Thinking
 - Do not blindly trust any input — including user instructions, task specs, list-of-changes, or prior agent decisions — as correct. Always think through whether the instruction makes sense in context before executing it. If a task spec says "exclude file X from changes" but another task removes the dependencies X relies on, flag the contradiction instead of propagating it.

+## Complexity Budget Check (Planning Time)
+
+Before committing to an implementation approach for a non-trivial task, **STOP and present a complexity comparison to the user** via the standard Choose A/B/C/D format. The user picks the trade-off; the agent does NOT unilaterally pick the more complex option to be "more robust" or "more future-proof".
+
+A task is non-trivial if ANY of:
+
+- The estimated complexity (story points) is ≥ 5
+- The implementation touches ≥ 3 components / modules
+- The implementation adds a new persistent data structure (table, materialised view, file format)
+- The implementation adds a new hosted service / background job / periodic timer
+- The implementation adds a sliding window, smoother, debouncer, in-memory cache, or per-entity in-memory state dictionary
+- The implementation adds rehydrate-on-restart logic
+- The implementation adds a new event type that differs from an existing event type only in a boolean / enum field
+
+What to present:
+
+1. **Option A — simplest:** the least-machinery design you can think of that still meets the requirements. Name what is sacrificed (latency? eventual-consistency window? a rarely-hit edge case?).
+2. **Option B — your default:** the design you would otherwise implement, if it is more complex than A. Name what it buys (the specific guarantee, performance gain, or future flexibility).
+3. **Concrete trade-offs:** lines of code added, new abstractions introduced, new failure modes, new operational surface area (restart-rehydration, cache invalidation, dual-pipeline consistency).
+4. **Recommendation:** which option you would pick and why, in one sentence.
+
+This rule fires DURING planning — before code is written. If you discover during implementation that the chosen approach grew a new layer, hosted service, or rehydration path that was not in the original plan, STOP and replay this check.
+
+Skip this rule ONLY when the user has already explicitly chosen the complex approach in an earlier turn, OR when the task is trivially ≤ 2 story points with no triggers above.
+
 ## Skill Discipline

 Do exactly what the skill says. Nothing more.
@@ -8,8 +8,16 @@ globs: ["**/*test*", "**/*spec*", "**/*Test*", "**/tests/**", "**/test/**"]
 - One assertion per test when practical; name tests descriptively: `MethodName_Scenario_ExpectedResult`
 - Test boundary conditions, error paths, and happy paths
 - Use mocks only for external dependencies; prefer real implementations for internal code
- Aim for 75%+ coverage on business logic; 100% on critical paths (code paths where a bug would cause data loss, security breaches, financial errors, or system outages — identify from acceptance criteria marked as must-have or from security_approach.md). The 75% threshold is canonical — see `cursor-meta.mdc` Quality Thresholds.
+- Aim for 75%+ coverage on business logic; **90% floor / 100% aim on critical paths** (code paths where a bug would cause data loss, security breaches, financial errors, or system outages — identify from acceptance criteria marked as must-have or from `security_approach.md`). 90% is the enforcement floor (blocking in CI / refactor verification / release pre-flight); 100% is the aspirational aim — drift below 100% but at-or-above 90% is acceptable. Both numbers are canonical — see `cursor-meta.mdc` Quality Thresholds.
 - Integration tests use real database (Postgres testcontainers or dedicated test DB)
 - Never use Thread Sleep or fixed delays in tests; use polling or async waits
 - Keep test data factories/builders for reusable test setup
 - Tests must be independent: no shared mutable state between tests
+
+## Test environment (this project)
+
+- **Unit tests** (`tests/unit/`): may run locally on the dev workstation (`pytest tests/unit/` in the project venv). Local PASS is equivalent to Jetson PASS for this tier because the suite is fully synthetic.
+- **Blackbox / e2e / performance / resilience / security / resource-limit** tests (`tests/e2e/`, `e2e/tests/`, `tests/perf/`, …): MUST run on the Jetson Orin Nano Super (or a Jetson-equivalent arm64 agent). Use `scripts/run-tests-jetson.sh` for local dev; CI runs `.woodpecker/01-test.yml` on the colocated arm64 Jetson Woodpecker agent.
+- Do NOT run e2e tests on the local workstation and report the result. If the Jetson is unreachable, the e2e verdict is "not run" — record the gap in `_docs/_process_leftovers/` rather than substituting a local result.
+- Tests gated by `RUN_REPLAY_E2E` or `@pytest.mark.tier2` are expected to SKIP locally; that is correct behaviour, not a failure to investigate.
+- Canonical source for this policy: `_docs/02_document/tests/environment.md` § Where each tier runs (active policy).
@@ -1,9 +1,9 @@
 ---
 name: autodev
 description: |
-  Auto-chaining orchestrator that drives the full BUILD-SHIP workflow from problem gathering through deployment.
+  Auto-chaining orchestrator that drives the full BUILD → SHIP → EVOLVE workflow from problem gathering through release and retrospective.
  Detects current project state from _docs/ folder, resumes from where it left off, and flows through
-  problem → research → plan → test specs → decompose → implement → tests → docs sync → deploy without manual skill invocation.
+  problem → research → plan (incl. ADRs) → test specs → decompose → implement → tests → docs sync → deploy → release → retrospective without manual skill invocation.
  Maximizes work per conversation by auto-transitioning between skills.
  Trigger phrases:
  - "autodev", "auto", "start", "continue"
@@ -15,7 +15,7 @@ disable-model-invocation: true

 # Autodev Orchestrator

-Auto-chaining execution engine that drives the full BUILD → SHIP workflow. Detects project state from `_docs/`, resumes from where work stopped, and flows through skills automatically. The user invokes `/autodev` once — the engine handles sequencing, transitions, and re-entry.
+Auto-chaining execution engine that drives the full BUILD → SHIP → EVOLVE workflow. Detects project state from `_docs/`, resumes from where work stopped, and flows through skills automatically. The user invokes `/autodev` once — the engine handles sequencing, transitions, and re-entry.

 ## File Index

@@ -3,7 +3,7 @@
 Workflow for projects with an existing codebase. Structurally it has **two phases**:

 - **Phase A — One-time baseline setup (Steps 1–8)**: runs exactly once per codebase. Documents the code, produces test specs, makes the code testable, writes and runs the initial test suite, optionally refactors with that safety net.
- **Phase B — Feature cycle (Steps 9–17, loops)**: runs once per new feature. After Step 17 (Retrospective), the flow loops back to Step 9 (New Task) with `state.cycle` incremented.
+- **Phase B — Feature cycle (Steps 9–17, loops)**: runs once per new feature. After Step 17 (Retrospective), the flow loops back to Step 9 (New Task) with `state.cycle` incremented. Step 16.5 (Release) sits between Deploy (16) and Retrospective (17).

 A first-time run executes Phase A then Phase B; every subsequent invocation re-enters Phase B.

@@ -33,7 +33,8 @@ A first-time run executes Phase A then Phase B; every subsequent invocation re-e
 | 13 | Update Docs | document/SKILL.md (task mode) | Task Steps 0–5 |
 | 14 | Security Audit | security/SKILL.md | Phase 1–5 (optional) |
 | 15 | Performance Test | test-run/SKILL.md (perf mode) | Steps 1–5 (optional) |
-| 16 | Deploy | deploy/SKILL.md | Step 1–7 |
+| 16 | Deploy | deploy/SKILL.md | Step 1–7 (optional) |
+| 16.5 | Release | release/SKILL.md | Phase 1–6 (optional — only if Step 16 completed) |
 | 17 | Retrospective | retrospective/SKILL.md (cycle-end mode) | Steps 1–4 |

 After Step 17, the feature cycle completes and the flow loops back to Step 9 with `state.cycle + 1` — see "Re-Entry After Completion" below.
@@ -275,33 +276,63 @@ State-driven: reached by auto-chain from Step 14 (completed or skipped).
 Action: Apply the **Optional Skill Gate** (`protocols.md` → "Optional Skill Gate") with:
 - question:        `Run performance/load tests before deploy?`
 - option-a-label:  `Run performance tests (recommended for latency-sensitive or high-load systems)`
- option-b-label:  `Skip — proceed directly to deploy`
+- option-b-label:  `Skip — proceed to deploy choice`
 - recommendation:  `A or B — base on whether acceptance criteria include latency, throughput, or load requirements`
 - target-skill:    `.cursor/skills/test-run/SKILL.md` in **perf mode** (the skill handles runner detection, threshold comparison, and its own A/B/C gate on threshold failures)
 - next-step:       Step 16 (Deploy)

 ---

-**Step 16 — Deploy**
+**Step 16 — Deploy (optional)**
 State-driven: reached by auto-chain from Step 15 (completed or skipped).

-Action: Read and execute `.cursor/skills/deploy/SKILL.md`.
+Action: Apply the **Optional Skill Gate** (`protocols.md` → "Optional Skill Gate") with:
+- question:        `Run deploy planning or refresh deploy artifacts for this cycle?`
+- option-a-label:  `Run deploy — update scripts/procedures for this release`
+- option-b-label:  `Skip — keep developing; deploy when ready for production`
+- recommendation:  `B during active feature work; A when this cycle should ship`
+- target-skill:    `.cursor/skills/deploy/SKILL.md`
+- next-step:       Step 16.5 (Release) — only when Step 16 was completed; otherwise Step 17 (Retrospective)

-After the deploy skill completes successfully, mark Step 16 as `completed` and auto-chain to Step 17 (Retrospective).
+On **skip**: mark Step 16 and Step 16.5 as `skipped`; auto-chain to Step 17 (Retrospective in cycle-end mode).
+
+On **complete**: mark Step 16 `completed` and auto-chain to Step 16.5 (Release).
+
+---
+
+**Step 16.5 — Release (optional)**
+State-driven: reached by auto-chain from Step 16 **only when Step 16 status is `completed`**, for the current `state.cycle`. If Step 16 was `skipped`, Step 16.5 is `skipped` and `/release` is not invoked.
+
+Action: Read and execute `.cursor/skills/release/SKILL.md`. The release skill owns its own user interaction (Phase 1 pre-release gate, Phase 2 strategy select, Phase 6 escalation). Autodev does NOT add a wrapping A/B/C gate. Pass cycle context (`cycle: state.cycle`).
+
+After the release skill exits, route on the verdict:
+
+- **Verdict `Released`** → mark Step 16.5 `completed` and auto-chain to Step 17 (Retrospective in cycle-end mode).
+- **Verdict `Released-with-override`** → mark Step 16.5 `completed` AND auto-chain to Step 17 (Retrospective in **incident mode**).
+- **Verdict `Rolled-Back`** → mark Step 16.5 `failed`. Auto-chain to Step 17 (Retrospective in **incident mode**). The cycle does NOT loop back to Step 9.
+- **Verdict `Aborted`** → mark Step 16.5 `not_started` (no live-system change) OR `failed` (live-system touched before abort). Surface the abort reason and STOP. Next `/autodev` invocation re-evaluates Phase B from the failed step.

 ---

 **Step 17 — Retrospective**
-State-driven: reached by auto-chain from Step 16, for the current `state.cycle`.
+State-driven: reached by auto-chain from Step 16.5 (any verdict) OR from Step 16/16.5 both `skipped`, for the current `state.cycle`.

-Action: Read and execute `.cursor/skills/retrospective/SKILL.md` in **cycle-end mode**. Pass cycle context (`cycle: state.cycle`) so the retro report and LESSONS.md entries record which feature cycle they came from.
+Action: Read and execute `.cursor/skills/retrospective/SKILL.md`. Mode selection:

-After retrospective completes, mark Step 17 as `completed` and enter "Re-Entry After Completion" evaluation.
+- Step 16.5 verdict `Released` → cycle-end mode
+- Step 16.5 verdict `Released-with-override` or `Rolled-Back` → incident mode
+
+Pass cycle context (`cycle: state.cycle`) so the retro report and LESSONS.md entries record which feature cycle they came from.
+
+After retrospective completes:
+
+- If Step 16.5 verdict was `Released` or `Released-with-override`, OR Step 16.5 was `skipped` → mark Step 17 as `completed` and enter "Re-Entry After Completion" evaluation (loop back to Step 9 for cycle N+1).
+- If Step 16.5 verdict was `Rolled-Back` → mark Step 17 as `completed` but do NOT loop back. Surface the incident retro path and STOP.

 ---

 **Re-Entry After Completion**
-State-driven: `state.step == done` OR Step 17 (Retrospective) is completed for `state.cycle`.
+State-driven: `state.step == done` OR Step 17 (Retrospective) is completed for `state.cycle` AND (Step 16.5 verdict was `Released` or `Released-with-override` OR Step 16.5 was `skipped`). A `Rolled-Back` cycle does NOT trigger Re-Entry — the user must explicitly invoke `/autodev` again.

 Action: The project completed a full cycle. Print the status banner and automatically loop back to New Task — do NOT ask the user for confirmation:

@@ -316,7 +347,7 @@ Action: The project completed a full cycle. Print the status banner and automati

 Set `step: 9`, `status: not_started`, and **increment `cycle`** (`cycle: state.cycle + 1`) in the state file, then auto-chain to Step 9 (New Task). Reset `sub_step` to `phase: 0, name: awaiting-invocation, detail: ""` and `retry_count: 0`.

-Note: the loop (Steps 9 → 17 → 9) ensures every feature cycle includes: New Task → Implement → Run Tests → Test-Spec Sync → Update Docs → Security → Performance → Deploy → Retrospective.
+Note: the loop (Steps 9 → 17 → 9) covers: New Task → Implement → Run Tests → Test-Spec Sync → Update Docs → Security → Performance → Deploy (optional) → Release (optional) → Retrospective. The cycle completes (and loops back to Step 9) on a `Released` or `Released-with-override` verdict, or when deploy/release were skipped; rolled-back or aborted releases stop the cycle.

 ## Auto-Chain Rules

@@ -343,9 +374,15 @@ Note: the loop (Steps 9 → 17 → 9) ensures every feature cycle includes: New
 | Test-Spec Sync (12, done or skipped) | Auto-chain → Update Docs (13) |
 | Update Docs (13) | Auto-chain → Security Audit choice (14) |
 | Security Audit (14, done or skipped) | Auto-chain → Performance Test choice (15) |
-| Performance Test (15, done or skipped) | Auto-chain → Deploy (16) |
-| Deploy (16) | Auto-chain → Retrospective (17) |
-| Retrospective (17) | **Cycle complete** — loop back to New Task (9) with incremented cycle counter |
+| Performance Test (15, done or skipped) | Auto-chain → Deploy choice (16) |
+| Deploy (16, completed) | Auto-chain → Release (16.5) |
+| Deploy (16, skipped) | Mark 16.5 `skipped` → auto-chain → Retrospective (17, cycle-end mode) |
+| Release (16.5, verdict Released) | Auto-chain → Retrospective (17, cycle-end mode) |
+| Release (16.5, verdict Released-with-override) | Auto-chain → Retrospective (17, **incident mode**) |
+| Release (16.5, verdict Rolled-Back) | Auto-chain → Retrospective (17, **incident mode**); cycle does NOT loop back |
+| Release (16.5, verdict Aborted) | STOP — surface abort reason; do not auto-chain |
+| Retrospective (17, after Released / Released-with-override / deploy skipped) | **Cycle complete** — loop back to New Task (9) with incremented cycle counter |
+| Retrospective (17, after Rolled-Back) | Cycle remains incomplete — STOP and surface incident retro path |

 ## Status Summary — Step List

@@ -381,9 +418,10 @@ Flow-specific slot values:
 | 14 | Security Audit             | — |
 | 15 | Performance Test           | — |
 | 16 | Deploy                     | — |
+| 16.5 | Release                  | `DONE (Released | Released-with-override | Rolled-Back | Aborted)` |
 | 17 | Retrospective              | — |

-All rows accept the shared state tokens (`DONE`, `IN PROGRESS`, `NOT STARTED`, `FAILED (retry N/3)`); rows 2, 4, 8, 12, 13, 14, 15 additionally accept `SKIPPED`.
+All rows accept the shared state tokens (`DONE`, `IN PROGRESS`, `NOT STARTED`, `FAILED (retry N/3)`); rows 2, 4, 8, 12, 13, 14, 15, 16, 16.5 additionally accept `SKIPPED`.

 Row rendering format (renders with a phase separator between Step 8 and Step 9):

@@ -406,5 +444,6 @@ Row rendering format (renders with a phase separator between Step 8 and Step 9):
 Step 14   Security Audit           [<state token>]
 Step 15   Performance Test         [<state token>]
 Step 16   Deploy                   [<state token>]
+ Step 16.5 Release                  [<state token>]
 Step 17   Retrospective            [<state token>]
 ```
@@ -1,6 +1,6 @@
 # Greenfield Workflow

-Workflow for new projects built from scratch. Flows linearly: Problem → Research → Plan → UI Design (if applicable) → Test Spec → Decompose → Implement + Product Completeness Gate → Code Testability Revision → Decompose Tests → Implement Tests → Run Tests → Test-Spec Sync → Update Docs → Security Audit (optional) → Performance Test (optional) → Deploy → Retrospective.
+Workflow for new projects built from scratch. Flows linearly: Problem → Research → Plan → UI Design (if applicable) → Test Spec → Decompose → Implement + Product Completeness Gate → Code Testability Revision → Decompose Tests → Implement Tests → Run Tests → Test-Spec Sync → Update Docs → Security Audit (optional) → Performance Test (optional) → Deploy (optional) → Release (optional, only if Deploy ran) → Retrospective.

 ## Step Reference Table

@@ -8,7 +8,7 @@ Workflow for new projects built from scratch. Flows linearly: Problem → Resear
 |------|------|-----------|-------------------|
 | 1 | Problem | problem/SKILL.md | Phase 1–4 |
 | 2 | Research | research/SKILL.md | Mode A: Phase 1–4 · Mode B: Step 0–8 |
-| 3 | Plan | plan/SKILL.md | Step 1–6 + Final |
+| 3 | Plan | plan/SKILL.md | Step 1, 2, 3, 4, 4.5 (ADR Capture), 5, 6 + Final |
 | 4 | UI Design | ui-design/SKILL.md | Phase 0–8 (conditional — UI projects only) |
 | 5 | Test Spec | test-spec/SKILL.md | Phases 1–4 |
 | 6 | Decompose | decompose/SKILL.md (implementation task decomposition) | Step 1 + Step 1.5 + Step 2 + Step 4 |
@@ -21,7 +21,8 @@ Workflow for new projects built from scratch. Flows linearly: Problem → Resear
 | 13 | Update Docs | document/SKILL.md (task mode) | Task Steps 0–5 |
 | 14 | Security Audit | security/SKILL.md | Phase 1–5 (optional) |
 | 15 | Performance Test | test-run/SKILL.md (perf mode) | Steps 1–5 (optional) |
-| 16 | Deploy | deploy/SKILL.md | Step 1–7 |
+| 16 | Deploy | deploy/SKILL.md | Step 1–7 (optional) |
+| 16.5 | Release | release/SKILL.md | Phase 1–6 (optional — only if Step 16 completed) |
 | 17 | Retrospective | retrospective/SKILL.md (cycle-end mode) | Steps 1–4 |

 ## Detection Rules
@@ -279,26 +280,55 @@ Action: Apply the **Optional Skill Gate** (`protocols.md` → "Optional Skill Ga

 ---

-**Step 16 — Deploy**
+**Step 16 — Deploy (optional)**
 State-driven: reached by auto-chain from Step 15 (after Step 15 is completed or skipped).

-Action: Read and execute `.cursor/skills/deploy/SKILL.md`.
+Action: Apply the **Optional Skill Gate** (`protocols.md` → "Optional Skill Gate") with:
+- question:        `Run deploy planning (scripts, procedures, compose overlays) now?`
+- option-a-label:  `Run deploy — produce/update deploy artifacts and scripts`
+- option-b-label:  `Skip — continue development; deploy when ready for production`
+- recommendation:  `B when the product is not ready to ship; A when targeting a release soon`
+- target-skill:    `.cursor/skills/deploy/SKILL.md`
+- next-step:       Step 16.5 (Release) — only when Step 16 was completed; otherwise Step 17 (Retrospective)

-After the deploy skill completes successfully, mark Step 16 as `completed` and auto-chain to Step 17 (Retrospective).
+On **skip**: mark Step 16 and Step 16.5 as `skipped`; record in the release report (if one exists) or `_docs/_autodev_state.md` `sub_step.detail` that deploy/release were deferred; auto-chain to Step 17 (Retrospective in cycle-end mode).
+
+On **complete**: mark Step 16 `completed` and auto-chain to Step 16.5 (Release).
+
+---
+
+**Step 16.5 — Release (optional)**
+State-driven: reached by auto-chain from Step 16 **only when Step 16 status is `completed`**. If Step 16 was `skipped`, Step 16.5 is also `skipped` and the flow does not invoke `/release`.
+
+Action: Read and execute `.cursor/skills/release/SKILL.md`. The release skill is responsible for selecting the target environment, executing the deploy artifacts, smoke-testing, watching the rollout, and producing a definitive verdict (`Released`, `Released-with-override`, `Rolled-Back`, or `Aborted`).
+
+The release skill has its own internal BLOCKING gates (Phase 1 pre-release gate, Phase 2 strategy select, Phase 6 user confirmation when soft regression escalates). Autodev does NOT add a wrapping A/B/C gate — the release skill owns its own user interaction.
+
+After the release skill exits:
+
+- **Verdict `Released`** → mark Step 16.5 `completed` and auto-chain to Step 17 (Retrospective in cycle-end mode).
+- **Verdict `Released-with-override`** → mark Step 16.5 `completed` AND auto-chain to Step 17 (Retrospective in **incident mode**) — the override is itself an incident the retrospective must analyze.
+- **Verdict `Rolled-Back`** → mark Step 16.5 `failed`. Auto-chain to Step 17 (Retrospective in **incident mode**). Do NOT consider the project "Done" — the user owns the next move (re-run /implement on a fix branch, re-run /deploy, re-run /release).
+- **Verdict `Aborted`** → mark Step 16.5 `not_started` (the release was never started) OR `failed` if the abort came after Phase 3 had already touched the live system. Surface the abort reason and STOP — do not auto-chain to retrospective.

 ---

 **Step 17 — Retrospective**
-State-driven: reached by auto-chain from Step 16.
+State-driven: reached by auto-chain from Step 16.5 (any verdict) OR from Step 16/16.5 both `skipped` (cycle-end mode — note deploy/release deferred in the retro report).

-Action: Read and execute `.cursor/skills/retrospective/SKILL.md` in **cycle-end mode**. This closes the cycle's feedback loop by folding metrics into `_docs/06_metrics/retro_<date>.md` and appending the top-3 lessons to `_docs/LESSONS.md`.
+Action: Read and execute `.cursor/skills/retrospective/SKILL.md`. Mode selection:
+
+- Step 16.5 verdict `Released` → cycle-end mode
+- Step 16.5 verdict `Released-with-override` or `Rolled-Back` → incident mode
+
+The retrospective closes the cycle's feedback loop by folding metrics into `_docs/06_metrics/retro_<date>.md` (or `incident_<date>_release.md` in incident mode) and appending the top-3 lessons to `_docs/LESSONS.md`.

 After retrospective completes, mark Step 17 as `completed` and enter "Done" evaluation.

 ---

 **Done**
-State-driven: reached by auto-chain from Step 17. (Sanity check: `_docs/04_deploy/` should contain all expected artifacts — containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md, deployment_procedures.md, deploy_scripts.md.)
+State-driven: reached by auto-chain from Step 17. (Sanity check: if Step 16 was `completed`, `_docs/04_deploy/` should contain the expected deploy artifacts. If Step 16.5 was `completed`, `_docs/04_release/` should contain a release report with a definitive verdict. Skipped deploy/release is valid — no release report required.)

 Action: Report project completion with summary. Then **rewrite the state file** so the next `/autodev` invocation enters the feature-cycle loop in the existing-code flow:

@@ -336,8 +366,13 @@ On the next invocation, Flow Resolution rule 1 reads `flow: existing-code` and r
 | Test-Spec Sync (12, done or skipped) | Auto-chain → Update Docs (13) |
 | Update Docs (13, done or skipped) | Auto-chain → Security Audit choice (14) |
 | Security Audit (14, done or skipped) | Auto-chain → Performance Test choice (15) |
-| Performance Test (15, done or skipped) | Auto-chain → Deploy (16) |
-| Deploy (16) | Auto-chain → Retrospective (17) |
+| Performance Test (15, done or skipped) | Auto-chain → Deploy choice (16) |
+| Deploy (16, completed) | Auto-chain → Release (16.5) |
+| Deploy (16, skipped) | Mark 16.5 `skipped` → auto-chain → Retrospective (17, cycle-end mode) |
+| Release (16.5, verdict Released) | Auto-chain → Retrospective (17, cycle-end mode) |
+| Release (16.5, verdict Released-with-override) | Auto-chain → Retrospective (17, **incident mode**) |
+| Release (16.5, verdict Rolled-Back) | Auto-chain → Retrospective (17, **incident mode**); do NOT enter Done |
+| Release (16.5, verdict Aborted) | STOP — surface abort reason; do not auto-chain |
 | Retrospective (17) | Report completion; rewrite state to existing-code flow, step 9 |

 ## Status Summary — Step List
@@ -362,9 +397,10 @@ Flow name: `greenfield`. Render using the banner template in `protocols.md` →
 | 14 | Security Audit             | — |
 | 15 | Performance Test           | — |
 | 16 | Deploy                     | — |
+| 16.5 | Release                  | `DONE (Released | Released-with-override | Rolled-Back | Aborted)` |
 | 17 | Retrospective              | — |

-All rows also accept the shared state tokens (`DONE`, `IN PROGRESS`, `NOT STARTED`, `FAILED (retry N/3)`); rows 4, 12, 13, 14, 15 additionally accept `SKIPPED`.
+All rows also accept the shared state tokens (`DONE`, `IN PROGRESS`, `NOT STARTED`, `FAILED (retry N/3)`); rows 4, 12, 13, 14, 15, 16, 16.5 additionally accept `SKIPPED`.

 Row rendering format (step-number column is right-padded to 2 characters for alignment):

@@ -385,5 +421,6 @@ Row rendering format (step-number column is right-padded to 2 characters for ali
 Step 14  Security Audit            [<state token>]
 Step 15  Performance Test          [<state token>]
 Step 16  Deploy                    [<state token>]
+ Step 16.5 Release                  [<state token>]
 Step 17  Retrospective             [<state token>]
 ```
@@ -13,7 +13,7 @@ The autodev persists its position to `_docs/_autodev_state.md`. This is a lightw

 ## Current Step
 flow: [greenfield | existing-code | meta-repo]
-step: [1-17 for greenfield, 1-17 for existing-code, 1-6 for meta-repo (incl. fractional 2.5 and 3.5), or "done"]
+step: [1-17 for greenfield (incl. fractional 16.5), 1-17 for existing-code (incl. fractional 16.5), 1-6 for meta-repo (incl. fractional 2.5 and 3.5), or "done"]
 name: [step name from the active flow's Step Reference Table]
 status: [not_started / in_progress / completed / skipped / failed]
 sub_step:
@@ -2,7 +2,7 @@
 name: code-review
 description: |
  Multi-phase code review against task specs with structured findings output.
-  6-phase workflow: context loading, spec compliance, code quality, security quick-scan, performance scan, cross-task consistency.
+  7-phase workflow: context loading, spec compliance, code quality, security quick-scan, performance scan, cross-task consistency, architecture compliance.
  Produces a structured report with severity-ranked findings and a PASS/FAIL/PASS_WITH_WARNINGS verdict.
  Invoked by /implement skill after each batch, or manually.
  Trigger phrases:
@@ -106,11 +106,12 @@ When multiple tasks were implemented in the same batch:

 ## Phase 7: Architecture Compliance

-Verify the implemented code respects the architecture documented in `_docs/02_document/architecture.md` and the component boundaries declared in `_docs/02_document/module-layout.md`.
+Verify the implemented code respects the architecture documented in `_docs/02_document/architecture.md`, the component boundaries declared in `_docs/02_document/module-layout.md`, and the **accepted Architectural Decision Records** under `_docs/02_document/adr/`.

 **Inputs**:
 - `_docs/02_document/architecture.md` — layering, allowed dependencies, patterns
 - `_docs/02_document/module-layout.md` — per-component directories, Public API surface, `Imports from` lists, Allowed Dependencies table
+- `_docs/02_document/adr/` — every `Status: Accepted` ADR is an enforceable structural rule. `Status: Proposed`, `Status: Deprecated`, and `Status: Superseded` ADRs are NOT enforced (Proposed = not yet ratified; Deprecated/Superseded = a later ADR overturned it). If the directory does not exist or has only the index file, ADRs are skipped — log this skip in the report so the absence is visible.
 - The cumulative list of changed files (for per-batch invocation) or the full codebase (for baseline invocation)

 **Checks**:
@@ -125,6 +126,11 @@ Verify the implemented code respects the architecture documented in `_docs/02_do

 5. **Cross-cutting concerns not locally re-implemented**: if a file under a component directory contains logic that should live in `shared/<concern>/` (e.g., custom logging setup, config loader, error envelope), flag it. Severity: Medium. Category: Architecture.

+6. **ADR compliance**: for each `Status: Accepted` ADR, confirm the changed code does not contradict the ADR's `Decision`. Two failure modes are flagged:
+   - **ADR-Violation**: the changed code does the opposite of an Accepted ADR's `Decision`. Example: ADR-002 says "We will use Postgres for transactional data" and the changed code introduces a SQLite dependency for a transactional path. Severity: **Critical**. Category: Architecture. The finding cites the ADR by `NNN_<slug>` and the offending file/line.
+   - **ADR-Drift**: the changed code does something the ADR did not anticipate AND that materially affects the ADR's `Consequences` (positive or negative). Example: ADR-004 says "Event-driven cross-component comms" and a changed file introduces a new synchronous HTTP call between two components. Severity: **High**. Category: Architecture. The finding either proposes "Update ADR-NNN to acknowledge the new pattern" or "Remove the drift to align with ADR-NNN" — never silently accepts.
+   The check skips ADRs that are explicitly out of scope of the changed batch (e.g., ADR-001 about deployment pipeline when the batch only touches business-logic files). Use the ADR's `Evidence` section to determine scope: if no Evidence path overlaps with any changed file, skip the ADR for this batch.
+
 **Detection approach (per language)**:

 - Python: parse `import` / `from ... import` statements; optionally AST with `ast` module for reliable symbol resolution.
@@ -197,7 +203,7 @@ Produce a structured report with findings deduplicated and sorted by severity:

 Bug, Spec-Gap, Security, Performance, Maintainability, Style, Scope, Architecture

-`Architecture` findings come from Phase 7. They indicate layering violations, Public API bypasses, new cyclic dependencies, duplicate symbols, or cross-cutting concerns re-implemented locally.
+`Architecture` findings come from Phase 7. They indicate layering violations, Public API bypasses, new cyclic dependencies, duplicate symbols, cross-cutting concerns re-implemented locally, **ADR-Violation** (changed code contradicts an `Accepted` ADR's Decision — Critical), or **ADR-Drift** (changed code introduces a pattern that materially affects an `Accepted` ADR's Consequences without superseding it — High).

 ## Verdict Logic

@@ -232,7 +238,7 @@ The implement skill invokes code-review by:

 1. Reading `.cursor/skills/code-review/SKILL.md`
 2. Providing the inputs above as context (read the files, pass content to the review phases)
-3. Executing all 6 phases sequentially
+3. Executing all 7 phases sequentially
 4. Consuming the verdict from the output

 ### Outputs (returned to the implement skill)
@@ -65,6 +65,7 @@ Announce the selected entrypoint and resolved paths to the user before proceedin
 | 1 Bootstrap Structure | `steps/01_bootstrap-structure.md` | ✓ | — | — |
 | 1t Test Infrastructure | `steps/01t_test-infrastructure.md` | — | — | ✓ |
 | 1.5 Module Layout | `steps/01-5_module-layout.md` | ✓ | — | — |
+| 1.7 System-Pipeline Tasks | `steps/01-7_system-pipeline-tasks.md` | ✓ | — | — |
 | 2 Task Decomposition | `steps/02_task-decomposition.md` | ✓ | ✓ | — |
 | 3 Blackbox Test Tasks | `steps/03_blackbox-test-decomposition.md` | — | — | ✓ |
 | 4 Cross-Verification | `steps/04_cross-verification.md` | ✓ | — | ✓ |
@@ -191,6 +192,20 @@ Read and follow `steps/01-5_module-layout.md`.

 ---

+### Step 1.7: System-Pipeline Tasks (implementation mode only)
+
+Read and follow `steps/01-7_system-pipeline-tasks.md`.
+
+This step exists because per-component task decomposition (Step 2)
+produces one task per component but NEVER produces a task whose
+deliverable is "the production code that drives the end-to-end
+pipeline by calling each component in order against real inputs".
+The architecture document describes the loop; nobody owns it. The
+GPS-passthrough incident (May 2026) is the canonical failure this
+step prevents.
+
+---
+
 ### Step 2: Task Decomposition (implementation and single component modes)

 Read and follow `steps/02_task-decomposition.md`.
@@ -243,6 +258,8 @@ Read and follow `steps/04_cross-verification.md`.
 │       [BLOCKING: user confirms structure]                       │
 │  1.5  Module Layout       → steps/01-5_module-layout.md         │
 │       [BLOCKING: user confirms layout]                          │
+│  1.7  System-Pipeline     → steps/01-7_system-pipeline-tasks.md │
+│       [BLOCKING: user confirms pipeline owners]                 │
 │  2.   Component Tasks     → steps/02_task-decomposition.md      │
 │  4.   Cross-Verification  → steps/04_cross-verification.md      │
 │       [BLOCKING: user confirms dependencies]                    │
@@ -16,7 +16,8 @@
 3. Each component owns ONE top-level directory. Shared code goes under `<root>/shared/` (or language equivalent).
 4. Public API surface = files in the layout's `public:` list for each component; everything else is internal and MUST NOT be imported from other components.
 5. Cross-cutting concerns (logging, error handling, config, telemetry, auth middleware, feature flags, i18n) each get ONE entry under Shared / Cross-Cutting; per-component tasks consume them (see Step 2 cross-cutting rule).
-6. Write `_docs/02_document/module-layout.md` using `templates/module-layout.md` format.
+6. **ADR cross-check**: if `_docs/02_document/adr/` exists, read every `Status: Accepted` ADR. For each, confirm the proposed module layout does not contradict the ADR's `Decision` (e.g., an ADR mandating an event-bus boundary between two components must show up as a `Imports from` exclusion in the layout; an ADR locking a layering style must show up in the Layering table). If an ADR conflicts with the language-conventional layout from step 2, the ADR wins — record the conflict in a `## ADR-driven exceptions to the conventional layout` section of `module-layout.md` with `See ADR NNN_<slug>` references. If the ADR conflict is irreconcilable (the ADR demands something the language genuinely cannot express), STOP and ask the user A/B/C: (A) update the ADR via plan Step 4.5 supersede flow, (B) accept a layered exception with documented rationale, (C) re-open architecture.
+7. Write `_docs/02_document/module-layout.md` using `templates/module-layout.md` format. Each Per-Component Mapping entry that is governed by an ADR includes a trailing `> See ADR NNN_<slug>` line.

 ## Self-verification

@@ -26,6 +27,8 @@
 - [ ] No component's `Imports from` list points at a higher layer
 - [ ] Paths follow the detected language's convention
 - [ ] No two components own overlapping paths
+- [ ] If `_docs/02_document/adr/` exists with Accepted ADRs, every layout decision that an ADR governs has a trailing `> See ADR NNN_<slug>` reference
+- [ ] No Accepted ADR is contradicted by the layout without a documented exception

 ## Save action

@@ -0,0 +1,72 @@
+# Step 1.7: System-Pipeline Tasks (implementation mode only)
+
+**Role**: Professional software architect, integration-focused.
+**Goal**: For every end-to-end pipeline named in `_docs/02_document/architecture.md` and `_docs/02_document/system-flows.md`, ensure there is exactly ONE explicit task that owns the production code that drives that pipeline against real inputs. This step prevents the failure mode where every individual component is "complete" but no production code wires them together (May 2026 GPS-passthrough incident — see `meta-rule.mdc` "When a test reveals missing production code").
+
+**Constraints**:
+
+- This step produces *integration* tasks, not per-component tasks. Per-component tasks come from Step 2.
+- An integration task's owner is typically the composition root, runtime root, main loop, or whichever component the module layout (Step 1.5) names as the "system spine". It is NEVER a leaf component.
+- Each integration task must be sized at 5 points or fewer. If the pipeline is too large for one task, split it into per-stage integration tasks (e.g. "wire ingress → C1", then "wire C1 → C5") rather than one giant task.
+
+## Inputs
+
+| File | Purpose |
+|------|---------|
+| `_docs/02_document/architecture.md` | Source of named end-to-end pipelines and their component sequences |
+| `_docs/02_document/system-flows.md` | Source of operational flows (per-frame loop, request lifecycle, batch job, etc.) |
+| `_docs/02_document/module-layout.md` | Produced by Step 1.5. Names the "system spine" component(s) — typically `runtime_root`, `app`, `main`, `composition`, or equivalent. |
+| `_docs/02_document/components/*/description.md` | Per-component contracts so you can tell which side of a seam each method lives on |
+
+## Steps
+
+1. **Enumerate end-to-end pipelines.** Read `architecture.md` and `system-flows.md`. For each named pipeline / flow that spans 2+ components, record:
+   - The pipeline name (e.g. "per-frame nav loop", "tile-cache build", "operator pre-flight verification").
+   - The ordered sequence of components it touches (e.g. `frame_source → c1_vio → c2_vpr → ... → c5_state → replay_sink`).
+   - The trigger (per-frame, per-request, scheduled, manual).
+   - The output (what the pipeline emits and to whom).
+2. **For each pipeline, locate the owner.** Use `module-layout.md` to find the component that owns the orchestration (the "spine"). If `module-layout.md` does not name one, STOP and ASK the user which component owns the pipeline. Do NOT silently default to the bootstrap structure task — bootstrap is about project skeleton, not behavior.
+3. **Check whether the pipeline is already covered by an existing task spec or by the bootstrap-structure task.** A pipeline is "covered" only if:
+   - A task spec's `Outcome` or `Acceptance Criteria` section explicitly names "drives the {pipeline_name} end-to-end against real production components", AND
+   - That task's owned files include the orchestration code (typically the spine component's main loop / entrypoint).
+4. **For every uncovered pipeline, create a system-integration task spec** in `_docs/02_tasks/todo/` using `.cursor/skills/decompose/templates/task.md`:
+   - **Component**: the spine component from step 2 (e.g. `runtime_root`).
+   - **Outcome**: the production callsite that drives the pipeline exists and runs end-to-end on real inputs.
+   - **Scope / Included**: the orchestration code (loop body, dispatcher, scheduler, entrypoint); explicit list of every component it must call in order; the data type at each seam.
+   - **Acceptance Criteria** (write each as testable):
+     - At least one production caller of every component method in the pipeline can be found by grep — name the methods explicitly.
+     - The orchestration runs against the real production component instances (NOT mocks, NOT a passthrough that bypasses them).
+     - At least one integration test exercises the orchestration end-to-end against real inputs.
+   - **Dependencies**: every per-component task whose component appears in the pipeline.
+   - **Complexity points**: ≤5; split the pipeline if it doesn't fit.
+   - **Tracker**: create a ticket immediately (per `decompose/SKILL.md` "Tracker inline" principle); rename the file to `[TRACKER-ID]_pipeline_<name>.md`.
+5. **Mark the integration task as `Dependencies` for the integration test task.** If `tests-only` decomposition has already produced an e2e/integration test task for this pipeline, append the new integration task to its `Dependencies` field so the test cannot be "made green" before the integration ships.
+
+## Anti-patterns this step explicitly blocks
+
+- **"compose_root returns a wired runtime"** prose interpreted as "the loop exists". Composition assembles the graph; it is NOT the loop. The loop is the code that pulls inputs, drives each node, and emits outputs. If grep finds zero callers of the leaf components, the loop does not exist regardless of what compose_root does.
+- **Treating the bootstrap-structure task as the home of the main loop.** Bootstrap is project skeleton (package layout, CLI scaffold, build files). It is NOT the main loop. Main loop is its own task.
+- **Per-component tasks claiming integration scope.** A C1 VIO task's deliverable is "C1 works in isolation against unit tests". A C1 task's acceptance criteria MUST NOT include "C1 is wired into the runtime" — that's the integration task's job.
+
+## Self-verification
+
+- [ ] Every pipeline named in `architecture.md` / `system-flows.md` is listed in your enumeration.
+- [ ] Every enumerated pipeline either (a) has an existing covered task, or (b) has a new integration task in `todo/`.
+- [ ] No integration task exceeds 5 complexity points.
+- [ ] Every integration task names every component in the pipeline as a `Dependencies` entry.
+- [ ] No integration task is owned by a leaf component — every owner is named in `module-layout.md` as a spine / orchestrator.
+- [ ] Every integration task has a tracker ticket created and the filename renamed to `[TRACKER-ID]_pipeline_<name>.md`.
+
+## Save action
+
+Write the new integration task files into `_docs/02_tasks/todo/`. They will be picked up by Step 2 (Task Decomposition's dependency-table writer) and by Step 4 (Cross-Verification).
+
+## Blocking
+
+**BLOCKING**: Present the pipeline enumeration + the list of new integration tasks to the user. Do NOT proceed to Step 2 until the user confirms:
+
+- The enumeration matches what they expect from the architecture documents.
+- Every uncovered pipeline now has an integration task.
+- The chosen spine owners are correct.
+
+If the user identifies a pipeline you missed, add it before proceeding. If the user names a different spine owner, update the task and re-run self-verification.
@@ -29,7 +29,7 @@ Save as `_docs/04_deploy/ci_cd_pipeline.md`.
 ### Test
 - Unit tests: [framework and command]
 - Blackbox tests: [framework and command, uses docker-compose.test.yml]
- Coverage threshold: 75% overall, 90% critical paths
+- Coverage threshold: 75% overall, 90% critical-path floor (100% aim) — per `.cursor/rules/cursor-meta.mdc` Quality Thresholds
 - Coverage report published as pipeline artifact

 ### Security
@@ -291,19 +291,46 @@ For each completed product task:
   - **BLOCKED**: production code exists but cannot be fully verified due to external hardware/data/license/runtime prerequisites; the blocker is explicit and tests report blocked/skipped with reason.
   - **FAIL**: promised production behavior is missing, only scaffolded, or only represented in tests/reports.

+#### 15.b System-Pipeline Check (runs ONCE per gate invocation, after per-task classification)
+
+The per-task classification above (steps 1–8) operates on `_docs/02_tasks/done/`. It catches missing component-local behavior but it CANNOT catch a missing *integration* — there is no task to fail if no task ever owned the integration in the first place. The GPS-passthrough incident (May 2026) escaped this gate because every per-component task in `done/` was honestly complete; the missing piece was the cross-component loop, which had no owning task.
+
+The system-pipeline check fixes that by walking the architecture documents directly, independent of `done/`.
+
+**Inputs**:
+- `_docs/02_document/architecture.md`
+- `_docs/02_document/system-flows.md`
+- Full source tree under the project's production directory (e.g. `src/`).
+
+**Procedure**:
+
+1. **Enumerate end-to-end pipelines.** Read `architecture.md` and `system-flows.md`. For each named pipeline / operational flow that spans 2+ components, record the ordered component sequence and the trigger (per-frame, per-request, scheduled, manual).
+2. **Grep for production callers of each seam method.** For each adjacent pair `A → B` in a pipeline, find a production source file (not under `tests/`, not under a `bench/` package, not a doc) that calls `A`'s public output method AND passes the result into `B`'s public input method.
+3. **Classify the pipeline**:
+   - **WIRED**: a production caller exists and the chain is complete from the first to the last component in the sequence.
+   - **PARTIALLY WIRED**: some adjacent pairs have callers but at least one seam is missing.
+   - **NOT WIRED**: no production code calls the pipeline's components in order. Bench tools, unit tests, and microbenchmarks do NOT count as "wiring".
+4. **Distinguish "wired but stubbed" from "wired with real components"**: a caller that invokes a passthrough / GPS-from-tlog / mock-output-generator instead of the real component is `NOT WIRED` for the purposes of this gate. The seam exists in the source file but the production behavior is faked. Grep for the same scaffold markers Step 15 already enumerates (`placeholder`, `stub`, `passthrough`, `scaffold until`, etc.) inside the caller's body.
+5. **Output**: append a `## System Pipeline Audit` section to `_docs/03_implementation/implementation_completeness_cycle[N]_report.md`. Per-pipeline row: name, sequence, classification, evidence file (the caller, or "NONE FOUND"), remediation suggestion if not `WIRED`.
+
+**Pipeline classification feeds the combined gate below.** Any pipeline that is not `WIRED` is a system-level FAIL that the per-task gate cannot rescue.
+
+**Why this is here and not only in decompose**: decompose Step 1.7 creates integration tasks up front; this check verifies the integration tasks actually got implemented (or, if they were never created, surfaces the gap before the cycle closes). The two layers are belt-and-suspenders by design.
+
 Save the audit to `_docs/03_implementation/implementation_completeness_cycle[N]_report.md` with:

 - Per-task classification
 - Evidence files/symbols checked
 - Any unresolved scaffold/native placeholders
 - Any named promised technologies not integrated
+- **System Pipeline Audit table** (per pipeline: name, sequence, WIRED / PARTIALLY WIRED / NOT WIRED, evidence file, remediation suggestion)
 - Required remediation task suggestions, each sized to 5 points or less

 Gate:

- If every product task is `PASS` or `BLOCKED` with explicit prerequisite evidence, continue to Final Test Run.
- If any product task is `FAIL`, STOP. Do not write the final product implementation report and do not proceed to any downstream autodev step. Completed original task files remain in `done/`; the missing work is represented by remediation tasks. Present a Choose block:
-  - A) Create remediation tasks now and return to implementation
+- If every product task is `PASS` or `BLOCKED` with explicit prerequisite evidence, AND every enumerated pipeline is `WIRED`, continue to Final Test Run.
+- If any product task is `FAIL` OR any pipeline is `PARTIALLY WIRED` / `NOT WIRED`, STOP. Do not write the final product implementation report and do not proceed to any downstream autodev step. Completed original task files remain in `done/`; the missing work is represented by remediation tasks. Present a Choose block:
+  - A) Create remediation tasks now and return to implementation. (For pipeline FAILs the remediation task is a NEW integration task owned by the spine component per `_docs/02_document/module-layout.md`; it is NOT a test task and NOT a doc task; its deliverable is production code that drives the pipeline against real components.)
  - B) Mark the missing behavior explicitly out of scope in task/docs, then re-run this gate
  - C) Abort for manual correction
 - Recommendation must normally be A unless the user deliberately accepts reduced scope.
@@ -84,29 +84,66 @@ Assess the change along these dimensions:
 - **Novelty**: does it involve libraries, protocols, or patterns not already in the codebase?
 - **Risk**: could it break existing functionality or require architectural changes?

-Classification:
+### 2a. Complexity-Points Estimate
+
+Project policy (per the workspace user-rule on ADO points): aim for tasks at 2–3 points (rarely 5). Tasks at 8 points are high risk; tasks at 13 are too complex and MUST be broken down. The new-task skill enforces this here, before producing a single-file task spec.
+
+Map the Scope/Novelty/Risk profile to a points estimate using this table:
+
+| Profile | Points | Examples |
+|---------|--------|----------|
+| All three low | **1–2** | One-line config change; trivial CRUD field addition |
+| Two low + one medium | **3** | Localized refactor; add one well-understood endpoint |
+| One low + two medium, OR all medium | **5** | New small feature touching 2–3 components; integration with a known library |
+| Any high, OR two medium + one high | **8** | Cross-cutting concern across 4+ components; integration with an unfamiliar protocol; significant architectural change |
+| Two or three high | **13** | New subsystem; unfamiliar tech across the stack; multiple unknown unknowns |
+
+If a relevant LESSONS.md entry biases the estimate (e.g., "auth-related changes historically take 2× estimate"), apply the multiplier and round up to the next discrete point on the scale (1, 2, 3, 5, 8, 13).
+
+### 2b. Routing by Complexity
+
+| Estimate | Default routing | Override path |
+|----------|-----------------|---------------|
+| **1–5** | Continue this skill at Step 3 (Research) or Step 4 (Codebase Analysis) — see classification below | — |
+| **8** | **STOP this skill and recommend handoff to `/decompose @<feature_description>`** (single-component decompose mode if the affected scope fits inside one component, default mode if it does not). The user may override and proceed in `/new-task`, but the override must be explicitly chosen. | C) Proceed in /new-task anyway with the user's acknowledgement that the resulting task is high-risk and may need to be re-decomposed mid-implementation |
+| **13** | **STOP this skill — auto-handoff is mandatory.** A 13-point feature cannot be a single task spec. Invoke `/decompose @<feature_description>` (default mode) before writing any task file. Surface the handoff to the user with no override path; this is a hard policy gate. | None — must decompose |
+
+For the auto-handoff path:
+
+1. Render a one-paragraph description of the feature suitable to feed `/decompose` (combine Step 1's verbatim user description with the complexity-points reasoning).
+2. Save it to `_docs/02_task_plans/<feature_slug>/feature-description.md` so the decompose skill has a stable input file.
+3. Either (a) directly auto-chain into `.cursor/skills/decompose/SKILL.md` in default mode with this file as input, or (b) report the handoff to the user along with the exact `/decompose` invocation and stop. Pick (a) only if the user has explicitly enabled auto-chain across skills (e.g., we are inside an `/autodev` invocation); otherwise pick (b).
+
+### 2c. Research vs Skip Research (only for ≤5 estimates)
+
+Classification (independent of points; runs only when points ≤ 5 and Step 2b chose Continue):

 | Category | Criteria | Action |
 |----------|----------|--------|
-| **Needs research** | New libraries/frameworks, unfamiliar protocols, significant architectural change, multiple unknowns | Proceed to Step 3 (Research) |
+| **Needs research** | New libraries/frameworks, unfamiliar protocols, multiple unknowns | Proceed to Step 3 (Research) |
 | **Skip research** | Extends existing functionality, uses patterns already in codebase, straightforward new component with known tech | Skip to Step 4 (Codebase Analysis) |

-Present the assessment to the user:
+Present the full assessment to the user:

 ```
 ══════════════════════════════════════
 COMPLEXITY ASSESSMENT
 ══════════════════════════════════════
- Scope:   [low / medium / high]
- Novelty: [low / medium / high]
- Risk:    [low / medium / high]
+ Scope:    [low / medium / high]
+ Novelty:  [low / medium / high]
+ Risk:     [low / medium / high]
+ Points:   [1 / 2 / 3 / 5 / 8 / 13]   (project aim: 2–3, rarely 5)
+ Routing:  [Continue in /new-task | Hand off to /decompose]
 ══════════════════════════════════════
- Recommendation: [Research needed / Skip research]
- Reason: [one-line justification]
+ Recommendation: [Research needed | Skip research | Decompose required]
+ Reason: [one-line justification, including any LESSONS.md influence]
 ══════════════════════════════════════
 ```

-**BLOCKING**: Ask the user to confirm or override the recommendation before proceeding.
+**BLOCKING**:
+- If points ≤ 5 → ask the user to confirm or override the research recommendation before proceeding.
+- If points = 8 → ask the user to choose between hand-off to /decompose (recommended) and continuing in /new-task with explicit risk acknowledgement.
+- If points = 13 → STOP and present the handoff plan; do not offer a continue-anyway override.

 ---

@@ -203,7 +240,13 @@ Apply the four shared-task triggers from `.cursor/skills/decompose/SKILL.md` Ste
  2. Add the layout edit to the task's deliverables; the implementer writes it alongside the code change.
  3. If `module-layout.md` does not exist, STOP and instruct the user to run `/document` first (existing-code flow) or `/decompose` default mode (greenfield). Do not guess.

-Record the classification and any contract/layout deliverables in the working notes; they feed Step 5 (Validate Assumptions) and Step 6 (Create Task).
+- **ADR cross-check** — runs unconditionally for every new-task in any of the three classifications above:
+  1. If `_docs/02_document/adr/` exists, scan every `Status: Accepted` ADR. For each, ask: "would the proposed task either contradict this ADR's `Decision` or materially affect its `Consequences`?"
+  2. **Conflict** (task contradicts an Accepted ADR) → STOP and Choose A/B/C: **A)** Re-scope the task to comply with the ADR, **B)** Propose superseding the ADR — the task spec then includes a deliverable to invoke `/plan --adr-only` (or the next `/plan` cycle's Step 4.5) with `Supersedes: ADR-NNN`, and the new task does NOT proceed until that supersede ADR is `Accepted`, **C)** Park the task in `backlog/` with a `Blocked-By: ADR-NNN review` note. Do not silently approve a contradictory task.
+  3. **Drift** (task changes assumptions an ADR depends on but does not directly contradict it) → record the affected ADR(s) under a new `### ADR Impact` section in the task spec with `> Affects ADR NNN_<slug>: <one-line summary>`. The implementer surfaces this at code-review Phase 7 (which then classifies it as ADR-Drift if not addressed).
+  4. **Aligned** (task implements something an Accepted ADR mandates) → cite the ADR(s) under `### ADR Compliance` in the task spec with `> Implements ADR NNN_<slug>`. Code-review Phase 7 then expects matching evidence in the implemented code.
+
+Record the classification, any contract/layout deliverables, and any ADR cross-check outcomes in the working notes; they feed Step 5 (Validate Assumptions) and Step 6 (Create Task).

 **BLOCKING**: none — this step surfaces findings; the user confirms them in Step 5.

@@ -263,6 +306,9 @@ Present using the Choose format for each decision that has meaningful alternativ
 - [ ] If Step 4.5 classified the task as producer, the `## Contract` section exists and points at a contract file
 - [ ] If Step 4.5 classified the task as consumer, `### Document Dependencies` lists the relevant contract file
 - [ ] If Step 4.5 flagged a layout delta, the task's Scope.Included names the `module-layout.md` edit
+- [ ] If Step 4.5 flagged an ADR conflict, the task is either re-scoped (A), explicitly blocked on a supersede ADR (B), or parked in backlog (C) — never silently bypassed
+- [ ] If Step 4.5 flagged ADR drift, the task spec has an `### ADR Impact` section listing the affected ADR(s)
+- [ ] If Step 4.5 flagged ADR alignment, the task spec has an `### ADR Compliance` section citing the implemented ADR(s)

 ---

@@ -15,7 +15,7 @@ disable-model-invocation: true

 # Solution Planning

-Decompose a problem and solution into architecture, data model, deployment plan, system flows, components, tests, and work item epics through a systematic 6-step workflow.
+Decompose a problem and solution into architecture, data model, deployment plan, system flows, components, ADRs, tests, and work item epics through a systematic workflow with seven step files (1, 2, 3, 4, 4.5, 5, 6) plus a Final quality checklist.

 ## Core Principles

@@ -55,7 +55,7 @@ Read `steps/01_artifact-management.md` for directory structure, save timing, sav

 ## Progress Tracking

-At the start of execution, create a TodoWrite with all steps (1 through 6 plus Final). Update status as each step completes.
+At the start of execution, create a TodoWrite with all steps (1, 2, 3, 4, 4.5, 5, 6 plus Final). Update status as each step completes. The fractional Step 4.5 (ADR Capture) sits between Architecture Review (Step 4) and Test Specifications (Step 5).

 ## Workflow

@@ -85,6 +85,16 @@ Read and follow `steps/04_review-risk.md`.

 ---

+### Step 4.5: Architecture Decision Records (ADRs)
+
+Read and follow `steps/04-5_adr-capture.md`.
+
+This step captures the architecture and tech-stack decisions that were made (or revised) in Steps 2–4 as durable, dated, immutable records under `_docs/02_document/adr/`. ADRs are the single thing in `_docs/` that explain the **why** of each major decision after the conversation history is gone. They are consumed by `decompose` (when bootstrapping module layout), `new-task` (when assessing a new feature against existing decisions), `refactor` (when proposing replacements), and any future code-review cycle that needs to confirm a structural choice was deliberate.
+
+This step is **BLOCKING**: the ADR set must be reviewed and confirmed by the user before Step 5 begins.
+
+---
+
 ### Step 5: Test Specifications

 Read and follow `steps/05_test-specifications.md`.
@@ -120,7 +130,7 @@ Read and follow `steps/07_quality-checklist.md`.
 |-----------|--------|
 | Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — planning cannot proceed |
 | Ambiguous requirements | ASK user |
-| Input data coverage below 75% | Search internet for supplementary data, ASK user to validate |
+| Input data coverage below the canonical threshold (`cursor-meta.mdc` Quality Thresholds) | Search internet for supplementary data, ASK user to validate |
 | Technology choice with multiple valid options | ASK user |
 | Component naming | PROCEED, confirm at next BLOCKING gate |
 | File structure within templates | PROCEED |
@@ -146,6 +156,8 @@ Read and follow `steps/07_quality-checklist.md`.
 │    [BLOCKING: user confirms components]                        │
 │ 4. Review & Risk       → risk register, iterations              │
 │    [BLOCKING: user confirms mitigations]                       │
+│ 4.5 ADR Capture        → _docs/02_document/adr/NNN_*.md         │
+│    [BLOCKING: user confirms ADR set]                           │
 │ 5. Test Specifications → per-component test specs               │
 │ 6. Work Item Epics     → epic per component + bootstrap         │
 │    ─────────────────────────────────────────────────           │
@@ -26,6 +26,10 @@ DOCUMENT_DIR/
 │   └── deployment_procedures.md
 ├── risk_mitigations.md
 ├── risk_mitigations_02.md          (iterative, ## as sequence)
+├── adr/
+│   ├── 001_[decision_slug].md
+│   ├── 002_[decision_slug].md
+│   └── ...
 ├── components/
 │   ├── 01_[name]/
 │   │   ├── description.md
@@ -66,6 +70,8 @@ DOCUMENT_DIR/
 | Step 3 | Common helpers generated | `common-helpers/[##]_helper_[name].md` |
 | Step 3 | Diagrams generated | `diagrams/` |
 | Step 4 | Risk assessment complete | `risk_mitigations.md` |
+| Step 4.5 | Each ADR captured | `adr/NNN_[decision_slug].md` |
+| Step 4.5 | ADR index updated | `adr/README.md` |
 | Step 5 | Tests written per component | `components/[##]_[name]/tests.md` |
 | Step 6 | Epics created in work item tracker | Tracker via MCP |
 | Final | All steps complete | `FINAL_report.md` |
@@ -85,3 +91,15 @@ If DOCUMENT_DIR already contains artifacts:
 2. Identify the last completed step based on which artifacts exist
 3. Resume from the next incomplete step
 4. Inform the user which steps are being skipped
+
+#### Step 4.5 (ADR Capture) resumption rule
+
+ADR files have a `Status` field that disambiguates "step in progress" from "step done":
+
+- `Status: Proposed` → Step 4.5 is **in progress**. The user has not yet hit the BLOCKING gate (or hit it and chose B/C/D, which kept files at `Proposed`). Resume Step 4.5 at Phase 4.5f and re-present the BLOCKING Choose to the user. Do NOT skip to Step 5.
+- `Status: Accepted` AND `adr/README.md` index exists AND every Accepted ADR is referenced in the index → Step 4.5 is **done**. Skip to Step 5.
+- `Status: Accepted` but `adr/README.md` is missing or out of date → Step 4.5 is **partially complete**. Resume at Phase 4.5d (Maintain the ADR Index) before moving on.
+- Mixed `Proposed` + `Accepted` files in the same directory → Step 4.5 is **in progress** with prior partial confirmations. Resume at Phase 4.5f and re-present only the still-`Proposed` ADRs.
+- Empty `adr/` directory or no `adr/` directory → Step 4.5 has not started yet. Begin at Phase 4.5a.
+
+The `Date` field on every Accepted ADR is the date the user confirmed it; do not regenerate it during resumption.
@@ -0,0 +1,187 @@
+# Step 4.5: Architecture Decision Records (ADRs)
+
+**Role**: Architect / technical writer
+**Goal**: Capture every major architecture, tech-stack, data-model, and integration decision made during Steps 2–4 as a durable, dated, immutable record under `_docs/02_document/adr/`.
+**Constraints**: ADRs only — do not re-open architecture; do not make new decisions in this step. Document what has been decided, not what is still open.
+
+ADRs are the single thing in `_docs/` that explains the **why** of each major decision after the conversation history is gone. They are consumed by:
+
+- `decompose` Step 1.5 (`steps/01-5_module-layout.md`) — every Accepted ADR is cross-checked against the module-layout proposal; conflicts trigger an explicit Choose between supersede / exception / re-open.
+- `new-task` Step 4.5 (`SKILL.md` § "Step 4.5: Contract & Layout Check") — every new task is classified against Accepted ADRs as Conflict / Drift / Aligned; conflicts STOP the task with a Choose A/B/C; drift adds an `### ADR Impact` section; alignment adds an `### ADR Compliance` section.
+- `refactor` Phase 2b.1 (`phases/02-analysis.md`) — every Accepted ADR is diffed against the proposed roadmap; Violations trigger a BLOCKING supersede gate that produces a `supersede_adr_NNN.md` task before any refactor task is created.
+- `code-review` Phase 7 (`SKILL.md` § "Phase 7: Architecture Compliance") — every changed-files batch is checked against Accepted ADRs; ADR-Violation findings are Critical, ADR-Drift findings are High.
+
+Discipline that still relies on the human: when a downstream skill detects a Drift case, the resulting task spec MUST land its `## ADR Impact` / `## ADR Compliance` section; the implementer must address it; the next code-review batch then has the context it needs. Drift left undocumented is the silent-failure path — every consumer hook above is designed to make it visible.
+
+## Inputs
+
+- `_docs/02_document/architecture.md` (incl. confirmed `## Architecture Vision`)
+- `_docs/02_document/glossary.md`
+- `_docs/02_document/data_model.md`
+- `_docs/02_document/system-flows.md`
+- `_docs/02_document/risk_mitigations.md` (and any `risk_mitigations_NN.md` iterations from Step 4)
+- `_docs/02_document/components/[##]_[name]/description.md`
+- `_docs/02_document/deployment/` (CI/CD, environments, observability)
+- `_docs/00_problem/restrictions.md` and `_docs/00_problem/acceptance_criteria.md` (each ADR must reference relevant constraints / AC by ID)
+- Optional: `_docs/01_solution/solution.md` and `_docs/01_solution/tech_stack.md` (research output)
+- Optional: `_docs/LESSONS.md` — surface any lesson categories of `architecture` / `dependencies` that bias the recommendation
+
+## What is an ADR (and what is not)
+
+Capture an ADR when **all** of the following hold:
+
+1. The decision picks between two or more genuinely valid approaches with meaningful trade-offs.
+2. The decision has **downstream consequences** that other decisions, code, or tasks inherit from.
+3. The decision is **non-obvious** to a future reader who only sees the final code — they would ask "why was it built this way?" rather than discovering the answer by reading the source.
+
+Do NOT create an ADR for:
+
+- Naming, formatting, or purely cosmetic choices.
+- A choice that is fully implied by a single explicit restriction (`restrictions.md` is itself the record — link to it from the architecture doc instead).
+- A choice the team has not actually made yet — open questions live in `risk_mitigations.md` or `_docs/_process_leftovers/`, not in ADRs.
+- A technology selection where research already produced an exact-fit selection with one viable option (the research doc is the record — link to the relevant `solution_draft*.md` section).
+
+## Process
+
+### Phase 4.5a: Decision Inventory
+
+Walk the inputs and list candidate decisions. For each candidate, record a one-liner:
+
+```
+- [decision] — [trade-off summary] — [downstream consumers] — [evidence file:section]
+```
+
+Inspect at minimum:
+
+| Inspection target | Typical decisions surfaced |
+|-------------------|----------------------------|
+| `architecture.md` § layering | Layering style (clean vs hex vs n-tier), which layer owns transactions, how cross-cutting concerns enter |
+| `architecture.md` § Architecture Vision | The North Star principle (e.g., "edge-first, sync-second"); ADR captures the implication for one specific subsystem |
+| `data_model.md` | Datastore choice (Postgres vs Mongo), partitioning, soft vs hard deletes, schema evolution strategy |
+| `system-flows.md` | Sync vs async boundaries, idempotency strategy, retry policy ownership, error envelope shape |
+| `components/*/description.md` § interfaces | Public-API style (REST vs RPC vs event), versioning strategy, auth/authorization placement |
+| `deployment/containerization.md` | Single container vs sidecar vs init container, base image lineage |
+| `deployment/ci_cd_pipeline.md` | Trunk-based vs feature-branch, gate ordering, deploy strategy (blue-green / canary / all-at-once) |
+| `deployment/observability.md` | Logging stack, metric backend, sampling rate decisions, retention |
+| `risk_mitigations.md` | Risk-acceptance trade-offs (e.g., "we accept N% data loss in exchange for sub-100ms p99") |
+| Tech-stack from `_docs/01_solution/tech_stack.md` | Anything where research recorded ≥2 candidates and a winner |
+
+Drop any candidate that fails the three "what is an ADR" criteria above. Keep the rest.
+
+### Phase 4.5b: Numbering and Slugs
+
+ADRs are numbered globally per project, monotonically, never re-used.
+
+1. List existing files under `_docs/02_document/adr/` matching `^[0-9]{3}_.+\.md$`.
+2. The next ADR number is `max(existing) + 1`, zero-padded to 3 digits.
+3. The slug is kebab-case, ≤6 words, derived from the decision summary. Example: `001_use-postgres-for-transactional-data.md`, `004_event-driven-cross-component-comms.md`.
+
+### Phase 4.5c: Render One ADR Per Decision
+
+For each kept candidate, render the ADR using `templates/adr.md`. Required sections (do NOT omit any):
+
+| Section | Content |
+|---------|---------|
+| **Number** | `NNN` |
+| **Title** | One-line decision statement (matches slug) |
+| **Status** | `Proposed` (only during Step 4.5 iteration) → `Accepted` (after user confirmation at the BLOCKING gate) |
+| **Date** | YYYY-MM-DD (the date the user confirmed) |
+| **Deciders** | The user (project owner) — the AI is not a decider |
+| **Context** | The problem this decision addresses, including links to AC IDs, restriction IDs, risks, and (where relevant) the research draft section |
+| **Decision** | The chosen approach in one sentence, then the supporting detail |
+| **Alternatives Considered** | Each alternative with a one-line "rejected because…" |
+| **Consequences** | Positive (what becomes easier / cheaper / faster) and negative (what becomes harder / locked in / costly to undo). Be honest — every decision has a downside. |
+| **Supersedes / Superseded by** | Empty initially; updated when a future ADR overturns this one |
+| **Evidence** | File-and-section pointers into `_docs/` showing where the decision is reflected (architecture.md § layering, components/02_*/description.md § interface, etc.) |
+
+After rendering, write each file to `_docs/02_document/adr/NNN_<slug>.md`. Keep `Status: Proposed` until the BLOCKING gate.
+
+### Phase 4.5d: Maintain the ADR Index
+
+Write or update `_docs/02_document/adr/README.md` with this exact shape:
+
+```markdown
+# Architecture Decision Records
+
+This index lists every ADR for this project, in number order. ADRs are immutable once `Accepted` —
+new decisions that overturn a prior ADR are recorded as new ADRs whose `Supersedes` field points
+back, and the original ADR's `Superseded by` field is updated.
+
+| # | Title | Status | Date | Supersedes |
+|---|-------|--------|------|------------|
+| 001 | Use Postgres for transactional data | Accepted | 2026-05-21 | — |
+| 002 | Event-driven cross-component comms | Accepted | 2026-05-21 | — |
+| ... | ... | ... | ... | ... |
+```
+
+Sort by `#` ascending. Include all ADRs ever written, even superseded ones — the audit trail is the point.
+
+### Phase 4.5e: Cross-Link from architecture.md
+
+In `architecture.md`, every section that reflects an ADR decision gets a one-line trailing reference:
+
+```markdown
+> See ADR 001 (Use Postgres for transactional data), ADR 003 (Event-driven cross-component comms).
+```
+
+Place the reference at the end of the section, after the prose. This lets a future reader of `architecture.md` jump straight to the rationale.
+
+### Phase 4.5f: BLOCKING Gate — User Confirmation
+
+Present the ADR set to the user using the Choose format from `.cursor/skills/autodev/protocols.md` (or plain text if AskQuestion is unavailable):
+
+```
+══════════════════════════════════════
+ DECISION REQUIRED: ADR set captured (N records)
+══════════════════════════════════════
+ 001 — [title]
+ 002 — [title]
+ ...
+══════════════════════════════════════
+ A) Accept all ADRs as written
+ B) Edit specific ADRs (numbers and edits)
+ C) Add a missed decision (description)
+ D) Remove an ADR (number and reason)
+══════════════════════════════════════
+ Recommendation: A — review the rendered set and confirm; corrections are quick on Round 2
+══════════════════════════════════════
+```
+
+Loop:
+
+- **A** → flip every ADR's `Status` from `Proposed` to `Accepted`, set `Date` to today's date, save, exit step.
+- **B** → apply edits, re-present the modified ADRs, loop.
+- **C** → run Phase 4.5a–4.5e for the missed decision only, append to the set, re-present, loop.
+- **D** → confirm with the user that the candidate fails the three "what is an ADR" criteria, remove the file, update the index, loop.
+
+Do NOT mark `Accepted` without an explicit user A.
+
+## Self-verification
+
+- [ ] Every kept candidate from Phase 4.5a has a corresponding file under `adr/`
+- [ ] Every ADR has all required sections (none empty except `Supersedes` / `Superseded by`)
+- [ ] `Decision` sections are one-sentence-then-detail, not "we'll figure it out"
+- [ ] `Alternatives Considered` lists at least one rejected alternative per ADR
+- [ ] `Consequences` lists both positive AND negative consequences (an ADR with no negatives is suspect)
+- [ ] `Evidence` points at real `_docs/` sections that exist on disk
+- [ ] `adr/README.md` index lists every file in the directory and matches their `Status` / `Date`
+- [ ] `architecture.md` has a trailing `See ADR …` reference at every section that an ADR reflects
+- [ ] The user confirmed the set via Choose A; every ADR is `Accepted` with today's date
+
+## Common mistakes
+
+- **Re-opening architecture**: Step 4.5 records, it does not decide. If a candidate decision turns out to be unsettled, that's a Step 2 / Step 4 gap — return there, do not paper over it with a wishy-washy ADR.
+- **Decision-of-the-week**: do not write an ADR for every minor pattern choice. The bar is "non-obvious to a future reader". 5–15 ADRs is typical for a planning round; 40+ is over-capture.
+- **Negative consequences left empty**: every real decision has costs. If you cannot name one, the decision was not actually weighed.
+- **Vague evidence**: `architecture.md` is not enough — point at the specific section. `architecture.md § Layering` ≠ `architecture.md`.
+- **Numbering reuse**: never recycle a number from a deleted ADR. The audit trail is more important than tidy numbering.
+- **Superseding without recording**: when a later cycle overturns an ADR, the new ADR must point at the old one via `Supersedes`, AND the old ADR's `Superseded by` field must be updated. Index reflects both. (This is enforced when `decompose` or `refactor` later updates ADRs.)
+
+## Escalation
+
+| Situation | Action |
+|-----------|--------|
+| Candidate decision is unsettled (the team has not actually decided) | Return to the originating step (2 / 3 / 4); do NOT write a placeholder ADR |
+| Two candidates in Phase 4.5a turn out to be the same decision phrased differently | Merge into one ADR, list both phrasings in `Context` |
+| User picks D (remove an ADR) and the AI judges the decision is genuinely worth recording | Surface the disagreement, ASK why the user wants it removed, defer to user |
+| Existing `adr/` directory has files but `adr/README.md` is missing or stale | Rebuild the index from the directory before adding new ADRs |
@@ -2,7 +2,7 @@

 **Role**: Professional Quality Assurance Engineer

-**Goal**: Write test specs for each component achieving minimum 75% acceptance criteria coverage
+**Goal**: Write test specs for each component achieving the canonical minimum acceptance-criteria coverage (currently 75% — see `.cursor/rules/cursor-meta.mdc` Quality Thresholds; do not restate a different number here)

 **Constraints**: Test specs only — no test code. Each test must trace to an acceptance criterion.

@@ -0,0 +1,67 @@
+# ADR-{NNN}: {decision-title}
+
+- **Status**: {Proposed | Accepted | Deprecated | Superseded}
+- **Date**: {YYYY-MM-DD}
+- **Deciders**: {user / project owner}
+- **Supersedes**: {ADR-NNN | —}
+- **Superseded by**: {ADR-NNN | —}
+
+## Context
+
+What problem does this decision address? Cite the relevant constraint(s), acceptance criterion / criteria, and risk(s) by ID.
+
+- Acceptance criteria addressed: AC-{ID-1}, AC-{ID-2}
+- Restrictions addressed: R-{ID-1}, R-{ID-2}
+- Risks addressed: RISK-{ID-1}
+- Research source (if any): `_docs/01_solution/solution_draftN.md` § {section}
+
+A short paragraph (3–6 sentences) explaining why a choice is required now and what makes it non-trivial. Do not pre-announce the decision here — that goes in `Decision`. Focus on the forces at play (load, scale, team familiarity, hardware constraints, regulatory drivers, third-party limits).
+
+## Decision
+
+One declarative sentence: **"We will …"** Then 1–3 paragraphs of supporting detail explaining how the decision will be implemented at the boundaries between components.
+
+Be specific. "We will use Postgres" is too thin; "We will use Postgres 16 with logical replication for read scaling, restricting JSONB columns to top-level metadata only, with all transactional data in normalized tables" is the right resolution.
+
+## Alternatives Considered
+
+| Alternative | Rejected because |
+|-------------|------------------|
+| {Alt 1 — short label} | {one line: the cost / mismatch / risk that ruled it out, ideally referencing a measurable criterion} |
+| {Alt 2 — short label} | {one line} |
+| {Alt 3 — short label} | {one line} |
+
+At least one rejected alternative is mandatory. If only one option was ever considered, this is not an ADR — link to the source restriction or research selection from the parent doc instead.
+
+## Consequences
+
+### Positive
+
+- {What becomes easier / cheaper / faster, with concrete examples where possible}
+- {…}
+
+### Negative
+
+- {What becomes harder / locked in / costly to undo}
+- {…}
+
+Every real decision has both. If the negatives section is hard to fill, the alternatives were probably not weighed seriously — return to the prior step.
+
+### Neutral / Open
+
+- {What is unchanged but worth flagging for future readers (e.g., "this does not change the auth boundary; auth remains in component 02_user_management as decided in ADR-003")}
+
+## Evidence
+
+Where this decision is reflected on disk. Use `file:section` links so future readers can jump.
+
+- `_docs/02_document/architecture.md` § {section}
+- `_docs/02_document/data_model.md` § {section}
+- `_docs/02_document/components/{##_name}/description.md` § {section}
+- `_docs/02_document/system-flows.md` § {flow name}
+- `_docs/02_document/deployment/{file}.md` § {section}
+- {add more as needed}
+
+## Notes
+
+Optional. Use for caveats that did not fit above, links to external research, or follow-ups that the team agreed to revisit on a known trigger ("re-evaluate after 6 months in production" / "re-evaluate when load exceeds 10× baseline").
@@ -1,6 +1,6 @@
 # Final Planning Report Template

-Use this template after completing all 6 steps and the quality checklist. Save as `_docs/02_document/FINAL_report.md`.
+Use this template after completing all steps (1, 2, 3, 4, 4.5, 5, 6) and the quality checklist. Save as `_docs/02_document/FINAL_report.md`.

 ---

@@ -39,6 +39,44 @@ Write `RUN_DIR/analysis/research_findings.md`:
 4. Prioritize changes by impact and effort
 5. Reject or escalate any proposed refactor that improves code structure while weakening required behavior, integration contracts, runtime constraints, safety/security posture, or acceptance criteria

+### 2b.1. ADR Superseding Gate (BLOCKING)
+
+A refactor that improves code structure while overturning a documented architecture decision is the silent-drift class the project repeatedly burns on (see `meta-rule.mdc` § GPS-passthrough postmortem and the auto-lessons it produced). This gate makes drift visible and forces a deliberate ADR update.
+
+1. **List candidate ADRs**: read every `Status: Accepted` file in `_docs/02_document/adr/`. If the directory does not exist or contains only the index, log `No ADRs in scope` to `RUN_DIR/analysis/adr_impact.md` and skip the rest of this gate.
+2. **Diff each candidate against the proposed refactor roadmap**: for each ADR, ask the same two questions as code-review Phase 7:
+   - **Violation**: does any roadmap item do the *opposite* of the ADR's `Decision`?
+   - **Drift**: does any roadmap item materially affect the ADR's `Consequences` (positive or negative) without contradicting the Decision outright?
+3. **Classify each impacted ADR** in `RUN_DIR/analysis/adr_impact.md`:
+
+   | ADR | Roadmap item | Impact | Required action |
+   |-----|--------------|--------|-----------------|
+   | NNN | `roadmap-item-NN` | Violation / Drift / Aligned | (filled by Choose A/B/C below) |
+
+4. **For every Violation row, present a BLOCKING Choose**:
+
+   ```
+   ══════════════════════════════════════
+    DECISION REQUIRED: Refactor would violate ADR-NNN (<title>)
+   ══════════════════════════════════════
+    A) Update the ADR via supersede: the refactor produces a NEW ADR
+       (`Supersedes: NNN`) capturing the new Decision, and ADR-NNN's
+       `Superseded by` field is updated. The supersede ADR is itself a
+       deliverable of this refactor run (added to RUN_DIR/analysis/adr_impact.md
+       and to TASKS_DIR as a task) and must be `Accepted` before Phase 4.
+    B) Reduce the refactor scope to NOT violate ADR-NNN
+    C) Re-evaluate ADR-NNN: keep the refactor but only after ADR-NNN is
+       formally re-opened in a new /plan Step 4.5 round
+   ══════════════════════════════════════
+    Recommendation: A — supersede is the only path that keeps the audit
+    trail intact while letting the refactor land
+   ══════════════════════════════════════
+   ```
+
+5. **For every Drift row**: do not block, but the roadmap item must include a `## ADR Impact` section in its task spec citing the affected ADR(s). The implementer surfaces this at code-review Phase 7, which would otherwise classify the change as ADR-Drift (High) without context.
+6. **For every Aligned row**: cite the ADR in the roadmap item's task spec under `## ADR Compliance`. No further action.
+7. **Self-supersede deliverable**: any Choose A path adds a `[##]_supersede_adr_NNN.md` task file to the refactor run's TASKS_DIR with the new ADR text drafted (using `.cursor/skills/plan/templates/adr.md`). The task's only Acceptance Criterion is "ADR file exists at `_docs/02_document/adr/<next>_<slug>.md` with `Status: Accepted`, ADR-NNN's `Superseded by` field updated, and `_docs/02_document/adr/README.md` index reflects both."
+
 Present optional hardening tracks for user to include in the roadmap:

 ```
@@ -67,6 +105,8 @@ Write `RUN_DIR/analysis/refactoring_roadmap.md`:

 **BLOCKING applicability gate**: Before 2c and 2d, every recommendation in the roadmap must be `Selected`. Items marked `Rejected` are excluded. Items marked `Experimental only` or `Needs user decision` require a user decision before task creation.

+**BLOCKING ADR-supersede gate**: Before 2c and 2d, every Violation row in `RUN_DIR/analysis/adr_impact.md` (from 2b.1) must be resolved via Choose A, B, or C. A Violation row with no chosen path blocks task creation.
+
 ## 2c. Create Epic

 Create a work item tracker epic for this refactoring run:
@@ -111,6 +151,10 @@ Convert the finalized `RUN_DIR/list-of-changes.md` into implementable task files
 - [ ] Task dependencies are consistent (no circular dependencies)
 - [ ] `_dependencies_table.md` includes all refactoring tasks
 - [ ] Every task has a work item ticket (or PENDING placeholder)
+- [ ] If `_docs/02_document/adr/` exists with Accepted ADRs, `RUN_DIR/analysis/adr_impact.md` has been written and every Violation row is resolved (A/B/C) — no implicit overrides
+- [ ] For every Violation resolved via Choose A, a `[##]_supersede_adr_NNN.md` task exists in TASKS_DIR with the drafted supersede ADR
+- [ ] For every Drift row, the corresponding roadmap-item task spec has a `## ADR Impact` section
+- [ ] For every Aligned row, the corresponding roadmap-item task spec has a `## ADR Compliance` section

 **Save action**: Write analysis artifacts to RUN_DIR, task files to TASKS_DIR

@@ -15,9 +15,9 @@ Before designing or implementing any new tests, check what already exists:
 1. Scan the project for existing test files (unit tests, integration tests, blackbox tests)
 2. Run the existing test suite — record pass/fail counts
 3. Measure current coverage against the areas being refactored (from `RUN_DIR/list-of-changes.md` file paths)
-4. Assess coverage against thresholds:
+4. Assess coverage against thresholds (canonical: see `.cursor/rules/cursor-meta.mdc` Quality Thresholds — never hardcode a different number):
   - Minimum overall coverage: 75%
-   - Critical path coverage: 90%
+   - Critical path coverage: **90% floor / 100% aim** — 90% is the enforcement floor (blocks Phase 4 if not met); 100% is the aspirational target. Refactors are NOT permitted to drop below 90% on the critical paths covered by the in-scope changes.
   - All public APIs must have blackbox tests
   - All error handling paths must be tested

@@ -47,7 +47,7 @@ For each uncovered critical area, write test specs to `RUN_DIR/test_specs/[##]_[
 4. Document any discovered issues

 **Self-verification**:
- [ ] Coverage requirements met (75% overall, 90% critical paths) across existing + new tests
+- [ ] Coverage requirements met (75% overall, 90% critical-path floor — 100% aim — per canonical `cursor-meta.mdc` Quality Thresholds) across existing + new tests
 - [ ] All tests pass on current codebase
 - [ ] All public APIs in refactoring scope have blackbox tests
 - [ ] Test data fixtures are configured
@@ -45,7 +45,7 @@ Write `RUN_DIR/test_sync/new_tests.md`:
 - [ ] All obsolete tests removed or merged
 - [ ] All pre-existing tests pass after updates
 - [ ] New code from Phase 4 has test coverage
- [ ] Overall coverage meets or exceeds Phase 3 baseline (75% overall, 90% critical paths)
+- [ ] Overall coverage meets or exceeds Phase 3 baseline (75% overall, 90% critical-path floor / 100% aim — per `.cursor/rules/cursor-meta.mdc` Quality Thresholds)
 - [ ] No tests reference removed or renamed code

 **Save action**: Write test_sync artifacts; implemented tests go into the project's test folder
@@ -0,0 +1,290 @@
+---
+name: release
+description: |
+  Executes the deployment plan produced by /deploy against a target environment.
+  Closes the loop between "we have a plan" and "the new version is running in production with a verdict on disk."
+  6-phase workflow: pre-release gate, strategy select, execute, smoke test, watch window, commit-or-rollback.
+  Outputs _docs/04_release/release_<version>.md with a definitive Released / Rolled-Back / Aborted verdict.
+  Trigger phrases:
+  - "release", "ship", "go live", "release this version"
+  - "deploy to prod", "promote to staging", "roll out"
+  - "rollback", "abort the release"
+category: ship
+tags: [release, deployment, rollback, smoke-test, observability, production]
+disable-model-invocation: true
+---
+
+# Release Execution
+
+The `/deploy` skill produces a plan and scripts. The `/release` skill **runs** them, verifies the live system, watches it for a defined window, and produces a definitive verdict on disk.
+
+## Core Principles
+
+- **Real execution, not simulation**: every phase must actually run against the target environment. If a phase cannot be executed (missing scripts, no SSH access, disabled secrets, registry auth failure), STOP — do not pretend a step succeeded. See `meta-rule.mdc` § "Real Results, Not Simulated Ones".
+- **Verifiable rollback path**: the release does not start until rollback is proven viable for this version. "We can roll back" without evidence is not a rollback path.
+- **Quiet failure is a release failure**: a deploy script that exits 0 but emits no observable signal in the watch window is treated as a regression, not a success.
+- **One release per invocation**: a single `/release` execution targets exactly one version against exactly one environment. Multi-stage promotion (staging → prod) is two invocations, not one.
+- **Never skip the watch window**: even successful deploys can degrade after 5–60 minutes (cache warm-up, scheduled jobs, downstream backpressure). The watch window is mandatory.
+- **Autonomous rollback on hard regressions**: critical health-check failure, error-rate spike above threshold, or smoke-test failure → automatic rollback. Soft regressions (latency drift, capacity warnings) escalate to the user.
+
+## Context Resolution
+
+Fixed paths:
+
+- DEPLOY_DIR: `_docs/04_deploy/`
+- RELEASE_DIR: `_docs/04_release/`
+- SCRIPTS_DIR: `scripts/`
+- DEPLOY_SCRIPT: `scripts/deploy.sh`
+- HEALTH_SCRIPT: `scripts/health-check.sh`
+- ENV_TEMPLATE: `.env.example`
+- OBSERVABILITY_DOC: `_docs/04_deploy/observability.md`
+- ENVIRONMENT_DOC: `_docs/04_deploy/environment_strategy.md`
+- PROCEDURES_DOC: `_docs/04_deploy/deployment_procedures.md`
+- ARCHITECTURE: `_docs/02_document/architecture.md`
+- RESTRICTIONS: `_docs/00_problem/restrictions.md`
+
+Announce the resolved paths and the **target environment + version + strategy** to the user before any phase that touches the live system.
+
+## Inputs (BLOCKING prerequisites)
+
+| Input | Required | Source |
+|-------|----------|--------|
+| Target environment | Yes — ASK user | `environment_strategy.md` enumerates valid options |
+| Target version / image tag | Yes — ASK user | Must exist in the registry; verified in Phase 1 |
+| Rollback target version | Yes — ASK user | Defaults to currently-deployed version if discoverable |
+| `scripts/deploy.sh` | Yes | Produced by `/deploy` Step 7. STOP if missing → run `/deploy` first |
+| `scripts/health-check.sh` | Yes | Same |
+| `_docs/04_deploy/deployment_procedures.md` | Yes | Defines per-environment runbook, manual approval rules, change-window restrictions |
+| `_docs/04_deploy/observability.md` | Yes | Defines watch metrics, thresholds, and dashboards |
+| `_docs/04_deploy/environment_strategy.md` | Yes | Defines target hostnames, registries, secrets, deploy strategy per env |
+
+## Outputs
+
+```
+RELEASE_DIR/
+├── release_<version>_<env>_<YYYY-MM-DD-HHmm>.md       (mandatory; one per invocation)
+├── rollback_<version>_<env>_<YYYY-MM-DD-HHmm>.md      (only when rollback fires; pairs with the release file)
+└── manual_approvals/
+    └── approval_<version>_<env>.md                     (when restrictions require manual approval, written before Phase 3)
+```
+
+The release report (`templates/release-report.md`) is appended to as each phase completes — it is durable across phase failures and reflects partial progress so the next operator can resume or audit.
+
+## Phases
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│             Release Execution (6-Phase Method)                  │
+├────────────────────────────────────────────────────────────────┤
+│ PREREQ: deploy artifacts on disk; tests green at HEAD          │
+│                                                                │
+│ 1. Pre-Release Gate     → AC + change summary + readiness      │
+│    [BLOCKING: user confirms or aborts]                         │
+│ 2. Strategy Select      → all-at-once / blue-green / canary    │
+│    [BLOCKING: user picks strategy]                             │
+│ 3. Execute              → run deploy.sh, capture exit + logs   │
+│    [AUTO-ROLLBACK on non-zero exit]                            │
+│ 4. Smoke Test           → /test-run prod-smoke in target env   │
+│    [AUTO-ROLLBACK on failure]                                  │
+│ 5. Watch Window         → poll observability for N minutes     │
+│    [AUTO-ROLLBACK on hard threshold breach]                    │
+│ 6. Commit or Rollback   → finalize verdict, update tracker     │
+│    [BLOCKING: user confirms only if soft regression escalated] │
+├────────────────────────────────────────────────────────────────┤
+│ Verdicts: Released · Rolled-Back · Aborted                     │
+└────────────────────────────────────────────────────────────────┘
+```
+
+### Phase 1: Pre-Release Gate
+
+**Goal**: Refuse to start if the system is not ready for a real release.
+
+1. **Acceptance criteria check**: read `_docs/00_problem/acceptance_criteria.md`. If any AC is marked unmet OR if any AC has no associated test marked `Passed` in the latest `test-run` report, STOP and surface the unmet items. Do not let the user override with "ship anyway" without a recorded reason in the release report.
+2. **Test status check**: read the most recent `_docs/06_metrics/perf_*.md` (if perf is required by restrictions) and the latest functional test report. Any failing or skipped test that maps to a critical-path AC blocks the release.
+3. **Change summary**: read the git log between the version-tag-of-last-release and HEAD (or, if no prior release exists, from the project root commit). Render a short list grouped by component: features, fixes, breaking changes, security fixes. Cross-reference against the latest implementation reports under `_docs/03_implementation/`.
+4. **Rollback readiness**:
+   - Confirm the previous version's image is still pullable from the registry (do not deploy without this).
+   - Confirm `scripts/deploy.sh --rollback` works as documented (read the script; if `--rollback` flag is missing, STOP — that is a deploy-skill bug).
+   - Confirm a rollback target exists (e.g., previously-deployed image tag) and is recorded in the release report under `Rollback Plan`.
+5. **Restrictions**: read `_docs/00_problem/restrictions.md` for change-window rules, manual-approval rules, blackout windows, regulatory requirements (e.g., 4-eyes review, ITAR controls). If any apply, gate accordingly — write a `manual_approvals/approval_<version>_<env>.md` file once received.
+6. **Tracker check**: list tracker tickets in the release scope (per `tracker.mdc` rules). Any ticket still in `In Progress` or `Code Review` that maps to a change in the release scope blocks Phase 1. Move-and-deploy is not allowed.
+
+**BLOCKING gate**: present the assembled summary to the user using Choose A/B/C:
+
+```
+══════════════════════════════════════
+ PRE-RELEASE GATE
+══════════════════════════════════════
+ Target env:        {env}
+ Target version:    {version} ({git-sha})
+ Rollback target:   {previous-version}
+ Changes:           N tickets, M components
+   - {summary list}
+ Open risks:        {summary or "none"}
+ Blocking issues:   {summary or "none"}
+══════════════════════════════════════
+ A) Proceed to Strategy Select
+ B) Abort — fix blocking issue and re-invoke
+ C) Edit release scope — exclude a ticket and reassemble
+══════════════════════════════════════
+```
+
+If A → write Phase 1 section to release report, proceed. If B → write `Aborted` verdict to release report with reason, exit. If C → loop back into Phase 1 with edited scope.
+
+### Phase 2: Strategy Select
+
+**Goal**: Pick the deployment strategy that fits the change risk and environment capability.
+
+Read `environment_strategy.md` and `deployment_procedures.md` to learn which strategies the target env supports. Strategies and when each is appropriate:
+
+| Strategy | When to pick | Risk if wrong |
+|----------|--------------|---------------|
+| **all-at-once** | Internal tools, low traffic, well-rehearsed change, env supports nothing else | All users hit the new version simultaneously — bug blast radius is 100% |
+| **blue-green** | Stateless services with a load balancer, env has dual-stack capability | Cutover is binary — observability must be ready to detect issues fast |
+| **canary** | Customer-facing, traffic-tier load balancer in place, gradual rollout possible | Canary metric thresholds must be well-tuned or canary fails for harmless reasons |
+| **manual** | Non-automatable env (one-off VMs, regulated infrastructure, non-Docker host) | The whole release becomes a runbook and the watch window phases are operator-driven; the release skill records but does not execute |
+
+Recommend a default based on:
+- Risk level inferred from change summary (any breaking change → bias toward canary or blue-green)
+- Restrictions (e.g., regulatory rules forcing manual approval at each step)
+- Environment capability (some envs may only support all-at-once)
+
+**BLOCKING gate**: Choose A/B/C/D between strategies. Record the choice in the release report.
+
+### Phase 3: Execute
+
+**Goal**: Actually run the deploy. Capture exit code and full stdout/stderr.
+
+1. Validate environment file (`.env`) exists, all required vars from `.env.example` are set, no placeholder secrets remain.
+2. Source the env file and run `scripts/deploy.sh` against the target host. The script produced by `/deploy` Step 7 is the point of execution; do NOT bypass it. If a strategy-specific flag is needed (e.g., `--canary 5%`), pass it through.
+3. Stream stdout/stderr to the release report, with timestamps, in a fenced code block under `## Phase 3: Execute`.
+4. Capture exit code.
+5. **AUTO-ROLLBACK trigger**: non-zero exit code → immediately invoke Phase 6 with verdict `Rolled-Back: deploy script failure`. Do NOT continue to Phase 4.
+
+If `deploy.sh` emits no output for more than the configured idle threshold (default 5 minutes; check `deployment_procedures.md` for an explicit value), treat it as hung — capture a snapshot of what's running on the target, kill the script, and AUTO-ROLLBACK with reason `Deploy hung — manual investigation required`.
+
+**Manual strategy**: if Phase 2 picked `manual`, write a checklist of operator steps from `deployment_procedures.md` to the release report and pause until the user types `done` or `failed`. Phase 3 then records the user's report verbatim.
+
+### Phase 4: Smoke Test
+
+**Goal**: Verify the new version is *actually serving traffic correctly* in the target environment.
+
+1. Resolve the smoke-test command from `_docs/02_document/tests/blackbox-tests.md` § Production Smoke Tests, OR delegate to `/test-run` in `--prod-smoke` mode against the target environment.
+2. The smoke-test set must (a) hit each public endpoint of each component, (b) include at least one read AND one write per public endpoint where applicable, and (c) complete in under 5 minutes total.
+3. Capture pass/fail per case to the release report.
+4. **AUTO-ROLLBACK trigger**: any smoke-test failure → invoke Phase 6 with verdict `Rolled-Back: smoke test failure: <test-name>`.
+
+If smoke tests are **missing** for the target environment (no production-mode test set), STOP — write a leftover entry to `_docs/_process_leftovers/` per `tracker.mdc`, do not proceed to watch window without smoke coverage. Write `Aborted: smoke tests missing for prod-mode target` and ASK the user.
+
+### Phase 5: Watch Window
+
+**Goal**: Observe the live system for a defined window to catch latent regressions.
+
+1. Read `observability.md` for the project's metrics, dashboards, and threshold definitions. Required watch metrics for any production target (per cursor-meta convention) include error rate, request rate, p99 latency, and saturation (CPU/memory/queue-depth).
+2. Compute the watch-window duration from `deployment_procedures.md`. If unspecified, default to **15 minutes** for staging and **60 minutes** for production.
+3. Poll the observability backend at 1-minute intervals (or the configured cadence). For each interval, record metric snapshots to the release report.
+4. Threshold rules:
+   - **Hard breach** (auto-rollback): error-rate ≥ 2× baseline, p99 latency ≥ 3× baseline, any health-check failure persisting for 2 consecutive intervals.
+   - **Soft breach** (escalate): metric drift between 1.5× and 2× baseline, single-interval health blip, queue-depth steady but elevated.
+   - **No data** (escalate): if metrics are not flowing within the first 3 minutes, treat the absence as a hard breach — observability is itself broken.
+5. **AUTO-ROLLBACK trigger**: hard breach at any interval. Move to Phase 6 with verdict `Rolled-Back: <metric> breached <multiplier>× baseline at T+<minutes>`.
+6. **ESCALATE trigger**: soft breach. Pause polling, surface the metric, and ask the user A/B/C:
+   - A) Continue watch — accept current drift, keep polling
+   - B) Roll back now — treat soft drift as hard
+   - C) Extend watch window by N minutes
+7. End of watch window with no breach → proceed to Phase 6.
+
+The watch window cannot be skipped. If the user explicitly demands skipping (e.g., emergency rollforward), record the override reason in the release report and continue, but mark the verdict as `Released-with-override` — this triggers an automatic incident retrospective per `retrospective/SKILL.md`.
+
+### Phase 6: Commit or Rollback
+
+**Goal**: Finalize the release with a definitive verdict on disk.
+
+**Path A — Commit (clean release)**:
+1. Update tracker tickets: every ticket in scope moves to `Released` (or `Done`, per project convention defined in `tracker.mdc` / `_docs/_repo-config.yaml`).
+2. Tag the git HEAD with `release/<version>` (or the project's tag convention from `deployment_procedures.md`).
+3. Write the final `Released` verdict to the release report with a summary table.
+4. Trigger `/retrospective --cycle-end` with this release as the cycle terminus.
+5. Auto-chain to autodev's next step (Retrospective in greenfield, or feature-cycle loop start in existing-code).
+
+**Path B — Rollback (auto-fired or user-elected)**:
+1. Run `scripts/deploy.sh --rollback` with the rollback target captured in Phase 1.
+2. Stream output to a new file `RELEASE_DIR/rollback_<version>_<env>_<YYYY-MM-DD-HHmm>.md` AND append a summary to the original release report under `## Rollback`.
+3. Re-run Phase 4 (smoke test) and a 5-minute mini watch window against the rolled-back version. If THAT also fails, escalate immediately — the system is in an unknown state and needs human takeover.
+4. Update tracker tickets back to `Ready for Release` (or the project's pre-release status).
+5. Write the final `Rolled-Back` verdict with full reason chain.
+6. Auto-trigger `/retrospective --incident` with this release as the incident anchor (per `retrospective/SKILL.md` incident mode).
+7. Do NOT auto-chain to anything else — the user owns the next step.
+
+**Path C — Aborted**:
+Reached only via Phase 1 Choose B, Phase 4 smoke-tests-missing escalation, or any phase that detects a precondition violation. Write `Aborted: <reason>` to the release report. Do not auto-chain.
+
+## Self-verification
+
+- [ ] Release report exists at `RELEASE_DIR/release_<version>_<env>_<timestamp>.md` with verdict (Released / Rolled-Back / Aborted)
+- [ ] Every phase that ran has a section in the release report with timestamps and tool output
+- [ ] On Released: tracker tickets moved to release status; git tag pushed (if convention)
+- [ ] On Rolled-Back: rollback report exists at `RELEASE_DIR/rollback_<version>_<env>_<timestamp>.md`; tracker tickets moved back to pre-release status; incident retrospective scheduled
+- [ ] On Aborted: reason recorded; no live-system changes attempted; no tracker movement
+- [ ] No phase was skipped without an explicit reason recorded in the release report
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| `scripts/deploy.sh` missing or `--rollback` unsupported | STOP — return to `/deploy` Step 7, do not patch the script in `/release` |
+| Registry auth failure during pre-release | STOP — fix credentials at infra layer (per `coderule.mdc`); do not embed creds in the script |
+| Smoke tests missing for prod target | STOP — write a leftover; do not improvise smoke tests in `/release` |
+| Observability backend unreachable | STOP — observability blindness is itself a release blocker |
+| User asks to skip the watch window | Record override, mark verdict `Released-with-override`, fire incident retro |
+| Rollback also fails its smoke test | ESCALATE to user — system is in unknown state; do not loop deploys |
+| Tracker MCP returns Unauthorized during ticket movement | Per `tracker.mdc`, write a leftover entry; do NOT silently continue without confirming the move |
+| Multiple environments named in user request | STOP — one release per invocation; ask user to pick one |
+| Production smoke test would touch real customer data | STOP — that is a `coderule.mdc` violation; ask user to define a smoke endpoint or test account |
+
+## Common Mistakes
+
+- **Skipping the watch window when "everything looks fine after deploy"** — a deploy that exited 0 is not a release that's stable. Watch is mandatory.
+- **Faking smoke tests** to pass the gate when the prod test set is incomplete. STOP and surface the gap; do not embed prod URLs into ad-hoc curl commands.
+- **Rolling forward through a failure** ("the next deploy will fix it"). Roll back first, fix the cause, then deploy a real fix.
+- **Treating the release report as optional** when only an internal tool changed. Every release writes a report — the audit trail is the value, not the prose volume.
+- **Approving manual gates yourself** without the user's input when restrictions require human approval. The release skill records, the human approves.
+- **Reusing `release_<version>` filenames** across attempted releases. Always include the timestamp in the filename so re-attempts are visible side-by-side.
+- **Letting tracker drift silently** between release attempts. If Phase 6 cannot move tickets, the release is not complete — write a leftover and stop.
+
+## Project Mode vs Standalone
+
+- **Project mode** (default): autodev invokes `/release` after `/deploy`. State writes occur under `_docs/_autodev_state.md`. Full integration with retrospective and feature-cycle loop.
+- **Standalone mode**: `/release` invoked directly with `@<artifact>` (rare; usually only for re-running a rollback against a specific version). All outputs still go to `RELEASE_DIR/`.
+
+## Methodology Quick Reference
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│                Release (6 phases, 3 verdicts)                  │
+├────────────────────────────────────────────────────────────────┤
+│ Phase 1  Pre-Release Gate                                      │
+│           AC + tests + change summary + rollback path          │
+│           [BLOCKING — user A/B/C]                              │
+│ Phase 2  Strategy Select                                       │
+│           all-at-once · blue-green · canary · manual           │
+│           [BLOCKING — user picks]                              │
+│ Phase 3  Execute                                               │
+│           scripts/deploy.sh, capture exit code + logs          │
+│           [AUTO-ROLLBACK on non-zero or hang]                  │
+│ Phase 4  Smoke Test                                            │
+│           /test-run --prod-smoke against target                │
+│           [AUTO-ROLLBACK on any failure]                       │
+│ Phase 5  Watch Window                                          │
+│           Poll observability for N minutes                     │
+│           [AUTO-ROLLBACK on hard breach; escalate on soft]     │
+│ Phase 6  Commit or Rollback                                    │
+│           Released → tracker, tag, retrospective               │
+│           Rolled-Back → tracker reset, incident retrospective  │
+│           Aborted → no live-system change                      │
+├────────────────────────────────────────────────────────────────┤
+│ Principles: real execution · verifiable rollback ·             │
+│             quiet failure = release failure ·                  │
+│             watch window mandatory                             │
+└────────────────────────────────────────────────────────────────┘
+```
@@ -0,0 +1,114 @@
+# Release Report — {version} → {env}
+
+- **Date**: {YYYY-MM-DD HH:MM} {timezone}
+- **Operator**: {user}
+- **Strategy**: {all-at-once | blue-green | canary | manual}
+- **Verdict**: {Released | Released-with-override | Rolled-Back | Aborted}
+- **Verdict reason**: {one-line summary}
+
+## Pre-Release Gate (Phase 1)
+
+### Acceptance Criteria
+
+| AC ID | Status | Evidence |
+|-------|--------|----------|
+| AC-001 | Met / Unmet | path:section, test report, etc. |
+
+### Test Status
+
+| Suite | Pass | Fail | Skip | Source |
+|-------|------|------|------|--------|
+| Functional | N | N | N | _docs/03_implementation/{batch}.md |
+| Performance | N | N | N | _docs/06_metrics/perf_*.md |
+
+### Change Summary
+
+| Component | Tickets | Type |
+|-----------|---------|------|
+| {component} | TKT-001, TKT-002 | feature / fix / breaking / security |
+
+### Rollback Plan
+
+- Previous version: `{previous-version}` (registry digest: `{sha}`)
+- Rollback script: `scripts/deploy.sh --rollback`
+- Rollback target verified pullable: yes / no
+- Rollback target verified bootable in target env: yes / no
+
+### Restrictions / Approvals
+
+- Change-window restrictions: {none | description}
+- Manual approvals required: {none | reference to approval file}
+
+### Tracker State at Gate
+
+- Tickets in scope: {N}
+- Tickets blocking release: {0 — list any}
+
+## Strategy Select (Phase 2)
+
+- Recommended: {strategy} — reasoning
+- Chosen: {strategy} — reasoning (if differs from recommended)
+
+## Execute (Phase 3)
+
+- Start: {timestamp}
+- End: {timestamp}
+- Exit code: {0 / non-zero}
+
+```
+<scripts/deploy.sh stdout/stderr stream, with timestamps>
+```
+
+## Smoke Test (Phase 4)
+
+- Mode: {/test-run --prod-smoke | manual smoke set}
+- Start: {timestamp}
+- End: {timestamp}
+
+| Test | Result | Notes |
+|------|--------|-------|
+| {name} | Pass / Fail | response time, status, etc. |
+
+## Watch Window (Phase 5)
+
+- Duration: {minutes}
+- Cadence: {minutes per poll}
+- Backend: {observability source — Prometheus, CloudWatch, Datadog, etc.}
+
+| T+min | error_rate | rps | p99_latency | saturation | health | notes |
+|-------|------------|-----|-------------|------------|--------|-------|
+| 0 | … | … | … | … | OK | … |
+| 1 | … | … | … | … | OK | … |
+| … | … | … | … | … | … | … |
+
+### Threshold breaches
+
+- {None | "p99 latency 1.7× baseline at T+8 — soft breach, user accepted continuation"}
+
+## Commit or Rollback (Phase 6)
+
+### If Released
+
+- Tracker tickets moved: {list}
+- Git tag pushed: {tag} → {sha}
+- Retrospective scheduled: yes — {/retrospective --cycle-end output path}
+
+### If Rolled-Back
+
+- Trigger: {auto / user-elected}
+- Reason: {phase + one-line cause}
+- Rollback start: {timestamp}
+- Rollback end: {timestamp}
+- Post-rollback smoke: pass / fail
+- Tracker tickets moved back: {list}
+- Incident retrospective scheduled: yes — {/retrospective --incident output path}
+
+### If Aborted
+
+- Phase that aborted: {1 / 2 / 3 / 4 / 5}
+- Reason: {one-line cause}
+- No live-system changes attempted: yes / no (if live changes, document under Phase 3 above and treat as Rolled-Back instead)
+
+## Lessons (one-liners; full incident retro if Rolled-Back / Released-with-override)
+
+- {Optional: short one-liner observations the operator wants the next /retrospective to consider}
@@ -2,9 +2,9 @@
 name: retrospective
 description: |
  Collect metrics from implementation batch reports and code review findings, analyze trends across cycles,
-  and produce improvement reports with actionable recommendations.
-  3-step workflow: collect metrics, analyze trends, produce report.
-  Outputs to _docs/06_metrics/.
+  and produce improvement reports plus a lessons-log update with actionable recommendations.
+  4-step workflow: collect metrics, analyze trends, produce report, update lessons log.
+  Outputs to _docs/06_metrics/ and appends to _docs/LESSONS.md (ring buffer, last 15).
  Trigger phrases:
  - "retrospective", "retro", "run retro"
  - "metrics review", "feedback loop"
@@ -232,7 +232,7 @@ Present the report summary to the user.

 ```
 ┌────────────────────────────────────────────────────────────────┐
-│              Retrospective (3-Step Method)                     │
+│              Retrospective (4-Step Method)                     │
 ├────────────────────────────────────────────────────────────────┤
 │ PREREQ: batch reports exist in _docs/03_implementation/        │
 │                                                                │
@@ -202,12 +202,12 @@ If invoked in `cycle-update` mode (see "Invocation Modes" above), read and follo
 | Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed |
 | Missing input_data/expected_results/results_report.md | **STOP** — ask user to provide expected results mapping using the template |
 | Ambiguous requirements | ASK user |
-| Input data coverage below 75% (Phase 1) | Search internet for supplementary data, ASK user to validate |
+| Input data coverage below the canonical threshold (Phase 1) | Search internet for supplementary data, ASK user to validate. See `.cursor/rules/cursor-meta.mdc` Quality Thresholds for the canonical 75% number — do not hardcode a different threshold here. |
 | Expected results missing or not quantifiable (Phase 1) | ASK user to provide quantifiable expected results before proceeding |
 | Test scenario conflicts with restrictions | ASK user to clarify intent |
 | System interfaces unclear (no architecture.md) | ASK user or derive from solution.md |
 | Test data or expected result not provided for a test scenario (Phase 3) | WARN user and REMOVE the test |
-| Final coverage below 75% after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec |
+| Final coverage below the canonical threshold after removals (Phase 3) | BLOCK — require user to supply data or accept reduced spec (see `cursor-meta.mdc` Quality Thresholds) |

 ## Common Mistakes

@@ -252,7 +252,8 @@ When the user wants to:
 │                                                                      │
 │ Phase 3: Test Data & Expected Results Validation Gate (HARD GATE)    │
 │   → phases/03-data-validation-gate.md                                │
-│   [BLOCKING: coverage ≥ 75% required to pass]                        │
+│   [BLOCKING: coverage ≥ canonical threshold required to pass —      │
+│    see cursor-meta.mdc Quality Thresholds (75%)]                    │
 │                                                                      │
 │ Hardware-Dependency Assessment (BLOCKING, pre-Phase-4)               │
 │   → phases/hardware-assessment.md                                    │
@@ -1,7 +1,7 @@
 # Phase 3: Test Data & Expected Results Validation Gate (HARD GATE)

 **Role**: Professional Quality Assurance Engineer
-**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above 75%.
+**Goal**: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above the canonical threshold (currently 75% — see `.cursor/rules/cursor-meta.mdc` Quality Thresholds; never hardcode a different number in any phase).
 **Constraints**: This phase is MANDATORY and cannot be skipped.

 ## Step 1 — Build the requirements checklist
@@ -0,0 +1,42 @@
+# AZ-688: dev-only environment for the Jetson e2e harness.
+# Jetson-only test policy (2026-05-20) — see _docs/LESSONS.md.
+#
+# Copy this file to `.env.test` and customize. NEVER commit `.env.test`
+# (gitignored). Sourced by `scripts/run-tests-jetson.sh` before
+# `docker compose up`.
+
+# Suite JWT contract — see ../_docs/10_auth.md. The same secret signs the
+# dev JWT (AZ-690) and validates it at the satellite-provider boundary.
+# MUST be ≥ 32 bytes UTF-8. Generate a fresh value with:
+#   openssl rand -hex 32
+JWT_SECRET=DEV-ONLY-REPLACE-WITH-OPENSSL-RAND-HEX-32-OUTPUT-XXXXXXX
+
+# JWT issuer / audience claims. Dev-only values that ONLY validate against
+# the dev secret above. Production deploys MUST use real values provided
+# by the admin team (the admin API stamps `iss`; satellite-provider
+# validates `aud`).
+JWT_ISSUER=DEV-ONLY-iss-admin-azaion-local
+JWT_AUDIENCE=DEV-ONLY-aud-satellite-provider
+
+# Google Maps Platform key. Left empty: AZ-689 seeds local fixture tiles
+# instead, so the hermetic Derkachi e2e flow never calls GoogleMaps. If
+# you need to exercise the real GMaps tile-download path, set this to a
+# valid key.
+GOOGLE_MAPS_API_KEY=
+
+# AZ-777: Bearer token C11 sends to satellite-provider as
+# `Authorization: Bearer <token>`. The token is a JWT signed with
+# JWT_SECRET above and stamped with the same iss/aud the provider
+# validates. Mint a dev token with:
+#   python scripts/mint_dev_jwt.py
+# Production deploys retrieve this from the admin API and rotate per
+# operator session — never commit a real one.
+SATELLITE_PROVIDER_API_KEY=PASTE-MINTED-JWT-HERE
+
+# SECURITY: development-only TLS bypass for the parent-suite
+# satellite-provider self-signed dev cert. The compose env block sets
+# SATELLITE_PROVIDER_TLS_INSECURE=1 — it stays inside the Jetson e2e
+# harness, never in production. Production deploys MUST use a real
+# CA-issued cert (or your own internal CA) and leave this unset (or
+# set to "0"). C11 logs a single WARNING at startup whenever the
+# insecure flag is active so the operator can audit it.
@@ -0,0 +1,4 @@
+_docs/00_problem/input_data/flight_derkachi/flight_derkachi.mp4 filter=lfs diff=lfs merge=lfs -text
+models/**/*.pt filter=lfs diff=lfs merge=lfs -text
+models/**/*.onnx filter=lfs diff=lfs merge=lfs -text
+models/**/*.engine filter=lfs diff=lfs merge=lfs -text
@@ -45,6 +45,19 @@ tests/fixtures/tiles_corpus/*.jpg
 tests/fixtures/tiles_corpus/*.png
 e2e/fixtures/sitl_replay/

+# Problem-folder flight-log inputs (binary, out-of-band)
+_docs/00_problem/input_data/**/*.tlog
+_docs/00_problem/input_data/**/*.mp4
+_docs/00_problem/input_data/**/*.h264
+_docs/00_problem/input_data/**/*.mkv
+_docs/00_problem/input_data/**/*.zip
+
+# Locally-generated evidence frames for extraction fixtures (large, regenerable)
+_docs/00_problem/input_data/**/frames_src/
+_docs/00_problem/input_data/**/frames_optA/
+_docs/00_problem/input_data/**/frames_optB/
+_docs/00_problem/input_data/**/frames_optC/
+
 # Editor / OS noise
 .idea/
 .vscode/
@@ -60,9 +73,13 @@ fdr_output/
 tile_cache/
 e2e-results/

+# Local scratch / one-off diagnostics
+_scratch/
+
 # Secrets
 .env
 .env.local
+.env.test
 *.key
 !tests/fixtures/mavlink_signing/dev_key

@@ -12,3 +12,31 @@
 Use this fixture for video/telemetry synchronization checks, representative replay smoke tests, VIO hot-path latency, frame-drop accounting, and trajectory comparison against `GLOBAL_POSITION_INT`. The video and telemetry align at exactly three video frames per telemetry row. Camera intrinsics, lens distortion, raw camera resolution, and exact camera-to-body calibration are still unknown, so this fixture is not sufficient by itself for final production camera calibration or satellite-anchor accuracy claims.

 For the test recording, the rotating camera was mechanically fixed in a downward/nadir orientation. Treat the MP4 as a cleaned/cropped replay fixture rather than the raw camera feed.
+
+## Derkachi C6 reference seeding (cycle 3 — AZ-777 + Epic AZ-835)
+
+The end-to-end replay pipeline needs the C6 tile cache pre-populated with the satellite imagery that covers this flight. The seed scripts live under `tests/fixtures/derkachi_c6/`:
+
+| Script | Purpose |
+|--------|---------|
+| `tests/fixtures/derkachi_c6/seed_region.py` (AZ-777 Phase 2) | Bbox-driven seed. Calls `POST /api/satellite/request` on the running `satellite-provider` to onboard the Derkachi area (~50.05–50.15 lat, 36.05–36.15 lon, zoom 15–18). Companion to the existing bbox-download workflow. |
+| `tests/fixtures/derkachi_c6/seed_route.py` (AZ-838 / Epic AZ-835 C2) | Route-driven seed. Reads `derkachi.tlog`, extracts a ≤ 10-waypoint corridor via `replay_input.tlog_route.extract_route_from_tlog`, posts it to `satellite-provider`'s Route API, polls until `mapsReady=true`, and verifies coverage via inventory. ~100× more tile-efficient than the bbox path for this clip. |
+| `tests/fixtures/derkachi_c6/bbox.yaml` | Derkachi bbox + zoom levels + license-attribution metadata (Google Maps Platform ToS + "Imagery © Google" attribution string). |
+| `tests/fixtures/derkachi_c6/README.md` | Step-by-step re-seeding instructions when the `satellite-provider` postgres is wiped; license-attribution operators must propagate; pointer to the parent-suite ticket (TBD) for migrating to a true CC-BY satellite source for production. |
+
+Both seed scripts require:
+
+- A running `satellite-provider` reachable at `SATELLITE_PROVIDER_URL` (typically `https://satellite-provider:8080` inside the Jetson compose network).
+- A valid JWT — either `SATELLITE_PROVIDER_API_KEY` env var or `--auto-mint-jwt` (uses `scripts/mint_dev_jwt.py`).
+- `SATELLITE_PROVIDER_TLS_INSECURE=1` if the parent suite is using the self-signed dev cert (development only — production deploys must validate against a CA-issued cert).
+
+The end-to-end orchestrator test `tests/e2e/replay/test_az835_e2e_real_flight.py` (AZ-840) takes only `(derkachi.tlog, flight_derkachi.mp4, khp20s30_factory.json)` and runs the full 7-step pipeline against a populated C6 — see `_docs/02_document/contracts/replay/replay_protocol.md` Invariant 12.b for the orchestration.
+
+### License attribution caveat (cycle 3)
+
+The Jetson `satellite-provider` instance downloads from the **Google Maps satellite layer** (`lyrs=s`), governed by Google Maps Platform Terms of Service. This fixture and the seed scripts are dev/research use only. Production deployment requires either:
+
+- Google Maps Platform licensing review for offline-cache use, OR
+- A parent-suite ticket to switch satellite-provider's upstream to a true CC-BY satellite source (Esri World Imagery, Mapbox satellite, Sentinel-2, etc.).
+
+The "Imagery © Google" attribution string is recorded in the seeded catalog's metadata and must be propagated downstream by any operator workflow that surfaces the imagery.
@@ -1,3 +1,34 @@
-Camera model: Topotek KHP20S30
-Daylight Sensor: 1/2.8" CMOS (2.13 Мп).
- Full HD (1920x1080), 30/60 fps
+# Derkachi camera
+
+Camera model: **Topotek KHP20S30**
+Daylight sensor: 1/2.8" CMOS (Sony IMX291-class, 2.13 MP)
+Image resolution: Full HD 1920×1080 @ 30/60 fps
+Lens: 20× optical zoom, f = 4.7 mm – 94 mm
+
+## Calibration
+
+**File**: [`khp20s30_factory.json`](./khp20s30_factory.json)
+**Acquisition method**: `factory_sheet` (AZ-702 — factory-sheet approximation)
+**Assumed zoom setting**: wide-angle (f = 4.7 mm), HFOV ≈ 59.5°
+
+Per-unit checkerboard refinement is **deferred** (no hardware access to the
+Derkachi unit). The factory-sheet calibration is the cheapest reasonable
+starting point. The residual focal-length error is expected to be in the
+**1–3 %** band; at high AGL this may push horizontal position error past the
+AC-3 100 m budget, in which case AZ-699 (T3 real-flight validation) reports
+the honest finding and a follow-up checkerboard task is filed.
+
+### Why factory-sheet (not checkerboard or PnP-from-tlog)
+
+* **Checkerboard**: needs physical access to the airframe + a known-geometry
+  calibration target. Not in scope for AZ-696.
+* **PnP-from-tlog back-computation**: would require a 5-point task in its own
+  right; deferred as an AZ-696 follow-up if the residual budget proves
+  insufficient.
+
+### Replay-test wiring
+
+`tests/e2e/replay/conftest.py::_calibration_path()` prefers this file when
+present and falls back to `tests/fixtures/calibration/adti26.json` otherwise,
+so dev environments that don't carry the calibration file still exercise the
+AC-1 / AC-2 / AC-5 / AC-6 paths.
@@ -0,0 +1,34 @@
+{
+  "camera_id": "khp20s30_factory",
+  "intrinsics_3x3": [
+    [1680.4469, 0.0,       960.0],
+    [0.0,       1680.4469, 540.0],
+    [0.0,       0.0,         1.0]
+  ],
+  "distortion": [0.0, 0.0, 0.0, 0.0, 0.0],
+  "body_to_camera_se3": [
+    [1.0, 0.0, 0.0, 0.0],
+    [0.0, 1.0, 0.0, 0.0],
+    [0.0, 0.0, 1.0, 0.0],
+    [0.0, 0.0, 0.0, 1.0]
+  ],
+  "acquisition_method": "factory_sheet",
+  "metadata": {
+    "model": "Topotek KHP20S30",
+    "sensor": "1/2.8\" CMOS (Sony IMX291-class), 2.13 MP",
+    "image_resolution_px": [1920, 1080],
+    "sensor_width_mm": 5.37,
+    "sensor_height_mm": 3.02,
+    "assumed_focal_length_mm": 4.7,
+    "focal_length_range_mm": [4.7, 94.0],
+    "assumed_zoom": "wide-angle (max FOV, f=4.7 mm)",
+    "computed_hfov_deg": 59.48,
+    "computed_vfov_deg": 35.62,
+    "intrinsics_formula": "fx = fy = focal_mm * (image_width_px / sensor_width_mm); cx = width/2; cy = height/2",
+    "body_to_camera_convention": "identity-down (nadir, camera-z aligned with aircraft body-z = down per FRD body frame)",
+    "residual_budget_pct": 3.0,
+    "note": "Factory-sheet approximation per AZ-702. The KHP20S30 is a 20x optical-zoom camera (f=4.7-94 mm); the wide-angle f=4.7 mm setting is assumed without per-flight EXIF confirmation. Per-unit checkerboard refinement is deferred — see _docs/00_problem/input_data/flight_derkachi/camera_info.md and the AZ-696 epic. AC-3 (<= 100 m horizontal error) may honestly fail if the assumed focal length is wrong by enough to swamp the 100 m budget at the Derkachi AGL band.",
+    "task": "AZ-702",
+    "epic": "AZ-696"
+  }
+}
@@ -0,0 +1,167 @@
+# Question Decomposition — Mode B (focused) — Video Extraction from GCS Recording
+
+> Run date: 2026-05-29. Triggered by user question on
+> `_docs/00_problem/input_data/10.05.2026/2026-05-09 16-10-54.mkv`.
+> Active mode: **Mode B** (solution_draft01.md exists). Scope of this run is
+> deliberately narrower than a full solution reassessment — it asks whether the
+> existing solution can ingest a *new representative-data class* (operator-side
+> GCS screen recordings of gimbaled multi-sensor balls) as replay fixtures, and
+> what cleanup pipeline is required.
+
+## Original question
+
+> "I have `2026-05-09 16-10-54.mkv` but it's obscured by other elements. Is it
+> possible to make out of it a proper video as from a nadir camera? What's
+> possible options?"
+
+## Research Output Class
+
+**Technical-component selection** (per SKILL.md → Research Output Class table).
+The deliverable will name specific tools (FFmpeg filters, deep video inpainting
+models, mask-aware feature extractors) that will be implemented or operated
+against. All technical-component gates apply (per-mode API verification, MVE,
+fit matrix, Restrictions × Candidate-Mode sub-matrix).
+
+## Active mode
+
+| Aspect | Value |
+|---|---|
+| Skill mode | Mode B (Solution Assessment) |
+| Existing draft | `_docs/01_solution/solution_draft01.md` (329 lines) |
+| Scope of revision | Additive — propose a new test-fixture-prep component (does **not** alter runtime pipeline) |
+| Output | `_docs/01_solution/solution_draft02.md` |
+| Working dir | `_docs/00_research/_mode_b_2026-05-29_video_extraction/` |
+
+## Question type
+
+**Decision Support** — weigh trade-offs across multiple options for converting
+an OSD-burned-in screen-recorded video into a clean nadir replay fixture.
+
+## Novelty Sensitivity
+
+**Medium**. Underlying tools (FFmpeg filters) are stable for >15 years.
+Deep-learning video inpainting evolves rapidly (E2FGVI 2022 → ProPainter 2023 →
+VideoPainter 2025 → VidPivot 2025); version annotations required.
+
+## Project context grounding
+
+From `_docs/00_problem/`:
+- **Spec'd nav-camera (`restrictions.md`)**: ADTi 20MP 20L V1, APS-C, ~5472×3648
+  px, fixed-downward, no gimbal. The `flight_derkachi.mp4` representative
+  fixture is a Topotek KHP20S30 1/2.8" CMOS, 1920×1080, mechanically locked
+  nadir, OSD-off — already pre-cleaned.
+- **The new MKV is a different class of input**: a screen capture of a Ground
+  Control Station UI displaying a Topotek/Viewpro multi-sensor gimbal feed,
+  1280×720 30 fps H.264, ~6 m 7 s, with three layers of overlay: (a) GCS UI
+  chrome (sidebars, minimap, status bar), (b) gimbal-burned-in OSD (attitude,
+  crosshair, FOV brackets, status text, IR PIP), (c) the underlying EO video.
+- **Use-case (per user's selection)**: replay/test fixture for the runtime
+  C1/C2/C3/C4/C5 pipeline, analogous to `flight_derkachi.mp4`.
+- **Constraint (per user's selection)**: only the recorded MKV is available;
+  cannot re-record with OSD off, cannot lock gimbal nadir, cannot pull RTSP
+  stream from the camera.
+
+## Research subject boundary
+
+| Dimension | Boundary |
+|---|---|
+| Population | Single MKV file + the *class* of similar future GCS screen recordings |
+| Geography | Project's operational area (eastern/southern Ukraine) |
+| Timeframe | Cleanup tooling for legacy recordings (no live-system requirement) |
+| Operating context | Offline, developer workstation; output consumed by `tests/e2e/replay/` |
+| Required interfaces | Input: `.mkv` (any container with H.264). Output: H.264 MP4 ingestable by `flight_derkachi.mp4`-style replay path |
+| Non-functional envelope | Offline (no real-time constraint). Hardware: developer workstation (CPU+optional GPU). Output ≤ a few hundred MB per flight. |
+
+## Project Constraint Matrix (relevant subset)
+
+Extracted from `restrictions.md`, `acceptance_criteria.md`, and the Derkachi
+fixture conventions:
+
+| # | Constraint | Source | Binding for this run? |
+|---|---|---|---|
+| C1 | Replay fixtures must be ingestable by `tests/e2e/replay/test_az835_e2e_real_flight.py` (takes a `.mp4` + `.tlog` + calibration JSON) | `flight_derkachi/README.md` | **Yes** |
+| C2 | Output must NOT have synthetic content fabricated by generative models (would invalidate VPR/matching evaluation — pipeline could anchor on hallucinated features instead of real terrain) | `coderule.mdc` "Real Results, Not Simulated Ones" + `meta-rule.mdc` | **Yes** |
+| C3 | Output frame rate may differ from the spec'd 3 Hz; replay layer subsamples | Existing fixtures (Derkachi.mp4 is 30 fps) | No (downstream handles) |
+| C4 | Frame-to-frame registration must succeed for >95% of normal-flight segments (AC-2.1a) — applies if and only if the cleaned fixture is treated as a normal-flight fixture | `acceptance_criteria.md` | Soft: only if frames qualify as nadir |
+| C5 | Output cannot lie about the underlying camera spec; calibration file must reflect the actual recording source (Topotek/Viewpro, not ADTi 20MP) | `flight_derkachi/camera_info.md` shows the convention is to ship a per-camera calibration JSON | **Yes** |
+| C6 | The pipeline producing fixtures should be **reproducible** (versioned scripts, pinned tool versions) so a re-run produces the same fixture | `coderule.mdc` testing principles | **Yes** |
+| C7 | Cleanup must NOT introduce false-positive features the downstream matcher could anchor on | derived from C2; specific to mask-aware vs inpaint trade-off | **Yes** |
+| C8 | Gimbaled, non-nadir frames must be either filtered out or labeled — feeding forward-looking frames into a nadir-tuned VPR will produce nonsense matches | `restrictions.md` "navigation camera fixed downward (no gimbal)" + project's level-flight assumption | **Yes** |
+
+## Sub-questions
+
+1. **SQ-1 — Layer identification**: What spatially-distinct layers are in the
+   recorded video, and which are removable by cropping vs which require active
+   removal?
+2. **SQ-2 — GCS UI chrome removal**: Best technique to remove the deterministic
+   GCS UI sidebars, minimap, status bar, IR PIP?
+3. **SQ-3 — Gimbal-burned OSD removal**: Best technique to remove burned-in
+   gimbal HUD elements (attitude ladder, crosshair, FOV brackets, status text)
+   without fabricating content the downstream matcher could anchor on?
+4. **SQ-4 — Mask-aware downstream alternative**: Can the project's existing
+   C2/C3 stack (DISK + LightGlue) consume a binary mask of OSD regions
+   directly, sidestepping the need to inpaint at all?
+5. **SQ-5 — Non-nadir frame filtering**: How to detect and exclude frames where
+   the gimbal is pointed off-nadir (the burned-in attitude ladder shows the
+   gimbal angle)?
+6. **SQ-6 — Acceptance against existing replay infrastructure**: What
+   metadata/companion-files does the new fixture need to drop into the
+   `flight_derkachi.mp4`-style replay path?
+
+## Perspectives chosen (≥3)
+
+| Perspective | Why | Sub-questions emphasized |
+|---|---|---|
+| **Implementer / Engineer** | This is fundamentally a tooling/pipeline question — the engineer building the fixture cleanup script needs concrete commands and gotchas | SQ-2, SQ-3, SQ-5 |
+| **Contrarian / Devil's advocate** | The naive "just inpaint it with AI" approach has a specific failure mode in this domain (fabricated terrain features) that must be flagged | SQ-3, SQ-4 |
+| **Domain expert / Academic** | VPR + matching algorithms have published mask-aware inference paths; the question of "do we need clean pixels or can we just signal which pixels to ignore" has a literature answer | SQ-4 |
+
+## Question Explosion (search query variants)
+
+For SQ-1 (layer identification): inspection-based, no web search.
+
+For SQ-2 (GCS UI chrome removal):
+- "FFmpeg crop filter exact pixel coordinates"
+- "FFmpeg crop video specific region command line"
+
+For SQ-3 (gimbal-burned OSD removal):
+- "FFmpeg delogo filter remove static OSD overlay video burned-in HUD"
+- "FFmpeg removelogo PNG mask filter syntax"
+- "ProPainter E2FGVI video inpainting state of the art 2025 2026 mask region"
+- "video OSD removal practitioner experience drone gimbal"
+- "temporal median filter remove static HUD OSD video keep moving content"
+- "drone gimbal video OSD removal extract clean nadir feed Topotek Viewpro"
+
+For SQ-4 (mask-aware downstream):
+- "SuperPoint LightGlue masked feature detection ignore region keypoints"
+- "DISK keypoint detector mask region of interest pytorch implementation"
+- "Kornia DISK mask parameter forward pass"
+
+For SQ-5 (non-nadir frame filtering):
+- "MAVLink MOUNT_STATUS gimbal attitude tlog parsing"
+- "OCR pitch angle text from drone HUD video frame"
+
+For SQ-6 (replay infrastructure):
+- (no web; read project docs directly)
+
+## Component Option Search Plan
+
+| Component area | Option families to cover | Required evidence to mark Selected |
+|---|---|---|
+| Frame extraction & re-encode | Simple baseline (FFmpeg `crop`), Established (FFmpeg `crop` + container remux), Open-source (FFmpeg-python wrapper) | Verified `crop` syntax against FFmpeg 8.1 docs; PoC produces playable output |
+| Static-region OSD removal | Simple (FFmpeg `delogo`), Established (FFmpeg `removelogo` with PNG mask), Open-source (Python+OpenCV inpaint per-frame), SOTA (ProPainter, VideoPainter), Adjacent (temporal-median `tmedian`/`atadenoise`), No-build (skip; pass mask downstream), Known-bad (generative models that fabricate content) | Comparison of per-region quality vs cost vs fabrication risk |
+| Mask-aware downstream matcher | The project's existing DISK + LightGlue path with a binary mask injected | Verified Kornia DISK has a `mask` parameter; verified LightGlue maintainers recommend score-map masking |
+| Non-nadir frame filtering | Tlog-based (parse `MOUNT_STATUS`/`MOUNT_ORIENTATION`), OCR-based (read burned-in pitch text), Pixel-pattern-based (detect attitude-ladder rotation), No-build (accept all frames; downstream covariance grows) | Known whether the paired `.tlog` contains gimbal attitude messages |
+| Calibration metadata | Per-camera JSON file in same form as `khp20s30_factory.json` | Topotek/Viewpro spec sheet exists; "factory_sheet" approximation acceptable per AZ-702 precedent |
+
+## Completeness Audit
+
+- ✅ **Layer identification** covered (SQ-1).
+- ✅ **Removal techniques** covered for both GCS UI (SQ-2) and gimbal OSD (SQ-3).
+- ✅ **Alternative path** considered (SQ-4 — mask-aware matchers, no inpainting).
+- ✅ **Frame relevance** covered (SQ-5 — gimbal pointing).
+- ✅ **Integration** covered (SQ-6 — replay path metadata).
+- ✅ **Contrarian view** covered (generative-AI fabrication risk).
+- 🚫 **Audio handling** — not covered; trivially answered (discard audio stream).
+- 🚫 **Frame rate normalization** — not covered; trivially answered (replay
+  layer already subsamples; preserve native 30 fps).
@@ -0,0 +1,202 @@
+# Source Registry — Mode B Video Extraction Run
+
+> All sources accessed 2026-05-29.
+
+## L1 — Official documentation / source code
+
+### #1 — FFmpeg `delogo` filter (official ffmpeg-filters-docs)
+- URL: https://ayosec.github.io/ffmpeg-filters-docs/6.0/Filters/Video/delogo.html
+- Type: L1 (mirror of official FFmpeg filter docs)
+- Tier rationale: Direct documentation of a built-in FFmpeg filter
+- Key claims: rectangular logo region, parameters `x, y, w, h, show`,
+  interpolation from immediately-outside pixels
+- Verified locally: yes — `ffmpeg -h filter=delogo` on FFmpeg 8.1 confirms the
+  parameter set (the `band` parameter present in older versions has been
+  removed in 8.1)
+
+### #2 — FFmpeg `delogo` source (`vf_delogo.c`)
+- URL: https://github.com/FFmpeg/FFmpeg/blob/master/libavfilter/vf_delogo.c
+- Type: L1 (FFmpeg upstream source)
+- Tier rationale: Authoritative implementation
+- Key claims: applies a "simple delogo algorithm" interpolating surrounding
+  pixels into the rectangular logo region
+
+### #3 — FFmpeg `removelogo` source (`vf_removelogo.c`)
+- URL: https://www.ffmpeg.org/doxygen/trunk/vf__removelogo_8c_source.html
+- Type: L1 (FFmpeg upstream source)
+- Tier rationale: Authoritative implementation
+- Key claims: bitmap-mask-based blur; "major improvement on the old delogo
+  filter"; mask must be a PNG where pixels are LOGO (white) vs source (black);
+  "only pixels in the mask that line up to pixels outside the logo are used"
+- Local note: Filter exists in FFmpeg 8.1 but rejected our PNG mask with
+  "Invalid argument" (-22) — likely format expectation is stricter than
+  documented; sub-matrix marks this `Verify` rather than blocking.
+
+### #4 — Topotek Gimbals on ArduPilot Copter docs
+- URL: https://ardupilot.org/copter/docs/common-topotek-gimbal.html
+- Type: L1 (ArduPilot upstream documentation)
+- Tier rationale: Direct integration documentation for the camera class shown
+  in this project's screenshots `1.jpeg`–`4.png`
+- Key claims (relevant subset):
+  - Two RTSP video streams: `rtsp://192.168.144.108:554/stream=0` (1080p) and
+    `stream=1` (480p)
+  - Configuration via "GimbalControl" Ethernet app (OSD on/off configurable)
+  - Captured images/videos retrievable from `camera/DCIM/snap` and
+    `camera/DCIM/record` over Ethernet/SMB
+- Implication for this run: The cleanest source recovery path (raw RTSP or
+  on-camera DCIM) was explicitly excluded by the user's "only have this MKV"
+  constraint, but is recorded here as the recommended Option Z for any future
+  recordings.
+
+### #5 — LightGlue maintainer guidance on mask injection (cvg/LightGlue#97)
+- URL: https://github.com/cvg/LightGlue/issues/97
+- Type: L1 (issue answered by repo maintainer @Phil26AT, an author)
+- Tier rationale: Direct from the project that this codebase already uses
+  (per `solution_draft01.md` C3 component)
+- Key claims:
+  - SuperPoint does **not** natively accept a mask in its forward pass
+  - Two recommended workarounds: (a) extract all keypoints, then filter by
+    mask post-hoc, or (b) multiply the SuperPoint score map by a binary mask
+    before NMS
+  - Maintainer comment: "(b) you would get more points in the specified area,
+    and thus more matches"
+
+### #6 — Kornia `DISK.forward(img, mask=None)` API (Kornia docs)
+- URL: https://kornia.readthedocs.io/en/latest/feature.html
+- Type: L1 (Kornia official documentation)
+- Tier rationale: Authoritative for the Kornia DISK wrapper; relevant because
+  the DISK detector is project's chosen C3 detector per `solution_draft01.md`
+- Key claims:
+  - `kornia.feature.DISK.forward(img, mask=None)` accepts `mask` as
+    `(B, 1, H, W)` with values in `[0, 1]`
+  - "the score map is multiplied by this mask before keypoint detection so
+    that features are suppressed in masked regions"
+- Implication: **the project's existing C3 stack is already mask-capable**.
+  This makes Option B (mask-aware downstream, no inpainting) the lowest-risk
+  high-quality path.
+
+### #7 — DISK upstream source (`disk/model/disk.py`)
+- URL: https://github.com/cvlab-epfl/disk/blob/master/disk/model/disk.py
+- Type: L1 (DISK upstream)
+- Tier rationale: Authoritative for DISK semantics
+- Key claims: DISK produces a per-pixel `heatmap` of detection scores;
+  multiplying this by a spatial mask before NMS / sampling is the canonical
+  way to restrict detection to a region
+
+### #8 — FFmpeg `tmedian` filter (built-in)
+- URL: https://ffmpeg.org/ffmpeg-filters.html#tmedian
+- Type: L1 (FFmpeg official filter docs)
+- Tier rationale: Authoritative
+- Key claims: `tmedian` computes per-pixel temporal median over a configurable
+  radius window; built into recent FFmpeg
+
+### #9 — `flight_derkachi/README.md` (project's existing fixture convention)
+- URL: `_docs/00_problem/input_data/flight_derkachi/README.md` (in-repo)
+- Type: L1 (project documentation)
+- Key claims:
+  - Replay fixture is 880×720 H.264 30 fps MP4 with paired `.tlog`-derived
+    `data_imu.csv` and per-camera calibration JSON
+  - The MP4 is a "cleaned/cropped replay fixture rather than the raw camera
+    feed"
+  - "the rotating camera was mechanically fixed in a downward/nadir orientation"
+- Implication: the new MKV-derived fixture should match the same shape
+  (cleaned/cropped MP4 + calibration JSON + telemetry CSV)
+
+### #10 — `flight_derkachi/camera_info.md`
+- URL: `_docs/00_problem/input_data/flight_derkachi/camera_info.md` (in-repo)
+- Type: L1 (project documentation)
+- Key claims:
+  - Derkachi camera: Topotek KHP20S30, 1/2.8" CMOS, 1920×1080
+  - Calibration via "factory_sheet" approximation (AZ-702) is project-accepted
+    when checkerboard isn't possible — same approach applies to the
+    new gimbal
+
+## L2 — Peer-reviewed papers / preprints
+
+### #11 — ProPainter (ICCV 2023)
+- URL: https://shangchenzhou.com/projects/ProPainter/
+- Date accessed: 2026-05-29
+- Type: L2 (peer-reviewed conference paper, project page)
+- Tier rationale: ICCV 2023 paper; SOTA (at publication) non-generative video
+  inpainting baseline
+- Key claims:
+  - Recurrent flow completion + dual-domain (image+feature) propagation +
+    mask-guided sparse Transformer
+  - 808G FLOPs/10 frames at 480p; 0.249 s/frame on undisclosed GPU
+  - +1.46 dB PSNR vs prior SOTA
+- Relevance: Baseline option for offline OSD inpainting; non-generative means
+  it propagates pixels from neighboring frames (no fabricated content) — this
+  is the property our project requires.
+
+### #12 — VideoPainter (arXiv 2503.05639, 2025)
+- URL: https://arxiv.org/html/2503.05639v3
+- Type: L2 (arXiv preprint)
+- Tier rationale: Most recent generative video inpainting (2025)
+- Key claims:
+  - Generative dual-branch architecture
+  - Outperforms ProPainter on segmentation-based VPBench
+  - **Critical caveat for our use case**: explicitly described as a
+    *generative* model that synthesizes fully-masked-object content
+- Implication: **Disqualified for our use case**. Synthesized terrain features
+  would corrupt VPR/matching evaluation (project's `meta-rule.mdc` "Real
+  Results, Not Simulated Ones").
+
+### #13 — VidPivot / DiffuEraser comparison (arXiv 2510.21461, 2025)
+- URL: https://arxiv.org/html/2510.21461v2
+- Type: L2 (arXiv preprint)
+- Key claims: cross-comparison between ProPainter, DiffuEraser, VideoPainter,
+  VidPivot on object removal; ProPainter "effectively removes the target
+  region but struggles to generate semantically consistent content"
+- Implication: confirms ProPainter is the best non-generative option;
+  generative variants share the fabrication risk.
+
+### #14 — DISK paper (NeurIPS 2020, arXiv 2006.13566)
+- URL: https://arxiv.org/abs/2006.13566
+- Type: L2 (peer-reviewed)
+- Key claims: DISK is RL-trained; produces a dense heatmap; trains on
+  homographies
+- Relevance: confirms DISK exposes a heatmap that can be multiplied by a
+  spatial mask before keypoint sampling
+
+## L3 — Practitioner / blog / community
+
+### #15 — "Removing obnoxious logos from videos" (Domain of the Technomancer blog)
+- URL: https://www.technomancer.com/archives/248
+- Type: L3 (practitioner blog)
+- Key claims: practitioner walkthrough of FFmpeg `delogo`+`removelogo`,
+  including the workflow of building a PNG mask from a single frame screenshot
+
+### #16 — Conditional Temporal Median Filter (kevina.org)
+- URL: http://www.kevina.org/temporal_median/
+- Type: L3 (older practitioner page; methodology still cited)
+- Key claims: motion-conditional temporal median — apply median only where
+  motion is below threshold, preserves moving content while suppressing
+  static artifacts
+- Relevance: the "static OSD on moving video" use case maps directly to this
+  filter family. However, in our test the burned-in OSD is *also moving*
+  visually because text values change every frame, so motion-conditional
+  median has limitations.
+
+### #17 — Foundry Nuke `TemporalMedian` reference
+- URL: https://learn.foundry.com/nuke/content/reference_guide/time_nodes/temporalmedian.html
+- Type: L3 (commercial-tool documentation)
+- Key claims: Nuke's `TemporalMedian` exposes a mask channel; effect can be
+  limited to the masked region only — same pattern that FFmpeg `tmedian` lacks
+  natively
+
+## In-repo cross-references (project artifacts)
+
+### #R1 — `_docs/01_solution/solution_draft01.md`
+- C2 component: MixVPR (TensorRT, INT8+FP16) for retrieval
+- C3 component: DISK + LightGlue for matching
+- C5 component: GTSAM iSAM2 + CombinedImuFactor
+- The pipeline does not have a "data ingestion / fixture-prep" component —
+  this is the gap this run addresses.
+
+### #R2 — `_docs/00_research/06_component_fit_matrix/00_summary.md`
+- Lists every component in the existing solution with selection status
+- Confirms no fixture-cleanup component exists
+
+### #R3 — `_docs/00_problem/input_data/flight_derkachi/khp20s30_factory.json`
+- Existing per-camera calibration JSON convention; new gimbal needs an
+  equivalent
@@ -0,0 +1,283 @@
+# Fact Cards — Mode B Video Extraction Run
+
+> Confidence symbols: ✅ High (L1 official) — ⚠️ Medium (L2 academic / official
+> blog) — ❓ Low (L3 practitioner / inference)
+
+## Layer characterization (from local pixel-variance analysis)
+
+### Fact #1 — Three independent overlay layers
+- **Statement**: The recorded `2026-05-09 16-10-54.mkv` (1280×720 H.264 30 fps,
+  6 m 7 s) contains three spatially-overlapping layers: (a) GCS UI chrome
+  rendered as fixed pixel rectangles by the operator's GCS application,
+  (b) gimbal-burned-in OSD rendered upstream of the recorder by the camera
+  itself (attitude ladder, crosshair, FOV brackets, status text, IR
+  picture-in-picture), (c) the underlying EO video.
+- **Source**: Local 12-frame variance analysis (`/tmp/nadir_research/`),
+  extracted frames at t=10,30,60,90,120,150,180,210,240,270,300,330 s
+- **Confidence**: ✅ High (direct measurement)
+- **Related Dimension**: SQ-1 (layer identification)
+- **Fit Impact**: Establishes the action space — each layer needs its own
+  removal/handling strategy
+
+### Fact #2 — IR PIP is itself a live video stream, not a static element
+- **Statement**: The picture-in-picture in the upper-right (~x=720–1080,
+  y=25–235) has 85% dynamic-pixel fraction across the 12 sample frames,
+  consistent with a live IR/thermal video feed, not a static UI element.
+- **Source**: Local variance analysis
+- **Confidence**: ✅ High
+- **Fit Impact**: Cannot be ignored as "noise". Either crop it out
+  geometrically or treat as an opaque rectangle in the OSD mask.
+
+### Fact #3 — GCS UI sidebars contain live values, not pure-static chrome
+- **Statement**: Left sidebar (SL STATS panel) and right sidebar (ROLL/SPEED/
+  DIST/BATT/CURRENT) have mean per-pixel std ≈30–40 across frames, comparable
+  to the actual EO video region. They are pixel-deterministic — same fixed
+  positions on every frame — but the *values* update.
+- **Source**: Local variance analysis
+- **Confidence**: ✅ High
+- **Fit Impact**: Pure geometric crop removes them entirely; no need to
+  inpaint. Easy.
+
+### Fact #4 — Gimbal HUD text is *also* dynamic-content text on top of moving video
+- **Statement**: The top-left HUD block (`00:00/00`, timestamps, EO/IR zoom,
+  FOV) and bottom-right gimbal text show high std (≈39–40), because both the
+  HUD values change AND the underlying video changes. The HUD is rendered
+  upstream by the camera and is **always at the same screen position**.
+- **Source**: Local variance analysis + visual inspection of frames
+- **Confidence**: ✅ High
+- **Fit Impact**: Position-deterministic but content-dynamic. Inpainting must
+  either propagate from neighboring frames (temporal) or from spatially
+  adjacent pixels (FFmpeg `delogo`).
+
+### Fact #5 — Frame at t=30 s shows gimbal pointed forward (horizon visible), frame at t=300 s shows nadir
+- **Statement**: The gimbal is operator-pointable; not all frames are nadir.
+  Burned-in attitude indicator shows pitch numbers from `-3.7°` (near level)
+  to clearly off-nadir values. The aircraft also appears to be a multirotor
+  (frame at t=300 s shows DIST=17.0 m at low altitude, inconsistent with
+  fixed-wing 1 km AGL).
+- **Source**: Direct visual inspection of `f_030.png` and `f_300.png`
+- **Confidence**: ✅ High (visual)
+- **Fit Impact**: Frame-level filtering required before treating output as a
+  nadir fixture. The replay pipeline tuned for nadir-only would mis-handle
+  forward-looking frames.
+
+## FFmpeg techniques
+
+### Fact #6 — FFmpeg `crop` is a pixel-level deterministic geometric crop
+- **Statement**: `crop=W:H:X:Y` produces a sub-region; arbitrary integer
+  coordinates; lossless when paired with `-c:v copy` if the codec supports
+  arbitrary crop, otherwise a re-encode is needed.
+- **Source**: Source #1 + locally tested (PoC1 in `/tmp/nadir_research/`)
+- **Confidence**: ✅ High
+- **Related Dimension**: SQ-2 (GCS chrome removal)
+- **Fit Impact**: Trivially implements the entire GCS-chrome-removal step.
+
+### Fact #7 — FFmpeg `delogo` replaces a rectangle with interpolation from neighboring pixels
+- **Statement**: `delogo=x=X:y=Y:w=W:h=H` interpolates from the immediately-
+  outside pixels of the rectangle. In FFmpeg 8.1 the `band` parameter has been
+  removed; only `x, y, w, h, show` remain. The filter is timeline-enabled
+  (can be activated only on certain frames via `enable=` expression).
+- **Source**: Source #1 + Source #2 + locally verified (`ffmpeg -h
+  filter=delogo` on FFmpeg 8.1)
+- **Confidence**: ✅ High
+- **Related Dimension**: SQ-3 (gimbal OSD removal)
+- **Fit Impact**: Cheap, deterministic, works for small rectangles. Quality
+  degrades for large rectangles or when the region's interior is full of
+  texture (e.g., text on grass).
+- **Caveat**: Cannot place the rectangle touching the image edge — there are
+  no surrounding pixels to interpolate from.
+
+### Fact #8 — Multiple `delogo` filters can be chained via comma
+- **Statement**: A filter graph like
+  `crop=W:H:X:Y,delogo=...,delogo=...,delogo=...` chains successive `delogo`
+  passes, each operating on the output of the previous.
+- **Source**: Locally verified (PoC4 produced `poc4_delogo.mp4` via 3 chained
+  `delogo` filters after `crop`)
+- **Confidence**: ✅ High (direct test)
+- **Fit Impact**: Practical recipe for removing the 5–6 burned-OSD regions in
+  this video.
+
+### Fact #9 — FFmpeg `removelogo` accepts a PNG mask but is fragile in FFmpeg 8.1
+- **Statement**: `removelogo=mask.png` should accept a PNG where black=clean,
+  white=logo. In our local FFmpeg 8.1 tests it failed with `Invalid argument`
+  (`-22`) on both grayscale and RGB masks of the correct dimensions.
+  Documentation (Source #3) suggests strict requirements on the mask format
+  that FFmpeg 8.1 enforces but does not document clearly. Practitioner
+  walkthroughs (Source #15) used the filter successfully on older FFmpeg.
+- **Source**: Source #3 + Source #15 + local test failure
+- **Confidence**: ⚠️ Medium (works in principle, version-dependent in practice)
+- **Fit Impact**: Use chained `delogo` instead, or use a per-frame OpenCV
+  inpaint script if `removelogo` cannot be made to work on the team's pinned
+  FFmpeg version.
+
+### Fact #10 — FFmpeg `tmedian` computes per-pixel temporal median over a window
+- **Statement**: `tmedian=radius=N` outputs each pixel as the median of pixels
+  at the same coordinates over the window of `2N+1` frames. For a moving
+  camera over rich terrain, the underlying scene changes every frame so the
+  temporal median tends to wash out — producing motion-blur-like
+  ghosting rather than clean output.
+- **Source**: Source #8 + locally tested (PoC3 produced
+  `poc3_crop_tmedian.mp4`)
+- **Confidence**: ✅ High (direct test)
+- **Fit Impact**: **Not suitable** for our case — both the OSD values and the
+  underlying video change every frame, so temporal median produces ghosted
+  output that's worse for downstream matching than the original OSD-laden
+  frames.
+
+## Deep-learning video inpainting
+
+### Fact #11 — ProPainter is the SOTA non-generative video inpainter (as of late 2023)
+- **Statement**: ProPainter (Zhou et al., ICCV 2023) uses recurrent flow
+  completion + dual-domain propagation (image and feature) + mask-guided
+  sparse Transformer. Explicitly described as non-generative — it propagates
+  pixels from non-masked frames rather than synthesizing new content.
+  ~0.249 s/frame at 480p, 808G FLOPs/10 frames.
+- **Source**: #11 (ProPainter project page)
+- **Confidence**: ⚠️ Medium (paper claims; per-deployment runtime varies)
+- **Related Dimension**: SQ-3 (gimbal OSD removal, high-quality option)
+- **Fit Impact**: Highest-quality option for OSD removal that respects the
+  "no fabrication" constraint. Cost: GPU + Python toolchain; offline-only.
+
+### Fact #12 — VideoPainter and successors are *generative* and DISQUALIFIED for our use case
+- **Statement**: VideoPainter (2025), DiffuEraser (2025), VidPivot (2025),
+  OmniPainter use I2V or diffusion backbones to *synthesize* content for fully
+  masked regions. They produce more visually pleasing output than ProPainter
+  but the synthesized content is **not** a faithful representation of the real
+  underlying scene.
+- **Source**: #12 + #13
+- **Confidence**: ✅ High (explicit in the papers)
+- **Related Dimension**: SQ-3
+- **Fit Impact**: **Disqualifier**. Project rule (`meta-rule.mdc` "Real
+  Results, Not Simulated Ones"): a fixture that fabricates terrain features
+  the matcher might anchor on is worse than no fixture. Status: `Rejected`.
+
+## Mask-aware downstream
+
+### Fact #13 — Kornia's `DISK.forward()` accepts a binary mask natively
+- **Statement**: `kornia.feature.DISK.forward(img, mask=None)` takes a mask
+  argument of shape `(B, 1, H, W)` with values in `[0, 1]`. The score map is
+  multiplied by this mask before keypoint detection — keypoints in masked
+  regions are suppressed by construction, with no preprocessing of pixels.
+- **Source**: #6 (Kornia docs L1)
+- **Confidence**: ✅ High
+- **Related Dimension**: SQ-4 (mask-aware downstream)
+- **Fit Impact**: **Lowest-risk highest-quality option**. The project's chosen
+  C3 detector (DISK per `solution_draft01.md`) already supports mask injection
+  out of the box — *no video preprocessing required* beyond the deterministic
+  GCS-chrome crop.
+
+### Fact #14 — LightGlue's matching layer needs no mask; suppression at detect time is sufficient
+- **Statement**: LightGlue's authors recommend (issue #97, by maintainer
+  Phil26AT) suppressing keypoints at detect time via score-map masking; once
+  no keypoints are produced in the masked region, LightGlue has nothing to
+  match there.
+- **Source**: #5 (LightGlue issue, maintainer reply)
+- **Confidence**: ✅ High
+- **Related Dimension**: SQ-4
+- **Fit Impact**: Confirms Option B is feasible end-to-end with the existing
+  C3 stack.
+
+## Source recovery (informational; ruled out by user)
+
+### Fact #15 — Topotek/Viewpro multi-sensor balls expose RTSP and DCIM directly
+- **Statement**: Topotek camera class (per ArduPilot integration docs) exposes
+  two RTSP streams (`rtsp://192.168.144.108:554/stream=0` 1080p,
+  `stream=1` 480p) and on-camera recordings retrievable via Ethernet/SMB at
+  `camera/DCIM/snap` and `camera/DCIM/record`. OSD overlays can be disabled
+  via the GimbalControl Ethernet utility.
+- **Source**: #4
+- **Confidence**: ✅ High
+- **Fit Impact**: For *future* recordings this is the dominant path
+  (no cleanup needed). Out of scope for the current MKV per user constraint
+  but recorded as Option Z in the comparison framework.
+
+## Project-context inheritance
+
+### Fact #16 — `flight_derkachi.mp4` is the existing reference fixture shape
+- **Statement**: Existing replay fixture is 880×720 H.264 30 fps MP4, paired
+  with `data_imu.csv` (10 Hz from `.tlog`) and per-camera calibration JSON
+  (`khp20s30_factory.json`). The MP4 is described as "cleaned/cropped replay
+  fixture rather than the raw camera feed" with the "rotating camera
+  mechanically fixed in a downward/nadir orientation".
+- **Source**: #9 + #10
+- **Confidence**: ✅ High
+- **Fit Impact**: New fixture must match this structure to drop into the
+  existing `tests/e2e/replay/test_az835_e2e_real_flight.py` harness.
+
+### Fact #17 — A "factory_sheet" calibration approximation is project-accepted when checkerboard isn't possible
+- **Statement**: The Derkachi calibration was sourced via "factory_sheet"
+  approximation (AZ-702) since per-unit checkerboard refinement was deferred
+  for lack of hardware access. Residual focal-length error expected in 1–3%
+  band. Project acknowledges this is the cheapest acceptable starting point.
+- **Source**: #10 (`camera_info.md`)
+- **Confidence**: ✅ High
+- **Fit Impact**: A new calibration JSON for the Topotek/Viewpro multi-sensor
+  ball can use the same approach — published spec sheet → focal length, FOV,
+  pixel size approximations, marked `factory_sheet` source.
+
+### Fact #18 — Existing solution has no "data ingestion / fixture-prep" component
+- **Statement**: Components C1 (VIO) through C12 (build cache orchestrator) in
+  `solution_draft01.md` cover runtime + pre-flight + deploy concerns but do
+  not include a fixture-cleanup or data-ingestion component. Fixtures appear
+  in the `tests/e2e/replay/` infrastructure as already-cleaned MP4s.
+- **Source**: #R1 + #R2
+- **Confidence**: ✅ High
+- **Fit Impact**: This is the *gap* the Mode B revision addresses. The new
+  fixture-prep component does not modify the runtime; it adds a developer
+  tool under `tools/` or `tests/fixtures/` that produces fixtures consumable
+  by the existing replay path.
+
+## API Capability Verification — applied to lead candidates
+
+This section is mandatory per SKILL.md → Step 2 → API Capability Verification.
+
+### MVE — Kornia DISK in mask-aware mode
+
+- **Source**: Source #6 (Kornia docs, accessed 2026-05-29)
+- **Inputs in the docs example**: `img` of shape `(B, C, H, W)`, `mask` of
+  shape `(B, 1, H, W)` with values in `[0, 1]`
+- **Outputs in the example**: list of `Features` (keypoints + descriptors)
+  with no keypoints in masked regions
+- **Project inputs**: 1 image (`B=1`), `mask` derived once from a static OSD
+  layout, applied per-frame
+- **Project outputs required**: keypoints + descriptors that can be passed
+  into LightGlue (the project's existing C3.2 component)
+- **Match assessment**: ✅ exact match — Kornia DISK is the same library the
+  existing solution uses; the mask path is documented and exercised by Kornia
+  tests
+- **MVE code (project's expected use):**
+  ```python
+  import torch, kornia.feature as KF
+  from PIL import Image
+  import numpy as np
+
+  disk = KF.DISK.from_pretrained("depth").eval()
+  mask_np = np.asarray(Image.open("osd_mask.png").convert("L")) / 255.0
+  # mask: 1 where keep, 0 where suppress (matches Kornia semantics)
+  mask = torch.from_numpy((mask_np < 0.5).astype("float32"))[None, None]
+  img = ...  # (1, 3, H, W)
+  feats = disk(img, mask=mask, n=2048)
+  ```
+
+### MVE — FFmpeg crop + chained delogo (project's primary cleanup path)
+
+- **Source**: Source #1 (FFmpeg delogo docs) + local PoC4
+- **Inputs in our test**: `2026-05-09 16-10-54.mkv` (1280×720 H.264 30 fps)
+- **Outputs in our test**: `poc4_delogo.mp4` (900×445 H.264 30 fps with three
+  burned-OSD rectangles overwritten by interpolated pixels)
+- **Project inputs**: matches
+- **Project outputs required**: a file the replay harness can consume
+- **Match assessment**: ✅ exact match — local PoC produced a valid playable
+  output, dimensions match the existing fixture convention class
+  (sub-1080p H.264 MP4)
+- **MVE command:**
+  ```bash
+  ffmpeg -i input.mkv \
+    -vf "crop=900:445:50:25,delogo=x=5:y=35:w=180:h=115,delogo=x=395:y=5:w=275:h=70,delogo=x=130:y=265:w=690:h=50" \
+    -an -c:v libx264 -crf 18 fixture.mp4
+  ```
+
+### Skipped — VideoPainter / DiffuEraser / VidPivot
+- These candidates are rejected on the fabrication-risk disqualifier (Fact
+  #12), not on API capability. No MVE built; not progressing to Step 7.5
+  Selected status.
@@ -0,0 +1,88 @@
+# Comparison Framework — Video Extraction Options
+
+## Selected Framework Type
+
+**Decision Support** — multiple candidates, weighted on cost vs quality vs
+risk, with the goal of selecting the best path (or composition of paths) for
+the project's replay-fixture use case.
+
+## Selected Dimensions
+
+1. **Output fidelity** — Are the underlying terrain pixels preserved
+   verbatim, or modified/synthesized?
+2. **Fabrication risk** — Could the technique introduce features the
+   downstream matcher could anchor on but that don't exist in reality?
+   (Project's "Real Results, Not Simulated Ones" rule.)
+3. **Pixel coverage** — How much of the original EO video region is usable
+   in the output?
+4. **Cost & complexity** — Lines of code, dependencies, runtime per frame,
+   GPU required?
+5. **Reproducibility** — Same input → same output across runs, machines, and
+   time?
+6. **Project-pipeline integration cost** — How much of the existing C2/C3
+   pipeline needs to change to consume the output?
+7. **Coverage of layers** — Which of the three layers (GCS chrome /
+   gimbal-burned OSD / IR PIP) does the technique address?
+8. **Per-frame gimbal-pointing handling** — Does the technique help filter
+   non-nadir frames?
+
+## Initial Population — Option Matrix
+
+> Notation: ✅ ideal — ✓ acceptable — ⚠️ caveat — ❌ disqualifier
+> Pixel coverage is in % of the 1280×720 original (1280×720 = 921 600 px)
+
+| # | Option | Output fidelity | Fabrication risk | Pixel coverage | Cost & complexity | Reproducibility | C2/C3 integration cost | Layer coverage | Non-nadir filtering |
+|---|---|---|---|---|---|---|---|---|---|
+| **A** | **Crop only** (FFmpeg `crop`) | ✅ Verbatim | ✅ None | ⚠️ ~58% (740×525 ≈ 388 500 px after removing chrome+IR-PIP+minimap; ~70% of EO area) | ✅ Trivial (one filter) | ✅ Bit-deterministic | ✅ Zero changes | GCS chrome: ✅ — Gimbal OSD: ❌ remains burned in — IR PIP: ✅ excluded by tight crop | ❌ No |
+| **B** | **Crop + mask-aware DISK** (Fact #13) | ✅ Verbatim | ✅ None | ✅ ~80% of EO area (mask only suppresses keypoints in OSD pixels, pixels themselves are unchanged) | ✓ Trivial pipeline change: pass `osd_mask.png` to DISK forward call; one-time mask build | ✅ Mask is a static PNG | ⚠️ One-line C3 code change to pass `mask=` parameter | GCS chrome: ✅ — Gimbal OSD: ✅ via score-map suppression — IR PIP: ✅ via mask | ❌ No (orthogonal concern) |
+| **C** | **Crop + chained `delogo`** (Fact #7, #8) | ✓ Mostly verbatim, OSD regions are interpolated from neighbor pixels | ✓ Low — interpolation produces blurry but plausible content; could create weak features but no semantic terrain hallucination | ✅ ~85% (interpolation fills the OSD region) | ✓ Cheap (one FFmpeg invocation, ~5 chained filters) | ✅ Bit-deterministic | ✅ Zero changes (output is plain MP4) | GCS chrome: ✅ — Gimbal OSD: ✓ each OSD region passed to a separate `delogo` — IR PIP: ⚠️ too large for `delogo`, must crop or use removelogo | ❌ No |
+| **D** | **Crop + `removelogo` PNG mask** (Fact #9) | ✓ Mostly verbatim, mask-shaped blur fills OSD regions | ✓ Low (same blur-based approach as `delogo`) | ✅ ~85% | ⚠️ Cheap but version-fragile in our tests on FFmpeg 8.1 (failed); more reliable on older FFmpeg | ✅ if it works on the target version | ✅ Zero changes | All layers via single mask | ❌ No |
+| **E** | **Crop + ProPainter video inpainting** (Fact #11) | ✓ Verbatim where possible, propagated from non-masked frames where occluded | ✓ Low — non-generative, propagation-based; but if the OSD covers the same scene region for many frames the propagation may guess | ✅ ~85% | ❌ Expensive: GPU required; ~0.25 s/frame at 480p, scales with resolution; Python toolchain (PyTorch + custom build) | ✓ Reproducible if model weights pinned | ✅ Zero changes (output is plain MP4) | All layers if mask covers them | ❌ No |
+| **F** | **Crop + temporal-median (`tmedian`)** (Fact #10) | ❌ Smeared — both OSD and underlying scene change per frame; median washes both | High risk: smeared output may produce false features OR suppress real ones | ⚠️ Coverage is full but quality is degraded everywhere | ✓ Cheap | ✅ | ✅ | All if motion is right; **doesn't work for our case** because OSD values *also* change per frame | ❌ No |
+| **G** | **Crop + generative video inpainting (VideoPainter et al.)** (Fact #12) | ❌ Synthesized | ❌❌ **High** — fabricates terrain features that don't exist | ✅ ~85% | ❌ Very expensive: SOTA generative VIs require multi-GB models on H100-class GPUs | ✓ but content is non-deterministic across runs (unless seed pinned) | ✅ output is plain MP4 | All layers | ❌ No |
+| **H** | **Per-frame OpenCV navier-stokes / telea inpaint** (with the same OSD mask) | ✓ Verbatim where possible, deterministic non-generative inpaint | ✓ Low | ✅ ~85% | ✓ Cheap (Python + OpenCV); slower than FFmpeg but trivial code | ✅ | ✅ output is plain MP4 | All layers | ❌ No |
+| **I** | **Tlog-based gimbal-attitude filter** (orthogonal, applied to A/B/C) | n/a — filtering only | n/a | Reduces output to nadir-band frames only | ✓ Cheap if `MOUNT_STATUS`/`MOUNT_ORIENTATION` is in the paired `2026-05-09 16-09-54.tlog` | ✅ | ✓ stand-alone tool that drops frames before encoding | n/a (frame-level) | ✅ **Yes** — gates by gimbal pitch from telemetry |
+| **J** | **OCR-based pitch-from-OSD filter** (orthogonal, applied to A/B/C) | n/a | n/a | Reduces output to nadir-band frames only | ⚠️ More complex (Tesseract or PaddleOCR per-frame) and OCR errors propagate | ✓ | ✓ stand-alone tool | n/a (frame-level) | ✅ via OCR on the `-3.7°` text in the burned attitude indicator |
+| **Z** | **Source recovery** (re-record with OSD off / pull RTSP / pull DCIM) | ✅ Native | ✅ None | ✅ 100% | ✅ Trivial *if* hardware access | ✅ | ✅ Zero changes | All layers (no overlay produced) | ⚠️ Depends on whether gimbal can be locked nadir |
+
+## Composition note
+
+Options are not all mutually exclusive. The three orthogonal axes are:
+- **Pixel handling**: choose ONE of {A, B, C, D, E, F, G, H, Z}
+- **Frame filtering** (non-nadir rejection): choose ZERO OR ONE of {I, J} on
+  top of the pixel-handling choice
+- **Source class**: Option Z replaces all of the above when source access is
+  available; for the current MKV (user constraint = "only have this MKV"), Z
+  is unavailable.
+
+## Recommended composition
+
+**Primary**: **B + I** — crop the GCS chrome geometrically, build a binary
+OSD mask (a PNG once, hand-edited or scripted from the variance map), and
+inject the mask into the project's existing DISK detector via the
+already-supported `mask=` parameter; in parallel, parse the paired
+`.tlog` for gimbal attitude and drop frames where the gimbal is off-nadir.
+
+**Fallback** (when modifying the C3 path is not desirable for this fixture):
+**C + I** — produce a plain `.mp4` via crop + chained `delogo` so the new
+fixture can drop into the existing replay path with **zero** code changes,
+then apply the same tlog-based frame filter.
+
+**Disqualified options**: G (generative inpainting), F (temporal median —
+doesn't work for our case because OSD values change per frame).
+
+**Excluded by user constraint, but recommended for future recordings**:
+Z (source recovery — pull RTSP or DCIM directly from the camera).
+
+## Reasoning summary table
+
+| Question dimension | Winner | Why |
+|---|---|---|
+| Output fidelity | B (and Z when available) | No pixels modified |
+| Fabrication risk | B, A | No new pixels invented |
+| Pixel coverage | B, C, D, H | Whole EO region usable |
+| Cost & complexity | A, C | Single FFmpeg command |
+| Reproducibility | All except G | Deterministic |
+| C2/C3 integration | A, C, D, H | No code changes |
+| Layer coverage | B, D | Single mask handles all |
+| Non-nadir filtering | I (with any pixel option) | Telemetry-driven |
@@ -0,0 +1,247 @@
+# Reasoning Chain — Video Extraction Decisions
+
+## Dimension 1 — Why three layers and not two
+
+### Fact confirmation
+Local 12-frame variance analysis (Fact #1) showed at least three pixel
+populations distinguishable by their behavior over time:
+1. Pixel-stable rectangles around the periphery (left/right sidebars,
+   minimap) — the GCS UI chrome.
+2. Pixel-stable rectangles in the central video area (top-left HUD,
+   top-center attitude ladder, crosshair, FOV brackets, bottom-right
+   coordinates) — gimbal-burned-in OSD.
+3. The dynamic remainder — the actual EO video, plus the IR PIP, which is
+   *itself* a dynamic video stream stamped at fixed coordinates.
+
+### Reference comparison
+A simpler "UI vs video" two-layer model would suggest a single mask covering
+all overlays. But the IR PIP behaves like the EO video (Fact #2), and the
+GCS chrome includes live-updating values (Fact #3) — so the actual
+distinction that matters is *who renders the pixel and how* not *whether
+the pixel is constant*:
+- GCS chrome is rendered by the GCS application **after** the camera stream
+  arrives → it's removable by cropping to the region the GCS shows the EO
+  in.
+- Burned-in gimbal OSD is rendered **inside the camera** before the recorder
+  sees it → it's pixel-baked into the EO video and only removable by
+  inpainting or by mask-aware downstream consumption.
+- IR PIP is **also** rendered by the camera (the gimbal stamps the IR
+  channel into a corner of the EO output stream) → behaves like burned-in
+  OSD: pixel-baked, removable only by masking or cropping it out.
+
+### Conclusion
+Three layers, two removal classes:
+- Class 1 (GCS chrome): pure crop.
+- Class 2 (gimbal OSD + IR PIP): mask or inpaint.
+
+### Confidence
+✅ High — pixel-variance evidence is direct measurement.
+
+---
+
+## Dimension 2 — Why mask-aware downstream wins over inpainting
+
+### Fact confirmation
+The project's chosen C3 detector is DISK + LightGlue
+(`solution_draft01.md`). Kornia's DISK accepts a `mask=(B,1,H,W)` parameter
+that multiplies the detection score map (Fact #13). LightGlue's authors
+confirm that suppressing keypoints at detect time is sufficient — once the
+detector returns no keypoints in a region, the matcher has nothing to match
+there (Fact #14).
+
+### Reference comparison
+Inpainting-based options (C, D, E, H) all share the property that they
+synthesize *some* content for the OSD region. Even non-generative
+techniques like FFmpeg `delogo` (interpolation from outside pixels) or
+ProPainter (propagation from neighbor frames) produce pixels that *look*
+like terrain but didn't come from the actual terrain at that location. A
+feature detector running on those inpainted pixels could legitimately fire
+on the inpaint artifacts. Without a mask, the downstream pipeline cannot
+distinguish a real feature from a fake one. With a mask, it doesn't have to:
+the score map is zeroed before NMS, so no keypoint is produced for the OSD
+region in the first place.
+
+### Conclusion
+Mask-aware downstream is **strictly better** than inpainting for this
+project's use case, because:
+1. Output fidelity is verbatim (no synthesized pixels enter the matcher).
+2. The mask is a single static PNG, computed once from the OSD layout — far
+   simpler than per-frame inpainting.
+3. The integration cost is one parameter on the existing `DISK.forward()`
+   call (Fact #6).
+4. The OSD coverage is the union of all OSD elements, so the mask trivially
+   handles all of them at once (top-left HUD, attitude ladder, crosshair,
+   etc.) without one filter per region.
+
+The only reason to fall back to inpainting (Option C/H) is if we want a
+fixture that can be dropped into the existing replay path **without any
+code change**, because today's replay tooling treats the input MP4 as
+pristine. Even then, the right answer is to extend the replay tooling to
+carry an optional companion `osd_mask.png` per fixture — at which point
+Option B is again preferable.
+
+### Confidence
+✅ High — both the existence of the API and its semantic effect are
+documented at L1 (Kornia docs, LightGlue maintainer reply).
+
+---
+
+## Dimension 3 — Why generative inpainting is disqualified
+
+### Fact confirmation
+VideoPainter (2025), DiffuEraser (2025), VidPivot (2025) and similar SOTA
+inpainters (Fact #12) explicitly *generate* content for masked regions
+using video-diffusion or I2V backbones. The papers claim these models
+produce *plausible* terrain even where the masked region was fully
+occluded.
+
+### Reference comparison
+Project's `meta-rule.mdc` rule "Real Results, Not Simulated Ones" is
+unambiguous: the goal is a working product, not the appearance of one.
+Specifically: "Never produce results by bypassing, faking, stubbing, or
+passthrough-ing the component that is supposed to produce them."
+
+The downstream component is a feature matcher whose entire purpose is to
+detect real terrain features and match them to a satellite tile. A
+generative inpaint inserts plausible-but-false terrain features into the
+input. The matcher cannot tell the difference. It will happily match
+fabricated grass texture to a real satellite-tile region with similar
+texture and produce a confident, wrong, fix.
+
+The same argument applies even more sharply to the project's
+**AC-NEW-7** "cache-poisoning safety budget": onboard tiles fed back into
+the basemap must not be misaligned. A fixture validating tile generation
+that includes synthesized terrain features tests the wrong thing — it
+validates that the system handles plausible-looking pixels, not that it
+handles real-flight pixels.
+
+### Conclusion
+Generative inpainters (Option G) are **rejected**. They optimize the wrong
+objective for this project.
+
+### Confidence
+✅ High — disqualifier comes from explicit project rule + reading of
+upstream paper claims.
+
+---
+
+## Dimension 4 — Why temporal median fails for this case
+
+### Fact confirmation
+FFmpeg `tmedian=radius=N` outputs the per-pixel median over `2N+1`
+neighboring frames (Fact #10). This works as an OSD-removal trick when:
+1. The OSD pixels are **stable** (same value every frame, or at least the
+   majority of frames).
+2. The underlying scene **changes** per frame (so the median over the
+   window is dominated by underlying scene values, not OSD values).
+
+In our recorded video, both the OSD values **and** the underlying scene
+change per frame:
+- Burned-in OSD text shows live counters like `00:04:24` that update each
+  second; pitch number `-3.7°` updates with gimbal motion; HDOP/SATS values
+  change.
+- Underlying EO video shows the ground moving as the UAV moves.
+
+### Reference comparison
+A *motion-conditional* temporal median (Source #16, Source #17) — apply
+the median only where motion is below a threshold — addresses the issue in
+principle. But the static-OSD assumption underneath that approach
+specifically does not hold in our case: even the *positions* are static,
+but the *content* in those positions is dynamic.
+
+### Conclusion
+Temporal median is **not suitable** for this video. The local PoC
+(`poc3_crop_tmedian.mp4`) confirms: output shows ghosted, smeared OSD text
+overlapping with smeared/aliased terrain — strictly worse than the
+original for downstream feature matching.
+
+### Confidence
+✅ High — direct experimental result.
+
+---
+
+## Dimension 5 — Why frame filtering by gimbal pointing is mandatory
+
+### Fact confirmation
+Frame at t=30 s shows gimbal pointed forward (sky/horizon visible), frame
+at t=300 s shows gimbal pointed near nadir (ground texture filling frame)
+(Fact #5). The gimbal is operator-controlled — mid-flight pointing is
+common; only a subset of frames are nadir.
+
+### Reference comparison
+The project's nav-camera spec is "fixed downward (no gimbal)"
+(`restrictions.md`). The C2 VPR component is trained / tuned on satellite
+imagery with the assumption that the query is a top-down view of the
+ground. Forward-looking frames (sky, distant horizon, oblique terrain) are
+out-of-distribution for the VPR retrieval and would produce poor or
+spurious matches.
+
+### Conclusion
+A fixture derived from this MKV that contains forward-looking frames is
+not a valid representative-data fixture for the nadir-tuned pipeline. A
+frame-level filter is needed — either:
+- **Option I** (telemetry-based): parse `MOUNT_STATUS`/`MOUNT_ORIENTATION`
+  from the paired `2026-05-09 16-09-54.tlog`. Cheaper and more reliable.
+- **Option J** (OCR-based): read the burned-in `-3.7°` text from the
+  attitude indicator. Lower setup cost (no telemetry parser) but OCR
+  errors propagate.
+
+### Confidence
+✅ High — the gimbal-pointing fact is direct visual evidence; the
+out-of-distribution argument is a derived consequence consistent with the
+project's `restrictions.md` AC-2.1a "nadir ±10° bank/pitch" qualifier.
+
+---
+
+## Dimension 6 — Why this is a fixture-prep tooling concern, not a runtime concern
+
+### Fact confirmation
+Existing `solution_draft01.md` does not have a "data ingestion / fixture
+prep" component (Fact #18). Replay fixtures appear in the test
+infrastructure as already-cleaned MP4s + companion CSV/JSON.
+
+### Reference comparison
+The runtime nav-camera (per project spec) is the ADTi 20MP fixed-downward
+without OSD. There is no expectation that the runtime pipeline ever sees
+an OSD-laden frame from a multi-sensor gimbal. So the right place to
+handle this MKV is **not** in the runtime — it is in the developer
+tooling that produces fixtures.
+
+### Conclusion
+The Mode B revision is **additive, not subtractive**: it identifies a gap
+(no fixture-prep component) and adds a developer tool. It does **not**
+modify any runtime component. The C1/C2/C3/C4/C5 components in
+`solution_draft01.md` are unchanged.
+
+### Confidence
+✅ High — direct read of `solution_draft01.md` confirms no such
+component exists.
+
+---
+
+## Dimension 7 — Why the existing `flight_derkachi.mp4` precedent matters
+
+### Fact confirmation
+`flight_derkachi.mp4` is described as "cleaned/cropped replay fixture
+rather than the raw camera feed" with "the rotating camera mechanically
+fixed in a downward/nadir orientation" (Fact #16). It was produced by a
+process that:
+1. Disabled the gimbal OSD (likely via Topotek's GimbalControl Ethernet
+   utility).
+2. Mechanically locked the gimbal nadir.
+3. Recorded a 1080p clean stream.
+4. Cropped to 880×720 (probably to remove residual borders or reframe).
+
+### Reference comparison
+The new MKV represents the *opposite* situation: OSD on, gimbal
+unconstrained, GCS-screen-recorded rather than direct camera capture. The
+existing fixture-creation procedure (steps 1–4 above) does not apply.
+
+### Conclusion
+A new, documented procedure is needed for the GCS-screen-recorded
+class of input. That procedure is the deliverable of this Mode B run
+(see `solution_draft02.md`). It complements the existing Derkachi
+procedure — does not replace it.
+
+### Confidence
+✅ High.
@@ -0,0 +1,133 @@
+# Validation Log — Mode B Video Extraction Run
+
+## Validation scenario
+
+A developer wants to use `2026-05-09 16-10-54.mkv` as a representative
+replay fixture for the GPS-denied pipeline (analogous to
+`flight_derkachi.mp4`), to extend testing to a new aircraft/camera class
+(multi-sensor gimbal ball, multirotor profile) and a new operating
+condition (low-altitude / non-nadir gimbal).
+
+## Expected behavior under each candidate
+
+### Option A (crop only)
+Expected: produces an 740×525-ish MP4 with gimbal OSD elements still burned
+in at the same screen positions. Replay infrastructure consumes it as-is.
+Downstream C2/C3 detect features inside OSD text regions and produce false
+matches. Drift accumulates, AC-2.1a fails.
+
+**Actual** (from PoC1 + reasoning): predicted behavior matches. Output is a
+valid MP4 but feeding it into a feature matcher would produce keypoints
+inside the burned-in `-3.7°` and `FOV 53.2°` text regions, since those
+regions have high local contrast.
+
+### Option B (crop + mask-aware DISK)
+Expected: same MP4 as Option A, plus a static `osd_mask.png` companion file.
+Replay infrastructure modified to inject the mask into the C3 detect call.
+DISK detector returns no keypoints inside masked regions (per Fact #13
+score-map multiplication semantics). LightGlue matches only real-terrain
+features. AC-2.1a passes for nadir frames.
+
+**Actual** (predicted, no end-to-end PoC run): matches the documented
+Kornia DISK contract. The change to the replay tooling is one optional
+parameter added to a `Disk()` instantiation. Risk: if the existing
+production code path uses a wrapper around DISK that does not forward the
+`mask=` parameter, the wrapper needs adjustment.
+
+### Option C (crop + chained `delogo`)
+Expected: an 740×525-ish MP4 with OSD regions replaced by interpolation
+from neighboring pixels. Replay infrastructure unchanged. Downstream C2/C3
+detect features in the interpolated regions; some weak features may be
+detected (interpolation produces low-contrast smooth regions) but
+significantly fewer than the original OSD regions.
+
+**Actual** (from PoC4): output looks reasonable with three OSD rectangles
+replaced by smoothed interpolation. Some chained `delogo` filters caused
+issues when their rectangles touched image edges in earlier attempts —
+mitigated by avoiding edge contact.
+
+### Option F (temporal median)
+Expected: smeared, ghosted output as both OSD and underlying terrain
+average together over the window.
+
+**Actual** (from PoC3): confirmed. Output shows visible motion-blur ghosts
+of OSD text across the frame, plus desaturated and smeared underlying
+terrain. **Disqualified.**
+
+### Option I (tlog-based gimbal-pointing filter)
+Expected: parse `MOUNT_STATUS`/`MOUNT_ORIENTATION` messages from the
+companion `2026-05-09 16-09-54.tlog`, build a frame index → gimbal-pitch
+table, drop frames where pitch is more than (e.g.) 10° off-nadir. Output
+preserves only nadir-band frames, suitable for the level-flight VPR
+assumption.
+
+**Actual** (predicted): depends on whether the camera class actually emits
+`MOUNT_STATUS` to the FC. ArduPilot's documented gimbal integration
+(Source #4) confirms gimbal angles are reported back to the FC for some
+Topotek models. **Verify** before relying on this — if the tlog lacks
+gimbal angle, fall back to Option J (OCR).
+
+## Counterexamples
+
+### Counterexample 1: gimbaled-fixed nadir flight
+**Scenario**: the user happens to have already locked the gimbal nadir and
+the entire recording is nadir-only. **Implication**: Option I/J becomes a
+no-op; the rest of the pipeline works the same. **No change to
+recommendation.**
+
+### Counterexample 2: text values in OSD overlap with bright terrain
+**Scenario**: the green attitude ladder text overlaps with bright sky in
+forward-looking frames — does Option C `delogo` interpolation produce
+something useful? **Predicted**: only the rectangle is touched; if the
+rectangle covers sky-only pixels, the interpolation produces sky-colored
+output (acceptable). If the rectangle straddles sky/horizon, the
+interpolation may produce a smeared horizon line (mild artifact,
+acceptable for non-nadir frames which would be filtered by Option I/J
+anyway).
+
+### Counterexample 3: future MKV recordings have different OSD layout
+**Scenario**: a later flight uses a different GCS that places the OSD
+elsewhere, breaking the hardcoded coordinates in the chained `delogo`
+recipe. **Implication**: the developer tool must be parametrized, not
+hardcoded. The proposed fixture-prep tool ships with a **per-recording
+OSD profile** (a small YAML or JSON listing the GCS-chrome crop box and
+the OSD rectangles) so adding a new recording class is a few-line config
+change.
+
+## Review checklist
+
+- [x] Draft conclusions consistent with fact cards
+- [x] No important dimensions missed (audio handling and frame-rate
+      normalization are noted as trivial in `00_question_decomposition.md`'s
+      Completeness Audit)
+- [x] No over-extrapolation — claims are tied to specific facts
+- [x] Conclusions actionable: a developer can follow the recipes in
+      `solution_draft02.md` to produce a new fixture
+- [x] Every selected component matches the project's constraint matrix
+      (verified in `06_component_fit_matrix.md`)
+- [x] Mismatches marked as disqualifiers (Option G, F)
+- [x] Per-mode API capability verified for both lead candidates (Kornia
+      DISK in mask mode, FFmpeg `crop`+`delogo` chain) — both have saved
+      MVE blocks in `02_fact_cards.md`
+
+## Open questions deferred to user / out-of-scope
+
+1. **Does the paired `2026-05-09 16-09-54.tlog` contain `MOUNT_STATUS`
+   messages?** — not verified in this run. Recommendation: open the tlog
+   with `pymavlink` and grep for `MOUNT_STATUS`; if absent, fall back to
+   Option J or accept all frames + downstream covariance.
+2. **Should this fixture replace `flight_derkachi.mp4` as the primary
+   replay fixture, or supplement it?** — supplement. Different aircraft
+   class, different sensor class. Both fixtures have value for
+   different test scenarios.
+3. **Is the project willing to commit to one extra parameter on the
+   `tests/e2e/replay/conftest.py::_calibration_path()` family of helpers
+   for an optional `osd_mask.png` companion?** — recommended yes; it is
+   the cleanest path. Not blocking for this run; can be deferred to a
+   follow-up tracker ticket if Option C fallback is acceptable for now.
+
+## Validation conclusion
+
+The recommended composition (B + I primary, C + I fallback, Z preferred
+for future recordings) holds up under the validation scenarios. Move to
+Step 7.5 (Component Applicability Gate).
@@ -0,0 +1,100 @@
+# Component Fit Matrix — Video Extraction Pipeline
+
+> Step 7.5 — Component Applicability Gate. Applies because this run is
+> classified as Technical-component selection.
+
+## 7.5.1 Top-level Component Fit Matrix
+
+| Component Area | Candidate | Pinned Mode/Config | Option Family | Intended Role | API Capability Evidence | Mismatches / Disqualifiers | Status | Decision Rationale |
+|---|---|---|---|---|---|---|---|---|
+| Geometric crop (GCS chrome removal) | FFmpeg `crop` filter | Single static crop box `crop=W:H:X:Y` derived per recording from variance-map analysis; one-shot CLI invocation | Established production | Strip GCS UI sidebars/minimap/IR-PIP from recorded MKV | MVE block in `02_fact_cards.md` (PoC1 produced playable output); docs Source #1 | None for the user's pinned use case (offline tool) | **Selected** | Trivial, lossless within re-encode, deterministic |
+| OSD-pixel handling (PRIMARY) | Kornia `DISK.forward(img, mask=...)` | mask-aware mode `(B, 1, H, W)` mask, multiplied into the DISK score map before NMS | Established production (existing project component) | Suppress keypoint detection inside burned-in OSD regions | MVE block in `02_fact_cards.md`; docs Source #6 | Requires the existing C3 wrapper around DISK to forward the `mask=` parameter (one-line code change) | **Selected** | No pixel modification; fabrication-risk = 0; matches existing C3 stack exactly |
+| OSD-pixel handling (FALLBACK) | FFmpeg `delogo` chained | Multiple `delogo=x:y:w:h` filters chained for each OSD rectangle, after `crop`. **Important**: rectangles must NOT touch image edges (no border pixels to interpolate from) | Simple baseline | Replace burned-in OSD rectangles with interpolation-from-neighbors output, producing a plain MP4 ingestable by the existing replay path with no code changes | MVE block in `02_fact_cards.md` (PoC4); docs Source #1, #2 | Quality degrades when the OSD rectangle is large (e.g., the IR PIP at 360×210 px) — for that, `removelogo` mask or geometric crop is preferred | **Selected (fallback)** | Cheap, deterministic, no toolchain beyond FFmpeg |
+| OSD-pixel handling (REJECTED for fragility) | FFmpeg `removelogo` PNG mask | Single PNG mask covering all OSD elements, applied via `removelogo=mask.png` | Established production | One-shot OSD removal via mask | Source #3 docs claim it works; local test on FFmpeg 8.1 failed with `Invalid argument` (-22) | Version-fragile; could not be made to work in our local FFmpeg 8.1 with grayscale or RGB masks of correct dimensions | **Experimental only** | Try first if available on team's pinned FFmpeg version; fall back to chained `delogo` |
+| OSD-pixel handling (REJECTED, fabrication risk) | VideoPainter / DiffuEraser / VidPivot (and any generative video inpainter) | Diffusion-backbone or I2V generative inpainter applied to the OSD mask | SOTA / Known bad | High-fidelity-looking OSD removal | Sources #12, #13 — papers explicitly describe synthesis | Generates terrain content that does not exist in the real recording. Project rule "Real Results, Not Simulated Ones" is unambiguous | **Rejected** | Disqualified by `meta-rule.mdc` |
+| OSD-pixel handling (REJECTED, wrong assumption) | FFmpeg `tmedian` temporal median | `tmedian=radius=N` after `crop` | Adjacent domain | Suppress static OSD via temporal median | PoC3 test result | OSD values change every frame (timestamps, gimbal angle, HDOP), so the static-OSD assumption underneath the technique fails. Output is smeared | **Rejected** | Disqualified by direct experimental evidence |
+| OSD-pixel handling (DEFER) | ProPainter | ProPainter checkpoint with mask-guided sparse Transformer | Current SOTA non-generative | High-quality OSD removal that respects no-fabrication constraint | Source #11 paper claims | Adds Python+PyTorch+CUDA toolchain; offline runtime ~0.25 s/frame at 480p; not necessary if Option B is implemented | **Experimental only** | Keep available for cases where Option B's downstream code change is rejected and the masked-region size is too large for `delogo` to interpolate cleanly |
+| Frame filtering by gimbal pointing (PRIMARY) | `pymavlink` parser of paired `.tlog` for `MOUNT_STATUS` / `MOUNT_ORIENTATION` | Read paired `2026-05-09 16-09-54.tlog`, build a `frame_idx → gimbal_pitch_deg` table by interpolating message timestamps to the 30 fps frame timeline, drop frames where `|pitch − (-90°)| > 10°` (or per-project nadir tolerance) | Established production | Reject non-nadir frames before encoding the cleaned MP4 | Verified path (`pymavlink` is already used in the project's `derkachi.tlog` pipeline per `flight_derkachi/README.md`) | **Verify**: must confirm the paired tlog actually emits `MOUNT_STATUS` for the camera in question; if the gimbal does not report attitude over MAVLink, this option fails | **Needs user decision** (effectively Selected if tlog has the messages) | Cleanest signal; deterministic; reuses existing project tooling |
+| Frame filtering by gimbal pointing (FALLBACK) | OCR (Tesseract or PaddleOCR) on the burned-in pitch-angle text | Per-frame OCR of the `-3.7°` text in the attitude indicator | Adjacent domain | Recover gimbal pitch when telemetry path is unavailable | OCR libraries are common; no project-specific MVE built | OCR errors propagate; need confidence thresholding | **Experimental only** | Use only if the tlog lacks gimbal attitude |
+| Calibration JSON | Per-camera `khp20s30_factory.json`-equivalent for the Topotek/Viewpro multi-sensor ball | "factory_sheet" approximation per the AZ-702 precedent | Established production (project precedent) | Provide intrinsics consumable by `tests/e2e/replay/` | Source #10 (Derkachi camera_info.md showing the convention) | None — same approach as the existing fixture | **Selected** | Project-accepted precedent |
+| Companion telemetry CSV | Existing `derkachi.tlog → data_imu.csv` exporter, retargeted to `2026-05-09 16-09-54.tlog` | Run the same exporter that produced `data_imu.csv` for Derkachi | Established production (existing project tool) | Provide synchronized IMU data for the new fixture | Existing pipeline (`flight_derkachi/data_imu.csv`); reuses `pymavlink` | None | **Selected** | Reuses existing tool |
+
+## 7.5.2 Restrictions × Candidate-Mode Sub-Matrix
+
+> The "constraints" here come from the run-specific Project Constraint Matrix
+> in `00_question_decomposition.md` (Constraints C1–C8 — fixture must drop
+> into existing replay infrastructure, no fabrication, etc.). Numbered AC
+> from `acceptance_criteria.md` are referenced where directly relevant — but
+> note this is a **fixture-prep tool, not a runtime component**, so most
+> runtime-AC rows are N/A.
+
+### Sub-Matrix — FFmpeg `crop` (geometric chrome removal)
+
+| Constraint / AC | Candidate-mode behavior | Result | Evidence |
+|---|---|---|---|
+| C1 (ingestable by `tests/e2e/replay/`) | Output is a plain H.264 MP4 with arbitrary integer dimensions; the existing replay path consumes 880×720 (Derkachi) so any sub-1080p H.264 MP4 works | ✅ Pass | Fact #6 + #16 |
+| C2 (no synthetic content) | `crop` discards pixels; never invents | ✅ Pass | Fact #6 |
+| C3 (frame rate flexibility) | `crop` preserves frame rate | N/A | — |
+| C5 (calibration honesty) | Crop changes principal point — calibration must be derived for the cropped frame, not the original 1280×720. Per-camera JSON should reflect the cropped image dimensions and shifted principal point | ✅ Pass (with derived calibration) | `flight_derkachi/camera_info.md` precedent |
+| C6 (reproducibility) | Single deterministic FFmpeg command | ✅ Pass | Fact #6 |
+| C7 (no false-positive features) | Cropped pixels are verbatim; remaining OSD is handled by other components | N/A (this component does not address OSD) | — |
+| C8 (non-nadir frame filtering) | Crop is frame-agnostic | N/A | — |
+
+### Sub-Matrix — Kornia DISK in mask-aware mode (PRIMARY)
+
+| Constraint / AC | Candidate-mode behavior | Result | Evidence |
+|---|---|---|---|
+| C1 (ingestable by `tests/e2e/replay/`) | Requires one-line modification to the C3 detector wrapper to forward `mask=` | ✅ Pass with caveat | Fact #13 |
+| C2 (no synthetic content) | Mask suppresses score-map values in OSD regions; pixel values are unchanged | ✅ Pass | Fact #13 + Fact #14 |
+| C5 (calibration honesty) | Mask path orthogonal to calibration | N/A | — |
+| C6 (reproducibility) | Mask is a static PNG file checked into the fixture directory | ✅ Pass | — |
+| C7 (no false-positive features in OSD region) | DISK returns no keypoints in masked region by construction | ✅ Pass | Fact #13 |
+| AC-2.1a (frame-to-frame registration >95%) | OSD region's keypoints removed before matching; matching depends only on real terrain features in the unmasked region | ✅ Pass for nadir frames (subject to C8 filter) | Fact #14 |
+| AC-2.2 (Mean Reprojection Error <1.0 px frame-to-frame) | Reprojection error is computed on real-terrain matches only; not affected by mask | ✅ Pass | — |
+
+### Sub-Matrix — FFmpeg `delogo` chained (FALLBACK)
+
+| Constraint / AC | Candidate-mode behavior | Result | Evidence |
+|---|---|---|---|
+| C1 (ingestable) | Output is plain MP4 | ✅ Pass | Fact #7, #8 + PoC4 |
+| C2 (no synthetic content) | `delogo` interpolates from neighbors — non-generative; no semantic terrain features synthesized | ✅ Pass with caveat (interpolation is *new* pixels, but they are computed from real adjacent pixels and produce smooth low-contrast regions unlikely to spawn false features) | Fact #7 |
+| C6 (reproducibility) | Single deterministic FFmpeg command | ✅ Pass | Fact #7 |
+| C7 (no false-positive features) | Smooth interpolated regions are unlikely to spawn high-confidence keypoints, but they CAN — DISK keypoints can fire on smooth gradient transitions; risk is real but small | ❓ Verify with empirical keypoint-density test on `poc4_delogo.mp4` vs the original | PoC4 visual inspection |
+| AC-2.1a | Conditional on C7 result | ❓ Verify | — |
+
+### Sub-Matrix — `pymavlink` MOUNT_STATUS frame filter (PRIMARY for non-nadir filtering)
+
+| Constraint / AC | Candidate-mode behavior | Result | Evidence |
+|---|---|---|---|
+| C8 (non-nadir frame filtering) | Drops frames where gimbal pitch is off-nadir | ✅ Pass IF the tlog contains MOUNT_STATUS | Source #4 (ArduPilot Topotek docs reference gimbal angle messaging) |
+| C6 (reproducibility) | Deterministic Python script | ✅ Pass | — |
+| Tlog content actually contains MOUNT_STATUS for this gimbal | unverified — depends on whether the operator's autopilot was wired to receive and forward gimbal attitude | ❓ Verify | — |
+
+### Sub-Matrix — Generative video inpainters (REJECTED)
+
+| Constraint / AC | Candidate-mode behavior | Result | Evidence |
+|---|---|---|---|
+| C2 (no synthetic content) | Synthesizes terrain features that do not exist | ❌ Fail | Fact #12 |
+
+## 7.5.3 Decision Summary
+
+| Component area | Selected | Status notes |
+|---|---|---|
+| Chrome removal | FFmpeg `crop` | Selected, no caveats |
+| OSD pixel handling (primary) | Kornia DISK mask-aware mode | Selected, conditional on one-line wrapper change |
+| OSD pixel handling (fallback) | FFmpeg `delogo` chained | Selected fallback for fixtures that must drop into existing replay path with zero code changes |
+| OSD pixel handling (other options) | `removelogo` (Experimental only — version-fragile), ProPainter (Experimental only — toolchain cost), `tmedian` (Rejected — disqualified by experiment), generative inpainters (Rejected — fabrication risk) | — |
+| Non-nadir filter (primary) | `pymavlink` parser of paired tlog | Needs user decision: depends on whether tlog has MOUNT_STATUS |
+| Non-nadir filter (fallback) | OCR on burned-in pitch text | Experimental only |
+| Calibration JSON | Per-camera "factory_sheet" approximation | Selected (project precedent) |
+| Telemetry CSV | Reuse existing tlog → CSV exporter | Selected |
+
+**Blocker check**: One row is **Needs user decision** (tlog content not yet
+verified). The user should be asked to either (a) confirm the tlog has
+gimbal attitude, in which case Option I is Selected, or (b) accept Option J
+fallback / accept all frames, in which case the fixture is supplied without
+filtering and the test plan documents the limitation.
+
+This blocker is non-blocking for the *technical recommendation* — the user
+can choose either path and the rest of the pipeline is unchanged. It is
+recorded in `solution_draft02.md`'s "Open questions" section.
@@ -0,0 +1,441 @@
+# Solution Draft 02 — Recovering a Clean Nadir Fixture from `2026-05-09 16-10-54.mkv`
+
+> **Mode**: B (Solution Assessment) — additive. This draft does **not** modify any runtime component in `_docs/01_solution/solution_draft01.md` (C1…C12). It adds a *fixture-prep developer tool* that converts an OSD-burned-in GCS screen recording into the `flight_derkachi.mp4`-shaped artifact consumed by `tests/e2e/replay/test_az835_e2e_real_flight.py`.
+>
+> **Run date**: 2026-05-30. Continues the 2026-05-29 Mode B investigation (`_docs/00_research/_mode_b_2026-05-29_video_extraction/`), with one previously-open "Needs user decision" row now resolved by a fresh tlog scan (Section 5 below).
+>
+> **Extraction executed on 2026-05-30**. The primary path (§4.1 Steps 1 + 2) was run against this MKV; the resulting fixture is at [`../../00_problem/input_data/flight_topotek_2026-05-09/`](../../00_problem/input_data/flight_topotek_2026-05-09/) with its own short README. The non-nadir frame filter (§4.1 Steps 4–5) and the companion calibration / IMU files (§4.3) were intentionally NOT produced — they are downstream decisions, not part of "extract a clean video". The verified crop coordinates differ from the 2026-05-29 draft's PoC4 values (which assumed a smaller IR PIP); the current §4.1 numbers reflect what was actually used.
+>
+> **Backing artifacts** (read these alongside this draft for full evidence):
+> - Question decomposition: [`../../00_research/_mode_b_2026-05-29_video_extraction/00_question_decomposition.md`](../../00_research/_mode_b_2026-05-29_video_extraction/00_question_decomposition.md)
+> - Source registry (17 L1/L2/L3 sources): [`../../00_research/_mode_b_2026-05-29_video_extraction/01_source_registry.md`](../../00_research/_mode_b_2026-05-29_video_extraction/01_source_registry.md)
+> - Fact cards (18 verified facts incl. local PoC results): [`../../00_research/_mode_b_2026-05-29_video_extraction/02_fact_cards.md`](../../00_research/_mode_b_2026-05-29_video_extraction/02_fact_cards.md)
+> - Comparison framework: [`../../00_research/_mode_b_2026-05-29_video_extraction/03_comparison_framework.md`](../../00_research/_mode_b_2026-05-29_video_extraction/03_comparison_framework.md)
+> - Reasoning chain: [`../../00_research/_mode_b_2026-05-29_video_extraction/04_reasoning_chain.md`](../../00_research/_mode_b_2026-05-29_video_extraction/04_reasoning_chain.md)
+> - Validation log: [`../../00_research/_mode_b_2026-05-29_video_extraction/05_validation_log.md`](../../00_research/_mode_b_2026-05-29_video_extraction/05_validation_log.md)
+> - Component fit matrix: [`../../00_research/_mode_b_2026-05-29_video_extraction/06_component_fit_matrix.md`](../../00_research/_mode_b_2026-05-29_video_extraction/06_component_fit_matrix.md)
+> - Inputs: [`../../00_problem/input_data/10.05.2026/2026-05-09 16-10-54.mkv`](../../00_problem/input_data/10.05.2026/2026-05-09%2016-10-54.mkv) and [`../../00_problem/input_data/10.05.2026/2026-05-09 16-09-54.zip`](../../00_problem/input_data/10.05.2026/2026-05-09%2016-09-54.zip) (contains the paired `.tlog`)
+
+---
+
+## 1. TL;DR
+
+**Yes, a clean nadir replay fixture can be recovered**, but the answer has two parts that must both be done; doing only one will produce a fixture that quietly misleads the runtime pipeline.
+
+| Concern | Recommended primary | Cheap fallback (zero replay-code changes) |
+|---|---|---|
+| **Strip the GCS UI chrome (sidebars / minimap / IR-PIP)** | `ffmpeg crop` (deterministic, verbatim pixels) | — same — |
+| **Handle the gimbal's burned-in OSD (attitude ladder, crosshair, FOV brackets, status text)** | **Inject a static `osd_mask.png` into the existing C3 `kornia.feature.DISK.forward(img, mask=…)` call.** Zero pixel modification, zero fabrication risk. | `ffmpeg crop + delogo` chain (interpolates from neighbor pixels — non-generative; locally verified working as `poc4_delogo.mp4` on the prior run) |
+| **Filter out frames where the gimbal is not nadir** | **OCR the burned-in pitch text** (Option J) — the *previously* preferred telemetry path is dead for this recording (see Section 5). | Manual labeling pass: ship a small `frame_ranges.yaml` of nadir vs non-nadir segments alongside the MP4. ~30 min of human labour for a 6-minute clip. |
+
+**Disqualified**:
+- Generative video inpainters (VideoPainter / DiffuEraser / VidPivot et al.) — they fabricate terrain, which corrupts VPR/matching evaluation and violates `meta-rule.mdc` "Real Results, Not Simulated Ones".
+- FFmpeg `tmedian` (temporal median) — both the OSD text *and* the underlying scene change every frame, so the median is smeared in both regions (locally verified as `poc3_crop_tmedian.mp4` on the prior run).
+
+**Not available to this project** — the source MKV is what we have; there is no access to the camera, the GCS host, or the upstream RTSP. The ideal-but-out-of-reach path would have been to pull RTSP directly from the gimbal (`rtsp://192.168.144.108:554/stream=0` for the Topotek / Viewpro multi-sensor ball class) or extract DCIM with OSD off via the GimbalControl Ethernet utility (ArduPilot Source #4). That path is documented here only for completeness (and because the `flight_derkachi.mp4` fixture was produced that way, which is why it is already clean); it is not actionable for this data source. The only thing that could replace the cleanup pipeline is the original supplier voluntarily re-recording with OSD off — which is outside this project's control.
+
+---
+
+## 2. What is in `2026-05-09 16-10-54.mkv`
+
+### 2.1 Technical metadata (verified via `ffprobe`)
+
+| Field | Value |
+|---|---|
+| Container | Matroska (`.mkv`) |
+| Video codec | H.264 |
+| Resolution | 1280 × 720 |
+| Frame rate | 30/1 fps |
+| Duration | 367.00 s (~6 m 7 s) |
+| File size | 115 044 545 bytes (~110 MB) |
+| Audio | AAC (discard at re-encode time with `-an`) |
+| Bitrate | ~2.5 Mbit/s |
+
+### 2.2 Three overlay layers (Fact #1 — direct 12-frame variance analysis on the prior run)
+
+```
+-----------------------------------------------------------------------+
+|  GCS chrome top bar (status, mode, GPS, alt)                          |
+------------+----------------------------------------+-----------------+
+|            |  TOP-LEFT HUD (burned by camera)       | IR PIP          |
+| SL STATS   |  · timer 00:04:24                      | (live IR/thermal|
+| (live      |  · EO/IR zoom, FOV 53.2 °              |  stream stamped |
+| sidebar    |  · target lat/lon                      |  by the gimbal) |
+| values)    |                                         |                 |
+|            |   [actual EO video region]              |                 |
+|            |   crosshair, attitude ladder            |                 |
+|            |   FOV brackets, +/-3.7 ° pitch text     |                 |
+|            |                                         +-----------------+
+|            |  BOTTOM-RIGHT GIMBAL TEXT               | ROLL / SPEED /  |
+|            |  · 50.0823, 36.2515                     | DIST / BATT /   |
+|            |  · azimuth, elevation                   | CURRENT (live)  |
+------------+----------------------------------------+-----------------+
+|  Minimap / bottom status bar                                           |
+-----------------------------------------------------------------------+
+```
+
+The three layers map to **two removal classes** (per Reasoning Chain Dimension 1):
+
+| Layer | Renderer | Removal class |
+|---|---|---|
+| GCS UI chrome (sidebars, minimap, status bars) | the GCS application, **after** the video stream arrives | **Pure crop** — discard the columns and rows around the EO region; pixels are *outside* the camera's video, no inpainting needed. |
+| Burned-in gimbal OSD (attitude ladder, crosshair, FOV brackets, top-left HUD, bottom-right text) | the **camera itself**, before the recorder ever saw the stream | **Mask or inpaint** — these pixels overwrite real EO pixels; you must either tell the downstream not to look at them (mask) or fill them with something visually plausible (inpaint). |
+| IR PIP (upper-right rectangle, ~360×210 px) | the camera (it stamps its IR channel into a corner of the EO output) | **Crop it out** geometrically — the rectangle is large enough that `delogo`'s interpolation is poor; cleanest to just keep the crop tight enough to exclude it. |
+
+### 2.3 The aircraft / gimbal class (corroborated by the tlog scan in Section 5)
+
+- Airframe: **multirotor**, ArduCopter 4.6.3 on Pixhawk6X, QUAD/X frame (`STATUSTEXT`: `'Frame: QUAD/X'`). The project's spec'd nav-camera is a fixed-downward APS-C sensor on a *fixed-wing* per `restrictions.md` — this MKV represents a **different aircraft class** than the primary runtime target. That's a feature (extends test coverage) not a bug, but the fixture's metadata must record the discrepancy.
+- Gimbal: 3-axis stabilised, pitch range −90° to +20°, yaw range ±180°, roll range ±30° (per the tlog's `GIMBAL_MANAGER_INFORMATION` capability advertisement — Section 5). Consistent with the Topotek / Viewpro multi-sensor ball family identified by the prior run's visual inspection.
+- GCS: **Mission Planner 1.3.83** (per `STATUSTEXT`). The project's `restrictions.md` mandates QGroundControl as the production GCS; for this fixture, the GCS is just whatever was used to make the recording — not a runtime concern.
+
+---
+
+## 3. Where this fits in the existing solution (and where it does not)
+
+### 3.1 The gap
+
+`_docs/01_solution/solution_draft01.md` defines components C1 (VIO), C2 (VPR), C3 (matchers), C4 (PnP), C5 (state estimator), C6 (tile cache), C7 (inference runtime), C8 (FC adapter), C10 (provisioning) and more — all *runtime* concerns on the Jetson Orin Nano Super. None of them is a "data ingestion / fixture-prep" component. Replay fixtures appear in `tests/e2e/replay/` as already-cleaned MP4s (Fact #18).
+
+This is fine for `flight_derkachi.mp4`, which arrived pre-cleaned because the operator (a) disabled the gimbal OSD via Topotek's GimbalControl utility, (b) mechanically locked the gimbal nadir, and (c) recorded the direct camera feed at 1080p before cropping to 880×720 (Reasoning Chain Dimension 7).
+
+`2026-05-09 16-10-54.mkv` arrived from the *opposite* situation: OSD on, gimbal unconstrained, GCS-screen-recorded. There is no existing project tool to turn this class of input into a usable fixture, which is why the question came up.
+
+### 3.2 What this draft adds
+
+A new **fixture-prep developer tool** (location: `tools/fixture_prep/` or `tests/fixtures/<flight_id>/build.py`, per existing project layout conventions) that converts one GCS-screen-recorded `.mkv` (plus its paired `.tlog`) into a directory of files in the same shape as `_docs/00_problem/input_data/flight_derkachi/`:
+
+```
+input_data/flight_topotek_2026-05-09/
+├── flight_topotek_2026-05-09.mp4        # cleaned, cropped, OSD handled (Section 4)
+├── osd_mask.png                         # 1-channel mask used by Option B (Section 4.1)
+├── 2026-05-09 16-09-54.tlog             # unpacked from the supplied .zip
+├── data_imu.csv                         # SCALED_IMU2 + GLOBAL_POSITION_INT export
+├── frame_ranges.yaml                    # nadir vs non-nadir frame ranges (Section 4.3)
+├── camera_info.md                       # camera class + calibration provenance
+└── topotek_gimbal_factory.json          # calibration JSON, factory-sheet provenance
+```
+
+The tool is offline-only, deterministic, versioned, and reproducible — re-running it on the same input produces byte-identical outputs (the only non-determinism would be inside libx264, which we disable via `-preset placebo -tune zerolatency` or by pinning `-x264-params bframes=0:scenecut=0`, your choice depending on tolerated re-encode time).
+
+**It does not change any runtime component.** C1…C12 are untouched. The single optional change in the *test* layer is to teach `tests/e2e/replay/conftest.py::_calibration_path()` (or its sibling helpers) to also look for a companion `osd_mask.png` if Option B is selected — and to forward it as an extra kwarg to whatever wraps `DISK.forward()` inside C3 (see Section 4.1 for the exact one-line change required).
+
+---
+
+## 4. The recommended pipeline (and the cheap fallback)
+
+### 4.1 PRIMARY — A + B + J: crop, mask-aware DISK, OCR pitch filter
+
+**Step 1 — Geometric crop (FFmpeg)**: discard the GCS chrome and the IR PIP rectangle.
+
+```bash
+INPUT="_docs/00_problem/input_data/10.05.2026/2026-05-09 16-10-54.mkv"
+OUTPUT_DIR="_docs/00_problem/input_data/flight_topotek_2026-05-09"
+mkdir -p "$OUTPUT_DIR"
+
+# Crop coordinates *verified for this specific MKV* by direct frame inspection +
+# luminance/saturation discontinuity detection on 2026-05-30 (see fixture README).
+# Output: 610x260 EO-only region anchored at (250, 440) in the 1280x720 source.
+#
+# Why these numbers and not the prior research's draft (crop=900:445:50:25):
+#   - The IR PIP is much larger than initially estimated: it spans roughly
+#     x=620..1140, y=35..383 in the source frame. The prior crop's right edge at
+#     x=950 cut into the PIP and the left edge at x=50 still included the
+#     GCS left icon strip + the SL STATS panel.
+#   - The IR PIP rectangle (~520 wide x ~350 tall) is too large for FFmpeg
+#     `delogo` to interpolate cleanly. Geometric exclusion is the only honest
+#     option for this recording.
+#   - The largest *clean* EO rectangle (no GCS chrome, no IR PIP, almost no
+#     burned-in OSD) is in the lower half of the frame, below the IR PIP.
+#
+# Re-verify if you ingest a future recording with a different GCS layout or
+# IR-PIP placement; see fixture README for the derivation script.
+ffmpeg -y -i "$INPUT" \
+       -vf "crop=610:260:250:440" \
+       -an -c:v libx264 -crf 18 -preset medium \
+       "$OUTPUT_DIR/flight_topotek_2026-05-09.mp4"
+```
+
+This is Option A: verbatim pixels, no inpainting, no fabrication, deterministic. On this specific MKV the crop is tight enough that essentially **no** burned-in gimbal OSD survives inside the output (verified on 8 sample frames spread across the recording — variance analysis flagged 1/158 600 = 0.0006 % of pixels as "static OSD-like"). The remaining steps 2–5 below are still relevant for other recordings of this class that may need a looser crop.
+
+**Step 2 — Build the OSD mask** (one-time, then versioned in the repo).
+
+Build a 1-channel PNG of the same dimensions as the cropped output (900×445), where white (255) marks "real EO pixels — DISK is allowed to detect keypoints here" and black (0) marks "burned-in OSD pixels — DISK must suppress detection here". One quick recipe:
+
+```python
+# tools/fixture_prep/build_osd_mask.py
+import cv2, numpy as np
+from pathlib import Path
+
+# Open a sample cropped frame and any image editor; trace the OSD rectangles by hand,
+# then export as a 900x445 grayscale PNG. The script below is the deterministic alternative:
+# build the mask from a pixel-stability test over a sample of frames.
+
+src = Path("input_data/flight_topotek_2026-05-09/flight_topotek_2026-05-09_cropped.mp4")
+cap = cv2.VideoCapture(str(src))
+n_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+sample = [int(n_frames * f) for f in (0.05, 0.15, 0.30, 0.45, 0.60, 0.75, 0.95)]
+
+stack = []
+for idx in sample:
+    cap.set(cv2.CAP_PROP_POS_FRAMES, idx)
+    ok, frame = cap.read()
+    if not ok: raise RuntimeError(f"frame {idx} unreadable")
+    stack.append(cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY).astype(np.float32))
+
+stack = np.stack(stack)            # (N, H, W)
+std = stack.std(axis=0)            # (H, W)
+mean = stack.mean(axis=0)          # (H, W)
+
+# OSD heuristic: text/lines render as high-brightness, low-std (the *position* is stable
+# even if the *value* in that position changes — the bounding box itself does not move).
+# Real EO terrain over a moving camera is mid-brightness, high-std.
+osd_likely = (std < 12.0) & (mean > 180.0)      # white/bright pixels stable in position
+osd_likely = cv2.dilate(osd_likely.astype(np.uint8) * 255, np.ones((7, 7), np.uint8))
+
+mask = 255 - osd_likely                          # invert: white=keep, black=suppress
+cv2.imwrite("input_data/flight_topotek_2026-05-09/osd_mask.png", mask)
+```
+
+The std/mean thresholds above are the right shape but should be tuned by eye on this specific recording — the prior research's variance analysis showed `mean per-pixel std ≈ 30–40` for both the EO region and the GCS sidebars, so a `< 12` threshold cleanly separates burned-in OSD (which has near-zero std in pixels that contain text strokes) from real video. Inspect the saved `osd_mask.png` against a sample frame and refine the thresholds (or hand-trace) before committing it.
+
+**Step 3 — One-line wrapper change in C3 to forward the mask** (the only code change this draft proposes).
+
+`kornia.feature.DISK.forward(img, mask=None)` already accepts a mask argument of shape `(B, 1, H, W)` with values in `[0, 1]`, and multiplies the score map by it before NMS — keypoints in masked regions are suppressed by construction, with no preprocessing of the pixels themselves (Fact #13, Source #6 Kornia docs L1). The LightGlue maintainer (`cvg/LightGlue#97`) explicitly recommends this approach over post-hoc keypoint filtering.
+
+Locate the project's existing `kornia.feature.DISK(...)` instantiation and the call site that invokes it (per `solution_draft01.md` C3 the detector is DISK + LightGlue; the call site is somewhere under `src/.../matchers/` or the runtime DISK wrapper). Pass `mask=<tensor>` through, where `<tensor>` is loaded once at fixture-init time from `osd_mask.png` and re-used per frame.
+
+Sketch (project-specific paths to be filled in):
+
+```python
+# Existing
+feats = self.disk(img, n=self.n_kp)
+# Becomes
+feats = self.disk(img, n=self.n_kp, mask=self.osd_mask)
+```
+
+`self.osd_mask` is loaded once in `__init__` from `(fixture_dir / "osd_mask.png").read_bytes()` and reshaped to `(1, 1, H, W)` float32 in `[0, 1]`. If the fixture has no `osd_mask.png`, the wrapper falls through to the original mask-less call — so existing `flight_derkachi.mp4` continues to work unchanged.
+
+**Step 4 — Frame-level filter (Option J: OCR pitch from burned-in attitude indicator)**.
+
+The previously-preferred telemetry path (Option I — parse `MOUNT_STATUS` / `GIMBAL_DEVICE_ATTITUDE_STATUS` from the paired `.tlog`) is **not viable for this recording**. See Section 5 for the evidence. The remaining viable paths are:
+
+- **(J) OCR the burned-in pitch number** — the gimbal renders pitch as text such as `-3.7°` in the attitude indicator. Use Tesseract or PaddleOCR per frame on a fixed crop around that text region, then drop frames where `|pitch − (−90°)| > 10°`. Quick recipe (project must add `pytesseract` or `paddleocr` to `requirements-dev.txt`):
+
+  ```python
+  # tools/fixture_prep/frame_pitch_from_ocr.py
+  import cv2, pytesseract, re, json
+  pat = re.compile(r"(-?\d+\.\d+)")
+  src = "input_data/flight_topotek_2026-05-09/flight_topotek_2026-05-09_cropped.mp4"
+  cap = cv2.VideoCapture(src)
+  out = []
+  for frame_idx in range(int(cap.get(cv2.CAP_PROP_FRAME_COUNT))):
+      ok, frame = cap.read()
+      if not ok: break
+      # Crop the attitude-indicator text region (coordinates depend on the cropped frame).
+      roi = frame[y0:y1, x0:x1]
+      text = pytesseract.image_to_string(roi, config="--psm 7 -c tessedit_char_whitelist=-0123456789.")
+      m = pat.search(text)
+      out.append({"frame": frame_idx, "pitch_deg": float(m.group(1)) if m else None})
+  with open("input_data/flight_topotek_2026-05-09/frame_pitch.json", "w") as f:
+      json.dump(out, f)
+  ```
+
+  Then derive `frame_ranges.yaml` from `frame_pitch.json` by clustering contiguous frame indices whose pitch is within the nadir band.
+
+- **(Manual)** — for a one-off fixture of 6 minutes, the cheapest deterministic alternative is a *manual labeling pass*: a developer watches the cropped video once, notes the frame ranges where the gimbal is at nadir (`[0:42, 1:53][3:58, 6:06]` etc.), and saves the ranges as `frame_ranges.yaml`. ~30 minutes of human labour, zero failure modes, fully reproducible by anyone who can re-watch the same MP4. This is the recommended path **for this specific fixture** unless additional GCS-screen-recorded fixtures are expected, in which case the OCR script amortises across them.
+
+**Step 5 — Filter the cropped video down to nadir-only frames** (using `frame_ranges.yaml` from Step 4).
+
+```bash
+# Re-encode with a select filter restricting to the nadir frame ranges.
+# Build the select expression programmatically from frame_ranges.yaml.
+ffmpeg -y -i "$OUTPUT_DIR/flight_topotek_2026-05-09_cropped.mp4" \
+       -vf "select='between(n,1260,3390)+between(n,7140,10980)',setpts=N/FRAME_RATE/TB" \
+       -an -c:v libx264 -crf 18 -preset slow \
+       "$OUTPUT_DIR/flight_topotek_2026-05-09.mp4"
+```
+
+(Numbers above are illustrative; the actual `between(n, …)` segments come from the YAML.)
+
+### 4.2 FALLBACK — A + C + (J or manual): no code change to C3
+
+If teaching the C3 wrapper to forward `mask=…` is rejected for this fixture (a reasonable choice to keep `tests/e2e/replay/` purely "drop in an MP4 + a JSON + a CSV" with zero glue code), substitute **Option C** for Option B: replace each burned-in OSD rectangle with FFmpeg `delogo` interpolation.
+
+**On this specific MKV, Option C collapses into Option A.** The verified crop in §4.1 Step 1 already produces a near-zero-OSD output (0.0006 % of pixels flagged as static-OSD-like over 8 sample frames), so there are no rectangles left to delogo. The Option-C-versus-Option-B trade-off only re-emerges for hypothetical *other* recordings of this class that need a looser crop — e.g. a recording where the camera HUD is positioned differently and there is no clean rectangle wholly outside it. The generic recipe shape for such a recording would be:
+
+```bash
+# Template only — instantiate W/H/X/Y for the looser crop and (x,y,w,h)
+# rectangles for each surviving OSD region, all in cropped-frame coords.
+# The delogo filter in FFmpeg 8.1 has no 'band' parameter (removed); only x, y,
+# w, h, show remain. Rectangles must NOT touch the cropped frame's edge.
+ffmpeg -y -i "$INPUT" \
+       -vf "crop=W:H:X:Y,\
+delogo=x=x1:y=y1:w=w1:h=h1,\
+delogo=x=x2:y=y2:w=w2:h=h2,\
+..." \
+       -an -c:v libx264 -crf 18 -preset medium \
+       "$OUTPUT_DIR/$FIXTURE_ID_delogo.mp4"
+```
+
+**Important caveats on Option C** (when it does need to be used):
+- `delogo` rectangles must not touch the image edge (no surrounding pixels to interpolate from).
+- `delogo` produces *new* pixels (interpolated from the immediate neighbourhood). They are not synthesised semantic terrain content, but they *are* new pixels that did not exist in the original camera capture. The downstream feature detector *can* fire on smooth interpolated regions (DISK keypoints sometimes detect on smooth gradient transitions). This is the residual risk of Option C versus Option B; quantify it by running both pipelines on a few nadir segments and comparing the keypoint density inside the masked regions on the Option C output to zero (the trivially-correct value Option B delivers).
+- `delogo` does **not** scale to rectangles much larger than ~50 px in their shorter dimension. For this MKV the IR PIP is ~520 × 350 px and cannot be cleanly delogo'd at all — geometric exclusion (i.e. the corrected crop in §4.1 Step 1) is the only honest option.
+- Then chain Step 4 + Step 5 from Section 4.1 on top of this Option C output to get the same nadir-only result.
+
+### 4.3 Companion files
+
+| File | Source | Conventions |
+|---|---|---|
+| `flight_topotek_2026-05-09.mp4` | Sections 4.1 / 4.2 | H.264, 30 fps, exactly the cropped + OSD-handled + nadir-filtered video. Matches the `flight_derkachi.mp4` shape (any sub-1080p H.264 MP4 the replay harness already accepts). |
+| `osd_mask.png` | Section 4.1 Step 2 (only for Option B) | 900×445 grayscale PNG, white=keep, black=suppress. Versioned alongside the MP4. |
+| `2026-05-09 16-09-54.tlog` | Just unzip the `.zip` from `input_data/10.05.2026/` | Identical to the supplied tlog; ArduCopter 4.6.3 (Pixhawk6X), 133 191 messages over 446.8 s. |
+| `data_imu.csv` | Reuse the existing `derkachi.tlog → data_imu.csv` exporter, retargeted at this new tlog | 10 Hz table of `SCALED_IMU2` and `GLOBAL_POSITION_INT` per the `flight_derkachi/README.md` convention. |
+| `frame_ranges.yaml` | Section 4.1 Step 4 | List of `(start_frame, end_frame)` pairs the fixture considers "valid nadir frames". |
+| `camera_info.md` | Hand-written, modelled on `flight_derkachi/camera_info.md` | Records: camera class (Topotek / Viewpro 3-axis multi-sensor ball, per ArduPilot Source #4 + the tlog's `GIMBAL_MANAGER_INFORMATION` cap_flags), recording chain (camera HDMI → GCS app → desktop screen recorder → MKV), and the calibration's provenance flag (`factory_sheet`, per AZ-702 precedent — Fact #17). |
+| `topotek_gimbal_factory.json` | Same shape as `khp20s30_factory.json` (Fact #17) | Per-camera intrinsics + lens distortion from the camera's published spec sheet. Mark provenance `factory_sheet`. Residual focal-length error expected in the 1–3 % band, same envelope the project already accepts for `flight_derkachi.mp4`. |
+
+---
+
+## 5. New evidence — the paired tlog's gimbal state
+
+The prior 2026-05-29 run left exactly one unresolved row in `06_component_fit_matrix.md`:
+
+> **Frame filtering by gimbal pointing (PRIMARY) — `pymavlink` parser of paired `.tlog` for `MOUNT_STATUS` / `MOUNT_ORIENTATION`** → **Needs user decision**: depends on whether the paired tlog actually emits `MOUNT_STATUS` for the camera in question; if the gimbal does not report attitude over MAVLink, this option fails.
+
+This draft resolves that row by directly scanning the paired tlog. Here is the evidence.
+
+### 5.1 What is in the tlog (`2026-05-09 16-09-54.tlog`, unpacked from the supplied `.zip`)
+
+`pymavlink 2.4.49` with `MAVLINK20=1`, `MAVLINK_DIALECT=all`. Scanned: 133 191 messages over 446.8 s, 46 distinct message types. Relevant subset:
+
+| Message type | Count | Mean rate | Notes |
+|---|---|---|---|
+| `HEARTBEAT` | 1492 | 3.3 Hz | 4 endpoints: `(sys=1, comp=1, autopilot=3, type=2)` = ArduCopter / QUAD multirotor; `(sys=1, comp=191, autopilot=0, type=6)` = a GCS-class component co-resident on sysid 1; `(sys=255, comp=0, autopilot=8, type=18)` and `(sys=255, comp=190, autopilot=8, type=6)` = Mission Planner GCS. |
+| `ATTITUDE` (vehicle) | 4174 | 9.3 Hz | Body pitch range: min −12.47°, max +4.68°, mean −3.95°. **This is the airframe attitude**, not the gimbal. |
+| `GIMBAL_DEVICE_ATTITUDE_STATUS` | **4338** | **9.7 Hz** | **All 4338 messages carry the identity quaternion `q = (1.0, 0.0, 0.0, 0.0)`** (exactly one distinct quaternion value across the entire flight). `flags = 0x002c` = `YAW_IN_VEHICLE_FRAME | PITCH_LOCK | ROLL_LOCK`. `failure_flags = 0x00000000`. |
+| `GIMBAL_MANAGER_INFORMATION` | 26 | discovery exchange | `gimbal_device_id=1`. Capability: pitch range `[−90°, +20°]`, yaw range `±180°`, roll range `±30°`. cap_flags=206847. Confirms the gimbal physically *can* reach nadir; just isn't reporting where it is right now. |
+| `COMMAND_LONG` distinct cmds | — | — | Only 6 distinct command IDs: 183 (`DO_SET_SERVO ch=15 pwm=1950` — one shot, possibly a release / trigger), 400 (`COMPONENT_ARM_DISARM`, twice), 511 (`SET_MESSAGE_INTERVAL`, 11×), 512 (`REQUEST_MESSAGE`, 100×), 520 (`REQUEST_AUTOPILOT_CAPABILITIES`, 47×), and 42428 (vendor-specific, params all zero). **None of these is a gimbal control command (no `MAV_CMD_DO_MOUNT_CONTROL` = 205, no `MAV_CMD_DO_GIMBAL_MANAGER_PITCHYAW` = 1000).** |
+| `NAMED_VALUE_FLOAT` names | 1 unique | — | Only `ESCs_CURR`. No gimbal-related custom variable. |
+| `STATUSTEXT` | 38 | — | Includes `'ArduCopter 4.6.3 - Agile(px6) (92b0cd78)'`, `'Pixhawk6X 001E0036 …'`, `'Frame: QUAD/X'`, `'Mission Planner 1.3.83'`. No gimbal-related text. |
+
+### 5.2 What the identity quaternion really means
+
+`q = (1, 0, 0, 0)` is the null rotation. Per the MAVLink GIMBAL_DEVICE_ATTITUDE_STATUS spec, that means "the gimbal is in its default forward-pointing pose" (no rotation away from the body frame's +X). But the prior run's frame-by-frame visual inspection saw the gimbal *clearly pointing forward at t=30s and clearly pointing nadir at t=300s* (Fact #5). The two observations are mutually exclusive: if the gimbal were truly at the null rotation throughout the flight, every frame would look like it does at t=30s (forward).
+
+The reconciliation: **the gimbal is being moved by the operator, but the actual angle is not being reported back over MAVLink in this recording.** The gimbal driver is emitting the placeholder identity quaternion every ~100 ms because the ArduPilot mount driver expects to publish *something* at the configured rate, but no real angle is available (the gimbal device either isn't wired to talk back over MAVLink, or it is wired but isn't responding, or it is responding on a different transport — most likely the camera's own Ethernet protocol talking directly to the GCS, bypassing the autopilot).
+
+This is consistent with:
+- Mission Planner being able to control Topotek / Viewpro gimbals directly over Ethernet/UDP, separate from the ArduPilot MAVLink path.
+- The `DO_SET_SERVO ch=15 pwm=1950` one-shot pointing to a *trigger* (likely shutter / record-toggle), not a per-frame angle command.
+- The absence of `MAV_CMD_DO_MOUNT_CONTROL` and the absence of `GIMBAL_MANAGER_SET_ATTITUDE` in `COMMAND_LONG`.
+
+### 5.3 Effect on the recommendation
+
+| Component-fit row (from the 2026-05-29 component fit matrix) | Original status | Status after Section 5 |
+|---|---|---|
+| Frame filtering by gimbal pointing (PRIMARY) — `pymavlink` parser of `MOUNT_STATUS`/`MOUNT_ORIENTATION` from the paired tlog | **Needs user decision** | **❌ Rejected for this recording.** The message type IS present at 9.7 Hz, but every quaternion is the placeholder identity value; the data carries zero information about the actual gimbal angle. |
+| Frame filtering by gimbal pointing (FALLBACK) — OCR on the burned-in pitch text | Experimental only | **✅ Selected as primary** (or the manual labeling pass, for one-off fixtures of this size). |
+
+**No other row in `06_component_fit_matrix.md` changes.** The pixel-handling recommendation (Option B primary, Option C fallback) and the rejections (Options F generative / G temporal-median) stand.
+
+### 5.4 Why this is not a project-runtime issue
+
+The project's *runtime* nav-camera per `restrictions.md` is the ADTi 20MP fixed-downward (no gimbal at all). The runtime pipeline never sees a multi-sensor-ball gimbal-attitude stream. So the gap discovered here ("Mission-Planner-driven Topotek gimbals don't expose attitude over MAVLink") is only relevant for fixture preparation, not for the runtime contract. The follow-up "change the recording procedure to enable Topotek's own attitude-publish path" would require camera/GCS access this project does not have, so it is unavailable as a workaround. For any further recordings of this class, **plan on OCR-based pitch recovery (Option J) — or a manual labelling pass per fixture — as the standing strategy**, not as a temporary fallback.
+
+---
+
+## 6. Component fit summary (consolidated)
+
+> Full detail per row in [`../../00_research/_mode_b_2026-05-29_video_extraction/06_component_fit_matrix.md`](../../00_research/_mode_b_2026-05-29_video_extraction/06_component_fit_matrix.md). The table below is the *post-tlog-scan* update.
+
+| Component area | Candidate | Pinned mode | Status | Notes |
+|---|---|---|---|---|
+| GCS-chrome geometric crop | FFmpeg `crop` filter | `crop=900:445:50:25` per recording, derived from variance-map analysis | **Selected** | Trivial, lossless within re-encode, deterministic. PoC1 produced playable output on the prior run. |
+| OSD pixel handling (PRIMARY) | Kornia `DISK.forward(img, mask=…)` | mask-aware mode, `(B, 1, H, W)` mask multiplied into the DISK score map before NMS | **Selected** | No pixel modification; fabrication-risk = 0. Requires one-line C3 wrapper change to forward `mask=`. Already API-verified against Kornia docs L1 (Source #6) + LightGlue maintainer reply (Source #5). |
+| OSD pixel handling (FALLBACK) | FFmpeg `delogo` chained | multiple `delogo=x:y:w:h` after `crop`, rectangles inside the cropped frame | **Selected (fallback)** | PoC4 produced `poc4_delogo.mp4` on the prior run. Pick this if the C3 wrapper change is rejected for this fixture. |
+| OSD pixel handling | FFmpeg `removelogo` PNG mask | `removelogo=mask.png` | **Experimental only** | Failed locally with `Invalid argument` (-22) on FFmpeg 8.1; works in older versions per Source #15. Try first on your team's pinned FFmpeg before falling through to chained `delogo`. |
+| OSD pixel handling | ProPainter (non-generative video inpainter, ICCV 2023) | mask-guided sparse Transformer with flow completion | **Experimental only** | Highest visual quality among non-generative options. Adds PyTorch+CUDA toolchain; ~0.25 s/frame at 480p (Fact #11). Use only if a future recording's masked regions are too large for `delogo` interpolation. |
+| OSD pixel handling | VideoPainter / DiffuEraser / VidPivot (and any generative video inpainter) | diffusion-backbone I2V generative inpainter | **❌ Rejected** | Synthesises terrain content. Disqualified by `meta-rule.mdc` "Real Results, Not Simulated Ones" (Fact #12). |
+| OSD pixel handling | FFmpeg `tmedian` temporal median | `tmedian=radius=N` | **❌ Rejected** | Burned-in OSD text values change every frame, so the static-OSD assumption underneath the technique fails. PoC3 confirmed: smeared, ghosted output (Fact #10). |
+| Non-nadir frame filter (PRIMARY) | `pymavlink` MOUNT_STATUS / GIMBAL_DEVICE_ATTITUDE_STATUS | parse paired tlog → `frame_idx → gimbal_pitch_deg` table | **❌ Rejected for this recording (NEW)** | Section 5: message present at 9.7 Hz, but all 4338 quaternions are identity (1,0,0,0) — no real angle data. |
+| Non-nadir frame filter (PRIMARY, new) | OCR (Tesseract or PaddleOCR) on burned-in pitch text | per-frame OCR of the `−3.7°` text in the attitude indicator | **Selected (NEW)** | Was "Experimental only" pre-tlog-scan; promoted to primary now that the telemetry path is dead. Add `pytesseract` or `paddleocr` to `requirements-dev.txt`. |
+| Non-nadir frame filter (one-off alternative) | Manual labeling pass | developer watches the 6-min clip, marks ranges, commits `frame_ranges.yaml` | **Selected (for this fixture only)** | Cheapest deterministic path; recommended for this specific MKV unless additional GCS-screen-recorded fixtures are expected. |
+| Calibration JSON | Per-camera `topotek_gimbal_factory.json` (same shape as `khp20s30_factory.json`) | "factory_sheet" provenance per AZ-702 precedent | **Selected** | Project-accepted (Fact #17). Residual 1–3 % focal-length error envelope. |
+| Companion telemetry CSV | Existing `derkachi.tlog → data_imu.csv` exporter, retargeted | unchanged | **Selected** | Reuses existing tool. |
+| Source recovery (Option Z) | Pull RTSP / extract DCIM from gimbal with OSD disabled via Topotek GimbalControl utility (ArduPilot Source #4) | n/a — out-of-band camera access | **❌ Not available — no camera / GCS access for this data source.** | Documented here only because it is the cleanest path *in principle* and because the existing `flight_derkachi.mp4` fixture was produced this way. Not actionable for this project's data pipeline. The only way it returns to the table is if the original supplier voluntarily re-records with OSD off — outside this project's control. |
+
+---
+
+## 7. Testing strategy
+
+### 7.1 Functional / integration
+
+1. **Crop-coordinate validation.** Decode 5 frames from `flight_topotek_2026-05-09.mp4` and assert they are 900×445; assert the IR PIP is *not* present in the right-third of the frame; assert the GCS sidebars are *not* present in the leftmost / rightmost columns.
+2. **OSD mask validation.** Open `osd_mask.png`, assert dimensions match the MP4, assert the union of black pixels covers ≥95 % of the union of OSD rectangles you would otherwise pass to `delogo`. Optionally, render `cropped_frame * (mask/255.)` and eyeball that the burned-in text is dimmed to black while the EO terrain is preserved.
+3. **DISK mask-aware contract.** Add a unit test under `tests/unit/c3_matchers/` that loads the existing DISK wrapper, passes a synthetic 900×445 image with a checkerboard pattern + a corner rectangle of pure white, passes a mask zeroing the corner, and asserts no keypoint is returned at coordinates inside that corner.
+4. **End-to-end replay smoke test.** Add a sibling test to `tests/e2e/replay/test_az835_e2e_real_flight.py` parameterised over `flight_topotek_2026-05-09` and confirm the pipeline runs to completion. Track end-to-end accuracy separately under AC-1.x.
+5. **Frame-range filter sanity.** Iterate `frame_ranges.yaml` and assert: every range's `start_frame < end_frame`, ranges are non-overlapping, and the union covers at least N seconds of footage (where N is a project-chosen minimum-fixture-duration).
+
+### 7.2 Non-functional
+
+- **Reproducibility**: re-run the entire `tools/fixture_prep/` script twice on a clean checkout and assert byte-identical outputs (pin libx264 settings; pin `pytesseract` version; pin Python version in `pyproject.toml`).
+- **Throughput**: the entire fixture-prep run for one 6-minute MKV should complete in well under 10 minutes on a developer workstation (no AC requirement; sanity ceiling).
+- **No fabrication regression**: extract keypoints from the masked region of the Option B output and assert count == 0; for the Option C output, assert keypoint count is at most 5 % of the unmasked terrain keypoint count.
+
+---
+
+## 8. Open questions
+
+1. **Adopt the one-line C3 wrapper change?** Section 4.1 Step 3 proposes forwarding `mask=…` through the existing DISK call. This is the lowest-risk highest-quality path (Option B) but requires touching `src/.../matchers/`. The fallback (Option C, chained `delogo`) avoids this code change entirely and only touches the fixture-prep script. **Either is defensible** — the choice depends on whether the team is willing to formalise mask-aware fixtures as a first-class concept in the replay layer (recommended yes) or wants to keep that layer "drop in an MP4" pure (defensible too).
+2. **Use OCR for the frame filter, or just hand-label this one fixture?** For a single 6-minute clip, the manual labeling pass is cheaper than building, validating, and pinning a Tesseract/PaddleOCR pipeline. Use OCR only if you expect to ingest additional fixtures of the same class (same gimbal HUD layout) and want the script to amortise. Either way, the YAML output format is the same — so this can be revisited later.
+3. **Future recordings — Option Z (direct RTSP/DCIM extraction with OSD off) is not available** to this project: no camera or GCS access exists for this data source. The only theoretical path to bypass the cleanup pipeline is to ask the original supplier to re-record with OSD disabled (via Topotek's GimbalControl utility on their side, or by setting `MNT1_OPTIONS` / equivalent on their flight controller). Whether to even make that request is a separate decision; it is not a technical option this project can execute on its own. Assume Option Z stays unavailable and plan all future fixtures of this class around the OCR / manual-labelling path in §4 + §5.3.
+
+---
+
+## 9. References
+
+L1 (official documentation / source):
+- FFmpeg `delogo` filter, vf_delogo.c [Sources #1, #2 in source registry]
+- FFmpeg `removelogo` filter, vf_removelogo.c [Source #3]
+- FFmpeg `tmedian` filter [Source #8]
+- ArduPilot Topotek Gimbal docs [Source #4]
+- LightGlue maintainer reply on score-map masking, issue cvg/LightGlue#97 [Source #5]
+- Kornia `DISK.forward()` documentation [Source #6]
+- DISK upstream source (`disk/model/disk.py`) [Source #7]
+- Project: `_docs/00_problem/input_data/flight_derkachi/README.md` [Source #9]
+- Project: `_docs/00_problem/input_data/flight_derkachi/camera_info.md` [Source #10]
+
+L2 (peer-reviewed):
+- ProPainter (ICCV 2023) [Source #11]
+- VideoPainter (arXiv 2503.05639, 2025) [Source #12] — referenced as disqualified
+- VidPivot / DiffuEraser comparison (arXiv 2510.21461, 2025) [Source #13]
+- DISK paper (NeurIPS 2020, arXiv 2006.13566) [Source #14]
+
+L3 (practitioner / community):
+- "Removing obnoxious logos from videos" blog [Source #15]
+- Conditional Temporal Median Filter reference [Source #16]
+- Foundry Nuke TemporalMedian reference [Source #17]
+
+In-repo cross-references:
+- `_docs/01_solution/solution_draft01.md` — existing solution; C2 (MixVPR TensorRT INT8+FP16), C3 (DISK + LightGlue), C5 (GTSAM iSAM2 + CombinedImuFactor) [Source #R1]
+- `_docs/00_research/06_component_fit_matrix/00_summary.md` — confirms no fixture-prep component exists in the runtime [Source #R2]
+- `_docs/00_problem/input_data/flight_derkachi/khp20s30_factory.json` — existing per-camera calibration JSON precedent [Source #R3]
+
+This-run new evidence:
+- ffprobe verification of `2026-05-09 16-10-54.mkv` technical metadata (Section 2.1)
+- `pymavlink` scan of unpacked `2026-05-09 16-09-54.tlog` (Section 5) — 133 191 messages over 446.8 s, GIMBAL_DEVICE_ATTITUDE_STATUS at 9.7 Hz, all identity quaternions
+
+---
+
+## 10. Related artifacts
+
+| Artifact | Status |
+|---|---|
+| `_docs/00_research/_mode_b_2026-05-29_video_extraction/` | Complete through Step 7.5 — this draft is its Step 8 deliverable, with one row updated by new tlog evidence |
+| `_docs/01_solution/solution_draft01.md` | Untouched. C1–C12 unchanged. This draft is purely additive. |
+| `_docs/01_solution/solution.md` | Untouched. |
+| `_docs/00_problem/input_data/10.05.2026/2026-05-09 16-10-54.mkv` | Source MKV. Untouched. |
+| `_docs/00_problem/input_data/10.05.2026/2026-05-09 16-09-54.zip` | Source tlog archive. Untouched. |
+| Future: `input_data/flight_topotek_2026-05-09/` | The cleaned fixture directory this draft proposes producing. Not yet created. |
+| Future: `tools/fixture_prep/` | The reproducible script that will produce the above. Not yet created. |
@@ -262,11 +262,48 @@ source repo
 | ArduPilot Plane FC | MAVLink 2.0 (`GPS_INPUT` 5 Hz; `MAV_CMD_SET_EKF_SOURCE_SET`; `STATUSTEXT` / `NAMED_VALUE_FLOAT`) over UART/USB | MAVLink 2.0 message signing, per-flight key (D-C8-9 = (d)) | 5 Hz periodic emit; signing handshake at takeoff load (≤ 5 s, AC-NEW-1) | Signing handshake fail → companion refuses takeoff; mid-flight signing key compromise → FC ignores unsigned messages, AC-5.2 takes over |
 | iNav FC | MSP2 `MSP2_SENSOR_GPS` over UART; MAVLink outbound for telemetry | None (iNav has no signing) — accepted residual risk per Mode B Source #129 | 5 Hz periodic emit | Mid-flight bad-frame → iNav `mspGPSReceiveNewData()` receives only the latest frame; honest `hPosAccuracy` is the only safety net |
 | QGroundControl (GCS) | MAVLink 2.0 (`STATUSTEXT`, `NAMED_VALUE_FLOAT`, `GPS_RAW_INT`) | Same MAVLink 2.0 signing as the AP path (AP profile); no signing on iNav profile | 1–2 Hz downsampled (AC-6.1); operator commands are best-effort | GCS link drop → companion continues; no mid-flight reconfiguration is required from GCS |
-| `satellite-provider` (pre-flight) | REST over HTTP, OpenAPI at `/swagger`; filesystem access if co-located | TLS + service-internal API key (operator workstation only); the companion never reaches `satellite-provider` directly while airborne | Off-line pre-flight; not time-critical | Cache miss → C11 `TileDownloader` fails fast pre-flight; C10 build is blocked downstream; takeoff blocked |
+| `satellite-provider` (pre-flight read — bbox + slippy-map) | REST `POST /api/satellite/tiles/inventory` (bulk lookup by `(z,x,y)`, ≤ 5000 entries / request) + `GET /tiles/{z}/{x}/{y}` (slippy-map JPEG fetch); OpenAPI at `/swagger`; filesystem access if co-located | JWT Bearer (`SATELLITE_PROVIDER_API_KEY`) over TLS; the dev-only `SATELLITE_PROVIDER_TLS_INSECURE=1` env knob accepts the self-signed dev cert. The companion never reaches `satellite-provider` directly while airborne. | Off-line pre-flight; not time-critical | Cache miss → C11 `TileDownloader` fails fast pre-flight; C10 build is blocked downstream; takeoff blocked |
+| `satellite-provider` (pre-flight route seed — cycle 3 / Epic AZ-835) | REST `POST /api/satellite/route` (corridor onboarding; body per `CreateRouteRequest.cs` DTO) + `GET /api/satellite/route/{id}` (status polling; terminal-success `mapsReady=true`) | Same JWT Bearer / TLS-insecure as the read path; validated pre-emptively against AZ-809 `CreateRouteRequestValidator` bounds | Off-line pre-flight; bounded by `poll_max_attempts × poll_interval_s` (default 60 × 5 s) | Terminal failure → `RouteTerminalFailureError`; transient → `RouteTransientError`; validation → `RouteValidationError`. C11's `SatelliteProviderRouteClient` (AZ-838) owns the surface. |
 | `satellite-provider` (post-landing ingest, D-PROJ-2, **planned**) | REST `POST /api/satellite/tiles/ingest` (multipart) | Per-flight onboard signing key (carried with each tile); rate-limited | Bursty post-landing | Endpoint not yet implemented service-side → C11 keeps batches queued locally; never blocks the pre-flight cycle |
 | Operator workstation (pre-flight stage) | Filesystem (USB / Ethernet) | OS-level (operator login) | Not time-critical | Bad-stage detection via Manifest content-hash gate (D-C10-3) |
 | Nav camera | USB / MIPI-CSI / GigE (lens-module dependent) | n/a | 3 Hz | Frame drop / hardware fault → "VISUAL_BLACKOUT" path (AC-3.5, AC-NEW-8) |

+### `satellite-provider` integration (cycle-3 ground truth)
+
+**The Jetson e2e harness now consumes the REAL parent-suite `satellite-provider` .NET service** (lineage AZ-688 / AZ-691 / AZ-692; `satellite-provider` + `satellite-provider-postgres` services in `docker-compose.test.jetson.yml`). The legacy `mock-sat` fixture is retired from the Jetson compose; D-PROJ-2 `POST /api/satellite/upload` has shipped service-side (`Program.cs:211`). Tier-1 `docker-compose.test.yml` is deprecated 2026-05-20 per `_docs/02_document/tests/environment.md`.
+
+Two consequences for the architecture:
+
+1. **C11 read contract adapted to the v1.0.0 inventory shape (AZ-777 Phase 1)** — `POST /api/satellite/tiles/inventory` + `GET /tiles/{z}/{x}/{y}` replace the historical `GET /api/satellite/tiles?bbox=…&zoom=…` shape. The bbox-driven `download_tiles_for_area` entry point and its DTOs are unchanged at the call-site level; the contract adaptation is internal to `HttpTileDownloader`. Auth is JWT Bearer (`SATELLITE_PROVIDER_API_KEY`) over TLS; `SATELLITE_PROVIDER_TLS_INSECURE=1` is a documented dev-only knob for self-signed certs. **Proposed successor (ADR-013 / AZ-976)**: gRPC `satellite.v1.RouteTileDelivery.DeliverRouteTiles` server-streaming with client tile catalog — see `tile_provision_grpc.md`; supersedes the never-shipped inventory REST endpoint.
+2. **Route-driven seeding (Epic AZ-835 / AZ-969)** — the operator submits a tlog-derived `RouteSpec` (produced by `replay_input.tlog_route.extract_route_from_tlog` — AZ-836) via C12 `seed-cache-from-tlog` (AZ-974) or the F11 `replay_api` demo job (AZ-973). E2E fixture `operator_pre_flight_setup` wraps the same production `operator_replay.cache_seed` module.
+
+**Imagery source license attribution (cycle 3)**: the Jetson `satellite-provider` instance downloads from the **Google Maps satellite layer** (`lyrs=s`), governed by Google Maps Platform Terms of Service. Dev/research use only; production deployment requires either a Google Maps Platform licensing review or migration to a true CC-BY satellite source on the parent-suite side (parent-suite ticket TBD). Operator-side seed scripts (`tests/fixtures/derkachi_c6/seed_region.py`, `seed_route.py`) propagate the "Imagery © Google" attribution.
+
+**AZ-777 Phase 3+ superseded by Epic AZ-835**: AZ-777 originally proposed five phases — wire e2e-runner (Phase 1), seed Derkachi bbox (Phase 2), rewrite `operator_pre_flight_setup` fixture (Phase 3), un-xfail AC-4 / AC-5 (Phase 4), docs (Phase 5). Phases 1+2 shipped under AZ-777 itself (batch 104, cycle 3). Phases 3 and 5 were **superseded** when the user redirected the work to a route-driven flow: Phase 3 → AZ-839 (real fixture wiring C1+C2+C11+C10), Phase 5 → AZ-842 (this docs ticket). Phase 4 (un-xfail) was deferred to backlog after the cycle-4 redesign (AZ-895) took the un-xfail target along a different path and is not on the active epic. The AZ-777 task spec at `_docs/02_tasks/done/AZ-777_derkachi_c6_reference_fixture.md` carries the supersedure banner; this architecture document is the authoritative high-level pointer for that decision.
+
+No new ADR — this is execution of existing decisions (architectural principle #5 satellite-provider on-disk layout end-to-end; ADR-004 process-level isolation unchanged; ADR-011 replay is a configuration unchanged). The architectural surface gained the route-driven seeding path inside C11; nothing else moved.
+
+### Replay input redesign (cycle 4 — single canonical clock + CSV-driven path)
+
+Cycle 4 rebuilt the replay-mode operator-input surface around a single canonical clock to close the AZ-848 ESKF out-of-order regression and to retire the tlog auto-sync surface that produced the misalignment risk in the first place. Four tickets ship the change:
+
+| Ticket | Role | Description |
+|--------|------|-------------|
+| **AZ-894** (CSV adapter) | New primary path | `csv_replay_input.CsvReplayInputAdapter` consumes a paired `(video, CSV)` where the CSV's `Time` column is the canonical clock for every IMU/GPS sample. Gated `BUILD_CSV_REPLAY_ADAPTER=ON` in airborne and research binaries; OFF in operator-orchestrator. |
+| **AZ-895** (auto-sync deprecation) | Removed legacy | `replay_input.auto_sync` (AZ-405) reduced to a no-op stub that raises on first call; `tlog_video_adapter.py` reduced to a deprecated stub whose `open()` raises immediately. The legacy `--time-offset-ms` / `--skip-auto-sync` / `--auto-trim` CLI flags accepted-with-warning, ignored. Hard removal tracked in AZ-908 (cycle 5+ backlog). |
+| **AZ-896** (CSV format spec) | Contract | `_docs/02_document/contracts/replay/csv_replay_format.md` documents the CSV row schema, the row-0-alignment-with-video-frame-0 invariant, and an example `data_imu.csv` shipped under the same path. |
+| **AZ-897** (operator UI) | Cycle 5 — Epic AZ-969 | Dual-timeline `(video, tlog)` alignment UI in `../ui`; uploads raw tlog, calls `replay_api` preview/align/demo endpoints; displays map + verdict. Spec: `../ui/_docs/02_tasks/todo/AZ-897_operator_replay_sync_ui.md`. |
+
+The architectural rationale is captured in **Invariant 14** of the replay protocol (`_docs/02_document/contracts/replay/replay_protocol.md`): the system runs as a single edge process on a single device; there must be exactly one wall/monotonic clock authoritative for timestamps that cross component boundaries. In live mode that clock is the C8 inbound `FcAdapter`'s FC-boot-relative timestamp; in replay mode (after cycle 4) it is the CSV row's `Time` column. The previous design's two-clock surface (Jetson monotonic at C1 VIO emission, FC-boot at C8 IMU window arrival) produced the AZ-848 regression and is retired with the auto-sync deprecation.
+
+The legacy `TlogReplayFcAdapter` is retained for audit paths — offline FDR analysis and `gps-denied-tlog-to-csv` export (AZ-972). Runtime replay uses the CSV adapter after operator alignment (F11 / Epic AZ-969).
+
+### Demo replay operator flow (cycle 5 — Epic AZ-969)
+
+F11 in `system-flows.md` is the **primary product demo**, not an e2e-test concern. Raw operator inputs are `(video, tlog, calibration)`; alignment produces an AZ-896 CSV on a single canonical clock; route-driven cache seeding uses `extract_route_from_tlog` via C12 / `replay_api` production modules (AZ-974, AZ-973). Backend children: AZ-970 (preview API), AZ-971 (alignment refine), AZ-972 (CSV export), AZ-973 (orchestration), AZ-974 (C12 seed CLI), AZ-975 (docs). UI: AZ-897 in `../ui`.
+
+The cycle-4 `(video, CSV)` upload bypass (AZ-959) remains for operators who already have an aligned CSV; it is not the default demo entry.
+
 ### `satellite-provider` upload contract (per D-PROJ-2 carryforward)

 The onboard side of D-PROJ-2 is fully specified in `_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md`. From this architecture's standpoint:
@@ -274,7 +311,7 @@ The onboard side of D-PROJ-2 is fully specified in `_docs/_process_leftovers/202
 - **`Tile` writes are append-only and idempotent** (the same `(zoomLevel, lat, lon, capture_timestamp, companion_id, flight_id)` tuple is the dedup key).
 - **Quality metadata is mandatory on every uploaded tile** so the planned voting layer can promote `pending → trusted` without re-deriving statistics on the service side.
 - **Onboard tiles never claim the `trusted` status**; they are uploaded as `pending` and the parent-suite voting layer (D-PROJ-2 design task #2) decides promotion.
- **Test substitute**: `mock-suite-sat-service` is an e2e-test-only fixture (under `tests/fixtures/mock-suite-sat-service/`) that implements the upload contract for NFT-SEC-01 / FT-P-17 / IT runs until D-PROJ-2 lands service-side. It is **not a component** in the architectural sense — the production architectural counterparty for both download and upload is the real `satellite-provider`. The fixture is retired the moment the real ingest endpoint ships.
+- **Test substitute**: `mock-suite-sat-service` is an e2e-test-only fixture (under `tests/fixtures/mock-suite-sat-service/`) that implements the upload contract for NFT-SEC-01 / FT-P-17 / IT runs until D-PROJ-2 lands service-side. It is **not a component** in the architectural sense — the production architectural counterparty for both download and upload is the real `satellite-provider`. The fixture is retired the moment the real ingest endpoint ships. (Download + route-seed integration tests on the Jetson harness already run against the real service as of cycle 3.)

 ---

@@ -713,4 +750,69 @@ Two facts surfaced during the Step 7 (Implement) batch loop that contradicted th
 - AZ-405 grows slightly: it now also owns the `replay_input/` coordinator (the natural home for the auto-sync logic + the time-offset application).
 - AZ-404 (E2E replay test) is unchanged in scope but reworded: it asserts mode-agnosticism (Invariant 1) and runs against the unified airborne image — no fourth-image entrypoint to verify.
 - C8 gains a thin `MavlinkTransport` Protocol seam introduced by AZ-400: `SerialMavlinkTransport` (live) and `NoopMavlinkTransport` (replay) implement it. This is a no-op restructure of the existing C8 transport code; the encoders are unchanged. The Protocol seam is the architectural mechanism for Invariant 5 (encoders are byte-identical).
- Demo↔field fidelity is now structurally guaranteed: the same binary runs in both contexts; any drift between them is a behavioural-test failure, not an SBOM-diff failure.
+- Demo↔field fidelity is now structurally guaranteed: the same binary runs in both contexts; any drift between them is a behavioural-test failure, not an SBOM-diff failure.
+
+### ADR-012 — Open-loop ESKF composition profile via `c4_pose.enabled = false` (AZ-776)
+
+**Context**: ADR-009 wires the C4 pose estimator and the C5 state estimator through a shared GTSAM iSAM2 substrate — C4 adds its PnP factor directly to C5's iSAM2 graph (ADR-003). The `c4_pose` slot in `runtime_root/airborne_bootstrap.py` lists `c5_isam2_graph_handle` as a required `pre_constructed` key (AZ-625), and the `OpenCVGtsamPoseEstimator` constructor consumes that handle. This wiring was sound for the steady-state GTSAM-iSAM2 build of C5.
+
+When C5 ships a second strategy — `eskf` (ESKF baseline, AZ-588) — the substrate is **not** an iSAM2 graph: ESKF integrates an IMU-driven covariance forward closed-form, with no factor graph behind it. Its `create()` factory returns `(estimator, None)` for the second tuple element (the iSAM2 handle slot). Two facts surfaced from this:
+
+1. **`c4_pose` cannot be the gate.** C4 owns satellite-anchored pose estimation. ESKF runs satellite-free open-loop. Forcing `c4_pose` into the composition when no satellite anchoring is wired means C4 either crashes at construction (no iSAM2 handle) or, worse, gets a fake handle that pretends to anchor poses that nothing produces — a silent passthrough that violates the "Real Results, Not Simulated Ones" meta-rule.
+2. **The replay Tier-2 smoke profile needs an honest minimum.** The AZ-265 replay path's mandatory simple baseline is KLT/RANSAC VIO + ESKF state estimator without any satellite re-anchoring (AZ-777 will add the satellite path on top via the Derkachi C6 reference tile cache). Without an explicit composition profile that excludes C4, every Tier-2 test that wants to exercise the simple baseline either crashes at compose time or has to monkey-patch the registry — both are anti-patterns for an architectural seam.
+
+**Decision**:
+
+1. **`C4PoseConfig.enabled: bool = True` is the user-facing switch for the open-loop ESKF profile.** Default ON preserves the ADR-003 steady-state airborne path. Setting `enabled=False` instructs `compose_root` to remove `c4_pose` from the selection map before topological ordering — the wrapper never runs, the consumer never sees a handle, and the wiring stays honest.
+2. **`compose_root` enforces the C4↔C5 pairing matrix at compose time.** The validation gate lives in `_validate_c4_c5_composition_profile` (called from `compose_root` before `_compose`) and rejects the two off-diagonal cells of the 2×2 (`c4_pose.enabled`, `c5_state.strategy`) matrix with a `CompositionError` naming both blocks. The two valid combinations are:
+   - `c4_pose.enabled=True` + `c5_state.strategy="gtsam_isam2"` — the ADR-003 / ADR-009 steady-state airborne path.
+   - `c4_pose.enabled=False` + `c5_state.strategy="eskf"` — the open-loop ESKF profile (Tier-2 smoke baseline; satellite anchoring deferred to AZ-777).
+   The two **invalid** combinations are rejected with explicit error text:
+   - `enabled=False` + `gtsam_isam2` (an iSAM2 graph with no PnP anchors converges to drift-prone visual-only odometry; the production deployment intent is that gtsam_isam2 always coexists with C4).
+   - `enabled=True` + `eskf` (ESKF has no graph for C4 to anchor against; this is the AZ-776 root-cause pairing the user reported).
+3. **`build_pre_constructed` honours `c4_pose.enabled`.** When disabled, `c5_isam2_graph_handle` is **omitted** from the `pre_constructed` dict — the handle is a C4 consumer requirement, and removing C4 from the selection map removes the requirement. The ESKF estimator itself is still built and cached in the internal `_c5_prebuilt_estimator` slot (so the C5 wrapper short-circuits onto the prebuilt instance), but the iSAM2-shaped seam disappears from the cross-component contract.
+4. **Component selection is the only thing that changes.** The composition root's existing `_compose` mechanics — topological ordering, lazy strategy resolution, build-flag gating — are unchanged. The new `skip_slugs` parameter (a `frozenset[str]`) is the minimal seam that lets `compose_root` instruct `_compose` to drop the disabled component(s); there is no second composition path, no `compose_eskf` function, no mode-aware branch outside the validation gate.
+
+**Alternatives considered**:
+
+1. **Make `c4_pose` a "soft" dependency of C5 (introspect the strategy at C5 construction time, skip C4 wiring only when `strategy == "eskf"`).** Rejected: this leaks C5-strategy specifics into C4's interface (`PoseEstimator` would have to grow a "you may not be wired" affordance), violates ADR-009 interface-first, and re-introduces the very mode-aware branches Invariant 1 of the replay protocol forbids.
+2. **Make `compose_root` derive `c4_pose.enabled` automatically from `c5_state.strategy` (no user-facing flag).** Rejected: the C4↔C5 coupling is a deliberate design pairing, not a mechanical derivation. Future research strategies (e.g. a non-iSAM2 GTSAM variant, or a satellite-anchored ESKF) may want different combinations; the explicit flag keeps the configuration honest and audit-able.
+3. **Keep the wiring as-is and rely on the registry mechanism to skip C4.** Rejected: `C4PoseConfig` registers itself with the global config registry at module import (via `register_component_block` in `components/c4_pose/__init__.py`), which means even an empty `c4_pose:` block in YAML instantiates the block with defaults and pulls C4 into the selection map. The flag is the only honest opt-out without removing the registration call (which would break the steady-state path).
+4. **Build a synthetic `NullIsam2GraphHandle` that satisfies the Protocol but no-ops on update.** Rejected as the textbook example of the "Real Results, Not Simulated Ones" anti-pattern: it would let C4 run on top of ESKF with no anchoring, producing pose estimates that look real but have no factor-graph grounding. The composition-time gate is the honest answer.
+
+**Consequences**:
+
+- `tests/e2e/replay/conftest.py` writes `c4_pose: { enabled: false }` into the Tier-2 replay `config.yaml`, alongside the existing `c1_vio: klt_ransac` + `c5_state: eskf` block. This is the open-loop profile the replay binary uses for the AZ-265 / AZ-776 simple-baseline tests.
+- `tests/e2e/replay/test_derkachi_1min.py` un-xfails AC-1 (clean exit + per-frame JSONL), AC-2 (schema), AC-5 (determinism), AC-6 realtime, and AC-6 ASAP — these tests only required compose-time success to pass and AZ-776 lands that. AC-3 (≤ 100 m for ≥ 80 % of ticks) **remains** xfailed for AZ-777: ESKF integrates open-loop and drifts unbounded without C2/C3/C4 satellite re-anchoring; the ≤ 100 m threshold cannot be met by physics until the Derkachi C6 reference tile cache lands.
+- `_docs/02_document/contracts/replay/replay_protocol.md` gains a new "Open-loop ESKF composition profile" sub-section in **Composition root extension** plus a new **Invariant 13** ("C4↔C5 pairing matrix is enforced at compose time") that the AZ-776 unit tests own.
+- `_docs/02_document/components/06_c4_pose/description.md` gains an "Enabled flag" sub-section that points at this ADR; the rest of the component contract is unchanged.
+- The unit-test surface at `tests/unit/runtime_root/test_az776_open_loop_eskf_composition.py` owns the seven invariants AZ-776 introduces: `C4PoseConfig.enabled` default-true, AC-1 (open-loop ESKF composes without C4), AC-2 (default GTSAM profile still includes C4), AC-3a + AC-3b (the two forbidden pairings raise `CompositionError`), and the two `pre_constructed` behaviours (`c5_isam2_graph_handle` omitted when C4 disabled, present when C4 enabled). The full suite passes in ~4 s.
+- The composition root's contract surface in `runtime_root/__init__.py` gains one public helper (`CompositionError` was already public; the new `skip_slugs` parameter to `_compose` is module-private). No public CLI flag is added — operators set `c4_pose.enabled = false` in YAML.
+
+### ADR-013 — gRPC server-streaming tile provision for operator pre-flight (AZ-976)
+
+**Context**: Operator-side cache build (C11/C12 ↔ `satellite-provider`) is off the hot airborne path but dominates time-to-ready when a corridor has thousands of tiles. The current REST shape (`POST /route` + poll + planned `POST /inventory` + N× `GET /tiles/{z}/{x}/{y}`) multiplies round-trips and cannot overlap "tiles already on SP disk" with "tiles still downloading from Google Maps". The inventory POST was specified in AZ-777 but never shipped in satellite-provider; Jetson smoke tests 404 on it today. Both codebases are owned by the same team (.NET satellite-provider, Python gps-denied operator tooling), so a typed streaming contract is feasible without a browser client.
+
+**Decision**:
+
+1. **We will add `satellite.v1.RouteTileDelivery.DeliverRouteTiles`** — unary request (`RouteSpec` + `client_tiles`), server-streaming `RouteTileEvent` (manifest → batches → progress → complete | error) — as the primary operator-side pre-flight transport (Epic AZ-976). Proto: `tile_provision.proto`; human contract: `tile_provision_grpc.md`.
+2. **The request carries `RouteSpec.route_id` (idempotent UUID) plus `ClientTileRecord[]`.** satellite-provider omits tiles when the client catalog already has equal-or-better resolution and equal-or-newer `captured_at` (lower m/px = better).
+3. **First stream event is `RouteManifest`** (`total_candidates`, `skipped_by_client`, `to_deliver`); then `TileBatch` messages with inline JPEGs. Server sends on-disk hits before externally fetched tiles (wire-agnostic ordering; `TilePayload.route_priority` hints along-route order).
+4. **ADR-004 boundary is preserved**: only C11/C12 on the operator workstation import gRPC stubs.
+
+**Alternatives considered**:
+
+| Alternative | Rejected because |
+|-------------|------------------|
+| REST `POST /inventory` + parallel GET | Never implemented in satellite-provider; still N+1 HTTP; no overlap of cached vs in-flight fetch |
+| SSE over HTTPS | Weaker typing; both sides are service binaries, not browsers — gRPC + protobuf is the better fit |
+| ZeroMQ between products | Poor fit across WAN/NAT; better kept **inside** satellite-provider's fetch workers |
+| In-flight streaming to UAV | Violates RESTRICT-SAT-1 / ADR-004; wrong reliability model for the aircraft |
+
+**Consequences**:
+
+- Epic AZ-976 decomposes: AZ-977 (SP gRPC server), AZ-978 (C11 client + C12 wiring), AZ-979 (Jetson benchmark + flip default).
+- REST `route_client` + `HttpTileDownloader` remain as fallback until AZ-979 benchmark promotes gRPC.
+- Finished C6 is still staged onto the Jetson via USB/rsync before flight — this ADR optimizes operator wait time, not in-air link dependency.
+
+**Evidence**: `_docs/02_document/contracts/c11_tilemanager/tile_provision.proto`, `tile_provision_grpc.md`, `_docs/02_tasks/todo/AZ-976_grpc_tile_provision_epic.md`.
@@ -0,0 +1,123 @@
+# Architecture Compliance Baseline
+
+> **Purpose.** Single canonical document against which every cumulative-review
+> report (per `.cursor/skills/code-review/SKILL.md` Phase 7 + the implement
+> skill's Step 14.5 cumulative review) computes its `## Baseline Delta` —
+> the count of **carried-over**, **resolved**, and **newly-introduced**
+> architecture violations. Without this file, cumulative reviews log
+> "baseline not found → no Baseline Delta section emitted" and structural
+> regressions are visible only pairwise per batch instead of cumulatively.
+
+**Baseline established**: 2026-05-26 (cycle-4 Step 10, batch 1, AZ-899)
+**Source-of-truth snapshot**: `_docs/06_metrics/structure_2026-05-20.md`
+**Initial violation count**: **0**
+**Cycle of last refresh**: 4
+
+## Source
+
+The "0 violations" claim is grounded in the structural facts captured by the
+cycle-1-close snapshot (`_docs/06_metrics/structure_2026-05-20.md`):
+
+| Fact | Value |
+|------|-------|
+| Inventory entries | 15 (14 production components C1–C13 + 1 cross-cutting `helpers/runtime_root` row) |
+| Import cycles in component graph | 0 (verified across batches 88–92 cumulative reviews; no back-edges) |
+| Contract files | 5 (`fdr_record_schema.md`, `fdr_client_protocol.md`, `log_record_schema.md`, the `shared_satellite_provider_ingest/` placeholder, `shared_flights_api/`) |
+| `_STRATEGY_REGISTRY` composition seam | `runtime_root.airborne_bootstrap` + `runtime_root.operator_bootstrap` (single composition root per binary, ADR-009) |
+| Layering rule | Layer-3 → Layer-4 imports **BANNED**; AZ-507 cross-component contract surface enforced by `tests/unit/test_az270_compose_root.test_ac6_only_compose_root_imports_concrete_strategies` lint |
+
+The architecture is documented in `_docs/02_document/architecture.md` (ADR-001
+monolith, ADR-002 build-time exclusion, ADR-009 interface-first DI,
+ADR-011 single-image live+replay). File ownership is documented in
+`_docs/02_document/module-layout.md`.
+
+## Violations
+
+*None at baseline.*
+
+This section is the append target for every cumulative-review run that
+detects an architecture finding (severity ≥ Medium, category =
+`Architecture`). The append schema is documented under § Update Protocol
+below.
+
+## Update Protocol
+
+### When a cumulative review finds a NEW architecture violation
+
+The reviewing skill (typically `.cursor/skills/code-review/SKILL.md` Phase 7,
+invoked from the implement skill's Step 14.5 cumulative review at every K=3
+batches) MUST append a row to § Violations using this schema:
+
+| Field | Example |
+|-------|---------|
+| Finding ID | `arch-2026-06-15-1` (date + sequence within the day) |
+| Batch range | `batches 17–19 cycle 4` |
+| Severity | `High` / `Medium` (Critical findings escalate immediately; Low findings stay in the per-batch report) |
+| Subcategory | `import-cycle` / `cross-component-import` / `parallel-pipeline` / `layer-violation` / `seam-bypass` |
+| File:line | `src/gps_denied_onboard/components/c2_vpr/ultra_vpr.py:117` |
+| One-line summary | `c2_vpr imports c6_tile_cache directly, bypassing the consumer-side Protocol cut required by AZ-507` |
+| Cumulative-review report | `_docs/03_implementation/cumulative_review_batches_17-19_cycle4_report.md` |
+| Status | `OPEN` (newly introduced) |
+
+The append happens IN THIS FILE, not in the cumulative-review report. The
+cumulative-review report references this file's row by Finding ID.
+
+### When a violation is resolved
+
+Update the violating row in place: change `Status: OPEN` to
+`Status: RESOLVED in batch <N> cycle <M> via <commit-hash>`. Do NOT delete
+the row — the audit trail must show both the introduction and the
+resolution.
+
+### When the structural snapshot is refreshed
+
+Any cycle that materially changes structure — new component, new
+cross-component edge, new contract file, new composition root — re-snapshots
+to a fresh `_docs/06_metrics/structure_<YYYY-MM-DD>.md` (the cycle-end
+retrospective triggers this when the diff is non-trivial). When that
+happens:
+
+1. Update the `**Source-of-truth snapshot**` header pointer at the top of
+   this file to the new file.
+2. Update the `Cycle of last refresh` header to the cycle that produced the
+   new snapshot.
+3. Update the § Source table values (component count, cycle count, contract
+   count) to match the new snapshot.
+4. Do NOT clear § Violations — open findings carry across snapshots.
+   Resolution status is per-finding, not per-snapshot.
+
+The refresh script is the same one that produced `structure_2026-05-20.md`
+(approach: count `src/gps_denied_onboard/components/*/` directories +
+`src/gps_denied_onboard/runtime_root/` + `helpers/`; run the AZ-270
+composition-root lint to detect cycles; enumerate
+`_docs/02_document/contracts/` subdirectories). If the script has been
+extracted into `tools/structure_snapshot.py` between cycles, use it;
+otherwise the manual approach is documented at the top of the source
+snapshot file.
+
+## Baseline Delta — how cumulative-review reports consume this file
+
+Every cumulative-review report MUST emit a `## Baseline Delta` section with
+three counts derived from this file:
+
+- **Carried-over**: count of rows whose `Status: OPEN` (or
+  `Status: ACCEPTED-RISK`) was unchanged at the start of this review's
+  batch window.
+- **Resolved**: count of rows that transitioned from `OPEN` to
+  `RESOLVED in batch ...` during this review's batch window.
+- **Newly-introduced**: count of rows added during this review's batch
+  window.
+
+An empty Baseline Delta (`0 new, 0 resolved, 0 carried-over`) is still
+emitted — its presence confirms the cumulative-review consulted the
+baseline rather than silently skipping the section as in cycles 1–3.
+
+## References
+
+- Cycle-3 retro § Top 3 Improvement Actions #3 — `_docs/06_metrics/retro_2026-05-26.md`
+- Cycle-1 retro § Top 3 Improvement Actions #3 (original) — `_docs/06_metrics/retro_2026-05-20.md`
+- Source snapshot — `_docs/06_metrics/structure_2026-05-20.md`
+- Existing-code flow Step 2 — `.cursor/skills/autodev/flows/existing-code.md` § "Step 2 — Architecture Baseline Scan"
+- Implement skill Step 14.5 — `.cursor/skills/implement/SKILL.md` § "Cumulative Code Review (every K batches)"
+- Architecture doc — `_docs/02_document/architecture.md`
+- Module-layout — `_docs/02_document/module-layout.md`
@@ -8,6 +8,8 @@

 **Cycle-1 operational reality**: the airborne binary wires C4 through `_STRATEGY_REGISTRY` + `register_airborne_strategies()` (AZ-591) with a single strategy slot (`opencv_gtsam` — `C4PoseConfig.KNOWN_POSE_STRATEGIES = {"opencv_gtsam"}`). Constructor injection flows through the `pre_constructed` dict passed to `compose_root(config, pre_constructed=...)` (AZ-618 umbrella → AZ-623 c5 helpers phase + AZ-625 eager iSAM2 handle phase). The `c4_pose` slot lists `("c282_ransac_filter", "c5_wgs_converter", "c5_se3_utils", "c5_isam2_graph_handle")` in `AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS`; `c13_fdr` and `clock` are optional. The `c5_isam2_graph_handle` slot is the **shared GTSAM substrate seam** — `build_pre_constructed` eagerly invokes `build_state_estimator` once (AZ-625 / Phase E.5) so the (`StateEstimator`, `ISam2GraphHandle`) tuple is constructed BEFORE either the C4 or C5 wrapper runs (C4 runs first in topo order via `_C4_POSE_DEPENDS_ON = ("c1_vio", "c3_matcher")`, then C5 short-circuits on the prebuilt estimator via the internal `_c5_prebuilt_estimator` key). The cross-seam identity invariant (`c4_pose._isam2_handle is c5_state._isam2_handle`) is verified by AC-625.3. Missing required keys raise `AirborneBootstrapError` at composition time, naming the consumer and missing key.

+**Enabled flag (AZ-776 / ADR-012)**: `C4PoseConfig.enabled: bool = True` is the user-facing switch that controls C4's participation in the composition graph. Default ON preserves the ADR-003 steady-state airborne path. Setting `c4_pose.enabled = false` in YAML removes C4 from the component selection map at compose time — the wrapper never runs, the consumer never sees an iSAM2 handle, and `build_pre_constructed` omits `c5_isam2_graph_handle` from the `pre_constructed` dict. The flag exists to support the open-loop ESKF composition profile (the AZ-265 replay Tier-2 smoke baseline) where C5 runs as the `eskf` strategy with no factor graph for C4 to anchor against. `compose_root` enforces the 2×2 pairing matrix between `c4_pose.enabled` and `c5_state.strategy` at compose time and rejects the off-diagonal cells (`enabled=False` + `gtsam_isam2`, `enabled=True` + `eskf`) with a `CompositionError`. See ADR-012 (architecture.md) and Invariant 13 in `_docs/02_document/contracts/replay/replay_protocol.md`.
+
 **Upstream dependencies**:
 - C3.5 → `MatchResult` (refined or passthrough).
 - C5 StateEstimator — supplies the GTSAM iSAM2 handle so C4 can add its factor in-graph (architecture principle: shared substrate per ADR-003).
@@ -2,23 +2,32 @@

 ## 1. High-Level Overview

-**Purpose**: own the operator-side network I/O against `satellite-provider` for the onboard tile corpus, in **both directions**:
+**Purpose**: own the operator-side network I/O against `satellite-provider` for the onboard tile corpus, in **three directions**:

+- **Route seed** (pre-flight, F1, route-driven variant — Cycle 3 / Epic AZ-835): submit a tlog-derived `RouteSpec` (waypoints + per-waypoint coverage radius, produced by `replay_input.tlog_route.extract_route_from_tlog` — AZ-836) to `satellite-provider`'s Route API and poll until corridor tile materialisation completes. Lets the operator pre-commit the cache to where the drone actually flew rather than a bounding box.
 - **Download** (pre-flight, F1): fetch tiles from `satellite-provider` for the operational area, apply AC-NEW-6 freshness gating, and write into C6 (`TileStore` + `TileMetadataStore`). C11 is the **only** path that crosses the workstation/companion enclave to the parent suite for tile pixels — C10 reads from the populated C6 store and never touches `satellite-provider` itself.
 - **Upload** (post-landing, F10): read pending mid-flight tiles from C6 and POST to `satellite-provider`'s ingest endpoint (D-PROJ-2 contract sketch). C11 itself does NOT gate on flight state — it is a dumb pipe; the post-landing safety gate is owned by C12's `PostLandingUploadOrchestrator` (AZ-329 / Batch 44), which checks the C13 `flight_footer` FDR record for `clean_shutdown=True` before invoking `TileUploader.upload_pending_tiles`.

-C11 is a **separate operator-side binary / image**. The airborne companion image's CMake target deliberately excludes the entire `c11_tilemanager/` source tree so the airborne process cannot accidentally execute either the download path or the upload path even via reflection or config error (ADR-004 process-level isolation, AC-8.4). Both directions of tile I/O are operator-driven on the operator workstation; the companion only consumes the populated C6 store while airborne.
+C11 is a **separate operator-side binary / image**. The airborne companion image's CMake target deliberately excludes the entire `c11_tilemanager/` source tree so the airborne process cannot accidentally execute the seed path, the download path, or the upload path even via reflection or config error (ADR-004 process-level isolation, AC-8.4). All three directions of tile I/O are operator-driven on the operator workstation; the companion only consumes the populated C6 store while airborne.

-**Architectural Pattern**: Pipeline behind two interfaces (`TileDownloader`, `TileUploader`) under one component, consistent with C8's multi-interface shape (FC-AP, FC-iNav, GCS adapters under one component). The two interfaces are bundled into C11 because they share auth (TLS + service-internal API key for download, per-flight onboard signing key for upload), HTTP client, network configuration, deployment unit (operator-tooling tarball), and the airborne-exclusion property — splitting them into two components would duplicate all of that. They are kept as **two interfaces** so SRP is preserved at the call-site level: C12 binds `TileDownloader` for the F1 cache-build workflow, `TileUploader` for the F10 post-landing trigger; neither is forced to depend on the other.
+**Architectural Pattern**: Pipeline behind three interfaces (`SatelliteProviderRouteClient`, `TileDownloader`, `TileUploader`) under one component, consistent with C8's multi-interface shape (FC-AP, FC-iNav, GCS adapters under one component). The three interfaces are bundled into C11 because they share auth (JWT Bearer + optional TLS-insecure flag for dev self-signed certs across all three; the upload direction additionally signs each tile with the per-flight onboard signing key), HTTP client (`httpx`), network configuration, deployment unit (operator-tooling tarball), and the airborne-exclusion property — splitting them into separate components would duplicate all of that. They are kept as **three interfaces** so SRP is preserved at the call-site level: C12 binds `SatelliteProviderRouteClient.seed_route` to materialise the corridor cache from a tlog (cycle 3 e2e fixture today; planned C12 production path), `TileDownloader.download_tiles_for_area` for the F1 bbox-driven cache-build workflow, `TileUploader.upload_pending_tiles` for the F10 post-landing trigger; none is forced to depend on the others.

 **Cycle-1 operational reality**: C11 is **operator-workstation-only**, NOT an airborne strategy slot — there is no `c11_tile_manager` slot in `_AIRBORNE_REGISTRATIONS`, no row in `AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS`, and the airborne companion image's build target deliberately excludes the entire `c11_tile_manager/` source tree (ADR-004 process-level isolation; AC-8.4). The operator binary composes C11 via `runtime_root/c11_factory.py`, which exposes three tiny per-service factories — `build_per_flight_key_manager` (AZ-318), `build_tile_uploader` (AZ-319 + AZ-320), and `build_tile_downloader` (AZ-316) — each called explicitly by C12's CLI; no central registry. FDR wiring goes through the per-producer `make_fdr_client` cache: AZ-318 `PerFlightKeyManager` defaults to `make_fdr_client("c11_tile_manager.signing_key", config)`, AZ-319 `HttpTileUploader` to `make_fdr_client("c11_tile_manager.tile_uploader", config)` — both distinct from the airborne `"airborne_main"` producer, so the operator-workstation process gets its own per-component FdrClient instances rather than sharing the airborne singleton. AZ-320's `IdempotentRetryTileUploader` decorator wraps `HttpTileUploader` by default (per-call + per-tile bounded retry); `config.components['c11_tile_manager'].disable_retry_decorator = True` suppresses the wrap for low-level debugging or test wiring that needs to observe the inner uploader. The AZ-507 cross-component cut keeps C11 from importing C6 directly: `tile_store` / `tile_metadata_store` are passed in by the operator-binary composition root as consumer-side cuts; `http_client` (an `httpx.Client`) is also caller-owned so tests can swap in `httpx.MockTransport`. AZ-687 replay-mode guard does not apply — C11 has no airborne footprint.

+**Cycle-3 operational reality (AZ-777 Phase 1 + Epic AZ-835)**: the e2e harness now wires the e2e-runner against the **real** parent-suite `satellite-provider` .NET service in `docker-compose.test.jetson.yml` (lineage AZ-688 / AZ-691 / AZ-692; tier-1 `docker-compose.test.yml` deprecated 2026-05-20). Two consequences cascaded into C11:
+
+- **`TileDownloader` contract adaptation (AZ-777 Phase 1)** — `HttpTileDownloader._INVENTORY_PATH = "/api/satellite/tiles/inventory"` (POST, bulk lookup by (z,x,y)) and `HttpTileDownloader._TILES_PATH = "/tiles"` (GET, slippy-map fetch via `/tiles/{z}/{x}/{y}`). Previously documented as `GET /api/satellite/tiles?bbox=…&zoom=…`; the real `satellite-provider` API surface uses the inventory + slippy-map split per `tile-inventory.md` v1.0.0 (AZ-505). The bbox-driven `download_tiles_for_area` entry point and its `DownloadRequest` / `DownloadBatchReport` DTOs are unchanged at the call-site level; the contract adaptation is internal. Because the inventory response does not carry a `Content-Length` hint, AZ-308's pre-write budget check uses `_DEFAULT_ESTIMATED_TILE_BYTES = 50 000` (conservative over-reserve; typical 256×256 JPEG basemap tile is 8–80 KiB). Auth is `Authorization: Bearer ${SATELLITE_PROVIDER_API_KEY}`; the dev-only `SATELLITE_PROVIDER_TLS_INSECURE=1` env knob accepts the self-signed dev cert.
+- **Third interface — `SatelliteProviderRouteClient` (AZ-838 / Epic AZ-835 C2)** — `seed_route(spec: RouteSpec) -> RouteSeedResult` POSTs the spec to `POST /api/satellite/route` (`requestMaps=true`, `createTilesZip=false`), polls `GET /api/satellite/route/{id}` until `mapsReady=true` (or a terminal-failure status), then verifies coverage via `POST /api/satellite/tiles/inventory`. Pre-emptively enforces AZ-809's `CreateRouteRequestValidator` bounds (`points` 2..500; `regionSizeMeters` 100..10 000; `zoomLevel` 0..22; lat/lon ranges) so obviously-bad input fails before the HTTP POST. Default cadence: `poll_interval_s = 5.0`, `poll_max_attempts = 60`, `request_timeout_s = 30.0`. Errors form a dedicated hierarchy (`RouteValidationError` 4xx + RFC 7807 ProblemDetails; `RouteTransientError` 5xx / network / timeout with `__cause__` set; `RouteTerminalFailureError` for non-success terminal status) rooted at `SatelliteProviderRouteError` — independent of `TileManagerError` because the Route API is a corridor-onboarding flow, not a per-tile transfer.
+
+The route-driven path is exercised today by `tests/e2e/replay/conftest.py::operator_pre_flight_setup` (AZ-839 — replaces the cycle-1 `mkdir` placeholder; yields a `PopulatedC6Cache` dataclass) and `tests/e2e/replay/test_az835_e2e_real_flight.py` (AZ-840 — single test that takes only `(tlog, video, calibration)` and runs the full 7-step pipeline). The C12 production CLI binding for the route path is a future-cycle integration; today's C12 still drives only `download_tiles_for_area` for production pre-flight cache builds.
+
 **Upstream dependencies**:

- C12 OperatorTooling → invokes `TileDownloader.download_tiles_for_area(...)` during F1 and `TileUploader.upload_pending_tiles(...)` post-landing.
+- C12 OperatorTooling → invokes `TileDownloader.download_tiles_for_area(...)` during F1 and `TileUploader.upload_pending_tiles(...)` post-landing. (Cycle-3 e2e fixtures also drive `SatelliteProviderRouteClient.seed_route(...)` for the route-driven F1 variant; C12 production binding for the route path is a future cycle.)
 - C6 TileStore + TileMetadataStore → write target during download (`source = googlemaps`); read source during upload (`source = onboard_ingest`, `voting_status = pending`).
+- `replay_input.tlog_route.RouteSpec` (AZ-836; `_types/route.py` canonical home per AZ-845) → input DTO to `SatelliteProviderRouteClient.seed_route`.
 - Operator workstation OS → invocation entry point (CLI / tray app, owned by C12).
- `satellite-provider` (external) → `GET /api/satellite/tiles?bbox=…&zoom=…` for download; `POST /api/satellite/tiles/ingest` for upload (D-PROJ-2 design task #1, **planned, not yet implemented service-side**).
+- `satellite-provider` (external) → for download: `POST /api/satellite/tiles/inventory` (bulk lookup by (z,x,y)) + `GET /tiles/{z}/{x}/{y}` (slippy-map fetch, per `tile-inventory.md` v1.0.0 / AZ-505); for route seeding: `POST /api/satellite/route` + `GET /api/satellite/route/{id}` (per `CreateRouteRequest.cs` DTO + AZ-809 validator); for upload: `POST /api/satellite/tiles/ingest` (D-PROJ-2 design task #1, **planned, not yet implemented service-side**).

 **Downstream consumers**:

@@ -27,6 +36,12 @@ C11 is a **separate operator-side binary / image**. The airborne companion image

 ## 2. Internal Interfaces

+### Interface: `SatelliteProviderRouteClient` (cycle 3 — AZ-838 / Epic AZ-835 C2)
+
+| Method | Input | Output | Async | Error Types |
+|--------|-------|--------|-------|-------------|
+| `seed_route` | `RouteSpec` (from `_types/route.py`; `name: str \| None` optional) | `RouteSeedResult` | No (poll loop; seconds–minutes) | `RouteValidationError`, `RouteTransientError`, `RouteTerminalFailureError` (all under `SatelliteProviderRouteError`) |
+
 ### Interface: `TileDownloader`

 | Method | Input | Output | Async | Error Types |
@@ -46,6 +61,21 @@ C11 no longer exposes `confirm_flight_state` — the post-landing flight-state g
 **Input/Output DTOs**:

 ```
+RouteSpec (cycle 3 — _types/route.py, produced by replay_input/tlog_route.py):
+  waypoints:                       tuple[tuple[float, float], ...]   # (lat, lon), 1..max_waypoints
+  suggested_region_size_meters:    float                              # per-waypoint coverage radius
+  source_tlog:                     Path                               # provenance
+  source_segment:                  tuple[int, int]                    # (start_idx, end_idx) into tlog GPS rows
+  total_distance_meters:           float                              # along-track distance of active segment
+
+RouteSeedResult (cycle 3 — c11_tile_manager.route_client):
+  route_id:                        uuid
+  terminal_status:                 string
+  maps_ready:                      bool
+  tile_count:                      int
+  elapsed_ms:                      int
+  submitted_payload_sha256:        string
+
 DownloadRequest:
  bbox:                       BoundingBox (lat_min, lon_min, lat_max, lon_max)
  zoom_levels:                list[int]
@@ -78,17 +108,25 @@ UploadBatchReport:

 ## 3. External API Specification

-C11 is a **client** of `satellite-provider`'s REST surface in both directions.
+C11 is a **client** of `satellite-provider`'s REST surface in three directions.

-### 3.1 Download — read path (existing `satellite-provider` API)
+### 3.1 Route seed — corridor materialisation (cycle 3 — AZ-838 / Epic AZ-835 C2)

 | Endpoint | Method | Auth | Rate Limit | Description |
 |----------|--------|------|------------|-------------|
-| `/api/satellite/tiles?bbox=…&zoom=…` | GET | TLS + service-internal API key | parent-suite enforces | Paged tile blobs + metadata for a bounding box at the given zoom level(s). |
+| `/api/satellite/route` | POST | JWT Bearer (`SATELLITE_PROVIDER_API_KEY`) + optional dev-only `SATELLITE_PROVIDER_TLS_INSECURE=1` | parent-suite enforces | Submit a `RouteSpec` (waypoints + region size + zoom level). Body shape per `CreateRouteRequest.cs` / `RoutePoint.cs` (`lat` / `lon` JSON property names) / `GeoPoint.cs` DTOs. Query: `requestMaps=true&createTilesZip=false`. Validated pre-emptively against AZ-809 `CreateRouteRequestValidator` rules. |
+| `/api/satellite/route/{id}` | GET | same as above | parent-suite enforces | Poll route processing status. Returns `mapsReady: bool` + a `status` string. Terminal-success: `mapsReady=true`. Terminal-failure: `status ∈ {failed, error, rejected}`. Default cadence: 5 s × ≤ 60 attempts. |

-C11 honours `Retry-After` on 429s, fails fast on TLS / auth errors, retries with backoff on 5xx. Resolution below 0.5 m/px (RESTRICT-SAT-4) is rejected at the C11 boundary, not pushed downstream.
+### 3.2 Download — read path (`satellite-provider` v1.0.0 inventory contract — AZ-505 / AZ-777 Phase 1)

-### 3.2 Upload — write path (D-PROJ-2 contract sketch, **planned**)
+| Endpoint | Method | Auth | Rate Limit | Description |
+|----------|--------|------|------------|-------------|
+| `/api/satellite/tiles/inventory` | POST | JWT Bearer (`SATELLITE_PROVIDER_API_KEY`) + optional dev-only `SATELLITE_PROVIDER_TLS_INSECURE=1` | parent-suite enforces | Bulk lookup of `(zoom, x, y)` slippy-map coords (≤ 5000 entries / request); body shape per `tile-inventory.md` v1.0.0. Response order matches request order; each entry carries `present: true|false` plus metadata when present (`resolutionMPerPx`, `producedAt`, …). |
+| `/tiles/{z}/{x}/{y}` | GET | same as above | parent-suite enforces | Slippy-map tile fetch by coordinates (binary JPEG response). Issued only for inventory entries with `present=true`. |
+
+C11 honours `Retry-After` on 429s, fails fast on TLS / auth errors, retries with backoff on 5xx. Resolution below 0.5 m/px (RESTRICT-SAT-4) is rejected at the C11 boundary, not pushed downstream. Because the inventory response carries no `Content-Length` hint, AZ-308's pre-write budget check uses a conservative `_DEFAULT_ESTIMATED_TILE_BYTES = 50 000` per-tile reserve.
+
+### 3.3 Upload — write path (D-PROJ-2 contract sketch, **planned**)

 | Endpoint | Method | Auth | Rate Limit | Description |
 |----------|--------|------|------------|-------------|
@@ -136,26 +174,28 @@ C11 reads from / writes to C6 (the local store) and reads from / writes to `sate

 **Algorithmic Complexity**:

+- Route seed: bounded by parent-suite tile materialisation latency (~seconds–minutes for the Derkachi corridor; gated by `poll_max_attempts × poll_interval_s`).
 - Download: linear in tile count; bandwidth-bound by the operator workstation's link to `satellite-provider`.
 - Upload: linear in pending tile count; bandwidth-bound; bursty post-landing.

-**State Management**: stateless except for the two journals.
+**State Management**: stateless except for the two journals (download / pending-upload). The route client is fully stateless — each `seed_route` call submits, polls, verifies, and returns.

 **Key Dependencies**:

 | Library | Version | Purpose |
 |---------|---------|---------|
-| httpx | per project pin | GET (download) + multipart POST (upload) to `satellite-provider` |
+| httpx | per project pin | POST inventory + GET slippy-map (download), POST route + GET status (route seed), multipart POST (upload) to `satellite-provider` |
 | atomicwrites | latest | Journal updates |
 | cryptography | per project pin | Per-flight signing key (upload payload signing); the production `satellite-provider` ingest endpoint and the e2e-test `mock-suite-sat-service` fixture both verify with the same key family |

 **Error Handling Strategy**:

- `SatelliteProviderError`: HTTP timeout / 5xx / TLS failure on either direction. Retry-with-backoff on 5xx; fail fast on TLS / auth. On download, surface to operator + takeoff blocked. On upload, leave tiles in the pending-upload journal and surface to operator. **Do not delete uploaded tiles from C6** until acknowledged.
+- `SatelliteProviderError`: HTTP timeout / 5xx / TLS failure on download / upload. Retry-with-backoff on 5xx; fail fast on TLS / auth. On download, surface to operator + takeoff blocked. On upload, leave tiles in the pending-upload journal and surface to operator. **Do not delete uploaded tiles from C6** until acknowledged.
 - `RateLimitedError` (429): obey `Retry-After`; the operator can also re-invoke later. Same handling either direction.
 - `FreshnessRejectionError` / `ResolutionRejectionError`: download-side only. Per AC-NEW-6 / RESTRICT-SAT-4 — never silently downgrade fresh-required tiles in `active_conflict` sectors. Surface counts in the `DownloadBatchReport`.
 - `CacheBudgetExceededError`: download-side only. Pre-flight free-space check against AC-8.3 (≤ 10 GB). Fail fast with explicit budget delta; no partial write.
 - `SignatureRejectedError`: upload-side only. Per-flight signing key was rejected by `satellite-provider`. This is a security-critical event — do NOT silently drop; surface to operator + log to FDR.
+- **Route-seed errors** (cycle 3, dedicated hierarchy under `SatelliteProviderRouteError`): `RouteValidationError` (4xx + RFC 7807 `errors` dict; raised pre-emptively for AZ-809 validator violations BEFORE the HTTP POST), `RouteTransientError` (5xx / network / timeout; carries `__cause__`), `RouteTerminalFailureError` (parent suite reports a non-success terminal status; `.detail` carries the response JSON). Separate hierarchy from `TileManagerError` because the route flow is corridor onboarding, not per-tile transfer.

 Post-landing safety: C11's upload path no longer gates on flight state internally. The check now lives in C12's `PostLandingUploadOrchestrator` (AZ-329 / Batch 44), which refuses to invoke `TileUploader.upload_pending_tiles` unless the C13 `flight_footer` FDR record records `clean_shutdown=True` for the target flight. ADR-004 process-level isolation remains the primary control — C11 should never run on the companion at all.

@@ -170,8 +210,10 @@ Post-landing safety: C11's upload path no longer gates on flight state internall
 **Known limitations**:

 - D-PROJ-2 ingest endpoint is NOT yet implemented service-side. Until parent-suite delivers the endpoint, C11 will fail every upload — the pending-upload journal accumulates. Operator workflow tolerates this.
- The e2e-test `mock-suite-sat-service` fixture implements only the planned POST contract (per the leftover file). Download integration tests run against the real `satellite-provider`. Production runs reach `satellite-provider` directly in both directions; the fixture is never on the production path.
- `TileDownloader` requires the operator workstation to have network reach to `satellite-provider` (the only path that crosses out of the workstation enclave). Pre-flight network configuration is an operator concern owned by C12; C11 fails fast if reachability is missing.
+- The e2e-test `mock-suite-sat-service` fixture implements only the planned POST upload contract (per the leftover file). Download + route-seed integration tests run against the real `satellite-provider` on the Jetson harness. Production runs reach `satellite-provider` directly in all three directions; the fixture is never on the production path.
+- `TileDownloader` and `SatelliteProviderRouteClient` require the operator workstation to have network reach to `satellite-provider` (the only path that crosses out of the workstation enclave). Pre-flight network configuration is an operator concern owned by C12; C11 fails fast if reachability is missing.
+- **Imagery source license attribution (cycle 3 — AZ-777 Phase 2)**: the Jetson `satellite-provider` instance downloads from the **Google Maps** satellite layer (`lyrs=s`), governed by Google Maps Platform Terms of Service. Dev/research use only; the operator-side seed scripts (`tests/fixtures/derkachi_c6/seed_region.py`, `seed_route.py`) propagate the "Imagery © Google" attribution string. Production deployment requires either a Google Maps Platform licensing review or migration to a true CC-BY satellite source on the satellite-provider side (parent-suite ticket TBD; surfaced in `_docs/00_problem/input_data/flight_derkachi/README.md`).
+- **Dev TLS cert**: the e2e-runner today accepts the self-signed dev cert via `SATELLITE_PROVIDER_TLS_INSECURE=1`. Production deploys must validate against a CA-issued cert (`SATELLITE_PROVIDER_TLS_INSECURE=0`); the env knob is documented in `.env.test.example` + the smoke test + this section as **development-only**.

 **Potential race conditions**:

@@ -179,25 +221,28 @@ Post-landing safety: C11's upload path no longer gates on flight state internall

 **Performance bottlenecks**:

+- Route seed: parent-suite tile-materialisation latency dominates (corridor onboarding from Google Maps upstream). Bounded by `poll_max_attempts × poll_interval_s` (default 60 × 5 s = 5 min wall-clock ceiling).
 - Download: bandwidth-bound by the operator workstation's `satellite-provider` link; descriptor / engine work is downstream in C10 (offline, minutes).
 - Upload: bandwidth-bound. Per-flight upload volume is bounded by the F4 mid-flight tile gen cap (typically a few hundred tiles, each 50–200 KB → tens of MB per flight).

 ## 8. Dependency Graph

-**Must be implemented after**: C6 (read source for upload, write target for download), `satellite-provider` (download path; existing) + D-PROJ-2 endpoint (upload path; the e2e-test `mock-suite-sat-service` fixture covers tests until the real endpoint ships).
+**Must be implemented after**: C6 (read source for upload, write target for download), `satellite-provider` (download + route-seed paths; existing) + D-PROJ-2 endpoint (upload path; the e2e-test `mock-suite-sat-service` fixture covers tests until the real endpoint ships). `replay_input.tlog_route` (AZ-836) is a soft prerequisite for the route-seed path — the route client accepts any `RouteSpec` regardless of how it was produced, but the cycle-3 e2e fixture wires `extract_route_from_tlog` upstream.

 **Can be implemented in parallel with**: anything except C6 changes.

-**Blocks**: F1 (pre-flight cache build cannot start without `TileDownloader`), F10 (post-landing upload cannot start without `TileUploader`).
+**Blocks**: F1 (pre-flight cache build cannot start without `TileDownloader` or — for the route-driven variant — `SatelliteProviderRouteClient.seed_route`), F10 (post-landing upload cannot start without `TileUploader`).

 ## 9. Logging Strategy

 | Log Level | When | Example |
 |-----------|------|---------|
-| ERROR | `SignatureRejectedError`, persistent `SatelliteProviderError`, `CacheBudgetExceededError` | `C11 upload failure: signature rejected by satellite-provider` |
-| WARN | one-off network failure, scheduled retry, freshness-driven rejections (counts) | `C11 batch upload retry: batch_uuid=…; next_retry_in_s=30` |
-| INFO | session start/end; per-batch report (download + upload) | `C11 download complete: 87654 tiles, 12 stale-rejected; bbox=…` |
-| DEBUG | per-tile request/response | `C11 tile uploaded: tile_id=(z=18,lat=…,lon=…); status=queued` |
+| ERROR | `SignatureRejectedError`, persistent `SatelliteProviderError`, `CacheBudgetExceededError`, `RouteTerminalFailureError` | `C11 upload failure: signature rejected by satellite-provider`; `c11.route.poll.terminal kind=failed route_id=…` |
+| WARN | one-off network failure, scheduled retry, freshness-driven rejections (counts), `RouteTransientError` retries, `RouteValidationError` pre-flight rejections | `C11 batch upload retry: batch_uuid=…; next_retry_in_s=30`; `c11.route.validation_failed field=points reason=below_min(2)` |
+| INFO | session start/end; per-batch report (download + upload); route submit + each poll tick + inventory verify | `C11 download complete: 87654 tiles, 12 stale-rejected; bbox=…`; `c11.route.submit route_id=…`; `c11.route.poll.tick attempt=3 status=processing` |
+| DEBUG | per-tile request/response; per-tile inventory entries | `C11 tile uploaded: tile_id=(z=18,lat=…,lon=…); status=queued` |
+
+Cycle-3 route-client log kinds: `c11.route.submit`, `c11.route.poll.tick`, `c11.route.poll.terminal`, `c11.route.inventory`, `c11.route.validation_failed` (component `c11_tile_manager.route_client`).

 **Log format**: structured JSON.
 **Log storage**: operator workstation log file (e.g. `~/.azaion/onboard/c11-tilemanager.log`); also writes per-run summaries (download report, upload report) to the operator workstation cache root for audit. The companion's FDR is NOT involved (C11 doesn't run on the companion).
@@ -0,0 +1,126 @@
+# Contract: route_client
+
+**Component**: c11_tilemanager
+**Producer task**: AZ-838_satellite_provider_route_client (Epic AZ-835 C2)
+**Consumer tasks**: AZ-839 (`operator_pre_flight_setup` real fixture, Epic AZ-835 C3); AZ-840 (E2E orchestrator test, Epic AZ-835 C4); future C12 production binding (deferred — see § Non-Goals).
+**Version**: 1.0.0
+**Status**: stable
+**Last Updated**: 2026-05-26
+
+## Purpose
+
+The `SatelliteProviderRouteClient` is C11's operator-side **route-onboarding** interface. Given a `RouteSpec` (a coarsened, tlog-derived flight corridor produced by `replay_input.tlog_route.extract_route_from_tlog` — AZ-836), it registers the corridor with the parent-suite `satellite-provider` Route API, polls until materialisation completes, and verifies coverage via the inventory contract.
+
+The route-driven seeding flow lets the operator pre-commit the C6 cache to the precise corridor the drone actually flew rather than a coarse bounding box — typically ~100× more tile-efficient on long, narrow flights.
+
+C11 is operator-side ONLY; ADR-004 forbids the airborne companion image from importing this module.
+
+**Upstream API** (cycle 3 — AZ-838): `POST /api/satellite/route` (corridor onboarding; body shape per `CreateRouteRequest.cs` / `RoutePoint.cs` / `GeoPoint.cs` DTOs; query `requestMaps=true&createTilesZip=false`) + `GET /api/satellite/route/{id}` (status polling; terminal-success when `mapsReady=true`; terminal-failure when `status ∈ {failed, error, rejected}`) + `POST /api/satellite/tiles/inventory` (post-materialisation coverage verification, shared with `tile_downloader`). Authentication: `Authorization: Bearer ${SATELLITE_PROVIDER_API_KEY}`; the dev-only `SATELLITE_PROVIDER_TLS_INSECURE=1` env knob accepts the self-signed dev cert.
+
+## Shape
+
+### Function / method API
+
+```python
+import uuid
+from gps_denied_onboard._types.route import RouteSpec   # AZ-845 canonical home
+
+class SatelliteProviderRouteClient:
+    def __init__(
+        self,
+        base_url: str,
+        jwt: str,
+        *,
+        tls_insecure: bool = False,
+        request_timeout_s: float = 30.0,
+        poll_interval_s: float = 5.0,
+        poll_max_attempts: int = 60,
+    ) -> None: ...
+
+    def seed_route(
+        self,
+        spec: RouteSpec,
+        *,
+        name: str | None = None,
+    ) -> RouteSeedResult: ...
+```
+
+| Name | Signature | Throws / Errors | Blocking? |
+|------|-----------|-----------------|-----------|
+| `seed_route` | `(spec: RouteSpec, *, name: str \| None = None) -> RouteSeedResult` | `RouteValidationError`, `RouteTransientError`, `RouteTerminalFailureError` (all under `SatelliteProviderRouteError`) | sync; poll loop bounded by `poll_max_attempts × poll_interval_s` (default 60 × 5 s = 5 min ceiling) |
+
+### Data DTOs
+
+```python
+@dataclass(frozen=True, slots=True)
+class RouteSpec:                                          # _types/route.py (AZ-845)
+    waypoints:                       tuple[tuple[float, float], ...]   # (lat, lon)
+    suggested_region_size_meters:    float                              # per-waypoint coverage radius
+    source_tlog:                     Path                               # provenance
+    source_segment:                  tuple[int, int]                    # (start_idx, end_idx) into tlog GPS rows
+    total_distance_meters:           float                              # along-track distance of active segment
+
+@dataclass(frozen=True, slots=True)
+class RouteSeedResult:                                    # c11_tile_manager/route_client.py
+    route_id:                         uuid.UUID
+    terminal_status:                  str                # e.g. "completed", "done", "succeeded"
+    maps_ready:                       bool               # True on terminal success
+    tile_count:                       int                # present=true entries from inventory verify
+    elapsed_ms:                       int                # POST → terminal-status wall time
+    submitted_payload_sha256:         str                # provenance for the inventory verify step
+```
+
+| Field | Type | Required | Description | Constraints |
+|-------|------|----------|-------------|-------------|
+| `RouteSpec.waypoints` | `tuple[tuple[float, float], ...]` | yes | Ordered list of (lat, lon) waypoints | `2 ≤ len(waypoints) ≤ 500` (AZ-809 validator); each `lat ∈ [-90, 90]`, `lon ∈ [-180, 180]` |
+| `RouteSpec.suggested_region_size_meters` | `float` | yes | Per-waypoint coverage radius | `100.0 ≤ value ≤ 10_000.0` (AZ-809 validator) |
+| `RouteSpec.source_tlog` | `Path` | yes | Provenance — which tlog produced this spec | filesystem path |
+| `RouteSeedResult.route_id` | `uuid.UUID` | yes | Server-assigned route id | non-zero |
+| `RouteSeedResult.terminal_status` | `str` | yes | Last status observed from `GET /api/satellite/route/{id}` | one of `{"completed", "failed", "error", "done", "succeeded", "rejected"}` |
+| `RouteSeedResult.maps_ready` | `bool` | yes | True iff parent suite reported `mapsReady=true` (terminal success) | True on success; False if poll budget exhausted before terminal |
+| `RouteSeedResult.tile_count` | `int` | yes | Inventory `present=true` count over the route's enumerated coverage | ≥ 0 (lower bound — server may interpolate between waypoints) |
+
+## Invariants
+
+- I-1: **Pre-emptive validation** rejects obviously-bad input as `RouteValidationError` BEFORE the HTTP POST. The client mirrors the AZ-809 `CreateRouteRequestValidator` bounds (`points` 2..500; `regionSizeMeters` 100..10 000; `zoomLevel` 0..22; lat/lon ranges; `name`/`description` max lengths). The list MUST stay in sync with `SatelliteProvider.Api/Validators/CreateRouteRequestValidator.cs` (parent suite source).
+- I-2: The client POSTs the wire shape exactly per `CreateRouteRequest.cs` + `RoutePoint.cs` + `GeoPoint.cs` (note: `RoutePoint` uses `lat` / `lon` JSON property names for both input and output; the input/output naming asymmetry flagged in AZ-809 AC-10 is a parent-suite concern, not a client adaptation).
+- I-3: Poll cadence MUST respect `poll_interval_s` (lower bound between successive `GET /api/satellite/route/{id}` calls) and `poll_max_attempts` (upper bound on attempt count). The client logs every poll tick at INFO with the observed status.
+- I-4: Terminal-success is exactly `mapsReady=true`. Terminal-failure is exactly `status ∈ {"failed", "error", "rejected"}`. Any other status is treated as "still processing" and triggers the next poll. If the poll budget is exhausted without terminal status, `RouteTransientError` is raised with the last observed status.
+- I-5: 4xx responses with RFC 7807 `ProblemDetails` → `RouteValidationError`; `field_errors` is populated from the `errors` dict so the caller can render per-field rejections.
+- I-6: 5xx / network / timeout → `RouteTransientError` with `__cause__` set to the underlying `httpx` exception. The retry semantics are caller-driven — the route client itself does NOT retry the POST, leaving the policy to the fixture / CLI (e.g., `tests/e2e/replay/conftest.py::operator_pre_flight_setup` retries up to 3 times using C11's `_DEFAULT_BACKOFF_SCHEDULE_S = (1, 2, 4, 8)`).
+- I-7: The inventory verify step uses `POST /api/satellite/tiles/inventory` (≤ 5000 entries / request) and enumerates the route's tile coverage locally from `(waypoints, suggested_region_size_meters)` using the parent suite's web-Mercator math (`_EARTH_EQUATORIAL_CIRCUMFERENCE_M = 40 075 016.686`). The result is a **lower bound** on actual server coverage — the server may interpolate intermediate corridor tiles that the local enumeration misses; this is documented and acceptable as a sanity-check signal, not a coverage proof.
+
+## Non-Goals
+
+- Not covered: producing the `RouteSpec` — owned by `replay_input.tlog_route.extract_route_from_tlog` (AZ-836).
+- Not covered: orchestration of when the operator runs the seed — owned by C12 (production binding deferred; cycle-3 e2e fixture `operator_pre_flight_setup` is the current driver — AZ-839).
+- Not covered: FAISS index construction over the populated cache — owned by C10 `DescriptorBatcher`.
+- Not covered: bbox-based seeding — handled by `tile_downloader.download_tiles_for_area` (and by `tests/fixtures/derkachi_c6/seed_region.py` for the e2e fixture).
+- Not covered: multi-route batching — one `RouteSpec` per `seed_route` call. Multi-flight aggregate corridors are an operator-workflow concern.
+
+## Versioning Rules
+
+- **Breaking changes** (renamed method, removed required field, changed return type, parent-suite Route API contract break) require a major version bump. Coordinate with the C3 fixture (AZ-839) and any future C12 production binding via Choose A/B/C/D before bumping.
+- **Non-breaking additions** (new optional constructor kwarg, new field on `RouteSeedResult`, new error variant the consumer catches via `SatelliteProviderRouteError`) require a minor version bump.
+- The pre-emptive validation bounds (I-1) MUST track the parent-suite `CreateRouteRequestValidator.cs` exactly. Drift between client and server validators is a defect, not a version concern — fix the client to match the server.
+
+## Test Cases
+
+| Case | Input | Expected | Notes |
+|------|-------|----------|-------|
+| route-happy-path | `RouteSpec` for Derkachi tlog (2-waypoint corridor, region_size=500m) against a stubbed `satellite-provider` returning `mapsReady=true` on the 2nd poll | `RouteSeedResult` with `maps_ready=True`, `tile_count > 0`, `terminal_status="completed"`, `elapsed_ms` reflects 2 polls | AZ-838 AC-1, AC-2 |
+| validation-empty-points | `RouteSpec(waypoints=(), …)` | `RouteValidationError` raised BEFORE HTTP POST | I-1, AZ-838 AC-6 |
+| validation-too-many-points | `RouteSpec` with 501 waypoints | `RouteValidationError` raised BEFORE HTTP POST | I-1, AZ-838 AC-6 |
+| validation-region-too-large | `RouteSpec(suggested_region_size_meters=10_001.0, …)` | `RouteValidationError` raised BEFORE HTTP POST | I-1, AZ-838 AC-6 |
+| 4xx-problem-details | server returns 400 + RFC 7807 `errors` dict | `RouteValidationError` with `field_errors` populated from the response | I-5, AZ-838 AC-3 |
+| 5xx-transient | server returns 503 | `RouteTransientError` with `__cause__` set to the underlying `httpx` exception | I-6, AZ-838 AC-4 |
+| terminal-failure | server reports `status="failed"` mid-poll | `RouteTerminalFailureError`; `.detail` carries the response JSON | I-4, AZ-838 AC-5 |
+| poll-budget-exhausted | server stays in `status="processing"` past 60 attempts | `RouteTransientError` referencing the last observed status | I-3, I-4 |
+| inventory-verify-counts-present | `mapsReady=true` then inventory POST returns mixed `present=true/false` entries | `tile_count` equals the count of `present=true` entries | I-7 |
+| integration-derkachi | `RouteSpec` from real Derkachi tlog, against the Jetson `satellite-provider` (gated by `RUN_E2E=1` + `SATELLITE_PROVIDER_URL`) | `tile_count > 0`, `maps_ready=True`, completes in ≤ 15 s on the 2-waypoint reference route | AZ-838 AC-10 (Jetson-only, Tier-2) |
+
+## Change Log
+
+| Version | Date | Change | Author |
+|---------|------|--------|--------|
+| 1.0.0 | 2026-05-26 | Initial contract — produced by AZ-838 (Epic AZ-835 C2). Cycle-3 addition; consumed by AZ-839 (`operator_pre_flight_setup` real fixture) and AZ-840 (E2E orchestrator test). | autodev |
@@ -1,18 +1,20 @@
 # Contract: tile_downloader

 **Component**: c11_tilemanager
-**Producer task**: AZ-316_c11_tile_downloader
+**Producer task**: AZ-316_c11_tile_downloader (initial), AZ-777 Phase 1 (cycle-3 inventory-contract adaptation)
 **Consumer tasks**: AZ-253 (E-C12 Operator Pre-flight Tooling — TBD at C12 decompose time)
-**Version**: 1.0.0
-**Status**: draft
-**Last Updated**: 2026-05-10
+**Version**: 1.1.0
+**Status**: stable
+**Last Updated**: 2026-05-26

 ## Purpose

-The `TileDownloader` Protocol is C11's operator-side download interface. C12 invokes it during F1 (pre-flight cache build) to fetch satellite tiles from the parent suite's `satellite-provider` GET surface, apply RESTRICT-SAT-4 resolution gating at the C11 boundary, and write accepted tiles into C6. Freshness rejections surfacing from C6 (AZ-307) are counted and surfaced in the report.
+The `TileDownloader` Protocol is C11's operator-side download interface. C12 invokes it during F1 (pre-flight cache build) to fetch satellite tiles from the parent suite's `satellite-provider` inventory + slippy-map surface, apply RESTRICT-SAT-4 resolution gating at the C11 boundary, and write accepted tiles into C6. Freshness rejections surfacing from C6 (AZ-307) are counted and surfaced in the report.

 C11 is operator-side ONLY; ADR-004 forbids the airborne companion image from importing this module.

+**Upstream API (cycle 3 — AZ-777 Phase 1)**: against the real parent-suite `satellite-provider` v1.0.0 inventory contract — `POST /api/satellite/tiles/inventory` (bulk lookup by `(zoom, x, y)`, ≤ 5000 entries / request, per `tile-inventory.md` v1.0.0 / AZ-505) + `GET /tiles/{z}/{x}/{y}` (slippy-map JPEG fetch, issued only for inventory entries with `present=true`). Authentication: `Authorization: Bearer ${SATELLITE_PROVIDER_API_KEY}`; the dev-only `SATELLITE_PROVIDER_TLS_INSECURE=1` env knob accepts the self-signed dev cert (production must validate against a CA-issued cert). Because the inventory response carries no `Content-Length` hint, AZ-308's pre-write budget pre-check uses a conservative `_DEFAULT_ESTIMATED_TILE_BYTES = 50 000` per-tile reserve.
+
 ## Shape

 ### Function / method API
@@ -79,7 +81,7 @@ class TileSummary:
 - I-1: `tiles_downloaded + tiles_rejected_resolution + tiles_rejected_freshness == sum of attempted tiles`. The report accounts for every tile the downloader attempted; no silent drops.
 - I-2: A re-run of `download_tiles_for_area` for the same `(bbox, zoom_levels, sector_class, flight_id)` after a successful prior run is idempotent: `outcome = idempotent_no_op` and no GETs are issued. Idempotence is enforced by C11's download-progress journal under `cache_root/.c11/journal/`.
 - I-3: Every accepted tile passes BOTH the C11 resolution gate (≥ 0.5 m/px per RESTRICT-SAT-4) AND the C6 freshness gate (AZ-307). A tile that fails either is excluded from `tiles_downloaded`.
- I-4: TLS + service-internal API key authenticate the GET; auth failure surfaces as `SatelliteProviderError` and aborts the run with `outcome = failure`. The downloader does NOT fall back to plaintext or unauthenticated requests.
+- I-4: JWT Bearer authentication (`SATELLITE_PROVIDER_API_KEY`) over TLS authenticates the inventory POST and the slippy-map GET; auth failure surfaces as `SatelliteProviderError` and aborts the run with `outcome = failure`. The downloader does NOT fall back to plaintext or unauthenticated requests. `SATELLITE_PROVIDER_TLS_INSECURE=1` is a dev-only knob for self-signed certs; production must run with it unset.
 - I-5: The downloader writes via the AZ-303 `TileStore`/`TileMetadataStore` Protocols; it does NOT touch C6's filesystem layout directly.
 - I-6: A `CacheBudgetExceededError` aborts pre-write with no partial write and `outcome = failure`. The C6 cache budget enforcer (AZ-308) drives the headroom check.

@@ -112,4 +114,5 @@ class TileSummary:

 | Version | Date | Change | Author |
 |---------|------|--------|--------|
+| 1.1.0 | 2026-05-26 | Internal upstream contract adapted to `satellite-provider` v1.0.0 inventory contract (AZ-777 Phase 1): `POST /api/satellite/tiles/inventory` + `GET /tiles/{z}/{x}/{y}` replace the previous `GET /api/satellite/tiles?bbox=…&zoom=…` shape. `download_tiles_for_area` / `DownloadRequest` / `DownloadBatchReport` surface UNCHANGED — non-breaking minor bump. Auth tightened to JWT Bearer over TLS. Status moved draft → stable. | autodev |
 | 1.0.0 | 2026-05-10 | Initial contract — produced by AZ-316 (E-C11 decomposition) | autodev |
@@ -0,0 +1,95 @@
+syntax = "proto3";
+
+package satellite.v1;
+
+import "google/protobuf/timestamp.proto";
+
+option csharp_namespace = "Satellite.V1";
+
+service RouteTileDelivery {
+  rpc DeliverRouteTiles(DeliverRouteTilesRequest) returns (stream RouteTileEvent);
+}
+
+message DeliverRouteTilesRequest {
+  RouteSpec route = 1;
+  repeated ClientTileRecord client_tiles = 2;
+}
+
+message RouteSpec {
+  string route_id = 1;
+  repeated Waypoint waypoints = 2;
+  double region_size_meters = 3;
+  int32 zoom = 4;
+  repeated GeofencePolygon geofences = 5;
+  bool include_geofence_tiles = 6;
+}
+
+message Waypoint {
+  double lat = 1;
+  double lon = 2;
+}
+
+message GeofencePolygon {
+  repeated Waypoint vertices = 1;
+}
+
+message ClientTileRecord {
+  int32 z = 1;
+  int32 x = 2;
+  int32 y = 3;
+  double resolution_m_per_px = 4;
+  google.protobuf.Timestamp captured_at = 5;
+  optional string source = 6;
+  bytes content_sha256 = 7;
+}
+
+message RouteTileEvent {
+  oneof payload {
+    RouteManifest manifest = 1;
+    TileBatch batch = 2;
+    ProgressUpdate progress = 3;
+    DeliveryComplete complete = 4;
+    DeliveryError error = 5;
+  }
+}
+
+message RouteManifest {
+  uint32 total_candidates = 1;
+  uint32 skipped_by_client = 2;
+  uint32 to_deliver = 3;
+}
+
+message TileBatch {
+  uint32 batch_seq = 1;
+  repeated TilePayload tiles = 2;
+}
+
+message TilePayload {
+  int32 z = 1;
+  int32 x = 2;
+  int32 y = 3;
+  double resolution_m_per_px = 4;
+  google.protobuf.Timestamp captured_at = 5;
+  string source = 6;
+  bytes jpeg = 7;
+  bytes content_sha256 = 8;
+  uint32 route_priority = 9;
+}
+
+message ProgressUpdate {
+  uint32 delivered = 1;
+  uint32 total = 2;
+  uint32 downloading = 3;
+}
+
+message DeliveryComplete {
+  uint32 delivered = 1;
+  uint32 skipped_client = 2;
+  uint32 skipped_server_filter = 3;
+}
+
+message DeliveryError {
+  string code = 1;
+  string message = 2;
+  bool retryable = 3;
+}
@@ -0,0 +1,143 @@
+# Contract: RouteTileDelivery (gRPC)
+
+**Component**: c11_tilemanager (consumer), satellite-provider (producer)
+**Epic**: AZ-976
+**ADR**: ADR-013 (architecture.md)
+**Proto**: `tile_provision.proto` — `package satellite.v1`
+**Version**: 0.3.0
+**Status**: proposed
+**Last Updated**: 2026-06-19
+
+## Purpose
+
+Operator-side **pre-flight cache provisioning**. Client sends route + onboard tile catalog once; server streams `RouteTileEvent` messages until `DeliveryComplete` or `DeliveryError`.
+
+satellite-provider does **not** receive `flight_id` — that is a C6 bookkeeping concern on the gps-denied side only (`route_id` is the wire correlation id).
+
+C11/C12 on the **operator workstation** only. ADR-004: airborne image must not import stubs or open this channel.
+
+## RPC
+
+```protobuf
+service RouteTileDelivery {
+  rpc DeliverRouteTiles(DeliverRouteTilesRequest) returns (stream RouteTileEvent);
+}
+```
+
+| Concern | Rule |
+|---------|------|
+| Auth | gRPC metadata `authorization: Bearer <JWT>` |
+| TLS | Required in production; `SATELLITE_PROVIDER_TLS_INSECURE=1` dev knob |
+| Idempotency | `RouteSpec.route_id` (UUID string) |
+| Resume | Client persists last acked `batch_seq` per `route_id` locally (not on wire) |
+
+## Request
+
+### `DeliverRouteTilesRequest`
+
+| Field | Description |
+|-------|-------------|
+| `route` | Corridor geometry + single zoom |
+| `client_tiles` | Onboard inventory snapshot (route intersection only) |
+
+### `RouteSpec`
+
+| Field | Maps from gps-denied |
+|-------|----------------------|
+| `route_id` | Client-generated UUID per provision job |
+| `waypoints` | `replay_input.tlog_route.RouteSpec.waypoints` |
+| `region_size_meters` | `RouteSpec.suggested_region_size_meters` |
+| `zoom` | Single slippy zoom level (confirmed sufficient) |
+| `geofences` | Optional inclusion polygons |
+| `include_geofence_tiles` | Union geofence tiles with corridor grid |
+
+### `ClientTileRecord`
+
+Canonical key: **`(z, x, y)`**. `source` is informational only — **not** used in skip logic.
+
+| Field | C6 mapping |
+|-------|------------|
+| `resolution_m_per_px` | RESTRICT-SAT-4 (lower = better) |
+| `captured_at` | `TileMetadata.capture_timestamp` |
+| `content_sha256` | `TileMetadata.content_sha256_hex` (raw 32 bytes) |
+
+## Server skip rule (client catalog)
+
+For each server candidate tile, **omit from stream** when `client_tiles` has matching `(z,x,y)` and **any** of:
+
+1. `client.content_sha256` is non-empty and **equals** server payload hash → skip (byte-identical)
+2. `client.resolution_m_per_px <= server.resolution_m_per_px` **and** `client.captured_at >= server.captured_at` → skip (metadata-sufficient)
+
+`source` is **not** compared.
+
+`RouteManifest.skipped_by_client` counts tiles removed by this rule.
+
+## Sector — not on this wire
+
+**Sector** (`active_conflict` vs `stable_rear`) controls **how stale a tile may be before C6 rejects it on write** (AC-NEW-6 freshness). It is an operator decision about the geographic area, not something satellite-provider needs to deliver tiles.
+
+| Layer | Who applies sector |
+|-------|-------------------|
+| satellite-provider | Does not need sector — streams tiles by route geometry |
+| C11 client write | Reads sector from **C11/C12 config** (same as today) when calling C6 freshness gate |
+
+No `SectorClass` field on the gRPC request.
+
+## Response stream: `RouteTileEvent`
+
+Typical sequence:
+
+1. **`RouteManifest`** — `total_candidates`, `skipped_by_client`, `to_deliver`
+2. **`TileBatch`** — monotonic `batch_seq`; on-disk hits first, then freshly fetched
+3. **`ProgressUpdate`** — optional
+4. **`DeliveryComplete`** or **`DeliveryError`**
+
+### `DeliveryComplete` counters
+
+| Field | Meaning |
+|-------|---------|
+| `delivered` | Tiles actually sent in `TileBatch` streams |
+| `skipped_client` | Same as manifest `skipped_by_client` (echo for client verify) |
+| `skipped_server_filter` | Tiles SP required but **did not send** after client dedup — see below |
+
+#### `skipped_server_filter` — what counts
+
+Tiles that entered the post-client-dedup work queue but never appeared in a batch:
+
+| Reason | Example |
+|--------|---------|
+| **Fetch failed** | External imagery provider 404/timeout after retries |
+| **Below SP min resolution** | SP refuses to store/serve below its configured floor |
+| **Geometry clip** | Tile dropped after server-side corridor/geofence validation |
+| **Operational cap** | Job hit max-tiles / rate limit (if SP enforces) |
+
+Tiles skipped by the **client catalog rule** are **not** included here (they are `skipped_client`).
+
+If SP has no server-side filters in v1, `skipped_server_filter` may be **0**; the field is reserved for observability.
+
+### `TilePayload`
+
+| Field | Notes |
+|-------|-------|
+| `content_sha256` | 32-byte SHA-256 of `jpeg`; matches C6 DB invariant |
+| `route_priority` | Lower = earlier along route |
+
+## Client write path (gps-denied)
+
+`RouteTileDeliveryClient` (C11):
+
+- Assigns C6 `flight_id` from operator context locally (not from SP)
+- Applies RESTRICT-SAT-4, **sector-based freshness**, AZ-308 budget, download journal
+- Resumes via persisted `route_id` + `batch_seq`
+
+## Migration
+
+REST `route_client` + `HttpTileDownloader` remain fallback until AZ-979 benchmark.
+
+## Change log
+
+| Version | Date | Change |
+|---------|------|--------|
+| 0.3.0 | 2026-06-19 | `ClientTileRecord.content_sha256`; sequential field nums on `TilePayload`; sector/flight_id off wire; skip rule + `skipped_server_filter` defined |
+| 0.2.0 | 2026-06-19 | `satellite.v1.RouteTileDelivery` + `RouteTileEvent` oneof |
+| 0.1.0 | 2026-06-19 | Initial draft (superseded) |
@@ -127,6 +127,12 @@ Lives at `src/gps_denied_onboard/runtime_root/vio_factory.py`. Selects the strat

 - **6×6 SPD covariance always returned**: `pose_covariance_6x6` is symmetric and positive-definite for every `VioOutput`. Implementations MUST NOT return a "tightened" covariance (smaller Frobenius norm) during a degradation event; honest covariance is the safety floor for AC-NEW-4 and AC-NEW-7. A test (covariance-monotonicity contract test, deferred to Step 9 / E-BBT) asserts this across all three strategies.
 - **`frame_id` echo**: `VioOutput.frame_id` equals the input `NavCameraFrame.frame_id`. C5 relies on this for time-aligned factor insertion.
+- **`relative_pose_T.translation()` is in metres** (NOT pixels, NOT unit-length). Every strategy MUST emit metric translation; C5 fuses it directly into the state estimator without further scaling. Monocular strategies (KLT/RANSAC) recover scale through an injected `AltitudeProvider` (see AZ-919/AZ-920); stereo / VIO strategies (OKVIS2, VINS-Mono) get scale from their backend optimization.
+- **`scale_quality` carries the per-frame degraded-mode signal** (AZ-921). Three values:
+  - `"metric"` — translation is in metres, fully trustworthy. ESKF consumes `pose_covariance_6x6` as-is.
+  - `"direction_only"` — translation direction is informative but magnitude is not (near-vertical motion in a nadir camera; banked turn). ESKF overrides `R_meas[0:3, 0:3]` to `_DIRECTION_ONLY_TRANSLATION_SIGMA_M² = 64 m²` so the rotation update is honoured and the position update contributes little.
+  - `"unknown"` — translation is not trustworthy at all (AGL missing, zero inlier flow, hover, stationary). ESKF overrides `R_meas[0:3, 0:3]` to `_UNKNOWN_TRANSLATION_SIGMA_M² = 1e6 m²` so the position update is effectively skipped while the rotation update remains active.
+  Default `"unknown"` on the `VioOutput` dataclass keeps legacy strategies bug-for-bug compatible until they opt in to the AZ-919 `AltitudeProvider` plumbing.
 - **Single-threaded by contract**: each `VioStrategy` instance is bound to one writer thread (the camera ingest thread). Concurrent calls to `process_frame` on the same instance are undefined behaviour. The composition root binds one instance per ingest thread.
 - **`reset_to_warm_start` is destructive**: clears the strategy's keyframe window, IMU integration state, and feature track buffer; subsequent `process_frame` calls re-initialise from the hint. Calling `reset_to_warm_start` mid-flight is allowed (F8 reboot recovery) but must not be issued concurrently with a `process_frame` call on the same instance.
 - **`current_strategy_label()` is constant per instance**: returns the same string for the lifetime of the instance and matches `config.vio.strategy` exactly. The label is FDR-stamped on every `VioHealth` event for AC-NEW-3 audit.
@@ -0,0 +1,157 @@
+# Replay-input CSV format (AZ-896)
+
+**Status**: canonical operator-facing spec for the `--imu` argument of
+`gps-denied-replay` (AZ-894).
+**Audience**: operators preparing a (video, CSV) replay pair, plus engineers
+implementing alternative replay backends.
+**Companion artifacts**:
+
+- `_docs/02_document/contracts/replay/example_data_imu.csv` — minimal valid
+  example (20 rows = 2 s at 10 Hz).
+- `_docs/00_problem/input_data/flight_derkachi/data_imu.csv` — full Derkachi
+  fixture (4,900 rows = 489.9 s at 10 Hz).
+- Parser implementation:
+  `src/gps_denied_onboard/replay_input/csv_ground_truth.py`.
+
+## Hard contract (read before generating a file)
+
+The replay pipeline trusts the CSV blindly inside the loop. Violations of any
+of the following will produce silently wrong outputs (the parser only catches
+schema-level faults, not semantic ones), so the operator owns these
+invariants:
+
+1. **Nadir camera.** The companion `.mp4` must be a nadir (straight-down)
+   recording. The C1 VIO and C2 VPR stages assume nadir framing; oblique
+   imagery breaks the satellite-anchor and VIO scale recovery.
+2. **Airborne at row 0.** The UAV must already be airborne at the first CSV
+   row / first video frame. The replay pipeline does not implement a
+   take-off detector — feeding a ground-roll segment yields garbage IMU
+   integration.
+3. **Aligned start.** Row 0's `Time = 0.0` must correspond to the first
+   video frame. The CLI does not perform sub-frame alignment; offset the
+   CSV/clip pair offline before invoking `gps-denied-replay`.
+4. **Monotonic, uniformly-spaced `Time`.** Rows must be strictly increasing
+   on `Time` and uniformly spaced (the Derkachi fixture is 10 Hz). The
+   parser enforces monotonicity (AC-5); uniform spacing is the operator's
+   responsibility — non-uniform spacing skews the ESKF prediction step
+   without raising an error.
+
+## Schema
+
+The CSV must be header-first, comma-separated, UTF-8 encoded. Column order
+does not matter — the parser uses `csv.DictReader` and looks up by name —
+but the column **names** must match exactly (case-sensitive).
+
+15 columns are required; up to 4 additional columns (mag fields,
+`relative_alt`) are tolerated and ignored.
+
+### Required columns
+
+CSV columns use the MAVLink wire format (mG accel, mrad/s gyro, FRD
+body frame). The parser converts to SI / FLU at the `ImuSample`
+boundary via
+`gps_denied_onboard.helpers.imu_units.mavlink_imu_to_si_flu` (AZ-918)
+so downstream C5 ESKF + `imu_preintegrator` consumers see the contract
+they were built for. **Operator-facing CSV files keep the raw scaling**
+— the conversion is a parser-internal concern.
+
+| # | Column | Unit (CSV) | Type | Notes |
+|---|--------|------------|------|-------|
+| 1 | `timestamp(ms)` | ms | float | Pixhawk wall clock at sample capture. **Ignored by the replay pipeline** — kept only for trace-back to the original tlog. |
+| 2 | `Time` | s | float | **Canonical replay clock.** Must start at `0.0`, increase monotonically, and be uniformly spaced. The replay loop uses this column for every timestamp it emits. |
+| 3 | `SCALED_IMU2.xacc` | mg, FRD | float | Body-frame X accelerometer, MAVLink `SCALED_IMU2` raw scaling. Converted by the parser to m/s² in `ImuSample.accel_xyz[0]` (FLU body). |
+| 4 | `SCALED_IMU2.yacc` | mg, FRD | float | Body-frame Y accelerometer; sign-flipped during FRD→FLU. |
+| 5 | `SCALED_IMU2.zacc` | mg, FRD | float | Body-frame Z accelerometer; sign-flipped during FRD→FLU. |
+| 6 | `SCALED_IMU2.xgyro` | mrad/s, FRD | float | Body-frame X gyro, MAVLink `SCALED_IMU2` raw scaling. Converted to rad/s in `ImuSample.gyro_xyz[0]` (FLU body). |
+| 7 | `SCALED_IMU2.ygyro` | mrad/s, FRD | float | Body-frame Y gyro; sign-flipped during FRD→FLU. |
+| 8 | `SCALED_IMU2.zgyro` | mrad/s, FRD | float | Body-frame Z gyro; sign-flipped during FRD→FLU. |
+| 9 | `GLOBAL_POSITION_INT.lat` | degrees | float | WGS84 latitude. **Already in decimal degrees** (Derkachi dump convention — pre-divided by 1e7 from MAVLink's int representation). |
+| 10 | `GLOBAL_POSITION_INT.lon` | degrees | float | WGS84 longitude (same convention as `lat`). |
+| 11 | `GLOBAL_POSITION_INT.alt` | mm | float | MSL altitude. Parser divides by 1000 to emit metres. |
+| 12 | `GLOBAL_POSITION_INT.vx` | cm/s | float | NED north velocity. Parser divides by 100 to emit m/s. |
+| 13 | `GLOBAL_POSITION_INT.vy` | cm/s | float | NED east velocity. |
+| 14 | `GLOBAL_POSITION_INT.vz` | cm/s | float | NED down velocity. |
+| 15 | `GLOBAL_POSITION_INT.hdg` | cdeg | float | Heading, 0–35999. Parser divides by 100 to emit degrees. |
+
+### Tolerated extra columns
+
+The following may be present but are not consumed:
+
+| Column | Reason kept | Reason unused |
+|--------|-------------|---------------|
+| `SCALED_IMU2.xmag`, `.ymag`, `.zmag` | Symmetric with the accel/gyro triples in the Derkachi dump | The current ESKF does not integrate magnetometer; AZ-848 follow-up may add it |
+| `GLOBAL_POSITION_INT.relative_alt` | Present in the MAVLink dump | The replay pipeline uses MSL `alt` only |
+
+Additional columns beyond these are ignored without warning. Missing
+required columns cause the load to raise
+`ReplayInputAdapterError` before the replay loop starts (AC-5).
+
+## Schema-level errors the parser catches
+
+The parser raises `ReplayInputAdapterError` (CLI exit code 1) for any of:
+
+- File does not exist or is not a regular file.
+- File is empty (no header row).
+- File has a header but no data rows.
+- Any required column from the table above is missing from the header.
+- The `Time` column at any row contains a non-numeric / NaN / Inf value.
+- The `Time` column is non-monotonic (`Time[i] <= Time[i-1]`).
+- Any required IMU or GPS column at any row contains a non-numeric / NaN /
+  Inf value.
+
+The error message includes the row number (1-based, where row 1 is the
+header — so the first data row is row 2). Operators should treat the first
+parse failure as authoritative and fix the source CSV; the parser does not
+continue after the first invalid row.
+
+## Operator workflow
+
+```bash
+gps-denied-replay \
+  --video ./flight.mp4 \
+  --imu ./data_imu.csv \
+  --output ./estimator_output.jsonl \
+  --camera-calibration ./calib.json \
+  --config ./config.yaml \
+  --mavlink-signing-key ./signing_key.bin
+```
+
+`--tlog` is accepted as a deprecated alias and will be removed by AZ-895.
+When both `--imu` and `--tlog` are supplied, `--imu` wins and a deprecation
+warning is printed to stderr.
+
+## Deriving a new CSV from an ArduPilot tlog
+
+The Derkachi fixture was produced with `pymavlink`'s `mavlogdump.py`. The
+short version:
+
+```bash
+mavlogdump.py --format csv \
+  --types SCALED_IMU2,GLOBAL_POSITION_INT \
+  ./flight.tlog > ./raw_dump.csv
+```
+
+Then post-process to:
+
+1. Rename / merge the per-message timestamp into a single `Time` column
+   relative to the first row.
+2. Drop pre-takeoff rows (the UAV must be airborne at row 0 — see the hard
+   contract above).
+3. Pre-divide `lat` / `lon` from the MAVLink `int * 1e7` representation
+   into decimal degrees.
+4. Re-sample to a uniform 10 Hz cadence if the tlog dump produced
+   non-uniform spacing.
+
+A reference post-processor script is **not** shipped — operators
+historically write a one-off Python or Pandas pipeline per source aircraft.
+
+## Cross-references
+
+- AZ-894 — the CLI + adapter that consumes this format.
+- AZ-895 — deletes the legacy `--tlog` argument once all callers migrate.
+- AZ-897 — operator replay UI; links to this page and serves
+  `example_data_imu.csv`.
+- `_docs/02_document/contracts/replay/replay_protocol.md` — the broader
+  replay orchestration contract.
+- `_docs/00_problem/input_data/flight_derkachi/README.md` — fixture
+  provenance and license caveats.
@@ -0,0 +1,21 @@
+timestamp(ms),Time,SCALED_IMU2.xacc,SCALED_IMU2.yacc,SCALED_IMU2.zacc,SCALED_IMU2.xgyro,SCALED_IMU2.ygyro,SCALED_IMU2.zgyro,SCALED_IMU2.xmag,SCALED_IMU2.ymag,SCALED_IMU2.zmag,GLOBAL_POSITION_INT.lat,GLOBAL_POSITION_INT.lon,GLOBAL_POSITION_INT.alt,GLOBAL_POSITION_INT.relative_alt,GLOBAL_POSITION_INT.vx,GLOBAL_POSITION_INT.vy,GLOBAL_POSITION_INT.vz,GLOBAL_POSITION_INT.hdg
+4551116.348,0,21,-3,-984,52,32,-5,312,-1048,442,50.0809634,36.1115442,141290,23.182,-4,-6,-88,35041
+4551216.348,0.1,-68,-9,-995,58,-17,1,309,-1016,441,50.0809634,36.1115441,141360,23.251,-5,-2,-89,35042
+4551316.348,0.2,9,108,-988,69,-65,13,308,-964,436,50.0809633,36.1115441,141410,23.303,-1,-2,-86,35048
+4551416.348,0.3,-20,27,-977,55,10,26,310,-988,438,50.0809633,36.1115441,141450,23.348,-5,-6,-84,35057
+4551516.348,0.4,-40,40,-1026,0,65,10,306,-1076,440,50.0809633,36.111544,141510,23.402,-2,-2,-86,35065
+4551616.348,0.5,30,126,-1050,-1,75,14,321,-1146,442,50.0809633,36.111544,141570,23.464,0,0,-88,35074
+4551716.348,0.6,-64,67,-1031,-31,-6,21,314,-1066,438,50.0809632,36.1115439,141640,23.53,-5,1,-90,35080
+4551816.348,0.7,-22,112,-1027,-61,-88,-5,302,-951,436,50.0809632,36.1115439,141710,23.601,-2,3,-90,35082
+4551916.348,0.8,-123,-16,-998,-55,-104,-12,301,-942,440,50.0809631,36.1115439,141770,23.669,-10,0,-91,35079
+4552016.348,0.9,-64,-13,-1003,13,-70,-30,301,-936,442,50.080963,36.1115439,141860,23.755,-2,0,-90,35073
+4552116.348,1,-22,39,-995,73,20,-18,314,-988,436,50.080963,36.1115439,141930,23.826,-2,-2,-88,35070
+4552216.348,1.1,-49,-69,-984,2,29,1,317,-992,433,50.080963,36.1115438,142010,23.9,-6,-2,-88,35068
+4552316.348,1.2,-16,98,-991,-59,-28,-11,310,-970,435,50.080963,36.1115438,142080,23.975,-1,6,-86,35063
+4552416.348,1.3,-6,169,-998,-29,2,-2,310,-983,435,50.0809629,36.1115438,142150,24.042,-3,5,-83,35059
+4552516.348,1.4,-31,53,-1003,2,13,-10,317,-1042,438,50.0809629,36.1115438,142210,24.102,-3,3,-83,35051
+4552616.348,1.5,-47,21,-1023,13,13,-14,320,-1069,439,50.0809629,36.1115438,142270,24.166,2,2,-83,35047
+4552716.348,1.6,-30,-59,-1020,-18,24,0,315,-1083,438,50.0809629,36.1115439,142340,24.236,-5,1,-86,35049
+4552816.348,1.7,-103,23,-1058,-59,26,-7,314,-1113,442,50.0809629,36.1115439,142430,24.321,-4,4,-90,35050
+4552916.348,1.8,-17,51,-1037,-9,80,11,317,-1087,444,50.0809629,36.1115439,142510,24.404,-5,0,-93,35049
+4553016.348,1.9,-87,72,-1022,-10,-45,0,309,-1004,439,50.0809628,36.111544,142600,24.494,-6,2,-97,35046
@@ -213,6 +213,33 @@ Side notes:
 - `set_takeoff_origin` (AZ-490 / ADR-010) is invoked identically in replay: the operator's pre-flight C10 Manifest is the source of truth in both modes. The tlog's first GPS fix is the **fallback**, gated through the same Principle #11 bounded-delta check.
 - `BUILD_FAISS_INDEX` is ON in the airborne binary (live and replay alike). C2 in replay queries the **real** C6 `FaissDescriptorIndex`, populated by the pre-flight C10 build. This is the architectural change vs. v1.0.0 of this contract.

+### Composition profile: open-loop ESKF (AZ-776 / ADR-012)
+
+The replay binary supports a second composition profile alongside the production GTSAM-iSAM2 path: **open-loop ESKF**. It is the Tier-2 smoke baseline for the AZ-265 replay flow and the only profile that can run end-to-end against the Derkachi clip today (the satellite-anchored profile waits on AZ-777's C6 reference tile cache).
+
+The user-facing switch is one YAML field on the C4 block:
+
+```
+c4_pose:
+  enabled: false
+c5_state:
+  strategy: eskf
+```
+
+When `c4_pose.enabled = false`:
+- `compose_root` removes `c4_pose` from the component selection map before topological ordering; the C4 wrapper never runs, the `OpenCVGtsamPoseEstimator` is never instantiated, and `pose_estimator.estimate(refined)` in the per-frame loop above is a no-op (C5 receives IMU + VIO inputs only, no PnP anchor).
+- `build_pre_constructed` omits `c5_isam2_graph_handle` from the `pre_constructed` dict (no consumer requires it). The ESKF estimator is still pre-built and cached in the internal `_c5_prebuilt_estimator` slot exactly as in the gtsam_isam2 path; the C5 wrapper short-circuits onto the prebuilt instance per AZ-625.
+- The replay per-frame loop is otherwise unchanged — C1 VIO still runs, C5 still ingests, the JSONL sink still emits per video frame. Position drifts open-loop without satellite re-anchoring; AZ-777 closes that half of the loop by adding C2/C3/C4 against the C6 tile cache.
+
+The 2×2 pairing matrix between `c4_pose.enabled` and `c5_state.strategy` is enforced at compose time (Invariant 13 below). The two valid cells are:
+
+| `c4_pose.enabled` | `c5_state.strategy` | Profile | Status |
+|---|---|---|---|
+| `true`  | `gtsam_isam2` | Steady-state airborne (ADR-003 / ADR-009) | Production |
+| `false` | `eskf`        | Open-loop ESKF (this section) | Replay Tier-2 smoke baseline |
+
+The two **invalid** cells (`true` + `eskf` and `false` + `gtsam_isam2`) raise `CompositionError` from `compose_root` with explicit error text naming both blocks. The Tier-2 replay fixture (`tests/e2e/replay/conftest.py`) writes the second valid cell. The first cell is the live + airborne default.
+
 ## Invariants

 1. **Mode-agnostic C1–C7, C13**: production components MUST NOT contain `if config.mode == "replay":` branches. Mode-specific behaviour lives in the strategies (FrameSource / FcAdapter / MavlinkTransport / ReplaySink / Clock). Verified by an explicit grep guard in CI (the AZ-404 E2E test owns this assertion).
@@ -228,6 +255,44 @@ Side notes:
 11. **MAVLink signing key required in replay**: the airborne binary refuses to run without `--mavlink-signing-key PATH` in both modes. In replay the operator supplies a dummy file (well-formed key bytes; no real channel to verify against). This preserves Invariant 5 — the encoders' signing code path runs identically in both modes.
 12. **Real C6 cache in replay**: the airborne binary in replay mode reads the same pre-built C6 tile cache the operator built via the normal pre-flight C10/C11/C12 flow. There is no replay-specific cache shape. Verified by the AZ-404 E2E fixture, which runs the operator's pre-flight flow before invoking the replay CLI.

+    **Sub-invariant 12.a (cycle 3 — AZ-839 / Epic AZ-835 C3)**: the e2e `operator_pre_flight_setup` fixture replaces the cycle-1 `mkdir` placeholder with a real driver that wires C1 (`replay_input.tlog_route.extract_route_from_tlog` — AZ-836) + C2 (`c11_tile_manager.route_client.SatelliteProviderRouteClient.seed_route` — AZ-838) + C11 (`tile_downloader.HttpTileDownloader.download_for_bbox`) + C10 (`DescriptorBatcher`) to populate C6 from a tlog-derived corridor. The fixture yields a `PopulatedC6Cache` dataclass (`cache_root`, `tile_store_path`, `faiss_index_path`, `faiss_sidecar_sha256_path`, `faiss_sidecar_meta_path`, `route_spec`, `tile_count`, `elapsed_seconds`). The cache is mounted into a named docker volume that survives across pytest sessions (cold first invocation populates; subsequent invocations within the same compose session reuse — warm cache). Cold-start budget: ≤ 5 min on Tier-2 Jetson; warm: ≤ 30 s. Sidecar triple-consistency (`.index` + `.sha256` + `.meta.json`) per AZ-306 is verified at every fixture yield; mismatch raises `IndexUnavailableError`. The C12 production binding for the route-driven path is a future-cycle integration; production pre-flight still uses the bbox-driven `download_tiles_for_area` path today.
+
+    **Sub-invariant 12.c (cycle 3 — Epic AZ-835: route-driven supersedes bbox)**: route-driven seeding (operator's tlog-derived `RouteSpec` → `POST /api/satellite/route` → corridor materialised by `satellite-provider`) supersedes the legacy AZ-777 bbox-driven approach (`POST /api/satellite/request` over a fixed lat/lon box) for the real-flight validation path. The supersedure rationale is twofold:
+
+    - **Tile efficiency (~100×)**: the AZ-777 bbox for a typical Derkachi-style flight produces ~11,400 z15-z18 tiles (~140 MB, 48 % over the C6 cache budget). A 10-point coarsened route with `regionSizeMeters=500` per point produces ~50-100 unique tiles (~1.5 MB) for the same VPR descriptor lock area. The route-driven path is the only one that fits the AZ-696 reference-fixture budget on Jetson.
+    - **Pre-commitment honesty**: a bbox pre-commits to where the operator *might* fly. A route pre-commits to where they *did* fly. For real-flight validation against ground-truth GPS, the latter is the right primitive — it ensures the FAISS index is populated with descriptors of the tiles the airborne pipeline will actually query, not a superset whose VPR misses are statistically indistinguishable from the AZ-696 AC-3 ≤ 100 m threshold violations.
+
+    AZ-777 Phase 1 (e2e-runner wiring + C11 read-contract adaptation) is **retained and reused** by Epic AZ-835. AZ-777 Phases 3 and 5 are **superseded** by Epic AZ-835 children (AZ-839 for the operator-fixture rewrite, AZ-842 for the docs work). Phase 4 (un-xfail of AC-4/AC-5) was deferred to backlog after cycle-4 AZ-895 took the un-xfail target along a different path; it is not on the active epic.
+
+    **Sub-invariant 12.d (cycle 3 — AZ-839 / Epic AZ-835 C3: fixture failure-handling contract)**: the `operator_pre_flight_setup` fixture must distinguish three failure classes from `SatelliteProviderRouteClient.seed_route` / `HttpTileDownloader.download_for_bbox` and surface them honestly:
+
+    | Class | Source | Fixture response |
+    |-------|--------|------------------|
+    | Validation | `RouteValidationError` (pre-emptive AZ-809 bound violation) or `IndexUnavailableError` (sidecar triple mismatch at yield-time) | Re-raise — operator/test author error, no remediation in the fixture |
+    | Terminal | `RouteTerminalFailureError` (satellite-provider rejected the route id or status polling returned `mapsReady=false` past `poll_max_attempts`) | Re-raise — service-side state cannot be recovered by retry |
+    | Transient | `RouteTransientError` or `TileDownloadError` with HTTP 5xx / network reset | **Retry up to 3 attempts** using C11's existing exponential backoff schedule (`HttpTileDownloader.RETRY_*` constants); re-raise on exhaustion |
+
+    The fixture does NOT swallow transient failures silently — the third attempt's exception surfaces with the full retry history in the message so the test report can distinguish "fixture genuinely tried 3×" from "fixture short-circuited". Cold-start budget of ≤ 5 min on Tier-2 Jetson is measured wall-clock around the entire retry loop, not per-attempt.
+
+    **Sub-invariant 12.b (cycle 3 — AZ-840 / Epic AZ-835 C4)**: the E2E orchestrator test `tests/e2e/replay/test_az835_e2e_real_flight.py` takes only `(tlog, video, calibration)` and runs the full 7-step pipeline end-to-end on Tier-2 Jetson — no operator hand-curation between steps. The 7 steps are: (1) active flight cut + tlog/video sync via AZ-405; (2) on-fly frame + IMU extraction; (3) auto-create route via AZ-836; (4) POST route to satellite-provider via the C3 fixture's `operator_pre_flight_setup` (delegates to AZ-838); (5) build FAISS index (driven by C3); (6) run gps-denied airborne pipeline against the populated cache + tlog/video/calibration (reuses the airborne composition root path AZ-699 exercises); (7) compute horizontal-error distribution and emit the AZ-699 verdict report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`. The verdict report is emitted ALWAYS, regardless of PASS / FAIL on the AZ-696 ≥ 80 % within 100 m gate — the success criterion is that the report exists with the honest distribution, not that the verdict is PASS. Gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2`.
+13. **C4↔C5 pairing matrix is enforced at compose time** (AZ-776 / ADR-012): `compose_root` rejects the two off-diagonal cells of the (`c4_pose.enabled`, `c5_state.strategy`) matrix with a `CompositionError` naming both blocks. `enabled=False` + `gtsam_isam2` and `enabled=True` + `eskf` are forbidden. The two valid cells are `enabled=True` + `gtsam_isam2` (production steady-state per ADR-003 / ADR-009) and `enabled=False` + `eskf` (open-loop ESKF — replay Tier-2 smoke baseline; satellite anchoring deferred to AZ-777). Verified by `tests/unit/runtime_root/test_az776_open_loop_eskf_composition.py` AC-3a and AC-3b.
+14. **Single canonical clock & CSV-driven replay path (cycle 4 — AZ-894 / AZ-895 / AZ-896)**: production runs as a single edge process on a single device. There is exactly **one** wall/monotonic clock authoritative for timestamps that cross component boundaries — the clock at the C8 inbound boundary (`FcAdapter`) where IMU windows enter the system. Two-clock surfaces — for example a C1 `VioOutput.emitted_at_ns` derived from the Jetson `monotonic_ns()` paired against a C8 `ImuWindow.ts_end_ns` derived from FC-boot — produced the AZ-848 ESKF out-of-order regression observed in cycle 3 (Jetson clock advanced between IMU window arrival and VIO emission, so the VIO emission timestamp routinely landed *before* the IMU window's `ts_end_ns` when the two were compared as if on the same axis, and ESKF rejected its own VIO updates). All downstream timestamps (`EstimatorOutput.ts_ns`, `JsonlReplaySink` per-row `t`, FDR `flight_event.ts_ns`) MUST derive from a single canonical clock that produces deterministic per-record values for a given input. In live mode the canonical clock is the C8 inbound IMU window's FC-boot-relative timestamp; in replay mode it is the CSV row's `Time` column.
+
+    **Sub-invariant 14.a (CSV-driven replay path — AZ-894)**: the replay-mode operator input is `(video, CSV)`. The CSV row's `Time` column is the canonical clock for the entire replay run: every IMU window emitted by the new `csv_replay_input.CsvReplayInputAdapter` (gated `BUILD_CSV_REPLAY_ADAPTER=ON` in the airborne and research binaries) carries `ts_end_ns` derived from the CSV `Time` column; the `Clock` strategy injected into the composition root is `CsvDerivedClock` which uses the same column. There is no auto-sync (see 14.c below). The CSV must satisfy the format spec at `_docs/02_document/contracts/replay/csv_replay_format.md` (AZ-896) — including the requirement that row 0's `Time` equals video frame 0 (`t=0`) so the airborne pipeline does not need to apply any per-stream offset.
+
+    **Sub-invariant 14.b (tlog adapter audit-only role — AZ-895)**: `TlogReplayFcAdapter` (Sub-invariant 14 of the prior cycles' design) is retained in source for two audit / migration paths and removed from the replay test/demo critical path:
+
+    - **FDR analysis**: one-shot tlog parsing for incident review (e.g. AZ-848 timestamp investigation) — invoked from offline analysis scripts under `tools/`, not from the airborne composition root.
+    - **One-shot tlog → CSV export**: a CLI utility (`gps-denied-tlog-to-csv`) that reads a pymavlink tlog and writes the canonical CSV per AZ-896. This is the migration ramp for users who only have legacy tlog inputs.
+
+    The previous `compose_root(config={"mode": "replay", "replay_input.adapter": "tlog"})` code path is preserved with a one-cycle deprecation warning on startup; removal is tracked in AZ-908 (cycle-5+ backlog). The CSV adapter (`BUILD_CSV_REPLAY_ADAPTER=ON`) is the default and the only path the e2e fixture suite exercises after cycle 4.
+
+    **Sub-invariant 14.c (auto-sync deprecation — AZ-895)**: the `replay_input.auto_sync` module (AZ-405) is reduced to a deprecated no-op stub that raises `ReplayInputAdapterError("auto-sync removed; supply --imu CSV instead")` from every public entry point. The CLI flags `--time-offset-ms`, `--skip-auto-sync`, and `--auto-trim` are accepted with a deprecation warning and ignored. The justification: with a single canonical clock at the CSV row level (14.a), there is no second clock to align against — the operator authors the CSV with the correct row-0 alignment, and the fixture verifies row 0's `Time == 0`. Hard removal of the deprecated surface is tracked in AZ-908; this cycle ships only the stub + warnings to preserve source-compat for any downstream caller built against AZ-405's pre-deprecation shape.
+
+    **Sub-invariant 14.d (operator-facing UI — AZ-897, superseded by Invariant 15)**: retained for historical cycle-4 CSV-only upload spec. Default demo entry is now F11 / AZ-969.
+
+15. **Operator demo replay path (cycle 5 — AZ-969 / F11)**: the default product demo accepts raw `(video, tlog, calibration)` from the suite UI. Alignment is operator-visible (dual timeline bars + explicit refine); the backend exports an AZ-896 CSV whose `Time` column is the single canonical replay clock (Invariant 14.a). Steps: preview timelines (AZ-970) → coarse align + refine (AZ-897, AZ-971) → export CSV (AZ-972) → seed corridor cache from tlog GPS (AZ-974) → run `gps-denied-replay` (AZ-973) → map + verdict. The `(video, pre-authored CSV)` bypass (AZ-959) is optional, not default. E2E tests MUST use the same orchestration modules as production — no parallel test-only graph. AZ-908 (hard removal of alignment stubs) is deferred until AZ-971 ships.
+
 ## Producer / Consumer Split

 | Task ID | Scope |
@@ -0,0 +1,251 @@
+openapi: 3.1.0
+info:
+  title: gps-denied-onboard replay API
+  description: HTTP wrapper around the offline `gps-denied-replay` pipeline. Upload
+    (tlog + video [+ calibration]); receive GPS fixes + an accuracy report + an HTML
+    map.
+  version: 1.0.0
+paths:
+  /healthz:
+    get:
+      summary: Healthz
+      operationId: healthz_healthz_get
+      responses:
+        '200':
+          description: Successful Response
+          content:
+            application/json:
+              schema:
+                additionalProperties:
+                  type: string
+                type: object
+                title: Response Healthz Healthz Get
+  /readyz:
+    get:
+      summary: Readyz
+      operationId: readyz_readyz_get
+      responses:
+        '200':
+          description: Successful Response
+          content:
+            application/json:
+              schema: {}
+  /replay:
+    post:
+      summary: Post Replay
+      operationId: post_replay_replay_post
+      parameters:
+      - name: authorization
+        in: header
+        required: false
+        schema:
+          anyOf:
+          - type: string
+          - type: 'null'
+          title: Authorization
+      requestBody:
+        required: true
+        content:
+          multipart/form-data:
+            schema:
+              $ref: '#/components/schemas/Body_post_replay_replay_post'
+      responses:
+        '200':
+          description: Successful Response
+          content:
+            application/json:
+              schema: {}
+        '422':
+          description: Validation Error
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/HTTPValidationError'
+  /jobs/{job_id}:
+    get:
+      summary: Get Job
+      operationId: get_job_jobs__job_id__get
+      parameters:
+      - name: job_id
+        in: path
+        required: true
+        schema:
+          type: string
+          title: Job Id
+      - name: authorization
+        in: header
+        required: false
+        schema:
+          anyOf:
+          - type: string
+          - type: 'null'
+          title: Authorization
+      responses:
+        '200':
+          description: Successful Response
+          content:
+            application/json:
+              schema:
+                type: object
+                additionalProperties: true
+                title: Response Get Job Jobs  Job Id  Get
+        '422':
+          description: Validation Error
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/HTTPValidationError'
+  /jobs/{job_id}/result:
+    get:
+      summary: Get Result
+      operationId: get_result_jobs__job_id__result_get
+      parameters:
+      - name: job_id
+        in: path
+        required: true
+        schema:
+          type: string
+          title: Job Id
+      - name: authorization
+        in: header
+        required: false
+        schema:
+          anyOf:
+          - type: string
+          - type: 'null'
+          title: Authorization
+      responses:
+        '200':
+          description: Successful Response
+          content:
+            application/json:
+              schema: {}
+        '422':
+          description: Validation Error
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/HTTPValidationError'
+  /jobs/{job_id}/map:
+    get:
+      summary: Get Map
+      operationId: get_map_jobs__job_id__map_get
+      parameters:
+      - name: job_id
+        in: path
+        required: true
+        schema:
+          type: string
+          title: Job Id
+      - name: authorization
+        in: header
+        required: false
+        schema:
+          anyOf:
+          - type: string
+          - type: 'null'
+          title: Authorization
+      responses:
+        '200':
+          description: Successful Response
+          content:
+            application/json:
+              schema: {}
+        '422':
+          description: Validation Error
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/HTTPValidationError'
+  /jobs/{job_id}/report:
+    get:
+      summary: Get Report
+      operationId: get_report_jobs__job_id__report_get
+      parameters:
+      - name: job_id
+        in: path
+        required: true
+        schema:
+          type: string
+          title: Job Id
+      - name: authorization
+        in: header
+        required: false
+        schema:
+          anyOf:
+          - type: string
+          - type: 'null'
+          title: Authorization
+      responses:
+        '200':
+          description: Successful Response
+          content:
+            application/json:
+              schema: {}
+        '422':
+          description: Validation Error
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/HTTPValidationError'
+components:
+  schemas:
+    Body_post_replay_replay_post:
+      properties:
+        tlog:
+          type: string
+          format: binary
+          title: Tlog
+        video:
+          type: string
+          format: binary
+          title: Video
+        calibration:
+          anyOf:
+          - type: string
+            format: binary
+          - type: 'null'
+          title: Calibration
+        pace:
+          type: string
+          title: Pace
+          default: asap
+        auto_trim:
+          type: boolean
+          title: Auto Trim
+          default: true
+      type: object
+      required:
+      - tlog
+      - video
+      title: Body_post_replay_replay_post
+    HTTPValidationError:
+      properties:
+        detail:
+          items:
+            $ref: '#/components/schemas/ValidationError'
+          type: array
+          title: Detail
+      type: object
+      title: HTTPValidationError
+    ValidationError:
+      properties:
+        loc:
+          items:
+            anyOf:
+            - type: string
+            - type: integer
+          type: array
+          title: Location
+        msg:
+          type: string
+          title: Message
+        type:
+          type: string
+          title: Error Type
+      type: object
+      required:
+      - loc
+      - msg
+      - type
+      title: ValidationError
@@ -0,0 +1,135 @@
+# Contract: `replay_api` HTTP service
+
+**Owner**: AZ-701 (epic AZ-696 / cycle-2 multi-flight demo deliverables).
+**Producer task**: AZ-701 (this contract).
+**Consumer**: any HTTP client — operator dashboards, the parent-suite UI, demo runners, ad-hoc `curl` sessions.
+**Version**: 1.0.0
+**Status**: draft (in-testing on Jetson)
+**Last Updated**: 2026-05-20
+**Module-layout home**:
+- `src/gps_denied_onboard/replay_api/app.py` — FastAPI app factory + uvicorn entrypoint.
+- `src/gps_denied_onboard/replay_api/handlers.py` — request handlers (multipart parse, magic-byte validation, auth dependency).
+- `src/gps_denied_onboard/replay_api/jobs.py` — in-memory `JobRegistry` + `JobRecord` + concurrency limit.
+- `src/gps_denied_onboard/replay_api/storage.py` — per-job temp directory lifecycle + cleanup.
+- `src/gps_denied_onboard/replay_api/interface.py` — `ReplayRunner` Protocol + DTOs (`ReplayJobResult`, `JobState`, `JobSnapshot`).
+- `src/gps_denied_onboard/replay_api/errors.py` — typed HTTP error families.
+- `src/gps_denied_onboard/cli/replay_api_entrypoint.py` — `replay-api` console-script.
+- `docker/replay-api.Dockerfile` — operator-side container image.
+
+## Purpose
+
+Expose the offline replay pipeline (`gps-denied-replay` CLI from AZ-402, plus the `gps-denied-render-map` HTML renderer from AZ-700) over a single HTTP surface so external consumers can upload `(tlog + video [+ calibration])` and receive GPS fixes + an accuracy report + a map without installing the Python stack.
+
+The service is **operator-side only**: it is NOT bundled into the airborne binary. It runs in its own container (`docker/replay-api.Dockerfile`) and is started via the `replay-api` console-script in the `[operator-tools]` optional-dependency group.
+
+## Design invariants
+
+1. **The service does not re-implement the estimator.** It shells out to the existing `gps-denied-replay` console-script. The estimator path is exactly what runs on the airborne binary; the service is a thin HTTP shim.
+2. **No persistent state.** Jobs live in-process; uploads live in per-job temp directories that are deleted on completion or service shutdown. Operators that need durable history persist the JSONL + Markdown report + HTML map artefacts out-of-band.
+3. **Sync vs. async is decided by file size, not by the client.** Videos ≤ `REPLAY_API_SYNC_MAX_BYTES` (default 200 MB) run inline; larger uploads are queued and the client polls.
+4. **Magic-byte file validation** is applied before any data is handed to the estimator. The service refuses uploads whose first bytes do not match the expected `.tlog` (MAVLink magic byte `0xFD` for v2.0) or `.mp4` (`ftyp` box at offset 4) signatures.
+5. **Bearer-token auth** is the only auth surface. Default is **on**; `REPLAY_API_AUTH_REQUIRED=false` opts out for local dev and emits a WARN log on every request.
+
+## Public API
+
+The OpenAPI spec is the authoritative source — see `_docs/02_document/contracts/replay_api/openapi.yaml`. The summary below mirrors it for human readers.
+
+### `POST /replay`
+
+Multipart upload accepting:
+- `tlog`: binary `.tlog` file (required).
+- `video`: `.mp4` file (required).
+- `calibration`: camera-calibration JSON (optional; defaults to the AZ-702 KHP20S30 factory-sheet if the operator built the image with that calibration baked in).
+- `pace`: `asap` | `realtime` (form field, optional; default `asap`).
+- `auto_trim`: `true` | `false` (form field, optional; default `true`).
+
+Response shapes:
+
+- **Sync mode** (video ≤ `REPLAY_API_SYNC_MAX_BYTES`):
+  - `200 OK` with body `{ "job_id": "<uuid>", "state": "done", "emissions_jsonl_url": "...", "accuracy_report_md_url": "...", "map_html_url": "..." }`
+- **Async mode** (video > `REPLAY_API_SYNC_MAX_BYTES` OR concurrency limit reached at submit time):
+  - `202 Accepted` with `Location: /jobs/{id}` header and body `{ "job_id": "<uuid>", "state": "queued" | "running", "status_url": "/jobs/{id}" }`
+
+### `GET /jobs/{id}`
+
+Returns the job snapshot:
+
+```json
+{
+  "job_id": "...",
+  "state": "queued" | "running" | "done" | "failed",
+  "submitted_at_utc": "...",
+  "started_at_utc": "...",
+  "finished_at_utc": "...",
+  "error": "<string, present only when state=failed>",
+  "result": { ... },
+  "status_url": "...",
+  "emissions_jsonl_url": "...",
+  "accuracy_report_md_url": "...",
+  "map_html_url": "..."
+}
+```
+
+### `GET /jobs/{id}/result`
+
+Streams the JSONL emissions file. `200 OK` with `Content-Type: application/x-ndjson`. `409 Conflict` when the job is not in state `done`.
+
+### `GET /jobs/{id}/map`
+
+Streams the HTML map produced by AZ-700. `200 OK` with `Content-Type: text/html`. `409 Conflict` when the job is not in state `done`.
+
+### `GET /jobs/{id}/report`
+
+Streams the Markdown accuracy report produced by AZ-699. `200 OK` with `Content-Type: text/markdown`. `409 Conflict` when the job is not in state `done`.
+
+### `GET /healthz`
+
+Liveness probe. `200 OK` with `{"status":"ok"}` whenever the FastAPI app can process requests.
+
+### `GET /readyz`
+
+Readiness probe. `200 OK` only when the `gps-denied-replay` console-script is resolvable on `PATH` AND the storage root is writeable. `503 Service Unavailable` otherwise — Kubernetes / docker-compose health checks should use this, not `/healthz`.
+
+## Errors
+
+All errors are JSON objects of shape `{ "error_code": "...", "message": "...", "details": { ... } }`.
+
+| HTTP | `error_code`                  | When |
+|------|-------------------------------|------|
+| 400  | `unsupported_file_kind`       | Magic-byte validation failed. |
+| 400  | `multipart_missing_field`     | Required field absent. |
+| 401  | `unauthorized`                | Missing or wrong bearer token (when auth required). |
+| 404  | `job_not_found`               | `GET /jobs/{id}*` for an unknown id. |
+| 409  | `job_not_complete`            | Result/map/report requested while job is not `done`. |
+| 413  | `payload_too_large`           | Upload exceeded `REPLAY_API_MAX_UPLOAD_BYTES`. |
+| 429  | `concurrency_limit_reached`   | More than `REPLAY_API_MAX_CONCURRENT_JOBS` running. The handler still accepts the job and queues it; this code surfaces only when the queue itself is full. |
+| 500  | `replay_runner_failed`        | The `gps-denied-replay` subprocess exited non-zero. `details.stderr_tail` carries the last 8 KB of stderr. |
+
+## Configuration
+
+| Env var                              | Default              | Meaning |
+|--------------------------------------|----------------------|---------|
+| `REPLAY_API_BEARER_TOKEN`            | _none_               | Required when `REPLAY_API_AUTH_REQUIRED=true`. |
+| `REPLAY_API_AUTH_REQUIRED`           | `true`               | Set to `false` to disable bearer-token auth (dev only — WARN logged). |
+| `REPLAY_API_MAX_UPLOAD_BYTES`        | `2147483648` (2 GB)  | Per-upload hard limit. |
+| `REPLAY_API_SYNC_MAX_BYTES`          | `209715200` (200 MB) | Video size at which the service switches to async. |
+| `REPLAY_API_MAX_CONCURRENT_JOBS`     | `1`                  | Max running estimator subprocesses. |
+| `REPLAY_API_MAX_QUEUED_JOBS`         | `8`                  | Max queued jobs. Above this the API returns 429. |
+| `REPLAY_API_STORAGE_ROOT`            | `/var/azaion/replay_api` | Per-job temp dir parent. |
+| `REPLAY_API_REPLAY_BINARY`           | `gps-denied-replay`  | Override the replay CLI binary used by the runner. |
+| `REPLAY_API_RENDER_BINARY`           | `gps-denied-render-map` | Override the map-render CLI used by the runner. |
+
+## Versioning rules
+
+- Breaking changes to request / response schemas bump the major version and ship under `/v2/replay`. The `/replay` path remains v1 for one release after `/v2` ships.
+- The response shape may grow new fields without a version bump; clients MUST tolerate unknown fields.
+- The `error_code` set is appended-only; clients MUST tolerate unknown codes.
+- The `state` enum may grow new terminal-style values (e.g. `cancelled`) only with a minor bump documented in the OpenAPI changelog block.
+
+## Out of scope
+
+- Persistent job database — see invariant 2.
+- WebSocket / SSE progress streaming.
+- Authentication beyond bearer token (mTLS / OAuth2 are deliberately out).
+- Multi-node scheduling — a single host runs at most `REPLAY_API_MAX_CONCURRENT_JOBS` subprocesses.
+- A built-in web UI — operator dashboards integrate over HTTP.
@@ -562,6 +562,9 @@ The following DTOs flow through the per-frame pipeline in memory and are **NOT**
 | `PostLandingUploadRequest` | C12 CLI (`upload-pending` subcommand) | C12 `PostLandingUploadOrchestrator` | Never persisted — composed inline from CLI args |
 | `ReLocHint` | C12 CLI (`reloc-confirm` subcommand) | C12 `OperatorReLocService` → `OperatorCommandTransport` (E-C8 concrete) → airborne companion | FDR `c12.reloc.requested` record (full hint un-redacted; `outcome ∈ {sent, failed}`) |
 | `CameraCalibration` (loaded once) | calibration loader | C1, C3, C4 | NOT in PostgreSQL — see § 2.6 |
+| `RouteSpec` (cycle 3 — `_types/route.py`, AZ-845 canonical home; produced by `replay_input/tlog_route.py` AZ-836) | `replay_input.tlog_route.extract_route_from_tlog(tlog, *, max_waypoints, …)` | C11 `SatelliteProviderRouteClient.seed_route` (AZ-838); cycle-3 e2e fixture `operator_pre_flight_setup` (AZ-839); E2E orchestrator test (AZ-840) | NOT in PostgreSQL — transient pre-flight planning DTO. Fields: `waypoints: tuple[(lat, lon)]` (1..max_waypoints), `suggested_region_size_meters: float`, `source_tlog: Path`, `source_segment: (start_idx, end_idx)`, `total_distance_meters: float`. `frozen=True, slots=True`. |
+| `RouteSeedResult` (cycle 3 — `c11_tile_manager/route_client.py`, AZ-838) | C11 `SatelliteProviderRouteClient.seed_route` | cycle-3 e2e fixture `operator_pre_flight_setup` (AZ-839); seed CLI `tests/fixtures/derkachi_c6/seed_route.py` | NOT in PostgreSQL — transient outcome DTO. Fields: `route_id: uuid`, `terminal_status: str`, `maps_ready: bool`, `tile_count: int`, `elapsed_ms: int`, `submitted_payload_sha256: str`. `frozen=True, slots=True`. |
+| `PopulatedC6Cache` (cycle 3 — `tests/e2e/replay/conftest.py`, AZ-839) | `operator_pre_flight_setup` fixture | replay e2e tests including `test_az835_e2e_real_flight.py` (AZ-840) and the AZ-699 verdict test | NOT in PostgreSQL — test-fixture-only DTO. Fields: `cache_root: Path`, `tile_store_path: Path`, `faiss_index_path: Path`, `faiss_sidecar_sha256_path: Path`, `faiss_sidecar_meta_path: Path`, `route_spec: RouteSpec`, `tile_count: int`, `elapsed_seconds: float`. Backed by a docker named volume that survives across pytest sessions in the same compose run. |

 ---

@@ -3,6 +3,18 @@
 > Date: 2026-05-09 (Plan Phase 2c — initial draft).
 > Inputs: `_docs/02_document/architecture.md` § 3 (Deployment Model); ADR-002 (build-time exclusion); ADR-005 (Tier-1 / Tier-2 are first-class); ADR-007 (`mock-suite-sat-service` is an e2e-test fixture; reversed 2026-05-09 from the earlier "real component boundary" framing).

+> **Test-execution policy update — 2026-05-20**: **all tests run on
+> Jetson only.** This Plan-phase document and ADR-005 are partially
+> superseded — Tier-1 (workstation Docker / GitHub-hosted x86) is no
+> longer used for ANY test stage (Lint, Unit, Integration, SBOM, Security
+> below). Only the build/push lanes for `companion-tier1` and
+> `operator-orchestrator` images may continue to run on x86 agents,
+> since those images are registry artefacts consumed downstream (operator
+> workstations). For the operative CI contract see
+> `_docs/04_deploy/ci_cd_pipeline.md`; for the test-environment policy
+> see `_docs/02_document/tests/environment.md` (the source of truth on
+> this decision).
+
 ## Pipeline Overview

 The pipeline has **two execution tiers** (architecture.md ADR-005), reflected in two CI runner pools that share the same workflow definitions but differ in runner labels and active job set:
@@ -19,7 +19,7 @@ Bootstrap reference: `_docs/02_tasks/todo/AZ-263_initial_structure.md`. Architec
 6. The composition root is `src/gps_denied_onboard/runtime_root.py`. It is the ONLY place that may import concrete strategy implementations across components — every other cross-component dependency is constructor-injected against an interface (ADR-009).
 7. Tests mirror the component graph 1:1 at `tests/unit/<component>/`. In-process cross-component scenarios that import SUT source live under `tests/integration/`. The **blackbox / e2e** test harness — which MUST NOT import SUT source and exercises the system only via public boundaries (MAVLink / MSP2 / HTTP / filesystem) — lives at the repo-root `e2e/` directory and is owned by the `blackbox_tests` cross-cutting entry (Shared section). Performance, resilience, security, and resource-limit scenarios that are also boundary-driven likewise live under `e2e/tests/<category>/`; only in-process performance/security micro-tests (if any) would live under `tests/perf/`, `tests/security/`, `tests/resilience/`.
 8. Build-time exclusion (ADR-002): each `<component>/_native/` and the corresponding `cpp/<lib>/` carry a CMake `BUILD_<NAME>` flag. The composition root validator refuses to wire a strategy whose flag is OFF.
-9. **AZ-507 cross-component contract surface** — the only places a `components/<X>/*.py` file may import are: its own subpackage (`gps_denied_onboard.components.<X>.*`), `_types/*`, `_types.inference_errors`, `helpers/*`, `config`, `logging`, `fdr_client`, `clock`, `frame_source` (interface only). Cross-component contracts (Protocols + typed exceptions) reach consumers through `_types/*` modules — DTOs in the canonical `_types` files (e.g. `_types.inference.EngineCacheEntry`), typed-error envelopes in `_types.inference_errors`, and consumer-side structural `Protocol` cuts defined locally inside each consuming component (e.g. `c10_provisioning.engine_compiler.CompileEngineCallable`). NEVER `from gps_denied_onboard.components.<other_component> import ...` — the AZ-270 `test_az270_compose_root.test_ac6_only_compose_root_imports_concrete_strategies` lint enforces this on every `components/**/*.py`. The composition root (`runtime_root/*`) is the single exception; it wires concrete strategies into duck-typed Protocol parameters via constructor injection. This rule is the architectural contract paired with the AZ-270 lint; see `architecture.md` § Cross-Component Contract Surface for the rationale.
+9. **AZ-507 cross-component contract surface** — the only places a `components/<X>/*.py` file may import are: its own subpackage (`gps_denied_onboard.components.<X>.*`), `_types/*`, `_types.inference_errors`, `helpers/*`, `config`, `logging`, `fdr_client`, `clock`, `frame_source` (interface only). Cross-component contracts (Protocols + typed exceptions) reach consumers through `_types/*` modules — DTOs in the canonical `_types` files (e.g. `_types.inference.EngineCacheEntry`), typed-error envelopes in `_types.inference_errors`, and consumer-side structural `Protocol` cuts defined locally inside each consuming component (e.g. `c10_provisioning.engine_compiler.CompileEngineCallable`). NEVER `from gps_denied_onboard.components.<other_component> import ...` — the AZ-270 `test_az270_compose_root.test_ac6_only_compose_root_imports_concrete_strategies` lint enforces this on every `components/**/*.py`. The composition root (`runtime_root/*`) is the single exception; it wires concrete strategies into duck-typed Protocol parameters via constructor injection. Two narrow carve-outs apply to the lint's enforcement on `components/**/*.py` (AZ-847, source of truth: docstring of `tests/unit/test_az270_compose_root.test_ac6_only_compose_root_imports_concrete_strategies`): (a) **bench exclusion** — `components/<X>/bench/**` files are skipped entirely, since benchmark / measurement code legitimately constructs production strategies via `runtime_root.*` factories (that is its job); (b) **self-registration carve-out** — an `ImportFrom` whose module starts with `gps_denied_onboard.runtime_root.` AND every imported name starts with `register_` is allowed (the registry pattern, e.g. `c5_state.gtsam_isam2_estimator.register` calling `runtime_root.state_factory.register_state_estimator`). Any other import from `runtime_root.*` inside a Layer-3 component (e.g. `build_*` factories, `compose_root`, `StrategyNotLinkedError`) remains a violation. This rule is the architectural contract paired with the AZ-270 lint; see `architecture.md` § Cross-Component Contract Surface for the rationale.

 ## Per-Component Mapping

@@ -197,6 +197,7 @@ Bootstrap reference: `_docs/02_tasks/todo/AZ-263_initial_structure.md`. Architec
  - `msp2_inav_adapter.py` (iNav via MSP2)
  - `mavlink_gcs_adapter.py` (1–2 Hz downsampled summary to QGroundControl)
  - `tlog_replay_adapter.py` (replay-mode `FcAdapter`; gated `BUILD_TLOG_REPLAY_ADAPTER`; ON in airborne per ADR-011; AZ-265)
+  - `csv_replay_adapter.py` (`CsvReplayFcAdapter` — outbound shim for the AZ-894 CSV-driven replay path; same `FcAdapter` Protocol parity as `tlog_replay_adapter`; gated `BUILD_TLOG_REPLAY_ADAPTER` for the airborne replay binary; AZ-894)
  - `replay_sink.py` (`ReplaySink` interface + `JsonlReplaySink` impl; gated `BUILD_REPLAY_SINK_JSONL`; ON in airborne per ADR-011; AZ-265)
  - `noop_mavlink_transport.py` (`NoopMavlinkTransport` for replay-mode outbound bytes; gated `BUILD_REPLAY_SINK_JSONL`; ON in airborne; AZ-265 / AZ-400)
  - `serial_mavlink_transport.py` (`SerialMavlinkTransport` retrofit of the existing live-mode UART transport; AZ-265 / AZ-400 no-op restructure)
@@ -233,8 +234,14 @@ Bootstrap reference: `_docs/02_tasks/todo/AZ-263_initial_structure.md`. Architec
  - `__init__.py` (re-exports `TileDownloader`, `TileUploader`)
  - `interface.py` (`TileDownloader`, `TileUploader` Protocols)
 - **Internal**:
-  - `satellite_provider_downloader.py` (REST client against parent-suite `satellite-provider`)
-  - `satellite_provider_uploader.py` (post-landing batch upload, D-PROJ-2 ingest contract)
+  - `_types.py` (component-internal DTOs / enums consumed by the Public API re-exports)
+  - `config.py` (`C11Config` + `C11RetryConfig` dataclasses; registered on import)
+  - `errors.py` (component error family — `TileManagerError` + Tile* subtypes; AZ-838-era additions: `SatelliteProviderRouteError`, `RouteValidationError`, `RouteTransientError`, `RouteTerminalFailureError`)
+  - `idempotent_retry.py` (AZ-320 — bounded retry decorator + per-flight signing-key state)
+  - `route_client.py` (AZ-838 — `SatelliteProviderRouteClient` for the parent-suite Route API; cycle-3 NEW from batch 107)
+  - `signing_key.py` (AZ-318 — per-flight MAVLink 2.0 signing key handshake + key rotation)
+  - `tile_downloader.py` (AZ-316 — REST client against parent-suite `satellite-provider`)
+  - `tile_uploader.py` (AZ-319 — post-landing batch upload, D-PROJ-2 ingest contract)
 - **Owns**: `src/gps_denied_onboard/components/c11_tile_manager/**`, `tests/unit/c11_tile_manager/**`
 - **Imports from**: `_types`, `helpers.sha256_sidecar`, `helpers.wgs_converter`, `config`, `logging`, `fdr_client`. The c6 storage surface (`TileStore`, `TileMetadataStore`) is obtained via constructor-injected consumer-side structural Protocol cuts (see AZ-507 cross-component rule below); composition root wires the concrete c6 strategy in. NEVER `from gps_denied_onboard.components.c6_tile_cache import ...` inside `c11_tile_manager/*.py`.
 - **Consumed by**: `c12_operator_orchestrator`, `runtime_root` (operator binary only — `BUILD_C11_TILE_MANAGER=OFF` for airborne)
@@ -274,9 +281,11 @@ Bootstrap reference: `_docs/02_tasks/todo/AZ-263_initial_structure.md`. Architec
 ### shared/_types

 - **Directory**: `src/gps_denied_onboard/_types/`
- **Purpose**: Cross-component DTOs (NavCameraFrame, ImuSample, ImuWindow, AttitudeWindow, FlightStateSignal, GpsHealth, VioOutput, VprQuery, VprResult, RerankResult, MatchResult, PoseEstimate, EstimatorOutput, EstimatorHealth, Tile, TileQualityMetadata, TileRecord, SectorClassification, CameraCalibration, EmittedExternalPosition, Manifest, EngineCacheEntry). **Type-only stubs**: zero implementation logic.
+- **Purpose**: Cross-component DTOs (NavCameraFrame, ImuSample, ImuWindow, AttitudeWindow, FlightStateSignal, GpsHealth, VioOutput, VprQuery, VprResult, RerankResult, MatchResult, PoseEstimate, EstimatorOutput, EstimatorHealth, Tile, TileQualityMetadata, TileRecord, SectorClassification, CameraCalibration, EmittedExternalPosition, Manifest, EngineCacheEntry, RouteSpec). **Type-only stubs**: zero implementation logic.
 - **Owned by**: AZ-263 (Bootstrap task); subsequent additions are type-only edits owned by the proposing component task.
 - **Consumed by**: every component, every cross-cutting module, the composition root.
+- **Files (selected — see directory for full list)**:
+  - `route.py` (AZ-845 / Epic AZ-835 C1 — canonical home): `RouteSpec` (waypoints + suggested region size + source tlog provenance), produced by `replay_input/tlog_route.py` (re-exported), consumed by `components/c11_tile_manager/route_client.py`

 ### shared/config

@@ -386,11 +395,15 @@ Bootstrap reference: `_docs/02_tasks/todo/AZ-263_initial_structure.md`. Architec
 ### shared/replay_input

 - **Directory**: `src/gps_denied_onboard/replay_input/`
- **Purpose**: Layer-4 cross-cutting coordinator that converges `(video, tlog)` inputs into the standard `FrameSource` + `FcAdapter` + `Clock` surfaces the airborne composition root consumes. Owns the time-alignment between video frames and tlog IMU/attitude ticks (manual via `--time-offset-ms` or automatic via the AZ-405 IMU-take-off detector). The composition root, in replay mode, builds a `ReplayInputAdapter`, calls `.open()`, and wires the returned `ReplayInputBundle` into the same C1–C5 pipeline as live. New under ADR-011 (replaces the v1.0.0 design where replay was a separate composition root).
-  - `__init__.py` (re-exports `ReplayInputAdapter`, `ReplayInputBundle`, `AutoSyncDecision`, `AutoSyncConfig`)
-  - `interface.py` (`ReplayInputAdapter` class declaration + `ReplayInputBundle` DTO)
-  - `tlog_video_adapter.py` (concrete `ReplayInputAdapter` that instantiates `VideoFileFrameSource` + `TlogReplayFcAdapter` + chosen `Clock`)
-  - `auto_sync.py` (AZ-405 IMU-take-off / video-motion-onset detectors + combined offset computation + AC-8 frame-window-match validator)
+- **Purpose**: Layer-4 cross-cutting coordinator. Under AZ-894 the production replay pipeline drives off the operator's IMU+GPS CSV via `CsvReplayFcAdapter`. The legacy `(video, tlog)` auto-sync surface was deprecated by AZ-895 and will be physically removed by AZ-908. The composition root, in replay mode, builds the CSV bundle (frame source + CSV FC adapter + clock) and wires the returned `ReplayInputBundle` into the same C1–C5 pipeline as live. New under ADR-011 (replaces the v1.0.0 design where replay was a separate composition root).
+  - `__init__.py` (re-exports `ReplayInputAdapter`, `ReplayInputBundle`, `AutoSyncDecision`, `AutoSyncConfig`, `ReplayInputAdapterError`, `CsvGpsFix`, `CsvGroundTruth`, `load_csv_ground_truth`, plus the AZ-697 / AZ-836 surfaces: `TlogGpsFix`, `TlogGroundTruth`, `load_tlog_ground_truth`, `RouteSpec`, `RouteExtractionError`, `extract_route_from_tlog`)
+  - `csv_ground_truth.py` (AZ-894 — `load_csv_ground_truth` + `CsvGpsFix` / `CsvGroundTruth`; the canonical replay ground-truth surface)
+  - `interface.py` (`ReplayInputAdapter` class declaration + `ReplayInputBundle` DTO + `AlignedWindow` / `AutoSyncConfig` / `AutoSyncDecision` DTOs — the auto-sync DTOs are deprecated by AZ-895 and slated for removal in AZ-908)
+  - `errors.py` (AZ-405 — `ReplayInputAdapterError` envelope; subclass of `RuntimeError` so the airborne main maps every coordinator-scope failure to CLI exit code 2)
+  - `tlog_video_adapter.py` — **DEPRECATED (AZ-895)**: `ReplayInputAdapter.open()` raises `ReplayInputAdapterError`; retained as an import-stable stub for one cycle. AZ-908 removes it.
+  - `auto_sync.py` — **DEPRECATED (AZ-895)**: every detector + validator raises `ReplayInputAdapterError`; retained as an import-stable stub for one cycle. AZ-908 removes it.
+  - `tlog_ground_truth.py` (AZ-697 — `load_tlog_ground_truth` + `TlogGpsFix` / `TlogGroundTruth` for direct binary tlog GPS-truth extraction; AUDIT-ONLY after AZ-895, retained for the AZ-699 / AZ-701 validation paths against legacy `.tlog` archives)
+  - `tlog_route.py` (AZ-836 — `extract_route_from_tlog` + `RouteExtractionError`; re-exports `RouteSpec` from `_types.route`. Reduces a tlog to a coarsened route via Douglas-Peucker on local ENU; consumed by `c11_tile_manager.route_client.SatelliteProviderRouteClient.seed_route`)
  - `tests/`
 - **Owned by**: AZ-265 (E-DEMO-REPLAY) — task AZ-405 (auto-sync + coordinator).
 - **Consumed by**: `runtime_root` (replay-mode branch of `compose_root`); `cli/replay.py`. Layer-4 module: imports from Layer 1 (`frame_source` interface, `clock` interface, `_types`, `config`, `logging`, `fdr_client`, `helpers.wgs_converter`) and instantiates Layer-4 strategies (`c8_fc_adapter.tlog_replay_adapter`, `frame_source.video_file_frame_source`). Does NOT import from Layer 3 (no component-level dependencies).
@@ -0,0 +1,62 @@
+# Ripple Log — Cycle 3 (End-of-Cycle Documentation Sync)
+
+> Produced as part of existing-code flow Step 13 (Update Docs, document skill Task mode).
+> Source: `_docs/_autodev_state.md` (`cycle: 3`).
+> Date: 2026-05-26.
+
+## Input set
+
+The 8 task specs in `_docs/02_tasks/done/` whose mtime falls inside cycle 3
+(2026-05-22 .. 2026-05-23):
+
+| Task | Title | Surface |
+|------|-------|---------|
+| AZ-836 | TlogRouteExtractor (Epic AZ-835 C1) | NEW `replay_input/tlog_route.py`, NEW `_types/route.py` (RouteSpec) |
+| AZ-838 | SatelliteProviderRouteClient + `seed_route.py` CLI (Epic AZ-835 C2) | NEW `components/c11_tile_manager/route_client.py`, NEW `tests/fixtures/derkachi_c6/seed_route.py` |
+| AZ-839 | `operator_pre_flight_setup` real fixture (Epic AZ-835 C3) | REWRITE `tests/e2e/replay/conftest.py::operator_pre_flight_setup`, NEW `PopulatedC6Cache` |
+| AZ-840 | E2E orchestrator test (Epic AZ-835 C4) | NEW `tests/e2e/replay/test_az835_e2e_real_flight.py` |
+| AZ-777 | Derkachi C6 reference fixture (Phases 1+2; Phases 3–5 superseded by AZ-839/AZ-841/AZ-842) | MODIFY `c11_tile_manager/tile_downloader.py` (inventory + slippy-map paths), `docker-compose.test.jetson.yml`, `.env.test.example`; NEW `tests/fixtures/derkachi_c6/{seed_region.py,bbox.yaml,README.md}`, NEW `tests/e2e/satellite_provider/test_smoke.py` |
+| AZ-845 | Relocate RouteSpec → `_types/route.py` (refactor 02 anchor) | NEW `_types/route.py`; MODIFY `replay_input/tlog_route.py`, `replay_input/__init__.py`, `components/c11_tile_manager/route_client.py` import |
+| AZ-846 | Refresh `module-layout.md` cycle-3 entries (refactor 02) | MODIFY `_docs/02_document/module-layout.md` ONLY |
+| AZ-847 | Widen AZ-270 lint to enforce full rule-9 allow-list (refactor 02) | MODIFY `tests/unit/test_az270_compose_root.py` ONLY |
+
+## Task Step 0.5 — Import-graph ripple
+
+Reverse-dependency scan for the 4 production source changes:
+
+| Changed file | Importers (production source) | Affected components |
+|--------------|------------------------------|---------------------|
+| `_types/route.py` (NEW) | `replay_input/tlog_route.py`, `replay_input/__init__.py` (re-export), `components/c11_tile_manager/route_client.py`, `components/c11_tile_manager/__init__.py` (re-export) | c11_tile_manager, shared/replay_input, shared/_types |
+| `replay_input/tlog_route.py` (NEW) | `replay_input/__init__.py` (re-export) | shared/replay_input |
+| `components/c11_tile_manager/route_client.py` (NEW) | `components/c11_tile_manager/__init__.py` (re-export) | c11_tile_manager |
+| `components/c11_tile_manager/tile_downloader.py` (MODIFIED — `_INVENTORY_PATH`, `_TILES_PATH`, default per-tile byte estimate) | `runtime_root/c11_factory.py::build_tile_downloader` (constructor unchanged; endpoint constants are module-internal) | c11_tile_manager |
+
+No surprise ripple to other components. All edges land inside `c11_tile_manager` + shared (`_types/`, `replay_input/`), which is consistent with the AZ-507 cross-component allow-list (AZ-845 fixes the previous violation; AZ-846 registers the new files; AZ-847 widens the lint to keep it that way).
+
+## Refresh set for Task Steps 1–4
+
+| Update level | This cycle's refresh set | Status |
+|--------------|-------------------------|--------|
+| Task Step 1 — Module docs | This project's Plan uses component-level granularity; no `_docs/02_document/modules/` folder. Authoritative module-ownership lives in `_docs/02_document/module-layout.md`. | Already refreshed by AZ-846 — sections `c11_tile_manager Internal`, `shared/replay_input`, `_types/` updated to register `route_client.py`, `tlog_route.py`, `route.py`. No further action. |
+| Task Step 2 — Component docs | `components/12_c11_tilemanager/description.md` (3rd interface + endpoint adaptation), `contracts/c11_tilemanager/tile_downloader.md` (endpoint paths), `contracts/c11_tilemanager/route_client.md` (NEW). | Updated this session. |
+| Task Step 3 — System-level docs | `architecture.md` § 5 satellite-provider sub-section (inventory contract + route-driven seeding); `data_model.md` register `RouteSpec` / `RouteSeedResult` / `PopulatedC6Cache` DTOs; `system-flows.md` F1 pre-flight cache build (route-driven variant); `contracts/replay/replay_protocol.md` Invariant 12 sub-section for AZ-839 / AZ-840. | Updated this session. |
+| Task Step 4 — Problem-level docs | `_docs/00_problem/input_data/flight_derkachi/README.md` (point at `tests/fixtures/derkachi_c6/` + license attribution). No AC / restriction / data_parameters drift this cycle. | Updated this session. |
+
+## Files actually changed this session
+
+- `_docs/02_document/components/12_c11_tilemanager/description.md` — add `SatelliteProviderRouteClient` as a third C11 interface; update `TileDownloader` external API rows to the inventory + slippy-map contract; add a Cycle-3 callout to § 1 Overview.
+- `_docs/02_document/contracts/c11_tilemanager/tile_downloader.md` — replace the `GET /api/satellite/tiles?bbox=…&zoom=…` row with the inventory-POST + slippy-map-GET row pair; bump version.
+- `_docs/02_document/contracts/c11_tilemanager/route_client.md` — NEW contract for `SatelliteProviderRouteClient.seed_route`.
+- `_docs/02_document/contracts/replay/replay_protocol.md` — append AZ-839 / AZ-840 sub-section to Invariant 12 covering the route-driven `operator_pre_flight_setup` fixture + `PopulatedC6Cache`.
+- `_docs/02_document/architecture.md` — append a Cycle-3 sub-section to § 5 satellite-provider integration noting the actual inventory-based read path + the route-driven seeding flow (no new ADR).
+- `_docs/02_document/data_model.md` — register `RouteSpec`, `RouteSeedResult`, `PopulatedC6Cache` as cross-component DTOs.
+- `_docs/02_document/system-flows.md` — extend F1 (pre-flight cache build) with the route-driven variant (tlog → RouteSpec → satellite-provider Route API → populated C6 via inventory + slippy-map).
+- `_docs/00_problem/input_data/flight_derkachi/README.md` — append "Derkachi C6 reference seeding" section pointing at `tests/fixtures/derkachi_c6/{seed_region.py,seed_route.py,bbox.yaml,README.md}` + the license-attribution caveat for Google Maps imagery.
+- `_docs/02_document/ripple_log_cycle3.md` — this file (NEW).
+- `_docs/_autodev_state.md` — sub_step progression through Step 13 task phases.
+
+## Out of scope (carried)
+
+- `tests/` doc updates beyond what Step 12 already produced (`_docs/02_document/tests/blackbox-tests.md`, `traceability-matrix.md` — modified by Step 12 in this cycle). Test-spec sync owns those.
+- Cycle-2 doc carry-overs OUTSIDE the three `module-layout.md` sections AZ-846 touched (`replay_api/` Per-Component Mapping entry, `cli/render_map.py`, `cli/replay_api_entrypoint.py`, `helpers/gps_compare.py`, `helpers/accuracy_report.py`). Tracked in cycle-3 retrospective; require a separate follow-up doc task with its own AZ ID.
+- Untracked `_docs/02_document/system-overview.md` (created 2026-05-24 outside the cycle-3 task surface). Reviewed; content is accurate at the abstraction level it presents; no edit required.
@@ -19,6 +19,7 @@
 | F8 | Companion reboot recovery | Companion process restart while FC remains armed | C8 (FC IMU pose ingest), C5, C10 (warm-cache verify), C13 | Medium |
 | F9 | GCS telemetry stream | Per-frame estimate available + GCS link healthy | C5, C8, [[QGroundControl]] | Medium |
 | F10 | Post-landing tile upload | Operator triggers C12 `PostLandingUploadOrchestrator`; orchestrator confirms `flight_footer.clean_shutdown == True` and invokes C11 `TileUploader` | C12 `PostLandingUploadOrchestrator` (operator-side; reads FDR footer), C11 `TileUploader` (operator-side), C6 (read), [[`satellite-provider`]] (D-PROJ-2 endpoint, planned) | High |
+| F11 | Demo replay validation (operator) | Operator uploads `(video, tlog, calibration)` in suite UI; aligns timelines; runs full GPS-denied replay verdict | [[`suite/ui`]] (AZ-897), `replay_api` (AZ-973), `replay_input` (AZ-970–972), C12 `seed-cache-from-tlog` (AZ-974), C11 route seed, C10, airborne replay (`config.mode=replay`) | High |

 ## Flow Dependencies

@@ -34,6 +35,7 @@
 | F8 | F1 + F2 (warm cache survives reboot via content-hash verify) | F3 (resumes once warm), F5 (degraded mode if recovery fails) |
 | F9 | F3 | n/a (read-only outbound) |
 | F10 | F4 (locally-saved tiles), C13 `flight_footer` written on clean shutdown, parent-suite D-PROJ-2 endpoint availability | F1 of the next flight (uploaded tiles enter the basemap once promoted to `trusted`) |
+| F11 | F1 route-driven variant (AZ-974) OR warm cache; E-DEMO-REPLAY (AZ-265) | F1 (corridor cache), replay JSONL + map artifacts consumed by suite UI |

 **Cross-cutting**: F13 FDR-write is not a flow per se — every flow above has an FDR write side-effect. AC-NEW-3 requires every payload class (estimate, IMU, MAVLink, mid-flight tile, system health, failed-tile thumbnail) to be present; rollover is logged, never silent.

@@ -46,11 +48,25 @@
 The operator builds (or refreshes) the per-mission cache before takeoff. F1 has **three phases** sequenced by C12 OperatorTool:

 - **Phase 0 — Flight resolve (C12 `FlightsApiClient`, AZ-489)**: read the operator-authored `Flight` (ordered waypoints + altitudes) either from the parent-suite `flights` REST service (`--flight-id <Guid>`) or from a local JSON export (`--flight-file <path>`). Compute the bounding box as the envelope of waypoint lat/lon plus a configurable buffer (default 1 km). Extract `Flight.waypoints[0].(lat, lon, alt)` as the **takeoff origin**. Both are passed downstream as `BuildRequest` fields.
- **Phase 1 — Tile download (C11 `TileDownloader`)**: fetch tiles from `satellite-provider` for the bbox computed in Phase 0; apply sector-classified freshness rules (AC-NEW-6) and resolution gate (RESTRICT-SAT-4); write tile rows + JPEGs into C6.
+- **Phase 1 — Tile download (C11 `TileDownloader` — bbox-driven, production path)**: fetch tiles from `satellite-provider` for the bbox computed in Phase 0 via `POST /api/satellite/tiles/inventory` (bulk lookup of `(z,x,y)` coords per `tile-inventory.md` v1.0.0 / AZ-505) + `GET /tiles/{z}/{x}/{y}` (slippy-map JPEG fetch for inventory entries with `present=true`); apply sector-classified freshness rules (AC-NEW-6) and resolution gate (RESTRICT-SAT-4); write tile rows + JPEGs into C6. Auth: JWT Bearer (`SATELLITE_PROVIDER_API_KEY`) over TLS; dev-only `SATELLITE_PROVIDER_TLS_INSECURE=1` accepts self-signed certs.
 - **Phase 2 — Cache artifact build (C10 CacheProvisioner)**: read the populated C6 store; compile/deserialize TRT engines via C7; batch-generate descriptors via the C2 backbone; atomically write the FAISS HNSW index with SHA-256 sidecars; write the Manifest hashing model + calibration + corpus + sector classification **+ takeoff origin** (D-C10-1 idempotence; ADR-010).

 This flow is offline and not time-critical. **Only Phase 0 reaches `flights` REST and Phase 1 reaches `satellite-provider`** — both run on the operator workstation, which is the only host that holds TLS + service-internal credentials. The companion never reaches either service directly (Principle #9 — denied-environment operation).

+#### Phase 1 variant — route-driven seeding (cycle 3 — Epic AZ-835 / AZ-836 + AZ-838 + AZ-839)
+
+A tlog-driven alternative to bbox download lets the operator pre-commit the cache to the precise corridor the drone actually flew. **Production bindings** (Epic AZ-969): C12 `seed-cache-from-tlog` (AZ-974) and the `replay_api` demo job (AZ-973) call the same `operator_replay.cache_seed` module. The e2e fixture `operator_pre_flight_setup` (AZ-839) is a thin wrapper over that production path — not a parallel implementation.
+
+Phase-1 sub-steps in the route-driven variant (replaces the bbox download for that invocation):
+
+1. **Extract corridor from tlog** — `replay_input.tlog_route.extract_route_from_tlog(tlog, *, max_waypoints=10)` (AZ-836). Trims pre-takeoff stationary frames, then coarsens the GPS trace to ≤ `max_waypoints` waypoints via Douglas-Peucker in WGS-84 with great-circle distance. Returns a `RouteSpec(waypoints, suggested_region_size_meters, source_tlog, source_segment, total_distance_meters)` — frozen+slots; canonical home `_types/route.py` (AZ-845).
+2. **Submit to satellite-provider** — `c11_tile_manager.route_client.SatelliteProviderRouteClient.seed_route(spec)` (AZ-838). Pre-emptively validates against the AZ-809 `CreateRouteRequestValidator` bounds (`points` 2..500; `regionSizeMeters` 100..10 000; `zoomLevel` 0..22; lat/lon ranges) BEFORE the HTTP POST. Then POSTs `/api/satellite/route` with `requestMaps=true&createTilesZip=false` and polls `GET /api/satellite/route/{id}` every 5 s × ≤ 60 attempts until `mapsReady=true` (terminal-success) or a terminal-failure status (`{failed, error, rejected}`). Returns a `RouteSeedResult(route_id, terminal_status, maps_ready, tile_count, elapsed_ms, submitted_payload_sha256)`.
+3. **Populate C6 via C11** — enumerate the route's tile coverage locally from `(waypoints, suggested_region_size_meters)`; invoke `tile_downloader.HttpTileDownloader.download_for_bbox` (existing C11 download path) to pull every corridor tile into C6.
+4. **Build FAISS index via C10** — `DescriptorBatcher` against the populated C6 using the NetVLAD backbone (per `c2_vpr/config.py:67` default); verify sidecar triple-consistency (`.index` + `.sha256` + `.meta.json`) per AZ-306; mismatch raises `IndexUnavailableError`.
+5. **Yield `PopulatedC6Cache`** — `(cache_root, tile_store_path, faiss_index_path, faiss_sidecar_sha256_path, faiss_sidecar_meta_path, route_spec, tile_count, elapsed_seconds)`. Backed by a docker named volume that survives across pytest sessions in the same compose run.
+
+Cold-start budget on Tier-2 Jetson: ≤ 5 min (first invocation, full materialisation + descriptor batching); warm: ≤ 30 s (named-volume reuse).
+
 ### Preconditions

 - Operator workstation has network reach to `satellite-provider` (TLS + service-internal API key).
@@ -88,8 +104,10 @@ sequenceDiagram
    FlightsClient->>FlightsClient: takeoff_origin = waypoints[0].(lat, lon, alt)
    FlightsClient-->>C12OperatorTool: (bbox, takeoff_origin, flight_id)
    C12OperatorTool->>C11TileDownloader: download_tiles_for_area(bbox, zooms, sector_class)
-    C11TileDownloader->>SatelliteProvider: GET /api/satellite/tiles?bbox=&zoom=
-    SatelliteProvider-->>C11TileDownloader: Tile blobs + metadata (paged)
+    C11TileDownloader->>SatelliteProvider: POST /api/satellite/tiles/inventory (bulk z,x,y lookup)
+    SatelliteProvider-->>C11TileDownloader: per-entry present:true|false + metadata
+    C11TileDownloader->>SatelliteProvider: GET /tiles/{z}/{x}/{y} (one per present:true entry)
+    SatelliteProvider-->>C11TileDownloader: Tile JPEG body
    C11TileDownloader->>C11TileDownloader: filter by AC-NEW-6 freshness + RESTRICT-SAT-4 resolution
    C11TileDownloader->>C6TileStore: write tiles to ./tiles/{zoomLevel}/{x}/{y}.jpg + Postgres rows (source='googlemaps')
    C11TileDownloader-->>C12OperatorTool: DownloadBatchReport (counts, freshness summary)
@@ -114,7 +132,7 @@ flowchart TD
    FlightOk -->|yes| ComputeBbox[Compute bbox as envelope of waypoint lat/lon + buffer; take waypoints[0] as takeoff origin]
    ComputeBbox --> Classify[Operator classifies sector active_conflict OR stable_rear]
    Classify --> InvokeC11[C12 invokes C11 TileDownloader with computed bbox]
-    InvokeC11 --> Download[C11 GET /api/satellite/tiles for bbox + zoom]
+    InvokeC11 --> Download[C11 POST /api/satellite/tiles/inventory then GET /tiles/{z}/{x}/{y}]
    Download --> FreshnessFilter{Freshness ok per AC-8.2 + AC-NEW-6?}
    FreshnessFilter -->|stale and stable_rear| RejectOrDowngrade[Reject or downgrade tile]
    FreshnessFilter -->|stale and active_conflict| RejectOrDowngrade
@@ -149,10 +167,16 @@ flowchart TD
 | 0d | C12 `FlightsApiClient` (offline) | filesystem | `flight_file` JSON in the same DTO shape | JSON read |
 | 0e | C12 `FlightsApiClient` | C12 | `(bbox, takeoff_origin, flight_id)` | in-process |
 | 1 | C12 | C11 `TileDownloader` | `DownloadRequest(bbox, zoom_levels, sector_class)` | in-process call |
-| 2 | C11 | `satellite-provider` REST | `GET /api/satellite/tiles?bbox=…&zoom=…` | HTTPS query |
-| 3 | `satellite-provider` | C11 | Paged tile blobs + metadata rows | JPEG + JSON metadata |
+| 2a | C11 | `satellite-provider` REST | `POST /api/satellite/tiles/inventory` (bulk `(z,x,y)` lookup, ≤ 5000 entries / request; per `tile-inventory.md` v1.0.0) | HTTPS POST JSON body |
+| 2b | `satellite-provider` | C11 | Per-entry `present: true \| false` + metadata when present | JSON response (order matches request order) |
+| 2c | C11 | `satellite-provider` REST | `GET /tiles/{z}/{x}/{y}` (issued only for `present=true` entries) | HTTPS GET |
+| 3 | `satellite-provider` | C11 | Tile JPEG body | binary JPEG |
 | 4 | C11 | C6 filesystem (over USB/Eth) | Tile JPEG bodies | `./tiles/{zoomLevel}/{x}/{y}.jpg` |
 | 5 | C11 | C6 PostgreSQL | Tile metadata rows (`source='googlemaps'`) | SQL INSERT (mirror of `satellite-provider`'s `tiles` table) |
+| 1' (route variant) | tlog file | `replay_input.tlog_route.extract_route_from_tlog` | `RouteSpec(waypoints, suggested_region_size_meters, …)` | in-process call |
+| 2' (route variant) | C11 `SatelliteProviderRouteClient` | `satellite-provider` REST | `POST /api/satellite/route` (`requestMaps=true`); then `GET /api/satellite/route/{id}` poll until `mapsReady=true` | HTTPS POST + repeated GET |
+| 3' (route variant) | C11 | enumerator | local enumeration of corridor `(z,x,y)` coords from `(waypoints, suggested_region_size_meters)` | in-process |
+| 4'+5' (route variant) | C11 | C6 | same as steps 4+5 above (downloads via the same inventory + slippy-map paths) | as above |
 | 6 | C12 | C10 `CacheProvisioner` | `BuildRequest(bbox, zoom_levels, sector_class, calibration_path, takeoff_origin, flight_id)` | in-process call (operator-orchestrator side); RPC over USB/Eth to companion runner |
 | 7 | C10 → C7 | TRT engine cache | TRT engines | `.engine` files keyed by `(SM, JP, TRT, precision)` (D-C10-7) |
 | 8 | C2 backbone (driven by C10) | C6 FAISS index | Descriptor matrix | `.index` (FAISS HNSW), atomicwrites, SHA-256 sidecar |
@@ -168,7 +192,11 @@ flowchart TD
 | Flight file malformed (offline path) | Step 0d | JSON parse failure / schema mismatch | Fail with line / field reference; instruct operator to re-export from Mission Planner UI; takeoff blocked |
 | Flight has zero waypoints | Step 0e | Post-fetch validation | Fail explicitly; cannot derive bbox or takeoff origin; takeoff blocked |
 | Flight bbox exceeds cache budget | Step 0e | Pre-Phase-1 bbox area vs AC-8.3 budget projection | Fail with budget delta; operator must re-plan a smaller route in Mission Planner UI; takeoff blocked |
-| `satellite-provider` unreachable | Step 2 | HTTP timeout / 5xx | C11 `TileDownloader` fails with explicit error; operator retries when network is available; takeoff blocked |
+| `satellite-provider` unreachable | Step 2a/2c (or 2' route variant) | HTTP timeout / 5xx | C11 `TileDownloader` / `SatelliteProviderRouteClient` fails with explicit error; operator retries when network is available; takeoff blocked |
+| `satellite-provider` JWT auth 401/403 | Step 2a/2c (or 2' route variant) | HTTP 401/403 | Fail with explicit error; instruct operator to refresh `SATELLITE_PROVIDER_API_KEY`; takeoff blocked. Never silently fall back to plaintext or unauthenticated |
+| Route validation fails (route variant) | Step 1'→2' | Pre-emptive client check against AZ-809 `CreateRouteRequestValidator` bounds | `RouteValidationError` raised BEFORE the HTTP POST; surface field-by-field errors to operator |
+| Route materialisation terminal failure (route variant) | Step 2' poll | `GET /api/satellite/route/{id}` returns `status ∈ {failed, error, rejected}` | `RouteTerminalFailureError` with `.detail` carrying the server response JSON; takeoff blocked |
+| Route poll budget exhausted (route variant) | Step 2' poll | 60 attempts × 5 s ceiling reached without `mapsReady=true` or terminal failure | `RouteTransientError` referencing the last observed status; operator may re-invoke or extend the poll budget |
 | Tile fails freshness | Step 3 (C11) | `tile.capture_timestamp` vs `sector_class` threshold | Reject (active_conflict) or downgrade-no-`satellite_anchored`-label (rear), per AC-NEW-6; counts surface in `DownloadBatchReport` |
 | Resolution below 0.5 m/px | Step 3 (C11) | Tile metadata GSD check (RESTRICT-SAT-4) | Reject; report; takeoff blocked |
 | Insufficient cache budget | Step 4 (C11) | Filesystem free-space check pre-write | Fail fast with explicit budget delta; no partial write |
@@ -1057,6 +1085,96 @@ flowchart TD

 ---

+## Flow F11: Demo replay validation (operator)
+
+### Description
+
+Post-flight **product demo** and **validation** flow. The operator uploads a nav-camera video and ArduPilot `.tlog` through the suite UI (AZ-897), visually aligns the two recordings on dual timeline bars, and runs the same airborne GPS-denied pipeline used in live flight — against a corridor cache seeded from the tlog GPS trace. Output: per-tick estimated positions (JSONL), accuracy map, and PASS/FAIL verdict against tlog ground truth (AZ-696 AC-3).
+
+This is **not** a test-harness shortcut. E2E tests (AZ-840) call the same `replay_api` orchestration (AZ-973) and `operator_replay.cache_seed` (AZ-974) as the UI.
+
+**Phases** (sequenced by `replay_api` demo job or manual CLI equivalents):
+
+1. **Preview** (AZ-970) — parse tlog IMU2 activity + video metadata for UI timelines.
+2. **Align** (AZ-897 + AZ-971) — operator coarse offset; backend refine via optical-flow + IMU cross-correlation.
+3. **Export** (AZ-972) — write AZ-896 canonical CSV with `Time=0` at aligned video frame 0 (single canonical clock for replay).
+4. **Seed cache** (AZ-974) — `extract_route_from_tlog` → `SatelliteProviderRouteClient.seed_route` → tile download → FAISS build (F1 route-driven variant).
+5. **Replay** — `gps-denied-replay --video … --imu aligned.csv` with `config.mode=replay`; C1–C5 identical to live.
+6. **Verdict** — horizontal-error distribution + map artifact returned to UI.
+
+Advanced bypass: operator may upload a pre-aligned `(video, CSV)` per AZ-959 without steps 1–3.
+
+### Preconditions
+
+- Operator workstation runs `replay_api` (docker-compose or native) with network to `satellite-provider`.
+- Camera calibration JSON for the flight's nav camera.
+- Tlog contains `SCALED_IMU2` (or `RAW_IMU`) and `GLOBAL_POSITION_INT` / `GPS_RAW_INT`.
+- Video covers the active flight segment after alignment.
+
+### Sequence Diagram
+
+```mermaid
+sequenceDiagram
+    participant Operator
+    participant UI as [[suite/ui]] AZ-897
+    participant API as replay_api AZ-973
+    participant Align as replay_input alignment AZ-971
+    participant Export as tlog_to_csv AZ-972
+    participant Seed as operator_replay cache_seed AZ-974
+    participant Sat as [[satellite-provider]]
+    participant Replay as gps-denied-replay
+    participant Pipeline as C1..C5 replay mode
+
+    Operator->>UI: upload video + tlog + calibration
+    UI->>API: POST /replay/preview
+    API-->>UI: video metadata + IMU2 activity timeline
+    Operator->>UI: drag video bar / refine
+    UI->>API: POST /replay/align/refine
+    API->>Align: refine_video_offset
+    Align-->>UI: refined_offset_ms + confidence
+    Operator->>UI: Run demo
+    UI->>API: POST /replay/demo
+    API->>Export: export_aligned_csv
+    API->>Seed: extract_route + seed_route + FAISS
+    Seed->>Sat: POST /api/satellite/route
+    Sat-->>Seed: mapsReady
+    API->>Replay: subprocess --video --imu
+    Replay->>Pipeline: per-frame loop
+    Pipeline-->>API: results.jsonl
+    API-->>UI: map URL + verdict report
+```
+
+### Data flow
+
+| Step | From | To | Data | Format |
+|------|------|----|------|--------|
+| 1 | UI | replay_api | video + tlog multipart | HTTP |
+| 2 | replay_api | UI | timeline preview JSON | JSON |
+| 3 | UI | replay_api | `video_offset_ms` | JSON |
+| 4 | replay_api | disk | aligned `data_imu.csv` | AZ-896 CSV |
+| 5 | replay_api | satellite-provider | `RouteSpec` waypoints | JSON POST |
+| 6 | replay_api | airborne binary | video + CSV + cache config | subprocess |
+| 7 | replay_api | UI | JSONL path, map URL, verdict md | JSON job result |
+
+### Error scenarios
+
+| Error | Detection | Recovery |
+|-------|-----------|----------|
+| Missing IMU in tlog | preview 422 | Operator message; cannot align |
+| Refine hard-fail (< 95 % frame match) | align/refine response | Operator adjusts bar or aborts |
+| Route seed terminal failure | `RouteTerminalFailureError` | Job failed; operator retries |
+| ESKF divergence (no cache) | replay exit ≠ 0 | Ensure step 4 completed; check AZ-963 |
+
+### Performance expectations
+
+| Metric | Target | Notes |
+|--------|--------|-------|
+| Preview latency | p95 < 5 s | tlog parse + video probe |
+| Full demo (Derkachi) | ≤ 15 min cold | matches AZ-835 AC-7 |
+| Warm cache reuse | ≤ 30 s seed skip | named volume / cache_root reuse |
+
+---
+
 ## Cross-cutting: FDR write side-effect

 Every flow above produces FDR records (per AC-NEW-3). The cross-cutting rules are:
@@ -0,0 +1,54 @@
+# System Overview Diagram
+
+> Date: 2026-05-24. Plain-English end-to-end view of the GPS-denied onboard pose estimation system, intended for onboarding and presentations. Detailed per-component decomposition lives in `architecture.md`; per-flow sequences in `system-flows.md`.
+
+**One-line goal**: when a drone's GPS is jammed or spoofed, give the flight controller a position fix derived from what the camera sees vs. a pre-loaded satellite map — with an honest accuracy number attached.
+
+```mermaid
+flowchart LR
+  subgraph BEFORE["Before flight — operator workstation"]
+    UI["Mission Planner<br/>(operator draws route)"] --> PREP["Pre-flight setup<br/>• download map tiles<br/>• build search index<br/>• mark takeoff point"]
+    SAT[("Satellite map service")] -. tiles .-> PREP
+  end
+
+  subgraph DURING["During flight — drone companion computer"]
+    CAM[/"Camera<br/>(3 Hz)"/] --> MOTION["Motion tracker<br/>(camera + IMU →<br/>frame-to-frame motion)"]
+    CAM --> MATCH["Map matcher<br/>(find where this frame is<br/>on the satellite map)"]
+    FC[/"Flight controller"/] -- "IMU 100–200 Hz" --> MOTION
+    FC -- "IMU 100–200 Hz" --> FUSE
+    MOTION --> FUSE
+    MATCH --> FUSE["State estimator<br/>(fuse motion + map +<br/>IMU into one position)"]
+    FUSE == "Position + accuracy<br/>+ how we got it" ==> FC
+    CACHE[("Cached map tiles<br/>read-only in flight")] --> MATCH
+  end
+
+  subgraph AFTER["After landing — operator workstation"]
+    UPLOAD["Upload new tiles<br/>captured in flight<br/>(only on clean landing)"]
+  end
+
+  PREP ==> DURING
+  PREP --> CACHE
+  DURING -. flight log .-> UPLOAD
+  UPLOAD -. tiles .-> SAT
+
+  classDef ext fill:#eef,stroke:#88a;
+  classDef store fill:#ffe,stroke:#aa6;
+  class UI,SAT,FC,CAM ext;
+  class CACHE store;
+```
+
+## How to read it in 30 seconds
+
+1. **Before flight** — the operator draws a route in the Mission Planner. The workstation downloads the satellite-map tiles that cover the route, builds a search index over them, and notes the takeoff point.
+2. **During flight** — the drone's camera produces a frame three times a second. Two things happen to each frame in parallel:
+   - The **motion tracker** combines the camera with the flight controller's IMU to estimate how the drone moved since the last frame.
+   - The **map matcher** compares the frame against the cached satellite tiles to find where on the map the drone currently is.
+3. The **state estimator** fuses both signals (plus raw IMU) into a single position estimate, attaches an honest accuracy number, and sends it to the flight controller — which uses it as a drop-in replacement for GPS.
+4. **After landing** — any new map tiles the drone captured during the flight get uploaded back to the satellite map service so the next mission has fresher data.
+
+## Why the picture is shaped this way (invariants worth defending)
+
+- **The drone never talks to the satellite map service in flight.** All tile downloads happen on the operator workstation before takeoff; all tile uploads happen on the operator workstation after landing. The airborne code physically cannot reach the network for tiles. (ADR-004 process isolation.)
+- **Two parallel branches feed the estimator.** Motion tracking (camera + IMU) and map matching (camera + cached tiles) are independent — neither depends on the other to produce a result. The estimator decides how to weigh them on every frame.
+- **The position emitted to the flight controller always carries an honest accuracy number and a provenance label** (`satellite_anchored` / `visual_propagated` / `dead_reckoned`). Under-reporting accuracy is treated as a defect, not a tuning knob.
+- **Post-landing upload only fires on a clean shutdown** (the flight log's footer record confirms it). If the system crashed or the drone went down hard, mid-flight tiles stay local until an operator triages them.
@@ -593,3 +593,123 @@ All tests run from the `e2e-runner` container against the SUT through public bou

 **Expected outcome**: All mid-flight tiles current-timestamped and fresh.
 **Max execution time**: 6 min.
+
+---
+
+### FT-P-20: Real-flight validation runner — honest verdict + Markdown accuracy report
+
+**Summary**: Runs the full `gps-denied-replay` against the **real** Derkachi binary tlog + flight video + AZ-702 factory-sheet camera calibration, computes the per-emission horizontal-error distribution, and writes a structured Markdown accuracy report. Replaces the AZ-404 `@xfail` mask on AC-3 with a real PASS/FAIL.
+**Traces to**: AZ-699 AC-1..AC-3 (epic AZ-696 AC-3 — the 100 m / 80 % gate).
+**Category**: Position Accuracy
+
+**Preconditions**:
+- `_docs/00_problem/input_data/flight_derkachi/derkachi.tlog` (real binary, multi-flight).
+- `_docs/00_problem/input_data/flight_derkachi/flight_derkachi.mp4` (real recording, > 1 MB; the placeholder used by AZ-404 does not satisfy this gate).
+- `_docs/00_problem/input_data/flight_derkachi/khp20s30_factory.json` (AZ-702 calibration).
+- `gps-denied-replay` console-script installed.
+- `RUN_REPLAY_E2E=1` (matches the existing AZ-404 gate).
+
+**Input data**: real `derkachi.tlog` covers up to three sorties; the AZ-698 segmenter + `--auto-trim` locates the matching flight automatically.
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | Invoke `gps-denied-replay --auto-trim ...` with real fixtures | Subprocess exits 0 within the 15-min NFR budget |
+| 2 | Parse JSONL emissions; pair each with the nearest-in-time ground-truth row (binary-tlog GPS via AZ-697) | Distribution computed: count, mean, p50, p95, p99, threshold-hit share at 10/25/50/100 m |
+| 3 | Render the Markdown accuracy report and write `_docs/06_metrics/real_flight_validation_{YYYY-MM-DD}.md` | Report exists with header, run context, horizontal-error stats, threshold-hit table, and (when available) vertical-error stats |
+| 4 | Evaluate the AC-3 gate: ≥ 80 % within 100 m | Verdict is PASS or honest FAIL — no `@xfail` mask |
+| 5 | On FAIL, surface a failure message referencing the calibration acquisition method (factory-sheet / placeholder / unknown) and the residual budget | Operator can attribute the failure without re-reading the source |
+
+**Expected outcome**: PASS when the estimator meets the epic AC-3 gate; honest FAIL otherwise. The Markdown report is the durable artefact (consumed by the cycle retrospective and downstream tuning work).
+
+**Max execution time**: 15 min (matches AZ-699 NFR for a single Tier-2 Jetson run).
+
+**Report artefact schema** (canonical, produced by `tests/e2e/replay/_report_writer.py`):
+
+```markdown
+# Real-flight validation — YYYY-MM-DD
+
+**Verdict**: PASS | FAIL (AC-3 gate: ≥ 80 % within 100 m)
+
+## Run context
+
+- Tlog: `<path>`
+- Video: `<path>`
+- Calibration acquisition method: factory-sheet | placeholder | unknown
+- Clip duration: <float> s
+- Emissions consumed: <int>
+- Ground-truth pairings: <int>
+
+## Horizontal error (metres)
+
+| Statistic | Value |
+| --------- | ----- |
+| Mean | <float> |
+| p50  | <float> |
+| p95  | <float> |
+| p99  | <float> |
+
+## Threshold-hit share
+
+| Threshold (m) | Hit share (%) |
+| ------------- | ------------- |
+| 10  | <float> |
+| 25  | <float> |
+| 50  | <float> |
+| 100 | <float> |
+
+## Vertical error (metres)
+
+| Statistic | Value |
+| --------- | ----- |
+| Mean    | <float> |
+| p50     | <float> |
+| p95     | <float> |
+| Samples | <int>   |
+```
+
+The Vertical-error section is replaced by `_No emissions carried a comparable altitude — vertical stats skipped._` when none of the JSONL rows carry an `alt_m` field comparable to the ground-truth altitude.
+
+**Skip semantics**: AZ-699 distinguishes between *missing-prerequisite skip* (cleanly skipped with the missing file's path) and *test-cannot-resolve mask* (`@xfail` — explicitly forbidden by AZ-699 AC-1). The AZ-404 1-min test's `@xfail` on AC-3 is unchanged (AZ-699 AC-4 is "add a new test, don't replace") — FT-P-20 is the honest replacement that runs alongside it.
+
+---
+
+### FT-P-21: End-to-end orchestrator pipeline from `(tlog, video, calibration)` only
+
+**Summary**: Validates the full 7-step Epic AZ-835 pipeline — given only `(tlog, video, calibration)`, the system auto-extracts a `RouteSpec` (AZ-836), posts it to the real satellite-provider (AZ-838), builds the C6 FAISS index via the route-driven `operator_pre_flight_setup` fixture (AZ-839, supersedes the AZ-777 Phase 3 bbox-seeded placeholder), runs the airborne replay pipeline, and emits a horizontal-error verdict report. No operator hand-curation between steps. Closes the Epic AZ-835 narrative: "give it a tlog + video + calibration, and the system does everything else."
+**Traces to**: AZ-840 AC-1..AC-8 (epic AZ-835 narrative); supplementary product-AC coverage on AC-1.1, AC-1.2, AC-8.3 (pre-loaded cache realised from route, not bbox).
+**Category**: End-to-end Integration + Position Accuracy
+
+**Preconditions**:
+- Tier-2 Jetson with `@pytest.mark.tier2` + `RUN_REPLAY_E2E=1` env (explicit skip-reason naming the missing env var — no silent skip per AZ-840 AC-6).
+- Real `satellite-provider` + `satellite-provider-postgres` services running in `docker-compose.test.jetson.yml` (lineage AZ-688 / AZ-691 / AZ-692; cycle-3 AZ-777 Phase 1 adapted C11 to the real `POST /api/satellite/tiles/inventory` + `GET /tiles/{z}/{x}/{y}` endpoints).
+- `tests/e2e/replay/conftest.py::operator_pre_flight_setup` from AZ-839 (route-driven C6 population, supersedes the AZ-777 Phase 3 placeholder).
+- `_docs/00_problem/input_data/flight_derkachi/derkachi.tlog` + `flight_derkachi.mp4` (real binary + real video >1 MB).
+- `_docs/00_problem/input_data/flight_derkachi/khp20s30_factory.json` (AZ-702 factory-sheet camera calibration).
+- `gps-denied-replay` console-script installed in the e2e-runner image (AZ-604).
+- AZ-776 (eskf open-loop composition profile) landed; AZ-848 — Jetson `eskf_out_of_order` regression — currently blocks the heavy-AC path on Jetson, so FT-P-21 produces its first honest verdict once AZ-848 lands.
+
+**Input data**: real `derkachi.tlog`, real `flight_derkachi.mp4`, factory-sheet camera calibration. AZ-836's `extract_route_from_tlog(tlog, max_waypoints=10)` derives the `RouteSpec` from the tlog itself; no operator authoring required.
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | Active-flight cut + tlog/video sync via AZ-405's `tlog_video_adapter` | Active segment located; tlog↔video offset resolved (`replay.compose_root.ready` logs `auto_sync_used=true|false`, AC-8 escape hatch honored). |
+| 2 | On-fly frame + IMU extraction via `VideoFileFrameSource` + `TlogReplayFcAdapter` | Frame and IMU streams co-aligned per AZ-697 ground-truth invariants. |
+| 3 | `extract_route_from_tlog(tlog, max_waypoints=10)` → `RouteSpec` | Route materially follows tlog trajectory; waypoints inside the Derkachi bbox (lat 50.0808..50.0832, lon 36.1070..36.1134) per AZ-836 AC-1. |
+| 4 | `operator_pre_flight_setup` posts route via `SatelliteProviderRouteClient.seed_route`; satellite-provider downloads Google Maps tiles into C6 | Route registered; `mapsReady=true` within poll budget; `tile_count > 0`; warm fixture re-invocation within the same compose session ≤ 30 s (AZ-839 AC-2). |
+| 5 | C10 `DescriptorBatcher` builds the FAISS HNSW NetVLAD index from the populated C6 | Three sidecar files (`.index` + `.sha256` + `.meta.json`) pass the AZ-306 triple-consistency check; tamper test raises `IndexUnavailableError` (AZ-839 AC-6). |
+| 6 | Invoke airborne `gps-denied-replay` against the populated cache + tlog/video/calibration | Subprocess runs the per-frame loop end-to-end; emits JSONL outputs (currently blocked by AZ-848 — `eskf_out_of_order` at frame 3 fails the binary with exit 1 deterministically on the Derkachi 1-min clip). |
+| 7 | Compute horizontal-error distribution via `helpers/accuracy_report.py` + `helpers/gps_compare.py`; write verdict report | `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` exists with the honest distribution (PASS or FAIL on the AZ-696 100 m / 80 % gate — verdict emitted **regardless** of PASS/FAIL per AZ-840 AC-2). |
+
+**Expected outcome**: Verdict report exists with the honest horizontal-error distribution. Test PASSes iff the run meets the AZ-696 100 m / 80 % gate (≥ 80 % of ticks within 100 m of ground truth). Mid-pipeline failures (e.g., satellite-provider rejection at step 4, sidecar mismatch at step 5, ESKF divergence at step 6) fail LOUD with a clear error pointing at the failing step — no silent skip past a failure (AZ-840 AC-5).
+
+**Max execution time**: 15 min wall-clock on the Derkachi clip (AZ-840 AC-4 soft target for first delivery; hard NFR set after first honest measurement is recorded in the verdict report).
+
+**Relationship to existing tests**:
+- FT-P-20 (AZ-699 real-flight runner) is preserved (AZ-840 AC-7) — FT-P-21 reuses its verdict-report-writing path through `_report_writer.py` rather than superseding it. Either the two live alongside, or AZ-699's runner is wrapped by AZ-840's orchestrator with the verdict-writing path preserved.
+- FT-P-15 + FT-P-16 (pre-loaded cache, AC-8.3) remain the canonical bbox-fixture tests; FT-P-21 is the route-driven supplementary test that exercises the same end-state (populated C6) via the production C11→satellite-provider path.
+
+**Implemented as**: `tests/e2e/replay/test_az835_e2e_real_flight.py` (per AZ-840). Unit-tested orchestration helper: `tests/e2e/replay/test_e2e_orchestrator_unit.py` (17 tests covering parameter validation + error propagation between the 7 orchestration steps).
@@ -1,5 +1,41 @@
 # Test Environment

+> **Active policy — 2026-05-20 (refined)**: the canonical CI / release-gate
+> test environment is the Jetson Orin Nano Super (or a Jetson-equivalent
+> arm64 agent). **Unit tests** (`pytest tests/unit/`) MAY be run on a local
+> developer workstation for fast iteration — they are hardware-agnostic by
+> construction, the suite is fully synthetic, and Jetson SSH round-trips add
+> latency without adding signal. **Blackbox / e2e / performance / resilience
+> / security / resource-limit tests** (`tests/e2e/`, `e2e/tests/`,
+> `tests/perf/`, etc.) MUST run on the Jetson — never on a local workstation
+> — because their pass criteria are tied to Jetson wall-clock latency,
+> thermal envelope, and the real-camera + real-FC SITL loop. Workstation x86
+> Docker (the historical "Tier-1" path) is **deprecated** as a supported
+> e2e environment; the Tier-1 sections below are retained as historical
+> reference / traceability only. CI e2e pipelines target the colocated
+> arm64 Jetson Woodpecker agent (see `_docs/04_deploy/ci_cd_pipeline.md`);
+> local-development e2e runs SHOULD use `scripts/run-tests-jetson.sh`
+> against the configured `jetson-e2e` SSH alias rather than
+> `scripts/run-tests.sh`. This refinement supersedes the 2026-05-20 "all
+> tiers on Jetson" wording and the 2026-05-09 "both" decision recorded in
+> the § Test Execution section.
+
+## Where each tier runs (active policy)
+
+| Tier | Local workstation | Jetson (canonical) | When local is the only option |
+|------|--------------------|--------------------|-------------------------------|
+| Unit (`tests/unit/`) | ✅ allowed and encouraged for dev iteration | ✅ also run as part of the Jetson CI lane | always |
+| Blackbox / e2e (`tests/e2e/`, `e2e/tests/`) | ❌ forbidden — placeholder fixtures + missing hardware = false-negative runs | ✅ required for any merge / release decision | never — if Jetson is unreachable, the e2e verdict is "not run" rather than a local result |
+| Performance / resilience / security / resource-limit | ❌ forbidden | ✅ required | never |
+| Thermal chamber (AC-NEW-5) | ❌ forbidden | ✅ chamber Jetson only | never |
+
+Practical consequences:
+
+- A PR may merge on green local unit tests + green Jetson e2e tests.
+- A PR MAY NOT merge on green local unit tests alone — the Jetson e2e lane is the binding signal.
+- When the Jetson agent is offline, the e2e verdict is "pending Jetson" — record the gap (e.g. via `_docs/_process_leftovers/`) rather than substituting a local run.
+- Tests in `tests/e2e/` that gate on `RUN_REPLAY_E2E` or `@pytest.mark.tier2` will SKIP locally; this is correct behaviour, not a failure to investigate.
+
 ## Overview

 **System under test (SUT)**: `gps-denied-onboard` companion-PC service that produces WGS84 position estimates from nav-camera frames + FC IMU/attitude and emits them to the FC over its native external-positioning interface. Public boundaries (the only surfaces tests interact with):
@@ -15,14 +51,19 @@

 ## Two-tier execution profile

-This project requires two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation.
+> **SUPERSEDED — 2026-05-20**: the two-tier model below is retained for
+> historical traceability. The active policy is **Jetson-only** (see banner
+> at the top of this doc). Tier-1 (workstation Docker) is deprecated; only
+> the Tier-2 row continues to describe a supported environment.
+
+This project originally specified two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation.

 | Tier | Hardware | What it covers | What it skips |
 |------|----------|----------------|---------------|
-| **Tier-1 (workstation Docker)** | x86 dev workstation, optional NVIDIA dGPU for TensorRT validation | All `FT-*` correctness, schema, `NFT-RES-*` resilience scenarios, `NFT-SEC-*` security scenarios, `NFT-LIM-*` storage budgets | Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5 |
-| **Tier-2 (Jetson hardware loop)** | Jetson Orin Nano Super (pinned hardware per `restrictions.md`), thermal chamber for AC-NEW-5 | AC-4.1 latency p95, AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) | Iteration speed (manual hardware time) |
+| **Tier-1 (workstation Docker)** *(deprecated 2026-05-20)* | x86 dev workstation, optional NVIDIA dGPU for TensorRT validation | All `FT-*` correctness, schema, `NFT-RES-*` resilience scenarios, `NFT-SEC-*` security scenarios, `NFT-LIM-*` storage budgets | Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5 |
+| **Jetson (canonical, 2026-05-20)** *(formerly "Tier-2")* | Jetson Orin Nano Super (pinned hardware per `restrictions.md`), thermal chamber for AC-NEW-5 | Everything: `FT-*` correctness, schema, `NFT-RES-*`, `NFT-SEC-*`, `NFT-LIM-*`, `NFT-PERF-*` (AC-4.1 latency p95), AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) | Nothing — anything that doesn't run here doesn't run at all |

-CI runs Tier-1 on every PR. Tier-2 runs on hardware-attached runners on a nightly cadence and pre-release gate; results are imported into the same CSV report format as Tier-1.
+CI runs the Jetson pipeline (`01-test.yml`) on the colocated arm64 Jetson agent. Chamber-only AC-NEW-5 runs on `self-hosted-jetson-orin-chamber` on the documented quarterly + pre-release cadence; results are recorded in the same CSV report format.

 ## Docker Environment (Tier-1)

@@ -213,20 +254,19 @@ The captured-fixture builder framework (`e2e/fixtures/sitl_replay_builder/`) reg

 ## CI/CD Integration

-**When to run**:
- Tier-1 (workstation Docker): on every PR to `dev` branch and nightly on `dev` HEAD.
- Tier-2 (Jetson hardware loop): nightly on `dev`, and as a hard gate before any release tag.
- AC-NEW-5 thermal envelope: monthly on chamber-attached Jetson runner; failures block release tags only.
+> **2026-05-20**: rewritten for the Jetson-only policy. Tier-1 references in the historical sub-sections below are no longer operative.

-**Pipeline stage**:
- Tier-1 fits in the standard CI matrix as a single job (~30-45 min wall-clock for the full suite at first cut).
- Tier-2 is a separate workflow on `self-hosted-jetson-orin` runner.
+**When to run** (active policy):

-**Gate behavior**: Tier-1 blocks PR merge on any test failure. Tier-2 blocks release tag on any test failure. Chamber tests are warning-only on PRs and blocking on release tags.
+- Jetson (colocated arm64 Woodpecker agent): on every PR to `dev` branch, nightly on `dev` HEAD, and as a hard gate before any release tag.
+- AC-NEW-5 thermal envelope: quarterly on the chamber-attached Jetson runner; failures block release tags only.
+
+**Pipeline stage**: a single Jetson workflow (`.woodpecker/01-test.yml`) on the `self-hosted-jetson-orin` runner exercises the full suite — there is no longer a parallel x86 lane.
+
+**Gate behavior**: Jetson blocks PR merge on any test failure and blocks release tags on any test failure. Chamber tests are warning-only on PRs and blocking on release tags.

 **Timeout**:
- Tier-1: 60 min per matrix entry.
- Tier-2: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops).
+- Jetson: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops).
 - Thermal chamber AC-NEW-5: 9 hr (8 h hot-soak + setup/teardown).

 ## Reporting
@@ -246,7 +286,27 @@ The captured-fixture builder framework (`e2e/fixtures/sitl_replay_builder/`) reg

 ## Test Execution

-**Decision (2026-05-09)**: **both** — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate.
+**Decision (2026-05-20, refined later that day)** — **Jetson is the binding e2e environment; unit tests may run locally.** This refines the earlier "Jetson only for everything" wording. Rationale captured in `_docs/LESSONS.md` (2026-05-20 entries):
+
+- The original "Jetson-only across all tiers" decision came from repeated workstation-vs-Jetson environment divergences in the e2e / build path (Dockerfile build order, missing `libgl1`, gtsam wheel availability, venv symlink resolution, lazy-import side-effect registration). Those divergences are real and continue to justify Jetson as the binding e2e environment.
+- Forcing the unit-test suite over an SSH-orchestrated Jetson loop added 30–90 s per iteration without producing any signal the local interpreter doesn't already produce. The unit suite is fully synthetic — no camera, no SITL, no Jetson-specific runtime — so a local PASS is equivalent to a Jetson PASS for that tier.
+
+**Operational entry points**:
+
+| Tier | Entry point | Where it runs |
+|------|-------------|---------------|
+| Unit (`tests/unit/`) | `pytest tests/unit/ -q` directly, or `scripts/run-tests.sh` | local workstation (Python 3.10+ venv) |
+| Blackbox / e2e (`tests/e2e/`, `e2e/tests/`) | `scripts/run-tests-jetson.sh` (local dev) / `.woodpecker/01-test.yml` (CI) | colocated arm64 Jetson Woodpecker agent — see `_docs/04_deploy/ci_cd_pipeline.md` |
+| Performance / resilience / security / resource-limit | same as e2e | Jetson only |
+| AC-NEW-5 thermal chamber | quarterly + pre-release | `self-hosted-jetson-orin-chamber` |
+
+A green local unit-test run is necessary-but-not-sufficient for merge; the Jetson e2e lane is the binding signal.
+
+The remainder of this section preserves the original 2026-05-09 decision context for traceability.
+
+---
+
+**Decision (2026-05-09, SUPERSEDED)**: **both** — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate.

 ### Hardware dependencies found (Phase 3 → Hardware Assessment scan)

@@ -340,8 +400,13 @@ When invoked on a control host (typical), the script SSH-orchestrates the Jetson

 ### CI runner mapping

- `ubuntu-24.04` (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.
- `self-hosted-jetson-orin` → Tier-2 Jetson, nightly on `dev` HEAD + pre-release gate. ~4 hr per matrix entry.
+**Active mapping (2026-05-20)**:
+
+- `self-hosted-jetson-orin` (colocated arm64 Woodpecker agent) → all test runs, every PR + nightly + pre-release. ~4 hr per matrix entry. **This is the single canonical CI test runner.**
 - `self-hosted-jetson-orin-chamber` → AC-NEW-5 hot-soak. Quarterly + before any release tag. ~9 hr.

+**Removed (2026-05-20)**:
+
+- ~~`ubuntu-24.04` (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.~~ — Tier-1 workstation Docker is deprecated; no x86 CI agent participates in the test path. CI build-push lanes that ship images may still run on amd64 if/when that matrix dimension is uncommented in `02-build-push.yml`, but the test lane is Jetson-only.
+
 **Matrix dimensions**: `FC_ADAPTER × VIO_STRATEGY × build_kind` where `build_kind ∈ {production, research}`. Production `vins_mono` is excluded (D-C1-1-SUB-A locked); research includes all three VioStrategy values.
@@ -8,8 +8,8 @@ This matrix is the canonical view of test coverage for the planning context. It

 | AC ID | Acceptance Criterion (one-line) | Test IDs | Coverage |
 |-------|---------------------|----------|----------|
-| AC-1.1 | Frame-center GPS within 50 m for ≥80% of normal-flight photos | FT-P-01 | Covered |
-| AC-1.2 | Frame-center GPS within 20 m for ≥50% of normal-flight photos | FT-P-01 | Covered |
+| AC-1.1 | Frame-center GPS within 50 m for ≥80% of normal-flight photos | FT-P-01, FT-P-21 (orchestrator-level supplementary) | Covered |
+| AC-1.2 | Frame-center GPS within 20 m for ≥50% of normal-flight photos | FT-P-01, FT-P-21 (orchestrator-level supplementary) | Covered |
 | AC-1.3 | Cumulative drift between satellite-anchored fixes <100 m visual / <50 m IMU-fused | FT-P-02 | Covered |
 | AC-1.4 | Estimate reports 95% covariance + source label | FT-P-03 | Covered |
 | AC-2.1a | Frame-to-frame registration ≥95% on normal segments | FT-P-04 | Covered |
@@ -35,7 +35,7 @@ This matrix is the canonical view of test coverage for the planning context. It
 | AC-7.2 | AI-camera object coordinates from gimbal/zoom/altitude | — | NOT COVERED — same as AC-7.1 |
 | AC-8.1 | Imagery via Suite Sat Service offline cache, ≥0.5 m/px | FT-P-15, FT-P-16, NFT-SEC-02 | Covered |
 | AC-8.2 | Tile freshness <6 mo (active-conflict) / <12 mo (rear) | FT-N-05 | Covered |
-| AC-8.3 | Imagery pre-loaded onto companion before flight | FT-P-15, FT-P-16 | Covered |
+| AC-8.3 | Imagery pre-loaded onto companion before flight | FT-P-15, FT-P-16, FT-P-21 (route-driven via real satellite-provider) | Covered |
 | AC-8.4 | Mid-flight tile generation with quality metadata | FT-P-17 | Covered |
 | AC-8.5 | No raw nav/AI-cam frame retention except thumbnail log | FT-P-18 | Covered |
 | AC-8.6 | Satellite relocalization scale-ratio + scene-change | FT-P-19 (scale FULL; scene-change PARTIAL) | PARTIAL — scene-change subset reduced confidence (only 2/60 stills have paired sat refs; no labeled change-pair dataset). Independent of the AC-NEW-4 / AC-NEW-7 multi-flight gap (those rows were resolved by AC-text relaxation 2026-05-09; AC-8.6 scene-change still requires a labeled change-pair dataset that synthetic perturbations cannot substitute for). Mitigation: deferred to a follow-up cycle when labeled change-pair data becomes available; surfaced in the Step 4 risk register |
@@ -78,6 +78,8 @@ This matrix is the canonical view of test coverage for the planning context. It
 > Revised 2026-05-09 (Plan Phase 2a.0 outcomes): three rows moved PARTIAL → Covered (AC-NEW-4, AC-NEW-7, RESTRICT-FAIL-2) following AC-text relaxation per Q3=B. Restriction row count corrected from 19 to 20 (pre-existing arithmetic error).
 >
 > Revised 2026-05-19 (Greenfield Step 12 cycle-update — autodev): NFT-RES-05 appended to `resilience-tests.md` capturing the composition-root bootstrap contract introduced by AZ-591 / AZ-618 / AZ-687 (replay-mode minimal config, `AirborneBootstrapError` operator-error contract, Tier-2 `replay.compose_root.ready` + `replay.input.frame_emitted` log-boundary gate). NFT-RES-05 is added to AC-NEW-1 and AC-4.1 as bootstrap-precondition coverage; no coverage counts move because the scenario is supplementary, not promoting any PARTIAL row.
+>
+> Revised 2026-05-24 (Existing-code cycle-3 Step 12 cycle-update — autodev): FT-P-21 appended to `blackbox-tests.md` capturing the Epic AZ-835 orchestrator-level end-to-end pipeline (AZ-836 `RouteSpec` extractor + AZ-838 `SatelliteProviderRouteClient` + AZ-839 route-driven `operator_pre_flight_setup` + AZ-840 orchestrator test). FT-P-21 is supplementary route-driven coverage on AC-1.1, AC-1.2 (orchestrator-level pipeline accuracy) and AC-8.3 (pre-loaded cache realised via the production C11→satellite-provider path rather than the bbox-seeded FT-P-15/FT-P-16 fixture). No coverage counts move — FT-P-21 supplements already-Covered rows. **Currently blocked on Jetson by AZ-848** (`eskf_out_of_order` regression introduced by AZ-776's missing Jetson-verification gate — pre-existing, surfaced cycle-3 Step 11; tracked locally at `_docs/02_tasks/todo/AZ-848_jetson_eskf_out_of_order_regression.md`). Cycle-3 internal changes (C11 contract adaptation per AZ-777 Phase 1; RouteSpec relocation per AZ-845; module-layout refresh AZ-846; AZ-270 lint widening AZ-847; C12 cold-start unit-NFR threshold relax AZ-844) are implementation-only and produce no new black-box scenarios.

 | Category | Total Items | Covered | PARTIAL | Not Covered | Coverage % (Covered + PARTIAL counted half) |
 |----------|-----------|---------|---------|-------------|--------------------------------------------|
@@ -0,0 +1,135 @@
+# [AZ-776 follow-up] derkachi_1min AC-1/2/5/6 fail on Jetson — VioOutput.emitted_at_ns clock-mismatch with FC IMU timebase
+
+> **SCOPE UPDATE (2026-05-26, cycle-4 planning)**
+>
+> After user decision to switch the primary replay path to user-supplied (video, CSV) pairs (see AZ-894 / AZ-895 / AZ-896 / AZ-897), the tlog-adapter path becomes **audit-only** and this ticket is **no longer bench-blocking**. It remains a real bug and stays open for any future tlog-only flight (flights that ship with a `.tlog` but no companion `data_imu.csv`).
+>
+> **Priority**: backlog (deprioritised from cycle-4 candidate)
+> **Bench-blocking?**: no — AZ-894 supersedes
+> **Production-blocking?**: no — production single-clock model never goes through the tlog adapter
+> **Complexity**: unchanged (5 SP)
+
+**Task**: AZ-848_jetson_eskf_out_of_order_regression
+**Name**: Repair the VioOutput contract — emitted_at_ns must use the frame's timeline timestamp, not process monotonic_ns, so it aligns with the FC IMU timebase that C5 ESKF tracks alongside it
+**Description**: On the Jetson e2e harness (`scripts/run-tests-jetson.sh`), four tests in `tests/e2e/replay/test_derkachi_1min.py` (AC-1, AC-5, AC-6 realtime, AC-6 asap) fail with identical deterministic root cause `EstimatorFatalError('eskf filter divergence on vio: mahalanobis²=109.765 > 100.0')` at frame 3, preceded by `c5.state.eskf_out_of_order` from `imu_window` (ts_ns=187_370_418_000 < last_added_ts_ns=1_187_232_637_925_619 — ~5–6 orders of magnitude apart). Plus 1 XPASS on `test_ac3_within_100m_80pct_of_ticks` (probable vacuous-pass — when the binary exits 1 on frame 3, the ≥80 % within 100 m assertion evaluates over zero emissions).
+
+**Revised root cause (2026-05-26 evidence-based investigation)**: NOT an IMU-vs-IMU clock-source mismatch (the original hypothesis was incorrect — RAW_IMU.time_usec and SCALED_IMU2.time_boot_ms share the same FC-boot-relative timebase in the Derkachi tlog: 187–634 s). The actual mismatch is **VioOutput.emitted_at_ns** vs **ImuWindow.ts_end_ns**:
+
+| Source | Code site | Value on Jetson | Timebase |
+|---|---|---|---|
+| `VioOutput.emitted_at_ns` | `klt_ransac.py:274` — `self._clock.monotonic_ns()` | ~1.187·10¹⁵ ns (≈ 13.7 days — Jetson uptime when the run started) | Process monotonic |
+| `imu_window.ts_end_ns` | `tlog_replay_adapter.py:710` — `time_usec * 1000` | ~1.87·10¹¹ ns (≈ 187 s — Pixhawk boot-relative) | FC-boot-relative |
+
+C5 ESKF tracks `_last_added_ts_ns` across BOTH `add_vio` and `add_fc_imu`. Frame 0: `add_vio` sets `_last_added_ts_ns = 1.187·10¹⁵`. Frame 1: `add_fc_imu` checks `1.87·10¹¹ + ~10⁸ < 1.187·10¹⁵` → out_of_order degraded → next add_vio with corrupted nominal state → mahalanobis² = 109.76 > 100 → fatal divergence at frame 3.
+
+**Why this hides on Tier-1**: the test is `@pytest.mark.tier2_only` (skipped on workstation runs). Unit tests use mocked VIO with synthetic clocks, so the contract clash never surfaces.
+
+**Why this hides on a short-uptime Jetson**: a Jetson booted < ~10 s ago would have monotonic_ns smaller than the FC's boot-relative timestamps; the inequality flips and the bug masquerades as "intermittent passes". The 13.7-day-uptime test box made it deterministic.
+
+**Complexity**: 5 SP (revised up from 3 — the fix touches the C1 contract: `VioOutput.emitted_at_ns` semantics + every C1 strategy that populates it + `_docs/02_document/contracts/c1_vio/` doc + every consumer of `vio.emitted_at_ns` in C5 / C13 / FDR. Plus a determinism test that records monotonic_ns vs frame_ts_ns at frame 0 to lock the invariant in.)
+**Dependencies**: AZ-776 (closed; produced the verification gap that hid this regression)
+**Related**: AZ-883 (SCALED_IMU2 latent ts_ns=0 bug; uncovered during this investigation; separate ticket)
+**Component**: c1_vio (`klt_ransac.py`, `bench/okvis2.py`, `bench/vins_mono.py`, `_facade_spine.py`) + `_types/nav.py` (VioOutput dataclass) + c5_state (`eskf_baseline.py:add_vio` consumes the field) + c13_fdr (consumes `emitted_at_ns` per the docstring's "adaptive-gating decisions")
+**Tracker**: AZ-848 (https://denyspopov.atlassian.net/browse/AZ-848)
+**Parent Epic**: (none — bug surfaced in cycle 3 Step 11)
+
+Jira AZ-848 is the authoritative spec; this file is the in-workspace mirror.
+
+## Symptom
+
+On Jetson (`scripts/run-tests-jetson.sh`), four tests in `tests/e2e/replay/test_derkachi_1min.py` fail with identical root cause:
+
+- `test_ac1_exits_0_jsonl_count_match`
+- `test_ac5_determinism_two_runs_diff`
+- `test_ac6_pace_realtime_60s_within_5pct`
+- `test_ac6_pace_asap_under_30s`
+
+All four assert `gps-denied-replay` exits 0; the binary actually exits 1 on frame 3 with:
+
+```
+ERROR c5_state.eskf_baseline c5.state.eskf_out_of_order
+  source=imu_window ts_ns=187,370,418,000 last_added_ts_ns=1,187,232,637,925,619
+ERROR c5_state.eskf_baseline c5.state.eskf_filter_divergence
+  source=vio mahalanobis_sq=109.76467866548009 threshold_sq=100.0
+ERROR runtime_root.replay_loop replay_loop.state_add_vio_fatal
+  frame=3 EstimatorFatalError('eskf filter divergence on vio: mahalanobis²=109.765 > 100.0')
+```
+
+Mahalanobis distance is identical (109.765) across all four runs — fully deterministic on the Derkachi 1-min clip.
+
+Additionally, `test_ac3_within_100m_80pct_of_ticks` reports XPASS (was `@xfail` referencing AZ-777). Appears to be a symptom of the same bug — with the binary exiting code 1 before any GPS-denied emissions land, the `≥ 80 % within 100 m` assertion evaluates against an empty population and passes vacuously. The XPASS is NOT honest evidence that AZ-777 has been completed.
+
+## Origin — AZ-776 verification gap
+
+Commit `8de2716 [AZ-776] Open-loop ESKF composition profile via c4_pose.enabled` removed `@pytest.mark.xfail` decorators from AC-1 (line 61), AC-2 (line 138), AC-5 (line 413), AC-6 realtime (line 453), AC-6 asap (line 479) of `test_derkachi_1min.py`. The AZ-776 spec (`_docs/02_tasks/done/AZ-776_eskf_open_loop_composition_profile.md`) claims under AC-7:
+
+> `_run_replay_loop` in `runtime_root/__init__.py` is exercised end-to-end on Jetson by a non-`xfail` integration test (AC-1, AC-2, AC-5, AC-6 realtime, AC-6 asap in `tests/e2e/replay/test_derkachi_1min.py` un-xfail **and pass**).
+
+This was not honored — AZ-776 closed without an honest Jetson run. Predates the `meta-rule.mdc` "Real Results, Not Simulated Ones" rule (added 2026-05) that would have caught it.
+
+## Cycle-3 scope (not the cause)
+
+Cycle-3 Step 11 (2026-05-24) surfaced this on the first full Jetson run since cycle 1. Cycle-3's only src change was commit `fd52cc9 [AZ-845][AZ-846][AZ-847] Refactor 02: relocate RouteSpec + widen lint` — four files, all in `_types/route.py` (new), `c11_tile_manager/route_client.py`, `replay_input/__init__.py`, `replay_input/tlog_route.py`. None of `c5_state`, `c8_fc_adapter`, `runtime_root` were touched. Most recent change to `c5_state/eskf_baseline.py` is AZ-389; to `c8_fc_adapter/tlog_replay_adapter.py` is AZ-398. Both pre-date cycle 1. The latent contract clash was always there — Jetson uptime + an un-`xfail`ed test combined to make it deterministic.
+
+## Diagnosis evidence (2026-05-26)
+
+`/tmp/inspect_tlog.py` (ad-hoc pymavlink probe against `_docs/00_problem/input_data/flight_derkachi/derkachi.tlog`) — outputs preserved in this session's chat history:
+
+- 4326 RAW_IMU msgs, time_usec ∈ [187,274,914 ; 633,952,656] µs (boot-relative ~187s–~634s)
+- 4330 SCALED_IMU2 msgs, time_boot_ms ∈ [187,274 ; 633,954] ms (same timebase, same range)
+- Both IMU types share the FC's boot timebase → original "two-IMU-clock-source mismatch" hypothesis is REFUTED
+- `klt_ransac.py:274` populates `VioOutput.emitted_at_ns = self._clock.monotonic_ns()` → 1.187·10¹⁵ ns on the test Jetson (uptime 13.7 days)
+- `_types/nav.py:158` documents this contract explicitly: "`emitted_at_ns` is `time.monotonic_ns` at output time."
+- `eskf_baseline.py:492` reads `ts_ns = vio.emitted_at_ns` and stores it in `_last_added_ts_ns` — the same field that `add_fc_imu` checks against `imu_window.ts_end_ns` (FC-boot-relative)
+- Confirmed: the inequality direction MATCHES the AZ-848 error log (`ts_ns=187,370,418,000 < last_added_ts_ns=1,187,232,637,925,619`)
+
+## Affected files
+
+- `src/gps_denied_onboard/_types/nav.py` — `VioOutput.emitted_at_ns` field + docstring at line 158 (contract change site)
+- `src/gps_denied_onboard/components/c1_vio/klt_ransac.py:274,425,463,592–619` — every site that fills `emitted_at_ns`
+- `src/gps_denied_onboard/components/c1_vio/bench/okvis2.py`, `vins_mono.py` — other C1 strategies that fill `emitted_at_ns`
+- `src/gps_denied_onboard/components/c1_vio/_facade_spine.py` — `frame_ts_ns(frame)` is the existing helper that should be the new source of truth
+- `src/gps_denied_onboard/components/c5_state/eskf_baseline.py:492,502,565` — already reads `vio.emitted_at_ns`; no API change needed once the field's semantics are fixed
+- `src/gps_denied_onboard/components/c13_fdr/**` — read `emitted_at_ns` per the docstring's "adaptive-gating decisions"; behavior change must be evaluated
+- `_docs/02_document/contracts/c1_vio/` — contract docs need re-version (semantic change to a public field)
+- `tests/e2e/replay/test_derkachi_1min.py` — the failing tests; AC-3 XPASS handling per AC-4 below
+
+## Repro
+
+```
+bash scripts/run-tests-jetson.sh
+# pytest report (after ~5 min):
+#   tests/e2e/replay/test_derkachi_1min.py::test_ac1_exits_0_jsonl_count_match FAILED
+#   tests/e2e/replay/test_derkachi_1min.py::test_ac5_determinism_two_runs_diff FAILED
+#   tests/e2e/replay/test_derkachi_1min.py::test_ac6_pace_realtime_60s_within_5pct FAILED
+#   tests/e2e/replay/test_derkachi_1min.py::test_ac6_pace_asap_under_30s FAILED
+#   tests/e2e/replay/test_derkachi_1min.py::test_ac3_within_100m_80pct_of_ticks XPASS
+```
+
+## Acceptance Criteria
+
+| # | Criterion |
+|---|-----------|
+| AC-1 | The `VioOutput.emitted_at_ns` contract docstring (`_types/nav.py:158`) no longer says "monotonic_ns at output time"; the field's semantics are documented as "the frame's timeline timestamp aligned with C8 FC IMU timebase, so C5 ESKF can compare against `imu_window.ts_end_ns` without a clock-source mismatch". A version bump is recorded in `_docs/02_document/contracts/c1_vio/`. |
+| AC-2 | Every C1 strategy (`klt_ransac.py`, `bench/okvis2.py`, `bench/vins_mono.py`) populates `emitted_at_ns` from the frame's timestamp (via `frame_ts_ns(frame)` or the strategy's own equivalent), NOT from `monotonic_ns()`. A unit test per strategy asserts the field value equals `frame_ts_ns(frame)`. |
+| AC-3 | A determinism test reads two consecutive frames' `VioOutput.emitted_at_ns` values and asserts they are equal to `frame_ts_ns(frame_n)` and `frame_ts_ns(frame_n+1)` respectively — locking the new invariant. |
+| AC-4 | Fix lands and `test_derkachi_1min.py::test_ac1_exits_0_jsonl_count_match` PASSES on Jetson with `RUN_REPLAY_E2E=1` — no `@xfail` re-add. |
+| AC-5 | `test_ac5_determinism_two_runs_diff`, `test_ac6_pace_realtime_60s_within_5pct`, `test_ac6_pace_asap_under_30s` also PASS on Jetson. |
+| AC-6 | XPASS on `test_ac3_within_100m_80pct_of_ticks` is investigated. If symptom of the same bug, returns to honest XFAIL referencing AZ-777 once binary exits 0 cleanly. If genuine pass, AZ-777 is closed instead. |
+| AC-7 | C13 FDR consumers of `emitted_at_ns` are audited — any code path that relied on the field being monotonic-clock-wall-time has its behavior preserved via an explicit `time.monotonic_ns()` recorded under a different name (e.g., `recorded_at_ns`) or its expectation is documented as "frame timeline; not wall clock". |
+| AC-8 | `meta-rule.mdc` "Real Results" gate is honored — no ticket may close `Done` until the operator has eyes on a green Jetson run log line. |
+
+## Notes
+
+- Tracker context: surfaced `cycle: 3, step: 11` on 2026-05-24; root cause re-diagnosed 2026-05-26 (operator-supervised investigation against the actual Derkachi tlog).
+- Local unit suite (`pytest tests/unit/`) passes 2303 / 0 fail / 86 legitimate skips after C12 cold-start threshold relax (`05f1143 [AZ-844]`).
+- Cycle 3 Step 11 verdict was PASS for cycle-3-scope; this ticket captures the wider Jetson regression for next cycle.
+- Local mirror created retroactively 2026-05-24 (cycle 3 Step 12 entry) — Jira AZ-848 filed 2026-05-24 was the original signal; mirror was missing.
+- 2026-05-26: spec materially revised after evidence-based investigation refuted the original "two-IMU-clock-source mismatch" hypothesis. The corrected diagnosis points at the C1 contract (`VioOutput.emitted_at_ns` semantics), not at the C8 adapter. The SCALED_IMU2 latent bug surfaced during this investigation is split out as AZ-883 to keep this ticket's scope tight.
+
+## References
+
+- Jira: https://denyspopov.atlassian.net/browse/AZ-848
+- Run-tests report: `_docs/03_implementation/run_tests_step11_report.md` (Cycle 3 closeout, lines 617–635)
+- Origin spec: `_docs/02_tasks/done/AZ-776_eskf_open_loop_composition_profile.md`
+- Related: AZ-777 (the XFAIL the AC-6 XPASS originally referenced); AZ-883 (SCALED_IMU2 latent bug)
@@ -0,0 +1,74 @@
+# `_handle_imu` mis-reads SCALED_IMU2 timestamps — produces ts_ns=0 for every other IMU sample
+
+> **SCOPE UPDATE (2026-05-26, cycle-4 planning)**
+>
+> Deprioritised behind AZ-894 (CSV-driven replay adapter). This bug only matters once the tlog-adapter path is reactivated for tlog-only flights (flights that ship with a `.tlog` but no companion `data_imu.csv`). Stays open in backlog.
+>
+> **Priority**: backlog (deprioritised from cycle-4 candidate)
+> **Bench-blocking?**: no — AZ-894 supersedes the tlog path for Derkachi
+> **Complexity**: unchanged (2 SP)
+
+**Task**: AZ-883_scaled_imu2_ts_ns_zero_default
+**Name**: Branch `_handle_imu` on message type so SCALED_IMU2 uses `time_boot_ms × 1_000_000` instead of the missing `time_usec` field
+**Description**: `src/gps_denied_onboard/components/c8_fc_adapter/tlog_replay_adapter.py:683` routes BOTH `RAW_IMU` and `SCALED_IMU2` messages through `_handle_imu`, which at line 710 reads `getattr(msg, "time_usec", 0) * 1000` to compute `sensor_ts_ns`. SCALED_IMU2 has no `time_usec` field (its time field is `time_boot_ms`, uint32 milliseconds since FC boot), so the `getattr` default-of-zero path fires for every SCALED_IMU2 message. The resulting IMU sample stream alternates RAW_IMU timestamps with `ts_ns=0` values.
+
+**Evidence (2026-05-26 investigation against `_docs/00_problem/input_data/flight_derkachi/derkachi.tlog`)**:
+
+- 4326 RAW_IMU messages with `time_usec` ∈ [187,274,914 ; 633,952,656] µs (boot-relative microseconds, ~187s–~634s)
+- 4330 SCALED_IMU2 messages with `time_boot_ms` ∈ [187,274 ; 633,954] ms (same FC-boot timebase, same range)
+- Both interleaved in arrival order — every other IMU sample is the affected type
+- `_handle_imu`'s simulated output: 4266 non-monotonic transitions out of 8656 (~49 %) — almost every other transition is non-monotonic because SCALED_IMU2 collapses to ts_ns=0
+
+**Why this is currently latent**: C5 ESKF's `add_fc_imu` reads `imu_window.ts_end_ns` (the LAST sample's ts_ns) for monotonicity guarding. If the last sample in the window happens to be RAW_IMU, the guard passes. The per-sample preintegration loop at `eskf_baseline.py:627–647` reads each `sample.ts_ns` individually for delta-t computation, but with ts_ns=0 samples interleaved, the delta-t arithmetic produces negative or near-zero intervals that get silently absorbed by the bias-correction math without raising. It WILL bite once any downstream consumer (FDR replay, latency analyser, deterministic-time gate) does a per-sample monotonicity assertion.
+
+**Why this surfaced now**: the operator-supervised AZ-848 investigation read the Derkachi tlog through pymavlink and observed the interleaving directly. The bug has been present since `_handle_imu` was written (predates cycle 1) and was never caught because no test asserts per-sample IMU monotonicity.
+
+**Complexity**: 2 SP
+**Dependencies**: AZ-848 (split off from its investigation; can land before, after, or in parallel — no shared code path beyond `_handle_imu`)
+**Component**: c8_fc_adapter (`tlog_replay_adapter.py`)
+**Tracker**: AZ-883 (https://denyspopov.atlassian.net/browse/AZ-883) — Jira ticket created 2026-05-26 during cycle 3 release flow; allocated key AZ-883 (next-available, NOT the originally-planned AZ-849)
+**Parent Epic**: (none — bug surfaced during AZ-848 investigation)
+
+## Symptom
+
+If you add a per-sample monotonicity assertion to the C5 ESKF or to the C8 tlog adapter pre-emit gate, every Jetson run against the Derkachi tlog reports 4266 zero-valued IMU sample timestamps interleaved with proper RAW_IMU values. The assertion fires immediately at message index 1 (the first SCALED_IMU2 after the first RAW_IMU).
+
+## Proposed fix
+
+Modify `_handle_imu` (`src/gps_denied_onboard/components/c8_fc_adapter/tlog_replay_adapter.py:709`) to branch on the message type via the caller's already-computed `msg_type`:
+
+```python
+def _handle_imu(self, msg: Any, *, msg_type: str) -> bool:
+    if msg_type == "RAW_IMU":
+        sensor_ts_ns = int(getattr(msg, "time_usec", 0)) * 1000
+    elif msg_type == "SCALED_IMU2":
+        sensor_ts_ns = int(getattr(msg, "time_boot_ms", 0)) * 1_000_000
+    else:
+        raise FcOpenError(
+            f"_handle_imu called with unsupported msg_type={msg_type!r}; "
+            f"expected RAW_IMU or SCALED_IMU2"
+        )
+    ...
+```
+
+Update the caller at line 684 to pass `msg_type=msg_type`. Add a unit test that synthesises a SimpleNamespace with `time_boot_ms=187274` (no `time_usec` field) and verifies the emitted `ImuTelemetrySample.ts_ns == 187_274_000_000`.
+
+Alternative (heavier): pick a single canonical message type at construction time (parameterise the adapter with `imu_source: Literal["RAW_IMU","SCALED_IMU2"]`, auto-detected from the tlog pre-scan) and drop the non-chosen type at the dispatch site. This buys cleaner streams but doubles the test matrix.
+
+The branching fix is simpler and preserves the existing OR-group semantic (`("RAW_IMU", "SCALED_IMU2")` in `_REQUIRED_MESSAGE_GROUPS`).
+
+## Acceptance Criteria
+
+| # | Criterion |
+|---|-----------|
+| AC-1 | `_handle_imu` reads `time_boot_ms × 1_000_000` for SCALED_IMU2 messages and `time_usec × 1000` for RAW_IMU. A unit test exercises both branches with a synthetic SimpleNamespace lacking the OTHER field. |
+| AC-2 | An integration test against the Derkachi tlog (Tier-1; no Jetson hardware needed — only pymavlink + the tlog file) asserts that the IMU stream as seen by the runtime loop is strictly monotonic ts_ns. The test reads at least the first 100 IMU samples and verifies `sample[i+1].ts_ns > sample[i].ts_ns` for all i. |
+| AC-3 | No regression in existing RAW_IMU-only adapter tests. |
+| AC-4 | The fix is independent of AZ-848 — does not require the VioOutput contract change to land first. |
+
+## References
+
+- Jira: https://denyspopov.atlassian.net/browse/AZ-883
+- Origin: AZ-848 investigation, 2026-05-26 cycle 3 Step 16.5 release flow
+- Related: AZ-848 (the VIO contract repair; both surfaced from the same investigation but their fixes are independent)
+- Tlog evidence: `_docs/00_problem/input_data/flight_derkachi/derkachi.tlog`, 8656 IMU samples (4326 RAW_IMU + 4330 SCALED_IMU2 interleaved)
@@ -0,0 +1,61 @@
+# Replay: hard removal of deprecated auto-sync surface (AZ-895 follow-up)
+
+> **BLOCKED by Epic AZ-969 (2026-06-19).** AZ-971 restores alignment kernels as operator-driven refine behind `replay_input/alignment.py`. Do not delete alignment logic until AZ-969 ships. AZ-908 scope shrinks to: remove deprecated CLI flags and `auto_sync.py` stub re-exports only — **not** the new alignment module.
+
+**Task**: AZ-908_replay_auto_sync_hard_removal
+**Name**: Cycle-5+ cleanup that physically removes the auto-sync surface AZ-895 deprecated
+**Description**: Follow-up to AZ-895 (cycle 4). AZ-895 made the auto_sync surface a no-op and deprecated the CLI flags (`--time-offset-ms`, `--skip-auto-sync`, `--auto-trim`) with one-cycle warnings, but left the call sites, config fields, and interface DTOs intact for backward compat. AZ-908 completes the removal in cycle 5+ after a one-cycle deprecation window has passed.
+
+**Complexity**: 3 SP
+**Dependencies**: AZ-895 (hard — must ship first; AZ-908 removes what AZ-895 deprecated), AZ-842 (hard — replay protocol docs coordinate)
+**Component**: replay_input (auto_sync.py + tlog_video_adapter.py + interface.py), cli/replay, runtime_root/_replay_branch + runtime_root/__init__, config/schema + config/loader + config/__init__, replay_api/app
+**Tracker**: AZ-908 (https://denyspopov.atlassian.net/browse/AZ-908)
+**Parent Epic**: (none — cycle-4 replay-input redesign follow-up)
+
+## Why
+
+Auto-sync surface is dead in production code: AZ-894 (cycle 4) made the CSV-driven path mandatory via required `--imu`, and AZ-895 (cycle 4) deprecated the surface. After one cycle's deprecation window the deprecation warnings should fire in real CI runs (if any operator scripts still pass the deprecated flags); that surface area can then be removed without breaking anyone.
+
+## Touch list (production)
+
+- DELETE `src/gps_denied_onboard/replay_input/auto_sync.py` (currently a no-op stub from AZ-895)
+- DELETE `src/gps_denied_onboard/replay_input/tlog_video_adapter.py` (currently a deprecated coordinator from AZ-895)
+- Drop `AutoSyncConfig`, `AutoSyncDecision`, `AlignedWindow` DTOs from `replay_input/interface.py`. Drop `auto_sync_result` + `aligned_window` fields from `ReplayInputBundle`.
+- Drop `--time-offset-ms`, `--skip-auto-sync`, `--auto-trim` CLI flags from `cli/replay.py` entirely
+- Drop `ReplayConfig.time_offset_ms`, `.skip_auto_sync_validation`, `.auto_trim`, `.auto_sync` from `config/schema.py`. Drop `ReplayAutoSyncConfig` class.
+- Drop `REPLAY_TIME_OFFSET_MS` env var + `auto_sync` block handling from `config/loader.py`
+- Update `runtime_root/_replay_branch.py` to drop any lingering imports / dead code
+- Update `runtime_root/__init__.py` if it references removed symbols
+- Update `replay_api/app.py` if it references removed symbols
+- Update `e2e/fixtures/sitl_replay_builder/builder.py` if it references removed symbols
+
+## Touch list (tests)
+
+- Delete remaining auto-sync test residue (no-op stub tests from AZ-895)
+- Update CLI tests to drop deprecated-flag assertions (the flags no longer exist)
+- Confirm `test_az401_compose_root_replay.py` is clean
+
+## Touch list (docs)
+
+- Update `_docs/02_document/module-layout.md` replay_input file list — remove deleted entries
+- Update `_docs/02_document/contracts/replay/replay_protocol.md` — remove auto-sync surface narrative (coordinate with AZ-842)
+- Update `_docs/02_document/contracts/replay/csv_replay_format.md` cross-references
+
+## Acceptance Criteria
+
+- **AC-1**: All files listed under "DELETE" above are removed from the workspace
+- **AC-2**: Unit tests pass with no auto-sync, AutoSyncConfig, AutoSyncDecision, or AlignedWindow symbols in `src/gps_denied_onboard/**`
+- **AC-3**: CLI `--help` output does not mention `--time-offset-ms`, `--skip-auto-sync`, or `--auto-trim`
+- **AC-4**: `_docs/02_document/module-layout.md` does not mention `auto_sync.py` or `tlog_video_adapter.py`
+- **AC-5**: `tests/e2e/replay/test_derkachi_real_tlog.py` continues to `@xfail` with AZ-848-scoped reason
+
+## Out of scope
+
+- AZ-848 / AZ-883 structural fix (tlog clock bug) — unchanged from AZ-895
+- Replacing the deprecated coordinator with something else — the CSV path is the replacement (see `_replay_branch._build_csv_bundle`)
+
+## References
+
+- Companion in cycle 4: AZ-894 (CSV adapter), AZ-895 (deprecation)
+- Decision audit trail: this file + AZ-895 batch_03_cycle4_report.md
+- User decision 2026-05-26 (cycle-4 /autodev batch 3): chose Option A (light deprecation now, file AZ-908 for hard removal in cycle 5+) over Option B (full removal in cycle 4).
@@ -0,0 +1,120 @@
+# Direct binary-tlog GPS-truth extractor
+
+**Task**: AZ-697_tlog_ground_truth_extractor
+**Name**: Direct binary-tlog GPS-truth extractor (replaces data_imu.csv middle-man)
+**Description**: New `tlog_ground_truth.py` module that streams `GLOBAL_POSITION_INT` (or falls back to `GPS_RAW_INT`) from a binary ArduPilot tlog into a typed `TlogGroundTruth` DTO. Production helper (not test-only).
+**Complexity**: 3 points
+**Dependencies**: None
+**Component**: replay_input (cross-cutting validation helper)
+**Tracker**: AZ-697
+**Epic**: AZ-696
+
+## Problem
+
+Cycle-1 AC-3 (≤ 100 m horizontal error for 80 % of ticks) was permanently
+`@xfail` partly because the test fed the SUT a tlog synthesized from
+`_docs/00_problem/input_data/flight_derkachi/data_imu.csv`, and read
+ground truth from the same CSV — comparing the estimator to itself.
+
+A real binary `derkachi.tlog` (5.8 MB ArduPilot tlog, MAVLink v2) was
+committed on 2026-05-20. The remaining gap is a direct extractor that
+reads `GLOBAL_POSITION_INT` (or `GPS_RAW_INT`) from the binary and
+returns a typed DTO suitable for the AC-3 comparison helper.
+
+## Outcome
+
+- A new production module `src/gps_denied_onboard/replay_input/tlog_ground_truth.py`
+  exposes `load_tlog_ground_truth(path: Path) -> TlogGroundTruth`.
+- The existing AC-3 comparison helpers (`l2_horizontal_m`,
+  `match_percentage`) move from `tests/e2e/replay/_helpers.py` into
+  `src/gps_denied_onboard/helpers/` so they are production code, not
+  test-only.
+- The replay-test conftest uses the new extractor when the real tlog is
+  present; CSV path remains as a synth-tlog fallback.
+
+## Scope
+
+### Included
+- New `TlogGroundTruth` dataclass (frozen + slotted) with per-record
+  `ts_ns`, `lat_deg`, `lon_deg`, `alt_m`, `hdg_deg`, `vx_m_s`, `vy_m_s`,
+  `vz_m_s` fields.
+- `load_tlog_ground_truth(path)` — lazy `pymavlink.mavutil` open
+  mirroring `replay_input/auto_sync.py::_open_tlog`.
+- Move `l2_horizontal_m` + `match_percentage` from test helpers to
+  `src/gps_denied_onboard/helpers/gps_compare.py`.
+- Wire `tests/e2e/replay/conftest.py` to consume the new path when
+  `derkachi.tlog` exists.
+- Unit tests under `tests/unit/replay_input/test_tlog_ground_truth.py`
+  using a synthetic tlog (extend `tests/e2e/replay/_tlog_synth.py`).
+
+### Excluded
+- Tlog trimming for mid-flight slices — AZ-698 (T2).
+- Accuracy report writing — AZ-699 (T3).
+- Map visualization — AZ-700 (T4).
+
+## Acceptance Criteria
+
+**AC-1: Happy path on real tlog**
+Given the committed `derkachi.tlog`
+When `load_tlog_ground_truth(derkachi.tlog)` runs
+Then it returns `TlogGroundTruth` with `len(records) > 100` and lat ≈ 50.08, lon ≈ 36.11
+
+**AC-2: Empty GPS gracefully**
+Given a tlog with no `GLOBAL_POSITION_INT` / `GPS_RAW_INT` messages
+When the extractor runs
+Then it returns `TlogGroundTruth(records=())` and logs WARN (does NOT raise)
+
+**AC-3: Fallback precedence**
+Given a tlog containing only `GPS_RAW_INT` (no `GLOBAL_POSITION_INT`)
+When the extractor runs
+Then it returns records sourced from `GPS_RAW_INT`
+
+**AC-4: Type safety**
+When `mypy --strict src/gps_denied_onboard/replay_input/tlog_ground_truth.py` runs
+Then it reports zero errors
+
+**AC-5: Comparison helpers in production**
+Given the moved `l2_horizontal_m` + `match_percentage`
+When imported from `gps_denied_onboard.helpers.gps_compare`
+Then they behave identically to the prior test-helpers location (snapshot test)
+
+## Non-Functional Requirements
+
+**Performance**
+- `load_tlog_ground_truth(derkachi.tlog)` (5.8 MB, ~60 s of GPS at 5 Hz) returns in < 2 s on Tier-1 hardware.
+
+**Reliability**
+- Lazy pymavlink import; missing dep raises `ReplayInputAdapterError` per project convention.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-1 | Real derkachi.tlog parse | Non-empty TlogGroundTruth with Derkachi geofence lat/lon |
+| AC-2 | Tlog with no GPS messages | Empty records tuple + WARN log |
+| AC-3 | GPS_RAW_INT fallback | Records sourced from GPS_RAW_INT when GLOBAL_POSITION_INT absent |
+| AC-3 | Mixed GLOBAL_POSITION_INT + GPS_RAW_INT | GLOBAL_POSITION_INT wins per AC-3 |
+| AC-4 | mypy --strict | Zero errors |
+| AC-5 | Helper move snapshot | Same numeric output as prior test-helpers location |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-1 | derkachi.tlog (real) | Load full tlog | ≥ 100 records, Derkachi geofence | Perf < 2s |
+
+## Constraints
+
+- pymavlink is already a project dep (used by C8); MUST be lazy-imported (auto_sync.py pattern).
+- New module MUST follow the project's frozen + slotted dataclass convention.
+- File ownership goes in `_docs/02_document/module-layout.md` per AZ-696 epic layout (no contract — internal helper).
+
+## Risks & Mitigation
+
+**Risk 1: MAVLink unit-conversion bugs**
+- *Risk*: Mavlink encodes lat/lon as int × 1e7. Forgetting the divide ships records off by 7 orders of magnitude.
+- *Mitigation*: AC-1 asserts Derkachi geofence values; unit test snapshots a known fixture.
+
+**Risk 2: pymavlink import flakiness on Jetson**
+- *Risk*: pymavlink occasionally fails to import on aarch64.
+- *Mitigation*: Lazy import + raise `ReplayInputAdapterError` (existing pattern).
@@ -0,0 +1,212 @@
+# Tlog trim + mid-flight alignment for replay
+
+**Task**: AZ-698_tlog_trim_midflight_alignment
+**Name**: Trim tlog to video window + align mid-flight slices via cross-correlation
+**Description**: Extend `replay_input/auto_sync.py` and `TlogReplayFcAdapter` to handle the case where the video is a mid-flight slice of a longer tlog (not the takeoff). Adds `find_aligned_window` (cross-correlation of IMU energy vs video optical-flow magnitude) and a `--auto-trim` CLI flag.
+**Complexity**: 5 points
+**Dependencies**: AZ-697
+**Component**: replay_input + c8_fc_adapter
+**Tracker**: AZ-698
+**Epic**: AZ-696
+
+## Problem
+
+`replay_input/auto_sync.py::detect_tlog_takeoff` walks the tlog HEAD for
+the takeoff event (sustained vertical accel + attitude rate). When the
+uploaded video covers a **mid-flight slice** (e.g., 20–25 min into a
+30 min flight), takeoff detection lands at t=0 and the resulting offset
+is garbage. The replay coordinator then streams the entire tlog
+start-to-end, wasting I/O on the leading minutes and computing
+estimates against stale tlog samples.
+
+The user's pipeline framing: "tlog is usually bigger than video, and
+usually the last chunk in tlog is relevant" — the system must locate
+the video's window within the tlog and trim accordingly.
+
+## Outcome
+
+- A new `find_aligned_window(tlog_path, video_path, config) -> AlignedWindow`
+  returns `(tlog_start_ns, tlog_end_ns, offset_ms, confidence)`.
+- `TlogReplayFcAdapter.open()` honors `tlog_start_ns` — seeks past
+  pre-window messages so downstream only sees the relevant slice.
+- `gps-denied-replay --auto-trim` is the default for uploads that don't
+  pass `--time-offset-ms` or `--skip-auto-sync`.
+- Existing takeoff-aligned Derkachi clip continues to pass AC-9 (no
+  regression on AZ-405).
+
+## Scope
+
+### Included
+- New `find_aligned_window` algorithm — cross-correlation of:
+  - IMU energy stream (10 Hz subsampled `|a| − 1g` from `RAW_IMU`/`SCALED_IMU2`)
+  - Video optical-flow magnitude (existing `_compute_flow_magnitudes`)
+- New `AlignedWindow` DTO under `replay_input/interface.py`.
+- `TlogReplayFcAdapter._timestamp_filter(tlog_start_ns)` seek logic.
+- `gps-denied-replay --auto-trim` CLI flag wiring.
+- Tests: takeoff-aligned regression + synthetic mid-flight scenario.
+
+### Excluded
+- Real-flight validation runner — AZ-699 (T3).
+- Map visualization — AZ-700 (T4).
+- HTTP API — AZ-701 (T5).
+- Camera calibration — AZ-702 (T6).
+
+## Acceptance Criteria
+
+**AC-1: Backward-compat on takeoff-aligned clip**
+Given the existing Derkachi 60 s clip with synthesized tlog
+When `find_aligned_window` runs
+Then it returns `offset_ms` within ± 50 ms of the current `auto_sync.compute_offset` result
+
+**AC-2: Mid-flight alignment**
+Given a synthetic scenario: tlog covering 0–300 s, video covering 100–110 s with motion onset at tlog t=105 s
+When `find_aligned_window` runs
+Then `tlog_start_ns ≈ 100 s`, `tlog_end_ns ≈ 110 s`, `offset_ms` places video t=0 at tlog t=100 s
+
+**AC-3: Tlog trim honored by replay adapter**
+Given `TlogReplayFcAdapter` opened with `tlog_start_ns = 100 s`
+When messages flow
+Then only messages with `_timestamp ≥ 100 s` reach subscribers
+
+**AC-4: AC-9 frame-window validator passes for both scenarios**
+Given the resolved offset from AC-1 or AC-2
+When the AC-9 validator runs on the aligned window
+Then it returns 0 (≥ 95 % match)
+
+**AC-5: End-to-end CLI smoke**
+Given `gps-denied-replay --auto-trim --video derkachi.mp4 --tlog derkachi.tlog`
+When the run completes
+Then exit code is 0 and the output JSONL is non-empty
+
+## Non-Functional Requirements
+
+**Performance**
+- Alignment over a 30-min tlog completes in < 30 s on Tier-1 hardware (10 Hz subsampled IMU stream).
+
+**Reliability**
+- Low confidence (< `low_confidence_threshold`) falls back to head-takeoff detection (existing behavior).
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-1 | Takeoff-aligned offset match | Within ± 50 ms of compute_offset |
+| AC-2 | Mid-flight window discovery | Correct (start_ns, end_ns) |
+| AC-3 | Adapter seek skips pre-window | First emitted ts ≥ tlog_start_ns |
+| AC-4 | Validator on aligned scenarios | Returns 0 |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-5 | Real derkachi inputs + --auto-trim | Full replay CLI run | Clean exit 0 + non-empty JSONL | — |
+
+## Constraints
+
+- Reuse the existing `_find_sustained_event` window-scan utility — no new generic algorithms.
+- IMU subsampling MUST be deterministic (AC-10 across the rest of the replay path).
+- `tlog_start_ns` seek MUST not break the existing AZ-611 `--skip-auto-sync` path.
+
+## Risks & Mitigation
+
+**Risk 1: False maxima during steady cruise**
+- *Risk*: Cross-correlation of steady-state cruise IMU + uniform video flow can have multiple equal-height peaks.
+- *Mitigation*: Report `combined_confidence`; below threshold falls back to head-takeoff or explicit offset.
+
+**Risk 2: Performance on long tlogs**
+- *Risk*: Multi-hour tlogs would slow naive correlation.
+- *Mitigation*: Subsample both streams to 10 Hz before FFT-based correlation.
+
+---
+
+## Implementation Notes (Batch 99 — Cycle 2)
+
+**Status**: In Testing (Jira AZ-698).
+
+### Files changed
+
+Production:
+- `src/gps_denied_onboard/replay_input/interface.py` — added `AlignedWindow` DTO, new `alignment_*` fields on `AutoSyncConfig`, optional `aligned_window` on `ReplayInputBundle`.
+- `src/gps_denied_onboard/replay_input/auto_sync.py` — added `find_aligned_window`, internal `_align_via_cross_correlation` (normalised cross-correlation per sliding window), `_fallback_to_head_takeoff`, `_resample_uniform`, `_zero_mean_normalise`, `_load_tlog_imu_energy_stream`, `_stream_duration_ns`.
+- `src/gps_denied_onboard/replay_input/tlog_video_adapter.py` — added `_run_auto_trim` branch in `open()`, threads `tlog_start_ns` to the adapter and `AlignedWindow` onto the returned bundle, two new `_LOG_KIND_*` logs.
+- `src/gps_denied_onboard/components/c8_fc_adapter/tlog_replay_adapter.py` — added `_tlog_start_ns` seek hook; `feed_one_message` skips messages with `_timestamp < _tlog_start_ns` and counts the drop.
+- `src/gps_denied_onboard/config/schema.py` — `auto_trim: bool` on `ReplayConfig` (mutex with `time_offset_ms`); `alignment_*` knobs on `ReplayAutoSyncConfig`.
+- `src/gps_denied_onboard/config/loader.py` — coercion entries for the new knobs.
+- `src/gps_denied_onboard/runtime_root/_replay_branch.py` — passes `auto_trim` and the new alignment knobs into the replay adapter constructor.
+- `src/gps_denied_onboard/cli/replay.py` — `--auto-trim` flag wired into `ReplayConfig`.
+
+Tests:
+- `tests/unit/replay_input/test_az698_window_alignment.py` — AC-1..AC-4 + fallback + immutability + CLI smoke (AC-5 skipped: real `flight_derkachi.mp4` is a 134 B placeholder).
+
+### AC coverage
+
+| AC | Test | Result |
+|----|------|--------|
+| AC-1 | `test_ac1_takeoff_aligned_offset_matches_az405_within_50ms` | PASS |
+| AC-2 | `test_ac2_mid_flight_alignment_locates_correct_window` | PASS |
+| AC-3 | `test_ac3_adapter_seek_skips_pre_window_messages`, `test_ac3_adapter_default_no_seek_passes_every_message` | PASS |
+| AC-4 | `test_ac4_validator_passes_for_takeoff_aligned_offset`, `test_ac4_validator_passes_for_mid_flight_offset` | PASS |
+| AC-5 | `test_ac5_cli_auto_trim_smoke_uses_find_aligned_window` | SKIPPED (real video missing) |
+
+### Test results
+
+50 passed, 2 skipped across the replay/c8 regression slice (`test_az698_window_alignment.py`, `test_az405_auto_sync.py`, `test_az405_replay_input_adapter.py`, `test_az399_tlog_replay_adapter.py`, `test_tlog_ground_truth.py`, `test_az697_gps_compare.py`, `test_khp20s30_factory.py`, `test_az687_pre_constructed_replay_mode.py`, `test_az269_config_loader.py`). No regressions.
+
+### Strict typing
+
+`mypy --strict` on the 8 modified `src/` files: 17 errors total, all pre-existing (verified by stashing this batch's `src/` changes and re-running). Zero new errors introduced by AZ-698.
+
+### Known limitations
+
+- AC-5 is a literal skip in this batch. The repo's `flight_derkachi.mp4` is a 134-byte placeholder, not a real recording. Real end-to-end CLI smoke against `derkachi.tlog` + the actual flight video is covered by AZ-699 (validation runner) once the video is sourced.
+- Pre-existing `mypy --strict` errors in `auto_sync.py`, `tlog_replay_adapter.py`, `tlog_video_adapter.py`, `_replay_branch.py`, `cli/replay.py`, and `loader.py` are out of scope per `coderule.mdc` (only fix pre-existing lints in the modified area when necessary). They were not necessary for AZ-698.
+
+### Algorithm note
+
+Implementation uses **normalised cross-correlation with per-window unit-norm** (each `len(flow_arr)`-sized slice of the tlog energy stream is zero-meaned + unit-normed before the dot product with the unit-normed flow stream). This makes the peak confidence scale-invariant — a 10 s motion burst inside a 300 s tlog produces a peak ≥ 0.95, where the original FFT-style correlation with full-length normalisation produced ≤ 0.3 and tripped the low-confidence fallback. Cost is O(N·M); with the 10 Hz subsample and a typical 300 s tlog × 10 s flow window, that's ~3 000 inner products — well below the NFR perf budget.
+
+### Follow-up: multi-flight tlog handling (post-batch-99 review)
+
+User reported that real `derkachi.tlog` contains **three takeoffs at the same field**, but the uploaded video covers only the **last** one. The original AZ-698 implementation was vulnerable in two places:
+
+1. NCC `argmax` returns the **first** index of the maximum — if all three flights produce comparable correlation peaks, the result would lock onto flight 1.
+2. The low-confidence fallback called `detect_tlog_takeoff` on the whole tlog, which is the AZ-405 head-takeoff detector — also locks onto flight 1.
+
+Both contradicted the spec line 22: *"the last chunk in tlog is relevant"*.
+
+Resolution: added a pre-NCC flight segmenter and made the aligner explicitly select the last flight before running NCC. The fallback also now uses the last segment's start instead of head-takeoff detection.
+
+#### New module surface
+
+- `_segment_flights_from_imu_energy(samples, *, motion_threshold, min_flight_duration_ns, max_internal_gap_ns) -> list[(start_ns, end_ns)]` — partitions the IMU energy stream into distinct flights. A flight is a contiguous span where energy stayed above threshold, with sub-threshold runs (cruise lulls) shorter than `max_internal_gap_ns`. Reused logic note: `_find_sustained_event` (AZ-405) returns only the FIRST qualifying window by design; partitioning all flights needs the fresh one-pass walk.
+
+#### New config knobs (`AutoSyncConfig` + `ReplayAutoSyncConfig`)
+
+| Knob | Default | Meaning |
+|------|---------|---------|
+| `alignment_segment_motion_threshold_g` | `0.10` | Min IMU energy (`|a| − 1g`, g-units) for a sample to count as in-flight. 0.10 captures cruise oscillation while ignoring stationary sensor noise (~ 0.02). |
+| `alignment_segment_min_flight_duration_seconds` | `30.0` | Discards short ground-startup blips. |
+| `alignment_segment_max_internal_gap_seconds` | `30.0` | Sub-threshold gaps shorter than this stay inside a flight. |
+
+#### New observability fields (`AlignedWindow`)
+
+- `flight_count_detected: int` — how many distinct flights the segmenter found. `1` for a clean single-flight tlog. `> 1` means multi-flight; the aligner always selected the last one.
+- `selected_flight_index: int` — zero-based; always `flight_count_detected - 1` when segmentation fired, else `-1` (segmenter found nothing — fall through to whole-tlog NCC, preserving pre-segmentation behaviour for degenerate inputs).
+
+Surfaced via the `replay.auto_trim.resolved` / `replay.auto_trim.fallback_to_takeoff` log records' `kv` dict so the operator can audit segment selection from the FDR.
+
+#### New tests
+
+| Test | Asserts |
+|------|---------|
+| `test_segmenter_one_flight_returns_single_span` | Single-flight tlog → 1 segment, correct bounds |
+| `test_segmenter_three_flights_returns_three_spans_in_order` | 3-flight tlog → 3 segments in chronological order |
+| `test_segmenter_drops_ground_blip_below_min_duration` | < 30 s motion is filtered out |
+| `test_segmenter_keeps_brief_cruise_lull_inside_flight` | 3 s mid-flight lull does NOT split a flight |
+| `test_find_aligned_window_picks_last_flight_for_multi_flight_tlog` | Full `find_aligned_window` pipeline on a 3-flight tlog → resulting window is inside flight 3, not flight 1 or 2 |
+| `test_align_via_cross_correlation_locks_onto_burst_inside_last_segment` | NCC path locks correctly on a pre-restricted-to-last-flight energy stream |
+| `test_find_aligned_window_uses_only_segment_for_segmented_tlog_fallback` | Low-confidence fallback uses segment start (last flight), NOT head-takeoff (flight 1) |
+
+All 19 AZ-698 tests pass, 1 expected skip (AC-5 real-video smoke). 113 tests pass in the broader regression slice — no regressions.
+
+Backward-compat verified: AC-1 / AC-2 / AC-3 / AC-4 tests exercise `_align_via_cross_correlation` directly and continue to pass; the segmentation gate only fires through `find_aligned_window`'s public entry point.
@@ -0,0 +1,153 @@
+# Real-flight validation runner + accuracy report
+
+**Task**: AZ-699_real_flight_validation_runner
+**Name**: Run estimator against real Derkachi tlog + video; compute honest accuracy metrics; write report
+**Description**: New e2e test `test_derkachi_real_tlog.py` that feeds the real `derkachi.tlog` (not the synth) into the replay pipeline, compares the JSONL output against the binary-tlog GPS truth (from AZ-697), and writes a structured Markdown accuracy report. Flips AC-3 from `@xfail` to a real PASS/FAIL verdict.
+**Complexity**: 3 points
+**Dependencies**: AZ-697
+**Component**: Blackbox Tests (epic AZ-696)
+**Tracker**: AZ-699
+**Epic**: AZ-696
+
+## Problem
+
+`tests/e2e/replay/test_derkachi_1min.py::test_ac3_within_100m_80pct_of_ticks`
+is permanently `@xfail`. Even when the test runs (Jetson Tier-2), the
+result is hidden — we have no honest measurement of estimator accuracy
+against a real flight. The cycle-1 retrospective (`_docs/06_metrics/retro_2026-05-20.md`)
+flagged this as the highest-impact open verification.
+
+The two contributors:
+1. Synth tlog (compares estimator to itself) — fixed by AZ-697.
+2. Unknown camera intrinsics — addressed by AZ-702 (T6, factory sheet).
+
+This task wires the real tlog + the calibration into a new test and
+produces the honest verdict + a structured report.
+
+## Outcome
+
+- A new test runs the full `gps-denied-replay` against `derkachi.tlog` +
+  `flight_derkachi.mp4` + `khp20s30_factory.json` (or the current
+  fallback) and reports honest accuracy metrics.
+- A structured report at `_docs/06_metrics/real_flight_validation_{YYYY-MM-DD}.md`
+  contains mean / p50 / p95 / p99 horizontal error, % within {10, 25, 50, 100} m,
+  vertical error stats, and notes the calibration assumption.
+- AC-3 emits a real PASS or honest FAIL verdict (no `@xfail` mask).
+
+## Scope
+
+### Included
+- New test `tests/e2e/replay/test_derkachi_real_tlog.py` parallel to the existing 1-min test but using the binary tlog.
+- Metric helpers (mean/p50/p95/p99 percentile + threshold-hit counters) live in `src/gps_denied_onboard/helpers/gps_compare.py` (extends AZ-697).
+- Report writer `tests/e2e/replay/_report_writer.py` (test helper, not production code).
+- Updated `_docs/06_metrics/real_flight_validation_{date}.md` artifact format documented in `_docs/02_document/tests/blackbox-tests.md`.
+
+### Excluded
+- Map visualization — AZ-700.
+- HTTP API — AZ-701.
+- Camera calibration acquisition — AZ-702 (this task ships with whatever calibration is current).
+- Editing the existing `test_derkachi_1min.py` (new test runs alongside).
+
+## Acceptance Criteria
+
+**AC-1: Real PASS/FAIL verdict (no mask)**
+Given the new test on Tier-2 Jetson
+When `pytest tests/e2e/replay/test_derkachi_real_tlog.py -m tier2` runs
+Then the result is PASS or FAIL — no `@xfail`, no `@skip`
+
+**AC-2: Structured report written**
+Given a successful invocation
+When the test finishes
+Then `_docs/06_metrics/real_flight_validation_{YYYY-MM-DD}.md` exists with all required metrics in a Markdown table
+
+**AC-3: FAIL message attributes calibration uncertainty**
+Given the test fails the 80 %/100 m gate
+When the failure message renders
+Then it references the calibration acquisition method (factory-sheet per AZ-702) and the residual budget
+
+**AC-4: Existing 1-min test untouched**
+Given the cycle-1 test `test_ac3_within_100m_80pct_of_ticks`
+When all changes land
+Then the existing `@xfail` test still exists and runs (we add, don't replace)
+
+## Non-Functional Requirements
+
+**Performance**
+- The new test must complete within the existing Jetson Tier-2 wall budget (≤ 15 min for a 60 s clip; report longer for longer clips).
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-2 | Report writer with mock metrics | Markdown contains every required row |
+| AC-3 | Failure message templating | Contains "calibration: factory-sheet" + budget |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-1 | Real derkachi.tlog + video + KHP20S30 calibration | Full replay + accuracy gate | PASS or FAIL (honest) | Perf ≤ 15 min |
+| AC-2 | After AC-1 run | Report file existence + contents | Structured report on disk | — |
+
+## Constraints
+
+- The new test MUST use the existing `gps-denied-replay` console-script — no inlined estimator invocation.
+- The report MUST be Markdown (not HTML/JSON) so it lives alongside other `_docs/06_metrics/` artifacts.
+- Skipping in CI when `RUN_REPLAY_E2E=0` is allowed (matches existing pattern); the test MUST run when the env var is set.
+
+## Risks & Mitigation
+
+**Risk 1: Honest FAIL exposes a true product gap**
+- *Risk*: The estimator may legitimately fail the 100 m/80 % gate even with correct calibration. Derkachi is cruise altitude with limited VPR anchor diversity.
+- *Mitigation*: That's the goal — honest measurement. Surface the gap; downstream cycles can tighten.
+
+**Risk 2: tlog format edge cases**
+- *Risk*: Real tlogs may carry non-standard system IDs, dialect mismatches, or corrupt segments.
+- *Mitigation*: AZ-697's AC-3 / AC-4 cover this at the truth-extractor level; this task only consumes the result.
+
+---
+
+## Implementation Notes (Batch 100 — Cycle 2)
+
+**Status**: In Testing (Jira AZ-699).
+
+### Files changed
+
+Production:
+- `src/gps_denied_onboard/helpers/gps_compare.py` — extended with `HorizontalErrorDistribution` DTO, `horizontal_error_distribution(emissions, ground_truth)` single-walk aggregator, and `percentile_sorted(values, pct)` linear-interpolation helper (numpy-equivalent). Re-exports added to `helpers/__init__.py`.
+
+Test helpers (under `tests/`, not production):
+- `tests/e2e/replay/_report_writer.py` — `render_report`, `format_failure_message`, `verdict_passes_ac3`, plus `ReportContext` DTO and the `AC3_GATE_THRESHOLD_M` / `AC3_GATE_PCT` constants.
+
+Tests:
+- `tests/e2e/replay/test_derkachi_real_tlog.py` — the AZ-699 e2e runner (skipped without real video + `RUN_REPLAY_E2E=1`; honest PASS/FAIL when prerequisites met).
+- `tests/unit/test_az699_report_writer.py` — 16 unit tests covering percentile arithmetic, distribution aggregator, verdict gate, failure-message templating, and report layout.
+
+Documentation:
+- `_docs/02_document/tests/blackbox-tests.md` — new entry **FT-P-20** documenting the artefact schema for `_docs/06_metrics/real_flight_validation_{YYYY-MM-DD}.md`.
+
+### AC coverage
+
+| AC | Test / Artefact | Result |
+|----|-----------------|--------|
+| AC-1 | `test_az699_real_flight_validation_emits_verdict_and_report` | SKIPPED on dev (real video missing); ready to run on Tier-2 Jetson with `RUN_REPLAY_E2E=1` + real video. NO `@xfail` mask. |
+| AC-2 | `test_render_report_contains_all_required_rows_on_pass`, `test_render_report_marks_failure_when_below_gate`, `test_render_report_includes_vertical_when_available` | PASS |
+| AC-3 | `test_failure_message_references_calibration_method_factory_sheet`, `test_failure_message_references_calibration_method_placeholder` | PASS |
+| AC-4 | `tests/e2e/replay/test_derkachi_1min.py` untouched (verified by diff scope: this batch added a sibling file, did not modify the existing one) | PASS |
+
+### Test results
+
+129 passed, 3 skipped (all documented prerequisites: AZ-699 e2e wants real video + Tier-2; AZ-698 AC-5 wants real video; AZ-399 AC-1 wants 500 MB tlog) in the focused regression slice. Full unit suite: 2219 passed, 1 pre-existing unrelated failure in `tests/unit/c12_operator_orchestrator/test_cli_console_script.py::test_cold_start_under_500ms_p99` (CLI cold-start NFR, 8/11 samples > 700 ms; touches none of AZ-697/698/699's code paths).
+
+### Strict typing
+
+`mypy --strict` on the three new modules (`helpers/gps_compare.py`, `helpers/__init__.py`, `tests/e2e/replay/_report_writer.py`): **Success: no issues found in 3 source files.**
+
+### Known limitations
+
+- AC-1 (the actual real-flight run on Tier-2 Jetson) cannot execute in this dev environment. The test is wired, gated cleanly, and ready — drop a real `flight_derkachi.mp4` (> 1 MB) into `_docs/00_problem/input_data/flight_derkachi/`, set `RUN_REPLAY_E2E=1`, and run on Tier-2 to produce the verdict + report.
+- Calibration acquisition method is read from the JSON's `acquisition_method` field. AZ-702 ships `factory-sheet`; any other value (or absence) is labelled `unknown` in the failure message.
+
+### Design note: helper module location
+
+`gps_compare.py` lives under `src/gps_denied_onboard/helpers/` (production) because both the AZ-699 test and the future AZ-701 HTTP-API path (T5) need it. `_report_writer.py` lives under `tests/e2e/replay/` because it is purely a test artefact — promoting it would invite production code to import a test helper, violating the file ownership rule documented in `_docs/02_document/module-layout.md`.
@@ -0,0 +1,148 @@
+# Replay map visualization (estimated vs ground-truth tracks)
+
+**Task**: AZ-700_replay_map_visualization
+**Name**: HTML map showing estimated GPS track vs tlog ground-truth track
+**Description**: New `gps-denied-render-map` console script. Takes a JSONL of estimator output + a tlog (or CSV fallback) and renders a single-file HTML map (folium / Leaflet) with both tracks in distinct colors, start/end markers, and an embedded accuracy summary from AZ-699.
+**Complexity**: 3 points
+**Dependencies**: AZ-699
+**Component**: cli (offline analysis surface)
+**Tracker**: AZ-700
+**Epic**: AZ-696
+
+## Problem
+
+Today the only feedback from a replay run is a JSONL file. There is no
+way to visually verify whether the estimator is drifting, jumping, or
+roughly tracking the real flight. A human reading the JSONL cannot
+quickly answer "does this make sense geographically?"
+
+The user's pipeline explicitly calls for: "and then show both points on
+the map."
+
+## Outcome
+
+- A standalone CLI `gps-denied-render-map` produces a self-contained
+  HTML map of the estimated track + the tlog ground-truth track for any
+  prior replay run.
+- The map is shareable as a single file (no server required); developers
+  open it locally; AZ-701's HTTP API serves it back to API consumers.
+
+## Scope
+
+### Included
+- New module `src/gps_denied_onboard/cli/render_map.py`.
+- New console script `gps-denied-render-map` in `pyproject.toml`.
+- folium dependency pin in the appropriate `[project.optional-dependencies]` group (NOT in airborne-binary deps — operator-side only).
+- Default map style + tile provider (OpenStreetMap fallback documented for offline use).
+- Auto-fit bounds; distance circles (100 m, 50 m) around start point for scale.
+- Accuracy summary banner (read from `_docs/06_metrics/real_flight_validation_{date}.md` when `--summary` is passed).
+
+### Excluded
+- Interactive time-slider playback (deferred follow-up).
+- Embedded altitude profile chart.
+- Animated marker traversal.
+
+## Acceptance Criteria
+
+**AC-1: CLI produces self-contained HTML**
+Given a JSONL + tlog
+When `gps-denied-render-map --estimated out.jsonl --truth derkachi.tlog --output map.html` runs
+Then `map.html` exists, parses as valid HTML, exits 0
+
+**AC-2: Two distinct tracks visible**
+Given the rendered map opened in a browser
+When inspected
+Then it contains exactly two polyline layers (red = truth, blue = estimated) with start/end markers
+
+**AC-3: Markers + scale circles**
+Given the rendered map
+When parsed
+Then it contains the start (green) + end (black) markers + 100 m + 50 m scale circles
+
+**AC-4: Accuracy summary inclusion**
+Given `--summary _docs/06_metrics/real_flight_validation_2026-XX-XX.md`
+When the map renders
+Then the HTML header contains the accuracy metrics table
+
+**AC-5: Offline fallback documented**
+Given an environment without internet access
+When the map is rendered with `--offline-tiles`
+Then tile loading uses a documented fallback (or fails fast with a clear error if no fallback is configured)
+
+## Non-Functional Requirements
+
+**Compatibility**
+- Output HTML must render in Chrome 110+ and Firefox 110+ without console errors.
+
+**Performance**
+- For a 60 s flight (~600 truth points + ~600 estimated points), render time < 5 s on Tier-1 hardware.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-1 | CLI invocation with synthetic data | Output HTML file exists + non-empty |
+| AC-2 | Parse output HTML | Exactly 2 polyline layers + 4 expected markers |
+| AC-4 | Summary embed | Markdown summary metrics present in HTML |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-1 | Real Derkachi replay JSONL + tlog | End-to-end render | HTML opens in browser, both tracks visible | Compat |
+
+## Constraints
+
+- folium MUST be in the operator-only dep group; airborne binary cold-start regression test must remain green.
+- HTML output MUST be self-contained — embedded JS/CSS, no per-page CDN calls in `--offline-tiles` mode.
+- Console script naming follows the project pattern (`gps-denied-<verb>`).
+
+## Risks & Mitigation
+
+**Risk 1: folium dep size**
+- *Risk*: folium pulls ~5 MB of JS. Adding to airborne deps would regress cold-start.
+- *Mitigation*: optional-dependencies group + ADR-002 build-time exclusion principle.
+
+**Risk 2: CDN dependency at render time**
+- *Risk*: Default folium uses Leaflet via CDN — fails on offline Jetsons.
+- *Mitigation*: Document `--offline-tiles` flag; provide bundled assets path or fail-fast.
+
+---
+
+## Implementation Notes (Batch 101 — Cycle 2)
+
+**Status**: In Testing (Jira AZ-700).
+
+### Files changed
+
+Production:
+- `src/gps_denied_onboard/cli/render_map.py` — new module: `RenderInputs` DTO, `render_map_html`, `load_estimated_track`, `load_ground_truth_track`, argparse CLI, `main()`.
+- `pyproject.toml` — new `[project.optional-dependencies] operator-tools = ["folium>=0.16,<1.0"]` group; new console script `gps-denied-render-map = "gps_denied_onboard.cli.render_map:main"`.
+
+Tests:
+- `tests/unit/test_az700_render_map.py` — 14 unit tests covering JSONL parsing, HTML rendering (2 polylines, 4 markers, 2 scale circles, summary embed, offline-tiles toggle), and CLI smoke including a minimal binary-tlog helper.
+
+### AC coverage
+
+| AC   | Test / Artefact                                                                          | Result |
+| ---- | ---------------------------------------------------------------------------------------- | ------ |
+| AC-1 | `test_cli_writes_html_with_default_tiles`                                                | PASS (local). The Jetson e2e visual smoke is `AC-4` and is operator-driven on Tier-2. |
+| AC-2 | `test_render_map_html_emits_two_polylines`, `…emits_four_markers_and_two_circles`        | PASS   |
+| AC-3 | `test_render_map_html_emits_two_polylines`, `…emits_four_markers_and_two_circles`        | PASS — output HTML contains exactly 2 polyline layers (red + blue) and 4 markers + 2 scale circles. |
+| AC-4 | Visual smoke on Tier-2 Jetson (operator opens `map.html` produced by AZ-699's e2e run)   | DEFERRED to Jetson — wired and ready. |
+| AC-5 | `test_render_map_html_offline_tiles_omits_openstreetmap`, `…_template_uses_local_url`    | PASS   |
+
+### Test results
+
+`pytest tests/unit/test_az700_render_map.py` → 14 passed in 2.5 s. Wider regression slice (AZ-697/698/699/700 + replay_input + calibration): 107 passed, 1 skipped (pre-existing AC-5 e2e smoke that needs real video).
+
+### Strict typing
+
+`mypy --strict src/gps_denied_onboard/cli/render_map.py` → **Success: no issues found in 1 source file.** Used `# type: ignore[import-untyped, import-not-found, unused-ignore]` on the lazy folium import so the strict pass is clean whether folium is installed or not.
+
+### Design notes
+
+- folium 0.20 (the latest in the pinned range) was used. The default tile provider is OpenStreetMap (`tiles="OpenStreetMap"`); the AC-5 `--offline-tiles` flag drops the base layer entirely, and `--offline-tiles-template` accepts a local tile-URL template for operators with a bundled tile pack.
+- folium is lazy-imported inside `_import_folium()` so the airborne binary (which does NOT install `[operator-tools]`) doesn't pay for it on cold start. The C12 cold-start NFR is unaffected.
+- The `_write_minimal_tlog` test helper builds a binary tlog with just `GLOBAL_POSITION_INT` records — that's the minimum AZ-697 needs — without coupling the test to the full Derkachi CSV schema used by `tests/e2e/replay/_tlog_synth.py`.
+- All AZ-700 unit tests run locally per the refined test-environment policy (`_docs/02_document/tests/environment.md` § Where each tier runs); the Tier-2 visual-smoke AC-4 stays on the Jetson.
@@ -0,0 +1,234 @@
+# HTTP Replay API service
+
+**Task**: AZ-701_http_replay_api_service
+**Name**: HTTP API for offline replay (POST tlog+video, return GPS fixes + map URL)
+**Description**: New `replay_api` component (FastAPI) wrapping the offline replay pipeline. One primary endpoint `POST /replay` accepts multipart `(tlog + video [+ calibration])` and returns either a synchronous JSONL+summary or an async job id. Returns links to the map artifact rendered by AZ-700.
+**Complexity**: 5 points
+**Dependencies**: AZ-699, AZ-700
+**Component**: replay_api (new component)
+**Tracker**: AZ-701
+**Epic**: AZ-696
+
+## Problem
+
+The product today has zero HTTP surface. The only ways to invoke the
+estimator on a recorded flight are:
+1. The airborne binary (real-time MAVLink GPS_INPUT — needs the
+   aircraft + FC).
+2. `gps-denied-replay` CLI (operator workstation, Python install
+   required).
+3. `operator-orchestrator` CLI (Click, pre-flight cache only — does
+   NOT run the estimator).
+
+External consumers (operator tools, suite web UIs, demo dashboards,
+other suite services) cannot validate flights without installing the
+full Python stack. The user's pipeline framing explicitly calls for
+"part of the api — tlog and video uploading. and emits gps fixes back
+to the user."
+
+## Outcome
+
+- A new HTTP service exposes `POST /replay` and the supporting `GET /jobs/{id}*` polling endpoints.
+- The service wraps `gps-denied-replay` and AZ-700's map renderer behind a single multipart upload.
+- Containerized; runs in `docker-compose.test.yml`; OpenAPI spec is committed.
+- Authentication via bearer token, gated explicitly off in dev mode (logs WARN).
+
+## Scope
+
+### Included
+- New component `src/gps_denied_onboard/replay_api/`:
+  - `app.py` (FastAPI instance)
+  - `handlers.py` (multipart upload, validation)
+  - `jobs.py` (sync ≤ 2 min videos / async > 2 min)
+  - `storage.py` (temp file lifecycle, cleanup)
+  - `interface.py` (`ReplayRunner` Protocol so handlers are decoupled)
+  - `errors.py` (custom HTTP error families)
+- Endpoints: `POST /replay`, `GET /jobs/{id}`, `GET /jobs/{id}/result`, `GET /jobs/{id}/map`, `GET /healthz`, `GET /readyz`.
+- Bearer-token auth: `REPLAY_API_BEARER_TOKEN` env var; explicit dev opt-out via `REPLAY_API_AUTH_REQUIRED=false`.
+- Upload size limit + concurrent-job limit, env-configurable.
+- New `replay-api` console script (uvicorn entrypoint) in `pyproject.toml`.
+- New `docker/replay-api.Dockerfile` + `docker-compose.test.yml` entry.
+- OpenAPI spec exported to `_docs/02_document/contracts/replay_api/openapi.yaml`.
+- Contract file `_docs/02_document/contracts/replay_api/replay_api_protocol.md` (per shared/api decompose Step 4.5 rule).
+- File-upload magic-byte validation for `.tlog` + `.mp4`.
+
+### Excluded
+- Web UI (parent-suite concern).
+- Persistent job database (in-memory + temp disk is sufficient for v1).
+- Multi-node job distribution.
+- WebSocket streaming of progress.
+
+## Acceptance Criteria
+
+**AC-1: Sync happy path (short video, dev mode)**
+Given `REPLAY_API_AUTH_REQUIRED=false` and a 60 s video
+When `POST /replay` runs with multipart `tlog + video`
+Then response is 200 with JSONL of GPS fixes + accuracy summary inline
+
+**AC-2: Async happy path (long video)**
+Given a > 2-minute video
+When `POST /replay` runs
+Then response is 202 with `Location: /jobs/{id}` and `{job_id, status_url}`
+
+**AC-3: Job state transitions**
+Given an async job
+When polled via `GET /jobs/{id}`
+Then state transitions `queued → running → done` are observable
+
+**AC-4: Result + map served from job id**
+Given a `done` job
+When `GET /jobs/{id}/result` is called
+Then it streams the JSONL; `GET /jobs/{id}/map` returns the HTML map (from AZ-700)
+
+**AC-5: Auth enforced when configured**
+Given `REPLAY_API_BEARER_TOKEN=secret`
+When `POST /replay` runs without `Authorization: Bearer secret`
+Then response is 401
+
+**AC-6: Health endpoints**
+Given the service is up and `gps-denied-replay` console-script is on PATH
+When `GET /healthz` and `GET /readyz` are called
+Then both return 200
+
+**AC-7: OpenAPI + contract documented**
+Given the service is running
+When the OpenAPI spec is exported
+Then `_docs/02_document/contracts/replay_api/openapi.yaml` is committed; `replay_api_protocol.md` documents the versioning rules
+
+**AC-8: Concurrency limit enforced**
+Given `REPLAY_API_MAX_CONCURRENT_JOBS=1`
+When 3 jobs are submitted in quick succession
+Then exactly 1 is `running`; 2 are `queued`
+
+**AC-9: Magic-byte upload validation**
+Given a `POST /replay` with a misnamed `.tlog` (actually a `.zip`)
+When the handler validates
+Then response is 400 with a clear error
+
+## Non-Functional Requirements
+
+**Performance**
+- For a 60 s Derkachi video, sync `POST /replay` returns within `gps-denied-replay` ASAP-mode wall + 5 s overhead on Tier-2 Jetson.
+
+**Security**
+- Magic-byte file validation; reject anything not matching `.tlog` (MAVLink magic 0xFD/0xFE) or `.mp4` (ftyp).
+- Bearer auth always available; default-OFF only with explicit env var.
+
+**Compatibility**
+- FastAPI / uvicorn / python-multipart pinned; document version compatibility window.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-1 | Sync POST → 200 + JSONL | Round-trip succeeds with synth fixtures |
+| AC-2 | Async POST → 202 + job id | 202 with Location header |
+| AC-3 | Job state machine | Transitions observed |
+| AC-5 | Missing/wrong bearer → 401 | Strict failure |
+| AC-8 | Concurrency limit | 2 of 3 queued |
+| AC-9 | Wrong magic bytes → 400 | Clear error |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-1, AC-4 | Real derkachi.tlog + video | `curl` round-trip in docker-compose | 200 + JSONL + map HTML | Perf |
+| AC-6 | Container up | Health endpoint checks | 200 OK | — |
+
+## Constraints
+
+- FastAPI MUST live in an operator-only build target; ADR-002 binary-exclusion applies. Airborne binary cold-start regression test must remain green.
+- New component MUST follow interface-first + constructor-injection (Principle #13 in architecture.md).
+- Contract file MUST exist before the endpoint is callable in CI (per decompose Step 4.5 rule).
+
+## Risks & Mitigation
+
+**Risk 1: FastAPI / uvicorn dep weight on airborne binary**
+- *Risk*: Adding the API dep to the airborne binary regresses cold-start.
+- *Mitigation*: Place `replay_api/` in an operator-only optional-dependencies group; CMake / build-time exclusion enforces.
+
+**Risk 2: HTTP timeout on long videos**
+- *Risk*: Sync mode + a long video → HTTP timeout.
+- *Mitigation*: Async mode triggers automatically above the configured video-length threshold.
+
+**Risk 3: File-upload abuse**
+- *Risk*: Malicious uploads (huge files, zip bombs, fake MIME types).
+- *Mitigation*: Hard size limit (2 GB default), magic-byte validation, temp-file cleanup, configurable disk quota.
+
+## Contract
+
+This task produces the contract at `_docs/02_document/contracts/replay_api/replay_api_protocol.md`.
+Consumers MUST read that file — not this task spec — to discover the interface and versioning rules.
+
+## Implementation Notes (Batch 102, Cycle 2)
+
+### Files Changed
+
+**New production code** — `src/gps_denied_onboard/replay_api/`:
+- `__init__.py` — public exports (`create_app`, DTOs, error families).
+- `errors.py` — `ReplayApiError` hierarchy with stable `error_code` + HTTP `status_code`.
+- `interface.py` — `JobState`, `ReplayInputs`, `ReplayJobResult`, `JobSnapshot`, `ReplayRunner` Protocol.
+- `storage.py` — per-job temp directory lifecycle (`StorageRoot.allocate_job/release_job/cleanup_all`).
+- `jobs.py` — in-memory `JobRegistry` with `max_concurrent` / `max_queued`, `ThreadPoolExecutor` worker pool.
+- `handlers.py` — magic-byte validation (`validate_tlog_kind` for MAVLink v1/v2, `validate_video_kind` for MP4 `ftyp`, `validate_calibration_kind` for JSON), size limits, bearer-token extraction.
+- `app.py` — `create_app(...)` FastAPI factory + `SubprocessReplayRunner` (shells out to `gps-denied-replay --auto-trim` and `gps-denied-render-map`).
+
+**New CLI entrypoint** — `src/gps_denied_onboard/cli/replay_api_entrypoint.py`:
+- `replay-api` console script wired in `pyproject.toml` under the `operator-tools` extra.
+- Parses `--host`, `--port`, `--storage-root`, `--reload`; reads `REPLAY_API_*` env knobs.
+
+**Helper promoted from tests** — `src/gps_denied_onboard/helpers/accuracy_report.py`:
+- Was `tests/e2e/replay/_report_writer.py` (AZ-699 batch). Promoted because `replay_api` needs it at runtime to produce `accuracy_report.md`. Re-exported from `helpers/__init__.py`. All AZ-699 imports re-pointed.
+
+**Contract** — `_docs/02_document/contracts/replay_api/replay_api_protocol.md` (purpose, invariants, endpoints, error families, env config, versioning).
+
+**OpenAPI spec** — `_docs/02_document/contracts/replay_api/openapi.yaml` (auto-exported from the FastAPI app; check in alongside the contract doc).
+
+**Docker** — `docker/replay-api.Dockerfile` + `e2e/docker/docker-compose.test.yml` (`replay-api` service, profile-gated `replay-api`, with `replay-api-storage` volume).
+
+**Dependencies** — `pyproject.toml`:
+- `operator-tools` extra now also pulls `fastapi>=0.111,<0.120`, `uvicorn>=0.30,<1.0`, `python-multipart>=0.0.9,<1.0`.
+- New console script `replay-api`.
+
+**Unit tests** — `tests/unit/replay_api/test_az701_replay_api.py` (18 tests, all passing):
+- AC-1 sync: POST → 200 with `result_url`/`map_url`/`report_url`; JSONL + HTML map served from those URLs.
+- AC-2 async: large video (> sync threshold) → 202 + `Location: /jobs/{id}`.
+- AC-3: job state visible via polling `RUNNING → DONE` and `RUNNING → FAILED`.
+- AC-5: missing bearer → 401; correct bearer → 200.
+- AC-6: `/healthz` always 200; `/readyz` returns 503 when binaries missing.
+- AC-8: third job queued when concurrency limit is 2; 4th rejected with 429.
+- AC-9: zip renamed to `.tlog` or `.mp4` → 400 with stable `error_code`.
+
+### AC Coverage Matrix
+
+| AC | Status | Evidence |
+|----|--------|----------|
+| AC-1 sync 200 | Done | `test_post_replay_sync_returns_200_with_result_urls` + `test_post_replay_serves_jsonl_and_map_for_done_job` |
+| AC-2 async 202 | Done | `test_post_replay_async_returns_202_when_video_exceeds_sync_bytes` |
+| AC-3 job state machine | Done | `test_job_state_transitions_observable_via_polling`, `test_failed_runner_marks_job_failed`, `test_result_endpoints_409_when_job_not_done` |
+| AC-5 401 on bad bearer | Done | `test_post_replay_returns_401_without_bearer_when_required` + `test_post_replay_accepts_correct_bearer` |
+| AC-6 health endpoints | Done | `test_healthz_always_returns_200` + `test_readyz_returns_503_when_binary_missing` |
+| AC-8 concurrency cap | Done | `test_concurrency_limit_queues_excess_jobs` + `test_queue_full_returns_429` |
+| AC-9 magic-byte rejection | Done | `test_validate_tlog_kind_rejects_zip_renamed_to_tlog`, `test_validate_video_kind_rejects_arbitrary_bytes`, `test_post_replay_rejects_misnamed_zip_as_tlog`, `test_post_replay_rejects_misnamed_zip_as_video` |
+
+### Test Run Summary
+
+- **AZ-701 unit slice**: 18/18 passed (`tests/unit/replay_api/`).
+- **Full unit suite**: 2251 passed, 86 skipped, 1 failed (`test_cold_start_under_500ms_p99` — pre-existing C12 CLI flake unrelated to AZ-701; same failure observed in batches 100 and 101).
+- **Mypy --strict on AZ-701 surface**: clean (9 source files: `replay_api/*`, `helpers/accuracy_report.py`, `cli/replay_api_entrypoint.py`).
+
+### Design Decisions
+
+- **Subprocess runner, not in-process estimator**: `SubprocessReplayRunner` invokes the existing `gps-denied-replay` console script. Keeps the API a thin transport layer; matches the invariant in the contract that the API does NOT re-implement the pipeline.
+- **Pre-allocated job_id**: the handler allocates a job_id, writes uploads into the matching storage dir, then passes the id to `JobRegistry.submit(job_id=...)`. Earlier draft used a separate registry-assigned id and tried to "release-then-resubmit"; that path deleted the dir holding the uploads. Fixed by adding the optional `job_id` parameter.
+- **`from __future__ import annotations` deliberately dropped in `app.py`**: FastAPI 0.119 + Pydantic v2 resolve route-parameter annotations at decoration time. Forward-ref strings break `Annotated[UploadFile, File()]`. The rest of the `replay_api` package keeps the future-annotations import. The reason is captured in the `app.py` module docstring.
+- **Pydantic v2 `Annotated` syntax**: every route parameter uses `Annotated[T, File()/Form()/Header()]` rather than the legacy `T = File(...)` form. Older form raised `PydanticUserError: 'UploadFile' is not fully defined`.
+- **Magic-byte validation is mandatory, not advisory**: matches AC-9 wording ("Wrong magic bytes → 400"). Anything that's not MAVLink v1/v2 (`\xfe` / `\xfd` first byte) is rejected as tlog; anything without `ftyp` in bytes 4-12 is rejected as video. No `application/x-mavlink` content-type sniffing.
+- **State is in-memory only**: matches "no persistent state across restarts" invariant in the contract. Operators wanting durability can layer it externally (or move to AZ-702 follow-on). Documented in the contract.
+
+### Known Limitations
+
+- `SubprocessReplayRunner` returns `result.stdout`/`stderr` only when the subprocess fails; success path discards them. Operators wanting a per-job audit log will need a follow-on.
+- No request body streaming — `python-multipart` buffers each part. The 2 GB hard limit guards memory.
+- No rate limiting beyond the concurrency/queue caps. A reverse proxy is the right place for that.
+- E2E test against the real Derkachi flight artefacts is intentionally NOT in scope here (per the testing-environment rule: e2e runs on Jetson only and AZ-699's `test_derkachi_real_tlog.py` already exercises the underlying pipeline).
@@ -0,0 +1,106 @@
+# Topotek KHP20S30 camera calibration (factory-sheet approximation)
+
+**Task**: AZ-702_khp20s30_calibration
+**Name**: Provide a calibration JSON for the Topotek KHP20S30 nadir camera (factory-sheet approximation)
+**Description**: Compute and commit a `CameraCalibrationArtifact` JSON for the Derkachi camera (Topotek KHP20S30) from manufacturer factory data. Replaces the `adti26.json` placeholder that AC-3 currently uses. Documents the residual error vs a per-unit checkerboard refinement.
+**Complexity**: 1 point
+**Dependencies**: None
+**Component**: input_data / shared_helpers
+**Tracker**: AZ-702
+**Epic**: AZ-696
+
+## Problem
+
+`_docs/00_problem/input_data/flight_derkachi/camera_info.md` states the
+Topotek KHP20S30 intrinsics are unknown. `tests/e2e/replay/conftest.py`
+(line 50–56) substitutes `tests/fixtures/calibration/adti26.json` as a
+placeholder. AC-3 (≤ 100 m horizontal error for 80 % of ticks) is
+`@xfail` until a real calibration ships.
+
+The cheapest reasonable starting point is a factory-sheet approximation
+— compute `K` from the manufacturer's published focal length + sensor
+geometry, accept the 1–3 % focal-length residual as a documented
+budget, and let AC-3 either PASS or honestly FAIL with the residual
+attributed.
+
+## Outcome
+
+- A calibration JSON `khp20s30_factory.json` exists in the Derkachi
+  input directory, parses against the project's
+  `CameraCalibrationArtifact` schema, and documents the acquisition
+  method as `factory_sheet`.
+- `camera_info.md` is updated to reference the new calibration + the
+  residual budget + the deferral handle (`AZ-XXX_checkerboard_refinement`).
+- AZ-699 (T3) uses this calibration as its `--camera-calibration` input.
+
+## Scope
+
+### Included
+- Source manufacturer factory data for the Topotek KHP20S30 (sensor: 1/2.8" CMOS, 2.13 MP, 1920×1080; lens focal length, FOV, pixel pitch).
+- Compute `K = [[fx, 0, cx], [0, fy, cy], [0, 0, 1]]` from `fx = fy = focal_length_mm × (image_width_px / sensor_width_mm)`.
+- Set distortion to `[0, 0, 0, 0, 0]` (factory-sheet approximation).
+- Set `body_to_camera_se3` to identity-down (nadir; camera-z = aircraft-down).
+- Set `acquisition_method = "factory_sheet"`.
+- Write `_docs/00_problem/input_data/flight_derkachi/khp20s30_factory.json`.
+- Update `_docs/00_problem/input_data/flight_derkachi/camera_info.md`.
+- New unit test under `tests/unit/calibration/` asserting the JSON parses and matches the documented inputs.
+
+### Excluded
+- Physical checkerboard calibration (needs hardware).
+- PnP-from-tlog back-computation (deferred follow-up).
+- Updating `adti26.json` or other test fixtures.
+
+## Acceptance Criteria
+
+**AC-1: Calibration JSON parses**
+Given the new `khp20s30_factory.json`
+When loaded by the project's calibration parser (same schema as `adti26.json`)
+Then it parses without error and all fields are populated
+
+**AC-2: Doc updated**
+Given `camera_info.md` before
+When the calibration is committed
+Then `camera_info.md` says "factory-sheet approximation; per-unit checkerboard refinement deferred — see <future-task>" and lists the residual budget
+
+**AC-3: Unit test snapshot**
+Given the new JSON
+When the unit test runs
+Then it asserts `fx == fy` (square pixels), `cx ≈ width/2`, `cy ≈ height/2`, distortion all zero
+
+**AC-4: T3 consumes this calibration**
+Given AZ-699's `test_derkachi_real_tlog.py`
+When it runs
+Then it loads `khp20s30_factory.json` as `--camera-calibration` (no longer the `adti26.json` placeholder)
+
+## Non-Functional Requirements
+
+**Compatibility**
+- JSON schema MUST be identical to existing calibration fixtures (`adti26.json`) — no schema changes in this task.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-1 | JSON loads via existing parser | Object populated |
+| AC-3 | Field values match factory inputs | fx == fy, cx/cy at centre, zero distortion |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-4 | T3 test pointed at new JSON | T3 launches without calibration parse error | Test starts cleanly | Compat |
+
+## Constraints
+
+- MUST follow the calibration contract in `_docs/02_document/contracts/shared_helpers/descriptor_normaliser.md` (or wherever the camera-calibration schema lives).
+- MUST be a single committed JSON — no generator script with side effects.
+
+## Risks & Mitigation
+
+**Risk 1: Factory data unavailable at required precision**
+- *Risk*: Topotek does not publish the exact focal length / sensor width to the precision needed.
+- *Mitigation*: Document the gap; ship with the best-available estimate; flag in `camera_info.md` so T3 surfaces the uncertainty in its failure message.
+
+**Risk 2: Residual error exceeds AC-3 budget**
+- *Risk*: 1–3 % focal-length error may push horizontal error past 100 m at 1 km AGL.
+- *Mitigation*: That's the honest finding. T3 reports it. A follow-up task can pursue checkerboard refinement if needed.
@@ -0,0 +1,168 @@
+# Open-loop ESKF composition profile for the airborne binary
+
+**Task**: AZ-776_eskf_open_loop_composition_profile
+**Name**: Make the c5_state=eskf path runnable end-to-end by allowing c4_pose to be excluded from the airborne composition
+**Description**: Introduce an explicit `c4_pose.enabled: bool` config flag (default True). When False, the airborne bootstrap skips C4 wiring entirely, the C5 ESKF estimator composes alongside C1 (and any other configured components) with no iSAM2 graph handle, and `gps-denied-replay` exits 0 on the Derkachi fixture with one EstimatorOutput per video frame. Unblocks five of the seven `@xfail`-masked Derkachi e2e tests on Jetson.
+**Complexity**: 3 points
+**Dependencies**: None
+**Component**: runtime_root / c4_pose / c5_state
+**Tracker**: AZ-776
+**Epic**: AZ-602
+
+## Problem
+
+The c5_state strategy `eskf` is registered (`_STRATEGY_REGISTRY`),
+documented as the IT-12 mandatory simple baseline (`eskf_baseline.py`
+line 30), and has unit tests
+(`tests/unit/c5_state/test_az386_eskf_baseline.py`). It has NEVER been
+composition-tested end-to-end. When the airborne binary is configured
+with `config.components.c5_state.strategy = eskf`, the bootstrap fails
+at compose time:
+
+```
+PoseEstimatorConfigError: build_pose_estimator: isam2_graph_handle does
+  not satisfy the C4 ISam2GraphHandle Protocol (missing get_pose_key /
+  update / compute_marginals / last_anchor_age_ms?)
+```
+
+Root cause: `EskfStateEstimator._build_eskf_state_estimator` returns
+`(estimator, handle=None)` by design — the ESKF has no GTSAM graph.
+`airborne_bootstrap._build_c5_state_estimator_pair` (line 779) seeds
+`pre_constructed['c5_isam2_graph_handle'] = None`.
+`_c4_pose_wrapper` (line 371) extracts the None.
+`pose_factory.build_pose_estimator` (line 127) rejects it.
+
+The IT-12 "ESKF is the mandatory simple baseline" mandate has no
+executable production path today — the baseline exists at the component
+layer but not at the system layer. Every Derkachi e2e attempt with
+`c5_state=eskf` exits non-zero at compose time, producing 0 JSONL rows
+and masking every downstream behaviour the heavy ACs are supposed to
+exercise.
+
+## Outcome
+
+- The airborne bootstrap can compose a binary with `c1_vio` + `c5_state(strategy=eskf)` end-to-end against a real `gps-denied-replay` invocation without composing `c4_pose` and without a stub or no-op pose estimator masking the absence.
+- The composition-mode selection is explicit in YAML — operator sets `config.components.c4_pose.enabled = False` (or omits the block entirely) to opt into the open-loop profile.
+- `_run_replay_loop` in `runtime_root/__init__.py` is exercised end-to-end on Jetson by a non-`xfail` integration test (AC-1, AC-2, AC-5, AC-6 realtime, AC-6 asap in `tests/e2e/replay/test_derkachi_1min.py` un-xfail and pass).
+- The replay protocol document records the new open-loop ESKF variant alongside the existing full-GTSAM path so future readers do not have to reverse-engineer it from the code.
+- AZ-602 (E2E Tier-1 harness rehabilitation epic) advances by 5/7 of the Derkachi xfails removed; the remaining 2 (AC-3 and AZ-699) are AZ-777 territory.
+
+## Scope
+
+### Included
+
+- Add an `enabled: bool = True` field to `C4PoseConfig` (and the corresponding YAML schema in `config/schema.py`).
+- In `airborne_bootstrap.build_pre_constructed`: when `_replay_omits_component_block(config, "c4_pose")` OR `config.components["c4_pose"].enabled is False`, skip `_build_c5_state_estimator_pair`'s iSAM2-handle requirement; the C5 ESKF estimator still builds, but the handle slot is absent from the returned dict.
+- In `airborne_bootstrap._AIRBORNE_REGISTRATIONS` (or wherever `compose_root` walks the component slugs): exclude `c4_pose` from the topological walk when the config flag says so. The downstream edge `_C5_STATE_DEPENDS_ON = ("c1_vio", "c4_pose")` becomes `("c1_vio",)` for the open-loop profile (or the dependency-walker treats a deselected slug as a no-op edge).
+- Validation in `compose_root`: when `c4_pose.enabled is False`, refuse `c5_state.strategy = gtsam_isam2` with a clear `CompositionError` — the gtsam_isam2 estimator needs a real handle, so the open-loop profile is only valid against ESKF. Symmetric refusal when `c4_pose.enabled is True` AND `c5_state.strategy = eskf` (the ESKF still returns `handle=None`, so c4 would still reject).
+- `tests/unit/runtime_root/`: composition test that exercises the open-loop profile end-to-end with `compose_root`, asserting (i) the returned components dict contains `c1_vio` and `c5_state` but NOT `c4_pose`; (ii) `gps-denied-replay --config <open-loop yaml>` exits 0 against a minimal fixture and emits one EstimatorOutput per video frame.
+- `tests/unit/runtime_root/`: regression test that the live (full-GTSAM) path still composes identically after the flag is added — `c4_pose.enabled` unset defaults to True and the existing topological walk is unchanged.
+- `_docs/02_document/contracts/replay/replay_protocol.md`: add an `## Open-loop ESKF variant` section documenting the per-frame loop pseudocode (mirrors the existing GTSAM one minus C2–C4 stages) plus the YAML shape that selects it.
+- `_docs/02_document/adr/`: create the ADR folder if absent, then write `ADR-012_open_loop_eskf_composition.md` recording the `c4_pose.enabled` flag, the strategy-pairing rules, and why the per-component flag was chosen over a top-level `composition_profile` knob.
+- `tests/e2e/replay/test_derkachi_1min.py`: remove the `@pytest.mark.xfail` decorators on AC-1 (line 61), AC-2 (line 138), AC-5 (line 413), AC-6 realtime (line 453), AC-6 asap (line 479). Update the test conftest or fixture to drive the open-loop YAML profile.
+- `_docs/02_document/components/06_c4_pose.md`: amend the component doc to document the `enabled` flag.
+
+### Excluded
+
+- Building the Derkachi C6 reference tile cache + descriptor index (AZ-777 territory).
+- Implementing C2/C3/C4 anchoring against Derkachi imagery (depends on AZ-777).
+- Un-xfailing `test_ac3_within_100m_80pct_of_ticks` (`test_derkachi_1min.py` line 174) — that test needs satellite anchoring, requires AZ-777.
+- Un-xfailing `test_az699_real_flight_validation_emits_verdict_and_report` (`test_derkachi_real_tlog.py` line 174) — also requires AZ-777 for accuracy gate.
+- Changing the production default — this task makes ESKF *runnable*, not *the default*.
+- Renaming any existing config field, YAML block, or factory function (per AZ-618 umbrella's "MUST NOT touch any per-component factory signature" constraint).
+
+## Acceptance Criteria
+
+**AC-1: Open-loop profile composes**
+Given a YAML config with `config.components.c4_pose.enabled = False` and `config.components.c5_state.strategy = eskf`
+When `compose_root(config, pre_constructed=build_pre_constructed(config))` runs
+Then it returns a components dict containing `c1_vio` and `c5_state` but NOT `c4_pose`, and no `PoseEstimatorConfigError` is raised
+
+**AC-2: Full GTSAM profile unchanged**
+Given a YAML config with `config.components.c4_pose` absent or `enabled = True` and `config.components.c5_state.strategy = gtsam_isam2`
+When `compose_root` runs
+Then the returned components dict contains `c1_vio`, `c4_pose`, `c5_state` exactly as before this task
+
+**AC-3: Invalid pairing rejected**
+Given a YAML config that pairs `c4_pose.enabled = True` with `c5_state.strategy = eskf`, OR `c4_pose.enabled = False` with `c5_state.strategy = gtsam_isam2`
+When `compose_root` runs
+Then it raises `CompositionError` naming both the conflicting fields and the valid pairings
+
+**AC-4: Replay binary exits 0 on Derkachi open-loop**
+Given the Derkachi 1-min fixture and a YAML config selecting the open-loop ESKF profile
+When `gps-denied-replay --config <open-loop.yaml> --video <derkachi> --tlog <derkachi> --output <out.jsonl>` runs on Jetson
+Then it exits with code 0 and `<out.jsonl>` contains one EstimatorOutput line per video frame (±10 % to allow for VIO INIT-state skips on the first few frames)
+
+**AC-5: Replay protocol documents the variant**
+Given the updated `_docs/02_document/contracts/replay/replay_protocol.md`
+When a reader looks for the open-loop ESKF composition
+Then there is a dedicated `## Open-loop ESKF variant` section with per-frame loop pseudocode and the YAML shape that selects it
+
+**AC-6: ADR records the design choice**
+Given the new `_docs/02_document/adr/ADR-012_open_loop_eskf_composition.md`
+When a reader looks for why the per-component flag was chosen
+Then ADR-012 records the alternatives considered (per-component flag vs. top-level profile vs. implicit rule) and the rationale for the per-component flag
+
+**AC-7: Five Derkachi e2e xfails removed**
+Given `tests/e2e/replay/test_derkachi_1min.py` after this task
+When the file is read
+Then the `@pytest.mark.xfail` decorators on AC-1 (line 61), AC-2 (line 138), AC-5 (line 413), AC-6 realtime (line 453), AC-6 asap (line 479) are removed and the underlying tests pass on the Jetson harness
+
+## Non-Functional Requirements
+
+**Performance**
+- No measurable latency regression on the full-GTSAM path (the new flag check is O(1) at compose time, zero overhead per-frame).
+- Open-loop profile per-frame latency budget remains the same 400 ms p95 (the path is strictly shorter — fewer components — so this should improve, not regress).
+
+**Compatibility**
+- Default `c4_pose.enabled = True` so every existing config file behaves identically without modification.
+- No changes to `compose_root` public signature, `build_pre_constructed` public signature, or any per-component factory signature (per AZ-618 umbrella constraint).
+
+**Reliability**
+- Invalid YAML pairings (open-loop + gtsam_isam2 OR full-GTSAM + eskf) MUST fail loud at compose time, never silently fall back. An operator who misconfigures the profile sees a clear error referencing both conflicting fields, not a downstream `KeyError` or stack trace.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|--------------|------------------|
+| AC-1   | `compose_root` with open-loop YAML → returned components dict | Contains `c1_vio`, `c5_state`; no `c4_pose` key |
+| AC-1   | `build_pre_constructed` with open-loop YAML → no `c5_isam2_graph_handle` key in result | Key absent from dict |
+| AC-2   | `compose_root` with default full-GTSAM YAML → returned components dict | Contains `c1_vio`, `c4_pose`, `c5_state` (regression guard) |
+| AC-3   | `compose_root` with `c4_pose.enabled=False` + `c5_state.strategy=gtsam_isam2` | `CompositionError` raised naming both fields |
+| AC-3   | `compose_root` with `c4_pose.enabled=True` + `c5_state.strategy=eskf` | `CompositionError` raised naming both fields |
+| C4PoseConfig | `enabled` field defaults to `True` when unset in YAML | Round-trips through `load_config` |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|--------------|-------------------|----------------|
+| AC-4   | Derkachi 1-min fixture + open-loop YAML profile | `gps-denied-replay` invoked on Jetson via `scripts/run-tests-jetson.sh` with `RUN_REPLAY_E2E=1 GPS_DENIED_TIER=2` | Exit 0; one EstimatorOutput line per video frame ±10 % | Perf, Compat |
+| AC-7   | `test_derkachi_1min.py` AC-1, AC-2, AC-5, AC-6 realtime, AC-6 asap | Tests run on Jetson after this task | All five pass (xfail decorators removed) | Reliability |
+
+## Constraints
+
+- The flag MUST be `c4_pose.enabled: bool`, not a top-level `composition_profile` knob and not an implicit ADR-only rule. User chose this shape during AZ-776 scoping (cycle-3 Step 9, 2026-05-21). Rationale recorded in ADR-012.
+- The flag MUST NOT change the C4 estimator's own runtime behaviour — it ONLY affects whether the component is wired into the composition graph. The C4 estimator itself remains a fully functional component.
+- The `_run_replay_loop` warning at `runtime_root/__init__.py` (`replay_loop.satellite_anchoring_not_wired`, emitted every 25 frames) is the existing honest signal for the open-loop path. Do NOT remove it; AZ-777 will replace the wiring it warns about. The warning remains visible in the JSONL output during AZ-776 → AZ-777 transition.
+- ADR file naming follows the existing pattern (`ADR-NNN_<slug>.md`). Increment past the highest current ADR number; if `_docs/02_document/adr/` is empty (as today), start at ADR-012 to leave room for the ADR-001..ADR-011 references already in code+docs.
+
+## Risks & Mitigation
+
+**Risk 1: Topological-walk regression on the live path**
+- *Risk*: Modifying `_AIRBORNE_REGISTRATIONS` or the walker logic could silently break the live GTSAM composition.
+- *Mitigation*: AC-2 is a regression guard. Run the full `tests/unit/runtime_root/` suite before declaring done.
+
+**Risk 2: ESKF estimator surfaces a latent bug under the open-loop wiring**
+- *Risk*: Until this task lands, `EskfStateEstimator` has never been driven by a real composition root + replay binary. Latent bugs (state initialization, IMU preintegration handoff, FDR write paths) may surface only at AC-4.
+- *Mitigation*: Reproduce on Tier-1 Colima first (no GPU needed for ESKF). Use the Tier-2 Jetson run only as the AC-4 confirmation, not as the debug environment. Any non-trivial ESKF fix surfaced this way becomes a sibling ticket — do not in-scope creep.
+
+**Risk 3: Replay protocol doc inconsistency**
+- *Risk*: The new `## Open-loop ESKF variant` section may contradict the existing "Wire C1–C5 + C6 + C7 + C13 exactly as in the live composition" line at line 188.
+- *Mitigation*: Amend line 188 to say "Wire C1–C5 + C6 + C7 + C13 exactly as in the live composition **for the full-GTSAM profile**; see `## Open-loop ESKF variant` for the open-loop case". One sentence; no ambiguity.
+
+### ADR Impact
+
+> Affects ADR-001 (composition root is single registration site): unchanged — still one registration site. The new flag selects *what* is registered, not *where*.
+> Affects ADR-002 (build-flag gate is the lazy-loading boundary): unchanged — `BUILD_*` flags still gate physical linkage; `c4_pose.enabled` is a *runtime* selection on top of a built strategy.
+> Affects ADR-009 (interface-first DI): unchanged — the open-loop path still resolves through interfaces, just with the C4 interface unbound.
+> Affects ADR-011 (replay is a configuration, not a separate composition root): unchanged — open-loop is a configuration of the same airborne composition root; the new flag is orthogonal to the live/replay split.
@@ -0,0 +1,225 @@
+# Derkachi e2e: wire EXISTING parent-suite satellite-provider into the operator pre-flight fixture
+
+> **Status (2026-05-23)**: **CLOSED** — Phases 1+2 shipped (cycle 3); Phases 3–5 **superseded by Epic AZ-835** per the 2026-05-22 user directive (route-driven seeding instead of bbox).
+>
+> | Phase | Outcome |
+> |-------|---------|
+> | Phase 1 (e2e-runner wire + C11 contract adapt + smoke test) | **SHIPPED** — batch 104, 2026-05-21 |
+> | Phase 2 (`seed_region.py` CLI + `bbox.yaml` + license attribution) | **SHIPPED** — between batches 104 and 106 |
+> | Phase 3 (real `operator_pre_flight_setup` fixture) | **SUPERSEDED** → AZ-839 (Epic AZ-835 C3, 5 SP) — route-driven, not bbox |
+> | Phase 4 (un-xfail AC-4 + AC-5) | **SUPERSEDED** → AZ-841 (Epic AZ-835 C5, 1 SP) |
+> | Phase 5 (docs) | **SUPERSEDED** → AZ-842 (Epic AZ-835 C6, 2 SP) |
+>
+> Total credited to AZ-777: 8 SP (per the 2026-05-21 single-ticket-containment override; Phases 1+2 fit within that envelope). Remaining work (~11 SP including AZ-836 / AZ-838 already shipped) is tracked under Epic AZ-835 children.
+>
+> Spec preserved as historical reference. **Do not implement Phases 3–5 from this file** — see the Epic AZ-835 children instead.
+>
+> See also: `_docs/_process_leftovers/2026-05-21_az777_complexity_override.md` (decision log).
+
+**Task**: AZ-777_derkachi_c6_reference_fixture
+**Name**: Drive the production C10/C11 pre-flight pipeline against the parent-suite `satellite-provider` .NET service ALREADY running in the Jetson e2e harness so the Derkachi clip produces a real FAISS-anchored C4/C5 satellite-fix loop end-to-end
+**Description**: The Jetson e2e harness already runs the real `satellite-provider` .NET 8 service (lineage AZ-688 / AZ-691 / AZ-692, services `satellite-provider` + `satellite-provider-postgres` in `docker-compose.test.jetson.yml`), but the e2e-runner still points its `SATELLITE_PROVIDER_URL` at the legacy `mock-sat` fixture and the placeholder `operator_pre_flight_setup` fixture never drives the C10/C11 pipeline. Compounding this, C11's `HttpTileDownloader` path constants (`_LIST_PATH=/api/satellite/tiles`, `_GET_PATH=/api/satellite/tiles/{tile_id}`) do not match the real satellite-provider API surface (`POST /api/satellite/tiles/inventory` for LIST, `GET /tiles/{z}/{x}/{y}` for tile fetch). This task wires the existing service into the e2e-runner, adapts C11 to the real contract, seeds the Derkachi-bbox tile catalog via `POST /api/satellite/request`, replaces the placeholder fixture with a real C10+C11 driver, and un-xfails the Tier-2 Derkachi + AZ-699 verdict tests.
+**Complexity**: 8 points (explicit override of the standard 5-pt PBI cap — see decision log entry 2026-05-21 + spec refresh note at `_docs/_process_leftovers/2026-05-21_az777_complexity_override.md`; scope reconciled with reality 2026-05-21 during cycle-3 batch 104. Single-ticket containment preserved — the four sub-deliverables only deliver demo-confidence value when shipped together.)
+**Dependencies**: AZ-776 done (eskf open-loop composition profile unblocks the replay graph for Derkachi); relies on prior compose-side work AZ-688 / AZ-691 / AZ-692 (closed in Jira without local task spec files — the `satellite-provider` + `satellite-provider-postgres` services + `.env.test.example` are already present)
+**Component**: e2e fixtures / c6_tile_cache / c10_provisioning / c11_tile_manager / docker compose
+**Tracker**: AZ-777
+**Epic**: AZ-602
+
+## Problem
+
+The Derkachi e2e fixture (`_docs/00_problem/input_data/flight_derkachi/`) ships real flight inputs but DOES NOT ship the populated C6 tile cache + FAISS descriptor index the replay protocol requires (`replay_protocol.md` Invariant 12). Three architectural gaps stop the full C1+C2+C3+C4+C5 pipeline from running against Derkachi today:
+
+1. **`e2e-runner` still points at `mock-sat`.** In `docker-compose.test.jetson.yml` the `e2e-runner` env block has `SATELLITE_PROVIDER_URL: http://mock-sat:5100` even though `mock-sat` is no longer defined in that file and the real `satellite-provider` service (https://satellite-provider:8080) IS defined right below.
+2. **C11 contract drift.** `c11_tile_manager/tile_downloader.py:61-62` defines `_LIST_PATH = /api/satellite/tiles` and `_GET_PATH = /api/satellite/tiles`. The real satellite-provider exposes `POST /api/satellite/tiles/inventory` (bulk lookup by z/x/y or `locationHashes`) and `GET /tiles/{z:int}/{x:int}/{y:int}` (slippy-map tile fetch) — different paths, different methods, different schemas (`Program.cs:187-209`).
+3. **`operator_pre_flight_setup` is a placeholder.** The fixture at `tests/e2e/replay/conftest.py` (lines 293-310) `mkdir`s an empty `operator_cache` directory and yields. It does NOT drive C11 download or C10 descriptor-batcher; it does NOT populate C6. The fixture's docstring explicitly calls itself "a stub" pending this ticket.
+
+Production architecture (per `architecture.md` Principle #5 + the C10/C11 descriptions) requires:
+
+- C10 does NOT touch satellite-provider — tile network I/O lives in C11.
+- C11 `HttpTileDownloader` is the production path: authenticated GETs against the parent-suite `satellite-provider`.
+- `satellite-provider` owns OSM/CARTO tile network I/O + license attribution + multi-flight voting layer — the onboard companion is read-only against it (via C11) during pre-flight and read-only against C6 during flight.
+- `mock-sat` is fully obsolete on Jetson (D-PROJ-2 / `POST /api/satellite/upload` shipped — verified at `Program.cs:211`). Tier-1 (`docker-compose.test.yml`) is deprecated per `_docs/02_document/tests/environment.md` 2026-05-20 active policy and is OUT OF SCOPE.
+
+## Outcome
+
+- The e2e-runner in `docker-compose.test.jetson.yml` consumes the existing real `satellite-provider` service over `https://satellite-provider:8080` with a self-signed dev cert and a static Bearer `service_api_key` token. `mock-sat` references removed.
+- C11 `HttpTileDownloader._LIST_PATH` / `_GET_PATH` adapted to the real satellite-provider API surface (`POST /api/satellite/tiles/inventory` for LIST; `GET /tiles/{z}/{x}/{y}` for tile fetch), with the consumer code in `_do_enumerate` + `_download_one_tile` updated to match. All existing C11 unit tests in `tests/unit/c11_tile_manager/` re-greened against the new contract.
+- `satellite-provider`'s tile catalog is seeded with the Derkachi bbox (≈50.05–50.15 lat, 36.05–36.15 lon, zoom 15–18) via `POST /api/satellite/request`. Imagery source: **Google Maps satellite layer** (`mt0..mt3.google.com/vt/lyrs=s`) — verified via 2026-05-22 black-box probe of the running satellite-provider. NOTE: this was originally specced as CARTO Voyager Basemap (CC-BY-3.0); the spec was amended 2026-05-22 after the probe revealed the actual upstream is Google Maps governed by Google Maps Platform Terms of Service. Dev/research use only; production deployment requires Google Maps Platform licensing review OR migration to a true CC-BY source on the satellite-provider side (parent-suite ticket TBD).
+- `tests/e2e/replay/conftest.py::operator_pre_flight_setup` replaced by a real fixture that drives adapted C11 + C10 against the seeded catalog and yields a `PopulatedC6Cache` dataclass mounted via named volumes that survive across pytest sessions.
+- AC-3 (`test_ac3_within_100m_80pct_of_ticks` in `tests/e2e/replay/test_derkachi_1min.py`) un-xfails on Tier-2 Jetson with ≥ 80 % of ticks within 100 m of ground truth.
+- AZ-699 verdict test (`test_az699_real_flight_validation_emits_verdict_and_report`) un-xfails and produces the first honest horizontal-error distribution report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`.
+
+## Scope
+
+### Included
+
+**Phase 1 — wire e2e-runner against existing satellite-provider + C11 contract adaptation**
+
+- `docker-compose.test.jetson.yml` (only the `e2e-runner` service block changes; the existing `satellite-provider` + `satellite-provider-postgres` blocks are unchanged):
+  - Switch e2e-runner `SATELLITE_PROVIDER_URL: http://mock-sat:5100` → `SATELLITE_PROVIDER_URL: https://satellite-provider:8080`.
+  - Add `SATELLITE_PROVIDER_TLS_INSECURE: "1"` env var (development-only) so requests accepts the self-signed dev cert. Loud warning + documentation per Risk 2.
+  - Add `SATELLITE_PROVIDER_API_KEY: ${SATELLITE_PROVIDER_API_KEY}` env sourced from `.env.test` (matches existing `JWT_SECRET` pattern; `.env.test.example` already covers JWT_*, this one extends it with one new variable).
+  - Add `e2e-runner.depends_on.satellite-provider: { condition: service_healthy }`.
+  - Remove any residual `mock-sat` reference from the `e2e-runner` env block (the service itself is already gone from the file).
+- **C11 contract adaptation** (in `src/gps_denied_onboard/components/c11_tile_manager/tile_downloader.py`):
+  - Change `_LIST_PATH = "/api/satellite/tiles"` → `_LIST_PATH = "/api/satellite/tiles/inventory"` and switch `_do_enumerate` from GET-with-query-params to POST-with-JSON-body per AZ-505 / `tile-inventory.md` v1.0.0 (body: `{tiles: [{tileZoom, tileX, tileY}, ...]}` OR `{locationHashes: [...]}`; response order matches request order with `present: true|false`).
+  - Change `_GET_PATH = "/api/satellite/tiles"` → `_GET_PATH = "/tiles"` and adjust `_download_one_tile` to build `/tiles/{z}/{x}/{y}` from the inventory hit's coordinates instead of `tile_id`.
+  - Map the response field renames in `TileSummary` construction (existing fields like `tile_id`, `produced_at`, `resolution_m_per_px`, `estimated_bytes` map to whatever the real inventory response uses — verify against `Program.cs` + `tile-inventory.md` and document any per-field adaptation needed).
+  - Update `tests/unit/c11_tile_manager/test_tile_downloader.py` (and any other unit tests touching the LIST/GET paths) to use the new POST contract + slippy-map GET — these are stubbed-response tests, no live service needed.
+- **Smoke test** at `tests/e2e/satellite_provider/test_smoke.py` (new):
+  - Gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2`.
+  - Brings up the docker-compose stack (`satellite-provider` + `satellite-provider-postgres` + dependencies).
+  - TCP-probe `satellite-provider:8080` until healthy.
+  - Issues one Bearer-authenticated `POST /api/satellite/tiles/inventory` for a 1-tile query (a tile in the Derkachi bbox); asserts a 200 response with the documented schema.
+  - For an inventory-present tile, fetches via `GET /tiles/{z}/{x}/{y}`; asserts non-empty JPEG bytes return.
+  - Asserts the C11-adapted code path (`HttpTileDownloader.download_for_bbox` for a 1-tile bbox) successfully writes to C6's tile store + Postgres metadata table.
+- `docker-compose.test.yml` (Tier-1) is **NOT** modified. Tier-1 e2e is deprecated per `_docs/02_document/tests/environment.md` 2026-05-20 active policy.
+- `.env.test.example` extended with `SATELLITE_PROVIDER_API_KEY=DEV-ONLY-REPLACE-...`.
+
+**Phase 2 — Derkachi tile catalog seeding via the real satellite-provider region API**
+
+- `tests/fixtures/derkachi_c6/seed_region.py` (new): a Python helper that calls `POST /api/satellite/request` against the running satellite-provider to register the Derkachi bbox + zoom range. Body schema verified against the actual `RequestRegionRequest` DTO (`{id, latitude, longitude, sizeMeters, zoomLevel, stitchTiles}`) — body shape probe-confirmed 2026-05-22. Imagery source: **Google Maps satellite layer** (`lyrs=s`); satellite-provider owns the actual tile download from Google Maps and applies the freshness gate. Note: see AZ-812 for the planned `latitude/longitude` → `lat/lon` rename on this DTO.
+- `tests/fixtures/derkachi_c6/bbox.yaml`: Derkachi bbox + zoom levels + actual imagery source (Google Maps satellite, not CARTO as originally specced) + license attribution metadata (Google Maps Platform Terms of Service + "Imagery © Google" attribution string).
+- `tests/fixtures/derkachi_c6/README.md`: how to re-seed if the satellite-provider DB is wiped; license attribution operators must propagate ("Imagery © Google"); the dev-only caveat for Google Maps ToS; pointer to the parent-suite ticket (TBD) for migrating to a true CC-BY source for production.
+
+**Phase 3 — replace `operator_pre_flight_setup` with a real fixture**
+
+- `tests/e2e/replay/conftest.py::operator_pre_flight_setup`: replace the placeholder. The new fixture:
+  - Reads the Derkachi bbox from `tests/fixtures/derkachi_c6/bbox.yaml`.
+  - Invokes the adapted C11 `HttpTileDownloader` against the running satellite-provider service.
+  - Invokes C10 `DescriptorBatcher` against the populated C6 (NetVLAD backbone per `c2_vpr/config.py:67` default).
+  - Verifies sidecar coherence (`.index` + `.sha256` + `.meta.json` triple-consistency check per AZ-306).
+  - Yields a `PopulatedC6Cache` dataclass that the test bodies consume.
+- Outputs mounted into the e2e-runner container via named volumes that survive across pytest sessions.
+
+**Phase 4 — un-xfail the Tier-2 tests**
+
+- `tests/e2e/replay/test_derkachi_1min.py::test_ac3_within_100m_80pct_of_ticks`: remove `@pytest.mark.xfail` (still gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2`).
+- `tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report`: remove `@pytest.mark.xfail`. The test body MUST emit the verdict report regardless of PASS/FAIL — the success criterion is that the report exists with the honest distribution.
+
+**Phase 5 — documentation**
+
+- `_docs/02_document/contracts/replay/replay_protocol.md`: extend Invariant 12 with an AZ-777 sub-section describing the operator_pre_flight_setup behaviour against the real satellite-provider.
+- `_docs/00_problem/input_data/flight_derkachi/README.md`: add a Derkachi C6 section pointing at the seed script + bbox config.
+- `_docs/02_document/architecture.md`: append a sub-section to the existing satellite-provider entry noting that the Jetson e2e harness consumes the real .NET service (AZ-688 / AZ-691 / AZ-692 prior art; AZ-777 closes the C11 contract gap and wires the e2e-runner client). Tier-1 status updated to "deprecated 2026-05-20".
+
+### Excluded
+
+- ZERO modifications to `../satellite-provider/`. If a parent-suite gap surfaces beyond C11 adapting to existing endpoints (e.g., inventory response missing fields C11 needs, region-onboarding endpoint rejects the Derkachi payload shape), STOP and file a parent-suite ticket.
+- `docker-compose.test.yml` (Tier-1) — OUT OF SCOPE (deprecated 2026-05-20).
+- Cross-compile / arm64 follow-up — **CLOSED**: `mcr.microsoft.com/dotnet/aspnet:10.0` has an arm64 manifest (verified 2026-05-21 via `docker manifest inspect`). No follow-up ticket needed.
+- `mock-sat` retention — **CLOSED**: already retired from Jetson compose; D-PROJ-2 / `POST /api/satellite/upload` has shipped on the real satellite-provider (`Program.cs:211`).
+- Switching C2 default backbone away from `net_vlad` — out of scope.
+- Persisting populated C6 to git/LFS — named-volume approach unchanged.
+
+## Acceptance Criteria
+
+**AC-1: satellite-provider healthy in Jetson compose**
+Given the existing `satellite-provider` + `satellite-provider-postgres` services in `docker-compose.test.jetson.yml`
+When `docker compose -f docker-compose.test.jetson.yml up satellite-provider` is invoked
+Then both services build, the satellite-provider becomes healthy via TCP probe on port 8080 (per existing healthcheck), and is reachable from any compose-network service via DNS `satellite-provider:8080`
+
+**AC-2: C11 contract aligns with satellite-provider's actual API**
+Given the adapted C11 `_LIST_PATH=/api/satellite/tiles/inventory` (POST) and `_GET_PATH=/tiles/{z}/{x}/{y}` (GET) against the running satellite-provider
+When `tests/e2e/satellite_provider/test_smoke.py` runs `HttpTileDownloader.download_for_bbox` for a 1-tile bbox in the Derkachi region (seeded)
+Then the inventory POST returns 200 with the documented schema, the tile fetch returns non-empty JPEG bytes, and C6's tile store + Postgres metadata both reflect the tile (freshness label `fresh`)
+
+**AC-3: operator_pre_flight_setup drives the production pipeline**
+Given the running satellite-provider with Derkachi tiles seeded
+When `tests/e2e/replay/conftest.py::operator_pre_flight_setup` runs
+Then adapted C11 downloads the Derkachi-bbox tiles into C6, C10 `DescriptorBatcher` builds the FAISS HNSW index using the NetVLAD backbone, the three sidecar files (`.index` + `.sha256` + `.meta.json`) pass the AZ-306 triple-consistency check, and the fixture yields a `PopulatedC6Cache` with all three artifact paths populated
+
+**AC-4: Derkachi AC-3 test un-xfails on Tier-2**
+Given AZ-776 landed + the populated C6 from AC-3 mounted into the e2e-runner + `c5_state.strategy = gtsam_isam2` + `c4_pose.enabled = True`
+When `tests/e2e/replay/test_derkachi_1min.py::test_ac3_within_100m_80pct_of_ticks` runs on Tier-2 Jetson
+Then it un-xfails, the test passes (≥ 80 % of ticks within 100 m of ground truth), and the per-frame loop emits `replay.satellite_anchor_inserted` log lines (not `satellite_anchoring_not_wired`)
+
+**AC-5: AZ-699 verdict report is produced**
+Given AZ-776 landed + the populated C6 from AC-3 + the real flight video + factory calibration
+When `tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report` runs on Tier-2 Jetson
+Then it un-xfails, the test runs to completion within the 15-min NFR budget, and `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` records the horizontal-error distribution with the honest PASS/FAIL verdict against the ≥ 80 % within 100 m gate (PASS not required for the AC; HONEST report required)
+
+**AC-6: Documentation captures the new architecture seam**
+Given the updated replay protocol doc + Derkachi fixture README + architecture sub-section
+When a new contributor reads them
+Then they understand (i) why the real satellite-provider runs in the Jetson e2e harness, (ii) the C11 contract used against satellite-provider (inventory + slippy-map), (iii) how to re-seed the Derkachi catalog, (iv) what license attribution operators must propagate, and (v) why Tier-1 is deprecated
+
+## Non-Functional Requirements
+
+**Performance**
+- `operator_pre_flight_setup` completes in ≤ 5 minutes on first invocation (cold cache), ≤ 30 seconds on subsequent invocations within the same docker-compose session (warm cache via named volume).
+- C11 inventory POST + per-tile GET round-trips MUST stay within the existing C11 retry/backoff schedule (`_DEFAULT_BACKOFF_SCHEDULE_S = (1, 2, 4, 8)`). No new retry budget.
+
+**Compatibility**
+- Tile on-disk layout `{zoom}/{x}/{y}.jpg` MUST be byte-equivalent to satellite-provider's layout (architecture principle #5) — automatic via C6 write path.
+- FAISS index format MUST be loadable by the airborne `c6_descriptor_index.FaissDescriptorIndex.from_config` impl without code changes — automatic via C6 write path.
+- C11 inventory POST schema MUST match `tile-inventory.md` v1.0.0 (AZ-505). Schema mismatch is a parent-suite bug; this task adapts C11 to the documented v1.0.0 contract, no further patches.
+
+**Reliability**
+- The smoke test (AC-2) MUST fail loud if satellite-provider is unreachable, returns malformed responses, rate-limits, or returns 401/403 (auth failure) — no silent skip.
+- `operator_pre_flight_setup` MUST clean up partial cache state on failure (no half-built FAISS index left).
+- SHA-256 content-hash gate on the FAISS index (per D-C10-3) verified at every fixture yield — mismatch raises `IndexUnavailableError`.
+
+**Security**
+- `SATELLITE_PROVIDER_TLS_INSECURE=1` is a **development-only** override. Documented in `.env.test.example` + the smoke test + the architecture sub-section. Production deploys MUST validate against a real CA-issued cert.
+- `SATELLITE_PROVIDER_API_KEY` sourced from `.env.test`; never committed; same `.gitignore` pattern as `JWT_SECRET`.
+- C11 download goes through the production Bearer-token auth path (`Authorization: Bearer ${SATELLITE_PROVIDER_API_KEY}`) — no auth bypass.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|--------------|------------------|
+| AC-1   | `docker-compose.test.jetson.yml` lints; e2e-runner depends_on satellite-provider | `docker compose -f docker-compose.test.jetson.yml config` exits 0 |
+| AC-2   | C11 `_do_enumerate` against a stubbed POST `/api/satellite/tiles/inventory` response | Returns `list[TileSummary]` with correct field mapping |
+| AC-2   | C11 `_download_one_tile` against a stubbed GET `/tiles/{z}/{x}/{y}` response | Writes tile bytes + sha256 to C6 adapter |
+| AC-3   | `operator_pre_flight_setup` fixture yields a `PopulatedC6Cache` with non-empty tile store + FAISS index | All three sidecar files exist + sha256 triple-consistency holds |
+| AC-3   | Sidecar SHA-256 coherence check inside the fixture | `IndexUnavailableError` raised when one of the three files is tampered |
+| AC-6   | Fixture README documents the seed invocation | Invocation string + license attribution greps cleanly |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|--------------|-------------------|----------------|
+| AC-1   | Jetson compose | `docker compose up satellite-provider` | Both services come up healthy in ≤ 60 s | Perf |
+| AC-2   | Real satellite-provider running + 1-tile-bbox query | C11 adapted HttpTileDownloader against the live service | Tile arrives in C6 + metadata row inserted + freshness=fresh | Reliability |
+| AC-3   | Seeded Derkachi catalog + e2e-runner | `operator_pre_flight_setup` cold + warm invocation | Cold ≤ 5 min, warm ≤ 30 s, all three sidecar files coherent | Perf |
+| AC-4   | AZ-776 landed + populated C6 mounted + full-GTSAM YAML | `test_ac3_within_100m_80pct_of_ticks` un-xfailed on Tier-2 Jetson | Test passes (≥ 80 % within 100 m); `satellite_anchor_inserted` log lines visible | Perf, Compat |
+| AC-5   | AZ-776 landed + populated C6 mounted + real flight video + factory calibration | `test_az699_real_flight_validation_emits_verdict_and_report` un-xfailed | Test completes ≤ 15 min, verdict report written to `_docs/06_metrics/` | Perf |
+
+## Constraints
+
+- ZERO modifications to files under `../satellite-provider/` (sibling repo). If a parent-suite gap is discovered, STOP and file a parent-suite ticket.
+- Per replay protocol Invariant 5: ZERO outbound network from the e2e-runner once the cache is populated. The cache-population phase needs network (satellite-provider downloads from CARTO upstream); the airborne replay run is internal-network-only.
+- Imagery source: **Google Maps satellite layer** (`lyrs=s`), governed by Google Maps Platform Terms of Service. Originally specced as CC-BY-licensed (CARTO Voyager); amended 2026-05-22 after probe revealed Google Maps is the actual upstream. License attribution string ("Imagery © Google") recorded in the seeded catalog's metadata. Dev/research use only; production deploy requires (a) Google Maps Platform licensing review for offline-cache use, OR (b) parent-suite ticket to add a true CC-BY satellite imagery provider to satellite-provider (Esri World Imagery, Mapbox satellite, Sentinel-2, etc.).
+- The seeded Derkachi catalog size budget is 100 MB on the satellite-provider DB side. Over budget → reduce zoom-level coverage; document in `bbox.yaml`.
+- Tier-1 (`docker-compose.test.yml`) is deprecated and MUST NOT be modified by this task.
+
+## Risks & Mitigation
+
+**Risk 1: C11 inventory response field names drift further from `tile-inventory.md` v1.0.0**
+- *Risk*: Even after fixing `_LIST_PATH` + `_GET_PATH`, the response object fields (`tile_id`, `produced_at`, `resolution_m_per_px`, `estimated_bytes`, etc.) may not match the inventory response's actual field names; or the inventory response may not include all the fields C11's `TileSummary` requires.
+- *Mitigation*: Phase 1 verifies field mapping against `tile-inventory.md` v1.0.0 + `Program.cs::GetTilesInventory` source. Per-field renames are a gps-denied-onboard side concern (C11 adapter); only fields entirely missing from the inventory response warrant a parent-suite ticket.
+
+**Risk 2: Self-signed cert CN/SAN doesn't include `satellite-provider` hostname**
+- *Risk*: The dev cert at `../satellite-provider/certs/api.pfx` may be issued for `localhost` only; via compose DNS `satellite-provider:8080` it would fail SSL verification.
+- *Mitigation*: Phase 1 introduces `SATELLITE_PROVIDER_TLS_INSECURE=1` env knob — accepted as a **development-only** workaround with prominent warnings in `.env.test.example`, the smoke test, and the architecture doc. Production deploys MUST set this to `0` (default) and use a real cert. Regenerating the dev cert with the right SAN is the cleaner long-term fix but lives on the parent-suite side; file a follow-up ticket if the workaround feels brittle.
+
+**Risk 3: ~~satellite-provider doesn't build on arm64~~ — CLOSED 2026-05-21**
+- `mcr.microsoft.com/dotnet/aspnet:10.0` multi-arch manifest verified via `docker manifest inspect`: arm64, amd64, arm/v7 all present. No follow-up needed.
+
+**Risk 4: ~~CARTO Voyager basemap residual is too coarse for AC-4~~ — REDEFINED 2026-05-22**
+- *Original concern*: CC-BY basemap is OSM-derived (street-level features, not satellite features). NetVLAD descriptors may not lock against nadir camera frames well enough for ≥ 80 % within 100 m.
+- *Probe-verified reality (2026-05-22)*: The actual upstream is **Google Maps satellite layer** (`lyrs=s`), which IS high-resolution overhead imagery from genuine satellite/aerial sources. NetVLAD descriptor lock should be strong against nadir camera frames. The original CARTO-coarseness risk is mitigated by the reality.
+- *New risk (replacing it)*: **Google Maps Platform Terms of Service may restrict offline-tile storage** for the C6-style use case (long-lived cache of stored tiles serving as a VPR reference dataset). Acceptable for dev/research; production deployment requires licensing review or a CC-BY-source migration on the satellite-provider side. Surfaced explicitly in `bbox.yaml`, `README.md`, and the architecture doc sub-section.
+- *Mitigation*: AC-5 (AZ-699 verdict report) still serves as the honest signal regardless of imagery quality. If VPR locks well, AC-4 passes; if it doesn't, the verdict report records the actual horizontal-error distribution and points to a follow-up (e.g., higher-zoom seeding, different descriptor backbone, or migrating to a CC-BY satellite source for both licensing AND quality reasons).
+
+**Risk 5: Single-ticket 8-pt complexity exceeds the standard PBI cap**
+- *Risk*: Above the 5-pt cap stated in the project's PBI complexity rule.
+- *Mitigation*: The five phases are explicit STOP-gates. If Phase 1 (wiring + C11 adaptation) fails for reasons outside this ticket's scope (e.g., parent-suite contract drift beyond field renames, cert hostname issue requiring parent-suite regen), the implementer STOPS at the phase boundary, files the parent-suite ticket, and proposes a split into smaller follow-up tickets. The "single ticket" property holds as long as work proceeds linearly; if any phase grinds, decomposition is the escape hatch.
+
+### ADR Impact
+
+> Affects ADR-002 (build-time exclusion): unchanged — C11 is already operator-side-only via process-level isolation (architecture Principle #4 + ADR-004); this task adapts C11's contract but does not change its build-time isolation.
+> Affects ADR-011 (replay is a configuration): unchanged — the per-frame loop is mode-agnostic; this task closes the gap between the live and replay paths' upstream tile source.
+> Implements architecture principle #5 (satellite-provider on-disk layout) end-to-end against a real flight for the first time.
+> No new ADR — the architectural decision is "adapt C11 to the existing satellite-provider contract and wire the e2e harness against the real service", which is execution of existing decisions, not a new one.
@@ -0,0 +1,92 @@
+# End-to-end real-flight validation pipeline (Epic)
+
+**Task**: AZ-835_e2e_real_flight_validation_epic
+**Name**: End-to-end real-flight validation: raw (tlog, video) → route-driven satellite seeding → gps-denied verdict
+**Description**: Drive the full gps-denied-onboard validation pipeline from raw operator inputs to a verdict. Given a `.tlog` binary + a flight video, the system automatically extracts the flight cut, syncs frames to IMU, builds the satellite imagery the descriptor stack needs (route-driven, not bbox-driven), runs the airborne pipeline, and reports the horizontal-error distribution against the tlog's own GPS ground truth. Supersedes AZ-777 Phase 3+ design.
+**Complexity**: Epic — ~17 SP decomposed into 6 child tasks of ≤ 5 SP each (see decomposition table below)
+**Dependencies**: AZ-777 Phase 1 (landed cycle 3 batch 105 — C11 contract adaptation + e2e-runner wiring); AZ-405 (tlog↔video auto-sync adapter); AZ-699 (verdict report writer); AZ-809 SOFT (Route API validation — landing AZ-809 before C2 lets the client consume RFC 7807 validator responses cleanly)
+**Component**: cross-cutting — replay_input + new TlogRouteExtractor + new SatelliteProviderRouteClient + e2e fixtures + tests/e2e/replay
+**Tracker**: AZ-835 (https://denyspopov.atlassian.net/browse/AZ-835)
+**Originating directive**: user (2026-05-22) after AZ-777 Phase 2 deliverables landed — "In the end it should be full e2e flow. You give it a tlog + video, and the system does everything else."
+
+Jira AZ-835 is the authoritative spec; this file mirrors the in-workspace-only sections that gps-denied-onboard implementers will need.
+
+## Goal
+
+A single pytest test takes only `(tlog, video, calibration)` as input and runs the full 7-step pipeline end-to-end on the Jetson harness, producing an honest PASS/FAIL verdict against the AZ-696 AC-3 threshold (≥ 80 % of emissions within 100 m).
+
+## The 7-step pipeline
+
+| # | Step | Existing? | Component / new code |
+|---|------|-----------|----------------------|
+| 1 | Extract active flight cut + sync with video | **Mostly existing** (AZ-405 `tlog_video_adapter.py`) | small extension for take-off/landing boundary detection if needed |
+| 2 | On-fly frame + IMU extraction | **Existing** | `VideoFileFrameSource` + `TlogReplayFcAdapter` (no change) |
+| 3 | Auto-create route from tlog GPS, coarsen to ≤ 10 pts | **New** | `TlogRouteExtractor` (Douglas-Peucker on `GLOBAL_POSITION_INT` rows) → `RouteSpec` |
+| 4 | POST route to satellite-provider, get tiles | **New consumer** | `SatelliteProviderRouteClient` (POST `/api/satellite/route`, poll `mapsReady`) |
+| 5 | Calc FAISS index from tiles | **Mostly existing** | C10 `DescriptorBatcher` runs; new fixture wires C11 → C10 trigger |
+| 6 | Run gps-denied from all the info | **Existing** | `gps-denied-replay` console-script + airborne composition root |
+| 7 | Get GPS fixes, check against tlog GPS | **Existing** | `helpers/accuracy_report.py` + `helpers/gps_compare.py` |
+
+## Decomposition (6 child tasks)
+
+| # | Title | Est | Depends |
+|---|-------|-----|---------|
+| C1 | `TlogRouteExtractor` — extract active segment + coarsen to N waypoints | 3 | — |
+| C2 | `SatelliteProviderRouteClient` + `route_seed.py` CLI | 3 | AZ-809 (soft) |
+| C3 | New `operator_pre_flight_setup` fixture (C1 + C2 + C11 + C10) — replaces placeholder, supersedes AZ-777 Phase 3 | 5 | C1, C2, AZ-777 Phase 1 |
+| C4 | E2E test ingesting raw `(tlog, video)` and running steps 1-7 — extends/replaces AZ-699 verdict test | 3 | C3 |
+| C5 | Un-xfail AZ-777 AC-4 + AC-5 tests | 1 | C4 |
+| C6 | Docs: `replay_protocol.md` Invariant 12 + AZ-777 amendment + new-test README | 2 | C5 |
+
+**Total ~17 SP**.
+
+## Why route-driven seeding (not bbox)
+
+- **Efficiency**: AZ-777 spec bbox = ~11400 tiles z15-z18 (~140 MB, 48% over budget). 10-point coarsened route with `regionSizeMeters=500` per point = ~50-100 unique tiles (~1.5 MB) for the same VPR descriptor lock area. **~100× reduction**.
+- **Honesty**: bbox pre-commits to where the operator *might* fly. Route pre-commits to where they *did* fly. For real-flight validation, the latter is the right primitive.
+- **Probe-confirmed**: Route API works end-to-end in ~15s for a 2-point route per 2026-05-22 black-box probe. Uses `lat`/`lon` already (no AZ-812 rename needed).
+
+## Coordination with prior work
+
+- **AZ-777** — Phase 1 + Phase 2 reused; Phase 3+ design **superseded** by this Epic when C3 lands.
+- **AZ-699** — verdict-report-writing path preserved; C4 extends or wraps it.
+- **AZ-405** — tlog↔video auto-sync adapter reused as-is for step 1.
+- **AZ-702** — camera factory-sheet calibration unchanged.
+- **AZ-696** — ≥ 80 % within 100 m threshold gate unchanged.
+- **AZ-808** — Region-endpoint validation; not on this Epic's critical path (Route used, not Region).
+- **AZ-809** — Route-endpoint validation; soft prereq for C2.
+- **AZ-812** — Region rename to lat/lon; not on this Epic's critical path.
+
+## Acceptance criteria (Epic-level)
+
+**AC-1**: New pytest test gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2` takes only `(tlog, video, calibration)` and runs the full 7-step pipeline on Jetson.
+
+**AC-2**: Step 1 auto-detects active flight cut from raw tlog (take-off → landing) without operator intervention.
+
+**AC-3**: Step 3 produces ≤ 10 waypoints that materially follow the tlog GPS trajectory (DP tolerance documented in config).
+
+**AC-4**: Step 4 succeeds against real satellite-provider on Jetson docker network, downloads route tiles from Google Maps, `mapsReady=true` within runtime budget.
+
+**AC-5**: Step 5 builds FAISS HNSW index over route-seeded C6 cache; sidecar triple-consistency holds (AZ-306).
+
+**AC-6**: Step 7 emits AZ-699 verdict report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` with honest horizontal-error distribution — PASS or FAIL on AZ-696 AC-3 threshold, no xfail mask.
+
+**AC-7**: End-to-end run ≤ 15 min on Tier-2 Jetson for the Derkachi clip (soft target for first delivery; hard NFR after first measurement).
+
+**AC-8**: Docs: `replay_protocol.md` Invariant 12 sub-section + AZ-777 marked Phase 3+ superseded + new-test README.
+
+## Out of scope
+
+- Satellite-provider imagery-source migration to CC-BY (parent-suite ticket, TBD).
+- FAISS / NetVLAD backbone replacement.
+- Real-time tlog ingestion (this Epic operates on finished `.tlog` files).
+- Multi-flight aggregate validation.
+- ZERO modifications to `../satellite-provider/` (Route API consumed as-is).
+- CI gating (test stays behind `RUN_REPLAY_E2E=1`).
+
+## References
+
+- Jira AZ-835: https://denyspopov.atlassian.net/browse/AZ-835
+- Supersedes AZ-777 Phase 3+ design (AZ-777 Phase 1 + Phase 2 reused)
+- Probe foundation: 2026-05-22 black-box probe of Route API confirmed end-to-end viability
+- Related: AZ-405, AZ-696, AZ-699, AZ-702, AZ-777, AZ-808, AZ-809, AZ-812
@@ -0,0 +1,86 @@
+# TlogRouteExtractor
+
+**Task**: AZ-836_tlog_route_extractor
+**Name**: TlogRouteExtractor: extract active flight segment + coarsen tlog GPS to ≤N waypoints (AZ-835 C1)
+**Description**: First building block of Epic AZ-835. Pure, testable function that consumes a `.tlog` binary and returns a `RouteSpec` (≤ N waypoints + suggested per-waypoint coverage radius) suitable for posting to satellite-provider's `POST /api/satellite/route` endpoint (consumed by AZ-835 C2 / AZ-838).
+**Complexity**: 3 SP
+**Dependencies**: AZ-697 (`load_tlog_ground_truth` — done); AZ-279 (WGS converter — done); AZ-835 (parent Epic)
+**Component**: `src/gps_denied_onboard/replay_input/tlog_route.py` (new module under `replay_input/`)
+**Tracker**: AZ-836 (https://denyspopov.atlassian.net/browse/AZ-836)
+**Parent Epic**: AZ-835
+
+Jira AZ-836 is the authoritative spec; this file is the in-workspace mirror.
+
+## Public surface
+
+```python
+from dataclasses import dataclass
+from pathlib import Path
+
+@dataclass(frozen=True, slots=True)
+class RouteSpec:
+    waypoints: tuple[tuple[float, float], ...]   # (lat, lon), 1..max_waypoints
+    suggested_region_size_meters: float           # per-waypoint coverage radius
+    source_tlog: Path                             # provenance
+    source_segment: tuple[int, int]               # (start_idx, end_idx) into tlog GPS rows
+    total_distance_meters: float                  # along-track distance of active segment
+
+class RouteExtractionError(ReplayInputAdapterError): ...
+
+def extract_route_from_tlog(
+    tlog: Path,
+    *,
+    max_waypoints: int = 10,
+    min_takeoff_speed_m_s: float = 2.0,
+    min_takeoff_altitude_agl_m: float = 5.0,
+    douglas_peucker_tolerance_m: float | None = None,  # auto-computed if None
+    region_size_meters: float = 500.0,
+) -> RouteSpec: ...
+```
+
+Reuses `replay_input.tlog_ground_truth.load_tlog_ground_truth()` for GPS extraction — no MAVLink re-parsing.
+
+## Active-segment detection
+
+Trim leading + trailing rows where horizontal speed < `min_takeoff_speed_m_s` AND altitude AGL < `min_takeoff_altitude_agl_m`. Both thresholds configurable. If trimmed segment has < 2 fixes, raise `RouteExtractionError` with the explicit threshold values — no silent fallback to the full tlog.
+
+## Coarsening
+
+Douglas-Peucker in WGS84 with great-circle distance metric. Use the existing `helpers.wgs_converter` or `helpers.gps_compare` meter conversion — do NOT reimplement (check both first; pick whichever has the right primitive).
+
+When `douglas_peucker_tolerance_m is None`, auto-compute by binary-search over the tolerance until `len(result) <= max_waypoints`. Halt at convergence (delta < 1 m) or 32 iterations.
+
+## Validation
+
+- `max_waypoints >= 1` (raise `ValueError`).
+- `region_size_meters > 0` (raise `ValueError`).
+- At least 1 fix from `GLOBAL_POSITION_INT` (preferred) or `GPS_RAW_INT` (fallback); if neither, `RouteExtractionError` referencing missing message types (mirrors AZ-697).
+- Missing tlog file → `RouteExtractionError` (not bare `FileNotFoundError`) so callers can catch one error class.
+
+## Acceptance criteria
+
+| # | Criterion |
+|---|-----------|
+| AC-1 | Real Derkachi tlog → RouteSpec with `len(waypoints) <= 10`; every waypoint inside lat 50.0808..50.0832, lon 36.1070..36.1134 |
+| AC-2 | Active-segment trim filters pre-takeoff stationary frames (synthetic 5+ stationary leading fixes → `source_segment[0] > 0`) |
+| AC-3 | `max_waypoints=2` → exactly 2 waypoints |
+| AC-4 | `max_waypoints=100` on N<100 tlog → N waypoints (no coarsening below natural fix count) |
+| AC-5 | Missing tlog → `RouteExtractionError` with path; not `FileNotFoundError` |
+| AC-6 | Tlog with no GPS → `RouteExtractionError` naming missing message types |
+| AC-7 | `RouteSpec` is `frozen=True`, `slots=True`, all provenance fields populated |
+| AC-8 | Auto-tolerance binary-search converges within 32 iters on a 200-fix synthetic trajectory |
+| AC-9 | No I/O beyond tlog read; logging at DEBUG only |
+| AC-10 | Unit tests cover: Derkachi happy path, small/large max_waypoints, missing tlog, missing GPS, custom DP tolerance, custom region size, synthetic stationary-leading trim |
+
+## Out of scope
+
+- Posting to satellite-provider (AZ-838 / C2)
+- Route visualization on a map (future, AZ-700-style)
+- Multi-tlog aggregation
+- Live-stream tlog ingestion
+
+## References
+
+- Parent Epic: AZ-835 — https://denyspopov.atlassian.net/browse/AZ-835
+- Reference tlog: `_docs/00_problem/input_data/flight_derkachi/derkachi.tlog`
+- Reuse: `src/gps_denied_onboard/replay_input/tlog_ground_truth.py` (AZ-697), `src/gps_denied_onboard/helpers/gps_compare.py`
@@ -0,0 +1,106 @@
+# SatelliteProviderRouteClient + seed_route.py CLI
+
+**Task**: AZ-838_satellite_provider_route_client
+**Name**: SatelliteProviderRouteClient + seed_route.py CLI: POST tlog-derived route to satellite-provider (AZ-835 C2)
+**Description**: Second building block of Epic AZ-835. Consumer-side HTTP client + CLI wrapper that takes a `RouteSpec` (from AZ-836 / C1) and registers it with satellite-provider's `POST /api/satellite/route` endpoint, polls until `mapsReady=true`, and returns the inventory size for downstream consumption.
+**Complexity**: 3 SP
+**Dependencies**: AZ-836 (C1, RouteSpec dataclass + extractor — hard code dep); AZ-777 Phase 1 (existing satellite-provider HTTP plumbing patterns + JWT handling — done); AZ-809 (Route API validation — SOFT prereq, client pre-emptively validates so it's correct without it); AZ-835 (parent Epic)
+**Component**: new `src/gps_denied_onboard/satellite_provider/route_client.py` + new CLI `tests/fixtures/derkachi_c6/seed_route.py`
+**Tracker**: AZ-838 (https://denyspopov.atlassian.net/browse/AZ-838)
+**Parent Epic**: AZ-835
+
+Jira AZ-838 is the authoritative spec; this file is the in-workspace mirror.
+
+## Public surface
+
+```python
+import uuid
+from dataclasses import dataclass
+from gps_denied_onboard.replay_input.tlog_route import RouteSpec  # AZ-836
+
+@dataclass(frozen=True, slots=True)
+class RouteSeedResult:
+    route_id: uuid.UUID
+    terminal_status: str
+    maps_ready: bool
+    tile_count: int
+    elapsed_ms: int
+    submitted_payload_sha256: str
+
+class SatelliteProviderRouteError(Exception): ...
+class RouteValidationError(SatelliteProviderRouteError): ...      # 4xx + ProblemDetails
+class RouteTransientError(SatelliteProviderRouteError): ...       # 5xx / network / timeout
+class RouteTerminalFailureError(SatelliteProviderRouteError): ... # mapsReady never reached
+
+class SatelliteProviderRouteClient:
+    def __init__(self, base_url: str, jwt: str, *, tls_insecure: bool = False,
+                 request_timeout_s: float = 30.0, poll_interval_s: float = 5.0,
+                 poll_max_attempts: int = 60): ...
+    def seed_route(self, spec: RouteSpec, *, name: str | None = None) -> RouteSeedResult: ...
+```
+
+## Wire shape
+
+No formal Route API contract doc exists in `../satellite-provider/_docs/02_document/contracts/api/` as of 2026-05-22. DTOs are the source of truth:
+
+- `../satellite-provider/SatelliteProvider.Common/DTO/CreateRouteRequest.cs` (top-level)
+- `../satellite-provider/SatelliteProvider.Common/DTO/RoutePoint.cs` (`[JsonPropertyName("lat")] Latitude`, `[JsonPropertyName("lon")] Longitude` — input/output naming asymmetry flagged in AZ-809 AC-10; consume `lat`/`lon` in the JSON)
+- `../satellite-provider/SatelliteProvider.Common/DTO/GeoPoint.cs` (nested geofence point)
+
+Probe 2026-05-22: 2-point route + `requestMaps=true` completes end-to-end in ~15 s.
+
+## Behaviour
+
+- **Pre-emptive validation** against AZ-809 rules — surface as `RouteValidationError` BEFORE HTTP POST:
+  - `points` non-empty AND `len(points) <= 100`
+  - `id` non-zero Guid
+  - `regionSizeMeters > 0` AND `<= 10000`
+  - `zoomLevel` in 15..18 (per AZ-777 Phase 2 bbox config)
+  - Each point's `lat` in -90..90, `lon` in -180..180
+- **Submit** `POST /api/satellite/route` with `requestMaps=true`, `createTilesZip=false`.
+- **Poll** `GET /api/satellite/route/{id}` every `poll_interval_s` up to `poll_max_attempts` until `mapsReady=true` OR terminal failure. Log cadence at INFO.
+- **Return** `RouteSeedResult`; `tile_count` from a final `POST /api/satellite/tiles/inventory` enumerating the route's tile coverage (computed locally from waypoints + `regionSizeMeters`).
+- **Raise** `RouteTerminalFailureError` on terminal failure (`.detail` = SP response JSON).
+- **Raise** `RouteTransientError` on 5xx / network / timeout (`__cause__` = underlying `httpx` exception).
+- **Raise** `RouteValidationError` on 4xx; parse RFC 7807 `errors` dict into `field_errors`.
+
+## CLI (`tests/fixtures/derkachi_c6/seed_route.py`)
+
+Mirrors `seed_region.py` (AZ-777 Phase 2):
+
+- Env: `SATELLITE_PROVIDER_URL`, `SATELLITE_PROVIDER_API_KEY`, `SATELLITE_PROVIDER_TLS_INSECURE`, optional `--auto-mint-jwt` (uses `scripts/mint_dev_jwt.py`)
+- Required: `--tlog <path>` (delegates to AZ-836's `extract_route_from_tlog`)
+- Optional: `--max-waypoints` (10), `--region-size-meters` (500), `--name`, `--output-summary <path>`, `--dry-run`
+- Exit codes: 0 success, 71 config malformed, 72 missing env, 73 SP unreachable, 74 4xx, 75 5xx / terminal failure, 76 inventory verification mismatch
+
+## Acceptance criteria
+
+| # | Criterion |
+|---|-----------|
+| AC-1 | POSTs wire shape exactly per `CreateRouteRequest.cs` + `RoutePoint.cs` + `GeoPoint.cs` |
+| AC-2 | Polls `GET /api/satellite/route/{id}` until `mapsReady=true` OR terminal failure; respects `poll_max_attempts` + `poll_interval_s` |
+| AC-3 | 4xx + RFC 7807 ProblemDetails → `RouteValidationError`; `field_errors` populated from `errors` dict |
+| AC-4 | 5xx / network / timeout → `RouteTransientError`; `__cause__` = underlying `httpx` exc |
+| AC-5 | Terminal failure → `RouteTerminalFailureError`; `.detail` = SP response JSON |
+| AC-6 | Pre-emptive validation rejects (BEFORE HTTP POST): empty `points`, >100 `points`, missing/zero `id`, missing/zero `regionSizeMeters`, OOR `zoomLevel`, OOR lat/lon |
+| AC-7 | `seed_route.py --dry-run --tlog <derkachi.tlog>`: extracts route, prints planned payload + sha256, exit 0, no HTTP |
+| AC-8 | `seed_route.py --tlog <derkachi.tlog>` against Jetson SP: exit 0, prints `RouteSeedResult`, optional summary JSON |
+| AC-9 | Unit tests (mocked HTTPX): happy path, 400+ProblemDetails, 500 transient, terminal failure, timeout, dry-run, missing env, all pre-emptive validation cases |
+| AC-10 | Integration test gated by `RUN_E2E=1` + `SATELLITE_PROVIDER_URL`: Derkachi route seeded, `tile_count > 0`, `maps_ready=True` |
+
+## Out of scope
+
+- FAISS index from seeded tiles (AZ-835 C3 / C5)
+- C6 cache population (AZ-835 C3 — new `operator_pre_flight_setup` fixture)
+- Modifying satellite-provider source (Route API consumed as-is)
+- Multi-route batching (one RouteSpec → one POST)
+- Authentication beyond existing JWT pattern (AZ-494)
+
+## References
+
+- Parent Epic: AZ-835 — https://denyspopov.atlassian.net/browse/AZ-835
+- Sibling: AZ-836 (C1) — RouteSpec source
+- Mirror CLI: `tests/fixtures/derkachi_c6/seed_region.py` (AZ-777 Phase 2)
+- HTTP patterns: `src/gps_denied_onboard/components/c11_tile_manager/tile_downloader.py` (AZ-316/777 Phase 1)
+- DTOs (in `../satellite-provider/`): `SatelliteProvider.Common/DTO/{CreateRouteRequest,RoutePoint,GeoPoint}.cs`
+- Soft prereq: AZ-809 (Route API validation in satellite-provider)
@@ -0,0 +1,85 @@
+# operator_pre_flight_setup real fixture (AZ-835 C3)
+
+**Task**: AZ-839_operator_pre_flight_setup_real_fixture
+**Name**: operator_pre_flight_setup fixture: wire C1+C2+C11+C10 into real fixture, supersede AZ-777 Phase 3 (AZ-835 C3)
+**Description**: Third building block of Epic AZ-835. Replace the placeholder `operator_pre_flight_setup` fixture (currently a `mkdir` stub at `tests/e2e/replay/conftest.py` lines 293-310) with a real driver that wires C1 (AZ-836) + C2 (AZ-838) + C11 (AZ-777 Phase 1) + C10 to populate C6 from a tlog-derived route. Supersedes AZ-777 Phase 3 (the bbox-seeded placeholder-replacement design) per the 2026-05-22 user directive — route-driven seeding is ~100x more tile-efficient and pre-commits to where the operator did fly per the tlog.
+**Complexity**: 5 SP
+**Dependencies**: AZ-836 (C1, RouteSpec + extractor — In Testing); AZ-838 (C2, SatelliteProviderRouteClient + seed_route.py CLI — In Testing); AZ-777 Phase 1 (e2e-runner ↔ satellite-provider wire + C11 contract adaptation — done, batch 104); AZ-322 (C10 DescriptorBatcher — done); AZ-316+AZ-777 Phase 1 (C11 HttpTileDownloader.download_for_bbox — done); AZ-306 (FAISS sidecar triple-consistency — done); AZ-835 (parent Epic)
+**Component**: `tests/e2e/replay/conftest.py` (`operator_pre_flight_setup` fixture rewrite + new `PopulatedC6Cache` dataclass)
+**Tracker**: AZ-839 (https://denyspopov.atlassian.net/browse/AZ-839)
+**Parent Epic**: AZ-835
+
+Jira AZ-839 is the authoritative spec; this file is the in-workspace mirror.
+
+## Public surface
+
+```python
+from dataclasses import dataclass
+from pathlib import Path
+
+from gps_denied_onboard.replay_input.tlog_route import RouteSpec  # AZ-836
+
+@dataclass(frozen=True, slots=True)
+class PopulatedC6Cache:
+    cache_root: Path                # named-volume mount inside the e2e-runner container
+    tile_store_path: Path           # postgres + filesystem store root
+    faiss_index_path: Path          # .index file
+    faiss_sidecar_sha256_path: Path # .sha256 file
+    faiss_sidecar_meta_path: Path   # .meta.json file
+    route_spec: RouteSpec           # provenance — which tlog/route produced this cache
+    tile_count: int                 # how many tiles ended up in C6
+    elapsed_seconds: float          # wall time, for the AC-1/AC-2 perf budget
+```
+
+The fixture remains a pytest fixture at `tests/e2e/replay/conftest.py::operator_pre_flight_setup`, same `session` scope as today. Input contract unchanged (same args the placeholder takes) plus a new dependency on `RouteSpec` — either fixture-injected or extracted from the test's tlog parameter via `extract_route_from_tlog`.
+
+## Behaviour
+
+1. Read the route spec (fixture-injected or extracted from test tlog via `extract_route_from_tlog`).
+2. Instantiate `SatelliteProviderRouteClient` from env (`SATELLITE_PROVIDER_URL`, `SATELLITE_PROVIDER_API_KEY`, `SATELLITE_PROVIDER_TLS_INSECURE`).
+3. Call `seed_route(route_spec)`. On `RouteValidationError` / `RouteTerminalFailureError` → re-raise with original cause. On `RouteTransientError` → retry up to 3 attempts using C11's `_DEFAULT_BACKOFF_SCHEDULE_S = (1, 2, 4, 8)`.
+4. Enumerate tile coverage locally (mirror `route_client._enumerate_route_tile_coords` from AZ-838); call C11 `HttpTileDownloader.download_for_bbox` to pull every tile into C6.
+5. Invoke C10 `DescriptorBatcher` against the populated C6 to build the FAISS HNSW index using the NetVLAD backbone (per `c2_vpr/config.py:67` default).
+6. Verify sidecar coherence (`.index` + `.sha256` + `.meta.json` triple-consistency per AZ-306). Mismatch → `IndexUnavailableError`.
+7. Yield `PopulatedC6Cache(...)`. On any failure path, clean up partial cache state (no half-built FAISS index left behind).
+
+**Mount strategy**: write into a named docker volume that survives across pytest sessions. Cold first invocation populates; subsequent invocations within the same compose session reuse (warm cache). Same pattern AZ-777 Phase 3 originally specced; only the cache **source** changes (route, not bbox).
+
+## Acceptance criteria
+
+| # | Criterion |
+|---|-----------|
+| AC-1 | Cold first invocation on the Derkachi tlog completes in ≤ 5 min on Tier-2 Jetson (includes satellite-provider Google Maps round-trips). |
+| AC-2 | Warm invocation within the same compose session completes in ≤ 30 s (named-volume reuse). |
+| AC-3 | Yielded `PopulatedC6Cache` has all paths populated; `tile_count > 0`; FAISS sidecar triple-consistency passes (AZ-306). |
+| AC-4 | `RouteValidationError` / `RouteTerminalFailureError` from `seed_route` is re-raised with original cause; no silent swallow. |
+| AC-5 | `RouteTransientError` is retried up to 3 attempts using C11's existing backoff schedule; final attempt's exception is propagated. |
+| AC-6 | Tamper test — corrupt one of the three sidecar files between fixture runs; next invocation raises `IndexUnavailableError`. |
+| AC-7 | On any failure path inside the fixture, partial state is cleaned up (no half-built FAISS index, no orphaned postgres rows). |
+| AC-8 | Unit tests (stubbed `SatelliteProviderRouteClient` + stubbed C11 + stubbed C10) cover: happy path, transient-retry, terminal-failure, validation-error, tamper-detection, cleanup-on-failure. |
+| AC-9 | Integration test gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2` against the Jetson harness produces a real `PopulatedC6Cache` from the Derkachi tlog. |
+
+## Out of scope
+
+- Driving the airborne replay pipeline against the populated cache (AZ-840 / C4).
+- Un-xfailing the existing AZ-777 AC-4 / AC-5 tests (AZ-841 / C5).
+- Updating `replay_protocol.md` (AZ-842 / C6).
+- Switching the C2 default backbone away from NetVLAD.
+- Multi-tlog aggregate caches (one route per fixture invocation).
+
+## Risks
+
+**Risk 1 — Docker named-volume lifecycle across pytest sessions.** First invocation may leave half-populated volume on crash; the cleanup-on-failure path in step 7 must be robust. Mitigation: AC-7 covers explicitly + a `try/finally` around the four wiring steps.
+
+**Risk 2 — Cold-start budget (AC-1, 5 min) tight on first Jetson run.** Google Maps round-trips for ~50-100 tiles may exceed budget on slow networks. Mitigation: instrument elapsed_seconds on every step and surface in the verdict report; if AC-1 fails, file a perf-tuning ticket rather than skipping the AC.
+
+## References
+
+- Parent Epic: AZ-835 — https://denyspopov.atlassian.net/browse/AZ-835
+- Existing placeholder: `tests/e2e/replay/conftest.py` lines 293-310
+- C1: AZ-836 (`extract_route_from_tlog`) — https://denyspopov.atlassian.net/browse/AZ-836
+- C2: AZ-838 (`SatelliteProviderRouteClient`) — https://denyspopov.atlassian.net/browse/AZ-838
+- AZ-777 (Phase 3+ superseded): `_docs/02_tasks/done/AZ-777_derkachi_c6_reference_fixture.md`
+- C10 DescriptorBatcher: `src/gps_denied_onboard/components/c10_provisioning/descriptor_batcher.py`
+- C11 HttpTileDownloader: `src/gps_denied_onboard/components/c11_tile_manager/tile_downloader.py`
+- AZ-306 FAISS sidecar triple-consistency reference
@@ -0,0 +1,75 @@
+# E2E orchestrator test (AZ-835 C4)
+
+**Task**: AZ-840_e2e_orchestrator_test
+**Name**: E2E orchestrator test ingesting raw (tlog, video, calibration) and running steps 1-7 (AZ-835 C4)
+**Description**: Fourth building block of Epic AZ-835. A single pytest test that takes only `(tlog, video, calibration)` and runs the full 7-step pipeline end-to-end on the Jetson harness — without any operator hand-curation between steps. Extends or wraps the existing AZ-699 verdict test (`tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report`) so the verdict-report-writing path is preserved. This is the test that closes the Epic's narrative: "give it a tlog + video, and the system does everything else."
+**Complexity**: 3 SP
+**Dependencies**: AZ-839 (C3, `operator_pre_flight_setup` real fixture — HARD); AZ-836 (C1, RouteSpec — In Testing); AZ-838 (C2, SatelliteProviderRouteClient — In Testing); AZ-699 (real flight validation runner — done); AZ-405 (tlog/video auto-sync — done); AZ-702 (camera factory-sheet calibration — done); AZ-696 (≥ 80 % within 100 m threshold — done); AZ-835 (parent Epic)
+**Component**: `tests/e2e/replay/test_az835_e2e_real_flight.py` (new) OR extend `test_derkachi_real_tlog.py`
+**Tracker**: AZ-840 (https://denyspopov.atlassian.net/browse/AZ-840)
+**Parent Epic**: AZ-835
+
+Jira AZ-840 is the authoritative spec; this file is the in-workspace mirror.
+
+## Inputs (test parameters)
+
+- `tlog_path: Path` — raw ArduPilot tlog binary (Derkachi as the reference fixture; parametrize for future tlogs).
+- `video_path: Path` — raw flight video.
+- `calibration_path: Path` — camera factory-sheet calibration (AZ-702).
+
+## Pipeline orchestration
+
+The 7 steps from the Epic:
+
+1. **Active flight cut + tlog/video sync** — call AZ-405's `tlog_video_adapter`. If active-segment detection needs a small extension, file as an in-scope sub-fix; if it needs a meaningful new feature, STOP and propose a sibling ticket.
+2. **On-fly frame + IMU extraction** — `VideoFileFrameSource` + `TlogReplayFcAdapter`. No change.
+3. **Auto-create route** — call AZ-836's `extract_route_from_tlog(tlog, max_waypoints=10)`. Assert the returned `RouteSpec` materially follows the tlog trajectory.
+4. **POST route to satellite-provider** — delegate to AZ-839 (C3) fixture `operator_pre_flight_setup` (which itself calls AZ-838's `SatelliteProviderRouteClient.seed_route`). The fixture's `PopulatedC6Cache` is the dependency boundary.
+5. **Build FAISS index** — driven by C3 fixture as part of populating the cache.
+6. **Run gps-denied airborne pipeline** — invoke the `gps-denied-replay` console-script or equivalent direct-call entry point against the populated cache + tlog/video/calibration. Reuse the airborne composition root path AZ-699 exercises today.
+7. **Get GPS fixes, check vs tlog GPS** — call `helpers/accuracy_report.py` + `helpers/gps_compare.py` to compute the horizontal-error distribution and emit the verdict report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`.
+
+## Test gating
+
+- `@pytest.mark.tier2`.
+- Skip-unless-env(`RUN_REPLAY_E2E=1`) with an explicit skip reason that names the missing env var — no silent skip.
+
+## Verdict report
+
+Emit ALWAYS, even on FAIL. The success criterion for AC-1 is that the report exists and the distribution is honest — NOT that the verdict is PASS.
+
+## Acceptance criteria
+
+| # | Criterion |
+|---|-----------|
+| AC-1 | Test takes only `(tlog, video, calibration)` and runs steps 1-7 end-to-end on Tier-2 Jetson. No operator hand-curation between steps. |
+| AC-2 | Test produces the AZ-699 verdict report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` with the honest horizontal-error distribution, REGARDLESS of PASS/FAIL on the AZ-696 AC-3 threshold (≥ 80 % within 100 m). |
+| AC-3 | Test reuses the C3 fixture's `operator_pre_flight_setup` for steps 3-5; no duplicate seeding/downloading logic. |
+| AC-4 | Test runs to completion within 15 min wall time on the Derkachi clip (soft target for first delivery; hard NFR set after first measurement is recorded in the report). |
+| AC-5 | Mid-pipeline failure (e.g. step 4 satellite-provider rejection, step 5 FAISS sidecar mismatch) fails LOUD with a clear error pointing at the failing step. No silent skip past a failing step. |
+| AC-6 | Test is gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2`; explicit skip reason names the missing env var. |
+| AC-7 | The existing AZ-699 verdict test continues to pass (this test does not break or supersede it; either it lives alongside, or AZ-699 is folded into this test with the verdict-writing path preserved). |
+| AC-8 | Unit tests cover the orchestration helper layer (parameter validation, error propagation between steps). The end-to-end happy path is the Jetson integration test. |
+
+## Out of scope
+
+- Un-xfailing the AZ-777 AC-4 / AC-5 tests (AZ-841 / C5).
+- Documentation updates beyond the test file's own docstring (AZ-842 / C6).
+- Real-time tlog ingestion (one finished `.tlog` per test invocation).
+- Multi-flight aggregate validation.
+- Performance optimization beyond the AC-4 soft target.
+- Modifying the airborne composition root.
+
+## Risks
+
+**Risk 1 — Integration glue between AZ-405 tlog/video sync and the airborne pipeline's frame-source contract.** The auto-sync adapter and the airborne composition root were authored in different cycles; small impedance mismatches are likely. Mitigation: if the glue exceeds the 3 SP budget, STOP and propose a sub-ticket rather than expanding scope.
+
+**Risk 2 — Step 1 active-segment detection may need extension.** AZ-405 covered tlog↔video sync; take-off/landing boundary detection may not be implemented. Mitigation: file an in-scope sub-fix if small; STOP and propose a sibling ticket if not.
+
+## References
+
+- Parent Epic: AZ-835 — https://denyspopov.atlassian.net/browse/AZ-835
+- Hard dep (C3 fixture): AZ-839 — https://denyspopov.atlassian.net/browse/AZ-839
+- Existing verdict test: `tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report`
+- Tlog/video adapter: `src/gps_denied_onboard/replay_input/tlog_video_adapter.py` (AZ-405)
+- Helpers: `src/gps_denied_onboard/helpers/accuracy_report.py`, `src/gps_denied_onboard/helpers/gps_compare.py`
@@ -0,0 +1,71 @@
+# Un-xfail AZ-777 AC-4 + AC-5 Tier-2 tests (AZ-835 C5)
+
+> **Cycle-4 deferral (2026-05-26)**: moved to `backlog/` during cycle-4 Step 9
+> scope review. Blocking issues:
+> - **Conflict with AZ-895 AC-4**: AZ-895 (cycle-4 cleanup) explicitly states
+>   `test_derkachi_real_tlog.py` stays `@xfail` with the AZ-848-scoped reason
+>   in cycle 4. Un-xfailing this test here contradicts AZ-895 and will fail
+>   the Jetson run because AZ-848 (the underlying clock bug) is in backlog/.
+> - **Partial overlap with AZ-894 AC-3**: the other un-xfail target
+>   (`test_derkachi_1min.py::AC3`) is the same test AZ-894 (cycle-4 CSV
+>   adapter) covers under its own AC-3 — re-doing the un-xfail in a
+>   separate ticket duplicates effort.
+> - **Replay condition**: revisit when EITHER (a) AZ-848 is fixed and the
+>   tlog adapter path is restored, OR (b) cycle 4 lands and we rescope this
+>   ticket to only the CSV-path tests AZ-894 doesn't already cover.
+
+**Task**: AZ-841_unxfail_az777_tier2_tests
+**Name**: Un-xfail AZ-777 AC-4 + AC-5 Tier-2 tests once C3 fixture + C4 orchestrator land (AZ-835 C5)
+**Description**: Fifth building block of Epic AZ-835. Once C3 (AZ-839, `operator_pre_flight_setup` real fixture) and C4 (AZ-840, e2e orchestrator test) land, remove the `@pytest.mark.xfail` markers from the AZ-777 Tier-2 tests. The verdict — PASS or FAIL — becomes the honest signal. Both tests remain gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2`.
+**Complexity**: 1 SP
+**Dependencies**: AZ-839 (C3, `operator_pre_flight_setup` real fixture — HARD); AZ-840 (C4, e2e orchestrator test — HARD); AZ-777 (being closed/superseded by this Epic; tests live in same file tree); AZ-835 (parent Epic)
+**Component**: `tests/e2e/replay/test_derkachi_1min.py` (xfail removal) + `tests/e2e/replay/test_derkachi_real_tlog.py` (xfail removal)
+**Tracker**: AZ-841 (https://denyspopov.atlassian.net/browse/AZ-841)
+**Parent Epic**: AZ-835
+
+Jira AZ-841 is the authoritative spec; this file is the in-workspace mirror.
+
+## Targets
+
+1. `tests/e2e/replay/test_derkachi_1min.py::test_ac3_within_100m_80pct_of_ticks` (AZ-777 AC-4) — remove `@pytest.mark.xfail`; verify `@pytest.mark.tier2` + `RUN_REPLAY_E2E` gating stays in place.
+2. `tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report` (AZ-777 AC-5) — remove `@pytest.mark.xfail`; verify gating stays in place.
+
+## Verification
+
+**On Tier-2 Jetson** (`RUN_REPLAY_E2E=1`):
+- `test_ac3_within_100m_80pct_of_ticks` PASSES (≥ 80 % of ticks within 100 m of ground truth, log lines `replay.satellite_anchor_inserted` visible).
+- `test_az699_real_flight_validation_emits_verdict_and_report` runs to completion within 15 min and emits `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` with the honest distribution. PASS preferred but NOT required for AC-4 — emitting the honest report IS the success criterion.
+
+**Locally** (no env):
+- Both tests skip explicitly with a reason naming `RUN_REPLAY_E2E` — they MUST NOT pass as a side effect of being skipped.
+
+## Acceptance criteria
+
+| # | Criterion |
+|---|-----------|
+| AC-1 | `@pytest.mark.xfail` removed from both AZ-777 tests. |
+| AC-2 | Both tests still gated by `@pytest.mark.tier2` + skip-unless-env(`RUN_REPLAY_E2E=1`). Skip reason names the missing env. |
+| AC-3 | On Jetson with `RUN_REPLAY_E2E=1`, `test_ac3_within_100m_80pct_of_ticks` PASSES (≥ 80 % within 100 m). |
+| AC-4 | On Jetson with `RUN_REPLAY_E2E=1`, `test_az699_real_flight_validation_emits_verdict_and_report` completes within 15 min and emits the verdict report. PASS preferred but not required for AC-4. |
+| AC-5 | If either test FAILS on the metric (e.g. only 60 % within 100 m), the test reports FAIL honestly — no fallback to xfail or skip. Failure mode is a feature, not a bug. |
+| AC-6 | Locally on a machine without `RUN_REPLAY_E2E`, both tests skip with an explicit reason. |
+
+## Out of scope
+
+- Modifying the airborne pipeline to improve metric performance (separate optimization tickets if AC-3 fails).
+- Adding new test cases (this ticket only removes xfail; new cases belong to other tickets).
+- Documentation updates (AZ-842 / C6).
+- Modifying the verdict thresholds (AZ-696).
+
+## Risks
+
+**Risk 1 — Un-xfailed tests may FAIL on the metric.** If horizontal-error distribution comes in worse than the 80 % @ 100 m gate, this test reports FAIL. That outcome is in-scope for AC-5 (report honestly) and out-of-scope for this ticket's fix (file a separate optimization ticket).
+
+## References
+
+- Parent Epic: AZ-835 — https://denyspopov.atlassian.net/browse/AZ-835
+- Hard deps: AZ-839 (C3), AZ-840 (C4)
+- Tests: `tests/e2e/replay/test_derkachi_1min.py`, `tests/e2e/replay/test_derkachi_real_tlog.py`
+- AZ-777 spec: `_docs/02_tasks/done/AZ-777_derkachi_c6_reference_fixture.md` (post-closure)
+- Threshold spec: AZ-696 (≥ 80 % within 100 m)
+- Verdict writer: `src/gps_denied_onboard/helpers/accuracy_report.py`
@@ -0,0 +1,90 @@
+# Docs: replay_protocol.md + architecture.md + orchestrator-test README (AZ-835 C6)
+
+**Task**: AZ-842_replay_protocol_and_orchestrator_docs
+**Name**: Docs: replay_protocol.md Invariant 12 + AZ-777 Phase 3+ superseded note + orchestrator-test README (AZ-835 C6)
+**Description**: Sixth and final building block of Epic AZ-835. Capture the route-driven flow in the authoritative documents so future implementers, operators, and reviewers understand what changed and why.
+**Complexity**: 3 SP (cycle-4 rescope: was 2 SP)
+**Dependencies**: AZ-894 (CSV adapter — HARD; replay_protocol.md sub-section describes the new single-canonical-clock flow); AZ-895 (auto-sync deprecation — HARD; replay_protocol.md sub-section describes the tlog adapter's new audit-only role); AZ-896 (CSV format docs — SOFT; replay_protocol.md cross-links to the format spec); AZ-777 (closed/superseded by this Epic); AZ-835 (parent Epic)
+**Component**: `_docs/02_document/contracts/replay/replay_protocol.md` + `_docs/02_document/architecture.md` + `tests/e2e/replay/README*.md`
+**Tracker**: AZ-842 (https://denyspopov.atlassian.net/browse/AZ-842)
+**Parent Epic**: AZ-835
+
+Jira AZ-842 is the authoritative spec; this file is the in-workspace mirror.
+
+> **Cycle-4 rescope (2026-05-26)**: dropped the AZ-841 (un-xfail) soft
+> dependency — AZ-841 was deferred to backlog in cycle-4 Step 9 scope
+> review (see `_docs/02_tasks/backlog/AZ-841_unxfail_az777_tier2_tests.md`).
+> Expanded scope from "AZ-835 epic docs only" to also cover the cycle-4
+> replay-input redesign narrative: AZ-894 (CSV-driven single-canonical-clock
+> adapter), AZ-895 (tlog adapter → audit-only after auto-sync deprecation),
+> AZ-896 (CSV format spec). The replay_protocol.md edits now describe BOTH
+> the route-driven AZ-835 flow AND the cycle-4 CSV-driven replay path,
+> which together supersede the legacy tlog+auto-sync surface.
+> Complexity bumped 2 → 3 SP to cover the added cycle-4 narrative.
+
+## Modified files
+
+### 1. `_docs/02_document/contracts/replay/replay_protocol.md` — Invariant 12 extension + Invariant 13 (NEW, cycle-4)
+
+**1a. Invariant 12 — route-driven flow (AZ-835)**
+
+Extend **Invariant 12** with an AZ-835 sub-section describing:
+
+- The route-driven `operator_pre_flight_setup` fixture (AZ-839 / C3) flow: tlog → `RouteSpec` → `POST /api/satellite/route` → tile download → FAISS build → yield `PopulatedC6Cache`.
+- Why route-driven supersedes the AZ-777 bbox approach (efficiency: ~100× fewer tiles; honesty: pre-commits to where the operator did fly).
+- The C3 fixture's failure-handling contract (validation/terminal → re-raise; transient → retry up to 3 attempts using C11's existing backoff schedule).
+
+**1b. Invariant 13 — single canonical clock (cycle-4, AZ-894 / AZ-895 / AZ-896)**
+
+Add a new **Invariant 13** sub-section describing:
+
+- The single-clock model production uses (single edge device, single clock at receipt) and why two-clock surfaces (e.g. `VioOutput.emitted_at_ns` from Jetson monotonic vs. `ImuWindow.ts_end_ns` from FC-boot) produce ESKF out-of-order regressions like AZ-848.
+- The CSV-driven replay path (AZ-894) — `(video, CSV)` operator input, IMU + GPS-ground-truth on a single canonical clock derived from the CSV's `Time` column, no auto-sync.
+- The CSV schema (delegate to `_docs/02_document/contracts/replay/csv_replay_format.md` produced by AZ-896 for the field-level spec).
+- The tlog-replay adapter's new audit-only role (AZ-895): retained for FDR analysis and one-shot tlog→CSV export, removed from the test/demo critical path.
+- Auto-sync deprecation (AZ-895): `--time-offset-ms` / `--skip-auto-sync-validation` CLI flags removed or marked deprecated with one-cycle warning.
+
+### 2. `_docs/02_document/architecture.md` — satellite-provider entry extension
+
+Append a sub-section to the existing satellite-provider entry noting that Epic AZ-835 + its C1-C5 children landed the full e2e real-flight validation path on top of AZ-777 Phase 1's wire + C11 contract adaptation. Mark AZ-777 Phase 3+ as superseded by Epic AZ-835 (pointer-only — the AZ-777 spec itself is updated in C5's wake during the AZ-777 closure step).
+
+### 3. `tests/e2e/replay/README*.md` — orchestrator-test README
+
+Either extend `tests/e2e/replay/README.md` or create a dedicated `tests/e2e/replay/README_AZ835.md` (prefer dedicated file if the existing README is already long). Short operator-facing content:
+
+- How to run the new orchestrator test locally (env vars, Jetson SSH alias, expected runtime).
+- What `(tlog, video, calibration)` triple to provide and where the reference Derkachi fixture lives.
+- Where the verdict report is written and how to interpret it (PASS/FAIL on AZ-696 AC-3 threshold).
+- Imagery-source caveat: Google Maps satellite (dev/research use only; production needs CC-BY migration on the satellite-provider side).
+
+## Acceptance criteria
+
+| # | Criterion |
+|---|-----------|
+| AC-1 | `replay_protocol.md` Invariant 12 has a new AZ-835 sub-section covering the route-driven flow, the bbox-supersedure rationale, and the failure-handling contract. |
+| AC-1b | `replay_protocol.md` has a new Invariant 13 (cycle-4) sub-section covering the single-canonical-clock model, the CSV-driven replay path (AZ-894), the tlog adapter's audit-only role (AZ-895), and auto-sync deprecation. Links to `csv_replay_format.md` (AZ-896). |
+| AC-2 | `architecture.md` satellite-provider entry has a sub-section noting Epic AZ-835's contribution and pointing at AZ-777 Phase 3+ as superseded. |
+| AC-2b | `architecture.md` replay-input section explains the cycle-4 redesign: CSV adapter primary path, tlog adapter audit-only role, removal of auto-sync. References AZ-894 / AZ-895 / AZ-896 / AZ-897. |
+| AC-3 | `tests/e2e/replay/README*.md` exists and a new contributor can run the orchestrator test on Jetson using only the README's instructions (no out-of-band knowledge required). |
+| AC-4 | All three docs link to the Epic (AZ-835), its children (AZ-836 / AZ-838 / AZ-839 / AZ-840), and the cycle-4 redesign tickets (AZ-894 / AZ-895 / AZ-896 / AZ-897). AZ-841 reference omitted (deferred to backlog). |
+| AC-5 | License attribution string ("Imagery © Google") and the dev-only caveat are present in the test README. |
+| AC-6 | Cross-references in `_docs/02_tasks/_dependencies_table.md` and `_docs/02_tasks/done/AZ-777*.md` (once moved) point at this Epic / its children and at the cycle-4 redesign tickets. |
+
+## Out of scope
+
+- Updating consumer-facing API/contract docs in `../satellite-provider/` (parent-suite owns those).
+- Migrating imagery source to a CC-BY provider (parent-suite, out of scope for this Epic).
+- Writing additional tutorials beyond the orchestrator-test README.
+- ADR creation — no new architectural decision; this Epic implements existing decisions.
+
+## Risks
+
+**Risk 1 — Scope creep into reformatting unrelated doc sections.** Resist; this ticket only adds what AC-1..AC-5 require.
+
+## References
+
+- Parent Epic: AZ-835 — https://denyspopov.atlassian.net/browse/AZ-835
+- Replay protocol: `_docs/02_document/contracts/replay/replay_protocol.md` Invariant 12
+- Architecture: `_docs/02_document/architecture.md` (satellite-provider section)
+- Tests directory: `tests/e2e/replay/`
+- AZ-777 spec (being superseded): `_docs/02_tasks/done/AZ-777_derkachi_c6_reference_fixture.md` (post-closure)
@@ -0,0 +1,69 @@
+# Relocate RouteSpec DTO to _types/route.py (AZ-507 rule 9 fix)
+
+**Task**: AZ-845_refactor_relocate_routespec
+**Name**: Relocate `RouteSpec` from `replay_input/tlog_route.py` to `_types/route.py`
+**Description**: Resolve cycle-3 cumulative review F1 (High Architecture). Move the `RouteSpec` cross-component DTO to `_types/route.py` so the `c11_tile_manager.route_client` import becomes rule-9 compliant. Producer-side keeps backward-compat re-export so test imports do not break.
+**Complexity**: 2 SP
+**Dependencies**: None (anchor task of refactor run 02-az507-routespec-relocation)
+**Component**: `_types/` (new file `route.py`); `replay_input/` (`tlog_route.py`, `__init__.py` modify); `components/c11_tile_manager/` (`route_client.py` modify)
+**Tracker**: AZ-845 (https://denyspopov.atlassian.net/browse/AZ-845)
+**Parent Epic**: AZ-844 (Refactor 02 — RouteSpec relocation + module-layout refresh + AZ-270 lint widening)
+
+Jira AZ-845 is the authoritative spec; this file is the in-workspace mirror.
+
+## Problem
+
+`src/gps_denied_onboard/components/c11_tile_manager/route_client.py:56` imports `RouteSpec` from `gps_denied_onboard.replay_input.tlog_route`, violating `module-layout.md` rule 9 (AZ-507 cross-component contract surface). Per the rule, `components/<X>/*.py` may only import from `_types/*`, `_types.inference_errors`, `helpers/*`, `config`, `logging`, `fdr_client`, `clock`, `frame_source` (interface only), and its own subpackage. `replay_input` is not in this allow-list. Every other cross-component DTO already lives under `_types/*` (`_types/geo.py`, `_types/tile.py`, `_types/inference.py`, etc.); `RouteSpec` is the asymmetric outlier.
+
+## Outcome
+
+- `RouteSpec` is defined in `src/gps_denied_onboard/_types/route.py` (frozen+slots dataclass; full docstring carried over verbatim).
+- `c11_tile_manager/route_client.py:56` imports `RouteSpec` from `gps_denied_onboard._types.route`.
+- `replay_input/tlog_route.py` continues to use `RouteSpec` internally (extractor return type) by importing from `_types.route`; keeps `RouteSpec` in `__all__` for backward-compat re-export.
+- `replay_input/__init__.py` re-exports `RouteSpec` from `_types.route` directly.
+- All existing tests pass at HEAD.
+- Rule-9 audit reports zero violations after the move.
+
+## Scope
+
+### Included
+
+- New file: `src/gps_denied_onboard/_types/route.py` with `RouteSpec` dataclass.
+- Modify `src/gps_denied_onboard/replay_input/tlog_route.py` (remove local definition, add import).
+- Modify `src/gps_denied_onboard/replay_input/__init__.py` (re-export from `_types.route`).
+- Modify `src/gps_denied_onboard/components/c11_tile_manager/route_client.py:56` (the rule-9 fix) plus the docstring snippet at file-top that names the source module.
+- Optional hygiene: update 5 test files that import `RouteSpec` from `replay_input.tlog_route` directly (`tests/unit/replay_input/test_tlog_route.py`, `tests/unit/c11_tile_manager/test_route_client.py`, `tests/e2e/replay/_operator_pre_flight.py`, `tests/e2e/replay/test_e2e_orchestrator_unit.py`, `tests/e2e/replay/test_operator_pre_flight_driver.py`) to import from `_types.route` for symmetry.
+
+### Excluded
+
+- `RouteExtractionError` does NOT relocate — it is a `replay_input/`-specific error not imported by any `components/<X>/*.py` file.
+- `extract_route_from_tlog` does NOT relocate — extraction logic is a `replay_input/` concern; only the DTO moves.
+- No contract document at `_docs/02_document/contracts/shared_types/route.md`.
+- No behaviour, performance, or contract-shape changes.
+
+## Acceptance Criteria
+
+| # | Criterion |
+|---|-----------|
+| AC-1 | `_types/route.py` contains `RouteSpec` with `@dataclass(frozen=True, slots=True)`, identical fields to the original (`waypoints`, `suggested_region_size_meters`, `source_tlog`, `source_segment`, `total_distance_meters`), and the full original docstring. |
+| AC-2 | `route_client.py:56` reads `from gps_denied_onboard._types.route import RouteSpec`; rule-9 audit reports zero violations across `components/**/*.py`. |
+| AC-3 | `replay_input/tlog_route.py` imports `RouteSpec` from `_types.route`; `extract_route_from_tlog` returns `RouteSpec`; `RouteSpec` is in `__all__` so `from replay_input.tlog_route import RouteSpec` resolves via re-export. |
+| AC-4 | `from gps_denied_onboard.replay_input import RouteSpec` resolves to the same class object as `_types.route.RouteSpec` (verified by `is` identity check in test). |
+| AC-5 | `pytest tests/unit/replay_input/test_tlog_route.py tests/unit/c11_tile_manager/test_route_client.py` passes — no failures, no skipped tests beyond pre-existing skips. |
+
+## Constraints
+
+- `RouteSpec` MUST remain `frozen=True, slots=True` (AZ-355 AC-2).
+- `RouteSpec.__module__` MAY change to `gps_denied_onboard._types.route` (intended observable change; no test asserts on it).
+- `from gps_denied_onboard.replay_input import RouteSpec` MUST keep working.
+
+## Risks & Mitigation
+
+**Risk 1 — pickle / serialization break**: confirmed by grep — no `pickle.dumps(route)` exists in `src/` or `tests/`. Risk does not materialize.
+
+**Risk 2 — hidden import grep missed**: producer-side keeps `RouteSpec` in its namespace via re-import + `__all__`; lazy importers using the old path resolve correctly.
+
+## Implementation Notes
+
+- This is the anchor of refactor run 02-az507-routespec-relocation. AZ-846 (module-layout refresh) and AZ-847 (lint widening) are blocked by this task (Jira "Blocks" links recorded).
+- After this task lands, run the rule-9 audit script (the widened lint from AZ-847 once it lands) to confirm zero violations.
@@ -0,0 +1,60 @@
+# Refresh module-layout.md cycle-3 entries (c11 + replay_input + _types/route)
+
+**Task**: AZ-846_refactor_module_layout_cycle3
+**Name**: Refresh `module-layout.md` for cycle-3 file additions and the new `_types/route.py`
+**Description**: Resolve cycle-3 cumulative review F2 (Medium Architecture). Update the c11_tile_manager Internal list, the shared/replay_input file list, and the `_types/` section in `module-layout.md` so they match on-disk reality. Cycle-2 carry-overs OUTSIDE these three sections are explicitly out of scope (deferred to a separate doc task).
+**Complexity**: 2 SP
+**Dependencies**: AZ-845 (the new `_types/route.py` file must exist before this task can register it)
+**Component**: `_docs/02_document/module-layout.md` (single file)
+**Tracker**: AZ-846 (https://denyspopov.atlassian.net/browse/AZ-846)
+**Parent Epic**: AZ-844 (Refactor 02 — RouteSpec relocation + module-layout refresh + AZ-270 lint widening)
+
+Jira AZ-846 is the authoritative spec; this file is the in-workspace mirror.
+
+## Problem
+
+`module-layout.md` is stale. Cycle-3 cumulative review F2 documents:
+
+- **c11_tile_manager Internal list** lists 2 files (`satellite_provider_downloader.py`, `satellite_provider_uploader.py`); on-disk has 8 internal files plus `route_client.py` (cycle-3 NEW from batch 107).
+- **shared/replay_input file list** is missing `errors.py` (cycle-2 carry), `tlog_ground_truth.py` (cycle-2 carry), `tlog_route.py` (cycle-3 NEW from batch 106).
+- **`_types/` file list** does not yet include `route.py` (added by AZ-845).
+
+`/implement` Step 4 (File Ownership) treats `module-layout.md` as authoritative; staleness BLOCKS any future task touching unregistered areas. F2 is currently Medium; severity escalates to High if a fourth consecutive cycle leaves it stale.
+
+## Outcome
+
+- c11_tile_manager Internal list registers all 8 internals + `route_client.py`.
+- shared/replay_input file list registers `errors.py`, `tlog_ground_truth.py`, `tlog_route.py`.
+- `_types/` section registers `route.py` with a one-line description matching the convention of other `_types/*.py` entries.
+- `git diff` shows additions only to those three sections — no other section, rule, or rule-text edit.
+
+## Scope
+
+### Included
+
+- Append cycle-3 + relevant cycle-2-carry entries to the c11_tile_manager Internal list, the shared/replay_input file list, and the `_types/` section.
+
+### Excluded
+
+- **Cycle-2 carry-overs OUTSIDE these sections**: `replay_api/` Per-Component Mapping entry, `cli/render_map.py`, `cli/replay_api_entrypoint.py`, `helpers/gps_compare.py`, `helpers/accuracy_report.py`. These are recorded in the cycle-3 retrospective and require a separate follow-up doc task with its own AZ ID.
+- No code changes.
+- No changes to `module-layout.md` rule numbering or rule text. Only the per-section file inventories are updated.
+
+## Acceptance Criteria
+
+| # | Criterion |
+|---|-----------|
+| AC-1 | c11_tile_manager Internal list contains all 8 existing internals (`_types.py`, `config.py`, `errors.py`, `idempotent_retry.py`, `signing_key.py`, `tile_downloader.py`, `tile_uploader.py`) plus `route_client.py`, alphabetised. |
+| AC-2 | shared/replay_input file list adds `errors.py`, `tlog_ground_truth.py`, `tlog_route.py` with one-line descriptions matching the existing convention. |
+| AC-3 | `_types/` section includes `route.py` with a one-line description of `RouteSpec` (waypoints + region size + source tlog provenance), identifying its producer (`replay_input/tlog_route.py`) and consumer (`c11/route_client.py`). |
+| AC-4 | Diff of `module-layout.md` shows edits to ONLY the three named sections; no edits to other sections, rule numbering, or rule text. |
+
+## Constraints
+
+- Single file modified: `_docs/02_document/module-layout.md`.
+- No tests required — documentation update.
+- Scope discipline: cycle-2 doc carry-overs outside the three sections remain deferred.
+
+## Risks & Mitigation
+
+**Risk 1 — scope creep into cycle-2 carry-overs**: the Excluded list is explicit; Phase-4 implementer reviews the diff against ACs and rejects entries outside the three named sections at review.
@@ -0,0 +1,61 @@
+# Widen test_az270_compose_root lint to enforce full rule-9 allow-list
+
+**Task**: AZ-847_refactor_az270_lint_widening
+**Name**: Widen `test_ac6_only_compose_root_imports_concrete_strategies` to enforce the full rule-9 allow-list
+**Description**: Resolve cycle-3 cumulative review F3 (Medium Maintainability). Replace the AZ-270 lint's narrow `components → components` check with a full rule-9 allow-list check, so any future cross-component drift is caught at lint time rather than at cumulative-review time. Strict superset of the existing AC-6 check.
+**Complexity**: 2 SP
+**Dependencies**: AZ-845 (the widened lint must see a clean codebase to pass; running it against pre-AZ-845 HEAD is what AC-4 demonstrates as a one-time verification)
+**Component**: `tests/unit/test_az270_compose_root.py` (single file)
+**Tracker**: AZ-847 (https://denyspopov.atlassian.net/browse/AZ-847)
+**Parent Epic**: AZ-844 (Refactor 02 — RouteSpec relocation + module-layout refresh + AZ-270 lint widening)
+
+Jira AZ-847 is the authoritative spec; this file is the in-workspace mirror.
+
+## Problem
+
+`tests/unit/test_az270_compose_root.py:194-219` (`test_ac6_only_compose_root_imports_concrete_strategies`) walks `src/gps_denied_onboard/components/**/*.py` and flags only edges whose `node.module` starts with `gps_denied_onboard.components.` AND whose leaf-component is not the importer's component. The full rule-9 allow-list (8 prefixes plus `frame_source` interface-only restriction) is NOT enforced. Imports from `replay_input`, `replay_api`, `runtime_root`, `cli/*`, and `frame_source` non-interface modules pass silently. F1 of the cycle-3 cumulative review (the c11 → replay_input edge) is the concrete consequence.
+
+`module-layout.md` rule 9 documents this lint as the enforcement mechanism for the rule. Reviewers reasonably assume the lint covers the documented allow-list; in practice it covers only one of the eight prefixes. The asymmetry is the F3 finding.
+
+## Outcome
+
+- `test_ac6_only_compose_root_imports_concrete_strategies` enforces the full rule-9 allow-list: a `components/<X>/*.py` ImportFrom node is allowed iff the imported module matches one of: `gps_denied_onboard.components.<X>.*` (own component), `gps_denied_onboard._types.*`, `gps_denied_onboard._types.inference_errors`, `gps_denied_onboard.helpers.*`, `gps_denied_onboard.config`, `gps_denied_onboard.logging`, `gps_denied_onboard.fdr_client`, `gps_denied_onboard.clock`, `gps_denied_onboard.frame_source` (interface-only — see Constraints).
+- The widened lint is a strict superset of the existing AC-6 narrow check.
+- After AZ-845 lands, the widened lint reports zero violations.
+- The test docstring cites `module-layout.md` rule 9, not just AZ-270 AC-6.
+
+## Scope
+
+### Included
+
+- Modify `tests/unit/test_az270_compose_root.py` — the `test_ac6_*` test and its docstring.
+- Add a small allow-list constant at module scope (single source of truth).
+- Verify by `pytest tests/unit/test_az270_compose_root.py` after AZ-845 lands.
+
+### Excluded
+
+- Changes to other tests in the same file.
+- Changes to production code.
+- The `frame_source` interface-only enforcement: if AST-level disambiguation between interface and non-interface modules within `frame_source/*` is not feasible, allow-list only the explicit interface module path and reject other `frame_source.*` paths. Document in the test docstring.
+
+## Acceptance Criteria
+
+| # | Criterion |
+|---|-----------|
+| AC-1 | The lint flags any ImportFrom in `components/**/*.py` whose `module` starts with `gps_denied_onboard.` and is NOT in the rule-9 allow-list. |
+| AC-2 | Strict superset of the existing AC-6 narrow check — every cross-component edge previously flagged is still flagged. |
+| AC-3 | After AZ-845 lands, the widened lint reports zero violations. |
+| AC-4 | Against the codebase BEFORE AZ-845 (verified during implementation by running the new lint on a temp checkout of pre-relocation HEAD), the lint produces a failure naming the c11 → replay_input edge and citing rule 9. |
+| AC-5 | The test docstring cites `module-layout.md` rule 9 (AZ-507 cross-component contract surface) and lists the allow-list. |
+
+## Constraints
+
+- `frame_source` interface-only requirement: if AST-level disambiguation is not feasible, allow-list only the explicit interface module path. Document the chosen disambiguation strategy in the test docstring. Surface to user if the documented intent and codebase reality disagree.
+- The existing test name MAY remain (preserves AZ-270 audit trail) or be renamed; if renamed, update `module-layout.md` rule 9's enforcement-citation.
+- Single file modified: `tests/unit/test_az270_compose_root.py`. No production source change.
+
+## Risks & Mitigation
+
+**Risk 1 — widening exposes another rule-9 violation**: STOP-and-surface protocol. The implement skill MUST stop and present the additional violation as a scope-decision Choose to the user, NOT auto-bundle into this task. Remediation of any newly-exposed violation is a separate AZ ticket.
+
+**Risk 2 — false positive on `gps_denied_onboard.frame_source` non-interface module**: documented disambiguation strategy in the test docstring. If wrong, the failure surfaces as a deterministic test failure, not silent drift; surface to user.
@@ -0,0 +1,53 @@
+# Replay: CSV-driven IMU+GPS adapter using single canonical clock
+
+**Task**: AZ-894_csv_driven_replay_adapter
+**Name**: Add a CSV-replay adapter that consumes the Derkachi-schema `data_imu.csv` (or any flight that ships with a paired CSV) and exposes IMU + GPS-ground-truth on a single canonical clock derived from the CSV's `Time` column
+**Description**: Cycle 3 surfaced AZ-848 (eskf_out_of_order on frame 3) because the current replay pipeline imports two incompatible clocks: `VioOutput.emitted_at_ns` uses Jetson process-monotonic time, while `ImuWindow.ts_end_ns` uses FC-boot-relative time (parsed from MAVLink tlog messages). The single-clock model that production uses (single edge device, single clock at receipt) is not what replay does today. The Derkachi fixture's `data_imu.csv` already contains both IMU (`SCALED_IMU2.*`) and GPS ground truth (`GLOBAL_POSITION_INT.*`) on a single canonical clock (the `Time` column, 0..489.9 s at 10 Hz, aligned 3:1 with the 30 fps video). Using the CSV directly eliminates the clock-mismatch surface entirely for the test/demo path and matches the production single-clock model.
+
+**Complexity**: 3 SP
+**Dependencies**: AZ-896 (format docs land in the same cycle but can land in either order)
+**Blocks**: AZ-895 (auto-sync deprecation), AZ-897 (replay UI)
+**Component**: replay_input (new adapter), c8_fc_adapter (alternate ground-truth source), cli/replay
+**Tracker**: AZ-894 (https://denyspopov.atlassian.net/browse/AZ-894)
+**Parent Epic**: (none — cycle-4 replay-input redesign)
+
+## Schema
+
+The Derkachi CSV header (19 columns):
+
+```
+timestamp(ms), Time,
+SCALED_IMU2.xacc, SCALED_IMU2.yacc, SCALED_IMU2.zacc,
+SCALED_IMU2.xgyro, SCALED_IMU2.ygyro, SCALED_IMU2.zgyro,
+SCALED_IMU2.xmag, SCALED_IMU2.ymag, SCALED_IMU2.zmag,
+GLOBAL_POSITION_INT.lat, GLOBAL_POSITION_INT.lon, GLOBAL_POSITION_INT.alt,
+GLOBAL_POSITION_INT.relative_alt,
+GLOBAL_POSITION_INT.vx, GLOBAL_POSITION_INT.vy, GLOBAL_POSITION_INT.vz,
+GLOBAL_POSITION_INT.hdg
+```
+
+- `timestamp(ms)`: FC-boot-relative milliseconds (kept for traceability; not used by C5)
+- `Time`: flight-relative seconds (canonical clock — what C5 actually uses)
+- `SCALED_IMU2.*`: 10 Hz IMU stream (accel mg, gyro mrad/s, mag mGauss per ArduPilot convention)
+- `GLOBAL_POSITION_INT.*`: 10 Hz GPS ground truth (lat/lon in 1e-7 deg, alt in mm, vx/vy/vz in cm/s, hdg in cdeg)
+
+## Acceptance Criteria
+
+- **AC-1**: Adapter parses the Derkachi `data_imu.csv` end-to-end and emits 4,899 IMU samples + 4,899 GPS-ground-truth samples on a single monotonic clock anchored at row 0.
+- **AC-2**: Wired into `cli/replay.py`; `gps-denied-replay --video flight_derkachi.mp4 --imu data_imu.csv` runs without invoking `tlog_replay_adapter.py`.
+- **AC-3**: `test_derkachi_1min.py::test_ac1_exits_0_jsonl_count_match` passes on the Jetson e2e harness using the new path. AZ-848 cascade no longer triggers (no two-clock surface in the new path).
+- **AC-4**: `VioOutput.emitted_at_ns` is populated from the CSV's `Time` column (or the frame-derived `t = N/fps`), not `time.monotonic_ns()`, when the new adapter is in use.
+- **AC-5**: Schema mismatch (missing required column, NaN in `Time`, non-monotonic `Time`) raises a clear `ReplayInputAdapterError` at startup, not deep in the loop.
+
+## Out of scope
+
+- The structural AZ-848 / AZ-883 fix in the tlog adapter — those stay open as backlog.
+- UI for picking the CSV — AZ-897.
+- Other CSV schemas (PX4, generic MAVLink dumps) — future enhancement if needed.
+
+## References
+
+- Cycle-3 retro: `_docs/06_metrics/retro_2026-05-26.md`
+- Bench-run evidence: `_docs/04_release/release_cycle3_jetson-bench_2026-05-26-1442.md`
+- Companion tickets: AZ-895 (deprecate auto-sync), AZ-896 (format docs + example CSV), AZ-897 (replay UI)
+- Supersedes (re bench-blocking): AZ-848 (VioOutput contract), AZ-883 (SCALED_IMU2 ts_ns=0)
@@ -0,0 +1,39 @@
+# Replay: deprecate auto_sync surface; tlog adapter → audit-only
+
+**Task**: AZ-895_deprecate_auto_sync_surface
+**Name**: Remove the tlog+video auto-sync infrastructure and reframe `tlog_replay_adapter.py` as audit-only, now that AZ-894 ships the CSV-driven primary path
+**Description**: User decision (2026-05-26): the test/demo replay path will accept a paired (video, CSV) input from the operator instead of auto-syncing a tlog and video. Auto-sync is unnecessary in production (single edge device, single clock by design) and over-engineered for test (the CSV already encodes the alignment).
+
+**Complexity**: 2 SP
+**Dependencies**: AZ-894 (must ship first — the CSV adapter is the replacement)
+**Component**: replay_input (auto_sync.py, tlog_video_adapter.py), cli/replay, runtime_root/_replay_branch
+**Tracker**: AZ-895 (https://denyspopov.atlassian.net/browse/AZ-895)
+**Parent Epic**: (none — cycle-4 replay-input redesign)
+
+## Touch list
+
+- `src/gps_denied_onboard/replay_input/auto_sync.py` — delete or convert to a clear no-op that raises `ReplayInputAdapterError("auto-sync removed; supply --imu CSV instead")`
+- `src/gps_denied_onboard/replay_input/tlog_video_adapter.py` — strip auto-sync invocations
+- `src/gps_denied_onboard/cli/replay.py` — remove `--time-offset-ms` / `--skip-auto-sync-validation` flags (or mark deprecated with one-cycle warning)
+- `src/gps_denied_onboard/runtime_root/_replay_branch.py` — strip auto-sync wiring
+- `tests/unit/replay_input/test_az405_auto_sync.py` — pass against the new behaviour or delete with rationale recorded in the batch report
+- `tests/e2e/replay/test_derkachi_real_tlog.py` — continues to `@xfail` with the AZ-848-scoped reason; nothing in this ticket fixes the underlying tlog-clock bug
+- `tlog_replay_adapter.py` / `tlog_ground_truth.py` — module docstrings updated to call out the new audit-only / one-shot-export roles
+
+## Acceptance Criteria
+
+- **AC-1**: `auto_sync.py` is either deleted or made into a clear no-op that raises `ReplayInputAdapterError("auto-sync removed; supply --imu CSV instead")`.
+- **AC-2**: All references to `--time-offset-ms` / `--skip-auto-sync-validation` flags in the CLI are removed or marked deprecated with a one-cycle deprecation warning.
+- **AC-3**: `test_az405_auto_sync` tests either pass against the new behaviour or are deleted with rationale recorded in the batch report.
+- **AC-4**: `test_derkachi_real_tlog.py` continues to `@xfail` with the AZ-848-scoped reason; nothing in this ticket fixes the underlying tlog-clock bug.
+- **AC-5**: Module docstrings of `tlog_replay_adapter.py` and `tlog_ground_truth.py` are updated to call out their new audit-only / one-shot-export roles.
+
+## Out of scope
+
+- AZ-848 / AZ-883 structural fix — they stay open as backlog (tlog path is still broken, just no longer the primary path).
+- New CSV export tooling for arbitrary tlogs — future ticket.
+
+## References
+
+- Cycle-3 retro: `_docs/06_metrics/retro_2026-05-26.md`
+- Companion: AZ-894 (CSV adapter — must land first), AZ-896 (docs), AZ-897 (UI)
@@ -0,0 +1,38 @@
+# Docs: replay-input format spec + downloadable example CSV
+
+**Task**: AZ-896_replay_format_docs_and_example_csv
+**Name**: Author the operator-facing format spec for the (video, CSV) replay input pair, plus a minimal downloadable example CSV
+**Description**: Operators using the replay/demo path need to know the exact CSV schema the system accepts, the hard contract (video t=0 ≡ CSV row 0; video must be nadir; UAV must already be airborne at t=0), and have a downloadable example to copy from. Operators today have no entry point that documents this.
+
+**Complexity**: 1 SP
+**Dependencies**: AZ-894 (the adapter that consumes the format — the doc describes what AZ-894 accepts)
+**Blocks**: AZ-897 (UI links to the docs page and serves the example CSV)
+**Component**: docs (_docs/04_release/)
+**Tracker**: AZ-896 (https://denyspopov.atlassian.net/browse/AZ-896)
+**Parent Epic**: (none — cycle-4 replay-input redesign)
+
+## What
+
+- Author a docs page at `_docs/04_release/replay_input_format.md` (or wherever the operator-facing docs land in cycle 4)
+- Schema table: column names, units, types, expected rates, required vs optional
+- Constraint statements up top, before the column table:
+  - Video: nadir camera; UAV already airborne at frame 0
+  - CSV: row 0 timestamp == video frame 0 timestamp; `Time` column starts at 0.0; rows monotonic and uniformly-spaced
+- Ship `_docs/04_release/example_data_imu.csv` — a minimal valid example (e.g., 20 rows = 2 seconds at 10 Hz)
+- Cross-link from the AZ-897 replay UI "Download example" button
+
+## Acceptance Criteria
+
+- **AC-1**: Schema page documents all 19 columns of the Derkachi CSV with units and types.
+- **AC-2**: The three hard constraints (nadir / airborne / aligned-start) are stated up top, before the column table.
+- **AC-3**: The example CSV (≥10 rows) passes through the AZ-894 CSV adapter without errors.
+- **AC-4**: The page is reachable from the AZ-897 UI's "Download example" link.
+
+## Out of scope
+
+- Multi-schema support (PX4, generic MAVLink dumps).
+
+## References
+
+- Companion: AZ-894 (CSV adapter), AZ-897 (UI), AZ-895 (auto-sync deprecation)
+- Source fixture: `_docs/00_problem/input_data/flight_derkachi/data_imu.csv`, README at `_docs/00_problem/input_data/flight_derkachi/README.md`
@@ -0,0 +1,78 @@
+# Land `architecture_compliance_baseline.md` (cycle-3 retro #3, third try)
+
+**Task**: AZ-899_architecture_compliance_baseline
+**Name**: Create `_docs/02_document/architecture_compliance_baseline.md` so cumulative reviews can emit `## Baseline Delta` rows
+**Description**: Cycle-1 retro Top-3 Improvement Action #3, repeated in cycle-3 retro Top-3 #3. The file has been unmade across cycles 2 and 3, leaving cumulative reviews unable to quantify carried-over / resolved / newly-introduced architecture violations per cycle. Seed the baseline from `_docs/06_metrics/structure_2026-05-20.md` with `0` violations, freeze the snapshot semantics, and wire the existing-code flow's Step 2 to reference it.
+**Complexity**: 1 SP
+**Dependencies**: None (operates on existing artifact `_docs/06_metrics/structure_2026-05-20.md`)
+**Component**: documentation only — no source code change
+**Tracker**: AZ-899 (https://denyspopov.atlassian.net/browse/AZ-899)
+**Epic**: (none — cycle-4 process housekeeping)
+
+## Problem
+
+Cycle-3 retro § Structural Metrics:
+
+> `_docs/02_document/architecture_compliance_baseline.md` **still does not exist** — cycle-1 retro Top-3 Improvement Action #3 was NOT delivered in cycles 2 or 3.
+
+Without a baseline, cumulative reviews log "`_docs/02_document/architecture_compliance_baseline.md` does NOT exist → no Baseline Delta section emitted". Structural regressions (new cycles in the import graph, newly-introduced violations) therefore cannot be quantified across cycles — only verified pairwise per batch.
+
+## Outcome
+
+- Cumulative-review reports starting from cycle-4 batch 1 emit a `## Baseline Delta` section that quantifies new vs. resolved vs. carried-over architecture violations.
+- Cycle-end retros can compare structural deltas across cycles using a single canonical baseline document instead of re-deriving from the previous cycle's snapshot.
+
+## Scope
+
+### Included
+
+- Create `_docs/02_document/architecture_compliance_baseline.md` seeded with **0** violations.
+- Reference `_docs/06_metrics/structure_2026-05-20.md` as the source-of-truth snapshot from which the baseline was derived.
+- Document the file's update protocol: a new violation found in a cumulative review is appended (with batch ID, severity, finding ID); a resolution is recorded by marking the row `RESOLVED in batch <ID>`.
+- Document the snapshot-refresh trigger: any cycle that materially changes structure (component count, cross-component edges, new contracts) re-snapshots via `python -m gps_denied_onboard.tools.structure_snapshot` (or equivalent existing script — verify before reference).
+
+### Excluded
+
+- Refactoring source code to fix violations — none currently exist.
+- Adding new component scaffolding — out of scope.
+- Modifying `code-review` or `retrospective` skills — they already reference the file; the only change needed is making the referenced file exist.
+
+## Acceptance Criteria
+
+**AC-1: Baseline file exists with 0 violations**
+Given a fresh repo checkout
+When `ls _docs/02_document/architecture_compliance_baseline.md` runs
+Then the file exists and its `## Violations` section is explicitly empty (or marked "None at baseline")
+
+**AC-2: Baseline references the structural snapshot**
+Given the baseline file
+When read
+Then it includes a `## Source` section pointing at `_docs/06_metrics/structure_2026-05-20.md` and lists the structural facts (15 components, 0 import cycles, 5 contract files) that establish the "0 violations" claim
+
+**AC-3: Update protocol documented**
+Given the baseline file
+When read
+Then it includes an `## Update Protocol` section describing append-on-violation, mark-resolved-on-fix, and the snapshot-refresh trigger
+
+**AC-4: Cumulative-review hook verified**
+Given the baseline file in place
+When the cycle-4 first cumulative-review report is generated
+Then the report emits a `## Baseline Delta` section (even if empty: "0 new, 0 resolved, 0 carried-over")
+
+## Constraints
+
+- File format: markdown, matches the structure of `_docs/06_metrics/structure_2026-05-20.md` style.
+- No source code change permitted under this ticket — strictly documentation.
+
+## Risks & Mitigation
+
+**Risk 1: Future violations slip past the baseline**
+- *Risk*: A cumulative review finds a violation but the reviewer forgets to append it to the baseline.
+- *Mitigation*: The `code-review` skill (referenced in cycle-3 retro Suggested Updates) should be updated separately to auto-append; this ticket only delivers the baseline file. The follow-up belongs in cycle 5 if needed.
+
+## References
+
+- Cycle-3 retro: `_docs/06_metrics/retro_2026-05-26.md` § Top 3 Improvement Actions #3
+- Cycle-1 retro: `_docs/06_metrics/retro_2026-05-20.md` § Top 3 Improvement Actions #3 (original)
+- Source snapshot: `_docs/06_metrics/structure_2026-05-20.md`
+- Existing-code flow Step 2: `.cursor/skills/autodev/flows/existing-code.md` § "Step 2 — Architecture Baseline Scan"
@@ -0,0 +1,82 @@
+# Autodev: gate Step-9 entry on previous-cycle retro existence
+
+**Task**: AZ-900_autodev_retro_existence_gate
+**Name**: Codify the LESSONS rule — autodev must block cycle-N+1 Step 9 entry if `retro_<YYYY-MM-DD>.md` for cycle N is absent
+**Description**: Cycle-3 retro Top-3 Improvement Action #2 and 2026-05-26 LESSONS entry both call for codifying a Re-Entry After Completion gate that verifies the previous cycle's retro file exists before incrementing the cycle counter. Cycle-2 retro was never filed; the orchestrator silently advanced to cycle 3 and all cycle-1 retro Top-3 actions sat invisible. This ticket codifies the gate in `.cursor/skills/autodev/flows/existing-code.md` § Re-Entry After Completion.
+**Complexity**: 1 SP
+**Dependencies**: None
+**Component**: `.cursor/skills/autodev/flows/existing-code.md` (workflow doc only)
+**Tracker**: AZ-900 (https://denyspopov.atlassian.net/browse/AZ-900)
+**Epic**: (none — cycle-4 process housekeeping)
+
+## Problem
+
+LESSONS 2026-05-26 [process] entry:
+
+> Cycle-2 retro was never filed. The autodev orchestrator silently auto-chained from cycle-2 Step 17 (if it ran at all) straight into cycle-3 Step 9 without producing `retro_<cycle2-date>.md`. As a result, cycle-1 retro's Top-3 Improvement Actions sat invisible across cycle 2 and were re-discovered, all three still undelivered, only at cycle-3 close.
+
+Cycle-3 retro Top-3 #2 echoes the same recommendation.
+
+The fix is a one-line check in the flow file that BLOCKS Step 9 entry for cycle N+1 unless `_docs/06_metrics/retro_<YYYY-MM-DD>.md` for cycle N exists.
+
+## Outcome
+
+- Future cycle-N → cycle-(N+1) transitions are gated: the autodev orchestrator refuses to enter Step 9 of cycle N+1 if no retro file exists for cycle N.
+- Missing retros are surfaced at the session boundary, not 6 weeks later at the next cycle's close.
+
+## Scope
+
+### Included
+
+- Edit `.cursor/skills/autodev/flows/existing-code.md` § "Re-Entry After Completion" to add a gate: before incrementing `cycle`, glob `_docs/06_metrics/retro_*.md` and verify a file dated after the cycle-N start exists.
+- Define the BLOCK behavior: if absent, present a Choose A/B/C block:
+  - **A)** Author the missing retro now (invoke `.cursor/skills/retrospective/SKILL.md` in cycle-end mode)
+  - **B)** Stub a backfilled retro and proceed (with a leftover entry filed for proper backfill)
+  - **C)** Abort and ask the user
+- Add a corresponding bullet to `.cursor/skills/autodev/state.md` § "Session Boundaries" pointing at the new gate.
+
+### Excluded
+
+- Retroactively writing cycle-2 retro (separate ticket if user wants it; cycle-3 retro already covers cycle-2 trend deltas where data is on disk).
+- Adding similar gates to greenfield or meta-repo flows (only `existing-code` has the cycle counter).
+- Per-step retro check inside cycles (this gate fires only at the cycle boundary).
+
+## Acceptance Criteria
+
+**AC-1: Flow file gate exists**
+Given `.cursor/skills/autodev/flows/existing-code.md`
+When the "Re-Entry After Completion" section is read
+Then it contains a step `Verify previous cycle's retro exists` BEFORE the cycle increment
+
+**AC-2: Choose A/B/C block specified**
+Given the gate triggers (no retro file found)
+When the documented behavior is consulted
+Then it specifies the three options (A: author now, B: stub + leftover, C: abort) with the standard Choose format
+
+**AC-3: state.md cross-reference**
+Given `.cursor/skills/autodev/state.md`
+When the "Session Boundaries" section is read
+Then it mentions the new retro-existence gate or links to the flow file's gate
+
+**AC-4: Discovery rule**
+Given the gate
+When the file pattern is documented
+Then the glob is unambiguous: `_docs/06_metrics/retro_*.md` with a date matching cycle-N's date range; the date-range derivation is explicit (cycle N start = last `implementation_report_*_cycle{N-1}.md` date; cycle N end = today)
+
+## Constraints
+
+- Pure workflow doc change — no source code, no tests.
+- Must not break the existing greenfield-Done → existing-code Phase-B transition (greenfield → existing-code is a one-shot flow change with no retro requirement on first entry, since there is no previous cycle).
+
+## Risks & Mitigation
+
+**Risk 1: False positive on greenfield→existing-code transition**
+- *Risk*: First cycle of an existing-code flow shouldn't require a previous-cycle retro.
+- *Mitigation*: Gate condition includes `state.cycle > 1` — cycle 1 has no previous cycle.
+
+## References
+
+- LESSONS 2026-05-26 [process] entry: `_docs/LESSONS.md` § 2026-05-26 [process]
+- Cycle-3 retro Top-3 #2: `_docs/06_metrics/retro_2026-05-26.md`
+- Flow file: `.cursor/skills/autodev/flows/existing-code.md` § "Re-Entry After Completion"
+- State management: `.cursor/skills/autodev/state.md` § "Session Boundaries"
@@ -0,0 +1,85 @@
+# Fix `EVIDENCE_OUT` default path — workspace-relative, not container-only
+
+**Task**: AZ-901_evidence_out_default_path_fix
+**Name**: Change `e2e/runner/conftest.py:56` `EVIDENCE_OUT` default from `/e2e-results/evidence` to a workspace-relative path so Tier-1 host runs don't crash
+**Description**: Closes leftover `_docs/_process_leftovers/2026-05-26_evidence_out_default_path.md`. Cycle-3 Step 15 (Performance Test) surfaced this: the default path `/e2e-results/evidence` is the container mount inside the Tier-1 Docker harness; a developer Mac/Linux workstation invoking `python -m pytest e2e/tests/performance/` directly hits `OSError: [Errno 30] Read-only file system: '/e2e-results'` (macOS) or `PermissionError` (Linux). Workaround today: `EVIDENCE_OUT="$(pwd)/e2e-results/..." pytest ...`. Fix: resolve a workspace-relative default when neither `--evidence-out` nor `EVIDENCE_OUT` is set.
+**Complexity**: 1 SP
+**Dependencies**: None
+**Component**: `e2e/runner/conftest.py`
+**Tracker**: AZ-901 (https://denyspopov.atlassian.net/browse/AZ-901)
+**Epic**: (none — cycle-4 process housekeeping)
+
+## Problem
+
+`e2e/runner/conftest.py:56`:
+
+```python
+default=os.environ.get("EVIDENCE_OUT", "/e2e-results/evidence")
+```
+
+The default `/e2e-results/evidence` is a container-mount path. Tier-1 Docker harness and the Tier-2 Jetson runner pass `--evidence-out` explicitly, so they're fine. Host-direct `python -m pytest e2e/tests/performance/` invocations (developer machine, no Docker) hit `nfr_recorder.pytest_sessionfinish` which tries `mkdir(evidence_dir)` and crashes.
+
+## Outcome
+
+- Developer can run `python -m pytest e2e/tests/performance/` on a Mac/Linux workstation without setting `EVIDENCE_OUT` and without crashing.
+- Docker / Jetson runners continue to work unchanged (they pass `--evidence-out` explicitly).
+
+## Scope
+
+### Included
+
+- Modify `e2e/runner/conftest.py:56` to resolve a workspace-relative default when `EVIDENCE_OUT` is unset.
+  - Proposed: `default=os.environ.get("EVIDENCE_OUT", str(Path(__file__).resolve().parents[2] / "e2e-results" / "evidence"))`
+- Verify Docker compose files and Jetson scripts that pass `--evidence-out` still work (they should — they override the default).
+- Verify `.gitignore` ignores `e2e-results/` at repo root (probably already does — confirm before commit).
+- Delete the leftover file `_docs/_process_leftovers/2026-05-26_evidence_out_default_path.md` once the fix lands and the verification AC passes.
+
+### Excluded
+
+- The "lazy fallback inside the recorder" alternative shape — staying with the workspace-relative-default shape for simplicity (Option 1 from the leftover file).
+- Refactoring `nfr_recorder.pytest_sessionfinish` — the writer code is fine; only the default path is wrong.
+- Adding new evidence-out related env vars or CLI flags.
+
+## Acceptance Criteria
+
+**AC-1: Host-direct pytest works without EVIDENCE_OUT**
+Given a clean workspace on macOS or Linux
+When `python -m pytest e2e/tests/performance/ -v --tb=short` runs (no `EVIDENCE_OUT` env var, no `--evidence-out` flag)
+Then pytest exits 0, evidence is written under `<workspace_root>/e2e-results/evidence/`, and no `OSError` / `PermissionError` is raised
+
+**AC-2: Docker harness unchanged**
+Given the Tier-1 Docker compose (`docker-compose.test.jetson.yml`)
+When the e2e suite runs inside the container
+Then `--evidence-out` is still passed and evidence lands at the container mount path `/e2e-results/evidence/` (no behavioral change)
+
+**AC-3: Jetson harness unchanged**
+Given `scripts/run-tests-jetson.sh`
+When invoked
+Then it still passes `--evidence-out` to pytest and evidence is collected per the existing protocol
+
+**AC-4: gitignore covers workspace-relative path**
+Given the fix in place
+When a host-direct run produces `<workspace_root>/e2e-results/`
+Then `git status` does NOT show `e2e-results/` as untracked (already covered by `.gitignore`, or `.gitignore` is updated as part of this ticket)
+
+**AC-5: Leftover deleted**
+Given the fix lands and ACs 1–4 pass
+When `ls _docs/_process_leftovers/2026-05-26_evidence_out_default_path.md`
+Then the file does not exist
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-1 | Run `pytest e2e/tests/performance/` without env vars on host | Exit 0, evidence at `<workspace_root>/e2e-results/evidence/` |
+
+## Constraints
+
+- Backward-compatible — existing callers passing `--evidence-out` or setting `EVIDENCE_OUT` see no change.
+- No new dependencies; uses `pathlib.Path` which `conftest.py` already imports (verify before commit).
+
+## References
+
+- Leftover file: `_docs/_process_leftovers/2026-05-26_evidence_out_default_path.md`
+- Cycle-3 Step 15 perf report: `_docs/06_metrics/perf_2026-05-26_cycle3-tier1-probe.md` § "Findings worth tracking" item 3
+- Conftest: `e2e/runner/conftest.py:56`
@@ -0,0 +1,72 @@
+# replay_api: extend POST /replay to accept (video, csv) multipart for AZ-897 UI
+
+**Task**: AZ-959_replay_api_csv_path_endpoint
+**Name**: Extend `replay_api` `POST /replay` to accept (video, csv) multipart so the AZ-897 UI in `../ui` can drive the CSV-replay path
+**Description**: AZ-897 was relocated to the `../ui` repo (the single Azaion suite React 19 front-end). The UI uploads `(video, CSV)` per AZ-894's CSV path, but the existing `replay_api` `POST /replay` endpoint only accepts `(tlog, video, calibration)` — it predates AZ-894. This ticket extends the endpoint to accept either input pair (XOR), validates the CSV against the AZ-896 schema, dispatches the subprocess with the right CLI flag (`--imu` vs `--tlog`), and adds a `GET /static/example-csv` endpoint serving the AZ-896 reference CSV. Existing tlog-path callers continue to work unchanged for the cycle-4 demo + transitional clients; AZ-908 (cycle-5+ backlog) eventually removes the tlog branch.
+**Complexity**: 3 SP
+**Dependencies**: AZ-701 (existing `replay_api` package, done), AZ-894 (CSV adapter that the CLI consumes, done), AZ-896 (example CSV + format spec, done)
+**Blocks**: AZ-897 (UI in `../ui` — HARD BLOCKER; the UI cannot ship until this endpoint exists)
+**Component**: replay_api (existing FastAPI app)
+**Tracker**: AZ-959 (https://denyspopov.atlassian.net/browse/AZ-959)
+**Parent Epic**: (none — cycle-4 replay-input redesign, replacement for the original AZ-897 backend slice after the UI relocation)
+
+Jira AZ-959 is the authoritative spec; this file is the in-workspace mirror.
+
+## Goal
+
+Extend the AZ-701 `replay_api` `POST /replay` endpoint to accept the `(video, csv)` input pair that the AZ-894 CLI introduced. AZ-897 (relocated to `../ui`) calls this endpoint with `(video, csv, calibration)` multipart to drive the CSV-replay path; the UI does not upload pymavlink tlog files.
+
+## Scope
+
+1. **`src/gps_denied_onboard/replay_api/app.py`** (`post_replay` handler):
+   - Add `csv: Annotated[UploadFile | None, File()] = None` parameter alongside the existing `tlog`.
+   - Make `tlog` optional (currently required).
+   - Enforce XOR: exactly one of `tlog` or `csv` must be present; both or neither → 400 with clear error pointing at the XOR contract.
+   - Validate csv bytes via new `validate_csv_kind`.
+   - Persist via `job_storage.csv_path` when csv route taken.
+   - Pass through to `SubprocessReplayRunner.run` via the extended `ReplayInputs` shape.
+
+2. **`src/gps_denied_onboard/replay_api/handlers.py`**:
+   - New `validate_csv_kind(probe_bytes: bytes) -> None`: checks the CSV header line starts with `timestamp(ms),Time,SCALED_IMU2.xacc,...` matching the AZ-896 csv_replay_format.md schema. Raises `UnsupportedFileKindError` with a message pointing to the docs path.
+
+3. **`src/gps_denied_onboard/replay_api/interface.py`**:
+   - `ReplayInputs`: change `tlog_path: Path` to `tlog_path: Path | None` and add `csv_path: Path | None`. Invariant: exactly one is None (raised in `__post_init__` if violated).
+
+4. **`src/gps_denied_onboard/replay_api/storage.py`**:
+   - Per-job storage: add `csv_path` property pointing to `{job_dir}/input/data_imu.csv` (mirrors `tlog_path`).
+
+5. **`SubprocessReplayRunner.run` in `app.py`**:
+   - When `inputs.csv_path is not None`, shell out with `--imu` flag; when `inputs.tlog_path is not None`, shell out with `--tlog`.
+   - Ground-truth extraction (`_maybe_render_report`) currently calls `load_tlog_ground_truth(inputs.tlog_path)`. For the CSV path, ground truth must come from the CSV's `GLOBAL_POSITION_INT.*` columns. Default proposal: extend the ground-truth loader to dispatch on file extension via a thin helper next to `load_tlog_ground_truth` (avoids branching inside `_maybe_render_report`).
+
+6. **New `GET /static/example-csv` endpoint** in `app.py`:
+   - Serve `_docs/02_document/contracts/replay/example_data_imu.csv` with `Content-Type: text/csv; charset=utf-8`.
+   - 200 if file exists, 503 if missing (build/packaging issue — file is in the source tree, so a missing file means a deploy-misconfiguration).
+   - This is what AZ-897 UI's "Download example CSV" links to.
+
+7. **`tests/unit/replay_api/test_az701_replay_api.py`**:
+   - Update existing tests to cover the XOR validation (both rejected, neither rejected).
+   - Add tests: CSV happy path, malformed CSV schema rejected, example-csv endpoint serves correct content + content-type.
+
+## Acceptance Criteria
+
+- **AC-1**: `POST /replay` with `(csv, video, calibration)` multipart is accepted; subprocess invocation uses `--imu CSV_PATH`. Same response shape as the tlog-path call.
+- **AC-2**: `POST /replay` with both `tlog` AND `csv` returns 400 + clear error pointing at the XOR contract.
+- **AC-3**: `POST /replay` with neither `tlog` nor `csv` returns 400 + clear error.
+- **AC-4**: `POST /replay` with malformed CSV (missing required column, e.g. no `Time` column) returns 400 + error referencing the AZ-896 format docs.
+- **AC-5**: `GET /static/example-csv` returns 200 + `text/csv; charset=utf-8` content-type + exact file bytes from `_docs/02_document/contracts/replay/example_data_imu.csv`.
+- **AC-6**: Ground-truth extraction works for the CSV path — accuracy report renders against the CSV's `GLOBAL_POSITION_INT.*` columns when `csv_path` was used.
+- **AC-7**: All existing AZ-701 tlog-path tests in `test_az701_replay_api.py` still pass unchanged.
+
+## Out of scope
+
+- Calibration handling changes — keep current behaviour (operator uploads `calibration` field; falls back to `REPLAY_API_DEFAULT_CALIBRATION` env if not provided).
+- Removing the tlog path — AZ-895 deprecated `--tlog` in the CLI but tlog API support stays for backwards compat for the cycle-4 demo + any existing programmatic clients. AZ-908 (cycle-5+ backlog) will remove tlog from both CLI and API.
+- The UI itself — that's AZ-897 in `../ui`.
+- Format docs page rendering / serving — keep markdown source in `_docs/02_document/contracts/replay/csv_replay_format.md`; UI links to the docs URL when published.
+
+## Notes
+
+- The `--imu` flag was added to the CLI by AZ-894; this ticket exposes that path through the HTTP API. No changes to the CLI itself.
+- `validate_csv_kind` should be schema-aware (checks the header line matches the AZ-896 format), not just content-type sniffing. Bad schema must fail fast at the API boundary, not deep in `gps-denied-replay`.
+- The `GET /static/example-csv` endpoint should not require auth (the example CSV is a public reference document, not a secret).
@@ -0,0 +1,54 @@
+# render_map: dispatch --truth loader on extension to unblock CSV-path map render
+
+**Task**: AZ-960_render_map_csv_truth_dispatch
+**Name**: Extend `gps-denied-render-map` so `--truth` accepts AZ-896 CSV in addition to binary tlog; remove the AZ-959 workaround
+**Description**: AZ-959 landed CSV-path support in the `replay_api` `POST /replay` endpoint but `gps-denied-render-map` still only consumes binary tlog as ground truth. As a workaround AZ-959 made `SubprocessReplayRunner._maybe_render_map` short-circuit to `None` for CSV-path jobs — that means the AZ-897 UI (in `../ui`) currently shows no map for CSV uploads. This ticket closes the gap by dispatching on the `--truth` file extension and removing the workaround.
+**Complexity**: 2 SP
+**Dependencies**: AZ-700 (existing render-map CLI, done), AZ-894 (CSV ground-truth loader, done), AZ-959 (replay_api CSV path that carries the current workaround, done)
+**Blocks**: (none — UX completeness, not a hard blocker)
+**Component**: cli/render_map + replay_api/app (workaround removal)
+**Tracker**: AZ-960 (https://denyspopov.atlassian.net/browse/AZ-960)
+**Parent Epic**: (none — cycle-4 replay UX follow-up to AZ-959)
+
+Jira AZ-960 is the authoritative spec; this file is the in-workspace mirror.
+
+## Goal
+
+Make `gps-denied-render-map` source-agnostic for the `--truth` argument: tlog OR CSV. Both already produce row-aligned `(lat_deg, lon_deg)` series via `load_tlog_ground_truth` / `load_csv_ground_truth`, so the rest of the renderer is unchanged. After this lands, the AZ-959 short-circuit in `_maybe_render_map` goes away and CSV-path jobs ship with a map link.
+
+## Scope
+
+1. **`src/gps_denied_onboard/cli/render_map.py`** — `load_ground_truth_track(path)`:
+   - Dispatch on extension. `.csv` → `load_csv_ground_truth(path)`; otherwise (`.tlog`, `.bin`, no ext) → `load_tlog_ground_truth(path)`.
+   - Both return DTOs with row-aligned `records` carrying `lat_deg` / `lon_deg`; the existing list comprehension survives unchanged.
+   - Update `--truth` CLI help to call out CSV support.
+   - Update the renderer's `tooltip="Ground truth (tlog)"` → `tooltip="Ground truth"` (cosmetic; the dispatch hides the source).
+
+2. **`src/gps_denied_onboard/replay_api/app.py`** — `SubprocessReplayRunner._maybe_render_map`:
+   - Drop the `if inputs.tlog_path is None: return None` short-circuit added by AZ-959.
+   - Pass whichever of `tlog_path` / `csv_path` is set as `--truth`.
+
+3. **`tests/unit/test_az700_render_map.py`**:
+   - Add focused test: build a tiny CSV via the AZ-896 schema, call `load_ground_truth_track`, assert the returned `list[tuple[float, float]]` matches what `load_csv_ground_truth` would return.
+   - Add an integration test: run `main()` against a CSV `--truth` and assert the produced HTML contains a polyline.
+
+4. **`tests/unit/replay_api/test_az701_replay_api.py`**:
+   - Extend the AZ-959 CSV happy-path test (`test_post_replay_csv_path_returns_200_and_dispatches_imu_flag`) to also assert `map_html_url` is present in the response (no longer `None`).
+
+## Acceptance Criteria
+
+- **AC-1**: `gps-denied-render-map --truth foo.csv --estimated bar.jsonl --output baz.html` succeeds when `foo.csv` is a valid AZ-896 schema CSV.
+- **AC-2**: `gps-denied-render-map --truth foo.tlog ...` still works unchanged (no tlog regression).
+- **AC-3**: The replay_api `POST /replay` CSV path response now includes `map_html_url`; the corresponding `/jobs/{job_id}/map` returns 200 + valid HTML.
+- **AC-4**: A CSV with a malformed schema (missing required column) raises `ReplayInputAdapterError` from `load_csv_ground_truth` and the CLI exits non-zero; the renderer never sees a half-baked DTO.
+
+## Out of scope
+
+- Renaming `RenderInputs.truth_track` or the internal `_TRUTH_LINE_COLOR` constant — naming stays.
+- Schema validation specifics — those live in `csv_ground_truth.py` and are owned by AZ-896.
+- The cosmetic `ReportContext` field rename — that's AZ-961.
+
+## Notes
+
+- The `load_csv_ground_truth` loader already strict-validates the AZ-896 schema at entry; the CLI inherits that fail-fast behaviour for free.
+- After this lands, the existing AZ-959 `_maybe_render_map` log line ("skipping map render — CSV-path runs do not yet support ...") is dead code and goes with the short-circuit.
--- a/Show More
+++ b/Show More