[autodev] handoff snapshot after batch 16 push

Co-authored-by: Cursor <cursoragent@cursor.com>
[autodev] handoff snapshot after batch 16 commit
2026-06-21 13:51:10 +00:00 · 2026-05-20 17:06:59 +03:00 · 2026-05-20 17:06:00 +03:00 · 2026-05-20 17:05:27 +03:00 · 2026-05-20 16:19:30 +03:00 · 2026-05-20 16:18:40 +03:00
498 changed files with 54114 additions and 15401 deletions
@@ -0,0 +1,11 @@
+[build]
+# Default build target is host arch; aarch64 cross-builds are driven via `cross` or `cargo zigbuild`
+# in CI (see .woodpecker.yml stage `build-arm64`).
+
+[target.aarch64-unknown-linux-gnu]
+# Cross-compilation linker is supplied by the `cross` / `zigbuild` toolchain in CI.
+# For local cross-builds, install `cross` (`cargo install cross`) and run
+# `cross build --release --target aarch64-unknown-linux-gnu`.
+
+[net]
+retry = 3
@@ -0,0 +1,60 @@
+---
+description: "Single Responsibility Principle applied to _docs/ artifacts. Each canonical file owns ONE concern and MUST NOT bleed into a sibling artifact's concern."
+alwaysApply: true
+---
+
+# Artifact Single Responsibility
+
+SRP is not only for code. Every canonical `_docs/` artifact owns exactly **one** concern. Mixing concerns across artifacts is a violation — fix the artifact, do not let the leak survive.
+
+## Canonical artifact responsibilities
+
+| Artifact | Owns ONLY | MUST NOT contain |
+|---|---|---|
+| `_docs/00_problem/problem.md` | What the system is for, the problem it solves, who uses it, the operational/environmental reality that defines the problem space. WHO + WHAT + WHY. | Technology choices, frameworks, languages, libraries, state-machine designs, component lists, internal data flows, IPC mechanisms, algorithms, schema names, "uses X library", "implements Y pattern". |
+| `_docs/00_problem/restrictions.md` | Externally imposed constraints the system MUST satisfy: hardware (the device that already exists), regulatory, operational (deployment environment, climate, link reliability), vendor-fixed protocols (a chosen camera or radio whose protocol cannot be changed), legal/budget/timeline. | Design choices framed as constraints. "We chose Rust for memory safety" is design, not restriction. "The Jetson Orin Nano has 8 GB RAM" is a restriction. |
+| `_docs/00_problem/acceptance_criteria.md` | Measurable, design-independent outcomes. What "done" looks like, expressed so a black-box test can verify it. | Implementation choices (libraries, params, algorithms, internal component names). AC is reverse-engineered FROM problem+restrictions, never FROM solution. |
+| `_docs/00_problem/input_data/` | Reference data the system consumes + the input→quantifiable-expected-output mapping consumed by `/test-spec`. | Solution design or AC restatement. |
+| `_docs/00_problem/security_approach.md` | Threat model + non-negotiable security principles + open security decisions. | Specific algorithms / libraries unless the AC truly mandates them (e.g. "must use AES-256" only if regulation forces it). |
+| `_docs/01_solution/solution.md` | The chosen solution shape: high-level approach, the component breakdown name list, the tech stack with one-line rationale, pointers to the architecture deep dive. | Detailed flows (those belong in system-flows.md). Per-component contracts (those belong in component specs). Re-statement of the problem (point to it, do not duplicate). |
+| `_docs/02_document/architecture.md` | System context, component layering, NFR targets, detailed design, MAVLink command surface, sync protocols, open architecture questions, scope boundary. The "how" at a system level. | Wholesale re-statement of problem.md, restrictions.md, AC, or solution overview. May briefly reference them; must not duplicate them. (If the project predates this rule and architecture.md has §Problem / §Restrictions / §AC sections, leave them but mark them as "MOVED to canonical location — keep this in sync or delete on next refactor".) |
+| `_docs/02_document/system-flows.md` | Per-flow narratives + sequence diagrams. Behaviour over time. | Component implementation details (those live in component specs). |
+| `_docs/02_document/data_model.md` | Canonical entity catalogue. | Component implementation details. |
+| `_docs/02_document/components/<name>/description.md` | Per-component: purpose, inputs, outputs, responsibilities, state, failure modes, NFR targets, dependencies. | Cross-component flows (those live in system-flows.md). |
+| `_docs/02_document/decision-rationale.md` | The "why" behind every load-bearing decision. Research evidence, reasoning chain, fact cards, fit matrix, validation log. | Authoritative architecture (point to architecture.md). |
+| `_docs/02_document/glossary.md` | Project-specific terms only. | Generic CS/industry terms (RTSP, gRPC, JSON, etc.). |
+
+## Litmus test (apply before writing or editing any of the above)
+
+Before you save a file, scan each sentence and ask: **does this sentence belong to this artifact's concern (per the table above)?** If it belongs to a sibling artifact, move it there. Do not "summarise the system architecture in problem.md so the reader has context" — that is exactly the violation this rule exists to prevent.
+
+Specific signals that you are leaking:
+
+- problem.md mentions a programming language, framework, library, IPC mechanism, state-machine pattern, container, file format, RPC framework, or algorithm → solution leakage. Remove.
+- restrictions.md says "we will use X because Y" or "the solution must be implemented with Z" → design choice masquerading as restriction. Move to solution.md (or architecture.md if it is a design non-negotiable).
+- acceptance_criteria.md names a specific library, model file, or component → implementation leakage. Re-express as observable behaviour ("system returns N detections within Tms"), not "library X must return N detections".
+- solution.md re-explains the problem in detail (more than a one-paragraph context-setter) → duplication. Point to problem.md instead.
+- architecture.md restates AC numerically instead of referencing acceptance_criteria.md → duplication that will drift.
+
+## When a fact is genuinely cross-cutting
+
+Sometimes a single fact touches multiple concerns. Pick the artifact whose concern is *primary* and reference from the others:
+
+- "ViewPro A40 is the camera." Hardware reality → **restrictions.md**. solution.md / architecture.md reference it.
+- "Tier-1 inference lives in `../detections`, not in autopilot." Architectural non-negotiable → **architecture.md §5**. solution.md mentions it; restrictions.md does NOT (it is not an external constraint, it is a chosen split).
+- "Operator commands must be authenticated, signed, replay-protected." This is a **principle / restriction** the threat model imposes → security_approach.md owns the principle; architecture.md owns the chosen scheme.
+- "≤5 POIs / minute" is a **product requirement** → acceptance_criteria.md owns it; architecture.md owns how scan_controller enforces it.
+
+## When you are tempted to skip the rule
+
+Common excuses and the answer:
+
+- "But the architecture document was authored before the canonical problem/solution split existed." → Then the architecture document over-reaches into other concerns. Mark the over-reaching sections "MOVED — see <canonical path>" and shrink them on the next refactor. Do not propagate the over-reach into newly authored artifacts.
+- "But the reader needs context to understand the problem statement." → Context for the *problem* means **operational + environmental + user reality** (e.g. "the UAV flies at 600–1000 m, must work in winter snow"). Context does NOT mean a tour of the solution design.
+- "But everyone will read both files anyway." → Then the duplication is harmless? No — duplication drifts. The two copies diverge silently and a reader cannot tell which one is authoritative.
+- "But the source paragraph in architecture.md said it this way." → architecture.md may itself be in violation (see "When a project predates this rule" in the table above). Do not propagate a pre-existing violation when authoring a new file.
+
+## Enforcement
+
+- Any `/autodev` or skill workflow that writes one of the canonical artifacts MUST self-check against the table above before saving.
+- When auditing an existing artifact, flag any sentence that violates the table. If the violation is in a file you are editing for another reason, fix it inline (per the "adjacent hygiene" allowance in coderule.mdc → "Scope discipline"). If it is in a file outside your current scope, record it in `_docs/_process_leftovers/` for later cleanup.
@@ -39,6 +39,7 @@ alwaysApply: true
 - When you think you are done with changes, run the full test suite. Every failure in tests that cover code you modified or that depend on code you modified is a **blocking gate**. For pre-existing failures in unrelated areas, report them to the user but do not block on them. Never silently ignore or skip a failure without reporting it. On any blocking failure, stop and ask the user to choose one of:
  - **Investigate and fix** the failing test or source code
  - **Remove the test** if it is obsolete or no longer relevant
+- **Iterative-skill exception**: when an iterative loop skill is active (e.g. autodev / `implement/SKILL.md` batch loop, `refactor/SKILL.md` batch loop), the skill governs full-suite cadence — typically focused tests per task/batch and a single full-suite gate at the very end of the implementation phase, NOT after each batch. "Done with changes" means done with the entire implementation phase the skill is running, not done with one batch. Do not run the full suite per batch unless the skill explicitly says to.
 - Do not rename any databases or tables or table columns without confirmation. Avoid such renaming if possible.

 - Make sure we don't commit binaries, create and keep .gitignore up to date and delete binaries after you are done with the task
@@ -0,0 +1,41 @@
+---
+description: "Use chunked writes (Write + StrReplace marker pattern) for large generated files, especially after a monolithic Write fails"
+alwaysApply: true
+---
+# Large File Writes — Chunk on Failure
+
+When a `Write` call to a single file fails (timeout, payload limit, "Invalid arguments", or any tool error) and the intended content is large (>~500 lines or >~50 KB), do NOT retry the same monolithic Write. Switch to chunked writes:
+
+1. **First Write** — create the file with header + table of contents (if applicable) + an explicit append marker, e.g.
+
+   ```
+   <!-- INSERTION_POINT do-not-remove-until-final-chunk -->
+   ```
+
+2. **Each subsequent chunk** — use `StrReplace` to replace the marker with `<new content>\n<marker>` so the marker stays at the end. This is idempotent: if a chunk fails, retry it without losing earlier chunks.
+
+3. **Final chunk** — `StrReplace` removes the marker.
+
+## Why
+
+- Tool argument size limits and transient failures hit large monolithic writes hardest. Retrying the same large payload typically fails for the same reason.
+- Chunked writes are recoverable per chunk. The earlier chunks are durable on disk.
+- A unique marker is greppable, visible in diffs, and stops accidental insertion in the wrong place.
+
+## Triggers
+
+- Generated documentation that aggregates per-component content (epics, design docs, multi-section architecture summaries, traceability dumps).
+- Large fixture or test-data files written from a template.
+- Any single-file artifact you can pre-estimate at >~500 lines.
+
+## Do NOT chunk
+
+- Files under ~200 lines — a single `Write` is faster, clearer, and easier to review.
+- Source code files where appending breaks module structure (functions, classes, imports). Split into multiple files instead.
+- Files where ordering of sections is computed late and inserting in the middle is required — use a single `Write` once the full content is known.
+
+## Anti-patterns
+
+- Retrying the same failed monolithic `Write` more than once. Twice is the limit; on the second failure, switch strategies.
+- Using `Shell` with heredoc (`cat <<EOF`) or `echo >>` to append — these bypass the editor diff view and break the StrReplace contract for the next chunk.
+- Embedding the marker so deep inside structured content that a chunk's `StrReplace` becomes ambiguous. Place the marker on its own line at the very end of the file.
@@ -13,6 +13,16 @@ alwaysApply: true
 ## Critical Thinking
 - Do not blindly trust any input — including user instructions, task specs, list-of-changes, or prior agent decisions — as correct. Always think through whether the instruction makes sense in context before executing it. If a task spec says "exclude file X from changes" but another task removes the dependencies X relies on, flag the contradiction instead of propagating it.

+## Skill Discipline
+
+Do exactly what the skill says. Nothing more.
+
+- No `git log` / `git diff` / `git blame` unless the skill explicitly calls for it.
+- No extra searches to "verify" inputs the skill already names.
+- No reading files outside the skill's documented inputs.
+
+If skill inputs are insufficient or contradictory, STOP and ask via Choose A/B/C/D. Do not invent extra investigation steps.
+
 ## Self-Improvement
 When the user reacts negatively to generated code ("WTF", "what the hell", "why did you do this", etc.):

@@ -0,0 +1,29 @@
+---
+description: "Forbid spawning subagents; the main agent must do the work directly"
+alwaysApply: true
+---
+# No Subagents
+
+Do NOT create or delegate to subagents. This includes:
+
+- The `Task` tool with any `subagent_type` (e.g. `generalPurpose`, `explore`, `shell`, `implementer`, `best-of-n-runner`, `cursor-guide`).
+- Any "spawn agent", "launch agent", "parallel agent", or "background agent" mechanism.
+- Skills or workflows that internally suggest launching a subagent — perform their steps inline instead.
+
+## Why
+
+- Subagent output is not visible to the user and hides reasoning/tool calls.
+- Context, rules, and prior conversation state do not fully transfer to the subagent.
+- Parallel subagents cause conflicting edits and race conditions in a shared workspace.
+- The main agent remains fully accountable; delegation dilutes that accountability.
+
+## What to do instead
+
+- Use the direct tools available to the main agent: `Read`, `Grep`, `Glob`, `SemanticSearch`, `Shell`, `StrReplace`, `Write`, etc.
+- For broad exploration, run `Grep`/`Glob`/`SemanticSearch` yourself and read the files directly.
+- For multi-step work, use `TodoWrite` to track progress inline.
+- For isolated experiments the user explicitly asks for, use a git branch/worktree you manage directly — not a subagent runner.
+
+## Exception
+
+Only spawn a subagent if the user explicitly requests it in the current turn (e.g. "use a subagent to…", "launch an explore agent…"). Even then, confirm once before spawning.
@@ -0,0 +1,46 @@
+---
+description: "Explanation length and reasoning depth calibration"
+alwaysApply: true
+---
+# Response Calibration
+
+Default to concise. Expand only when the content demands it.
+
+## Length target
+
+- **Default**: a direct answer in ~3–10 lines. Short paragraphs or a tight bullet list.
+- **Expand when**: the question involves trade-offs across multiple options, a migration/architectural decision, a security/data-loss risk, or the user explicitly asks for depth ("explain in detail", "walk me through", "why").
+- **Shrink when**: the user asks for "shorter", "simpler", "TL;DR", "one line", or similar. Do not re-inflate in later turns unless they ask a new deeper question.
+
+## Completeness floor
+
+Short ≠ incomplete. Every response must still:
+
+- Answer the actual question asked (not a reframed version).
+- State the key constraint or reason *once*, not repeatedly.
+- Flag a real caveat if one exists (data loss, breaking change, wrong-OS, security). One sentence is enough.
+- Not drop a step from an action sequence. If there are 5 steps, list 5 — but without narration between them.
+
+If the honest answer truly needs more space (e.g. trade-off matrix, multi-option decision), write more — but lead with the recommendation or direct answer, then the detail.
+
+## Structure
+
+- One direct sentence first. Then supporting detail.
+- Prefer bullets over prose for enumerations, comparisons, or step lists.
+- Drop section headers for anything under ~15 lines.
+- No "Summary" / "Conclusion" sections unless the response is genuinely long.
+
+## Reasoning depth (internal)
+
+- Match thinking to the problem, not the length of the answer.
+  - Factual / "where is X used" / single-file edit → minimal thinking, go straight to tools.
+  - Trade-off / refactor / debugging 3+ hypotheses deep → full thinking budget.
+- Do not pad thinking to look thorough. Do not skip thinking on genuinely ambiguous problems to look fast.
+
+## Anti-patterns to avoid
+
+- Restating the question back to the user.
+- Multi-paragraph preambles before the answer.
+- Exhaustive "alternatives considered" sections when the user didn't ask for alternatives.
+- Recapping what was just done at the end of every tool-using turn ("Done. I have edited the file…") — a one-line confirmation is enough.
+- Speculative "you might also want to…" paragraphs. Offer follow-ups as a single short sentence, or not at all.
@@ -0,0 +1,38 @@
+---
+description: "Standards for creating and maintaining Cursor skills"
+globs: [".cursor/skills/**"]
+---
+
+# Skill Building
+
+## When To Create A Skill
+- Create a skill for repeatable, bounded workflows that benefit from a reusable process.
+- Do not create a skill for a one-off task, vague goal, or workflow that still needs product decisions.
+- Start small; evolve the skill when repeated use reveals clearer steps, constraints, or checks.
+
+## Skill Contract
+- `SKILL.md` must define a clear `name` and a proactive `description` that explains when the skill should be used.
+- State expected inputs, constraints, workflow steps, and final output shape.
+- Make trigger conditions explicit enough that the agent can recognize intent without an exact command.
+- Base instructions on observable project evidence; do not invite fabrication or unsupported assumptions.
+
+## Keep The Core Lean
+- Keep `SKILL.md` concise and under the repo's `.cursor/` size guidance.
+- Move detailed standards, examples, and background knowledge into `references/`.
+- Put reusable output shapes in `templates/` or other skill-local assets instead of embedding them in the main instructions.
+- Keep one primary responsibility per skill; use an orchestrator skill only when multiple existing skills must run in a defined order.
+
+## Deterministic Work
+- Use scripts for mechanical steps that are repeatable, parameterized, and safer outside the model's reasoning.
+- Scripts must expose explicit inputs, avoid hidden side effects, and fail loudly on errors.
+- Do not use scripts to bypass review, hide destructive behavior, or hardcode secrets.
+
+## Quality Proof
+- Include realistic examples, checklists, or eval-style scenarios that define what good output looks like.
+- Cover common failure cases such as missing sections, leftover placeholders, hallucinated facts, unsafe actions, or malformed output.
+- Review skill changes against those checks before treating the skill as ready.
+
+## Security Review
+- Treat third-party skills like untrusted code until reviewed.
+- Inspect scripts, dependencies, references, secret handling, network calls, and destructive commands before use.
+- Prefer local, project-scoped assets and dependencies; document any external dependency the skill requires.
@@ -14,11 +14,14 @@ alwaysApply: true
 - Issue types: Epic, Story, Task, Bug, Subtask

 ## Tracker Availability Gate
- If Jira MCP returns **Unauthorized**, **errored**, **connection refused**, or any non-success response: **STOP** tracker operations and notify the user via the Choose A/B/C/D format documented in `.cursor/skills/autodev/protocols.md`.
+- If Jira MCP returns **Unauthorized**, **errored**, **connection refused**, **timeout**, a non-2xx status code, an empty body, or any response shape that does not clearly confirm the requested change: **STOP IMMEDIATELY** — no automatic retry, no silent continuation. Surface the full raw error/response to the user verbatim and notify via the Choose A/B/C/D format documented in `.cursor/skills/autodev/protocols.md`.
+- A minimal `{"success": true}` body with no echoed issue state is NOT a confirmed transition. When a transition's success matters (status moves, ticket creation, blocking link), follow it with a read-back call (`getJiraIssue` or equivalent) and confirm the new state matches what you asked for. If the read-back disagrees → STOP and ASK.
+- Do NOT loop "retry up to N times before asking". One call, one verification. On failure, the user decides whether to retry.
 - The user may choose to:
-  - **Retry authentication** — preferred; the tracker remains the source of truth.
+  - **Retry the same operation** — once, after the user authorizes it. If it fails again, surface both responses.
+  - **Retry authentication** — preferred when the failure looks like an auth/credentials problem; the tracker remains the source of truth.
  - **Continue in `tracker: local` mode** — only when the user explicitly accepts this option. In that mode all tasks keep numeric prefixes and a `Tracker: pending` marker is written into each task header. The state file records `tracker: local`. The mode is NOT silent — the user has been asked and has acknowledged the trade-off.
- Do NOT auto-fall-back to `tracker: local` without a user decision. Do not pretend a write succeeded. If the user is unreachable (e.g., non-interactive run), stop and wait.
+- Do NOT auto-fall-back to `tracker: local` without a user decision. Do not pretend a write succeeded. Do not paper over an opaque response by moving on. If the user is unreachable (e.g., non-interactive run), stop and wait.
 - When the tracker becomes available again, any `Tracker: pending` tasks should be synced — this is done at the start of the next `/autodev` invocation via the Leftovers Mechanism below.

 ## Leftovers Mechanism (non-user-input blockers only)
@@ -3,7 +3,7 @@ name: autodev
 description: |
  Auto-chaining orchestrator that drives the full BUILD-SHIP workflow from problem gathering through deployment.
  Detects current project state from _docs/ folder, resumes from where it left off, and flows through
-  problem → research → plan → decompose → implement → deploy without manual skill invocation.
+  problem → research → plan → test specs → decompose → implement → tests → docs sync → deploy without manual skill invocation.
  Maximizes work per conversation by auto-transitioning between skills.
  Trigger phrases:
  - "autodev", "auto", "start", "continue"
@@ -52,7 +52,7 @@ Determine which flow to use (check in order — first match wins):

 After selecting the flow, apply its detection rules (first match wins) to determine the current step.

-**Note**: the meta-repo flow uses a different artifact layout — its source of truth is `_docs/_repo-config.yaml`, not `_docs/NN_*/` folders. Other detection rules assume the BUILD-SHIP artifact layout; they don't apply to meta-repos.
+**Note**: the meta-repo flow uses a different artifact layout — its source of truth is `_docs/_repo-config.yaml`, not `_docs/NN_*/` folders. After Step 2.5 it also produces `_docs/glossary.md` and a `## Architecture Vision` section in the cross-cutting architecture doc identified by `docs.cross_cutting`. Other detection rules assume the BUILD-SHIP artifact layout; they don't apply to meta-repos.

 ## Execution Loop

@@ -67,8 +67,9 @@ B3. Read state              — `_docs/_autodev_state.md` (if it exists).
 B4. Read File Index         — `state.md`, `protocols.md`, and the active flow file.

 ### Resolve (once per invocation, after Bootstrap)
-R1. Reconcile state         — verify state file against `_docs/` contents; on disagreement, trust the folders
-                               and update the state file (rules: `state.md` → "State File Rules" #4).
+R1. Reconcile state         — verify state file against `_docs/` contents; probe `<workspace-root>/../docs`
+                               (parent suite `docs/` — see `state.md` → "State File Rules" #4); on disagreement,
+                               trust the folders and update the state file (rules: `state.md` → "State File Rules" #4).
                               After this step, `state.step` / `state.status` are authoritative.
 R2. Resolve flow            — see §Flow Resolution above.
 R3. Resolve current step    — when a state file exists, `state.step` drives detection.
@@ -112,6 +113,15 @@ Do NOT modify, skip, or abbreviate any part of the sub-skill's workflow. The aut

 The state file (`_docs/_autodev_state.md`) is a minimal pointer — only the current step. See `state.md` for the authoritative template, field semantics, update rules, and worked examples. Do not restate the schema here — `state.md` is the single source of truth.

+**Conciseness rule (authoritative).** The state file MUST stay short. Acceptable content per field:
+
+- `name` — the step title from the active flow's Step Reference Table. That's it.
+- `sub_step.name` — kebab-case identifier from the active sub-skill. That's it.
+- `sub_step.detail` — **leave empty (`""`) by default.** Add a one-line note ONLY when the next-session resumer cannot infer where to pick up from `phase` + `name` + on-disk artifacts alone (e.g. `"batch 2 of 4"`, `"blocked on D-PROJ-2 reply"`, `"variant 1b"`). NEVER use `detail` as a changelog, recap, or summary of completed work — those facts belong in the relevant `_docs/` artifact (glossary, traceability matrix, leftovers folder, retro report, etc.) and in git history.
+- **Total file size target: <30 lines.** If you're tempted to write more, you're using the wrong artifact — write in `_docs/` instead.
+
+Multi-line `detail` blobs that recap what was just completed are a smell. The state file is a *pointer*, not a logbook.
+
 ## Trigger Conditions

 This skill activates when the user wants to:
@@ -13,7 +13,7 @@ A first-time run executes Phase A then Phase B; every subsequent invocation re-e

 | Step | Name | Sub-Skill | Internal SubSteps |
 |------|------|-----------|-------------------|
-| 1 | Document | document/SKILL.md | Steps 1–8 |
+| 1 | Document | document/SKILL.md | Steps 0–7 incl. inline 2.5 (module-layout) and 4.5 (glossary + arch vision) |
 | 2 | Architecture Baseline Scan | code-review/SKILL.md (baseline mode) | Phase 1 + Phase 7 |
 | 3 | Test Spec | test-spec/SKILL.md | Phases 1–4 |
 | 4 | Code Testability Revision | refactor/SKILL.md (guided mode) | Phases 0–7 (conditional) |
@@ -53,6 +53,8 @@ Action: An existing codebase without documentation was detected. Read and execut

 The document skill's Step 2.5 produces `_docs/02_document/module-layout.md`, which is required by every downstream step that assigns file ownership (`/implement` Step 4, `/code-review` Phase 7, `/refactor` discovery). If this file is missing after Step 1 completes (e.g., a pre-existing `_docs/` dir predates the 2.5 addition), re-invoke `/document` in resume mode — it will pick up at Step 2.5.

+The document skill's Step 4.5 produces `_docs/02_document/glossary.md` and prepends a confirmed `## Architecture Vision` section to `architecture.md`. Both are user-confirmed artifacts; downstream skills (refactor, decompose, new-task) treat them as authoritative for terminology and structural intent. If `glossary.md` is missing after Step 1 (pre-existing `_docs/` dir from before the 4.5 addition), re-invoke `/document` in resume mode — it will pick up at Step 4.5 without redoing module/component analysis.
+
 ---

 **Step 2 — Architecture Baseline Scan**
@@ -150,15 +152,17 @@ If `_docs/02_tasks/` subfolders have some task files already (e.g., refactoring
 ---

 **Step 6 — Implement Tests**
-Condition (folder fallback): `_docs/02_tasks/todo/` contains task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/implementation_report_tests.md` does not exist.
+Condition (folder fallback): `_docs/02_tasks/todo/` contains test task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/implementation_report_tests.md` does not exist.
 State-driven: reached by auto-chain from Step 5.

-Action: Read and execute `.cursor/skills/implement/SKILL.md`
+Action: Invoke `.cursor/skills/implement/SKILL.md` with task selection context **Test implementation**.

-The implement skill reads test tasks from `_docs/02_tasks/todo/` and implements them.
+The implement skill reads only test tasks from `_docs/02_tasks/todo/` and implements them.

 If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues.

+For folder fallback, **test task files** means `*_test_infrastructure.md` plus task specs whose `**Component**` or `**Epic**` identifies `Blackbox Tests`.
+
 ---

 **Step 7 — Run Tests**
@@ -1,6 +1,6 @@
 # Greenfield Workflow

-Workflow for new projects built from scratch. Flows linearly: Problem → Research → Plan → UI Design (if applicable) → Decompose → Implement → Run Tests → Security Audit (optional) → Performance Test (optional) → Deploy → Retrospective.
+Workflow for new projects built from scratch. Flows linearly: Problem → Research → Plan → UI Design (if applicable) → Test Spec → Decompose → Implement + Product Completeness Gate → Code Testability Revision → Decompose Tests → Implement Tests → Run Tests → Test-Spec Sync → Update Docs → Security Audit (optional) → Performance Test (optional) → Deploy → Retrospective.

 ## Step Reference Table

@@ -10,13 +10,19 @@ Workflow for new projects built from scratch. Flows linearly: Problem → Resear
 | 2 | Research | research/SKILL.md | Mode A: Phase 1–4 · Mode B: Step 0–8 |
 | 3 | Plan | plan/SKILL.md | Step 1–6 + Final |
 | 4 | UI Design | ui-design/SKILL.md | Phase 0–8 (conditional — UI projects only) |
-| 5 | Decompose | decompose/SKILL.md | Step 1–4 |
-| 6 | Implement | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
-| 7 | Run Tests | test-run/SKILL.md | Steps 1–4 |
-| 8 | Security Audit | security/SKILL.md | Phase 1–5 (optional) |
-| 9 | Performance Test | test-run/SKILL.md (perf mode) | Steps 1–5 (optional) |
-| 10 | Deploy | deploy/SKILL.md | Step 1–7 |
-| 11 | Retrospective | retrospective/SKILL.md (cycle-end mode) | Steps 1–4 |
+| 5 | Test Spec | test-spec/SKILL.md | Phases 1–4 |
+| 6 | Decompose | decompose/SKILL.md (implementation task decomposition) | Step 1 + Step 1.5 + Step 2 + Step 4 |
+| 7 | Implement | implement/SKILL.md | Batch loop + Product Implementation Completeness Gate |
+| 8 | Code Testability Revision | refactor/SKILL.md (guided mode) | Phases 0–7 (conditional) |
+| 9 | Decompose Tests | decompose/SKILL.md (tests-only) | Step 1t + Step 3 + Step 4 |
+| 10 | Implement Tests | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
+| 11 | Run Tests | test-run/SKILL.md | Steps 1–4 |
+| 12 | Test-Spec Sync | test-spec/SKILL.md (cycle-update mode) | Phase 2 + Phase 3 (scoped) |
+| 13 | Update Docs | document/SKILL.md (task mode) | Task Steps 0–5 |
+| 14 | Security Audit | security/SKILL.md | Phase 1–5 (optional) |
+| 15 | Performance Test | test-run/SKILL.md (perf mode) | Steps 1–5 (optional) |
+| 16 | Deploy | deploy/SKILL.md | Step 1–7 |
+| 17 | Retrospective | retrospective/SKILL.md (cycle-end mode) | Steps 1–4 |

 ## Detection Rules

@@ -80,12 +86,12 @@ If `_docs/02_document/` exists but is incomplete (has some artifacts but no `FIN
 ---

 **Step 4 — UI Design (conditional)**
-Condition (folder fallback): `_docs/02_document/architecture.md` exists AND `_docs/02_tasks/todo/` does not exist or has no task files.
+Condition (folder fallback): `_docs/02_document/architecture.md` exists AND `_docs/02_document/tests/traceability-matrix.md` does not exist.
 State-driven: reached by auto-chain from Step 3.

 Action: Read and execute `.cursor/skills/ui-design/SKILL.md`. The skill runs its own **Applicability Check**, which handles UI project detection and the user's A/B choice. It returns one of:

- `outcome: completed` → mark Step 4 as `completed`, auto-chain to Step 5 (Decompose).
+- `outcome: completed` → mark Step 4 as `completed`, auto-chain to Step 5 (Test Spec).
 - `outcome: skipped, reason: not-a-ui-project` → mark Step 4 as `skipped`, auto-chain to Step 5.
 - `outcome: skipped, reason: user-declined` → mark Step 4 as `skipped`, auto-chain to Step 5.

@@ -93,34 +99,162 @@ The autodev no longer inlines UI detection heuristics — they live in `ui-desig

 ---

-**Step 5 — Decompose**
-Condition: `_docs/02_document/` contains `architecture.md` AND `_docs/02_document/components/` has at least one component AND `_docs/02_tasks/todo/` does not exist or has no task files
+**Step 5 — Test Spec**
+Condition (folder fallback): `_docs/02_document/FINAL_report.md` exists AND `_docs/02_document/architecture.md` exists AND `_docs/02_document/tests/traceability-matrix.md` does not exist.
+State-driven: reached by auto-chain from Step 4 (completed or skipped).

-Action: Read and execute `.cursor/skills/decompose/SKILL.md`
+Action: Read and execute `.cursor/skills/test-spec/SKILL.md`.
+
+This step converts the greenfield problem statement, acceptance criteria, solution, architecture, component docs, and UI design artifacts (if any) into test specifications before implementation begins. The test spec should cover unit, integration, blackbox, and e2e scenarios where those levels are applicable to the project.
+
+---
+
+**Step 6 — Decompose**
+Condition: `_docs/02_document/` contains `architecture.md` AND `_docs/02_document/components/` has at least one component AND `_docs/02_document/tests/traceability-matrix.md` exists AND `_docs/02_tasks/todo/` does not exist or has no implementation task files.
+
+Action: Invoke `.cursor/skills/decompose/SKILL.md` for **implementation task decomposition**. The greenfield flow selects the implementation entrypoint before handing off: Bootstrap Structure, Module Layout, Component Task Decomposition, and Cross-Task Verification.
+
+Do not invoke Blackbox Test Task Decomposition from Step 6. Test tasks are intentionally deferred to Step 9 (Decompose Tests) so the first implementation batch stays focused on product functionality and Step 8 can revise testability before test task files exist.

 If `_docs/02_tasks/` subfolders have some task files already, the decompose skill's resumability handles it.

 ---

-**Step 6 — Implement**
-Condition: `_docs/02_tasks/todo/` contains task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/` does not contain any `implementation_report_*.md` file
+**Step 7 — Implement**
+Condition: `_docs/02_tasks/todo/` contains implementation task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/` does not contain a valid product implementation report.

-Action: Read and execute `.cursor/skills/implement/SKILL.md`
+Action: Invoke `.cursor/skills/implement/SKILL.md` with task selection context **Product implementation**.
+
+The implement skill must run its **Product Implementation Completeness Gate** before it writes any final product implementation report. This gate compares completed product task specs, architecture/component promises, and actual source code so scaffold-only implementations cannot advance to Step 8. A final product implementation report without `_docs/03_implementation/implementation_completeness_cycle[N]_report.md` is incomplete and must not be treated as Step 7 completion.

 If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues. The FINAL report filename is context-dependent — see implement skill documentation for naming convention.

+For folder fallback, **implementation task files** means task specs that are not test-only specs: exclude `*_test_infrastructure.md` and task specs whose `**Component**` or `**Epic**` identifies `Blackbox Tests`.
+
+For folder fallback, a **product implementation report** is any `_docs/03_implementation/implementation_report_*.md` file except `_docs/03_implementation/implementation_report_tests.md` and refactor reports. It is valid for greenfield progression only when:
+- the matching `_docs/03_implementation/implementation_completeness_cycle[N]_report.md` exists,
+- that completeness report does not contain unresolved `FAIL` classifications, and
+- `_docs/02_tasks/todo/` contains no pending implementation task files.
+
+If a product report exists but any of those validity checks fail, treat product implementation as incomplete and stay in Step 7.
+
 ---

-**Step 7 — Run Tests**
-Condition (folder fallback): `_docs/03_implementation/` contains an `implementation_report_*.md` file.
-State-driven: reached by auto-chain from Step 6.
+**Step 8 — Code Testability Revision**
+Condition (folder fallback): `_docs/03_implementation/` contains a valid product implementation report, `_docs/03_implementation/implementation_completeness_cycle[N]_report.md` exists without unresolved `FAIL` classifications, `_docs/04_refactoring/01-testability-refactoring/testability_assessment.md` does not exist, `_docs/04_refactoring/01-testability-refactoring/testability_changes_summary.md` does not exist, `_docs/03_implementation/implementation_report_tests.md` does not exist, and `_docs/02_tasks/todo/` does not contain test task files.
+State-driven: reached by auto-chain from Step 7.
+
+**Purpose**: verify the newly built code can be exercised by the planned tests before writing the test suite. Greenfield code should be testable by design; this step catches accidental hardcoded paths, singletons, direct external service construction, or other implementation choices that would make meaningful tests impossible.
+
+**Scope — MINIMAL, SURGICAL fixes**: this is not a general refactor. It is the smallest set of changes required to make the implemented code runnable under tests.
+
+**Allowed changes** in this phase:
+- Replace hardcoded URLs / file paths / credentials / magic numbers with env vars or constructor arguments.
+- Extract narrow interfaces for components that need stubbing in tests.
+- Add optional constructor parameters for dependency injection; default to the existing behavior so callers do not break.
+- Wrap global singletons in thin accessors that tests can override.
+- Split a function ONLY when necessary to stub one of its collaborators — do not split for clarity alone.
+
+**NOT allowed** in this phase (defer to a later refactor task):
+- Renaming public APIs.
+- Moving code between files unless strictly required for isolation.
+- Changing algorithms or business logic.
+- Restructuring module boundaries or rewriting layers.
+
+Action: Analyze the codebase against the test specs to determine whether the code can be tested as-is.
+
+1. Read `_docs/02_document/tests/traceability-matrix.md` and all test scenario files in `_docs/02_document/tests/`.
+2. For each test scenario, check whether the code under test can be exercised in isolation. Look for:
+   - Hardcoded file paths or directory references
+   - Hardcoded configuration values (URLs, credentials, magic numbers)
+   - Global mutable state that cannot be overridden
+   - Tight coupling to external services without abstraction
+   - Missing dependency injection or non-configurable parameters
+   - Direct file system operations without path configurability
+   - Inline construction of heavy dependencies (models, clients)
+3. If ALL scenarios are testable as-is:
+   - Create `_docs/04_refactoring/01-testability-refactoring/`
+   - Write `_docs/04_refactoring/01-testability-refactoring/testability_assessment.md` with the scenarios reviewed and outcome "Code is testable — no changes needed"
+   - Mark Step 8 as `completed` with outcome "Code is testable — no changes needed"
+   - Auto-chain to Step 9 (Decompose Tests)
+4. If testability issues are found:
+   - Create `_docs/04_refactoring/01-testability-refactoring/`
+   - Write `list-of-changes.md` in that directory using the refactor skill template (`.cursor/skills/refactor/templates/list-of-changes.md`), with:
+     - **Mode**: `guided`
+     - **Source**: `autodev-greenfield-testability-analysis`
+     - One change entry per testability issue found (change ID, file paths, problem, proposed change, risk, dependencies). Each entry must fit the allowed-changes list above; reject entries that drift into full refactor territory and log them under "Deferred refactor candidates" instead.
+   - Invoke the refactor skill in **guided mode**: read and execute `.cursor/skills/refactor/SKILL.md` with the `list-of-changes.md` as input
+   - Phase 3 (Safety Net) is skipped for this testability run because the test suite has not been implemented yet
+   - After execution, surface `RUN_DIR/testability_changes_summary.md` to the user via the Choose format (accept / request follow-up) before auto-chaining
+   - Copy or save the accepted summary as `_docs/04_refactoring/01-testability-refactoring/testability_changes_summary.md` so folder fallback can detect Step 8 completion
+   - Mark Step 8 as `completed`
+   - Auto-chain to Step 9 (Decompose Tests)
+
+---
+
+**Step 9 — Decompose Tests**
+Condition (folder fallback): `_docs/02_document/tests/traceability-matrix.md` exists AND workspace contains source code files AND `_docs/03_implementation/` contains a valid product implementation report AND `_docs/03_implementation/implementation_completeness_cycle[N]_report.md` exists without unresolved `FAIL` classifications AND (`_docs/04_refactoring/01-testability-refactoring/testability_assessment.md` exists OR `_docs/04_refactoring/01-testability-refactoring/testability_changes_summary.md` exists) AND (`_docs/02_tasks/todo/` does not exist or has no test task files) AND `_docs/03_implementation/implementation_report_tests.md` does not exist.
+State-driven: reached by auto-chain from Step 8.
+
+Action: Read and execute `.cursor/skills/decompose/SKILL.md` in **tests-only mode** (pass `_docs/02_document/tests/` as input). The decompose skill will:
+1. Run Step 1t (test infrastructure bootstrap)
+2. Run Step 3 (blackbox/e2e-capable test task decomposition)
+3. Run Step 4 (cross-verification against test coverage)
+
+If `_docs/02_tasks/` subfolders have some task files already, the decompose skill's resumability handles it — it appends test tasks alongside existing completed implementation tasks.
+
+---
+
+**Step 10 — Implement Tests**
+Condition (folder fallback): `_docs/02_tasks/todo/` contains test task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/implementation_report_tests.md` does not exist.
+State-driven: reached by auto-chain from Step 9.
+
+Action: Invoke `.cursor/skills/implement/SKILL.md` with task selection context **Test implementation**.
+
+The implement skill reads only test tasks from `_docs/02_tasks/todo/` and implements them.
+
+If `_docs/03_implementation/` has batch reports, the implement skill detects completed test tasks and continues.
+
+For folder fallback, **test task files** means `*_test_infrastructure.md` plus task specs whose `**Component**` or `**Epic**` identifies `Blackbox Tests`.
+
+---
+
+**Step 11 — Run Tests**
+Condition (folder fallback): `_docs/03_implementation/implementation_report_tests.md` exists.
+State-driven: reached by auto-chain from Step 10.

 Action: Read and execute `.cursor/skills/test-run/SKILL.md`

+Verifies the implemented unit, integration, blackbox, and e2e tests pass before proceeding to spec and documentation sync. This is a hard product gate, not a harness-smoke gate: e2e/blackbox tests must exercise the actual implemented system through public runtime boundaries and compare actual outputs against `_docs/00_problem/input_data/expected_results/results_report.md` or referenced machine-readable expected-result files. Stubs are allowed only for external systems outside the product boundary; missing internal product implementation must fail or block the gate and send the flow back to Implement.
+
 ---

-**Step 8 — Security Audit (optional)**
-State-driven: reached by auto-chain from Step 7.
+**Step 12 — Test-Spec Sync**
+State-driven: reached by auto-chain from Step 11. Requires `_docs/02_document/tests/traceability-matrix.md` to exist — if missing, mark Step 12 `skipped` (see Action below).
+
+Action: Read and execute `.cursor/skills/test-spec/SKILL.md` in **cycle-update mode**. Pass the completed implementation task specs, completed test task specs, and implementation reports as inputs.
+
+The skill appends implementation-learned acceptance criteria, scenarios, and NFR updates to the existing test-spec files without rewriting unaffected sections. If `traceability-matrix.md` is missing, mark Step 12 as `skipped` — the next `/test-spec` full run will regenerate it.
+
+After completion, auto-chain to Step 13 (Update Docs).
+
+---
+
+**Step 13 — Update Docs**
+State-driven: reached by auto-chain from Step 12 (completed or skipped). Requires `_docs/02_document/` to contain existing documentation — if missing, mark Step 13 `skipped` (see Action below).
+
+Action: Read and execute `.cursor/skills/document/SKILL.md` in **Task mode**. Pass all completed implementation and test task spec files plus the implementation reports.
+
+The document skill in Task mode updates affected module docs, component docs, system-level docs, and test documentation without redoing full discovery, verification, or problem extraction.
+
+If `_docs/02_document/` does not contain existing docs, mark Step 13 as `skipped`.
+
+After completion, auto-chain to Step 14 (Security Audit).
+
+---
+
+**Step 14 — Security Audit (optional)**
+State-driven: reached by auto-chain from Step 13 (completed or skipped).

 Action: Apply the **Optional Skill Gate** (`protocols.md` → "Optional Skill Gate") with:
 - question:        `Run security audit before deploy?`
@@ -128,12 +262,12 @@ Action: Apply the **Optional Skill Gate** (`protocols.md` → "Optional Skill Ga
 - option-b-label:  `Skip — proceed directly to deploy`
 - recommendation:  `A — catches vulnerabilities before production`
 - target-skill:    `.cursor/skills/security/SKILL.md`
- next-step:       Step 9 (Performance Test)
+- next-step:       Step 15 (Performance Test)

 ---

-**Step 9 — Performance Test (optional)**
-State-driven: reached by auto-chain from Step 8.
+**Step 15 — Performance Test (optional)**
+State-driven: reached by auto-chain from Step 14 (completed or skipped).

 Action: Apply the **Optional Skill Gate** (`protocols.md` → "Optional Skill Gate") with:
 - question:        `Run performance/load tests before deploy?`
@@ -141,30 +275,30 @@ Action: Apply the **Optional Skill Gate** (`protocols.md` → "Optional Skill Ga
 - option-b-label:  `Skip — proceed directly to deploy`
 - recommendation:  `A or B — base on whether acceptance criteria include latency, throughput, or load requirements`
 - target-skill:    `.cursor/skills/test-run/SKILL.md` in **perf mode** (the skill handles runner detection, threshold comparison, and its own A/B/C gate on threshold failures)
- next-step:       Step 10 (Deploy)
+- next-step:       Step 16 (Deploy)

 ---

-**Step 10 — Deploy**
-State-driven: reached by auto-chain from Step 9 (after Step 9 is completed or skipped).
+**Step 16 — Deploy**
+State-driven: reached by auto-chain from Step 15 (after Step 15 is completed or skipped).

 Action: Read and execute `.cursor/skills/deploy/SKILL.md`.

-After the deploy skill completes successfully, mark Step 10 as `completed` and auto-chain to Step 11 (Retrospective).
+After the deploy skill completes successfully, mark Step 16 as `completed` and auto-chain to Step 17 (Retrospective).

 ---

-**Step 11 — Retrospective**
-State-driven: reached by auto-chain from Step 10.
+**Step 17 — Retrospective**
+State-driven: reached by auto-chain from Step 16.

 Action: Read and execute `.cursor/skills/retrospective/SKILL.md` in **cycle-end mode**. This closes the cycle's feedback loop by folding metrics into `_docs/06_metrics/retro_<date>.md` and appending the top-3 lessons to `_docs/LESSONS.md`.

-After retrospective completes, mark Step 11 as `completed` and enter "Done" evaluation.
+After retrospective completes, mark Step 17 as `completed` and enter "Done" evaluation.

 ---

 **Done**
-State-driven: reached by auto-chain from Step 11. (Sanity check: `_docs/04_deploy/` should contain all expected artifacts — containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md, deployment_procedures.md, deploy_scripts.md.)
+State-driven: reached by auto-chain from Step 17. (Sanity check: `_docs/04_deploy/` should contain all expected artifacts — containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md, deployment_procedures.md, deploy_scripts.md.)

 Action: Report project completion with summary. Then **rewrite the state file** so the next `/autodev` invocation enters the feature-cycle loop in the existing-code flow:

@@ -191,47 +325,65 @@ On the next invocation, Flow Resolution rule 1 reads `flow: existing-code` and r
 | Research (2) | Auto-chain → Research Decision (ask user: another round or proceed?) |
 | Research Decision → proceed | Auto-chain → Plan (3) |
 | Plan (3) | Auto-chain → UI Design detection (4) |
-| UI Design (4, done or skipped) | Auto-chain → Decompose (5) |
-| Decompose (5) | **Session boundary** — suggest new conversation before Implement |
-| Implement (6) | Auto-chain → Run Tests (7) |
-| Run Tests (7, all pass) | Auto-chain → Security Audit choice (8) |
-| Security Audit (8, done or skipped) | Auto-chain → Performance Test choice (9) |
-| Performance Test (9, done or skipped) | Auto-chain → Deploy (10) |
-| Deploy (10) | Auto-chain → Retrospective (11) |
-| Retrospective (11) | Report completion; rewrite state to existing-code flow, step 9 |
+| UI Design (4, done or skipped) | Auto-chain → Test Spec (5) |
+| Test Spec (5) | Auto-chain → Decompose (6) |
+| Decompose (6) | **Session boundary** — suggest new conversation before Implement |
+| Implement (7) | Auto-chain only after Product Implementation Completeness Gate passes → Code Testability Revision (8) |
+| Code Testability Revision (8) | Auto-chain → Decompose Tests (9) |
+| Decompose Tests (9) | **Session boundary** — suggest new conversation before Implement Tests |
+| Implement Tests (10) | Auto-chain → Run Tests (11) |
+| Run Tests (11, all pass) | Auto-chain → Test-Spec Sync (12) |
+| Test-Spec Sync (12, done or skipped) | Auto-chain → Update Docs (13) |
+| Update Docs (13, done or skipped) | Auto-chain → Security Audit choice (14) |
+| Security Audit (14, done or skipped) | Auto-chain → Performance Test choice (15) |
+| Performance Test (15, done or skipped) | Auto-chain → Deploy (16) |
+| Deploy (16) | Auto-chain → Retrospective (17) |
+| Retrospective (17) | Report completion; rewrite state to existing-code flow, step 9 |

 ## Status Summary — Step List

 Flow name: `greenfield`. Render using the banner template in `protocols.md` → "Banner Template (authoritative)". No header-suffix, current-suffix, or footer-extras — all empty for this flow.

-| # | Step Name          | Extra state tokens (beyond the shared set) |
-|---|--------------------|--------------------------------------------|
-| 1 | Problem            | — |
-| 2 | Research           | `DONE (N drafts)` |
-| 3 | Plan               | — |
-| 4 | UI Design          | — |
-| 5 | Decompose          | `DONE (N tasks)` |
-| 6 | Implement          | `IN PROGRESS (batch M of ~N)` |
-| 7 | Run Tests          | `DONE (N passed, M failed)` |
-| 8 | Security Audit     | — |
-| 9 | Performance Test   | — |
-| 10 | Deploy            | — |
-| 11 | Retrospective     | — |
+| # | Step Name                   | Extra state tokens (beyond the shared set) |
+|---|-----------------------------|--------------------------------------------|
+| 1 | Problem                     | — |
+| 2 | Research                    | `DONE (N drafts)` |
+| 3 | Plan                        | — |
+| 4 | UI Design                   | — |
+| 5 | Test Spec                   | — |
+| 6 | Decompose                   | `DONE (N tasks)` |
+| 7 | Implement                   | `IN PROGRESS (batch M of ~N)` |
+| 8 | Code Testability Revision   | — |
+| 9 | Decompose Tests             | `DONE (N tasks)` |
+| 10 | Implement Tests            | `IN PROGRESS (batch M)` |
+| 11 | Run Tests                  | `DONE (N passed, M failed)` |
+| 12 | Test-Spec Sync             | — |
+| 13 | Update Docs                | — |
+| 14 | Security Audit             | — |
+| 15 | Performance Test           | — |
+| 16 | Deploy                     | — |
+| 17 | Retrospective              | — |

-All rows also accept the shared state tokens (`DONE`, `IN PROGRESS`, `NOT STARTED`, `FAILED (retry N/3)`); rows 4, 8, 9 additionally accept `SKIPPED`.
+All rows also accept the shared state tokens (`DONE`, `IN PROGRESS`, `NOT STARTED`, `FAILED (retry N/3)`); rows 4, 12, 13, 14, 15 additionally accept `SKIPPED`.

 Row rendering format (step-number column is right-padded to 2 characters for alignment):

 ```
- Step 1   Problem             [<state token>]
- Step 2   Research            [<state token>]
- Step 3   Plan                [<state token>]
- Step 4   UI Design           [<state token>]
- Step 5   Decompose           [<state token>]
- Step 6   Implement           [<state token>]
- Step 7   Run Tests           [<state token>]
- Step 8   Security Audit      [<state token>]
- Step 9   Performance Test    [<state token>]
- Step 10  Deploy              [<state token>]
- Step 11  Retrospective       [<state token>]
+ Step 1   Problem                   [<state token>]
+ Step 2   Research                  [<state token>]
+ Step 3   Plan                      [<state token>]
+ Step 4   UI Design                 [<state token>]
+ Step 5   Test Spec                 [<state token>]
+ Step 6   Decompose                 [<state token>]
+ Step 7   Implement                 [<state token>]
+ Step 8   Code Testability Rev.     [<state token>]
+ Step 9   Decompose Tests           [<state token>]
+ Step 10  Implement Tests           [<state token>]
+ Step 11  Run Tests                 [<state token>]
+ Step 12  Test-Spec Sync            [<state token>]
+ Step 13  Update Docs               [<state token>]
+ Step 14  Security Audit            [<state token>]
+ Step 15  Performance Test          [<state token>]
+ Step 16  Deploy                    [<state token>]
+ Step 17  Retrospective             [<state token>]
 ```
@@ -5,7 +5,8 @@ Workflow for **meta-repositories** — repos that aggregate multiple components
 This flow differs fundamentally from `greenfield` and `existing-code`:

 - **No problem/research/plan phases** — meta-repos don't build features, they coordinate existing ones
- **No test spec / implement / run tests** — the meta-repo has no code to test
+- **No test spec / run tests** — the meta-repo has no code to test
+- **`implement` is scoped to suite-level work only** — cross-repo concerns, repo/folder renames, suite-root infra additions (e.g., `.gitmodules`, `_infra/`, suite `e2e/`). Per-component implementation lives in each component's own workspace `/autodev` cycle. The meta-repo's implement step (Step 3.5) executes only when `_docs/tasks/todo/` is non-empty AND the user explicitly opts in; placement is **before** the sync skills so subsequent Doc/E2E/CICD sync propagates the post-implementation state.
 - **No `_docs/00_problem/` artifacts** — documentation target is `_docs/*.md` unified docs, not per-feature `_docs/NN_feature/` folders
 - **Primary artifact is `_docs/_repo-config.yaml`** — generated by `monorepo-discover`, read by every other step

@@ -15,8 +16,11 @@ This flow differs fundamentally from `greenfield` and `existing-code`:
 |------|------|-----------|-------------------|
 | 1 | Discover | monorepo-discover/SKILL.md | Phase 1–10 |
 | 2 | Config Review | (human checkpoint, no sub-skill) | — |
+| 2.5 | Glossary & Architecture Vision | (inline, no sub-skill) | Steps 1–5 |
 | 3 | Status | monorepo-status/SKILL.md | Sections 1–5 |
+| 3.5 | Suite Implement | implement/SKILL.md (suite-level invocation context) | Steps 1–14 + 16 (Step 14.5 + Step 15 skipped); conditional on `_docs/tasks/todo/` non-empty AND user opt-in |
 | 4 | Document Sync | monorepo-document/SKILL.md | Phase 1–7 (conditional on doc drift) |
+| 4.5 | Integration Test Sync | monorepo-e2e/SKILL.md | Phase 1–6 (conditional on suite-e2e drift; skipped if `suite_e2e:` block absent in config) |
 | 5 | CICD Sync | monorepo-cicd/SKILL.md | Phase 1–7 (conditional on CI drift) |
 | 6 | Loop | (auto-return to Step 3 on next invocation) | — |

@@ -58,17 +62,121 @@ Action: This is a **hard session boundary**. The skill cannot proceed until a hu
 ══════════════════════════════════════
 ```

- If user picks A → verify `confirmed_by_user: true` is now set in the config. If still `false`, re-ask. If true, auto-chain to **Step 3 (Status)**.
+- If user picks A → verify `confirmed_by_user: true` is now set in the config. If still `false`, re-ask. If true, auto-chain to **Step 2.5 (Glossary & Architecture Vision)**.
 - If user picks B → mark Step 2 as `in_progress`, update state file, end the session. Tell the user to invoke `/autodev` again after reviewing.

 **Do NOT auto-flip `confirmed_by_user`.** Only the human does that.

 ---

+**Step 2.5 — Glossary & Architecture Vision** (one-shot)
+
+Condition (folder fallback): `_docs/_repo-config.yaml` exists AND `confirmed_by_user: true` AND (`_docs/glossary.md` does NOT exist OR the cross-cutting architecture doc identified in `docs.cross_cutting` does NOT contain a `## Architecture Vision` section).
+State-driven: reached by auto-chain from Step 2 (user picked A).
+
+**Goal**: Capture meta-repo-wide terminology and the user's architecture vision **once**, after the config is confirmed but before any sync skill runs. Without this, `monorepo-document` will faithfully propagate per-component changes but never surface a unified mental model of the meta-repo to the user, and the AI will keep re-inferring the same project terminology on every invocation.
+
+**Why inline (no sub-skill)**: `monorepo-discover` is hard-guarded to write only `_repo-config.yaml`; `monorepo-document` only edits *existing* docs. Glossary and architecture-vision creation is a first-time, user-confirmed write that crosses both guarantees, so it lives directly in the flow.
+
+**Inputs**:
+- `_docs/_repo-config.yaml` (component list, doc map, conventions, assumptions log)
+- Cross-cutting docs listed under `docs.cross_cutting` (existing architecture doc, if any)
+- Each component's `primary_doc` (read-only, for terminology + responsibility extraction)
+- Root `README.md` if `repo.root_readme` is referenced
+
+**Outputs**:
+- `_docs/glossary.md` (or `<docs.root>/glossary.md` if `docs.root` ≠ `_docs/`) — NEW
+- The cross-cutting architecture doc updated in place: a `## Architecture Vision` section is prepended (or merged into an existing "Vision" / "Overview" heading)
+- One new entry appended to `_docs/_repo-config.yaml` under `assumptions_log:` recording the run
+- A new top-level config entry: `glossary_doc: <path>` so future `monorepo-status` and `monorepo-document` runs treat the glossary as a known cross-cutting doc
+
+**Procedure**:
+
+1. **Draft glossary** from `_repo-config.yaml` + each component's primary doc. Include:
+   - Component codenames as they appear in the config (`name` field) and any rename pairs the user noted in `unresolved:` resolutions
+   - Domain terms that recur across ≥2 component docs
+   - Acronyms / abbreviations
+   - Convention names from `conventions:` (e.g., commit prefix, deployment tier names)
+   - Stakeholder personas if cross-cutting docs reference them
+   Each entry: one-line definition + source (`source: components.<name>.primary_doc` or `source: _repo-config.yaml conventions`). Skip generic terms.
+
+2. **Draft architecture vision** from the meta-repo perspective:
+   - **One paragraph**: what the system as a whole is, what each component contributes, the runtime topology (one binary / N services / N clients + 1 server / hybrid), how components communicate (REST / gRPC / queue / DB-shared / file-shared)
+   - **Components & responsibilities** (one-line each), pulled directly from `_repo-config.yaml` `components:` list
+   - **Cross-cutting concerns ownership**: which doc owns which concern (auth, schema, deployment, etc.) — pulled from `docs.cross_cutting[].owns`
+   - **Architectural principles / non-negotiables** the user has implied across components (e.g., "all components share a single Postgres", "submodules own their own CI", "deployment is per-tier, not per-component")
+   - **Open questions / structural drift signals**: components missing from `docs.cross_cutting`, components in registry but not in config (registry mismatch), or contradictions between component primary docs
+
+3. **Present condensed view** to the user (NOT the full draft files):
+
+   ```
+   ══════════════════════════════════════
+    REVIEW: Meta-Repo Glossary + Architecture Vision
+   ══════════════════════════════════════
+    Glossary (N terms drafted from config + component docs):
+      - <Term>: <one-line definition>
+      - ...
+
+    Architecture Vision — meta-repo level:
+      <one-paragraph synopsis>
+
+      Components / responsibilities:
+        - <component>: <one-line>
+        - ...
+
+      Cross-cutting ownership:
+        - <concern> → <doc>
+        - ...
+
+      Principles / non-negotiables:
+        - <principle>
+        - ...
+
+      Open questions / drift signals:
+        - <q1>
+        - <q2>
+   ══════════════════════════════════════
+    A) Looks correct — write the files
+    B) Add / correct entries (provide diffs)
+    C) Resolve open questions / drift signals first
+   ══════════════════════════════════════
+    Recommendation: pick C if drift signals exist;
+                    otherwise B if components or principles
+                    don't match your intent; A only when
+                    the inferred vision is exactly right.
+   ══════════════════════════════════════
+   ```
+
+4. **Iterate**:
+   - On B → integrate the user's diffs/additions, re-present, loop until A.
+   - On C → ask the listed open questions in one batch, integrate answers, re-present.
+   - **Do NOT proceed to step 5 until the user picks A.**
+
+5. **Save**:
+   - Write `_docs/glossary.md` (alphabetical) with `**Status**: confirmed-by-user` + date.
+   - Update the cross-cutting architecture doc identified in `docs.cross_cutting` (or create one at `_docs/00_architecture.md` if none exists and the user's option-B input named one): prepend `## Architecture Vision` with the confirmed paragraph + components + ownership + principles. Preserve every existing H2 below verbatim.
+   - Append to `_docs/_repo-config.yaml`:
+     - Top-level `glossary_doc: <path-relative-to-repo-root>` (sibling of `docs.root`)
+     - New `assumptions_log:` entry: `{ date: <today>, skill: autodev-meta-repo Step 2.5, run_notes: "Captured glossary + architecture vision", assumptions: [...] }`
+   - Do NOT flip any `confirmed: false` → `confirmed: true` in the config; this step writes its own confirmed artifact, it does not retroactively confirm config inferences.
+
+**Self-verification**:
+- [ ] Every glossary entry traces to either the config or a component primary doc
+- [ ] Every component listed in the vision matches a `components:` entry in the config
+- [ ] All open questions are answered or explicitly deferred (with the user's acknowledgement)
+- [ ] The cross-cutting architecture doc still contains every H2 it had before this step
+- [ ] User picked option A on the latest condensed view
+
+**Idempotency**: if both `_docs/glossary.md` exists AND the architecture doc already has a `## Architecture Vision` section, this step is **skipped on re-invocation**. To refresh, the user invokes `/autodev` after deleting `glossary.md` (or running `monorepo-discover` with structural changes that justify a re-confirmation).
+
+After completion, auto-chain to **Step 3 (Status)**.
+
+---
+
 **Step 3 — Status**

-Condition (folder fallback): `_docs/_repo-config.yaml` exists AND `confirmed_by_user: true`.
-State-driven: reached by auto-chain from Step 2 (user picked A), or entered on any re-invocation after a completed cycle.
+Condition (folder fallback): `_docs/_repo-config.yaml` exists AND `confirmed_by_user: true` AND (`_docs/glossary.md` exists OR `glossary_doc:` is recorded in the config).
+State-driven: reached by auto-chain from Step 2.5, or entered on any re-invocation after a completed cycle.

 Action: Read and execute `.cursor/skills/monorepo-status/SKILL.md`.

@@ -78,11 +186,16 @@ The status report identifies:
 - Registry/config mismatches
 - Unresolved questions

-Based on the report, auto-chain branches:
+Based on the report, auto-chain branches in this evaluation order (first match wins):

- If **doc drift** found → auto-chain to **Step 4 (Document Sync)**
- Else if **CI drift** (only) found → auto-chain to **Step 5 (CICD Sync)**
- Else if **registry mismatch** found (new components not in config) → present Choose format:
+1. **Registry mismatch** (new components not in config, or config component not in registry) → present the Choose format below FIRST. After the user resolves it (A: refresh discover, B: onboard, C: continue with mismatch acknowledged), proceed to the next rule. This rule has priority because a stale config would mislead Step 3.5's ownership-envelope synthesis and any sync skill's component scope.
+2. **Pre-routing gate (Step 3.5 detection)** — check `_docs/tasks/todo/` for suite-level task files (`*.md` excluding files starting with `_`). If ≥1 task is present, auto-chain to **Step 3.5 (Suite Implement)**. After Step 3.5 returns (regardless of A/B outcome), the post-implement re-status applies rules 3–6 below to the post-implementation state.
+3. If **doc drift** found → auto-chain to **Step 4 (Document Sync)**
+4. Else if **CI drift** (only) found → auto-chain to **Step 5 (CICD Sync)**
+5. Else if **suite-e2e drift** (only) found → auto-chain to **Step 4.5 (Integration Test Sync)** (only when `suite_e2e:` block exists in config)
+6. Else → **workflow done for this cycle**.
+
+**Registry mismatch Choose format** (rule 1):

 ```
 ══════════════════════════════════════
@@ -99,7 +212,134 @@ Based on the report, auto-chain branches:
 ══════════════════════════════════════
 ```

- Else → **workflow done for this cycle**. Report "No drift. Meta-repo is in sync." Loop waits for next invocation.
+When rule 6 fires (no drift, no todo tasks), report "No drift. Meta-repo is in sync." and end the cycle. Loop waits for next invocation.
+
+---
+
+**Step 3.5 — Suite Implement**
+
+Condition (folder fallback): `_docs/tasks/todo/` exists AND contains ≥1 file matching `*.md` excluding files starting with `_` (e.g., `_dependencies_table.md` is excluded by convention).
+
+State-driven: reached by auto-chain from Step 3 when the pre-routing gate detected todo tasks. Inserted **before** the sync skills (Step 4 / 4.5 / 5) by deliberate design: implementing renames + cross-repo edits first means the subsequent sync skills propagate the actual landed state rather than the pre-change state, avoiding a second cycle to fix downstream drift.
+
+**Skip condition**: `_docs/tasks/todo/` is empty, missing, or contains only `_*` files. In that case Step 3.5 is skipped entirely and the cycle proceeds with Step 3's existing drift-based routing.
+
+**Goal**: Execute suite-level implementation tasks — cross-repo concerns (e.g., `autopilot` + `ui` + suite `e2e/` cutover in a coordinated change-set), folder renames (e.g., `git mv flights missions` + `.gitmodules` edit + `_infra/` path refs), and suite-root infrastructure additions (e.g., `_infra/dev/docker-compose.dev.yml`). Per-component implementation work stays in each component's own workspace `/autodev` cycle.
+
+**Why this exists**: the meta-repo's existing sync skills (`monorepo-document`, `monorepo-cicd`, `monorepo-e2e`) only **propagate** changes that already landed. They cannot **execute** a task spec. Without Step 3.5, suite-level tickets like AZ-543 (B4 repo rename) or AZ-506 (new dev compose) have no flow path forward — they require operator action outside autodev.
+
+**Inputs**:
+
+- `_docs/tasks/todo/*.md` (excluding `_*`) — task specs in the existing format (`Task` / `Component` / `Dependencies` / `Acceptance criteria` headers)
+- `_docs/_repo-config.yaml` — `components[].path` list, used to compute the suite-level OWNED envelope (workspace root EXCLUDING any path under a component's folder)
+- `_docs/tasks/_dependencies_table.md` — synthesized by this step if missing (see Procedure)
+- `_docs/tasks/_suite_module_layout.md` — synthesized by this step if missing (see Procedure)
+
+**Procedure**:
+
+1. **Detection (already done by Step 3 pre-routing gate)**. List task files in `_docs/tasks/todo/` (excluding `_*`). If 0 → skip Step 3.5. If ≥1 → continue.
+
+2. **Present Choose**:
+
+   ```
+   ══════════════════════════════════════
+    DECISION REQUIRED: <N> suite-level task(s) in _docs/tasks/todo/
+   ══════════════════════════════════════
+    Task(s) detected:
+      - AZ-XXX: <title>           (deps: <list or "—">)
+      - AZ-YYY: <title>           (deps: <list or "—">)
+      ...
+
+    A) Run implement skill on these task(s) now (then continue to Doc / E2E / CICD sync)
+    B) Skip implement this cycle — continue to Doc / E2E / CICD sync without executing tasks
+    C) Pause — review the tasks before deciding (end session, no state changes)
+   ══════════════════════════════════════
+    Recommendation: A — running implement BEFORE syncs means subsequent
+                    sync skills propagate the post-implementation state.
+                    B is appropriate when tasks are blocked on user input
+                    or external coordination. C when the tasks themselves
+                    need owner clarification before execution.
+   ══════════════════════════════════════
+   ```
+
+3. **On user A — Pre-flight**:
+
+   a. **Working tree clean check**. Run `git status --porcelain`. If non-empty, surface to the user with a Choose A/B/C identical to the implement skill's prerequisite gate (commit/stash manually; agent commits as `chore: WIP pre-implement`; abort).
+
+   b. **Synthesize `_docs/tasks/_dependencies_table.md`** if missing. Parse each in-scope task's `Dependencies:` field. Write a minimal table of the form:
+
+      ```markdown
+      # Suite-Level Task Dependencies
+
+      | Task ID | Depends on | Notes |
+      |---------|------------|-------|
+      | AZ-XXX  | (none)     | — |
+      | AZ-YYY  | AZ-XXX     | — |
+      ```
+
+      If a task lists a dependency that is neither in `todo/` nor `done/`, log a warning in the synthesized file but do not block — implement skill's Step 1 (Parse) will surface the issue if it actually blocks execution.
+
+   c. **Synthesize `_docs/tasks/_suite_module_layout.md`** if missing. Default content:
+
+      ```markdown
+      # Suite-Level Module Layout (synthetic)
+
+      Generated by autodev meta-repo Step 3.5. The suite root has no per-feature decomposition; ownership is defined at the component-boundary level only.
+
+      ## Per-Component Mapping
+
+      | Component | Owns                             | Imports from |
+      |-----------|----------------------------------|--------------|
+      | suite     | (workspace root) excluding any path listed under `_repo-config.yaml.components[].path` | (read-only) every component's primary doc + `_docs/*.md` |
+
+      Suite-level tasks operate on: `.gitmodules`, `_infra/**`, `_docs/**` (excluding `_docs/tasks/_*` regenerated files), root `README.md`, `e2e/**` (suite e2e harness only).
+
+      Forbidden paths for suite-level tasks: `<component>/**` for every component listed in `_repo-config.yaml.components[].path` — those edits live in the component's own workspace `/autodev` cycle.
+      ```
+
+   d. **Prepare invocation context**:
+
+      ```
+      suite_level: true
+      TASKS_DIR: _docs/tasks/
+      module_layout_path: _docs/tasks/_suite_module_layout.md
+      ```
+
+4. **Invoke implement skill**. Read and execute `.cursor/skills/implement/SKILL.md` with the prepared context. The skill's "Suite-level invocation context" subsection (added in tandem with this flow change) honors the three flags above and skips:
+
+   - Step 14.5 (cumulative code review) — no `architecture_compliance_baseline.md` exists at the suite level; cross-task drift is captured by the next `monorepo-status` cycle instead.
+   - Step 15 (Product Implementation Completeness Gate) — the gate's inputs (`_docs/02_document/architecture.md`, `system-flows.md`, `components/*/description.md`) do not exist in the meta-repo artifact layout. Suite tasks are infrastructure / coordination work, not feature implementation.
+
+   All other implement skill steps (1–14, 16) execute unchanged. Tracker integration (Step 5: In Progress, Step 12: In Testing) runs normally.
+
+5. **Post-implement re-status**. After the implement skill completes (last batch committed, all originally-todo tasks moved to `_docs/tasks/done/`), silently re-run Step 3's drift detection logic — do NOT re-render the full Status report; just re-evaluate the drift signals against the post-implementation tree. Then auto-chain per the post-implementation drift findings:
+
+   - Doc drift → Step 4 (Document Sync)
+   - Suite-e2e drift only → Step 4.5
+   - CI drift only → Step 5
+   - No drift → cycle complete
+
+   Note: the post-implement re-status is exactly why Step 3.5 is placed before sync. A repo rename will typically introduce doc + CI drift; the next invocation of Step 4 / Step 5 catches it on the same cycle.
+
+6. **On user B (skip)** → mark Step 3.5 `skipped` in state file. Apply Step 3's original drift-based routing (compute from the pre-Step-3.5 Status report).
+
+7. **On user C (pause)** → end session. Update state to `step: 3.5, status: in_progress, sub_step: {phase: 0, name: awaiting-task-review, detail: "<N> tasks pending review"}`. Tell the user to invoke `/autodev` again after deciding. **Do NOT modify any files** — pre-flight has not run yet.
+
+**Self-verification** (executed before invoking implement):
+
+- [ ] Working tree is clean (or user explicitly chose B in the WIP-stash sub-Choose)
+- [ ] `_docs/tasks/_dependencies_table.md` exists (synthesized if it didn't)
+- [ ] `_docs/tasks/_suite_module_layout.md` exists (synthesized if it didn't)
+- [ ] All in-scope task files have a `Component:` field (skip + report any that don't — don't guess ownership)
+- [ ] Tracker availability gate satisfied per `protocols.md` (or `tracker: local` previously chosen)
+
+**Failure handling**:
+
+- If implement returns FAILED → standard Failure Handling (`protocols.md`): retry up to 3 times, then escalate.
+- If implement is interrupted mid-batch → next invocation re-detects via the implement skill's resumability protocol (read latest `_docs/03_implementation/suite_batch_*.md`). Step 3.5 itself is reentrant: on re-entry, if `todo/` still has tasks, it presents the Choose again with the remaining set.
+- **Half-applied state risk** (acknowledged): if implement is interrupted between commits, the working tree is clean at the last commit boundary but the in-flight batch is lost. The user is responsible for inspecting and re-invoking. This is intentional — automated rollback of suite-level renames + `.gitmodules` edits is more dangerous than a human-driven recovery.
+
+**Idempotency**: if `_docs/tasks/todo/` becomes empty after this step (all tasks moved to `done/`), the next `/autodev` invocation skips Step 3.5 entirely and proceeds with normal Status → sync flow.

 ---

@@ -115,6 +355,28 @@ The skill:
 3. Applies doc edits
 4. Skips any component with unconfirmed mapping (M5), reports

+After completion:
+- If the status report ALSO flagged suite-e2e drift → auto-chain to **Step 4.5 (Integration Test Sync)**
+- Else if the status report ALSO flagged CI drift → auto-chain to **Step 5 (CICD Sync)**
+- Else → end cycle, report done
+
+---
+
+**Step 4.5 — Integration Test Sync**
+
+State-driven: reached by auto-chain from Step 3 (when status report flagged suite-e2e drift and no doc drift) or from Step 4 (when both doc and suite-e2e drift were flagged).
+
+**Skip condition**: if `_docs/_repo-config.yaml` has no `suite_e2e:` block, this step is skipped entirely — there's no harness to sync. The status report should not flag suite-e2e drift in that case; if it does, that's a status-skill bug.
+
+Action: Read and execute `.cursor/skills/monorepo-e2e/SKILL.md` with scope = components flagged by status.
+
+The skill:
+1. Verifies every path under `suite_e2e.*` exists (binary fixtures excepted — see the skill's Phase 1)
+2. Classifies each flagged change against the suite-e2e impact table
+3. Applies edits to `e2e/docker-compose.suite-e2e.yml`, `e2e/fixtures/init.sql`, `e2e/fixtures/expected_detections.json` metadata, and `e2e/runner/tests/*.spec.ts` selectors as needed
+4. Bumps baseline `fixture_version` with a `-stale` suffix and appends a `_docs/_process_leftovers/` entry whenever the detection model revision changes (binary fixture cannot be regenerated automatically)
+5. Reports synced files; does not run the suite e2e itself
+
 After completion:
 - If the status report ALSO flagged CI drift → auto-chain to **Step 5 (CICD Sync)**
 - Else → end cycle, report done
@@ -123,11 +385,11 @@ After completion:

 **Step 5 — CICD Sync**

-State-driven: reached by auto-chain from Step 3 (when status report flagged CI drift and no doc drift) or from Step 4 (when both doc and CI drift were flagged).
+State-driven: reached by auto-chain from Step 3 (when status report flagged CI drift and no doc/suite-e2e drift), Step 4, or Step 4.5.

 Action: Read and execute `.cursor/skills/monorepo-cicd/SKILL.md` with scope = components flagged by status.

-After completion, end cycle. Report files updated across both doc and CI sync.
+After completion, end cycle. Report files updated across doc, suite-e2e, and CI sync.

 ---

@@ -156,14 +418,24 @@ After onboarding completes, the config is updated. Auto-chain back to **Step 3 (
 | Completed Step | Next Action |
 |---------------|-------------|
 | Discover (1) | Auto-chain → Config Review (2) |
-| Config Review (2, user picked A, confirmed_by_user: true) | Auto-chain → Status (3) |
+| Config Review (2, user picked A, confirmed_by_user: true) | Auto-chain → Glossary & Architecture Vision (2.5) |
 | Config Review (2, user picked B) | **Session boundary** — end session, await re-invocation |
-| Status (3, doc drift) | Auto-chain → Document Sync (4) |
-| Status (3, CI drift only) | Auto-chain → CICD Sync (5) |
-| Status (3, no drift) | **Cycle complete** — end session, await re-invocation |
+| Glossary & Architecture Vision (2.5) | Auto-chain → Status (3) |
+| Status (3, todo tasks present) | Auto-chain → Suite Implement (3.5) — pre-routing gate fires before drift-based routing |
+| Status (3, no todo tasks, doc drift) | Auto-chain → Document Sync (4) |
+| Status (3, no todo tasks, suite-e2e drift only) | Auto-chain → Integration Test Sync (4.5) |
+| Status (3, no todo tasks, CI drift only) | Auto-chain → CICD Sync (5) |
+| Status (3, no todo tasks, no drift) | **Cycle complete** — end session, await re-invocation |
 | Status (3, registry mismatch) | Ask user (A: discover, B: onboard, C: continue) |
-| Document Sync (4) + CI drift pending | Auto-chain → CICD Sync (5) |
-| Document Sync (4) + no CI drift | **Cycle complete** |
+| Suite Implement (3.5, user picked A, success) | Silent re-status; auto-chain per post-implementation drift (Step 4 / 4.5 / 5 / cycle complete) |
+| Suite Implement (3.5, user picked B) | Mark `skipped`; auto-chain per Step 3's original drift findings |
+| Suite Implement (3.5, user picked C) | **Session boundary** — end session, await re-invocation |
+| Suite Implement (3.5, FAILED ×3) | Standard Failure Handling escalation (`protocols.md`) |
+| Document Sync (4) + suite-e2e drift pending | Auto-chain → Integration Test Sync (4.5) |
+| Document Sync (4) + CI drift only pending | Auto-chain → CICD Sync (5) |
+| Document Sync (4) + no further drift | **Cycle complete** |
+| Integration Test Sync (4.5) + CI drift pending | Auto-chain → CICD Sync (5) |
+| Integration Test Sync (4.5) + no CI drift | **Cycle complete** |
 | CICD Sync (5) | **Cycle complete** |

 ## Status Summary — Step List
@@ -178,30 +450,40 @@ Flow-specific slot values:
   Config:  _docs/_repo-config.yaml [confirmed_by_user: <true|false>, last_updated: <date>]
  ```

-| # | Step Name        | Extra state tokens (beyond the shared set) |
-|---|------------------|--------------------------------------------|
-| 1 | Discover         | — |
-| 2 | Config Review    | `IN PROGRESS (awaiting human)` |
-| 3 | Status           | `DONE (no drift)`, `DONE (N drifts)` |
-| 4 | Document Sync    | `DONE (N docs)`, `SKIPPED (no doc drift)` |
-| 5 | CICD Sync        | `DONE (N files)`, `SKIPPED (no CI drift)` |
+| # | Step Name                          | Extra state tokens (beyond the shared set) |
+|---|------------------------------------|--------------------------------------------|
+| 1 | Discover                           | — |
+| 2 | Config Review                      | `IN PROGRESS (awaiting human)` |
+| 2.5 | Glossary & Architecture Vision   | `SKIPPED (already captured)` |
+| 3 | Status                             | `DONE (no drift)`, `DONE (N drifts)` |
+| 3.5 | Suite Implement                  | `DONE (N tasks)`, `SKIPPED (no todo tasks)`, `SKIPPED (user picked B)`, `IN PROGRESS (batch M of ~N)`, `IN PROGRESS (awaiting-task-review)` |
+| 4 | Document Sync                      | `DONE (N docs)`, `SKIPPED (no doc drift)` |
+| 4.5 | Integration Test Sync            | `DONE (N files)`, `SKIPPED (no suite-e2e drift)`, `SKIPPED (no suite_e2e config block)` |
+| 5 | CICD Sync                          | `DONE (N files)`, `SKIPPED (no CI drift)` |

-All rows accept the shared state tokens (`DONE`, `IN PROGRESS`, `NOT STARTED`, `FAILED (retry N/3)`); rows 4 and 5 additionally accept `SKIPPED`.
+All rows accept the shared state tokens (`DONE`, `IN PROGRESS`, `NOT STARTED`, `FAILED (retry N/3)`); rows 2.5, 3.5, 4, 4.5, and 5 additionally accept `SKIPPED`.

 Row rendering format:

 ```
- Step 1   Discover          [<state token>]
- Step 2   Config Review     [<state token>]
- Step 3   Status            [<state token>]
- Step 4   Document Sync     [<state token>]
- Step 5   CICD Sync         [<state token>]
+ Step 1     Discover                          [<state token>]
+ Step 2     Config Review                     [<state token>]
+ Step 2.5   Glossary & Architecture Vision    [<state token>]
+ Step 3     Status                            [<state token>]
+ Step 3.5   Suite Implement                   [<state token>]
+ Step 4     Document Sync                     [<state token>]
+ Step 4.5   Integration Test Sync             [<state token>]
+ Step 5     CICD Sync                         [<state token>]
 ```

 ## Notes for the meta-repo flow

- **No session boundary except Step 2**: unlike existing-code flow (which has boundaries around decompose), meta-repo flow only pauses at config review. Syncing is fast enough to complete in one session.
+- **Session boundaries**: Step 2 (Config Review pending), Step 2.5 (one-shot glossary/vision review), and Step 3.5 (when user picks C "Pause"). Step 3.5's A/B picks do NOT cross a session boundary — they auto-chain to syncs in the same session.
 - **Cyclical, not terminal**: no "done forever" state. Each invocation completes a drift cycle; next invocation starts fresh.
- **No tracker integration**: this flow does NOT create Jira/ADO tickets. Maintenance is not a feature — if a feature-level ticket spans the meta-repo's concerns, it lives in the per-component workspace.
+- **Tracker integration scope**: this flow does NOT create Jira/ADO tickets in its sync skills (Status / Document Sync / E2E / CICD). Step 3.5 (Suite Implement) IS tracker-integrated — it transitions existing tickets In Progress → In Testing per the implement skill's standard tracker handling. Suite-level tickets are authored manually by the operator (typically as children of an Epic that spans multiple components, like AZ-539); the flow doesn't auto-create them.
+- **Per-component vs. suite-level work**:
+  - Tickets that touch component source code (`<component>/src/**`) belong in that component's own workspace `/autodev` cycle. The meta-repo flow does NOT execute them.
+  - Tickets that touch suite-root paths only (`.gitmodules`, `_infra/**`, suite `e2e/**`, root `README.md`, suite `_docs/**` outside `tasks/_*`) are eligible for Step 3.5.
+  - Tickets that span both (e.g., AZ-550 B11 consumer cutover, which touches `autopilot/`, `ui/`, AND suite `e2e/`) are NOT executable from a single workspace by design — split the ticket so the suite-level slice can run in Step 3.5 and the component slices run in their owning workspaces.
 - **Onboarding is opt-in**: never auto-onboarded. User must explicitly request.
 - **Failure handling**: uses the same retry/escalation protocol as other flows (see `protocols.md`).
@@ -110,9 +110,11 @@ Before entering a step from this table for the first time in a session, verify t
 | Flow | Step | Sub-Step | Tracker Action |
 |------|------|----------|----------------|
 | greenfield | Plan | Step 6 — Epics | Create epics for each component |
-| greenfield | Decompose | Step 1 + Step 2 + Step 3 — All tasks | Create ticket per task, link to epic |
+| greenfield | Decompose | Implementation decomposition Step 1 + Step 2 — Product tasks | Create ticket per product task, link to epic |
+| greenfield | Decompose Tests | Step 1t + Step 3 — All test tasks | Create ticket per task, link to epic |
 | existing-code | Decompose Tests | Step 1t + Step 3 — All test tasks | Create ticket per task, link to epic |
 | existing-code | New Task | Step 7 — Ticket | Create ticket per task, link to epic |
+| meta-repo | Suite Implement | Step 3.5 — implement skill Step 5 / Step 12 | Transition existing tickets In Progress → In Testing per implement skill (does NOT create new tickets — operator authors them) |

 ### State File Marker

@@ -138,7 +140,7 @@ One retry ladder covers all failure modes: explicit failure returned by a sub-sk

 Treat the sub-skill as **failed** when ANY of the following is observed:

- The sub-skill explicitly returns a failed result (including blocked subagents, auto-fix loop exhaustion, prerequisite violations).
+- The sub-skill explicitly returns a failed result (including blocked tasks, auto-fix loop exhaustion, prerequisite violations).
 - **Stuck signals**: the same artifact is rewritten 3+ times without meaningful change; the sub-skill re-asks a question that was already answered; no new artifact has been saved despite active execution.

 ### Retry ladder
@@ -291,7 +293,7 @@ For steps that produce `_docs/` artifacts (problem, research, plan, decompose, d

 ## Debug Protocol

-When the implement skill's auto-fix loop fails (code review FAIL after 2 auto-fix attempts) or an implementer subagent reports a blocker, the user is asked to intervene. This protocol guides the debugging process. (Retry budget and escalation are covered by Failure Handling above; this section is about *how* to diagnose once the user has been looped in.)
+When the implement skill's auto-fix loop fails (code review FAIL after 2 auto-fix attempts) or a task reports a blocker, the user is asked to intervene. This protocol guides the debugging process. (Retry budget and escalation are covered by Failure Handling above; this section is about *how* to diagnose once the user has been looped in.)

 ### Structured Debugging Workflow

@@ -387,7 +389,7 @@ The banner shell is defined here once. Each flow file contributes only its step-
  where `<state token>` comes from the state-token set defined per row in the flow's step-list table.
 - `<current-suffix>` — optional, flow-specific. The existing-code flow appends ` (cycle <N>)` when `state.cycle > 1`; other flows leave it empty.
 - `Retry:` row — omit entirely when `retry_count` is 0. Include it with `<N>/3` otherwise.
- `<footer-extras>` — optional, flow-specific. The meta-repo flow adds a `Config:` line with `_docs/_repo-config.yaml` state; other flows leave it empty.
+- `<footer-extras>` — optional, flow-specific. The meta-repo flow adds a `Config:` line with `_docs/_repo-config.yaml` state; other flows leave it empty unless **parent suite docs** apply: if `<workspace-root>/../docs` exists and is a directory, append `Suite docs (parent): <absolute path>` on its own line (or `Suite docs (parent): absent` is **not** required — omit when missing). This line is orthogonal to flow-specific footer lines; both may appear.

 ### State token set (shared)

@@ -13,7 +13,7 @@ The autodev persists its position to `_docs/_autodev_state.md`. This is a lightw

 ## Current Step
 flow: [greenfield | existing-code | meta-repo]
-step: [1-11 for greenfield, 1-17 for existing-code, 1-6 for meta-repo, or "done"]
+step: [1-17 for greenfield, 1-17 for existing-code, 1-6 for meta-repo (incl. fractional 2.5 and 3.5), or "done"]
 name: [step name from the active flow's Step Reference Table]
 status: [not_started / in_progress / completed / skipped / failed]
 sub_step:
@@ -82,6 +82,19 @@ retry_count: 0
 cycle: 1
 ```

+```
+flow: meta-repo
+step: 3.5
+name: Suite Implement
+status: in_progress
+sub_step:
+  phase: 7
+  name: batch-loop
+  detail: "AZ-543 batch 1 of 1; suite-level"
+retry_count: 0
+cycle: 1
+```
+
 ```
 flow: existing-code
 step: 10
@@ -100,7 +113,7 @@ cycle: 3
 1. **Create** on the first autodev invocation (after state detection determines Step 1)
 2. **Update** after every change — this includes: batch completion, sub-step progress, step completion, session boundary, failed retry, or any meaningful state transition. The state file must always reflect the current reality.
 3. **Read** as the first action on every invocation — before folder scanning
-4. **Cross-check**: verify against actual `_docs/` folder contents. If they disagree, trust the folder structure and update the state file
+4. **Cross-check**: verify against actual `_docs/` folder contents. If they disagree, trust the folder structure and update the state file. **Parent suite `docs/`**: on every invocation, also probe `<workspace-root>/../docs` (the parent directory’s `docs` folder — typical suite-level shared documentation next to a component repo). If it exists, mention it in the Status Summary footer per `protocols.md`; use it only as supplemental reading context unless a flow step explicitly ties detection to it. It never replaces workspace `_docs/` for step detection by default.
 5. **Never delete** the state file
 6. **Retry tracking**: increment `retry_count` on each failed auto-retry; reset to `0` on success. If `retry_count` reaches 3, set `status: failed`
 7. **Failed state on re-entry**: if `status: failed` with `retry_count: 3`, do NOT auto-retry — present the issue to the user first
@@ -209,7 +209,7 @@ Bug, Spec-Gap, Security, Performance, Maintainability, Style, Scope, Architectur

 The `/implement` skill invokes this skill after each batch completes:

-1. Collects changed files from all implementer agents in the batch
+1. Collects changed files from all tasks implemented in the batch
 2. Passes task spec paths + changed files to this skill
 3. If verdict is FAIL — presents findings to user (BLOCKING), user fixes or confirms
 4. If verdict is PASS or PASS_WITH_WARNINGS — proceeds automatically (findings shown as info)
@@ -221,7 +221,7 @@ The `/implement` skill invokes this skill after each batch completes:
 | Input | Type | Source | Required |
 |-------|------|--------|----------|
 | `task_specs` | list of file paths | Task `.md` files from `_docs/02_tasks/todo/` for the current batch | Yes |
-| `changed_files` | list of file paths | Files modified by implementer agents (from `git diff` or agent reports) | Yes |
+| `changed_files` | list of file paths | Files modified by the tasks in the batch (from `git diff`) | Yes |
 | `batch_number` | integer | Current batch number (for report naming) | Yes |
 | `project_restrictions` | file path | `_docs/00_problem/restrictions.md` | If exists |
 | `solution_overview` | file path | `_docs/01_solution/solution.md` | If exists |
@@ -2,8 +2,8 @@
 name: decompose
 description: |
  Decompose planned components into atomic implementable tasks with bootstrap structure plan.
-  4-step workflow: bootstrap structure plan, component task decomposition, blackbox test task decomposition, and cross-task verification.
-  Supports full decomposition (_docs/ structure), single component mode, and tests-only mode.
+  Workflow entrypoints: implementation task decomposition, single component decomposition, and tests-only decomposition.
+  The invoking flow decides which entrypoint to run; this skill executes that selected sequence.
  Trigger phrases:
  - "decompose", "decompose features", "feature decomposition"
  - "task decomposition", "break down components"
@@ -20,7 +20,7 @@ Decompose planned components into atomic, implementable task specs with a bootst

 ## Core Principles

- **Atomic tasks**: each task does one thing; if it exceeds 8 complexity points, split it
+- **Atomic tasks**: each task does one thing; if it exceeds 5 complexity points, split it
 - **Behavioral specs, not implementation plans**: describe what the system should do, not how to build it
 - **Flat structure**: all tasks are tracker-ID-prefixed files in TASKS_DIR — no component subdirectories
 - **Save immediately**: write artifacts to disk after each task; never accumulate unsaved work
@@ -30,14 +30,15 @@ Decompose planned components into atomic, implementable task specs with a bootst

 ## Context Resolution

-Determine the operating mode based on invocation before any other logic runs.
+Resolve the selected entrypoint from the invocation context before any other logic runs. The caller decides whether this is implementation, single component, or tests-only decomposition; this skill only executes the selected sequence.

-**Default** (no explicit input file provided):
+**Implementation task decomposition** (default; selected by flows before invoking this skill):

 - DOCUMENT_DIR: `_docs/02_document/`
 - TASKS_DIR: `_docs/02_tasks/`
 - TASKS_TODO: `_docs/02_tasks/todo/`
 - Reads from: `_docs/00_problem/`, `_docs/01_solution/`, DOCUMENT_DIR
+- Produces only implementation tasks. Blackbox/e2e test task files are produced only when the invoking flow selects tests-only decomposition.

 **Single component mode** (provided file is within `_docs/02_document/` and inside a `components/` subdirectory):

@@ -55,24 +56,24 @@ Determine the operating mode based on invocation before any other logic runs.
 - TESTS_DIR: `DOCUMENT_DIR/tests/`
 - Reads from: `_docs/00_problem/`, `_docs/01_solution/`, TESTS_DIR

-Announce the detected mode and resolved paths to the user before proceeding.
+Announce the selected entrypoint and resolved paths to the user before proceeding.

 ### Step Applicability by Mode

-| Step | File | Default | Single | Tests-only |
-|------|------|:-------:|:------:|:----------:|
+| Step | File | Implementation | Single | Tests-only |
+|------|------|:--------------:|:------:|:----------:|
 | 1 Bootstrap Structure | `steps/01_bootstrap-structure.md` | ✓ | — | — |
 | 1t Test Infrastructure | `steps/01t_test-infrastructure.md` | — | — | ✓ |
 | 1.5 Module Layout | `steps/01-5_module-layout.md` | ✓ | — | — |
 | 2 Task Decomposition | `steps/02_task-decomposition.md` | ✓ | ✓ | — |
-| 3 Blackbox Test Tasks | `steps/03_blackbox-test-decomposition.md` | ✓ | — | ✓ |
+| 3 Blackbox Test Tasks | `steps/03_blackbox-test-decomposition.md` | — | — | ✓ |
 | 4 Cross-Verification | `steps/04_cross-verification.md` | ✓ | — | ✓ |

 ## Input Specification

 ### Required Files

-**Default:**
+**Implementation task decomposition:**

 | File | Purpose |
 |------|---------|
@@ -80,10 +81,11 @@ Announce the detected mode and resolved paths to the user before proceeding.
 | `_docs/00_problem/restrictions.md` | Constraints and limitations |
 | `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
 | `_docs/01_solution/solution.md` | Finalized solution |
-| `DOCUMENT_DIR/architecture.md` | Architecture from plan skill |
+| `DOCUMENT_DIR/architecture.md` | Architecture from plan/document skill (must contain a `## Architecture Vision` H2 — confirmed user intent) |
+| `DOCUMENT_DIR/glossary.md` | Project terminology (confirmed by user in plan Phase 2a.0 or document Step 4.5). Use it to keep task names, component references, and AC wording consistent with the user's vocabulary |
 | `DOCUMENT_DIR/system-flows.md` | System flows from plan skill |
 | `DOCUMENT_DIR/components/[##]_[name]/description.md` | Component specs from plan skill |
-| `DOCUMENT_DIR/tests/` | Blackbox test specs from plan skill |
+| `DOCUMENT_DIR/tests/` | Optional product acceptance context from test-spec skill; do not create test task files from it in this entrypoint |

 **Single component mode:**

@@ -110,7 +112,7 @@ Announce the detected mode and resolved paths to the user before proceeding.

 ### Prerequisite Checks (BLOCKING)

-**Default:**
+**Implementation task decomposition:**

 1. DOCUMENT_DIR contains `architecture.md` and `components/` — **STOP if missing**
 2. Create TASKS_DIR and TASKS_TODO if they do not exist
@@ -144,6 +146,8 @@ TASKS_DIR/

 **Naming convention**: Each task file is initially saved in `TASKS_TODO/` with a temporary numeric prefix (`[##]_[short_name].md`). After creating the work item ticket, rename the file to use the work item ticket ID as prefix (`[TRACKER-ID]_[short_name].md`). For example: `todo/01_initial_structure.md` → `todo/AZ-42_initial_structure.md`.

+If tracker availability fails, follow `.cursor/rules/tracker.mdc` before continuing. Only when the user explicitly chooses `tracker: local` may the numeric prefix remain; in that mode set `Tracker: pending` and `Epic: pending` in the task header and keep the task eligible for later tracker sync.
+
 ### Save Timing

 | Step | Save immediately after | Filename |
@@ -165,11 +169,11 @@ If TASKS_DIR subfolders already contain task files:

 ## Progress Tracking

-At the start of execution, create a TodoWrite with all applicable steps for the detected mode (see Step Applicability table). Update status as each step/component completes.
+At the start of execution, create a TodoWrite with all applicable steps for the selected entrypoint (see Step Applicability table). Update status as each step/component completes.

 ## Workflow

-### Step 1: Bootstrap Structure Plan (default mode only)
+### Step 1: Bootstrap Structure Plan (implementation mode only)

 Read and follow `steps/01_bootstrap-structure.md`.

@@ -181,25 +185,25 @@ Read and follow `steps/01t_test-infrastructure.md`.

 ---

-### Step 1.5: Module Layout (default mode only)
+### Step 1.5: Module Layout (implementation mode only)

 Read and follow `steps/01-5_module-layout.md`.

 ---

-### Step 2: Task Decomposition (default and single component modes)
+### Step 2: Task Decomposition (implementation and single component modes)

 Read and follow `steps/02_task-decomposition.md`.

 ---

-### Step 3: Blackbox Test Task Decomposition (default and tests-only modes)
+### Step 3: Blackbox Test Task Decomposition (tests-only mode only)

 Read and follow `steps/03_blackbox-test-decomposition.md`.

 ---

-### Step 4: Cross-Task Verification (default and tests-only modes)
+### Step 4: Cross-Task Verification (implementation and tests-only modes)

 Read and follow `steps/04_cross-verification.md`.

@@ -207,7 +211,7 @@ Read and follow `steps/04_cross-verification.md`.

 - **Coding during decomposition**: this workflow produces specs, never code
 - **Over-splitting**: don't create many tasks if the component is simple — 1 task is fine
- **Tasks exceeding 8 points**: split them; no task should be too complex for a single implementer
+- **Tasks exceeding 5 points**: split them; no task should be too complex for a single implementer
 - **Cross-component tasks**: each task belongs to exactly one component
 - **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
 - **Creating git branches**: branch creation is an implementation concern, not a decomposition one
@@ -220,7 +224,7 @@ Read and follow `steps/04_cross-verification.md`.
 | Situation | Action |
 |-----------|--------|
 | Ambiguous component boundaries | ASK user |
-| Task complexity exceeds 8 points after splitting | ASK user |
+| Task complexity exceeds 5 points after splitting | ASK user |
 | Missing component specs in DOCUMENT_DIR | ASK user |
 | Cross-component dependency conflict | ASK user |
 | Tracker epic not found for a component | ASK user for Epic ID |
@@ -232,15 +236,14 @@ Read and follow `steps/04_cross-verification.md`.
 ┌────────────────────────────────────────────────────────────────┐
 │          Task Decomposition (Multi-Mode)                        │
 ├────────────────────────────────────────────────────────────────┤
-│ CONTEXT: Resolve mode (default / single component / tests-only) │
+│ CONTEXT: Invoke the selected entrypoint (implementation / single / tests-only) │
 │                                                                 │
-│ DEFAULT MODE:                                                   │
+│ IMPLEMENTATION TASK DECOMPOSITION:                              │
 │  1.   Bootstrap Structure → steps/01_bootstrap-structure.md     │
 │       [BLOCKING: user confirms structure]                       │
 │  1.5  Module Layout       → steps/01-5_module-layout.md         │
 │       [BLOCKING: user confirms layout]                          │
 │  2.   Component Tasks     → steps/02_task-decomposition.md      │
-│  3.   Blackbox Tests      → steps/03_blackbox-test-decomposition.md │
 │  4.   Cross-Verification  → steps/04_cross-verification.md      │
 │       [BLOCKING: user confirms dependencies]                    │
 │                                                                 │
@@ -26,7 +26,7 @@ For each component (or the single provided component):
 4. Do not create tasks for other components — only tasks for the current component
 5. Each task should be atomic, containing 1 API or a list of semantically connected APIs
 6. Write each task spec using `templates/task.md`
-7. Estimate complexity per task (1, 2, 3, 5, 8 points); no task should exceed 8 points — split if it does
+7. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
 8. Note task dependencies (referencing tracker IDs of already-created dependency tasks, e.g., `AZ-42_initial_structure`)
 9. **Cross-cutting rule**: if a concern spans ≥2 components (logging, config loading, auth/authZ, error envelope, telemetry, feature flags, i18n), create ONE shared task under the cross-cutting epic. Per-component tasks declare it as a dependency and consume it; they MUST NOT re-implement it locally. Duplicate local implementations are an `Architecture` finding (High) in code-review Phase 7 and a `Maintainability` finding in Phase 6.
 10. **Shared-models / shared-API rule**: classify the task as shared if ANY of the following is true:
@@ -43,16 +43,32 @@ For each component (or the single provided component):
    Consumers read the contract file, not the producer's task spec. This prevents interface drift when the producer's implementation detail leaks into consumers.
 11. **Immediately after writing each task file**: create a work item ticket, link it to the component's epic, write the work item ticket ID and Epic ID back into the task header, then rename the file from `todo/[##]_[short_name].md` to `todo/[TRACKER-ID]_[short_name].md`.

+## Runtime Completeness Decomposition Gate
+
+Before Step 2 is considered complete, scan `architecture.md`, `system-flows.md`, component descriptions, and the solution for named internal runtime capabilities and dependencies. Examples include BASALT/OpenVINS/Kimera, FAISS, DINOv2, ONNX/TensorRT, ALIKED/DISK, LightGlue, RANSAC, PostGIS, MAVLink emission, FDR rollover, and any "A-Z" user-visible pipeline.
+
+For every named internal capability:
+
+1. Ensure at least one implementation task explicitly owns the production integration or production algorithm.
+2. Do not treat "define protocol", "create adapter boundary", "add deterministic fallback", "create scaffold", or "prepare native bridge" as implementation of the capability unless the architecture explicitly says the real capability is out of scope.
+3. If a capability needs external hardware/data to verify, still create the production implementation task. Verification may be hardware-gated later; implementation must not be omitted.
+4. Add a `## Runtime Completeness` section to any affected task with:
+   - named capability/dependency,
+   - production code that must exist,
+   - allowed external stubs, if any,
+   - unacceptable substitutes such as fake/deterministic/internal stubs.
+
 ## Self-verification (per component)

 - [ ] Every task is atomic (single concern)
- [ ] No task exceeds 8 complexity points
+- [ ] No task exceeds 5 complexity points
 - [ ] Task dependencies reference correct tracker IDs
 - [ ] Tasks cover all interfaces defined in the component spec
 - [ ] No tasks duplicate work from other components
 - [ ] Every task has a work item ticket linked to the correct epic
 - [ ] Every shared-models / shared-API task has a contract file at `_docs/02_document/contracts/<component>/<name>.md` and a `## Contract` section linking to it
 - [ ] Every cross-cutting concern appears exactly once as a shared task, not N per-component copies
+- [ ] Every named internal runtime capability has a production implementation task, not only an interface/scaffold/fallback task

 ## Save action

@@ -1,4 +1,4 @@
-# Step 3: Blackbox Test Task Decomposition (default and tests-only modes)
+# Step 3: Blackbox Test Task Decomposition (tests-only mode only)

 **Role**: Professional Quality Assurance Engineer
 **Goal**: Decompose blackbox test specs into atomic, implementable task specs.
@@ -6,7 +6,6 @@

 ## Numbering

- In default mode: continue sequential numbering from where Step 2 left off.
 - In tests-only mode: start from 02 (01 is the test infrastructure bootstrap from Step 1t).

 ## Steps
@@ -14,21 +13,26 @@
 1. Read all test specs from `DOCUMENT_DIR/tests/` (`blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, `resource-limit-tests.md`)
 2. Group related test scenarios into atomic tasks (e.g., one task per test category or per component under test)
 3. Each task should reference the specific test scenarios it implements and the environment/test-data specs
-4. Dependencies:
-   - In default mode: blackbox test tasks depend on the component implementation tasks they exercise
+4. Add a **System Under Test Boundary** section to every e2e/blackbox test task:
+   - The test must drive the product through public runtime boundaries and compare actual outputs to `_docs/00_problem/input_data/expected_results/results_report.md` and any referenced machine-readable expected-result files.
+   - Stubs are allowed only for external systems outside the product boundary: flight controller/SITL, QGC observer, satellite-provider/Suite service, physical Jetson hardware, physical camera, licensed public datasets, and network services.
+   - Stubs, fakes, deterministic fallbacks, monkeypatches, or direct imports are not allowed for internal product modules that the scenario is meant to validate, such as VIO, safety/anchor wrapper, satellite retrieval, anchor verification, tile manager, MAVLink output adapter, or FDR.
+   - If an internal module is not implemented, the test must fail/block as missing product implementation; it must not pass by replacing that module with a test stub.
+5. Dependencies:
   - In tests-only mode: blackbox test tasks depend on the test infrastructure bootstrap task (Step 1t)
-5. Write each task spec using `templates/task.md`
-6. Estimate complexity per task (1, 2, 3, 5, 8 points); no task should exceed 8 points — split if it does
-7. Note task dependencies (referencing tracker IDs of already-created dependency tasks)
-8. **Immediately after writing each task file**: create a work item ticket under the "Blackbox Tests" epic, write the work item ticket ID and Epic ID back into the task header, then rename the file from `todo/[##]_[short_name].md` to `todo/[TRACKER-ID]_[short_name].md`.
+6. Write each task spec using `templates/task.md`
+7. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
+8. Note task dependencies (referencing tracker IDs of already-created dependency tasks)
+9. **Immediately after writing each task file**: create a work item ticket under the "Blackbox Tests" epic, write the work item ticket ID and Epic ID back into the task header, then rename the file from `todo/[##]_[short_name].md` to `todo/[TRACKER-ID]_[short_name].md`.

 ## Self-verification

 - [ ] Every scenario from `tests/blackbox-tests.md` is covered by a task
 - [ ] Every scenario from `tests/performance-tests.md`, `tests/resilience-tests.md`, `tests/security-tests.md`, and `tests/resource-limit-tests.md` is covered by a task
- [ ] No task exceeds 8 complexity points
- [ ] Dependencies correctly reference the dependency tasks (component tasks in default mode, test infrastructure in tests-only mode)
+- [ ] No task exceeds 5 complexity points
+- [ ] Dependencies correctly reference the test infrastructure task
 - [ ] Every task has a work item ticket linked to the "Blackbox Tests" epic
+- [ ] Every e2e/blackbox task forbids internal product stubs/fakes and requires comparison against expected-results artifacts

 ## Save action

@@ -1,4 +1,4 @@
-# Step 4: Cross-Task Verification (default and tests-only modes)
+# Step 4: Cross-Task Verification (implementation and tests-only modes)

 **Role**: Professional software architect and analyst
 **Goal**: Verify task consistency and produce `_dependencies_table.md`.
@@ -8,17 +8,20 @@

 1. Verify task dependencies across all tasks are consistent
 2. Check no gaps:
-   - In default mode: every interface in `architecture.md` has tasks covering it
+   - In implementation mode: every product interface in `architecture.md` has implementation task coverage
   - In tests-only mode: every test scenario in `traceability-matrix.md` is covered by a task
+   - In implementation mode: every named internal runtime capability/dependency from architecture, solution, system flows, and component descriptions has a production implementation task, not only an interface/scaffold/fallback task
+   - In tests-only mode: every e2e/blackbox task has a System Under Test Boundary section that forbids stubbing internal product modules and requires comparison to expected-results artifacts
 3. Check no overlaps: tasks don't duplicate work
 4. Check no circular dependencies in the task graph
 5. Produce `_dependencies_table.md` using `templates/dependencies-table.md`

 ## Self-verification

-### Default mode
+### Implementation mode

- [ ] Every architecture interface is covered by at least one task
+- [ ] Every product interface in `architecture.md` is covered by at least one implementation task
+- [ ] Every named internal runtime capability has a production implementation task
 - [ ] No circular dependencies in the task graph
 - [ ] Cross-component dependencies are explicitly noted in affected task specs
 - [ ] `_dependencies_table.md` contains every task with correct dependencies
@@ -26,6 +29,7 @@
 ### Tests-only mode

 - [ ] Every test scenario from `traceability-matrix.md` "Covered" entries has a corresponding task
+- [ ] Every e2e/blackbox task validates actual product behavior and allows stubs only for external systems
 - [ ] No circular dependencies in the task graph
 - [ ] Test task dependencies reference the test infrastructure bootstrap
 - [ ] `_dependencies_table.md` contains every task with correct dependencies
@@ -28,4 +28,4 @@ Use this template after cross-task verification. Save as `TASKS_DIR/_dependencie
 - Dependencies column lists tracker IDs (e.g., "AZ-43, AZ-44") or "None"
 - No circular dependencies allowed
 - Tasks should be listed in recommended execution order
- The `/implement` skill reads this table to compute parallel batches
+- The `/implement` skill reads this table to compute dependency-aware batches; task execution remains sequential
@@ -1,6 +1,6 @@
 # Module Layout Template

-The module layout is the **authoritative file-ownership map** used by the `/implement` skill to assign OWNED / READ-ONLY / FORBIDDEN files to implementer subagents. It is derived from `_docs/02_document/architecture.md` and the component specs at `_docs/02_document/components/`, and it follows the target language's standard project-layout conventions.
+The module layout is the **authoritative file-ownership map** used by the `/implement` skill to assign OWNED / READ-ONLY / FORBIDDEN files to each task. It is derived from `_docs/02_document/architecture.md` and the component specs at `_docs/02_document/components/`, and it follows the target language's standard project-layout conventions.

 Save as `_docs/02_document/module-layout.md`. This file is produced by the decompose skill (Step 1.5 module layout) and consumed by the implement skill (Step 4 file ownership). Task specs remain purely behavioral — they do NOT carry file paths. The layout is the single place where component → filesystem mapping lives.

@@ -104,4 +104,4 @@ The implement skill's Step 4 (File Ownership) reads this file and, for each task
 3. Set READ-ONLY = the Public API files of every component listed in `Imports from`, plus `shared/*` Public API files.
 4. Set FORBIDDEN = every other component's Owns glob.

-If two tasks in the same batch map to the same component, the implement skill schedules them sequentially (one implementer at a time for that component) to avoid file conflicts on shared internal files.
+Execution inside a batch is already sequential (one task at a time). This mapping is still required because it enforces scope discipline per task — preventing a task from drifting into files that belong to another component.
@@ -11,7 +11,7 @@ Save as `TASKS_DIR/[##]_[short_name].md` initially, then rename to `TASKS_DIR/[T
 **Task**: [TRACKER-ID]_[short_name]
 **Name**: [short human name]
 **Description**: [one-line description of what this task delivers]
-**Complexity**: [1|2|3|5|8] points
+**Complexity**: [1|2|3|5] points
 **Dependencies**: [AZ-43_shared_models, AZ-44_db_migrations] or "None"
 **Component**: [component name for context]
 **Tracker**: [TASK-ID]
@@ -102,8 +102,7 @@ Consumers MUST read that file — not this task spec — to discover the interfa
 - 2 points: Non-trivial, low complexity, minimal coordination
 - 3 points: Multi-step, moderate complexity, potential alignment needed
 - 5 points: Difficult, interconnected logic, medium-high risk
- 8 points: High difficulty, high ambiguity or coordination, multiple components
- 13 points: Too complex — split into smaller tasks
+- 8+ points: Too complex — split into smaller tasks

 ## Output Guidelines

@@ -26,7 +26,8 @@
   - Application components under test
   - Test runner container (black-box, no internal imports)
   - Isolated database with seed data
-   - All tests runnable via `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
+   - All tests runnable via `docker compose -f docker-compose.test.yml up --abort-on-container-exit --exit-code-from e2e-runner`
+   - See the Woodpecker two-workflow contract in [`../templates/ci_cd_pipeline.md`](../templates/ci_cd_pipeline.md) — the test runner entry point defined here becomes the first step of `.woodpecker/01-test.yml`.
 7. Define image tagging strategy: `<registry>/<project>/<component>:<git-sha>` for CI, `latest` for local dev only

 ## Self-verification
@@ -85,3 +85,140 @@ Save as `_docs/04_deploy/ci_cd_pipeline.md`.
 | Deploy success | [Slack] | [team] |
 | Deploy failure | [Slack/email + PagerDuty] | [on-call] |
 ```
+
+---
+
+## Reference Implementation: Woodpecker CI two-workflow contract
+
+Use this when the project's CI is **Woodpecker** and the test layout follows the autodev e2e contract from [`../../decompose/templates/test-infrastructure-task.md`](../../decompose/templates/test-infrastructure-task.md) (an `e2e/` folder containing `Dockerfile`, `docker-compose.test.yml`, `conftest.py`, `requirements.txt`, `mocks/`, `fixtures/`, `tests/`).
+
+The contract is **two workflows in `.woodpecker/`**, scheduled on the same agent label, with the build workflow gated on a successful test run:
+
+- `.woodpecker/01-test.yml` — runs the e2e contract, publishes `results/report.csv` as an artifact, fails the pipeline on any test failure.
+- `.woodpecker/02-build-push.yml` — `depends_on: [01-test]`. Builds the image, tags it `${CI_COMMIT_BRANCH}-${TAG_SUFFIX}`, pushes it to the registry. Skipped automatically if test failed.
+
+The agent label is parameterized via `matrix:` so a single workflow file fans out across architectures: `labels: platform: ${PLATFORM}` routes each matrix entry to the matching agent. Both workflows for a repo must use the same matrix so test and build run on the same machine and share Docker layer cache. New architectures = new matrix entries; never new files.
+
+### Multi-arch matrix conventions
+
+| Variable | Meaning | Typical values |
+|----------|---------|----------------|
+| `PLATFORM` | Woodpecker agent label — selects which physical machine runs the entry. | `arm64`, `amd64` |
+| `TAG_SUFFIX` | Image tag suffix appended after the branch name. | `arm`, `amd` |
+| `DOCKERFILE` *(only when arches need different Dockerfiles)* | Path to the Dockerfile for this entry. | `Dockerfile`, `Dockerfile.jetson` |
+
+Most repos use the same `Dockerfile` for both arches (multi-arch base images handle the rest), so `DOCKERFILE` can be omitted from the matrix and hardcoded in the build command. Repos with split per-arch Dockerfiles (e.g., `detections` uses `Dockerfile.jetson` on Jetson with TensorRT/CUDA-on-L4T) declare `DOCKERFILE` as a matrix var.
+
+When only one architecture is currently in use, keep the matrix block with a single entry and the second entry commented out — adding a new arch is then a one-line uncomment, not a structural change.
+
+### `.woodpecker/01-test.yml`
+
+```yaml
+when:
+  event: [push, pull_request, manual]
+  branch: [dev, stage, main]
+
+matrix:
+  include:
+    - PLATFORM: arm64
+      TAG_SUFFIX: arm
+    # - PLATFORM: amd64
+    #   TAG_SUFFIX: amd
+
+labels:
+  platform: ${PLATFORM}
+
+steps:
+  - name: e2e
+    image: docker
+    commands:
+      - cd e2e
+      - docker compose -f docker-compose.test.yml up --abort-on-container-exit --exit-code-from e2e-runner --build
+      - docker compose -f docker-compose.test.yml down -v
+    volumes:
+      - /var/run/docker.sock:/var/run/docker.sock
+
+  - name: report
+    image: docker
+    when:
+      status: [success, failure]
+    commands:
+      - test -f e2e/results/report.csv && cat e2e/results/report.csv || echo "no report"
+    volumes:
+      - /var/run/docker.sock:/var/run/docker.sock
+```
+
+Notes:
+- `--abort-on-container-exit` shuts the whole compose down as soon as ANY service exits, so a crashed dependency surfaces immediately instead of hanging the runner.
+- `--exit-code-from e2e-runner` ensures the pipeline's exit code reflects the test runner's, not the SUT's.
+- The `report` step runs on `[success, failure]` so the report is always published; without this the CSV is lost on red builds.
+- `down -v` between runs drops mock state and DB volumes — every test run starts clean.
+
+### `.woodpecker/02-build-push.yml`
+
+```yaml
+when:
+  event: [push, manual]
+  branch: [dev, stage, main]
+
+depends_on:
+  - 01-test
+
+matrix:
+  include:
+    - PLATFORM: arm64
+      TAG_SUFFIX: arm
+    # - PLATFORM: amd64
+    #   TAG_SUFFIX: amd
+
+labels:
+  platform: ${PLATFORM}
+
+steps:
+  - name: build-push
+    image: docker
+    environment:
+      REGISTRY_HOST:
+        from_secret: registry_host
+      REGISTRY_USER:
+        from_secret: registry_user
+      REGISTRY_TOKEN:
+        from_secret: registry_token
+    commands:
+      - echo "$REGISTRY_TOKEN" | docker login "$REGISTRY_HOST" -u "$REGISTRY_USER" --password-stdin
+      - export TAG=${CI_COMMIT_BRANCH}-${TAG_SUFFIX}
+      - export BUILD_DATE=$(date -u +%Y-%m-%dT%H:%M:%SZ)
+      - |
+        docker build -f Dockerfile \
+          --build-arg CI_COMMIT_SHA=$CI_COMMIT_SHA \
+          --label org.opencontainers.image.revision=$CI_COMMIT_SHA \
+          --label org.opencontainers.image.created=$BUILD_DATE \
+          --label org.opencontainers.image.source=$CI_REPO_URL \
+          -t $REGISTRY_HOST/azaion/<service>:$TAG .
+      - docker push $REGISTRY_HOST/azaion/<service>:$TAG
+    volumes:
+      - /var/run/docker.sock:/var/run/docker.sock
+```
+
+Notes:
+- `depends_on: [01-test]` is enforced by Woodpecker — a failed `01-test` (any matrix entry) skips this workflow.
+- The build workflow does NOT trigger on `pull_request` events: PRs get test signal only; pushes to `dev`/`stage`/`main` produce images. Avoids polluting the registry with PR images.
+- Replace `<service>` with the actual service name (matches the registry namespace pattern `azaion/<service>`).
+- For repos with split per-arch Dockerfiles, add `DOCKERFILE: Dockerfile.jetson` (or similar) to the matrix entry and substitute `${DOCKERFILE}` for `Dockerfile` in the `docker build -f` line.
+
+### Variations by stack
+
+The contract is language-agnostic because the runner is `docker compose`. The Dockerfile inside `e2e/` selects the test framework:
+
+| Stack | `e2e/Dockerfile` runs |
+|-------|----------------------|
+| Python | `pytest --csv=/results/report.csv -v` |
+| .NET | `dotnet test --logger:"trx;LogFileName=/results/report.trx"` (convert to CSV in a final step if needed) |
+| Node/UI | `npm test -- --reporters=default --reporters=jest-junit --outputDirectory=/results` |
+| Rust | `cargo test --no-fail-fast -- --format json > /results/report.json` |
+
+When the repo has **only unit tests** (no `e2e/docker-compose.test.yml`), drop the compose orchestration and run the native test command directly inside a stack-appropriate image. Keep the same two-workflow split — `01-test.yml` runs unit tests, `02-build-push.yml` is unchanged.
+
+### Manual-trigger override (test infrastructure not yet validated)
+
+If a repo ships a complete `e2e/` layout but the test fixtures are not yet validated end-to-end (e.g., expected-results data is still being authored), gate `01-test.yml` on `event: [manual]` only and add a TODO comment pointing to the unblocking task. The `02-build-push.yml` workflow drops its `depends_on` clause for the manual-only window — an explicit and reversible exception, not a permanent split.
@@ -31,6 +31,7 @@ _docs/
    │   ├── components.md
    │   └── flows/
    ├── 04_verification_log.md           # Step 4
+    ├── glossary.md                       # Step 4.5 (confirmed-by-user)
    ├── FINAL_report.md                  # Step 7
    └── state.json                       # Resumability
 ```
@@ -49,6 +50,7 @@ Maintained in `DOCUMENT_DIR/state.json` for resumability:
  "modules_remaining": ["services/auth", "api/endpoints"],
  "module_batch": 1,
  "components_written": [],
+  "step_4_5_glossary_vision": "not_started",
  "last_updated": "2026-03-21T14:00:00Z"
 }
 ```
@@ -15,7 +15,7 @@ Covers three related modes that share the same 8-step pipeline:

 ## Progress Tracking

-Create a TodoWrite with all steps (0 through 7). Update status as each step completes.
+Create a TodoWrite with all steps (0 through 7, including the inline Step 2.5 Module Layout Derivation and Step 4.5 Glossary & Architecture Vision). Update status as each step completes.

 ## Steps

@@ -251,7 +251,107 @@ Apply corrections inline to the documents that need them.

 **BLOCKING**: Present verification summary to user. Do NOT proceed until user confirms corrections are acceptable or requests additional fixes.

-**Session boundary**: After verification is confirmed, suggest a session break before proceeding to the synthesis steps (5–7). These steps produce different artifact types and benefit from fresh context:
+---
+
+### Step 4.5: Glossary & Architecture Vision (BLOCKING)
+
+**Role**: Software architect + business analyst
+**Goal**: Reconcile the AI's verified understanding of the codebase with the user's intended terminology and architecture vision. Existing-code projects often carry domain language and structural intent that is invisible from code alone (synonyms, deprecated names, modules that are "supposed to" be split, components the user thinks of as one logical unit even though they live in two folders). This step makes that intent explicit before any downstream skill (refactor, decompose, new-task) acts on the docs.
+
+**When this step runs**:
+- Always, after Step 4 (Verification Pass) — for Full and Resume modes.
+- **Skipped** in Focus Area mode (the glossary/vision is system-wide; running it on a partial scan would produce a partial glossary). Resume the user once a full pass exists.
+
+**Inputs** (already on disk after Step 4):
+- `DOCUMENT_DIR/architecture.md`, `system-flows.md`, `data_model.md`, `deployment/*`
+- `DOCUMENT_DIR/components/*/description.md`
+- `DOCUMENT_DIR/modules/*.md`
+- `DOCUMENT_DIR/04_verification_log.md` (so the AI knows which doc parts are confirmed vs. flagged)
+
+**Outputs**:
+- `DOCUMENT_DIR/glossary.md` (NEW)
+- `DOCUMENT_DIR/architecture.md` updated in place: a new `## Architecture Vision` section is prepended (or merged into an existing "Overview" / "Vision" heading if already present); existing technical sections are preserved verbatim
+
+**Procedure**:
+
+1. **Draft glossary** from verified docs:
+   - Domain entities, processes, roles named in module/component docs
+   - Acronyms / abbreviations
+   - Internal codenames (project, service, model names) that recur in the codebase
+   - Synonym pairs the AI noticed (e.g., the codebase uses "flight" but module comments say "mission")
+   - Stakeholder personas if any docs reference them
+   Each entry: one-line definition + source reference (`source: components/03_flights/description.md`). Skip generic CS/industry terms.
+
+2. **Draft architecture vision** as the AI currently understands the codebase:
+   - **One paragraph**: what the system is, who runs it, the runtime topology shape (monolith / services / pipeline / library / hybrid), and the dominant pattern (e.g., "submodule-based meta-repo with REST + SSE between UI and backend").
+   - **Components & responsibilities** (one-line each), pulled from `components/*/description.md`.
+   - **Major data flows** (one or two sentences each), pulled from `system-flows.md`.
+   - **Architectural principles / non-negotiables** the AI inferred from the code (e.g., "DB-driven config", "all UI traffic via REST + SSE only", "no per-component shared state"). Mark each with `inferred-from: <source>`.
+   - **Open questions / drift signals**: places where the code disagrees with itself, or where the AI cannot tell intent from implementation (e.g., two components doing similar work — is that legacy duplication or deliberate?).
+
+3. **Present condensed view** to the user (NOT the full draft files — a synopsis only):
+
+   ```
+   ══════════════════════════════════════
+    REVIEW: Glossary + Architecture Vision (existing code)
+   ══════════════════════════════════════
+    Glossary (N terms drafted from verified docs):
+      - <Term>: <one-line definition>
+      - ...
+
+    Architecture Vision — as inferred from the codebase:
+      <one-paragraph synopsis>
+
+      Components / responsibilities:
+        - <component>: <one-line>
+        - ...
+
+      Principles / non-negotiables (inferred):
+        - <principle>  [inferred-from: <source>]
+        - ...
+
+      Open questions / drift signals:
+        - <q1>
+        - <q2>
+   ══════════════════════════════════════
+    A) Inferred vision matches my intent — write the files
+    B) Add / correct entries (provide diffs — terms, components,
+       principles, or rename pairs)
+    C) Resolve the open questions / drift signals first
+   ══════════════════════════════════════
+    Recommendation: pick C if any drift signals exist;
+                    otherwise B if the vision misses
+                    project-specific intent; A only when
+                    the inferred vision is exactly right.
+   ══════════════════════════════════════
+   ```
+
+4. **Iterate**:
+   - On B → integrate the user's diffs/additions, re-present, loop until A.
+   - On C → ask the listed open questions in one batch (M4-style), integrate answers, re-present.
+   - **Do NOT proceed to step 5 until the user picks A.**
+
+5. **Save**:
+   - Write `DOCUMENT_DIR/glossary.md`, alphabetical, with a top-line `**Status**: confirmed-by-user` and the date.
+   - Update `DOCUMENT_DIR/architecture.md`:
+     - If a `## Architecture Vision` (or `## Vision` / `## Overview`) section already exists at the top, replace its body with the confirmed paragraph + components + principles.
+     - Otherwise, insert `## Architecture Vision` as the first H2 after the title; preserve every existing H2 below.
+     - Do NOT delete or re-order existing technical sections (Tech Stack, Deployment Model, Data Model, NFRs, ADRs).
+
+6. **Update `state.json`**: mark `step_4_5_glossary_vision: confirmed`. Resume on rerun must skip this step unless the user explicitly invokes `/document --refresh-vision`.
+
+**Self-verification**:
+- [ ] Every glossary entry traces to at least one file under `DOCUMENT_DIR/`
+- [ ] Every component listed in the vision matches a folder under `DOCUMENT_DIR/components/`
+- [ ] All open questions are answered or explicitly deferred (with the user's acknowledgement)
+- [ ] `architecture.md` still contains all H2 sections it had before this step
+- [ ] User picked option A on the latest condensed view
+
+**BLOCKING**: Do NOT proceed to the session boundary / Step 5 until both files are saved and the user has picked A.
+
+---
+
+**Session boundary**: After Step 4.5 is confirmed, suggest a session break before proceeding to the synthesis steps (5–7). These steps produce different artifact types and benefit from fresh context:

 ```
 ══════════════════════════════════════
@@ -1,41 +1,59 @@
 ---
 name: implement
 description: |
-  Orchestrate task implementation with dependency-aware batching, parallel subagents, and integrated code review.
+  Implement tasks sequentially with dependency-aware batching and integrated code review.
  Reads flat task files and _dependencies_table.md from TASKS_DIR, computes execution batches via topological sort,
-  launches up to 4 implementer subagents in parallel, runs code-review skill after each batch, and loops until done.
+  implements tasks one at a time in dependency order, runs code-review skill after each batch, and loops until done.
  Use after /decompose has produced task files.
  Trigger phrases:
  - "implement", "start implementation", "implement tasks"
-  - "run implementers", "execute tasks"
+  - "execute tasks"
 category: build
-tags: [implementation, orchestration, batching, parallel, code-review]
+tags: [implementation, batching, code-review]
 disable-model-invocation: true
 ---

-# Implementation Orchestrator
+# Implementation Runner

-Orchestrate the implementation of all tasks produced by the `/decompose` skill. This skill is a **pure orchestrator** — it does NOT write implementation code itself. It reads task specs, computes execution order, delegates to `implementer` subagents, validates results via the `/code-review` skill, and escalates issues.
+Implement all tasks produced by the `/decompose` skill. This skill reads task specs, computes execution order, writes the code and tests for each task **sequentially** (no subagents, no parallel execution), validates results via the `/code-review` skill, and escalates issues.

-The `implementer` agent is the specialist that writes all the code — it receives a task spec, analyzes the codebase, implements the feature, writes tests, and verifies acceptance criteria.
+For each task the main agent receives a task spec, analyzes the codebase, implements the feature, writes tests, and verifies acceptance criteria — then moves on to the next task.

 ## Core Principles

- **Orchestrate, don't implement**: this skill delegates all coding to `implementer` subagents
- **Dependency-aware batching**: tasks run only when all their dependencies are satisfied
- **Max 4 parallel agents**: never launch more than 4 implementer subagents simultaneously
- **File isolation**: no two parallel agents may write to the same file
+- **Sequential execution**: implement one task at a time. Do NOT spawn subagents and do NOT run tasks in parallel. (See `.cursor/rules/no-subagents.mdc`.)
+- **Dependency-aware ordering**: tasks run only when all their dependencies are satisfied
+- **Batching for review, not parallelism**: tasks are grouped into batches so `/code-review` and commits operate on a coherent unit of work — all tasks inside a batch are still implemented one after the other
 - **Integrated review**: `/code-review` skill runs automatically after each batch
- **Auto-start**: batches launch immediately — no user confirmation before a batch
+- **Completeness before testing**: product implementation is not done until code is checked against task outcomes, included scope, architecture/component promises, named runtime dependencies, and unresolved scaffold/native placeholders — not just task AC tests
+- **Runtime dependency reality**: production code cannot satisfy a task by exposing only a protocol, fake runner, deterministic fallback, or "native bridge" placeholder when the task/architecture promises a concrete internal capability such as BASALT VIO, FAISS retrieval, LightGlue matching, or a full A-Z localization pipeline. Stubs are allowed only for external systems and tests.
+- **Auto-start**: batches start immediately — no user confirmation before a batch
 - **Gate on failure**: user confirmation is required only when code review returns FAIL
 - **Commit per batch**: after each batch is confirmed, commit. Ask the user whether to push to remote unless the user previously opted into auto-push for this session.

 ## Context Resolution

 - TASKS_DIR: `_docs/02_tasks/`
- Task files: all `*.md` files in `TASKS_DIR/todo/` (excluding files starting with `_`)
+- Task files: selected `*.md` files in `TASKS_DIR/todo/` (excluding files starting with `_`)
 - Dependency table: `TASKS_DIR/_dependencies_table.md`

+### Task Selection Context
+
+The invoking flow decides which task category this run should execute. The implement skill must honor that selected context instead of consuming every file in `todo/`.
+
+| Context | Selected task files |
+|---------|---------------------|
+| Product implementation | Task specs that are not test-only and not refactoring specs |
+| Test implementation | `*_test_infrastructure.md` plus task specs whose `Component` or `Epic` identifies `Blackbox Tests` |
+| Refactoring | Task specs whose filename or task ID includes `_refactor_` |
+
+If no explicit context is provided, infer it from the active autodev step:
+- greenfield Step 7 or existing-code Step 10 → Product implementation
+- greenfield Step 10 or existing-code Step 6 → Test implementation
+- refactor Phase 4 → Refactoring
+
+Unselected task files remain in `TASKS_DIR/todo/` for their later flow step.
+
 ### Task Lifecycle Folders

 ```
@@ -46,9 +64,31 @@ TASKS_DIR/
 └── done/        ← completed tasks (moved here after implementation)
 ```

+### Suite-level invocation context (meta-repo flow)
+
+When invoked from `.cursor/skills/autodev/flows/meta-repo.md` Step 3.5 (or any caller that supplies the same context envelope), the skill receives:
+
+```
+suite_level: true
+TASKS_DIR: <override>          # e.g., _docs/tasks/  (vs. default _docs/02_tasks/)
+module_layout_path: <override>  # e.g., _docs/tasks/_suite_module_layout.md
+```
+
+When `suite_level: true` is present, the following gate adjustments apply — and ONLY these. All other steps (1–14, 16) execute unchanged:
+
+1. **TASKS_DIR override** is honored throughout the skill (Step 1 Parse, Step 13 Archive, Step 15 input paths if it ran). Default `_docs/02_tasks/` is replaced by the supplied path.
+2. **module_layout_path override** is read instead of the hardcoded `_docs/02_document/module-layout.md` in Step 4 (Assign File Ownership). The supplied file uses the same `Per-Component Mapping` schema. If both the override and the hardcoded path are missing, behavior is unchanged from default mode (STOP and instruct).
+3. **Step 14.5 (Cumulative Code Review) — SKIPPED**. The meta-repo has no `_docs/02_document/architecture_compliance_baseline.md`; cross-task drift is captured by the next `monorepo-status` cycle instead.
+4. **Step 15 (Product Implementation Completeness Gate) — SKIPPED**. The gate's hard inputs (`_docs/02_document/architecture.md`, `system-flows.md`, `components/*/description.md`) do not exist in the meta-repo artifact layout. Suite-level tasks are infrastructure / coordination work (renames, cross-repo edits, suite-root infra additions), not feature implementation; the equivalent completeness signal is the next `monorepo-status` drift report (which the meta-repo flow re-runs immediately after Step 3.5 returns).
+5. **Final report filename**: `_docs/03_implementation/suite_implementation_report_{run_name}.md` (in addition to the existing feature/test/refactor variants). Batch reports follow `_docs/03_implementation/suite_batch_{NN}_report.md`.
+6. **Tracker integration** (Step 5: In Progress, Step 12: In Testing) runs unchanged — suite-level tickets follow the same tracker rules as any other.
+
+Without `suite_level: true`, none of these adjustments apply and the skill runs exactly as documented in default mode.
+
 ## Prerequisite Checks (BLOCKING)

-1. `TASKS_DIR/todo/` exists and contains at least one task file — **STOP if missing**
+1. `TASKS_DIR/todo/` exists and contains at least one task file for the selected context — **STOP if missing**
+   - Exception for Product implementation re-entry: if no selected product tasks remain in `todo/`, but the active autodev state is Step 7 or the latest product completeness report is missing/invalid/contains `FAIL`, skip directly to Step 15 (Product Implementation Completeness Gate). This gate may create remediation tasks and return to Step 1. Do not write a final implementation report from this state.
 2. `_dependencies_table.md` exists — **STOP if missing**
 3. At least one task is not yet completed — **STOP if all done**
 4. **Working tree is clean** — run `git status --porcelain`; the output must be empty.
@@ -56,16 +96,16 @@ TASKS_DIR/
     - A) Commit or stash stray changes manually, then re-invoke `/implement`
     - B) Agent commits stray changes as a single `chore: WIP pre-implement` commit and proceeds
     - C) Abort
-   - Rationale: implementer subagents edit files in parallel and commit per batch. Unrelated uncommitted changes get silently folded into batch commits otherwise.
+   - Rationale: each batch ends with a commit. Unrelated uncommitted changes would get silently folded into batch commits otherwise.
   - This check is repeated at the start of each batch iteration (see step 6 / step 14 Loop).

 ## Algorithm

 ### 1. Parse

- Read all task `*.md` files from `TASKS_DIR/todo/` (excluding files starting with `_`)
+- Read selected task `*.md` files from `TASKS_DIR/todo/` (excluding files starting with `_`)
 - Read `_dependencies_table.md` — parse into a dependency graph (DAG)
- Validate: no circular dependencies, all referenced dependencies exist
+- Validate: no circular dependencies in the selected task graph, all referenced selected-task dependencies exist or are already completed in `TASKS_DIR/done/`

 ### 2. Detect Progress

@@ -78,22 +118,23 @@ TASKS_DIR/

 - Topological sort remaining tasks
 - Select tasks whose dependencies are ALL satisfied (completed)
- If a ready task depends on any task currently being worked on in this batch, it must wait for the next batch
- Cap the batch at 4 parallel agents
+- A batch is simply a coherent group of tasks for review + commit. Within the batch, tasks are implemented sequentially in topological order.
+- Cap the batch size at a reasonable review scope (default: 4 tasks)
 - If the batch would exceed 20 total complexity points, suggest splitting and let the user decide

 ### 4. Assign File Ownership

-The authoritative file-ownership map is `_docs/02_document/module-layout.md` (produced by the decompose skill's Step 1.5). Task specs are purely behavioral — they do NOT carry file paths. Derive ownership from the layout, not from the task spec's prose.
+The authoritative file-ownership map is `_docs/02_document/module-layout.md` (produced by the decompose skill's Step 1.5), unless `suite_level: true` was supplied in the invocation context — in which case the `module_layout_path` override is read instead (see "Suite-level invocation context" above). Task specs are purely behavioral — they do NOT carry file paths. Derive ownership from the layout, not from the task spec's prose.

 For each task in the batch:
 - Read the task spec's **Component** field.
 - Look up the component in `_docs/02_document/module-layout.md` → Per-Component Mapping.
- Set **OWNED** = the component's `Owns` glob (exclusive write for the duration of the batch).
+- Set **OWNED** = the component's `Owns` glob (the files this task is allowed to write).
 - Set **READ-ONLY** = Public API files of every component in the component's `Imports from` list, plus all `shared/*` Public API files.
 - Set **FORBIDDEN** = every other component's `Owns` glob, and every other component's internal (non-Public API) files.
 - If the task is a shared / cross-cutting task (lives under `shared/*`), OWNED = that shared directory; READ-ONLY = nothing; FORBIDDEN = every component directory.
- If two tasks in the same batch map to the same component or overlapping `Owns` globs, schedule them sequentially instead of in parallel.
+
+Since execution is sequential, there is no parallel-write conflict to resolve; ownership here is a **scope discipline** check — it stops a task from drifting into unrelated components even when alone.

 If `_docs/02_document/module-layout.md` is missing or the component is not found:
 - STOP the batch.
@@ -102,31 +143,30 @@ If `_docs/02_document/module-layout.md` is missing or the component is not found

 ### 5. Update Tracker Status → In Progress

-For each task in the batch, transition its ticket status to **In Progress** via the configured work item tracker (see `protocols.md` for tracker detection) before launching the implementer. If `tracker: local`, skip this step.
+For each task in the batch, transition its ticket status to **In Progress** via the configured work item tracker (see `protocols.md` for tracker detection) before starting work. If `tracker: local`, skip this step. If a tracker operation fails unexpectedly, follow `.cursor/rules/tracker.mdc`.

-### 6. Launch Implementer Subagents
+### 6. Implement Tasks Sequentially

-**Per-batch dirty-tree re-check**: before launching subagents, run `git status --porcelain`. On the first batch this is guaranteed clean by the prerequisite check. On subsequent batches, the previous batch ended with a commit so the tree should still be clean. If the tree is dirty at this point, STOP and surface the dirty files to the user using the same A/B/C choice as the prerequisite check. The most likely causes are a failed commit in the previous batch, a user who edited files mid-loop, or a pre-commit hook that re-wrote files and was not captured.
+**Per-batch dirty-tree re-check**: before starting the batch, run `git status --porcelain`. On the first batch this is guaranteed clean by the prerequisite check. On subsequent batches, the previous batch ended with a commit so the tree should still be clean. If the tree is dirty at this point, STOP and surface the dirty files to the user using the same A/B/C choice as the prerequisite check. The most likely causes are a failed commit in the previous batch, a user who edited files mid-loop, or a pre-commit hook that re-wrote files and was not captured.

-For each task in the batch, launch an `implementer` subagent with:
- Path to the task spec file
- List of files OWNED (exclusive write access)
- List of files READ-ONLY
- List of files FORBIDDEN
- **Explicit instruction**: the implementer must write or update tests that validate each acceptance criterion in the task spec. If a test cannot run in the current environment (e.g., TensorRT requires GPU), the test must still be written and skip with a clear reason.
+For each task in the batch **in topological order, one at a time**:
+1. Read the task spec file.
+2. Respect the file-ownership envelope computed in Step 4 (OWNED / READ-ONLY / FORBIDDEN).
+3. Implement the feature and write/update tests for every acceptance criterion in the spec. Tests for internal product behavior must exercise the production implementation path. If a test cannot run in the current environment (e.g., TensorRT requires GPU), the test must still exist and skip/block with a clear prerequisite reason, but that skip does not make missing production code complete.
+4. Run the relevant tests locally before moving on to the next task in the batch. If tests fail, fix in-place — do not defer.
+5. Capture a short per-task status line (files changed, tests pass/fail, any blockers) for the batch report.

-Launch all subagents immediately — no user confirmation.
+Do NOT spawn subagents and do NOT attempt to implement two tasks simultaneously, even if they touch disjoint files. See `.cursor/rules/no-subagents.mdc`.

-### 7. Monitor
+### 7. Collect Status

- Wait for all subagents to complete
- Collect structured status reports from each implementer
- If any implementer reports "Blocked", log the blocker and continue with others
+- After all tasks in the batch are finished, aggregate the per-task status lines into a structured batch status.
+- If any task reported "Blocked", log the blocker with the failing task's ID and continue — the batch report will surface it.

-**Stuck detection** — while monitoring, watch for these signals per subagent:
- Same file modified 3+ times without test pass rate improving → flag as stuck, stop the subagent, report as Blocked
- Subagent has not produced new output for an extended period → flag as potentially hung
- If a subagent is flagged as stuck, do NOT let it continue looping — stop it and record the blocker in the batch report
+**Stuck detection** — while implementing a task, watch for these signals in your own progress:
+- The same file has been rewritten 3+ times without tests going green → stop, mark the task Blocked, and move to the next task in the batch (the user will be asked at the end of the batch).
+- You have tried 3+ distinct approaches without evidence-driven progress → stop, mark Blocked, move on.
+- Do NOT loop indefinitely on a single task. Record the blocker and proceed.

 ### 8. AC Test Coverage Verification

@@ -139,8 +179,8 @@ Before code review, verify that every acceptance criterion in each task spec has
   - **Not covered**: no test exists for this AC

 If any AC is **Not covered**:
- This is a **BLOCKING** failure — the implementer must write the missing test before proceeding
- Re-launch the implementer with the specific ACs that need tests
+- This is a **BLOCKING** failure — the missing test must be written before proceeding
+- Go back to the offending task, add tests for the specific ACs that lack coverage, then re-run this check
 - If the test cannot run in the current environment (GPU required, platform-specific, external service), the test must still exist and skip with `pytest.mark.skipif` or `pytest.skip()` explaining the prerequisite
 - A skipped test counts as **Covered** — the test exists and will run when the environment allows

@@ -189,18 +229,22 @@ Track `auto_fix_attempts` and `escalated_findings` in the batch report for retro

 ### 12. Update Tracker Status → In Testing

-After the batch is committed and pushed, transition the ticket status of each task in the batch to **In Testing** via the configured work item tracker. If `tracker: local`, skip this step.
+After the batch is committed (and pushed if the user approved pushing), transition the ticket status of each task in the batch to **In Testing** via the configured work item tracker. If `tracker: local`, skip this step. If a tracker operation fails unexpectedly, follow `.cursor/rules/tracker.mdc`.

 ### 13. Archive Completed Tasks

 Move each completed task file from `TASKS_DIR/todo/` to `TASKS_DIR/done/`.

+For product implementation, this archive means "batch implementation accepted." The Product Implementation Completeness Gate can still require follow-up remediation tasks before the feature is complete; it does not move original task files back to `todo/`.
+
 ### 14. Loop

 - Go back to step 2 until all tasks in `todo/` are done

 ### 14.5. Cumulative Code Review (every K batches)

+**Skipped entirely when `suite_level: true`** (see "Suite-level invocation context" above) — the meta-repo has no `architecture_compliance_baseline.md` to evaluate against; cross-task drift is captured by the next `monorepo-status` cycle.
+
 - **Trigger**: every K completed batches (default `K = 3`; configurable per run via a `cumulative_review_interval` knob in the invocation context)
 - **Purpose**: per-batch review (Step 9) catches batch-local issues; cumulative review catches issues that only appear when tasks are combined — architecture drift, cross-task inconsistency, duplicate symbols introduced across different batches, contracts that drifted across producer/consumer batches
 - **Scope**: the union of files changed since the **last** cumulative review (or since the start of the run if this is the first)
@@ -216,22 +260,81 @@ Move each completed task file from `TASKS_DIR/todo/` to `TASKS_DIR/done/`.
 - **Interaction with Auto-Fix Gate**: Architecture findings (new category from code-review Phase 7) always escalate per the implement auto-fix matrix; they cannot silently auto-fix
 - **Resumability**: if interrupted, the next invocation checks for the latest `cumulative_review_batches_*.md` and computes the changed-file set from batch reports produced after that review

-### 15. Final Test Run
+### 15. Product Implementation Completeness Gate

- After all batches are complete, run the full test suite once
- Read and execute `.cursor/skills/test-run/SKILL.md` (detect runner, run suite, diagnose failures, present blocking choices)
- Test failures are a **blocking gate** — do not proceed until the test-run skill completes with a user decision
- When tests pass, report final summary
+Run this gate after all **product implementation** tasks are complete and before writing any final product implementation report or allowing autodev to proceed to testability/test decomposition. Skip this gate when (a) the remaining context is explicitly test implementation or refactoring (as determined by the task files and report filename rules), OR (b) `suite_level: true` was supplied in the invocation context (the gate's inputs do not exist in the meta-repo artifact layout — see "Suite-level invocation context" above).
+
+**Goal**: catch the failure mode where narrow tests validate scaffold behavior while the task's actual outcome, included scope, architecture promise, or named integration remains unimplemented.
+
+Inputs:
+
+- Completed product task specs from `_docs/02_tasks/done/` for the current cycle
+- `_docs/02_document/architecture.md`
+- `_docs/02_document/system-flows.md`
+- Relevant `_docs/02_document/components/*/description.md` files
+- Current source code under each completed task's ownership envelope
+- Batch reports and code-review reports for the current cycle
+
+For each completed product task:
+
+1. Read these sections from the task spec: `Description`, `Outcome`, `Scope / Included`, `Acceptance Criteria`, `Non-Functional Requirements`, `Constraints`, and explicit named technologies or integrations.
+2. Compare those promises against actual source code, not only tests or report prose.
+3. Search the task's owned component files for unresolved implementation markers: `placeholder`, `stub`, `reserved`, `TODO`, `NotImplemented`, `pass`, `deterministic`, `fake`, `mock`, `scaffold`, `native bridge`, and empty native/readme-only integration directories. Ignore test fixtures/mocks only when they are under test-owned paths and not used as production behavior.
+4. Verify that each named runtime dependency in the task promise is integrated as production behavior, not merely represented by an interface. Examples: if a task promises FAISS, DINOv2, BASALT, LightGlue, OpenCV, RANSAC, a database, cloud service, or hardware SDK, the production code must either call that dependency or contain an adapter that loads and executes the real dependency package. A deterministic fallback, fake runner, empty `native/` package, or "bridge to be supplied later" is **FAIL** unless the task itself explicitly scoped the dependency out before implementation started.
+5. Distinguish internal implementation from external prerequisites:
+   - Internal product capabilities (VIO, anchor verification, cache retrieval, safety wrapper, FDR, MAVLink emission) must be implemented in production code before the task can pass.
+   - External systems/hardware/data (Jetson device, physical camera, ArduPilot process, QGC, third-party service credentials, unavailable licensed dataset) may be `BLOCKED` only when production code exists and the missing prerequisite is outside the product boundary.
+6. Verify tests exercise the real implementation path where local prerequisites exist. Environment-gated tests may skip only with an explicit prerequisite reason; they do not make missing production code complete.
+7. For any architecture promise that describes an end-to-end user outcome, verify there is an executable production pipeline connecting the relevant components. Isolated component contracts and test-only harness orchestration are not enough.
+8. Classify each task:
+   - **PASS**: task promises are implemented or explicitly out of scope in the task itself.
+   - **BLOCKED**: production code exists but cannot be fully verified due to external hardware/data/license/runtime prerequisites; the blocker is explicit and tests report blocked/skipped with reason.
+   - **FAIL**: promised production behavior is missing, only scaffolded, or only represented in tests/reports.
+
+Save the audit to `_docs/03_implementation/implementation_completeness_cycle[N]_report.md` with:
+
+- Per-task classification
+- Evidence files/symbols checked
+- Any unresolved scaffold/native placeholders
+- Any named promised technologies not integrated
+- Required remediation task suggestions, each sized to 5 points or less
+
+Gate:
+
+- If every product task is `PASS` or `BLOCKED` with explicit prerequisite evidence, continue to Final Test Run.
+- If any product task is `FAIL`, STOP. Do not write the final product implementation report and do not proceed to any downstream autodev step. Completed original task files remain in `done/`; the missing work is represented by remediation tasks. Present a Choose block:
+  - A) Create remediation tasks now and return to implementation
+  - B) Mark the missing behavior explicitly out of scope in task/docs, then re-run this gate
+  - C) Abort for manual correction
+- Recommendation must normally be A unless the user deliberately accepts reduced scope.
+
+Remediation task creation:
+
+1. For each `FAIL`, create one or more task specs using `.cursor/skills/decompose/templates/task.md`; each remediation task must be sized at 5 points or less.
+2. Save each task to `_docs/02_tasks/todo/` with a short name prefixed by `remediate_`.
+3. Set **Component** to the failed task's component and set **Dependencies** to the failed task ID plus any remediation prerequisites.
+4. Create or defer tracker tickets using the same tracker rules as decompose/new-task: if tracker is available, create tickets immediately; if the user explicitly chose `tracker: local`, keep numeric prefixes with `Tracker: pending` / `Epic: pending`.
+5. Append the remediation tasks to `_docs/02_tasks/_dependencies_table.md`.
+6. Return to Step 1 (Parse) in **Product implementation** context. The final product implementation report can be written only after remediation tasks complete and this gate reruns without `FAIL`.
+
+### 16. Final Test Run
+
+- After all batches are complete, run the full test suite once unless the invoking flow's immediate next step is `Run Tests`.
+- If the next flow step is `Run Tests`, record a handoff in the final implementation report and let `.cursor/skills/test-run/SKILL.md` own the full-suite gate to avoid duplicate full runs.
+- When this step does run, read and execute `.cursor/skills/test-run/SKILL.md` (detect runner, run suite, diagnose failures, present blocking choices).
+- Test failures are a **blocking gate** — do not proceed until the test-run skill completes with a user decision.
+- When tests pass, report final summary.

 ## Batch Report Persistence

-After each batch completes, save the batch report to `_docs/03_implementation/batch_[NN]_cycle[N]_report.md` for feature implementation (or `batch_[NN]_report.md` for test/refactor runs). Create the directory if it doesn't exist. When all tasks are complete, produce a FINAL implementation report with a summary of all batches. The filename depends on context:
+After each batch completes, save the batch report to `_docs/03_implementation/batch_[NN]_cycle[N]_report.md` for feature implementation (or `batch_[NN]_report.md` for test/refactor runs). Create the directory if it doesn't exist. For product implementation, produce the FINAL implementation report only after the Product Implementation Completeness Gate passes. For test and refactor implementation, produce the FINAL report after all selected tasks complete and the full-suite gate is either run or handed off per Step 16. The filename depends on context:

 - **Test implementation** (tasks from test decomposition): `_docs/03_implementation/implementation_report_tests.md`
 - **Feature implementation**: `_docs/03_implementation/implementation_report_{feature_slug}_cycle{N}.md` where `{feature_slug}` is derived from the batch task names (e.g., `implementation_report_core_api_cycle2.md`) and `{N}` is the current `state.cycle` from `_docs/_autodev_state.md`. If `state.cycle` is absent (pre-migration), default to `cycle1`.
 - **Refactoring**: `_docs/03_implementation/implementation_report_refactor_{run_name}.md`
+- **Suite-level** (when `suite_level: true` was supplied — see "Suite-level invocation context" above): `_docs/03_implementation/suite_implementation_report_{run_name}.md`. Batch reports use `_docs/03_implementation/suite_batch_{NN}_report.md`. `{run_name}` is derived from the batch task IDs (e.g., `suite_implementation_report_az543_az549_az550.md`).

-Determine the context from the task files being implemented: if all tasks have test-related names or belong to a test epic, use the tests filename; otherwise derive the feature slug from the component names and append the cycle suffix.
+Determine the context from the task files being implemented: if all tasks have test-related names or belong to a test epic, use the tests filename; if `suite_level: true` was supplied, use the suite filename; otherwise derive the feature slug from the component names and append the cycle suffix.

 Batch report filenames must also include the cycle counter when running feature implementation: `_docs/03_implementation/batch_{NN}_cycle{N}_report.md` (test and refactor runs may use the plain `batch_{NN}_report.md` form since they are not cycle-scoped).

@@ -264,9 +367,10 @@ After each batch, produce a structured report:

 | Situation | Action |
 |-----------|--------|
-| Implementer fails same approach 3+ times | Stop it, escalate to user |
+| Same task rewritten 3+ times without green tests | Mark Blocked, continue batch, escalate at batch end |
 | Task blocked on external dependency (not in task list) | Report and skip |
-| File ownership conflict unresolvable | ASK user |
+| File ownership violated (task wrote outside OWNED) | ASK user |
+| Product completeness gate finds missing promised implementation | STOP — create remediation tasks or get explicit user scope reduction |
 | Test failure after final test run | Delegate to test-run skill — blocking gate |
 | All tasks complete | Report final summary, suggest final commit |
 | `_dependencies_table.md` missing | STOP — run `/decompose` first |
@@ -281,7 +385,8 @@ Each batch commit serves as a rollback checkpoint. If recovery is needed:

 ## Safety Rules

- Never launch tasks whose dependencies are not yet completed
- Never allow two parallel agents to write to the same file
- If a subagent fails or is flagged as stuck, stop it and report — do not let it loop indefinitely
- Always run the full test suite after all batches complete (step 15)
+- Never start a task whose dependencies are not yet completed
+- Never run tasks in parallel and never spawn subagents — see `.cursor/rules/no-subagents.mdc`
+- If a task is flagged as stuck, stop working on it and report — do not let it loop indefinitely
+- Always run the Product Implementation Completeness Gate before final product reports
+- Always run or hand off the full test suite after all batches complete (step 16)
@@ -3,29 +3,31 @@
 ## Topological Sort with Batch Grouping

 The `/implement` skill uses a topological sort to determine execution order,
-then groups tasks into batches for parallel execution.
+then groups tasks into batches for code review and commit. Execution within a
+batch is **sequential** — see `.cursor/rules/no-subagents.mdc`.

 ## Algorithm

 1. Build adjacency list from `_dependencies_table.md`
 2. Compute in-degree for each task node
-3. Initialize batch 0 with all nodes that have in-degree 0
+3. Initialize the ready set with all nodes that have in-degree 0
 4. For each batch:
-   a. Select up to 4 tasks from the ready set
-   b. Check file ownership — if two tasks would write the same file, defer one to the next batch
-   c. Launch selected tasks as parallel implementer subagents
-   d. When all complete, remove them from the graph and decrement in-degrees of dependents
-   e. Add newly zero-in-degree nodes to the next batch's ready set
+   a. Select up to 4 tasks from the ready set (default batch size cap)
+   b. Implement the selected tasks one at a time in topological order
+   c. When all tasks in the batch complete, remove them from the graph and
+      decrement in-degrees of dependents
+   d. Add newly zero-in-degree nodes to the ready set
 5. Repeat until the graph is empty

-## File Ownership Conflict Resolution
+## Ordering Inside a Batch

-When two tasks in the same batch map to overlapping files:
- Prefer to run the lower-numbered task first (it's more foundational)
- Defer the higher-numbered task to the next batch
- If both have equal priority, ask the user
+Tasks inside a batch are executed in topological order — a task is only
+started after every task it depends on (inside the batch or in a previous
+batch) is done. When two tasks have the same topological rank, prefer the
+lower-numbered (more foundational) task first.

 ## Complexity Budget

 Each batch should not exceed 20 total complexity points.
 If it does, split the batch and let the user choose which tasks to include.
+The budget exists to keep the per-batch code review scope reviewable.
@@ -129,7 +129,8 @@ If `_docs/_repo-config.yaml` already exists:
   - Entries removed (component removed from registry)
 4. **Ask the user** whether to apply the diff.
 5. If applied, **preserve `confirmed: true` flags** for entries that still match — don't reset human-approved mappings.
-6. If user declines, stop — leave config untouched.
+6. **Preserve user-owned top-level keys verbatim**: `glossary_doc:` (written by autodev meta-repo Step 2.5) and any `assumptions_log:` entries are NEVER edited or removed by this skill. Carry them through unchanged. If the file referenced by `glossary_doc:` no longer exists on disk, surface as an `unresolved:` question — do not auto-clear the field.
+7. If user declines, stop — leave config untouched.

 ### Phase 8: Batch question checkpoint (M4)

@@ -15,6 +15,8 @@ Propagates component changes into the unified documentation set. Strictly scoped
 | Root `README.md` **only** if `_repo-config.yaml` lists it as a doc target (e.g., services table) | Install scripts (`ci-*.sh`) → use `monorepo-cicd` |
 | Docs index (`_docs/README.md` or similar) cross-reference tables | Component-internal docs (`<component>/README.md`, `<component>/docs/*`) |
 | Cross-cutting docs listed in `docs.cross_cutting` | `_docs/_repo-config.yaml` itself (only `monorepo-discover` and `monorepo-onboard` write it) |
+| Body of cross-cutting docs **except** the `## Architecture Vision` section (preserved verbatim — owned by autodev meta-repo Step 2.5) | The file at `glossary_doc:` (user-confirmed; only autodev meta-repo Step 2.5 rewrites it). New project terms surfaced during sync are reported back to the user, not silently appended |
+| `## Architecture Vision` body — read-only, may be referenced for terminology consistency but never edited | — |

 If a component change requires CI/env updates too, tell the user to also run `monorepo-cicd`. This skill does NOT cross domains.

@@ -166,6 +168,8 @@ Append to `_docs/_repo-config.yaml` under `assumptions_log:`:
 - Change `confirmed_by_user` or any `confirmed: <bool>` flag
 - Auto-commit or push
 - Guess a mapping not in the config
+- Edit `glossary_doc:` (the file recorded under the config's `glossary_doc:` key)
+- Edit the `## Architecture Vision` section of any cross-cutting doc; if a sync would conflict with that section, surface the conflict to the user and skip — do not silently rewrite user-confirmed content

 ## Edge cases

@@ -0,0 +1,152 @@
+---
+name: monorepo-e2e
+description: Syncs the suite-level integration e2e harness (`e2e/docker-compose.suite-e2e.yml`, fixtures, Playwright runner) when component contracts drift in ways that affect the cross-service scenario. Reads `_docs/_repo-config.yaml` to know which suite-e2e artifacts are in play. Touches ONLY suite-e2e files — never per-component CI, docs, or component internals. Use when a component changes a port, env var, public API endpoint, DB schema column, or detection model that the suite e2e exercises.
+---
+
+# Monorepo Suite-E2E
+
+Propagates component changes into the suite-level integration e2e harness. Strictly scoped — never edits docs, component internals, per-component CI configs, or the production deploy compose.
+
+## Scope — explicit
+
+| In scope | Out of scope |
+| -------- | ------------ |
+| `e2e/docker-compose.suite-e2e.yml` (overlay, healthchecks, seed services) | Production `_infra/deploy/<target>/docker-compose.yml` — `monorepo-cicd` owns it |
+| `e2e/fixtures/init.sql` (seeded rows that the spec depends on) | Component DB migrations — owned by each component |
+| `e2e/fixtures/expected_detections.json` (detection baseline) | Detection model itself — owned by `detections/` |
+| `e2e/runner/tests/*.spec.ts` selector / contract-driven edits | New scenarios (user-driven, not drift-driven) |
+| `e2e/runner/Dockerfile` / `package.json` Playwright version bumps | Net-new e2e infrastructure (use `monorepo-onboard` or initial scaffolding) |
+| `.woodpecker/suite-e2e.yml` (suite-level pipeline) | Per-component `.woodpecker/01-test.yml` / `02-build-push.yml` — `monorepo-cicd` owns those |
+| Suite-e2e leftover entries under `_docs/_process_leftovers/` | Per-component leftovers — owned by each component |
+
+If a component change needs doc updates too, tell the user to also run `monorepo-document`. If it needs production-deploy or per-component CI updates, run `monorepo-cicd`. This skill **only** updates the suite-e2e surface.
+
+## Preconditions (hard gates)
+
+1. `_docs/_repo-config.yaml` exists.
+2. Top-level `confirmed_by_user: true`.
+3. `suite_e2e.*` section is populated in config (see "Required config block" below). If absent, abort and ask the user to extend the config via `monorepo-discover`.
+4. Components-in-scope have confirmed contract mappings (port, public API path, DB tables touched), OR user explicitly approves inferred ones.
+
+## Required config block
+
+This skill expects `_docs/_repo-config.yaml` to carry:
+
+```yaml
+suite_e2e:
+  overlay: e2e/docker-compose.suite-e2e.yml
+  fixtures:
+    init_sql: e2e/fixtures/init.sql
+    baseline_json: e2e/fixtures/expected_detections.json
+    binary_fixtures:
+      - e2e/fixtures/sample.mp4
+      - e2e/fixtures/model.tar.gz
+  runner:
+    dockerfile: e2e/runner/Dockerfile
+    package_json: e2e/runner/package.json
+    spec_dir: e2e/runner/tests
+  pipeline: .woodpecker/suite-e2e.yml
+  scenario:
+    description: "Upload video → detect → overlays → dataset → DB persistence"
+    components_exercised:
+      - ui
+      - annotations
+      - detections
+      - postgres-local
+    api_contracts:
+      - component: ui
+        path: /api/admin/auth/login
+      - component: annotations
+        path: /api/annotations/media/batch
+      - component: annotations
+        path: /api/annotations/media/{id}/annotations
+    db_tables:
+      - media
+      - annotations
+      - detection
+      - detection_classes
+    model_pin:
+      detections_repo_path: <path-to-model-config-or-classes-source>
+      classes_source: annotations/src/Database/DatabaseMigrator.cs
+```
+
+If `suite_e2e:` is missing the skill **stops** — it does not invent a default mapping.
+
+## Mitigations (M1–M7)
+
+- **M1** Separation: this skill only touches suite-e2e files; no production deploy compose, no per-component CI, no docs, no component internals.
+- **M3** Factual vs. interpretive: port, env var, API path, DB column — FACTUAL, read from the components' code. Whether a baseline still matches the model — DEFERRED to the user (the skill flags drift, never silently re-records).
+- **M4** Batch questions at checkpoints.
+- **M5** Skip over guess: a component change that doesn't map cleanly to one of the in-scope artifacts → skip and report.
+- **M6** Assumptions footer + append to `_repo-config.yaml` `assumptions_log`.
+- **M7** Drift detection: verify every path under `suite_e2e.*` exists on disk; stop if not.
+
+## Workflow
+
+### Phase 1: Drift check (M7)
+
+Verify every file listed under `suite_e2e.*` (excluding `binary_fixtures`, which are gitignored) exists on disk. Missing file → stop and ask:
+- Run `monorepo-discover` to refresh, OR
+- Skip the missing artifact (recorded in report)
+
+For `binary_fixtures` paths that are absent (expected — they live in S3/LFS), check whether `expected_detections.json._meta.video_sha256` is still a `TBD-...` placeholder. If yes, surface this as a known leftover (`_docs/_process_leftovers/2026-04-22_suite-e2e-binary-fixtures.md`) and continue.
+
+### Phase 2: Determine scope
+
+Same as `monorepo-cicd` Phase 2 — ask the user, or auto-detect. For **auto-detect**, flag commits that touch suite-e2e-relevant concerns:
+
+| Commit pattern | Suite-e2e impact |
+| -------------- | ---------------- |
+| New port exposed by `<component>` | Healthcheck override may change in `e2e/docker-compose.suite-e2e.yml` |
+| New required env var on `<component>` | `e2e/docker-compose.suite-e2e.yml` `e2e-runner` env block + `init.sql` seed |
+| Public API path renamed / removed | Spec selector / API call path in `e2e/runner/tests/*.spec.ts` |
+| DB schema column renamed in a `db_tables` entry | `init.sql` column reference + spec `pg.query` text |
+| New required DB table referenced by spec | `init.sql` insert block (skip if owned by component migration) |
+| Detection model rev change in `detections/` | `expected_detections.json` `_meta.model.revision` + flag baseline as stale |
+| New canonical detection class added | `expected_detections.json._meta` annotation |
+
+Present the flagged list; confirm.
+
+### Phase 3: Classify changes per component
+
+| Change type | Target suite-e2e files |
+| ----------- | ---------------------- |
+| Port / env var change | `e2e/docker-compose.suite-e2e.yml` |
+| API path / contract change | `e2e/runner/tests/*.spec.ts` |
+| DB schema reference change | `e2e/fixtures/init.sql` and spec SQL queries |
+| Model / class catalog change | `e2e/fixtures/expected_detections.json` (mark `_meta.fixture_version` bump + leftover entry for binary refresh) |
+| Playwright dependency drift | `e2e/runner/package.json` + `e2e/runner/Dockerfile` |
+| Suite scenario steps gone stale | **Stop and ask** — scenario edits are user-driven, not drift-driven |
+
+### Phase 4: Apply edits
+
+Edit each in-scope file. After each batch, run `ReadLints` on touched files. Do NOT run the suite e2e itself — that's a downstream pipeline operation, not a sync-skill responsibility.
+
+For `expected_detections.json`: when the model revision changes, the skill **does not** re-record the baseline — the binary fixture cannot be regenerated from the dev environment. Instead:
+1. Set `_meta.model.revision` to the new revision.
+2. Set `_meta.fixture_version` to a new bumped version with a `-stale` suffix (e.g., `0.2.0-stale`).
+3. Append a new entry to `_docs/_process_leftovers/` describing the required re-record.
+4. Leave `expected.by_class` untouched — the spec's tolerance check will fail loudly until the binary refresh lands.
+
+### Phase 5: Update assumptions log
+
+Append a new `assumptions_log:` entry to `_docs/_repo-config.yaml` recording:
+- Date, components in scope, which suite-e2e files were touched
+- Any inferred contract mappings still tagged `confirmed: false`
+- Any leftover entries created
+
+### Phase 6: Report
+
+Render a Choose-format summary of the synced files, surface any `_process_leftovers/` entries created, and end. Do NOT auto-commit.
+
+## Self-verification
+
+- [ ] No file outside `e2e/`, `.woodpecker/suite-e2e.yml`, or `_docs/_process_leftovers/` was edited
+- [ ] `_docs/_repo-config.yaml` `suite_e2e:` block was not silently mutated except for `assumptions_log` append
+- [ ] `expected_detections.json` was not re-recorded (only metadata bumped + leftover added)
+- [ ] Every spec edit traces to a flagged commit pattern in Phase 2
+- [ ] `ReadLints` clean on every touched file
+
+## Failure handling
+
+Same retry / escalation protocol as `monorepo-cicd` — see `protocols.md`. The most common failure mode is the binary-fixture leftover (sample.mp4 missing or SHA-mismatched); this skill does not attempt to resolve it, only surfaces it.
@@ -59,6 +59,8 @@ Mark each as `complete` / `partial` / `missing` and explain.
 - Every component in `components:` appears in the registry — flag mismatches
 - Every `docs.root` file cross-referenced in config exists on disk — flag missing
 - Every `ci.orchestration_files` and `ci.install_scripts` exists — flag missing
+- `glossary_doc:` (if recorded in config) points to a file that exists on disk — flag missing
+- The cross-cutting architecture doc identified by `docs.cross_cutting` contains a `## Architecture Vision` section — flag missing (signals the meta-repo flow's Step 2.5 was skipped or the section was removed)

 ### Section 5: Unresolved questions

@@ -113,6 +115,8 @@ In registry, not in config:    [list or "(none)"]
 In config, not in registry:    [list or "(none)"]
 Config-referenced docs missing: [list or "(none)"]
 Config-referenced CI files missing: [list or "(none)"]
+glossary_doc:                  [path or "not recorded — run /autodev to capture"]
+Architecture Vision section:   [present | missing in <doc>]

 ═══════════════════════════════════════════════════
 Unresolved questions
@@ -75,7 +75,7 @@ Record the description verbatim for use in subsequent steps.
 **Role**: Technical analyst
 **Goal**: Determine whether deep research is needed.

-Read the user's description and the existing codebase documentation from DOCUMENT_DIR (architecture.md, components/, system-flows.md).
+Read the user's description and the existing codebase documentation from DOCUMENT_DIR (architecture.md including its `## Architecture Vision` section, glossary.md, components/, system-flows.md). Use `glossary.md` to keep the new task's name, acceptance-criteria wording, and component references aligned with the user's confirmed vocabulary; flag the task to the user if the request appears to violate an Architecture Vision principle, do not silently allow it.

 **Consult LESSONS.md**: if `_docs/LESSONS.md` exists, read it and look for entries in categories `estimation`, `architecture`, `dependencies` that might apply to the task under consideration. If a relevant lesson exists (e.g., "estimation: auth-related changes historically take 2x estimate"), bias the classification and recommendation accordingly. Note in the output which lessons (if any) were applied.

@@ -134,7 +134,8 @@ The `<task_slug>` is a short kebab-case name derived from the feature descriptio
 **Goal**: Determine where and how to insert the new functionality, and whether existing tests cover the new requirements.

 1. Read the codebase documentation from DOCUMENT_DIR:
-   - `architecture.md` — overall structure
+   - `architecture.md` — overall structure (the `## Architecture Vision` H2 is user-confirmed intent and must not be violated by the new task without explicit approval)
+   - `glossary.md` — project terminology; reuse the user's vocabulary in task names, AC, and component references
   - `components/` — component specs
   - `system-flows.md` — data flows (if exists)
   - `data_model.md` — data model (if exists)
@@ -281,7 +282,7 @@ Present using the Choose format for each decision that has meaningful alternativ
   - Update **Epic** field: `[EPIC-ID]`
 3. Rename the file from `[##]_[short_name].md` to `[TICKET-ID]_[short_name].md`

-If the work item tracker is not authenticated or unavailable (`tracker: local`):
+If the work item tracker is not authenticated or unavailable, follow `.cursor/rules/tracker.mdc` before continuing. Only if the user explicitly chooses `tracker: local`:
 - Keep the numeric prefix
 - Set **Tracker** to `pending`
 - Set **Epic** to `pending`
@@ -336,7 +337,7 @@ After the user chooses **Done**:
 | Research skill hits a blocker | Follow research skill's own escalation rules |
 | Codebase analysis reveals conflicting architectures | **ASK** user which pattern to follow |
 | Complexity exceeds 5 points | **WARN** user and suggest splitting into multiple tasks |
-| Work item tracker MCP unavailable | **WARN**, continue with local-only task files |
+| Work item tracker MCP unavailable | Follow `.cursor/rules/tracker.mdc`; do not continue in local mode unless the user explicitly chooses it |

 ## Trigger Conditions

@@ -69,7 +69,7 @@ Capture any new questions, findings, or insights that arise during test specific

 ### Step 2: Solution Analysis

-Read and follow `steps/02_solution-analysis.md`.
+Read and follow `steps/02_solution-analysis.md`. The step opens with **Phase 2a.0: Glossary & Architecture Vision** (BLOCKING) — drafts `_docs/02_document/glossary.md` and a one-paragraph architecture vision, presents the condensed view to the user, iterates until confirmed, then proceeds into the architecture, data-model, and deployment phases. The confirmed vision becomes the first `## Architecture Vision` H2 of `architecture.md`.

 ---

@@ -107,6 +107,7 @@ Read and follow `steps/07_quality-checklist.md`.
 - **Coding during planning**: this workflow produces documents, never code
 - **Multi-responsibility components**: if a component does two things, split it
 - **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
+- **Skipping the glossary/vision gate (Phase 2a.0)**: drafting `architecture.md` from raw `solution.md` without confirming terminology and vision means the AI's mental model is not aligned with the user's; every downstream artifact will inherit that drift
 - **Diagrams without data**: generate diagrams only after the underlying structure is documented
 - **Copy-pasting problem.md**: the architecture doc should analyze and transform, not repeat the input
 - **Vague interfaces**: "component A talks to component B" is not enough; define the method, input, output
@@ -137,8 +138,10 @@ Read and follow `steps/07_quality-checklist.md`.
 │                                                                │
 │ 1. Blackbox Tests      → test-spec/SKILL.md                     │
 │    [BLOCKING: user confirms test coverage]                     │
-│ 2. Solution Analysis   → architecture, data model, deployment   │
-│    [BLOCKING: user confirms architecture]                      │
+│ 2. Solution Analysis   → glossary + vision, architecture,       │
+│                          data model, deployment                 │
+│    [BLOCKING 2a.0: user confirms glossary + vision]            │
+│    [BLOCKING 2a:   user confirms architecture]                  │
 │ 3. Component Decomp    → component specs + interfaces           │
 │    [BLOCKING: user confirms components]                        │
 │ 4. Review & Risk       → risk register, iterations              │
@@ -4,20 +4,105 @@
 **Goal**: Produce `architecture.md`, `system-flows.md`, `data_model.md`, and `deployment/` from the solution draft
 **Constraints**: No code, no component-level detail yet; focus on system-level view

+### Phase 2a.0: Glossary & Architecture Vision (BLOCKING)
+
+**Role**: Software architect + business analyst
+**Goal**: Align the AI's mental model of the project with the user's intent BEFORE drafting `architecture.md`. Capture domain terminology and the user's high-level architecture vision so every downstream artifact (architecture, components, flows, tests, epics) is grounded in confirmed user intent — not in AI inference.
+
+**Inputs**:
+- `_docs/00_problem/problem.md`, `acceptance_criteria.md`, `restrictions.md`
+- `_docs/00_problem/input_data/*`
+- `_docs/01_solution/solution.md` (and any earlier `solution_draft*.md` siblings)
+- Any blackbox-test findings produced in Step 1
+
+**Outputs**:
+- `_docs/02_document/glossary.md` (NEW)
+- A confirmed "Architecture Vision" paragraph + bullet list held in working memory and used as the spine of Phase 2a's `architecture.md`
+
+**Procedure**:
+
+1. **Draft glossary** — extract project-specific terminology from inputs (NOT generic software terms). Include:
+   - Domain entities, processes, and roles
+   - Acronyms / abbreviations
+   - Internal codenames or product names
+   - Synonym pairs in active use (e.g., "flight" vs. "mission")
+   - Stakeholder personas referenced in problem.md
+   Each entry: one-line definition, plus a parenthetical source (`source: problem.md`, `source: solution.md §3`).
+   Skip terms that have a single well-known industry meaning (REST, JSON, etc.).
+
+2. **Draft architecture vision** — synthesize from inputs:
+   - **One paragraph**: what the system is, who uses it, the shape of the runtime topology (monolith / services / pipeline / library / hybrid).
+   - **Components & responsibilities** (one-line each). At this stage these are *intent-level*, not the formal decomposition that Step 3 produces.
+   - **Major data flows** (one or two sentences each).
+   - **Architectural principles / non-negotiables** the user has implied (e.g., "DB-driven config", "no per-component state outside Redis", "all UI traffic via REST + SSE only").
+   - **Open architectural questions** the AI cannot resolve from inputs alone.
+
+3. **Present condensed view** to the user (NOT the full draft files — a synopsis only):
+
+   ```
+   ══════════════════════════════════════
+    REVIEW: Glossary + Architecture Vision
+   ══════════════════════════════════════
+    Glossary (N terms drafted):
+      - <Term>: <one-line definition>
+      - ...
+    Architecture Vision:
+      <one-paragraph synopsis>
+
+      Components / responsibilities:
+        - <component>: <one-line>
+        - ...
+
+      Principles / non-negotiables:
+        - <principle>
+        - ...
+
+      Open questions (AI could not resolve):
+        - <q1>
+        - <q2>
+   ══════════════════════════════════════
+    A) Looks correct — write glossary.md, use vision for Phase 2a
+    B) I want to add / correct entries (provide diffs)
+    C) Answer the open questions first, then re-present
+   ══════════════════════════════════════
+    Recommendation: pick C if open questions exist, otherwise A
+   ══════════════════════════════════════
+   ```
+
+4. **Iterate**:
+   - On B → integrate the user's diffs/additions, re-present the condensed view, loop until A.
+   - On C → ask the listed open questions one round (M4-style batch), integrate answers, re-present.
+   - **Do NOT proceed to step 5 until the user picks A.**
+
+5. **Save**:
+   - Write `_docs/02_document/glossary.md` with terms in alphabetical order. Include a top-line `**Status**: confirmed-by-user` and the date.
+   - Hold the confirmed vision (paragraph + components + principles) in working memory; Phase 2a will materialize it into `architecture.md` and **must** preserve every confirmed principle and component intent verbatim.
+
+**Self-verification**:
+- [ ] Every glossary entry traces to at least one input file (no invented terms)
+- [ ] Every component listed in the vision is one the inputs reference
+- [ ] All open questions are either answered or explicitly deferred (with the user's acknowledgement)
+- [ ] User picked option A on the latest condensed view
+
+**BLOCKING**: Do NOT proceed to Phase 2a until `glossary.md` is saved and the user has confirmed the architecture vision.
+
 ### Phase 2a: Architecture & Flows

 1. Read all input files thoroughly
 2. Incorporate findings, questions, and insights discovered during Step 1 (blackbox tests)
-3. Research unknown or questionable topics via internet; ask user about ambiguities
-4. Document architecture using `templates/architecture.md` as structure
-5. Document system flows using `templates/system-flows.md` as structure
+3. **Apply confirmed vision from Phase 2a.0**: the architecture document must include a top-level `## Architecture Vision` section that contains the user-confirmed paragraph, components, and principles verbatim. The rest of `architecture.md` (tech stack, deployment model, NFRs, ADRs) builds on top of that section, never contradicts it
+4. Research unknown or questionable topics via internet; ask user about ambiguities
+5. Document architecture using `templates/architecture.md` as structure
+6. Document system flows using `templates/system-flows.md` as structure

 **Self-verification**:
+- [ ] `architecture.md` opens with a `## Architecture Vision` section matching Phase 2a.0
 - [ ] Architecture covers all capabilities mentioned in solution.md
 - [ ] System flows cover all main user/system interactions
- [ ] No contradictions with problem.md or restrictions.md
+- [ ] No contradictions with problem.md, restrictions.md, or the confirmed vision
 - [ ] Technology choices are justified
 - [ ] Blackbox test findings are reflected in architecture decisions
+- [ ] Every term used in `architecture.md` that is project-specific appears in `glossary.md`

 **Save action**: Write `architecture.md` and `system-flows.md`

@@ -58,4 +58,4 @@ Do NOT create minimal epics with just a summary and short description. The epic

 8. **Create "Blackbox Tests" epic** — this epic will parent the blackbox test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `tests/`.

-**Save action**: Epics created via the configured tracker MCP. Also saved locally in `epics.md` with ticket IDs. If `tracker: local`, save locally only.
+**Save action**: Epics created via the configured tracker MCP. Also saved locally in `epics.md` with ticket IDs. If tracker availability fails, follow `.cursor/rules/tracker.mdc`; only if the user explicitly chooses `tracker: local`, save locally only with pending tracker markers.
@@ -133,4 +133,4 @@ Link to architecture.md and relevant component spec.]
  - `component` — a normal per-component epic
  - `cross-cutting` — a shared concern that spans ≥2 components
  - `tests` — the blackbox-tests epic (always exactly one)
- Complexity points for child issues follow the project standard: 1, 2, 3, 5, 8. Do not create issues above 5 points — split them.
+- Complexity points for child issues follow the project standard: 1, 2, 3, 5. Do not create issues above 5 points — split them.
@@ -181,6 +181,8 @@ Categorized measurable criteria with markdown headers and bullet points:

 Every criterion must have a measurable value. Vague criteria like "should be fast" are not acceptable — push for "less than 400ms end-to-end".

+**AC must be design-independent**: describe testable outcomes only — no libraries, algorithms, params, or design choices. Implementation follows AC, never reverse. (IEEE 830 / Atlassian / GitScrum)
+
 ### input_data/

 At least one file. Options:
@@ -24,6 +24,8 @@ Phase details live in `phases/` — read the relevant file before executing each
 - **Save immediately**: write artifacts to disk after each phase
 - **Delegate execution**: all code changes go through the implement skill via task files
 - **Ask, don't assume**: when scope or priorities are unclear, STOP and ask the user
+- **Exact-fit recommendations**: do not recommend a replacement pattern, library, service, architecture, algorithm, or "modern approach" merely because it improves structure or solves a similar class of problem. It must fit confirmed product constraints, acceptance criteria, operating context, integration boundaries, and current code realities. Otherwise reject it, mark it experimental, or ask the user before adding it to the roadmap.
+- **Per-mode API capability verification on replacements**: when a refactor proposes replacing or adding a library/SDK/framework/service that exposes multiple modes or configurations, pin the exact mode the refactored code will use (inputs, outputs, runtime) and verify *that mode* via mandatory `context7` lookup plus a saved Minimum Viable Example before promoting the recommendation to `Selected`. Capability claims at the category level ("supports A, B, C modes") must be cross-checked against the literal mode enumeration — `A, B → A+B` style conflations are the recurring silent-failure path.

 ## Context Resolution

@@ -57,7 +59,7 @@ Create REFACTOR_DIR and RUN_DIR if missing. If a RUN_DIR with the same name alre

 Both modes produce `RUN_DIR/list-of-changes.md` (template: `templates/list-of-changes.md`). Both modes then convert that file into task files in TASKS_DIR during Phase 2.

-**Guided mode cleanup**: after `RUN_DIR/list-of-changes.md` is created from the input file, delete the original input file to avoid duplication.
+**Guided mode cleanup**: after `RUN_DIR/list-of-changes.md` is created from the input file, delete the original input file only if it lives outside `RUN_DIR`. If the provided file is already the canonical `RUN_DIR/list-of-changes.md`, keep it as the audit record.

 ## Workflow

@@ -79,10 +81,10 @@ Both modes produce `RUN_DIR/list-of-changes.md` (template: `templates/list-of-ch
 - "refactor [specific target]" → skip phase 1 if docs exist
 - Default → all phases

-**Testability-run specifics** (guided mode invoked by autodev existing-code flow Step 4):
+**Testability-run specifics** (guided mode invoked by autodev existing-code Step 4 or greenfield Step 8):
 - Run name is `01-testability-refactoring`.
 - Phase 3 (Safety Net) is skipped by design — no tests exist yet. Compensating control: the `list-of-changes.md` gate in Phase 1 must be reviewed and approved by the user before Phase 4 runs.
- Scope is MINIMAL and surgical; reject change entries that drift into full refactor territory (see existing-code flow Step 4 for allowed/disallowed lists). Flagged entries go to `RUN_DIR/deferred_to_refactor.md` for Step 8 (optional full refactor) consideration.
+- Scope is MINIMAL and surgical; reject change entries that drift into full refactor territory (see the invoking flow's testability step for allowed/disallowed lists). Flagged entries go to `RUN_DIR/deferred_to_refactor.md` for the next optional full-refactor step or backlog consideration.
 - After Phase 4 (Execution) completes, write `RUN_DIR/testability_changes_summary.md` as Phase 4.5. Format: one bullet per applied change.
  ```markdown
  # Testability Changes Summary ({{run_name}})
@@ -95,7 +95,7 @@ Also copy to project standard locations:

 **Critical step — do not skip.** Before producing the change list, cross-reference documented business flows against actual implementation. This catches issues that static code inspection alone misses.

-1. **Read documented flows**: Load `DOCUMENT_DIR/system-flows.md`, `DOCUMENT_DIR/architecture.md`, `DOCUMENT_DIR/module-layout.md`, every file under `DOCUMENT_DIR/contracts/`, and `SOLUTION_DIR/solution.md` (whichever exist). Extract every documented business flow, data path, architectural decision, module ownership boundary, and contract shape.
+1. **Read documented flows**: Load `DOCUMENT_DIR/system-flows.md`, `DOCUMENT_DIR/architecture.md` (paying special attention to its `## Architecture Vision` section — that's the user-confirmed structural intent), `DOCUMENT_DIR/glossary.md`, `DOCUMENT_DIR/module-layout.md`, every file under `DOCUMENT_DIR/contracts/`, and `SOLUTION_DIR/solution.md` (whichever exist). Extract every documented business flow, data path, architectural decision, module ownership boundary, and contract shape. Any refactor change that contradicts a confirmed Architecture Vision principle must either be rejected or surfaced to the user before being added to `list-of-changes.md` — those principles are not refactor targets without explicit user approval.

 2. **Trace each flow through code**: For every documented flow (e.g., "video batch processing", "image tiling", "engine initialization"), walk the actual code path line by line. At each decision point ask:
   - Does the code match the documented/intended behavior?
@@ -7,14 +7,29 @@
 ## 2a. Deep Research

 1. Analyze current implementation patterns
-2. Research modern approaches for similar systems
-3. Identify what could be done differently
-4. Suggest improvements based on state-of-the-art practices
+2. Extract the **Project Constraint Matrix** from `problem.md`, `restrictions.md`, `acceptance_criteria.md`, current architecture/docs, and actual code constraints. Include required inputs/outputs, operating context, lifecycle assumptions, integration boundaries, non-functional targets, and hard disqualifiers.
+3. Research modern approaches for similar systems
+4. For each alternative pattern/library/service/architecture/algorithm, research intrinsic implementation constraints: required inputs/outputs, runtime assumptions, supported deployment modes, resource needs, operational limits, licensing/security constraints, and known failure reports.
+
+   **API Capability Verification — Per-Mode (MANDATORY, BLOCKING for proposed replacements)**
+
+   When a refactor recommendation replaces (or adds) a library/SDK/framework/service, the same per-mode verification used by `/research` Step 2 applies — selecting a replacement on category fit alone is the same silent-failure path. For every replacement candidate that has multiple modes or configurations:
+
+   1. **Pin the exact mode/configuration** the refactored code will use, in one explicit sentence. Inputs (data shapes, sensor counts, payloads, rates), outputs (per `acceptance_criteria.md` and contract files), runtime (matching the project's deployment).
+   2. **Run `context7` (or equivalent docs lookup)** for the candidate. **Mandatory for every replacement library/SDK/framework candidate**, not optional. Minimum three queries per candidate: mode enumeration, project's exact mode (with input/output shapes), disqualifier probe ("does this mode produce the required output? are there published limitations on this runtime?"). Append URLs to `RUN_DIR/analysis/research_findings.md` references section.
+   3. **Save a Minimum Viable Example (MVE)** for the pinned mode under `RUN_DIR/analysis/mve_evidence.md` with: source, inputs in example, outputs in example, project inputs, project outputs required, match assessment ✅/⚠️/❌. If no official example covers the project's exact configuration, the recommendation cannot be `Selected` based on category fit alone — it must be `Experimental only` (with required-evidence note) or `Rejected`.
+   4. **Treat "the same library in a different mode" as a different recommendation.** If the project's pinned mode is `<X>` but the only documented evidence covers `<Y>`, do not silently soften the description. Open a separate recommendation row, with its own MVE, fit assessment, and disqualifiers.
+   5. **Common silent-failure pattern**: a fact summary paraphrases docs as "supports A, B, C, D modes" when the docs actually mean "supports A; B; C and D as separate orthogonal modes" — no `A+B` combination exists. Cross-check paraphrased capability claims against the literal mode enumeration.
+
+5. Identify what could be done differently
+6. Suggest improvements only when they fit the Project Constraint Matrix. A cleaner or more modern approach that violates product constraints must be marked `Rejected` or `Experimental only`, not added as a roadmap recommendation.

 Write `RUN_DIR/analysis/research_findings.md`:
 - Current state analysis: patterns used, strengths, weaknesses
 - Alternative approaches per component: current vs alternative, pros/cons, migration effort
 - Prioritized recommendations: quick wins + strategic improvements
+- Constraint-fit table: recommendation, **pinned mode/config**, constraints checked, **API capability evidence (MVE link)**, evidence, mismatches/disqualifiers, status (`Selected` / `Rejected` / `Experimental only` / `Needs user decision`)
+- For every recommendation that replaces or adds a library/SDK/framework, append a **Restrictions × Candidate-Mode sub-matrix** that walks every numbered line of `restrictions.md` and `acceptance_criteria.md` against the candidate's pinned mode, marking each cell ✅ Pass / ❌ Fail / ❓ Verify / N/A with cited evidence. A recommendation cannot be `Selected` while any cell is ❌ or ❓.

 ## 2b. Solution Assessment & Hardening Tracks

@@ -22,6 +37,7 @@ Write `RUN_DIR/analysis/research_findings.md`:
 2. Identify weak points in codebase, map to specific code areas
 3. Perform gap analysis: acceptance criteria vs current state
 4. Prioritize changes by impact and effort
+5. Reject or escalate any proposed refactor that improves code structure while weakening required behavior, integration contracts, runtime constraints, safety/security posture, or acceptance criteria

 Present optional hardening tracks for user to include in the roadmap:

@@ -47,6 +63,9 @@ Write `RUN_DIR/analysis/refactoring_roadmap.md`:
 - Gap analysis: what's missing, what needs improvement
 - Phased roadmap: Phase 1 (critical fixes), Phase 2 (major improvements), Phase 3 (enhancements)
 - Selected hardening tracks and their items
+- Applicability gate: each roadmap item must state constraint fit, mismatches, required evidence, and status (`Selected` / `Rejected` / `Experimental only` / `Needs user decision`)
+
+**BLOCKING applicability gate**: Before 2c and 2d, every recommendation in the roadmap must be `Selected`. Items marked `Rejected` are excluded. Items marked `Experimental only` or `Needs user decision` require a user decision before task creation.

 ## 2c. Create Epic

@@ -55,7 +74,7 @@ Create a work item tracker epic for this refactoring run:
 1. Epic name: the RUN_DIR name (e.g., `01-testability-refactoring`)
 2. Create the epic via configured tracker MCP
 3. Record the Epic ID — all tasks in 2d will be linked under this epic
-4. If tracker unavailable, use `PENDING` placeholder and note for later
+4. If tracker is unavailable, follow `.cursor/rules/tracker.mdc`; only use `PENDING` placeholders if the user explicitly chooses `tracker: local`

 ## 2d. Task Decomposition

@@ -79,6 +98,12 @@ Convert the finalized `RUN_DIR/list-of-changes.md` into implementable task files
 **Self-verification**:
 - [ ] All acceptance criteria are addressed in gap analysis
 - [ ] Recommendations are grounded in actual code, not abstract
+- [ ] Every recommendation has been checked against the Project Constraint Matrix
+- [ ] No recommendation violates product restrictions, acceptance criteria, documented architecture decisions, or actual code integration boundaries
+- [ ] Every replacement library/SDK/framework recommendation has a pinned mode/config, a saved MVE in `mve_evidence.md`, and a Restrictions × Candidate-Mode sub-matrix with no ❌ or ❓ cells
+- [ ] `context7` (or equivalent) was consulted for every replacement library/SDK/framework recommendation
+- [ ] Paraphrased capability claims have been cross-checked against the literal mode-enumeration evidence (no `A, B → A+B` style conflation)
+- [ ] Rejected and experimental approaches are documented but not converted into implementation tasks without user approval
 - [ ] Roadmap phases are prioritized by impact
 - [ ] Epic created and all tasks linked to it
 - [ ] Every entry in list-of-changes.md has a corresponding task file in TASKS_DIR
@@ -10,7 +10,7 @@
   - All `[TRACKER-ID]_refactor_*.md` files are present
   - Each task file has valid header fields (Task, Name, Description, Complexity, Dependencies)
 2. Verify `TASKS_DIR/_dependencies_table.md` includes the refactoring tasks
-3. Verify all tests pass (safety net from Phase 3 is green)
+3. Verify all tests pass (safety net from Phase 3 is green), unless this is a testability run where Phase 3 was intentionally skipped
 4. If any check fails, go back to the relevant phase to fix

 ## 4b. Delegate to Implement Skill
@@ -21,9 +21,9 @@ The implement skill will:
 1. Parse task files and dependency graph from TASKS_DIR
 2. Detect already-completed tasks (skip non-refactoring tasks from prior workflow steps)
 3. Compute execution batches for the refactoring tasks
-4. Launch implementer subagents (up to 4 in parallel)
+4. Implement tasks sequentially in topological order (no subagents, no parallelism)
 5. Run code review after each batch
-6. Commit and push per batch
+6. Commit per batch and push only when the user approved pushing
 7. Update work item ticket status

 Do NOT modify, skip, or abbreviate any part of the implement skill's workflow. The refactor skill is delegating execution, not optimizing it.
@@ -47,7 +47,7 @@ After the implement skill completes:
 For each successfully completed refactoring task:

 1. Transition the work item ticket status to **Done** via the configured tracker MCP
-2. If tracker unavailable, note the pending status transitions in `RUN_DIR/execution_log.md`
+2. If tracker is unavailable, follow `.cursor/rules/tracker.mdc`; if the user explicitly chose `tracker: local`, note the pending status transitions in `RUN_DIR/execution_log.md`

 For any failed or blocked tasks, leave their status as-is (the implement skill already set them to In Testing or blocked).

@@ -32,7 +32,7 @@ For each component doc affected:
 ## 7d. Update System-Level Documentation

 If structural changes were made (new modules, removed modules, changed interfaces):
-1. Update `_docs/02_document/architecture.md` if architecture changed
+1. Update `_docs/02_document/architecture.md` if architecture changed — but **never edit the `## Architecture Vision` section**. That section is user-confirmed (plan Phase 2a.0 / document Step 4.5); if a refactor invalidates a vision principle, surface it to the user and let them update the vision themselves before continuing. Update only the technical sections below the Vision H2.
 2. Update `_docs/02_document/system-flows.md` if flow sequences changed
 3. Update `_docs/02_document/diagrams/components.md` if component relationships changed

@@ -23,6 +23,7 @@ Save as `RUN_DIR/list-of-changes.md`. Produced during Phase 1 (Discovery).
 - **Problem**: [what makes this problematic / untestable / coupled]
 - **Change**: [what to do — behavioral description, not implementation steps]
 - **Rationale**: [why this change is needed]
+- **Constraint Fit**: [which product constraints / acceptance criteria / integration boundaries this preserves; or "Rejected — violates ..."]
 - **Risk**: [low | medium | high]
 - **Dependencies**: [other change IDs this depends on, or "None"]

@@ -31,6 +32,7 @@ Save as `RUN_DIR/list-of-changes.md`. Produced during Phase 1 (Discovery).
 - **Problem**: [description]
 - **Change**: [description]
 - **Rationale**: [description]
+- **Constraint Fit**: [description]
 - **Risk**: [low | medium | high]
 - **Dependencies**: [C01, or "None"]
 ```
@@ -44,6 +46,8 @@ Save as `RUN_DIR/list-of-changes.md`. Produced during Phase 1 (Discovery).
 - **File(s)** must reference actual files verified to exist in the codebase
 - **Problem** describes the current state, not the desired state
 - **Change** describes what the system should do differently — behavioral, not prescriptive
+- **Constraint Fit** proves the change preserves confirmed product requirements, restrictions, acceptance criteria, architecture decisions, and integration contracts
+- Do not include changes whose only benefit is structural cleanliness if they weaken required behavior or violate constraints; record those as rejected in analysis instead
 - **Dependencies** reference other change IDs within this list; cross-run dependencies use tracker IDs
 - In guided mode, the input file entries are validated against actual code and enriched with file paths, risk, and dependencies before writing
 - In automatic mode, entries are derived from Phase 1 component analysis and Phase 2 research findings
@@ -30,6 +30,27 @@ Transform vague topics raised by users into high-quality, deliverable research r
 - **Internet-first investigation** — do not rely on training data for factual claims; search the web extensively for every sub-question, rephrase queries when results are thin, and keep searching until you have converging evidence from multiple independent sources
 - **Multi-perspective analysis** — examine every problem from at least 3 different viewpoints (e.g., end-user, implementer, business decision-maker, contrarian, domain expert, field practitioner); each perspective should generate its own search queries
 - **Question multiplication** — for each sub-question, generate multiple reformulated search queries (synonyms, related terms, negations, "what can go wrong" variants, practitioner-focused variants) to maximize coverage and uncover blind spots
+- **Component option breadth** — for every component area, build a broad option landscape before selecting. Search direct candidates, adjacent-domain alternatives, commercial/open-source variants, classical/simple baselines, current SOTA, and "do not use" failure cases. A component may not be narrowed to one candidate until alternatives have been searched and rejected with evidence.
+- **Component research depth** — for every serious component candidate, go beyond discovery pages. Read official docs, repository/license files, issue discussions, benchmarks, deployment guides, version/platform requirements, security notes, maintenance signals, and real-world failure reports. Extract evidence for inputs/outputs, lifecycle assumptions, runtime/storage/latency fit, integration boundaries, licensing, operational risks, and unsupported scenarios before assigning any selection status.
+- **Exact-fit component selection** — never select a component, tool, library, service, architecture pattern, or algorithm merely because it solves a similar class of problem. It must be proven compatible with the project's explicit operating context, constraints, required inputs/outputs, non-functional requirements, lifecycle assumptions, and acceptance criteria. If fit is unproven or mismatched, mark it `Rejected`, `Experimental only`, or escalate for user decision before it can shape the solution.
+- **Per-mode API capability verification** *(applies only to technical-component selection — see Research Output Class below)* — when a candidate library/SDK/framework/service exposes multiple modes or configurations, *the candidate is not a single thing*. Pin the exact mode the project will use (one explicit sentence: inputs, outputs, runtime), and verify *that mode* against the project's required inputs/outputs via official docs (mandatory `context7` lookup) plus a saved Minimum Viable Example. Capability claims at the category level ("supports X, Y, Z modes") must be cross-checked against the literal mode enumeration before being treated as project-applicable. Two modes of one library are two distinct candidates for the purposes of the Component Applicability Gate. Does not apply to non-technical research (concept comparison, market/policy investigation, knowledge organization, etc.).
+
+## Research Output Class (BLOCKING — set in Step 1)
+
+Before applying any of the technical-component gates (per-mode API capability verification, Component Applicability Gate, Restrictions × Candidate-Mode sub-matrix, MVE evidence, mandatory `context7` lookup), classify the research output into one of two classes. Record the decision in `00_question_decomposition.md` once, near the top, so every downstream step honors it.
+
+| Class | What the output recommends or selects | Examples | Technical-component gates apply? |
+|-------|---------------------------------------|----------|----------------------------------|
+| **Technical-component selection** | One or more libraries, SDKs, frameworks, services, protocols, data formats, infrastructure patterns, algorithms, or APIs that will be implemented or operated against | "Pick a vector database", "Compare auth-token strategies for our API", "Should we use Kafka or RabbitMQ?", architecture / tech-stack / migration drafts (Mode A, Mode B) | **Yes — all gates active** |
+| **Non-technical investigation** | Concept comparisons, knowledge organization, root-cause investigation of an event, market/policy/regulatory/social analysis, literature review, decision support without committing to specific tooling | "Why did adoption stall in Q3?", "Compare phenomenology vs constructivism", "Map regulatory landscape for X", "What do practitioners say about onboarding under remote-first orgs?" | **No — skip API/MVE/sub-matrix gates; the rest of the 8-step engine still applies** |
+
+How to decide:
+1. Inspect the question and the input files (`problem.md`, `restrictions.md`, `acceptance_criteria.md`, or the standalone input file).
+2. If the deliverable will name specific software/services/protocols that someone will then build with or operate, it is **Technical-component selection**.
+3. If the deliverable is a report, comparison, or recommendation that does not commit to specific tooling, it is **Non-technical investigation**.
+4. **Mixed runs are valid.** Some research questions have a non-technical core but include one technical sub-question (or vice versa). In that case classify per component area within the run, not the run as a whole, and note in `00_question_decomposition.md` which component areas trigger the technical-component gates.
+
+When the run is purely **Non-technical investigation**, the rest of the research engine — question decomposition, perspective rotation, exhaustive web search, fact extraction, comparison framework, reasoning chain, validation, deliverable formatting — still applies in full. The sections that get skipped are explicitly the technical gates listed in the table above.

 ## Context Resolution

@@ -27,13 +27,26 @@
 - [ ] Iterative deepening completed: follow-up questions from initial findings were searched
 - [ ] No sub-question relies solely on training data without web verification

+## Component Option Breadth
+
+- [ ] `00_question_decomposition.md` contains a Component Option Search Plan
+- [ ] Every component area was searched across simple baseline, established production, open-source, commercial/vendor, current SOTA, adjacent-domain, no-build/defer, and known-bad options where applicable
+- [ ] Every component area has at least 3 realistic candidates, or a documented explanation of why broad searches found fewer
+- [ ] Each lead candidate has official/source-of-truth evidence plus independent validation when available
+- [ ] Each component area includes at least one baseline/fallback option and at least one rejected or experimental option when possible
+- [ ] Alternative names, synonyms, and neighboring-domain terms were searched before declaring the option landscape complete
+- [ ] Licensing, runtime, platform, maintenance, and unsupported-scenario searches were performed for every lead, fallback, and rejected candidate
+
 ## Mode A Specific

 - [ ] Phase 1 completed: AC assessment was presented to and confirmed by user
 - [ ] AC assessment consistent: Solution draft respects the (possibly adjusted) acceptance criteria and restrictions
 - [ ] Competitor analysis included: Existing solutions were researched
 - [ ] All components have comparison tables: Each component lists alternatives with tools, advantages, limitations, security, cost
+- [ ] Component options are broad: component tables include baseline, production, open-source, commercial/vendor, SOTA/research, adjacent-domain, defer/no-build, and disqualified options where applicable
 - [ ] Tools/libraries verified: Suggested tools actually exist and work as described
+- [ ] Component fit matrix completed: `06_component_fit_matrix.md` (or `06_component_fit_matrix/` if split) exists and every selected component/tool/pattern is marked `Selected`
+- [ ] No field-adjacent substitution: no selected candidate is chosen only because it solves a similar class of problem while failing the project's explicit constraints
 - [ ] Testing strategy covers AC: Tests map to acceptance criteria
 - [ ] Tech stack documented (if Phase 3 ran): `tech_stack.md` has evaluation tables, risk assessment, and learning requirements
 - [ ] Security analysis documented (if Phase 4 ran): `security_analysis.md` has threat model and per-component controls
@@ -45,6 +58,9 @@
 - [ ] New draft is self-contained: Written as if from scratch, no "updated" markers
 - [ ] Performance column included: Mode B comparison tables include performance characteristics
 - [ ] Previous draft issues addressed: Every finding in the table is resolved in the new draft
+- [ ] Existing selected components were challenged against a broad alternative landscape before being kept
+- [ ] Existing component fit audited: every old and new component/tool/pattern was checked against `restrictions.md`, `acceptance_criteria.md`, and the Project Constraint Matrix
+- [ ] Rejected/experimental candidates are not lead recommendations unless the user explicitly accepted the risk

 ## Timeliness Check (High-Sensitivity Domain BLOCKING)

@@ -64,7 +80,7 @@ When the research topic has Critical or High sensitivity level:
 ## Target Audience Consistency Check (BLOCKING)

 - [ ] Research boundary clearly defined: `00_question_decomposition.md` has clear population/geography/timeframe/level boundaries
- [ ] Every source has target audience annotated in `01_source_registry.md`
+- [ ] Every source has target audience annotated in `01_source_registry.md` (or category files under `01_source_registry/` if split)
 - [ ] Mismatched sources properly handled (excluded, annotated, or marked reference-only)
 - [ ] No audience confusion in fact cards: Every fact has target audience consistent with research boundary
 - [ ] No audience confusion in the report: Policies/research/data cited have consistent target audiences
@@ -76,3 +92,33 @@ When the research topic has Critical or High sensitivity level:
 - [ ] Cited facts have corresponding statements in the original text (no over-interpretation)
 - [ ] Source publication/update dates annotated; technical docs include version numbers
 - [ ] Unverifiable information annotated `[limited source]` and not sole support for core conclusions
+
+## Exact-Fit Validation (BLOCKING)
+
+- [ ] Project Constraint Matrix extracted from problem context before component selection
+- [ ] Component fit matrix includes `Component Area`, `Option Family`, and `Pinned Mode/Config` columns
+- [ ] Every selected component/tool/library/service/pattern/algorithm has evidence for required inputs/outputs and integration boundaries
+- [ ] Every selected candidate has evidence for the operating context and lifecycle assumptions it must support
+- [ ] Every selected candidate has evidence for non-functional targets that are binding for the project
+- [ ] Known unsupported scenarios and failure reports were searched for every selected candidate
+- [ ] Mismatches are recorded as disqualifiers, not softened into generic limitations
+- [ ] Any candidate with unproven fit is marked `Experimental only` or escalated for user decision
+- [ ] Any candidate with documented constraint conflict is marked `Rejected`
+
+## API Capability Verification (BLOCKING)
+
+**Applicability**: this checklist applies only when the run is classified as **Technical-component selection** (see SKILL.md → Research Output Class). For non-technical research (concept comparison, market/policy investigation, root-cause analysis, knowledge organization), skip this checklist entirely and note the skip in `05_validation_log.md`. For mixed runs, apply only to technical component areas.
+
+For every lead candidate that is a library/SDK/framework/service:
+
+- [ ] The exact mode/configuration the project will use is pinned in one explicit sentence (inputs, outputs, runtime); no vague "supports X" language
+- [ ] `context7` (or equivalent docs lookup) was run for the candidate, with at least 3 queries: mode enumeration, project's exact mode, disqualifier probe
+- [ ] All consulted URLs from context7 / official docs are appended to `01_source_registry.md` (or files under `01_source_registry/` if split)
+- [ ] A Minimum Viable Example (MVE) was saved for the pinned mode in `02_fact_cards.md` / `02_fact_cards/` (or `02_mve_evidence.md`) with: source, inputs in example, outputs in example, project inputs, project outputs required, match assessment ✅/⚠️/❌
+- [ ] When the MVE inputs or outputs do not exactly match the project's, the mismatch is cited from the official docs (not inferred), and the candidate is `Experimental only` or `Rejected`
+- [ ] When a library has multiple modes, each project-relevant mode appears as its own candidate row (not a single library row that softens across modes)
+- [ ] Restrictions × Candidate-Modes sub-matrix in `06_component_fit_matrix.md` (or files under `06_component_fit_matrix/` if split) is filled for every lead candidate, with one row per numbered restriction and per numbered acceptance criterion
+- [ ] Sub-matrix uses ✅ / ❌ / ❓ / N/A only — no free-form prose substitutes
+- [ ] No `Selected` candidate has any ❌ or ❓ cell in its sub-matrix
+- [ ] "Validation gate required" footnotes are explicitly classified as either *API capability* (must be resolved here) or *runtime quality* (may be carried forward)
+- [ ] Paraphrased capability claims in fact cards have been cross-checked against the literal mode-enumeration evidence (no `mono, inertial → mono-inertial` style conflation)
@@ -89,7 +89,7 @@ Value Translation:

 ## Source Registry Entry Template

-For each source consulted, immediately append to `01_source_registry.md`:
+For each source consulted, immediately append to `01_source_registry.md` (or the appropriate category file under `01_source_registry/` if the artifact has been split — see splittable-artifacts convention in `steps/00_project-integration.md`):
 ```markdown
 ## Source #[number]
 - **Title**: [source title]
@@ -57,22 +57,49 @@ RESEARCH_DIR/
 ├── 03_comparison_framework.md     # Step 4 output: selected framework and populated data
 ├── 04_reasoning_chain.md          # Step 6 output: fact → conclusion reasoning
 ├── 05_validation_log.md           # Step 7 output: use-case validation results
+├── 06_component_fit_matrix.md     # Step 7.5 output: component exact-fit gate
 └── raw/                           # Raw source archive (optional)
    ├── source_1.md
    └── source_2.md
 ```

+#### Splittable artifacts — Layout convention
+
+The following three artifacts MAY equivalently be a **folder** of the same base name when the single-file form has grown unwieldy (typically ≳ 1000 lines or ≳ 200 KB):
+
+- `01_source_registry.md` ↔ `01_source_registry/`
+- `02_fact_cards.md` ↔ `02_fact_cards/`
+- `06_component_fit_matrix.md` ↔ `06_component_fit_matrix/`
+
+When using the folder form:
+
+- Place a `00_summary.md` index file at the folder root with a short common summary table and the cross-cutting status the single-file form would have carried in its preamble.
+- Split per-entry content into category files (e.g. one file per sub-question or per component): `SQ1_*.md`, `C1_*.md`, etc. Keep entry numbering global across the folder so cross-references like "Source #42" still resolve to exactly one place.
+- Cross-references from outside the folder may point at either `01_source_registry/00_summary.md` (for the index) or directly at the relevant category file.
+
+```
+RESEARCH_DIR/01_source_registry/        # split form (when single-file is too large)
+├── 00_summary.md                       # index + investigation status + compact source table
+├── SQ1_existing_systems.md             # category file
+├── SQ2_canonical_pipeline.md           # category file
+├── C1_vio.md                           # per-component file
+└── ...
+```
+
+Throughout the rest of this skill (other steps, references, templates), the singular `XX.md` form is used as a logical name; treat each occurrence as applying equally to the folder form when the artifact has been split.
+
 ### Save Timing & Content

 | Step | Save immediately after completion | Filename |
 |------|-----------------------------------|----------|
 | Mode A Phase 1 | AC & restrictions assessment tables | `00_ac_assessment.md` |
 | Step 0-1 | Question type classification + sub-question list | `00_question_decomposition.md` |
-| Step 2 | Each consulted source link, tier, summary | `01_source_registry.md` |
-| Step 3 | Each fact card (statement + source + confidence) | `02_fact_cards.md` |
+| Step 2 | Each consulted source link, tier, summary | `01_source_registry.md` *(splittable, see convention)* |
+| Step 3 | Each fact card (statement + source + confidence) | `02_fact_cards.md` *(splittable, see convention)* |
 | Step 4 | Selected comparison framework + initial population | `03_comparison_framework.md` |
 | Step 6 | Reasoning process for each dimension | `04_reasoning_chain.md` |
 | Step 7 | Validation scenarios + results + review checklist | `05_validation_log.md` |
+| Step 7.5 | Component exact-fit gate and selection status | `06_component_fit_matrix.md` *(splittable, see convention)* |
 | Step 8 | Complete solution draft | `OUTPUT_DIR/solution_draft##.md` |

 ### Save Principles
@@ -90,11 +117,12 @@ RESEARCH_DIR/
 |------|---------|----------------|
 | `00_ac_assessment.md` | AC & restrictions assessment (Mode A only) | After Phase 1 completion |
 | `00_question_decomposition.md` | Question type, sub-question list | After Step 0-1 completion |
-| `01_source_registry.md` | All source links and summaries | Continuously updated during Step 2 |
-| `02_fact_cards.md` | Extracted facts and sources | Continuously updated during Step 3 |
+| `01_source_registry.md` *(splittable)* | All source links and summaries | Continuously updated during Step 2 |
+| `02_fact_cards.md` *(splittable)* | Extracted facts and sources | Continuously updated during Step 3 |
 | `03_comparison_framework.md` | Selected framework and populated data | After Step 4 completion |
 | `04_reasoning_chain.md` | Fact → conclusion reasoning | After Step 6 completion |
 | `05_validation_log.md` | Use-case validation and review | After Step 7 completion |
+| `06_component_fit_matrix.md` *(splittable)* | Exact-fit matrix for every proposed component/tool/pattern with status `Selected` / `Rejected` / `Experimental only` / `Needs user decision` | Before Step 8 deliverable formatting |
 | `OUTPUT_DIR/solution_draft##.md` | Complete solution draft | After Step 8 completion |
 | `OUTPUT_DIR/tech_stack.md` | Tech stack evaluation and decisions | After Phase 3 (optional) |
 | `OUTPUT_DIR/security_analysis.md` | Threat model and security controls | After Phase 4 (optional) |
@@ -6,7 +6,9 @@ Triggered when no `solution_draft*.md` files exist in OUTPUT_DIR, or when the us

 **Role**: Professional software architect

-A focused preliminary research pass **before** the main solution research. The goal is to validate that the acceptance criteria and restrictions are realistic before designing a solution around them.
+> **AC must be design-independent**: describe testable outcomes only — no libraries, algorithms, params, or design choices. Implementation follows AC, never reverse. (IEEE 830 / Atlassian / GitScrum)
+
+A focused preliminary research pass **before** the main solution research. The goal is to validate that the acceptance criteria and restrictions are realistic before designing a solution around them. Any revision proposed in this phase must respect the design-independence rule above — propose AC changes as outcome/budget edits, not as implementation prescriptions.

 **Input**: All files from INPUT_DIR (or INPUT_FILE in standalone mode)

@@ -73,16 +75,18 @@ Full 8-step research methodology. Produces the first solution draft.
 **Task** (drives the 8-step engine):
 1. Research existing/competitor solutions for similar problems — search broadly across industries and adjacent domains, not just the obvious competitors
 2. Research the problem thoroughly — all possible ways to solve it, split into components; search for how different fields approach analogous problems
-3. For each component, research all possible solutions and find the most efficient state-of-the-art approaches — use multiple query variants and perspectives from Step 1
-4. For each promising approach, search for real-world deployment experience: success stories, failure reports, lessons learned, and practitioner opinions
-5. Search for contrarian viewpoints — who argues against the common approaches and why? What failure modes exist?
-6. Verify that suggested tools/libraries actually exist and work as described — check official repos, latest releases, and community health (stars, recent commits, open issues)
-7. Include security considerations in each component analysis
-8. Provide rough cost estimates for proposed solutions
+3. Derive a **Project Constraint Matrix** before evaluating component options. Extract exact constraints from `problem.md`, `restrictions.md`, `acceptance_criteria.md`, input data notes, and the Phase 1 AC assessment. Include required inputs/outputs, operating context, runtime envelope, data availability, lifecycle boundaries, non-functional targets, integration boundaries, security constraints, and explicit out-of-scope decisions.
+4. For each component, research all possible solutions and find the most efficient state-of-the-art approaches — use multiple query variants and perspectives from Step 1
+5. For each promising approach, search for real-world deployment experience: success stories, failure reports, lessons learned, and practitioner opinions
+6. Search for contrarian viewpoints — who argues against the common approaches and why? What failure modes exist?
+7. Verify that suggested tools/libraries actually exist and work as described — check official repos, latest releases, and community health (stars, recent commits, open issues)
+8. For every candidate component/tool/library/service/pattern/algorithm, prove exact fit against the Project Constraint Matrix. A field-adjacent solution is not selectable unless its documented implementation assumptions match the project's constraints. Mismatches must be recorded as disqualifiers and the candidate marked `Rejected`, `Experimental only`, or `Needs user decision`.
+9. Include security considerations in each component analysis
+10. Provide rough cost estimates for proposed solutions

 Be concise in formulating. The fewer words, the better, but do not miss any important details.

-**Save action**: Write `OUTPUT_DIR/solution_draft##.md` using template: `templates/solution_draft_mode_a.md`
+**Save action**: Write `RESEARCH_DIR/06_component_fit_matrix.md` (or its split-folder equivalent under `RESEARCH_DIR/06_component_fit_matrix/`, per the splittable-artifacts convention in `00_project-integration.md`) before the final draft, then write `OUTPUT_DIR/solution_draft##.md` using template: `templates/solution_draft_mode_a.md`

 ---

@@ -10,18 +10,25 @@ Full 8-step research methodology applied to assessing and improving an existing

 **Task** (drives the 8-step engine):
 1. Read the existing solution draft thoroughly
-2. Research in internet extensively — for each component/decision in the draft, search for:
+2. Derive or refresh the **Project Constraint Matrix** from all files in INPUT_DIR. Include required inputs/outputs, operating context, runtime envelope, data availability, lifecycle boundaries, non-functional targets, integration boundaries, security constraints, and explicit out-of-scope decisions.
+3. Audit every component/decision in the existing draft against the Project Constraint Matrix before researching alternatives:
+   - If a component's documented implementation assumptions match the project constraints, keep it eligible and record evidence.
+   - If fit is unproven, mark it `Experimental only` until evidence is found.
+   - If constraints conflict, mark it `Rejected` and search for alternatives.
+   - If rejecting it changes product behavior or risk materially, escalate for user decision.
+4. Research in internet extensively — for each component/decision in the draft, search for:
   - Known problems and limitations of the chosen approach
   - What practitioners say about using it in production
   - Better alternatives that may have emerged recently
   - Common failure modes and edge cases
   - How competitors/similar projects solve the same problem differently
-3. Search specifically for contrarian views: "why not [chosen approach]", "[chosen approach] criticism", "[chosen approach] failure"
-4. Identify security weak points and vulnerabilities — search for CVEs, security advisories, and known attack vectors for each technology in the draft
-5. Identify performance bottlenecks — search for benchmarks, load test results, and scalability reports
-6. For each identified weak point, search for multiple solution approaches and compare them
-7. Based on findings, form a new solution draft in the same format
+5. Search specifically for contrarian views: "why not [chosen approach]", "[chosen approach] criticism", "[chosen approach] failure"
+6. Identify security weak points and vulnerabilities — search for CVEs, security advisories, and known attack vectors for each technology in the draft
+7. Identify performance bottlenecks — search for benchmarks, load test results, and scalability reports
+8. For each identified weak point, search for multiple solution approaches and compare them
+9. For every revised candidate, prove exact fit against the Project Constraint Matrix. Do not select field-adjacent or "similar problem" options unless their intrinsic implementation constraints match the project.
+10. Based on findings, form a new solution draft in the same format

-**Save action**: Write `OUTPUT_DIR/solution_draft##.md` (incremented) using template: `templates/solution_draft_mode_b.md`
+**Save action**: Write `RESEARCH_DIR/06_component_fit_matrix.md` (or its split-folder equivalent under `RESEARCH_DIR/06_component_fit_matrix/`, per the splittable-artifacts convention in `00_project-integration.md`) before the final draft, then write `OUTPUT_DIR/solution_draft##.md` (incremented) using template: `templates/solution_draft_mode_b.md`

 **Optional follow-up**: After Mode B completes, the user can request Phase 3 (Tech Stack Consolidation) or Phase 4 (Security Deep Dive) using the revised draft. These phases work identically to their Mode A descriptions in `steps/01_mode-a-initial-research.md`.
@@ -40,6 +40,7 @@ Key principle: Critical-sensitivity topics (AI/LLMs, blockchain) require sources
 - "What existing/competitor solutions address this problem?"
 - "What are the component parts of this problem?"
 - "For each component, what are the state-of-the-art solutions?"
+- "For each component, what are the practical alternatives across simple baseline, established production option, open-source option, commercial option, current SOTA, adjacent-domain option, and no-build/defer option?"
 - "What are the security considerations per component?"
 - "What are the cost implications of each approach?"

@@ -48,6 +49,7 @@ Key principle: Critical-sensitivity topics (AI/LLMs, blockchain) require sources
 - "What are the security vulnerabilities in the proposed architecture?"
 - "Where are the performance bottlenecks?"
 - "What solutions exist for each identified issue?"
+- "For each component already selected in the draft, what alternatives should be considered before keeping, replacing, or rejecting it?"

 **General sub-question patterns** (use when applicable):
 - **Sub-question A**: "What is X and how does it work?" (Definition & mechanism)
@@ -84,6 +86,27 @@ For **each sub-question**, generate **at least 3-5 search query variants** befor

 Record all planned queries in `00_question_decomposition.md` alongside each sub-question.

+#### Component Option Breadth (MANDATORY)
+
+Before Step 2, identify the component areas implied by the problem and create a search plan for options in each area. A component area is any replaceable tool, library, model, service, algorithm, data format, protocol, infrastructure pattern, or validation approach that could materially affect the solution.
+
+For every component area, generate search queries for these option families unless clearly not applicable:
+- **Simple baseline**: low-complexity classical or manual approach that can serve as a fallback or regression baseline.
+- **Established production option**: mature library/service/pattern with field usage.
+- **Open-source candidate**: permissive-license option with inspectable implementation and community history.
+- **Commercial/vendor option**: paid or vendor-supported option, including SDK/platform constraints.
+- **Current SOTA / research option**: recent model, paper, or benchmark leader that may be promising but immature.
+- **Adjacent-domain option**: solution from a neighboring domain with similar constraints.
+- **No-build / defer option**: whether the component can be avoided, simplified, or moved out of scope.
+- **Known bad option**: candidate or family that appears attractive but has documented failure modes or disqualifiers.
+
+For each component area, record:
+- Candidate names and option families to search.
+- At least 5 query variants covering alternatives, comparisons, limitations, licensing, runtime/scale, and exact project constraints.
+- The minimum evidence needed to mark a candidate `Selected`, `Rejected`, `Experimental only`, or `Needs user decision`.
+
+Add this as a "Component Option Search Plan" section in `00_question_decomposition.md`.
+
 **Research Subject Boundary Definition (BLOCKING - must be explicit)**:

 When decomposing questions, you must explicitly define the **boundaries of the research subject**:
@@ -94,6 +117,9 @@ When decomposing questions, you must explicitly define the **boundaries of the r
 | **Geography** | Which region is being studied? | Chinese universities vs US universities vs global |
 | **Timeframe** | Which period is being studied? | Post-2020 vs full historical picture |
 | **Level** | Which level is being studied? | Undergraduate vs graduate vs vocational |
+| **Operating context** | What exact environment, lifecycle phase, and runtime conditions must the solution support? | In-flight embedded runtime vs offline post-processing; production web traffic vs admin batch job |
+| **Required interfaces** | What inputs, outputs, protocols, data shapes, and ownership boundaries are fixed? | One camera vs stereo rig; REST API vs message queue; local file boundary vs service API |
+| **Non-functional envelope** | What latency, throughput, storage, memory, availability, safety, security, cost, and maintainability targets are binding? | <400 ms p95, 8 GB RAM, 99.9% availability, reversible migrations |

 **Common mistake**: User asks about "university classroom issues" but sources include policies targeting "K-12 students" — mismatched target populations will invalidate the entire research.

@@ -116,9 +142,11 @@ Record the audit result in `00_question_decomposition.md` as a "Completeness Aud
   - Summary of relevant problem context from INPUT_DIR
   - Classified question type and rationale
   - **Research subject boundary definition** (population, geography, timeframe, level)
+   - **Project Constraint Matrix summary** (operating context, required interfaces, non-functional envelope, lifecycle assumptions, and hard disqualifiers extracted from input files)
   - List of decomposed sub-questions
   - **Chosen perspectives** (at least 3 from the Perspective Rotation table) with rationale
   - **Search query variants** for each sub-question (at least 3-5 per sub-question)
+   - **Component Option Search Plan** (component areas, option families, candidate names, query variants, required evidence)
   - **Completeness audit** (taxonomy cross-reference + domain discovery results)
 4. Write TodoWrite to track progress

@@ -132,7 +160,7 @@ Tier sources by authority, **prioritize primary sources** (L1 > L2 > L3 > L4). C

 **Tool Usage**:
 - Use `WebSearch` for broad searches; `WebFetch` to read specific pages
- Use the `context7` MCP server (`resolve-library-id` then `get-library-docs`) for up-to-date library/framework documentation
+- Use the `context7` MCP server (`resolve-library-id` then `query-docs` / `get-library-docs`) for up-to-date library/framework documentation. **Mandatory per lead candidate** — see "API Capability Verification" below.
 - Always cross-verify training data claims against live sources for facts that may have changed (versions, APIs, deprecations, security advisories)
 - When citing web sources, include the URL and date accessed

@@ -145,17 +173,77 @@ Do not stop at the first few results. The goal is to build a comprehensive evide
 - Consult at least **2 different source tiers** per sub-question (e.g., L1 official docs + L4 community discussion)
 - If initial searches yield fewer than 3 relevant sources for a sub-question, **broaden the search** with alternative terms, related domains, or analogous problems

+**Minimum search effort per component area**:
+- Search every option family from the "Component Option Search Plan" before choosing a lead candidate.
+- For each lead, fallback, or rejected candidate, search at least one official/source-of-truth page and at least one independent validation source when available.
+- Search `"[component] alternatives"`, `"[candidate] vs [alternative]"`, `"[candidate] limitations"`, `"[candidate] license"`, `"[candidate] production"`, and `"[candidate] [binding project constraint]"`.
+- If fewer than 3 realistic candidates are found for a component area, explicitly document why the landscape is narrow and search adjacent domains before accepting that result.
+- Include at least one simple baseline and one "do not use" or disqualified candidate per component area when possible; these prevent false confidence in the selected option.
+
+**Candidate implementation-limit searches (MANDATORY)**:
+For every component/tool/library/service/pattern/algorithm that may be selected or recommended, search for its intrinsic implementation constraints. Do not rely on product category labels, marketing summaries, or examples from a different operating context. Include query variants for:
+- Official supported inputs/outputs, protocols, data formats, and deployment modes
+- Required hardware/runtime/platform/version constraints
+- Timing, throughput, memory, storage, synchronization, and scaling assumptions
+- Lifecycle assumptions: offline vs online, batch vs real time, development vs production, single tenant vs multi tenant, local vs networked
+- Known unsupported scenarios, limitations, issue reports, production failures, and workarounds
+- Licensing, security, maintenance, and community-health constraints
+- Exact phrases from the project's restrictions and acceptance criteria combined with the candidate name
+
+**API Capability Verification — Per-Mode (MANDATORY, BLOCKING for lead candidates)**:
+
+**Applicability**: this section applies only when the run is classified as **Technical-component selection** in the SKILL's Research Output Class section, and only to lead candidates that are libraries/SDKs/frameworks/services/protocols/data formats with multiple modes or configurations. For non-technical research (concept comparison, market/policy investigation, knowledge organization, root-cause analysis without tooling commitments), skip this entire sub-section and continue with the rest of Step 2 — the broader candidate implementation-limit search above is sufficient. State the skip explicitly once in `02_fact_cards.md` (or in `02_fact_cards/00_summary.md` if split): `API Capability Verification: not applicable — this run is a Non-technical investigation, no library/SDK/service candidates`.
+
+Most libraries/SDKs/services expose **multiple modes or configurations** (e.g., monocular vs stereo VO, sync vs async API, batch vs streaming inference, write-through vs write-behind cache). Selecting a candidate "because it supports X" without pinning *which mode* the project will use, and *whether that exact mode produces the required outputs from the required inputs*, is the most common silent-failure path in research. A library can support a class of problem in mode A while being unusable for the project's specific configuration in mode B.
+
+For every lead candidate that is a library/SDK/framework/service with multiple modes or configurations, do the following — in this order, before marking the candidate `Selected`:
+
+1. **Pin the exact mode/configuration the project will use.**
+   Derived from the Project Constraint Matrix: which inputs are available (sensor count, sensor types, data shapes, rates), which outputs are required (per `acceptance_criteria.md` and contract files), which hardware/runtime is fixed (per `restrictions.md`). Write this as a single sentence: "We will use `<library>` in `<mode/config>` with inputs `<list>` and expect outputs `<list>` on `<runtime>`." Do not progress past this step on a vague mode description.
+
+2. **Run `context7` (or equivalent docs lookup) for the candidate** — this is **mandatory for every lead library/SDK/framework candidate**, not optional. Minimum three queries per candidate:
+   1. *Mode enumeration*: "What modes/configurations does `<library>` support? List every value of the mode/config enum and what each requires as input."
+   2. *Project's exact mode*: "Show a minimum runnable example of `<library>` in `<the pinned mode>` with `<the project's input shape>`. What does it produce?"
+   3. *Disqualifier probe*: "Does `<library>` `<the pinned mode>` produce `<the required output>`? Are there published limitations of `<the pinned mode>` for `<the project's runtime/hardware>`?"
+
+   For services without context7 coverage, use official docs site + WebFetch on the API reference page + the project's example/tutorial directory in the source repo. Append every consulted URL to `01_source_registry.md` (or the appropriate category file under `01_source_registry/` if split — see splittable-artifacts convention in `00_project-integration.md`).
+
+3. **Save a Minimum Viable Example (MVE) for the pinned mode.**
+   Append to `02_fact_cards.md` / `02_fact_cards/` (or a sibling `02_mve_evidence.md`) at least one block per lead library candidate with:
+
+   ```markdown
+   ## MVE — <library> in <pinned mode>
+   - **Source**: <official URL or context7 reference, with date>
+   - **Inputs in the example**: <e.g., 2 calibrated cameras + IMU at 200 Hz>
+   - **Outputs in the example**: <e.g., 6-DoF pose with covariance>
+   - **Project inputs**: <e.g., 1 camera + IMU at 200 Hz>
+   - **Project outputs required**: <e.g., 6-DoF pose with metric translation>
+   - **Match assessment**: ✅ exact match / ⚠️ partial (specify dimension) / ❌ mismatch (specify dimension)
+   - **If ⚠️ or ❌**: cite the official-docs sentence that establishes the mismatch.
+   ```
+
+   If no official example covers the project's exact configuration → the candidate cannot be marked `Selected` based on category fit alone. Status must be `Experimental only` (with required-evidence note) or `Rejected` (when the docs explicitly disqualify the configuration).
+
+4. **Bind every numbered Restriction and Acceptance Criterion to the candidate's pinned mode.**
+   For each numbered line in `restrictions.md` and `acceptance_criteria.md`, decide one of: `Pass` (the pinned mode satisfies it with cited evidence), `Fail` (the pinned mode contradicts it with cited evidence), `Verify` (no evidence either way; deeper investigation required), `N/A` (the line is irrelevant to this component area). Record this in `02_fact_cards.md` (or the candidate's per-component file under `02_fact_cards/` if split) under the candidate's MVE block. The structural matrix in Step 7.5 reads from these bindings.
+
+5. **Treat "the same library in a different mode" as a different candidate.**
+   If the project's pinned mode is `Monocular` but the only documented evidence covers `Stereo`, do not silently soften "rotation only" into "rotation + translation". Open a separate candidate row for the Monocular mode, with its own MVE, fit assessment, and disqualifiers. Two modes of one library are two distinct candidates for the purposes of this gate.
+
+**Common silent-failure pattern this guards against**: a fact card paraphrases the docs as "supports A, B, C, D modes" when the docs actually mean "supports A; B; C and D as separate orthogonal modes". A category-level "Selected" decision then carries through every downstream artifact, masking that the project's required A+B combination does not exist as a single mode.
+
 **Search broadening strategies** (use when results are thin):
 - Try adjacent fields: if researching "drone indoor navigation", also search "robot indoor navigation", "warehouse AGV navigation"
 - Try different communities: academic papers, industry whitepapers, military/defense publications, hobbyist forums
 - Try different geographies: search in English + search for European/Asian approaches if relevant
 - Try historical evolution: "history of X", "evolution of X approaches", "X state of the art 2024 2025"
 - Try failure analysis: "X project failure", "X post-mortem", "X recall", "X incident report"
+- Try disqualifier probes: "X unsupported", "X limitations", "X requirements", "X with [project constraint]", "X without [required input]", "X real-time [target]", "X production failure"

 **Search saturation rule**: Continue searching until new queries stop producing substantially new information. If the last 3 searches only repeat previously found facts, the sub-question is saturated.

 **Save action**:
-For each source consulted, **immediately** append to `01_source_registry.md` using the entry template from `references/source-tiering.md`.
+For each source consulted, **immediately** append to `01_source_registry.md` (or the appropriate category file under `01_source_registry/` if split) using the entry template from `references/source-tiering.md`.

 ---

@@ -185,7 +273,7 @@ Transform sources into **verifiable fact cards**:
  - ❓ Low: Inference or from unofficial sources

 **Save action**:
-For each extracted fact, **immediately** append to `02_fact_cards.md`:
+For each extracted fact, **immediately** append to `02_fact_cards.md` (or the appropriate category file under `02_fact_cards/` if split):
 ```markdown
 ## Fact #[number]
 - **Statement**: [specific fact description]
@@ -194,6 +282,7 @@ For each extracted fact, **immediately** append to `02_fact_cards.md`:
 - **Target Audience**: [which group this fact applies to, inherited from source or further refined]
 - **Confidence**: ✅/⚠️/❓
 - **Related Dimension**: [corresponding comparison dimension]
+- **Fit Impact**: [supports selection / disqualifies / makes experimental / needs user decision]
 ```

 **Target audience in fact statements**:
@@ -229,7 +318,7 @@ After initial fact extraction, review what you have found and identify **knowled
   - Failure cases and edge conditions
   - Recent developments that may change the picture

-4. **Update artifacts**: Append new sources to `01_source_registry.md`, new facts to `02_fact_cards.md`
+4. **Update artifacts**: Append new sources to `01_source_registry.md`, new facts to `02_fact_cards.md` (use the appropriate category files under `01_source_registry/` and `02_fact_cards/` if split)

 **Exit criteria**: Proceed to Step 4 when:
 - Every sub-question has at least 3 facts with at least one from L1/L2
@@ -24,6 +24,18 @@ Write to `03_comparison_framework.md`:
 | ... | | | |
 ```

+**Required exact-fit dimensions for component/tool decisions**:
+When the output selects or recommends a component, tool, library, service, architecture pattern, or algorithm, the framework MUST include these dimensions unless explicitly not applicable:
+- Option family (`Simple baseline`, `Established production`, `Open-source`, `Commercial/vendor`, `Current SOTA`, `Adjacent-domain`, `No-build/defer`, `Known bad`)
+- Required inputs/outputs and ownership boundaries
+- Operating context and lifecycle fit
+- Non-functional envelope fit
+- Implementation assumptions and hard disqualifiers
+- Evidence quality and source tier
+- Selection status (`Selected`, `Rejected`, `Experimental only`, `Needs user decision`)
+
+For each component area, include multiple candidates in the initial population. Do not present only the preferred option unless the investigation found no realistic alternatives; if so, state the searches that proved the narrow landscape.
+
 ---

 ### Step 5: Reference Point Baseline Alignment
@@ -97,6 +109,8 @@ Validate conclusions against a typical scenario:
 - [ ] Are there any important dimensions missed?
 - [ ] Is there any over-extrapolation?
 - [ ] Are conclusions actionable/verifiable?
+- [ ] Does every selected component/tool/pattern match the Project Constraint Matrix?
+- [ ] Are mismatches marked as disqualifiers instead of hidden as generic "limitations"?

 **Save action**:
 Write to `05_validation_log.md`:
@@ -128,6 +142,66 @@ If using Y: [expected behavior]

 ---

+### Step 7.5: Component Applicability Gate (BLOCKING)
+
+**Applicability**: this gate applies only when the run is classified as **Technical-component selection** in the SKILL's Research Output Class section. For non-technical research (concept comparison, market/policy investigation, root-cause analysis without tooling, knowledge organization), skip this entire step and proceed to Step 8 — there are no components to gate. State the skip once in `05_validation_log.md`: `Step 7.5 (Component Applicability Gate): not applicable — Non-technical investigation`. For mixed runs (some component areas technical, some not), apply this gate only to the technical component areas; the non-technical ones do not produce 7.5 rows.
+
+Before finalizing the solution draft, build an exact-fit matrix for every component/tool/library/service/pattern/algorithm that is selected, recommended, rejected, or treated as a fallback. Free-form prose in a "Project Constraints Checked" column is **not sufficient** — mismatches hide inside rationale text. The matrix must be structured per restriction and per acceptance criterion.
+
+#### 7.5.1 Top-level Component Fit Matrix
+
+```markdown
+# Component Fit Matrix
+
+| Component Area | Candidate | Pinned Mode/Config | Option Family | Intended Role | API Capability Evidence | Mismatches / Disqualifiers | Status | Decision Rationale |
+|----------------|-----------|--------------------|---------------|---------------|-------------------------|----------------------------|--------|--------------------|
+| [area] | [name] | [exact mode/config the project will use, copied verbatim from the MVE block in Step 2] | [family] | [role] | MVE: [link to MVE block in `02_fact_cards.md` / `02_fact_cards/` or `02_mve_evidence.md`]; docs: [Source #] | [none / list] | Selected / Rejected / Experimental only / Needs user decision | [why] |
+```
+
+The new **Pinned Mode/Config** column is mandatory. A row without a pinned mode is incomplete. The new **API Capability Evidence** column links to the Minimum Viable Example saved during Step 2's API Capability Verification — without an MVE link the candidate cannot be `Selected`.
+
+#### 7.5.2 Restrictions × Candidate-Modes Sub-Matrix (MANDATORY)
+
+For each lead candidate row in the top-level matrix, append a structured cross-check that walks every numbered line of `restrictions.md` and `acceptance_criteria.md` against the candidate's **pinned mode/config**.
+
+```markdown
+## Sub-Matrix — <Candidate Name> in <Pinned Mode>
+
+| Restriction / AC | Candidate-mode behavior | Result | Evidence |
+|------------------|-------------------------|--------|----------|
+| R1: <verbatim line from restrictions.md> | <how the pinned mode behaves under this restriction> | ✅ Pass / ❌ Fail / ❓ Verify / N/A | [Fact # / Source # / MVE link] |
+| R2: ... | ... | ... | ... |
+| ... | ... | ... | ... |
+| AC-1.1: <verbatim line from acceptance_criteria.md> | <how the pinned mode satisfies (or contradicts) this AC's measurable target> | ✅ / ❌ / ❓ / N/A | [Fact # / Source # / MVE link] |
+| AC-1.2: ... | ... | ... | ... |
+| ... | ... | ... | ... |
+```
+
+Cell semantics:
+- ✅ **Pass** — the candidate's pinned mode satisfies this line, with cited official-doc or MVE evidence.
+- ❌ **Fail** — the candidate's pinned mode contradicts this line, with cited evidence. Even one ❌ disqualifies the candidate from `Selected` status.
+- ❓ **Verify** — no evidence yet either way; further investigation required (loops back to Step 2 / Step 3.5). A row left ❓ at the end of analysis blocks the candidate.
+- **N/A** — the line is irrelevant to this component area (state why in one phrase).
+
+A candidate row may not be marked `Selected` while any cell is ❌ or ❓.
+
+#### 7.5.3 Decision Rules
+
+- `Selected` is allowed only when (a) the top-level row has an MVE link, (b) the sub-matrix has zero ❌, (c) the sub-matrix has zero ❓, and (d) the candidate's documented implementation assumptions match the project's explicit constraints and acceptance criteria.
+- `Experimental only` is required when a candidate might work but lacks proof for the exact operating context (e.g., MVE exists for a similar configuration but not the exact one).
+- `Rejected` is required when documented assumptions conflict with project constraints (any sub-matrix row is ❌ with cited evidence).
+- `Needs user decision` is required when a mismatch changes scope, cost, safety, product behavior, or acceptance criteria — and the user has not yet been consulted.
+- Each component area must include at least one selected or fallback-safe option, plus the most credible rejected/experimental alternatives discovered during web research.
+- A component area with only one candidate is incomplete unless `00_question_decomposition.md` documents the broader searches and why they yielded no realistic alternatives.
+- A candidate may not appear as the lead solution in Step 8 unless this gate marks it `Selected`.
+- "Validation gate required" footnotes are not equivalent to `Selected`. If the validation gate concerns API capability (does the mode produce the required output?), that is a Step-2 / Step-7.5 question and must be resolved here, not deferred to runtime. Only validation gates concerning *runtime quality* (e.g., "does this VO converge on this terrain class?") may be carried forward as `Selected with runtime gate`.
+
+**Save action**: Write `06_component_fit_matrix.md` (or, when split, the equivalent files under `06_component_fit_matrix/` — typically `00_summary.md` for the top-level matrix plus per-component sub-matrix files) containing both 7.5.1 (top-level) and 7.5.2 (per-candidate sub-matrices).
+
+**BLOCKING**: If any lead candidate has ❌, ❓, `Experimental only`, `Rejected`, or `Needs user decision` status, do not silently proceed. Ask the user or choose a different selected candidate.
+
+---
+
 ### Step 8: Deliverable Formatting

 Make the output **readable, traceable, and actionable**.
@@ -139,8 +213,8 @@ Integrate all intermediate artifacts. Write to `OUTPUT_DIR/solution_draft##.md`

 Sources to integrate:
 - Extract background from `00_question_decomposition.md`
- Reference key facts from `02_fact_cards.md`
+- Reference key facts from `02_fact_cards.md` (or files under `02_fact_cards/` if split)
 - Organize conclusions from `04_reasoning_chain.md`
- Generate references from `01_source_registry.md`
+- Generate references from `01_source_registry.md` (or files under `01_source_registry/` if split)
 - Supplement with use cases from `05_validation_log.md`
 - For Mode A: include AC assessment from `00_ac_assessment.md`
@@ -10,12 +10,21 @@

 [Architecture solution that meets restrictions and acceptance criteria.]

+> **Applicability** — the table columns `Pinned Mode/Config` and `API Capability Evidence` apply only to technical-component runs (per SKILL.md → Research Output Class). For non-technical research outputs (concept comparison, market/policy report, investigation answer), this Architecture section may be replaced with a comparison/analysis section that does not use these columns; or the columns may be marked `N/A` per row when the row describes a non-technical "component" (a process, a policy, an organizational construct). For mixed runs, fill the columns only on rows that describe libraries/SDKs/frameworks/services/protocols/data formats/algorithms.
+
 ### Component: [Component Name]

-| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
-|----------|-------|-----------|-------------|-------------|----------|------|-----|
-| [Option 1] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [cost] | [fit assessment] |
-| [Option 2] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [cost] | [fit assessment] |
+| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit |
+|----------|-------|--------------------|-----------|-------------|-------------|----------|------|-------------------------|-----|
+| [Option 1] | [lib/platform] | [exact mode/config used: inputs, outputs, runtime] | [pros] | [cons] | [intrinsic requirements] | [security] | [cost] | MVE: [link to MVE block]; docs: [Source #] | [Selected / Rejected / Experimental only / Needs user decision — cite exact-fit evidence and disqualifiers] |
+| [Option 2] | [lib/platform] | [exact mode/config used] | [pros] | [cons] | [intrinsic requirements] | [security] | [cost] | MVE: [link]; docs: [Source #] | [Selected / Rejected / Experimental only / Needs user decision] |
+
+**Exact-fit evidence**:
+- Project constraints checked: [inputs/outputs, operating context, lifecycle, NFRs, acceptance criteria]
+- Evidence: [Fact # / Source #]
+- Disqualifiers: [none or list]
+- Restrictions × Candidate-Modes sub-matrix: see `06_component_fit_matrix.md` (or `06_component_fit_matrix/` if split) § <Candidate Name>
+- API capability gates: ✅ MVE saved / ⚠️ partial — see disqualifiers / ❌ no MVE — candidate is Experimental only or Rejected

 [Repeat per component]

@@ -13,12 +13,21 @@

 [Architecture solution that meets restrictions and acceptance criteria.]

+> **Applicability** — the table columns `Pinned Mode/Config` and `API Capability Evidence` apply only to technical-component runs (per SKILL.md → Research Output Class). For non-technical assessment outputs (e.g., reassessing a policy approach, comparing organizational designs), this Architecture section may be replaced with the assessment content that does not use these columns; or the columns may be marked `N/A` per row for non-technical "components". For mixed runs, fill the columns only on rows that describe libraries/SDKs/frameworks/services/protocols/data formats/algorithms.
+
 ### Component: [Component Name]

-| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
-|----------|-------|-----------|-------------|-------------|----------|------------|-----|
-| [Option 1] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [perf] | [fit assessment] |
-| [Option 2] | [lib/platform] | [pros] | [cons] | [reqs] | [security] | [perf] | [fit assessment] |
+| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Performance | API Capability Evidence | Fit |
+|----------|-------|--------------------|-----------|-------------|-------------|----------|------------|-------------------------|-----|
+| [Option 1] | [lib/platform] | [exact mode/config used: inputs, outputs, runtime] | [pros] | [cons] | [intrinsic requirements] | [security] | [perf] | MVE: [link to MVE block]; docs: [Source #] | [Selected / Rejected / Experimental only / Needs user decision — cite exact-fit evidence and disqualifiers] |
+| [Option 2] | [lib/platform] | [exact mode/config used] | [pros] | [cons] | [intrinsic requirements] | [security] | [perf] | MVE: [link]; docs: [Source #] | [Selected / Rejected / Experimental only / Needs user decision] |
+
+**Exact-fit evidence**:
+- Project constraints checked: [inputs/outputs, operating context, lifecycle, NFRs, acceptance criteria]
+- Evidence: [Fact # / Source #]
+- Disqualifiers: [none or list]
+- Restrictions × Candidate-Modes sub-matrix: see `06_component_fit_matrix.md` (or `06_component_fit_matrix/` if split) § <Candidate Name>
+- API capability gates: ✅ MVE saved / ⚠️ partial — see disqualifiers / ❌ no MVE — candidate is Experimental only or Rejected

 [Repeat per component]

@@ -22,7 +22,7 @@ test-run has two modes. The caller passes the mode explicitly; if missing, defau
 | Mode | Scope | Typical caller | Input artifacts |
 |------|-------|---------------|-----------------|
 | `functional` (default) | Unit / integration / blackbox tests — correctness | autodev Steps that verify after Implement Tests or Implement | `scripts/run-tests.sh`, `_docs/02_document/tests/environment.md`, `_docs/02_document/tests/blackbox-tests.md` |
-| `perf` | Performance / load / stress / soak tests — latency, throughput, error-rate thresholds | autodev greenfield Step 9, existing-code Step 15 (pre-deploy) | `scripts/run-performance-tests.sh`, `_docs/02_document/tests/performance-tests.md`, AC thresholds in `_docs/00_problem/acceptance_criteria.md` |
+| `perf` | Performance / load / stress / soak tests — latency, throughput, error-rate thresholds | autodev greenfield Step 15, existing-code Step 15 (pre-deploy) | `scripts/run-performance-tests.sh`, `_docs/02_document/tests/performance-tests.md`, AC thresholds in `_docs/00_problem/acceptance_criteria.md` |

 Direct user invocation (`/test-run`) defaults to `functional`. If the user says "perf tests", "load test", "performance", or passes a performance scenarios file, run `perf` mode.

@@ -32,6 +32,17 @@ After selecting a mode, read its corresponding workflow below; do not mix them.

 ## Functional Mode

+### 0. System-Under-Test Reality Gate
+
+Before accepting any functional, blackbox, or e2e result as a pass, verify what the tests actually exercised.
+
+1. If `_docs/00_problem/input_data/expected_results/results_report.md` exists, at least one e2e/blackbox run must compare actual product outputs against that mapping or the machine-readable files it references.
+2. Stubs are allowed only for external systems outside the product boundary: flight controller/SITL, QGC observer, satellite-provider/Suite service, physical Jetson hardware, physical camera, unavailable licensed datasets, and network services.
+3. Stubs, fakes, deterministic fallbacks, monkeypatches, or direct replacement of internal product modules are not allowed for the behavior under test. Internal examples include VIO, safety/anchor wrapper, satellite retrieval, anchor verification, tile manager, MAVLink output adapter, FDR, and the A-Z localization pipeline.
+4. If tests pass only because an internal module is fake/scaffolded, classify the run as **failed** with category `missing product implementation`.
+5. If a scenario is blocked because external hardware/data is absent, verify the production code path exists before accepting the block as legitimate. Missing internal production code is not an environment block.
+6. If the test runner writes CSV/Markdown reports, inspect them. A zero exit code is not enough; blocked/internal-stubbed scenarios still require classification.
+
 ### 1. Detect Test Runner

 Check in order — first match wins:
@@ -94,7 +105,7 @@ Categorize skips as: **explicit skip (dead code)**, **runtime skip (unreachable)

 ### 5. Handle Outcome

-**All tests pass, zero skipped** → return success to the autodev for auto-chain.
+**All tests pass, zero skipped, and the System-Under-Test Reality Gate passes** → return success to the autodev for auto-chain.

 **Any test fails or errors** → this is a **blocking gate**. Never silently ignore failures. **Always investigate the root cause before deciding on an action.** Read the failing test code, read the error output, check service logs if applicable, and determine whether the bug is in the test or in the production code.

@@ -95,7 +95,7 @@ Examples:

 File: `expected_results/image_01_detections.json`

-```json
+```json
 {
  "input": "image_01.jpg",
  "expected": {
@@ -119,7 +119,7 @@ File: `expected_results/image_01_detections.json`
    ]
  }
 }
-```
+```
 ```

 ---
@@ -0,0 +1,16 @@
+target/
+.git/
+.gitignore
+.cargo/
+*.md
+!README.md
+_docs/
+.woodpecker/
+.woodpecker.yml
+.cursor/
+.idea/
+.vscode/
+.DS_Store
+MAVSDK/
+ardupilot/
+build/
@@ -0,0 +1,24 @@
+# autopilot — example environment variables.
+# Copy to `.env` for local dev. `.env` is git-ignored.
+#
+# Non-secret config lives in TOML under config/; this file is for runtime overrides
+# and secrets only (see _docs/02_document/deployment/containerization.md §6).
+
+# Path to the active TOML config. Dev/staging/prod all read this single variable.
+AUTOPILOT_CONFIG=./config/autopilot.dev.toml
+
+# tracing-subscriber filter (see observability.md §2).
+RUST_LOG=info,autopilot=debug
+
+# Health server bind address (matches config.toml default).
+AUTOPILOT_HEALTH_BIND=127.0.0.1:8080
+
+# Runtime VLM flag. The binary must ALSO be built with `--features vlm`
+# for this flag to enable the VLM path.
+AUTOPILOT_VLM_ENABLED=false
+
+# Secrets (must be supplied per environment; never commit real values)
+# In production these come from systemd `EnvironmentFile=` pointing at a
+# permission-restricted file (see containerization.md §3).
+MISSIONS_API_TOKEN=
+GROUND_STATION_TOKEN=
@@ -1,5 +1,21 @@
-MAVSDK/
-ardupilot/
-build/
+/target
+/MAVSDK
+/ardupilot
+/build
 .idea
-.DS_Store
+.DS_Store
+*.swp
+*.swo
+
+# Local environment overrides
+.env
+.env.local
+
+# Editor scratch
+.vscode/
+*.iml
+
+# Coverage / profiling
+*.profraw
+tarpaulin-report.html
+coverage/
@@ -0,0 +1,90 @@
+# Woodpecker CI pipeline.
+# Stages run sequentially per _docs/02_document/deployment/ci_cd_pipeline.md §2.
+# A failed stage stops the pipeline.
+
+clone:
+  git:
+    image: woodpeckerci/plugin-git
+
+steps:
+  fetch:
+    image: rust:1.82-bookworm
+    commands:
+      - cargo fetch --locked
+
+  lint:
+    image: rust:1.82-bookworm
+    commands:
+      - rustup component add rustfmt clippy
+      - cargo fmt --all -- --check
+      - cargo clippy --all-targets --all-features -- -D warnings
+
+  unit-test:
+    image: rust:1.82-bookworm
+    commands:
+      - cargo test --workspace --all-features --locked
+
+  build-arm64:
+    image: rust:1.82-bookworm
+    commands:
+      - rustup target add aarch64-unknown-linux-gnu
+      - cargo install --locked cargo-zigbuild
+      - apt-get update && apt-get install -y --no-install-recommends zig
+      - cargo zigbuild --release --target aarch64-unknown-linux-gnu --workspace --locked
+
+  build-no-vlm:
+    image: rust:1.82-bookworm
+    commands:
+      - cargo build --workspace --no-default-features --locked
+      - cargo test --workspace --no-default-features --locked
+
+  integration-test:
+    image: rust:1.82-bookworm
+    commands:
+      - cargo test --workspace --all-features --locked -- --test-threads=1
+    when:
+      event: [push, pull_request]
+
+  sitl-conformance:
+    image: docker:24-cli
+    commands:
+      - docker compose -f docker-compose.test.yml up --abort-on-container-exit --exit-code-from autopilot
+    when:
+      event: [push, pull_request]
+
+  security-scan:
+    image: rust:1.82-bookworm
+    commands:
+      - cargo install --locked cargo-audit cargo-deny
+      - cargo audit
+      - cargo deny check
+
+  package:
+    image: docker:24-cli
+    commands:
+      - docker build -t azaion/autopilot:$${CI_COMMIT_BRANCH}-arm64 .
+    when:
+      branch: [dev, main]
+      event: push
+
+  sign:
+    image: cosign:latest
+    commands:
+      - cosign sign --yes azaion/autopilot:$${CI_COMMIT_TAG}-arm64
+    when:
+      event: tag
+
+  publish:
+    image: docker:24-cli
+    commands:
+      - docker push azaion/autopilot:$${CI_COMMIT_TAG}-arm64
+    when:
+      event: tag
+
+  # Benchmark gate is opt-in (manual / nightly) per ci_cd_pipeline.md §6.
+  benchmark-gate:
+    image: rust:1.82-bookworm
+    commands:
+      - cargo bench --workspace -- --save-baseline ci
+    when:
+      event: cron
@@ -0,0 +1,123 @@
+[workspace]
+resolver = "2"
+members = [
+    "crates/shared",
+    "crates/autopilot",
+    "crates/mavlink_layer",
+    "crates/mission_client",
+    "crates/frame_ingest",
+    "crates/detection_client",
+    "crates/movement_detector",
+    "crates/semantic_analyzer",
+    "crates/vlm_client",
+    "crates/scan_controller",
+    "crates/mapobjects_store",
+    "crates/gimbal_controller",
+    "crates/operator_bridge",
+    "crates/mission_executor",
+    "crates/telemetry_stream",
+]
+
+[workspace.package]
+edition = "2021"
+rust-version = "1.82"
+license = "Proprietary"
+publish = false
+authors = ["AZAION autopilot team"]
+
+[workspace.dependencies]
+# Async runtime
+tokio = { version = "1", features = ["rt-multi-thread", "macros", "sync", "time", "io-util", "net", "signal"] }
+
+# Foundational
+bytes = "1"
+anyhow = "1"
+thiserror = "1"
+async-trait = "0.1"
+once_cell = "1"
+
+# Serialisation
+serde = { version = "1", features = ["derive"] }
+serde_json = "1"
+toml = "0.8"
+
+# IDs and time
+uuid = { version = "1", features = ["v4", "serde"] }
+chrono = { version = "0.4", default-features = false, features = ["clock", "serde"] }
+
+# Observability
+tracing = "0.1"
+tracing-subscriber = { version = "0.3", features = ["env-filter", "json", "fmt"] }
+
+# CLI
+clap = { version = "4", features = ["derive", "env"] }
+
+# Health server
+axum = { version = "0.7", default-features = false, features = ["http1", "json", "tokio"] }
+tower = "0.5"
+hyper = { version = "1", features = ["server", "http1"] }
+
+# Networking / transports / schema
+reqwest = { version = "0.12", default-features = false, features = ["json", "rustls-tls", "gzip"] }
+jsonschema = { version = "0.18", default-features = false }
+tokio-serial = "5"
+
+# gRPC (operator-link transport — see telemetry_stream / detection_client)
+tonic = "0.14"
+tonic-prost = "0.14"
+prost = "0.14"
+prost-types = "0.14"
+tonic-prost-build = "0.14"
+protoc-bin-vendored = "3"
+tokio-stream = { version = "0.1", features = ["sync", "net"] }
+
+# Lock-free / sync helpers
+parking_lot = "0.12"
+
+# Crypto / hashing
+sha2 = "0.10"
+hmac = "0.12"
+
+# Wire encoding (VLM IPC)
+base64 = "0.22"
+
+# OS bindings (SO_PEERCRED on Linux)
+libc = "0.2"
+
+# Geospatial
+h3o = "0.7"
+
+# Multimedia (RTSP + H.264/265 decode for frame_ingest — see AZ-658).
+# Linked dynamically against the host FFmpeg 8.x install (libavcodec /
+# libavformat / libavutil / libswscale / libswresample) via pkg-config.
+ffmpeg-next = "8.1"
+
+# Test scaffolding
+wiremock = "0.6"
+tempfile = "3"
+
+# Workspace-internal
+shared = { path = "crates/shared" }
+mavlink_layer = { path = "crates/mavlink_layer" }
+mission_client = { path = "crates/mission_client" }
+frame_ingest = { path = "crates/frame_ingest" }
+detection_client = { path = "crates/detection_client" }
+movement_detector = { path = "crates/movement_detector" }
+semantic_analyzer = { path = "crates/semantic_analyzer" }
+vlm_client = { path = "crates/vlm_client" }
+scan_controller = { path = "crates/scan_controller" }
+mapobjects_store = { path = "crates/mapobjects_store" }
+gimbal_controller = { path = "crates/gimbal_controller" }
+operator_bridge = { path = "crates/operator_bridge" }
+mission_executor = { path = "crates/mission_executor" }
+telemetry_stream = { path = "crates/telemetry_stream" }
+
+[profile.release]
+lto = "thin"
+codegen-units = 1
+strip = "symbols"
+opt-level = 3
+
+[profile.dev]
+opt-level = 0
+debug = true
@@ -1,10 +1,52 @@
-FROM python:3.11-slim
-ARG CI_COMMIT_SHA=unknown
-ENV AZAION_REVISION=$CI_COMMIT_SHA
-RUN apt-get update && apt-get install -y libxml2-dev libxslt1-dev && rm -rf /var/lib/apt/lists/*
-WORKDIR /app
-COPY requirements.txt .
-RUN pip install --no-cache-dir -r requirements.txt
-COPY . .
+# Multi-stage build for the autopilot binary.
+# Production image is intended for development / CI / emulation (Option B in
+# _docs/02_document/deployment/containerization.md §4); on-airframe deployment
+# uses the native systemd unit (Option A — see deploy/systemd/).
+
+# -----------------------------------------------------------------------------
+# Stage 1: build
+# -----------------------------------------------------------------------------
+ARG RUST_VERSION=1.82
+FROM rust:${RUST_VERSION}-bookworm AS build
+
+WORKDIR /workspace
+
+# Cache dependency compilation by copying manifests first, then source.
+COPY Cargo.toml Cargo.lock* rust-toolchain.toml ./
+COPY .cargo ./.cargo
+COPY crates ./crates
+
+# Default feature set. Override with `--build-arg CARGO_FEATURES=vlm` to enable VLM.
+ARG CARGO_FEATURES=
+RUN if [ -n "$CARGO_FEATURES" ]; then \
+        cargo build --release --features "$CARGO_FEATURES"; \
+    else \
+        cargo build --release; \
+    fi
+
+# -----------------------------------------------------------------------------
+# Stage 2: runtime — production-equivalent NVDEC/TensorRT plumbing (Jetson)
+# -----------------------------------------------------------------------------
+# For emulation environments without GPU we use ubuntu:22.04 (see compose).
+FROM ubuntu:22.04 AS runtime
+
+# Runtime deps (ca-certificates for HTTPS to missions API; libssl for TLS).
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends ca-certificates libssl3 \
+    && rm -rf /var/lib/apt/lists/*
+
+# Non-root user per containerization.md §4.
+RUN groupadd --system --gid 10001 autopilot \
+    && useradd --system --uid 10001 --gid autopilot --shell /usr/sbin/nologin autopilot \
+    && mkdir -p /etc/azaion/autopilot /var/lib/autopilot \
+    && chown -R autopilot:autopilot /var/lib/autopilot
+
+COPY --from=build /workspace/target/release/autopilot /usr/local/bin/autopilot
+
+USER autopilot:autopilot
+ENV AUTOPILOT_CONFIG=/etc/azaion/autopilot/config.toml \
+    RUST_LOG=info \
+    AUTOPILOT_HEALTH_BIND=0.0.0.0:8080
+
 EXPOSE 8080
-CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
+ENTRYPOINT ["/usr/local/bin/autopilot"]
@@ -1,3 +1,77 @@
-# Azaion.Autopilot
+# autopilot

-Python service for autonomous UAV control via MAVLink with behaviour tree execution.
+Onboard mission executor for the AZAION reconnaissance UAV. Single Rust binary; runs on
+NVIDIA Jetson Orin Nano Super (aarch64). See `_docs/02_document/architecture.md` for the
+authoritative system design.
+
+## Layout
+
+```text
+crates/
+  shared/                # canonical DTOs, config, error, health, observability, clock, contracts
+  autopilot/             # binary crate — runtime composition root + /health endpoint
+  mavlink_layer/         # hand-rolled MAVLink v2 transport
+  mission_client/        # missions API REST client + MapObjects sync
+  frame_ingest/          # RTSP pull + decode
+  detection_client/      # bi-directional gRPC to ../detections
+  movement_detector/     # ego-motion-compensated residual-motion clustering
+  semantic_analyzer/     # Tier 2 — primitive graph + ROI CNN
+  vlm_client/            # Tier 3 — optional NanoLLM/VILA local IPC
+  mapobjects_store/      # H3-indexed on-device map + ignored items
+  gimbal_controller/     # ViewPro A40 UDP control
+  scan_controller/       # central typed state machine (ZoomedOut/ZoomedIn/TargetFollow)
+  operator_bridge/       # POI surface + operator command authentication
+  mission_executor/      # multirotor + fixed-wing FSMs + geofence + failsafe
+  telemetry_stream/      # always-on uplink to Ground Station
+
+config/                  # TOML config per environment (dev / staging / prod)
+deploy/systemd/          # on-airframe native systemd unit (Option A)
+fixtures/                # replay clips (RTSP, MAVLink, missions, detections)
+tests/e2e/               # workspace-level blackbox scenarios
+benches/                 # NFR benchmark-gate harness
+```
+
+## Build
+
+```bash
+# Host-arch build + tests
+cargo build --workspace
+cargo test  --workspace --locked
+
+# Optional VLM feature path
+cargo build --workspace --features vlm
+
+# No-default-features path (enforces the VLM optionality contract)
+cargo build --workspace --no-default-features
+cargo test  --workspace --no-default-features
+
+# aarch64 cross-build (CI uses cargo-zigbuild; locally `cross` also works)
+cargo install --locked cargo-zigbuild
+rustup target add aarch64-unknown-linux-gnu
+cargo zigbuild --release --target aarch64-unknown-linux-gnu --workspace
+```
+
+## Run (dev)
+
+```bash
+cp .env.example .env
+docker compose up -d
+# Then inspect:
+curl -s http://127.0.0.1:8080/health | jq
+```
+
+## Documentation
+
+The full document tree lives under `_docs/`. Start with:
+
+- `_docs/00_problem/problem.md` — the problem statement
+- `_docs/02_document/architecture.md` — system architecture
+- `_docs/02_document/system-flows.md` — sequence diagrams
+- `_docs/02_document/components/<name>/description.md` — per-component specs
+- `_docs/02_document/deployment/{containerization,ci_cd_pipeline,observability}.md`
+
+## CI
+
+`.woodpecker.yml` drives the pipeline. Stages: `fetch → lint → unit-test →
+build-arm64 → build-no-vlm → integration-test → sitl-conformance → security-scan
+→ package → sign → publish → benchmark-gate (opt-in)`.
@@ -0,0 +1,93 @@
+# Acceptance Criteria
+
+Measurable, design-independent success criteria. Implementation choices (specific models, libraries, components, algorithms) belong in `_docs/01_solution/` and `_docs/02_document/`, NOT here. (Audited against `.cursor/rules/artifact-srp.mdc`.)
+
+Every criterion below is observable through the system's external behaviour and can be evaluated by a black-box test.
+
+## Latency
+
+- Primitive (Tier 1) object detection — per-frame end-to-end on the deployed compute device: **≤100 ms** at 1280 px input.
+- Semantic confirmation (Tier 2) over a single ROI: **≤200 ms**.
+- Deep semantic confirmation (Tier 3 / VLM, when enabled): **≤5 s** per ROI.
+- Camera zoom transition (medium → high): **≤2 s** wall-clock, including the physical zoom traversal.
+- Decision-to-movement latency (internal scan-control decision → camera physically moving): **≤500 ms**.
+- Movement candidate enqueue: **≤1 s** during the wide-area sweep; **≤1.5 s** during the zoomed-in inspection (accommodating gimbal slew).
+- Zoom-out → zoom-in transition (POI detected → ROI fully zoomed): **≤2 s** wall-clock.
+- Operator command → action: **≤500 ms** from operator click to outbound command (modem RTT excluded).
+
+## Throughput / Rate
+
+- POI rate surfaced to the operator: **≤5 POIs / minute** (hard cap; frozen 2026-05-06).
+- Position telemetry rate: **≥1 Hz**, target **10 Hz**.
+- Sustained camera frame-rate floor: **≥10 fps**. Below this, zoom-in transitions MUST be suppressed and overall health MUST surface yellow.
+
+## Detection Quality
+
+(Behaviour as observed at the system boundary. Model identity, training data, and label catalogue live in `_docs/02_document/architecture.md` and the `../ai-training` repo.)
+
+- New target classes (black entrances, branch piles, footpaths, roads, trees, tree blocks): per-class **precision ≥80%** AND **recall ≥80%**.
+- Existing-class regression: per-class precision and recall MUST NOT degrade by more than ±2 percentage points against the documented baseline.
+- Concealed-position recall (initial gate, accepting high false-positive rate): **≥60%**.
+- Concealed-position precision (initial gate, operators filter): **≥20%**.
+- Footpath recall: **≥70%**.
+
+## Movement Detection Behaviour
+
+- Small moving point/cluster candidates that are not yet classifiable MUST be detected during the wide-area sweep and enqueued for zoomed inspection within **≤1 s**.
+- Movement detection MUST continue during the zoomed-in inspection (a moving target that appears inside a held POI must not be lost), with enqueue within **≤1.5 s**.
+- Stable objects (trees, houses, roads, terrain) MUST NOT be treated as moving solely because the camera platform itself moves.
+- A configurable per-zoom-band false-positive budget MUST be honoured (the system must not flood the operator with false candidates by ignoring its own threshold).
+
+## Scan & Camera Control Behaviour
+
+- The wide-area sweep MUST cover the planned route with a left-right gimbal pattern at wide or light/medium zoom.
+- Transition from sweep to detailed inspection MUST complete within **≤2 s** of POI detection (including physical zoom).
+- During detailed inspection the system MUST keep the target locked while the airframe flies, pan to keep features visible, hold endpoints up to **2 s** for deep analysis, and return to the sweep after analysis or a configurable per-POI timeout (default **5 s/POI**).
+- After operator confirmation, target-follow mode MUST keep the target within the **centre 25%** of the frame while visible.
+- Gimbal commands MUST achieve **≤500 ms** decision-to-movement latency with visibly smooth transitions.
+- The POI queue MUST be ordered by confidence × proximity to current camera × age factor (relative ranking, not absolute formula).
+
+## Operator Workflow
+
+- The decision window surfaced to the operator MUST scale linearly with confidence: **40% confidence → 30 s; 100% confidence → 120 s**. Below 40% confidence, the POI MUST NOT be surfaced at all.
+- Operator-decline MUST result in a persistent ignored-item entry for the matching `(MGRS cell, class group)` so the same target is not re-surfaced.
+- Timeout (no operator response within the window) MUST NOT create an ignored-item entry (forget, do not blacklist).
+- A new detection whose `(MGRS cell, class group)` matches an existing ignored-item MUST NOT be surfaced.
+- Operator confirmation MUST result in (a) a middle waypoint inserted into the mission and (b) a transition to target-follow mode.
+- A replayed or unsigned operator command MUST be rejected with a logged security warning; system state MUST NOT change.
+
+## Reliability & Safety
+
+- Pre-flight self-test MUST pass (every dependency healthy OR explicit operator acknowledgement of a known degraded state) before takeoff is permitted.
+- Loss of operator/Ground-Station radio link MUST trigger a known mission-safe outcome within a deterministic, configurable grace window (default **30 s grace → RTL**).
+- Loss of airframe command link MUST surface health red immediately and defer to the airframe autopilot's own failsafe.
+- Battery at or below the configured **RTL floor** (e.g. 25%) MUST trigger RTL automatically; battery at or below the **hard floor** (e.g. 15%) MUST trigger land-now. Only an authenticated operator command may override.
+- MAVLink command exhaustion (bounded retry with exponential backoff fails through max-retry) MUST flip the airframe-link health to red.
+- Wall-clock drift greater than **200 ms** versus GPS or NTP source MUST surface health yellow.
+- Geofence INCLUSION and EXCLUSION violations MUST both result in waypoint refusal + RTL.
+
+## Resources & Data
+
+- Combined RSS on the deployed compute device, for everything autopilot owns onboard (excluding Tier 1), MUST stay within **≤6 GB**.
+- Tier 1 per-frame latency MUST NOT degrade by more than **±5 ms** when autopilot's own onboard workload is running concurrently.
+
+## Map Reconciliation (with the central area-level map)
+
+- Pre-flight map pull for a **30 km × 30 km** mission area: **≤30 s** wall-clock. Cache-fallback on timeout is acceptable only with explicit operator acknowledgement.
+- Post-flight pass diff push for a **60-minute** mission: **≤2 min** wall-clock. Failure MUST persist the pending diff to durable on-device storage with bounded retry.
+
+## Acceptance Gates (project-level)
+
+- A hardware/replay benchmark suite MUST pass before product implementation begins. Specifically: every latency criterion above MUST be measured on the deployed compute device, not on a developer workstation.
+- Per-season dataset coverage MUST be demonstrated before MVP sign-off (winter, spring, summer, autumn).
+- MAVLink command surface MUST pass SITL conformance against ArduPilot.
+
+## Q-tagged criteria (depend on open architecture decisions)
+
+These criteria are real and measurable; their tolerance ranges may sharpen once the linked open question resolves. The questions are tracked in `_docs/02_document/architecture.md §8`.
+
+- Movement detection false-positive rate at zoomed-in inspection — depends on **Q14** (classical-CV adequacy vs learned-CV fallback).
+- MapObjects conflict resolution behaviour — depends on **Q8** (append-only log + projection rules).
+- Operator-command authentication conformance — depends on **Q9** (signing scheme).
+- Airframe MAVLink-2 message signing — depends on **Q6**.
+- Per-season flight-test gates — depends on **Q13**.
@@ -0,0 +1,58 @@
+# Input Data
+
+Runtime inputs the autopilot consumes when flying, plus reference fixtures + expected-output assertions for tests. **All fixtures live inside this workspace** (`fixtures/`) — never reach into sibling repos at `../` for inputs. The autopilot repo is self-sufficient.
+
+## Layout
+
+| Path | Owns |
+|---|---|
+| `data_parameters.md` | Description of runtime input shapes (camera, telemetry, gRPC, mission JSON, operator commands, VLM IPC) + the categories of reference data tests need + Tier-1/Tier-2 class catalogue. |
+| `services.md` | Per-external-service test-mock requirements: what shape of mock/fixture each of the 7 external systems needs and the acquisition status of each. |
+| `fixtures/README.md` | File-by-file manifest of every fixture in this directory: SHA-256, size, upstream provenance, which `expected_results/results_report.md` rows consume it. |
+| `fixtures/images/` | Real aerial frames (5 images, ~9 MB total) — Tier-1 inputs for detection-quality assertions (L1, D2, D6). |
+| `fixtures/videos/` | Real reconnaissance video (1 clip, 12 MB) for frame-rate floor + sequence tests (T3). |
+| `fixtures/movement/` | Wide-area movement-detection visual reference clips (4 clips, ~23 MB total). **No paired `gimbal.csv` / `telemetry.csv`** — ego-motion compensation (M1–M4) cannot run against these alone. |
+| `fixtures/semantic/` | Concealed-position semantic reference frames (4 PNGs, ~11 MB total) + `data_parameters.md` describing the new YOLO primitive classes the examples motivate. **Starter set only**, not a graded eval set. |
+| `fixtures/schemas/` | Detection-result contract schemas (JSON + JSON-schema) for D6. |
+| `fixtures/sql/` | Database init script — reference only; not directly asserted by an autopilot AC. |
+| `expected_results/results_report.md` | The input → quantifiable-expected-output mapping consumed by `/test-spec` Phase 1. Every row keys off an AC in `../acceptance_criteria.md`; deferred rows carry a structured `<DEFERRED: <shape>; ref <pointer>>` tag. |
+
+## Why fixtures are local
+
+The autopilot repo MUST be self-sufficient — a developer with only the autopilot clone (no parent suite checked out) MUST be able to run the test specifications. Cross-repo `../` paths are forbidden in `results_report.md` and in any test runner script. When a sibling repo (`../detections/`, `../e2e/`, `../missions/`, etc.) is the upstream source of a fixture, we **copy** it in and SHA-pin it in `fixtures/README.md` so upstream drift is detectable.
+
+## Suite-level coupling that still matters
+
+Even though fixtures are local, the underlying contracts the fixtures embody come from suite-level decisions. When those decisions change, the fixtures here go stale:
+
+- **Tier-1 detection model / classes** — when `../detections` ships a new model the `expected_detections.json` baseline goes stale; D1, D2, D6 rows in `results_report.md` must be re-recorded.
+- **`mission-schema`** — shared between autopilot and the `missions` repo. Schema changes break the mission JSON contract; the mock fixtures for Mp1–Mp5 (when authored) must re-pin.
+- **Detection classes catalogue** — class IDs 0..18 are governed at the suite level. Autopilot's normalised-box output uses the same IDs. The 5 new Tier-1 classes documented in `data_parameters.md → "Class catalogue"` must land in the suite catalogue before D1 can be measured.
+
+Today these couplings are tracked manually. The `monorepo-e2e` skill at the suite root will eventually own the drift detection.
+
+## Fixture gaps and the project policy on `/test-spec` Phase 3
+
+`/test-spec` Phase 3 has a **hard 75% coverage gate** on rows with real input fixtures + real expected results. Today's coverage is well below that gate (see `expected_results/results_report.md → "Coverage Status"`). **Project policy as of 2026-05-19**: rather than block the autodev flow at the gate, each deferred row is registered with a structured `<DEFERRED: <shape>; ref <pointer>>` tag in `results_report.md`, pointing at the per-service acquisition path in `services.md` or at an open architecture question (Q-tag). Deferred rows become **release-gate items**, not development-gate items. The `acceptance_criteria.md → "Acceptance Gates (project-level)"` hardware/replay benchmark requirement remains a hard release blocker.
+
+Summary of open gaps (authoritative list lives in `services.md` and `fixtures/README.md`):
+
+1. **Paired `gimbal.csv` + `telemetry.csv` for the 4 movement clips** — highest priority (blocks M1–M4 + tightens L6/L7). **User-confirmed unavailable today (2026-05-19).**
+2. Annotated multi-season eval set (concealed positions + footpaths).
+3. Mock `missions` API exchanges + mock `/mapobjects` round-trip.
+4. Mock Ground Station session traces.
+5. ArduPilot SITL traces.
+6. Operator-command envelopes (blocked on Q9).
+7. VLM I/O pairs.
+8. GPS / NTP drift scripts.
+
+Closing each gap is its own workstream tracked in Jira; the autodev flow does not block on them.
+
+## Adding new fixtures
+
+1. Drop the file under `fixtures/<images|videos|movement|semantic|schemas|sql|gimbal|telemetry|mavlink|vlm|operator|mapobjects>/<descriptive-name>.<ext>` — create the subdirectory if it does not exist.
+2. Compute SHA-256 (`shasum -a 256 <file>`).
+3. Add a row to the matching subsection in `fixtures/README.md` (file path, size, SHA, upstream provenance, which `results_report.md` rows consume it).
+4. Replace the matching `<DEFERRED: ...>` placeholder(s) in `expected_results/results_report.md` with the local path `fixtures/<...>`.
+5. If the fixture replaces a service mock, also update `services.md → "Coverage summary by service"` to reflect the new acquisition status.
+6. If the fixture is binary and large (> 50 MB) consider gitignoring it + adding an acquisition script per the e2e pattern; for everything in the current set, direct commit is fine.
@@ -0,0 +1,101 @@
+# Input Data Parameters
+
+Describes the **categories of input data** the system consumes at runtime, and the **categories of reference data** tests need. Internal component names, programming languages, IPC mechanisms, schema class names, and specific model choices are design and live in `_docs/02_document/architecture.md` — they do not belong in this file (per `.cursor/rules/artifact-srp.mdc`).
+
+Local fixtures live in `fixtures/`; see `fixtures/README.md` for the manifest. External-service test-mock requirements live in `services.md`; the per-row binding to AC criteria lives in `expected_results/results_report.md`.
+
+## Runtime inputs (what the system consumes when flying)
+
+| Input | Source | Format | Cadence | Notes |
+|---|---|---|---|---|
+| Camera frames | ViewPro A40 (or alternative ViewPro Z40K) | H.264 / H.265 over RTSP, 1080p (1920×1080) | 30 / 60 fps | Frame timestamps are mandatory. |
+| Primitive (Tier 1) detection responses | `../detections` service over a bi-directional streaming RPC contract | Bounding boxes with class id, confidence, normalised coordinates | Per frame | Same boxes feed Tier-2 ROI selection and the operator overlay. |
+| UAV telemetry | Airframe via MAVLink v2 (UDP or serial) | MAVLink messages: position, attitude, velocity, battery, link health, GPS fix | ≥1 Hz (10 Hz target) | Source-of-truth for ego-motion compensation. |
+| Gimbal feedback | ViewPro A40 vendor protocol over UDP | Yaw / pitch / zoom angle telemetry | per-tick | Source-of-truth for camera-pose compensation. |
+| Mission JSON | `missions` service via HTTPS REST | Shared `mission-schema` JSON | Once at mission start + middle-waypoint updates | Validated against the shared schema. |
+| Area-level map state | `missions` service extension `/missions/{id}/mapobjects` (GET) | Map-object records keyed by spatial cell | Once at mission start | Hydrates the system's local copy of the area map; cache-fallback on timeout. |
+| Operator commands | Ground Station via modem (return path of the outbound telemetry stream) | Authenticated + signed + replay-protected command envelope (scheme open per Q9) | Event-driven | confirm / decline / target-follow start / target-follow release / abort. |
+| Deep-analysis responses (optional) | Local-onboard model accessed via local IPC | Structured assessment schema (validated) | Per zoomed-in endpoint hold (when deep-analysis is enabled) | Schema-violation fails closed. |
+
+## Class catalogue (Tier-1 + Tier-2)
+
+Detection-quality acceptance criteria (`acceptance_criteria.md → Detection Quality`) are evaluated against a class catalogue that combines pre-existing suite-level classes with new autopilot-driven additions. Class IDs are governed at the suite level (`../detections` owns the catalogue); autopilot only consumes the IDs.
+
+### New Tier-1 (YOLO primitive) classes — to be added to the suite catalogue
+
+| # | Class name | Annotation hint | Motivated by |
+|---|---|---|---|
+| 1 | Black entrances | Bounding box; various sizes (small hideout openings to dugout entrances) | Concealed-position detection (D3, D4) |
+| 2 | Branch piles | Bounding box | Concealment material around hideouts (D3, D4) |
+| 3 | Footpaths | **Polyline / segmentation preferred over bbox** for linear features | Footpath recall gate (D5) |
+| 4 | Roads | Polyline / segmentation | Distinguishing roads from footpaths in the same scene |
+| 5 | Trees / tree blocks | Bounding box; tree-block annotation may use larger box for clusters | Concealment-context anchor; reduces false positives around tree-rows in movement detection (M1) |
+
+### Tier-2 semantic attributes — composed by `semantic_analyzer`, NOT added to YOLO catalogue
+
+| # | Attribute | Composed from | Used by |
+|---|---|---|---|
+| 1 | Footpath freshness (fresh / stale) | Footpath bbox + texture/edge analysis + seasonal context | Decision-window scoring, D5 partial coverage |
+| 2 | Concealed-structure inference | Black-entrance + branch-piles + footpath-approach proximity | POI surfacing for D3/D4 (the structure itself is composed, not directly labelled) |
+| 3 | Open clearing connected to path | Cleared-terrain texture + footpath endpoint | FPV-launch-point flagging |
+
+### Existing classes (already in the suite catalogue)
+
+The existing-class baseline (P=0.816, R=0.852 per the AC) covers the suite's pre-autopilot class set (vehicles, military equipment, etc.). Autopilot must not degrade these — see D2.
+
+### Reference for IDs
+
+The 19-id catalogue (0..18) is owned by `../detections`. Autopilot's normalised-box output uses the same IDs. When `../detections` ships a new model or renumbers IDs, the `expected_detections.json` baseline goes stale and D1, D2, D6 rows must be re-recorded.
+
+## Reference data needed for testing
+
+### Local fixtures already on disk
+
+See `fixtures/README.md` for the SHA-pinned manifest. Categorised summary:
+
+| Local fixture category | Files | Purpose | Bound to AC rows |
+|---|---|---|---|
+| `fixtures/images/*.jpg` | 5 aerial frames | Tier-1 detection contract; existing-class regression; normalised-box conformance | L1, D2, D6 |
+| `fixtures/videos/94d42580bd1ad6ff.mp4` | 1 reconnaissance clip | Frame-rate floor scenario, reserved for future movement-sequence tests | T3 |
+| `fixtures/schemas/expected_detections.{json,schema.json}` | 2 schema files | Detection-result contract shape reference | D6 |
+| `fixtures/sql/init.sql` | 1 SQL file | Suite-e2e DB seed reference | (suite-only; no autopilot AC) |
+| `fixtures/movement/video0[1-4].mp4` | 4 wide-area clips | Visual reference for movement-detection scenarios — **no paired telemetry CSVs**, ego-motion assertions unfalsifiable until those land | M1–M4 (visual reference only) |
+| `fixtures/semantic/semantic0[1-4].png` | 4 reference frames | Visual reference for concealed-position semantic targets — **starter set only, not a graded eval set** | D3, D4, D5 (starter only) |
+
+### Reference shapes still needed but not yet on disk
+
+The per-service mock catalogue is in `services.md` (authoritative). Summary of categories tests need:
+
+| Reference shape | Why it's needed | See |
+|---|---|---|
+| Frame sequences with synchronised `gimbal.csv` + `telemetry.csv` | Ego-motion compensation at zoom-out AND zoomed-in inspection | `services.md §6 Gimbal telemetry CSV` |
+| Concealed-position image set across all four seasons (annotated) | Concealed-position recall ≥60% and precision ≥20% | `services.md §5 Camera frame sequences` |
+| Footpath sequences (fresh, stale, all four seasons, polyline-annotated) | Footpath recall ≥70% | `services.md §5` |
+| New-class evaluation set (5 new classes above) | New-class per-class P/R ≥80% without degrading existing-class performance | `services.md §1 Tier-1 detection replay` (plus annotation campaign owned by `../ai-training` repo) |
+| Mock Tier-1 streaming-RPC replays | Detection-consumer isolation tests | `services.md §1` |
+| Mock Ground Station session traces | Lost-link failsafe ladder + operator-link reconnect | `services.md §3` |
+| MAVLink SITL traces | MAVLink conformance + waypoint insertion + geofence enforcement | `services.md §4` |
+| Mock central area-map service responses | Pre-flight pull / post-flight push round-trip; conflict cases (Q8) | `services.md §2` |
+| Operator-command envelopes | Signature + replay-protection tests (once Q9 resolves) | `services.md §8` |
+| VLM I/O pairs | Bounded ROI inputs + structured assessment outputs + schema-violation cases | `services.md §7` |
+| GPS / NTP drift scenarios | Wall-clock drift health-yellow gate | `services.md §9` |
+
+## Data volume targets
+
+- Training data: hundreds to thousands of annotated images/sequences total.
+- Seasonal coverage: winter (snow), spring (mud), summer (vegetation), autumn (mixed leaf + partial snow).
+- Available assembly effort: 1.5 months at 5 hours/day.
+- Movement detection requires **frame sequences** (not still images only) with synchronised camera + gimbal + UAV telemetry.
+- Footpaths require polyline or segmentation annotation rather than bounding boxes (see "Class catalogue" above).
+
+## Gaps that block `/test-spec` downstream
+
+`/test-spec` Phase 1 will pass on prerequisite existence (`expected_results/results_report.md` is non-empty). Phase 3 has a **hard 75% coverage gate** on rows with real input fixtures + real expected results.
+
+**Current coverage state** (re-computed 2026-05-19 after fixture restoration):
+
+- Rows bound to real local fixtures: L1, D2, D6, T3 (~4 rows) — these are also the rows whose fixtures were restored on 2026-05-19 from sibling repos.
+- Rows bound to **starter-only** fixtures (insufficient on their own): D3, D4, D5 (semantic PNGs), M1–M4 (movement videos without CSV).
+- Rows still deferred for fixture acquisition: see `fixtures/README.md → "Gaps still pending fixture acquisition"` and `services.md` for the authoritative list.
+
+**Project policy on the Phase 3 gate**: rather than block `/test-spec` at the 75% gate, the autodev flow registers each deferred row with a structured `<DEFERRED: needs <shape>; blocks AC <id>>` tag in `expected_results/results_report.md`. Test-spec authoring proceeds; deferred rows become release-gate items, not development-gate items. The acceptance_criteria.md project-level gate ("MUST pass before product implementation begins") still applies for the hardware/replay benchmark — that remains a hard release blocker, not deferred.
@@ -0,0 +1,153 @@
+# Expected Results
+
+Maps every quantifiable acceptance criterion from `_docs/00_problem/acceptance_criteria.md` to an input fixture + a measurable expected result. Consumed by `/test-spec` Phase 1.
+
+Per `.cursor/rules/artifact-srp.mdc`, this file uses **role / observable-behaviour language**, not internal component slugs. The system's externally observable behaviour is what's tested. Implementation names (component slugs, libraries, model names) live in `_docs/02_document/`.
+
+**Fixture sourcing**: all fixtures live in `fixtures/` (sibling-repo `../` paths are forbidden). Where no fixture exists yet, the `Input` cell carries a structured `<DEFERRED: <shape>; ref services.md §N>` tag. Phase 3 has a hard 75% coverage gate — the autodev flow registers deferred rows as release-gate items rather than blocking on the gate; see `data_parameters.md → "Gaps that block /test-spec downstream"`.
+
+**Comparison vocabulary**: see `.cursor/skills/test-spec/templates/expected-results.md` for canonical methods (`exact`, `numeric_tolerance`, `threshold_min`, `threshold_max`, `range`, `regex`, `substring`, `set_contains`, `json_diff`, `file_reference`).
+
+**Deferred-tag legend**: `<DEFERRED: <shape>; ref <pointer>>` where `<pointer>` is a section in `../services.md` (per-service mock requirements), an open architecture question (e.g. `Q9`), or `inline-authorable` (no external dependency — just not yet written).
+
+---
+
+## Latency
+
+Source ACs: `acceptance_criteria.md → Latency`.
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|---|---|---|---|---|---|
+| L1 | `fixtures/images/4d6e1830d211ad50.jpg` | Single 1280 px aerial frame consumed through the Tier-1 contract; measure end-to-end | per-frame end-to-end latency | threshold_max | ≤ 100 ms | N/A |
+| L2 | derived ROI ~640×640 from `fixtures/images/4d6e1830d211ad50.jpg` (inline-cropped by the test runner) | Tier-2 semantic confirmation over a single ROI | per-ROI latency | threshold_max | ≤ 200 ms | N/A |
+| L3 | `<DEFERRED: bounded ROI crop matching the deep-analysis input contract; ref services.md §7>` | Tier-3 deep-analysis (when enabled) local-IPC call | per-ROI call latency | threshold_max | ≤ 5000 ms | N/A |
+| L4 | `<DEFERRED: SITL or hardware-in-loop ViewPro A40 zoom command (medium→high); ref services.md §5>` | A40 physical zoom transition | wall-clock transition duration | threshold_max | ≤ 2000 ms | N/A |
+| L5 | `<DEFERRED: scripted scan decision event followed by camera physical motion; ref services.md §3, §5>` | Decision-to-movement latency end-to-end | wall-clock decision→motion duration | threshold_max | ≤ 500 ms | N/A |
+| L6 | `fixtures/movement/video01.mp4` (visual reference) + `<DEFERRED: paired gimbal.csv + telemetry.csv; ref services.md §6>` | Movement candidate enqueue at the wide-area sweep | detection→enqueue duration | threshold_max | ≤ 1000 ms | N/A |
+| L7 | `fixtures/movement/video02.mp4` (visual reference) + `<DEFERRED: paired gimbal.csv + telemetry.csv at zoomed-in band; ref services.md §6>` | Movement candidate enqueue during zoomed inspection | detection→enqueue duration | threshold_max | ≤ 1500 ms | N/A |
+| L8 | `<DEFERRED: full sweep → zoomed-inspection transition (POI detected → ROI fully zoomed); ref services.md §3, §5>` | Scan-mode transition including physical zoom | wall-clock transition | threshold_max | ≤ 2000 ms | N/A |
+| L9 | `<DEFERRED: scripted operator-click → outbound command emitted by the system (modem RTT excluded); ref services.md §3>` | Operator command → action latency | wall-clock click→outbound | threshold_max | ≤ 500 ms | N/A |
+
+## Throughput / Rate
+
+Source ACs: `acceptance_criteria.md → Throughput / Rate`.
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|---|---|---|---|---|---|
+| T1 | `<DEFERRED: long synthetic POI feed sustained above the cap (e.g. 20 POIs/min); inline-authorable>` | Cap enforcement on POIs surfaced to operator | POI rate surfaced | threshold_max | ≤ 5 / min | N/A |
+| T2 | `<DEFERRED: airframe MAVLink telemetry replay over a 60 s window; ref services.md §4>` | Position telemetry consumed from the airframe link | reported position rate | range | 1 Hz ≤ rate ≤ 10 Hz (10 Hz target) | N/A |
+| T3 | `fixtures/videos/94d42580bd1ad6ff.mp4` replayed with throttled-decode + frame-drop injection to drop below 10 fps for ≥5 s | Frame-rate floor trigger | zoom-in transitions suppressed AND overall health surfaces yellow | exact (suppression bool) + exact (health = yellow) | N/A | N/A |
+
+## Detection Quality
+
+Source ACs: `acceptance_criteria.md → Detection Quality`. Evaluation runs against the Tier-1 detection pipeline that the system consumes; autopilot's role is correct consumption + re-emission of the normalised-box contract. Class catalogue (5 new Tier-1 classes + 3 Tier-2 attributes) is defined in `../data_parameters.md → "Class catalogue"`.
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|---|---|---|---|---|---|
+| D1 | `<DEFERRED: new-class eval set across all four seasons (black entrances, branch piles, footpaths, roads, trees, tree blocks); ref services.md §1, annotation campaign in ../ai-training>` | Per-class precision/recall for added classes | per-class precision ≥ 0.80 AND recall ≥ 0.80 | threshold_min (both) | N/A | `<DEFERRED: expected_results/new_classes_pr.json>` |
+| D2 | `fixtures/images/{4d6e1830d211ad50,54f6459dbddb93d8,6dd601b7d2dc1b30,805bcf1e9f271a58,f997d0934726b555}.jpg` (5 frames) | Existing-class regression — must not degrade vs documented baseline P=0.816, R=0.852 | per-class precision + recall delta vs baseline | numeric_tolerance | ± 0.02 absolute | `<DEFERRED: expected_results/existing_classes_baseline.json — to be recorded against the pinned ../detections model>` |
+| D3 | `fixtures/semantic/semantic0[1-4].png` (4 starter frames — 1 winter, 3 unmarked season) + `<DEFERRED: full multi-season annotated concealed-position set; ref services.md §5>` | Concealed-position recall (initial gate, accepting high FP) | recall | threshold_min | ≥ 0.60 | `<DEFERRED: expected_results/concealed_positions.json>` |
+| D4 | Same as D3 | Concealed-position precision (operators filter) | precision | threshold_min | ≥ 0.20 | same as D3 |
+| D5 | `fixtures/semantic/semantic0[1-4].png` (all 4 feature footpaths leading to concealment — starter set) + `<DEFERRED: footpath sequences (fresh + stale, all four seasons), polyline-annotated; ref services.md §5>` | Footpath recall | recall | threshold_min | ≥ 0.70 | `<DEFERRED: expected_results/footpaths.json>` |
+| D6 | `fixtures/images/4d6e1830d211ad50.jpg` | Single-frame Tier-1 contract — system must consume the bbox stream and re-emit normalised-box format | output box stream conforms to the suite-level class catalogue (ids 0..18) + normalised coordinates ∈ [0,1] | schema_match + range | each coord ∈ [0,1] | `fixtures/schemas/expected_detections.schema.json` |
+
+## Movement Detection Behaviour
+
+Source ACs: `acceptance_criteria.md → Movement Detection`. Latency aspects (L6, L7) live under Latency.
+
+**Note**: M1–M4 each have a visual-reference video on disk but NO paired `gimbal.csv` / `telemetry.csv`. Ego-motion compensation cannot be verified against these videos — the visual binding is provided so a smoke harness can run, but the assertions in this section require the deferred CSVs to be meaningful. User confirmed 2026-05-19: paired CSVs do not exist today.
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|---|---|---|---|---|---|
+| M1 | `fixtures/movement/video01.mp4` (visual reference) + `<DEFERRED: paired gimbal.csv + telemetry.csv; scene must contain 1 stable tree row + 1 moving vehicle; ref services.md §6>` | Ego-motion compensation — stable objects rejected | system emits exactly 1 movement candidate (the vehicle); does NOT emit a candidate for the tree row | set_contains | candidate set == {vehicle}; ∉ tree row | N/A |
+| M2 | `fixtures/movement/video02.mp4` (visual reference) + `<DEFERRED: paired gimbal.csv + telemetry.csv at zoomed-in band; 1 small mover; ref services.md §6>` | Movement detection continues during zoomed-in hold | system enqueues 1 candidate while the camera is in the zoomed-in hold; current ROI is not preempted unless the candidate's priority exceeds it | exact | 1 candidate enqueued; ROI preempt decision matches priority rule | N/A |
+| M3 | `fixtures/movement/video03.mp4` (visual reference) + `<DEFERRED: paired gimbal.csv + telemetry.csv simulating per-zoom-band threshold edge (cluster persistence one frame below threshold); ref services.md §6>` | Per-zoom-band threshold honoured (no false candidate) | no candidate emitted | exact | count == 0 | N/A |
+| M4 | `fixtures/movement/video04.mp4` (visual reference) + `<DEFERRED: zoom-out + zoomed-in benchmark suite measuring false-positive rate at each band; ref services.md §6, Q14>` | Movement zoomed-in benchmark gate (Q14 fallback trigger) | false-positive rate per zoom band | threshold_max | ≤ per-zoom-band budget (configurable; default ≤ 0.5 / minute at zoomed-in) | `<DEFERRED: expected_results/movement_benchmark_caps.json>` |
+
+## Scan & Camera Control Behaviour
+
+Source ACs: `acceptance_criteria.md → Scan and Camera Control`.
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|---|---|---|---|---|---|
+| S1 | `<DEFERRED: scripted mission with planned route + simulated POI detected mid-sweep; ref services.md §3, §4>` | Sweep → zoomed-inspection transition within 2 s (L8) AND POI properly enqueued | transition completes; ROI matches POI bbox; queue length increments | exact (multiple) | N/A | N/A |
+| S2 | `<DEFERRED: zoomed-inspection hold scenario with footpath polyline overlapping the ROI; ref services.md §5, §6>` | Camera lock + pan along footpath while airframe flies | camera commands keep the footpath in the centre 50% of frame for the duration of the hold | numeric_tolerance | centre offset ≤ 25% per frame | N/A |
+| S3 | `<DEFERRED: operator-confirmed target + 60 s follow window; ref services.md §3>` | Target-follow centre-window | target inside centre 25% of frame while visible | threshold_max | per-frame |dx,dy| ≤ 0.125 × frame_size | N/A |
+| S4 | `<DEFERRED: queue with 3 POIs at varied confidence × proximity scores; inline-authorable>` | POI queue ordering | system pops POIs in order of `confidence × proximity × age_factor` (relative order matches) | exact (order) | N/A | N/A |
+| S5 | `<DEFERRED: hold endpoint with deep-analysis enabled — assessment returns within 2 s; ref services.md §7>` | Zoomed-in hold timeout default 5 s/POI; deep-analysis hold capped at 2 s | hold ends at min(5 s, deep_analysis_complete) | exact | N/A | N/A |
+
+## Operator Workflow
+
+Source ACs: `acceptance_criteria.md → Operator Workflow`.
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|---|---|---|---|---|---|
+| O1 | `<DEFERRED: synthetic POI at confidence = 0.40; inline-authorable>` | Confidence-scaled decision window lower bound | window duration | exact | 30 s | N/A |
+| O2 | `<DEFERRED: synthetic POI at confidence = 1.00; inline-authorable>` | Confidence-scaled decision window upper bound | window duration | exact | 120 s | N/A |
+| O3 | `<DEFERRED: synthetic POI at confidence = 0.70; inline-authorable>` | Linear interpolation (40% → 30 s, 100% → 120 s) | window duration ≈ 30 + (0.70-0.40)/(1.00-0.40) × (120-30) = 75 s | numeric_tolerance | ± 0.5 s | N/A |
+| O4 | `<DEFERRED: synthetic POI at confidence = 0.39; inline-authorable>` | Below-threshold suppression | POI NOT surfaced to operator | exact | count surfaced == 0 | N/A |
+| O5 | `<DEFERRED: surfaced POI followed by operator decline event; inline-authorable>` | Decline → ignored-item entry persisted | ignored-item appended with `(MGRS, class_group)` matching the declined POI | exact (count delta +1) + schema_match | N/A | N/A |
+| O6 | `<DEFERRED: new detection whose (MGRS, class_group) matches an existing ignored-item; inline-authorable>` | Ignored-item suppression | POI NOT surfaced | exact | count surfaced == 0 | N/A |
+| O7 | `<DEFERRED: surfaced POI + no operator response, > decision-window; inline-authorable>` | Timeout = forget (NOT blacklisted) | POI removed from queue; no ignored-item written | exact (queue −1) + exact (ignored-item count unchanged) | N/A | N/A |
+| O8 | `<DEFERRED: operator confirm command — valid + signed + within sequence; ref services.md §3, §8 (Q9)>` | Confirm → middle waypoint inserted; mode transitions to target-follow | mission update POSTed; scan-mode reports target-follow | exact (HTTP 200) + exact (mode) | N/A | N/A |
+| O9 | `<DEFERRED: replayed operator command — same envelope a second time; ref services.md §8 (blocked on Q9)>` | Replay protection | command rejected; security WARN logged; no state change | exact (state unchanged) + substring (log contains "replay") | N/A | N/A |
+| O10 | `<DEFERRED: malformed / unsigned operator command; ref services.md §8 (blocked on Q9)>` | Signature validation | command rejected; security WARN logged | exact (state unchanged) + substring (log contains "invalid") | N/A | N/A |
+
+## Reliability & Safety
+
+Source ACs: `acceptance_criteria.md → Reliability & Safety` + lost-link failsafe ladder.
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|---|---|---|---|---|---|
+| R1 | `<DEFERRED: BIT scenario — every dependency healthy; inline-authorable>` | Pre-flight self-test passes | health endpoint returns all green; takeoff permitted | exact (state) + exact (health.all == "green") | N/A | N/A |
+| R2 | `<DEFERRED: BIT scenario — Tier-1 detection unreachable; inline-authorable>` | BIT fails the takeoff gate | takeoff NOT permitted; detection dependency reports red | exact (takeoff inhibited) | N/A | N/A |
+| R3 | `<DEFERRED: BIT scenario — persistent-store ≥95% full; inline-authorable>` | Storage floor BIT failure | takeoff NOT permitted; storage dependency reports red | exact (takeoff inhibited) | N/A | N/A |
+| R4 | `<DEFERRED: in-flight operator/Ground-Station modem-link loss + 30 s elapsed; ref services.md §3, §4>` | Lost-link failsafe ladder (default 30 s grace → RTL) | system issues RTL at exactly 30 s; operator-link dependency reports red | exact (RTL command at 30s ± 1s) | ± 1 s | N/A |
+| R5 | `<DEFERRED: mid-flight battery sample at RTL-floor (e.g. 25%); ref services.md §4>` | RTL trigger | system issues RTL; health → yellow | exact (RTL command) + exact (health == yellow) | N/A | N/A |
+| R6 | `<DEFERRED: mid-flight battery sample at hard-floor (e.g. 15%); ref services.md §4>` | Land-now trigger (only operator-overridable) | system issues land-now | exact (land_now command) | N/A | N/A |
+| R7 | `<DEFERRED: airframe link command + simulated bounded retry/backoff with peer not responding through max-retries; ref services.md §4>` | Watchdog flips health red on exhaustion | airframe-link dependency reports red after configured max-retry | exact (health == red) | N/A | N/A |
+| R8 | `<DEFERRED: wall-clock drift > 200 ms simulation (GPS lock present, NTP disabled); ref services.md §9>` | Drift alarm | time-source dependency reports yellow; `clock_source` + `last_sync_at` reflect the drift | exact (health == yellow) | N/A | N/A |
+| R9 | `<DEFERRED: geofence EXCLUSION polygon crossed by simulated waypoint; ref services.md §4>` | Symmetric geofence enforcement | waypoint refused; RTL triggered | exact (waypoint rejected) + exact (RTL) | N/A | N/A |
+
+## Resources & Data
+
+Source ACs: `acceptance_criteria.md → Resources & Data`.
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|---|---|---|---|---|---|
+| Re1 | `<DEFERRED: long-running scenario — system's full onboard workload active for 5 min, monitored via process RSS; inline-authorable harness>` | Onboard memory budget (everything autopilot owns, excluding Tier 1) | combined RSS on the deployed compute device | threshold_max | ≤ 6 GB | N/A |
+| Re2 | Same as Re1 with concurrent Tier-1 traffic | Tier-1 non-degradation | Tier-1 ms/frame delta vs baseline (L1) | numeric_tolerance | ± 5 ms | N/A |
+
+## Map Reconciliation
+
+Source ACs: `acceptance_criteria.md → Map Reconciliation`.
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|---|---|---|---|---|---|
+| Mp1 | `<DEFERRED: mock central area-map service — 30 km × 30 km region, ~10000 map objects; ref services.md §2>` | Pre-flight pull | wall-clock GET → local copy hydrated | threshold_max | ≤ 30 s | N/A |
+| Mp2 | `<DEFERRED: same mock but unreachable (timeout); ref services.md §2>` | Cache-fallback path | system falls back to last-known cached state; reports `map_sync == "cached_fallback"`; operator MUST acknowledge before takeoff | exact (state) + exact (BIT requires explicit ack) | N/A | N/A |
+| Mp3 | `<DEFERRED: simulated 60-minute mission pass diff (~5000 NEW + ~2000 MOVED + ~500 REMOVED + ~10000 CONFIRMED-EXISTING); ref services.md §2>` | Post-flight push | wall-clock POST → 200 OK | threshold_max | ≤ 120 s | N/A |
+| Mp4 | `<DEFERRED: same as Mp3 but POST returns 5xx; ref services.md §2>` | Persist-on-disk + bounded retry | pending diff written to on-device storage; operator-visible warning surfaced; retry attempts logged | exact (file exists) + exact (warning surfaced) + threshold_max (retries ≤ configured cap) | N/A | N/A |
+| Mp5 | `<DEFERRED: two map updates with conflicting state for same (spatial-cell, class_group) — append-only log scenario; ref services.md §2, Q8>` | Conflict-resolution rule (Q8 placeholder) | append-only observation log + computed current view; conflict resolution per documented rule | json_diff | N/A | `<DEFERRED: expected_results/mapobjects_conflict_resolution.json — pending Q8>` |
+
+---
+
+## Coverage Status (auto-recomputed 2026-05-19)
+
+- **Total rows**: 56 (L1–L9, T1–T3, D1–D6, M1–M4, S1–S5, O1–O10, R1–R9, Re1–Re2, Mp1–Mp5).
+- **Fully bound to real fixtures**: L1, T3, D2, D6 = **4 rows (~7%)**.
+- **Bound to derived inline fixture** (no external acquisition needed): L2 = **+1 row (5 total, ~9%)**.
+- **Bound to starter/partial fixtures** (visual reference only — assertions need additional deferred inputs to be meaningful): D3, D4, D5, M1, M2, M3, M4 = **+7 rows (12 total partial, ~21%)**.
+- **Inline-authorable but not yet authored** (no external dependency — can be unblocked anytime by writing the fixture): T1, S4, O1–O7, R1–R3, R8, Re1, Re2 = **15 rows (~27%)**. Lifting these alone would bring effective coverage to ~48%.
+- **Blocked on external acquisition** (real recordings, SITL, annotated eval sets, mock services): L3–L9 (minus L6/L7 partial), T2, D1, M1–M4 (CSV pairs), S1, S2, S3, S5, R4–R7, R9, Mp1–Mp5 = **~24 rows (~43%)**.
+- **Blocked on architecture questions**: O8 (depends on Q9 partially), O9, O10 (Q9), M4 (Q14), Mp5 (Q8) = **4 rows**.
+
+**Decision (project policy)**: rather than block on the Phase 3 75% gate, each deferred row is now registered with a structured `<DEFERRED:>` tag and surfaces in `data_parameters.md → "Gaps that block /test-spec downstream"`. `/test-spec` Phase 2 can author scenarios for all 56 rows; deferred rows become **release-gate items**, not development-gate items. The `acceptance_criteria.md → "Acceptance Gates (project-level)"` hardware/replay benchmark requirement is preserved as the hard release gate — that one is NOT being deferred.
+
+## Notes on this spec
+
+- Every row carries a quantifiable comparison + tolerance — no row is "should work".
+- Where the AC depends on hardware (the deployed compute device, ViewPro A40), the test must run on representative hardware OR a benchmarked replay; pure-emulator runs are NOT acceptable for L1–L9, T1–T3, Re1–Re2.
+- Where the AC depends on an external service (`../detections`, `missions`, Ground Station), the test runs against either (a) the real service in the suite e2e (`../e2e/docker-compose.suite-e2e.yml`), or (b) a recorded replay fixture for isolation tests. Both modes are valid; the test scenario states which.
+- Q-tagged rows (M4 → Q14, Mp5 → Q8, O8–O10 → Q9) depend on open architecture questions. Their tolerance ranges may sharpen once those questions resolve; the existence of each row is non-negotiable.
+- M1–M4 visual-reference bindings (`fixtures/movement/video0[1-4].mp4`) are usable for harness smoke testing but DO NOT satisfy the assertion semantics — paired `gimbal.csv` + `telemetry.csv` are required for ego-motion compensation to be verifiable. This is the single highest-priority fixture gap.
@@ -0,0 +1,90 @@
+# Fixture manifest
+
+All fixtures live **inside this workspace** so the autopilot repo is self-sufficient — downstream test runners must never reach into a sibling repo at `../`. When you add or refresh a fixture, update the matching SHA-256 in this manifest AND the rows in `../expected_results/results_report.md` that consume it.
+
+Total on-disk size: ~57 MB.
+
+## Files
+
+### Still-image aerial frames — `images/`
+
+Used as Tier-1 input frames for detection-quality assertions.
+
+| File | Size | SHA-256 | Upstream source | `results_report.md` rows |
+|---|---|---|---|---|
+| `images/4d6e1830d211ad50.jpg` | 152 KB | `4c396495af64aaf9aac5ecb92431bf0c75db42b0bdb8e4eec1937f9995acee42` | `../detections/data/images/` (re-copied 2026-05-19) | L1, D6 |
+| `images/54f6459dbddb93d8.jpg` | 6.7 MB | `cd65c76a080ef72ce3528031f003f067fca6091c067a86d527a1ae91cd78be59` | `../detections/data/images/` (re-copied 2026-05-19) | D2 |
+| `images/6dd601b7d2dc1b30.jpg` | 1.4 MB | `45edd83a357a9f852e14e5845265cd09c20b4b99b1828c160cb3298f0e160181` | `../detections/data/images/` (re-copied 2026-05-19) | D2 |
+| `images/805bcf1e9f271a58.jpg` | 176 KB | `fe696899225fc04f2335e87acf6a3ad8a00cd3950c5940d5e73e5ce438f36257` | `../detections/data/images/` (re-copied 2026-05-19) | D2 |
+| `images/f997d0934726b555.jpg` | 232 KB | `5d1c9c551c0680e5b3d0aab261bca71e724c78f6db3580da598c680b4f7d4d79` | `../detections/data/images/` (re-copied 2026-05-19) | D2 |
+
+### Reconnaissance video — `videos/`
+
+| File | Size | SHA-256 | Upstream source | `results_report.md` rows |
+|---|---|---|---|---|
+| `videos/94d42580bd1ad6ff.mp4` | 12 MB | `602b22a42515a754313551847caa6d6a6d7b3cde1d857cbd08ebc5543fb8cf7c` | `../detections/data/videos/` (re-copied 2026-05-19) | T3 (frame-rate floor scenario) |
+
+### Movement-detection clips — `movement/`
+
+Wide-area reconnaissance clips intended for movement-detection visual baselines. **Important**: these clips DO NOT have paired `gimbal.csv` / `telemetry.csv` files — ego-motion compensation assertions (M1–M4) cannot run against them. They are useful for visual harness work, frame-count assertions, and as visual reference for the movement-detection scenarios.
+
+| File | Size | SHA-256 | Upstream source | `results_report.md` rows |
+|---|---|---|---|---|
+| `movement/video01.mp4` | 5.3 MB | `6f37186f5e9be97109db8d0d220df96d21cac9ce5b50b576234c6f7ee369d2bb` | local; provenance pre-existing in workspace | M1 (visual reference only — no telemetry) |
+| `movement/video02.mp4` | 5.9 MB | `7de7981e511e21e1e72f506d44541b44a4c27a995c9505ef8e3b48e69b416367` | local; provenance pre-existing in workspace | M2 (visual reference only — no telemetry) |
+| `movement/video03.mp4` | 6.1 MB | `df441164da7f37d715968212b95e9bf53c8e37384f20ddfab61cd6d0d18b4f3a` | local; provenance pre-existing in workspace | M3 (visual reference only — no telemetry) |
+| `movement/video04.mp4` | 5.8 MB | `36445bf1c86c5afa524000b5b2da7fc9cb3d39c745f9ad830b3d60f6868948e7` | local; provenance pre-existing in workspace | M4 (visual reference only — no telemetry) |
+
+### Semantic reference frames — `semantic/`
+
+Annotated reference examples for concealed-position semantic targets. **Not a graded eval set** — these are 4 hand-picked examples of footpath-to-concealment patterns, intended as visual reference for what the system should recognise. Detection-quality gates (D1, D3, D4, D5) need a full annotated multi-season eval set; these 4 PNGs are insufficient for those gates and serve as starter reference only.
+
+| File | Size | SHA-256 | Description | `results_report.md` rows |
+|---|---|---|---|---|
+| `semantic/semantic01.png` | 3.1 MB | `339ad4d35ab36052828f05652ab7249801bcd5d7bb04522f0ab9cbf6f0ca008a` | Footpath leading to branch-pile hideout in winter forest | D3, D4, D5 (starter only — full multi-season set still required) |
+| `semantic/semantic02.png` | 5.1 MB | `ffe3c49f5f1833724ce46083d212e714422e664b635cdd48b63311adefcd7b1f` | Footpath to FPV launch clearing, branch mass at forest edge | D3, D4, D5 (starter only) |
+| `semantic/semantic03.png` | 1.0 MB | `ce89c139815e9a80679237008f7cfc3039bbd53f162d48017e840ff91e57b109` | Footpath to squared hideout structure | D3, D4, D5 (starter only) |
+| `semantic/semantic04.png` | 1.3 MB | `b25c689b7aa543ec15858e4b5edfa32387ced4930130eb280d952c555f104e69` | Footpath terminating at tree-branch concealment | D3, D4, D5 (starter only) |
+| `semantic/data_parameters.md` | 2 KB | n/a (text) | Description of the four reference examples + the new YOLO primitive classes that motivate them | reference only |
+
+### Detection contract schemas — `schemas/`
+
+| File | Size | SHA-256 | Upstream source | `results_report.md` rows |
+|---|---|---|---|---|
+| `schemas/expected_detections.json` | 1.4 KB | `ce60c105d697efe0359d2e6b1b46fc63e53d3789b067d53501f9c76aad9bd1ae` | `../e2e/fixtures/` (re-copied 2026-05-19) | D6 (sample Tier-1 response) |
+| `schemas/expected_detections.schema.json` | 2.4 KB | `a7174e0b083dcbf42fa8672acd3e1807d11ea0629cc636ff958a4d77168733b9` | `../e2e/fixtures/` (re-copied 2026-05-19) | D6 (JSON-schema for the Tier-1 contract) |
+
+### Database init script — `sql/`
+
+| File | Size | SHA-256 | Upstream source | `results_report.md` rows |
+|---|---|---|---|---|
+| `sql/init.sql` | 3.7 KB | `b61e452c549f7b006db88d265f4346837e0a33d1abd4d977ebf3d48d8c943439` | `../e2e/fixtures/` (re-copied 2026-05-19) | suite-only reference; no autopilot AC row asserts against this |
+
+## Copy vs reference
+
+Fixtures were COPIED (not moved). The sibling repos still own the originals — keeping autopilot's copy in sync when an upstream changes is a manual chore today (the `monorepo-e2e` skill at the suite root will eventually own this drift; see `_docs/_process_leftovers/` if a sync is pending).
+
+When an upstream fixture changes:
+
+1. Recompute the SHA-256 in the source repo.
+2. Re-copy into the matching `fixtures/` subdirectory here.
+3. Update this manifest's SHA-256 column.
+4. If the change invalidates an assertion in `../expected_results/results_report.md`, fix the row's expected result too — do not let assertions drift silently against new data.
+
+## Gaps still pending fixture acquisition
+
+The authoritative per-service acquisition catalogue lives in `../services.md`. Summary of the still-open gaps (each is also tagged on its row in `../expected_results/results_report.md` with a structured `<DEFERRED: ...>` marker, and a `_docs/_process_leftovers/` entry records the replay obligation):
+
+| Gap | What's missing | Blocks AC rows | Acquisition status |
+|---|---|---|---|
+| Paired gimbal+telemetry CSVs for the 4 movement clips | `gimbal.csv` + `telemetry.csv` aligned to each video frame timestamps | M1–M4, tightens L6/L7 | **Confirmed unavailable today** (user 2026-05-19) — requires re-flight or new recording with gimbal-feedback channel captured |
+| Annotated eval set across all four seasons | Hundreds–thousands of labelled images per season for concealed-position + footpath gates | D1, D3, D4, D5 | needs annotation campaign (1.5 months at 5 hrs/day target per `semantic/data_parameters.md`) |
+| Per-zoom-band frame sequences | Same kind of clip as `movement/` but recorded at light, medium, and high zoom bands | tightens M2, L7, S2 | needs flight time + zoom-band metadata in the recorder |
+| Mock `missions` HTTPS exchanges | Recorded JSON request/response pairs for mission GET/POST + mapobjects GET/POST | Mp1–Mp5 | inline-authorable against the `mission-schema`; not yet authored |
+| Mock Ground Station session traces | Scripted timing trace (connect / push / drop / reconnect / lost-link) | R4, O8 | inline-authorable; not yet authored |
+| ArduPilot SITL traces | Recorded MAVLink streams for waypoint upload, geofence INCLUSION + EXCLUSION, RTL on lost-link, RTL on battery floor | R4, R5, R6, R7, R9 + project SITL conformance gate | needs SITL run |
+| Operator-command envelopes | Valid / expired / replayed / malformed envelopes under the chosen Q9 auth scheme | O9, O10 | **blocked on Q9** (`_docs/02_document/architecture.md §8`) |
+| VLM I/O pairs | Bounded ROI in → structured `VlmAssessment` out + schema-violation cases | L3, S5 | inline-authorable against the assessment schema once the local model is pinned |
+| GPS / NTP drift scenarios | Scripted offset / lock-loss traces | R8 | inline-authorable |
+
+When a fixture from this list lands, copy it under `fixtures/<category>/`, add a row to the relevant subsection above, and bind the matching `<DEFERRED>` row in `../expected_results/results_report.md` to its new local path.
@@ -0,0 +1,32 @@
+{
+  "$schema": "./expected_detections.schema.json",
+  "_meta": {
+    "fixture_version": "0.1.0-placeholder",
+    "video": "sample.mp4",
+    "video_sha256": "TBD-after-fixture-recording",
+    "model": {
+      "_comment": "Pinned model + classes that detections must run when this baseline applies. Refresh this block (and counts/bboxes below) whenever detections ships a new model.",
+      "name": "TBD",
+      "revision": "TBD",
+      "classes_source": "annotations/src/Database/DatabaseMigrator.cs (ids 0..18)"
+    },
+    "tolerance": {
+      "_comment": "Spec asserts ranges, not exact values. INT8 calibration drift can move pixel positions by a few units; absolute count can drift by ±1 across re-runs of the same engine on the same Jetson.",
+      "count_delta": 1,
+      "bbox_iou_min": 0.8,
+      "confidence_delta": 0.1
+    }
+  },
+  "expected": {
+    "total_annotations": 0,
+    "by_class": [
+      {
+        "class_id": 0,
+        "class_name": "ArmorVehicle",
+        "count": 0,
+        "bbox_samples": []
+      }
+    ],
+    "_placeholder_note": "Replace this block with the real baseline once sample.mp4 is recorded. Each entry under `by_class` carries: class_id, class_name (must match detection_classes.name), count, and bbox_samples (an array of {time_sec, center_x, center_y, width, height, confidence} entries the spec uses for IoU comparison)."
+  }
+}
@@ -0,0 +1,66 @@
+{
+  "$schema": "http://json-schema.org/draft-07/schema#",
+  "title": "Suite e2e expected detections baseline",
+  "type": "object",
+  "required": ["_meta", "expected"],
+  "properties": {
+    "$schema": { "type": "string" },
+    "_meta": {
+      "type": "object",
+      "required": ["fixture_version", "video", "video_sha256", "model", "tolerance"],
+      "properties": {
+        "fixture_version": { "type": "string" },
+        "video": { "type": "string" },
+        "video_sha256": { "type": "string" },
+        "model": {
+          "type": "object",
+          "required": ["name", "revision", "classes_source"],
+          "additionalProperties": true
+        },
+        "tolerance": {
+          "type": "object",
+          "required": ["count_delta", "bbox_iou_min", "confidence_delta"],
+          "properties": {
+            "count_delta": { "type": "integer", "minimum": 0 },
+            "bbox_iou_min": { "type": "number", "minimum": 0, "maximum": 1 },
+            "confidence_delta": { "type": "number", "minimum": 0, "maximum": 1 }
+          }
+        }
+      }
+    },
+    "expected": {
+      "type": "object",
+      "required": ["total_annotations", "by_class"],
+      "properties": {
+        "total_annotations": { "type": "integer", "minimum": 0 },
+        "by_class": {
+          "type": "array",
+          "items": {
+            "type": "object",
+            "required": ["class_id", "class_name", "count"],
+            "properties": {
+              "class_id": { "type": "integer", "minimum": 0 },
+              "class_name": { "type": "string" },
+              "count": { "type": "integer", "minimum": 0 },
+              "bbox_samples": {
+                "type": "array",
+                "items": {
+                  "type": "object",
+                  "required": ["time_sec", "center_x", "center_y", "width", "height"],
+                  "properties": {
+                    "time_sec": { "type": "number", "minimum": 0 },
+                    "center_x": { "type": "number" },
+                    "center_y": { "type": "number" },
+                    "width": { "type": "number", "minimum": 0 },
+                    "height": { "type": "number", "minimum": 0 },
+                    "confidence": { "type": "number", "minimum": 0, "maximum": 1 }
+                  }
+                }
+              }
+            }
+          }
+        }
+      }
+    }
+  }
+}
@@ -0,0 +1,45 @@
+# Semantic And Movement Detection Training Data
+
+# Source
+- Aerial imagery from reconnaissance winged UAVs at 600–1000m altitude
+- ViewPro A40 camera, 1080p resolution, various zoom levels
+- Extracted from video frames and still images
+- Movement detection requires frame sequences, not still images only; include camera/gimbal telemetry where available to separate target motion from UAV motion.
+
+# Target Classes
+- Footpaths / trails (linear features on snow, mud, forest floor)
+- Fresh footpaths (distinct edges, undisturbed surroundings, recent track marks)
+- Stale footpaths (partially covered by snow/vegetation, faded edges)
+- Concealed structures: branch pile hideouts, dugout entrances, squared/circular openings
+- Tree rows (potential concealment lines)
+- Open clearings connected to paths (FPV launch points)
+- Moving point/cluster candidates at wide or light/medium zoom
+
+# YOLO Primitive Classes (new)
+- Black entrances to hideouts (various sizes)
+- Piles of tree branches
+- Footpaths
+- Roads
+- Trees, tree blocks
+
+# Annotation Format
+- Managed by existing annotation tooling in separate repository
+- Expected: bounding boxes and/or segmentation masks depending on model architecture
+- Footpaths may require polyline or segmentation annotation rather than bounding boxes
+
+# Seasonal Coverage Required
+- Winter: snow-covered terrain (footpaths as dark lines on white)
+- Spring: mud season (footpaths as compressed/disturbed soil)
+- Summer: full vegetation (paths through grass/undergrowth)
+- Autumn: mixed leaf cover, partial snow
+
+# Volume
+- Target: hundreds to thousands of annotated images/sequences
+- Available effort: 1.5 months, 5 hours/day
+- Potential for annotation process automation
+
+# Reference Examples
+- semantic01.png — footpath leading to branch-pile hideout in winter forest
+- semantic02.png — footpath to FPV launch clearing, branch mass at forest edge
+- semantic03.png — footpath to squared hideout structure
+- semantic04.png — footpath terminating at tree-branch concealment
@@ -0,0 +1,104 @@
+-- Suite e2e database seed.
+--
+-- Loaded by the `db-seed` service in docker-compose.suite-e2e.yml after
+-- annotations has run its own DatabaseMigrator (which creates the schema +
+-- inserts the canonical detection_classes 0..18). This file therefore only
+-- adds rows that the e2e scenario depends on but the production runtime does
+-- NOT seed automatically.
+--
+-- Idempotency: every statement uses ON CONFLICT / IF NOT EXISTS so re-running
+-- the seed (e.g. on a `down -v` followed by `up`) lands the same final state.
+--
+-- Schema reference: annotations/src/Database/DatabaseMigrator.cs.
+
+\set ON_ERROR_STOP on
+
+-- Wait until annotations has populated its schema. The db-seed container starts
+-- only after postgres-local is healthy, but annotations may still be spinning
+-- up its tables. A bounded poll keeps the seed deterministic.
+DO $$
+DECLARE
+  attempt int := 0;
+BEGIN
+  WHILE attempt < 60 LOOP
+    PERFORM 1
+    FROM information_schema.tables
+    WHERE table_schema = 'public' AND table_name = 'detection_classes';
+    IF FOUND THEN
+      EXIT;
+    END IF;
+    PERFORM pg_sleep(1);
+    attempt := attempt + 1;
+  END LOOP;
+
+  IF attempt >= 60 THEN
+    RAISE EXCEPTION 'detection_classes table not found after 60s — annotations migration did not complete';
+  END IF;
+END $$;
+
+-- Default system_settings row. Annotations starts without one, but several
+-- spec assertions rely on `silent_detection = false` and known thumbnail dims
+-- so overlay rendering is reproducible.
+INSERT INTO system_settings (
+  id, name, military_unit,
+  default_camera_width, default_camera_fov,
+  thumbnail_width, thumbnail_height, thumbnail_border,
+  generate_annotated_image, silent_detection
+) VALUES (
+  '00000000-0000-0000-0000-00000000aaaa',
+  'azaion-suite-e2e',
+  'e2e-unit',
+  3840, 70,
+  240, 135, 10,
+  true, false
+) ON CONFLICT (id) DO NOTHING;
+
+-- Default directory_settings row. Annotations writes media files under the
+-- paths defined here; the e2e-runner doesn't read these directly but the
+-- service requires the row to exist on first hit.
+INSERT INTO directory_settings (
+  id, videos_dir, images_dir, labels_dir, results_dir,
+  thumbnails_dir, gps_sat_dir, gps_route_dir
+) VALUES (
+  '00000000-0000-0000-0000-00000000bbbb',
+  '/data/videos', '/data/images', '/data/labels', '/data/results',
+  '/data/thumbnails', '/data/gps_sat', '/data/gps_route'
+) ON CONFLICT (id) DO NOTHING;
+
+-- Default camera_settings row used by detections to size bbox-to-meters.
+INSERT INTO camera_settings (
+  id, altitude, focal_length, sensor_width
+) VALUES (
+  '00000000-0000-0000-0000-00000000cccc',
+  100, 50, 36
+) ON CONFLICT (id) DO NOTHING;
+
+-- Stable e2e user. The UUID is referenced by the spec when asserting
+-- annotation rows. Annotations does not own a `users` table — user identity
+-- is carried in JWTs minted with JWT_SECRET; the user_id here just needs to
+-- be deterministic and stable across runs.
+-- Stored in user_settings so the spec can `SELECT user_id` to confirm the
+-- seed ran.
+INSERT INTO user_settings (
+  id, user_id,
+  annotations_left_panel_width, annotations_right_panel_width,
+  dataset_left_panel_width,    dataset_right_panel_width
+) VALUES (
+  '00000000-0000-0000-0000-00000000dddd',
+  '00000000-0000-0000-0000-0000e2e2e2e2',
+  300, 400, 320, 320
+) ON CONFLICT (id) DO NOTHING;
+
+-- Sanity check — fail loudly if the canonical detection_classes are missing.
+-- annotations/src/Database/DatabaseMigrator.cs inserts ids 0..18 unconditionally.
+DO $$
+DECLARE
+  cnt int;
+BEGIN
+  SELECT COUNT(*) INTO cnt FROM detection_classes WHERE id BETWEEN 0 AND 18;
+  IF cnt < 19 THEN
+    RAISE EXCEPTION 'expected canonical detection_classes 0..18 (count=19), got %', cnt;
+  END IF;
+END $$;
+
+\echo 'suite-e2e seed complete'
@@ -0,0 +1,113 @@
+# External Services — Test-Mock Requirements
+
+Black-box catalogue of every external system autopilot depends on at runtime, with the **test-fixture / mock shape required for each**. Service-side design (protocols, component contracts, ownership boundaries) lives in `_docs/02_document/architecture.md` — this file owns ONLY the test-data dependency view (per `.cursor/rules/artifact-srp.mdc`, `_docs/00_problem/input_data/` is a test-data concern).
+
+Runtime input shapes (frame rates, message types) are described in `data_parameters.md`. This file extends them with the **acquisition status of the corresponding test fixture**.
+
+## Index
+
+| # | External system | Production role | Test-mock shape needed | Acquisition status |
+|---|---|---|---|---|
+| 1 | Tier-1 detection (`../detections`) | Primitive YOLO inference on every frame; returns class + bbox + confidence | Recorded bi-stream replay file (`request frame` → `response detections`) | **MISSING** — no replay recorded yet |
+| 2 | Mission planner (`missions` API) | Mission JSON pull at start; middle-waypoint POST on operator-confirm; pre-flight area-map pull + post-flight diff push | Mock HTTPS exchanges for GET/POST + sample mission + sample mapobjects state | **MISSING** — schema known (mission-schema), no fixture recorded |
+| 3 | Ground Station (modem) | Continuous push of camera + telemetry + bbox overlay; return path carries operator commands (confirm / decline / target-follow / abort) | Scripted session traces: nominal session, modem drop at T=N, reconnect at T=M, lost-link sustained ≥30 s | **MISSING** — authorable inline (no external dependency) |
+| 4 | Airframe autopilot (ArduPilot / PX4) | MAVLink v2 transport for the ~10–15 commands in `architecture.md §7.7`; battery + position telemetry; geofence enforcement | ArduPilot SITL traces: waypoint upload, geofence INCLUSION + EXCLUSION, RTL on lost-link, RTL on battery floor | **MISSING** — needs SITL run with scripted scenarios |
+| 5 | ViewPro A40 camera (frames) | H.264/265 1080p RTSP video feed at 30/60 fps | Recorded frame sequences (`.mp4`) — wide-zoom, light-zoom, medium-zoom, high-zoom variants | **PARTIAL** — 4 wide-zoom clips on disk (`fixtures/movement/video0[1-4].mp4`); zoom-band variants missing |
+| 6 | ViewPro A40 gimbal (control) | Vendor UDP control protocol; yaw / pitch / zoom telemetry per tick | Per-frame-sequence `gimbal.csv` paired with the matching video; per-tick yaw/pitch/zoom + timestamp | **MISSING** — no `gimbal.csv` paired with the 4 movement videos; ego-motion compensation (M1–M4) is unfalsifiable without this |
+| 7 | Deep-analysis VLM (local IPC) | Optional Tier-3 confirmation over bounded ROI; structured `VlmAssessment` response | Recorded I/O pairs (ROI in → `VlmAssessment` out) + schema-violation cases for fail-closed tests | **MISSING** — depends on the local model choice; can be authored against the assessment schema once the model is pinned |
+| 8 | Time source (GPS / NTP) | Wall-clock; drift triggers the R8 health-yellow gate | Scripted drift scenarios (no real GPS/NTP hardware needed) — clock offset, jump, source loss | **MISSING** — authorable inline |
+
+## Per-service detail — what acquisition would look like
+
+The table above is the index; the rows below explain the shape and acquisition path so the gaps can be planned out one at a time.
+
+### 1. Tier-1 detection replay (`../detections`)
+
+- Production transport: bi-directional gRPC. The autopilot streams frames out; `../detections` streams `Detections` messages back.
+- Mock shape: a `.replay` file (one per scenario) recording timestamped frames + the exact `Detections` responses the model emitted. Used by `detection_client` integration tests in isolation — no need to boot the real Tier-1 service.
+- Acquisition path: record one replay against the currently pinned `../detections` model. Re-record when the upstream model changes (the `monorepo-e2e` skill should eventually own this drift; see the suite's leftovers).
+- Blocks AC rows: every row that needs a deterministic detection stream — practically L1, L2, D1, D2, D6 in isolation; in suite-e2e mode these run live against the real `../detections`.
+
+### 2. Mission + MapObjects mock (`missions` API)
+
+- Production transport: HTTPS REST.
+- Mock shape: JSON fixtures per endpoint + a small mock HTTP server (or replay-style fixtures consumed by a test double). Endpoints in scope:
+  - `GET /missions/{id}` — mission JSON, validated against `mission-schema`.
+  - `POST /missions/{id}` — middle-waypoint insertion (200 OK + updated mission).
+  - `GET /missions/{id}/mapobjects` — pre-flight area-map pull (response shape: map-object records keyed by spatial cell; volume target ~10000 objects for the 30×30 km gate Mp1).
+  - `POST /missions/{id}/mapobjects` — post-flight diff push (NEW / MOVED / REMOVED / CONFIRMED-EXISTING; volume target per Mp3 ~17500 records).
+- Acquisition path: author JSON fixtures against the known schema; record real exchanges once `missions` is reachable from the test bench.
+- Blocks AC rows: Mp1–Mp5 (all 5 map-reconciliation rows).
+
+### 3. Ground Station session trace
+
+- Production transport: continuous push over modem (suite-level protocol).
+- Mock shape: scripted timing trace per scenario. Each scenario is a list of `(t, event)` pairs: connect, push frame, push telemetry, operator-click, modem-drop, reconnect, lost-link.
+- Acquisition path: authorable inline from `architecture.md §7` and `acceptance_criteria.md §Reliability & Safety`. No external dependency — just a fixture generator.
+- Blocks AC rows: R4 (lost-link → RTL at 30 s); O8, O9, O10 (operator command lifecycle on the return path, **but** O9/O10 also depend on Q9 for the auth scheme).
+
+### 4. MAVLink SITL trace
+
+- Production transport: MAVLink v2 over UDP or serial.
+- Mock shape: ArduPilot SITL recording capturing the autopilot's command stream + the airframe's response stream. One trace per scenario: waypoint upload, geofence INCLUSION violation, geofence EXCLUSION violation, lost-link RTL, battery RTL-floor RTL, battery hard-floor land-now.
+- Acquisition path: run ArduPilot SITL with a scripted mission; capture the full MAVLink stream with mavlink-router or equivalent.
+- Blocks AC rows: R4 (RTL exact timing), R5, R6, R7, R9; plus the project-level "MAVLink command surface MUST pass SITL conformance" gate.
+
+### 5. Camera frame sequences (ViewPro A40)
+
+- Production transport: RTSP/RTP over TCP/UDP, 1080p H.264/265 at 30/60 fps.
+- Current local fixtures: `fixtures/movement/video0[1-4].mp4` (4 clips, ~5–6 MB each), `fixtures/videos/94d42580bd1ad6ff.mp4` (one reconnaissance clip used for T3 frame-rate floor).
+- Mock-shape gap: zoom-band coverage. Each AC scenario that names a zoom level (wide, light, medium, high) needs a representative clip at that zoom band. The 4 movement clips do not enumerate which zoom band each represents — this needs documenting per clip OR re-recording with zoom-band labels.
+- Acquisition path: existing clips usable for movement-detection visual baselines; new recordings at each zoom band require flight time.
+
+### 6. Gimbal telemetry CSV (paired with frames)
+
+- Production transport: ViewPro A40 vendor protocol over UDP; per-tick yaw/pitch/zoom updates.
+- Mock shape: `gimbal.csv` with columns `(t, yaw_deg, pitch_deg, zoom_band, focal_mm)`, one CSV per video file, timestamps aligned to frame timestamps within ≤ 1 frame.
+- Acquisition path: requires re-flying the recording with the gimbal-feedback channel captured alongside. CANNOT be back-fitted to existing videos.
+- Blocks AC rows: M1, M2, M3, M4 (movement-detection ego-motion compensation); also tightens L6, L7 (movement candidate enqueue latency).
+- **Confirmed not available today (user-stated 2026-05-19).**
+
+### 7. VLM I/O pairs
+
+- Production transport: Unix-domain socket IPC to local-onboard VLM (NanoLLM / VILA1.5-3B per architecture §1).
+- Mock shape: paired `(roi.png, prompt.txt, vlm_response.json)` per scenario + a small set of schema-violation cases (truncated JSON, wrong field types, missing required fields) for fail-closed tests.
+- Acquisition path: depends on the local VLM model choice. Once pinned, capture real I/O during a flight or scripted run; schema-violation cases authored inline.
+- Blocks AC rows: L3 (Tier-3 ≤5 s latency on bounded ROI), S5 (deep-analysis hold-cap interaction).
+
+### 8. Operator-command envelopes
+
+- Production transport: comes back to autopilot via Ground Station modem return path.
+- Mock shape: per envelope, a `(scheme, payload, signature, sequence_id)` tuple. One fixture per case: valid, expired, replayed (same envelope sent twice), malformed (signature mismatch), unsigned.
+- Acquisition path: **blocked on Q9** (operator-command auth scheme — open in `_docs/02_document/architecture.md §8`). Once the scheme is chosen, envelopes are authorable inline.
+- Blocks AC rows: O9 (replay protection), O10 (signature validation); strengthens O8 (confirm pathway).
+
+### 9. GPS / NTP drift scripts
+
+- Production transport: kernel-level wall clock + GPS lock state.
+- Mock shape: scripted offset injection — bump the clock by N ms, drop GPS lock, change time source.
+- Acquisition path: authorable inline; no external dependency.
+- Blocks AC rows: R8.
+
+## Coverage summary by service
+
+| Service | Rows covered (real fixture) | Rows blocked on this service | Acquisition priority |
+|---|---|---|---|
+| Tier-1 replay | L1, D2, D6 (live; replay desirable for isolation) | none independently blocked | low (can use live `../detections` in suite-e2e) |
+| `missions` mock | none | Mp1–Mp5 (5 rows) | medium |
+| Ground Station trace | none | R4, O8 (2 rows) | low (inline-authorable) |
+| MAVLink SITL | none | R4, R5, R6, R7, R9 (5 rows) + project conformance gate | high |
+| Frame sequences | L1 (with image), T3 (with video) | enriches L6/L7 with telemetry | medium |
+| Gimbal CSV | none | M1–M4 (4 rows) + L6, L7 | **high — explicit user gap** |
+| VLM I/O pairs | none | L3, S5 (2 rows) | low (model-choice gated) |
+| Operator envelopes | none | O9, O10 (2 rows) | blocked on Q9 |
+| GPS/NTP drift | none | R8 | low (inline-authorable) |
+
+Per-row binding lives in `expected_results/results_report.md`. The status of each gap is mirrored in `_docs/_process_leftovers/` so the next `/autodev` run can replay the missing-fixture decision.
+
+## What this file does NOT own
+
+- Component design (how `detection_client` talks to Tier-1, how `mission_client` retries, etc.) — `_docs/02_document/architecture.md` and `_docs/02_document/components/*/description.md`.
+- Production data shapes (frame rate, MAVLink message types) — `data_parameters.md` already has these.
+- AC text — `_docs/00_problem/acceptance_criteria.md`.
+- The choice of which mocks to use during a given test run (live vs replay vs scripted) — `_docs/02_document/tests/` (test strategy doc, authored by `/test-spec` Phase 2).
@@ -0,0 +1,55 @@
+# Problem
+
+## What is being built
+
+`autopilot` is the onboard mission executor for a reconnaissance winged UAV. It runs on the airframe's edge compute device. It receives a mission from outside, controls the airframe, drives the camera + gimbal to inspect terrain, and feeds a remote human operator with everything the operator needs to confirm or decline each candidate target.
+
+## What problem it solves
+
+The reconnaissance UAV detects vehicles and military equipment well enough today, but the current high-value targets are **camouflaged positions** — FPV-operator hideouts, hidden artillery emplacements, dugouts masked by branches. These cannot be found by visual similarity to known object classes alone.
+
+Three observation gaps must be closed:
+
+- **Visual sweep coverage** — the camera must follow the planned route and keep eyes on the terrain it overflies, not only on already-known targets.
+- **Movement detection on a moving camera platform** — small movers must be surfaced as they appear, even while the airframe and gimbal are themselves moving and even at higher zoom levels.
+- **Context-aware target recognition** — a candidate position has to be assessed against scene context (footpaths arriving at it, fresh-vs-stale tracks, concealment patterns), not just shape.
+
+For every candidate it does surface, the system must reach a human operator quickly enough to act, without overwhelming the operator with too many candidates at once, and with confidence-scaled urgency so high-confidence targets are not lost to a low-confidence noise queue.
+
+## Who uses it
+
+- **Operators** — single primary, optional remote secondary. They see camera feed + telemetry + candidate overlays in a browser at a Ground Station and respond with confirm / decline / target-follow / abort. Their decisions must be authenticated, signed, and replay-protected because the radio link is hostile territory.
+- **Mission planners** — define the mission region and consume the post-mission diff of what was found.
+- **Airframe / Ground-Station crews** — depend on the system to safely abort or RTL when the operator link is lost, and to refuse takeoff if the system is not in a flight-ready state.
+- **Suite operations** — need to know when the airframe is in flight so that other ground-side housekeeping (model updates, OTA) does not interfere.
+
+## The operational reality this problem lives in
+
+Stated as fact, not as a design choice. (Design lives in `_docs/01_solution/solution.md` and `_docs/02_document/architecture.md`.)
+
+- The airframe is a reconnaissance winged UAV flying at 600–1000 m altitude.
+- Missions cover all four seasons and all common terrain types (winter snow, spring mud, summer vegetation, autumn; forest, open field, urban edges, mixed terrain).
+- The link between the airframe and the Ground Station is a modem radio that can degrade or drop entirely mid-flight; the system has to keep flying safely when this happens.
+- The operator is remote, watches a browser UI on the Ground Station, and is not co-located with the airframe.
+- Primitive (Tier 1) object detection is the responsibility of a separate service running alongside the autopilot on the same compute, accessible over a local interface — this split is fixed at the suite level, not something autopilot can choose.
+- Mission state and the area-level map of previously-seen objects come from a separate `missions` service over the network and are reconciled before takeoff and after landing.
+
+## What this system is NOT for
+
+(Scope-clarifying so the reader does not project unrelated concerns onto autopilot.)
+
+- Multi-airframe coordination, fleet management, swarm logic.
+- Mission planning across regions.
+- GPS-denied navigation algorithms (a separate suite service provides corrected GPS).
+- Annotation tooling, model training, dataset curation.
+- The operator browser UI itself (the Ground Station hosts it; autopilot feeds it).
+- Cloud-hosted inference of any kind.
+
+## Where to read further
+
+- `_docs/00_problem/restrictions.md` — the hard constraints (hardware, environment, regulatory).
+- `_docs/00_problem/acceptance_criteria.md` — measurable success criteria.
+- `_docs/00_problem/security_approach.md` — threat model + security non-negotiables.
+- `_docs/00_problem/input_data/` — runtime inputs + test fixture references.
+- `_docs/01_solution/solution.md` — the chosen solution shape (component breakdown, tech stack rationale).
+- `_docs/02_document/architecture.md` — the full architectural design.
@@ -0,0 +1,54 @@
+# Restrictions
+
+Externally imposed constraints the system MUST satisfy. Design choices — even frozen ones — live in `_docs/02_document/architecture.md`, not here. (Audited against `.cursor/rules/artifact-srp.mdc`.)
+
+## Hardware (fixed at the suite level — autopilot does not choose)
+
+- Compute device: **Jetson Orin Nano Super** (aarch64), 67 TOPS INT8, **8 GB shared LPDDR5**. Tier 1 detection consumes ~2 GB of that, leaving ~6 GB for everything autopilot owns.
+- Primary camera: **ViewPro A40**. 1080p (1920×1080), 40× optical zoom, f=4.25–170 mm, Sony 1/2.8" CMOS (IMX462LQR), HDMI or IP output at 1080p 30/60 fps. The A40's vendor control protocol is the only way to drive its pan/tilt/zoom — autopilot must speak it.
+- Alternative camera: **ViewPro Z40K** (higher cost; the system must remain compatible).
+- Thermal sensor (640×512, NETD ≤50 mK) may be added later; the system must not assume it is present today.
+- 40× optical zoom traversal takes 1–2 s wall-clock. Any sub-2-second zoom-out → zoom-in product behaviour must account for this physical floor.
+
+## Operational
+
+- Flight altitude: 600–1000 m.
+- All seasons in scope: winter snow, spring mud, summer vegetation, autumn. Winter-first-only is rejected (frozen 2026-05-06).
+- All terrain types in scope: forest, open field, urban edges, mixed terrain.
+- The operator/Ground-Station radio link is a modem with intermittent reliability — the system must tolerate degradation and full loss mid-flight.
+
+## Software environment (externally imposed)
+
+- The chosen onboard inference path must run on Jetson Orin Nano Super within the 6 GB residual RAM budget (after Tier 1).
+- **Models use FP16 precision** (frozen 2026-05-06; INT8 is rejected for MVP). Applies to every model loaded onto Jetson.
+- **No cloud egress for inference.** Any model larger than the in-binary footprint must run locally on the same Jetson, not in the cloud. Network calls for inference are forbidden.
+- Tier 1 (YOLO) and any local large model with GPU memory pressure share the Jetson GPU — only one of them may execute at any wall-clock instant. (This is a hardware-resource fact; how the system serialises them is design.)
+- The mission file format is the shared `mission-schema` artefact owned jointly by autopilot and the `missions` service. Autopilot MUST consume that schema; it cannot fork it.
+
+## Suite-level architectural splits (autopilot does not own these decisions)
+
+- Tier 1 primitive object detection runs in the sibling **`../detections`** service. Autopilot consumes its output; autopilot does NOT host Tier 1.
+- Mission state (waypoints, region, etc.) comes from the **`missions`** service. Autopilot does not author missions.
+- Central map of previously-detected objects lives in **`missions`** (extension `/missions/{id}/mapobjects`). Autopilot reconciles with it pre-flight and post-flight; in-flight, autopilot is authoritative for its mission's area.
+- GPS coordinates come from a separate **GPS-denied service** (`../gps-denied-onboard` / `../gps-denied-desktop`). Autopilot does NOT implement GPS-denied algorithms.
+- Operator browser UI is owned by the **Ground Station**. Autopilot pushes the data; it does NOT render the UI.
+- Annotation tooling + model training live in **separate repos** (`../annotations`, `../ai-training`). Autopilot does NOT own them.
+
+## Reliability & Safety obligations (mandatory)
+
+These are existence-of-the-rule constraints. The specific numeric thresholds (RTL grace, drift bound, retry count) are measured success criteria and live in `acceptance_criteria.md`.
+
+- **Pre-flight self-test (BIT) MUST gate takeoff.** The airframe must not take off until every dependency the mission needs is verifiably healthy or the operator has explicitly accepted a known degraded state (e.g. cached MapObjects fallback).
+- **Lost operator-link failsafe MUST be deterministic and bounded.** Loss of the operator/Ground-Station radio link cannot result in undefined behaviour. The eventual outcome must be a known mission-safe state (RTL by default, configurable per mission).
+- **Airframe MAVLink link loss MUST surface health-red immediately** and defer behaviour to the autopilot stack on the airframe (ArduPilot / PX4).
+- **Battery / fuel thresholds MUST trigger pre-defined safety behaviour** (RTL above a soft floor; land-now below a hard floor). Only operator override may bypass.
+- **Geofence enforcement MUST be symmetric** — both INCLUSION and EXCLUSION polygons honoured.
+- **Operator commands MUST be authenticated, signed, and replay-protected.** Modem-link encryption alone is not sufficient. (Threat model + open scheme choice live in `security_approach.md`.)
+- **On-device storage MUST be bounded.** Persistent-store full is a takeoff-blocker; mid-flight eviction policy is mandatory.
+- **No silent error swallowing.** Every dependency state MUST surface through a health endpoint.
+- **Wall-clock MUST be bound to GPS time once GPS is locked, or NTP at boot.** Forensic timestamping of operator commands depends on this.
+- **MAVLink command surface MUST conform** to whatever ArduPilot/PX4 actually accepts (SITL is the conformance reference). Inventing MAVLink semantics is not permitted.
+
+## Out of scope — see `problem.md → "What this system is NOT for"`
+
+Scope-exclusion statements are owned by `problem.md`. Not duplicated here.
@@ -0,0 +1,52 @@
+# Security Approach
+
+Threat model + non-negotiable security principles. Specific schemes / libraries / algorithms (HMAC vs ed25519, Unix-domain socket peer-cred mechanism, etc.) are design choices and live in `_docs/02_document/architecture.md` + per-component specs. (Audited against `.cursor/rules/artifact-srp.mdc`.)
+
+## Threat model
+
+The autopilot runs onboard a flying UAV. The threats it must defend against on the MVP timeline:
+
+1. **Hijack of operator commands over the radio link.** Even with modem-level link encryption, an attacker who acquires session state could replay a confirm / decline / target-follow / abort command and seize the system's behaviour. The radio link is hostile territory; link encryption alone cannot be the entire defence.
+2. **Crafted input payloads** (image / video crops sent to onboard models, malformed messages on the airframe link, oversize attachments to any onboard service) exploiting decoders, memory bugs, or causing resource exhaustion.
+3. **Unstructured model output** corrupting downstream decisions and producing false operator-facing confidence (e.g. a free-form VLM text response treated as a trusted downstream API).
+4. **Mid-flight peer spoofing** — a fake sibling service (Tier 1 detection, mission service, or any local IPC peer) impersonating a trusted dependency.
+5. **Forensic / audit gaps** — wall-clock drift breaking operator-command timestamping, post-mission diff attribution, or replay-protection windowing.
+
+**Out of scope** (lives elsewhere in the suite or is not relevant to the airborne payload):
+
+- Cloud-hosted secret management — autopilot does not call cloud services.
+- Multi-tenancy — single mission per flight; single operator-or-paired-operator session per flight.
+- Web-attack surface — the operator browser UI lives in the Ground Station, not in autopilot.
+- OTA update signing — Watchtower at the suite level owns it; autopilot only consumes signed images.
+
+## Non-negotiable security principles
+
+These are existence-of-the-rule constraints. The chosen mechanism for each is a design decision and lives in `_docs/02_document/architecture.md`.
+
+- **Operator commands MUST be authenticated, signed, and replay-protected.** Every confirm / decline / target-follow / abort command MUST carry a session-bound, replay-resistant signature that is validated before any state change. Failures are logged at WARN+ and dropped silently from the system's state machine; they are never permitted to take effect.
+- **No cloud egress for inference.** Tier 2 + Tier 3 (if enabled) MUST run on the same compute as the rest of autopilot. No HTTP / external network call originating from autopilot for inference is permitted.
+- **No silent error swallowing for security-relevant failures.** Signature invalid, peer-credential mismatch, schema violation, oversize payload rejected — each MUST surface through the health endpoint and the structured log.
+- **Bounded input for any model call.** Crop size + format allow-list + patched image decoders. Crafted-input and resource-exhaustion mitigation is mandatory; "accept anything and hope the decoder handles it" is not acceptable.
+- **Schema validation for any non-deterministic model output.** Free-form generative output (e.g. VLM text) MUST be projected onto a fixed structured schema before it crosses any decision boundary inside autopilot. Schema violation MUST fail closed.
+- **Local IPC peer authorisation.** Any onboard IPC peer that autopilot trusts MUST be identifiable as the expected local process (not just "anyone who can reach the socket"). The mechanism is a design choice.
+- **Health endpoint MUST reflect security state.** Pre-flight BIT covers reachability + warm-up of every external dependency; the same endpoint surfaces in-flight security signals (repeated signature failures, peer-credential mismatch, schema-violation rate).
+- **Wall-clock binding requirement.** Operator-command timestamping requires a trusted clock source. Wall-clock MUST be bound to GPS time once GPS is locked, or NTP at boot. Both sources MUST be recorded with `clock_source` + `last_sync_at`. Drift > 200 ms surfaces health yellow (the AC enforces the threshold; this rule mandates the binding).
+- **Airframe MAVLink integrity.** Whether the airframe link MUST use MAVLink-2 message signing depends on whether the link is physically isolated. If it is not physically isolated, message signing MUST be enabled. (The decision and the mechanism are tracked as Q6 in `architecture.md §8`.)
+
+## What this system does NOT own
+
+- Modem-link encryption setup — handled at the radio layer below autopilot.
+- Suite-wide TLS / certificate provisioning — delegated to suite-level deployment (`../_infra/`).
+- OTA update signing — Watchtower; autopilot consumes already-signed images. Boot-time self-check + rollback policy is an open suite-level question (Q10 in `architecture.md §8`).
+- Annotation / training-data security — lives in the `ai-training` repo.
+- Operator browser UI auth — Ground Station owns it; the modem-side handshake is jointly specified per the operator-command auth scheme (Q9).
+
+## Open security decisions (tracked in `_docs/02_document/architecture.md §8`)
+
+- **Q6** — MAVLink-2 message signing on the airframe link.
+- **Q9** — Operator-command authentication scheme (HMAC / ed25519 / MAVLink-2-extension / separate envelope).
+- **Q10** — Software rollback policy on the airframe (boot-time self-check, A/B partition, watchdog rollback).
+- **Q11** — Multi-operator session policy (single active operator vs quorum).
+- **Q12** — Comms blackout during banking turns (tolerate vs suppress lost-link failsafe during known turn arcs).
+
+None of these block the rest of the design. Each affected component spec calls out the question it depends on and the temporary contract used until the question resolves.
@@ -0,0 +1,50 @@
+# Solution
+
+The solution for `autopilot` is captured **in full** in `_docs/02_document/architecture.md`, `_docs/02_document/system-flows.md`, `_docs/02_document/data_model.md`, `_docs/02_document/decision-rationale.md`, the 13 per-component specs under `_docs/02_document/components/`, and `_docs/02_document/glossary.md`. These were produced before the canonical greenfield Problem step and were confirmed by the user on 2026-05-17.
+
+This file is the **canonical greenfield Solution pointer** — it exists so downstream skills that expect `_docs/01_solution/solution.md` (test-spec, decompose, plan-resume) have a single entry point, and it summarises the decision shape; it does not duplicate the architecture.
+
+## What is the solution
+
+A single Rust binary on Jetson Orin Nano Super (aarch64) that runs the mission, drives the gimbal in a two-level scan loop, ingests RTSP, delegates Tier 1 detection to `../detections` over bi-directional gRPC, runs Tier 2 + optional Tier 3 (VLM) locally, talks to a remote operator over modem via an always-on telemetry stream, and bracket-synchronises a local H3-indexed MapObjects store with the central `missions` API. The dominant pattern is a deterministic typed state machine — `ZoomedOut`, `ZoomedIn { roi, hold_started_at }`, `TargetFollow { target_id, started_at }` — coordinating a small set of Tokio actor components.
+
+## Component breakdown
+
+13 components organised into four planes (see `architecture.md §2`, §3 and per-component specs):
+
+- **Perception (data plane in)**: `frame_ingest`, `detection_client`, `movement_detector`, `semantic_analyzer`, `vlm_client` (optional).
+- **Decision + Memory**: `scan_controller`, `mapobjects_store`.
+- **Action (data plane out)**: `gimbal_controller`, `operator_bridge`, `mission_executor`, `mavlink_layer`, `mission_client`.
+- **Telemetry plane (always-on, parallel)**: `telemetry_stream`.
+
+Per-component design contracts (inputs, outputs, state, failure modes, NFRs) live in `_docs/02_document/components/<name>/description.md`.
+
+## Tech stack rationale (one-line summary per choice; full rationale in `decision-rationale.md`)
+
+| Layer | Selection | Rationale |
+|---|---|---|
+| Language | Rust | Memory safety, performance, single-binary deployment, strong typing for the deterministic state machine. |
+| Tier 1 detector | YOLO26 + YOLOE-26 FP16 TensorRT (in `../detections`) | Best fit with acceptance criteria + existing export pipeline. Not owned by autopilot. |
+| Tier 2 analyzer | Primitive graph + lightweight ROI CNN | Fast, explainable, data-efficient. |
+| Movement detection | OpenCV optical flow + telemetry; learned-CV fallback per Q14 | Addresses moving-camera constraint directly; benchmark-gated. |
+| VLM runtime | NanoLLM / VILA1.5-3B (optional, local IPC) | Local multimodal path that matches the no-cloud requirement. |
+| MAVLink transport | Hand-rolled (Rust) | Eliminates the largest current dependency-risk item; command surface is small (`architecture.md §7.7`). |
+| Gimbal protocol | ViewPro A40 vendor protocol over UDP | Matches the deployed camera. |
+| Inter-component IPC | Tokio channels / actors | Idiomatic Rust async. |
+| External IPC (VLM) | Unix-domain socket + peer-credential check | Local-only authorisation. |
+| MapObjects engine | TBD (SQLite + H3 / KV / in-memory + snapshot) | Open question Q3; does not block decomposition of the rest of the system. |
+| Observability | `tracing` + JSON logs to stdout | Scraped by the deployment's log-shipping stack. |
+| Build | `cargo` cross-compile for `aarch64-unknown-linux-gnu` | See `_docs/02_document/deployment/ci_cd_pipeline.md`. |
+
+## Reading order for downstream skills
+
+1. `_docs/02_document/architecture.md` — start with §0 Synopsis, then §3 Components, §5 Architectural Principles, §6 NFR Targets, §7 Detailed Design (in section order).
+2. `_docs/02_document/system-flows.md` — flow-by-flow walkthroughs; cross-referenced from the architecture sections.
+3. `_docs/02_document/data_model.md` — canonical entities (Frame, Detection, POI, VlmAssessment, MapObject, IgnoredItem, MissionItem, ...).
+4. `_docs/02_document/components/<name>/description.md` — one per component; consumed by `/decompose` to map tasks to components.
+5. `_docs/02_document/glossary.md` — project-specific terms (also user-confirmed 2026-05-17).
+6. `_docs/02_document/decision-rationale.md` — load-bearing research and decision evidence (the equivalent of `research/` Mode A + Mode B outputs).
+
+## Open questions / open decisions
+
+Tracked in `_docs/02_document/architecture.md §8 Open Questions` (Q1–Q14). None of them block initial implementation decomposition; each component spec calls out the questions it depends on and what the temporary contract is until the question resolves.
@@ -0,0 +1,124 @@
+# autopilot — Documentation Index
+
+**Status**: forward-looking design (Rust). The implementation is in flight. This page is the entry point into the doc set; it does not duplicate content.
+
+If you are new to autopilot, read in this order: `architecture.md` → `system-flows.md` → the component spec(s) you care about → `data_model.md` for entity-level detail → `decision-rationale.md` for *why* the design looks the way it does.
+
+---
+
+## 1. Doc set at a glance
+
+| File | Purpose |
+|---|---|
+| `architecture.md` | The system. System context, component layering, NFRs, detailed design (problem, restrictions, AC, training data, solution architecture, MAVLink and piloting, MapObjects/H3, MGRS sync, target relocation, MapObjects sync with central DB, tech stack), open questions, scope boundary. |
+| `system-flows.md` | Per-flow narratives + sequence diagrams. Frame pipeline, movement detection (zoom-out + zoom-in), VLM confirmation, scan-controller behaviour tree, operator round trip, mission lifecycle, MapObjects + ignored items, MapObjects sync, pre-flight BIT, lost-link failsafe ladder. |
+| `data_model.md` | Canonical entity catalogue. Frames, detections, POIs, VlmAssessment, MapObject + observation log + bundle, IgnoredItem, OperatorCommand envelope, MissionItem vs MissionWaypoint, MGRS wire format, persistence + versioning. |
+| `decision-rationale.md` | Load-bearing research and decision evidence (per-dimension reasoning chain, fact cards, fit matrix, validation log, source registry, weak-point→fix table, historical seed narrative). |
+| `glossary.md` | Project-specific terms. |
+| `components/<name>/description.md` | One per autopilot component (13 total): purpose, inputs, outputs, responsibilities, state, failure modes, dependencies, NFR targets, references. |
+| `deployment/containerization.md` | Single-binary deployment options (native systemd vs container), target hardware, configuration surface, health endpoint. |
+| `deployment/ci_cd_pipeline.md` | Build, test, SITL conformance, benchmark gate, sign + publish. |
+| `deployment/observability.md` | Logs (`tracing` + JSON), metrics, traces, health aggregation, replay-driven debugging. |
+| `FINAL_report.md` | This file. |
+
+---
+
+## 2. The system in two minutes
+
+`autopilot` is the onboard mission executor for a reconnaissance winged UAV. It runs as a single Rust process on an aarch64 Jetson Orin Nano. It pulls a mission from the external `missions` API (and the mission area's last-known MapObjects state), controls the UAV through a hand-rolled MAVLink layer, drives a ViewPro A40 gimbal in a two-level scan-and-zoom loop (zoom-out wide sweep + zoom-in on POI), streams camera frames + telemetry continuously over modem to an external Ground Station so the operator watches in a browser, and uses bi-directional gRPC to delegate primitive object detection to the external `../detections` API. Movement detection runs at both zoom levels with mandatory ego-motion compensation. Semantic-vision reasoning (Tier 2 + an optional local VLM), a POI scheduler with a ≤5 POIs/min operator-review cap, and a target-follow mode after operator confirmation all run inside autopilot. Pre-flight self-test gates takeoff; the mission's full pass diff is pushed back to the central MapObjects store at mission end. Operator commands are authenticated, signed, and replay-protected.
+
+Full synopsis: `architecture.md §Synopsis`.
+
+---
+
+## 3. Components
+
+The system is 13 components organised into 4 planes:
+
+| Plane | Components |
+|---|---|
+| Perception (data plane in) | `frame_ingest`, `detection_client`, `movement_detector`, `semantic_analyzer`, `vlm_client` (optional) |
+| Decision + Memory | `scan_controller`, `mapobjects_store` |
+| Action (data plane out) | `gimbal_controller`, `operator_bridge`, `mission_executor`, `mavlink_layer`, `mission_client` |
+| Telemetry plane (always-on, parallel) | `telemetry_stream` |
+
+Per-component design specs: `components/<name>/description.md`.
+
+---
+
+## 4. Architectural non-negotiables
+
+These are stated once in `architecture.md §5` and referenced everywhere:
+
+- Detection-as-a-service (Tier 1 lives in `../detections`).
+- Hand-rolled MAVLink (no third-party SDK).
+- Deterministic typed state machine for scan control: `ZoomedOut`, `ZoomedIn`, `TargetFollow`.
+- Ego-motion compensation is mandatory for movement detection. Movement detection runs at **both** zoom-out and zoom-in (per-zoom-band thresholds; classical-CV adequacy at zoom-in is benchmark-gated).
+- Operator workload cap of ≤5 POIs/minute is hard.
+- Operator timeout scales with confidence.
+- **Operator commands are authenticated, signed, and replay-protected** (modem encryption alone is not sufficient).
+- Local VLM with structured `VlmAssessment` schema; no cloud egress.
+- Always-on camera + telemetry stream to Ground Station.
+- **Lost-link failsafe is explicit** (`mission_executor` runs a typed ladder; default RTL after 30 s grace).
+- **Pre-flight self-test (BIT) gates takeoff** including MapObjects pre-flight pull.
+- **MapObjects are mission-bracketed and centrally synchronised** via the `missions` API extension `/missions/{id}/mapobjects`.
+- `autopilot` and `missions` are separate repos with a shared `mission-schema` artefact.
+- No silent error swallowing; health endpoint reflects every dependency including `mapobjects_sync`.
+- Geofence enforcement is symmetric: both INCLUSION and EXCLUSION are honoured.
+
+---
+
+## 5. Open questions
+
+Surfaced explicitly in `architecture.md §8`:
+
+| # | Question | Blocks |
+|---|---|---|
+| Q1 | Sweep pattern (pendulum / raster / lawn-mower), FOV per zoom tier, dwell time. | `scan_controller` zoom-out implementation. |
+| Q2 | Ground Station API contract (stream protocol, auth, bbox-overlay rendering). | `telemetry_stream` + `operator_bridge` design. |
+| Q3 | `mapobjects_store` engine (SQLite + H3 / KV / in-memory + snapshot). | Persistent-state design. |
+| Q4 | Tier 1 contract evolution / `detection_client` versioning. | gRPC contract definition. |
+| Q5 | `mission-schema` extraction location. | Schema sharing between `autopilot` and `missions`. |
+| Q6 | MAVLink-2 message signing. | `mavlink_layer` startup handshake. |
+| Q7 | Central MapObjects API contract details (paging, photo-ref upload, retention). | `missions` repo work + `mission_client` MapObjects sync code. |
+| Q8 | MapObjects conflict resolution (projection rules, REMOVED-claim expiry, multi-class disambiguation). | Central `map_objects_current` view definition. |
+| Q9 | Operator-command authentication scheme (HMAC vs ed25519 vs MAVLink-2 sig vs separate envelope). | `operator_bridge` validation logic + Ground Station integration. |
+| Q10 | Software rollback policy on the airframe (boot-time check, A/B partition, watchdog rollback). | Deployment design + on-airframe service supervision. |
+| Q11 | Multi-operator session policy (single active vs quorum). | `operator_bridge` session model. |
+| Q12 | Comms blackout during banking turns (tolerate as `LinkDegraded` vs suppress lost-link during turns). | Lost-link ladder timing constants. |
+| Q13 | All-season acceptance flight gates (minimum flights per season, per-season acceptance criteria). | MVP sign-off scope. |
+| Q14 | Movement-detector zoom-in fallback selection (learned optical flow vs CNN motion-segmentation vs IMU-tighter classical CV) if classical CV fails the per-zoom-band FP cap. | `movement_detector` zoom-in scope. |
+
+---
+
+## 6. Suite-level docs autopilot consumes
+
+These live in `../_docs/` (parent suite repo):
+
+| Path | Used for |
+|---|---|
+| `../_docs/00_top_level_architecture.md` | Suite topology, edge tier, flight-gate convention. |
+| `../_docs/02_missions.md` | Mission / Waypoint / Vehicle schemas (consumed by `mission_client`). |
+| `../_docs/03_detections.md` | Detections gRPC API (consumed by `detection_client`). |
+| `../_docs/04_system_design_clarifications.md` | REST patterns, stream-detection protocol, edge-device connection semantics. |
+| `../_docs/11_gps_denied.md` | GPS-Denied service architecture (out of autopilot scope). |
+| `../_docs/12_ai_training.md` | AI training pipeline (autopilot consumes the resulting models via the suite-wide model-sync timer). |
+
+Full table with ownership: `architecture.md §10`.
+
+---
+
+## 7. Where to put new content
+
+| You want to document… | Put it in… |
+|---|---|
+| A new flow between components | `system-flows.md` (and add a sequence diagram). |
+| A new entity / schema | `data_model.md`. |
+| A change in NFR target | `architecture.md §6`. |
+| A change in a single component's responsibilities | `components/<name>/description.md`. |
+| A change in the MAVLink command surface | `architecture.md §7.7`. |
+| A new architectural principle | `architecture.md §5`. |
+| A new design decision with research backing | `decision-rationale.md`. |
+| A new term | `glossary.md`. |
+| A change in deployment shape | `deployment/<file>.md`. |
+| Ad-hoc internal team note | not in `_docs/`. |
@@ -0,0 +1,847 @@
+# autopilot — Architecture
+
+**Status**: forward-looking design (Rust). The implementation is in flight; the system described here is the target architecture, not what runs today. Confirmed by user 2026-05-17.
+
+## Synopsis
+
+`autopilot` is the onboard mission executor for a reconnaissance winged UAV. It runs as a single Rust process on an aarch64 Jetson Orin Nano edge device. It pulls a mission from the external `missions` API, controls the UAV through a hand-rolled MAVLink layer (~10–15 commands; no third-party SDK), drives a ViewPro A40 gimbal in a two-level scan-and-zoom loop (zoom-out wide sweep + zoom-in on POI), streams camera frames + telemetry continuously over modem to an external Ground Station API so the operator watches in a browser, and uses bi-directional gRPC to delegate primitive object detection to the external `../detections` API. Semantic-vision reasoning (Tier 2 ROI analysis + an optional local VLM), a POI scheduler with an operator-review rate cap, and a target-follow mode after operator confirmation all run inside autopilot. The dominant pattern is a deterministic typed state machine (zoom-out / zoom-in / target-follow) coordinating a small set of async actors.
+
+---
+
+## 1. System Context
+
+Autopilot integrates with six external systems. The local VLM is optional (benchmark-gated); everything else is mandatory.
+
+```mermaid
+flowchart LR
+    cam["ViewPro A40<br/>RTSP camera + gimbal"]
+    det["../detections<br/>Tier 1 YOLO service"]
+    vlm["NanoLLM VILA1.5-3B<br/>(optional, local IPC)"]
+    miss["missions API"]
+    gs["Ground Station<br/>operator UI"]
+    ap["ArduPilot / PX4"]
+    autopilot["autopilot<br/>onboard mission + scan + perception"]
+    cam <-->|RTSP frames / UDP gimbal control| autopilot
+    autopilot <-->|bidir gRPC| det
+    autopilot <-.->|Unix-domain socket IPC| vlm
+    autopilot <-->|REST GET / POST| miss
+    autopilot <-->|stream over modem| gs
+    autopilot <-->|MAVLink v2| ap
+```
+
+Per-edge protocol details:
+
+| Edge | Protocol | Direction | Purpose |
+|---|---|---|---|
+| ViewPro A40 (camera) | RTSP/RTP over TCP/UDP | inbound | live H.264/265 1080p video to `frame_ingest`. |
+| ViewPro A40 (gimbal) | UDP, vendor control protocol | bidirectional | yaw / pitch / zoom commands + status; driven by `gimbal_controller`. |
+| `../detections` | bi-directional gRPC | bidirectional | frames out, bounding boxes back; driven by `detection_client`. |
+| NanoLLM VILA1.5-3B | Unix-domain socket IPC (peer-cred check) | bidirectional | bounded ROI + short prompt → structured `VlmAssessment`; optional. |
+| `missions` API | HTTPS REST (GET / POST) | bidirectional | mission pull on start; middle-waypoint POST on operator confirmation; **MapObjects** pre-flight pull + post-flight push (`/missions/{id}/mapobjects`, see §7.13). |
+| Ground Station API | continuous push over modem (protocol per `../_docs/04_system_design_clarifications.md`) | bidirectional | always-on camera feed + telemetry + bbox overlay; operator confirm / decline / target-follow. |
+| ArduPilot / PX4 | MAVLink v2 over UDP or serial | bidirectional | the small command surface in §7.7. |
+
+---
+
+## 2. Component Layering
+
+Three internal layers (Perception → Decision + Memory → Action) plus an always-on Telemetry plane that runs parallel to the decision loop.
+
+```mermaid
+flowchart TB
+    subgraph autopilot ["autopilot"]
+        subgraph perception ["Perception (data plane in)"]
+            fi[frame_ingest]
+            dc[detection_client]
+            md[movement_detector]
+            sa[semantic_analyzer]
+            vc["vlm_client (opt)"]
+        end
+        subgraph brain ["Decision + Memory"]
+            sc[scan_controller]
+            mo[mapobjects_store]
+        end
+        subgraph action ["Action (data plane out)"]
+            gc[gimbal_controller]
+            ob[operator_bridge]
+            me[mission_executor]
+            ml[mavlink_layer]
+            msc[mission_client]
+        end
+        subgraph tplane ["Telemetry plane (always-on, parallel)"]
+            ts[telemetry_stream]
+        end
+    end
+    perception ==>|"inputs (bboxes, motion, Tier 2, VlmAssessment)"| brain
+    brain ==>|"commands + POI updates + middle-waypoint hints"| action
+    perception -.->|"frames + bboxes"| tplane
+    action -.->|"telemetry"| tplane
+```
+
+Per-flow component-to-component sequence diagrams live in `system-flows.md`.
+
+---
+
+## 3. Components
+
+| Component | Layer | Responsibility |
+|---|---|---|
+| `frame_ingest` | Perception | Pull RTSP from ViewPro A40; decode; timestamp; hand frames to `detection_client`, `movement_detector`, and `telemetry_stream` (zero-copy where possible). |
+| `detection_client` | Perception | Bi-directional gRPC to `../detections`; streams frames out, receives bounding boxes back; same bboxes are reused for Tier 2 ROI selection and for operator overlay. Versioned against the `../_docs/03_detections.md` contract. |
+| `movement_detector` | Perception | Active in **both** zoom-out and zoom-in levels (skipped only during target-follow). OpenCV optical-flow / global-motion estimation fused with timestamped gimbal angle, zoom state, and UAV motion telemetry. Emits residual-motion clusters as POI candidates. Ego-motion compensation is mandatory; naive frame-differencing is rejected. Zoom-in adequacy of classical CV is benchmark-gated — see §7.6 Movement detector and Open Question Q14. |
+| `semantic_analyzer` | Perception | Tier 2. Primitive graph + lightweight ROI CNN over zoom-in crops. Owns path-freshness scoring, endpoint scoring, branch choice at intersections, and concealment-POI scoring. |
+| `vlm_client` | Perception (optional) | Local-IPC client to a NanoLLM/VILA1.5-3B process. Validates ROI payload size/format, calls the VLM with a bounded crop and short prompt, validates the response against a structured `VlmAssessment` schema. No cloud egress. Optional behind a `vlm_enabled` flag and a feature module (see §7.6 Local VLM Confirmation). |
+| `scan_controller` | Decision + Memory | Central deterministic typed state machine — `ZoomedOut`, `ZoomedIn`, `TargetFollow`. Owns the POI queue, timeouts, ≤5 POIs/min cap, confidence-scaled operator-decision window, and gimbal-command issuance. Full behaviour-tree spec in `system-flows.md §F4`. |
+| `mapobjects_store` | Decision + Memory | On-device H3-indexed map of detected objects + ignored-items list. Pre-flight pull of the mission-area map from the central `missions` API; in-flight on-device authoritative; post-flight push of the mission diff back to central. Computes new / moved / existing / removed diffs across passes (§7.10, §7.11, §7.12). Read/written directly by `scan_controller`; sync pulls/pushes are handled via `mission_client`. |
+| `gimbal_controller` | Action | ViewPro A40 control protocol (yaw / pitch / zoom). Honours ≤2 s zoom transition budget and ≤500 ms decision-to-movement latency. Owns the smooth-pan path-tracking primitive used in zoom-in level. |
+| `operator_bridge` | Action | Surfaces POIs and target-follow lifecycle events through `telemetry_stream` to the Ground Station; receives confirm / decline / target-follow start-release back. On decline, appends an `IgnoredItem` via `mapobjects_store`. On confirm, hands a middle-waypoint hint to `mission_executor`. |
+| `mission_executor` | Action | Multirotor and fixed-wing variants of the platform state machine: takeoff / climb / cruise / land for multirotor; upload-and-await-AUTO for fixed-wing. Owns geofence enforcement (both INCLUSION and EXCLUSION). Issues MAVLink commands through `mavlink_layer`; consumes `mission_client` mission state. Inserts middle waypoints on operator-confirmed targets. |
+| `mavlink_layer` | Action | Hand-rolled MAVLink v2 transport (UDP or serial) implementing only the ~10–15 commands this codebase needs. See §7.7 for the command surface. No third-party SDK. |
+| `mission_client` | Action | Pulls mission JSON from the `missions` API on start; validates against `mission-schema`; handles mid-flight middle-waypoint inserts (POST). Survives transient connection loss with bounded retry. |
+| `telemetry_stream` | Telemetry plane | Continuous push of camera frames + flight telemetry + bbox overlay to the Ground Station API over modem. Always-on; not detection-gated. Carries operator commands (confirm / decline / target-follow start-release) on the return path. |
+
+The system is intentionally a small set of well-named components rather than 30+ files. Everything in `frame_ingest`, `detection_client`, `movement_detector`, `semantic_analyzer`, and `vlm_client` runs on the **input data plane** — no UAV control, no operator surface. Everything in `gimbal_controller`, `mission_executor`, `mavlink_layer`, `mission_client`, and `operator_bridge` runs on the **output control plane** — UAV motion + operator interaction. `scan_controller` and `mapobjects_store` are the **brain** in between. `telemetry_stream` is parallel; it never sits in the decision path.
+
+Per-component design specs (purpose, inputs, outputs, state, failure modes, NFRs) live in `components/<name>/description.md`.
+
+---
+
+## 4. Major Data Flows
+
+1. **Frame pipeline**. ViewPro A40 RTSP → `frame_ingest` → `detection_client` (bi-dir gRPC to `../detections`) → bboxes back → `movement_detector` (active at both zoom-out and zoom-in; residual-motion clusters) → `scan_controller` POI queue. The same bboxes also flow into `telemetry_stream` for operator overlay. (`system-flows.md §F1`)
+2. **Zoom-in + confirmation**. `scan_controller` pops a POI → `gimbal_controller` zooms ViewPro A40 → `semantic_analyzer` runs Tier 2 over the ROI → optionally `vlm_client` runs Tier 3 → `scan_controller` decides. Movement candidates emerging during the zoom-in hold are still consumed (subject to telemetry-skew tolerance and the per-zoom-band thresholds). (`system-flows.md §F2`, `§F3`)
+3. **Operator round trip**. `telemetry_stream` pushes camera + telemetry + bbox overlay → Ground Station → operator browser → confirm / decline / target-follow start-release → modem → `operator_bridge` → `mapobjects_store` (decline) or `mission_executor` (confirm) or `scan_controller` (target-follow). Always-on, not detection-gated. Operator commands are authenticated, signed, and replay-protected (§5; scheme TBD per Q9). (`system-flows.md §F5`)
+4. **Mission lifecycle**. `mission_client` pulls from `missions` API → `mission_executor` issues MAVLink waypoints via `mavlink_layer` → `gimbal_controller` runs the zoom-out sweep along the route. On operator confirmation, `mission_executor` inserts a middle waypoint and resumes after target-follow ends. (`system-flows.md §F6`)
+5. **MapObjects + ignored items**. New detections compute an H3 cell, query the k-ring of neighbours, classify as new / moved / existing / removed (§7.12), and check for an `IgnoredItem` match before surfacing to the operator. (`system-flows.md §F7`)
+6. **MapObjects sync** (mission-bracketing). Pre-flight: `mission_client` pulls the last-known map state for the mission area from the `missions` API and hydrates `mapobjects_store`. Post-flight: `mission_client` pushes the mission's full pass diff (NEW / MOVED / REMOVED / CONFIRMED-EXISTING) back. In-flight sync is **batched only** for MVP — no streaming over modem (§7.13; `system-flows.md §F8`).
+
+---
+
+## 5. Architectural Principles / Non-Negotiables
+
+- **Detection-as-a-service.** Primitive (Tier 1) detection lives in `../detections`, not in autopilot. Autopilot owns Tier 2 (semantic) and Tier 3 (VLM, optional) only.
+- **Hand-rolled MAVLink.** No third-party SDK. The MAVLink command surface is small enough to hand-implement; eliminates the largest current dependency-risk item.
+- **Deterministic typed state machine** for scan control. States are `ZoomedOut | ZoomedIn { roi, hold_started_at } | TargetFollow { target_id, started_at }`. No ad-hoc booleans, no shared mutable flags. The full behaviour-tree spec lives in `system-flows.md §F4`.
+- **Ego-motion compensation is mandatory** for movement detection. Naive frame-differencing is rejected outright. Movement detection runs at **both** zoom-out and zoom-in (skipped only during target-follow); zoom-in adequacy of classical CV is benchmark-gated (§7.6, Q14).
+- **Operator workload cap of ≤5 POIs/minute** is hard, not soft. `scan_controller` enforces it.
+- **Operator timeout scales with confidence** — 40 % → 30 s, 100 % → 120 s, linear; below 40 % the target is not surfaced. Timeout = forget; decline = `IgnoredItem` entry.
+- **Operator commands are authenticated, signed, and replay-protected.** Modem-link encryption alone is not sufficient — every confirm / decline / target-follow / abort command MUST carry a session-bound, replay-resistant signature that `operator_bridge` validates before dispatch. Exact scheme TBD (§8 Q9).
+- **Local VLM with structured `VlmAssessment` schema.** Free-form VLM text is not a downstream API. No cloud egress.
+- **Always-on camera + telemetry stream** to Ground Station is part of the mission contract — operator always sees the live feed, not just on detection.
+- **Lost-link failsafe is explicit.** Loss of the operator/Ground-Station modem link triggers a typed failsafe ladder in `mission_executor` (§7.7). The ladder is deterministic; default action is RTL after a configured grace window.
+- **Pre-flight self-test (BIT) gates takeoff.** Every dependency listed in §5 plus mission load + MapObjects pre-flight pull (cached fallback acknowledged) must pass before `mission_executor` enters `ARMED` (multirotor) or `WAIT_AUTO` (fixed-wing). Health endpoint distinguishes pre-flight vs in-flight readiness.
+- **`autopilot` and `missions` are separate repos** with a shared `mission-schema` artefact. The same `missions` API also hosts the central MapObjects endpoints (§7.13).
+- **MapObjects are mission-bracketed and centrally synchronised.** Pre-flight pull on start; on-device authoritative in-flight; full pass diff pushed at mission end. The on-device store is a working copy of the central state for the mission's bounding box, not a private database.
+- **No silent error swallowing** anywhere in the pipeline. Health endpoint reflects every dependency: `frame_ingest`, `detection_client`, `movement_detector`, `semantic_analyzer`, `vlm_client` (if enabled), `scan_controller`, `gimbal_controller`, `mavlink_layer`, `mission_client`, `mission_executor`, `operator_bridge`, `telemetry_stream`, `mapobjects_store`, plus `mapobjects_sync` (pre-flight pull / post-flight push status).
+- **Geofence enforcement is symmetric.** Both INCLUSION and EXCLUSION polygons are honoured. (Earlier C++ behaviour silently ignored EXCLUSION; the rewrite explicitly enforces both.)
+
+---
+
+## 6. Non-Functional Targets
+
+| Concern | Target | Owner |
+|---|---|---|
+| Tier 1 latency | ≤100 ms / frame (end-to-end at 1280 px, FP16, batch 1) | `../detections` (autopilot's call budget respects it) |
+| Tier 2 latency | ≤200 ms / ROI | `semantic_analyzer` |
+| Tier 3 (VLM) latency | ≤5 s / ROI | `vlm_client` |
+| ViewPro A40 zoom transition | ≤2 s (medium → high) | `gimbal_controller` |
+| Decision-to-movement latency | ≤500 ms | `gimbal_controller` |
+| POI rate to operator | ≤5 POIs / min (hard cap) | `scan_controller` |
+| Concealed-position recall | ≥60 % | `semantic_analyzer` |
+| Concealed-position precision | ≥20 % (operators filter) | `semantic_analyzer` |
+| New per-class P / R | ≥80 % | `../detections` |
+| Footpath detection recall | ≥70 % | `semantic_analyzer` |
+| Movement-candidate enqueue latency | ≤1 s from detection (zoom-out); ≤1.5 s (zoom-in, accommodating gimbal slew) | `movement_detector` |
+| Zoom-out → zoom-in transition | ≤2 s including physical zoom | `scan_controller` + `gimbal_controller` |
+| Telemetry rate (position) | 1 Hz min, 10 Hz target | `mavlink_layer` |
+| Memory budget (semantic + movement + VLM) | ≤6 GB on Jetson Orin Nano (8 GB total, ~2 GB reserved for YOLO) | system-wide |
+| Watchdog / retry on MAVLink failures | bounded retry with exponential backoff; explicit max-retry; health flips to red | `mission_executor` |
+| Operator command → action latency | ≤500 ms operator-click → outbound MAVLink / gimbal command (excludes modem RTT) | `operator_bridge` + downstream |
+| Sustained frame-rate floor | ≥10 fps; below this `scan_controller` suppresses zoom-in transitions and surfaces health → yellow | `frame_ingest` + `scan_controller` |
+| MapObjects pre-flight pull | ≤30 s for a 30 km × 30 km mission area; cache-fallback acceptable on timeout | `mission_client` + `mapobjects_store` |
+| MapObjects post-flight push | ≤2 min for a 60 min mission's pass diff; bounded retry; persisted on disk if push fails | `mission_client` + `mapobjects_store` |
+
+---
+
+## 7. Detailed Design
+
+This section covers the rewrite-time problem narrative, suite-level concerns (mission regions, MapObjects, MGRS sync, new-vs-existing object detection), constraints, acceptance criteria, the chosen solution architecture, the MAVLink command surface, and the tech stack.
+
+### 7.1 Problem
+
+The reconnaissance winged UAV detects vehicles and military equipment with YOLO, but current high-value targets are camouflaged positions: FPV operator hideouts, hidden artillery emplacements, and dugouts masked by branches. These cannot be found by visual similarity to known object classes alone.
+
+The new approach has three cooperating search engines:
+
+- **Camera sweep** — follow the UAV route at wide or light/medium zoom with left-right gimbal movement to cover terrain and queue POIs.
+- **Movement detection** — runs in **both** zoom-out and zoom-in levels (skipped only during target-follow). Per-zoom-band thresholds keep false-positive rate below the operator-review cap; classical OpenCV adequacy at zoom-in is benchmark-gated (Q14).
+- **Semantic zoom search** — detect primitives such as black entrances, branch piles, footpaths, roads, trees, and tree blocks, then reason over scene context to find concealed positions.
+
+The system controls a two-level scan:
+
+- **Zoom-out level (wide-area sweep)** — the camera follows the UAV route at wide or light/medium zoom, sweeping left-right across the flight path while detecting primitives, buildings, vehicles, and small motion candidates. Footpath starts, suspicious branch piles, tree rows, movement candidates, and similar POIs are marked with GPS-denied coordinates and queued.
+- **Zoom-in level (detailed scan)** — the camera zooms into each queued POI or movement candidate for confirmation. It follows detected footpaths from origin to endpoint, keeps paths centered while the UAV moves, follows the freshest or most promising branch at intersections, holds on endpoints for VLM analysis of branch piles, dark entrances, dugouts, vehicles, or people, and slowly pans broader POIs such as tree rows or clearings. Movement detection continues, scaled for the higher pixel-to-metre ratio. After analysis or timeout, it returns to zoom-out and continues the queue or route.
+
+When an operator confirms a target, the system switches to **target-follow mode**: keep the target centered with gimbal control while the UAV moves, until the operator releases it or tracking is lost.
+
+### 7.2 Mission Regions and Reconnaissance Flow
+
+Mission directions can be vague. Waypoints define a route that passes through multiple regions:
+
+```text
+Start → Point1 → Point2 → Point3 → Point4 → Point5 → Point6 → Finish
+                    ╔═══════════════╗
+                    ║   Region 1    ║
+                    ╚═══════════════╝
+         ╔══════════════════╗
+         ║    Region 2      ║
+         ╚══════════════════╝
+  ╔══════════════╗
+  ║   Region 3   ║
+  ╚══════════════╝
+```
+
+The autopilot decides the route within each region (1, 2, and 3).
+
+**Alternative scenario — region-only search.** The user selects only a region for the search (no explicit waypoints inside). The autopilot plans its own route within the region.
+
+```text
+Start ──┐
+        │    ╔═══════════════╗
+        ├───►║    Region     ║  (contains Points)
+        │    ╚═══════════════╝
+Finish◄─┘
+```
+
+**Reconnaissance flow.** The reconnaissance UAV:
+
+1. Searches within the region and finds potential targets.
+2. Sends images to the retranslation UAV.
+3. The retranslation UAV forwards them to the human operator.
+4. The human operator makes a decision regarding the target using the behaviour-tree-driven `scan_controller` logic (`system-flows.md §F4`).
+
+**Scanning strategy.**
+
+- **Zoom-out level — wide-area scan.** Camera points along the UAV route with left-right swing. The detections service continuously recognises specific patterns as POIs. This initial scan runs at medium zoom while moving between targets. POI types: tree rows (potential caponiers, entrances concealed by tree rows); polygons (areas where military vehicles could be hidden); houses with vehicles or traces; roads and routes on snow or terrain, inside the forest, or near houses.
+- **Zoom-in level — detailed scan.** When the camera finds a POI or movement candidate, it zooms in and performs a detailed scan. During detailed scan it searches for trees, caponiers, military vehicles, and so on. Movement detection continues during the zoom-in hold (subject to the per-zoom-band thresholds) so a moving small target found mid-detail-scan is not lost.
+
+### 7.3 Restrictions
+
+**Hardware and camera.**
+
+- Jetson Orin Nano Super: 67 TOPS INT8, 8 GB shared LPDDR5; YOLO uses ~2 GB RAM, leaving ~6 GB for semantic detection, movement detection, and VLM.
+- All models use FP16 precision (frozen choice: keep FP16-only for all models).
+- Primary camera: ViewPro A40, 1080p (1920×1080), 40× optical zoom, f=4.25–170 mm, Sony 1/2.8" CMOS (IMX462LQR), HDMI or IP output at 1080p 30/60 fps.
+- Alternative camera: ViewPro Z40K at higher cost.
+- Thermal sensor (640×512, NETD ≤50 mK) is available only as a future enhancement, not a core requirement.
+
+**Operational.**
+
+- Flight altitude: 600–1000 m.
+- Support all seasons and terrain types: winter snow, spring mud, summer vegetation, autumn; forest, open field, urban edges, and mixed terrain. (Frozen choice: MVP must cover **all** seasons, not winter-first only.)
+- ViewPro A40 40× optical zoom traversal takes 1–2 s; zoom-out → zoom-in transition must complete within ≤2 s including physical zoom.
+- Movement detection runs at **both** zoom-out and zoom-in levels, compensates for UAV/gimbal motion, and queues candidates for zoom confirmation; target following starts only after operator confirmation. Per-zoom-band thresholds (cluster persistence, residual-velocity floor, telemetry-skew tolerance) are configurable.
+
+**Software.**
+
+- Inference: TensorRT on Jetson, ONNX Runtime fallback, 1280 px model input, tile splitting for large images.
+- VLM must run locally on Jetson with no cloud dependency, as a separate IPC process — not compiled into the autopilot binary.
+- YOLO and VLM inference run sequentially because they share GPU memory; no concurrent execution.
+
+**Reliability and safety.**
+
+- **Lost-link failsafe is mandatory.** Loss of the operator/Ground-Station modem link triggers a deterministic ladder in `mission_executor` (default RTL after a 30 s grace; configurable per mission). Loss of the airframe MAVLink link itself triggers immediate health → red and degrades to whatever ArduPilot/PX4's own failsafe dictates.
+- **Pre-flight self-test (BIT) gates takeoff.** GPS lock, camera RTSP healthy, gimbal homed (yaw/pitch/zoom feedback within tolerance), `../detections` reachable + warmed, mission loaded + validated, MapObjects pre-flight pull complete (or cached fallback acknowledged with operator confirm), VLM warm (if `vlm_enabled`), persistent-store space ≥ configured floor.
+- **Battery / fuel thresholds enforced.** `mission_executor` triggers RTL at battery ≤ configured RTL-floor (e.g. 25 %); land-now at hard-floor (e.g. 15 %); ignored only on operator override. Surfaces health → yellow / red accordingly. Threshold values are mission-configurable.
+- **Sustained frame-rate floor.** Below ≥10 fps sustained, `scan_controller` suppresses zoom-in transitions (only TIER 1 + operator overlay continue) and surfaces health → yellow.
+- **Wall-clock time source.** Monotonic clock is authoritative for telemetry-skew compensation and tick budgets. Wall-clock is bound to GPS time once GPS is locked (preferred) or NTP-set at boot if reachable; both are recorded with `clock_source` and `last_sync_at`. Drift > 200 ms surfaces health → yellow.
+- **On-device storage is bounded.** `mapobjects_store` retention + log buffer have configured caps; on cap-hit, oldest pre-current-mission data is evicted; persistent-store-full pre-flight is a BIT failure.
+
+**Integration and scope.**
+
+- The `../detections` service is FastAPI + Cython + TensorRT in a Docker container on Jetson; consumed via bi-directional gRPC.
+- Consume YOLO boxes with class, confidence, and normalised coordinates; output boxes in the same format for operator display.
+- Movement candidates and confirmed followed targets use the same normalised box format for operator display.
+- GPS coordinates come from the GPS-denied service (`../_docs/11_gps_denied.md`) and are out of scope for autopilot's own implementation.
+- **MapObjects sync** uses the central `missions` API extension `/missions/{id}/mapobjects` (pre-flight GET, post-flight POST). Schema in §7.13.
+- Annotation tooling, training pipeline, and data-collection automation are separate repositories and out of scope.
+- GPS-denied navigation is a separate project; mission planning and route selection inside a region remain in autopilot.
+
+**Frozen choices (2026-05-06, updated 2026-05-18).** Gating decisions for downstream design:
+
+1. **Tier 1 remains FP16-only** for all models. INT8 is rejected for MVP.
+2. **MVP acceptance requires all seasons**, not winter-first only.
+3. **Operator-review cap is ≤5 POIs/minute** (moderate cap chosen).
+4. **Movement detection assumes timestamped video, gimbal angle/zoom, and UAV motion telemetry** for MVP. Naive frame-differencing is rejected. Movement detection runs at both zoom-out and zoom-in; classical OpenCV adequacy at zoom-in is benchmark-gated (Q14).
+5. **Local VLM is required for MVP** if and only if the exact model satisfies ≤5 s/ROI and the memory budget; otherwise VLM is disabled for MVP and `scan_controller` operates without it.
+6. **MapObjects are mission-bracketed and centrally synchronised** via the `missions` API. In-flight sync is **batched only** for MVP (no streaming over modem).
+7. **Operator commands are authenticated, signed, and replay-protected.** Modem-link encryption alone is not sufficient.
+
+### 7.4 Acceptance Criteria
+
+**Latency.**
+
+| Tier | Target | Hardware |
+|---|---|---|
+| Tier 1 fast probe (YOLO26 + YOLOE-26) | ≤100 ms/frame | Jetson Orin Nano Super |
+| Tier 2 fast confirmation (custom CNN) | ≤200 ms/ROI | Jetson Orin Nano Super |
+| Tier 3 optional deep analysis (VLM) | ≤5 s/ROI | Jetson Orin Nano Super |
+
+**YOLO object detection.**
+
+- Add classes: black entrances of various sizes, branch piles, footpaths, roads, trees, and tree blocks.
+- New classes target: P ≥80 %, R ≥80 %; existing class performance must not degrade.
+- Baseline reference: current YOLO achieves P=81.6 %, R=85.2 % on non-masked objects.
+
+**Semantic detection.**
+
+- Initial concealed-position recall: ≥60 %, accepting high false positives for later reduction.
+- Initial concealed-position precision: ≥20 %, with operators filtering candidates.
+- Footpath detection recall: ≥70 %.
+- Pipeline consumes YOLO primitives (footpaths, roads, branch piles, entrances, trees), assesses path freshness, traces paths to endpoints, identifies concealed structures, and follows the freshest or most promising branch at intersections.
+
+**Movement detection.**
+
+- During the zoom-out sweep, detect small moving point/cluster candidates that are not yet classifiable and enqueue them for zoom confirmation within 1 s.
+- During the zoom-in hold, continue movement detection (independent residual-motion clustering, scaled for the zoomed pixel-to-metre ratio) so a moving small target appearing inside a held POI is not lost; enqueue within 1.5 s.
+- Account for UAV and gimbal motion: stable objects (trees, houses, roads, terrain) must not be treated as moving only because the camera platform moves.
+- Movement candidates become zoom-in POIs; after zoom, the system attempts semantic / YOLO confirmation as vehicle, people, or other relevant target.
+- Zoom-in adequacy of classical OpenCV optical-flow / global-motion estimation is benchmark-gated. If the false-positive rate at zoom-in exceeds the per-zoom-band budget, fall back to a learned optical-flow / CNN-based motion module behind a feature flag (Q14).
+
+**Scan and camera control.**
+
+- Zoom-out level covers the planned route with a wide or light/medium-zoom left-right sweep; POIs include footpaths, tree rows, branch piles, black entrances, movement candidates, houses with vehicles or traces, and roads on snow / terrain / forest.
+- Transition zoom-out → zoom-in within 2 s of POI detection, including physical zoom from medium to high.
+- Zoom-in level keeps camera lock while the UAV flies, compensates for aircraft motion, pans along footpaths or movement candidates so they stay visible and centered, holds endpoints for VLM analysis up to 2 s, and returns to zoom-out after analysis or configurable timeout (default 5 s/POI).
+- After operator confirmation, target-follow mode keeps the target in the centre 25 % of frame while visible, until operator release, target loss, or timeout.
+- Gimbal module commands ViewPro A40 pan/tilt/zoom with ≤500 ms decision-to-movement latency, smooth transitions, and footpaths/moving targets kept centered during pan.
+- Maintain an ordered POI queue prioritised by confidence and proximity to current camera position.
+
+**Resources and data.**
+
+- Semantic module + movement module + VLM RAM: ≤6 GB on Jetson Orin Nano Super.
+- Must coexist with the running YOLO pipeline without degrading YOLO performance.
+- Training data: hundreds to thousands of annotated images/sequences across all seasons and terrain types.
+- Dedicated annotation needed for black entrances, branch piles, footpaths, roads, trees, and tree blocks; available dataset assembly effort is 1.5 months at 5 hours/day.
+
+### 7.5 Training Data
+
+**Source.**
+
+- Aerial imagery from reconnaissance winged UAVs at 600–1000 m altitude.
+- ViewPro A40 camera, 1080p resolution, various zoom levels.
+- Extracted from video frames and still images.
+- Movement detection requires frame sequences, not still images only; include camera/gimbal telemetry where available to separate target motion from UAV motion.
+
+**Target classes.**
+
+- Footpaths / trails (linear features on snow, mud, forest floor).
+- Fresh footpaths (distinct edges, undisturbed surroundings, recent track marks).
+- Stale footpaths (partially covered by snow / vegetation, faded edges).
+- Concealed structures: branch-pile hideouts, dugout entrances, squared / circular openings.
+- Tree rows (potential concealment lines).
+- Open clearings connected to paths (FPV launch points).
+- Moving point/cluster candidates across the full zoom range (wide, light/medium, full zoom-in) — sequences must include both zoom-out and zoom-in examples to support per-zoom-band threshold tuning.
+
+**YOLO primitive classes (new).**
+
+- Black entrances to hideouts (various sizes).
+- Piles of tree branches.
+- Footpaths.
+- Roads.
+- Trees, tree blocks.
+
+**Annotation format.**
+
+- Managed by existing annotation tooling in a separate repository.
+- Expected: bounding boxes and/or segmentation masks depending on model architecture.
+- Footpaths may require polyline or segmentation annotation rather than bounding boxes.
+
+**Seasonal coverage.**
+
+- Winter: snow-covered terrain (footpaths as dark lines on white).
+- Spring: mud season (footpaths as compressed/disturbed soil).
+- Summer: full vegetation (paths through grass/undergrowth).
+- Autumn: mixed leaf cover, partial snow.
+
+**Volume.**
+
+- Target: hundreds to thousands of annotated images/sequences.
+- Available effort: 1.5 months at 5 hours/day.
+- Potential for annotation-process automation.
+
+### 7.6 Solution Architecture
+
+A two-level onboard scan system (zoom-out wide sweep + zoom-in confirmation). The system delegates Tier 1 detection to the existing FastAPI / Cython / TensorRT YOLO service (`../detections`), adds a central scan/perception scheduler (`scan_controller`), compensates motion using synchronised video / gimbal / UAV telemetry (movement detection runs at both zoom levels), controls the ViewPro A40 through a deterministic state machine, and invokes a secured local VLM process only for bounded zoom-in confirmation.
+
+Before implementation decomposition, the project must pass a **benchmark gate** on target hardware: Tier 1 latency, Tier 2 ROI latency, VLM latency / memory, A40 zoom timing, movement-replay false-positive rate, and all-season dataset readiness.
+
+```text
+Video frames + timestamped gimbal/zoom/UAV telemetry
+        |
+        v
+Input validation + telemetry synchronisation
+        |
+        v
+Central scan/perception scheduler (scan_controller)
+        |
+        +---> Existing FastAPI/Cython TensorRT service (../detections)
+        |       YOLO26 + YOLOE-26 fixed-class FP16 engines
+        |
+        +---> Movement detector (active in ZoomedOut and ZoomedIn)
+        |       OpenCV ego-motion compensation + residual clusters,
+        |       per-zoom-band thresholds; learned-CV fallback Q14
+        |
+        +---> Tier 2 semantic analyzer
+        |       primitive graph + lightweight ROI CNN (zoom-in only)
+        |
+        v
+POI queue (confidence + proximity + aging + <=5 POIs/min cap)
+        |
+        +---> ViewPro A40 state-machine controller
+        |
+        +---> Secured local VLM IPC (optional, benchmark-gated)
+                NanoLLM VILA1.5-3B, structured VlmAssessment output
+```
+
+#### Benchmark gate
+
+The first implementation milestone is a proof suite, not product code. It validates:
+
+- YOLO26 + YOLOE-26 FP16 TensorRT, fixed 1280 px, batch 1, end-to-end ≤100 ms/frame.
+- Tier 2 primitive graph + lightweight CNN ≤200 ms/ROI.
+- NanoLLM VILA1.5-3B local VLM ≤5 s/ROI and within remaining memory budget while the YOLO container is present.
+- ViewPro A40 medium-to-high zoom transition and command-to-movement latency.
+- Movement replay false-positive rate **measured independently** at zoom-out and zoom-in, under the ≤5 POIs/minute operator-review cap. If zoom-in exceeds the per-zoom-band cap with classical CV, the learned-CV fallback (Q14) becomes a benchmark-gate prerequisite for the zoom-in scope.
+- All-season dataset readiness and hard-negative coverage.
+
+#### Tier 1 primitive detector
+
+Use custom-trained fixed-class YOLO26 and YOLOE-26 TensorRT FP16 engines, owned by `../detections`. Runtime open-vocabulary prompt mutation is **not** part of MVP; fixed project classes or pre-baked embeddings are required. Outputs remain normalised boxes for operator display, with optional masks or path geometry passed as POI metadata.
+
+#### Tier 2 semantic analyzer
+
+Use a primitive graph plus a lightweight ROI CNN to reason over paths, branch piles, dark entrances, roads, trees, tree blocks, clearings, vehicles, people, and endpoint context. This layer owns path freshness, endpoint scoring, branch choice at intersections, and concealment-POI scoring. Active in the zoom-in level only.
+
+#### Movement detector
+
+Active at **both** zoom-out and zoom-in (skipped only during target-follow). Use OpenCV optical flow / global-motion estimation fused with timestamped gimbal angle, zoom state, and UAV motion telemetry. Naive frame differencing is rejected because it cannot distinguish target motion from platform motion. A telemetry synchronisation contract specifies maximum tolerated frame ↔ gimbal ↔ zoom ↔ UAV timestamp skew before motion compensation; out-of-tolerance samples must be rejected or downgraded.
+
+**Per-zoom-band tuning.** Cluster persistence threshold, residual-velocity floor, and telemetry-skew tolerance are configured per zoom band (zoom-out, zoom-in). The pixel-to-metre ratio differs by ~10× between bands, so identical residual pixel motion implies very different physical motion; thresholds must scale.
+
+**Adequacy at zoom-in (research item, Q14).** Classical optical flow / global-motion estimation is well-validated at zoom-out (UAV cruising, gimbal sweeping, large FOV, ego-motion is the dominant signal and easily fitted). At zoom-in the gimbal is actively path-following, the FOV is narrow, motion blur from any small command is large, and the homography model degrades. The benchmark gate (below) MUST measure the false-positive rate at zoom-in independently from zoom-out; if it exceeds the per-zoom-band cap, the implementation falls back to a learned optical-flow module (e.g. RAFT-derived) or a CNN-based motion-segmentation module behind a feature flag, while keeping the same input/output contract.
+
+#### Scan controller and POI queue
+
+Use a deterministic typed state machine with **`ZoomedOut`**, **`ZoomedIn { roi, hold_started_at }`**, and **`TargetFollow { target_id, started_at }`** states. The queue is ordered by confidence, proximity, and aging while enforcing the ≤5 POIs/minute operator-review cap. The controller handles timeouts, target loss, VLM waits, return-to-zoom-out, and target-follow centre-window behaviour. The full behaviour-tree spec — including tick scenarios and the 15 fixed-wing rules — lives in `system-flows.md §F4`.
+
+#### Local VLM confirmation
+
+Run NanoLLM with VILA1.5-3B through a separate local IPC process **if** the benchmark gate passes. Use one bounded ROI crop, short prompt, short answer, and a validated `VlmAssessment` schema. Free-form VLM text is not a downstream API. The IPC channel uses Unix-domain socket permissions and peer-credential checks where available.
+
+**Optionality model.** VLM is the only optional Tier in the system. Two complementary mechanisms model this:
+
+1. **Runtime configuration flag (`vlm_enabled`)**, gated by the benchmark-gate result. When the flag is `false`, `scan_controller` skips the VLM-confirmation step and proceeds with Tier 2 evidence alone for the zoom-in hold; the operator timeout still applies.
+2. **Build-time feature module.** The `vlm_client` component is a separate module behind a feature flag; the binary must build, link, and run identically when the module is absent. `scan_controller` MUST NOT contain a hard dependency on `vlm_client`'s presence — it depends only on a `VlmAssessment` provider trait whose default implementation returns `status: vlm_disabled`.
+
+The implementation chooses one of these (or both); both must yield the same observable behaviour: the system functions correctly with VLM absent, only losing the zoom-in confirmation step.
+
+#### Integration and reliability
+
+Preserve the normalised-box contract while adding POI metadata. A central scheduler (`scan_controller`) owns GPU-heavy work and enforces no concurrent YOLO/VLM execution. No silent exception swallowing; health must reflect every dependency listed in §5.
+
+#### Security and operational controls
+
+- Validate image / ROI payload size and format before decoding or inference.
+- Use patched OpenCV versions and an image-format allow-list.
+- Enforce local IPC authorisation and payload limits for the VLM process (Unix-domain socket permissions plus peer-credential checks).
+- Log POI creation reasons, source detections, queue decisions, gimbal commands, VLM requests, operator confirmations, and failure states.
+- Keep VLM local with no cloud egress.
+
+### 7.7 MAVLink and Piloting
+
+`mavlink_layer` is a hand-rolled MAVLink v2 transport. There is no third-party SDK dependency. The layer owns serialisation / deserialisation, heartbeat, sequence numbers, retry, and a single connection abstraction (UDP or serial, picked at startup from CLI / env).
+
+**Command surface (~10–15 commands).** Only what the system actually needs:
+
+| MAVLink message | Direction | Used by | Purpose |
+|---|---|---|---|
+| `HEARTBEAT` | bidirectional | `mavlink_layer` | liveness + GCS-vs-companion identification |
+| `COMMAND_LONG` (subset) | out | `mission_executor` | arm / disarm, takeoff, set-mode, change-speed, change-alt, land, RTL |
+| `COMMAND_ACK` | in | `mavlink_layer` | command-result demux, retry trigger |
+| `MISSION_COUNT` | out | `mission_executor` | pre-upload count |
+| `MISSION_REQUEST_INT` | in | `mission_executor` | pull-side mission upload |
+| `MISSION_ITEM_INT` | out | `mission_executor` | per-waypoint upload |
+| `MISSION_ACK` | in | `mission_executor` | upload completion |
+| `MISSION_SET_CURRENT` | out | `mission_executor` | start at item 0 |
+| `MISSION_CURRENT` | in | `mission_executor` | progress |
+| `MISSION_ITEM_REACHED` | in | `mission_executor` | progress |
+| `MISSION_CLEAR_ALL` | out | `mission_executor` | reset before re-upload (e.g., middle waypoint) |
+| `GLOBAL_POSITION_INT` | in | `telemetry_stream`, `mission_executor` | live position |
+| `ATTITUDE` | in | `telemetry_stream` | attitude for operator overlay |
+| `SYS_STATUS` / `EXTENDED_SYS_STATE` | in | health aggregator | mode, battery, sensor health |
+| `STATUSTEXT` | in | logger | autopilot diagnostic lines |
+| `SET_MODE` (or `COMMAND_LONG MAV_CMD_DO_SET_MODE`) | out | `mission_executor` | flight-mode transitions for fixed-wing |
+
+If the autopilot link supports MAVLink-2 message signing it is enabled; otherwise the link is treated as trusted (it is point-to-point on a closed serial / UDP path on the airframe).
+
+**Piloting variants.** `mission_executor` runs one of two state machines depending on the airframe declared at startup:
+
+- **Multirotor variant**: `DISCONNECTED → CONNECTED → HEALTH_OK → ARMED → TAKE_OFF → MISSION_UPLOADED → FLY_MISSION → LAND`. The executor arms, takes off to a configured altitude, and only then uploads + starts the mission. Bounded retry with exponential backoff at every transition; explicit max-retry; on exceeding it, health flips to red and the executor surfaces the failure via the operator bridge.
+- **Fixed-wing variant**: `DISCONNECTED → CONNECTED → HEALTH_OK → MISSION_UPLOADED → WAIT_AUTO → FLY_MISSION → LAND`. The executor skips arm/takeoff (the airframe is assumed already airborne under RC control), uploads the mission, and waits for the operator to switch the airframe into AUTO mode via RC. Same retry policy.
+
+**Geofence enforcement.** `mission_executor` honours both INCLUSION and EXCLUSION polygons declared in the mission. INCLUSION violations halt forward progress and trigger return-to-launch (RTL); EXCLUSION violations trigger the same. The earlier C++ implementation parsed but silently ignored EXCLUSION; the new design rejects that behaviour explicitly.
+
+**Mission uploads and middle-waypoint inserts.** When the operator confirms a target, `operator_bridge` hands a middle-waypoint hint to `mission_executor`. The executor recomputes the mission (current-position → middle-waypoint → resume original route), clears the existing autopilot mission via `MISSION_CLEAR_ALL`, re-uploads the new mission via the standard `MISSION_COUNT` / `MISSION_ITEM_INT` / `MISSION_ACK` sequence, and resumes flight. After target-follow ends (operator release, target loss, or timeout), the same sequence reverts to the original mission.
+
+**Lost-link failsafe (operator/Ground-Station modem link).** A typed failsafe ladder runs in `mission_executor`, evaluated each tick:
+
+| Stage | Trigger | Action |
+|---|---|---|
+| `LinkOk` | last operator heartbeat ≤ 5 s | continue mission; no behavioural change |
+| `LinkDegraded` | 5 s < last heartbeat ≤ 30 s | continue mission; surface health → yellow; queue all POI surface-events for replay-on-recovery |
+| `LinkLost` | last heartbeat > 30 s **and** target-follow inactive | trigger RTL via `MAV_CMD_NAV_RETURN_TO_LAUNCH`; log mission abort with reason; continue logging the mission diff for post-flight upload via `mapobjects_store` |
+| `LinkLostInFollow` | last heartbeat > 30 s **and** in target-follow | hold target-follow for an additional 30 s grace (operator may have momentarily lost link); thereafter fall through to `LinkLost` |
+
+The grace windows (5 s, 30 s, 30 s) are mission-configurable. **MAVLink-link loss to ArduPilot/PX4 itself** is not the same event — it triggers immediate health → red and falls through to whatever the airframe autopilot's own failsafe does (we do NOT override it).
+
+**Battery / fuel thresholds.** `mission_executor` reads `SYS_STATUS` / `EXTENDED_SYS_STATE` and enforces:
+
+- `battery ≤ rtl_threshold` (default 25 %) → trigger RTL, log reason, continue post-mission upload.
+- `battery ≤ hard_floor` (default 15 %) → land-now via `MAV_CMD_NAV_LAND` at safest reachable point; surface health → red.
+
+Operator override is permitted via a signed command (per Q9); without it, the thresholds are hard.
+
+**Connection configuration.** A single connection URI at startup: `udp://...` or `serial:///dev/...`. No runtime URI swap.
+
+**Frames and altitudes.** All waypoints in the mission API use `MAV_FRAME_GLOBAL_RELATIVE_ALT`. Terrain-following frames are not used (no SRTM database on the airframe).
+
+### 7.8 Detection Classes
+
+These classes extend the default seed set used by the detections service.
+
+| Class           | Local Name (UA) | Notes                      |
+|-----------------|-----------------|----------------------------|
+| Rows of trees   | Посадка         | Linear vegetation cover    |
+| Trenches/Ditches| Рів             | Linear earthwork features  |
+| Trash piles     | Сміття          | Indicators of activity     |
+| Tire tracks     | Сліди від шин   | Signs of movement          |
+
+Plus the new YOLO primitive classes from §7.5 Training Data: black entrances of various sizes, branch piles, footpaths, roads, trees, and tree blocks.
+
+### 7.9 MapObjects (H3 spatial index)
+
+`MapObjects` are created and managed internally by autopilot. There are **no** REST API endpoints for MapObjects — autopilot reads/writes them directly in the on-device store (`mapobjects_store`). The only external reference is the delete cascade in `DELETE /missions/{id}` (per the suite-level missions API).
+
+Autopilot needs to store objects on a 2D map efficiently in order to find differences fast:
+
+- New objects (new pile of trash, new tire tracks).
+- Changed objects.
+- Removed objects.
+
+Each object on the map is described by:
+
+- `gps(lat, lon)` — geographic position.
+- `size(width, height)` — bounding area.
+
+**Spatial indexing.** Use a hexagonal spatial index to efficiently store and query objects by location.
+
+**Approach:** H3 library (by Uber) — hierarchical hexagonal geospatial indexing system.
+
+| Aspect              | Detail                                     |
+|---------------------|--------------------------------------------|
+| Library             | H3 (`h3rs` crate for Rust)                 |
+| Algorithm basis     | 3D icosahedron → 2D hexagonal tessellation |
+| Key advantage       | Uniform area cells, good neighbour queries |
+| Open question       | Optimal tile/resolution size               |
+| Known issue         | Discontinuity problem at cell boundaries   |
+
+The hexagonal grid avoids the distortion problems of square grids and provides consistent neighbour relationships, making it suitable for fast spatial diff operations (detecting new, changed, and removed objects).
+
+### 7.10 Drone ⇄ Operator Sync Message Format
+
+Detection data is synced between drone and operator using a compact message format. MGRS (Military Grid Reference System) is used as the primary coordinate encoding — compact, standardised, and directly usable on military maps.
+
+**Drone → Operator (detection report):**
+
+```text
+missionId :: MGRS(encoded) :: class :: confidence :: size_width_m :: size_length_m :: photo_metadata :: flags
+```
+
+**Operator → Drone (command/acknowledgment):**
+
+```text
+missionId :: Encoded(GroundMGRS :: Time) :: ... :: missionId2
+```
+
+Wire-level field semantics live in `data_model.md §MGRS sync message`.
+
+### 7.11 Target Relocation / Movement Analysis
+
+The system maintains a live **map of objects** and detects changes between survey passes.
+
+**Map update types.**
+
+| Type    | Meaning                                      |
+|---------|----------------------------------------------|
+| New     | Object not seen before in this area          |
+| Moved   | Object of same class appeared nearby         |
+| Removed | Previously recorded object no longer present |
+
+**Map hashtable.** Objects are stored in a hashtable keyed by MGRS grid reference:
+
+```text
+MGRS1  -> Object1
+MGRS2  -> Object5
+MGRS12 -> Object2
+MGRSN  -> ObjectM
+```
+
+### 7.12 New vs Existing / Moved / Removed Object Detection
+
+When a detection occurs, the system must determine whether the object is **new**, **moved**, or **already known**. This must be done efficiently in real time. This is the implementation of `scan_controller`'s map-diff responsibilities; it lives in `mapobjects_store`.
+
+**Algorithm.**
+
+```text
+On each detection(gps, class, confidence, size):
+
+1. Compute H3 cell index at chosen resolution (e.g. res 10 ~15m edge).
+2. Build composite key = H3_cell + class.
+3. Query k-ring(H3_cell, k=2) -> get all neighbouring cells.
+4. For each neighbouring cell, lookup objects with same or similar class:
+     similar_classes = {military_vehicle, tank, artillery}  (configurable groups)
+5. Compare:
+     - If matching object found within distance_threshold (config, e.g. 50m)
+       AND same class group -> EXISTING (or MOVED if position delta > move_threshold).
+     - If no match -> NEW -> insert into map with H3 hash key.
+6. After full sweep: objects in the region that were NOT re-observed -> REMOVED candidates.
+```
+
+**Why H3 + MGRS.**
+
+| Step                     | Mechanism                  | Complexity |
+|--------------------------|----------------------------|------------|
+| Spatial cell lookup      | H3 `latlng_to_cell`        | O(1)       |
+| Neighbour query          | H3 `grid_disk(k=2)`        | O(1)       |
+| Object lookup per cell   | Hashtable by `MGRS+class`  | O(1)       |
+| Total per detection      | ~constant time             | O(k²)      |
+
+**Configurable parameters.**
+
+| Parameter            | Example Value | Purpose                                              |
+|----------------------|---------------|------------------------------------------------------|
+| search_radius_km     | 30            | Max radius to search for previously known objects    |
+| distance_threshold_m | 50            | Max distance to consider same object                 |
+| move_threshold_m     | 10            | Min displacement to flag as "moved"                  |
+| h3_resolution        | 10            | ~15 m edge length, good for vehicle-sized objects    |
+| similar_classes      | per config    | Class groups treated as equivalent for matching      |
+
+**Notes.**
+
+- The 30 km radius is for the broad initial query ("get all previously stored objects within 30 km"). H3 `grid_disk` at resolution 10 with k=2 covers ~90 m radius — this handles fine-grained matching. For the broad query, use a coarser H3 resolution (e.g. res 4 ~22 km edge) as a pre-filter.
+- `MGRS+class` is the composite key for the hashtable so that lookups are partitioned by both location and object type.
+- The discontinuity problem at H3 cell boundaries is solved by always querying the k-ring (centre cell + neighbours), ensuring objects near an edge are still matched.
+
+### 7.13 MapObjects Sync (central DB)
+
+`mapobjects_store` is **not** a private on-device database. It is the working copy of a centrally maintained map of detected objects, scoped to the mission's bounding box, synchronised on a per-mission basis.
+
+**Mirror of the GPS-Denied satellite-tile pattern.** Pre-flight, autopilot pulls the relevant central state into the on-device store; in-flight the on-device store is authoritative; post-flight, autopilot pushes the mission's full pass diff back to the central store. The central store is the source of truth across missions and across UAVs; the on-device store is the source of truth during the active mission.
+
+**Endpoint hosting (frozen 2026-05-18).** The endpoints are an extension of the existing `missions` API. There is no separate `mapobjects` service.
+
+| Endpoint | Method | Purpose |
+|---|---|---|
+| `/missions/{id}/mapobjects` | `GET` | Pre-flight: returns the central map state for the mission's bounding box (last-known objects + ignored items). |
+| `/missions/{id}/mapobjects` | `POST` | Post-flight: uploads the mission's full pass diff (NEW / MOVED / REMOVED-CANDIDATE / CONFIRMED-EXISTING) for central merge. |
+| `/missions/{id}/mapobjects/ignored` | `GET` | Pre-flight: returns the central ignored-items list scoped to the mission area. |
+| `/missions/{id}/mapobjects/ignored` | `POST` | Post-flight: uploads ignored-items appended during the mission. |
+| `DELETE /missions/{id}` | (existing) | Cascade: drops mission-scoped MapObjects and IgnoredItems centrally as well as on-device. |
+
+In-flight sync is **batched only** for MVP — no streaming over modem. Cross-UAV awareness lags by mission length; this is an explicit MVP trade-off (Frozen choice 6 in §7.3).
+
+**Sync lifecycle (per mission).**
+
+1. **Pre-flight pull** — `mission_client` calls `GET /missions/{id}/mapobjects` after fetching the mission itself. Response hydrates `mapobjects_store`. Failure modes:
+   - **Reachable + 200**: hydrate; record `pull_completed_at`. Sync state = `synced`.
+   - **Reachable + 4xx**: fail BIT; surface error; operator must investigate (likely mission-id mismatch or unauthorised UAV).
+   - **Unreachable / timeout**: BIT degrades. Operator may acknowledge to continue with **last-cached** state for this mission area (`sync state = cached_fallback`); the BIT failure is recorded for post-mission audit.
+   - **Empty response**: `sync state = synced`, store empty (legitimate first-flight in this area).
+2. **In-flight** — store is authoritative. All NEW / MOVED / EXISTING / IgnoredItem appends accumulate in the on-device store with `pending_upload = true`. No central writes.
+3. **Post-flight push** — `mission_client` calls `POST /missions/{id}/mapobjects` with the mission's full pass diff after landing or RTL. Conflict resolution is server-side per §7.13 conflict rules. Failure modes:
+   - **Reachable + 200**: clear `pending_upload`; record `push_completed_at`. Sync state = `synced`.
+   - **Unreachable / timeout / 5xx**: persist the pending diff on disk, retry with backoff. After max retries (configurable, default 24 h), surface as a warning; operator may manually trigger replay or accept loss.
+   - **4xx (rejected)**: log full payload, surface to operator; do not silently discard — the mission's results are at risk.
+
+**Conflict resolution at the central store (open question Q8 — proposed).** When two missions report contradicting state for the same `(h3_cell, class_group)`:
+
+- Both observations are **appended** to the per-`(h3_cell, class_group)` observation log (no destructive overwrite).
+- The "current view" surfaced to operator UI is computed from the observation log: most recent confirmed-existing observation wins; older REMOVED claims expire after a configurable age; class-group ambiguities surface as multi-class candidates.
+- IgnoredItems are union-merged (any operator-decline at any UAV propagates to all future missions in the same area, until explicit clear).
+
+**Central-side schema (SQL, indicative).**
+
+```sql
+-- Observations: every detection ever reported by any UAV/mission, never overwritten.
+CREATE TABLE map_object_observations (
+    id              UUID PRIMARY KEY,
+    h3_cell         BIGINT NOT NULL,
+    class           TEXT NOT NULL,
+    class_group     TEXT NOT NULL,
+    mission_id      UUID NOT NULL REFERENCES missions(id) ON DELETE CASCADE,
+    uav_id          UUID NOT NULL,
+    observed_at     TIMESTAMPTZ NOT NULL,
+    gps_lat         DOUBLE PRECISION NOT NULL,
+    gps_lon         DOUBLE PRECISION NOT NULL,
+    mgrs            TEXT NOT NULL,
+    size_width_m    REAL,
+    size_length_m   REAL,
+    confidence      REAL NOT NULL,
+    diff_kind       TEXT NOT NULL CHECK (diff_kind IN ('NEW','MOVED','EXISTING','REMOVED_CANDIDATE')),
+    photo_ref       TEXT,
+    raw_evidence    JSONB
+);
+CREATE INDEX ON map_object_observations (h3_cell, class_group);
+CREATE INDEX ON map_object_observations (mission_id);
+CREATE INDEX ON map_object_observations (observed_at DESC);
+
+-- IgnoredItems: per-area operator declines, union-merged across missions.
+CREATE TABLE map_object_ignored (
+    id              UUID PRIMARY KEY,
+    h3_cell         BIGINT NOT NULL,
+    mgrs            TEXT NOT NULL,
+    class_group     TEXT NOT NULL,
+    declined_at     TIMESTAMPTZ NOT NULL,
+    operator_id     UUID,
+    mission_id      UUID REFERENCES missions(id) ON DELETE SET NULL,
+    retention_scope TEXT NOT NULL CHECK (retention_scope IN ('mission','session','until_expiry')),
+    expires_at      TIMESTAMPTZ
+);
+CREATE INDEX ON map_object_ignored (h3_cell, class_group);
+CREATE INDEX ON map_object_ignored (expires_at) WHERE retention_scope = 'until_expiry';
+
+-- Materialised "current view" derived from observations + ignored.
+-- Recomputed nightly or on POST. Exact projection rules per §7.13 conflict resolution.
+CREATE MATERIALIZED VIEW map_objects_current AS ...;
+```
+
+**On-device-side schema (engine TBD per §8 Q3 — indicative shape).**
+
+```text
+mapobjects_store/
+  current_state            -- key = (h3_cell, class_group); value = MapObject record
+  pending_observations     -- ordered log of unflushed observations for post-flight POST
+  pending_ignored          -- unflushed IgnoredItem appends
+  sync_state               -- {pull_completed_at, push_completed_at, last_error, kind}
+```
+
+The on-device shape is intentionally narrower than the central schema — the on-device store does not need full observation history beyond the active mission; older history is only ever consulted via the central pull.
+
+**Bounding-box pull strategy.** The central API uses the mission's geofence INCLUSION polygon (or a generous AABB if no INCLUSION is set) to scope the response. Pulled records are filtered by retention age (default ≤30 days); operator can override to "all". The 30 km / k-ring numbers in §7.12 apply to **on-device** spatial queries; the pull radius is mission-defined.
+
+### 7.14 Tech Stack
+
+**Requirements.**
+
+| Area | Requirement |
+|---|---|
+| Runtime hardware | Jetson Orin Nano Super 8 GB, locked JetPack/power mode, ViewPro A40. |
+| Inference (Tier 1) | FP16 only, TensorRT primary, ONNX Runtime fallback, 1280 px model input. Lives in `../detections`. |
+| Service integration | Bi-directional gRPC client to the existing FastAPI + Cython + TensorRT detections service. |
+| VLM | Local-only, separate IPC process, sequential with YOLO, ≤5 s/ROI if used for MVP. |
+| Movement | Active at zoom-out and zoom-in, moving-camera compensation with timestamped video / gimbal / UAV telemetry; per-zoom-band thresholds; learned-CV fallback per Q14. |
+| MapObjects sync | Mission-bracketed: pre-flight `GET` + post-flight `POST` against `/missions/{id}/mapobjects`. Batched only for MVP. |
+| Output | Existing normalised-box format plus POI metadata for queue / reasoning. |
+| Proof gates | Hardware/replay benchmark suite before implementation decomposition; movement zoom-in benchmark independent of zoom-out. |
+
+**Selected stack.**
+
+| Layer | Selection | Rationale |
+|---|---|---|
+| Language (autopilot) | Rust | Memory safety, performance, single-binary deployment, strong type system for the deterministic state machine. |
+| Language (`../detections`) | Python + Cython | Existing service; we consume it, not rewrite it. |
+| Tier 1 detector | YOLO26 + YOLOE-26 fixed-class FP16 TensorRT | Best fit with acceptance criteria and export docs. Owned by `../detections`. |
+| Tier 2 analyzer | Primitive graph + lightweight CNN | Fast, explainable, data-efficient. |
+| Movement | OpenCV optical flow + telemetry | Directly addresses moving-camera constraint. |
+| VLM runtime | NanoLLM / VILA1.5-3B (with fallback benchmark path) | Documented local-multimodal path; matches no-cloud requirement. |
+| Scan controller | Deterministic typed state machine (Rust) | Simpler and easier to test for a fixed `ZoomedOut` / `ZoomedIn` / `TargetFollow` lifecycle. |
+| MAVLink transport | Hand-rolled in autopilot (Rust) | Eliminates the largest current dependency-risk item; small command surface (§7.7). |
+| Gimbal protocol | ViewPro A40 vendor protocol over UDP | Matches the deployed camera. |
+| `mapobjects_store` engine | TBD (SQLite + H3 extension / KV / in-memory + snapshot) | Open question; see §8. |
+| Inter-component IPC (in-process) | Tokio channels / actors | Idiomatic Rust async. |
+| External IPC (VLM) | Unix-domain socket with peer-credential check | Local-only authorisation. |
+| VLM output | Validated structured `VlmAssessment` schema | Makes VLM output a stable API contract. |
+| Input security | Content / size allow-list + patched OpenCV | Reduces crafted-input and resource-exhaustion risk. |
+| Observability | `tracing` + JSON logs to stdout, scraped by the deployment's log-shipping stack | See `deployment/observability.md`. |
+| Build | `cargo` cross-compile for `aarch64-unknown-linux-gnu` | See `deployment/ci_cd_pipeline.md`. |
+
+**Risk register.**
+
+| Risk | Impact | Mitigation |
+|---|---|---|
+| Tier 1 misses ≤100 ms/frame | Blocks acceptance | Fixed-shape FP16 engines, batch 1, benchmark before implementation decomposition. |
+| VLM misses ≤5 s/ROI or memory budget | Blocks VLM-required MVP policy | Benchmark NanoLLM / VILA first; fall back to smaller VLM only if it passes the same gates; otherwise disable VLM via `vlm_enabled=false`. |
+| All-season MVP data is insufficient | Blocks detection-quality targets | Per-season dataset gates and hard-negative mining. |
+| Movement false positives exceed ≤5 POIs/min | Operator overload | Telemetry-aided compensation, replay tests, queue cap, per-zoom-band thresholds. |
+| Classical OpenCV optical flow inadequate at zoom-in | Loss of zoom-in movement detection | Benchmark gate measures zoom-in independently; fallback to learned-CV / CNN motion module behind feature flag (Q14). |
+| Operator/Ground-Station modem link lost mid-flight | Uncontrolled UAV | Typed lost-link failsafe ladder in `mission_executor` (§7.7); RTL after 30 s grace; configurable. |
+| Battery / fuel below threshold mid-mission | Forced landing or crash | Hard-coded RTL + land-now thresholds (§7.7); operator override only via signed command. |
+| Operator command spoofing / replay over modem RF | Hostile hijack of operator commands | Authenticated, signed, replay-protected command envelope (§5; scheme TBD per Q9). |
+| Pre-flight self-test (BIT) misses a degraded dependency | Mid-flight component failure | BIT covers every dependency in §5 plus mission load + MapObjects pre-flight pull; cached-fallback acknowledgement is explicit. |
+| Wall-clock drift breaks operator-command timestamping | Forensic + audit failures | GPS-time-bound when GPS locked; NTP at boot; drift > 200 ms surfaces health → yellow. |
+| MapObjects post-flight push fails | Loss of mission-diff data centrally | Persist pending diff on disk; bounded retry; operator-visible warning; manual replay supported. |
+| A40 zoom transition exceeds ≤2 s | Breaks scan timing | Hardware-in-loop timing test; revise scan timeout / zoom range if needed. |
+| Hand-rolled MAVLink misses an edge case | Mission failure or hard-to-debug protocol behaviour | Conformance test against ArduPilot SITL; replay-based regression tests. |
+| Unstructured VLM output corrupts downstream decisions | Operator-facing false confidence | Schema validation, confidence enum, timeout / error state, fail-closed behaviour. |
+| Telemetry skew breaks movement compensation | False motion candidates | Define maximum frame / gimbal / UAV timestamp skew; reject / degrade unsynchronised samples. |
+| Untrusted image / ROI payloads exploit decoders or memory | Security and availability risk | Pin patched OpenCV, restrict formats, enforce size caps before decode. |
+
+---
+
+## 8. Open Questions
+
+| # | Question | Impact |
+|---|---|---|
+| Q1 | **Sweep pattern specification.** Pattern shape (pendulum / raster / lawn-mower), FOV per zoom tier, dwell time per direction, and whether sweep runs continuously or only between specific mission waypoints. | Blocks `scan_controller` zoom-out implementation. |
+| Q2 | **Ground Station API contract.** Stream protocol (WebRTC / WebSocket-H.264 / gRPC server-streaming?), session/auth model, and bbox-overlay rendering (server-side burn-in vs client-side render). | Blocks `telemetry_stream` + `operator_bridge` design. |
+| Q3 | **`mapobjects_store` engine.** SQLite + H3 extension / KV / in-memory + snapshot. | Blocks persistent-state design for ignored items + MapObjects. |
+| Q4 | **Tier 1 contract evolution.** How `detection_client` is versioned against an evolving `../detections` schema. | Blocks the gRPC contract definition. |
+| Q5 | **`mission-schema` extraction location.** `_infra/` at suite root, or a small third repo. | Blocks the `mission_client` / `missions` API contract sharing. |
+| Q6 | **MAVLink-2 message signing.** Whether the airframe link enables MAVLink-2 signing or treats the link as trusted. | Affects `mavlink_layer` startup handshake. |
+| Q7 | **Central MapObjects API contract.** Endpoint hosting is frozen as an extension of the `missions` API (§7.13). The remaining contract concerns are: schema versioning, paging strategy for large mission areas, photo-reference upload mechanism (URL handoff vs inline), and observation-history retention policy. | Blocks `missions` repo work + `mission_client` MapObjects sync code. |
+| Q8 | **MapObjects conflict resolution.** When two missions report contradicting state for the same `(h3_cell, class_group)`, the proposed rule is "append-only observation log + computed current view" (§7.13). Open: exact projection rules, REMOVED-claim expiry window, multi-class disambiguation. | Blocks central `map_objects_current` view definition. |
+| Q9 | **Operator-command authentication scheme.** The principle is committed (§5: signed, replay-protected). Scheme open: HMAC over (session_token, sequence_number, payload) vs JWT-style ed25519 vs MAVLink-2 signing extended to operator commands vs separate envelope. | Blocks `operator_bridge` validation logic + Ground Station integration. |
+| Q10 | **Software rollback policy on the airframe.** Watchtower OTA is mentioned in `../_docs/00_top_level_architecture.md`. Policy open: how a bad autopilot update is detected on the airframe (boot-time self-check, A/B partition, watchdog rollback) and rolled back without crew intervention. | Affects deployment design + on-airframe service supervision. |
+| Q11 | **Multi-operator session policy.** When two operators connect (one in primary station, one remote), which is authoritative for confirm/decline? Single active operator at a time, or quorum? How is `operator_id` recorded in `IgnoredItem`? | Blocks `operator_bridge` session model. |
+| Q12 | **Comms blackout during banking turns.** Winged UAV banking can lose modem LOS to Ground Station. Policy: tolerate brief blackouts as `LinkDegraded`, or suppress lost-link failsafe during known turn arcs (computed from mission shape)? | Affects lost-link failsafe ladder timing constants (§7.7). |
+| Q13 | **All-season acceptance flight gates.** Dataset gates (§7.4) are committed; flight-test gates are not. Open: minimum number of real flights per season before MVP acceptance, per-season acceptance pass criteria. | Affects MVP sign-off scope. |
+| Q14 | **Movement detection at zoom-in — fallback selection.** If classical OpenCV optical flow / global-motion estimation does not meet the per-zoom-band false-positive cap at zoom-in, the fallback module choice is open: learned optical flow (RAFT / FlowNet derivative) vs CNN motion segmentation vs IMU-tighter-coupled classical CV. The interface contract (`Frame + telemetry → Vec<MovementCandidate>`) is fixed; the implementation is replaceable. | Blocks `movement_detector` zoom-in scope if classical CV fails benchmark gate. |
+
+---
+
+## 9. Out of Scope
+
+- Multi-airframe coordination, fleet management, swarm logic.
+- Mission re-planning beyond middle-waypoint inserts.
+- Mission planning / route selection for arbitrary mission shapes (only intra-region routing).
+- GPS-denied navigation algorithms (delegated to the GPS-denied service, `../_docs/11_gps_denied.md`).
+- Cloud-hosted VLM or any external inference dependency.
+- Encrypted transport beyond what MAVLink-2 message signing and modem-level link encryption already provide.
+- Annotation tooling, model training, dataset curation (separate `ai-training` repo).
+- Operator browser UI (Ground Station hosts it; autopilot only feeds it).
+
+---
+
+## 10. External Suite Documents
+
+These suite-level documents live in the parent suite repo (`../_docs/`) and are consumed by autopilot but **not owned** by autopilot.
+
+| Suite-level path | Owner / primary-for | What autopilot uses it for |
+|---|---|---|
+| `../_docs/00_top_level_architecture.md` | suite (cross-cutting) | Suite topology, deployment tiers (`edge`), the **flight-gate convention** (`/run/azaion/in-flight` — written by autopilot, read by `model-sync.service`), Watchtower OTA model. Defines autopilot's place in the 11-component system. |
+| `../_docs/02_missions.md` | `missions` repo (.NET service) | Mission / Waypoint / Vehicle schemas. Autopilot consumes the missions API via `mission_client`. |
+| `../_docs/03_detections.md` | `detections` repo (Cython service) | Detections API spec. Autopilot consumes via bi-directional gRPC in `detection_client`. |
+| `../_docs/04_system_design_clarifications.md` | suite (cross-cutting) | REST patterns, stream-detection protocol, edge-device connection semantics. Defines the Ground Station push contract used by `telemetry_stream`. |
+| `../_docs/11_gps_denied.md` | `gps-denied-onboard` / `gps-denied-desktop` (shared primary) | GPS-Denied service architecture. Autopilot does NOT host any GPS-denied code; it consumes corrected GPS through the shared edge data path. |
+| `../_docs/12_ai_training.md` | `ai-training` repo | AI training pipeline. Autopilot consumes the resulting ONNX/TensorRT models via the rclone model-sync timer (flight-gate-aware). |
@@ -0,0 +1,76 @@
+# Component — `detection_client`
+
+**Layer**: Perception (data plane in)
+**Status**: forward-looking design (Rust)
+
+## 1. Purpose
+
+Bi-directional gRPC client to the external `../detections` service. Streams frames out, receives bounding-box detections back. Same bboxes are reused by `semantic_analyzer` (Tier 2 ROI selection) and by `telemetry_stream` (operator overlay). This is the only component in autopilot that talks to `../detections`.
+
+## 2. Inputs
+
+| Input | Source | Cadence | Notes |
+|---|---|---|---|
+| `Frame` | `frame_ingest` | up to 30 fps | Skipped when `ai_locked` is set. |
+| Tier-1 service config | startup config | once | gRPC endpoint, TLS settings, request budget, max concurrent streams. |
+
+## 3. Outputs
+
+| Output | Consumer | Shape |
+|---|---|---|
+| `DetectionBatch` | `scan_controller`, `semantic_analyzer`, `telemetry_stream` | `{ frame_seq: u64, detections: Vec<Detection>, latency_ms, model_version }` |
+| Health metric | health aggregator | gRPC connection state, `requests_in_flight`, `latency_p50/p99`, `errors_by_kind`, `model_version`. |
+
+`Detection` mirrors the `../detections` contract: `{ class_id, class_name, confidence, bbox_normalized, optional_mask_or_polyline, source_frame_seq }`.
+
+## 4. Key Responsibilities
+
+- Maintain a single bi-directional gRPC stream to `../detections`. Reconnect on stream loss with bounded exponential backoff.
+- Frame budgeting: respect the Tier-1 ≤100 ms/frame target by dropping older in-flight frames if a new frame arrives before the previous response (configurable).
+- Validate the response payload against the schema version the client was built against. Surface a hard error on schema mismatch; do not silently downcast.
+- Tag each `DetectionBatch` with the source frame's monotonic timestamp so downstream consumers can compute end-to-end latency.
+
+## 5. Internal State
+
+- gRPC channel, stream handle, reconnect state.
+- Sliding window of in-flight frame sequence numbers.
+- Last-known model version (echoed by `../detections` on each response or on stream init).
+
+State is in-process only.
+
+## 6. Failure Modes
+
+| Failure | Detection | Behaviour |
+|---|---|---|
+| `../detections` unreachable | gRPC connect error | Bounded exponential backoff; health → red after threshold; `scan_controller` continues but the `detection_client` health flag is red. |
+| Mid-stream cancellation by server | stream error | Reopen stream; do not lose frames in flight (best-effort retry on the latest only). |
+| Schema mismatch | response decode error | Hard error to the health aggregator; reject the response; alert. |
+| Model version change at runtime | new `model_version` on the stream | Log it; if the change implies new classes, surface to `scan_controller` so per-class thresholds can be reloaded. |
+| Consistent latency above budget | `latency_p99 > 100 ms` over a sliding window | Health → yellow; `scan_controller` may degrade to alternate-frame inference. |
+
+## 7. Dependencies
+
+**In-process**: `frame_ingest` (input), `scan_controller` / `semantic_analyzer` / `telemetry_stream` (output).
+
+**External**:
+- `../detections` gRPC service. Contract owner: `../_docs/03_detections.md`. Bi-directional streaming.
+
+## 8. Non-Functional Targets
+
+| Concern | Target |
+|---|---|
+| Per-frame round-trip latency | ≤100 ms (Tier-1 NFR; mostly owned by `../detections`, autopilot's call budget respects it) |
+| Reconnect latency | ≤2 s after `../detections` returns |
+| Throughput | up to 30 fps at 1080p |
+| Backpressure | drop oldest in-flight rather than queue indefinitely |
+
+## 9. Open Questions
+
+- Versioning strategy of the gRPC contract (covered in `architecture.md §8 Q4`).
+
+## 10. References
+
+- `architecture.md §1`, `§3`, `§7.6`.
+- `system-flows.md §F1`.
+- `../_docs/03_detections.md`.
+- `data_model.md §Detection`, `§DetectionBatch`.
@@ -0,0 +1,74 @@
+# Component — `frame_ingest`
+
+**Layer**: Perception (data plane in)
+**Status**: forward-looking design (Rust)
+
+## 1. Purpose
+
+Pull RTSP from the ViewPro A40 camera, decode H.264/265 to raw frames, attach a monotonic timestamp + sequence number, and hand each frame to the downstream consumers (`detection_client`, `movement_detector`, `telemetry_stream`) without copying frame buffers more than once.
+
+Frames are the system's primary input. Everything downstream of `frame_ingest` is rate-limited by it.
+
+## 2. Inputs
+
+| Input | Source | Cadence | Notes |
+|---|---|---|---|
+| RTSP video stream | ViewPro A40 (via airframe IP/port) | 30 fps at 1080p (60 fps capable) | TCP or UDP transport per camera config. Re-opens on failure with bounded backoff. |
+| Camera startup config | Static config (env or CLI) | once at process start | Stream URL, transport, decode codec preference. |
+| `bringCameraDown` / `bringCameraUp` health signal | local supervisor (if present) | event | Optional. Used by deployments that gate AI access to the camera (e.g., during RC takeover). When `down` is asserted, `frame_ingest` continues decoding for `telemetry_stream` but flags frames as "AI-locked" so downstream consumers skip detection. |
+
+## 3. Outputs
+
+| Output | Consumer | Shape |
+|---|---|---|
+| `Frame` | `detection_client`, `movement_detector`, `telemetry_stream` | `{ seq: u64, capture_ts_monotonic: ns, decode_ts_monotonic: ns, pixels: Arc<Bytes>, width, height, pix_fmt, ai_locked: bool }` |
+| Health metric | health aggregator | `frames/s`, `decode_ms_p50/p99`, `last_frame_age_ms`, `reopens_total`, `decode_errors_total` |
+
+## 4. Key Responsibilities
+
+- Open the RTSP session and recover from transient connection loss with bounded exponential backoff.
+- Decode frames using a hardware decoder where available (NVDEC on Jetson) with software fallback.
+- Stamp each frame with a monotonic capture timestamp at the earliest practical point in the pipeline; this is what `movement_detector` uses for telemetry-skew checks.
+- Publish frames through a single multi-consumer channel (Tokio broadcast or equivalent) using `Arc<Bytes>` for pixel data so consumers do not copy.
+- Drop frames if downstream consumers fall behind beyond a configured queue depth; record the drop with a reason ({{detection_client_slow, movement_detector_slow, telemetry_slow}}) and surface it through the health endpoint.
+
+## 5. Internal State
+
+- RTSP session handle and reconnect state (closed / connecting / streaming / failing).
+- Last-frame timestamp and sequence number.
+- Per-consumer drop counters.
+
+State is in-process only; nothing persists across restarts.
+
+## 6. Failure Modes
+
+| Failure | Detection | Behaviour |
+|---|---|---|
+| RTSP connection refused / lost | TCP connect error / read timeout | Bounded exponential backoff (1 s → 30 s cap); health flips to yellow after first failure, red after `last_frame_age_ms` exceeds a configured threshold. |
+| Decode error on a single frame | decoder returns error | Drop the frame; increment `decode_errors_total`; do not abort the stream. |
+| Decoder cold-start latency | first-frame timestamp far from session-open | Surface `decode_ms_first_frame` once; not an alert by itself. |
+| Downstream consumer slow | broadcast channel back-pressure | Drop the oldest frame for that consumer; counter-tagged drop; warning on sustained drops. |
+| Camera output format mismatch | unexpected SPS/PPS | Hard-fail at session open with an explicit error; do not silently pick a wrong decode path. |
+
+## 7. Dependencies
+
+**In-process**: none upstream; downstream consumers are `detection_client`, `movement_detector`, `telemetry_stream`.
+
+**External**:
+- ViewPro A40 RTSP (live).
+- Hardware video decoder (NVDEC on Jetson) via FFmpeg / GStreamer or a Rust binding.
+
+## 8. Non-Functional Targets
+
+| Concern | Target |
+|---|---|
+| End-to-end frame latency (RTSP rx → publish to consumers) | ≤30 ms p99 on Jetson Orin Nano. |
+| Frame drop rate | ≤0.1 % under normal conditions. |
+| Reconnect latency after camera reboot | ≤5 s from camera availability. |
+| Memory | one decoded-frame buffer pool with bounded size; no unbounded growth on slow consumers. |
+
+## 9. References
+
+- `architecture.md §1 System Context`, `§3 Components`, `§7.6 Solution Architecture`.
+- `system-flows.md §F1 Frame pipeline`.
+- `data_model.md §Frame`.
@@ -0,0 +1,78 @@
+# Component — `gimbal_controller`
+
+**Layer**: Action (data plane out)
+**Status**: forward-looking design (Rust); ViewPro A40 vendor protocol
+
+## 1. Purpose
+
+Drives the ViewPro A40 gimbal: pan (yaw), tilt (pitch), and zoom. Honours the ≤2 s zoom-transition budget and ≤500 ms decision-to-movement latency. Owns the zoom-out sweep, the smooth-pan path-tracking primitive used during the zoom-in level (follow-the-footpath behaviour), and the centre-window primitive used during target-follow.
+
+## 2. Inputs
+
+| Input | Source | Cadence | Notes |
+|---|---|---|---|
+| `GimbalCommand` | `scan_controller` | per state-machine tick or per zoom-in plan step | yaw / pitch / zoom goal; or pan plan; or centre-on-target. |
+| Sweep config | startup config | once | Zoom-out sweep pattern (pendulum / raster / lawn-mower — see `architecture.md §8 Q1`). |
+| Live gimbal status | ViewPro A40 (vendor protocol) | as emitted by camera | yaw / pitch / zoom feedback + faults. |
+
+## 3. Outputs
+
+| Output | Consumer | Shape |
+|---|---|---|
+| Vendor-protocol commands | ViewPro A40 (UDP) | yaw / pitch / zoom commands |
+| `GimbalState` | `frame_ingest` (for telemetry tagging), `movement_detector` (for ego-motion compensation) | `{ yaw, pitch, zoom, ts_monotonic, command_in_flight: bool }` |
+| Health metric | health aggregator | `commands_per_min`, `decision_to_movement_p99_ms`, `zoom_transition_p99_ms`, `vendor_faults_total`. |
+
+## 4. Key Responsibilities
+
+- Send vendor-protocol commands to the ViewPro A40 over UDP. Re-issue on timeout with bounded retry.
+- Run the zoom-out sweep pattern when `scan_controller` is in `ZoomedOut` (pattern itself depends on `architecture.md §8 Q1` resolution).
+- For the zoom-in path-follow, accept a pan plan (sequence of yaw / pitch / zoom goals with timing) from `scan_controller` / `semantic_analyzer` and execute it smoothly.
+- For target-follow, accept a centre-on-target stream (target bbox normalized) from `scan_controller` and command the gimbal to keep the target inside the centre 25 % of frame while visible.
+- Stamp every emitted command with a monotonic timestamp so `movement_detector` can synchronise it with frames.
+- Surface vendor-protocol faults to health and to `scan_controller`.
+
+## 5. Internal State
+
+- Last-known commanded yaw / pitch / zoom.
+- Last-known reported yaw / pitch / zoom (from gimbal feedback).
+- Sweep pattern state (current direction, dwell counter).
+- Current execution mode: `Sweep | PanPlan | CentreOnTarget | Idle`.
+
+State is in-process only.
+
+## 6. Failure Modes
+
+| Failure | Detection | Behaviour |
+|---|---|---|
+| ViewPro A40 not responding | command timeout | Bounded exponential backoff; health → yellow then red; `scan_controller` is informed and may pause zoom-in. |
+| Decision-to-movement above budget | self-instrumented | Health → yellow; investigate (likely UDP loss or vendor firmware issue). |
+| Zoom transition stalls | feedback shows no zoom progress | Re-issue command; health → yellow; report to `scan_controller`. |
+| Target lost during target-follow | feedback + tracker | Surface `target_lost` to `scan_controller`; controller decides to release follow. |
+| Conflicting commands | execution-mode mismatch | Reject the lower-priority command; log a hard error; never silently merge. |
+
+## 7. Dependencies
+
+**In-process** (input): `scan_controller`.
+**In-process** (output): `frame_ingest`, `movement_detector` (timestamped state).
+
+**External**: ViewPro A40 over UDP (vendor protocol).
+
+## 8. Non-Functional Targets
+
+| Concern | Target |
+|---|---|
+| Decision-to-movement latency | ≤500 ms |
+| Zoom transition (medium → high) | ≤2 s |
+| Sweep pattern stability | bounded jitter; no overshoot beyond configured FOV bounds |
+| Target-follow centre-window | target inside centre 25 % of frame while visible |
+
+## 9. Open Questions
+
+- Sweep pattern specification (`architecture.md §8 Q1`): pendulum / raster / lawn-mower; FOV per zoom tier; dwell time per direction.
+
+## 10. References
+
+- `architecture.md §3`, `§6 NFR`, `§7.6 Solution Architecture`.
+- `system-flows.md §F2 Movement detection (zoom-out + zoom-in)`.
+- `data_model.md §GimbalState`.
@@ -0,0 +1,124 @@
+# Component — `mapobjects_store`
+
+**Layer**: Decision + Memory
+**Status**: forward-looking design (Rust); on-device working copy of the central MapObjects state, mission-bracketed
+
+## 1. Purpose
+
+On-device, H3-indexed working copy of the centrally maintained MapObjects state plus the IgnoredItems list, scoped to the active mission's bounding box. Computes new / moved / existing / removed diffs across survey passes and is the source of truth for the operator-decline suppression rule **for the duration of the active mission**.
+
+This is **not** a private database. It is hydrated pre-flight from the central `missions` API (`/missions/{id}/mapobjects`) and the mission's full pass diff is pushed back post-flight. The central observation log + computed current view are authoritative across missions and across UAVs (`architecture.md §7.13`).
+
+## 2. Inputs
+
+| Input | Source | Cadence | Notes |
+|---|---|---|---|
+| Pre-flight pull payload | `mission_client` (from `missions` API) | once per mission | Hydrates `current_state` + `pending_ignored`. |
+| New detection / movement candidate (with MGRS + class + size) | `scan_controller` | per detection | Each is classified as new / moved / existing. |
+| `IgnoredItem` append | `scan_controller` (on operator decline) | event | `(MGRS, class_group)` plus operator metadata. |
+| End-of-pass marker | `scan_controller` / `mission_executor` | event per pass over a region | Triggers the removed-candidate sweep. |
+| Mission delete cascade | suite-level missions API hook (process-level config; not a network call) | event | Drops mission-scoped objects on mission deletion. |
+| Post-flight push trigger | `mission_executor` | once per mission, on terminal state | Causes `mission_client` to drain `pending_observations` + `pending_ignored` to the central API. |
+
+## 3. Outputs
+
+| Output | Consumer | Shape |
+|---|---|---|
+| `MapObjectClassification` | `scan_controller` | `new \| moved \| existing \| removed_candidate` per detection |
+| `IgnoredItem` match | `scan_controller` | suppression flag for (MGRS, class_group) |
+| Pass diff | `mission_client` (post-flight upload) + `operator_bridge` (optionally surfaced in-flight) | new / moved / removed lists per pass |
+| Sync state | `scan_controller`, health aggregator | `synced \| cached_fallback \| degraded`; `pending_observations_count`, `pending_ignored_count` |
+
+## 4. Key Responsibilities
+
+- **Pre-flight hydrate** from `mission_client` pull. Establish `current_state` and `pending_ignored`. Surface `sync_state` (`synced` or `cached_fallback` or `degraded`).
+- Compute H3 cell for each detection at the configured resolution (default res 10, ~15 m edge).
+- Build the composite key `H3_cell + class`. Maintain an in-memory hashmap; persist asynchronously to disk for crash recovery.
+- Answer queries: `classify(detection) → new | moved | existing` using k-ring lookup and `(distance_threshold_m, move_threshold_m, similar_classes)` configuration.
+- After a region's scan-pass ends, return objects in the region that were not re-observed as `removed_candidate`s (the operator decides on actual removal).
+- Maintain the `IgnoredItem` set; answer suppression queries (`is_ignored(MGRS, class_group)`).
+- Append every NEW / MOVED / EXISTING / REMOVED-CANDIDATE / IgnoredItem event to `pending_observations` / `pending_ignored` for the post-flight push (in-flight central writes are forbidden — Frozen choice 6 in `architecture.md §7.3`).
+- **Post-flight push**: hand the contents of `pending_observations` + `pending_ignored` to `mission_client` for `POST /missions/{id}/mapobjects` and `POST /missions/{id}/mapobjects/ignored`. On ack, clear pending; on failure, persist for retry.
+- On `DELETE /missions/{id}` cascade signal (received via `mission_client`), drop all objects scoped to that mission. The central side cascades as well.
+
+## 5. Sync state machine
+
+```text
+fresh_boot
+   │
+   ├──> pre-flight pull
+   │       │
+   │       ├── 200 OK ────────────> synced
+   │       ├── unreachable ────────> [operator ack required]
+   │       │                           │
+   │       │                           ├── ack on cache ──> cached_fallback
+   │       │                           └── abort ─────────> BIT fail
+   │       └── 4xx ─────────────────> BIT fail
+   │
+   ├── (during flight; in-process writes only)
+   │       │
+   │       ├── pending_observations grow
+   │       └── pending_ignored grow
+   │
+   └── post-flight push
+           │
+           ├── 200 OK on both endpoints ──> synced (pending cleared)
+           ├── partial ────────────────────> retry per-endpoint
+           └── persistent failure ─────────> degraded (operator warning, manual replay)
+```
+
+## 6. Internal State
+
+- In-memory hashmap of `(H3_cell + class) → MapObject`.
+- `IgnoredItem` set keyed by `(MGRS, class_group)`.
+- Per-region pass tracker for removed-candidate detection.
+- `pending_observations`: ordered log of NEW / MOVED / REMOVED-CANDIDATE / EXISTING events not yet pushed centrally.
+- `pending_ignored`: ordered log of IgnoredItem appends not yet pushed centrally.
+- `sync_state`: enum + last-pull timestamp + last-push timestamp + last error.
+- Persistence layer (engine TBD — see Open Questions) for crash recovery and post-flight upload durability.
+
+## 7. Failure Modes
+
+| Failure | Detection | Behaviour |
+|---|---|---|
+| Pre-flight pull unreachable | network | Surface BIT degradation; operator must acknowledge cached fallback or abort. Never silent. |
+| Pre-flight pull stale beyond freshness window | last-fetch-at compared to configured staleness | `sync_state = degraded`; operator must acknowledge or abort. |
+| Persistence write failure | engine error | Log + retry; in-memory state continues authoritative for this mission; health → yellow. |
+| Persistence corruption on startup | checksum / open failure | Refuse to start with stale state; require explicit recovery (engine-specific); surface to operator at startup. |
+| H3 query inconsistency near cell boundaries | algorithmic | Always query the k-ring (k=2 default) so boundary objects are matched anyway. |
+| Mission cascade signal lost | absent signal | `DELETE /missions/{id}` is the only cleanup trigger; on lost signal, mission-scoped objects accumulate. Operator-driven manual purge is acceptable. |
+| Post-flight push partial success | per-endpoint status | Independent retry per endpoint; do not roll back the successful one. |
+| Post-flight push persistent failure | bounded retries exhausted | `sync_state = degraded`; pending diff persisted on disk; operator-visible warning; manual replay supported. Mission's central data integrity at risk until replayed. |
+| In-flight crash | startup detects non-empty `pending_*` for a terminated mission | `mission_client` runs the post-flight push at startup before BIT completes for any new mission. |
+
+## 8. Dependencies
+
+**In-process**: `scan_controller`, `mission_client` (for pull/push round-trips), `mission_executor` (for post-flight trigger).
+
+**External**: H3 spatial-index library (Rust crate). Persistent store engine — TBD (SQLite + H3 extension / KV / in-memory + snapshot — see Open Questions). Central API contract via `mission_client`'s extension of the `missions` API (per `architecture.md §7.13`).
+
+## 9. Non-Functional Targets
+
+| Concern | Target |
+|---|---|
+| Per-detection classify latency | O(1); p99 ≤1 ms |
+| Pre-flight pull time | ≤30 s for a 30 km × 30 km mission area (per `architecture.md §6 NFR`) |
+| Post-flight push time | ≤2 min for a 60 min mission's pass diff (per `architecture.md §6 NFR`) |
+| Persistent-store size (single mission) | bounded; configurable retention |
+| Crash recovery time | ≤2 s to a usable state; in-flight crash → next-boot push of pending |
+| Boundary correctness | guaranteed by k-ring query |
+
+## 10. Open Questions
+
+- **Engine choice** (architecture.md §8 Q3): SQLite + H3 extension / KV / in-memory + snapshot.
+- **Central API schema details** (architecture.md §8 Q7): paging strategy, photo-reference upload mechanism, observation-history retention policy.
+- **Conflict resolution rules** (architecture.md §8 Q8): exact projection from observation log to current view; REMOVED-claim expiry window; multi-class disambiguation.
+- Optimal H3 resolution per terrain class.
+- Class-group definitions (`military_vehicle_group` vs `concealed_position_group` vs `movement_candidate`) — currently in `scan_controller` config.
+
+## 11. References
+
+- `architecture.md §3`, `§5 Architectural Principles` (MapObjects are mission-bracketed and centrally synchronised), `§6 NFR`, `§7.9 MapObjects (H3 spatial index)`, `§7.10 Sync Message Format`, `§7.11 Target Relocation`, `§7.12 New vs Existing object detection`, `§7.13 MapObjects Sync`.
+- `system-flows.md §F7 MapObjects + ignored-items` (in-flight diff), `§F8 MapObjects sync (central DB, mission-bracketing)`.
+- `data_model.md §MapObject`, `§IgnoredItem`, `§MapObjectObservation`, `§MapObjectsBundle`.
+- `../_docs/02_missions.md` (mission cascade contract; new MapObjects endpoints).
@@ -0,0 +1,87 @@
+# Component — `mavlink_layer`
+
+**Layer**: Action (data plane out)
+**Status**: forward-looking design (Rust); hand-rolled (no third-party SDK)
+
+## 1. Purpose
+
+Hand-rolled MAVLink v2 transport. Implements only the ~10–15 commands this codebase needs (full list in `architecture.md §7.7`). Owns serialisation / deserialisation, heartbeat, sequence numbers, retry, and a single connection abstraction (UDP or serial, picked at startup from CLI / env). No third-party SDK — eliminating the largest current dependency-risk item.
+
+## 2. Inputs
+
+| Input | Source | Cadence | Notes |
+|---|---|---|---|
+| Outgoing `COMMAND_LONG`, `MISSION_*`, `SET_MODE` | `mission_executor` | per state transition | Hand-rolled message constructors per command. |
+| Outgoing heartbeat | self (timer) | 1 Hz | `HEARTBEAT` to keep the autopilot's GCS-link alive. |
+| Connection URI | startup config | once | `udp://...` or `serial:///dev/...`. |
+| MAVLink-2 signing config | startup config | once | If supported by the link, signing is enabled; otherwise the link is treated as trusted. |
+
+## 3. Outputs
+
+| Output | Consumer | Shape |
+|---|---|---|
+| Decoded MAVLink messages | `mission_executor`, `telemetry_stream`, `movement_detector` (for UAV motion telemetry) | typed enum per message kind |
+| Connection state | health aggregator | `connected`, `last_heartbeat_age_ms`, `tx_seq`, `rx_seq`, `parse_errors_total`, `signing_enabled`. |
+
+The supported message surface (concise list; full table in `architecture.md §7.7`):
+
+- `HEARTBEAT` (bidir)
+- `COMMAND_LONG` subset (out): arm/disarm, takeoff, set-mode, change-speed, change-alt, land, RTL
+- `COMMAND_ACK` (in)
+- `MISSION_COUNT`, `MISSION_REQUEST_INT`, `MISSION_ITEM_INT`, `MISSION_ACK`, `MISSION_SET_CURRENT`, `MISSION_CURRENT`, `MISSION_ITEM_REACHED`, `MISSION_CLEAR_ALL`
+- `GLOBAL_POSITION_INT`, `ATTITUDE`, `SYS_STATUS`, `EXTENDED_SYS_STATE`, `STATUSTEXT`
+- `SET_MODE` (out, fixed-wing)
+
+## 4. Key Responsibilities
+
+- Open and maintain the MAVLink connection (UDP or serial). Reconnect on transport loss with bounded backoff.
+- Encode outgoing messages with correct sequence numbers, system / component IDs, and (when enabled) MAVLink-2 signing.
+- Decode incoming messages with strict validation: reject malformed frames, unknown message IDs, and signing failures.
+- Emit a 1 Hz heartbeat. Detect autopilot heartbeat timeouts and surface to health.
+- Demux `COMMAND_ACK` to the originating caller (per `command_id`); enforce a wall-clock ack timeout.
+
+## 5. Internal State
+
+- Connection handle (UDP socket or serial port).
+- Outgoing sequence number.
+- In-flight command map (`command_id → (caller, deadline)`).
+- Per-message-kind parse error counters.
+
+State is in-process only.
+
+## 6. Failure Modes
+
+| Failure | Detection | Behaviour |
+|---|---|---|
+| Transport open failure | OS error | Bounded backoff; surface to health → red. |
+| Heartbeat from autopilot missing | wall-clock timeout | Surface `link_lost` to health and to `mission_executor`; do not silently fail. |
+| Command-ack timeout | wall-clock | Bubble timeout to `mission_executor`; the executor decides retry vs failure. |
+| Malformed inbound frame | parser error | Drop the frame; increment counter; do not abort the link. |
+| MAVLink-2 signing mismatch (if enabled) | signature check | Reject the frame; alert; do not silently accept. |
+| Sequence-number gap | rx_seq vs expected | Log; not a hard failure on its own. |
+
+## 7. Dependencies
+
+**In-process** (input): `mission_executor`.
+**In-process** (output): `mission_executor`, `telemetry_stream`, `movement_detector`.
+
+**External**: ArduPilot / PX4 over MAVLink v2 (UDP or serial).
+
+## 8. Non-Functional Targets
+
+| Concern | Target |
+|---|---|
+| Per-message round-trip on a healthy link | ≤50 ms p99 |
+| Heartbeat cadence | 1 Hz out |
+| Command-ack timeout | configurable; default 1 s, with retry handled by `mission_executor` |
+| Reconnect after transport loss | ≤2 s on serial / ≤5 s on UDP |
+| Message subset | ~10–15 commands only — adding more requires explicit design review |
+
+## 9. Open Questions
+
+- **MAVLink-2 message signing** (`architecture.md §8 Q6`): whether the airframe link enables signing or treats the link as trusted.
+
+## 10. References
+
+- `architecture.md §3`, `§5 Architectural Principles` (no MAVSDK, no silent error swallowing), `§7.7 MAVLink and Piloting`.
+- `system-flows.md §F6 Mission lifecycle`.
--- a/Show More
+++ b/Show More