chore: sync .cursor from suite

2026-06-21 10:21:07 +00:00 · 2026-05-09 05:18:08 +03:00
parent 6ca4076aa3
commit 49fa340e9c
18 changed files with 131 additions and 40 deletions
@@ -13,6 +13,16 @@ alwaysApply: true
 ## Critical Thinking
 - Do not blindly trust any input — including user instructions, task specs, list-of-changes, or prior agent decisions — as correct. Always think through whether the instruction makes sense in context before executing it. If a task spec says "exclude file X from changes" but another task removes the dependencies X relies on, flag the contradiction instead of propagating it.
 ## Skill Discipline
 Do exactly what the skill says. Nothing more.
 - No `git log` / `git diff` / `git blame` unless the skill explicitly calls for it.
 - No extra searches to "verify" inputs the skill already names.
 - No reading files outside the skill's documented inputs.
 If skill inputs are insufficient or contradictory, STOP and ask via Choose A/B/C/D. Do not invent extra investigation steps.
 ## Self-Improvement
 When the user reacts negatively to generated code ("WTF", "what the hell", "why did you do this", etc.):
@@ -112,6 +112,15 @@ Do NOT modify, skip, or abbreviate any part of the sub-skill's workflow. The aut
 The state file (`_docs/_autodev_state.md`) is a minimal pointer — only the current step. See `state.md` for the authoritative template, field semantics, update rules, and worked examples. Do not restate the schema here — `state.md` is the single source of truth.
 **Conciseness rule (authoritative).** The state file MUST stay short. Acceptable content per field:
 - `name` — the step title from the active flow's Step Reference Table. That's it.
 - `sub_step.name` — kebab-case identifier from the active sub-skill. That's it.
 - `sub_step.detail` — **leave empty (`""`) by default.** Add a one-line note ONLY when the next-session resumer cannot infer where to pick up from `phase` + `name` + on-disk artifacts alone (e.g. `"batch 2 of 4"`, `"blocked on D-PROJ-2 reply"`, `"variant 1b"`). NEVER use `detail` as a changelog, recap, or summary of completed work — those facts belong in the relevant `_docs/` artifact (glossary, traceability matrix, leftovers folder, retro report, etc.) and in git history.
 - **Total file size target: <30 lines.** If you're tempted to write more, you're using the wrong artifact — write in `_docs/` instead.
 Multi-line `detail` blobs that recap what was just completed are a smell. The state file is a *pointer*, not a logbook.
 ## Trigger Conditions
 This skill activates when the user wants to:
@@ -225,7 +225,7 @@ State-driven: reached by auto-chain from Step 10.
 Action: Read and execute `.cursor/skills/test-run/SKILL.md`
-Verifies the implemented unit, integration, blackbox, and e2e tests pass before proceeding to spec and documentation sync.
+Verifies the implemented unit, integration, blackbox, and e2e tests pass before proceeding to spec and documentation sync. This is a hard product gate, not a harness-smoke gate: e2e/blackbox tests must exercise the actual implemented system through public runtime boundaries and compare actual outputs against `_docs/00_problem/input_data/expected_results/results_report.md` or referenced machine-readable expected-result files. Stubs are allowed only for external systems outside the product boundary; missing internal product implementation must fail or block the gate and send the flow back to Implement.
 ---
@@ -43,6 +43,21 @@ For each component (or the single provided component):
    Consumers read the contract file, not the producer's task spec. This prevents interface drift when the producer's implementation detail leaks into consumers.
 11. **Immediately after writing each task file**: create a work item ticket, link it to the component's epic, write the work item ticket ID and Epic ID back into the task header, then rename the file from `todo/[##]_[short_name].md` to `todo/[TRACKER-ID]_[short_name].md`.
 ## Runtime Completeness Decomposition Gate
 Before Step 2 is considered complete, scan `architecture.md`, `system-flows.md`, component descriptions, and the solution for named internal runtime capabilities and dependencies. Examples include BASALT/OpenVINS/Kimera, FAISS, DINOv2, ONNX/TensorRT, ALIKED/DISK, LightGlue, RANSAC, PostGIS, MAVLink emission, FDR rollover, and any "A-Z" user-visible pipeline.
 For every named internal capability:
 1. Ensure at least one implementation task explicitly owns the production integration or production algorithm.
 2. Do not treat "define protocol", "create adapter boundary", "add deterministic fallback", "create scaffold", or "prepare native bridge" as implementation of the capability unless the architecture explicitly says the real capability is out of scope.
 3. If a capability needs external hardware/data to verify, still create the production implementation task. Verification may be hardware-gated later; implementation must not be omitted.
 4. Add a `## Runtime Completeness` section to any affected task with:
   - named capability/dependency,
   - production code that must exist,
   - allowed external stubs, if any,
   - unacceptable substitutes such as fake/deterministic/internal stubs.
 ## Self-verification (per component)
 - [ ] Every task is atomic (single concern)
@@ -53,6 +68,7 @@ For each component (or the single provided component):
 - [ ] Every task has a work item ticket linked to the correct epic
 - [ ] Every shared-models / shared-API task has a contract file at `_docs/02_document/contracts/<component>/<name>.md` and a `## Contract` section linking to it
 - [ ] Every cross-cutting concern appears exactly once as a shared task, not N per-component copies
 - [ ] Every named internal runtime capability has a production implementation task, not only an interface/scaffold/fallback task
 ## Save action
@@ -13,12 +13,17 @@
 1. Read all test specs from `DOCUMENT_DIR/tests/` (`blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, `resource-limit-tests.md`)
 2. Group related test scenarios into atomic tasks (e.g., one task per test category or per component under test)
 3. Each task should reference the specific test scenarios it implements and the environment/test-data specs
-4. Dependencies:
+4. Add a **System Under Test Boundary** section to every e2e/blackbox test task:
   - The test must drive the product through public runtime boundaries and compare actual outputs to `_docs/00_problem/input_data/expected_results/results_report.md` and any referenced machine-readable expected-result files.
   - Stubs are allowed only for external systems outside the product boundary: flight controller/SITL, QGC observer, satellite-provider/Suite service, physical Jetson hardware, physical camera, licensed public datasets, and network services.
   - Stubs, fakes, deterministic fallbacks, monkeypatches, or direct imports are not allowed for internal product modules that the scenario is meant to validate, such as VIO, safety/anchor wrapper, satellite retrieval, anchor verification, tile manager, MAVLink output adapter, or FDR.
   - If an internal module is not implemented, the test must fail/block as missing product implementation; it must not pass by replacing that module with a test stub.
 5. Dependencies:
   - In tests-only mode: blackbox test tasks depend on the test infrastructure bootstrap task (Step 1t)
-5. Write each task spec using `templates/task.md`
+6. Write each task spec using `templates/task.md`
-6. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
+7. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
-7. Note task dependencies (referencing tracker IDs of already-created dependency tasks)
+8. Note task dependencies (referencing tracker IDs of already-created dependency tasks)
-8. **Immediately after writing each task file**: create a work item ticket under the "Blackbox Tests" epic, write the work item ticket ID and Epic ID back into the task header, then rename the file from `todo/[##]_[short_name].md` to `todo/[TRACKER-ID]_[short_name].md`.
+9. **Immediately after writing each task file**: create a work item ticket under the "Blackbox Tests" epic, write the work item ticket ID and Epic ID back into the task header, then rename the file from `todo/[##]_[short_name].md` to `todo/[TRACKER-ID]_[short_name].md`.
 ## Self-verification
@@ -27,6 +32,7 @@
 - [ ] No task exceeds 5 complexity points
 - [ ] Dependencies correctly reference the test infrastructure task
 - [ ] Every task has a work item ticket linked to the "Blackbox Tests" epic
 - [ ] Every e2e/blackbox task forbids internal product stubs/fakes and requires comparison against expected-results artifacts
 ## Save action
@@ -10,6 +10,8 @@
 2. Check no gaps:
   - In implementation mode: every product interface in `architecture.md` has implementation task coverage
   - In tests-only mode: every test scenario in `traceability-matrix.md` is covered by a task
   - In implementation mode: every named internal runtime capability/dependency from architecture, solution, system flows, and component descriptions has a production implementation task, not only an interface/scaffold/fallback task
   - In tests-only mode: every e2e/blackbox task has a System Under Test Boundary section that forbids stubbing internal product modules and requires comparison to expected-results artifacts
 3. Check no overlaps: tasks don't duplicate work
 4. Check no circular dependencies in the task graph
 5. Produce `_dependencies_table.md` using `templates/dependencies-table.md`
@@ -19,6 +21,7 @@
 ### Implementation mode
 - [ ] Every product interface in `architecture.md` is covered by at least one implementation task
 - [ ] Every named internal runtime capability has a production implementation task
 - [ ] No circular dependencies in the task graph
 - [ ] Cross-component dependencies are explicitly noted in affected task specs
 - [ ] `_dependencies_table.md` contains every task with correct dependencies
@@ -26,6 +29,7 @@
 ### Tests-only mode
 - [ ] Every test scenario from `traceability-matrix.md` "Covered" entries has a corresponding task
 - [ ] Every e2e/blackbox task validates actual product behavior and allows stubs only for external systems
 - [ ] No circular dependencies in the task graph
 - [ ] Test task dependencies reference the test infrastructure bootstrap
 - [ ] `_dependencies_table.md` contains every task with correct dependencies
@@ -25,7 +25,8 @@ For each task the main agent receives a task spec, analyzes the codebase, implem
 - **Dependency-aware ordering**: tasks run only when all their dependencies are satisfied
 - **Batching for review, not parallelism**: tasks are grouped into batches so `/code-review` and commits operate on a coherent unit of work — all tasks inside a batch are still implemented one after the other
 - **Integrated review**: `/code-review` skill runs automatically after each batch
- **Completeness before testing**: product implementation is not done until code is checked against task outcomes, included scope, architecture/component promises, and unresolved scaffold/native placeholders — not just task AC tests
+- **Completeness before testing**: product implementation is not done until code is checked against task outcomes, included scope, architecture/component promises, named runtime dependencies, and unresolved scaffold/native placeholders — not just task AC tests
 - **Runtime dependency reality**: production code cannot satisfy a task by exposing only a protocol, fake runner, deterministic fallback, or "native bridge" placeholder when the task/architecture promises a concrete internal capability such as BASALT VIO, FAISS retrieval, LightGlue matching, or a full A-Z localization pipeline. Stubs are allowed only for external systems and tests.
 - **Auto-start**: batches start immediately — no user confirmation before a batch
 - **Gate on failure**: user confirmation is required only when code review returns FAIL
 - **Commit per batch**: after each batch is confirmed, commit. Ask the user whether to push to remote unless the user previously opted into auto-push for this session.
@@ -66,6 +67,7 @@ TASKS_DIR/
 ## Prerequisite Checks (BLOCKING)
 1. `TASKS_DIR/todo/` exists and contains at least one task file for the selected context — **STOP if missing**
   - Exception for Product implementation re-entry: if no selected product tasks remain in `todo/`, but the active autodev state is Step 7 or the latest product completeness report is missing/invalid/contains `FAIL`, skip directly to Step 15 (Product Implementation Completeness Gate). This gate may create remediation tasks and return to Step 1. Do not write a final implementation report from this state.
 2. `_dependencies_table.md` exists — **STOP if missing**
 3. At least one task is not yet completed — **STOP if all done**
 4. **Working tree is clean** — run `git status --porcelain`; the output must be empty.
@@ -129,7 +131,7 @@ For each task in the batch, transition its ticket status to **In Progress** via
 For each task in the batch **in topological order, one at a time**:
 1. Read the task spec file.
 2. Respect the file-ownership envelope computed in Step 4 (OWNED / READ-ONLY / FORBIDDEN).
-3. Implement the feature and write/update tests for every acceptance criterion in the spec. If a test cannot run in the current environment (e.g., TensorRT requires GPU), the test must still be written and skip with a clear reason.
+3. Implement the feature and write/update tests for every acceptance criterion in the spec. Tests for internal product behavior must exercise the production implementation path. If a test cannot run in the current environment (e.g., TensorRT requires GPU), the test must still exist and skip/block with a clear prerequisite reason, but that skip does not make missing production code complete.
 4. Run the relevant tests locally before moving on to the next task in the batch. If tests fail, fix in-place — do not defer.
 5. Capture a short per-task status line (files changed, tests pass/fail, any blockers) for the batch report.
@@ -255,9 +257,13 @@ For each completed product task:
 1. Read these sections from the task spec: `Description`, `Outcome`, `Scope / Included`, `Acceptance Criteria`, `Non-Functional Requirements`, `Constraints`, and explicit named technologies or integrations.
 2. Compare those promises against actual source code, not only tests or report prose.
 3. Search the task's owned component files for unresolved implementation markers: `placeholder`, `stub`, `reserved`, `TODO`, `NotImplemented`, `pass`, `deterministic`, `fake`, `mock`, `scaffold`, `native bridge`, and empty native/readme-only integration directories. Ignore test fixtures/mocks only when they are under test-owned paths and not used as production behavior.
-4. Verify that each named runtime dependency in the task promise is either integrated behind the approved boundary or explicitly documented as a blocked prerequisite in the task/report. Examples: if a task promises FAISS, DINOv2, BASALT, LightGlue, OpenCV, RANSAC, a database, cloud service, or hardware SDK, the production code must contain that integration boundary; a deterministic fallback alone is not complete.
+4. Verify that each named runtime dependency in the task promise is integrated as production behavior, not merely represented by an interface. Examples: if a task promises FAISS, DINOv2, BASALT, LightGlue, OpenCV, RANSAC, a database, cloud service, or hardware SDK, the production code must either call that dependency or contain an adapter that loads and executes the real dependency package. A deterministic fallback, fake runner, empty `native/` package, or "bridge to be supplied later" is **FAIL** unless the task itself explicitly scoped the dependency out before implementation started.
-5. Verify tests exercise the real implementation path where local prerequisites exist. Environment-gated tests may skip only with an explicit prerequisite reason; they do not make missing production code complete.
+5. Distinguish internal implementation from external prerequisites:
-6. Classify each task:
+   - Internal product capabilities (VIO, anchor verification, cache retrieval, safety wrapper, FDR, MAVLink emission) must be implemented in production code before the task can pass.
   - External systems/hardware/data (Jetson device, physical camera, ArduPilot process, QGC, third-party service credentials, unavailable licensed dataset) may be `BLOCKED` only when production code exists and the missing prerequisite is outside the product boundary.
 6. Verify tests exercise the real implementation path where local prerequisites exist. Environment-gated tests may skip only with an explicit prerequisite reason; they do not make missing production code complete.
 7. For any architecture promise that describes an end-to-end user outcome, verify there is an executable production pipeline connecting the relevant components. Isolated component contracts and test-only harness orchestration are not enough.
 8. Classify each task:
   - **PASS**: task promises are implemented or explicitly out of scope in the task itself.
   - **BLOCKED**: production code exists but cannot be fully verified due to external hardware/data/license/runtime prerequisites; the blocker is explicit and tests report blocked/skipped with reason.
   - **FAIL**: promised production behavior is missing, only scaffolded, or only represented in tests/reports.
@@ -181,6 +181,8 @@ Categorized measurable criteria with markdown headers and bullet points:
 Every criterion must have a measurable value. Vague criteria like "should be fast" are not acceptable — push for "less than 400ms end-to-end".
 **AC must be design-independent**: describe testable outcomes only — no libraries, algorithms, params, or design choices. Implementation follows AC, never reverse. (IEEE 830 / Atlassian / GitScrum)
 ### input_data/
 At least one file. Options:
@@ -45,7 +45,7 @@
 - [ ] All components have comparison tables: Each component lists alternatives with tools, advantages, limitations, security, cost
 - [ ] Component options are broad: component tables include baseline, production, open-source, commercial/vendor, SOTA/research, adjacent-domain, defer/no-build, and disqualified options where applicable
 - [ ] Tools/libraries verified: Suggested tools actually exist and work as described
- [ ] Component fit matrix completed: `06_component_fit_matrix.md` exists and every selected component/tool/pattern is marked `Selected`
+- [ ] Component fit matrix completed: `06_component_fit_matrix.md` (or `06_component_fit_matrix/` if split) exists and every selected component/tool/pattern is marked `Selected`
 - [ ] No field-adjacent substitution: no selected candidate is chosen only because it solves a similar class of problem while failing the project's explicit constraints
 - [ ] Testing strategy covers AC: Tests map to acceptance criteria
 - [ ] Tech stack documented (if Phase 3 ran): `tech_stack.md` has evaluation tables, risk assessment, and learning requirements
@@ -80,7 +80,7 @@ When the research topic has Critical or High sensitivity level:
 ## Target Audience Consistency Check (BLOCKING)
 - [ ] Research boundary clearly defined: `00_question_decomposition.md` has clear population/geography/timeframe/level boundaries
- [ ] Every source has target audience annotated in `01_source_registry.md`
+- [ ] Every source has target audience annotated in `01_source_registry.md` (or category files under `01_source_registry/` if split)
 - [ ] Mismatched sources properly handled (excluded, annotated, or marked reference-only)
 - [ ] No audience confusion in fact cards: Every fact has target audience consistent with research boundary
 - [ ] No audience confusion in the report: Policies/research/data cited have consistent target audiences
@@ -113,11 +113,11 @@ For every lead candidate that is a library/SDK/framework/service:
 - [ ] The exact mode/configuration the project will use is pinned in one explicit sentence (inputs, outputs, runtime); no vague "supports X" language
 - [ ] `context7` (or equivalent docs lookup) was run for the candidate, with at least 3 queries: mode enumeration, project's exact mode, disqualifier probe
- [ ] All consulted URLs from context7 / official docs are appended to `01_source_registry.md`
+- [ ] All consulted URLs from context7 / official docs are appended to `01_source_registry.md` (or files under `01_source_registry/` if split)
- [ ] A Minimum Viable Example (MVE) was saved for the pinned mode in `02_fact_cards.md` (or `02_mve_evidence.md`) with: source, inputs in example, outputs in example, project inputs, project outputs required, match assessment ✅/⚠️/❌
+- [ ] A Minimum Viable Example (MVE) was saved for the pinned mode in `02_fact_cards.md` / `02_fact_cards/` (or `02_mve_evidence.md`) with: source, inputs in example, outputs in example, project inputs, project outputs required, match assessment ✅/⚠️/❌
 - [ ] When the MVE inputs or outputs do not exactly match the project's, the mismatch is cited from the official docs (not inferred), and the candidate is `Experimental only` or `Rejected`
 - [ ] When a library has multiple modes, each project-relevant mode appears as its own candidate row (not a single library row that softens across modes)
- [ ] Restrictions × Candidate-Modes sub-matrix in `06_component_fit_matrix.md` is filled for every lead candidate, with one row per numbered restriction and per numbered acceptance criterion
+- [ ] Restrictions × Candidate-Modes sub-matrix in `06_component_fit_matrix.md` (or files under `06_component_fit_matrix/` if split) is filled for every lead candidate, with one row per numbered restriction and per numbered acceptance criterion
 - [ ] Sub-matrix uses ✅ / ❌ / ❓ / N/A only — no free-form prose substitutes
 - [ ] No `Selected` candidate has any ❌ or ❓ cell in its sub-matrix
 - [ ] "Validation gate required" footnotes are explicitly classified as either *API capability* (must be resolved here) or *runtime quality* (may be carried forward)
@@ -89,7 +89,7 @@ Value Translation:
 ## Source Registry Entry Template
-For each source consulted, immediately append to `01_source_registry.md`:
+For each source consulted, immediately append to `01_source_registry.md` (or the appropriate category file under `01_source_registry/` if the artifact has been split — see splittable-artifacts convention in `steps/00_project-integration.md`):
 ```markdown
 ## Source #[number]
 - **Title**: [source title]
@@ -63,18 +63,43 @@ RESEARCH_DIR/
    └── source_2.md
 ```
 #### Splittable artifacts — Layout convention
 The following three artifacts MAY equivalently be a **folder** of the same base name when the single-file form has grown unwieldy (typically ≳ 1000 lines or ≳ 200 KB):
 - `01_source_registry.md` ↔ `01_source_registry/`
 - `02_fact_cards.md` ↔ `02_fact_cards/`
 - `06_component_fit_matrix.md` ↔ `06_component_fit_matrix/`
 When using the folder form:
 - Place a `00_summary.md` index file at the folder root with a short common summary table and the cross-cutting status the single-file form would have carried in its preamble.
 - Split per-entry content into category files (e.g. one file per sub-question or per component): `SQ1_*.md`, `C1_*.md`, etc. Keep entry numbering global across the folder so cross-references like "Source #42" still resolve to exactly one place.
 - Cross-references from outside the folder may point at either `01_source_registry/00_summary.md` (for the index) or directly at the relevant category file.
 ```
 RESEARCH_DIR/01_source_registry/        # split form (when single-file is too large)
 ├── 00_summary.md                       # index + investigation status + compact source table
 ├── SQ1_existing_systems.md             # category file
 ├── SQ2_canonical_pipeline.md           # category file
 ├── C1_vio.md                           # per-component file
 └── ...
 ```
 Throughout the rest of this skill (other steps, references, templates), the singular `XX.md` form is used as a logical name; treat each occurrence as applying equally to the folder form when the artifact has been split.
 ### Save Timing & Content
 | Step | Save immediately after completion | Filename |
 |------|-----------------------------------|----------|
 | Mode A Phase 1 | AC & restrictions assessment tables | `00_ac_assessment.md` |
 | Step 0-1 | Question type classification + sub-question list | `00_question_decomposition.md` |
-| Step 2 | Each consulted source link, tier, summary | `01_source_registry.md` |
+| Step 2 | Each consulted source link, tier, summary | `01_source_registry.md` *(splittable, see convention)* |
-| Step 3 | Each fact card (statement + source + confidence) | `02_fact_cards.md` |
+| Step 3 | Each fact card (statement + source + confidence) | `02_fact_cards.md` *(splittable, see convention)* |
 | Step 4 | Selected comparison framework + initial population | `03_comparison_framework.md` |
 | Step 6 | Reasoning process for each dimension | `04_reasoning_chain.md` |
 | Step 7 | Validation scenarios + results + review checklist | `05_validation_log.md` |
-| Step 7.5 | Component exact-fit gate and selection status | `06_component_fit_matrix.md` |
+| Step 7.5 | Component exact-fit gate and selection status | `06_component_fit_matrix.md` *(splittable, see convention)* |
 | Step 8 | Complete solution draft | `OUTPUT_DIR/solution_draft##.md` |
 ### Save Principles
@@ -92,12 +117,12 @@ RESEARCH_DIR/
 |------|---------|----------------|
 | `00_ac_assessment.md` | AC & restrictions assessment (Mode A only) | After Phase 1 completion |
 | `00_question_decomposition.md` | Question type, sub-question list | After Step 0-1 completion |
-| `01_source_registry.md` | All source links and summaries | Continuously updated during Step 2 |
+| `01_source_registry.md` *(splittable)* | All source links and summaries | Continuously updated during Step 2 |
-| `02_fact_cards.md` | Extracted facts and sources | Continuously updated during Step 3 |
+| `02_fact_cards.md` *(splittable)* | Extracted facts and sources | Continuously updated during Step 3 |
 | `03_comparison_framework.md` | Selected framework and populated data | After Step 4 completion |
 | `04_reasoning_chain.md` | Fact → conclusion reasoning | After Step 6 completion |
 | `05_validation_log.md` | Use-case validation and review | After Step 7 completion |
-| `06_component_fit_matrix.md` | Exact-fit matrix for every proposed component/tool/pattern with status `Selected` / `Rejected` / `Experimental only` / `Needs user decision` | Before Step 8 deliverable formatting |
+| `06_component_fit_matrix.md` *(splittable)* | Exact-fit matrix for every proposed component/tool/pattern with status `Selected` / `Rejected` / `Experimental only` / `Needs user decision` | Before Step 8 deliverable formatting |
 | `OUTPUT_DIR/solution_draft##.md` | Complete solution draft | After Step 8 completion |
 | `OUTPUT_DIR/tech_stack.md` | Tech stack evaluation and decisions | After Phase 3 (optional) |
 | `OUTPUT_DIR/security_analysis.md` | Threat model and security controls | After Phase 4 (optional) |
@@ -6,7 +6,9 @@ Triggered when no `solution_draft*.md` files exist in OUTPUT_DIR, or when the us
 **Role**: Professional software architect
-A focused preliminary research pass **before** the main solution research. The goal is to validate that the acceptance criteria and restrictions are realistic before designing a solution around them.
+> **AC must be design-independent**: describe testable outcomes only — no libraries, algorithms, params, or design choices. Implementation follows AC, never reverse. (IEEE 830 / Atlassian / GitScrum)
 A focused preliminary research pass **before** the main solution research. The goal is to validate that the acceptance criteria and restrictions are realistic before designing a solution around them. Any revision proposed in this phase must respect the design-independence rule above — propose AC changes as outcome/budget edits, not as implementation prescriptions.
 **Input**: All files from INPUT_DIR (or INPUT_FILE in standalone mode)
@@ -84,7 +86,7 @@ Full 8-step research methodology. Produces the first solution draft.
 Be concise in formulating. The fewer words, the better, but do not miss any important details.
-**Save action**: Write `RESEARCH_DIR/06_component_fit_matrix.md` before the final draft, then write `OUTPUT_DIR/solution_draft##.md` using template: `templates/solution_draft_mode_a.md`
+**Save action**: Write `RESEARCH_DIR/06_component_fit_matrix.md` (or its split-folder equivalent under `RESEARCH_DIR/06_component_fit_matrix/`, per the splittable-artifacts convention in `00_project-integration.md`) before the final draft, then write `OUTPUT_DIR/solution_draft##.md` using template: `templates/solution_draft_mode_a.md`
 ---
@@ -29,6 +29,6 @@ Full 8-step research methodology applied to assessing and improving an existing
 9. For every revised candidate, prove exact fit against the Project Constraint Matrix. Do not select field-adjacent or "similar problem" options unless their intrinsic implementation constraints match the project.
 10. Based on findings, form a new solution draft in the same format
-**Save action**: Write `RESEARCH_DIR/06_component_fit_matrix.md` before the final draft, then write `OUTPUT_DIR/solution_draft##.md` (incremented) using template: `templates/solution_draft_mode_b.md`
+**Save action**: Write `RESEARCH_DIR/06_component_fit_matrix.md` (or its split-folder equivalent under `RESEARCH_DIR/06_component_fit_matrix/`, per the splittable-artifacts convention in `00_project-integration.md`) before the final draft, then write `OUTPUT_DIR/solution_draft##.md` (incremented) using template: `templates/solution_draft_mode_b.md`
 **Optional follow-up**: After Mode B completes, the user can request Phase 3 (Tech Stack Consolidation) or Phase 4 (Security Deep Dive) using the revised draft. These phases work identically to their Mode A descriptions in `steps/01_mode-a-initial-research.md`.
@@ -192,7 +192,7 @@ For every component/tool/library/service/pattern/algorithm that may be selected
 **API Capability Verification — Per-Mode (MANDATORY, BLOCKING for lead candidates)**:
-**Applicability**: this section applies only when the run is classified as **Technical-component selection** in the SKILL's Research Output Class section, and only to lead candidates that are libraries/SDKs/frameworks/services/protocols/data formats with multiple modes or configurations. For non-technical research (concept comparison, market/policy investigation, knowledge organization, root-cause analysis without tooling commitments), skip this entire sub-section and continue with the rest of Step 2 — the broader candidate implementation-limit search above is sufficient. State the skip explicitly once in `02_fact_cards.md`: `API Capability Verification: not applicable — this run is a Non-technical investigation, no library/SDK/service candidates`.
+**Applicability**: this section applies only when the run is classified as **Technical-component selection** in the SKILL's Research Output Class section, and only to lead candidates that are libraries/SDKs/frameworks/services/protocols/data formats with multiple modes or configurations. For non-technical research (concept comparison, market/policy investigation, knowledge organization, root-cause analysis without tooling commitments), skip this entire sub-section and continue with the rest of Step 2 — the broader candidate implementation-limit search above is sufficient. State the skip explicitly once in `02_fact_cards.md` (or in `02_fact_cards/00_summary.md` if split): `API Capability Verification: not applicable — this run is a Non-technical investigation, no library/SDK/service candidates`.
 Most libraries/SDKs/services expose **multiple modes or configurations** (e.g., monocular vs stereo VO, sync vs async API, batch vs streaming inference, write-through vs write-behind cache). Selecting a candidate "because it supports X" without pinning *which mode* the project will use, and *whether that exact mode produces the required outputs from the required inputs*, is the most common silent-failure path in research. A library can support a class of problem in mode A while being unusable for the project's specific configuration in mode B.
@@ -206,10 +206,10 @@ For every lead candidate that is a library/SDK/framework/service with multiple m
   2. *Project's exact mode*: "Show a minimum runnable example of `<library>` in `<the pinned mode>` with `<the project's input shape>`. What does it produce?"
   3. *Disqualifier probe*: "Does `<library>` `<the pinned mode>` produce `<the required output>`? Are there published limitations of `<the pinned mode>` for `<the project's runtime/hardware>`?"
-   For services without context7 coverage, use official docs site + WebFetch on the API reference page + the project's example/tutorial directory in the source repo. Append every consulted URL to `01_source_registry.md`.
+   For services without context7 coverage, use official docs site + WebFetch on the API reference page + the project's example/tutorial directory in the source repo. Append every consulted URL to `01_source_registry.md` (or the appropriate category file under `01_source_registry/` if split — see splittable-artifacts convention in `00_project-integration.md`).
 3. **Save a Minimum Viable Example (MVE) for the pinned mode.**
-   Append to `02_fact_cards.md` (or a sibling `02_mve_evidence.md`) at least one block per lead library candidate with:
+   Append to `02_fact_cards.md` / `02_fact_cards/` (or a sibling `02_mve_evidence.md`) at least one block per lead library candidate with:
   ```markdown
   ## MVE — <library> in <pinned mode>
@@ -225,7 +225,7 @@ For every lead candidate that is a library/SDK/framework/service with multiple m
   If no official example covers the project's exact configuration → the candidate cannot be marked `Selected` based on category fit alone. Status must be `Experimental only` (with required-evidence note) or `Rejected` (when the docs explicitly disqualify the configuration).
 4. **Bind every numbered Restriction and Acceptance Criterion to the candidate's pinned mode.**
-   For each numbered line in `restrictions.md` and `acceptance_criteria.md`, decide one of: `Pass` (the pinned mode satisfies it with cited evidence), `Fail` (the pinned mode contradicts it with cited evidence), `Verify` (no evidence either way; deeper investigation required), `N/A` (the line is irrelevant to this component area). Record this in `02_fact_cards.md` under the candidate's MVE block. The structural matrix in Step 7.5 reads from these bindings.
+   For each numbered line in `restrictions.md` and `acceptance_criteria.md`, decide one of: `Pass` (the pinned mode satisfies it with cited evidence), `Fail` (the pinned mode contradicts it with cited evidence), `Verify` (no evidence either way; deeper investigation required), `N/A` (the line is irrelevant to this component area). Record this in `02_fact_cards.md` (or the candidate's per-component file under `02_fact_cards/` if split) under the candidate's MVE block. The structural matrix in Step 7.5 reads from these bindings.
 5. **Treat "the same library in a different mode" as a different candidate.**
   If the project's pinned mode is `Monocular` but the only documented evidence covers `Stereo`, do not silently soften "rotation only" into "rotation + translation". Open a separate candidate row for the Monocular mode, with its own MVE, fit assessment, and disqualifiers. Two modes of one library are two distinct candidates for the purposes of this gate.
@@ -243,7 +243,7 @@ For every lead candidate that is a library/SDK/framework/service with multiple m
 **Search saturation rule**: Continue searching until new queries stop producing substantially new information. If the last 3 searches only repeat previously found facts, the sub-question is saturated.
 **Save action**:
-For each source consulted, **immediately** append to `01_source_registry.md` using the entry template from `references/source-tiering.md`.
+For each source consulted, **immediately** append to `01_source_registry.md` (or the appropriate category file under `01_source_registry/` if split) using the entry template from `references/source-tiering.md`.
 ---
@@ -273,7 +273,7 @@ Transform sources into **verifiable fact cards**:
  - ❓ Low: Inference or from unofficial sources
 **Save action**:
-For each extracted fact, **immediately** append to `02_fact_cards.md`:
+For each extracted fact, **immediately** append to `02_fact_cards.md` (or the appropriate category file under `02_fact_cards/` if split):
 ```markdown
 ## Fact #[number]
 - **Statement**: [specific fact description]
@@ -318,7 +318,7 @@ After initial fact extraction, review what you have found and identify **knowled
   - Failure cases and edge conditions
   - Recent developments that may change the picture
-4. **Update artifacts**: Append new sources to `01_source_registry.md`, new facts to `02_fact_cards.md`
+4. **Update artifacts**: Append new sources to `01_source_registry.md`, new facts to `02_fact_cards.md` (use the appropriate category files under `01_source_registry/` and `02_fact_cards/` if split)
 **Exit criteria**: Proceed to Step 4 when:
 - Every sub-question has at least 3 facts with at least one from L1/L2
@@ -155,7 +155,7 @@ Before finalizing the solution draft, build an exact-fit matrix for every compon
 | Component Area | Candidate | Pinned Mode/Config | Option Family | Intended Role | API Capability Evidence | Mismatches / Disqualifiers | Status | Decision Rationale |
 |----------------|-----------|--------------------|---------------|---------------|-------------------------|----------------------------|--------|--------------------|
-| [area] | [name] | [exact mode/config the project will use, copied verbatim from the MVE block in Step 2] | [family] | [role] | MVE: [link to MVE block in `02_fact_cards.md` or `02_mve_evidence.md`]; docs: [Source #] | [none / list] | Selected / Rejected / Experimental only / Needs user decision | [why] |
+| [area] | [name] | [exact mode/config the project will use, copied verbatim from the MVE block in Step 2] | [family] | [role] | MVE: [link to MVE block in `02_fact_cards.md` / `02_fact_cards/` or `02_mve_evidence.md`]; docs: [Source #] | [none / list] | Selected / Rejected / Experimental only / Needs user decision | [why] |
 ```
 The new **Pinned Mode/Config** column is mandatory. A row without a pinned mode is incomplete. The new **API Capability Evidence** column links to the Minimum Viable Example saved during Step 2's API Capability Verification — without an MVE link the candidate cannot be `Selected`.
@@ -196,7 +196,7 @@ A candidate row may not be marked `Selected` while any cell is ❌ or ❓.
 - A candidate may not appear as the lead solution in Step 8 unless this gate marks it `Selected`.
 - "Validation gate required" footnotes are not equivalent to `Selected`. If the validation gate concerns API capability (does the mode produce the required output?), that is a Step-2 / Step-7.5 question and must be resolved here, not deferred to runtime. Only validation gates concerning *runtime quality* (e.g., "does this VO converge on this terrain class?") may be carried forward as `Selected with runtime gate`.
-**Save action**: Write `06_component_fit_matrix.md` containing both 7.5.1 (top-level) and 7.5.2 (per-candidate sub-matrices).
+**Save action**: Write `06_component_fit_matrix.md` (or, when split, the equivalent files under `06_component_fit_matrix/` — typically `00_summary.md` for the top-level matrix plus per-component sub-matrix files) containing both 7.5.1 (top-level) and 7.5.2 (per-candidate sub-matrices).
 **BLOCKING**: If any lead candidate has ❌, ❓, `Experimental only`, `Rejected`, or `Needs user decision` status, do not silently proceed. Ask the user or choose a different selected candidate.
@@ -213,8 +213,8 @@ Integrate all intermediate artifacts. Write to `OUTPUT_DIR/solution_draft##.md`
 Sources to integrate:
 - Extract background from `00_question_decomposition.md`
- Reference key facts from `02_fact_cards.md`
+- Reference key facts from `02_fact_cards.md` (or files under `02_fact_cards/` if split)
 - Organize conclusions from `04_reasoning_chain.md`
- Generate references from `01_source_registry.md`
+- Generate references from `01_source_registry.md` (or files under `01_source_registry/` if split)
 - Supplement with use cases from `05_validation_log.md`
 - For Mode A: include AC assessment from `00_ac_assessment.md`
@@ -23,7 +23,7 @@
 - Project constraints checked: [inputs/outputs, operating context, lifecycle, NFRs, acceptance criteria]
 - Evidence: [Fact # / Source #]
 - Disqualifiers: [none or list]
- Restrictions × Candidate-Modes sub-matrix: see `06_component_fit_matrix.md` § <Candidate Name>
+- Restrictions × Candidate-Modes sub-matrix: see `06_component_fit_matrix.md` (or `06_component_fit_matrix/` if split) § <Candidate Name>
 - API capability gates: ✅ MVE saved / ⚠️ partial — see disqualifiers / ❌ no MVE — candidate is Experimental only or Rejected
 [Repeat per component]
@@ -26,7 +26,7 @@
 - Project constraints checked: [inputs/outputs, operating context, lifecycle, NFRs, acceptance criteria]
 - Evidence: [Fact # / Source #]
 - Disqualifiers: [none or list]
- Restrictions × Candidate-Modes sub-matrix: see `06_component_fit_matrix.md` § <Candidate Name>
+- Restrictions × Candidate-Modes sub-matrix: see `06_component_fit_matrix.md` (or `06_component_fit_matrix/` if split) § <Candidate Name>
 - API capability gates: ✅ MVE saved / ⚠️ partial — see disqualifiers / ❌ no MVE — candidate is Experimental only or Rejected
 [Repeat per component]
@@ -32,6 +32,17 @@ After selecting a mode, read its corresponding workflow below; do not mix them.
 ## Functional Mode
 ### 0. System-Under-Test Reality Gate
 Before accepting any functional, blackbox, or e2e result as a pass, verify what the tests actually exercised.
 1. If `_docs/00_problem/input_data/expected_results/results_report.md` exists, at least one e2e/blackbox run must compare actual product outputs against that mapping or the machine-readable files it references.
 2. Stubs are allowed only for external systems outside the product boundary: flight controller/SITL, QGC observer, satellite-provider/Suite service, physical Jetson hardware, physical camera, unavailable licensed datasets, and network services.
 3. Stubs, fakes, deterministic fallbacks, monkeypatches, or direct replacement of internal product modules are not allowed for the behavior under test. Internal examples include VIO, safety/anchor wrapper, satellite retrieval, anchor verification, tile manager, MAVLink output adapter, FDR, and the A-Z localization pipeline.
 4. If tests pass only because an internal module is fake/scaffolded, classify the run as **failed** with category `missing product implementation`.
 5. If a scenario is blocked because external hardware/data is absent, verify the production code path exists before accepting the block as legitimate. Missing internal production code is not an environment block.
 6. If the test runner writes CSV/Markdown reports, inspect them. A zero exit code is not enough; blocked/internal-stubbed scenarios still require classification.
 ### 1. Detect Test Runner
 Check in order — first match wins:
@@ -94,7 +105,7 @@ Categorize skips as: **explicit skip (dead code)**, **runtime skip (unreachable)
 ### 5. Handle Outcome
-**All tests pass, zero skipped** → return success to the autodev for auto-chain.
+**All tests pass, zero skipped, and the System-Under-Test Reality Gate passes** → return success to the autodev for auto-chain.
 **Any test fails or errors** → this is a **blocking gate**. Never silently ignore failures. **Always investigate the root cause before deciding on an action.** Read the failing test code, read the error output, check service logs if applicable, and determine whether the bug is in the test or in the production code.