Update demo replay validation and testing documentation
ci/woodpecker/push/02-build-push Pipeline failed

- Modified the autodev state to reflect the current testing phase and details of the new `jetson-e2e` tests.
- Enhanced the "How to Test" documentation to provide clearer instructions on the demo replay validation process, including video and tlog alignment steps.
- Updated architectural documentation to include the new demo replay operator flow and its dependencies.
- Documented the removal of deprecated auto-sync features and clarified the operator-facing UI for replay validation.
- Added new entries in the dependencies table for upcoming tasks related to the demo replay flow.

These changes improve clarity and usability for operators and developers working with the demo replay system.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-06-20 11:24:43 +03:00
parent 12d0008763
commit 1f634c2604
175 changed files with 20701 additions and 41 deletions
+178
View File
@@ -0,0 +1,178 @@
---
name: research
description: |
Deep Research Methodology (8-Step Method) with two execution modes:
- Mode A (Initial Research): Assess acceptance criteria, then research problem and produce solution draft
- Mode B (Solution Assessment): Assess existing solution draft for weak points and produce revised draft
Supports project mode (_docs/ structure) and standalone mode (@file.md).
Auto-detects research mode based on existing solution_draft files.
Trigger phrases:
- "research", "deep research", "deep dive", "in-depth analysis"
- "research this", "investigate", "look into"
- "assess solution", "review solution draft"
- "comparative analysis", "concept comparison", "technical comparison"
category: build
tags: [research, analysis, solution-design, comparison, decision-support]
disable-model-invocation: true
---
# Deep Research (8-Step Method)
Transform vague topics raised by users into high-quality, deliverable research reports through a systematic methodology. Operates in two modes: **Initial Research** (produce new solution draft) and **Solution Assessment** (assess and revise existing draft).
## Core Principles
- **Conclusions come from mechanism comparison, not "gut feelings"**
- **Pin down the facts first, then reason**
- **Prioritize authoritative sources: L1 > L2 > L3 > L4**
- **Intermediate results must be saved for traceability and reuse**
- **Ask, don't assume** — when any aspect of the problem, criteria, or restrictions is unclear, STOP and ask the user before proceeding
- **Internet-first investigation** — do not rely on training data for factual claims; search the web extensively for every sub-question, rephrase queries when results are thin, and keep searching until you have converging evidence from multiple independent sources
- **Multi-perspective analysis** — examine every problem from at least 3 different viewpoints (e.g., end-user, implementer, business decision-maker, contrarian, domain expert, field practitioner); each perspective should generate its own search queries
- **Question multiplication** — for each sub-question, generate multiple reformulated search queries (synonyms, related terms, negations, "what can go wrong" variants, practitioner-focused variants) to maximize coverage and uncover blind spots
- **Component option breadth** — for every component area, build a broad option landscape before selecting. Search direct candidates, adjacent-domain alternatives, commercial/open-source variants, classical/simple baselines, current SOTA, and "do not use" failure cases. A component may not be narrowed to one candidate until alternatives have been searched and rejected with evidence.
- **Component research depth** — for every serious component candidate, go beyond discovery pages. Read official docs, repository/license files, issue discussions, benchmarks, deployment guides, version/platform requirements, security notes, maintenance signals, and real-world failure reports. Extract evidence for inputs/outputs, lifecycle assumptions, runtime/storage/latency fit, integration boundaries, licensing, operational risks, and unsupported scenarios before assigning any selection status.
- **Exact-fit component selection** — never select a component, tool, library, service, architecture pattern, or algorithm merely because it solves a similar class of problem. It must be proven compatible with the project's explicit operating context, constraints, required inputs/outputs, non-functional requirements, lifecycle assumptions, and acceptance criteria. If fit is unproven or mismatched, mark it `Rejected`, `Experimental only`, or escalate for user decision before it can shape the solution.
- **Per-mode API capability verification** *(applies only to technical-component selection — see Research Output Class below)* — when a candidate library/SDK/framework/service exposes multiple modes or configurations, *the candidate is not a single thing*. Pin the exact mode the project will use (one explicit sentence: inputs, outputs, runtime), and verify *that mode* against the project's required inputs/outputs via official docs (mandatory `context7` lookup) plus a saved Minimum Viable Example. Capability claims at the category level ("supports X, Y, Z modes") must be cross-checked against the literal mode enumeration before being treated as project-applicable. Two modes of one library are two distinct candidates for the purposes of the Component Applicability Gate. Does not apply to non-technical research (concept comparison, market/policy investigation, knowledge organization, etc.).
## Research Output Class (BLOCKING — set in Step 1)
Before applying any of the technical-component gates (per-mode API capability verification, Component Applicability Gate, Restrictions × Candidate-Mode sub-matrix, MVE evidence, mandatory `context7` lookup), classify the research output into one of two classes. Record the decision in `00_question_decomposition.md` once, near the top, so every downstream step honors it.
| Class | What the output recommends or selects | Examples | Technical-component gates apply? |
|-------|---------------------------------------|----------|----------------------------------|
| **Technical-component selection** | One or more libraries, SDKs, frameworks, services, protocols, data formats, infrastructure patterns, algorithms, or APIs that will be implemented or operated against | "Pick a vector database", "Compare auth-token strategies for our API", "Should we use Kafka or RabbitMQ?", architecture / tech-stack / migration drafts (Mode A, Mode B) | **Yes — all gates active** |
| **Non-technical investigation** | Concept comparisons, knowledge organization, root-cause investigation of an event, market/policy/regulatory/social analysis, literature review, decision support without committing to specific tooling | "Why did adoption stall in Q3?", "Compare phenomenology vs constructivism", "Map regulatory landscape for X", "What do practitioners say about onboarding under remote-first orgs?" | **No — skip API/MVE/sub-matrix gates; the rest of the 8-step engine still applies** |
How to decide:
1. Inspect the question and the input files (`problem.md`, `restrictions.md`, `acceptance_criteria.md`, or the standalone input file).
2. If the deliverable will name specific software/services/protocols that someone will then build with or operate, it is **Technical-component selection**.
3. If the deliverable is a report, comparison, or recommendation that does not commit to specific tooling, it is **Non-technical investigation**.
4. **Mixed runs are valid.** Some research questions have a non-technical core but include one technical sub-question (or vice versa). In that case classify per component area within the run, not the run as a whole, and note in `00_question_decomposition.md` which component areas trigger the technical-component gates.
When the run is purely **Non-technical investigation**, the rest of the research engine — question decomposition, perspective rotation, exhaustive web search, fact extraction, comparison framework, reasoning chain, validation, deliverable formatting — still applies in full. The sections that get skipped are explicitly the technical gates listed in the table above.
## Context Resolution
Determine the operating mode based on invocation before any other logic runs.
**Project mode** (no explicit input file provided):
- INPUT_DIR: `_docs/00_problem/`
- OUTPUT_DIR: `_docs/01_solution/`
- RESEARCH_DIR: `_docs/00_research/`
- All existing guardrails, mode detection, and draft numbering apply as-is.
**Standalone mode** (explicit input file provided, e.g. `/research @some_doc.md`):
- INPUT_FILE: the provided file (treated as problem description)
- BASE_DIR: if specified by the caller, use it; otherwise default to `_standalone/`
- OUTPUT_DIR: `BASE_DIR/01_solution/`
- RESEARCH_DIR: `BASE_DIR/00_research/`
- Guardrails relaxed: only INPUT_FILE must exist and be non-empty
- `restrictions.md` and `acceptance_criteria.md` are optional — warn if absent, proceed if user confirms
- Mode detection uses OUTPUT_DIR for `solution_draft*.md` scanning
- Draft numbering works the same, scoped to OUTPUT_DIR
- **Final step**: after all research is complete, move INPUT_FILE into BASE_DIR
Announce the detected mode and resolved paths to the user before proceeding.
## Project Integration
Read and follow `steps/00_project-integration.md` for prerequisite guardrails, mode detection, draft numbering, working directory setup, save timing, and output file inventory.
## Execution Flow
### Mode A: Initial Research
Read and follow `steps/01_mode-a-initial-research.md`.
Phases: AC Assessment (BLOCKING) → Problem Research → Tech Stack (optional) → Security (optional).
---
### Mode B: Solution Assessment
Read and follow `steps/02_mode-b-solution-assessment.md`.
---
## Research Engine (8-Step Method)
The 8-step method is the core research engine used by both modes. Steps 0-1 and Step 8 have mode-specific behavior; Steps 2-7 are identical regardless of mode.
**Investigation phase** (Steps 03.5): Read and follow `steps/03_engine-investigation.md`.
Covers: question classification, novelty sensitivity, question decomposition, perspective rotation, exhaustive web search, fact extraction, iterative deepening.
**Analysis phase** (Steps 48): Read and follow `steps/04_engine-analysis.md`.
Covers: comparison framework, baseline alignment, reasoning chain, use-case validation, deliverable formatting.
## Solution Draft Output Templates
- Mode A: `templates/solution_draft_mode_a.md`
- Mode B: `templates/solution_draft_mode_b.md`
## Escalation Rules
| Situation | Action |
|-----------|--------|
| Unclear problem boundaries | **ASK user** |
| Ambiguous acceptance criteria values | **ASK user** |
| Missing context files (`security_approach.md`, `input_data/`) | **ASK user** what they have |
| Conflicting restrictions | **ASK user** which takes priority |
| Technology choice with multiple valid options | **ASK user** |
| Contradictions between input files | **ASK user** |
| Missing acceptance criteria or restrictions files | **WARN user**, ask whether to proceed |
| File naming within research artifacts | PROCEED |
| Source tier classification | PROCEED |
## Trigger Conditions
When the user wants to:
- Deeply understand a concept/technology/phenomenon
- Compare similarities and differences between two or more things
- Gather information and evidence for a decision
- Assess or improve an existing solution draft
**Differentiation from other Skills**:
- Needs **research + solution draft** → use this Skill
## Stakeholder Perspectives
Adjust content depth based on audience:
| Audience | Focus | Detail Level |
|----------|-------|--------------|
| **Decision-makers** | Conclusions, risks, recommendations | Concise, emphasize actionability |
| **Implementers** | Specific mechanisms, how-to | Detailed, emphasize how to do it |
| **Technical experts** | Details, boundary conditions, limitations | In-depth, emphasize accuracy |
## Source Verifiability Requirements
Every cited piece of external information must be directly verifiable by the user. All links must be publicly accessible (annotate `[login required]` if not), citations must include exact section/page/timestamp, and unverifiable information must be annotated `[limited source]`. Full checklist in `references/quality-checklists.md`.
## Quality Checklist
Before completing the solution draft, run through the checklists in `references/quality-checklists.md`. This covers:
- General quality (L1/L2 support, verifiability, actionability)
- Mode A specific (AC assessment, competitor analysis, component tables, tech stack)
- Mode B specific (findings table, self-contained draft, performance column)
- Timeliness check for high-sensitivity domains (version annotations, cross-validation, community mining)
- Target audience consistency (boundary definition, source matching, fact card audience)
## Final Reply Guidelines
When replying to the user after research is complete:
**Should include**:
- Active mode used (A or B) and which optional phases were executed
- One-sentence core conclusion
- Key findings summary (3-5 points)
- Path to the solution draft: `OUTPUT_DIR/solution_draft##.md`
- Paths to optional artifacts if produced: `tech_stack.md`, `security_analysis.md`
- If there are significant uncertainties, annotate points requiring further verification
**Must not include**:
- Process file listings (e.g., `00_question_decomposition.md`, `01_source_registry.md`, etc.)
- Detailed research step descriptions
- Working directory structure display
**Reason**: Process files are for retrospective review, not for the user. The user cares about conclusions, not the process.
@@ -0,0 +1,48 @@
# Comparison & Analysis Frameworks — Reference
## General Dimensions (select as needed)
1. Goal / What problem does it solve
2. Working mechanism / Process
3. Input / Output / Boundaries
4. Advantages / Disadvantages / Trade-offs
5. Applicable scenarios / Boundary conditions
6. Cost / Benefit / Risk
7. Historical evolution / Future trends
8. Security / Permissions / Controllability
## Concept Comparison Specific Dimensions
1. Definition & essence
2. Trigger / invocation method
3. Execution agent
4. Input/output & type constraints
5. Determinism & repeatability
6. Resource & context management
7. Composition & reuse patterns
8. Security boundaries & permission control
## Decision Support Specific Dimensions
1. Solution overview
2. Implementation cost
3. Maintenance cost
4. Risk assessment
5. Expected benefit
6. Applicable scenarios
7. Team capability requirements
8. Migration difficulty
## Decomposition Completeness Probes (Completeness Audit Reference)
Used during Step 1's Decomposition Completeness Audit. After generating sub-questions, ask each probe against the current decomposition. If a probe reveals an uncovered area, add a sub-question for it.
| Probe | What it catches |
|-------|-----------------|
| **What does this cost — in money, time, resources, or trade-offs?** | Budget, pricing, licensing, tax, opportunity cost, maintenance burden |
| **What are the hard constraints — physical, legal, regulatory, environmental?** | Regulations, certifications, spectrum/frequency rules, export controls, physics limits, IP restrictions |
| **What are the dependencies and assumptions that could break?** | Supply chain, vendor lock-in, API stability, single points of failure, standards evolution |
| **What does the operating environment actually look like?** | Terrain, weather, connectivity, infrastructure, power, latency, user skill level |
| **What failure modes exist and what happens when they trigger?** | Degraded operation, fallback, safety margins, blast radius, recovery time |
| **What do practitioners who solved similar problems say matters most?** | Field-tested priorities that don't appear in specs or papers |
| **What changes over time — and what looks stable now but isn't?** | Technology roadmaps, regulatory shifts, deprecation risk, scaling effects |
@@ -0,0 +1,75 @@
# Novelty Sensitivity Assessment — Reference
## Novelty Sensitivity Classification
| Sensitivity Level | Typical Domains | Source Time Window | Description |
|-------------------|-----------------|-------------------|-------------|
| **Critical** | AI/LLMs, blockchain, cryptocurrency | 3-6 months | Technology iterates extremely fast; info from months ago may be completely outdated |
| **High** | Cloud services, frontend frameworks, API interfaces | 6-12 months | Frequent version updates; must confirm current version |
| **Medium** | Programming languages, databases, operating systems | 1-2 years | Relatively stable but still evolving |
| **Low** | Algorithm fundamentals, design patterns, theoretical concepts | No limit | Core principles change slowly |
## Critical Sensitivity Domain Special Rules
When the research topic involves the following domains, special rules must be enforced:
**Trigger word identification**:
- AI-related: LLM, GPT, Claude, Gemini, AI Agent, RAG, vector database, prompt engineering
- Cloud-native: Kubernetes new versions, Serverless, container runtimes
- Cutting-edge tech: Web3, quantum computing, AR/VR
**Mandatory rules**:
1. **Search with time constraints**:
- Use `time_range: "month"` or `time_range: "week"` to limit search results
- Prefer `start_date: "YYYY-MM-DD"` set to within the last 3 months
2. **Elevate official source priority**:
- Must first consult official documentation, official blogs, official Changelogs
- GitHub Release Notes, official X/Twitter announcements
- Academic papers (arXiv and other preprint platforms)
3. **Mandatory version number annotation**:
- Any technical description must annotate the current version number
- Example: "Claude 3.5 Sonnet (claude-3-5-sonnet-20241022) supports..."
- Prohibit vague statements like "the latest version supports..."
4. **Outdated information handling**:
- Technical blogs/tutorials older than 6 months -> historical reference only, cannot serve as factual evidence
- Version inconsistency found -> must verify current version before using
- Obviously outdated descriptions (e.g., "will support in the future" but now already supported) -> discard directly
5. **Cross-validation**:
- Highly sensitive information must be confirmed from at least 2 independent sources
- Priority: Official docs > Official blogs > Authoritative tech media > Personal blogs
6. **Official download/release page direct verification (BLOCKING)**:
- Must directly visit official download pages to verify platform support (don't rely on search engine caches)
- Use `WebFetch` to directly extract download page content
- Search results about "coming soon" or "planned support" may be outdated; must verify in real time
- Platform support is frequently changing information; cannot infer from old sources
7. **Product-specific protocol/feature name search (BLOCKING)**:
- Beyond searching the product name, must additionally search protocol/standard names the product supports
- Common protocols/standards to search:
- AI tools: MCP, ACP (Agent Client Protocol), LSP, DAP
- Cloud services: OAuth, OIDC, SAML
- Data exchange: GraphQL, gRPC, REST
- Search format: `"<product_name> <protocol_name> support"` or `"<product_name> <protocol_name> integration"`
## Timeliness Assessment Output Template
```markdown
## Timeliness Sensitivity Assessment
- **Research Topic**: [topic]
- **Sensitivity Level**: Critical / High / Medium / Low
- **Rationale**: [why this level]
- **Source Time Window**: [X months/years]
- **Priority official sources to consult**:
1. [Official source 1]
2. [Official source 2]
- **Key version information to verify**:
- [Product/technology 1]: Current version ____
- [Product/technology 2]: Current version ____
```
@@ -0,0 +1,124 @@
# Quality Checklists — Reference
## General Quality
- [ ] All core conclusions have L1/L2 tier factual support
- [ ] No use of vague words like "possibly", "probably" without annotating uncertainty
- [ ] Comparison dimensions are complete with no key differences missed
- [ ] At least one real use case validates conclusions
- [ ] References are complete with accessible links
- [ ] Every citation can be directly verified by the user (source verifiability)
- [ ] Structure hierarchy is clear; executives can quickly locate information
## Decomposition Completeness
- [ ] Domain discovery search executed: searched "key factors when [problem domain]" before starting research
- [ ] Completeness probes applied: every probe from `references/comparison-frameworks.md` checked against sub-questions
- [ ] No uncovered areas remain: all gaps filled with sub-questions or justified as not applicable
## Internet Search Depth
- [ ] Every sub-question was searched with at least 3-5 different query variants
- [ ] At least 3 perspectives from the Perspective Rotation were applied and searched
- [ ] Search saturation reached: last searches stopped producing new substantive information
- [ ] Adjacent fields and analogous problems were searched, not just direct matches
- [ ] Contrarian viewpoints were actively sought ("why not X", "X criticism", "X failure")
- [ ] Practitioner experience was searched (production use, real-world results, lessons learned)
- [ ] Iterative deepening completed: follow-up questions from initial findings were searched
- [ ] No sub-question relies solely on training data without web verification
## Component Option Breadth
- [ ] `00_question_decomposition.md` contains a Component Option Search Plan
- [ ] Every component area was searched across simple baseline, established production, open-source, commercial/vendor, current SOTA, adjacent-domain, no-build/defer, and known-bad options where applicable
- [ ] Every component area has at least 3 realistic candidates, or a documented explanation of why broad searches found fewer
- [ ] Each lead candidate has official/source-of-truth evidence plus independent validation when available
- [ ] Each component area includes at least one baseline/fallback option and at least one rejected or experimental option when possible
- [ ] Alternative names, synonyms, and neighboring-domain terms were searched before declaring the option landscape complete
- [ ] Licensing, runtime, platform, maintenance, and unsupported-scenario searches were performed for every lead, fallback, and rejected candidate
## Mode A Specific
- [ ] Phase 1 completed: AC assessment was presented to and confirmed by user
- [ ] AC assessment consistent: Solution draft respects the (possibly adjusted) acceptance criteria and restrictions
- [ ] Competitor analysis included: Existing solutions were researched
- [ ] All components have comparison tables: Each component lists alternatives with tools, advantages, limitations, security, cost
- [ ] Component options are broad: component tables include baseline, production, open-source, commercial/vendor, SOTA/research, adjacent-domain, defer/no-build, and disqualified options where applicable
- [ ] Tools/libraries verified: Suggested tools actually exist and work as described
- [ ] Component fit matrix completed: `06_component_fit_matrix.md` (or `06_component_fit_matrix/` if split) exists and every selected component/tool/pattern is marked `Selected`
- [ ] No field-adjacent substitution: no selected candidate is chosen only because it solves a similar class of problem while failing the project's explicit constraints
- [ ] Testing strategy covers AC: Tests map to acceptance criteria
- [ ] Tech stack documented (if Phase 3 ran): `tech_stack.md` has evaluation tables, risk assessment, and learning requirements
- [ ] Security analysis documented (if Phase 4 ran): `security_analysis.md` has threat model and per-component controls
## Mode B Specific
- [ ] Findings table complete: All identified weak points documented with solutions
- [ ] Weak point categories covered: Functional, security, and performance assessed
- [ ] New draft is self-contained: Written as if from scratch, no "updated" markers
- [ ] Performance column included: Mode B comparison tables include performance characteristics
- [ ] Previous draft issues addressed: Every finding in the table is resolved in the new draft
- [ ] Existing selected components were challenged against a broad alternative landscape before being kept
- [ ] Existing component fit audited: every old and new component/tool/pattern was checked against `restrictions.md`, `acceptance_criteria.md`, and the Project Constraint Matrix
- [ ] Rejected/experimental candidates are not lead recommendations unless the user explicitly accepted the risk
## Timeliness Check (High-Sensitivity Domain BLOCKING)
When the research topic has Critical or High sensitivity level:
- [ ] Timeliness sensitivity assessment completed: `00_question_decomposition.md` contains a timeliness assessment section
- [ ] Source timeliness annotated: Every source has publication date, timeliness status, version info
- [ ] No outdated sources used as factual evidence (Critical: within 6 months; High: within 1 year)
- [ ] Version numbers explicitly annotated for all technical products/APIs/SDKs
- [ ] Official sources prioritized: Core conclusions have support from official documentation/blogs
- [ ] Cross-validation completed: Key technical information confirmed from at least 2 independent sources
- [ ] Download page directly verified: Platform support info comes from real-time extraction of official download pages
- [ ] Protocol/feature names searched: Searched for product-supported protocol names (MCP, ACP, etc.)
- [ ] GitHub Issues mined: Reviewed product's GitHub Issues popular discussions
- [ ] Community hotspots identified: Identified and recorded feature points users care most about
## Target Audience Consistency Check (BLOCKING)
- [ ] Research boundary clearly defined: `00_question_decomposition.md` has clear population/geography/timeframe/level boundaries
- [ ] Every source has target audience annotated in `01_source_registry.md` (or category files under `01_source_registry/` if split)
- [ ] Mismatched sources properly handled (excluded, annotated, or marked reference-only)
- [ ] No audience confusion in fact cards: Every fact has target audience consistent with research boundary
- [ ] No audience confusion in the report: Policies/research/data cited have consistent target audiences
## Source Verifiability
- [ ] All cited links are publicly accessible (annotate `[login required]` if not)
- [ ] Citations include exact section/page/timestamp for long documents
- [ ] Cited facts have corresponding statements in the original text (no over-interpretation)
- [ ] Source publication/update dates annotated; technical docs include version numbers
- [ ] Unverifiable information annotated `[limited source]` and not sole support for core conclusions
## Exact-Fit Validation (BLOCKING)
- [ ] Project Constraint Matrix extracted from problem context before component selection
- [ ] Component fit matrix includes `Component Area`, `Option Family`, and `Pinned Mode/Config` columns
- [ ] Every selected component/tool/library/service/pattern/algorithm has evidence for required inputs/outputs and integration boundaries
- [ ] Every selected candidate has evidence for the operating context and lifecycle assumptions it must support
- [ ] Every selected candidate has evidence for non-functional targets that are binding for the project
- [ ] Known unsupported scenarios and failure reports were searched for every selected candidate
- [ ] Mismatches are recorded as disqualifiers, not softened into generic limitations
- [ ] Any candidate with unproven fit is marked `Experimental only` or escalated for user decision
- [ ] Any candidate with documented constraint conflict is marked `Rejected`
## API Capability Verification (BLOCKING)
**Applicability**: this checklist applies only when the run is classified as **Technical-component selection** (see SKILL.md → Research Output Class). For non-technical research (concept comparison, market/policy investigation, root-cause analysis, knowledge organization), skip this checklist entirely and note the skip in `05_validation_log.md`. For mixed runs, apply only to technical component areas.
For every lead candidate that is a library/SDK/framework/service:
- [ ] The exact mode/configuration the project will use is pinned in one explicit sentence (inputs, outputs, runtime); no vague "supports X" language
- [ ] `context7` (or equivalent docs lookup) was run for the candidate, with at least 3 queries: mode enumeration, project's exact mode, disqualifier probe
- [ ] All consulted URLs from context7 / official docs are appended to `01_source_registry.md` (or files under `01_source_registry/` if split)
- [ ] A Minimum Viable Example (MVE) was saved for the pinned mode in `02_fact_cards.md` / `02_fact_cards/` (or `02_mve_evidence.md`) with: source, inputs in example, outputs in example, project inputs, project outputs required, match assessment ✅/⚠️/❌
- [ ] When the MVE inputs or outputs do not exactly match the project's, the mismatch is cited from the official docs (not inferred), and the candidate is `Experimental only` or `Rejected`
- [ ] When a library has multiple modes, each project-relevant mode appears as its own candidate row (not a single library row that softens across modes)
- [ ] Restrictions × Candidate-Modes sub-matrix in `06_component_fit_matrix.md` (or files under `06_component_fit_matrix/` if split) is filled for every lead candidate, with one row per numbered restriction and per numbered acceptance criterion
- [ ] Sub-matrix uses ✅ / ❌ / ❓ / N/A only — no free-form prose substitutes
- [ ] No `Selected` candidate has any ❌ or ❓ cell in its sub-matrix
- [ ] "Validation gate required" footnotes are explicitly classified as either *API capability* (must be resolved here) or *runtime quality* (may be carried forward)
- [ ] Paraphrased capability claims in fact cards have been cross-checked against the literal mode-enumeration evidence (no `mono, inertial → mono-inertial` style conflation)
@@ -0,0 +1,121 @@
# Source Tiering & Authority Anchoring — Reference
## Source Tiers
| Tier | Source Type | Purpose | Credibility |
|------|------------|---------|-------------|
| **L1** | Official docs, papers, specs, RFCs | Definitions, mechanisms, verifiable facts | High |
| **L2** | Official blogs, tech talks, white papers | Design intent, architectural thinking | High |
| **L3** | Authoritative media, expert commentary, tutorials | Supplementary intuition, case studies | Medium |
| **L4** | Community discussions, personal blogs, forums | Discover blind spots, validate understanding | Low |
## L4 Community Source Specifics (mandatory for product comparison research)
| Source Type | Access Method | Value |
|------------|---------------|-------|
| **GitHub Issues** | Visit `github.com/<org>/<repo>/issues` | Real user pain points, feature requests, bug reports |
| **GitHub Discussions** | Visit `github.com/<org>/<repo>/discussions` | Feature discussions, usage insights, community consensus |
| **Reddit** | Search `site:reddit.com "<product_name>"` | Authentic user reviews, comparison discussions |
| **Hacker News** | Search `site:news.ycombinator.com "<product_name>"` | In-depth technical community discussions |
| **Discord/Telegram** | Product's official community channels | Active user feedback (must annotate [limited source]) |
## Principles
- Conclusions must be traceable to L1/L2
- L3/L4 serve only as supplementary and validation
- L4 community discussions are used to discover "what users truly care about"
- Record all information sources
- **Search broadly before searching deeply** — cast a wide net with multiple query variants before diving deep into any single source
- **Cross-domain search** — when direct results are sparse, search adjacent fields, analogous problems, and related industries
- **Never rely on a single search** — each sub-question requires multiple searches from different angles (synonyms, negations, practitioner language, academic language)
## Timeliness Filtering Rules (execute based on Step 0.5 sensitivity level)
| Sensitivity Level | Source Filtering Rule | Suggested Search Parameters |
|-------------------|----------------------|-----------------------------|
| Critical | Only accept sources within 6 months as factual evidence | `time_range: "month"` or `start_date` set to last 3 months |
| High | Prefer sources within 1 year; annotate if older than 1 year | `time_range: "year"` |
| Medium | Sources within 2 years used normally; older ones need validity check | Default search |
| Low | No time limit | Default search |
## High-Sensitivity Domain Search Strategy
```
1. Round 1: Targeted official source search
- Use include_domains to restrict to official domains
- Example: include_domains: ["anthropic.com", "openai.com", "docs.xxx.com"]
2. Round 2: Official download/release page direct verification (BLOCKING)
- Directly visit official download pages; don't rely on search caches
- Use tavily-extract or WebFetch to extract page content
- Verify: platform support, current version number, release date
3. Round 3: Product-specific protocol/feature search (BLOCKING)
- Search protocol names the product supports (MCP, ACP, LSP, etc.)
- Format: "<product_name> <protocol_name>" site:official_domain
4. Round 4: Time-limited broad search
- time_range: "month" or start_date set to recent
- Exclude obviously outdated sources
5. Round 5: Version verification
- Cross-validate version numbers from search results
- If inconsistency found, immediately consult official Changelog
6. Round 6: Community voice mining (BLOCKING - mandatory for product comparison research)
- Visit the product's GitHub Issues page, review popular/pinned issues
- Search Issues for key feature terms (e.g., "MCP", "plugin", "integration")
- Review discussion trends from the last 3-6 months
- Identify the feature points and differentiating characteristics users care most about
```
## Community Voice Mining Detailed Steps
```
GitHub Issues Mining Steps:
1. Visit github.com/<org>/<repo>/issues
2. Sort by "Most commented" to view popular discussions
3. Search keywords:
- Feature-related: feature request, enhancement, MCP, plugin, API
- Comparison-related: vs, compared to, alternative, migrate from
4. Review issue labels: enhancement, feature, discussion
5. Record frequently occurring feature demands and user pain points
Value Translation:
- Frequently discussed features -> likely differentiating highlights
- User complaints/requests -> likely product weaknesses
- Comparison discussions -> directly obtain user-perspective difference analysis
```
## Source Registry Entry Template
For each source consulted, immediately append to `01_source_registry.md` (or the appropriate category file under `01_source_registry/` if the artifact has been split — see splittable-artifacts convention in `steps/00_project-integration.md`):
```markdown
## Source #[number]
- **Title**: [source title]
- **Link**: [URL]
- **Tier**: L1/L2/L3/L4
- **Publication Date**: [YYYY-MM-DD]
- **Timeliness Status**: Currently valid / Needs verification / Outdated (reference only)
- **Version Info**: [If involving a specific version, must annotate]
- **Target Audience**: [Explicitly annotate the group/geography/level this source targets]
- **Research Boundary Match**: Full match / Partial overlap / Reference only
- **Summary**: [1-2 sentence key content]
- **Related Sub-question**: [which sub-question this corresponds to]
```
## Target Audience Verification (BLOCKING)
Before including each source, verify that its target audience matches the research boundary:
| Source Type | Target audience to verify | Verification method |
|------------|---------------------------|---------------------|
| **Policy/Regulation** | Who is it for? (K-12/university/all) | Check document title, scope clauses |
| **Academic Research** | Who are the subjects? (vocational/undergraduate/graduate) | Check methodology/sample description sections |
| **Statistical Data** | Which population is measured? | Check data source description |
| **Case Reports** | What type of institution is involved? | Confirm institution type |
Handling mismatched sources:
- Target audience completely mismatched -> do not include
- Partially overlapping -> include but annotate applicable scope
- Usable as analogous reference -> include but explicitly annotate "reference only"
@@ -0,0 +1,56 @@
# Usage Examples — Reference
## Example 1: Initial Research (Mode A)
```
User: Research this problem and find the best solution
```
Execution flow:
1. Context resolution: no explicit file -> project mode (INPUT_DIR=`_docs/00_problem/`, OUTPUT_DIR=`_docs/01_solution/`)
2. Guardrails: verify INPUT_DIR exists with required files
3. Mode detection: no `solution_draft*.md` -> Mode A
4. Phase 1: Assess acceptance criteria and restrictions, ask user about unclear parts
5. BLOCKING: present AC assessment, wait for user confirmation
6. Phase 2: Full 8-step research — competitors, components, state-of-the-art solutions
7. Output: `OUTPUT_DIR/solution_draft01.md`
8. (Optional) Phase 3: Tech stack consolidation -> `tech_stack.md`
9. (Optional) Phase 4: Security deep dive -> `security_analysis.md`
## Example 2: Solution Assessment (Mode B)
```
User: Assess the current solution draft
```
Execution flow:
1. Context resolution: no explicit file -> project mode
2. Guardrails: verify INPUT_DIR exists
3. Mode detection: `solution_draft03.md` found in OUTPUT_DIR -> Mode B, read it as input
4. Full 8-step research — weak points, security, performance, solutions
5. Output: `OUTPUT_DIR/solution_draft04.md` with findings table + revised draft
## Example 3: Standalone Research
```
User: /research @my_problem.md
```
Execution flow:
1. Context resolution: explicit file -> standalone mode (INPUT_FILE=`my_problem.md`, OUTPUT_DIR=`_standalone/my_problem/01_solution/`)
2. Guardrails: verify INPUT_FILE exists and is non-empty, warn about missing restrictions/AC
3. Mode detection + full research flow as in Example 1, scoped to standalone paths
4. Output: `_standalone/my_problem/01_solution/solution_draft01.md`
5. Move `my_problem.md` into `_standalone/my_problem/`
## Example 4: Force Initial Research (Override)
```
User: Research from scratch, ignore existing drafts
```
Execution flow:
1. Context resolution: no explicit file -> project mode
2. Mode detection: drafts exist, but user explicitly requested initial research -> Mode A
3. Phase 1 + Phase 2 as in Example 1
4. Output: `OUTPUT_DIR/solution_draft##.md` (incremented from highest existing)
@@ -0,0 +1,131 @@
## Project Integration
### Prerequisite Guardrails (BLOCKING)
Before any research begins, verify the input context exists. **Do not proceed if guardrails fail.**
**Project mode:**
1. Check INPUT_DIR exists — **STOP if missing**, ask user to create it and provide problem files
2. Check `problem.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
3. Check `restrictions.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
4. Check `acceptance_criteria.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
5. Check `input_data/` in INPUT_DIR exists and contains at least one file — **STOP if missing**
6. Read **all** files in INPUT_DIR to ground the investigation in the project context
7. Create OUTPUT_DIR and RESEARCH_DIR if they don't exist
**Standalone mode:**
1. Check INPUT_FILE exists and is non-empty — **STOP if missing**
2. Resolve BASE_DIR: use the caller-specified directory if provided; otherwise default to `_standalone/`
3. Resolve OUTPUT_DIR (`BASE_DIR/01_solution/`) and RESEARCH_DIR (`BASE_DIR/00_research/`)
4. Warn if no `restrictions.md` or `acceptance_criteria.md` were provided alongside INPUT_FILE — proceed if user confirms
5. Create BASE_DIR, OUTPUT_DIR, and RESEARCH_DIR if they don't exist
### Mode Detection
After guardrails pass, determine the execution mode:
1. Scan OUTPUT_DIR for files matching `solution_draft*.md`
2. **No matches found****Mode A: Initial Research**
3. **Matches found****Mode B: Solution Assessment** (use the highest-numbered draft as input)
4. **User override**: if the user explicitly says "research from scratch" or "initial research", force Mode A regardless of existing drafts
Inform the user which mode was detected and confirm before proceeding.
### Solution Draft Numbering
All final output is saved as `OUTPUT_DIR/solution_draft##.md` with a 2-digit zero-padded number:
1. Scan existing files in OUTPUT_DIR matching `solution_draft*.md`
2. Extract the highest existing number
3. Increment by 1
4. Zero-pad to 2 digits (e.g., `01`, `02`, ..., `10`, `11`)
Example: if `solution_draft01.md` through `solution_draft10.md` exist, the next output is `solution_draft11.md`.
### Working Directory & Intermediate Artifact Management
#### Directory Structure
At the start of research, **must** create a working directory under RESEARCH_DIR:
```
RESEARCH_DIR/
├── 00_ac_assessment.md # Mode A Phase 1 output: AC & restrictions assessment
├── 00_question_decomposition.md # Step 0-1 output
├── 01_source_registry.md # Step 2 output: all consulted source links
├── 02_fact_cards.md # Step 3 output: extracted facts
├── 03_comparison_framework.md # Step 4 output: selected framework and populated data
├── 04_reasoning_chain.md # Step 6 output: fact → conclusion reasoning
├── 05_validation_log.md # Step 7 output: use-case validation results
├── 06_component_fit_matrix.md # Step 7.5 output: component exact-fit gate
└── raw/ # Raw source archive (optional)
├── source_1.md
└── source_2.md
```
#### Splittable artifacts — Layout convention
The following three artifacts MAY equivalently be a **folder** of the same base name when the single-file form has grown unwieldy (typically ≳ 1000 lines or ≳ 200 KB):
- `01_source_registry.md``01_source_registry/`
- `02_fact_cards.md``02_fact_cards/`
- `06_component_fit_matrix.md``06_component_fit_matrix/`
When using the folder form:
- Place a `00_summary.md` index file at the folder root with a short common summary table and the cross-cutting status the single-file form would have carried in its preamble.
- Split per-entry content into category files (e.g. one file per sub-question or per component): `SQ1_*.md`, `C1_*.md`, etc. Keep entry numbering global across the folder so cross-references like "Source #42" still resolve to exactly one place.
- Cross-references from outside the folder may point at either `01_source_registry/00_summary.md` (for the index) or directly at the relevant category file.
```
RESEARCH_DIR/01_source_registry/ # split form (when single-file is too large)
├── 00_summary.md # index + investigation status + compact source table
├── SQ1_existing_systems.md # category file
├── SQ2_canonical_pipeline.md # category file
├── C1_vio.md # per-component file
└── ...
```
Throughout the rest of this skill (other steps, references, templates), the singular `XX.md` form is used as a logical name; treat each occurrence as applying equally to the folder form when the artifact has been split.
### Save Timing & Content
| Step | Save immediately after completion | Filename |
|------|-----------------------------------|----------|
| Mode A Phase 1 | AC & restrictions assessment tables | `00_ac_assessment.md` |
| Step 0-1 | Question type classification + sub-question list | `00_question_decomposition.md` |
| Step 2 | Each consulted source link, tier, summary | `01_source_registry.md` *(splittable, see convention)* |
| Step 3 | Each fact card (statement + source + confidence) | `02_fact_cards.md` *(splittable, see convention)* |
| Step 4 | Selected comparison framework + initial population | `03_comparison_framework.md` |
| Step 6 | Reasoning process for each dimension | `04_reasoning_chain.md` |
| Step 7 | Validation scenarios + results + review checklist | `05_validation_log.md` |
| Step 7.5 | Component exact-fit gate and selection status | `06_component_fit_matrix.md` *(splittable, see convention)* |
| Step 8 | Complete solution draft | `OUTPUT_DIR/solution_draft##.md` |
### Save Principles
1. **Save immediately**: Write to the corresponding file as soon as a step is completed; don't wait until the end
2. **Incremental updates**: Same file can be updated multiple times; append or replace new content
3. **Preserve process**: Keep intermediate files even after their content is integrated into the final report
4. **Enable recovery**: If research is interrupted, progress can be recovered from intermediate files
### Output Files
**Required files** (automatically generated through the process):
| File | Content | When Generated |
|------|---------|----------------|
| `00_ac_assessment.md` | AC & restrictions assessment (Mode A only) | After Phase 1 completion |
| `00_question_decomposition.md` | Question type, sub-question list | After Step 0-1 completion |
| `01_source_registry.md` *(splittable)* | All source links and summaries | Continuously updated during Step 2 |
| `02_fact_cards.md` *(splittable)* | Extracted facts and sources | Continuously updated during Step 3 |
| `03_comparison_framework.md` | Selected framework and populated data | After Step 4 completion |
| `04_reasoning_chain.md` | Fact → conclusion reasoning | After Step 6 completion |
| `05_validation_log.md` | Use-case validation and review | After Step 7 completion |
| `06_component_fit_matrix.md` *(splittable)* | Exact-fit matrix for every proposed component/tool/pattern with status `Selected` / `Rejected` / `Experimental only` / `Needs user decision` | Before Step 8 deliverable formatting |
| `OUTPUT_DIR/solution_draft##.md` | Complete solution draft | After Step 8 completion |
| `OUTPUT_DIR/tech_stack.md` | Tech stack evaluation and decisions | After Phase 3 (optional) |
| `OUTPUT_DIR/security_analysis.md` | Threat model and security controls | After Phase 4 (optional) |
**Optional files**:
- `raw/*.md` - Raw source archives (saved when content is lengthy)
@@ -0,0 +1,131 @@
## Mode A: Initial Research
Triggered when no `solution_draft*.md` files exist in OUTPUT_DIR, or when the user explicitly requests initial research.
### Phase 1: AC & Restrictions Assessment (BLOCKING)
**Role**: Professional software architect
> **AC must be design-independent**: describe testable outcomes only — no libraries, algorithms, params, or design choices. Implementation follows AC, never reverse. (IEEE 830 / Atlassian / GitScrum)
A focused preliminary research pass **before** the main solution research. The goal is to validate that the acceptance criteria and restrictions are realistic before designing a solution around them. Any revision proposed in this phase must respect the design-independence rule above — propose AC changes as outcome/budget edits, not as implementation prescriptions.
**Input**: All files from INPUT_DIR (or INPUT_FILE in standalone mode)
**Task**:
1. Read all problem context files thoroughly
2. **ASK the user about every unclear aspect** — do not assume:
- Unclear problem boundaries → ask
- Ambiguous acceptance criteria values → ask
- Missing context (no `security_approach.md`, no `input_data/`) → ask what they have
- Conflicting restrictions → ask which takes priority
3. Research in internet **extensively** — use multiple search queries per question, rephrase, and search from different angles:
- How realistic are the acceptance criteria for this specific domain? Search for industry benchmarks, standards, and typical values
- How critical is each criterion? Search for case studies where criteria were relaxed or tightened
- What domain-specific acceptance criteria are we missing? Search for industry standards, regulatory requirements, and best practices in the specific domain
- Impact of each criterion value on the whole system quality — search for research papers and engineering reports
- Cost/budget implications of each criterion — search for pricing, total cost of ownership analyses, and comparable project budgets
- Timeline implications — search for project timelines, development velocity reports, and comparable implementations
- What do practitioners in this domain consider the most important criteria? Search forums, conference talks, and experience reports
4. Research restrictions from multiple perspectives:
- Are the restrictions realistic? Search for comparable projects that operated under similar constraints
- Should any be tightened or relaxed? Search for what constraints similar projects actually ended up with
- Are there additional restrictions we should add? Search for regulatory, compliance, and safety requirements in this domain
- What restrictions do practitioners wish they had defined earlier? Search for post-mortem reports and lessons learned
5. Verify findings with authoritative sources (official docs, papers, benchmarks) — each key finding must have at least 2 independent sources
**Uses Steps 0-3 of the 8-step engine** (question classification, decomposition, source tiering, fact extraction) scoped to AC and restrictions assessment.
**Save action**: Write `RESEARCH_DIR/00_ac_assessment.md` with format:
```markdown
# Acceptance Criteria Assessment
## Acceptance Criteria
| Criterion | Our Values | Researched Values | Cost/Timeline Impact | Status |
|-----------|-----------|-------------------|---------------------|--------|
| [name] | [current] | [researched range] | [impact] | Added / Modified / Removed |
## Restrictions Assessment
| Restriction | Our Values | Researched Values | Cost/Timeline Impact | Status |
|-------------|-----------|-------------------|---------------------|--------|
| [name] | [current] | [researched range] | [impact] | Added / Modified / Removed |
## Key Findings
[Summary of critical findings]
## Sources
[Key references used]
```
**BLOCKING**: Present the AC assessment tables to the user. Wait for confirmation or adjustments before proceeding to Phase 2. The user may update `acceptance_criteria.md` or `restrictions.md` based on findings.
---
### Phase 2: Problem Research & Solution Draft
**Role**: Professional researcher and software architect
Full 8-step research methodology. Produces the first solution draft.
**Input**: All files from INPUT_DIR (possibly updated after Phase 1) + Phase 1 artifacts
**Task** (drives the 8-step engine):
1. Research existing/competitor solutions for similar problems — search broadly across industries and adjacent domains, not just the obvious competitors
2. Research the problem thoroughly — all possible ways to solve it, split into components; search for how different fields approach analogous problems
3. Derive a **Project Constraint Matrix** before evaluating component options. Extract exact constraints from `problem.md`, `restrictions.md`, `acceptance_criteria.md`, input data notes, and the Phase 1 AC assessment. Include required inputs/outputs, operating context, runtime envelope, data availability, lifecycle boundaries, non-functional targets, integration boundaries, security constraints, and explicit out-of-scope decisions.
4. For each component, research all possible solutions and find the most efficient state-of-the-art approaches — use multiple query variants and perspectives from Step 1
5. For each promising approach, search for real-world deployment experience: success stories, failure reports, lessons learned, and practitioner opinions
6. Search for contrarian viewpoints — who argues against the common approaches and why? What failure modes exist?
7. Verify that suggested tools/libraries actually exist and work as described — check official repos, latest releases, and community health (stars, recent commits, open issues)
8. For every candidate component/tool/library/service/pattern/algorithm, prove exact fit against the Project Constraint Matrix. A field-adjacent solution is not selectable unless its documented implementation assumptions match the project's constraints. Mismatches must be recorded as disqualifiers and the candidate marked `Rejected`, `Experimental only`, or `Needs user decision`.
9. Include security considerations in each component analysis
10. Provide rough cost estimates for proposed solutions
Be concise in formulating. The fewer words, the better, but do not miss any important details.
**Save action**: Write `RESEARCH_DIR/06_component_fit_matrix.md` (or its split-folder equivalent under `RESEARCH_DIR/06_component_fit_matrix/`, per the splittable-artifacts convention in `00_project-integration.md`) before the final draft, then write `OUTPUT_DIR/solution_draft##.md` using template: `templates/solution_draft_mode_a.md`
---
### Phase 3: Tech Stack Consolidation (OPTIONAL)
**Role**: Software architect evaluating technology choices
Focused synthesis step — no new 8-step cycle. Uses research already gathered in Phase 2 to make concrete technology decisions.
**Input**: Latest `solution_draft##.md` from OUTPUT_DIR + all files from INPUT_DIR
**Task**:
1. Extract technology options from the solution draft's component comparison tables
2. Score each option against: fitness for purpose, maturity, security track record, team expertise, cost, scalability
3. Produce a tech stack summary with selection rationale
4. Assess risks and learning requirements per technology choice
**Save action**: Write `OUTPUT_DIR/tech_stack.md` with:
- Requirements analysis (functional, non-functional, constraints)
- Technology evaluation tables (language, framework, database, infrastructure, key libraries) with scores
- Tech stack summary block
- Risk assessment and learning requirements tables
---
### Phase 4: Security Deep Dive (OPTIONAL)
**Role**: Security architect
Focused analysis step — deepens the security column from the solution draft into a proper threat model and controls specification.
**Input**: Latest `solution_draft##.md` from OUTPUT_DIR + `security_approach.md` from INPUT_DIR + problem context
**Task**:
1. Build threat model: asset inventory, threat actors, attack vectors
2. Define security requirements and proposed controls per component (with risk level)
3. Summarize authentication/authorization, data protection, secure communication, and logging/monitoring approach
**Save action**: Write `OUTPUT_DIR/security_analysis.md` with:
- Threat model (assets, actors, vectors)
- Per-component security requirements and controls table
- Security controls summary
@@ -0,0 +1,34 @@
## Mode B: Solution Assessment
Triggered when `solution_draft*.md` files exist in OUTPUT_DIR.
**Role**: Professional software architect
Full 8-step research methodology applied to assessing and improving an existing solution draft.
**Input**: All files from INPUT_DIR + the latest (highest-numbered) `solution_draft##.md` from OUTPUT_DIR
**Task** (drives the 8-step engine):
1. Read the existing solution draft thoroughly
2. Derive or refresh the **Project Constraint Matrix** from all files in INPUT_DIR. Include required inputs/outputs, operating context, runtime envelope, data availability, lifecycle boundaries, non-functional targets, integration boundaries, security constraints, and explicit out-of-scope decisions.
3. Audit every component/decision in the existing draft against the Project Constraint Matrix before researching alternatives:
- If a component's documented implementation assumptions match the project constraints, keep it eligible and record evidence.
- If fit is unproven, mark it `Experimental only` until evidence is found.
- If constraints conflict, mark it `Rejected` and search for alternatives.
- If rejecting it changes product behavior or risk materially, escalate for user decision.
4. Research in internet extensively — for each component/decision in the draft, search for:
- Known problems and limitations of the chosen approach
- What practitioners say about using it in production
- Better alternatives that may have emerged recently
- Common failure modes and edge cases
- How competitors/similar projects solve the same problem differently
5. Search specifically for contrarian views: "why not [chosen approach]", "[chosen approach] criticism", "[chosen approach] failure"
6. Identify security weak points and vulnerabilities — search for CVEs, security advisories, and known attack vectors for each technology in the draft
7. Identify performance bottlenecks — search for benchmarks, load test results, and scalability reports
8. For each identified weak point, search for multiple solution approaches and compare them
9. For every revised candidate, prove exact fit against the Project Constraint Matrix. Do not select field-adjacent or "similar problem" options unless their intrinsic implementation constraints match the project.
10. Based on findings, form a new solution draft in the same format
**Save action**: Write `RESEARCH_DIR/06_component_fit_matrix.md` (or its split-folder equivalent under `RESEARCH_DIR/06_component_fit_matrix/`, per the splittable-artifacts convention in `00_project-integration.md`) before the final draft, then write `OUTPUT_DIR/solution_draft##.md` (incremented) using template: `templates/solution_draft_mode_b.md`
**Optional follow-up**: After Mode B completes, the user can request Phase 3 (Tech Stack Consolidation) or Phase 4 (Security Deep Dive) using the revised draft. These phases work identically to their Mode A descriptions in `steps/01_mode-a-initial-research.md`.
@@ -0,0 +1,327 @@
## Research Engine — Investigation Phase (Steps 03.5)
### Step 0: Question Type Classification
First, classify the research question type and select the corresponding strategy:
| Question Type | Core Task | Focus Dimensions |
|---------------|-----------|------------------|
| **Concept Comparison** | Build comparison framework | Mechanism differences, applicability boundaries |
| **Decision Support** | Weigh trade-offs | Cost, risk, benefit |
| **Trend Analysis** | Map evolution trajectory | History, driving factors, predictions |
| **Problem Diagnosis** | Root cause analysis | Symptoms, causes, evidence chain |
| **Knowledge Organization** | Systematic structuring | Definitions, classifications, relationships |
**Mode-specific classification**:
| Mode / Phase | Typical Question Type |
|--------------|----------------------|
| Mode A Phase 1 | Knowledge Organization + Decision Support |
| Mode A Phase 2 | Decision Support |
| Mode B | Problem Diagnosis + Decision Support |
### Step 0.5: Novelty Sensitivity Assessment (BLOCKING)
Before starting research, assess the novelty sensitivity of the question (Critical/High/Medium/Low). This determines source time windows and filtering strategy.
**For full classification table, critical-domain rules, trigger words, and assessment template**: Read `references/novelty-sensitivity.md`
Key principle: Critical-sensitivity topics (AI/LLMs, blockchain) require sources within 6 months, mandatory version annotations, cross-validation from 2+ sources, and direct verification of official download pages.
**Save action**: Append timeliness assessment to the end of `00_question_decomposition.md`
---
### Step 1: Question Decomposition & Boundary Definition
**Mode-specific sub-questions**:
**Mode A Phase 2** (Initial Research — Problem & Solution):
- "What existing/competitor solutions address this problem?"
- "What are the component parts of this problem?"
- "For each component, what are the state-of-the-art solutions?"
- "For each component, what are the practical alternatives across simple baseline, established production option, open-source option, commercial option, current SOTA, adjacent-domain option, and no-build/defer option?"
- "What are the security considerations per component?"
- "What are the cost implications of each approach?"
**Mode B** (Solution Assessment):
- "What are the weak points and potential problems in the existing draft?"
- "What are the security vulnerabilities in the proposed architecture?"
- "Where are the performance bottlenecks?"
- "What solutions exist for each identified issue?"
- "For each component already selected in the draft, what alternatives should be considered before keeping, replacing, or rejecting it?"
**General sub-question patterns** (use when applicable):
- **Sub-question A**: "What is X and how does it work?" (Definition & mechanism)
- **Sub-question B**: "What are the dimensions of relationship/difference between X and Y?" (Comparative analysis)
- **Sub-question C**: "In what scenarios is X applicable/inapplicable?" (Boundary conditions)
- **Sub-question D**: "What are X's development trends/best practices?" (Extended analysis)
#### Perspective Rotation (MANDATORY)
For each research problem, examine it from **at least 3 different perspectives**. Each perspective generates its own sub-questions and search queries.
| Perspective | What it asks | Example queries |
|-------------|-------------|-----------------|
| **End-user / Consumer** | What problems do real users encounter? What do they wish were different? | "X problems", "X frustrations reddit", "X user complaints" |
| **Implementer / Engineer** | What are the technical challenges, gotchas, hidden complexities? | "X implementation challenges", "X pitfalls", "X lessons learned" |
| **Business / Decision-maker** | What are the costs, ROI, strategic implications? | "X total cost of ownership", "X ROI case study", "X vs Y business comparison" |
| **Contrarian / Devil's advocate** | What could go wrong? Why might this fail? What are critics saying? | "X criticism", "why not X", "X failures", "X disadvantages real world" |
| **Domain expert / Academic** | What does peer-reviewed research say? What are theoretical limits? | "X research paper", "X systematic review", "X benchmarks academic" |
| **Practitioner / Field** | What do people who actually use this daily say? What works in practice vs theory? | "X in production", "X experience report", "X after 1 year" |
Select at least 3 perspectives relevant to the problem. Document the chosen perspectives in `00_question_decomposition.md`.
#### Question Explosion (MANDATORY)
For **each sub-question**, generate **at least 3-5 search query variants** before searching. This ensures broad coverage and avoids missing relevant information due to terminology differences.
**Query variant strategies**:
- **Specificity ladder**: broad ("indoor navigation systems") → narrow ("UWB-based indoor drone navigation accuracy")
- **Negation/failure**: "X limitations", "X failure modes", "when X doesn't work"
- **Comparison framing**: "X vs Y for Z", "X alternative for Z", "X or Y which is better for Z"
- **Practitioner voice**: "X in production experience", "X real-world results", "X lessons learned"
- **Temporal**: "X 2025", "X latest developments", "X roadmap"
- **Geographic/domain**: "X in Europe", "X for defense applications", "X in agriculture"
Record all planned queries in `00_question_decomposition.md` alongside each sub-question.
#### Component Option Breadth (MANDATORY)
Before Step 2, identify the component areas implied by the problem and create a search plan for options in each area. A component area is any replaceable tool, library, model, service, algorithm, data format, protocol, infrastructure pattern, or validation approach that could materially affect the solution.
For every component area, generate search queries for these option families unless clearly not applicable:
- **Simple baseline**: low-complexity classical or manual approach that can serve as a fallback or regression baseline.
- **Established production option**: mature library/service/pattern with field usage.
- **Open-source candidate**: permissive-license option with inspectable implementation and community history.
- **Commercial/vendor option**: paid or vendor-supported option, including SDK/platform constraints.
- **Current SOTA / research option**: recent model, paper, or benchmark leader that may be promising but immature.
- **Adjacent-domain option**: solution from a neighboring domain with similar constraints.
- **No-build / defer option**: whether the component can be avoided, simplified, or moved out of scope.
- **Known bad option**: candidate or family that appears attractive but has documented failure modes or disqualifiers.
For each component area, record:
- Candidate names and option families to search.
- At least 5 query variants covering alternatives, comparisons, limitations, licensing, runtime/scale, and exact project constraints.
- The minimum evidence needed to mark a candidate `Selected`, `Rejected`, `Experimental only`, or `Needs user decision`.
Add this as a "Component Option Search Plan" section in `00_question_decomposition.md`.
**Research Subject Boundary Definition (BLOCKING - must be explicit)**:
When decomposing questions, you must explicitly define the **boundaries of the research subject**:
| Dimension | Boundary to define | Example |
|-----------|--------------------|---------|
| **Population** | Which group is being studied? | University students vs K-12 vs vocational students vs all students |
| **Geography** | Which region is being studied? | Chinese universities vs US universities vs global |
| **Timeframe** | Which period is being studied? | Post-2020 vs full historical picture |
| **Level** | Which level is being studied? | Undergraduate vs graduate vs vocational |
| **Operating context** | What exact environment, lifecycle phase, and runtime conditions must the solution support? | In-flight embedded runtime vs offline post-processing; production web traffic vs admin batch job |
| **Required interfaces** | What inputs, outputs, protocols, data shapes, and ownership boundaries are fixed? | One camera vs stereo rig; REST API vs message queue; local file boundary vs service API |
| **Non-functional envelope** | What latency, throughput, storage, memory, availability, safety, security, cost, and maintainability targets are binding? | <400 ms p95, 8 GB RAM, 99.9% availability, reversible migrations |
**Common mistake**: User asks about "university classroom issues" but sources include policies targeting "K-12 students" — mismatched target populations will invalidate the entire research.
#### Decomposition Completeness Audit (MANDATORY)
After generating sub-questions, verify the decomposition covers all major dimensions of the problem — not just the ones that came to mind first.
1. **Domain discovery search**: Search the web for "key factors when [problem domain]" / "what to consider when [problem domain]" (e.g., "key factors GPS-denied navigation", "what to consider when choosing an edge deployment strategy"). Extract dimensions that practitioners and domain experts consider important but are absent from the current sub-questions.
2. **Run completeness probes**: Walk through each probe in `references/comparison-frameworks.md` → "Decomposition Completeness Probes" against the current sub-question list. For each probe, note whether it is covered, not applicable (state why), or missing.
3. **Fill gaps**: Add sub-questions (with search query variants) for any uncovered area. Do this before proceeding to Step 2.
Record the audit result in `00_question_decomposition.md` as a "Completeness Audit" section.
**Save action**:
1. Read all files from INPUT_DIR to ground the research in the project context
2. Create working directory `RESEARCH_DIR/`
3. Write `00_question_decomposition.md`, including:
- Original question
- Active mode (A Phase 2 or B) and rationale
- Summary of relevant problem context from INPUT_DIR
- Classified question type and rationale
- **Research subject boundary definition** (population, geography, timeframe, level)
- **Project Constraint Matrix summary** (operating context, required interfaces, non-functional envelope, lifecycle assumptions, and hard disqualifiers extracted from input files)
- List of decomposed sub-questions
- **Chosen perspectives** (at least 3 from the Perspective Rotation table) with rationale
- **Search query variants** for each sub-question (at least 3-5 per sub-question)
- **Component Option Search Plan** (component areas, option families, candidate names, query variants, required evidence)
- **Completeness audit** (taxonomy cross-reference + domain discovery results)
4. Write TodoWrite to track progress
---
### Step 2: Source Tiering & Exhaustive Web Investigation
Tier sources by authority, **prioritize primary sources** (L1 > L2 > L3 > L4). Conclusions must be traceable to L1/L2; L3/L4 serve as supplementary and validation.
**For full tier definitions, search strategies, community mining steps, and source registry templates**: Read `references/source-tiering.md`
**Tool Usage**:
- Use `WebSearch` for broad searches; `WebFetch` to read specific pages
- Use the `context7` MCP server (`resolve-library-id` then `query-docs` / `get-library-docs`) for up-to-date library/framework documentation. **Mandatory per lead candidate** — see "API Capability Verification" below.
- Always cross-verify training data claims against live sources for facts that may have changed (versions, APIs, deprecations, security advisories)
- When citing web sources, include the URL and date accessed
#### Exhaustive Search Requirements (MANDATORY)
Do not stop at the first few results. The goal is to build a comprehensive evidence base.
**Minimum search effort per sub-question**:
- Execute **all** query variants generated in Step 1's Question Explosion (at least 3-5 per sub-question)
- Consult at least **2 different source tiers** per sub-question (e.g., L1 official docs + L4 community discussion)
- If initial searches yield fewer than 3 relevant sources for a sub-question, **broaden the search** with alternative terms, related domains, or analogous problems
**Minimum search effort per component area**:
- Search every option family from the "Component Option Search Plan" before choosing a lead candidate.
- For each lead, fallback, or rejected candidate, search at least one official/source-of-truth page and at least one independent validation source when available.
- Search `"[component] alternatives"`, `"[candidate] vs [alternative]"`, `"[candidate] limitations"`, `"[candidate] license"`, `"[candidate] production"`, and `"[candidate] [binding project constraint]"`.
- If fewer than 3 realistic candidates are found for a component area, explicitly document why the landscape is narrow and search adjacent domains before accepting that result.
- Include at least one simple baseline and one "do not use" or disqualified candidate per component area when possible; these prevent false confidence in the selected option.
**Candidate implementation-limit searches (MANDATORY)**:
For every component/tool/library/service/pattern/algorithm that may be selected or recommended, search for its intrinsic implementation constraints. Do not rely on product category labels, marketing summaries, or examples from a different operating context. Include query variants for:
- Official supported inputs/outputs, protocols, data formats, and deployment modes
- Required hardware/runtime/platform/version constraints
- Timing, throughput, memory, storage, synchronization, and scaling assumptions
- Lifecycle assumptions: offline vs online, batch vs real time, development vs production, single tenant vs multi tenant, local vs networked
- Known unsupported scenarios, limitations, issue reports, production failures, and workarounds
- Licensing, security, maintenance, and community-health constraints
- Exact phrases from the project's restrictions and acceptance criteria combined with the candidate name
**API Capability Verification — Per-Mode (MANDATORY, BLOCKING for lead candidates)**:
**Applicability**: this section applies only when the run is classified as **Technical-component selection** in the SKILL's Research Output Class section, and only to lead candidates that are libraries/SDKs/frameworks/services/protocols/data formats with multiple modes or configurations. For non-technical research (concept comparison, market/policy investigation, knowledge organization, root-cause analysis without tooling commitments), skip this entire sub-section and continue with the rest of Step 2 — the broader candidate implementation-limit search above is sufficient. State the skip explicitly once in `02_fact_cards.md` (or in `02_fact_cards/00_summary.md` if split): `API Capability Verification: not applicable — this run is a Non-technical investigation, no library/SDK/service candidates`.
Most libraries/SDKs/services expose **multiple modes or configurations** (e.g., monocular vs stereo VO, sync vs async API, batch vs streaming inference, write-through vs write-behind cache). Selecting a candidate "because it supports X" without pinning *which mode* the project will use, and *whether that exact mode produces the required outputs from the required inputs*, is the most common silent-failure path in research. A library can support a class of problem in mode A while being unusable for the project's specific configuration in mode B.
For every lead candidate that is a library/SDK/framework/service with multiple modes or configurations, do the following — in this order, before marking the candidate `Selected`:
1. **Pin the exact mode/configuration the project will use.**
Derived from the Project Constraint Matrix: which inputs are available (sensor count, sensor types, data shapes, rates), which outputs are required (per `acceptance_criteria.md` and contract files), which hardware/runtime is fixed (per `restrictions.md`). Write this as a single sentence: "We will use `<library>` in `<mode/config>` with inputs `<list>` and expect outputs `<list>` on `<runtime>`." Do not progress past this step on a vague mode description.
2. **Run `context7` (or equivalent docs lookup) for the candidate** — this is **mandatory for every lead library/SDK/framework candidate**, not optional. Minimum three queries per candidate:
1. *Mode enumeration*: "What modes/configurations does `<library>` support? List every value of the mode/config enum and what each requires as input."
2. *Project's exact mode*: "Show a minimum runnable example of `<library>` in `<the pinned mode>` with `<the project's input shape>`. What does it produce?"
3. *Disqualifier probe*: "Does `<library>` `<the pinned mode>` produce `<the required output>`? Are there published limitations of `<the pinned mode>` for `<the project's runtime/hardware>`?"
For services without context7 coverage, use official docs site + WebFetch on the API reference page + the project's example/tutorial directory in the source repo. Append every consulted URL to `01_source_registry.md` (or the appropriate category file under `01_source_registry/` if split — see splittable-artifacts convention in `00_project-integration.md`).
3. **Save a Minimum Viable Example (MVE) for the pinned mode.**
Append to `02_fact_cards.md` / `02_fact_cards/` (or a sibling `02_mve_evidence.md`) at least one block per lead library candidate with:
```markdown
## MVE — <library> in <pinned mode>
- **Source**: <official URL or context7 reference, with date>
- **Inputs in the example**: <e.g., 2 calibrated cameras + IMU at 200 Hz>
- **Outputs in the example**: <e.g., 6-DoF pose with covariance>
- **Project inputs**: <e.g., 1 camera + IMU at 200 Hz>
- **Project outputs required**: <e.g., 6-DoF pose with metric translation>
- **Match assessment**: ✅ exact match / ⚠️ partial (specify dimension) / ❌ mismatch (specify dimension)
- **If ⚠️ or ❌**: cite the official-docs sentence that establishes the mismatch.
```
If no official example covers the project's exact configuration → the candidate cannot be marked `Selected` based on category fit alone. Status must be `Experimental only` (with required-evidence note) or `Rejected` (when the docs explicitly disqualify the configuration).
4. **Bind every numbered Restriction and Acceptance Criterion to the candidate's pinned mode.**
For each numbered line in `restrictions.md` and `acceptance_criteria.md`, decide one of: `Pass` (the pinned mode satisfies it with cited evidence), `Fail` (the pinned mode contradicts it with cited evidence), `Verify` (no evidence either way; deeper investigation required), `N/A` (the line is irrelevant to this component area). Record this in `02_fact_cards.md` (or the candidate's per-component file under `02_fact_cards/` if split) under the candidate's MVE block. The structural matrix in Step 7.5 reads from these bindings.
5. **Treat "the same library in a different mode" as a different candidate.**
If the project's pinned mode is `Monocular` but the only documented evidence covers `Stereo`, do not silently soften "rotation only" into "rotation + translation". Open a separate candidate row for the Monocular mode, with its own MVE, fit assessment, and disqualifiers. Two modes of one library are two distinct candidates for the purposes of this gate.
**Common silent-failure pattern this guards against**: a fact card paraphrases the docs as "supports A, B, C, D modes" when the docs actually mean "supports A; B; C and D as separate orthogonal modes". A category-level "Selected" decision then carries through every downstream artifact, masking that the project's required A+B combination does not exist as a single mode.
**Search broadening strategies** (use when results are thin):
- Try adjacent fields: if researching "drone indoor navigation", also search "robot indoor navigation", "warehouse AGV navigation"
- Try different communities: academic papers, industry whitepapers, military/defense publications, hobbyist forums
- Try different geographies: search in English + search for European/Asian approaches if relevant
- Try historical evolution: "history of X", "evolution of X approaches", "X state of the art 2024 2025"
- Try failure analysis: "X project failure", "X post-mortem", "X recall", "X incident report"
- Try disqualifier probes: "X unsupported", "X limitations", "X requirements", "X with [project constraint]", "X without [required input]", "X real-time [target]", "X production failure"
**Search saturation rule**: Continue searching until new queries stop producing substantially new information. If the last 3 searches only repeat previously found facts, the sub-question is saturated.
**Save action**:
For each source consulted, **immediately** append to `01_source_registry.md` (or the appropriate category file under `01_source_registry/` if split) using the entry template from `references/source-tiering.md`.
---
### Step 3: Fact Extraction & Evidence Cards
Transform sources into **verifiable fact cards**:
```markdown
## Fact Cards
### Fact 1
- **Statement**: [specific fact description]
- **Source**: [link/document section]
- **Confidence**: High/Medium/Low
### Fact 2
...
```
**Key discipline**:
- Pin down facts first, then reason
- Distinguish "what officials said" from "what I infer"
- When conflicting information is found, annotate and preserve both sides
- Annotate confidence level:
- ✅ High: Explicitly stated in official documentation
- ⚠️ Medium: Mentioned in official blog but not formally documented
- ❓ Low: Inference or from unofficial sources
**Save action**:
For each extracted fact, **immediately** append to `02_fact_cards.md` (or the appropriate category file under `02_fact_cards/` if split):
```markdown
## Fact #[number]
- **Statement**: [specific fact description]
- **Source**: [Source #number] [link]
- **Phase**: [Phase 1 / Phase 2 / Assessment]
- **Target Audience**: [which group this fact applies to, inherited from source or further refined]
- **Confidence**: ✅/⚠️/❓
- **Related Dimension**: [corresponding comparison dimension]
- **Fit Impact**: [supports selection / disqualifies / makes experimental / needs user decision]
```
**Target audience in fact statements**:
- If a fact comes from a "partially overlapping" or "reference only" source, the statement **must explicitly annotate the applicable scope**
- Wrong: "The Ministry of Education banned phones in classrooms" (doesn't specify who)
- Correct: "The Ministry of Education banned K-12 students from bringing phones into classrooms (does not apply to university students)"
---
### Step 3.5: Iterative Deepening — Follow-Up Investigation
After initial fact extraction, review what you have found and identify **knowledge gaps and new questions** that emerged from the initial research. This step ensures the research doesn't stop at surface-level findings.
**Process**:
1. **Gap analysis**: Review fact cards and identify:
- Sub-questions with fewer than 3 high-confidence facts → need more searching
- Contradictions between sources → need tie-breaking evidence
- Perspectives (from Step 1) that have no or weak coverage → need targeted search
- Claims that rely only on L3/L4 sources → need L1/L2 verification
2. **Follow-up question generation**: Based on initial findings, generate new questions:
- "Source X claims [fact] — is this consistent with other evidence?"
- "If [approach A] has [limitation], how do practitioners work around it?"
- "What are the second-order effects of [finding]?"
- "Who disagrees with [common finding] and why?"
- "What happened when [solution] was deployed at scale?"
3. **Targeted deep-dive searches**: Execute follow-up searches focusing on:
- Specific claims that need verification
- Alternative viewpoints not yet represented
- Real-world case studies and experience reports
- Failure cases and edge conditions
- Recent developments that may change the picture
4. **Update artifacts**: Append new sources to `01_source_registry.md`, new facts to `02_fact_cards.md` (use the appropriate category files under `01_source_registry/` and `02_fact_cards/` if split)
**Exit criteria**: Proceed to Step 4 when:
- Every sub-question has at least 3 facts with at least one from L1/L2
- At least 3 perspectives from Step 1 have supporting evidence
- No unresolved contradictions remain (or they are explicitly documented as open questions)
- Follow-up searches are no longer producing new substantive information
@@ -0,0 +1,220 @@
## Research Engine — Analysis Phase (Steps 48)
### Step 4: Build Comparison/Analysis Framework
Based on the question type, select fixed analysis dimensions. **For dimension lists** (General, Concept Comparison, Decision Support): Read `references/comparison-frameworks.md`
**Save action**:
Write to `03_comparison_framework.md`:
```markdown
# Comparison Framework
## Selected Framework Type
[Concept Comparison / Decision Support / ...]
## Selected Dimensions
1. [Dimension 1]
2. [Dimension 2]
...
## Initial Population
| Dimension | X | Y | Factual Basis |
|-----------|---|---|---------------|
| [Dimension 1] | [description] | [description] | Fact #1, #3 |
| ... | | | |
```
**Required exact-fit dimensions for component/tool decisions**:
When the output selects or recommends a component, tool, library, service, architecture pattern, or algorithm, the framework MUST include these dimensions unless explicitly not applicable:
- Option family (`Simple baseline`, `Established production`, `Open-source`, `Commercial/vendor`, `Current SOTA`, `Adjacent-domain`, `No-build/defer`, `Known bad`)
- Required inputs/outputs and ownership boundaries
- Operating context and lifecycle fit
- Non-functional envelope fit
- Implementation assumptions and hard disqualifiers
- Evidence quality and source tier
- Selection status (`Selected`, `Rejected`, `Experimental only`, `Needs user decision`)
For each component area, include multiple candidates in the initial population. Do not present only the preferred option unless the investigation found no realistic alternatives; if so, state the searches that proved the narrow landscape.
---
### Step 5: Reference Point Baseline Alignment
Ensure all compared parties have clear, consistent definitions:
**Checklist**:
- [ ] Is the reference point's definition stable/widely accepted?
- [ ] Does it need verification, or can domain common knowledge be used?
- [ ] Does the reader's understanding of the reference point match mine?
- [ ] Are there ambiguities that need to be clarified first?
---
### Step 6: Fact-to-Conclusion Reasoning Chain
Explicitly write out the "fact → comparison → conclusion" reasoning process:
```markdown
## Reasoning Process
### Regarding [Dimension Name]
1. **Fact confirmation**: According to [source], X's mechanism is...
2. **Compare with reference**: While Y's mechanism is...
3. **Conclusion**: Therefore, the difference between X and Y on this dimension is...
```
**Key discipline**:
- Conclusions come from mechanism comparison, not "gut feelings"
- Every conclusion must be traceable to specific facts
- Uncertain conclusions must be annotated
**Save action**:
Write to `04_reasoning_chain.md`:
```markdown
# Reasoning Chain
## Dimension 1: [Dimension Name]
### Fact Confirmation
According to [Fact #X], X's mechanism is...
### Reference Comparison
While Y's mechanism is... (Source: [Fact #Y])
### Conclusion
Therefore, the difference between X and Y on this dimension is...
### Confidence
✅/⚠️/❓ + rationale
---
## Dimension 2: [Dimension Name]
...
```
---
### Step 7: Use-Case Validation (Sanity Check)
Validate conclusions against a typical scenario:
**Validation questions**:
- Based on my conclusions, how should this scenario be handled?
- Is that actually the case?
- Are there counterexamples that need to be addressed?
**Review checklist**:
- [ ] Are draft conclusions consistent with Step 3 fact cards?
- [ ] Are there any important dimensions missed?
- [ ] Is there any over-extrapolation?
- [ ] Are conclusions actionable/verifiable?
- [ ] Does every selected component/tool/pattern match the Project Constraint Matrix?
- [ ] Are mismatches marked as disqualifiers instead of hidden as generic "limitations"?
**Save action**:
Write to `05_validation_log.md`:
```markdown
# Validation Log
## Validation Scenario
[Scenario description]
## Expected Based on Conclusions
If using X: [expected behavior]
If using Y: [expected behavior]
## Actual Validation Results
[actual situation]
## Counterexamples
[yes/no, describe if yes]
## Review Checklist
- [x] Draft conclusions consistent with fact cards
- [x] No important dimensions missed
- [x] No over-extrapolation
- [ ] Issue found: [if any]
## Conclusions Requiring Revision
[if any]
```
---
### Step 7.5: Component Applicability Gate (BLOCKING)
**Applicability**: this gate applies only when the run is classified as **Technical-component selection** in the SKILL's Research Output Class section. For non-technical research (concept comparison, market/policy investigation, root-cause analysis without tooling, knowledge organization), skip this entire step and proceed to Step 8 — there are no components to gate. State the skip once in `05_validation_log.md`: `Step 7.5 (Component Applicability Gate): not applicable — Non-technical investigation`. For mixed runs (some component areas technical, some not), apply this gate only to the technical component areas; the non-technical ones do not produce 7.5 rows.
Before finalizing the solution draft, build an exact-fit matrix for every component/tool/library/service/pattern/algorithm that is selected, recommended, rejected, or treated as a fallback. Free-form prose in a "Project Constraints Checked" column is **not sufficient** — mismatches hide inside rationale text. The matrix must be structured per restriction and per acceptance criterion.
#### 7.5.1 Top-level Component Fit Matrix
```markdown
# Component Fit Matrix
| Component Area | Candidate | Pinned Mode/Config | Option Family | Intended Role | API Capability Evidence | Mismatches / Disqualifiers | Status | Decision Rationale |
|----------------|-----------|--------------------|---------------|---------------|-------------------------|----------------------------|--------|--------------------|
| [area] | [name] | [exact mode/config the project will use, copied verbatim from the MVE block in Step 2] | [family] | [role] | MVE: [link to MVE block in `02_fact_cards.md` / `02_fact_cards/` or `02_mve_evidence.md`]; docs: [Source #] | [none / list] | Selected / Rejected / Experimental only / Needs user decision | [why] |
```
The new **Pinned Mode/Config** column is mandatory. A row without a pinned mode is incomplete. The new **API Capability Evidence** column links to the Minimum Viable Example saved during Step 2's API Capability Verification — without an MVE link the candidate cannot be `Selected`.
#### 7.5.2 Restrictions × Candidate-Modes Sub-Matrix (MANDATORY)
For each lead candidate row in the top-level matrix, append a structured cross-check that walks every numbered line of `restrictions.md` and `acceptance_criteria.md` against the candidate's **pinned mode/config**.
```markdown
## Sub-Matrix — <Candidate Name> in <Pinned Mode>
| Restriction / AC | Candidate-mode behavior | Result | Evidence |
|------------------|-------------------------|--------|----------|
| R1: <verbatim line from restrictions.md> | <how the pinned mode behaves under this restriction> | ✅ Pass / ❌ Fail / ❓ Verify / N/A | [Fact # / Source # / MVE link] |
| R2: ... | ... | ... | ... |
| ... | ... | ... | ... |
| AC-1.1: <verbatim line from acceptance_criteria.md> | <how the pinned mode satisfies (or contradicts) this AC's measurable target> | ✅ / ❌ / ❓ / N/A | [Fact # / Source # / MVE link] |
| AC-1.2: ... | ... | ... | ... |
| ... | ... | ... | ... |
```
Cell semantics:
-**Pass** — the candidate's pinned mode satisfies this line, with cited official-doc or MVE evidence.
-**Fail** — the candidate's pinned mode contradicts this line, with cited evidence. Even one ❌ disqualifies the candidate from `Selected` status.
-**Verify** — no evidence yet either way; further investigation required (loops back to Step 2 / Step 3.5). A row left ❓ at the end of analysis blocks the candidate.
- **N/A** — the line is irrelevant to this component area (state why in one phrase).
A candidate row may not be marked `Selected` while any cell is ❌ or ❓.
#### 7.5.3 Decision Rules
- `Selected` is allowed only when (a) the top-level row has an MVE link, (b) the sub-matrix has zero ❌, (c) the sub-matrix has zero ❓, and (d) the candidate's documented implementation assumptions match the project's explicit constraints and acceptance criteria.
- `Experimental only` is required when a candidate might work but lacks proof for the exact operating context (e.g., MVE exists for a similar configuration but not the exact one).
- `Rejected` is required when documented assumptions conflict with project constraints (any sub-matrix row is ❌ with cited evidence).
- `Needs user decision` is required when a mismatch changes scope, cost, safety, product behavior, or acceptance criteria — and the user has not yet been consulted.
- Each component area must include at least one selected or fallback-safe option, plus the most credible rejected/experimental alternatives discovered during web research.
- A component area with only one candidate is incomplete unless `00_question_decomposition.md` documents the broader searches and why they yielded no realistic alternatives.
- A candidate may not appear as the lead solution in Step 8 unless this gate marks it `Selected`.
- "Validation gate required" footnotes are not equivalent to `Selected`. If the validation gate concerns API capability (does the mode produce the required output?), that is a Step-2 / Step-7.5 question and must be resolved here, not deferred to runtime. Only validation gates concerning *runtime quality* (e.g., "does this VO converge on this terrain class?") may be carried forward as `Selected with runtime gate`.
**Save action**: Write `06_component_fit_matrix.md` (or, when split, the equivalent files under `06_component_fit_matrix/` — typically `00_summary.md` for the top-level matrix plus per-component sub-matrix files) containing both 7.5.1 (top-level) and 7.5.2 (per-candidate sub-matrices).
**BLOCKING**: If any lead candidate has ❌, ❓, `Experimental only`, `Rejected`, or `Needs user decision` status, do not silently proceed. Ask the user or choose a different selected candidate.
---
### Step 8: Deliverable Formatting
Make the output **readable, traceable, and actionable**.
**Save action**:
Integrate all intermediate artifacts. Write to `OUTPUT_DIR/solution_draft##.md` using the appropriate output template based on active mode:
- Mode A: `templates/solution_draft_mode_a.md`
- Mode B: `templates/solution_draft_mode_b.md`
Sources to integrate:
- Extract background from `00_question_decomposition.md`
- Reference key facts from `02_fact_cards.md` (or files under `02_fact_cards/` if split)
- Organize conclusions from `04_reasoning_chain.md`
- Generate references from `01_source_registry.md` (or files under `01_source_registry/` if split)
- Supplement with use cases from `05_validation_log.md`
- For Mode A: include AC assessment from `00_ac_assessment.md`
@@ -0,0 +1,46 @@
# Solution Draft
## Product Solution Description
[Short description of the proposed solution. Brief component interaction diagram.]
## Existing/Competitor Solutions Analysis
[Analysis of existing solutions for similar problems, if any.]
## Architecture
[Architecture solution that meets restrictions and acceptance criteria.]
> **Applicability** — the table columns `Pinned Mode/Config` and `API Capability Evidence` apply only to technical-component runs (per SKILL.md → Research Output Class). For non-technical research outputs (concept comparison, market/policy report, investigation answer), this Architecture section may be replaced with a comparison/analysis section that does not use these columns; or the columns may be marked `N/A` per row when the row describes a non-technical "component" (a process, a policy, an organizational construct). For mixed runs, fill the columns only on rows that describe libraries/SDKs/frameworks/services/protocols/data formats/algorithms.
### Component: [Component Name]
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit |
|----------|-------|--------------------|-----------|-------------|-------------|----------|------|-------------------------|-----|
| [Option 1] | [lib/platform] | [exact mode/config used: inputs, outputs, runtime] | [pros] | [cons] | [intrinsic requirements] | [security] | [cost] | MVE: [link to MVE block]; docs: [Source #] | [Selected / Rejected / Experimental only / Needs user decision — cite exact-fit evidence and disqualifiers] |
| [Option 2] | [lib/platform] | [exact mode/config used] | [pros] | [cons] | [intrinsic requirements] | [security] | [cost] | MVE: [link]; docs: [Source #] | [Selected / Rejected / Experimental only / Needs user decision] |
**Exact-fit evidence**:
- Project constraints checked: [inputs/outputs, operating context, lifecycle, NFRs, acceptance criteria]
- Evidence: [Fact # / Source #]
- Disqualifiers: [none or list]
- Restrictions × Candidate-Modes sub-matrix: see `06_component_fit_matrix.md` (or `06_component_fit_matrix/` if split) § <Candidate Name>
- API capability gates: ✅ MVE saved / ⚠️ partial — see disqualifiers / ❌ no MVE — candidate is Experimental only or Rejected
[Repeat per component]
## Testing Strategy
### Integration / Functional Tests
- [Test 1]
- [Test 2]
### Non-Functional Tests
- [Performance test 1]
- [Security test 1]
## References
[All cited source links]
## Related Artifacts
- Tech stack evaluation: `_docs/01_solution/tech_stack.md` (if Phase 3 was executed)
- Security analysis: `_docs/01_solution/security_analysis.md` (if Phase 4 was executed)
@@ -0,0 +1,49 @@
# Solution Draft
## Assessment Findings
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|------------------------|----------------------------------------------|-------------|
| [old] | [weak point] | [new] |
## Product Solution Description
[Short description. Brief component interaction diagram. Written as if from scratch — no "updated" markers.]
## Architecture
[Architecture solution that meets restrictions and acceptance criteria.]
> **Applicability** — the table columns `Pinned Mode/Config` and `API Capability Evidence` apply only to technical-component runs (per SKILL.md → Research Output Class). For non-technical assessment outputs (e.g., reassessing a policy approach, comparing organizational designs), this Architecture section may be replaced with the assessment content that does not use these columns; or the columns may be marked `N/A` per row for non-technical "components". For mixed runs, fill the columns only on rows that describe libraries/SDKs/frameworks/services/protocols/data formats/algorithms.
### Component: [Component Name]
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Performance | API Capability Evidence | Fit |
|----------|-------|--------------------|-----------|-------------|-------------|----------|------------|-------------------------|-----|
| [Option 1] | [lib/platform] | [exact mode/config used: inputs, outputs, runtime] | [pros] | [cons] | [intrinsic requirements] | [security] | [perf] | MVE: [link to MVE block]; docs: [Source #] | [Selected / Rejected / Experimental only / Needs user decision — cite exact-fit evidence and disqualifiers] |
| [Option 2] | [lib/platform] | [exact mode/config used] | [pros] | [cons] | [intrinsic requirements] | [security] | [perf] | MVE: [link]; docs: [Source #] | [Selected / Rejected / Experimental only / Needs user decision] |
**Exact-fit evidence**:
- Project constraints checked: [inputs/outputs, operating context, lifecycle, NFRs, acceptance criteria]
- Evidence: [Fact # / Source #]
- Disqualifiers: [none or list]
- Restrictions × Candidate-Modes sub-matrix: see `06_component_fit_matrix.md` (or `06_component_fit_matrix/` if split) § <Candidate Name>
- API capability gates: ✅ MVE saved / ⚠️ partial — see disqualifiers / ❌ no MVE — candidate is Experimental only or Rejected
[Repeat per component]
## Testing Strategy
### Integration / Functional Tests
- [Test 1]
- [Test 2]
### Non-Functional Tests
- [Performance test 1]
- [Security test 1]
## References
[All cited source links]
## Related Artifacts
- Tech stack evaluation: `_docs/01_solution/tech_stack.md` (if Phase 3 was executed)
- Security analysis: `_docs/01_solution/security_analysis.md` (if Phase 4 was executed)