loader/.cursor/skills/research/steps/03_engine-investigation.md

## Research Engine — Investigation Phase (Steps 0–3.5)

### Step 0: Question Type Classification

First, classify the research question type and select the corresponding strategy:

| Question Type | Core Task | Focus Dimensions |
|---------------|-----------|------------------|
| **Concept Comparison** | Build comparison framework | Mechanism differences, applicability boundaries |
| **Decision Support** | Weigh trade-offs | Cost, risk, benefit |
| **Trend Analysis** | Map evolution trajectory | History, driving factors, predictions |
| **Problem Diagnosis** | Root cause analysis | Symptoms, causes, evidence chain |
| **Knowledge Organization** | Systematic structuring | Definitions, classifications, relationships |

**Mode-specific classification**:

| Mode / Phase | Typical Question Type |
|--------------|----------------------|
| Mode A Phase 1 | Knowledge Organization + Decision Support |
| Mode A Phase 2 | Decision Support |
| Mode B | Problem Diagnosis + Decision Support |

### Step 0.5: Novelty Sensitivity Assessment (BLOCKING)

Before starting research, assess the novelty sensitivity of the question (Critical/High/Medium/Low). This determines source time windows and filtering strategy.

**For full classification table, critical-domain rules, trigger words, and assessment template**: Read `references/novelty-sensitivity.md`

Key principle: Critical-sensitivity topics (AI/LLMs, blockchain) require sources within 6 months, mandatory version annotations, cross-validation from 2+ sources, and direct verification of official download pages.

**Save action**: Append timeliness assessment to the end of `00_question_decomposition.md`

---

### Step 1: Question Decomposition & Boundary Definition

**Mode-specific sub-questions**:

**Mode A Phase 2** (Initial Research — Problem & Solution):
- "What existing/competitor solutions address this problem?"
- "What are the component parts of this problem?"
- "For each component, what are the state-of-the-art solutions?"
- "What are the security considerations per component?"
- "What are the cost implications of each approach?"

**Mode B** (Solution Assessment):
- "What are the weak points and potential problems in the existing draft?"
- "What are the security vulnerabilities in the proposed architecture?"
- "Where are the performance bottlenecks?"
- "What solutions exist for each identified issue?"

**General sub-question patterns** (use when applicable):
- **Sub-question A**: "What is X and how does it work?" (Definition & mechanism)
- **Sub-question B**: "What are the dimensions of relationship/difference between X and Y?" (Comparative analysis)
- **Sub-question C**: "In what scenarios is X applicable/inapplicable?" (Boundary conditions)
- **Sub-question D**: "What are X's development trends/best practices?" (Extended analysis)

#### Perspective Rotation (MANDATORY)

For each research problem, examine it from **at least 3 different perspectives**. Each perspective generates its own sub-questions and search queries.

| Perspective | What it asks | Example queries |
|-------------|-------------|-----------------|
| **End-user / Consumer** | What problems do real users encounter? What do they wish were different? | "X problems", "X frustrations reddit", "X user complaints" |
| **Implementer / Engineer** | What are the technical challenges, gotchas, hidden complexities? | "X implementation challenges", "X pitfalls", "X lessons learned" |
| **Business / Decision-maker** | What are the costs, ROI, strategic implications? | "X total cost of ownership", "X ROI case study", "X vs Y business comparison" |
| **Contrarian / Devil's advocate** | What could go wrong? Why might this fail? What are critics saying? | "X criticism", "why not X", "X failures", "X disadvantages real world" |
| **Domain expert / Academic** | What does peer-reviewed research say? What are theoretical limits? | "X research paper", "X systematic review", "X benchmarks academic" |
| **Practitioner / Field** | What do people who actually use this daily say? What works in practice vs theory? | "X in production", "X experience report", "X after 1 year" |

Select at least 3 perspectives relevant to the problem. Document the chosen perspectives in `00_question_decomposition.md`.

#### Question Explosion (MANDATORY)

For **each sub-question**, generate **at least 3-5 search query variants** before searching. This ensures broad coverage and avoids missing relevant information due to terminology differences.

**Query variant strategies**:
- **Specificity ladder**: broad ("indoor navigation systems") → narrow ("UWB-based indoor drone navigation accuracy")
- **Negation/failure**: "X limitations", "X failure modes", "when X doesn't work"
- **Comparison framing**: "X vs Y for Z", "X alternative for Z", "X or Y which is better for Z"
- **Practitioner voice**: "X in production experience", "X real-world results", "X lessons learned"
- **Temporal**: "X 2025", "X latest developments", "X roadmap"
- **Geographic/domain**: "X in Europe", "X for defense applications", "X in agriculture"

Record all planned queries in `00_question_decomposition.md` alongside each sub-question.

**Research Subject Boundary Definition (BLOCKING - must be explicit)**:

When decomposing questions, you must explicitly define the **boundaries of the research subject**:

| Dimension | Boundary to define | Example |
|-----------|--------------------|---------|
| **Population** | Which group is being studied? | University students vs K-12 vs vocational students vs all students |
| **Geography** | Which region is being studied? | Chinese universities vs US universities vs global |
| **Timeframe** | Which period is being studied? | Post-2020 vs full historical picture |
| **Level** | Which level is being studied? | Undergraduate vs graduate vs vocational |

**Common mistake**: User asks about "university classroom issues" but sources include policies targeting "K-12 students" — mismatched target populations will invalidate the entire research.

**Save action**:
1. Read all files from INPUT_DIR to ground the research in the project context
2. Create working directory `RESEARCH_DIR/`
3. Write `00_question_decomposition.md`, including:
   - Original question
   - Active mode (A Phase 2 or B) and rationale
   - Summary of relevant problem context from INPUT_DIR
   - Classified question type and rationale
   - **Research subject boundary definition** (population, geography, timeframe, level)
   - List of decomposed sub-questions
   - **Chosen perspectives** (at least 3 from the Perspective Rotation table) with rationale
   - **Search query variants** for each sub-question (at least 3-5 per sub-question)
4. Write TodoWrite to track progress

---

### Step 2: Source Tiering & Exhaustive Web Investigation

Tier sources by authority, **prioritize primary sources** (L1 > L2 > L3 > L4). Conclusions must be traceable to L1/L2; L3/L4 serve as supplementary and validation.

**For full tier definitions, search strategies, community mining steps, and source registry templates**: Read `references/source-tiering.md`

**Tool Usage**:
- Use `WebSearch` for broad searches; `WebFetch` to read specific pages
- Use the `context7` MCP server (`resolve-library-id` then `get-library-docs`) for up-to-date library/framework documentation
- Always cross-verify training data claims against live sources for facts that may have changed (versions, APIs, deprecations, security advisories)
- When citing web sources, include the URL and date accessed

#### Exhaustive Search Requirements (MANDATORY)

Do not stop at the first few results. The goal is to build a comprehensive evidence base.

**Minimum search effort per sub-question**:
- Execute **all** query variants generated in Step 1's Question Explosion (at least 3-5 per sub-question)
- Consult at least **2 different source tiers** per sub-question (e.g., L1 official docs + L4 community discussion)
- If initial searches yield fewer than 3 relevant sources for a sub-question, **broaden the search** with alternative terms, related domains, or analogous problems

**Search broadening strategies** (use when results are thin):
- Try adjacent fields: if researching "drone indoor navigation", also search "robot indoor navigation", "warehouse AGV navigation"
- Try different communities: academic papers, industry whitepapers, military/defense publications, hobbyist forums
- Try different geographies: search in English + search for European/Asian approaches if relevant
- Try historical evolution: "history of X", "evolution of X approaches", "X state of the art 2024 2025"
- Try failure analysis: "X project failure", "X post-mortem", "X recall", "X incident report"

**Search saturation rule**: Continue searching until new queries stop producing substantially new information. If the last 3 searches only repeat previously found facts, the sub-question is saturated.

**Save action**:
For each source consulted, **immediately** append to `01_source_registry.md` using the entry template from `references/source-tiering.md`.

---

### Step 3: Fact Extraction & Evidence Cards

Transform sources into **verifiable fact cards**:

```markdown
## Fact Cards

### Fact 1
- **Statement**: [specific fact description]
- **Source**: [link/document section]
- **Confidence**: High/Medium/Low

### Fact 2
...
```

**Key discipline**:
- Pin down facts first, then reason
- Distinguish "what officials said" from "what I infer"
- When conflicting information is found, annotate and preserve both sides
- Annotate confidence level:
  - ✅ High: Explicitly stated in official documentation
  - ⚠️ Medium: Mentioned in official blog but not formally documented
  - ❓ Low: Inference or from unofficial sources

**Save action**:
For each extracted fact, **immediately** append to `02_fact_cards.md`:
```markdown
## Fact #[number]
- **Statement**: [specific fact description]
- **Source**: [Source #number] [link]
- **Phase**: [Phase 1 / Phase 2 / Assessment]
- **Target Audience**: [which group this fact applies to, inherited from source or further refined]
- **Confidence**: ✅/⚠️/❓
- **Related Dimension**: [corresponding comparison dimension]
```

**Target audience in fact statements**:
- If a fact comes from a "partially overlapping" or "reference only" source, the statement **must explicitly annotate the applicable scope**
- Wrong: "The Ministry of Education banned phones in classrooms" (doesn't specify who)
- Correct: "The Ministry of Education banned K-12 students from bringing phones into classrooms (does not apply to university students)"

---

### Step 3.5: Iterative Deepening — Follow-Up Investigation

After initial fact extraction, review what you have found and identify **knowledge gaps and new questions** that emerged from the initial research. This step ensures the research doesn't stop at surface-level findings.

**Process**:

1. **Gap analysis**: Review fact cards and identify:
   - Sub-questions with fewer than 3 high-confidence facts → need more searching
   - Contradictions between sources → need tie-breaking evidence
   - Perspectives (from Step 1) that have no or weak coverage → need targeted search
   - Claims that rely only on L3/L4 sources → need L1/L2 verification

2. **Follow-up question generation**: Based on initial findings, generate new questions:
   - "Source X claims [fact] — is this consistent with other evidence?"
   - "If [approach A] has [limitation], how do practitioners work around it?"
   - "What are the second-order effects of [finding]?"
   - "Who disagrees with [common finding] and why?"
   - "What happened when [solution] was deployed at scale?"

3. **Targeted deep-dive searches**: Execute follow-up searches focusing on:
   - Specific claims that need verification
   - Alternative viewpoints not yet represented
   - Real-world case studies and experience reports
   - Failure cases and edge conditions
   - Recent developments that may change the picture

4. **Update artifacts**: Append new sources to `01_source_registry.md`, new facts to `02_fact_cards.md`

**Exit criteria**: Proceed to Step 4 when:
- Every sub-question has at least 3 facts with at least one from L1/L2
- At least 3 perspectives from Step 1 have supporting evidence
- No unresolved contradictions remain (or they are explicitly documented as open questions)
- Follow-up searches are no longer producing new substantive information