Files
loader/.cursor/skills/research/steps/03_engine-investigation.md
T
Oleksandr Bezdieniezhnykh b0a03d36d6 Add .cursor AI autodevelopment harness (agents, skills, rules)
Made-with: Cursor
2026-03-26 01:06:55 +02:00

12 KiB
Raw Blame History

Research Engine — Investigation Phase (Steps 03.5)

Step 0: Question Type Classification

First, classify the research question type and select the corresponding strategy:

Question Type Core Task Focus Dimensions
Concept Comparison Build comparison framework Mechanism differences, applicability boundaries
Decision Support Weigh trade-offs Cost, risk, benefit
Trend Analysis Map evolution trajectory History, driving factors, predictions
Problem Diagnosis Root cause analysis Symptoms, causes, evidence chain
Knowledge Organization Systematic structuring Definitions, classifications, relationships

Mode-specific classification:

Mode / Phase Typical Question Type
Mode A Phase 1 Knowledge Organization + Decision Support
Mode A Phase 2 Decision Support
Mode B Problem Diagnosis + Decision Support

Step 0.5: Novelty Sensitivity Assessment (BLOCKING)

Before starting research, assess the novelty sensitivity of the question (Critical/High/Medium/Low). This determines source time windows and filtering strategy.

For full classification table, critical-domain rules, trigger words, and assessment template: Read references/novelty-sensitivity.md

Key principle: Critical-sensitivity topics (AI/LLMs, blockchain) require sources within 6 months, mandatory version annotations, cross-validation from 2+ sources, and direct verification of official download pages.

Save action: Append timeliness assessment to the end of 00_question_decomposition.md


Step 1: Question Decomposition & Boundary Definition

Mode-specific sub-questions:

Mode A Phase 2 (Initial Research — Problem & Solution):

  • "What existing/competitor solutions address this problem?"
  • "What are the component parts of this problem?"
  • "For each component, what are the state-of-the-art solutions?"
  • "What are the security considerations per component?"
  • "What are the cost implications of each approach?"

Mode B (Solution Assessment):

  • "What are the weak points and potential problems in the existing draft?"
  • "What are the security vulnerabilities in the proposed architecture?"
  • "Where are the performance bottlenecks?"
  • "What solutions exist for each identified issue?"

General sub-question patterns (use when applicable):

  • Sub-question A: "What is X and how does it work?" (Definition & mechanism)
  • Sub-question B: "What are the dimensions of relationship/difference between X and Y?" (Comparative analysis)
  • Sub-question C: "In what scenarios is X applicable/inapplicable?" (Boundary conditions)
  • Sub-question D: "What are X's development trends/best practices?" (Extended analysis)

Perspective Rotation (MANDATORY)

For each research problem, examine it from at least 3 different perspectives. Each perspective generates its own sub-questions and search queries.

Perspective What it asks Example queries
End-user / Consumer What problems do real users encounter? What do they wish were different? "X problems", "X frustrations reddit", "X user complaints"
Implementer / Engineer What are the technical challenges, gotchas, hidden complexities? "X implementation challenges", "X pitfalls", "X lessons learned"
Business / Decision-maker What are the costs, ROI, strategic implications? "X total cost of ownership", "X ROI case study", "X vs Y business comparison"
Contrarian / Devil's advocate What could go wrong? Why might this fail? What are critics saying? "X criticism", "why not X", "X failures", "X disadvantages real world"
Domain expert / Academic What does peer-reviewed research say? What are theoretical limits? "X research paper", "X systematic review", "X benchmarks academic"
Practitioner / Field What do people who actually use this daily say? What works in practice vs theory? "X in production", "X experience report", "X after 1 year"

Select at least 3 perspectives relevant to the problem. Document the chosen perspectives in 00_question_decomposition.md.

Question Explosion (MANDATORY)

For each sub-question, generate at least 3-5 search query variants before searching. This ensures broad coverage and avoids missing relevant information due to terminology differences.

Query variant strategies:

  • Specificity ladder: broad ("indoor navigation systems") → narrow ("UWB-based indoor drone navigation accuracy")
  • Negation/failure: "X limitations", "X failure modes", "when X doesn't work"
  • Comparison framing: "X vs Y for Z", "X alternative for Z", "X or Y which is better for Z"
  • Practitioner voice: "X in production experience", "X real-world results", "X lessons learned"
  • Temporal: "X 2025", "X latest developments", "X roadmap"
  • Geographic/domain: "X in Europe", "X for defense applications", "X in agriculture"

Record all planned queries in 00_question_decomposition.md alongside each sub-question.

Research Subject Boundary Definition (BLOCKING - must be explicit):

When decomposing questions, you must explicitly define the boundaries of the research subject:

Dimension Boundary to define Example
Population Which group is being studied? University students vs K-12 vs vocational students vs all students
Geography Which region is being studied? Chinese universities vs US universities vs global
Timeframe Which period is being studied? Post-2020 vs full historical picture
Level Which level is being studied? Undergraduate vs graduate vs vocational

Common mistake: User asks about "university classroom issues" but sources include policies targeting "K-12 students" — mismatched target populations will invalidate the entire research.

Save action:

  1. Read all files from INPUT_DIR to ground the research in the project context
  2. Create working directory RESEARCH_DIR/
  3. Write 00_question_decomposition.md, including:
    • Original question
    • Active mode (A Phase 2 or B) and rationale
    • Summary of relevant problem context from INPUT_DIR
    • Classified question type and rationale
    • Research subject boundary definition (population, geography, timeframe, level)
    • List of decomposed sub-questions
    • Chosen perspectives (at least 3 from the Perspective Rotation table) with rationale
    • Search query variants for each sub-question (at least 3-5 per sub-question)
  4. Write TodoWrite to track progress

Step 2: Source Tiering & Exhaustive Web Investigation

Tier sources by authority, prioritize primary sources (L1 > L2 > L3 > L4). Conclusions must be traceable to L1/L2; L3/L4 serve as supplementary and validation.

For full tier definitions, search strategies, community mining steps, and source registry templates: Read references/source-tiering.md

Tool Usage:

  • Use WebSearch for broad searches; WebFetch to read specific pages
  • Use the context7 MCP server (resolve-library-id then get-library-docs) for up-to-date library/framework documentation
  • Always cross-verify training data claims against live sources for facts that may have changed (versions, APIs, deprecations, security advisories)
  • When citing web sources, include the URL and date accessed

Exhaustive Search Requirements (MANDATORY)

Do not stop at the first few results. The goal is to build a comprehensive evidence base.

Minimum search effort per sub-question:

  • Execute all query variants generated in Step 1's Question Explosion (at least 3-5 per sub-question)
  • Consult at least 2 different source tiers per sub-question (e.g., L1 official docs + L4 community discussion)
  • If initial searches yield fewer than 3 relevant sources for a sub-question, broaden the search with alternative terms, related domains, or analogous problems

Search broadening strategies (use when results are thin):

  • Try adjacent fields: if researching "drone indoor navigation", also search "robot indoor navigation", "warehouse AGV navigation"
  • Try different communities: academic papers, industry whitepapers, military/defense publications, hobbyist forums
  • Try different geographies: search in English + search for European/Asian approaches if relevant
  • Try historical evolution: "history of X", "evolution of X approaches", "X state of the art 2024 2025"
  • Try failure analysis: "X project failure", "X post-mortem", "X recall", "X incident report"

Search saturation rule: Continue searching until new queries stop producing substantially new information. If the last 3 searches only repeat previously found facts, the sub-question is saturated.

Save action: For each source consulted, immediately append to 01_source_registry.md using the entry template from references/source-tiering.md.


Step 3: Fact Extraction & Evidence Cards

Transform sources into verifiable fact cards:

## Fact Cards

### Fact 1
- **Statement**: [specific fact description]
- **Source**: [link/document section]
- **Confidence**: High/Medium/Low

### Fact 2
...

Key discipline:

  • Pin down facts first, then reason
  • Distinguish "what officials said" from "what I infer"
  • When conflicting information is found, annotate and preserve both sides
  • Annotate confidence level:
    • High: Explicitly stated in official documentation
    • ⚠️ Medium: Mentioned in official blog but not formally documented
    • Low: Inference or from unofficial sources

Save action: For each extracted fact, immediately append to 02_fact_cards.md:

## Fact #[number]
- **Statement**: [specific fact description]
- **Source**: [Source #number] [link]
- **Phase**: [Phase 1 / Phase 2 / Assessment]
- **Target Audience**: [which group this fact applies to, inherited from source or further refined]
- **Confidence**: ✅/⚠️/❓
- **Related Dimension**: [corresponding comparison dimension]

Target audience in fact statements:

  • If a fact comes from a "partially overlapping" or "reference only" source, the statement must explicitly annotate the applicable scope
  • Wrong: "The Ministry of Education banned phones in classrooms" (doesn't specify who)
  • Correct: "The Ministry of Education banned K-12 students from bringing phones into classrooms (does not apply to university students)"

Step 3.5: Iterative Deepening — Follow-Up Investigation

After initial fact extraction, review what you have found and identify knowledge gaps and new questions that emerged from the initial research. This step ensures the research doesn't stop at surface-level findings.

Process:

  1. Gap analysis: Review fact cards and identify:

    • Sub-questions with fewer than 3 high-confidence facts → need more searching
    • Contradictions between sources → need tie-breaking evidence
    • Perspectives (from Step 1) that have no or weak coverage → need targeted search
    • Claims that rely only on L3/L4 sources → need L1/L2 verification
  2. Follow-up question generation: Based on initial findings, generate new questions:

    • "Source X claims [fact] — is this consistent with other evidence?"
    • "If [approach A] has [limitation], how do practitioners work around it?"
    • "What are the second-order effects of [finding]?"
    • "Who disagrees with [common finding] and why?"
    • "What happened when [solution] was deployed at scale?"
  3. Targeted deep-dive searches: Execute follow-up searches focusing on:

    • Specific claims that need verification
    • Alternative viewpoints not yet represented
    • Real-world case studies and experience reports
    • Failure cases and edge conditions
    • Recent developments that may change the picture
  4. Update artifacts: Append new sources to 01_source_registry.md, new facts to 02_fact_cards.md

Exit criteria: Proceed to Step 4 when:

  • Every sub-question has at least 3 facts with at least one from L1/L2
  • At least 3 perspectives from Step 1 have supporting evidence
  • No unresolved contradictions remain (or they are explicitly documented as open questions)
  • Follow-up searches are no longer producing new substantive information