mirror of
https://github.com/azaion/autopilot.git
synced 2026-04-22 22:06:34 +00:00
Refine coding standards and testing rules
- Updated coding rules to emphasize readability, meaningful comments, and maintainability. - Adjusted test coverage thresholds to 75% for business logic and clarified expectations for test scenarios. - Enhanced guidelines for handling skipped tests, emphasizing the need for investigation and resolution. - Introduced a completeness audit for decomposition in research steps to ensure thoroughness in addressing problem dimensions. Made-with: Cursor
This commit is contained in:
+22
-10
@@ -5,7 +5,7 @@ alwaysApply: true
|
||||
# Agent Meta Rules
|
||||
|
||||
## Execution Safety
|
||||
- Never run test suites, builds, Docker commands, or other long-running/resource-heavy/security-risky operations without asking the user first — unless it is explicitly stated in a skill or agent, or the user already asked to do so.
|
||||
- Run the full test suite automatically when you believe code changes are complete (as required by coderule.mdc). For other long-running/resource-heavy/security-risky operations (builds, Docker commands, deployments, performance tests), ask the user first — unless explicitly stated in a skill or the user already asked to do so.
|
||||
|
||||
## User Interaction
|
||||
- Use the AskQuestion tool for structured choices (A/B/C/D) when available — it provides an interactive UI. Fall back to plain-text questions if the tool is unavailable.
|
||||
@@ -33,18 +33,30 @@ When the user reacts negatively to generated code ("WTF", "what the hell", "why
|
||||
- "Before writing new infrastructure or workaround code, check how the existing codebase already handles the same concern. Follow established project patterns."
|
||||
|
||||
## Debugging Over Contemplation
|
||||
When the root cause of a bug is not clear after ~5 minutes of reasoning, analysis, and assumption-making — **stop speculating and add debugging logs**. Observe actual runtime behavior before forming another theory. The pattern to follow:
|
||||
|
||||
Agents cannot measure wall-clock time between turns. Use observable counts from your own transcript instead.
|
||||
|
||||
**Trigger: stop speculating and instrument.** When you've formed **3 or more distinct hypotheses** about a bug without confirming any against runtime evidence (logs, stderr, debugger state, actual test failure messages) — stop and add debugging output. Re-reading the same code hoping to "spot it this time" counts as a new hypothesis that still has zero evidence.
|
||||
|
||||
Steps:
|
||||
1. Identify the last known-good boundary (e.g., "request enters handler") and the known-bad result (e.g., "callback never fires").
|
||||
2. Add targeted `print(..., flush=True)` or log statements at each intermediate step to narrow the gap.
|
||||
3. Read the output. Let evidence drive the next step — not inference chains built on unverified assumptions.
|
||||
2. Add targeted `print(..., flush=True)`, `console.error`, or logger statements at each intermediate step to narrow the gap.
|
||||
3. Run the instrumented code. Read the output. Let evidence drive the next hypothesis — not inference chains.
|
||||
|
||||
Prolonged mental contemplation without evidence is a time sink. A 15-minute instrumented run beats 45 minutes of "could it be X? but then Y... unless Z..." reasoning.
|
||||
An instrumented run producing real output beats any amount of "could it be X? but then Y..." reasoning.
|
||||
|
||||
## Long Investigation Retrospective
|
||||
When a problem takes significantly longer than expected (>30 minutes), perform a post-mortem before closing out:
|
||||
|
||||
1. **Identify the bottleneck**: Was the delay caused by assumptions that turned out wrong? Missing visibility into runtime state? Incorrect mental model of a framework or language boundary?
|
||||
2. **Extract the general lesson**: What category of mistake was this? (e.g., "Python cannot call Cython `cdef` methods", "engine errors silently swallowed", "wrong layer to fix the problem")
|
||||
3. **Propose a preventive rule**: Formulate it as a short, actionable statement. Present it to the user for approval.
|
||||
4. **Write it down**: Add the approved rule to the appropriate `.mdc` file so it applies to all future sessions.
|
||||
Trigger a post-mortem when ANY of the following is true (all are observable in your own transcript):
|
||||
|
||||
- **10+ tool calls** were used to diagnose a single issue
|
||||
- **Same file modified 3+ times** without tests going green
|
||||
- **3+ distinct approaches** attempted before arriving at the fix
|
||||
- Any phrase like "let me try X instead" appeared **more than twice**
|
||||
- A fix was eventually found by reading docs/source the agent had dismissed earlier
|
||||
|
||||
Post-mortem steps:
|
||||
1. **Identify the bottleneck**: wrong assumption? missing runtime visibility? incorrect mental model of a framework/language boundary? ignored evidence?
|
||||
2. **Extract the general lesson**: what category of mistake was this? (e.g., "Python cannot call Cython `cdef` methods", "engine errors silently swallowed", "wrong layer to fix the problem")
|
||||
3. **Propose a preventive rule**: short, actionable. Present to user for approval.
|
||||
4. **Write it down**: add approved rule to the appropriate `.mdc` so it applies to future sessions.
|
||||
|
||||
Reference in New Issue
Block a user