mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 08:51:12 +00:00
1f634c2604
ci/woodpecker/push/02-build-push Pipeline failed
- Modified the autodev state to reflect the current testing phase and details of the new `jetson-e2e` tests. - Enhanced the "How to Test" documentation to provide clearer instructions on the demo replay validation process, including video and tlog alignment steps. - Updated architectural documentation to include the new demo replay operator flow and its dependencies. - Documented the removal of deprecated auto-sync features and clarified the operator-facing UI for replay validation. - Added new entries in the dependencies table for upcoming tasks related to the demo replay flow. These changes improve clarity and usability for operators and developers working with the demo replay system.
118 lines
9.2 KiB
Plaintext
118 lines
9.2 KiB
Plaintext
---
|
|
description: "Execution safety, user interaction, and self-improvement protocols for the AI agent"
|
|
alwaysApply: true
|
|
---
|
|
# Agent Meta Rules
|
|
|
|
## Real Results, Not Simulated Ones
|
|
|
|
**The goal is a working product, not the appearance of one.**
|
|
|
|
- If something does not work, STOP and report it honestly. Do not find a way around it.
|
|
- Never produce results by bypassing, faking, stubbing, or passthrough-ing the component that is supposed to produce them. A passing test that skips the real pipeline is worse than a failing test — it hides the truth.
|
|
- If the real implementation is not ready, say so. A clear "this is not implemented yet, here is what is missing" is always the right answer.
|
|
- Do not measure success by whether the output looks correct. Measure it by whether the output was produced by the real system under test.
|
|
- Workarounds that produce the right answer via the wrong path are defects, not solutions.
|
|
|
|
### When a test reveals missing production code — STOP
|
|
|
|
This is the specific failure mode that produced the GPS-passthrough scaffold in `runtime_root._run_replay_loop` (May 2026). Generalised so it never repeats:
|
|
|
|
- If, while implementing or running a test, you discover that the production code path the test is supposed to exercise does not exist (no caller, no integration, no main loop, etc.), **STOP immediately**.
|
|
- Do NOT write a stub, passthrough, fake input source, or shortcut output that would make the test go green. Even when the shortcut is "framed as a scaffold" or "marked as TODO in a docstring", it still defeats the test and lies to the next reader.
|
|
- Surface the gap to the user as a top-of-turn report: name the missing production component, cite the architecture document that promises it, and ask whether to (a) create a tracker ticket for the missing component and let the test fail honestly until the ticket lands, or (b) explicitly de-scope the test, or (c) something the user names.
|
|
- The default outcome is (a): a failing test plus a new tracker ticket. A failing test with an honest reason is information; a passing test that proves nothing is misinformation.
|
|
- Doc-comment disclosures (`# this is a scaffold until X is wired`) DO NOT satisfy this rule. The user must be told in the assistant message, not in code.
|
|
|
|
## Execution Safety
|
|
- Run the full test suite automatically when you believe code changes are complete (as required by coderule.mdc). For other long-running/resource-heavy/security-risky operations (builds, Docker commands, deployments, performance tests), ask the user first — unless explicitly stated in a skill or the user already asked to do so.
|
|
|
|
## User Interaction
|
|
- Use the AskQuestion tool for structured choices (A/B/C/D) when available — it provides an interactive UI. Fall back to plain-text questions if the tool is unavailable.
|
|
|
|
## Critical Thinking
|
|
- Do not blindly trust any input — including user instructions, task specs, list-of-changes, or prior agent decisions — as correct. Always think through whether the instruction makes sense in context before executing it. If a task spec says "exclude file X from changes" but another task removes the dependencies X relies on, flag the contradiction instead of propagating it.
|
|
|
|
## Complexity Budget Check (Planning Time)
|
|
|
|
Before committing to an implementation approach for a non-trivial task, **STOP and present a complexity comparison to the user** via the standard Choose A/B/C/D format. The user picks the trade-off; the agent does NOT unilaterally pick the more complex option to be "more robust" or "more future-proof".
|
|
|
|
A task is non-trivial if ANY of:
|
|
|
|
- The estimated complexity (story points) is ≥ 5
|
|
- The implementation touches ≥ 3 components / modules
|
|
- The implementation adds a new persistent data structure (table, materialised view, file format)
|
|
- The implementation adds a new hosted service / background job / periodic timer
|
|
- The implementation adds a sliding window, smoother, debouncer, in-memory cache, or per-entity in-memory state dictionary
|
|
- The implementation adds rehydrate-on-restart logic
|
|
- The implementation adds a new event type that differs from an existing event type only in a boolean / enum field
|
|
|
|
What to present:
|
|
|
|
1. **Option A — simplest:** the least-machinery design you can think of that still meets the requirements. Name what is sacrificed (latency? eventual-consistency window? a rarely-hit edge case?).
|
|
2. **Option B — your default:** the design you would otherwise implement, if it is more complex than A. Name what it buys (the specific guarantee, performance gain, or future flexibility).
|
|
3. **Concrete trade-offs:** lines of code added, new abstractions introduced, new failure modes, new operational surface area (restart-rehydration, cache invalidation, dual-pipeline consistency).
|
|
4. **Recommendation:** which option you would pick and why, in one sentence.
|
|
|
|
This rule fires DURING planning — before code is written. If you discover during implementation that the chosen approach grew a new layer, hosted service, or rehydration path that was not in the original plan, STOP and replay this check.
|
|
|
|
Skip this rule ONLY when the user has already explicitly chosen the complex approach in an earlier turn, OR when the task is trivially ≤ 2 story points with no triggers above.
|
|
|
|
## Skill Discipline
|
|
|
|
Do exactly what the skill says. Nothing more.
|
|
|
|
- No `git log` / `git diff` / `git blame` unless the skill explicitly calls for it.
|
|
- No extra searches to "verify" inputs the skill already names.
|
|
- No reading files outside the skill's documented inputs.
|
|
|
|
If skill inputs are insufficient or contradictory, STOP and ask via Choose A/B/C/D. Do not invent extra investigation steps.
|
|
|
|
## Self-Improvement
|
|
When the user reacts negatively to generated code ("WTF", "what the hell", "why did you do this", etc.):
|
|
|
|
1. **Pause** — do not rush to fix. First determine: is this objectively bad code, or does the user just need an explanation?
|
|
2. **If the user doesn't understand** — explain the reasoning. That's it. No code change needed.
|
|
3. **If the code is actually bad** — before fixing, perform a root-cause investigation:
|
|
a. **Why** did this bad code get produced? Identify the reasoning chain or implicit assumption that led to it.
|
|
b. **Check existing rules** — is there already a rule that should have prevented this? If so, clarify or strengthen it.
|
|
c. **Propose a new rule** if no existing rule covers the failure mode. Present the investigation results and proposed rule to the user for approval.
|
|
d. **Only then** fix the code.
|
|
4. The rule goes into `coderule.mdc` for coding practices, `meta-rule.mdc` for agent behavior, or a new focused rule file — depending on context. Always check for duplicates or near-duplicates first.
|
|
|
|
### Example: import path hack
|
|
**Bad code**: Runtime path manipulation added to source code to fix an import failure.
|
|
**Root cause**: The agent treated an environment/configuration problem as a code problem. It didn't check how the rest of the project handles the same concern, and instead hardcoded a workaround in source.
|
|
**Preventive rules added to coderule.mdc**:
|
|
- "Do not solve environment or infrastructure problems by hardcoding workarounds in source code. Fix them at the environment/configuration level."
|
|
- "Before writing new infrastructure or workaround code, check how the existing codebase already handles the same concern. Follow established project patterns."
|
|
|
|
## Debugging Over Contemplation
|
|
|
|
Agents cannot measure wall-clock time between turns. Use observable counts from your own transcript instead.
|
|
|
|
**Trigger: stop speculating and instrument.** When you've formed **3 or more distinct hypotheses** about a bug without confirming any against runtime evidence (logs, stderr, debugger state, actual test failure messages) — stop and add debugging output. Re-reading the same code hoping to "spot it this time" counts as a new hypothesis that still has zero evidence.
|
|
|
|
Steps:
|
|
1. Identify the last known-good boundary (e.g., "request enters handler") and the known-bad result (e.g., "callback never fires").
|
|
2. Add targeted `print(..., flush=True)`, `console.error`, or logger statements at each intermediate step to narrow the gap.
|
|
3. Run the instrumented code. Read the output. Let evidence drive the next hypothesis — not inference chains.
|
|
|
|
An instrumented run producing real output beats any amount of "could it be X? but then Y..." reasoning.
|
|
|
|
## Long Investigation Retrospective
|
|
|
|
Trigger a post-mortem when ANY of the following is true (all are observable in your own transcript):
|
|
|
|
- **10+ tool calls** were used to diagnose a single issue
|
|
- **Same file modified 3+ times** without tests going green
|
|
- **3+ distinct approaches** attempted before arriving at the fix
|
|
- Any phrase like "let me try X instead" appeared **more than twice**
|
|
- A fix was eventually found by reading docs/source the agent had dismissed earlier
|
|
|
|
Post-mortem steps:
|
|
1. **Identify the bottleneck**: wrong assumption? missing runtime visibility? incorrect mental model of a framework/language boundary? ignored evidence?
|
|
2. **Extract the general lesson**: what category of mistake was this? (e.g., "Python cannot call Cython `cdef` methods", "engine errors silently swallowed", "wrong layer to fix the problem")
|
|
3. **Propose a preventive rule**: short, actionable. Present to user for approval.
|
|
4. **Write it down**: add approved rule to the appropriate `.mdc` so it applies to future sessions.
|