Update demo replay validation and testing documentation
ci/woodpecker/push/02-build-push Pipeline failed

- Modified the autodev state to reflect the current testing phase and details of the new `jetson-e2e` tests.
- Enhanced the "How to Test" documentation to provide clearer instructions on the demo replay validation process, including video and tlog alignment steps.
- Updated architectural documentation to include the new demo replay operator flow and its dependencies.
- Documented the removal of deprecated auto-sync features and clarified the operator-facing UI for replay validation.
- Added new entries in the dependencies table for upcoming tasks related to the demo replay flow.

These changes improve clarity and usability for operators and developers working with the demo replay system.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-06-20 11:24:43 +03:00
parent 12d0008763
commit 1f634c2604
175 changed files with 20701 additions and 41 deletions
@@ -0,0 +1,60 @@
# Phase 4: Test Runner Script Generation
**Skip condition**: If this skill was invoked from the `/plan` skill (planning context, no code exists yet), skip Phase 4 entirely. Script creation should instead be planned as a task during decompose — the decomposer creates a task for creating these scripts. Phase 4 only runs when invoked from the existing-code flow (where source code already exists) or standalone.
**Role**: DevOps engineer
**Goal**: Generate executable shell scripts that run the specified tests, so autodev and CI can invoke them consistently.
**Constraints**: Scripts must be idempotent, portable across dev/CI, and exit with non-zero on failure. Respect the Hardware-Dependency Assessment decision recorded in `environment.md`.
**Prerequisite**: `phases/hardware-assessment.md` must have completed and written the "Test Execution" section to `TESTS_OUTPUT_DIR/environment.md`.
## Step 1 — Detect test infrastructure
1. Identify the project's test runner from manifests and config files:
- Python: `pytest` (`pyproject.toml`, `setup.cfg`, `pytest.ini`)
- .NET: `dotnet test` (`*.csproj`, `*.sln`)
- Rust: `cargo test` (`Cargo.toml`)
- Node: `npm test` or `vitest` / `jest` (`package.json`)
2. Check the Hardware-Dependency Assessment result recorded in `environment.md`:
- If **local execution** was chosen → do NOT generate docker-compose test files; scripts run directly on host
- If **Docker execution** was chosen → identify/generate docker-compose files for integration/blackbox tests
- If **both** was chosen → generate both
3. Identify performance/load testing tools from dependencies (`k6`, `locust`, `artillery`, `wrk`, or built-in benchmarks)
4. Read `TESTS_OUTPUT_DIR/environment.md` for infrastructure requirements
## Step 2 — Generate test runner
**Docker is the default.** Only generate a local `scripts/run-tests.sh` if the Hardware-Dependency Assessment determined **local** or **both** execution (i.e., the project requires real hardware like GPU/CoreML/TPU/sensors). For all other projects, use `docker-compose.test.yml` — it provides reproducibility, isolation, and CI parity without a custom shell script.
**If local script is needed** — create `scripts/run-tests.sh` at the project root using `.cursor/skills/test-spec/templates/run-tests-script.md` as structural guidance. The script must:
1. Set `set -euo pipefail` and trap cleanup on EXIT
2. **Install all project and test dependencies** (e.g. `pip install -q -r requirements.txt -r e2e/requirements.txt`, `dotnet restore`, `npm ci`). This prevents collection-time import errors on fresh environments.
3. Optionally accept a `--unit-only` flag to skip blackbox tests
4. Run unit/blackbox tests using the detected test runner (activate virtualenv if present, run test runner directly on host)
5. Print a summary of passed/failed/skipped tests
6. Exit 0 on all pass, exit 1 on any failure
**If Docker** — generate or update `docker-compose.test.yml` that builds the test image, installs all dependencies inside the container, runs the test suite, and exits with the test runner's exit code.
## Step 3 — Generate `scripts/run-performance-tests.sh`
Create `scripts/run-performance-tests.sh` at the project root. The script must:
1. Set `set -euo pipefail` and trap cleanup on EXIT
2. Read thresholds from `_docs/02_document/tests/performance-tests.md` (or accept as CLI args)
3. Start the system under test (local or docker-compose, matching the Hardware-Dependency Assessment decision)
4. Run load/performance scenarios using the detected tool
5. Compare results against threshold values from the test spec
6. Print a pass/fail summary per scenario
7. Exit 0 if all thresholds met, exit 1 otherwise
## Step 4 — Verify scripts
1. Verify both scripts are syntactically valid (`bash -n scripts/run-tests.sh`)
2. Mark both scripts as executable (`chmod +x`)
3. Present a summary of what each script does to the user
## Save action
Write `scripts/run-tests.sh` and `scripts/run-performance-tests.sh` to the project root.