Files
gps-denied-onboard/.cursor/skills/test-spec/phases/03-data-validation-gate.md
T
Oleksandr Bezdieniezhnykh 1f634c2604
ci/woodpecker/push/02-build-push Pipeline failed
Update demo replay validation and testing documentation
- Modified the autodev state to reflect the current testing phase and details of the new `jetson-e2e` tests.
- Enhanced the "How to Test" documentation to provide clearer instructions on the demo replay validation process, including video and tlog alignment steps.
- Updated architectural documentation to include the new demo replay operator flow and its dependencies.
- Documented the removal of deprecated auto-sync features and clarified the operator-facing UI for replay validation.
- Added new entries in the dependencies table for upcoming tasks related to the demo replay flow.

These changes improve clarity and usability for operators and developers working with the demo replay system.
2026-06-20 11:24:43 +03:00

6.4 KiB

Phase 3: Test Data & Expected Results Validation Gate (HARD GATE)

Role: Professional Quality Assurance Engineer Goal: Ensure every test scenario produced in Phase 2 has concrete, sufficient test data. Remove tests that lack data. Verify final coverage stays above the canonical threshold (currently 75% — see .cursor/rules/cursor-meta.mdc Quality Thresholds; never hardcode a different number in any phase). Constraints: This phase is MANDATORY and cannot be skipped.

Step 1 — Build the requirements checklist

Scan blackbox-tests.md, performance-tests.md, resilience-tests.md, security-tests.md, and resource-limit-tests.md. For every test scenario, classify its shape (input/output or behavioral) and extract:

Input/output tests:

# Test Scenario ID Test Name Required Input Data Required Expected Result Result Quantifiable? Comparison Method Input Provided? Expected Result Provided?
1 [ID] [name] [data description] [what system should output] [Yes/No] [exact/tolerance/pattern/threshold] [Yes/No] [Yes/No]

Behavioral tests:

# Test Scenario ID Test Name Trigger Condition Observable Behavior Pass/Fail Criterion Quantifiable?
1 [ID] [name] [e.g., service receives SIGTERM] [e.g., drain logs emitted, port closed] [e.g., drain completes ≤30s] [Yes/No]

Present both tables to the user.

Step 2 — Ask user to provide missing test data AND expected results

For each row where Input Provided? is No OR Expected Result Provided? is No, ask the user:

Option A — Provide the missing items: Supply what is missing:

  • Missing input data: Place test data files in _docs/00_problem/input_data/ or indicate the location.
  • Missing expected result: Provide the quantifiable expected result for this input. Update _docs/00_problem/input_data/expected_results/results_report.md with a row mapping the input to its expected output. If the expected result is complex, provide a reference CSV file in _docs/00_problem/input_data/expected_results/. Use .cursor/skills/test-spec/templates/expected-results.md for format guidance.

Expected results MUST be quantifiable — the test must be able to programmatically compare actual vs expected. Examples:

  • "3 detections with bounding boxes [(x1,y1,x2,y2), ...] ± 10px"
  • "HTTP 200 with JSON body matching expected_response_01.json"
  • "Processing time < 500ms"
  • "0 false positives in the output set"

Option B — Skip this test: If you cannot provide the data or expected result, this test scenario will be removed from the specification.

BLOCKING: Wait for the user's response for every missing item.

Step 3 — Validate provided data and expected results

For each item where the user chose Option A:

Input data validation:

  1. Verify the data file(s) exist at the indicated location
  2. Verify quality: data matches the format, schema, and constraints described in the test scenario (e.g., correct image resolution, valid JSON structure, expected value ranges)
  3. Verify quantity: enough data samples to cover the scenario (e.g., at least N images for a batch test, multiple edge-case variants)

Expected result validation:

  1. Verify the expected result exists in input_data/expected_results/results_report.md or as a referenced file in input_data/expected_results/
  2. Verify quantifiability: the expected result can be evaluated programmatically — it must contain at least one of:
    • Exact values (counts, strings, status codes)
    • Numeric values with tolerance (e.g., ± 10px, ≥ 0.85)
    • Pattern matches (regex, substring, JSON schema)
    • Thresholds (e.g., < 500ms, ≤ 5% error rate)
    • Reference file for structural comparison (JSON diff, CSV diff)
  3. Verify completeness: the expected result covers all outputs the test checks (not just one field when the test validates multiple)
  4. Verify consistency: the expected result is consistent with the acceptance criteria it traces to

If any validation fails, report the specific issue and loop back to Step 2 for that item.

Step 4 — Remove tests without data or expected results

For each item where the user chose Option B:

  1. Warn the user: ⚠️ Test scenario [ID] "[Name]" will be REMOVED from the specification due to missing test data or expected result.
  2. Remove the test scenario from the respective test file
  3. Remove corresponding rows from traceability-matrix.md
  4. Update test-data.md to reflect the removal

Save action: Write updated files under TESTS_OUTPUT_DIR:

  • test-data.md
  • Affected test files (if tests removed)
  • traceability-matrix.md (if tests removed)

Step 5 — Final coverage check

After all removals, recalculate coverage:

  1. Count remaining test scenarios that trace to acceptance criteria
  2. Count total acceptance criteria + restrictions
  3. Calculate coverage percentage: covered_items / total_items * 100
Metric Value
Total AC + Restrictions ?
Covered by remaining tests ?
Coverage % ?%

Decision:

  • Coverage ≥ 75% → Phase 3 PASSED. Present final summary to user.

  • Coverage < 75% → Phase 3 FAILED. Report:

    Test coverage dropped to X% (minimum 75% required). The removed test scenarios left gaps in the following acceptance criteria / restrictions:

    Uncovered Item Type (AC/Restriction) Missing Test Data Needed

    Action required: Provide the missing test data for the items above, or add alternative test scenarios that cover these items with data you can supply.

    BLOCKING: Loop back to Step 2 with the uncovered items. Do NOT finalize until coverage ≥ 75%.

Phase 3 Completion

When coverage ≥ 75% and all remaining tests have validated data AND quantifiable expected results:

  1. Present the final coverage report
  2. List all removed tests (if any) with reasons
  3. Confirm every remaining test has: input data + quantifiable expected result + comparison method
  4. Confirm all artifacts are saved and consistent

After Phase 3 completion, run phases/hardware-assessment.md before Phase 4.