mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 22:01:14 +00:00
chore: WIP pre-implement
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,181 @@
|
||||
# Release Report — Cycle 3 → Jetson (bench test)
|
||||
|
||||
- **Date**: 2026-05-26 14:42 EEST (UTC+3)
|
||||
- **Operator**: obezdienie001 (single-operator project; agent-assisted via `/autodev`)
|
||||
- **Strategy**: manual / bench-test
|
||||
- **Target version**: `be743a7` (dev HEAD; commit `[AZ-844] Close Step 11 cycle-3: unit pass, jetson regression AZ-848`)
|
||||
- **Target environment**: lab Jetson Orin Nano Super at SSH alias `jetson-e2e` (uptime 15d, 42 GB free on `/var/lib/docker`)
|
||||
- **Compose file**: `docker-compose.test.jetson.yml` (TEST compose — NOT the parent-suite airborne deploy compose)
|
||||
- **Verdict**: **Released**
|
||||
- **Verdict reason**: Bench run produced identical failure profile to Step 11 (`4 failed, 48 passed, 3 skipped, 1 xfailed, 1 xpassed in 335.41s`); same four AZ-848 test IDs failed; no NEW cycle-3-scope regressions introduced by `fd52cc9`. AZ-848 / AZ-883 carry forward to Cycle 4 as planned.
|
||||
|
||||
## Pre-Release Gate (Phase 1)
|
||||
|
||||
### Scope of this release
|
||||
|
||||
This is **not** an airborne production deploy. It is a **bench-test verification** that the cycle-3 source tree builds and runs on real Tier-2 hardware (the lab Jetson Orin Nano Super), using the same `docker-compose.test.jetson.yml` harness that drove the cycle-3 closeout in Step 11. The user explicitly chose this path over a true airborne deploy because two open Jetson blockers (AZ-848, AZ-883) were just diagnosed and deferred to Cycle 4.
|
||||
|
||||
A true airborne release will be Cycle 4's job, once AZ-848 (`VioOutput.emitted_at_ns` contract repair) and AZ-883 (`SCALED_IMU2` ts_ns=0 latent bug) are fixed.
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
The system-level ACs in `_docs/00_problem/acceptance_criteria.md` (AC-1.x position accuracy, AC-4.x latency/memory, AC-NEW-1 TTFF, AC-NEW-2 spoof promotion, AC-NEW-4 false-position safety, AC-NEW-5 thermal envelope) all require **live-flight data + Tier-2 hardware** and are not in scope for this bench test. They remain "Unverified" — same status as recorded in `_docs/06_metrics/perf_2026-05-19_workstation-tier1-probe.md` and `_docs/06_metrics/perf_2026-05-26_cycle3-tier1-probe.md`.
|
||||
|
||||
What IS in scope and verifiable here:
|
||||
|
||||
| Scope item | Verification | Status |
|
||||
|------------|--------------|--------|
|
||||
| Cycle-3 source builds on arm64 (Jetson Orin Nano Super) | `docker compose build` against `tests/e2e/Dockerfile.jetson` succeeds | Phase 3 |
|
||||
| Cycle-3 source runs on real Jetson hardware end-to-end | `pytest tests/unit/ + tests/e2e/replay/` exits with same failure profile as Step 11 closeout (`4 failed, 48 passed, 3 skipped, 1 xfailed, 1 xpassed`) | Phase 4 |
|
||||
| No new Cycle-3-scope regressions vs. Step 11 (2026-05-24) | Failure profile matches Step 11 — only the known AZ-848 4-tuple fails; no new failures introduced by `fd52cc9` | Phase 4 |
|
||||
| Working tree on Jetson reflects the cycle-3 closeout commit | `rsync` mirrors local `be743a7` to remote `~/gps-denied-onboard/` | Phase 3 |
|
||||
|
||||
### Test Status
|
||||
|
||||
| Suite | Pass | Fail | Skip | Source |
|
||||
|-------|-----:|-----:|-----:|--------|
|
||||
| Tier-1 unit (local Mac) | 2303 | 0 | 86 | `_docs/03_implementation/run_tests_step11_report.md` § Cycle-3 closeout → Local unit suite |
|
||||
| Tier-1 perf (this cycle, Mac) | n/a | n/a | n/a | `_docs/06_metrics/perf_2026-05-26_cycle3-tier1-probe.md` — 4/4 NFRs **Unverified** on Tier-1 (NFR-PERF-* require Tier-2 + AZ-595 fixture, both still pending) |
|
||||
| Tier-2 Jetson e2e (Step 11, 2026-05-24) | 48 | 4 (AZ-848) | 3 | `_docs/03_implementation/run_tests_step11_report.md` § Cycle 3 closeout → Jetson e2e |
|
||||
| Tier-2 Jetson e2e (this release; bench rerun) | <pending> | <pending> | <pending> | This release report, Phase 4 below |
|
||||
|
||||
### Change Summary
|
||||
|
||||
Cycle-3 src delta (single commit `fd52cc9 [AZ-845][AZ-846][AZ-847] Refactor 02: relocate RouteSpec + widen lint`):
|
||||
|
||||
```
|
||||
src/gps_denied_onboard/_types/route.py | +43
|
||||
src/gps_denied_onboard/components/c11_tile_manager/route_client.py | -4
|
||||
src/gps_denied_onboard/replay_input/__init__.py | -2
|
||||
src/gps_denied_onboard/replay_input/tlog_route.py | -30
|
||||
```
|
||||
|
||||
Net effect: relocate the `RouteSpec` dataclass from a private helper into the shared `_types/` package; widen ruff lint rules to cover the new module. No behavioural change. No `c1_vio` / `c5_state` / `c8_fc_adapter` / `runtime_root` touches.
|
||||
|
||||
Cycle-3 ticket scope (closed in this cycle, present at HEAD):
|
||||
|
||||
| Ticket | Type | Component | Notes |
|
||||
|--------|------|-----------|-------|
|
||||
| AZ-835 (epic) | feature | C1–C6 | "GPS-denied tile provisioning + route spec" epic; decomposed into C1–C6 sub-tasks |
|
||||
| AZ-836 | tooling | autodev | State-file trim; defer In Testing transition (MCP unavailable workaround) |
|
||||
| AZ-838 | feature | C2 (route client) | `SatelliteProviderRouteClient` + `seed_route.py` CLI |
|
||||
| AZ-839 | feature / fixture | C3 (matcher) + E-AZ-835 C3 | `operator_pre_flight_setup` real-fixture wiring |
|
||||
| AZ-840 | feature / test | E-AZ-835 C4 | e2e orchestrator test |
|
||||
| AZ-844 | infra / fix | C12 cold-start NFR + Jetson harness | Threshold relax 500 → 1000 ms; rsync exclude `tiles/` `ready/`; Step 11 closeout |
|
||||
| AZ-845, AZ-846, AZ-847 | refactor / lint | `_types/`, `c11_tile_manager`, `replay_input`, lint | Refactor 02 (this is the only `src/` delta) |
|
||||
| AZ-848 | bug (deferred) | C1 contract (`VioOutput.emitted_at_ns`) | **Deferred to Cycle 4.** Surfaced during this cycle's release flow when initially routed to operator-workstation target; root-cause re-diagnosed via tlog probe; 5 SP. |
|
||||
| AZ-883 | bug (deferred) | C8 adapter (`_handle_imu` SCALED_IMU2) | **Deferred to Cycle 4.** Latent ts_ns=0 bug surfaced during AZ-848 investigation; 2 SP. |
|
||||
|
||||
### Rollback Plan
|
||||
|
||||
- **Previous version**: NONE — this is the first-ever release for this project.
|
||||
- `_docs/04_release/` was empty before this report.
|
||||
- No `release/*` git tag in the repo.
|
||||
- No `.previous-tags.env` produced by a prior `stop-services.sh` run.
|
||||
- **Rollback script**: `scripts/deploy.sh --rollback` is **unavailable** for this bench test (exit 70 — `.previous-tags.env` not found). Acceptable: the test compose's "rollback" is `docker compose down` against `docker-compose.test.jetson.yml`, which leaves the Jetson in pre-test state.
|
||||
- **Rollback target verified pullable**: n/a (no previous version exists).
|
||||
- **Rollback target verified bootable in target env**: n/a.
|
||||
|
||||
For Cycle 4's true airborne release, a real rollback target will exist (the image produced by this bench-test cycle, once an arm64 image is built + tagged in CI).
|
||||
|
||||
### Restrictions / Approvals
|
||||
|
||||
- Change-window restrictions: none for bench testing on lab Jetson (NFT-SEC-05 in-flight egress lockdown and ground-only gate apply only to airborne).
|
||||
- Manual approvals required: none — single-operator project.
|
||||
- Restriction `_docs/00_problem/restrictions.md` § "Failsafe & Safety" applies only to live flight; not exercised by bench test.
|
||||
|
||||
### Tracker State at Gate
|
||||
|
||||
- **Tickets in scope** (CLOSED at HEAD): 8 tickets (AZ-835, AZ-836, AZ-838, AZ-839, AZ-840, AZ-844, AZ-845, AZ-846, AZ-847 — see Change Summary above).
|
||||
- **Tickets deferred to Cycle 4** (NOT blocking this bench release; explicitly off the operator-orchestrator + bench-test paths): AZ-848, AZ-883.
|
||||
- **Tickets blocking release**: 0. AZ-848 / AZ-883 affect only the live-flight tlog-replay path on the airborne Jetson; they are deliberately NOT a bench-test blocker because the bench test re-confirms the SAME failure profile as Step 11 (no NEW regressions in cycle-3-scope).
|
||||
|
||||
### Gate Decision
|
||||
|
||||
User picked **A) Bench testing on jetson-e2e** at the Pre-Release Gate. The contradiction with the user's prior turn (operator-workstation target) was flagged and resolved in favour of bench-test on Jetson. Three issues from the gate that influence verdict interpretation are recorded under "Rollback Plan" (no rollback target) and "Acceptance Criteria" (system-level ACs unverifiable from Tier-1 / bench).
|
||||
|
||||
## Strategy Select (Phase 2)
|
||||
|
||||
- **Recommended by skill table** for this target capability: `manual` (per `release/SKILL.md` Phase 2 table — "Non-automatable env (one-off VMs, regulated infrastructure, non-Docker host) — the whole release becomes a runbook"). Although Docker IS in play here, this is a bench rig with no load balancer, no traffic-tier routing, no automated rollout — the closest semantic match in the skill's table.
|
||||
- **Chosen**: `manual` / bench-test.
|
||||
- **Reasoning**: blue-green / canary / all-at-once all imply a service taking real traffic. The bench-test Jetson takes no traffic; it runs an internally-scripted test compose. The release does record but does not "deploy" in the production sense — the parent-suite Watchtower flow is bypassed; only the cycle-3 image's compileability + runnability on hardware is being verified.
|
||||
|
||||
## Execute (Phase 3)
|
||||
|
||||
- **Start**: 2026-05-26 14:42:41 UTC (shell job PID 84808)
|
||||
- **Command**: `bash scripts/run-tests-jetson.sh` (no flags; defaults to `JETSON_SSH_ALIAS=jetson-e2e`, `JETSON_REMOTE_DIR=~/gps-denied-onboard`, `COMPOSE_FILE=docker-compose.test.jetson.yml`)
|
||||
- **Stream sink**: `_docs/04_release/.jetson_bench_run_2026-05-26.log` (preserved for audit; NOT committed — `.jetson_bench_run_*.log` should land in `.gitignore` post-release).
|
||||
- **End**: 2026-05-26 14:50:17 UTC (wall clock 7m 35s; includes rsync + docker compose pull + e2e-runner image build + pytest)
|
||||
- **Exit code**: 1 — propagated from `pytest` (4 failures inside `e2e-runner`). **Expected**: AZ-848 deterministically fails the same 4 cases. The bench-test verdict is NOT "exit 0" — it is "failure profile matches Step 11".
|
||||
|
||||
Pytest summary line (from `_docs/04_release/.jetson_bench_run_2026-05-26.log`, e2e-runner-1 container):
|
||||
|
||||
```
|
||||
============================= test session starts ==============================
|
||||
platform linux -- Python 3.10.12, pytest-9.0.3, pluggy-1.6.0
|
||||
collected 57 items
|
||||
... (57 tests; see Phase 4 table below for the test-ID summary)
|
||||
= 4 failed, 48 passed, 3 skipped, 1 xfailed, 1 xpassed, 1 warning in 335.41s (0:05:35) =
|
||||
```
|
||||
|
||||
AZ-848 root-cause log line from THIS run (matches Step 11 root cause, confirms determinism):
|
||||
|
||||
```
|
||||
c5.state.eskf_out_of_order ts_ns=187,370,418,000 last_added_ts_ns=1,362,268,944,997,999
|
||||
c5.state.eskf_filter_divergence source=vio mahalanobis_sq=109.76467866548009 threshold_sq=100.0
|
||||
replay_loop.state_add_vio_fatal frame=3 EstimatorFatalError('eskf filter divergence on vio: mahalanobis²=109.765 > 100.0')
|
||||
```
|
||||
|
||||
(`last_added_ts_ns` differs from Step 11's value because Jetson uptime grew 2 days — the gap between `monotonic_ns` and FC-boot-relative timestamps scales with uptime per AZ-848 root cause; the IMU ts_ns is byte-identical (FC-boot-relative). Both confirm AZ-848's mechanism.)
|
||||
|
||||
## Smoke Test (Phase 4)
|
||||
|
||||
The bench-test compose IS the smoke set (per Phase 2 — bench-test strategy collapses Execute and Smoke into one harness invocation). The pass criterion below is **not** "0 failures" — it is "failure profile matches Step 11's evidence, i.e. only the known AZ-848 4-tuple fails, no new failures introduced by cycle-3 src delta".
|
||||
|
||||
- **Mode**: same harness as Step 11 closeout (rsync + `docker compose --abort-on-container-exit --exit-code-from e2e-runner up`)
|
||||
- **Start**: 2026-05-26 14:44:31 UTC (e2e-runner container started; `test session starts` line)
|
||||
- **End**: 2026-05-26 14:50:06 UTC (5m 35s pytest wall clock)
|
||||
|
||||
| Test | Step 11 (2026-05-24) | This run (2026-05-26) | Verdict |
|
||||
|------|----------------------|----------------------|---------|
|
||||
| `tests/e2e/replay/test_derkachi_1min.py::test_ac1_exits_0_jsonl_count_match` | FAIL (AZ-848 frame-3 ESKF divergence) | FAIL (same root cause; same frame; same mahalanobis²=109.765) | **Match — AZ-848 carries forward** |
|
||||
| `tests/e2e/replay/test_derkachi_1min.py::test_ac5_determinism_two_runs_diff` | FAIL (same root cause) | FAIL (same root cause) | **Match** |
|
||||
| `tests/e2e/replay/test_derkachi_1min.py::test_ac6_pace_realtime_60s_within_5pct` | FAIL (same root cause) | FAIL (same root cause) | **Match** |
|
||||
| `tests/e2e/replay/test_derkachi_1min.py::test_ac6_pace_asap_under_30s` | FAIL (same root cause) | FAIL (same root cause) | **Match** |
|
||||
| `tests/e2e/replay/test_derkachi_1min.py::test_ac3_within_100m_80pct_of_ticks` | XPASS (vacuous — binary exits 1 before emissions) | XPASS (same vacuous; same explanation in short-summary) | **Match** |
|
||||
| Remaining 48 cases | PASS | PASS (all 48) | **Match — no new regressions** |
|
||||
| Skipped (3) | env-gated (legitimate) | SKIPPED — same three (AZ-839 operator_pre_flight_setup × 2; AC-8 mock-suite-sat-service incomplete) | **Match** |
|
||||
| xfailed (1) | known xfail (AZ-699 / AZ-776+AZ-777) | XFAIL — same test, same upstream-gap explanation | **Match** |
|
||||
|
||||
**Smoke verdict pass condition**: ✅ met. Totals = `4 failed, 48 passed, 3 skipped, 1 xfailed, 1 xpassed` and the 4 failure IDs are byte-identical to Step 11's IDs.
|
||||
|
||||
## Watch Window (Phase 5)
|
||||
|
||||
- **Duration**: not applicable — bench test, no live traffic, no observability backend in scope.
|
||||
- **Substitute**: the test compose's `--abort-on-container-exit --exit-code-from e2e-runner` IS the watch — if any service crashes mid-test, pytest aborts and the exit code propagates back. The duration of the bench run (~5–6 min) acts as the de-facto watch.
|
||||
- This is explicitly recorded per `release/SKILL.md` Phase 5: "If the user explicitly demands skipping (e.g., emergency rollforward), record the override reason in the release report and continue, but mark the verdict as `Released-with-override`." Adapted for bench testing: no live traffic ⇒ no observability ⇒ Phase 5 is honestly N/A, not "skipped". Verdict will be `Released` (or `Aborted`), not `Released-with-override`.
|
||||
|
||||
## Commit or Rollback (Phase 6)
|
||||
|
||||
### Released
|
||||
|
||||
- Tracker tickets in scope **stay as they are** — they were moved to Done during prior cycle-3 steps (Step 12-15). No new tracker movement triggered by this bench-test release.
|
||||
- Git tag: deliberately NOT pushed. `release/cycle3-bench` would mislabel a bench-test milestone as a production release; the next true airborne release in Cycle 4 will carry the first `release/*` tag.
|
||||
- AZ-848 and AZ-883 are **explicit known-regression carry-forwards** into Cycle 4 — both have updated specs and Jira state set during this autodev session.
|
||||
- Cycle-3 source is hardware-bench-verified on the lab Jetson at SHA `be743a7`. The same source can be re-run reproducibly via `bash scripts/run-tests-jetson.sh` against `jetson-e2e`.
|
||||
- Retrospective scheduled: `/retrospective --cycle-end` auto-chains after this report. Output expected at `_docs/06_metrics/retro_cycle3_<timestamp>.md`.
|
||||
|
||||
## Open Risks Carried Into Cycle 4
|
||||
|
||||
| Risk | Owner ticket | Severity |
|
||||
|------|--------------|----------|
|
||||
| AZ-848 — VioOutput.emitted_at_ns contract clashes with FC-IMU timebase; blocks live-flight ESKF on long-uptime Jetson | AZ-848 (5 SP) | High — real airborne release blocked until fixed |
|
||||
| AZ-883 — `_handle_imu` produces ts_ns=0 for every SCALED_IMU2 message; latent IMU monotonicity violation | AZ-883 (2 SP) | Medium — latent; fix lands before C13 FDR replay tools assume per-sample monotonicity |
|
||||
| `EVIDENCE_OUT` default points at container-only path (`/e2e-results/evidence`) — breaks Tier-1 perf tests on the host | `_docs/_process_leftovers/2026-05-26_evidence_out_default_path.md` | Low — workaround exists (`EVIDENCE_OUT="$(pwd)/e2e-results/..."`) |
|
||||
|
||||
## Lessons (one-liners)
|
||||
|
||||
- **First-release rollback gap is structural, not procedural** — the `scripts/deploy.sh --rollback` path requires `.previous-tags.env`, which only exists after a successful `stop-services.sh` run. First-ever deploys have no rollback target by construction; the release skill's Phase 1 rollback check should treat first-release as a recognized first-time path, not a blocking gate.
|
||||
- **Bench-test "release" is a legitimate milestone but not a production release** — the release skill's six-phase pipeline (deploy → smoke → watch → commit) compresses to three phases for bench testing (rsync+build → harness-as-smoke → commit). The skill could grow an explicit `strategy: bench-test` row in its Phase 2 table so future releases don't have to improvise.
|
||||
- **Long-uptime Jetson + freshly-booted FC is the AZ-848 sensitiser** — the gap between `monotonic_ns` and FC-boot-relative timestamps grew by ~175 trillion ns over 2 days (1.187·10¹⁵ → 1.362·10¹⁵). This confirms the bug's mechanism is purely additive in uptime and gives Cycle 4 a clean reproduction protocol: `uptime -p` ≥ 1d on the Jetson + a tlog from a session ≤ 15 min after FC boot.
|
||||
- **Cycle-3 src delta size vs. release scope tension** — `fd52cc9` is a 75-line refactor; the release machinery exercises full deploy + smoke against it. The bench-test path balances "release discipline" against "tiny delta does not warrant prod-deploy theatre", and it should stay as the default for refactor-only cycles in this project.
|
||||
Reference in New Issue
Block a user