mirror of
https://github.com/azaion/satellite-provider.git
synced 2026-06-27 09:51:14 +00:00
[AZ-1124] Add PT-10 gRPC stream perf scenario
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -262,6 +262,13 @@ Step 9 cycle 8b: folded into cycle 8 as step 1 (AZ-812). Section retained in dep
|
||||
Step 9 cycle 9: 2 tasks created (AZ-1074 = 5 pts, AZ-1075 = 3 pts) — total 8 pts. gRPC TileStream for route-based progressive tile delivery.
|
||||
Step 9 cycle 10: 1 task created (AZ-1113 = 2 pts) — REST 400 error message sanitization (F-AZ795-1/2, F-AZ810-1). Child of AZ-795.
|
||||
Step 9 cycle 11: 1 task created (AZ-1123 = 1 pt) — document `docker-compose.perf.yml` host-port conflict playbook (cycle 10 retro action).
|
||||
Step 9 cycle 12: 1 task created (AZ-1124 = 3 pts) — PT-10 gRPC `DeliverRouteTiles` stream perf scenario (cycle 9–11 retro carry-over).
|
||||
|
||||
### Step 9 cycle 12 (PT-10 gRPC stream perf — AZ-1124)
|
||||
|
||||
| Task | Depends On | Points | Status |
|
||||
|------|-----------|--------|--------|
|
||||
| AZ-1124 PT-10 gRPC stream perf scenario | AZ-1074, AZ-1075, AZ-492 | 3 | Todo |
|
||||
|
||||
## Coverage Verification
|
||||
|
||||
|
||||
@@ -0,0 +1,142 @@
|
||||
# PT-10: gRPC DeliverRouteTiles stream performance scenario
|
||||
|
||||
**Task**: AZ-1124_pt10_grpc_stream_perf
|
||||
**Name**: PT-10 gRPC stream perf scenario
|
||||
**Description**: Add a runnable PT-10 performance scenario that measures latency of the existing `DeliverRouteTiles` server-streaming RPC against a live API.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-1074, AZ-1075, AZ-492
|
||||
**Component**: Test infrastructure (`scripts/run-performance-tests.sh`, `SatelliteProvider.IntegrationTests`) + perf NFR coverage (`_docs/02_document/tests/performance-tests.md`)
|
||||
**Tracker**: AZ-1124
|
||||
**Epic**: AZ-115
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/contracts/c11_tilemanager/tile_provision_grpc.md` (consumer — wire contract for `DeliverRouteTiles`)
|
||||
- `_docs/02_document/tests/blackbox-tests.md` § BT-32 (functional baseline; PT-10 adds NFR evidence)
|
||||
|
||||
### ADR Compliance
|
||||
|
||||
> Implements ADR-013: gRPC `RouteTileDelivery` transport alongside REST for progressive tile delivery.
|
||||
|
||||
## Problem
|
||||
|
||||
Cycle 9 shipped `RouteTileDelivery.DeliverRouteTiles` (AZ-1074) with functional integration coverage (AZ-1075 / BT-32). The performance gate still runs PT-01..PT-08 (REST-only). Step 15 can pass while the gRPC streaming surface remains **Unverified** for latency — a gap called out in cycle 9–11 retrospectives and LESSONS.md (`[testing]` entry 2026-06-25).
|
||||
|
||||
Operators and autodev Step 15 need measurable evidence for:
|
||||
|
||||
- **Time-to-first tile batch** on a cold path (may include Google Maps download)
|
||||
- **Total stream duration** until `DeliveryComplete`
|
||||
- **Slow-consumer smoke** — stream completes without corruption when the client deliberately delays between events (backpressure sanity check per AZ-1074 AC-4)
|
||||
|
||||
## Outcome
|
||||
|
||||
- `scripts/run-performance-tests.sh` runs PT-10 against the **real** gRPC `DeliverRouteTiles` endpoint (TLS + JWT metadata), not a mock.
|
||||
- PT-10 reports `first_batch_ms`, `total_stream_ms`, and optional slow-consumer pass/fail over `PERF_REPEAT_COUNT` iterations (default 20).
|
||||
- `_docs/02_document/tests/performance-tests.md` documents PT-10 triggers, thresholds, and pass criteria.
|
||||
- `_docs/02_document/tests/traceability-matrix.md` marks gRPC stream perf as covered (no longer Unverified).
|
||||
- gRPC perf gap from cycle 9–11 retros is closed.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- Add `SatelliteProvider.IntegrationTests` perf bootstrap subcommand (e.g. `--run-pt10`) invoked from `scripts/run-performance-tests.sh`, following the AZ-492 pattern (`--mint-only`, `--gen-uav-fixture`). The subcommand:
|
||||
- Mints or accepts JWT via existing `PerfBootstrap` / `JwtTokenFactory` surface
|
||||
- Opens a gRPC channel to `API_URL` (same TLS trust as REST perf probes)
|
||||
- Calls `DeliverRouteTiles` with the standard 2-waypoint / 500m / zoom-18 fixture (`GrpcTestHelpers.BuildValidRequest` coordinates)
|
||||
- Records wall-clock **first `RouteTileEvent` with `Batch` payload** and **stream close** (`DeliveryComplete` or `DeliveryError`)
|
||||
- Supports `PERF_REPEAT_COUNT` cold iterations; prints p50/p95 summary to stdout for the shell script to gate
|
||||
- Optional slow-consumer mode (`PERF_PT10_SLOW_MS` delay between stream events, default 50) on a single iteration — asserts stream completes with ≥1 tile and no `DeliveryError`
|
||||
- Add PT-10 section to `scripts/run-performance-tests.sh` with non-zero exit on harness failure (same contract as PT-07/PT-08).
|
||||
- Add PT-10 spec block to `_docs/02_document/tests/performance-tests.md` with documented thresholds:
|
||||
- Cold `p95(first_batch_ms) ≤ 30000` (aligns with PT-01 cold tile budget — includes GM round-trip)
|
||||
- Cold `p95(total_stream_ms) ≤ 120000` (2-point 500m corridor at zoom 18 on dev hardware)
|
||||
- Slow-consumer sub-check: completes without error (no fixed ms threshold)
|
||||
- Update `traceability-matrix.md` — gRPC streaming NFR row references PT-10.
|
||||
- Update script header comment from "PT-01..PT-08" to include PT-10.
|
||||
|
||||
### Excluded
|
||||
|
||||
- Production code changes to `RouteTileDeliveryGrpcService`, orchestrator, or proto (endpoint already exists).
|
||||
- New gRPC RPCs or contract version bumps.
|
||||
- Load testing with concurrent streams (k6 / multiple clients) — single-client sequential repeats only.
|
||||
- Promoting PT-09 inventory inline test into the shell harness (separate follow-up).
|
||||
- Setting production SLOs — thresholds are dev-harness budgets; Step 15 retains its A/B/C gate on threshold failures.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: PT-10 exercises real gRPC stream**
|
||||
Given a running API with TLS and valid `JWT_SECRET`
|
||||
When `scripts/run-performance-tests.sh` runs PT-10
|
||||
Then the probe calls `RouteTileDelivery.DeliverRouteTiles` over gRPC with `authorization: Bearer` metadata, AND receives at least one `Batch` event before `DeliveryComplete` on every successful iteration.
|
||||
|
||||
**AC-2: Latency metrics reported**
|
||||
Given `PERF_REPEAT_COUNT` cold iterations (default 20)
|
||||
When PT-10 completes
|
||||
Then the harness prints `first_batch_ms` and `total_stream_ms` per iteration plus aggregate p50/p95 for both metrics.
|
||||
|
||||
**AC-3: Threshold gate**
|
||||
Given the documented dev-hardware budgets in `performance-tests.md`
|
||||
When PT-10 p95 values are compared
|
||||
Then `p95(first_batch_ms) ≤ 30000` AND `p95(total_stream_ms) ≤ 120000`, OR the script exits non-zero with an explicit threshold failure message (Step 15 may still offer override).
|
||||
|
||||
**AC-4: Slow-consumer smoke**
|
||||
Given `PERF_PT10_SLOW_MS` > 0 (default 50)
|
||||
When PT-10 runs the slow-consumer sub-check once per script invocation
|
||||
Then the stream completes with `DeliveryComplete`, at least one tile in collected batches, AND no `DeliveryError`.
|
||||
|
||||
**AC-5: Spec and traceability updated**
|
||||
Given the post-task repository state
|
||||
When `performance-tests.md` and `traceability-matrix.md` are read
|
||||
Then PT-10 is documented as **Implemented** with scenario ID PT-10, AND the gRPC stream perf row is no longer `Unverified`.
|
||||
|
||||
**AC-6: No duplicated JWT / gRPC bootstrap**
|
||||
Given existing `PerfBootstrap`, `GrpcTestHelpers`, and `JwtTokenFactory`
|
||||
When PT-10 mints tokens and opens channels
|
||||
Then it reuses those surfaces — no third JWT mint implementation and no parallel gRPC client factory in the shell script.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Thresholds above are dev-compose budgets on 8-core x86 baseline; document revision path if hardware changes.
|
||||
|
||||
**Reliability**
|
||||
- Script exits non-zero if gRPC channel cannot be established, JWT is missing, or zero successful iterations complete.
|
||||
- Fail loudly with RPC status code and detail on `DeliveryError` or non-OK `RpcException`.
|
||||
|
||||
**Compatibility**
|
||||
- Bash 4+; gRPC probe runs via `dotnet` IntegrationTests DLL (already built for perf bootstrap).
|
||||
- Works with `docker-compose.perf.yml` overlay and `API_URL=https://localhost:18980` default.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-6 | Grep / static review | No new `JwtSecurityToken` constructors in perf script; PT-10 logic lives in IntegrationTests |
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|-------------|-------------------|----------------|
|
||||
| AC-1..AC-4 | API + DB running via compose; valid JWT | Run full `scripts/run-performance-tests.sh` including PT-10 | Real gRPC stream; metrics printed; thresholds evaluated | PT-10 NFR |
|
||||
| AC-1 | Same stack | Run IntegrationTests `--run-pt10` in isolation | Exit 0; JSON or tabular metrics on stdout | — |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Must not regress PT-01..PT-08 or `scripts/run-tests.sh`.
|
||||
- Must not commit minted tokens or disable TLS verification in tracked files.
|
||||
- Threshold failures are reported by the script; autodev Step 15 handles operator override — this task does not weaken gates silently.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Cold-path variance from Google Maps**
|
||||
- *Risk*: First-batch p95 spikes on network blips look like regressions.
|
||||
- *Mitigation*: Document that PT-10 cold path includes GM download; Step 15 override path remains available; record actuals in `_docs/06_metrics/perf_<date>.md`.
|
||||
|
||||
**Risk 2: TLS / HTTP2 setup differs from integration tests**
|
||||
- *Risk*: Perf script `API_URL` may not match in-compose `https://api:8080` used by integration tests.
|
||||
- *Mitigation*: Reuse `GrpcTestHelpers.CreateChannel` cert handling; document required `API_URL` and `certs/api.crt` in perf script header (same as REST perf).
|
||||
|
||||
**Risk 3: Threshold too tight on laptop hardware**
|
||||
- *Risk*: p95 gates fail on underpowered dev machines.
|
||||
- *Mitigation*: Thresholds match PT-01/PT-03 family (generous dev budgets); tune in `_docs/06_metrics` after first green run if needed.
|
||||
Reference in New Issue
Block a user