[AZ-585] [AZ-586] ResLim+Perf NFT tests; close test cycle 1

Batch 4 of test implementation cycle 1 (existing-code Step 6, final batch). - AZ-585 SteadyStateLoadTests + ColdStartRssTests: NFT-RES-LIM-01..04. SteadyStateLoadFixture runs one 5-min sustained-load window and samples RSS (docker stats), Npgsql conns (pg_stat_activity), and FDs (/proc/1/fd) every 5s; three test methods assert independently. All SkippableFact-gated on docker primitives. - AZ-586 PerformanceTests: NFT-PERF-01..04. Sequential single-client, 5 warm-ups + N measured calls, P50+P95 via LatencyPercentiles, recorded to PERF_RESULTS_FILE. Tagged Category=Perf so default gate excludes them. Infrastructure: - entrypoint.sh now applies --filter "${TEST_FILTER:-Category!=Perf}" per AZ-586 (default CI gate excludes performance). - MetricCsvRecorder: idempotent CSV appender keyed on env var, used by both Perf and ResLim categories. Step 6 (Implement Tests) is complete. Final report at _docs/03_implementation/implementation_report_tests.md handoffs the full-suite gate to test-run/SKILL.md (Step 7). Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 12:31:07 +00:00 · 2026-05-15 09:11:53 +03:00
parent 26126e6216
commit 001e80fe96
14 changed files with 1181 additions and 52 deletions
@@ -0,0 +1,116 @@
+# Resource Limit Tests
+
+**Task**: AZ-585_test_resource_limits
+**Name**: Resource limit tests (NFT-RES-LIM-01..04)
+**Description**: Implement xUnit blackbox tests for the 4 resource-limit observation scenarios — steady-state RSS memory under 5-min sustained load (P95 ≤ 250 MiB; no monotonic climb), Npgsql connection pool ≤ 100 with no unbounded growth, file-descriptor count ≤ 1024 with no leak, and cold-start RSS ≤ 200 MiB at `t=30s` after health-ok. Provisional gates documented per `restrictions.md` H6 — locked in after first green run.
+**Complexity**: 3 points
+**Dependencies**: AZ-576_test_infrastructure
+**Component**: Blackbox Tests
+**Tracker**: AZ-585
+**Epic**: AZ-575
+
+## Problem
+
+Per H6, container-level resource limits are NOT enforced inside the container — they will be set at the suite level (`_infra/_compose/`) per device type once locked. These tests establish baseline observations so the suite can size the cgroup limits correctly AND provide an upper-bound regression gate so future changes do not silently 10× the memory or FD footprint. The 8 GB Jetson Orin must accommodate ~6 .NET edge services + Postgres + UI; `missions`'s budget is ~200 MiB cold + ~250 MiB hot. Without these observation tests, a leak or library bloat could ship to the device and force a re-sizing decision late in deployment.
+
+## Outcome
+
+- All four NFT-RES-LIM-01..04 scenarios run and pass against the dockerised `missions` service.
+- Each test produces a CSV row with `Category=ResLim`, `Traces=H1|H3|H6|O10`, `Result=pass`, AND records the measured value (e.g., `P95_RSS_MiB=187`) in the `Traces` column so suite-level deployment planning can read it.
+- NFT-RES-LIM-01 measures P95 RSS over 5 minutes of mixed sustained load AND asserts `final_RSS - P95_RSS ≤ 20% * P95_RSS` (no monotonic climb).
+- NFT-RES-LIM-02 measures Npgsql connection count via `pg_stat_activity` every 5s AND asserts both `max ≤ 100` AND `final ≤ 1.3 * first_minute_steady_state`.
+- NFT-RES-LIM-03 measures `/proc/<pid>/fd | wc -l` inside the container every 5s AND asserts both `max ≤ 1024` AND `final ≤ 1.3 * minute_one_count`.
+- NFT-RES-LIM-04 measures cold-start RSS exactly 30s after `GET /health` first returns 200 (no requests issued yet) AND asserts `RSS ≤ 200 MiB`.
+
+## Scope
+
+### Included
+
+- NFT-RES-LIM-01 Steady-state memory under 5-min sustained load.
+- NFT-RES-LIM-02 Connection pool steady-state.
+- NFT-RES-LIM-03 File-descriptor steady-state.
+- NFT-RES-LIM-04 Cold-start RSS budget.
+- Each test records the measured value to the CSV `Traces` field so deployment planning can pick it up.
+- Provisional gates: 250 MiB hot, 200 MiB cold, 100 connections, 1024 FDs. On first green run, replace provisional gates with `measured + 50%` and open a Refactor Backlog ticket if the provisional gate was exceeded.
+
+### Excluded
+
+- Performance (latency / throughput) tests live in Task 19.
+- GPU / temperature / disk-I/O monitoring (per `restrictions.md` H8 — no specialised hardware on a CRUD service).
+- Long-soak / endurance tests (> 5 min) — explicitly deferred per `restrictions.md` H8.
+
+## Acceptance Criteria
+
+**AC-1: NFT-RES-LIM-01 steady-state RSS ≤ provisional 250 MiB with no monotonic climb**
+Given `missions` running with `seed_25_missions` + `seed_3_vehicles_2_default` and no host-side memory limit
+When the test orchestrator drives ~50 RPS of mixed `GET /vehicles`, `GET /missions`, `GET /missions/{id}/waypoints` for 5 minutes from a single concurrent client, while polling `docker stats --no-stream missions-sut` every 5s
+Then the P95 of the 60 RSS samples is `≤ 250 MiB` (provisional gate)
+And the final-sample RSS is within ± 20% of the P95 RSS (no sustained leak — RSS does not climb monotonically)
+And the measured P95 is recorded to the CSV `Traces` column as `P95_RSS_MiB=<n>`
+
+**AC-2: NFT-RES-LIM-02 connection pool ≤ 100 with no unbounded growth**
+Given the same setup as NFT-RES-LIM-01
+When the test orchestrator polls side-channel `SELECT count(*) FROM pg_stat_activity WHERE application_name LIKE 'Npgsql%' OR (usename='postgres' AND backend_type='client backend')` every 5s for 5 minutes
+Then the max sampled connection count is `≤ 100`
+And the final-sample count is `≤ 1.3 × (mean of samples in the first minute)`
+And the measured max is recorded as `MAX_NPGSQL_CONNS=<n>`
+
+**AC-3: NFT-RES-LIM-03 file descriptors ≤ 1024 with no leak**
+Given the same setup as NFT-RES-LIM-01
+When the test orchestrator executes `docker exec missions-sut sh -c 'ls /proc/$(pgrep -f Azaion.Missions.dll | head -1)/fd | wc -l'` every 5s for 5 minutes
+Then the max sampled FD count is `≤ 1024`
+And the final-sample count is `≤ 1.3 × (count at t=1min)`
+And the measured max is recorded as `MAX_FD=<n>`
+
+**AC-4: NFT-RES-LIM-04 cold-start RSS ≤ 200 MiB**
+Given `missions` has been started fresh (via `docker compose up -d missions` after `down -v`), no requests issued yet
+When `GET /health` first returns `200` AND 30s have elapsed
+Then `docker stats --no-stream missions-sut` reports `MEM USAGE` ≤ 200 MiB
+And the measured cold-start RSS is recorded as `COLD_RSS_MiB=<n>`
+
+## Non-Functional Requirements
+
+**Performance**
+- NFT-RES-LIM-01..03: each take exactly 5 minutes (sampling window). With Arrange/teardown, ≤ 6 minutes wall-clock.
+- NFT-RES-LIM-04: ≤ 60s wall-clock (fresh start + health-poll + 30s wait + measurement).
+- The total task runtime budget is ≤ 20 minutes, fitting inside the documented 15-min suite CI gate per `environment.md`. NFT-RES-LIM-01..03 share the same 5-minute window and run concurrently against a single dockerised `missions`; NFT-RES-LIM-04 runs separately because it requires a fresh start.
+
+**Reliability**
+- The load generator is a single-thread `HttpClient` driving requests in a tight loop; this is documented at 50 RPS approximately for the in-suite test runner. If the runner is unable to sustain 50 RPS (CI infrastructure too slow), the test SKIPS NFT-RES-LIM-01..03 with `Result=skip` and a clear `ErrorMessage=runner cannot sustain target load`. CI then reruns these on a beefier worker.
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-1 | `seed_25_missions` + 50 RPS for 5 min | P95 RSS sampling | P95 ≤ 250 MiB + no monotonic climb | H1, H6, O10 |
+| AC-2 | same | `pg_stat_activity` polling | max ≤ 100 + final ≤ 1.3×steady | O10 |
+| AC-3 | same | `/proc/<pid>/fd` polling | max ≤ 1024 + final ≤ 1.3×minute-one | H6, O10 |
+| AC-4 | fresh `docker compose up -d` | cold-start RSS at t=30s | RSS ≤ 200 MiB | H1, H3 |
+
+## Constraints
+
+- `docker stats` and `docker exec` from inside the runner: requires Docker socket access; AZ-576 covers this.
+- NFT-RES-LIM-03 requires `pgrep` inside the `missions` image; the test FAILS in Arrange (not Assert) if `pgrep` is unavailable. Alternative: parse `/proc/1/comm` if PID 1 is the .NET process (preferred for the small Dockerfile).
+- All measurements are recorded to the CSV report's `Traces` field so deployment planning can pick them up; this is more important than the pass/fail gate.
+- Provisional gates are documented per `restrictions.md` H6 — locked in based on first measured run.
+- AAA pattern with `// Arrange` / `// Act` / `// Assert` per test.
+
+## Risks & Mitigation
+
+**Risk 1: Measurement variance on shared CI runners**
+- *Risk*: A runner under noisy-neighbour load reports inflated RSS, flaking the gate.
+- *Mitigation*: Gates are provisional and generous (250 MiB vs. typical .NET service of ~150 MiB; 100 connections vs. typical idle pool of ~5–10). After the first green run, the gate is locked at `measured + 50%`.
+
+**Risk 2: NFT-RES-LIM-01..03 share a 5-minute window — flake correlation**
+- *Risk*: A CI hiccup that kills the SUT mid-window flakes all three at once.
+- *Mitigation*: Each test asserts its own metric; on `missions-sut` exit during the window, the test FAILS with a `"SUT exited during measurement window"` ErrorMessage rather than reporting a misleading metric value.
+
+**Risk 3: Provisional gates silently accepted as the locked gate**
+- *Risk*: If the first green run measures 200 MiB and the test passes, a future engineer treats 250 MiB as the gate forever — but actual headroom is only 50 MiB.
+- *Mitigation*: The test logs `(measured / gate) ratio`; CI dashboards flag ratios > 0.8 for re-tuning consideration. The lock-in workflow is documented in `restrictions.md` H6.
+
+## System Under Test Boundary
+
+- Tests drive the product through the public HTTP surface for load generation; `docker stats`, `docker exec`, and side-channel `pg_stat_activity` for measurement. Expected outputs are the documented gates from `_docs/02_document/tests/resource-limit-tests.md` (provisional) and the corresponding entries in `_docs/00_problem/input_data/expected_results/results_report.md` (when locked).
+- Stubs are allowed ONLY for: the external `admin` JWT issuer (`jwks-mock` container) and the DB-only stub tables for `media`, `annotations`, `detection`, `map_objects`.
+- Stubs, fakes, deterministic fallbacks, monkeypatches, or direct imports are NOT allowed for any internal product module — including the Npgsql connection pool, the `AppDataConnection` lifetime, or the `Program.cs` startup path. If any of these is not implemented, the test MUST fail/block as missing product implementation — it must not pass by replacing the module with a test stub.