[AZ-585] [AZ-586] ResLim+Perf NFT tests; close test cycle 1

Batch 4 of test implementation cycle 1 (existing-code Step 6, final batch).

- AZ-585 SteadyStateLoadTests + ColdStartRssTests: NFT-RES-LIM-01..04.
  SteadyStateLoadFixture runs one 5-min sustained-load window and samples
  RSS (docker stats), Npgsql conns (pg_stat_activity), and FDs
  (/proc/1/fd) every 5s; three test methods assert independently. All
  SkippableFact-gated on docker primitives.
- AZ-586 PerformanceTests: NFT-PERF-01..04. Sequential single-client,
  5 warm-ups + N measured calls, P50+P95 via LatencyPercentiles, recorded
  to PERF_RESULTS_FILE. Tagged Category=Perf so default gate excludes them.

Infrastructure:
- entrypoint.sh now applies --filter "${TEST_FILTER:-Category!=Perf}"
  per AZ-586 (default CI gate excludes performance).
- MetricCsvRecorder: idempotent CSV appender keyed on env var, used by
  both Perf and ResLim categories.

Step 6 (Implement Tests) is complete. Final report at
_docs/03_implementation/implementation_report_tests.md handoffs the
full-suite gate to test-run/SKILL.md (Step 7).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-15 09:11:53 +03:00
parent 26126e6216
commit 001e80fe96
14 changed files with 1181 additions and 52 deletions
@@ -0,0 +1,116 @@
# Resource Limit Tests
**Task**: AZ-585_test_resource_limits
**Name**: Resource limit tests (NFT-RES-LIM-01..04)
**Description**: Implement xUnit blackbox tests for the 4 resource-limit observation scenarios — steady-state RSS memory under 5-min sustained load (P95 ≤ 250 MiB; no monotonic climb), Npgsql connection pool ≤ 100 with no unbounded growth, file-descriptor count ≤ 1024 with no leak, and cold-start RSS ≤ 200 MiB at `t=30s` after health-ok. Provisional gates documented per `restrictions.md` H6 — locked in after first green run.
**Complexity**: 3 points
**Dependencies**: AZ-576_test_infrastructure
**Component**: Blackbox Tests
**Tracker**: AZ-585
**Epic**: AZ-575
## Problem
Per H6, container-level resource limits are NOT enforced inside the container — they will be set at the suite level (`_infra/_compose/`) per device type once locked. These tests establish baseline observations so the suite can size the cgroup limits correctly AND provide an upper-bound regression gate so future changes do not silently 10× the memory or FD footprint. The 8 GB Jetson Orin must accommodate ~6 .NET edge services + Postgres + UI; `missions`'s budget is ~200 MiB cold + ~250 MiB hot. Without these observation tests, a leak or library bloat could ship to the device and force a re-sizing decision late in deployment.
## Outcome
- All four NFT-RES-LIM-01..04 scenarios run and pass against the dockerised `missions` service.
- Each test produces a CSV row with `Category=ResLim`, `Traces=H1|H3|H6|O10`, `Result=pass`, AND records the measured value (e.g., `P95_RSS_MiB=187`) in the `Traces` column so suite-level deployment planning can read it.
- NFT-RES-LIM-01 measures P95 RSS over 5 minutes of mixed sustained load AND asserts `final_RSS - P95_RSS ≤ 20% * P95_RSS` (no monotonic climb).
- NFT-RES-LIM-02 measures Npgsql connection count via `pg_stat_activity` every 5s AND asserts both `max ≤ 100` AND `final ≤ 1.3 * first_minute_steady_state`.
- NFT-RES-LIM-03 measures `/proc/<pid>/fd | wc -l` inside the container every 5s AND asserts both `max ≤ 1024` AND `final ≤ 1.3 * minute_one_count`.
- NFT-RES-LIM-04 measures cold-start RSS exactly 30s after `GET /health` first returns 200 (no requests issued yet) AND asserts `RSS ≤ 200 MiB`.
## Scope
### Included
- NFT-RES-LIM-01 Steady-state memory under 5-min sustained load.
- NFT-RES-LIM-02 Connection pool steady-state.
- NFT-RES-LIM-03 File-descriptor steady-state.
- NFT-RES-LIM-04 Cold-start RSS budget.
- Each test records the measured value to the CSV `Traces` field so deployment planning can pick it up.
- Provisional gates: 250 MiB hot, 200 MiB cold, 100 connections, 1024 FDs. On first green run, replace provisional gates with `measured + 50%` and open a Refactor Backlog ticket if the provisional gate was exceeded.
### Excluded
- Performance (latency / throughput) tests live in Task 19.
- GPU / temperature / disk-I/O monitoring (per `restrictions.md` H8 — no specialised hardware on a CRUD service).
- Long-soak / endurance tests (> 5 min) — explicitly deferred per `restrictions.md` H8.
## Acceptance Criteria
**AC-1: NFT-RES-LIM-01 steady-state RSS ≤ provisional 250 MiB with no monotonic climb**
Given `missions` running with `seed_25_missions` + `seed_3_vehicles_2_default` and no host-side memory limit
When the test orchestrator drives ~50 RPS of mixed `GET /vehicles`, `GET /missions`, `GET /missions/{id}/waypoints` for 5 minutes from a single concurrent client, while polling `docker stats --no-stream missions-sut` every 5s
Then the P95 of the 60 RSS samples is `≤ 250 MiB` (provisional gate)
And the final-sample RSS is within ± 20% of the P95 RSS (no sustained leak — RSS does not climb monotonically)
And the measured P95 is recorded to the CSV `Traces` column as `P95_RSS_MiB=<n>`
**AC-2: NFT-RES-LIM-02 connection pool ≤ 100 with no unbounded growth**
Given the same setup as NFT-RES-LIM-01
When the test orchestrator polls side-channel `SELECT count(*) FROM pg_stat_activity WHERE application_name LIKE 'Npgsql%' OR (usename='postgres' AND backend_type='client backend')` every 5s for 5 minutes
Then the max sampled connection count is `≤ 100`
And the final-sample count is `≤ 1.3 × (mean of samples in the first minute)`
And the measured max is recorded as `MAX_NPGSQL_CONNS=<n>`
**AC-3: NFT-RES-LIM-03 file descriptors ≤ 1024 with no leak**
Given the same setup as NFT-RES-LIM-01
When the test orchestrator executes `docker exec missions-sut sh -c 'ls /proc/$(pgrep -f Azaion.Missions.dll | head -1)/fd | wc -l'` every 5s for 5 minutes
Then the max sampled FD count is `≤ 1024`
And the final-sample count is `≤ 1.3 × (count at t=1min)`
And the measured max is recorded as `MAX_FD=<n>`
**AC-4: NFT-RES-LIM-04 cold-start RSS ≤ 200 MiB**
Given `missions` has been started fresh (via `docker compose up -d missions` after `down -v`), no requests issued yet
When `GET /health` first returns `200` AND 30s have elapsed
Then `docker stats --no-stream missions-sut` reports `MEM USAGE` ≤ 200 MiB
And the measured cold-start RSS is recorded as `COLD_RSS_MiB=<n>`
## Non-Functional Requirements
**Performance**
- NFT-RES-LIM-01..03: each take exactly 5 minutes (sampling window). With Arrange/teardown, ≤ 6 minutes wall-clock.
- NFT-RES-LIM-04: ≤ 60s wall-clock (fresh start + health-poll + 30s wait + measurement).
- The total task runtime budget is ≤ 20 minutes, fitting inside the documented 15-min suite CI gate per `environment.md`. NFT-RES-LIM-01..03 share the same 5-minute window and run concurrently against a single dockerised `missions`; NFT-RES-LIM-04 runs separately because it requires a fresh start.
**Reliability**
- The load generator is a single-thread `HttpClient` driving requests in a tight loop; this is documented at 50 RPS approximately for the in-suite test runner. If the runner is unable to sustain 50 RPS (CI infrastructure too slow), the test SKIPS NFT-RES-LIM-01..03 with `Result=skip` and a clear `ErrorMessage=runner cannot sustain target load`. CI then reruns these on a beefier worker.
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | `seed_25_missions` + 50 RPS for 5 min | P95 RSS sampling | P95 ≤ 250 MiB + no monotonic climb | H1, H6, O10 |
| AC-2 | same | `pg_stat_activity` polling | max ≤ 100 + final ≤ 1.3×steady | O10 |
| AC-3 | same | `/proc/<pid>/fd` polling | max ≤ 1024 + final ≤ 1.3×minute-one | H6, O10 |
| AC-4 | fresh `docker compose up -d` | cold-start RSS at t=30s | RSS ≤ 200 MiB | H1, H3 |
## Constraints
- `docker stats` and `docker exec` from inside the runner: requires Docker socket access; AZ-576 covers this.
- NFT-RES-LIM-03 requires `pgrep` inside the `missions` image; the test FAILS in Arrange (not Assert) if `pgrep` is unavailable. Alternative: parse `/proc/1/comm` if PID 1 is the .NET process (preferred for the small Dockerfile).
- All measurements are recorded to the CSV report's `Traces` field so deployment planning can pick them up; this is more important than the pass/fail gate.
- Provisional gates are documented per `restrictions.md` H6 — locked in based on first measured run.
- AAA pattern with `// Arrange` / `// Act` / `// Assert` per test.
## Risks & Mitigation
**Risk 1: Measurement variance on shared CI runners**
- *Risk*: A runner under noisy-neighbour load reports inflated RSS, flaking the gate.
- *Mitigation*: Gates are provisional and generous (250 MiB vs. typical .NET service of ~150 MiB; 100 connections vs. typical idle pool of ~510). After the first green run, the gate is locked at `measured + 50%`.
**Risk 2: NFT-RES-LIM-01..03 share a 5-minute window — flake correlation**
- *Risk*: A CI hiccup that kills the SUT mid-window flakes all three at once.
- *Mitigation*: Each test asserts its own metric; on `missions-sut` exit during the window, the test FAILS with a `"SUT exited during measurement window"` ErrorMessage rather than reporting a misleading metric value.
**Risk 3: Provisional gates silently accepted as the locked gate**
- *Risk*: If the first green run measures 200 MiB and the test passes, a future engineer treats 250 MiB as the gate forever — but actual headroom is only 50 MiB.
- *Mitigation*: The test logs `(measured / gate) ratio`; CI dashboards flag ratios > 0.8 for re-tuning consideration. The lock-in workflow is documented in `restrictions.md` H6.
## System Under Test Boundary
- Tests drive the product through the public HTTP surface for load generation; `docker stats`, `docker exec`, and side-channel `pg_stat_activity` for measurement. Expected outputs are the documented gates from `_docs/02_document/tests/resource-limit-tests.md` (provisional) and the corresponding entries in `_docs/00_problem/input_data/expected_results/results_report.md` (when locked).
- Stubs are allowed ONLY for: the external `admin` JWT issuer (`jwks-mock` container) and the DB-only stub tables for `media`, `annotations`, `detection`, `map_objects`.
- Stubs, fakes, deterministic fallbacks, monkeypatches, or direct imports are NOT allowed for any internal product module — including the Npgsql connection pool, the `AppDataConnection` lifetime, or the `Program.cs` startup path. If any of these is not implemented, the test MUST fail/block as missing product implementation — it must not pass by replacing the module with a test stub.