Decompose Step 6 snapshot: 140 task specs + contract docs

Closes out greenfield Step 6 (Decompose) for all 14 components
(C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446
plus the _dependencies_table.md and component contract documents.

State file updated to greenfield Step 7 (Implement), not_started.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-11 00:39:48 +03:00
parent 8171fcb29e
commit 880eabcb3f
172 changed files with 22897 additions and 35 deletions
@@ -0,0 +1,107 @@
# Contract: fdr_client_protocol
**Component**: shared_fdr_client (cross-cutting concern owned by E-CC-FDR-CLIENT / AZ-247)
**Producer task**: AZ-273 — `_docs/02_tasks/todo/AZ-273_fdr_client_ringbuf.md`
**Consumer tasks**: every onboard component that emits FDR records (C1C13), the C13 writer thread (AZ-248 / E-C13), the overrun-policy task (AZ-XX / E-CC-FDR-CLIENT #3), `FakeFdrSink` (AZ-XX / E-CC-FDR-CLIENT #4), and the composition root (`runtime_root.py`)
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
Frozen public surface for the producer-side FDR queue. Every onboard producer holds exactly one `FdrClient(producer_id)`, calls `enqueue(record)`, and never blocks. The C13 writer thread is the sole consumer via `pop_one` / `drain`. The `on_overrun` hook is the documented extension point through which the overrun-policy PBI (next task in this epic) implements drop-oldest + `kind="overrun"` emission — without this hook, overrun behaviour would be hard-coded into the queue and AC-NEW-3 ("no silent drops") would be unobservable from outside.
## Shape
### Function and method APIs
```python
from typing import Callable
from .fdr_record_schema import FdrRecord # owned by AZ-272
class EnqueueResult:
OK = "ok"
OVERRUN = "overrun"
class FdrSpscViolationError(RuntimeError):
"""Raised when the SPSC contract is violated (concurrent dequeue, multi-producer enqueue)."""
class FdrClient:
def __init__(self, producer_id: str, capacity: int) -> None: ...
@property
def producer_id(self) -> str: ...
@property
def on_overrun(self) -> Callable[[FdrRecord], None] | None: ...
@on_overrun.setter
def on_overrun(self, hook: Callable[[FdrRecord], None] | None) -> None: ...
# Producer-side (single-threaded per FdrClient; lock-free; never blocks).
def enqueue(self, record: FdrRecord) -> EnqueueResult: ...
# Consumer-side (C13 writer; single-threaded per FdrClient; SPSC contract).
def pop_one(self) -> FdrRecord | None: ...
def drain(self, max_records: int) -> list[FdrRecord]: ...
# Test-only.
def flush(self) -> None: ...
# Module-level factory; preferred entrypoint for production code.
def make_fdr_client(producer_id: str, config: Config) -> FdrClient: ...
```
| Symbol | Required | Description | Constraints |
|--------|----------|-------------|-------------|
| `FdrClient(producer_id, capacity)` | yes | Construct a per-producer client; `capacity` MUST be `>= 16` and a power of two (ring-buffer-friendly) | `producer_id` non-empty; raises `ValueError` otherwise |
| `enqueue(record)` | yes | Non-blocking single-producer enqueue | Returns `OK` on success or `OVERRUN` when buffer is full; never raises into the caller; allocation-free on steady state |
| `on_overrun` (property) | yes | Hook invoked exactly once per overrun event with the would-be-enqueued record | Set by the overrun-policy PBI; default is `None` (records dropped silently is NOT acceptable in production — AC-NEW-3 requires the hook to be wired in `compose_root`) |
| `pop_one()` | yes | Single-consumer dequeue; returns the next record or `None` if empty | SPSC: only ONE thread may call `pop_one` / `drain` |
| `drain(max_records)` | yes | Pop up to `max_records` records in a single call | Same SPSC constraint as `pop_one` |
| `flush()` | yes | Test-only: blocks the calling thread until the buffer is empty | Production code MUST NOT call this on the hot path |
| `make_fdr_client(producer_id, config)` | yes | Factory; reads capacity from `config.fdr_client.<producer_id>.capacity` with documented default; caches one instance per `producer_id` | Two calls with the same `producer_id` return the same instance |
## Invariants
- **Lock-free**: `enqueue` and `pop_one` MUST NOT acquire a lock that any other thread can hold. They MAY use atomic primitives (CAS, single-word reads/writes, memory barriers) — these are not "locks" in the queue's sense.
- **Non-blocking enqueue**: `enqueue` returns within O(1) and never transitions the calling thread to BLOCKED state. When the buffer is full, it returns `OVERRUN` synchronously and invokes `on_overrun(record)` exactly once if the hook is set.
- **Allocation-free steady state**: `enqueue` for an in-buffer record (slot is free) MUST NOT allocate any heap object. The contract test verifies this with a `tracemalloc` snapshot diff (0 new objects).
- **SPSC**: each `FdrClient` instance has at most ONE producer thread (calls `enqueue`) and at most ONE consumer thread (calls `pop_one` / `drain`). Multi-producer or multi-consumer use is undefined behaviour. The instance includes an opt-in guard that raises `FdrSpscViolationError` on concurrent entry — used by the contract test.
- **One client per producer_id**: `make_fdr_client(producer_id, config)` returns the same cached instance on repeat calls. Tests use `_reset_for_tests()` (private, documented in Non-Goals) to clear the cache.
- **Producer_id stamped on every record**: `enqueue` does NOT mutate `record.producer_id` — the caller is responsible for setting it. The contract test verifies that `enqueue(FdrRecord(..., producer_id="c1_vio"))` lands on the consumer side with `producer_id == "c1_vio"`.
- **Cold-start budget**: constructing all FdrClient instances during `compose_root` is a one-time cost; the contract requires per-instance construction p99 ≤ 1 ms on Tier-2 (so ≤ 13 producers × 1 ms = 13 ms within the 1 s `compose_root` budget from the composition_root_protocol contract).
## Non-Goals
- This contract does NOT define the drop-oldest behaviour or what `on_overrun` does — that is the next PBI (AZ-XX) in this epic. The contract only defines the hook signature and the "exactly-once" invariant.
- This contract does NOT define the C13 writer thread, segment files, segment rotation, or 64 GB cap — owned by E-C13 (AZ-248).
- This contract does NOT define the `FdrRecord` schema or its serialisation — owned by AZ-272.
- This contract does NOT define `FakeFdrSink` — owned by the fourth PBI in this epic. `FakeFdrSink` SHOULD conform to `FdrClient`'s public surface so it is a drop-in replacement for component tests.
- `_reset_for_tests()` is intentionally private and test-only. Production code calling it is a contract violation.
## Versioning Rules
- **Breaking changes** (renaming a public method, changing return types, removing a method, weakening an invariant) → new major version + a deprecation pass through every consumer.
- **Non-breaking additions** (adding a new method, adding an optional kwarg with a default, strengthening an invariant) → minor version bump.
- **Patch changes** (doc clarification, performance budget tightening within tested limits) → patch bump.
- The contract test (`tests/contract/fdr_client_protocol.py`) MUST be updated alongside any version bump.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| valid-enqueue-pop-roundtrip | One `enqueue(record)` followed by one `pop_one()` | record returned; second `pop_one()` returns None | basic happy path |
| nonblocking-stalled-consumer | Consumer never calls `pop_one`; producer calls `enqueue` 1025 times into 1024-cap client | every call returns within 50 µs; #1025 returns `OVERRUN` | covers AC-1 |
| allocation-free-steady-state | Warmup, then `tracemalloc` snapshot diff across one `enqueue` | 0 new heap objects | covers AC-2 |
| capacity-from-config | `make_fdr_client("c1_vio", config_with_capacity_4096)` | `client._capacity() == 4096` | covers AC-3 |
| spsc-guard-rejects-multi-consumer | Two threads call `pop_one()` concurrently with guard enabled | `FdrSpscViolationError` raised | covers AC-4 |
| on-overrun-fires-once | Recording closure on `on_overrun`; force one overrun | closure called exactly once with the offending record | covers AC-5 |
| flush-drains | N records buffered, draining consumer thread, call `flush()` | returns only after buffer empty | covers AC-6 |
| empty-producer-id-rejected | `FdrClient(producer_id="")` | `ValueError` mentioning `producer_id` | covers AC-7 |
| invariant-cached-instance | Two `make_fdr_client("c1_vio", config)` calls | same instance | NFR-reliability |
| spsc-guard-rejects-multi-producer | Two threads call `enqueue` concurrently on same client with guard enabled | `FdrSpscViolationError` raised | strengthens AC-4 |
| no-mutation-of-producer-id | `enqueue(FdrRecord(producer_id="c1_vio"))` then `pop_one()` | popped record has `producer_id == "c1_vio"` | invariant test |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract derived from E-CC-FDR-CLIENT epic (AZ-247) | autodev decompose Step 2 |
@@ -0,0 +1,107 @@
# Contract: fdr_record_schema
**Component**: shared_fdr_client (cross-cutting concern owned by E-CC-FDR-CLIENT / AZ-247)
**Producer task**: AZ-272 — `_docs/02_tasks/todo/AZ-272_fdr_record_schema.md`
**Consumer tasks**: every onboard component that emits FDR records (C1C13), the C13 writer (AZ-248 / E-C13), post-flight tooling (E-C12 operator side), the FdrClient ring buffer (AZ-XX / E-CC-FDR-CLIENT next task), and `FakeFdrSink` (AZ-XX / E-CC-FDR-CLIENT fourth task)
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
Frozen, versioned wire format for every record written to the Flight Data Recorder. Every onboard producer (logs, VIO ticks, state ticks, tile matches, overruns, rollovers, failed-tile thumbnails, mid-flight tile snapshots, flight headers/footers) MUST round-trip through this schema, and the C13 writer + post-flight tooling MUST be the only readers. The schema enforces forward-compatibility so post-flight tooling pinned at version N keeps working when producers move to N+1.
## Shape
### Outer envelope (one of these per record on the wire)
```python
# Conceptual dataclass — actual implementation may emit via orjson- or msgpack-backed serialiser pinned at E-BOOT.
@frozen
class FdrRecord:
schema_version: int # MUST be >= 1; reader uses this to pick the right parser branch
ts: str # ISO 8601 UTC, microsecond precision, e.g. "2026-05-10T03:14:15.123456Z"
producer_id: str # non-empty; component slug from module-layout.md (e.g. "c2_vpr") or "shared.<name>" for cross-cutting producers
kind: str # one of the v1.0.0 kinds (closed enum below) OR an unknown future tag (preserved opaquely)
payload: dict[str, Any] # kind-specific shape; well-known shapes documented per kind below
# Forward-compat bucket — populated by parser when the wire bytes carry fields the local schema does not know.
# NEVER set by producers; producers leave it empty.
extra: dict[str, Any] = field(default_factory=dict)
```
| Field | Type | Required | Description | Constraints |
|-------|------|----------|-------------|-------------|
| `schema_version` | integer | yes | Schema major.minor packed as integer (1 for 1.x, 2 for 2.x) | `>= 1` |
| `ts` | string (ISO 8601 UTC, µs) | yes | Emit timestamp | RFC 3339 with `Z` suffix |
| `producer_id` | string | yes | Origin producer slug | non-empty; matches a module-layout component slug or `shared.<name>` |
| `kind` | string | yes | Record category | dotted snake_case, max 64 chars; v1.0.0 closed enum below |
| `payload` | object | yes (may be `{}` for kinds whose payload is empty) | Kind-specific data | JSON-safe / msgpack-safe scalars, nested dicts/arrays, no binary blobs >4 KiB |
| `extra` | object | parser-only | Forward-compat bucket for unknown future fields | populated by parser; producers MUST leave empty |
### v1.0.0 closed enum of `kind` values
| `kind` | Producer | Payload shape (required keys) | Notes |
|--------|----------|-------------------------------|-------|
| `log` | every component (via E-CC-LOG bridge) | `{level, component, frame_id?, kind, msg, kv, exc?}` (matches `log_record_schema` v1.0.0) | Forwarded WARN/ERROR records (per AZ-267 fdr_log_bridge) |
| `vio.tick` | C1 | `{frame_id, R, t, P, last_anchor_age_ms, mre_px?, imu_bias_norm?}` | Per-frame VIO output |
| `state.tick` | C5 | `{frame_id, fused_pose, covariance_2x2, estimator_label}` | Smoothed fused-pose tick |
| `tile_match` | C2.5 / C3 | `{frame_id, tile_id, score, match_count, ransac_inliers}` | Tile-matching diagnostics |
| `overrun` | E-CC-FDR-CLIENT itself | `{producer_id, dropped_count}` (`dropped_count > 0`) | AC-NEW-3: never silent. Emitted by drop-oldest hook |
| `segment_rollover` | E-C13 (writer) | `{old_segment, new_segment, total_bytes_after}` | Emitted on segment rotation, including 64 GB-cap drops |
| `failed_tile_thumbnail` | C6 / C11 | `{frame_id, tile_id, jpeg_bytes_b64}` (≤ 0.1 Hz rate cap) | AC-8.5 forensic exception |
| `mid_flight_tile_snapshot` | C13 (snapshot path) | `{snapshot_path, captured_at}` | AC-8.4 mid-flight snapshot pointer |
| `flight_header` | C13 (writer) | `{flight_id, started_at, schema_version, build_info}` | Single record at flight open |
| `flight_footer` | C13 (writer) | `{flight_id, ended_at, records_written, records_dropped}` | Single record at flight close |
### Wire bytes
- `serialise(record: FdrRecord) -> bytes` returns a single self-delimited byte string (length-prefixed if msgpack, single-line UTF-8 if orjson — pinned at E-BOOT in `pyproject.toml`).
- `parse(buf: bytes) -> FdrRecord` is the inverse for a single record. Streaming parser (multi-record) is not part of this contract — C13 writer/reader own that.
## Invariants
- `schema_version >= 1` on every record; missing or non-integer values are rejected by `parse` with `FdrSchemaError`.
- `producer_id` is non-empty on every record. Anonymous records on the wire are a contract violation — `serialise` rejects them with `FdrSchemaError`.
- For `kind="overrun"`: `payload.producer_id` MUST equal the originating producer's slug, and `payload.dropped_count` MUST be `> 0`. (The OUTER envelope's `producer_id` is `"shared.fdr_client"` because the overrun record is emitted by the FdrClient itself, not by the producer whose enqueue overran.)
- Forward-compatible parser: a record at minor version N+1 carrying fields unknown at version N parses without exception; unknown payload fields land in `payload.extra`; unknown top-level fields land in record-level `extra`. Tooling MAY then choose to skip the record.
- Unknown future `kind` values do NOT raise — `parse` returns an `FdrRecord` with `kind` set to the raw string and `payload` set to whatever decoded; tooling MAY skip.
- Renaming a field, changing a field type, or removing a required field requires a major version bump (schema_version 2.x).
- Embedded binary blobs ≤ 4 KiB only. Bigger payloads (e.g. mid-flight tile JPEGs, ML inference inputs) MUST be referenced by sidecar path on disk; the contract test rejects oversized inline blobs.
- `serialise` and `parse` are pure: same input → byte-identical output (or deep-equal record).
- `FdrSchemaError` is the ONLY exception type either function raises on schema violation; library-specific exceptions (`orjson.JSONDecodeError`, `msgpack.UnpackException`, etc.) MUST be wrapped before crossing the public API.
## Non-Goals
- This contract does NOT define the lock-free SPSC ring buffer (`FdrClient`) — owned by the next task in E-CC-FDR-CLIENT.
- This contract does NOT define the writer thread, segment files, or 64 GB cap — owned by E-C13 (AZ-248).
- This contract does NOT define what triggers a record (per-component § 9 logging policies, VIO tick rate, etc. are owned by component epics).
- This contract does NOT define multi-record framing on disk — that is C13's segment file format, owned separately.
## Versioning Rules
- **Breaking changes** (field renamed/removed, type changed, ordering changed for length-prefixed wire format, library choice changed) → new major version (e.g. 2.0.0) + a deprecation pass through every consumer + a paired major bump on this contract.
- **Non-breaking additions** (new optional payload field appended, new `kind` value, new top-level optional field) → minor version bump. Forward-compat parser tolerates these by design.
- **Patch changes** (clarification, doc-only, no wire change) → patch bump.
- The contract test (`tests/contract/fdr_record_schema.py`) MUST be updated alongside any version bump.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| valid-roundtrip-log | `kind="log", payload={"level":"INFO","component":"c2_vpr","kind":"vpr.warmup","msg":"loaded","kv":{"model":"salad"}}` | `parse(serialise(r)) == r` | covers AC-1 |
| valid-roundtrip-overrun | `kind="overrun", producer_id="shared.fdr_client", payload={"producer_id":"c1_vio","dropped_count":42}` | round-trips; both producer_ids preserved | covers AC-1 + AC-5 |
| forward-compat-future-field | wire bytes carry `payload.new_field="x"` (hypothetical v1.1) parsed at v1.0 | record parses; `payload.extra["new_field"] == "x"` | covers AC-2 |
| forward-compat-unknown-kind | `kind="future.kind", payload={"foo":1}` | record parses opaquely; no exception | covers AC-3 |
| invalid-missing-version | bytes missing `schema_version` field | `FdrSchemaError`; message names `schema_version` | covers AC-4 |
| invalid-overrun-missing-dropped-count | `kind="overrun", payload={"producer_id":"c1_vio"}` | `FdrSchemaError`; message names `dropped_count` | covers AC-5 |
| invalid-overrun-zero-dropped-count | `kind="overrun", payload={"producer_id":"c1_vio","dropped_count":0}` | `FdrSchemaError`; message names `dropped_count` | covers AC-5 (must be `> 0`) |
| invalid-empty-producer-id | `producer_id=""` on serialise | `FdrSchemaError`; message names `producer_id` | covers AC-6 |
| invalid-oversized-blob | `payload={"jpeg":<8 KiB bytes>}` | `FdrSchemaError`; message says "use sidecar path" | invariant: ≤ 4 KiB inline |
| pure-determinism | call `serialise(r)` twice | byte-identical outputs | NFR-reliability |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract derived from E-CC-FDR-CLIENT epic (AZ-247) | autodev decompose Step 2 |