mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 07:01:13 +00:00
[AZ-326] [AZ-327] C12 operator-tool CLI + companion SSH bringup
AZ-326 (3pt): operator-tool Click CLI shell at src/gps_denied_onboard/components/c12_operator_tooling/cli.py with six subcommands (download, build-cache, upload-pending, reloc-confirm, verify-ready, set-sector); SectorClassificationStore (atomic-write JSON under ~/.azaion/onboard/sector-classifications.json); freshness-table lookup driving AC-NEW-6; EXIT_* constants; AZ-266 structured-JSON log wiring to a rotating ~/.azaion/onboard/c12-tooling.log handler; operator-tool console-script entry in pyproject.toml. AZ-327 (3pt): CompanionBringup orchestrator at src/gps_denied_onboard/components/c12_operator_tooling/companion_bringup.py that opens an SSH session against the companion (paramiko per project pin), checks the four pre-flight artifacts (Manifest, expected engines, sha256 sidecars, calibration), and returns a ReadinessReport per description.md S2; CompanionUnreachableError + ContentHashMismatchError with operator-friendly remediation hints; ParamikoSshSessionFactory + RemoteSidecarVerifier (sha256sum + cat over SSH, no bytes pulled to the workstation); paramiko>=3.4,<4.0 dep added. NFR-perf-cold-start fix: PEP 562 lazy __getattr__ in c12_operator_tooling/__init__.py and flights_api/__init__.py defers HttpxFlightsApiClient (httpx), ParamikoSshSession[Factory] (paramiko + cryptography), bbox_from_waypoints / takeoff_origin_from_flight (numpy + pyproj). cli.py imports from leaf flights_api modules. operator-tool --help cold start: ~870ms -> <200ms typical, <500ms p99. Includes 73 unit tests (incl. paramiko-version-drift smoke per AZ-327 Risk 1) + console-script integration test. All 1494 repo-wide unit tests pass; 80 skips are pre-existing environment gates. Batch report: _docs/03_implementation/batch_42_cycle1_report.md. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,228 @@
|
||||
# C12 CLI App — Typer Entry Point + Subcommand Routing + Operator Helpers
|
||||
|
||||
**Task**: AZ-326_c12_cli_app
|
||||
**Name**: C12 CLI App
|
||||
**Description**: Implement the operator-tooling CLI shell that operators run on the workstation. Wires Typer (per the Click/Typer project pin) into `operator_tool/__main__.py`, registers six subcommands (`download`, `build-cache`, `upload-pending`, `reloc-confirm`, `verify-ready`, `set-sector`), wires the E-CC-LOG (AZ-266) logger to a workstation-side structured-JSON log file (`~/.azaion/onboard/c12-tooling.log`), and ships the two trivial operator-side helpers from description.md § 2 — `set_sector_classification(area, sector_class)` (persists per-area classification to a local JSON file under the operator workstation's home directory) and `apply_freshness_threshold(sector_class) -> int (months)` (a pure-data lookup that maps the sector classification enum to the AC-NEW-6 months freshness budget). Each subcommand is a thin shell that resolves its service collaborator (`flights_api_client`, `build_cache`, `companion_bringup`, `post_landing_upload`, `operator_reloc_service` — all owned by sibling tasks AZ-489 / AZ-NNN T2..T5) from the composition root and delegates to it; on success returns 0; on a known error type maps to a documented non-zero exit code with a one-line operator-friendly message + remediation hint pulled from the underlying error's `remediation` attribute. The CLI app does NOT own any workflow logic itself — only command registration, argument parsing, logger wiring, exit-code mapping, and the two simple operator helpers. **ADR-010 amendment**: the `build-cache` subcommand accepts a mutually-exclusive pair `--flight-id <Guid> | --flight-file <Path>` and forwards the resolved `FlightDto` (via AZ-489 `FlightsApiClient`) to the orchestrator (AZ-328), which derives the bbox + takeoff origin from it. The legacy `--bbox` flag is dropped because the bbox is now derived; passing it is an error.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-489_c12_flights_api_client (for the `FlightsApiClient` service collaborator + DTO definitions surfaced via `--flight-id` / `--flight-file`)
|
||||
**Component**: c12_operator_tooling (epic AZ-253 / E-C12)
|
||||
**Tracker**: AZ-326
|
||||
**Epic**: AZ-253 (E-C12)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/components/13_c12_operator_tooling/description.md` — § 2 (`set_sector_classification`, `apply_freshness_threshold` from `CacheBuildWorkflow`), § 5 (logging strategy table), § 7 (CLI-only this cycle, GUI deferred).
|
||||
- `_docs/02_document/contracts/shared_logging/log_record_schema.md` — INFO/WARN/ERROR log shapes for operator events.
|
||||
|
||||
## Problem
|
||||
|
||||
Without a real CLI shell:
|
||||
|
||||
- F1 (pre-flight cache build) and F10 (post-landing upload) have no operator entry point — every workflow function in this epic is unreachable from the workstation.
|
||||
- AC-NEW-6 (freshness pipeline) collapses partially — sibling tasks have no canonical place to call `apply_freshness_threshold(sector_class)` so each invents its own table or hard-codes months.
|
||||
- Sector classification (active-conflict vs stable-rear) per description.md § 1 has no persistent surface; operator restarts lose all classifications.
|
||||
- Logging from C12 is silent — without the wiring of E-CC-LOG to the workstation-side log file, every operator action is invisible during incident review.
|
||||
- Sibling tasks T2..T5 have no consumer; their service classes ship but no end-to-end CLI flow exercises them.
|
||||
- Exit codes are inconsistent across subcommands — operators script `operator-tool` runs and need `$?` to mean something specific per failure category.
|
||||
|
||||
This task delivers the CLI shell + the two trivial operator helpers. It does NOT own `build_cache`, `verify_companion_ready`, `trigger_post_landing_upload`, or `OperatorReLocService` — those are sibling tasks invoked through the CLI.
|
||||
|
||||
## Outcome
|
||||
|
||||
- A Typer-based CLI app at `src/operator_tool/`:
|
||||
- `src/operator_tool/__main__.py` — module entry point: `from operator_tool.cli import app; app()`.
|
||||
- `src/operator_tool/cli.py` — Typer `app = typer.Typer(name="operator-tool", help="GPS-denied onboard pre-flight tooling (operator workstation)")`. Registers six subcommands via `@app.command(...)`. Each subcommand opens a logging context, calls into its service collaborator, catches the documented exception family for that command, maps to the documented exit code, and `raise typer.Exit(code=N)`.
|
||||
- `src/operator_tool/sector_classification_store.py` — `SectorClassificationStore` class:
|
||||
- Constructor: `__init__(self, *, store_path: Path, logger: Logger)`.
|
||||
- `set_classification(area: AreaIdentifier, sector_class: SectorClassification) -> None` — persists `{area_id: sector_class}` mapping to `store_path` (default: `~/.azaion/onboard/sector-classifications.json`) using atomic write (`tempfile + os.replace`).
|
||||
- `get_classification(area: AreaIdentifier) -> SectorClassification | None` — reads the JSON file; returns the classification for the given area or `None` if not set.
|
||||
- `list_classifications() -> dict[AreaIdentifier, SectorClassification]` — returns all current classifications.
|
||||
- File format: `{"area_id": "active_conflict" | "stable_rear", ...}`.
|
||||
- INFO log on every `set_classification` call (`kind="c12.sector.classification.set"`).
|
||||
- `src/operator_tool/freshness_table.py` — `freshness_threshold_months(sector_class: SectorClassification) -> int`:
|
||||
- Pure data: `active_conflict → 1 month`; `stable_rear → 12 months`. Documented inline as the AC-NEW-6 freshness budget per description.md § 1 + Plan-phase intent.
|
||||
- Module-level constant: `FRESHNESS_TABLE: dict[SectorClassification, int]`.
|
||||
- `src/operator_tool/exit_codes.py` — module-level constants: `EXIT_OK = 0`, `EXIT_GENERIC_ERROR = 1`, `EXIT_USAGE = 2`, `EXIT_COMPANION_UNREACHABLE = 10`, `EXIT_CONTENT_HASH_MISMATCH = 11`, `EXIT_DOWNLOAD_FAILURE = 20`, `EXIT_BUILD_FAILURE = 21`, `EXIT_FLIGHT_STATE_NOT_CONFIRMED = 30`, `EXIT_UPLOAD_FAILURE = 31`, `EXIT_GCS_LINK_ERROR = 40`, `EXIT_LOCK_HELD = 50`, `EXIT_FLIGHTS_API_UNREACHABLE = 60`, `EXIT_FLIGHTS_API_AUTH = 61`, `EXIT_FLIGHT_NOT_FOUND = 62`, `EXIT_FLIGHT_SCHEMA = 63`, `EXIT_EMPTY_WAYPOINTS = 64`. Sibling tasks may extend with documented additions.
|
||||
- A composition root entry at `src/gps_denied_onboard/runtime_root/c12_factory.py`:
|
||||
- `build_operator_tool(config: Config) -> OperatorToolServices` — pure factory that constructs the `SectorClassificationStore` + a logger configured to write to `~/.azaion/onboard/c12-tooling.log`. Returns a frozen dataclass aggregating the operator-tool service handles. Sibling tasks T2..T5 each add their service to this dataclass without renaming or moving it.
|
||||
- Subcommand surface (each subcommand body lives in `cli.py`; service implementations live in sibling task files):
|
||||
- `download` — delegates to `tile_downloader.fetch(...)` (AZ-316). Maps `SatelliteProviderError → EXIT_DOWNLOAD_FAILURE`.
|
||||
- `build-cache` — accepts a mutually-exclusive pair `--flight-id <Guid> | --flight-file <Path>` (Typer-enforced via a callback that rejects both-set / neither-set with `EXIT_USAGE`), plus `--sector-class`, `--calibration-path`. Delegates to `build_cache_orchestrator.build_cache(...)` (sibling AZ-328) passing the resolved `FlightDto` (the orchestrator computes bbox + takeoff origin from it via AZ-489 helpers). Maps `CacheBuildError → EXIT_DOWNLOAD_FAILURE | EXIT_BUILD_FAILURE` (per `failure_phase`); `BuildLockHeldError → EXIT_LOCK_HELD`; `FlightsApiUnreachableError → EXIT_FLIGHTS_API_UNREACHABLE`; `FlightsApiAuthError → EXIT_FLIGHTS_API_AUTH`; `FlightNotFoundError → EXIT_FLIGHT_NOT_FOUND`; `FlightsApiSchemaError | FlightFileNotFoundError | WaypointSchemaError → EXIT_FLIGHT_SCHEMA`; `EmptyWaypointsError → EXIT_EMPTY_WAYPOINTS`.
|
||||
- `upload-pending` — delegates to `post_landing_upload.trigger_post_landing_upload(...)` (sibling T4). Maps `FlightStateNotConfirmedError → EXIT_FLIGHT_STATE_NOT_CONFIRMED`; `UploadGateBlockedError → EXIT_UPLOAD_FAILURE`.
|
||||
- `reloc-confirm` — delegates to `operator_reloc_service.request_reloc(...)` (sibling T5). Maps `GcsLinkError → EXIT_GCS_LINK_ERROR`.
|
||||
- `verify-ready` — delegates to `companion_bringup.verify_companion_ready(...)` (sibling T2). Maps `CompanionUnreachableError → EXIT_COMPANION_UNREACHABLE`; `ContentHashMismatchError → EXIT_CONTENT_HASH_MISMATCH`.
|
||||
- `set-sector` — delegates to `SectorClassificationStore.set_classification(...)`.
|
||||
- Each subcommand's `--help` includes a one-line summary + the AC IDs it supports (e.g. `build-cache: orchestrate F1 (AC-8.3, AC-NEW-1)`).
|
||||
- Logging is wired at app startup: a single rotating file handler at `~/.azaion/onboard/c12-tooling.log`, structured JSON formatter from E-CC-LOG (AZ-266). Console (stderr) handler at WARN level for operator visibility.
|
||||
- `pyproject.toml` registers `operator-tool` as a console script entry point pointing at `operator_tool.__main__:main`. The `main` function in `__main__.py` calls `app()`.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `operator_tool` package layout (`__init__.py`, `__main__.py`, `cli.py`, `sector_classification_store.py`, `freshness_table.py`, `exit_codes.py`).
|
||||
- The composition-root factory `build_operator_tool`.
|
||||
- Six subcommand registrations + per-subcommand `--help` text + per-subcommand exception → exit-code mapping.
|
||||
- `SectorClassificationStore` with atomic-write JSON persistence.
|
||||
- `freshness_threshold_months` pure-data lookup.
|
||||
- The exit-code constants module.
|
||||
- Logger wiring for the workstation-side log file (rotating file handler + structured JSON via AZ-266).
|
||||
- Console-script entry-point declaration in `pyproject.toml`.
|
||||
- Unit tests covering: subcommand registration, exception → exit-code mapping (using fakes for service collaborators), `SectorClassificationStore` round-trip (set, get, atomic write resilience), `freshness_threshold_months` for both enum values, console script invocability via `subprocess.run`.
|
||||
|
||||
### Excluded
|
||||
|
||||
- The actual workflows for `build-cache`, `upload-pending`, `reloc-confirm`, `verify-ready` — owned by sibling tasks T2..T5.
|
||||
- The download workflow body — owned by AZ-316.
|
||||
- The MAVLink encoding for `reloc-confirm` — owned by sibling T5.
|
||||
- A GUI surface — Plan-phase carryforward, deferred per description.md § 7.
|
||||
- Anything that runs on the airborne companion (this entire package is operator-workstation-only per ADR-004).
|
||||
- Per-subcommand integration tests against real `satellite-provider` — those live in C12-AT-01 (test decompose).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: All six subcommands register and appear in `--help`**
|
||||
Given the `operator-tool` console script is installed
|
||||
When the operator runs `operator-tool --help`
|
||||
Then the listed subcommands include exactly `download`, `build-cache`, `upload-pending`, `reloc-confirm`, `verify-ready`, `set-sector`; no extras
|
||||
|
||||
**AC-2: Successful subcommand exits 0**
|
||||
Given a subcommand whose service collaborator returns successfully
|
||||
When the subcommand is invoked through the CLI
|
||||
Then the process exit code is 0; no error message is printed to stderr; an INFO log entry is written
|
||||
|
||||
**AC-3: Each documented exception maps to its documented exit code**
|
||||
Given a service collaborator raises one of the documented exception types in this task's outcome list
|
||||
When the subcommand is invoked
|
||||
Then the process exit code matches the constant in `exit_codes.py`; a one-line operator-friendly message is printed to stderr; an ERROR log entry is written with the exception type and the remediation hint
|
||||
|
||||
**AC-4: `SectorClassificationStore` round-trips via atomic write**
|
||||
Given an empty store
|
||||
When `set_classification(area="Derkachi", sector_class=SectorClassification.active_conflict)` is called, then a fresh `SectorClassificationStore` is constructed pointing at the same path
|
||||
Then `get_classification("Derkachi")` returns `SectorClassification.active_conflict`; the on-disk JSON file matches the expected shape; the file's parent directory was created if missing
|
||||
|
||||
**AC-5: `SectorClassificationStore` set is atomic under crash**
|
||||
Given an existing JSON file with one classification
|
||||
When the process is killed (`SIGKILL`) mid-write of a second classification (simulated via a `Path.replace` patch that raises after `tempfile.write` but before `os.replace`)
|
||||
Then the original JSON file remains intact and parseable; no `*.tmp` lingers
|
||||
|
||||
**AC-6: `freshness_threshold_months` returns the documented values**
|
||||
Given the two enum values of `SectorClassification`
|
||||
When `freshness_threshold_months(...)` is called for each
|
||||
Then `active_conflict → 1`, `stable_rear → 12`
|
||||
|
||||
**AC-7: Logging writes structured JSON to the workstation log file**
|
||||
Given a fresh CLI invocation with `~/.azaion/onboard/` empty
|
||||
When any subcommand runs to completion
|
||||
Then a `c12-tooling.log` file exists at `~/.azaion/onboard/`; its lines parse as JSON; each line carries `timestamp`, `level`, `kind`, plus subcommand-specific fields per AZ-266's record schema
|
||||
|
||||
**AC-8: Console-script entry point is installed and runnable**
|
||||
Given the package is installed via `pip install -e .`
|
||||
When the shell runs `operator-tool --help`
|
||||
Then the help text is printed; the exit code is 0; the binary resolves through the entry-point declared in `pyproject.toml`
|
||||
|
||||
**AC-9: Subcommand `--help` references the relevant AC IDs**
|
||||
Given any subcommand
|
||||
When `operator-tool <subcommand> --help` is run
|
||||
Then the help text body includes the AC IDs the subcommand supports (e.g. `build-cache` mentions `AC-8.3, AC-NEW-1`); operators reading `--help` can cross-reference to `acceptance_criteria.md`
|
||||
|
||||
**AC-10: `set-sector` is idempotent for the same input**
|
||||
Given `set-sector --area Derkachi --class active_conflict` was just run
|
||||
When the same command is run again
|
||||
Then the on-disk JSON file is byte-identical (or has only timestamp diffs in the log, not in the data file); the operator sees the same exit code 0 and the same INFO log line
|
||||
|
||||
**AC-11: `build-cache --flight-id` happy path delegates to orchestrator with `FlightDto` (ADR-010)**
|
||||
Given a fake `FlightsApiClient.fetch_flight` returns a 3-waypoint `FlightDto`
|
||||
When `operator-tool build-cache --flight-id 00000000-0000-0000-0000-000000000001 --sector-class stable_rear --calibration-path /tmp/cal.json` runs
|
||||
Then `build_cache_orchestrator.build_cache(...)` is called once with the resolved `FlightDto` (or its `(flight_id, bbox, takeoff_origin)` projection per AZ-328 signature); ZERO calls to `--bbox` legacy parsing
|
||||
|
||||
**AC-12: `build-cache --flight-file` happy path uses offline loader**
|
||||
Given a local JSON file in the documented schema is on disk
|
||||
When `operator-tool build-cache --flight-file /tmp/flight.json --sector-class stable_rear --calibration-path /tmp/cal.json` runs
|
||||
Then `FlightsApiClient.load_flight_file(/tmp/flight.json)` is called once; `fetch_flight` is NOT called; the orchestrator receives the same DTO shape
|
||||
|
||||
**AC-13: `build-cache` with both `--flight-id` and `--flight-file` errors out**
|
||||
When `operator-tool build-cache --flight-id 00000000-0000-0000-0000-000000000001 --flight-file /tmp/flight.json ...` runs
|
||||
Then exit code is `EXIT_USAGE = 2`; stderr names the conflict; ZERO calls to either client method
|
||||
|
||||
**AC-14: `build-cache` with neither `--flight-id` nor `--flight-file` errors out**
|
||||
When `operator-tool build-cache --sector-class stable_rear --calibration-path /tmp/cal.json` runs (no flight source)
|
||||
Then exit code is `EXIT_USAGE = 2`; stderr lists which flag must be supplied
|
||||
|
||||
**AC-15: `FlightNotFoundError` maps to `EXIT_FLIGHT_NOT_FOUND`**
|
||||
Given `fetch_flight` raises `FlightNotFoundError`
|
||||
When `build-cache --flight-id <unknown>` runs
|
||||
Then exit code is `62`; ERROR log carries the offending flight_id; ZERO calls to C11/C10
|
||||
|
||||
**AC-16: `FlightsApiAuthError` maps to `EXIT_FLIGHTS_API_AUTH`** (and never logs the auth token)
|
||||
Given `fetch_flight` raises `FlightsApiAuthError`
|
||||
When `build-cache --flight-id <id>` runs
|
||||
Then exit code is `61`; the structured log entry does NOT contain the `auth_token` value
|
||||
|
||||
**AC-17: `EmptyWaypointsError` maps to `EXIT_EMPTY_WAYPOINTS`**
|
||||
Given the fetched `FlightDto` has zero waypoints
|
||||
When `build-cache --flight-id <id>` runs (and the orchestrator calls `bbox_from_waypoints` → raises)
|
||||
Then exit code is `64`; the stderr message instructs the operator to re-plan in the Mission Planner UI
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- CLI cold start (`operator-tool --help`) ≤ 500 ms on a developer laptop. The Typer app must avoid eager-importing heavy dependencies (httpx, pymavlink, paramiko) — sibling tasks expose lazy-import accessors used by their respective subcommands, not at module load time.
|
||||
|
||||
**Compatibility**
|
||||
- Click/Typer per the project pin (no version override).
|
||||
- The structured JSON log format matches AZ-266's record schema exactly; this task adds no new top-level field.
|
||||
|
||||
**Reliability**
|
||||
- The `SectorClassificationStore` write path is atomic across process kill (per AC-5).
|
||||
- `~/.azaion/onboard/` is created with mode `0o700` if it does not exist.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | `operator-tool --help` output | All 6 subcommands listed |
|
||||
| AC-2 | Subcommand with success-returning fake service | Exit 0, INFO log, no stderr |
|
||||
| AC-3 | Subcommand with raising fake (each documented exception family) | Exit code matches `exit_codes.py`; ERROR log; one-line stderr |
|
||||
| AC-4 | Round-trip `SectorClassificationStore` set → read | Matches input |
|
||||
| AC-5 | Patched `os.replace` to raise mid-write | Original file intact, no `*.tmp` lingers |
|
||||
| AC-6 | `freshness_threshold_months` for both enums | `active_conflict → 1`, `stable_rear → 12` |
|
||||
| AC-7 | Subcommand run, then read log file | Each line parses as JSON; required fields present |
|
||||
| AC-8 | `subprocess.run(["operator-tool", "--help"])` after `pip install -e .` | Exit 0, help text printed |
|
||||
| AC-9 | Per-subcommand `--help` text | Includes documented AC IDs |
|
||||
| AC-10 | Repeated `set-sector` for same area/class | On-disk JSON byte-identical |
|
||||
| AC-11 | `build-cache --flight-id` happy path | Orchestrator called once with resolved DTO |
|
||||
| AC-12 | `build-cache --flight-file` happy path | `load_flight_file` called; `fetch_flight` NOT called |
|
||||
| AC-13 | Both `--flight-id` and `--flight-file` | Exit 2; conflict message |
|
||||
| AC-14 | Neither flight source supplied | Exit 2; usage hint |
|
||||
| AC-15 | `FlightNotFoundError` | Exit 62; flight_id in log |
|
||||
| AC-16 | `FlightsApiAuthError` | Exit 61; auth_token NOT in log |
|
||||
| AC-17 | `EmptyWaypointsError` | Exit 64; Mission Planner UI hint |
|
||||
| NFR-perf-cold-start | Microbench `operator-tool --help` × 10 | p99 ≤ 500 ms |
|
||||
|
||||
## Constraints
|
||||
|
||||
- This task introduces NO new third-party dependencies — Click/Typer is already pinned by the project per description.md § 5.
|
||||
- Heavy dependencies (httpx, pymavlink, paramiko) MUST NOT be eager-imported in `cli.py` or `__main__.py`; they live behind the sibling tasks' service classes that are lazy-resolved.
|
||||
- The CLI is operator-workstation-only — `operator_tool` MUST NOT be importable from any airborne entry point. Verified at the SBOM-diff level by E-BOOT (the CI gate already enforces no `operator_tool` symbol in `production-binary`).
|
||||
- Atomic writes use `tempfile.NamedTemporaryFile(dir=store_path.parent) + os.replace`. Naked `Path.write_text()` is NOT acceptable per `coderule.mdc` "follow established project patterns" (see AZ-280's atomic-write pattern for the established convention; this task uses the simpler stdlib version since there is no SHA-256 sidecar requirement here).
|
||||
- Log file location is fixed at `~/.azaion/onboard/c12-tooling.log` per description.md § 9 — config-overrideable via `config.c12.log_path` for tests but the default MUST match the spec.
|
||||
- Subcommand naming is the source of truth for operators; renaming a subcommand requires a Plan-cycle change.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Heavy imports leak into CLI startup**
|
||||
- *Risk*: A future sibling task lazily-imports a heavy dependency at the wrong scope (module level instead of function level), violating NFR-perf-cold-start.
|
||||
- *Mitigation*: AC-NFR-perf-cold-start microbenches startup; CI hooks the test. If a regression appears, the offending import is surfaced by `python -X importtime`.
|
||||
|
||||
**Risk 2: Operator runs `set-sector` against a stale store path after upgrade**
|
||||
- *Risk*: An operator upgrades the operator-tool tarball; the new version changes the default `store_path`; classifications appear lost.
|
||||
- *Mitigation*: The default path is fixed at `~/.azaion/onboard/sector-classifications.json` and treated as a stable contract. A future cycle that needs to migrate runs an explicit migration; this cycle does NOT change the path.
|
||||
|
||||
**Risk 3: Console script collides with another tool**
|
||||
- *Risk*: The name `operator-tool` is generic; another package on the operator's workstation could shadow it.
|
||||
- *Mitigation*: The package is shipped as part of the operator-tooling tarball with its own venv; no global install. README documents the tarball install procedure.
|
||||
|
||||
**Risk 4: Atomic-write corner case — disk full mid-tempfile**
|
||||
- *Risk*: `tempfile.NamedTemporaryFile.write` could raise `OSError` mid-call; partial tempfile lingers.
|
||||
- *Mitigation*: `try/finally` deletes the tempfile path on any exception in the write path; AC-5 covers the kill-mid-replace case; the disk-full case surfaces as `OSError` to the caller and the original file remains intact.
|
||||
@@ -0,0 +1,209 @@
|
||||
# C12 Companion Bringup — SSH `verify_companion_ready` + `ReadinessReport`
|
||||
|
||||
**Task**: AZ-327_c12_companion_bringup
|
||||
**Name**: C12 Companion Bringup
|
||||
**Description**: Implement `CompanionBringup`, the C12-internal helper that opens an SSH session against the companion (paramiko per project pin), inspects the companion-side filesystem for the four required pre-flight artifacts (Manifest.json, .engine files + AZ-280 sidecars, calibration JSON), runs sidecar verification on the engines via a remote `sha256sum` over the engine path (compared against the sidecar's hex digest), and returns a `ReadinessReport` per description.md § 2 (`manifest_present`, `content_hashes_pass`, `engines_present`, `calibration_present`, `outcome ∈ {ready, not_ready}`, `not_ready_reasons: list[str]`). Owns the two error families: `CompanionUnreachableError` (SSH session-open failure: TCP refused, auth failed, host key mismatch, socket timeout) and `ContentHashMismatchError` (sidecar verification fails on at least one engine — distinct from "engine missing", which is a not-ready signal not an exception). Public surface is one method `verify_companion_ready(companion_address: CompanionAddress) -> ReadinessReport`. SSH user, key file, host-key policy, connect-timeout, and the canonical companion-side cache root come from config (`config.c12.companion_ssh_user`, `config.c12.companion_ssh_keyfile`, `config.c12.companion_host_key_policy`, `config.c12.companion_connect_timeout_s`, `config.c12.companion_cache_root`) per AZ-269. The session is opened in a `try/finally` block; the connection is always closed even if the four checks raise. INFO log on every successful call (with the four boolean flags + outcome); WARN on degraded readiness (any 3-of-4); ERROR on the two error families.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module
|
||||
**Component**: c12_operator_tooling (epic AZ-253 / E-C12)
|
||||
**Tracker**: AZ-327
|
||||
**Epic**: AZ-253 (E-C12)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/components/13_c12_operator_tooling/description.md` — § 2 (`verify_companion_ready` interface + `ReadinessReport` DTO shape), § 5 (`CompanionUnreachableError`, `ContentHashMismatchError`), § 7 (filesystem lockfile note — relevant for orchestrator T3 not this task).
|
||||
- `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md` — sidecar file format (this task verifies remotely; does not import the helper but reuses the schema).
|
||||
- `_docs/02_document/contracts/shared_helpers/engine_filename_schema.md` — engine filename layout used to enumerate the expected engines list.
|
||||
- `_docs/02_document/contracts/shared_logging/log_record_schema.md` — INFO/WARN/ERROR log shapes.
|
||||
|
||||
## Problem
|
||||
|
||||
Without a real `CompanionBringup`:
|
||||
|
||||
- `build_cache` (sibling T3) cannot run safely — the orchestrator would invoke C10 on the companion without any pre-flight visibility into the companion's state. A half-provisioned companion would either silently miscompile (manifest stale) or corrupt the cache.
|
||||
- The `verify-ready` CLI subcommand has no implementation — operators cannot diagnose "is my companion in a usable state?" without SSHing in manually.
|
||||
- Pre-flight content-hash verification per AC-NEW-1's takeoff gate (AZ-324 covers the airborne side) has no operator-side counterpart — sidecar mismatches that occur during the SSH transfer would only surface at takeoff, too late.
|
||||
- `CompanionUnreachableError` and `ContentHashMismatchError` exist as concept-only types in description.md § 5 with no producer.
|
||||
- Configuration knobs for SSH credentials, host-key policy, and the canonical cache root have no consumer; AZ-269's loader cannot validate them against a concrete usage.
|
||||
|
||||
This task delivers the bring-up + verification layer. It does NOT orchestrate the `build_cache` flow (sibling T3 does), does NOT invoke C10 (T3 does via SSH after this task confirms readiness), and does NOT perform the takeoff-time content-hash verification (AZ-324 owns the airborne side).
|
||||
|
||||
## Outcome
|
||||
|
||||
- A `CompanionBringup` class at `src/operator_tool/companion_bringup.py`:
|
||||
- Constructor: `__init__(self, *, ssh_factory: SshSessionFactory, sidecar_verifier: RemoteSidecarVerifier, logger: Logger, config: C12CompanionConfig)`.
|
||||
- `C12CompanionConfig` (`@dataclass(frozen=True)`): `ssh_user: str`, `ssh_keyfile: Path`, `host_key_policy: enum {strict, known_hosts, reject_new}`, `connect_timeout_s: float = 10.0`, `companion_cache_root: PurePosixPath = PurePosixPath("/var/lib/azaion/c10/cache")`, `manifest_filename: str = "Manifest.json"`, `calibration_filename: str = "camera_calibration.json"`, `expected_engines: tuple[str, ...] = ()` (the orchestrator passes the list per the request; default empty fails AC-2 cleanly).
|
||||
- Public method: `verify_companion_ready(companion_address: CompanionAddress) -> ReadinessReport`.
|
||||
- DTOs at `src/operator_tool/_types.py`:
|
||||
- `CompanionAddress` (`@dataclass(frozen=True)`): `host: str`, `port: int = 22`.
|
||||
- `ReadinessReport` (`@dataclass(frozen=True)`): `manifest_present: bool`, `content_hashes_pass: bool`, `engines_present: bool`, `calibration_present: bool`, `outcome: enum {ready, not_ready}`, `not_ready_reasons: tuple[str, ...]`, `companion_cache_root: str`, `engines_inspected_count: int`.
|
||||
- Errors at `src/operator_tool/errors.py`:
|
||||
- `CompanionUnreachableError(Exception)`: attributes `host: str`, `port: int`, `reason: enum {connect_refused, auth_failed, host_key_mismatch, timeout, other}`, `underlying_exception_repr: str`. `remediation` attribute returns a one-line operator-friendly hint per `reason`.
|
||||
- `ContentHashMismatchError(Exception)`: attributes `engine_path: str`, `expected_sha256_hex: str`, `actual_sha256_hex: str`. `remediation` attribute returns "Re-run the cache build (`operator-tool build-cache --area ...`) to repopulate the affected engine.".
|
||||
- A `SshSessionFactory` Protocol at `src/operator_tool/ssh_session.py`:
|
||||
```python
|
||||
@runtime_checkable
|
||||
class SshSession(Protocol):
|
||||
def run(self, command: str, *, timeout_s: float) -> RemoteCommandResult: ...
|
||||
def file_exists(self, remote_path: PurePosixPath) -> bool: ...
|
||||
def list_dir(self, remote_path: PurePosixPath) -> list[str]: ...
|
||||
def close(self) -> None: ...
|
||||
|
||||
@runtime_checkable
|
||||
class SshSessionFactory(Protocol):
|
||||
def open(self, address: CompanionAddress, *, timeout_s: float) -> SshSession: ...
|
||||
```
|
||||
Concrete implementation `ParamikoSshSessionFactory` wraps `paramiko.SSHClient` with the documented host-key policy mapping (`strict → RejectPolicy`, `known_hosts → AutoAddPolicy gated on `~/.ssh/known_hosts` presence`, `reject_new → RejectPolicy with explicit allowlist`).
|
||||
- A `RemoteSidecarVerifier` helper at `src/operator_tool/remote_sidecar_verifier.py`:
|
||||
- `verify(session: SshSession, engine_path: PurePosixPath) -> RemoteSidecarResult` — runs `sha256sum <engine_path>` over the SSH session, parses the first 64 hex chars, reads the sidecar file at `<engine_path>.sha256` via `session.run("cat ...")`, parses its 64 hex chars, compares case-insensitively. Returns `RemoteSidecarResult(matches: bool, expected_hex: str, actual_hex: str)`.
|
||||
- Method flow for `verify_companion_ready`:
|
||||
1. Open SSH session via `ssh_factory.open(companion_address, timeout_s=config.connect_timeout_s)`. On any paramiko/socket exception → catch and raise `CompanionUnreachableError` mapping the underlying type to a `reason` enum value. Always wrap subsequent steps in `try/finally` that closes the session.
|
||||
2. Check 1 — `manifest_present`: `session.file_exists(companion_cache_root / manifest_filename)`.
|
||||
3. Check 2 — `engines_present`: `session.list_dir(companion_cache_root / "engines")` → set of filenames; compare against `config.expected_engines`. If `config.expected_engines` is empty → `engines_present = False`, `not_ready_reasons += ["expected_engines list empty in caller-supplied config"]`. Else `engines_present = expected_engines.issubset(listed_engines)`; if not, append `"engines_missing: <comma-list>"`.
|
||||
4. Check 3 — `content_hashes_pass`: for each engine in the intersection of `expected_engines` and `listed_engines`, call `sidecar_verifier.verify(session, companion_cache_root / "engines" / engine)`. If ANY result `matches == False` → raise `ContentHashMismatchError` with the first failing path. If all match → `content_hashes_pass = True`. Records `engines_inspected_count` regardless.
|
||||
5. Check 4 — `calibration_present`: `session.file_exists(companion_cache_root / calibration_filename)`.
|
||||
6. Compute `outcome`: `ready` iff all four booleans are `True`; `not_ready` otherwise.
|
||||
7. Emit log: INFO `kind="c12.companion.ready"` with the four flags + outcome on success; WARN `kind="c12.companion.degraded"` if any check failed without raising (i.e. `outcome=not_ready` due to a missing artifact, not a hash mismatch).
|
||||
8. Return the `ReadinessReport`.
|
||||
- Composition-root factory at `src/gps_denied_onboard/runtime_root/c12_factory.py` extends T1's `OperatorToolServices` dataclass with a `companion_bringup: CompanionBringup` field. The factory `build_companion_bringup(config) -> CompanionBringup` constructs the paramiko-backed session factory + remote sidecar verifier + logger.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `CompanionBringup` class with the single public method.
|
||||
- The 2 DTOs (`CompanionAddress`, `ReadinessReport`) plus the `outcome` and `reason` enum types.
|
||||
- The 2 error types (`CompanionUnreachableError`, `ContentHashMismatchError`) with `remediation` attributes.
|
||||
- `SshSessionFactory` + `SshSession` Protocols.
|
||||
- `ParamikoSshSessionFactory` + `ParamikoSshSession` concrete implementations.
|
||||
- `RemoteSidecarVerifier` helper.
|
||||
- Composition-root factory.
|
||||
- Config schema extension on AZ-269's loader (`config.c12.companion_*` block).
|
||||
- `verify-ready` subcommand wiring delegated to T1's CLI shell — this task ships the service class; T1's `cli.py` resolves it from the composition root.
|
||||
- Conformance unit tests using a fake `SshSessionFactory` (no paramiko in unit tests) covering all 6 acceptance criteria.
|
||||
|
||||
### Excluded
|
||||
|
||||
- The `build_cache` orchestration that consumes `verify_companion_ready` (sibling T3).
|
||||
- The actual SSH-invocation of C10 on the companion (sibling T3).
|
||||
- The takeoff-time content-hash verification on the airborne side (AZ-324).
|
||||
- Engine compilation (AZ-321), descriptor generation (AZ-322), Manifest writing (AZ-323) — all C10 owns these and they ran prior to this task being invoked.
|
||||
- A SOCKS proxy or jump-host SSH path — direct SSH only this cycle.
|
||||
- Telemetry exfiltration of operator workstation key material — host key + private key never appear in log output (only fingerprint hash if at all).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: All four artifacts present + sidecars verify → `outcome=ready`**
|
||||
Given the companion's SSH is reachable, `Manifest.json` exists, all `expected_engines` exist, all sidecars verify, and the calibration file exists
|
||||
When `verify_companion_ready(address)` is called
|
||||
Then `ReadinessReport(manifest_present=True, content_hashes_pass=True, engines_present=True, calibration_present=True, outcome=ready, not_ready_reasons=())` is returned; ONE INFO log `kind="c12.companion.ready"` is emitted
|
||||
|
||||
**AC-2: Missing engine → `outcome=not_ready`**
|
||||
Given `expected_engines=("dinov2_vpr_sm87_jp62_trt103_fp16.engine", "lightglue_sm87_jp62_trt103_fp16.engine")` and only the first exists on the companion
|
||||
When `verify_companion_ready(address)` is called
|
||||
Then `engines_present=False`; `not_ready_reasons` contains `"engines_missing: lightglue_sm87_jp62_trt103_fp16.engine"`; `outcome=not_ready`; ONE WARN log `kind="c12.companion.degraded"`; NO `ContentHashMismatchError` is raised
|
||||
|
||||
**AC-3: Sidecar mismatch → `ContentHashMismatchError`**
|
||||
Given an engine file is present but its sidecar's hex digest does not match the engine's actual SHA-256
|
||||
When `verify_companion_ready(address)` is called
|
||||
Then `ContentHashMismatchError` is raised with `engine_path`, `expected_sha256_hex`, `actual_sha256_hex` populated; the SSH session is closed (`session.close()` is called in `finally`); ONE ERROR log `kind="c12.companion.hash.mismatch"` is emitted
|
||||
|
||||
**AC-4: SSH connection refused → `CompanionUnreachableError(reason=connect_refused)`**
|
||||
Given the companion address is unreachable (TCP RST or no listener)
|
||||
When `verify_companion_ready(address)` is called
|
||||
Then `CompanionUnreachableError(reason=connect_refused, underlying_exception_repr="...")` is raised; the underlying paramiko/socket exception's repr is captured; ONE ERROR log `kind="c12.companion.unreachable"`; `remediation` attribute returns "Check companion power, USB/Ethernet cable, and `config.c12.companion_address`."
|
||||
|
||||
**AC-5: SSH auth failure → `CompanionUnreachableError(reason=auth_failed)`**
|
||||
Given the companion is reachable but the SSH key is wrong or revoked
|
||||
When `verify_companion_ready(address)` is called
|
||||
Then `CompanionUnreachableError(reason=auth_failed, ...)` is raised; ERROR log `kind="c12.companion.unreachable"` with `reason="auth_failed"`; `remediation` attribute returns "Verify `config.c12.companion_ssh_keyfile` matches the public key in `~/.ssh/authorized_keys` on the companion."
|
||||
|
||||
**AC-6: Host key mismatch with `host_key_policy=strict` → `CompanionUnreachableError(reason=host_key_mismatch)`**
|
||||
Given the companion's host key has changed and `config.c12.companion_host_key_policy = strict`
|
||||
When `verify_companion_ready(address)` is called
|
||||
Then `CompanionUnreachableError(reason=host_key_mismatch, ...)` is raised; ERROR log; `remediation` returns "Inspect `~/.ssh/known_hosts`; if the companion was reflashed, remove its old entry; otherwise treat as a security incident."
|
||||
|
||||
**AC-7: SSH session is always closed**
|
||||
Given any of the four checks raises an unexpected exception (e.g. SFTP returns `OSError`)
|
||||
When `verify_companion_ready(address)` is called
|
||||
Then the exception propagates to the caller; `session.close()` was called exactly once before propagation (verifiable via spy on the fake `SshSession`); no socket descriptor leaks
|
||||
|
||||
**AC-8: Connect timeout → `CompanionUnreachableError(reason=timeout)`**
|
||||
Given the companion address routes but never responds to TCP SYN within `config.c12.companion_connect_timeout_s`
|
||||
When `verify_companion_ready(address)` is called
|
||||
Then `CompanionUnreachableError(reason=timeout, ...)` is raised within `connect_timeout_s + 1.0 s` (allowing test jitter); ERROR log includes the configured timeout value
|
||||
|
||||
**AC-9: `engines_inspected_count` reflects what was actually checked**
|
||||
Given a mix of present + missing engines (2 of 3 expected exist)
|
||||
When `verify_companion_ready(address)` is called
|
||||
Then `engines_inspected_count == 2`; the missing engine appears in `not_ready_reasons` but does NOT trigger a sidecar verify call (verifiable via spy)
|
||||
|
||||
**AC-10: `host_key_policy=reject_new` blocks first connection to a previously unseen host**
|
||||
Given `config.c12.companion_host_key_policy = reject_new` and the companion is not in `~/.ssh/known_hosts`
|
||||
When `verify_companion_ready(address)` is called
|
||||
Then `CompanionUnreachableError(reason=host_key_mismatch, ...)` is raised; ERROR log; `remediation` returns "Add the companion to `~/.ssh/known_hosts` first via a manual `ssh-keyscan`, then retry."
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- A successful `verify_companion_ready` call against a local-network companion (≤ 1 ms RTT) with 5 engines completes in ≤ 5 s wall-clock (dominated by 5 × `sha256sum` over engines totaling ~1 GB on the companion's NVMe).
|
||||
- Connection-open phase ≤ 2 s p99 in normal conditions; the `connect_timeout_s` config caps the worst case at the configured value.
|
||||
|
||||
**Compatibility**
|
||||
- paramiko per the project pin; no version override.
|
||||
- Host-key policies map to paramiko's `MissingHostKeyPolicy` subclasses; if paramiko changes the API in a future minor version, this task's policy mapping is the only place to update.
|
||||
|
||||
**Reliability**
|
||||
- The session is closed in `finally` on every code path (AC-7 covers).
|
||||
- `sha256sum` invocation has a per-engine timeout (default 60 s, config-overrideable) so a hung companion does not hold the operator's CLI indefinitely.
|
||||
- The four checks are sequential, not parallel, to keep the SSH session simple and ordering deterministic for log correlation.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | Fake `SshSessionFactory` returning a fake session where all four checks succeed | `ReadinessReport(outcome=ready)` + INFO log |
|
||||
| AC-2 | Fake session with one missing engine | `outcome=not_ready`, `not_ready_reasons` lists the missing engine, no hash check on the missing one |
|
||||
| AC-3 | Fake session where sidecar verifier returns `matches=False` | `ContentHashMismatchError` with populated attributes, session closed, ERROR log |
|
||||
| AC-4 | `SshSessionFactory.open` raises `ConnectionRefusedError` | `CompanionUnreachableError(reason=connect_refused)`, ERROR log |
|
||||
| AC-5 | `SshSessionFactory.open` raises `paramiko.AuthenticationException` | `CompanionUnreachableError(reason=auth_failed)`, ERROR log |
|
||||
| AC-6 | `SshSessionFactory.open` raises `paramiko.BadHostKeyException` with `policy=strict` | `CompanionUnreachableError(reason=host_key_mismatch)`, ERROR log |
|
||||
| AC-7 | Fake session whose `file_exists` raises `OSError` mid-flow | `OSError` propagates; `session.close()` called exactly once |
|
||||
| AC-8 | `SshSessionFactory.open` raises `socket.timeout` after `connect_timeout_s` | `CompanionUnreachableError(reason=timeout)`, log includes timeout value |
|
||||
| AC-9 | Fake session with mixed-presence engines, sidecar-verifier spy | `engines_inspected_count == count_of_present_expected`, sidecar verifier not called for missing engines |
|
||||
| AC-10 | `host_key_policy=reject_new` + unknown host | `CompanionUnreachableError(reason=host_key_mismatch)` with `reject_new`-specific remediation text |
|
||||
| NFR-perf-cold-call | Microbench against in-process fake session × 100 | p99 ≤ 50 ms for the orchestration overhead (excludes real SSH) |
|
||||
|
||||
## Constraints
|
||||
|
||||
- paramiko is the only allowed SSH library — no `subprocess.run("ssh ...")` shell-out (security: shell injection surface; reliability: no parsed output).
|
||||
- `SshSessionFactory` is a Protocol, NOT a class — the concrete `ParamikoSshSessionFactory` is one implementation, allowing tests to inject fakes without monkey-patching paramiko.
|
||||
- The `RemoteSidecarVerifier` does NOT pull the engine bytes back to the operator workstation — it runs `sha256sum` on the companion and parses the output. This avoids a multi-GB transfer per readiness check.
|
||||
- The error families (`CompanionUnreachableError`, `ContentHashMismatchError`) are the canonical types; sibling tasks (T3 build_cache) MUST consume these and not redefine them.
|
||||
- The host-key policy `auto_add_unknown` is intentionally NOT a supported value — silently accepting new host keys defeats the security model. The supported set is `strict | known_hosts | reject_new`; `known_hosts` requires the entry to already exist; `reject_new` is functionally identical to `strict` but with a clearer error message.
|
||||
- This task does NOT cache SSH sessions — every `verify_companion_ready` call opens and closes a fresh session. Caching would complicate the failure model for marginal performance gain (the bottleneck is the four `sha256sum` runs, not session establishment).
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: paramiko version drift breaks the host-key-policy mapping**
|
||||
- *Risk*: A future paramiko minor release renames or removes `MissingHostKeyPolicy` subclasses; this task's mapping breaks silently in tests that don't exercise paramiko itself.
|
||||
- *Mitigation*: A single integration test (marked `@pytest.mark.requires_paramiko`) constructs `ParamikoSshSessionFactory` with each policy value and asserts the resulting paramiko policy class name. Catches version drift on dependency upgrades.
|
||||
|
||||
**Risk 2: `sha256sum` is missing or behaves differently on the companion image**
|
||||
- *Risk*: The companion is JetPack-based; if it ships without `coreutils`'s `sha256sum`, this task's verifier breaks at runtime.
|
||||
- *Mitigation*: A composition-root health check at startup runs `sha256sum --version` over the SSH session and surfaces a clear `CompanionUnreachableError(reason=other, underlying_exception_repr="sha256sum not found")` if absent. JetPack base images include `coreutils` per ADR-005.
|
||||
|
||||
**Risk 3: Operator's `~/.ssh/known_hosts` has stale entries from prior bench runs**
|
||||
- *Risk*: A reflashed companion exhibits AC-10 / AC-6 failures legitimately, but operators see the cryptic paramiko traceback if remediation hints are unclear.
|
||||
- *Mitigation*: AC-6 / AC-10 require the `remediation` attribute on `CompanionUnreachableError` to mention `~/.ssh/known_hosts` explicitly. The CLI subcommand `verify-ready` (in T1) prints the remediation hint to stderr.
|
||||
|
||||
**Risk 4: Long-running `sha256sum` hangs the operator's CLI**
|
||||
- *Risk*: A degraded companion NVMe causes `sha256sum` on a 200 MB engine to take minutes; the operator sees a hung command.
|
||||
- *Mitigation*: `RemoteSidecarVerifier` enforces a per-engine timeout (default 60 s, config-overrideable). On timeout, the verifier raises `ContentHashMismatchError(actual_sha256_hex="<timeout>")` so the operator sees a clear failure and can investigate the disk.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: pre-flight companion artifact verification per AC-NEW-1 + description.md § 2 `verify_companion_ready`.
|
||||
- **Production code that must exist**: real `CompanionBringup` orchestrating real `ParamikoSshSessionFactory` + real `RemoteSidecarVerifier` (with real `sha256sum` over SSH); real config-driven SSH credentials + host-key policy + cache root.
|
||||
- **Allowed external stubs**: tests MAY use a fake `SshSessionFactory` returning a fake `SshSession` whose `run`, `file_exists`, `list_dir` are scripted; production wiring uses paramiko + the real companion.
|
||||
- **Unacceptable substitutes**: shelling out to `ssh ...` via `subprocess.run` (security + reliability); reading sidecars by pulling engine bytes back to the workstation (multi-GB per readiness check); `auto_add_unknown` host-key policy (security defeat); a "skip-verify" config flag (defeats AC-NEW-1).
|
||||
Reference in New Issue
Block a user