--- name: monorepo-discover description: Scans a monorepo or meta-repo (git-submodule aggregators, npm/cargo workspaces, etc.) and generates a human-reviewable `_docs/_repo-config.yaml` that other `monorepo-*` skills (document, cicd, onboard, status) read. Produces inferred mappings tagged with evidence; never writes to the config's `confirmed_by_user` flag — the human does that. Use on first setup in a new monorepo, or to refresh the config after structural changes. --- # Monorepo Discover Writes or refreshes `_docs/_repo-config.yaml` — the shared config file that every other `monorepo-*` skill depends on. Does NOT modify any other files. ## Core principle **Discovery is a suggestion, not a commitment.** The skill infers repo structure, but every inferred entry is tagged with `confirmed: false` + evidence. Action skills (`monorepo-document`, `monorepo-cicd`, `monorepo-onboard`) refuse to run until the human reviews the config and sets `confirmed_by_user: true`. ## Mitigations against LLM inference errors (applies throughout) | Rule | What it means | | ---- | ------------- | | **M1** Separation | This skill never triggers other skills. It stops after writing config. | | **M2** Evidence thresholds | No mapping gets recorded without at least one signal (name match, textual reference, directory convention, explicit statement). Zero-signal candidates go under `unresolved:` with a question. | | **M3** Factual vs. interpretive | Resolve factual questions alone (file exists? line says what?). Ask for interpretive ones (does A feed into B?) unless M2 evidence is present. Ask for conventional ones always (commit prefix? target branch?). | | **M4** Batch questions | Accumulate all `unresolved:` questions. Present at end of discovery, not drip-wise. | | **M5** Skip over guess | Never record a zero-evidence mapping under `components:` or `docs:` — always put it in `unresolved:` with a question. | | **M6** Assumptions footer | Every run ends with an explicit list of assumptions used. Also append to `assumptions_log:` in the config. | | **M7** Structural drift | If the config already exists, produce a diff of what would change and ask for approval before overwriting. Never silently regenerate. | ## Guardrail **This skill writes ONLY `_docs/_repo-config.yaml`.** It never edits unified docs, CI files, or component directories. If the workflow ever pushes you to modify anything else, stop. ## Workflow ### Phase 1: Detect repo type Check which of these exists (first match wins): 1. `.gitmodules` → **git-submodules meta-repo** 2. `package.json` with `workspaces` field → **npm/yarn/pnpm workspace** 3. `pnpm-workspace.yaml` → **pnpm workspace** 4. `Cargo.toml` with `[workspace]` section → **cargo workspace** 5. `go.work` → **go workspace** 6. Multiple top-level subfolders each with their own `package.json` / `Cargo.toml` / `pyproject.toml` / `*.csproj` → **ad-hoc monorepo** If none match → **ask the user** what kind of monorepo this is. Don't guess. Record in `repo.type` and `repo.component_registry`. ### Phase 2: Enumerate components Based on repo type, parse the registry and list components. For each collect: - `name`, `path` - `stack` — infer from files present (`.csproj` → .NET, `pyproject.toml` → Python, `Cargo.toml` → Rust, `package.json` → Node/TS, `go.mod` → Go). Multiple signals → pick dominant one. No signals → `stack: unknown` and add to `unresolved:`. - `evidence` — list of signals used (e.g., `[gitmodules_entry, csproj_present]`) Do NOT yet populate `primary_doc`, `secondary_docs`, `ci_config`, or `deployment_tier` — those come in Phases 4 and 5. ### Phase 3: Locate docs root Probe in order: `_docs/`, `docs/`, `documentation/`, or a root-level README with links to sub-docs. - Multiple candidates → ask user which is canonical - None → `docs.root: null` + flag under `unresolved:` Once located, classify each `*.md`: - **Primary doc** — filename or H1 names a component/feature - **Cross-cutting doc** — describes repo-wide concerns (architecture, schema, auth, index) - **Index** — `README.md`, `index.md`, or `_index.md` Detect filename convention (e.g., `NN_.md`) and next unused prefix. ### Phase 4: Map components to docs (inference, M2-gated) For each component, attempt to find its **primary doc** using the evidence rules. A mapping qualifies for `components:` (with `confirmed: false`) if at least ONE of these holds: - **Name match** — component name appears in the doc filename OR H1 - **Textual reference** — doc body explicitly names the component path or git URL - **Directory convention** — doc lives inside the component's folder - **Explicit statement** — README, index, or comment asserts the mapping No signal → entry goes under `unresolved:` with an A/B/C question, NOT under `components:` as a guess. Cross-cutting docs go in `docs.cross_cutting:` with an `owns:` list describing what triggers updates to them. If you can't classify a doc, add an `unresolved:` entry asking the user. ### Phase 5: Detect CI tooling Probe at repo root AND per-component for CI configs: - `.github/workflows/*.yml` → GitHub Actions - `.gitlab-ci.yml` → GitLab CI - `.woodpecker/` or `.woodpecker.yml` → Woodpecker - `.drone.yml` → Drone - `Jenkinsfile` → Jenkins - `bitbucket-pipelines.yml` → Bitbucket - `azure-pipelines.yml` → Azure Pipelines - `.circleci/config.yml` → CircleCI Probe for orchestration/infra at root: - `docker-compose*.yml` - `kustomization.yaml`, `helm/` - `Makefile` with build/deploy targets - `*-install.sh`, `*-setup.sh` - `.env.example`, `.env.template` Record under `ci:`. For image tag formats, grep compose files for `image:` lines and record the pattern (e.g., `${REGISTRY}/${NAME}:${BRANCH}-${ARCH}`). Anything ambiguous → `unresolved:` entry. ### Phase 6: Detect conventions - **Commit prefix**: `git log --format=%s -50` → look for `[PREFIX]` consistency - **Target/work branch**: check CI config trigger branches; fall back to `git remote show origin` - **Ticket ID pattern**: grep commits and docs for regex like `[A-Z]+-\d+` - **Image tag format**: see Phase 5 - **Deployment tiers**: scan root README and architecture docs for named tiers/environments Record inferred conventions with `confirmed: false`. ### Phase 7: Read existing config (if any) and produce diff If `_docs/_repo-config.yaml` already exists: 1. Parse it. 2. Compare against what Phases 1–6 discovered. 3. Produce a **diff report**: - Entries added (new components, new docs) - Entries changed (e.g., `primary_doc` changed due to doc renaming) - Entries removed (component removed from registry) 4. **Ask the user** whether to apply the diff. 5. If applied, **preserve `confirmed: true` flags** for entries that still match — don't reset human-approved mappings. 6. If user declines, stop — leave config untouched. ### Phase 8: Batch question checkpoint (M4) Present ALL accumulated `unresolved:` questions in one round. For each offer options when possible (A/B/C), open-ended only when no options exist. After answers, update the draft config with the resolutions. ### Phase 9: Write config file Write `_docs/_repo-config.yaml` using the schema in [templates/repo-config.example.yaml](templates/repo-config.example.yaml). - Top-level `confirmed_by_user: false` ALWAYS — only the human flips this - Every entry has `confirmed: ` and (when `false`) `evidence: [...]` - Append to `assumptions_log:` a new entry for this run ### Phase 10: Review handoff + assumptions footer (M6) Output: ``` Generated/refreshed _docs/_repo-config.yaml: - N components discovered (X confirmed, Y inferred, Z unresolved) - M docs located (K primary, L cross-cutting) - CI tooling: - P unresolved questions resolved this run; Q still open — see config - Assumptions made during discovery: - Treated as unified-docs root (only candidate found) - Inferred `` primary doc = `` (name match) - Commit prefix `` seen in N of last 20 commits Next step: please review _docs/_repo-config.yaml, correct any wrong inferences, and set `confirmed_by_user: true` at the top. After that, monorepo-document, monorepo-cicd, monorepo-status, and monorepo-onboard will run. ``` Then stop. ## What this skill will NEVER do - Modify any file other than `_docs/_repo-config.yaml` - Set `confirmed_by_user: true` - Record a mapping with zero evidence - Chain to another skill automatically - Commit the generated config ## Failure / ambiguity handling - Internal contradictions in a component (README references files not in code) → surface to user, stop, do NOT silently reconcile - Docs root cannot be located → record `docs.root: null` and list unresolved question; do not create a new `_docs/` folder - Parsing fails on `_docs/_repo-config.yaml` (existing file is corrupt) → surface to user, stop; never overwrite silently