feat(codebase): deep-enrich + graph-aware recall + team-wiki-codebase skill#56
Open
m0Nst3r873 wants to merge 10 commits into
Open
feat(codebase): deep-enrich + graph-aware recall + team-wiki-codebase skill#56m0Nst3r873 wants to merge 10 commits into
m0Nst3r873 wants to merge 10 commits into
Conversation
Vendored from team-wiki by @lurkacai (git.woa.com/lurkacai/team-wiki). Import paths adjusted for teamai-cli project structure. Files copied (all pure deterministic, no AI dependency): - core/graph-index.schema.ts: graph node/edge types, merge, save/load - core/wiki-protocol.ts: wiki category/confidence types, slugify - code-knowledge/code-collector.ts: file collection with git-aware filtering - code-knowledge/code-extractors.ts: multi-language fact extraction dispatch - code-knowledge/code-graph.ts: build CodeGraphIndex from facts - code-knowledge/code-incremental.ts: detect changed files via manifest - code-knowledge/extractors/*: TS/Python/Go/Java/Rust/Config extractors - interface-scanner.ts: HTTP/MQ/RPC endpoint detection (5 languages) - call-chain-tracer.ts: 4-layer call chain tracing - code-graph-overlay.ts: directory-level architecture nodes - doc-graph-extractor.ts: extract API/config/error nodes from docs - manifest-schema.ts: V2 manifest types (entrypoints, responsibilities)
Wire up vendored modules into the teamai extraction flow: - adapters/index.ts: unified export layer for all wiki-engine modules - adapters/templates.ts: router.md + index.md generation templates - codebase-extract.ts: full extraction pipeline collectCode → extractCodeFacts → scanInterfaces → traceCallChains → buildEvidencePages (interfaces.md + call-chains.md) → buildIndexHubOverlay → mergedGraph → graph-index.json → buildModuleSummaries → detectKnowledgeGaps → router/index/hot/gaps - utils/hook-output.ts: multi-tool Stop hook output formatting
- interface-scanner: HTTP/MQ/RPC detection across languages (12 tests) - call-chain-tracer: entry detection, layer classification (8 tests) - code-graph-overlay: buildIndexHubOverlay node/edge generation (5 tests) - doc-graph-extractor: structure + entity extraction (8 tests) - hook-output: formatStopHookOutput multi-tool format (6 tests) All tests use in-memory data, no filesystem/network dependencies.
Bug fixes applied: - B1: unify graph-index path to .indices/ (was .teamwiki/.indices/) - B2: fix router.md links (evidence/code/ prefix) - B3: add teamwiki to safeIgnore - B4: remove stale .teamwiki/evidence check - B5: use saveGraphIndex() instead of manual writeFile - B9: unify graph schema to GraphIndex (remove CodeGraphIndex) - B13: filter third-party npm imports from relation facts - B15: priority sort: key files first, then shallow dirs - B16: generate deterministic overview.md - B17: rename call-chains to dependency-paths (not runtime calls) - B18: Python extractor: only service-pattern functions as components - B19: facts deduplication by kind:name - B21: doc-graph config pattern restricted to SCREAMING_SNAKE_CASE - B22: API path pattern no longer requires /v\d*/ prefix CLI integration: - Add --extract, --incremental, --project, --max-files to codebase command - Add extract branch to codebase-cmd.ts - Add teamwiki/ to .gitignore
New modules (vendored/adapted from team-wiki by @lurkacai): - knowledge-reconciler.ts: 9-phase product↔code reconciliation - reconciler-v2-types.ts: NumericConfidence scoring types - manifest-compiler.ts: consume ManifestV2 → wiki pages New teamai modules: - enrich-with-ai.ts: per-module AI responsibility inference + repo-level domain classification via callClaudeParallel - rebuild-wiki-index.ts: generate table-based router.md + stats index.md from _manifest.json + _domains.json + overview.md - utils/git.ts: add autoPushTeamRepo for auto-push after import Updated: - wiki-engine/adapters/index.ts: export reconciler + confidence types - wiki-engine/adapters/templates.ts: DomainGroup router + IndexStats
…ication - import-repo.ts: add reconcile call after extraction, remove entire legacy AI domain classification flow (recommendDomain → domains.yaml) - import-org.ts: add rebuildWikiIndex + autoPush after batch import - codebase-extract.ts: integrate AI enrichment (enrichWithAI + writeManifest + _domains.json), domain-grouped router/index - Tests updated to match new import flow
1. deep-enrich.ts: background THPC-quality knowledge generation - Phase 1: Component design docs per module (parallel AI calls) - Phase 2: Architecture overview document - Phase 3: Graph documents G1-G3 (deterministic) - Progress tracking with _review/progress.json resume support 2. skills/team-wiki-codebase/: bundled deep generation skill (by @lurkacai) - 909-line SKILL.md methodology (K0-K4 phases) - Sub-agents: kb-doc-generator, graph-rag-agent - Registered in builtin-skills.ts for auto-deploy on pull
Security (M1/M2/M4): - enrich-with-ai.ts: sanitizeForPrompt() for prompt injection defense - import-repo.ts: independent JSON.parse try/catch with warn logging - knowledge-reconciler.ts: reject '../' and absolute paths Integration from main: - import-repo.ts: deep-enrich trigger + reconcile call - index.ts: hidden deep-enrich command + recall depth option - recall.ts + code-knowledge-recall.ts: codebase graph recall - contribute-check.ts: scoring adjustments + hook output fix - hook-handlers.ts: formatStopHookOutput multi-tool compat - clone.ts: HTTPS upgrade + SSH conversion - pull.ts: MCP registration + teamwiki sync - ci/extract-mr.ts: graph change detection in MR pipeline - README: teamwiki docs + CLI command table simplification - Various test updates to match new behavior
1. Recall agent model_hint (builtin-rules.ts + agents/teamai-recall.md): - Guide main agent to use mid-tier model for recall subagent - Balances cost/latency with tool-calling capability requirement 2. Auto-recall test fix (auto-recall.test.ts): - Add missing version/type/domain/df fields to test search index - Prevents isLegacyIndex() from triggering rebuild in isolated tests
Critical recall fixes: - B7: use protocol loadGraphIndex instead of local hardcoded version - B8: fix graph boost path resolution — match by slug/title not raw file paths (previously graph boost was dead code due to path format mismatch) - B10: BM25 document length uses raw token count, not deduplicated count - B11: normalize BM25 scores to 0-10 range before merging with learnings - B14: add CJK bigram tokenization for Chinese query matching - B24: extend graph boost to 2-hop neighbors (halved weight) - B25: deep-enrich BATCH increased 2→5 Also: camelCase splitting in tokenizer, GraphNode field migration (slug/title/type instead of id/label/kind/file) in extract-mr, wiki-lint
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Part 3 of 3. Depends on #55 (
feat/import-enrichment-v2).deep-enrich.ts: Background AI knowledge generation (component design docs + architecture overview + graph docs G1-G3) with resume support via_review/progress.jsoncode-knowledge-recall.ts: BM25 + graph-boost retrieval engine for teamwiki/ knowledge graphcodebase-upgrade-wiki.ts: Migration fromdocs/team-codebase/toteamwiki/formatcodebase-wiki-lint.ts: Graph health diagnostics (connectivity, orphans, staleness)team-wiki-codebaseskill bundle (909-line methodology)--depthoption (route/context/lookup), graph-aware agent instructions, model_hintCritical recall pipeline fixes
dlused deduplicated token count → broke length normalization. Fixed.getUserById→ get, user, by, id).How it's reachable
teamai deep-enrich --project <slug>→ generate docs/*.mdteamai codebase --lint(when teamwiki/ exists) → wiki-lint health checkteamai codebase --upgrade-wiki→ migrate legacy formatteamai recall --depth lookup <query>→ graph-boosted retrievalTest plan
npx tsc --noEmit— zero errorsnpx vitest run— 1505 tests passedDependency chain
Replaces #52 + #53.