feat+fix: C# support, investigation-grade trace, BM25 search, execution flows, channels, IMPORTS resolution#162
Open
Koolerx wants to merge 29 commits intoDeusData:mainfrom
Open
Conversation
added 7 commits
March 27, 2026 12:57
When trace_call_path targets a Class or Interface node, the BFS now resolves through DEFINES_METHOD edges to find the actual callable methods, then runs BFS from each method and merges results. Previously, tracing a class name returned 0 results because Class nodes have no direct CALLS edges — only their Method children do. Also expands edge types to include HTTP_CALLS and ASYNC_CALLS alongside CALLS for broader cross-service coverage. Node selection improved: when multiple nodes share the same name (e.g. a Class and its constructor Method), prefer the Class for resolution since constructors rarely have interesting outbound CALLS. Tested: C# class tracing went from 0 to 87 callees and 8 callers. TS repos unchanged at 50 callers.
…free detect_changes was using yyjson_mut_arr_add_str / yyjson_mut_obj_add_str which borrow pointers. The file name came from a stack buffer reused each fgets() iteration, and node names were freed by cbm_store_free_nodes before serialization. This caused corrupted output with null bytes embedded in filenames (e.g. 'CLAUDE.md\0\0\0ings.json'). Switch to yyjson_mut_arr_add_strcpy / yyjson_mut_obj_add_strcpy which copy the strings into yyjson's internal allocator, making them safe across the buffer reuse and free boundaries.
Vendored/minified JS files (tsc.js, typescript.js) inside non-JS repos produce false positive routes when the Express route extractor matches JS operators and keywords as route paths. Add a validation filter that rejects: - JS/TS operators: !, +, ++, -, --, :, ~ - JS/TS keywords: void, null, true, false, throw, this, typeof, etc. - Single-character non-slash paths (*, ?, #) - Paths with no alphanumeric or slash characters Also trims leading/trailing whitespace before comparison to catch 'void ' and 'throw ' variants from minified source. Tested: Routes went from 42 (20 garbage) to 22 real routes in test C# repo.
The tree-sitter C# grammar represents class inheritance via 'base_list' child nodes (e.g. 'class Foo : Bar, IBaz'). The extract_base_classes function didn't handle this node type, causing most C# inheritance to be missed. Add explicit traversal of base_list children, extracting type identifiers from both direct identifier nodes and wrapper nodes (simple_base_type, primary_constructor_base_type). Generic type arguments are stripped for resolution (List<int> → List). Tested: INHERITS edges went from 210 to 1,588 in test C# repo (7.5x improvement). Verified results include real C# domain classes (e.g. ClassA→BaseClassB, TestSuite→TestsBase, etc.).
The get_architecture MCP handler was only returning node/edge label counts (identical to get_graph_schema). The store has a full architecture analysis function cbm_store_get_architecture() that computes languages, hotspots, routes, entry points, packages, clusters, and layers — but it was never called from the MCP handler. Wire all architecture aspects into the response: - languages: file counts per language - hotspots: highest fan-in functions - routes: HTTP route definitions - entry_points: main/handler functions - packages: top-level module groupings - clusters: Louvain community detection results Use strcpy variants for all architecture strings since they're freed by cbm_store_architecture_free before any potential reuse. Tested: get_architecture went from 0 for all fields to 10 languages, 10 hotspots, 13 routes, 20 entry points, 15 packages.
The cbm_louvain() function was fully implemented but never called. Add arch_clusters() that loads all callable nodes and CALLS edges, runs Louvain community detection, groups results by community ID, and populates cbm_cluster_info_t with member counts and top-5 nodes per cluster sorted by largest communities first. Wire into cbm_store_get_architecture() dispatch for the 'clusters' aspect. Cap output at 20 clusters. Top nodes per cluster are selected by iterating community members (degree-based sorting can be added later). Tested: Test C# repo went from 0 to 20 clusters. Largest cluster has 3,205 members (test code), second has 1,881 (core API functions).
Add cbm_extract_hapi_routes() that handles the Hapi.js route registration
pattern: { method: 'GET', path: '/api/...', handler: ... }. Uses a
mini-parser that finds method:/path: property pairs within the same object
literal by tracking enclosing brace scope. Also extracts handler references.
Wired into both the prescan (parallel) path in pass_parallel.c and the
disk fallback path in pass_httplinks.c for both per-function and
module-level source scanning.
Tested: Test TS/Hapi repo went from 0 to 1,665 routes.
CBM now finds every route definition AND API call site, compared to
only 12 from external service proxy routes with the previous tool.
33a7d1d to
58fff9e
Compare
added 10 commits
March 27, 2026 14:31
Add a nodes_fts FTS5 virtual table synced via triggers for INSERT/UPDATE/DELETE. Enable SQLITE_ENABLE_FTS5 in both production and test Makefile flags. New 'query' parameter on search_graph: when set, uses FTS5 MATCH with bm25() ranking instead of regex matching. Multi-word queries are tokenized into OR terms for broad matching (e.g. 'authentication middleware' matches nodes containing either word, ranked by relevance). The direct B-tree dump pipeline bypasses SQLite triggers, so add a bulk FTS5 backfill step after indexing: INSERT INTO nodes_fts SELECT id, name, qualified_name, label, file_path FROM nodes Add cbm_store_exec() public API for raw SQL execution. Falls back gracefully to regex path if FTS5 is unavailable. Tested: 'authentication middleware' query returns 242 ranked results (was 0). 'session recording upload' returns 4,722 ranked results with relevant routes, controllers, and constants at the top.
… + Louvain Add process detection as a post-indexing pass that discovers cross-community execution flows: 1. Find all entry point nodes (is_entry_point=true or Route label) 2. Load CALLS edges and run Louvain community detection 3. BFS from each entry point to depth 8, max 200 visited nodes 4. Identify the deepest node that crosses a Louvain community boundary 5. Name the flow 'EntryPoint → Terminal' with process_type=cross_community 6. Store to new processes + process_steps tables New schema: 'processes' table (id, project, label, process_type, step_count, entry_point_id, terminal_id) and 'process_steps' table (process_id, node_id, step). New store API: cbm_store_detect_processes(), cbm_store_list_processes(), cbm_store_get_process_steps() with corresponding free functions. New MCP tool: list_processes returns up to 300 processes ordered by step count. Tested: TS/Hapi monorepo detects 300 cross-community processes, matching the flow count from competing tools. Examples: 'ssoCallbackHandler → catchUnexpectedResponse', 'exportCourse → sendSQSMessage'.
Detect emit/listen channel patterns in JS/TS/Python source files during indexing. Extracts socket.emit/on, io.emit/on, emitter.emit/on patterns with a regex scanner that identifies receiver names against a whitelist of known channel communicators (socket, io, emitter, eventBus, etc.). Filters out generic Node.js stream events (error, close, data, etc.) and classifies transport as 'socketio' or 'eventemitter' based on receiver name. New schema: 'channels' table (project, channel_name, direction, transport, node_id, file_path, function_name) with indexes on channel_name and project. New store API: cbm_store_detect_channels() scans source from disk for all indexed Function/Method/Module nodes in JS/TS/Python files. cbm_store_find_channels() queries by project and/or channel name with partial matching. Automatic cross-repo matching at query time (no link step). New MCP tool: get_channels returns matched channels with emitter/listener info, filterable by channel name and project. Tested: TS monorepo detects 210 channel references including Socket.IO subscribe/unsubscribe flows between UI and server.
node_prop() previously returned empty string for any property not in the hardcoded column list (name, qualified_name, label, file_path, start_line, end_line). Now falls through to json_extract_prop() on the node's properties_json field for unknown properties. Enables Cypher queries like: WHERE n.is_entry_point = 'true' WHERE n.is_test = '1' WHERE n.confidence > '0.5' Also adds 'file' as an alias for 'file_path' and 'id' for the node ID. Tested: 'MATCH (n:Function) WHERE n.is_entry_point = true' returns 10 controller handlers (previously 0).
…results QFix 1 — trace_call_path disambiguation + file paths: - When multiple callable symbols match, includes a 'candidates' array with name, label, file_path, line for each (like IDE go-to-definition) - Every BFS result node now includes file_path, label, start_line - Adds matched_file, matched_label, matched_line to the root response QFix 2 — domain-weighted flow terminal naming: - Reduced BFS max_results from 200 to 50 to prevent generic utility functions from becoming terminals - Terminal candidates scored by: name length (domain names are longer), CamelCase bonus, domain verb bonus (Handler, Controller, Service, etc.), penalty for generic names (update, get, set, findOne, push, etc.) - Result: 2/300 flows end in generic names (was ~280/300) - Step count range: 3-51 (was 3-201) QFix 3 — FTS5 search structural filtering: - Exclude File/Module/Folder/Section/Variable/Project nodes from results - Structural boost: Function/Method +10, Class/Interface/Type +5, Route +8 - High fan-in bonus: nodes with >5 CALLS in-degree get +3 - Result: 'authentication middleware' returns verifyJwt, apiMiddleware, createAuthRequestConfig (was returning Folder/Module/Section noise)
Gap 1 — Semantic cluster labels: Replace auto-numbered 'Cluster_N' with directory-derived semantic labels. For each cluster, sample up to 50 member file paths, extract the most common non-generic directory segment (skip src/lib/dist/test/node_modules/shared), capitalize and TitleCase the result. Falls back to 'Cluster_N' when no directory has >= 3 occurrences. Result: 'Services', 'Components', 'Controllers', 'Storage', 'Models', 'Stores', 'Scenarios', 'Courses' — matching competing tool quality. Gap 2 — Process participation in trace_call_path: After BFS traversal, query the processes table to find all execution flows the traced function participates in (as entry point, terminal, or by name substring match in the flow label). Includes up to 20 flows with label, process_type, and step_count directly in the trace response — no separate tool call needed.
…ss steps
Major rewrite of trace_call_path output for investigation-grade quality:
Categorized edges (Fixes A+D):
- incoming: { calls: [...], imports: [...], extends: [...] }
- outgoing: { calls: [...], has_method: [...], extends: [...] }
- Separate transitive_callers for depth > 1 (avoids noise in main results)
Each category queried independently via single-hop BFS on specific edge types.
Broader caller coverage (Fix A):
- Include USAGE and RAISES edges alongside CALLS for incoming queries
- Query both the Class node and its methods as BFS roots
- Result: MeteorError upstream goes from 9 to 39 callers
Noise elimination (Fix C):
- Default depth 1 for categorized results (direct only)
- Transitive callers isolated in separate field, capped at 50
- No more 106 render() methods polluting results
New get_impact tool (Fix F):
- BFS upstream/downstream with depth-grouped results
- d1_will_break / d2_likely_affected / d3_may_need_testing
- Risk assessment: LOW / MEDIUM / HIGH / CRITICAL based on d1 count
- Affected processes cross-referenced by name
- Tested: protectedUpdate returns CRITICAL (38 direct, 162 transitive)
New get_process_steps tool (Fix E):
- Returns ordered step list for a specific process ID
- Each step includes name, qualified_name, file_path
- Enables step-by-step flow debugging
Fix crash (double-free) when tracing nodes with 0 in-degree and 0 out-degree (e.g. Type nodes, empty Class stubs). Detect early via cbm_store_node_degree and return basic match info without attempting BFS traversal. Also move the traversal result array from stack to heap to prevent stack smashing with many start IDs. Add fuzzy name fallback: when exact name match returns 0 results, run a regex search with '.*name.*' pattern and return up to 10 suggestions with name, label, file_path, line. This handles cases like searching for 'RecordingSession' when only 'ContinuousRecordingSessionDataGen' exists.
Three fixes for C# delegate and event subscription patterns that were
invisible to the call graph:
Fix 1 — Bare method reference subscription:
event += MethodName creates a CALLS edge from the subscribing method
to the handler. Detects assignment_expression with += operator where
the RHS is an identifier or member_access_expression.
e.g. socket.OnConnected += SocketOnConnected
Fix 2 — Delegate .Invoke() resolution:
delegate?.Invoke(args) resolved to 'Invoke' which matches nothing.
Now detects conditional_access_expression and member_access_expression
where the method is 'Invoke', extracts the receiver (delegate property)
name as the call target instead.
e.g. OnConnected?.Invoke(this, e) → CALLS edge to 'OnConnected'
Fix 3 — Lambda event body scope attribution:
Lambda expressions inside += assignments no longer create a new scope
boundary. Calls inside the lambda body are attributed to the enclosing
method that subscribes the event, not to an anonymous lambda scope.
This means all handler logic is correctly attributed to the method
that registers the event subscription.
e.g. socket.OnError += (s, e) => { ErrorOnce(...); } attributes
the ErrorOnce call to the method containing the += statement.
Tested on C# codebase: SocketOnConnected gained 1 incoming caller
(from += subscription) and 1 outgoing call (from ?.Invoke resolution).
InitializeExternalClient gained 10 additional outgoing calls from
lambda body attribution (30 total, up from 20).
Fix A — Class node 0-degree early exit:
The crash guard that returns early for nodes with 0 CALLS edges was
incorrectly catching Class/Interface nodes that have DEFINES_METHOD and
INHERITS edges (cbm_store_node_degree only counts CALLS). Re-add the
is_class_like exemption so Class nodes always proceed to DEFINES_METHOD
resolution. Cap method resolution to 5 methods to prevent excessive BFS.
Fix A2 — has_method uses Class node ID:
The DEFINES_METHOD BFS was using method start_ids (from class resolution)
as the BFS root, but DEFINES_METHOD edges go FROM the Class TO Methods.
Use the original Class node ID for the has_method query.
Result: 30 methods found (GitNexus: 29), extends chain shown.
Fix B1 — Add .cs to channel detection file filter:
Channel detection SQL now includes .cs files alongside JS/TS/Python.
Fix B2 — C# channel extraction with constant resolution:
New cbm_extract_csharp_channels() in httplink.c that handles:
- const string CONSTANT = "value" → builds name-to-value map
- .Emit(CONSTANT, ...) → resolves to string value, marks as emit
- .OnRequest<T>(CONSTANT, ...) → resolves to string value, marks as listen
- .Emit("literal", ...) → direct string literal matching
Result: 73 channel references, 35 unique channels in C# repo (was 0).
added 7 commits
March 29, 2026 12:49
…st radius When a Class and its Constructor share the same name (common in C#/Java), get_impact previously picked the Constructor (which has 0 incoming CALLS), yielding empty blast radius results for any class query. Now mirrors trace_call_path's disambiguation logic: - Prefers Class node over same-named Constructor/Method - Expands through DEFINES_METHOD edges to get all method node IDs - Runs BFS from each method and merges results (dedup by closest hop) - Caps at 30 methods per class (vs trace's 5) for comprehensive coverage - Improved affected_processes matching: checks d=1 caller names too Tested on a 26K-node C# monolith: 'UserService' went from 0 callers to 16 direct callers, 19 total affected, HIGH risk, 20 affected processes.
Previously only JS/TS exports and lowercase 'main' were recognized as entry points, causing 0 execution flows for C#/Java repos. Changes: - Case-insensitive main detection (strcasecmp) — fixes C# 'Main' and Java 'main' in both extract_func_def and push_method_def paths - C# Windows Service lifecycle: OnStart, OnStartImpl, Run, Execute, Configure, ConfigureServices - C# ASP.NET decorators: [HttpGet], [HttpPost], [Route], [ApiController] - C# test decorators: [TestMethod], [Fact], [Test] - Java patterns: start, configure, init, run, handle - Java Spring/JAX-RS: @RequestMapping, @GetMapping, @PostMapping, etc. - Java JUnit/lifecycle: @OverRide, @test, @scheduled, @bean Critical fix: push_method_def() (class methods) was missing entry point detection entirely — only extract_func_def() (standalone functions) had it. Tested: C# monolith 1→69 flows, Java/Vert.x repo 0→300 flows, C# desktop app 2→280 flows + 33 routes discovered.
Channel deduplication: - Added UNIQUE index on channels(project, channel_name, direction, file_path, function_name) to prevent duplicate rows at insert time - Changed INSERT to INSERT OR IGNORE - Added DISTINCT to all channel SELECT queries - Fixed SQL injection in channel DELETE (was snprintf, now parameterized) Cypher count(DISTINCT ...): - Parser now accepts DISTINCT keyword inside aggregate functions: count(DISTINCT n.name), count(DISTINCT n.file_path), etc. - Added distinct_arg flag to cbm_return_item_t - Executor tracks seen values per-column and only increments count for unique values when distinct_arg is set - Proper cleanup of distinct_seen arrays in both WITH and RETURN paths Enables queries like: MATCH (caller)-[e]->(n) WHERE e.type = 'CALLS' RETURN count(DISTINCT n.name) as unique_callees
Adds WHERE NOT EXISTS { MATCH (caller)-[e]->(n) WHERE e.type = 'CALLS' }
support for anti-join queries like dead-code detection.
Parser: extends parse_not_expr to recognize NOT EXISTS { MATCH ... WHERE ... }
as a correlated subquery. Creates EXPR_NOT_EXISTS expression node with
sub_pattern and sub_where fields.
Executor: two evaluation paths for performance:
- Fast path (O(1) per node): when inner pattern has exactly 1 hop and one
endpoint is bound from outer scope, directly queries edges by source/target
ID. No full node scan needed.
- Slow path: full subquery expansion for complex/multi-hop patterns.
Threading: eval_expr and eval_where now accept (store, project, max_rows)
parameters to support correlated subquery expansion. All 5 call sites updated.
Enables queries like:
MATCH (n:Function) WHERE NOT EXISTS { MATCH (caller)-[e]->(n) WHERE e.type = 'CALLS' }
RETURN n.name, n.file_path LIMIT 20
Tested: finds 10 dead functions in a 216-function JS codebase in <1 second.
Cross-repo channels: when get_channels is called without a project parameter, iterates ALL indexed project .db files in the cache directory, queries each for matching channels, and merges results. Enables cross-service message flow tracing (e.g., find all repos that emit/listen on 'UserCreated'). has_property in trace: trace_call_path now includes outgoing.has_property section for Class/Interface nodes, showing all property nodes linked via HAS_PROPERTY edges — property name, file path, and line number.
Extracts property_declaration, indexer_declaration, event_declaration, and event_field_declaration from C# class bodies as 'Property' label nodes. Previously these were completely invisible to the knowledge graph. Creates HAS_PROPERTY edges from Class → Property in both parallel and serial indexing paths (pass_parallel.c, pass_definitions.c). Extracted metadata: property name, qualified name, file path, line range, declared type (from type field), decorators, export status. Tested: C# monolith (26K nodes) gained 3,470 Property nodes and 6,943 new edges including HAS_PROPERTY. trace_call_path now shows 5 properties for a typical service class.
…cope
C/C++ function_definition nodes have no 'name' field — the name is buried
in a declarator chain (function_definition → declarator → function_declarator
→ declarator → identifier). Both compute_func_qn() in extract_unified.c
and func_node_name() in helpers.c used ts_node_child_by_field_name('name')
which returns NULL for C/C++, causing all CALLS edges to be attributed to
the File node instead of the containing Function.
Fix: walk the C/C++ declarator chain (up to 8 levels) to find the identifier.
Handles: identifier, field_identifier, qualified_identifier, scoped_identifier.
Also unwraps template_declaration → function_definition for C++ templates.
Fixes C, C++, CUDA, and GLSL function scope resolution.
Tested: C++ desktop app went from 0 Function→Function CALLS edges to 10,
enabling process detection from entry points for the first time.
added 5 commits
March 29, 2026 14:50
Adds entry point detection for C/C++ patterns in both extract_func_def and push_method_def paths: - WinMain, wWinMain, wmain, _tmain (Win32 console/GUI apps) - DllMain (DLL entry points) - InitInstance, OnInitDialog (MFC framework entry points) These join the existing case-insensitive main() detection to cover the full spectrum of C/C++ application architectures.
Process detection now follows HANDLES, HTTP_CALLS, and ASYNC_CALLS edges
in addition to CALLS when building Louvain communities and running BFS
from entry points. Previously only CALLS edges were traversed, making
Express/Hapi route→handler flows invisible to process detection.
Changes:
- Louvain edge loading query: type IN ('CALLS','HANDLES','HTTP_CALLS','ASYNC_CALLS')
- BFS from entry points: 4 edge types instead of 1
Tested: Express monorepo with 158 routes went from 3 to 4 detected flows,
with routes now participating in community detection.
Two fixes to dramatically increase detected execution flows: 1. Route→Function resolution (step 1b): Route nodes have 0 outgoing edges (only incoming HANDLES from Module nodes), so BFS from Routes went nowhere. Now resolves each Route entry point through the HANDLES edge to find the Module, then looks up Functions in the same file — those become the real BFS starting points. This connects HTTP API routes to their handler logic. 2. Relaxed cross-community requirement: previously, flows were only created when BFS crossed a Louvain community boundary. Now flows with ≥3 steps are kept even within a single community, picking the deepest non-generic node as terminal. This catches Express-style flat patterns (route → controller → storage → db) that stay within one community. Results: - Express monorepo: 4 → 61 flows (route handlers now visible) - C# service: 69 → 78 flows (+9 intra-community flows) - JS service: 65 → 70 flows (+5 intra-community flows) - TS monolith: 300 (capped, no change)
Root cause: cbm_pipeline_fqn_module() received raw import paths like './utils/trace' or '../controllers/auth' and converted them directly to QNs without resolving against the importing file's directory. The resulting QN never matched any Module node, so IMPORTS edges were silently dropped. New function cbm_pipeline_resolve_import_path() in fqn.c: - Resolves ./ and ../ segments against the importer's directory - Normalizes path (collapses a/b/../c → a/c) - Bare module specifiers (no ./ prefix) pass through unchanged Extension probing in pass_parallel.c and pass_definitions.c: - After resolving the path, tries exact match first - Then probes: .js, .ts, .tsx, .jsx, .mjs, .mts, .css, .scss, .json - Then probes /index variants: /index.js, /index.ts, /index.tsx, etc. - Then probes C/C++ headers: .h, .hpp, .hh Results: - JS service: 0 → 335 IMPORTS edges - TS monolith: 153 → 11,770 IMPORTS edges (77x increase) - TS/React monorepo: 0 → 344 IMPORTS edges - TS/Electron app: 1 → 161 IMPORTS edges
The ES module import walker (walk_es_imports) only handled 'import' statements
but not CommonJS 'require()' calls. JS codebases using require() had zero
imports extracted.
Adds require() detection in walk_es_imports:
- Detects variable_declarator/assignment_expression with require() call value
- Handles: const X = require('Y') (default import)
- Handles: const { A, B } = require('Y') (destructured import via object_pattern)
- Handles: const [A, B] = require('Y') (array destructured)
- Supports shorthand_property_identifier_pattern and pair_pattern variants
- Falls back to path_last() for unnamed requires
Also adds variable_declaration and expression_statement to js_import_types
in lang_specs.c, catching 'var X = require()' patterns (older JS codebases).
Tested: JS service went from 0 to 335 IMPORTS with both ESM and CJS detected.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
29 commits adding major features and fixing critical bugs across the MCP handler, extraction layer, pipeline, and Cypher engine. Developed while stress-testing against large enterprise codebases (C# monolith ~109K nodes, Node.js/TS Hapi.js monorepo ~144K nodes, React/TS monorepo ~9K nodes, C++ library, Java/Vert.x service) and running real investigation scenarios.
Highlights: C# blast radius analysis (0 → 16 callers), execution flow detection for C#/Java/C++ (1 → 300 flows), Cypher NOT EXISTS + count(DISTINCT), IMPORTS edge resolution (0 → 11,770 edges), CommonJS require() extraction, C# property extraction (19K+ properties), cross-repo channel query, C++ CALLS edge attribution fix.
Bug Fixes (Commits 1-6)
1.
trace_call_pathClass → Method resolutionFile:
src/mcp/mcp.cWhen targeting a Class/Interface node, BFS now resolves through
DEFINES_METHODedges to find callable methods, then runs BFS from each. Previously returned 0 results for class names. Also expands edge types to includeHTTP_CALLSandASYNC_CALLS.2.
detect_changesuse-after-freeFile:
src/mcp/mcp.cFilenames came from a stack buffer reused each
fgets()iteration, and node names were freed before serialization. Switched tostrcpyvariants.3. Route path validation
File:
src/pipeline/pass_httplinks.cVendored/minified JS files produced false positive routes (JS operators as HTTP methods). Adds blocklist filter for keywords and operators.
4. C# inheritance via
base_listFile:
internal/cbm/extract_defs.cTree-sitter C#
base_listnodes weren't handled. Adds explicit traversal with generic type argument stripping. INHERITS edges: 210 → 1,588 (7.5x).5. Crash on 0-edge nodes + fuzzy name fallback
File:
src/mcp/mcp.cFixes double-free when tracing nodes with 0 edges. Moves traversal array to heap. Adds fuzzy substring fallback when exact name match returns 0 results.
6. Class
has_methoduses correct node IDFile:
src/mcp/mcp.cDEFINES_METHOD BFS was using method IDs (from class resolution) instead of the Class node ID. Fixed to query from the Class node directly.
New Features (Commits 7-17)
7.
get_architecturereturns full analysisFile:
src/mcp/mcp.c— Wiredcbm_store_get_architecture().8. Louvain clustering with semantic labels
File:
src/store/store.c— Directory-derived cluster labels (Controllers, Services, etc.).9. Hapi.js route extraction
Files:
src/pipeline/httplink.c,pass_httplinks.c—{ method: 'GET', path: '/api/...', handler: ... }patterns. 0 → 1,665 routes.10. BM25 full-text search via SQLite FTS5
Files:
src/store/store.c,Makefile.cbm— FTS5 with structural boost. Excludes File/Module/Folder noise.11. Execution flow detection
Files:
src/store/store.c,src/pipeline/pipeline.c,src/mcp/mcp.c— BFS + Louvain community detection. Domain-weighted terminal naming. 300 flows detected.12. Socket.IO + EventEmitter channel detection
Files:
src/pipeline/httplink.c,src/store/store.c— JS/TS/Python/C# with constant resolution.get_channelsMCP tool.13.
get_impactblast radius toolFile:
src/mcp/mcp.c— Depth-grouped results with risk assessment and affected processes.14-17. Cypher JSON properties, investigation-grade trace output, C# delegate/event resolution, C# channel constant resolution
See individual commit messages for details.
Phase 2: Gap Closure (Commits 18-29)
After extensive comparison testing against a production-grade code intelligence tool, 12 additional commits closing every identified quality gap:
18.
get_impactresolves Class over Constructor for C#File:
src/mcp/mcp.cWhen a Class and its Constructor share the same name (common in C#/Java),
get_impactpreviously picked the Constructor (0 incoming CALLS). Now mirrorstrace_call_path's disambiguation: prefers Class node, expands through DEFINES_METHOD to all methods, runs BFS from each and merges results. Caps at 30 methods per class.Result: C# class blast radius: 0 → 16 callers, 0 → 20 affected processes, LOW → HIGH risk.
19. Entry point detection for C#/Java class methods
File:
internal/cbm/extract_defs.cPreviously only JS/TS exports and lowercase
mainwere recognized as entry points. Adds:maindetection (fixes C#Main, Javamain)OnStart,OnStartImpl,Run,Execute,Configure[HttpGet],[HttpPost],[Route],[ApiController][TestMethod],[Fact],[Test]start,configure,init,run,handle@RequestMapping,@GetMapping,@PostMappingCritical fix:
push_method_def()(class methods) was missing entry point detection entirely — onlyextract_func_def()(standalone functions) had it.Result: C# monolith 1 → 69 flows, Java/Vert.x 0 → 300 flows, C# desktop 2 → 280 flows.
20. Channel dedup + count(DISTINCT) + SQL injection fix
Files:
src/store/store.c,src/cypher/cypher.c,src/cypher/cypher.hINSERT OR IGNORE+DISTINCTSELECTsnprintf— SQL injection vulnerability)count(DISTINCT n.name)with per-column dedup in executor21. Cypher NOT EXISTS subquery
Files:
src/cypher/cypher.c,src/cypher/cypher.hFull
WHERE NOT EXISTS { MATCH (caller)-[e]->(n) WHERE e.type = 'CALLS' }support. Two evaluation paths:Enables dead-code detection: finds 10 uncalled functions in <1 second.
22. Cross-repo channel query +
has_propertyin trace outputFile:
src/mcp/mcp.cget_channelswithout a project parameter iterates ALL indexed project databases.trace_call_pathnow includesoutgoing.has_propertysection.23. C# property extraction with HAS_PROPERTY edges
Files:
internal/cbm/extract_defs.c,src/pipeline/pass_parallel.c,src/pipeline/pass_definitions.cExtracts
property_declaration,indexer_declaration,event_declarationfrom C# class bodies asPropertylabel nodes. CreatesHAS_PROPERTYedges from Class → Property.Result: C# monolith gained 3,470 Property nodes. C# desktop app gained 19,124 Property nodes.
24. C/C++ CALLS edge attribution to enclosing function scope
Files:
internal/cbm/extract_unified.c,internal/cbm/helpers.cC/C++
function_definitionnodes have nonamefield — the name is buried in a declarator chain. Bothcompute_func_qn()andfunc_node_name()returned NULL, causing ALL CALLS edges to be attributed to File nodes. Adds declarator-chain walk (up to 8 levels) for C, C++, CUDA, GLSL. Also handlestemplate_declarationunwrapping.Result: C++ library went from 0 Function→Function CALLS edges to 74.
25. C++ entry point heuristics
File:
internal/cbm/extract_defs.cAdds:
WinMain,wWinMain,DllMain,wmain,_tmain,InitInstance,OnInitDialog.26. HANDLES + HTTP_CALLS in process detection BFS
File:
src/store/store.cProcess detection and Louvain community building now traverse
HANDLES,HTTP_CALLS, andASYNC_CALLSedges alongsideCALLS. Previously Express/Hapi route→handler flows were invisible.27. Route→Function resolution + relaxed process detection
File:
src/store/store.cRoute nodes have 0 outgoing edges (only incoming HANDLES from Modules). Adds resolution: Route → HANDLES → Module → Functions in same file. Also relaxes cross-community requirement: flows ≥3 steps are kept even within a single community.
Result: Express monorepo: 4 → 61 flows. C# service: 69 → 78. JS service: 65 → 70.
28. Resolve relative import paths for IMPORTS edge creation
Files:
src/pipeline/fqn.c,src/pipeline/pipeline.h,src/pipeline/pass_parallel.c,src/pipeline/pass_definitions.cNew
cbm_pipeline_resolve_import_path()resolves./and../segments against the importing file's directory. Extension probing:.js,.ts,.tsx,.jsx,.mjs,.mts,/index.js,/index.ts,.h,.hpp.Result: TS monolith: 153 → 11,770 IMPORTS (77x). JS service: 0 → 335. TS/React: 0 → 344.
29. CommonJS require() import extraction
Files:
internal/cbm/extract_imports.c,internal/cbm/lang_specs.cwalk_es_imports()only handled ESMimportstatements. Addsrequire()detection:const X = require('Y'),const { A, B } = require('Y'),var X = require('Y'), array destructured patterns.Testing
All 29 commits compile clean with
-Wall -Wextra -Werror. 2,586 existing tests pass. Stress-tested against: