Skip to content

Use provider C# reference map with pre-write internalization#10976

Draft
live1206 wants to merge 89 commits into
microsoft:mainfrom
live1206:mtg-hybrid-reference-map
Draft

Use provider C# reference map with pre-write internalization#10976
live1206 wants to merge 89 commits into
microsoft:mainfrom
live1206:mtg-hybrid-reference-map

Conversation

@live1206

@live1206 live1206 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Replaces the C# generator's Roslyn reference-map/internalization path with a provider/expression-based reference map and pre-write provider accessibility updates while preserving generated-output correctness.

Latest changes

  • Builds generated-code reachability from TypeProvider metadata and provider expression/body dependency traversal instead of constructing the generated reference map through Roslyn.
  • Fully removes the generated-source Roslyn reference-map fallback for generated code; generated dependencies now come from provider metadata, expression trees, helper/body/signature dependency lists, and explicit provider relationships.
  • Applies internal/public accessibility before writing generated files by updating provider modifiers directly.
  • Skips the later Roslyn InternalizeAsync pass when pre-write accessibility has already been applied; removal and reduce still run through the existing post-processing pipeline.
  • Preserves XML docs for providers that are made internal before writing.
  • Removes model-factory methods for models that are internalized pre-write.
  • Handles custom-code roots through provider custom views plus customization-wide NamedTypeSymbolProviders, using lightweight metadata identity and syntax dependency extraction so arbitrary custom Roslyn symbols do not need full CSharpType/member materialization.
  • Keeps request-header extension helpers reachable from both generated expressions and custom-code syntax references.

How provider reference-map construction works

This PR no longer uses Roslyn to construct the generated-code reference map. Instead, generated type reachability is derived from the in-memory provider model before files are written.

  1. Build all TypeProviders first, then run visitors, customized-member filtering, and back-compat processing so the analyzer sees the final provider shape.
  2. Flatten generated providers into graph nodes, including top-level providers, nested providers, and serialization providers. Nodes are keyed by fully-qualified generated type name.
  3. Add graph edges from provider metadata: provider type, base type, declaring type, implemented interfaces, nested/serialization providers, properties, fields, constructors, method signatures, attributes, generic arguments, arrays, collection definitions, helper/body/signature dependencies, and structured provider expression/body references.
  4. Build both a public-signature graph and a full generated graph. The public graph decides what must remain public; the full graph decides what generated files/helpers are reachable for removal.
  5. Seed reachability from known roots: public clients, additional configured roots, API-baseline generated types, custom-code references, generated public declarations, discriminator/derived model relationships, union variants when needed, and helper roots discovered after an initial reachability pass.
  6. Compute candidates from reachability: unreachable public declarations become internalize candidates, internal/generated declarations needed by public API become publicize candidates, and unreachable full-graph declarations become remove candidates.
  7. Apply accessibility before writing by updating provider modifiers directly. Model-factory methods for internalized/removed models are filtered for output, XML docs are preserved for types made internal, and Roslyn InternalizeAsync is skipped because the generated files already have the final accessibility.

Roslyn is still used by the existing post-processing pipeline for removal/reduce work and for reading custom-code symbols/syntax, but not for generated reference-map construction or applying internalization.

Latest benchmark data

Latest local full-generation benchmark compares Roslyn as the baseline against the current provider implementation with pre-write internalization and the latest custom-code/state-cleanup fixes. The values below are aggregated means across 3 BenchmarkDotNet runs with profiling disabled.

Benchmark Roslyn baseline Provider reference map Improvement vs Roslyn
Full generation mean 855.0 ms / 68.43 MB 558.9 ms / 44.57 MB 34.6% faster, 34.9% less allocation

Per-run results:

Run Roslyn baseline Provider reference map Improvement
Run 1 835.5 ms / 68.52 MB 582.3 ms / 44.56 MB 30.3% faster, 35.0% less allocation
Run 2 883.2 ms / 68.16 MB 542.4 ms / 44.49 MB 38.6% faster, 34.7% less allocation
Run 3 846.4 ms / 68.60 MB 552.0 ms / 44.67 MB 34.8% faster, 34.9% less allocation

Benchmark artifacts:

Run Artifact
Latest no-profile full-generation run 1 /tmp/typespec-provider-map-latest-20260701-0828-run1
Latest no-profile full-generation run 2 /tmp/typespec-provider-map-latest-20260701-0828-run2
Latest no-profile full-generation run 3 /tmp/typespec-provider-map-latest-20260701-0828-run3
Previous no-profile full-generation run 1 /tmp/typespec-provider-map-no-profile-20260630-1547-run1
Previous no-profile full-generation run 2 /tmp/typespec-provider-map-no-profile-20260630-1547-run2
Previous no-profile full-generation run 3 /tmp/typespec-provider-map-no-profile-20260630-1547-run3
Previous provider reference-map reruns /tmp/typespec-hybrid-benchmark-reruns-20260629-065344
Pre-write internalization /tmp/typespec-prewrite-internalize-benchmark-20260629-073203

Azure SDK local project-reference benchmark

Measured Azure SDK for .NET management-plane regeneration using the same local project-reference setup in both runs: /workspaces/azure-sdk-for-net was wired to the sibling /workspaces/typespec checkout via .NET project references, and only the TypeSpec checkout changed between main and this PR branch.

SDK package TypeSpec main baseline This PR branch Improvement
Azure.ResourceManager.Network 00:37:01.9 00:07:12.1 80.6% faster / 5.14x speedup
Azure.ResourceManager.DataFactory 00:13:43.4 00:03:33.9 74.0% faster / 3.85x speedup
Azure.ResourceManager.AppService 00:10:46.9 00:03:06.5 71.2% faster / 3.47x speedup

Validation

  • Latest Azure SDK full regen PR: [DO NOT MERGE] Preview Generator Version 1.0.0-alpha.20260625.1 (Azure data plane) Azure/azure-sdk-for-net#60254.
  • Local full Azure data-plane regen with the MTG RegenPreview.ps1 -Azure script regenerated 39 Azure-branded data-plane libraries with no SDK code diffs and no public API diffs.
    • The full parallel run passed 38/39 libraries and hit a transient NuGet restore race in Azure.Data.AppConfiguration (*.nuget.g.props already existed`).
    • Rerunning Azure.Data.AppConfiguration alone with the same local generator passed, leaving no SDK/API diffs.
  • Targeted local Azure data-plane regen for Azure.AI.Vision.ImageAnalysis passed after the custom-code stack-overflow fix.
    • The remaining SDK changes are expected generated cleanup/reachability diffs: unused System/Azure usings are removed from the model factory, and internal error models are generated/marked buildable in the MRW context.
  • C# emitter npm run format passes.
  • Repo-root pnpm format passes; unrelated formatter changes outside http-client-csharp were discarded.
  • Full Microsoft.TypeSpec.Generator.ClientModel.Tests pass (1451 tests).
  • C# emitter npm run build -- --no-restore passes.
  • Focused Microsoft.TypeSpec.Generator.Tests customization/post-processing tests pass (ClientCustomizationTests, PostProcessorTests.RemovesInvalidUsings).
  • C# emitter npm run cop passes.

@microsoft-github-policy-service microsoft-github-policy-service Bot added the emitter:client:csharp Issue for the C# client emitter: @typespec/http-client-csharp label Jun 12, 2026
@pkg-pr-new

pkg-pr-new Bot commented Jun 12, 2026

Copy link
Copy Markdown

Open in StackBlitz

npm i https://pkg.pr.new/@typespec/http-client-csharp@10976

commit: 9951f4b

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

All changed packages have been documented.

  • @typespec/http-client-csharp
Show changes

@typespec/http-client-csharp - internal ✏️

Improve generated C# reference-map analysis by deriving generated body references from provider expression trees.

@live1206

live1206 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

Follow-up on the remaining samples question from the latest review: the provider remove path only computes removal candidates from generated provider graph nodes, not from arbitrary project symbols. Sample files are generated documents but they are not provider nodes, so sample classes themselves are not selected as remove candidates by the provider graph. For generated types referenced only by samples, the full DPG regen/no-diff validation is the practical coverage; I do not see an additional code change needed here. The remaining static-state concern is still a known low-priority follow-up rather than a blocker for this PR.

@JoshLove-msft

Copy link
Copy Markdown
Contributor

Design suggestion: derive body references from expression trees rather than declared deps + source re-parsing

Following up on the body-reference discussion — the current approach builds graph edges from provider metadata (signatures, properties, fields, attributes, etc.) and then compensates for body-level references three ways: manually-declared HelperDependencyNames / BodyDependencyTypes / IncludeGeneratedBodyReferences, AddGeneratedBodyReferences re-parsing generated source with Roslyn for a curated set of "candidate" providers, and hardcoded fallbacks (the SetDelimited scan, ChangeTracking helper roots, etc.).

The concern is soundness: a pure-metadata graph is an under-approximation of reachability, because a C# type reference can appear anywhere a method body can (new Helper(), SomeStatic.Call(), extension calls like request.Headers.SetDelimited(...), casts, typeof). For a removal/internalization pass, under-approximating is the dangerous direction — you conclude "nothing references X," internalize/remove it, and the break surfaces later as a downstream compile error or a silent public-API regression. Correctness then rests on provider authors remembering to declare every body dependency (a manually-maintained parallel list that will drift) plus the hand-curated candidate list.

Suggestion: walk the body expression trees instead. The bodies already exist as structured MethodBodyStatement / ValueExpression trees before they're rendered to source. Traversing those to collect CSharpType references (constructor targets, static member owners, type literals, generic args, cast targets) would be:

  • sound by construction for anything the generator itself emits — it closes the gap generically instead of per-provider declaration, and would subsume the SetDelimited special-case and most BodyDependencyTypes/HelperDependencyNames overrides;
  • likely cheaper than AddGeneratedBodyReferences re-parsing generated C# through a Roslyn semantic model.

The fact that the PR re-parses generated source for the client/serialization candidates (rather than walking expression trees) suggests the tree walk wasn't feasible for all bodies — presumably where bodies are raw TypeProvider source or interpolated string literals whose contents aren't structured ValueExpressions. If so, that's the honest residual limitation to document: expression-tree walking would handle the structured majority soundly, with a narrow, explicit fallback (or a CI superset-check against the existing full Roslyn map) for the unstructured remainder — instead of today's broader reliance on manual declarations + name-based special-cases.

Not a merge blocker, but I think it's the change that would turn this from "correct-by-testing" into "correct-by-construction." Would also be worth attributing how much of the ~21% comes from skipping body analysis vs. from pre-write internalization (skipping the Roslyn InternalizeAsync/reduce passes) — if it's mostly the latter, a sound full-body map may retain most of the win.

--generated by Copilot

live1206 and others added 13 commits July 1, 2026 02:47
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… mtg-hybrid-reference-map

# Conflicts:
#	packages/http-client-csharp/generator/Microsoft.TypeSpec.Generator.ClientModel/test/PostProcessing/ClientBodyDependencyPostProcessingTests.cs
#	packages/http-client-csharp/generator/Microsoft.TypeSpec.Generator/src/PostProcessing/ProviderReferenceMapAnalyzer.cs
#	packages/http-client-csharp/generator/Microsoft.TypeSpec.Generator/src/Providers/TypeProvider.cs
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
live1206 and others added 7 commits July 1, 2026 07:01
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@live1206 live1206 marked this pull request as draft July 1, 2026 12:29
@@ -0,0 +1,7 @@
---

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should remove this change

Comment thread .chronus/config.yaml
changelog: ["@chronus/github/changelog", { repo: "microsoft/typespec" }]

additionalPackages:
- packages/http-client-csharp

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this intentional?

get
{
var ns = _namedTypeSymbol.ContainingNamespace.GetFullyQualifiedNameFromDisplayString();
return string.IsNullOrEmpty(ns) ? _namedTypeSymbol.Name : $"{ns}.{_namedTypeSymbol.Name}";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also cache this value please so it's not calculated each time?


foreach (var outputType in output.TypeProviders)
{
if (outputType is ModelFactoryProvider && outputType.Methods.Count == 0)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this only needed for model factories ?

CodeModelGenerator.Instance.AdditionalRootTypes
.Concat(CodeModelGenerator.Instance.NonRootTypes)
.Select(GetFileNameForType))
.Distinct(StringComparer.Ordinal)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, is the distinct necessary? The root types are a hashset, is it possible for there to be duplicates once we get the filename?

.ToArray();
}

private static string GetFileNameForType(string typeName)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this further, why do we need to explicitly do this now ? The types to keep were already being checked in the postprocess step. If we must do this here, should we try and find the corresponding type in the outputlibrary by name, and use it's RelativePath to get the name of the file instead?

Comment on lines +101 to +122
var generatedPublicReachable = GetReachableTypes(internalizeRoots, internalizeReferences);
AddDerivedModelReferences(providers, publicGraph.Nodes, internalizeReferences, generatedPublicReachable, generatedDiscriminatorBaseNames);
internalizeRoots.UnionWith(customPublicRoots);
var internalizeReachableWithoutHelpers = GetReachableTypes(internalizeRoots, internalizeReferences);
AddDerivedModelReferences(providers, publicGraph.Nodes, internalizeReferences, internalizeReachableWithoutHelpers, generatedDiscriminatorBaseNames);
internalizeReachableWithoutHelpers = GetReachableTypes(internalizeRoots, internalizeReferences);
var publicizeRoots = internalizeRoots.ToHashSet(StringComparer.Ordinal);
var internalizeHelperRoots = GetHelperRootNames(generatedProviders, graph.Nodes, internalizeReachableWithoutHelpers);
internalizeRoots.UnionWith(internalizeHelperRoots);
var internalizeReachable = GetReachableTypes(internalizeRoots, internalizeReferences);
var internalizeDeclaredNodes = GetPostProcessorDeclaredNodes(generatedProviders, graph.Nodes, publicOnly: true);
var customInternalBoundaryNodes = graph.Nodes
.Where(name => publicGraph.References.TryGetValue(name, out var references) && references.Overlaps(customInternalDeclarations))
.ToHashSet(StringComparer.Ordinal);
var publicizeDeclaredNodes = GetPostProcessorDeclaredNodes(generatedProviders, graph.Nodes, publicOnly: false)
.Except(internalizeDeclaredNodes, StringComparer.Ordinal);
var generatedImplementationInternalDeclarations = GetGeneratedImplementationInternalTypeDeclarations(generatedInternalDeclarations).ToHashSet(StringComparer.Ordinal);
var publicApiTraversalNodes = internalizeDeclaredNodes
.Except(generatedInternalDeclarations, StringComparer.Ordinal)
.Concat(publicizeDeclaredNodes)
.Except(generatedImplementationInternalDeclarations, StringComparer.Ordinal)
.ToHashSet(StringComparer.Ordinal);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be honest and say this is quite difficult to follow ☹️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

emitter:client:csharp Issue for the C# client emitter: @typespec/http-client-csharp

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants