feat: Token count for agents#860
Open
Ayaz-Microsoft wants to merge 18 commits into
Open
Conversation
Contributor
Contributor
There was a problem hiding this comment.
Pull request overview
Adds end-to-end LLM token usage telemetry for agent/workflow executions in the backend, plus Azure Monitor artifacts (workbook + KQL) to analyze usage by request, agent, model, and stage.
Changes:
- Added
TokenUsageAccumulator+ extraction helpers to capture token usage from Agent Framework responses/stream updates and emitLLM_*_Token_UsageApp Insights custom events. - Threaded
user_idthrough orchestrator entrypoints and API handlers; added per-request ContextVar propagation to tag telemetry emitted from deeper helpers (e.g., image generation). - Added standalone Bicep deployments for monitoring add-on resources and a “Token Usage” workbook, plus workbook JSON, KQL query pack, and docs.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/backend/token_usage.py | New module to extract/accumulate token counts and emit App Insights custom events. |
| src/backend/orchestrator.py | Creates/records/flushes token usage across workflow streaming, brief parsing, generation, and image paths; propagates user_id. |
| src/backend/app.py | Passes user_id into orchestrator calls for telemetry correlation. |
| infra/workbook/workbook.bicep | Standalone deployment of the Token Usage workbook targeting an App Insights resource (optional binding). |
| infra/workbook/README.md | Deployment instructions for the standalone workbook template. |
| infra/monitoring/monitoring.bicep | Standalone “add monitoring later” deployment (LA + App Insights). |
| infra/monitoring/README.md | Instructions for post-deploy monitoring enablement and wiring. |
| infra/dashboards/token-usage-workbook.json | Serialized workbook definition with tiles/queries for token usage analysis. |
| infra/dashboards/token-usage-queries.kql | KQL query pack for App Insights / Log Analytics. |
| docs/TokenUsageTelemetry.md | Documentation for emitted events, enabling telemetry, and querying/visualizing usage. |
| infra/main.bicep | Notes workbook is deployed separately; adds ACI tag hashing to force restart on monitoring config change. |
| infra/main_custom.bicep | Notes workbook deployed separately; adds ACI tag hashing; changes default gptModelCapacity. |
| infra/main.json | Recompiled ARM output with additional infra deltas beyond token telemetry. |
| .gitignore | Fixes rai_results ignore entry and adds Python coverage artifacts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Implemented TokenUsageAccumulator to track per-request, per-agent, and per-model token usage. - Emitted custom events to Azure Application Insights for monitoring. - Created KQL queries for visualizing token usage metrics in Application Insights. - Developed a workbook for easy access to token usage insights. - Updated orchestrator to integrate token usage tracking during message processing and response handling.
…nto its own template
# Conflicts: # src/backend/orchestrator.py
- Drop redundant extract_usage fallback in _RequestTokenTracker.record_event - Mark agent model as 'multiple' for mixed-model agents instead of locking first-seen - Thread user_id/conversation_id into select_products token telemetry - Fall back to gpt_model for Foundry title deployment to avoid empty model dimension Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Count total input and output tokens used by each agent at various stages and show in workbook for analysis.
Does this introduce a breaking change?
Golden Path Validation
Deployment Validation
What to Check
Verify that the following are valid
Other Information