fix: Token fix acount Token count Monitor#945
Open
Ashwal-Microsoft wants to merge 10 commits into
Open
Conversation
…, teams, and models - Add token_usage_utils.py with extraction and emission utilities - Integrate token tracking into chat_service.py streaming flow - Add KQL queries and Azure Monitor workbook for dashboards - Add unit tests (27 tests) for token usage utilities - Add AZURE_OPENAI_MODEL_DEPLOYMENT and TEAM_NAME env vars Tracks per-agent, per-user, per-team, and per-model token consumption to Application Insights for monitoring, cost estimation, and optimization. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This pull request introduces a cross-accelerator token-usage telemetry module and wires it into key backend LLM call sites so token usage can be emitted as standardized Application Insights custom events (plus adds supporting sample/config/dashboard assets).
Changes:
- Added
common.logging.llm_token_telemetrywith token extraction helpers, an emitter, and a scope/decorator for consistent event emission. - Introduced a process-wide
token_emittersingleton (src/api/telemetry.py) and integrated token tracking into chat streaming and title generation. - Added supporting artifacts for monitoring and sample data (KQL queries, infra parameter, sample transcripts/SQL inserts) and corresponding tests.
Reviewed changes
Copilot reviewed 15 out of 23 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/tests/api/common/logging/test_llm_token_telemetry.py | New unit tests for token telemetry helpers/emitter/scope. |
| src/api/telemetry.py | Adds a process-wide TokenUsageEmitter singleton configured via env vars. |
| src/api/services/history_service.py | Emits token usage for the title-generation agent run. |
| src/api/services/chat_service.py | Tracks/accumulates token usage across streaming agent chunks and emits telemetry. |
| src/api/common/logging/llm_token_telemetry.py | New core telemetry implementation: extraction, event emission, scope/decorator. |
| src/api/.env.sample | Adds env placeholders for token-tracking related settings. |
| infra/scripts/index_scripts/sql_files/processed_new_key_phrases.sql | Adds SQL insert script content for processed key phrases (sample data). |
| infra/scripts/index_scripts/sql_files/processed_data_batch_insert.sql | Adds batch insert SQL for processed conversation records (sample data). |
| infra/main.parameters.json | Adds enableMonitoring parameter substitution for deployments. |
| infra/dashboards/token-usage-queries.kql | Adds ready-to-run App Insights KQL queries for token-usage monitoring/cost estimation. |
| call_transcripts/convo_*.json | Adds sample call transcript JSON files used by data processing flows. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+295
to
+307
| in_details = _get(usage, "input_token_details") or {} | ||
| out_details = _get(usage, "output_token_details") or {} | ||
|
|
||
| record = TokenUsage( | ||
| input_tokens=inp, | ||
| output_tokens=out, | ||
| total_tokens=tot, | ||
| input_audio_tokens=_to_int(_get(in_details, "audio_tokens")), | ||
| input_text_tokens=_to_int(_get(in_details, "text_tokens")), | ||
| input_cached_tokens=_to_int(_get(in_details, "cached_tokens")), | ||
| output_audio_tokens=_to_int(_get(out_details, "audio_tokens")), | ||
| output_text_tokens=_to_int(_get(out_details, "text_tokens")), | ||
| ) |
Comment on lines
+34
to
+36
| # Token usage tracking configuration | ||
| AZURE_OPENAI_MODEL_DEPLOYMENT= | ||
| TEAM_NAME= |
Comment on lines
+713
to
+721
| self._log.info( | ||
| "[TOKEN USAGE] agent=%s model=%s input=%d output=%d total=%d %s", | ||
| agent_name, | ||
| model_deployment_name, | ||
| usage.input_tokens, | ||
| usage.output_tokens, | ||
| usage.total_tokens, | ||
| " ".join(f"{k}={v}" for k, v in dimensions.items() if v), | ||
| ) |
- Use TokenUsageScope as context manager (with statement) instead of manual __exit__ call to guarantee emission on all exit paths - Fix extract_realtime_usage to preserve None for missing optional token detail fields instead of coercing to 0 - Remove redundant double extraction in TokenUsageScope.add() since extract_usage_from_stream_chunk already calls extract_usage internally - Hash user_id in emit_all() log statement to prevent leaking raw IDs - Remove unused 'patch' import from test module - Add missing LLM_TOKEN_SAMPLE_RATE, LLM_TOKEN_USER_ID_HMAC_KEY, and LLM_TOKEN_PRICING to .env.sample Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
Coverage Report •
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment on lines
+59
to
61
| "enableMonitoring": { | ||
| "value": "${enableMonitoring}" | ||
| } |
Comment on lines
+719
to
+722
| safe_dims = dict(dimensions) | ||
| if "user_id" in safe_dims: | ||
| safe_dims["user_id"] = self._apply_user_id_hash(safe_dims["user_id"]) | ||
|
|
- Fix duplicate/conflicting imports in history_service.py (consolidated to single import line with get_azure_credential_async and build_async_azure_credential, removed unused get_azure_credential) - Fix enableMonitoring parameter to use azd-compatible env var pattern with default value (AZURE_ENV_ENABLE_MONITORING=false) - Strip user_id from logs entirely when HMAC hasher is not configured to prevent PII leakage in application logs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…iles These were accidentally committed alongside the token telemetry feature. They are not part of the token monitoring fix scope. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
These were accidentally included in commit caabe82 and are not part of the token monitoring fix scope. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The fallback hasattr(__iter__) check does accept arbitrary iterables (excluding str/bytes/Mapping), so update the docstring accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Does this introduce a breaking change?
Golden Path Validation
Deployment Validation