Skip to content

fix: Token fix acount Token count Monitor#945

Open
Ashwal-Microsoft wants to merge 10 commits into
devfrom
Token-fix-Acount
Open

fix: Token fix acount Token count Monitor#945
Ashwal-Microsoft wants to merge 10 commits into
devfrom
Token-fix-Acount

Conversation

@Ashwal-Microsoft

@Ashwal-Microsoft Ashwal-Microsoft commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Purpose

  • Implement comprehensive token usage tracking across all LLM call sites

Does this introduce a breaking change?

  • Yes
  • No

Golden Path Validation

  • I have tested the primary workflows (the "golden path") to ensure they function correctly without errors.

Deployment Validation

  • I have validated the deployment process successfully and all services are running as expected with this change.

Ashwal-Microsoft and others added 4 commits May 14, 2026 15:09
…, teams, and models

- Add token_usage_utils.py with extraction and emission utilities
- Integrate token tracking into chat_service.py streaming flow
- Add KQL queries and Azure Monitor workbook for dashboards
- Add unit tests (27 tests) for token usage utilities
- Add AZURE_OPENAI_MODEL_DEPLOYMENT and TEAM_NAME env vars

Tracks per-agent, per-user, per-team, and per-model token consumption
to Application Insights for monitoring, cost estimation, and optimization.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces a cross-accelerator token-usage telemetry module and wires it into key backend LLM call sites so token usage can be emitted as standardized Application Insights custom events (plus adds supporting sample/config/dashboard assets).

Changes:

  • Added common.logging.llm_token_telemetry with token extraction helpers, an emitter, and a scope/decorator for consistent event emission.
  • Introduced a process-wide token_emitter singleton (src/api/telemetry.py) and integrated token tracking into chat streaming and title generation.
  • Added supporting artifacts for monitoring and sample data (KQL queries, infra parameter, sample transcripts/SQL inserts) and corresponding tests.

Reviewed changes

Copilot reviewed 15 out of 23 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/tests/api/common/logging/test_llm_token_telemetry.py New unit tests for token telemetry helpers/emitter/scope.
src/api/telemetry.py Adds a process-wide TokenUsageEmitter singleton configured via env vars.
src/api/services/history_service.py Emits token usage for the title-generation agent run.
src/api/services/chat_service.py Tracks/accumulates token usage across streaming agent chunks and emits telemetry.
src/api/common/logging/llm_token_telemetry.py New core telemetry implementation: extraction, event emission, scope/decorator.
src/api/.env.sample Adds env placeholders for token-tracking related settings.
infra/scripts/index_scripts/sql_files/processed_new_key_phrases.sql Adds SQL insert script content for processed key phrases (sample data).
infra/scripts/index_scripts/sql_files/processed_data_batch_insert.sql Adds batch insert SQL for processed conversation records (sample data).
infra/main.parameters.json Adds enableMonitoring parameter substitution for deployments.
infra/dashboards/token-usage-queries.kql Adds ready-to-run App Insights KQL queries for token-usage monitoring/cost estimation.
call_transcripts/convo_*.json Adds sample call transcript JSON files used by data processing flows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/api/services/chat_service.py Outdated
Comment on lines +295 to +307
in_details = _get(usage, "input_token_details") or {}
out_details = _get(usage, "output_token_details") or {}

record = TokenUsage(
input_tokens=inp,
output_tokens=out,
total_tokens=tot,
input_audio_tokens=_to_int(_get(in_details, "audio_tokens")),
input_text_tokens=_to_int(_get(in_details, "text_tokens")),
input_cached_tokens=_to_int(_get(in_details, "cached_tokens")),
output_audio_tokens=_to_int(_get(out_details, "audio_tokens")),
output_text_tokens=_to_int(_get(out_details, "text_tokens")),
)
Comment thread src/api/common/logging/llm_token_telemetry.py Outdated
Comment thread src/tests/api/common/logging/test_llm_token_telemetry.py Outdated
Comment thread src/api/.env.sample
Comment on lines +34 to +36
# Token usage tracking configuration
AZURE_OPENAI_MODEL_DEPLOYMENT=
TEAM_NAME=
Comment on lines +713 to +721
self._log.info(
"[TOKEN USAGE] agent=%s model=%s input=%d output=%d total=%d %s",
agent_name,
model_deployment_name,
usage.input_tokens,
usage.output_tokens,
usage.total_tokens,
" ".join(f"{k}={v}" for k, v in dimensions.items() if v),
)
Ashwal-Microsoft and others added 2 commits June 2, 2026 22:24
- Use TokenUsageScope as context manager (with statement) instead of
  manual __exit__ call to guarantee emission on all exit paths
- Fix extract_realtime_usage to preserve None for missing optional
  token detail fields instead of coercing to 0
- Remove redundant double extraction in TokenUsageScope.add() since
  extract_usage_from_stream_chunk already calls extract_usage internally
- Hash user_id in emit_all() log statement to prevent leaking raw IDs
- Remove unused 'patch' import from test module
- Add missing LLM_TOKEN_SAMPLE_RATE, LLM_TOKEN_USER_ID_HMAC_KEY, and
  LLM_TOKEN_PRICING to .env.sample

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 3, 2026 10:06
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Coverage

Coverage Report •
FileStmtsMissCoverMissing
src/api
   telemetry.py462447%39–43, 50, 52–54, 56, 63–76
src/api/common/logging
   llm_token_telemetry.py42214465%103, 108, 137, 165, 176–182, 210, 229, 248–261, 276, 285–287, 289–293, 295–296, 298, 300–302, 304, 314, 323–324, 332–348, 362–363, 413–414, 419–425, 429, 445, 450–454, 459–461, 468–473, 484, 486–488, 498, 513–514, 523, 530–532, 540–542, 554, 572, 589, 605, 625, 648–650, 676, 722, 790–792, 800, 803–804, 830–831, 857, 859–860, 862–869, 871–877, 879–880, 885, 895
src/api/services
   chat_service.py1912586%64–65, 220–222, 225–234, 261–264, 268, 271, 282–283, 303–304
   history_service.py2132488%110, 241–242, 244, 281–283, 299, 305–307, 324, 340–341, 343, 359, 385–386, 388, 404, 424–425, 427, 446
TOTAL180833381% 

Tests Skipped Failures Errors Time
191 0 💤 0 ❌ 0 🔥 7.071s ⏱️

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 23 changed files in this pull request and generated 9 comments.

Comment thread src/api/services/history_service.py Outdated
Comment on lines +59 to 61
"enableMonitoring": {
"value": "${enableMonitoring}"
}
Comment on lines +719 to +722
safe_dims = dict(dimensions)
if "user_id" in safe_dims:
safe_dims["user_id"] = self._apply_user_id_hash(safe_dims["user_id"])

Comment thread infra/scripts/index_scripts/sql_files/processed_data_batch_insert.sql Outdated
- Fix duplicate/conflicting imports in history_service.py (consolidated
  to single import line with get_azure_credential_async and
  build_async_azure_credential, removed unused get_azure_credential)
- Fix enableMonitoring parameter to use azd-compatible env var pattern
  with default value (AZURE_ENV_ENABLE_MONITORING=false)
- Strip user_id from logs entirely when HMAC hasher is not configured
  to prevent PII leakage in application logs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…iles

These were accidentally committed alongside the token telemetry feature.
They are not part of the token monitoring fix scope.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 3, 2026 12:28
These were accidentally included in commit caabe82 and are not
part of the token monitoring fix scope.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment thread src/api/common/logging/llm_token_telemetry.py Outdated
The fallback hasattr(__iter__) check does accept arbitrary iterables
(excluding str/bytes/Mapping), so update the docstring accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants