Skip to content

feat(copaw): propagate traces to child agents and suppress duplicate entry spans#164

Open
Cirilla-zmh wants to merge 2 commits into
alibaba:mainfrom
Cirilla-zmh:feat/copaw_multi_agents
Open

feat(copaw): propagate traces to child agents and suppress duplicate entry spans#164
Cirilla-zmh wants to merge 2 commits into
alibaba:mainfrom
Cirilla-zmh:feat/copaw_multi_agents

Conversation

@Cirilla-zmh

@Cirilla-zmh Cirilla-zmh commented Apr 14, 2026

Copy link
Copy Markdown
Collaborator

Description

What changed

Shell subprocess: trace propagation (multi_agent_collaboration)

When AgentScope’s execute_shell_command runs a command that looks like a CoPaw sub-agent chat (copaw + agents + chat in the command string), the instrumentation merges the current trace context into the subprocess env using W3C TRACEPARENT / TRACESTATE (and configured propagators), aligned with OpenTelemetry’s environment carrier semantics. It sets COPAW_OTEL_CHILD_AGENT=1 so the child recognizes the role. An opt-in COPAW_OTEL_INJECT_SHELL_TRACE forces injection for every shell invocation (documented as advanced / risky for non-CoPaw children).

Entry span behavior in child processes

AgentRunner.query_handler is updated so that when COPAW_OTEL_CHILD_AGENT indicates a child agent process, it does not create a new enter_ai_application_system span; it attaches context extracted from the environment so AgentScope and other spans continue in the same trace as the parent.

Supporting modules

  • _env_carrier.py — local environment getter/setter for propagators (mirrors upstream _envcarrier where the published wheel may not expose it).
  • _constants.py — names and helpers for COPAW_OTEL_CHILD_AGENT, inject flags, and child-process detection.
  • _shell_patch.py — wraps execute_shell_command to merge env and delegate to the same async shell behavior with explicit env=.

Tests

  • instrumentation-loongsuite/loongsuite-instrumentation-copaw/tests/test_shell_propagate.py — asserts trace env injection and COPAW_OTEL_CHILD_AGENT for matching shell commands (and related behavior).
  • instrumentation-loongsuite/loongsuite-instrumentation-copaw/tests/test_child_entry_suppression.py — asserts child mode suppresses entry creation while attaching propagated context.

Documentation

  • instrumentation-loongsuite/loongsuite-instrumentation-copaw/README.md — new section Sub-agent CLI and trace continuity (multi_agent_collaboration), including baggage / OTEL_PROPAGATORS notes and advanced COPAW_OTEL_INJECT_SHELL_TRACE warning.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Unit tests

Does This PR Require a Core Repo Change?

  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

Change-Id: Iadc503014273de1c9455ff6f7518474083dbf1a8
Co-developed-by: Cursor <noreply@cursor.com>
@Cirilla-zmh Cirilla-zmh added enhancement New feature or request instrumentaion The instrumentation label represents issues related to instrumentation. genai The genai label represents issues related to generative AI. labels Apr 14, 2026
Change-Id: Ibfc5a4ad46670bd1bb18bcefcad95f876ebf389d
Co-developed-by: Cursor <noreply@cursor.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the CoPaw OpenTelemetry instrumentation to keep a single trace across parent/child CoPaw agent processes invoked via AgentScope shell tooling, while avoiding duplicate “entry” spans in child processes.

Changes:

  • Injects current trace context into execute_shell_command subprocess environments for copaw agents chat (or via opt-in env flag).
  • Suppresses enter_ai_application_system entry span creation in child CoPaw processes and attaches extracted parent context instead.
  • Adds supporting modules, tests validating propagation/suppression behavior, and updates documentation + changelog.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
instrumentation-loongsuite/loongsuite-instrumentation-copaw/src/opentelemetry/instrumentation/copaw/_shell_patch.py Wraps AgentScope shell execution to inject trace env and mark child processes.
instrumentation-loongsuite/loongsuite-instrumentation-copaw/src/opentelemetry/instrumentation/copaw/patch.py Updates query_handler wrapper to suppress entry in child mode and attach extracted context.
instrumentation-loongsuite/loongsuite-instrumentation-copaw/src/opentelemetry/instrumentation/copaw/_env_carrier.py Adds env getter/setter carrier for propagator injection/extraction.
instrumentation-loongsuite/loongsuite-instrumentation-copaw/src/opentelemetry/instrumentation/copaw/_constants.py Centralizes env var names and child-process detection helper.
instrumentation-loongsuite/loongsuite-instrumentation-copaw/src/opentelemetry/instrumentation/copaw/init.py Instruments/uninstruments AgentScope execute_shell_command in addition to query_handler.
instrumentation-loongsuite/loongsuite-instrumentation-copaw/tests/test_shell_propagate.py Tests shell command matching and env injection of TRACEPARENT + child marker.
instrumentation-loongsuite/loongsuite-instrumentation-copaw/tests/test_child_entry_suppression.py Tests child mode suppresses entry span emission.
instrumentation-loongsuite/loongsuite-instrumentation-copaw/README.md Documents sub-agent CLI trace continuity and the opt-in forced injection flag.
instrumentation-loongsuite/loongsuite-instrumentation-copaw/CHANGELOG.md Notes the multi-agent trace propagation and child entry suppression feature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +68 to +69
await asyncio.wait_for(proc.wait(), timeout=timeout)
stdout, stderr = await proc.communicate()
Comment on lines +147 to +157
env = _build_subprocess_env()
try:
return await _run_shell_command_with_env(command, timeout, env)
except Exception:
logger.debug(
"%s.%s inject path failed; falling back to original",
_MODULE_SHELL,
_PATCH_TARGET,
exc_info=True,
)
return await wrapped(*args, **kwargs)
Comment on lines +80 to +91
try:
proc.terminate()
stdout, stderr = await proc.communicate()
stdout_str = stdout.decode("utf-8")
stderr_str = stderr.decode("utf-8")
if stderr_str:
stderr_str += f"\n{stderr_suffix}"
else:
stderr_str = stderr_suffix
except ProcessLookupError:
stdout_str = ""
stderr_str = stderr_suffix
)
except Exception:
logger.debug("Failed to inject trace into env", exc_info=True)
return merged
@github-actions

github-actions Bot commented May 1, 2026

Copy link
Copy Markdown

This PR has been automatically marked as stale because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 days of this comment.
If you're still working on this, please add a comment or push new commits.

@github-actions github-actions Bot added the Stale label May 1, 2026
@ralf0131

ralf0131 commented May 5, 2026

Copy link
Copy Markdown
Collaborator

We need to progress on this so that it won't be closed.

@github-actions github-actions Bot removed the Stale label May 6, 2026
@github-actions

Copy link
Copy Markdown

This PR has been automatically marked as stale because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 days of this comment.
If you're still working on this, please add a comment or push new commits.

@github-actions github-actions Bot added the Stale label May 20, 2026

@ralf0131 ralf0131 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot

Summary

Propagates W3C trace context across CoPaw parent→child processes by wrapping AgentScope's execute_shell_command (injects TRACEPARENT/TRACESTATE + COPAW_OTEL_CHILD_AGENT=1) and, in the child, suppresses the duplicate enter_ai_application_system entry span while attaching the parent context.

Findings

  • [Warning] _shell_patch.py:_run_shell_command_with_env — when the trace-inject path is taken, only command and timeout are forwarded to the re-implemented subprocess call. Any additional **kwargs the upstream execute_shell_command accepts (e.g. cwd, env, shell) are silently dropped on the inject path (the except fallback calls the original wrapped(*args, **kwargs), but the success path does not). If upstream grows new parameters, the instrumented invocation will quietly lose them. Consider forwarding **kwargs (or at least documenting the supported subset) so the wrapper does not diverge.
  • [Warning] _shell_patch.py_run_shell_command_with_env duplicates the upstream subprocess behavior. This is a maintenance risk: if upstream execute_shell_command changes its return contract, error handling, or working-directory logic, the duplicate won't track it. Worth a comment linking to the upstream source version it mirrors.
  • [Info] _shell_patch.py:_build_subprocess_env — the fail-safe behavior on inject error is good (returns merged without COPAW_OTEL_CHILD_AGENT, so the child neither suppresses entry nor gets partial context).
  • [Info] patch.py — child mode correctly gates both start_entry and the GeneratorExit entry-cleanup, and the finally detaches the context token. Solid.

Suggestions

For the kwargs concern, a minimal safe change:

async def _run_shell_command_with_env(command, env, timeout, **_ignored):
    ...

…or, better, pass through the remaining kwargs to avoid silently dropping them.

Cross-repo Note

None — instrumentation is self-contained. (Related: alibaba/loongsuite-pilot does cross-process collection but is unaffected.)


Automated review by github-manager-bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request genai The genai label represents issues related to generative AI. instrumentaion The instrumentation label represents issues related to instrumentation. Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants