[copilot-cli-research] Copilot CLI Deep Research - 2026-05-15 #32287

2026-05-15T05:03:38Z

github-actions[bot]
Bot May 15, 2026

Analysis Date: 2026-05-15
Repository: github/gh-aw
Scope: 99 Copilot workflows out of 494 total (20%)

📊 Executive Summary

Research Topic: Copilot CLI Optimization Opportunities
Key Findings: 8 missed opportunities identified across security, performance, and developer experience
Primary Recommendation: Enable max-continuations for complex multi-step workflows — currently 0% adoption despite being a Copilot-exclusive feature

This analysis examined 99 workflows using engine: copilot against all available Copilot CLI capabilities. The workflows demonstrate strong adoption of safe-outputs (86%), GitHub MCP tools (62%), and cache-memory (32%). However, several powerful features remain entirely unused: max-continuations has 0% adoption, engine.bare mode is never used, and only 2 workflows leverage custom agent files. Model selection is also largely unconfigured (84% use defaults), missing cost/performance optimization opportunities.

The most pressing gap is network security: 52 of 99 Copilot workflows (53%) have no network.allowed configuration, and only 12 (12%) use the AWF sandbox. This represents a significant security hardening opportunity across the repository.

Critical Findings

🔴 High Priority Issues

1. Missing Network Restrictions (53% of workflows)
52 out of 99 Copilot workflows have no network.allowed configuration. Without network restrictions, the agent can reach arbitrary external hosts during execution.

Affected examples: most daily-* report workflows, stale-* cleanup workflows, sub-issue-closer.md

Fix:

network:
  allowed:
    - defaults  # github.com + copilot endpoints

2. AWF Sandbox Only 12% Adoption
Only 12 workflows use sandbox: agent: awf despite the firewall providing meaningful isolation against prompt injection and data exfiltration.

🟡 Medium Priority Opportunities

1. max-continuations — 0% usage
This is a Copilot-exclusive feature (no other engine supports it). It enables autopilot mode for long multi-step tasks with --max-autopilot-continues. Not a single workflow uses it.

2. Model not specified — 84% of workflows
83 of 99 workflows use the default model. Workflows like daily reports, summaries, and simple triage could use model: small for significant cost reduction.

3. engine.bare — 0% usage
No workflow uses bare mode, which disables automatic context loading (memory files, copilot-instructions.md). Read-only analysis workflows loading unnecessary context waste tokens every run.

View Full Analysis

1️⃣ Current State Analysis

View Copilot CLI Capabilities Inventory

Copilot CLI Capabilities Inventory

Feature	Config Key	Notes
Custom agent file	`engine.agent`	`.github/agents/*.agent.md`
Version pinning	`engine.version`	Defaults to `latest`
Model selection	`engine.model`	`small`, `large`, or specific model
Custom CLI args	`engine.args`	Appended to copilot CLI invocation
Custom env vars	`engine.env`	Injected into agent step
Enterprise endpoint	`engine.api-target`	GHEC/GHES hostname
Bare mode	`engine.bare` / `bare: true`	Disables context loading
Autopilot continuations	`max-continuations`	Copilot-only feature
AWF Firewall sandbox	`sandbox.agent: awf`	Network + filesystem isolation
Network allowlist	`network.allowed`	Restrict outbound connections
GitHub MCP toolsets	`tools.github.toolsets`	Granular GitHub API access
Cache memory	`tools.cache-memory`	File-based persistence
Repo memory	`tools.repo-memory`	Git-branch-based persistence
BYOK mode	`engine.env.COPILOT_PROVIDER_*`	Bring your own LLM key
Strict mode	`strict: true/false`	Security gate

View Usage Statistics

Usage Statistics (99 Copilot workflows)

Feature	Count	Adoption %
`safe-outputs`	85	86%
`timeout-minutes` set	97	98%
`tools.github` MCP	61	62%
`network.allowed`	47	47%
`tools.cache-memory`	32	32%
`engine.model` specified	16	16%
`tools.repo-memory`	19	19%
`sandbox.agent: awf`	12	12%
`engine.env`	10	10%
`engine.args`	4	4%
`strict: false`	4	4%
`engine.agent`	2	2%
`engine.version` pinned	0	0%
`max-continuations`	0	0%
`engine.bare`	0	0%

2️⃣ Feature Usage Matrix

Feature Category	Available	Used	Not Used	Usage Rate
CLI Execution	max-continuations, bare, agent, version, args	args(4), agent(2)	max-continuations, bare, version	6%
Model Config	small, large, specific model names	small(13), large(3)	default for 83 workflows	16%
Security	AWF sandbox, network allowlist, strict mode	network(47), sandbox(12)	52 without network	47–12%
MCP/Tools	github, cache-memory, repo-memory, playwright, mcp-scripts	github(61), cache-memory(32)	playwright(8), mcp-scripts(1)	varies
Engine Env	BYOK providers, custom vars	10 workflows	89 workflows	10%

3️⃣ Missed Opportunities

View High Priority Opportunities

🔴 High Priority

Opportunity 1: Network Restriction (53% of workflows unprotected)

What: 52 Copilot workflows have no network.allowed block
Why It Matters: Unrestricted network access allows agent to reach arbitrary hosts; risk of data exfiltration or SSRF
Where: daily-issues-report.md, stale-pr-cleanup.md, sub-issue-closer.md, draft-pr-cleanup.md, and 48 more
How to Implement: Add network: allowed: [defaults] as a minimum baseline

network:
  allowed:
    - defaults

Opportunity 2: AWF Sandbox Adoption (12% → target 50%+)

What: Only 12 workflows enable the AWF firewall sandbox
Why It Matters: Sandbox prevents file system tampering, limits blast radius from prompt injection
Where: All workflows that process untrusted input (issue bodies, PR descriptions, external data)
How to Implement:

sandbox:
  agent: awf

View Medium Priority Opportunities

🟡 Medium Priority

Opportunity 3: `max-continuations` — 0% Usage (Copilot-Exclusive Feature)

What: max-continuations enables autopilot mode, letting Copilot CLI chain multiple runs automatically for long tasks
Why It Matters: Complex workflows like daily-compiler-quality.md, dead-code-remover.md, code-scanning-fixer.md could benefit from chained execution without manual re-triggering
Where: Workflows with complex multi-step tasks or where single runs may time out
How to Implement:

engine: copilot
max-continuations: 3
timeout-minutes: 60

Opportunity 4: Model Selection — 84% Using Defaults

What: 83 workflows rely on default model selection; model: small costs less for simple tasks
Why It Matters: Significant token cost reduction for report/summary/triage workflows
Where: daily-issues-report.md, stale-pr-cleanup.md, weekly-issue-summary.md, auto-triage-issues.md, and 79 more
How to Implement: Use model: small for read-only analysis and reporting; keep model: large for code changes

model: small  # For read-only reports, summaries, triage

Opportunity 5: `engine.bare` Mode — 0% Usage

What: Bare mode (bare: true) disables automatic loading of AGENTS.md, copilot-instructions.md, and memory files
Why It Matters: Read-only analysis workflows waste tokens loading unnecessary context on every run
Where: daily-issues-report.md, api-consumption-report.md, daily-performance-summary.md, all pure-read report workflows
How to Implement:

engine:
  id: copilot
  bare: true

Opportunity 6: `engine.agent` Custom Agents — 2% Usage

What: Custom agent files in .github/agents/ provide workflow-specific system prompts
Why It Matters: Only 2 workflows (daily-agent-of-the-day-blog-writer.md, weekly-blog-post-writer.md) use the awf agent; specialized agents could improve output quality for domain-specific workflows
Where: Workflows like pr-code-quality-reviewer.md, architecture-guardian.md, security-compliance.md would benefit from expert persona agents
How to Implement: Create .github/agents/security-reviewer.agent.md with specialized persona, then reference:

engine:
  id: copilot
  agent: security-reviewer

View Low Priority Opportunities

🟢 Low Priority

Opportunity 7: Version Pinning — 0% Usage

What: No workflow pins a specific Copilot CLI version (engine.version)
Why It Matters: Workflow behavior can change unexpectedly when a new CLI version is released
Where: Smoke tests, critical production workflows
How: engine: id: copilot + version: "0.0.422"

Opportunity 8: Over-broad GitHub MCP Toolsets

What: 14 workflows use toolsets: [default] which includes all GitHub API tools; many could use narrower sets
Why It Matters: Principle of least privilege; reduces attack surface
Where: Workflows that only read issues could use toolsets: [issues] instead of [default]
How: toolsets: [issues] instead of toolsets: [default] for read-only workflows

4️⃣ Specific Workflow Recommendations

View Workflow-Specific Recommendations

`daily-issues-report.md`

Current State: No network config, no model specified, no sandbox
Recommendations: Add model: small, add network: allowed: [defaults], consider bare: true
Expected Benefit: Reduced token costs, improved security posture

`dead-code-remover.md`

Current State: Complex code analysis task, no max-continuations
Recommendations: Add max-continuations: 2 to allow multi-pass analysis
Expected Benefit: More complete dead code removal in single workflow run

`pr-code-quality-reviewer.md`

Current State: Generic Copilot agent for code review
Recommendations: Create .github/agents/code-reviewer.agent.md with review-focused persona; add engine.agent: code-reviewer
Expected Benefit: More consistent, specialized code review output

`auto-triage-issues.md`

Current State: No model specified, no sandbox
Recommendations: model: small (reads issues, simple classification), add network restriction
Expected Benefit: Lower cost, better security for untrusted input processing

5️⃣ Trends & Insights

View Historical Trends

This is the first comprehensive analysis. Key baseline metrics established for future comparison:

max-continuations: 0% → watch for adoption as teams learn about autopilot mode
engine.bare: 0% → new feature, expect gradual adoption
Network config: 47% → target 90%+ as security awareness grows
AWF sandbox: 12% → target 40%+ for workflows handling untrusted input
Model optimization: 16% → target 60%+ to reduce costs

Future research should track whether recommendations from this analysis are implemented and measure trend lines for each metric.

6️⃣ Best Practice Guidelines

Based on this research, here are recommended best practices:

Always set network.allowed: Even allowed: [defaults] is significantly better than open access. Make it a required field in workflow reviews.
Use model: small for non-coding tasks: Daily reports, issue summaries, triage workflows, and read-only analysis rarely need the full model. This alone could cut token costs for ~50+ workflows.
Adopt max-continuations for long-running tasks: Complex workflows that currently time out or need re-triggering are ideal candidates. Start with max-continuations: 2 and monitor.
Create domain-specific agent files: The .github/agents/ directory is underutilized. Creating specialized agents (security-reviewer, code-quality, documentation-writer) could significantly improve output quality.
Use engine.bare for pure analysis workflows: Any workflow that reads data and reports without needing repo-specific instructions should set bare: true to skip context loading.

7️⃣ Action Items

Immediate Actions (this week):

Add network: allowed: [defaults] to the 52 workflows without any network config
Identify top 10 report workflows and add model: small

Short-term (this month):

Enable max-continuations: 2 on dead-code-remover.md, code-scanning-fixer.md, repository-quality-improver.md
Add bare: true to 10+ pure read/report workflows
Create 2-3 specialized agent files for common workflow types (code-reviewer, report-writer, security-analyst)

Long-term (this quarter):

Expand AWF sandbox adoption from 12% to 40%+
Audit all toolsets: [default] usages and narrow to minimum required toolsets
Add version pinning to smoke test workflows for reproducibility

View Supporting Evidence & Methodology

📚 References

Copilot Engine Documentation: docs/src/content/docs/reference/engines.md
Engine Execution Code: pkg/workflow/copilot_engine_execution.go
Engine Tools Code: pkg/workflow/copilot_engine_tools.go
MCP Config Code: pkg/workflow/copilot_mcp.go

Research Methodology

Scanned all 494 workflow .md files in .github/workflows/
Identified 99 workflows using engine: copilot
Used grep pattern analysis to measure feature adoption rates
Cross-referenced available features from Go source code and documentation
Compared available features vs actual usage to identify gaps
Workflows examined: all 99 Copilot workflows for presence/absence of 15 feature categories

Generated by Copilot CLI Deep Research (Run: §25901095924)

Generated by 🔬 Copilot CLI Deep Research Agent · ● 9.8M · ◷

expires on May 16, 2026, 5:03 AM UTC

2026-05-16T04:53:58Z

github-actions[bot]
Bot May 16, 2026
Author

This discussion has been marked as outdated by Copilot CLI Deep Research Agent.

A newer discussion is available at Discussion #32544.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[copilot-cli-research] Copilot CLI Deep Research - 2026-05-15 #32287

Uh oh!

{{title}}

Uh oh!

1️⃣ Current State Analysis

Copilot CLI Capabilities Inventory

Usage Statistics (99 Copilot workflows)

2️⃣ Feature Usage Matrix

3️⃣ Missed Opportunities

🔴 High Priority

Opportunity 1: Network Restriction (53% of workflows unprotected)

Opportunity 2: AWF Sandbox Adoption (12% → target 50%+)

🟡 Medium Priority

Opportunity 3: `max-continuations` — 0% Usage (Copilot-Exclusive Feature)

Opportunity 4: Model Selection — 84% Using Defaults

Opportunity 5: `engine.bare` Mode — 0% Usage

Opportunity 6: `engine.agent` Custom Agents — 2% Usage

🟢 Low Priority

Opportunity 7: Version Pinning — 0% Usage

Opportunity 8: Over-broad GitHub MCP Toolsets

4️⃣ Specific Workflow Recommendations

`daily-issues-report.md`

`dead-code-remover.md`

`pr-code-quality-reviewer.md`

`auto-triage-issues.md`

5️⃣ Trends & Insights

6️⃣ Best Practice Guidelines

📚 References

Research Methodology

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[copilot-cli-research] Copilot CLI Deep Research - 2026-05-15 #32287

Uh oh!

github-actions[bot] Bot May 15, 2026

📊 Executive Summary

Critical Findings

🔴 High Priority Issues

🟡 Medium Priority Opportunities

1️⃣ Current State Analysis

Copilot CLI Capabilities Inventory

Usage Statistics (99 Copilot workflows)

2️⃣ Feature Usage Matrix

3️⃣ Missed Opportunities

🔴 High Priority

Opportunity 1: Network Restriction (53% of workflows unprotected)

Opportunity 2: AWF Sandbox Adoption (12% → target 50%+)

🟡 Medium Priority

Opportunity 3: max-continuations — 0% Usage (Copilot-Exclusive Feature)

Opportunity 4: Model Selection — 84% Using Defaults

Opportunity 5: engine.bare Mode — 0% Usage

Opportunity 6: engine.agent Custom Agents — 2% Usage

🟢 Low Priority

Opportunity 7: Version Pinning — 0% Usage

Opportunity 8: Over-broad GitHub MCP Toolsets

4️⃣ Specific Workflow Recommendations

daily-issues-report.md

dead-code-remover.md

pr-code-quality-reviewer.md

auto-triage-issues.md

5️⃣ Trends & Insights

6️⃣ Best Practice Guidelines

7️⃣ Action Items

📚 References

Research Methodology

Replies: 1 comment

Uh oh!

github-actions[bot] Bot May 16, 2026 Author

github-actions[bot]
Bot May 15, 2026

Opportunity 3: `max-continuations` — 0% Usage (Copilot-Exclusive Feature)

Opportunity 5: `engine.bare` Mode — 0% Usage

Opportunity 6: `engine.agent` Custom Agents — 2% Usage

`daily-issues-report.md`

`dead-code-remover.md`

`pr-code-quality-reviewer.md`

`auto-triage-issues.md`

github-actions[bot]
Bot May 16, 2026
Author