Skip to content

AIDC-AI/complex-mcp

Repository files navigation

ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox

ComplexMCP is a benchmark for evaluating model performance in complex software workflows and large API tool ecosystems.

arXiv badge

ComplexMCP

1) Build Environment Via Docker

docker build -t complexmcp:latest .
docker run -d --name complexmcp \
  -p 8000-8007:8000-8007 \
  -p 9000-9006:9000-9006 \
  complexmcp:latest

2) Create .env

Create a .env file in the project root, following .env.example format.

cp .env.example .env

Then fill values in .env as needed.

3) Run Benchmark

python run_benchmark.py --tool-config config/general.yaml \
  --model [model_name]

If you find this work helpful, please cite our paper:

@misc{li2026complexmcpevaluationllmagents,
      title={ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox}, 
      author={Yuanyang Li and Xue Yang and Longyue Wang and Weihua Luo and Hongyang Chen},
      year={2026},
      eprint={2605.10787},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2605.10787}, 
}

About

[ICML 2026] ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors