📣News | 🏆Leaderboard | 📊Comparison | 🚀Setup | ⚙️Artifacts | 📜Attribution | 🙏Acknowledgements
Live-SWE-agent is the first live, runtime self-evolving software engineering agent that expands and revises its own capabilities on the fly while working on a real-world issue. Our key insight is that software agents are themselves software systems, and modern LLM-based agents already possess the intrinsic capability to extend or modify their own behavior at runtime.
- [Nov 24th, 2025]: Claude Opus 4.5 + Live-SWE-agent scores 79.2% on SWE-bench Verified, leading all current open-source scaffolds and coming very close to Anthropic’s internal, manually engineered scaffold for Opus 4.5!!
- [Nov 20th, 2025]: Gemini 3 Pro + Live-SWE-agent scores 77.4% on SWE-bench Verified, outperforming all available models (including Claude 4.5)!
- [Nov 17th, 2025]: Live-SWE-agent achieves the new state-of-the-art solve rate of 45.8% on SWE-Bench Pro!
- [Nov 17th, 2025]: We've released Live-SWE-agent 1.0.0!
For software tasks, recent LLMs are often benchmarked using manually engineered, proprietary agent scaffolds, which makes it difficult to compare the true capabilities of different models fairly.
Live-SWE-agent not only demonstrates that a minimal, open, and live scaffold already has the ability to outperform proprietary scaffolds, but also offers a unified and powerful platform that enables genuinely fair, apples-to-apples comparisons for future model releases.
As shown below, on our leaderboard of recent models (all evaluated with Live-SWE-agent), Claude Opus 4.5 retains the #1 spot with a score of 79.2% on SWE-bench Verified by a large margin.
More model scores are coming soon! For more details, please visit our leaderboard. Feel free to submit your model's evaluation results to help build a more comprehensive and fair benchmarking platform!
Below shows the comparison graph between Live-SWE-agent and state-of-the-art open-source solutions and proprietary commercial agent scaffolds on SWE-bench Verified and SWE-Bench Pro.
We built Live-SWE-agent on top of the popular mini-swe-agent framework with very minimal modifications.
To use Live-SWE-agent, simply install mini-swe-agent first using this guide and use the custom Live-SWE-agent config:
mini --config config/livesweagent.yaml # using custom Live-SWE-agent configSee the config folder for more details.
You can download the complete trajectories, patches, and results of Live-SWE-agent in our v1.0.0 release:
swebench_verified: complete runs on SWE-bench Verifiedswebench_pro: complete runs on SWE-Bench Pro
You also obtain them in our 🤗 huggingface datasets
@article{livesweagent,
author = {Xia, Chunqiu Steven and Wang, Zhe and Yang, Yan and Wei, Yuxiang and Zhang, Lingming},
title = {Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?},
year = {2025},
journal = {arXiv preprint},
}

