Skip to content

OpenAutoCoder/live-swe-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

mini-swe-agent banner

Live-SWE-agent | The First Live AI Software Agent

📣News | 🏆Leaderboard | 📊Comparison | 🚀Setup | ⚙️Artifacts | 📜Attribution | 🙏Acknowledgements

Live-SWE-agent is the first live, runtime self-evolving software engineering agent that expands and revises its own capabilities on the fly while working on a real-world issue. Our key insight is that software agents are themselves software systems, and modern LLM-based agents already possess the intrinsic capability to extend or modify their own behavior at runtime.

📣 News

  • [Nov 24th, 2025]: Claude Opus 4.5 + Live-SWE-agent scores 79.2% on SWE-bench Verified, leading all current open-source scaffolds and coming very close to Anthropic’s internal, manually engineered scaffold for Opus 4.5!!
  • [Nov 20th, 2025]: Gemini 3 Pro + Live-SWE-agent scores 77.4% on SWE-bench Verified, outperforming all available models (including Claude 4.5)!
  • [Nov 17th, 2025]: Live-SWE-agent achieves the new state-of-the-art solve rate of 45.8% on SWE-Bench Pro!
  • [Nov 17th, 2025]: We've released Live-SWE-agent 1.0.0!

🏆 Leaderboard

For software tasks, recent LLMs are often benchmarked using manually engineered, proprietary agent scaffolds, which makes it difficult to compare the true capabilities of different models fairly.

Live-SWE-agent not only demonstrates that a minimal, open, and live scaffold already has the ability to outperform proprietary scaffolds, but also offers a unified and powerful platform that enables genuinely fair, apples-to-apples comparisons for future model releases.

As shown below, on our leaderboard of recent models (all evaluated with Live-SWE-agent), Claude Opus 4.5 retains the #1 spot with a score of 79.2% on SWE-bench Verified by a large margin.

More model scores are coming soon! For more details, please visit our leaderboard. Feel free to submit your model's evaluation results to help build a more comprehensive and fair benchmarking platform!

📊 Comparison

Below shows the comparison graph between Live-SWE-agent and state-of-the-art open-source solutions and proprietary commercial agent scaffolds on SWE-bench Verified and SWE-Bench Pro.

🚀 Setup

We built Live-SWE-agent on top of the popular mini-swe-agent framework with very minimal modifications.

To use Live-SWE-agent, simply install mini-swe-agent first using this guide and use the custom Live-SWE-agent config:

mini --config config/livesweagent.yaml # using custom Live-SWE-agent config

See the config folder for more details.

⚙️ Artifacts

You can download the complete trajectories, patches, and results of Live-SWE-agent in our v1.0.0 release:

  • swebench_verified: complete runs on SWE-bench Verified
  • swebench_pro: complete runs on SWE-Bench Pro

You also obtain them in our 🤗 huggingface datasets

📜 Attribution

@article{livesweagent,
  author    = {Xia, Chunqiu Steven and Wang, Zhe and Yang, Yan and Wei, Yuxiang and Zhang, Lingming},
  title     = {Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?},
  year      = {2025},
  journal   = {arXiv preprint},
}

🙏 Acknowledgements