Trace-Bench

Benchmark to evaluate LLM as an optimizer.

Currently, we are adding problems/domains one folder at a time.

The instructions to run each task are located inside the task folder.

Problem Sets

General Problem Sets

Simple QA Problem
A problem set that uses a ReAct agent
A problem set that uses a tool-calling agent
Code writing/generation
Math proof generation
A reasoning problem set that uses multi-agent (Learning to reason)

LLM4AD problems set

A comprehensive collection of 60 benchmark tasks derived from the LLM4AD (Large Language Models for Algorithm Design). Current implementation of graph is a single node.

Optimization - Basic (18 tasks): circle_packing, online_bin_packing_local, etc.
Optimization - Constructive (15 tasks): optimization_tsp_construct, optimization_knapsack_construct, optimization_set_cover_construct, etc.
Optimization - CO-Bench (21 tasks): optimization_travelling_salesman_problem, optimization_job_shop_scheduling, optimization_container_loading, etc.
Machine Learning (5 tasks): machine_learning_acrobot, machine_learning_pendulum, machine_learning_moon_lander, etc.
Scientific Discovery (1 task): science_discovery_ode_1d

Supported Algorithms: PrioritySearch, GEPA-Base, GEPA-UCB, GEPA-Beam

📖 See detailed usage guide →

Agent Architecture

ReAct agent

All the libraries from other repos are stored and managed in the external folder -- this folder will be created if one of the install.sh script is run inside the task folder.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
KernelBench		KernelBench
LLM4AD		LLM4AD
Veribench		Veribench
dev_deployment		dev_deployment
tests		tests
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Trace-Bench

Problem Sets

General Problem Sets

LLM4AD problems set

Agent Architecture

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

AgentOpt/Trace-Bench

Folders and files

Latest commit

History

Repository files navigation

Trace-Bench

Problem Sets

General Problem Sets

LLM4AD problems set

Agent Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages