This project benchmarks agents with memory capabilities. Follow the steps below to set up your environment and install dependencies.
- (July 7th, 2025) We released the code for reproducing the main experiment.
🌟 More details (such as datasets collection) coming soon! 🌟
It’s recommended to use a dedicated conda environment for reproducibility:
conda create --name MABench python=3.10.16
pip install torch
pip install -r requirements.txt
pip install "numpy<2"
We did not include the hipporag in requirements.txt since the current version of hipporag will cause some conflicts on pacakge version. You can create another environment with hipporag instead.
Sometimes you can try to supplement the lacked packages for cognee and letta. If you met some package related errors after installing requirements.txt.
pip install letta
pip uninstall letta
pip install cognee
pip uninstall cognee
To use this project, you need to download the processed data files and place them in the correct directory.
-
HuggingFace dataset link. It can be automatically downloaded if you run the code directly.
-
Do not forget the
entity2id.jsonfor Movie Recommendation task.
To run this project, you need to configure your API keys and model settings in a .env file at the project root.
Create a .env file and add the following content, replacing the placeholder values with your actual API keys:
OPENAI_API_KEY= ###your_openai_api_key
LLM_MODEL=gpt-4o-mini
LLM_API_KEY= ###your_api_key
Anthropic_API_KEY= ###your_anthropic_api
Google_API_KEY= ###your_google_api
Follow these steps to evaluate the benchmarking agent:
You can run an evaluation using the following example command:
bash bash_files/eniac/run_memagent_longcontext.sh
--agent_config: Path to the agent/model configuration file.--dataset_config: Path to the dataset configuration file.
bash bash_files/eniac/run_memagent_rag_agents.sh
bash bash_files/eniac/run_memagent_rag_agents_chunksize.sh
Remember that hipporag (2.0.0a3) reuqires openai==1.58.1, which may cause some latest OpenAI models could not be used in same environment.