Dingkang Liang1*, Cheng Zhang1*, Xiaopeng Xu1, Jianzhong Ju2, Zhenbo Luo2, Xiang Bai1
1 Huazhong University of Science & Technology, 2 MiLM Plus, Xiaomi Inc.
(*) Equal contribution.
- [2025.11.24] The code and dataset are released.
- [2025.11.08] 🎉🎉🎉 This work is accepted by AAAI2026 as Oral presentation (acceptance rate ~4.5%)!
Task scheduling is critical for embodied AI, enabling agents to follow natural language instructions and execute actions efficiently in 3D physical worlds. However, existing datasets often simplify task planning by ignoring operations research (OR) knowledge and 3D spatial grounding.
In this work, we propose Operations Research knowledge-based 3D Grounded Task Scheduling (ORS3D), a new task that requires the synergy of language understanding, 3D grounding, and efficiency optimization. Unlike prior settings, ORS3D demands that agents minimize total completion time by leveraging parallelizable subtasks, e.g., cleaning the sink while the microwave operates.
To facilitate research on ORS3D, we construct ORS3D-60K, a large-scale dataset comprising 60K composite tasks across 4K real-world scenes. Furthermore, we propose GRANT, an embodied multi-modal large language model equipped with a simple yet effective scheduling token mechanism to generate efficient task schedules and grounded actions. Extensive experiments on ORS3D-60K validate the effectiveness of GRANT across language understanding, 3D grounding, and scheduling efficiency.
This project is built upon Grounded 3D-LLM, and the preparations roughly follow the Grounded 3D-LLM.
Python: 3.10.16
Pytorch: 1.12.1+cu116
CUDA: 11.6
conda create -n GRANT python=3.10.16
conda activate GRANT
conda install openblas-devel -c anaconda
conda install openjdk=11
pip install -r requirements.txt
export LD_LIBRARY_PATH=your/custom/lib/path
# Please update LD_LIBRARY_PATH according to your system configuration.
pip3 install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip3 install torch-scatter -f https://data.pyg.org/whl/torch-1.12.1+cu116.html
pip install peft==0.8.2 --no-deps # ignore the pytorch version error
mkdir -p third_party
cd third_party
git clone --recursive "https://github.com/NVIDIA/MinkowskiEngine"
cd MinkowskiEngine
git checkout 02fc608bea4c0549b0a7b00ca1bf15dee4a0b228
python setup.py install --blas_include_dirs=${CONDA_PREFIX}/include --blas=openblas
cd ../pointnet2
python setup.py install
Note
If you encounter version issues, please refer to the complete dependency list in requirements.txt.
Download ORS3D-60K dataset and dataset splits from HuggingFace.
Download 3D scenes from SceneVerse.
GRANT
├── data
│ ├── langdata
│ │ │── ORS3D.json # ORS3D-60K dataset
│ │── SceneVerse
│ │ │── 3RScan
│ │ │── ARKitScenes
│ │ │── HM3D
│ │ │── MultiScan
│ │ │── ScanNet
│ │ │── splits # ORS3D-60K dataset splits
Please download the pretrained LLM weights (Tiny-Vicuna-1B) and store them in $ROOT_PATH/pretrained/llm_weight/Tiny-Vicuna-1B/
Download the point cloud encoder weights and pretrained GRANT weights from HuggingFace.
Step 1: Put the pretrained weights of 3D encoder and LLM to the proper directory.
GRANT
│── pretrained
│ │── bert-base-uncased
│ │── label_clip_features.pth
│ │── pointcloud_encoder.ckpt
│ │── GRANT.ckpt
│ │── llm_weight
│ │ │── Tiny-Vicuna-1B
Step 2: Verify that all required environment variables are correctly defined in .env.example, then create your actual environment file by running:
cp .env.example .env
Step 3: Run the training command: bash scripts/train.sh
Run the model evaluation command: bash scripts/eval.sh
This project is based on Grounded 3D-LLM (paper, code, page), SG3D (paper, code, page), LEO (paper, code, page). Thanks for their wonderful works.
If you find this repository useful in your research, please consider giving a star ⭐ and a citation.
@inproceedings{liang2026cook,
title={Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution},
author={Liang, Dingkang and Zhang, Cheng and Xu, Xiaopeng and Ju, Jianzhong and Luo, Zhenbo and Bai, Xiang},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026}
}


