Authors: Jerry Yan, Chinmay Talegaonkar, Nicholas Antipa, Eric Terrill, Sophia Merrifield
PoseIDON: Pose Identification for Depth of Objects via foundation model Networks.
This repository contains the associated code for my thesis and paper of the same name.
The following are links to the thesis and preprint.
Environment setup is quite involved and annoying, so brace yourself.
git clone https://github.com/jerukan/barrels.git
cd barrels
git submodule update --init --recursive
conda env create --name barrels --file environment.yml
conda activate barrels
### if your CUDA setup isn't completely messed up, this can be skipped ###
conda install -c nvidia cuda
export CUDA_HOME=$CONDA_PREFIX
### cuda shenanigans end ###Next, Foundpose dependencies must be set up in a separate environment because it uses faiss, which is incompatible with numpy 2.x, and I think trying to force 1.x numpy here might cause problems. Probably.
cd foundpose
conda env create --name foundpose_gpu_311 --file environment.yml
cd ..Afterwards, go into burybarrel/config.py and change the path to the FoundPose Python environment.
FOUNDPOSE_PYTHON_BIN_PATH = Path("/path/to/conda/environment/foundpose_gpu_311/bin/python")For densifying point clouds and generating 3D reconstruction meshes from COLMAP, we use OpenMVS since it can do so on a CPU. Getting densification workin on either COLMAP or OpenMVS requires rebuilding the repository from source, so regardless we'll have to suffer from CMake.
Note that installing requires root access (I couldn't find a way to install OpenMVS without it). If you're running this repository on a server and don't have root access, you'll have to run the reconstructions locally, and then copy the results onto the server.
git clone https://github.com/cdcseacave/openMVS.git
git clone https://github.com/microsoft/vcpkg.git
brew install vcpkg
cd vcpkg
./bootstrap-vcpkg.sh
export VCPKG_ROOT=path/to/barrels/vcpkg
brew install autoconf automake autoconf-archive
cd ../openMVS
mkdir make
cd make
cmake .. -DCMAKE_MAKE_PROGRAM=/usr/bin/make -DCMAKE_CXX_COMPILER=/opt/homebrew/opt/llvm/bin/clang++
cmake --build . -j4
# this will install OpenMVS in /usr/local/bin/OpenMVS
# this is optional if you manually set the path to the binaries inside here
cmake --install .
cd ../..git clone https://github.com/cdcseacave/openMVS.git
git clone https://github.com/microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
export VCPKG_ROOT=path/to/barrels/vcpkg
sudo apt-get install autoconf automake
cd ../openMVS
mkdir make
cd make
cmake .. -DCMAKE_MAKE_PROGRAM=/usr/bin/make -DCMAKE_CXX_COMPILER=/usr/bin/clang++
cmake --build . -j4
# this will install OpenMVS in /usr/local/bin/OpenMVS
# this is optional if you manually set the path to the binaries inside here
cmake --install .
cd ../..If you want to run the pipeline with recent deep-learning models for 3D reconstruction instead of classical photogrammetry, we have code to test the following models: Fast3r and VGGT.
Fast3r setup:
git clone https://github.com/facebookresearch/fast3r.git
cd fast3r
pip install -r requirements.txt
pip install -e .
cd ..VGGT setup is similar:
git clone https://github.com/facebookresearch/vggt.git
cd vggt
pip install -r requirements.txt
pip install -e .
cd ..Display list of available commands:
python -m burybarrel --helpRunning them:
python -m burybarrel script-name [ARGS]It's suggested to use tmux.
Otherwise, you can use the nohup command as follows:
nohup python -m burybarrel script-name [ARGS] &Output will go to nohup.out.
There are three important directories:
- input data directory
- results directory
- CAD model directory.
The general content of each are listed below.
datasets-folder/: contains all input image datadataset-name/rgb/: contains all original RGB imagesmask/: (if available) ground truth mask of the object, with the corresponding image name matching the one inrgb/gt-overlays/: (if available) ground truth overlay of the object over the original RGB imagecamera.json: camera intrinsicsgt_obj2cam.json: the ground truth 6DoF translation and rotation relative to the camera for each imageframe-time-nav.csv: the ROV readings for lat/lon, depth, and orientation for each imageinfo.json: miscellaneous information like GT burial depth, lat/lon position, or textual description of scene
results-folder/: contains model outputsdataset-name/colmap-out/: photogrammetry output from COLMAPcam_poses.json: world camera poses predicted by COLMAPsparse.ply: sparse point cloud predicted by COLMAP- other COLMAP output
openmvs-out/: dense reconstruction of scene with OpenMVSscene_dense.ply: dense point cloudscene_dense_mesh_refine_texture.obj: textured mesh reconstruction for visualization purposes- other OpenMVS output
sam-masks/: masks predicted by Grounding DINO + SAM 2masksinfo.json: information on bounding boxes, masks, scores, etc predicted for each image- the masks will be binary masks with the same name corresponding to the original image
foundpose-output/: output generated by FoundPosetemplates/: template renders of the CAD modelobject_repre/: KNN model of DINOv2 features from the templatesinference/: outputs from FoundPose on input RGB imagesestimated-poses.json: estimated 6DoF poses from FoundPose for each image, which may contain multiple hypotheses for each image- the image names for the output visualization will not be the same as the original image, but will be in the same order alphabetically
fit-output/: final output from 3D aggregation of FoundPose posesestimation-name-1/: a prediction with specific settingsfit-overlays/: visualizations of outputsreconstruction-info.json: information on depth, scale correction factor, and prediction settingsestimated-poses.json: 6DoF poses predicted for each image, in the same format as FoundPose inference output
models3d/: contains information on 3D CAD modelsmodel_info.json: symmetry information and descriptors for each CAD model- CAD model files are located here, preferably
.plyformat
This process is also annoying, so buckle up.
May or may not be coming.
Go to burybarrel/config.py and set the following variables to your own paths:
DEFAULT_DATA_DIR = Path("path/to/input/data/dir")
DEFAULT_RESULTS_DIR = Path("path/to/output/results/dir")
DEFAULT_MODEL_DIR = Path("path/to/CAD/models/dir")
ONE_MACHINE = TrueMost scripts have options to specify these paths too, so it's not neccessary to set this unless you want to retype the paths every time you run a script.
Provide video information in configs/footage.yaml.
dataset-name:
input_path: path/to/video.mp4
output_dir: path/to/output/results/dir
start_time: ~
timezone: US/Pacific
step: ~
navpath: ~
# crop: [0, 120, 1920, 875]
crop: ~
maskpaths: [data/dive-data/footage-mask-hud.png]
fps: 25
increase_contrast: False
denoise_depth: True
object_name: ~
description: ~Get frames from a video.
python -m burybarrel get-footage-keyframes -n dataset-namePerform 3D reconstruction.
python -m burybarrel reconstruct-colmap --sparse --dense -n dataset-namePerform segmentation, FoundPose monocular pose estimates, and multiview pose aggregation.
python -m burybarrel run-full-pipelines --step-all -n dataset-nameTo generate a spreadsheet of BOP metric results and burial depth errors, run the following:
python -m burybarrel get-metricsAssuming you have ground truth obviously.
Runtime logs should be located in the logs/ directory.