Skip to content

eightmm/RMSD-Pred

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RMSD-Pred

Python PyTorch DGL License

Advanced protein-ligand binding pose RMSD prediction using Graph Neural Networks

RMSD-Pred is a tool for predicting the Root Mean Square Deviation (RMSD) of protein-ligand binding poses using Graph Neural Networks (GNNs). The model provides both RMSD values and confidence scores for binding pose evaluation.

Features

  • Accurate RMSD Prediction: GNN models for precise RMSD estimation
  • Confidence Scoring: Probability estimation for pose correctness assessment
  • Multiple Input Formats: Support for SDF, MOL2, DLG, PDBQT, and batch processing

Quick Start

Installation

# Clone the repository
git clone https://github.com/eightmm/RMSD-Pred.git
cd RMSD-Pred

# Create and activate conda environment
conda create -n bindingrmsd python=3.11
conda activate bindingrmsd

# Install the package
pip install -e .

# Or install dependencies manually
pip install dgl -f https://data.dgl.ai/wheels/torch-2.4/cu121/repo.html
pip install torch rdkit meeko pandas tqdm

Basic Usage

Command Line Interface

# Using the installed command
bindingrmsd-inference \
    -r example/prot.pdb \
    -l example/ligs.sdf \
    -o results.tsv \
    --model_path save \
    --device cuda

# Or using the module directly
python -m bindingrmsd.inference \
    -r example/prot.pdb \
    -l example/ligs.sdf \
    -o results.tsv \
    --model_path save \
    --device cuda

Python API

from bindingrmsd.inference import inference

# Run inference
inference(
    protein_pdb="example/prot.pdb",
    ligand_file="example/ligs.sdf", 
    output="results.tsv",
    batch_size=128,
    model_path="save",
    device="cuda"
)

Usage Guide

Input Parameters

Parameter Description Default Required
-r, --protein_pdb Receptor protein PDB file - Yes
-l, --ligand_file Ligand file or file list - Yes
-o, --output Output results file result.csv No
--model_path Directory with model weights ./save No
--batch_size Batch size for inference 128 No
--device Compute device (cuda/cpu) cuda No
--ncpu Number of CPU workers 4 No

Supported Input Formats

Format Extension Description
SDF .sdf Structure Data File
MOL2 .mol2 Tripos MOL2 format
DLG .dlg AutoDock-GPU results
PDBQT .pdbqt AutoDock Vina results
List .txt Text file with file paths

Output Format

The results are saved as a tab-separated file with the following columns:

  • Name: Ligand pose identifier
  • pRMSD: Predicted RMSD value (Å)
  • Is_Above_2A: Confidence score (0-1, probability of being a good pose, 0 is better)
  • ADG_Score: AutoDock score (when available, NaN otherwise, 0 is better)

Architecture

Model Components

  • Gated Graph Neural Network: Advanced GNN architecture for molecular representation
  • Protein-Ligand Interaction: Comprehensive modeling of binding interactions
  • Dual Prediction: Simultaneous RMSD and confidence prediction
  • Efficient Processing: Optimized for batch inference

File Structure

RMSD-Pred/
├── bindingrmsd/          # Main package
│   ├── data/             # Data processing modules
│   │   ├── data.py          # Dataset classes
│   │   ├── ligand_atom_feature.py   # Ligand featurization
│   │   ├── protein_atom_feature.py  # Protein featurization
│   │   └── utils.py         # Utility functions
│   ├── model/            # Model architecture
│   │   ├── GatedGCNLSPE.py  # GNN implementation
│   │   └── model.py         # Prediction models
│   └── inference.py         # Inference script
├── example/              # Example data
│   ├── prot.pdb            # Example protein
│   ├── ligs.sdf            # Example ligands
│   └── run.sh              # Example script
├── save/                 # Pre-trained models
│   ├── reg.pth             # RMSD model weights
│   └── bce.pth             # Confidence model weights
├── setup.py                # Package configuration
├── env.yaml                # Conda environment
└── README.md               # This file

Example

Complete Workflow

# Navigate to example directory
cd example

# Run prediction
bindingrmsd-inference \
    -r prot.pdb \
    -l ligs.sdf \
    -o binding_results.tsv \
    --batch_size 64 \
    --device cuda

# View results
head binding_results.tsv

Expected Output

Name        pRMSD   Is_Above_2A ADG_Score
ligand_1    1.23    0.89        -8.5
ligand_2    3.45    0.12        -6.2
ligand_3    0.87    0.95        -9.1
...

Citation

If you use RMSD-Pred in your research, please cite:

@article{rmsdpred2024,
  title={RMSD-Pred: Accurate Prediction of Protein-Ligand Binding Pose RMSD using Graph Neural Networks},
  author={Jaemin Sim},
  year={2024}
}

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Authors

  • Jaemin Sim - Lead Developer - eightmm

About

Protein-ligand Binding RMSD prediction method

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •