Co-evolution-based Metal-binding Residue Prediction with Graph Neural Networks

Introduction

In this repository, we provide the code for the Metal-Binding Graph Neural Network (MBGNN) method, as described in our paper ''Co-evolution-based Metal-binding Residue Prediction with Graph Neural Networks''. MBGNN is a novel method that utilizes co-evolved residue networks and effectively captures dependencies within protein structures using graph neural networks, enhancing the prediction of co-evolved metal-binding residues and their associated metal types.

Method Overview

Important Files

The structure and description of the main files and directories are given below:

├── dataset # Raw data required for training and testing the models and extracted co-evolved pairs
│   ├── train_chains.fasta
│   ├── test_chains.fasta
│   ├── README.md # Description of the dataset
│   ├── test_coevolved_pairs.csv
│   ├── test_residues.tsv
│   ├── train_coevolved_pairs.csv
│   └── train_residues.tsv
├── compare_results  # Results of the comparison between MBGNN and other methods
│   ├── compare_metal_binding_prediction_results.ipynb 
│   ├── compare_metal_type_prediciton_results.ipynb 
│   ├── LMetalSite  # Results of LMetalSite method
│   ├── MBGNN_metal-binding_preds.tsv # Predicted metal-binding residues by MBGNN
│   ├── MBGNN_metal_type_preds.tsv  # Predicted metal types by MBGNN
│   ├── MetalNet # Results of MetalNet and MetalNet2 methods
│   └── M_Ionic # Rsults of M_Ionic method
├── example.fasta # Example fasta file containing arbitrary protein sequences from the test set
├── model_weights # Trained models weights
│   ├── metal_binding_predictor
│   └── metal_type_predictor
├── scripts # Scripts required for the prediction, each script can be run independently
│   ├── construct_graphs.py
│   ├── esm2.py
│   ├── extract_co_evovled_pairs.py
│   ├── gnn_model.py
│   ├── metal_binding_predictor.py
│   ├── metal_type_predictor.py
│   └── msa.py
├── main.py # Script to run the prediction for arbitrary protein sequences
└── training # Directory containing the Jupyter notebooks for training the models
    ├── construct_training_graphs.py
    ├── train_metal_binding_predictors.ipynb
    └── train_metal_type_predictors.ipynb

Requirements

The source code is implemented using Python 3.11, Pytorch 2.3.0, and Pytorch Geometric 2.6.1. All the required packages are given below.

torch=2.3.0
torch-geometric=2.6.1
networkx>=3.3
biopython>=1.85
esm=2.0.1
numpy>=1.26.2
scikit-learn>=1.4.2
pandas>=2.2

Usage

Setup the environment

To use the provided code, you need to install the required packages first. You can create a conda environmet and install the required packages using the following command:

# Create a conda environment
$ conda create -n mbgnn python=3.11
$ conda activate mbgnn

# Install the required packages
$ pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121
$ pip install torch_geometric==2.6.1
$ pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.3.0+cu121.html
$ pip install networkx>=3.3 biopython>=1.85 numpy>=1.26.2 scikit-learn>=1.4.2 pandas>=2.2
$ pip install fair-esm

# Clone the source code of MBGNN
$ git clone https://github.com/SRastegari/MBGNN.git
$ cd MBGNN

Perform prediction for a ‍‍‍‍‍‍‍`.fasta` file containing arbitrary protein sequences

For predicting metal-binding residues and their associated metal types for arbitrary protein sequences in a .fasta file, you need to run the main.py script. The main.py script will extract the protein sequences from the provided .fasta file, construct the co-evolved residue network, and predict the metal-binding residues and their associated metal types using the trained models. all you have to do is to pass the path to the .fasta file containing the protein sequences to the main.py scrip as follows:

$ python main.py path/to/file_name.fasta

you can run the mentioned command on the provided example fasta file, example.fasta, as follows:

$ python main.py example.fasta

All the predicted metal-binding residues and their associated metal types, as well as intermediate results, will be saved in the run_{file_name} directory.

Alternatively, you can use the provided Colab notebook to upload arbitrary protein sequences and run the prediction. The Colab notebook is available here.

Training the models

To train the models from scratch, follow these steps:

1. Prepare the environment:

Ensure you have set up the environment as described in the Setup the environment.

2. Prepare the data:

Use the train protein sequence and metal-binding residue datasets provided in the data directory. Train sequences are provided in the train_chains.fasta file, and the metal-binding residues are provided in the train_residues.tsv file. Also extracted co-evolved residue pairs for training and testing are provided in the train_coevolved_pairs.csv.

2.1. Deriving ESM2 Embeddings

To derive ESM2 embeddings for the protein sequences, use the esm2.py script provided in the scripts directory. This script will generate embeddings for each sequence in the train_chains.fasta file. you can run the script as follows:

$ python scripts/esm2.py data/train_chains.fasta /path/to/save/embeddings

2.2. Creating Co-evolved Residue Network for Training Metal-binding Predictors

To prepare PyG graphs corrspond to the co-evolved residue networks for training metal-binding predictors, use the training/construct_training_graphs.py script as follows:

$ python training/construct_training_graphs.py data/train_coevolved_pairs.csv /path/to/embeddings /path/to/save/graphs_list /path/to/train_residues --mode=metal_binding

2.3 Creating Co-evolved Residue Network for Training Metal-type Predictors

Similarly, to create the co-evolved residue network for training metal-type predictors, use the same training/construct_training_graphs.py script but with the appropriate parameters for metal types as follows:

$ python training/construct_training_graphs.py data/train_coevolved_pairs.csv /path/to/embeddings /path/to/save/graphs_list /path/to/train_residues --mode=metal_type

3. Train the metal-binding residue predictor:

Use the train_metal_binding_predictors.ipynb notebook provided in the training directory to train the metal-binding residue prediction model using the PyG graphs created in step 2.2.

4. Train the metal type predictor:

Use the train_metal_type_predictors.ipynb notebook provided in the training directory to train the metal type prediction model using the PyG graphs created in step 2.3.

Dataset

We used the dataset provided by MetalNet2, which consisted of 4,449 metal-binding protein chains collected from the Protein Data Bank (PDB) as of May 2023. The dataset contained a training set and a fixed hold-out test set, which respectively included 18,230 and 1,981 metal-binding CHED residues. Furthermore, 11 metal types were considered as labels for each metal-binding residue, including Zn, Ca, Mg, Mn, Fe, SF4, Ni, Cu, Co, FeS, and Fe3S. Annotated train and test residues are provided in dataset/train_residues.tsv and dataset/test_residues.tsv files, respectively. The protein sequences corresponding to the train and test residues are provided in the dataset/train_chains.fasta and dataset/test_chains.fasta files, respectively. Finally, The co-evolved residue pairs extracted from the train and test sequences are provided in the dataset/train_coevolved_pairs.csv and dataset/test_coevolved_pairs.csv files, respectively, but also can be extracted using the scripts/msa.py to perform multiple sequence alignment and extract co-evolved residue pairs using scripts/extract_co_evovled_pairs.py script as follows:

$ python scripts/msa.py data/train_chains.fasta /path/to/save/msa
$ python scripts/extract_co_evovled_pairs.py /path/to/msa /path/to/save/coevolved_pairs

Acknowledgements

Some parts of the code in this repository are adapted from the MetalNet2 repository. We would like to thank the authors for their valuable work.

Citation

If you find this repository useful in your research, please cite the following paper:

@article{rastegari2025co,
  title={Co-evolution-based Metal-binding Residue Prediction with Graph Neural Networks},
  author={Rastegari, Sayedmohammadreza and Tabakhi, Sina and Liu, Xianyuan and Sang, Wei and Lu, Haiping},
  journal={arXiv preprint arXiv:2502.16189},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Co-evolution-based Metal-binding Residue Prediction with Graph Neural Networks

Table of Contents

Introduction

Method Overview

Important Files

Requirements

Usage

Setup the environment

Perform prediction for a ‍‍‍‍‍‍‍`.fasta` file containing arbitrary protein sequences

Training the models

1. Prepare the environment:

2. Prepare the data:

2.1. Deriving ESM2 Embeddings

2.2. Creating Co-evolved Residue Network for Training Metal-binding Predictors

2.3 Creating Co-evolved Residue Network for Training Metal-type Predictors

3. Train the metal-binding residue predictor:

4. Train the metal type predictor:

Dataset

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
compare_results		compare_results
dataset		dataset
image		image
model_weights		model_weights
scripts		scripts
training		training
LICENSE		LICENSE
README.md		README.md
example.fasta		example.fasta
main.py		main.py

License

SRastegari/MBGNN

Folders and files

Latest commit

History

Repository files navigation

Co-evolution-based Metal-binding Residue Prediction with Graph Neural Networks

Table of Contents

Introduction

Method Overview

Important Files

Requirements

Usage

Setup the environment

Perform prediction for a ‍‍‍‍‍‍‍.fasta file containing arbitrary protein sequences

Training the models

1. Prepare the environment:

2. Prepare the data:

2.1. Deriving ESM2 Embeddings

2.2. Creating Co-evolved Residue Network for Training Metal-binding Predictors

2.3 Creating Co-evolved Residue Network for Training Metal-type Predictors

3. Train the metal-binding residue predictor:

4. Train the metal type predictor:

Dataset

Acknowledgements

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Perform prediction for a ‍‍‍‍‍‍‍`.fasta` file containing arbitrary protein sequences

Packages