This Python script provides a simple automated approach for cleaning and preprocessing PDB files using Biopython. It allows users to remove water molecules, specific heteroatoms, and non-standard residues, ensuring that the resulting structures are clean and standardized for downstream computational analyses like molecular dynamics (MD) simulations, docking, or structural analysis.
- Remove Water Molecules: Option to remove all water molecules (
HOH, WAT) from the PDB structure. - Selective Heteroatom Removal: Remove specified heteroatoms (e.g.,
PO4,SO4) by providing their residue names. - Keep Only Protein Residues: Retains only standard amino acids and common protonation variants, removing all non-protein residues.
- Flexible Usage: Users can apply different cleaning operations based on their specific requirements.
Here’s a refined version with improved wording and clarity:
Clone this repository and navigate to the directory containing the Dockerfile. Build the Docker image with the following command:
docker build -t pdb_cleaner .Navigate to the directory containing your input PDB files. Use the following command to run the script:
docker run --rm -v "$(pwd):/data" -t pdb_cleaner -i /data/input.pdb -o /data/output.pdb- Replace
input.pdbandoutput.pdbwith the names of your input and output PDB files, respectively. - The
--rmflag ensures the container is removed after execution. - The
-v "$(pwd):/data"mounts the current directory to/datainside the container.
- Python 3.6 or newer
- Biopython
Install Biopython using one of the following methods:
-
Using
pip:pip install biopython
-
Using
conda:conda install -c conda-forge biopython
Run the script directly with Python:
python pdb_processor.py -i <input_pdb_file> -o <output_pdb_file> [options]- Replace
<input_pdb_file>and<output_pdb_file>with the names of your input and output PDB files, respectively.
| Argument | Description |
|---|---|
-i, --input |
Path to the input PDB file (required). |
-o, --output |
Path to save the cleaned PDB file (required). |
-r, --hetatm |
List of heteroatoms to remove (e.g., PO4, SO4). |
-p, --keep-protein-only |
Remove all non-protein residues, keeping only protein residues (cannot be used with --hetatm). |
-w, --remove-water |
Remove all water molecules from the structure. |
-
Remove water and keep only protein residues:
python pdb_processor.py -i input.pdb -o output_cleaned.pdb -p -
Remove specific heteroatoms (e.g., PO4) and water molecules:
python pdb_processor.py -i input.pdb -o output_no_hetatm.pdb -r PO4 -w -
Error Example: The following command will raise an error because
--hetatmand--keep-protein-onlyare incompatible:python pdb_processor.py -i input.pdb -o output_error.pdb -r PO4 -p
Error Handling: Ensures incompatible options are not used together (e.g., --hetatm and --keep-protein-only).
If you encounter any issues, discover any bugs, or would like to contribute to the project, feel free to:
- Report an Issue: Open an issue describing the problem in detail, including error messages, input files, and the command used.
- Request a Feature: If you have a feature in mind that would enhance the script, you can create a feature request.
- Contribute: Fork the repository, make your changes, and submit a pull request. Contributions are always welcome, and your support will help improve the project!
Aaryesh Deshpande Email: aaryeshad@gmail.com