GROdecoder 🔬

GROdecoder is a powerful Python toolkit for extracting and identifying molecular components from structure files (PDB, GRO, CRD, COOR) generated by molecular dynamics simulations. It automatically detects and classifies proteins, nucleic acids, lipids, ions, solvents, and other small molecules, providing detailed molecular inventories in JSON format.

🌟 Features

Automatic Molecule Detection: Intelligently identifies proteins, nucleic acids, lipids, ions, solvents, and unknown molecules
Chain Segmentation: Detects individual protein/nucleic acid chains using distance-based connectivity analysis
Resolution Detection: Automatically determines if structures are all-atom or coarse-grained
Comprehensive Database: Built-in databases from CHARMM-GUI CSML and MAD for accurate molecular identification
Multiple Output Formats: Full or compact JSON serialization with optional atom indices
Flexible Input: Supports PDB, GRO, CRD, and COOR file formats
Web Interface: User-friendly Streamlit web application
Command Line Tool: Batch processing capabilities for high-throughput analysis

🚀 Quick Start

Installation

GROdecoder uses uv for dependency management:

git clone https://github.com/pierrepo/grodecoder.git
cd grodecoder
uv sync

Basic Usage

Command Line:

# Analyze a structure file
uv run grodecoder path/to/structure.gro

# Analyze a pair topology file + coordinates
uv run grodecoder path/to/topology.psf /path/to/coordinates.coor

# Output to stdout with compact format
uv run grodecoder structure.pdb --compact --stdout

# Custom bond threshold for chain detection
uv run grodecoder structure.gro --bond-threshold 3.5

Python API:

import grodecoder as gd

# Decode a structure file
decoded = gd.decode_structure("structure.gro")

# Access the molecular inventory
inventory = decoded.inventory
print(f"Found {len(inventory.segments)} protein/nucleic chains")
print(f"Found {len(inventory.small_molecules)} small molecules")

# Get specific molecule types
proteins = [seg for seg in inventory.segments
           if seg.molecular_type == gd.MolecularType.PROTEIN]
lipids = [mol for mol in inventory.small_molecules
          if mol.molecular_type == gd.MolecularType.LIPID]

Web Interface:

uv run streamlit run scripts/streamlit_app.py

Then open your browser at http://localhost:8501

📊 Output Format

GROdecoder produces detailed JSON inventories with the following structure:

{
  "inventory": {
    "segments": [
      {
        "atoms": [0, 1, 2, ...],
        "sequence": "MKALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIV...",
        "molecular_type": "protein",
        "number_of_atoms": 1434,
        "number_of_residues": 89
      }
    ],
    "small_molecules": [
      {
        "atoms": [8080, 8081, 8082, ...],
        "name": "SOL",
        "description": "Water (TIP3P model)",
        "molecular_type": "solvent",
        "number_of_atoms": 17259,
        "number_of_residues": 5753
      }
    ]
  },
  "resolution": "all-atom",
  "database_version": "1.0.0"
}

🔧 Advanced Features

Read back a Grodecoder inventory file

Reading a Grodecoder inventory file is essential to be able to access the different parts of a system without having to identify them again:

from grodecoder import read_grodecoder_output

gro_results = read_grodecoder_output("1BRS_grodecoder.json")

# Print the sequence of protein segment only.
for segment in gro_results.decoded.inventory.segments:
    if segment.is_protein():
        print(segment.sequence)

In conjunction with the structure file, we can use the grodecoder output file to access the different parts of the system, as identified by grodecoder:

import MDAnalysis
from grodecoder import read_grodecoder_output


universe = MDAnalysis.Universe("tests/data/1BRS.pdb")
gro_results = read_grodecoder_output("1BRS_grodecoder.json")

# Prints the center of mass of each protein segment.
for segment in gro_results.decoded.inventory.segments:
    if segment.is_protein():
        seg: MDAnalysis.AtomGroup = universe.atoms[segment.atoms]
        print(seg.center_of_mass())

Chain Detection

GROdecoder uses sophisticated distance-based algorithms to detect protein and nucleic acid chains:

# Detect chains with custom cutoff
decoded = gd.decode_structure("multi_chain.pdb", bond_threshold=4.0)

# Access individual chains
for i, chain in enumerate(decoded.inventory.segments):
    print(f"Chain {i+1}: {len(chain.sequence)} residues")

Molecular Type Classification

The toolkit categorizes molecules into six types:

Proteins: Amino acid sequences
Nucleic Acids: DNA/RNA sequences
Lipids: Membrane components from CHARMM-GUI and MAD databases
Ions: Inorganic ions (Na+, Cl-, Ca2+, etc.)
Solvents: Water models and organic solvents
Unknown: Unidentified small molecules

Resolution Detection

Automatically distinguishes between:

All-atom: High-resolution structures with complete atomic detail
Coarse-grained: Simplified representations with grouped atoms

🗃️ Database System

GROdecoder includes comprehensive molecular databases:

Built-in Databases

Amino Acids: Standard and modified residues
Nucleotides: DNA/RNA bases and modifications
Ions: Common inorganic ions with various protonation states
Solvents: Water models (TIP3P, SPC/E, etc.) and organic solvents

External Database Integration

CHARMM-GUI CSML: All-atom lipid and small molecule definitions
MAD Database: Coarse-grained molecular definitions

Database Updates

# Update CHARMM-GUI CSML database
uv run scripts/build_lipid_database.py

# Update MAD database
uv run scripts/scrap_MAD.py

🧪 Testing

Run the comprehensive test suite:

# Run all tests
uv run pytest

# Run specific test categories
uv run pytest tests/test_identifier.py  # Core identification tests
uv run pytest tests/test_regression.py  # Regression tests
uv run pytest tests/test_toputils/      # Topology utility tests

🤝 Contributing

Contributions are welcome! Please feel free to:

Report Issues: Bug reports and feature requests via GitHub Issues
Submit Pull Requests: Code improvements and new features
Add Molecules: Extend the molecular databases
Improve Documentation: Help make GROdecoder more accessible

📄 License

GROdecoder is released under the BSD 3-Clause License. See LICENSE for details.

👨‍💻 Authors & Acknowledgments

Created by Pierre Poulain

Special thanks to:

Contributors to the CHARMM-GUI CSML database
MAD (Martini Database) maintainers
MDAnalysis community
Beta testers and early adopters

🔗 Links

Documentation: GitHub Repository
Web Demo: Streamlit App
Issues & Support: GitHub Issues

GROdecoder: Making molecular simulation analysis simple, accurate, and reproducible.

Name		Name	Last commit message	Last commit date
Latest commit History 470 Commits
.github/workflows		.github/workflows
.streamlit		.streamlit
assets		assets
scripts		scripts
src/grodecoder		src/grodecoder
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GROdecoder 🔬

🌟 Features

🚀 Quick Start

Installation

Basic Usage

📊 Output Format

🔧 Advanced Features

Read back a Grodecoder inventory file

Chain Detection

Molecular Type Classification

Resolution Detection

🗃️ Database System

Built-in Databases

External Database Integration

Database Updates

🧪 Testing

🤝 Contributing

📄 License

👨‍💻 Authors & Acknowledgments

🔗 Links

About

Uh oh!

Releases 1

Uh oh!

Contributors 4

Uh oh!

Languages

License

MDverse/grodecoder

Folders and files

Latest commit

History

Repository files navigation

GROdecoder 🔬

🌟 Features

🚀 Quick Start

Installation

Basic Usage

📊 Output Format

🔧 Advanced Features

Read back a Grodecoder inventory file

Chain Detection

Molecular Type Classification

Resolution Detection

🗃️ Database System

Built-in Databases

External Database Integration

Database Updates

🧪 Testing

🤝 Contributing

📄 License

👨‍💻 Authors & Acknowledgments

🔗 Links

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Contributors 4

Uh oh!

Languages