This repository contains a MMCIF investigation dictionary that provides a data representation to capture the relationships between macromolecule structures deposited in the worldwide Protein Data Bank (wwPDB), and data from other databases and databanks, with enrichment of additional information / metadata to describe an investigation -- aka a series of related structures that were collected for a project and together provide insight.
This dictionary is an extension of the PDBx/mmCIF dictionary and provides the additional definitions required for an investigation files. Investigation files are umbrella files for a set of coordinates / models and their corresponding experimental data files.
The primary example showcased here is for fragment screening investigations, where multiple atomic-level models are determined to analyze how small molecule fragments interact with protein targets, facilitating drug discovery efforts.
Traditional PDB entries represent individual structures, but many research projects generate collections of related structures. InvestigationCIF solves this problem by:
- Creating umbrella files that link multiple coordinate files and their experimental data
- Adding contextual metadata about the overall investigation goals and methods
- Enabling better discoverability and analysis of related structural data
- Supporting reproducible research through standardized metadata capture
Fragment Screening Investigation mmCIF files created from PDB group depositions are available at: https://ftp.ebi.ac.uk/pub/databases/msd/fragment_screening/investigations/
An investigation mmCIF file can be created through mmcif-gen, which is a Python tool for generating mmCIF files.
mmcif-gen can be used to create an investigation mmCIF file from internal databases at research facilities, such as a synchrotron, for example:
# Fetch configuration for a specific facility
mmcif-gen fetch-facility-json maxiv
# Specify custom output directory
mmcif-gen fetch-facility-json maxiv -o ./mapping_operations
Each facility stores their data internally in different formats, thus each facility has a different facility-json.
For more extensive documentation on using it:
check mmcif-gen PyPI page
--or--
check mmcif-gen GitHub repository
README.md - this file
MMCIF investigation extension - Investigation dictionary extension
Examples - directory with examples of investigation mmCIF file(s) compliant with the MMCIF investgation dictionary
Fragment-based-screening (FBS) is a complex and data-rich endeavour, wherein each stage of the process can generate different file types of complex data, in both raw and processed forms. The popularity of fragment screening in academic scientific research and the pharmaceutical industry is reflected by the increasing number of facilities, such as synchrotrons, that support fragment screening experiments.
Synchrotrons are central service centres that support experimental data generation with multiple options related to structural biology using X-ray crystallography.
Individuals from synchrotrons across Europe were involve in developing the data model for fragment-screening in this repository. The Protein Data Bank in Europe, in collaboration with other organizations from the worldwide Protein Data Bank, has led the project.
Synchrotrons and associated facilities involved in developing this data model:
- The Crystallisation Facility at the European Molecular Biology Laboratory (EMBL) Grenoble and European Synchrotron Radiation Facility (ESFR) in France
- XChem: Diamond Fragment Screening at Diamond Light Source (DLS) in the United Kingdom
- Fragment Screening Facility at Berlin synchrotron BESSY-MX and Helmholtz-Zentrum Berlin/HZB in Germany
- FragMAX at Swedish synchrotron MAX IV in Sweden
- iNEXT-Discovery - a European Union funded project via Horizon Europe (Grant agreement ID: 871037)
- FragmentScreen - a European Union funded project via Horizon Europe (Grant agreement ID: 101094131)
Available to all in accordance with the Creative Commons Zero (CC0) designation.
We welcome contributions to improve the InvestigationCIF dictionary. For changes, please open an issue first to discuss what you would like to change.
For any feedback or suggestions, email us at pdbehelp@ebi.ac.uk. Please include 'InvestigationCIF' in your subject line.
