Skip to content

PDBeurope/InvestigationCIF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InvestigationCIF

Category count Item count license version

Overview

This repository contains a MMCIF investigation dictionary that provides a data representation to capture the relationships between macromolecule structures deposited in the worldwide Protein Data Bank (wwPDB), and data from other databases and databanks, with enrichment of additional information / metadata to describe an investigation -- aka a series of related structures that were collected for a project and together provide insight.

This dictionary is an extension of the PDBx/mmCIF dictionary and provides the additional definitions required for an investigation files. Investigation files are umbrella files for a set of coordinates / models and their corresponding experimental data files.

The primary example showcased here is for fragment screening investigations, where multiple atomic-level models are determined to analyze how small molecule fragments interact with protein targets, facilitating drug discovery efforts.

Why InvestigationCIF?

Traditional PDB entries represent individual structures, but many research projects generate collections of related structures. InvestigationCIF solves this problem by:

  • Creating umbrella files that link multiple coordinate files and their experimental data
  • Adding contextual metadata about the overall investigation goals and methods
  • Enabling better discoverability and analysis of related structural data
  • Supporting reproducible research through standardized metadata capture

Investigation Files

Fragment Screening Investigation mmCIF files created from PDB group depositions are available at: https://ftp.ebi.ac.uk/pub/databases/msd/fragment_screening/investigations/

Creating Investigation MMCIF file

An investigation mmCIF file can be created through mmcif-gen, which is a Python tool for generating mmCIF files.

mmcif-gen can be used to create an investigation mmCIF file from internal databases at research facilities, such as a synchrotron, for example:

# Fetch configuration for a specific facility
mmcif-gen fetch-facility-json maxiv

# Specify custom output directory
mmcif-gen fetch-facility-json maxiv -o ./mapping_operations

Each facility stores their data internally in different formats, thus each facility has a different facility-json.

For more extensive documentation on using it:

check mmcif-gen PyPI page
--or--
check mmcif-gen GitHub repository

Organization of the repository

README.md - this file

MMCIF investigation extension - Investigation dictionary extension

Examples - directory with examples of investigation mmCIF file(s) compliant with the MMCIF investgation dictionary

Contributions / collaborations

Fragment-based-screening (FBS) is a complex and data-rich endeavour, wherein each stage of the process can generate different file types of complex data, in both raw and processed forms. The popularity of fragment screening in academic scientific research and the pharmaceutical industry is reflected by the increasing number of facilities, such as synchrotrons, that support fragment screening experiments.

Synchrotrons are central service centres that support experimental data generation with multiple options related to structural biology using X-ray crystallography.

Individuals from synchrotrons across Europe were involve in developing the data model for fragment-screening in this repository. The Protein Data Bank in Europe, in collaboration with other organizations from the worldwide Protein Data Bank, has led the project.

Synchrotrons and associated facilities involved in developing this data model:

European Synchrotron Radiation Facility Logo
European Molecular Biology Laboratory Logo
Diamond Light Source Synchrotron Logo
Helmholtz-Zentrum Berlin Research Center Logo
Max IV Synchrotron Logo

Protein Data Bank in Europe

Funded by:

iNext-Discovery Logo                FragmentScreen Logo

  • iNEXT-Discovery - a European Union funded project via Horizon Europe (Grant agreement ID: 871037)
  • FragmentScreen - a European Union funded project via Horizon Europe (Grant agreement ID: 101094131)

Funded by the European Union

License

Available to all in accordance with the Creative Commons Zero (CC0) designation.

Contributing

We welcome contributions to improve the InvestigationCIF dictionary. For changes, please open an issue first to discuss what you would like to change.

Feedback

For any feedback or suggestions, email us at pdbehelp@ebi.ac.uk. Please include 'InvestigationCIF' in your subject line.

About

Documentation for the mmcif Investigation dictionary

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •