We use uv to manage dependencies and the project environment.
Clone the GitHub repository:
git clone git@github.com:MDverse/mdverse_data_schema.git
cd mdverse_data_schemaSync dependencies:
uv syncDownload parquet files from Zenodo to build the database:
uv run src/download_data.pyFiles will be downloaded to data/parquet_files:
data
└── parquet_files
├── datasets.parquet
├── files.parquet
├── gromacs_gro_files.parquet
├── gromacs_mdp_files.parquet
├── gromacs_xtc_files.parquet
Create the empty database:
uv run src/create_database.pyPopulate the tables with the data from parquet files:
uv run src/ingest_data.pyReport on the number of rows and columns of the table of the database:
uv run report.pyThis will create the file report.log with the information.
If you wish to re-ingest data from any of the following tables:
- TopologyFile
- ParameterFile
- TrajectoryFile
You can run these commands:
uv run src/ingest_topol_files.pyor
uv run src/ingest_param_files.pyor
uv run src/ingest_traj_files.py