An Open Nextflow pipeline for analyzing DIA-MS (Data-Independent Acquisition Mass Spectrometry) data.
oDIAFlow is a modular Nextflow pipeline for processing DIA proteomics data. The pipeline architecture follows the structure of quantms for future integration and compatibility. This repo is a sandbox for developing and testing DIA analysis workflows to ready them for inclusion in quantms.
oDIAFlow currently implements two main workflows:
Generates a spectral library from DDA data, then uses it to analyze DIA data.
DDA mzML files
↓
SAGE Search (database search)
↓
EasyPQP ConvertSage (convert to pickle format)
↓
EasyPQP Library (build spectral library)
↓
OpenSwathAssayGenerator (generate transitions)
↓
OpenSwathDecoyGenerator (add decoys)
↓
┌─────────────────────────────────────┐
│ DIA Analysis Pipeline │
└─────────────────────────────────────┘
↓
DIA mzML files → OpenSwathWorkflow (extract features + XICs)
↓
Merge OSW/OSWPQ (combine runs)
↓
Arycal (XIC-based alignment)
↓
PyProphet Alignment Scoring
↓
PyProphet (score → peptide inference → protein inference)
↓
Final Results (TSV)
Uses a predicted spectral library (e.g., from AlphaPeptDeep, DIA-NN) to analyze DIA data.
Transition TSV (predicted library)
↓
OpenSwathAssayGenerator (generate transitions)
↓
OpenSwathDecoyGenerator (add decoys)
↓
┌─────────────────────────────────────┐
│ DIA Analysis Pipeline │
└─────────────────────────────────────┘
↓
DIA mzML files → OpenSwathWorkflow (extract features + XICs)
↓
Merge OSW/OSWPQ (combine runs)
↓
Arycal (XIC-based alignment)
↓
PyProphet Alignment Scoring
↓
PyProphet (score → peptide inference → protein inference)
↓
Final Results (TSV)
nextflow run main.nf \
--workflow empirical \
--dda_glob "data/dda/*.mzML" \
--dia_glob "data/dia/*.mzML" \
--fasta "db/uniprot.fasta" \
--irt_traml "lib/iRTassays.TraML" \
--outdir resultsnextflow run main.nf \
--workflow insilico \
--dia_glob "data/dia/*.mzML" \
--transition_tsv "lib/predicted_library.tsv" \
--irt_traml "lib/iRTassays.TraML" \
--outdir resultsEmpirical workflow:
--dda_glob: Path pattern to DDA mzML files (e.g., "data/dda/*.mzML")--dia_glob: Path pattern to DIA mzML files (e.g., "data/dia/*.mzML")--fasta: Path to FASTA database file
In-silico workflow:
--dia_glob: Path pattern to DIA mzML files--transition_tsv: Path to predicted transition TSV file
--workflow: Workflow type: 'empirical' (default) or 'insilico'--irt_traml: Path to iRT peptide TraML file--swath_windows: Path to SWATH window definition file--use_parquet: Use Parquet format for PyProphet (default: false)--outdir: Output directory (default: "results")
- Sage: Fast proteomics database search engine
- EasyPQP: Spectral library generation from search results
- OpenMS/OpenSWATH: Feature extraction and scoring
- Arycal: Chromatogram alignment
- PyProphet: Semi-supervised learning and statistical validation
- Nextflow (version 25.10 or later)
- Docker or Singularity/Apptainer
- Container image:
ghcr.io/openswath/openswath:dev
Edit nextflow.config to adjust:
- Tool-specific parameters (Sage, PyProphet, OpenSWATH, etc.)
- Computational resources
- Container settings
The pipeline follows a modular structure compatible with quantms:
oDIAFlow/
├── main.nf # Entry point
├── nextflow.config # Configuration
├── workflows/ # High-level workflows
│ ├── dia_empirical_library.nf
│ └── dia_insilico_library.nf
├── subworkflows/ # Reusable sub-workflows
│ └── local/
│ ├── pyprophet_osw/
│ └── pyprophet_parquet/
└── modules/ # Individual process modules
└── local/
├── sage/
├── easypqp/
├── openms/
├── pyprophet/
└── arycal/