Opsin Phenotype Tool for Inference of Color Sensitivity (OPTICS) [v1.3]

Code: Data: VPOD_v1.2 DOI:

Opsin Phenotype Tool for Inference of Color Sensitivity (OPTICS) [v1.3]

Example Box Plot Output for Bootstrap Predictions of Opsin λmax by OPTICS

Description

OPTICS is an open-source tool that predicts the Opsin Phenotype (λmax) from unaligned opsin amino-acid sequences.
OPTICS leverages machine learning models trained on the Visual Physiology Opsin Database (VPOD).
OPTICS can be downloaded and used as a command-line or GUI tool.
OPTICS is also avaliable as an online tool here, hosted on our Galaxy Project server.
Check out our pre-print Accessible and Robust Machine Learning Approaches to Improve the Opsin Genotype-Phenotype Map to read more about it!

Key Features

λmax Prediction: Predicts the peak light absorption wavelength (λmax) for opsin proteins.
Model Selection: Choose from different pre-trained models for prediction.
Encoding Methods: Select between one-hot encoding or amino-acid property encoding for model training and prediction.
BLAST Analysis: Optionally perform BLASTp analysis to compare query sequences against reference datasets.
Bootstrap Predictions: Optionally enable bootstrap predictions for enhanced accuracy assessment (suggested limit to 10 sequences for bootstrap visulzations).
Prediction Explanation: Utilizes SHAP to explain the key features driving the λmax difference between any two sequences.

Installation

Clone the repository:

 git clone https://github.com/VisualPhysiologyDB/optics.git

Install dependencies: [Make sure you are working in the repository directory from here-after]

A. Create a Conda environment for OPTICS (make sure you have Conda installed)
```
conda create --name optics_env python=3.11 
```
THEN
```
conda activate optics_env
```
B. Use the 'requirements.txt' file to download base package dependencies for OPTICS
```
pip install -r requirements.txt
```
C. Download MAFFT and BLAST

IF working on MAC or LINUX device:
- Install BLAST and MAFFT directly from the bioconda channel
```
conda install bioconda::blast bioconda::mafft
```
IF working on WINDOWS device:
- Manaully install the Windows compatable BLAST executable on your system PATH; the download list is here
  - We suggest downloading 'ncbi-blast-2.16.0+-win64.exe'
- You DO NOT need to download MAFFT, OPTICS should be able to run MAFFT from the files we provide when downloading this GitHub.

Usage

MAKE SURE YOU HAVE ALL DEPENDENCIES DOWNLOADED AND THAT YOU ARE IN THE FOLDER DIRECTORY FOR OPTICS (or have loaded it as a module) BEFORE RUNNING ANY SCRIPTS!

Main prediction script (`optics_predictions.py`)

Required Args:

-i, --input: Either a single sequence or a path to a FASTA file.

General Optional Args:

-o, --output_dir: Desired directory to save output folder/files (optional). Default: './prediction_outputs'

-p, --prediction_prefix: Base filename for prediction outputs. Default: 'unnamed'

-v, --model_version: Version of models to use (optional). Based on the version of VPOD used to train models. Options/Default: vpod_1.3 (More version coming later)

-m, --model: Prediction model to use. Options: whole-dataset, wildtype, vertebrate, invertebrate, wildtype-vert, type-one, whole-dataset-mnm, wildtype-mnm, vertebrate-mnm, invertebrate-mnm, wildtype-vert-mnm. **Default: whole-dataset** 

-e, --encoding: Encoding method to use (optional). Options: one_hot, aa_prop. Default: aa_prop

--tolerate_non_standard_aa: Allows OPTICS to run predictions on sequences with 'non-standard' amino-acids (e.g. - 'X','O','B', etc...)(optional). Default: False

--n_jobs: Number of parallel processes to run (optional). -1 is the default, utilizing all avaiable processors., 


BLASTp Analysis Args (optional):

--blastp: Enable BLASTp analysis.

--blastp_report: Filename for BLASTp report. Default: blastp_report.txt

--refseq: Reference sequence used for blastp analysis. Options: bovine, squid, microbe, custom. Default: bovine

--custom_ref_file: Path to a custom reference sequence file for BLASTp.  Required if --refseq custom is selected.

Bootstrap Analysis Args (optional):

--bootstrap: Enable bootstrap predictions.

--visualize_bootstrap: Enable visualization of bootstrap predictions.

--bootstrap_num: Number of bootstrap models to load for prediction replicates. Default // Maximum: 100

--bootstrap_viz_file: Filename prefix for bootstrap visualization. Default: bootstrap_viz

--save_viz_as: File type for bootstrap visualizations. Options: SVG, PNG, or PDF Default: SVG

--full_spectrum_xaxis: Enables visualization of predictions on a full spectrum x-axis (300-650nm). Otherwise, x-axis is scaled with predictions.

Example Command Line Usage vvv

python optics_predictions.py -i ./examples/optics_ex_short.txt -o ex_test_of_optics -p ex_predictions -m wildtype -e aa_prop --blastp -blastp_report blastp_report.txt --refseq squid --bootstrap --visualize_bootstrap --bootstrap_viz_file bootstrap_viz --save_viz_as SVG

Input

Unaligned FASTA file containing opsin amino-acid sequences.

Example FASTA Entry:

  >NP_001014890.1_rhodopsin_Bos_taurus
  MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRT 
  PLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVC 
  KPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVV 
  HFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQG 
  SDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTTLCCGKNPLGDDEASTTVSKTETSQVAPA

Output

Predictions (TSV): λmax values, model used, and encoding method.
BLAST Results (TXT, optional): Comparison of query sequences to reference datasets.
Bootstrap Graphs (PDF, optional): Visualization of bootstrap prediction results.
Job Log (TXT): Log file containing input command to OPTICS, including encoding method and model used.

Note - All outputs are written into subfolders generated based on your 'prediction-prefix' under your specified output directory, and are marked by time and date.

Using the OPTICS GUI (`run_optics_gui.py`) - An more user-friendly alternative to command-line

That's right! No-need for command line, OPTICS can also be used as a GUI! The usage is quite simple, just use the command below (with your OPTICS conda enviornment activated) and get to predicting. ;)

Example GUI Usage vvv

python run_optics_gui.py

Example of the OPTICS GUI interface

Understanding the λmax Prediction Models

The --model flag allows you to select a specific pre-trained model for wavelength prediction. Each available model is named after the data-subset it was trained on, allowing you to choose the one best suited for your research question. This was originally done to test how factors like taxonomic group or gene family inclusivity impact prediction performance.

Base Model Datasets

The primary models include:

whole-dataset: Trained on the entire VPOD dataset, including all taxonomic groups and both wild-type and mutant sequences. In most cases, this is the recommended model as it leverages the most data.
- Generally, more data = better models (assuming that data is good data)
wildtype: Trained exclusively on wild-type opsin sequences, with all mutant sequences removed.
vertebrate: Trained only on sequences from the phylum Chordata.
invertebrate: Trained only on sequences from species not in the phylum Chordata.
wildtype-vert: A more specific subset containing only wild-type sequences from vertebrates.

The `-mnm` Suffix: Dataset Augmentation with Mine-n-Match (MNM)

The key difference between models with and without the -mnm suffix lies in the source of the phenotype data (the λmax values).

Standard models (e.g., wildtype): These are trained exclusively on data where the sequence-to-relationship was validated experimentally through heterologous expression. This represents a controlled, in-vitro dataset.
-mnm models (e.g., wildtype-mnm): These are trained on an augmented dataset. It includes the standard heterologous expression data plus additional data from our "Mine-n-Match" (mnm) procedure. This process systematically infers connections between sequences and in-vivo measurements, providing a broader and more biologically contextualized training set.
- Note, the methodology behind MNM and the implimentation of that data into VPOD/OPTICS is elaborated upon in our publication introducing OPTICS (Frazer et al. 2025 )

Explaining Prediction Differences with SHAP (`optics_shap.py`)

For users interested in the "nitty-gritty" of why sequences have different predicted λmax values, we provide a specialized script that uses SHAP (SHapley Additive exPlanations). This tool generates a plot and detailed data files that attribute the difference in prediction to specific features (i.e., amino acid sites and their properties).

Example SHAP plot for explaining individual predictions of opsin λmax by OPTICS

Example SHAP comparison plot for explaining pair-wise differences in predictions of opsin λmax by OPTICS

This script requires a FASTA file

File must contain at least two or more sequences if you are running a SHAP comparison.
Only a single sequence is needed for an individual SHAP explination

SHAP Script Parameters

Most parameters are identical to the main prediction script. Below are the key arguments:

Required Args:
  -i, --input: Path to a FASTA file containing two sequences to compare.

Optional Args:
  -o, --output_dir: Directory to save the SHAP analysis output folder.
  -p, --prediction_prefix: Base filename for the SHAP plot and data files.
  --mode: Analysis mode: select 'comparison' for pairwise SHAP comparison of all sequence predictions, 'single' for individual SHAP explinations of all sequences, or 'both' for both outputs.
  -m, --model: Prediction model to use for the comparison.
  -e, --encoding: Encoding method to use.
  --save_viz_as: File type for the SHAP visualization (svg, png, or pdf).
  --use_reference_sites : Enable to use reference site numbering (i.e. - Bovine or Squid Rhodopsin), instead of feature names.

Example Command Line Usage vvv

python optics_shap.py -i ./examples/optics_ex_short.fasta -o ./examples -p short_ex_test_aa_prop --mode both --use_reference_sites

Input

Unaligned FASTA file containing any number of opsin amino-acid sequences for shap comparison.
Please note - if you are doing comparison mode (or both) this is combinatorial (so all sequences will be ccompared in pairwise fashion) which can become computationally expensive.

Output

SHAP Plot (SVG/PNG/PDF): Visual explanation for the top 10 sites cotributing to prediction differences.
SHAP Data (CSV): Detailed feature attribution values.
Run Log (TXT): A record of the commands use and other information pertaining to the shap prediction.

***Note - Once again, all outputs are written into subfolders generated based on your 'prediction-prefix' under your specified output directory, and are marked by time and date.

License

All data and code is covered under a GNU General Public License (GPL)(Version 3), in accordance with Open Source Initiative (OSI)-policies

Citation

IF citing this GitHub and its contents use the following DOI provided by Zenodo...
```
10.5281/zenodo.10667840
```

IF you use OPTICS in your research, please cite the following paper(s):

Our more recent publication directly on the making/utility of OPTICS.

Seth A. Frazer, Todd H. Oakley. Accessible and Robust Machine Learning Approaches to Improve the Opsin Genotype-Phenotype Map. bioRxiv, 2025.08.22.671864. https://doi.org/10.1101/2025.08.22.671864

Our original paper on the development of VPOD; the opsin genotype-phenotype database backbone for training the ML models used in OPTICS.

Seth A. Frazer, Mahdi Baghbanzadeh, Ali Rahnavard, Keith A. Crandall, & Todd H Oakley. Discovering genotype-phenotype relationships with machine learning and the Visual Physiology Opsin Database (VPOD). GigaScience, 2024.09.01. https://doi.org/10.1093/gigascience/giae073

Contact

Contact information for author questions or feedback.

Todd H. Oakley - ORCID ID

oakley@ucsb.edu

Seth A. Frazer - ORCID ID

sethfrazer@ucsb.edu

Additional Notes/Resources

Want to use OPTICS without the hassle of the setup? -> CLICK HERE to visit our Galaxy Project server and use our tool!
OPTICS v1.3 uses VPOD_v1.3 for training.
Here is a link to a bibliography of the publications used to build VPOD_v1.2 (VPOD_v1.3 version not yet released)
If you know of publications for training opsin ML models not included in the VPOD_v1.2 database, please send them to us through this form
Check out the VPOD GitHub repository to learn more about our database and ML models!

Name		Name	Last commit message	Last commit date
Latest commit History 221 Commits
data		data
deepBreaks		deepBreaks
examples		examples
models		models
optics_scripts		optics_scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
AUTHORS.txt		AUTHORS.txt
LICENSE.txt		LICENSE.txt
README.md		README.md
example_commands.txt		example_commands.txt
optics_predictions.py		optics_predictions.py
optics_shap.py		optics_shap.py
requirements.txt		requirements.txt
run_optics_gui.py		run_optics_gui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Opsin Phenotype Tool for Inference of Color Sensitivity (OPTICS) [v1.3]

Description

Key Features

Table of Contents

Installation

THEN

Usage

Main prediction script (`optics_predictions.py`)

Example Command Line Usage vvv

Input

Output

Using the OPTICS GUI (`run_optics_gui.py`) - An more user-friendly alternative to command-line

Example GUI Usage vvv

Understanding the λmax Prediction Models

Base Model Datasets

The `-mnm` Suffix: Dataset Augmentation with Mine-n-Match (MNM)

Explaining Prediction Differences with SHAP (`optics_shap.py`)

SHAP Script Parameters

Example Command Line Usage vvv

Input

Output

License

Citation

Contact

Additional Notes/Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

VisualPhysiologyDB/optics

Folders and files

Latest commit

History

Repository files navigation

Opsin Phenotype Tool for Inference of Color Sensitivity (OPTICS) [v1.3]

Description

Key Features

Table of Contents

Installation

THEN

Usage

Main prediction script (optics_predictions.py)

Example Command Line Usage vvv

Input

Output

Using the OPTICS GUI (run_optics_gui.py) - An more user-friendly alternative to command-line

Example GUI Usage vvv

Understanding the λmax Prediction Models

Base Model Datasets

The -mnm Suffix: Dataset Augmentation with Mine-n-Match (MNM)

Explaining Prediction Differences with SHAP (optics_shap.py)

SHAP Script Parameters

Example Command Line Usage vvv

Input

Output

License

Citation

Contact

Additional Notes/Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Main prediction script (`optics_predictions.py`)

Using the OPTICS GUI (`run_optics_gui.py`) - An more user-friendly alternative to command-line

The `-mnm` Suffix: Dataset Augmentation with Mine-n-Match (MNM)

Explaining Prediction Differences with SHAP (`optics_shap.py`)

Packages