diff --git a/README.md b/README.md new file mode 100644 index 0000000..c01f2b4 --- /dev/null +++ b/README.md @@ -0,0 +1,231 @@ +# Genomic Data Platform Template + +This repository serves as a template for creating integrated genomic data platforms with built-in genome browsers. It was originally developed for *Ostrea chilensis* (Chilean flat oyster) research but has been generalized for use with any species. + +## Features + +- 🧬 **Integrated Genome Browser**: Built-in JBrowse genome browser for interactive genomic data exploration +- πŸ“Š **Interactive Data Tables**: R-powered tables for annotations, BLAST results, and pathway data +- 🎨 **Responsive Web Interface**: Clean, modern website built with Quarto +- πŸ”§ **Easy Customization**: Template-driven configuration system +- πŸ“± **Mobile Friendly**: Works on desktop and mobile devices +- 🌐 **GitHub Pages Ready**: Easy deployment to GitHub Pages + +## Technology Stack + +- **[Quarto](https://quarto.org/)**: Website generation and R integration +- **[JBrowse](https://jbrowse.org/)**: Genome browser component +- **R**: Data analysis and interactive tables +- **YAML**: Configuration management + +## Quick Start + +### 1. Prerequisites + +- [Quarto](https://quarto.org/docs/get-started/) (>= 1.0) +- [R](https://www.r-project.org/) (>= 4.0) with packages: + - `tidyverse` + - `DT` + - `readxl` + - `plotly` +- Python (>= 3.6) with `pyyaml` and `jinja2` + +### 2. Setup Your Project + +1. **Fork or clone this repository** + ```bash + git clone https://github.com/RobertsLab/OCEAN.git + cd OCEAN + ``` + +2. **Install Python dependencies** + ```bash + pip install pyyaml jinja2 + ``` + +3. **Configure your project** + + Copy and edit the configuration file: + ```bash + cp template-config.yml my-project-config.yml + ``` + + Edit `my-project-config.yml` with your species and project information: + ```yaml + # Project Information + project: + name: "MyGenome" + full_name: "My Species Genome Project" + description: "An integrated platform for my species research" + + # Species Information + species: + scientific_name: "Species scientificus" + common_name: "common species name" + description: "the common species" + emoji: "🧬" + + # Update data sources with your URLs + data_sources: + nr_blast: "https://example.com/path/to/nr.csv" + uniprot: "https://example.com/path/to/uniprot.txt" + # ... etc + ``` + +4. **Generate your project files** + ```bash + python setup-template.py my-project-config.yml + ``` + +5. **Add your genomic data** + + Place your genome files in the appropriate directories: + ``` + docs/jbrowse/data/ + β”œβ”€β”€ v1/ + β”‚ β”œβ”€β”€ your-genome.fasta + β”‚ β”œβ”€β”€ your-genome.fasta.fai + β”‚ β”œβ”€β”€ genes.gff3 + β”‚ └── annotations/ + └── tracks/ + ``` + +6. **Build and preview** + ```bash + cd quarto + quarto preview + ``` + +### 3. Customize Further + +#### Website Styling +- Edit `quarto/styles.css` for custom styling +- Modify `quarto/_quarto.yml` for navigation and theme changes + +#### Genome Browser +- Update JBrowse configuration in `docs/jbrowse/config.json` +- Add track data to `docs/jbrowse/data/` +- Configure assemblies, annotations, and quantitative tracks + +#### Data Sources +- Update URLs in your config file to point to your data +- Modify R code in `explore.qmd` template for custom analysis + +## Configuration Reference + +### Project Settings +```yaml +project: + name: "ProjectName" # Short identifier + full_name: "Full Project Name" # Display name + description: "Project description" +``` + +### Species Information +```yaml +species: + scientific_name: "Genus species" + common_name: "common name" + description: "descriptive text" + emoji: "🧬" # Optional branding emoji +``` + +### Data Sources +All data source URLs can be customized: +```yaml +data_sources: + nr_blast: "URL to BLAST results CSV" + uniprot: "URL to UniProt annotations" + pathways: "URL to pathway data" + gene_gff: "URL to gene annotations GFF" +``` + +### Population/Study Data +The template includes optional population study components: +```yaml +population_info: + enabled: true # Set to false to hide this section + title: "Population Information" + description: "Study description" + map_embed: "Google Maps embed URL" + locations: + - name: "Site 1" + salinity: "High" + # ... other characteristics +``` + +### JBrowse Configuration +Define genome assemblies and basic tracks: +```yaml +jbrowse: + version: "v3.6.4" + assemblies: + - name: "Assembly Name" + fasta_url: "URL to FASTA file" + fai_url: "URL to FASTA index" +``` + +## Directory Structure + +``` +your-project/ +β”œβ”€β”€ template-config.yml # Main configuration +β”œβ”€β”€ setup-template.py # Template processor +β”œβ”€β”€ templates/ # Template files +β”‚ β”œβ”€β”€ index.qmd.template +β”‚ β”œβ”€β”€ about.qmd.template +β”‚ β”œβ”€β”€ explore.qmd.template +β”‚ β”œβ”€β”€ _quarto.yml.template +β”‚ └── jbrowse-config.json.template +β”œβ”€β”€ quarto/ # Generated Quarto source +β”‚ β”œβ”€β”€ index.qmd +β”‚ β”œβ”€β”€ about.qmd +β”‚ β”œβ”€β”€ explore.qmd +β”‚ β”œβ”€β”€ _quarto.yml +β”‚ β”œβ”€β”€ img/ +β”‚ └── styles.css +└── docs/ # Generated website + β”œβ”€β”€ index.html + β”œβ”€β”€ about.html + β”œβ”€β”€ jbrowse/ + β”‚ β”œβ”€β”€ config.json + β”‚ β”œβ”€β”€ data/ + β”‚ └── index.html + └── ... +``` + +## Deployment + +### GitHub Pages +1. Push your repository to GitHub +2. Enable GitHub Pages in repository settings +3. Set source to `/docs` folder +4. Your site will be available at `https://username.github.io/repository-name` + +### Manual Deployment +Build the site and deploy the `docs/` folder to any web server: +```bash +cd quarto +quarto render +# Upload docs/ folder to your web server +``` + +## Examples + +See the original OCEAN project configuration for a complete example of a working genomic data platform. + +## Contributing + +Feel free to submit issues and enhancement requests! This template is designed to be flexible and extensible. + +## License + +This template is provided under the same license as the original OCEAN project. + +## Citation + +If you use this template for your research, please cite the original OCEAN project and any relevant publications. + +--- + +**Need help?** Check the [Issues](https://github.com/RobertsLab/OCEAN/issues) page or submit a new issue with the `template` label. \ No newline at end of file diff --git a/TEMPLATE-GUIDE.md b/TEMPLATE-GUIDE.md new file mode 100644 index 0000000..09bc8a8 --- /dev/null +++ b/TEMPLATE-GUIDE.md @@ -0,0 +1,184 @@ +# Using OCEAN as a Template + +This directory contains the template system for creating new genomic data platforms based on the OCEAN architecture. + +## Quick Start for New Projects + +### 1. Copy and Configure + +```bash +# Copy the template configuration +cp template-config.yml my-species-config.yml + +# Edit the configuration for your species +vim my-species-config.yml +``` + +### 2. Key Configuration Sections + +#### Project Information +```yaml +project: + name: "YourProject" # Short name, used in code + full_name: "Your Species Genomics Platform" # Display name + description: "Brief description of your platform" +``` + +#### Species Information +```yaml +species: + scientific_name: "Genus species" + common_name: "common name" + description: "the [adjective] [common name]" # Used in sentences + emoji: "🧬" # Optional branding emoji +``` + +#### Data Sources +Update all URLs to point to your data: +```yaml +data_sources: + nr_blast: "https://your-server.com/blast-results.csv" + uniprot: "https://your-server.com/uniprot-annotations.txt" + pathways: "https://your-server.com/pathway-data.csv" + gene_gff: "https://your-server.com/gene-annotations.gff3" +``` + +#### JBrowse Configuration +```yaml +jbrowse: + assemblies: + - name: "Your Assembly v1.0" + fasta_url: "https://your-server.com/genome.fasta" + fai_url: "https://your-server.com/genome.fasta.fai" +``` + +### 3. Generate Your Site + +```bash +# Install Python dependencies +pip install pyyaml jinja2 + +# Generate your project files +python setup-template.py my-species-config.yml + +# Copy your genome data files +mkdir -p docs/jbrowse/data/v1/ +cp your-genome-files/* docs/jbrowse/data/v1/ + +# Copy your images +cp your-logo.png quarto/img/ +``` + +### 4. Build and Test + +```bash +cd quarto +quarto preview # Test locally +quarto render # Build for deployment +``` + +## Template Customization + +### Adding New Data Sections + +1. Add data sources to your config: +```yaml +data_sources: + new_data_type: "https://your-server.com/new-data.csv" +``` + +2. Modify `templates/explore.qmd.template` to add R code for your new data. + +3. Regenerate with `python setup-template.py your-config.yml` + +### Customizing Population/Study Information + +The template includes a flexible population information section: + +```yaml +population_info: + enabled: true # Set to false to hide completely + title: "Population Information" # Customize section title + description: "Your study description" + + # Optional map + map_embed: "Google Maps embed URL" + + # Customizable location table + locations: + - name: "Site 1" + salinity: "High" + freshwater_input: "Low" + tidal_exchange: "High" + human_influence: "Low" +``` + +You can: +- Change column names by editing the template +- Add/remove columns as needed +- Disable the entire section with `enabled: false` + +### JBrowse Tracks + +The template creates basic gene, ncRNA, and repeat tracks. To add more: + +1. Add track data sources to your config +2. Edit `templates/jbrowse-config.json.template` +3. Add new track definitions following JBrowse v3 syntax + +### Styling and Branding + +- Update `quarto/styles.css` for custom CSS +- Replace `quarto/img/` files with your logos/images +- Modify the celebration message and platform features in your config + +## Example Configurations + +See the `examples/` directory for: +- `arabidopsis-config.yml` - Plant genomics example +- Additional species examples (add your own!) + +## File Structure After Setup + +``` +your-project/ +β”œβ”€β”€ template-config.yml # Original template config +β”œβ”€β”€ my-species-config.yml # Your customized config +β”œβ”€β”€ setup-template.py # Template processor +β”œβ”€β”€ templates/ # Template source files +β”œβ”€β”€ quarto/ # Generated Quarto source +β”‚ β”œβ”€β”€ index.qmd # Homepage +β”‚ β”œβ”€β”€ about.qmd # About page with species info +β”‚ β”œβ”€β”€ explore.qmd # Data exploration page +β”‚ β”œβ”€β”€ browse.qmd # Genome browser page +β”‚ └── _quarto.yml # Quarto configuration +└── docs/ # Generated website + β”œβ”€β”€ index.html + β”œβ”€β”€ jbrowse/ + β”‚ β”œβ”€β”€ config.json # JBrowse configuration + β”‚ └── data/ # Your genome data files + └── ... +``` + +## Tips for Success + +1. **Start Small**: Begin with basic gene and assembly data, add complexity later +2. **Test Early**: Use `quarto preview` to test changes immediately +3. **Version Control**: Commit your config files and generated templates +4. **Document Changes**: Keep notes on customizations for future reference +5. **Community**: Share your configurations as examples for others + +## Troubleshooting + +### Common Issues + +1. **Missing Dependencies**: Install Quarto, R packages, and Python modules +2. **Data URLs**: Ensure all data sources are publicly accessible +3. **File Paths**: Use absolute URLs for external data, relative paths for local files +4. **JBrowse Data**: Ensure FASTA files have corresponding .fai index files + +### Getting Help + +- Check the main README.md for general setup instructions +- Review example configurations in `examples/` +- Submit issues with the `template` label for template-specific problems \ No newline at end of file diff --git a/docs/jbrowse/config.json b/docs/jbrowse/config.json index 8590f7e..667a9d1 100644 --- a/docs/jbrowse/config.json +++ b/docs/jbrowse/config.json @@ -1,11 +1,12 @@ { "baseUrl": "jbrowse/", "assemblies": [ + { "name": "Assembly 1.0", "sequence": { "type": "ReferenceSequenceTrack", - "trackId": "custom_refseq", + "trackId": "assembly_1.0_refseq", "adapter": { "type": "IndexedFastaAdapter", "fastaLocation": { @@ -17,11 +18,12 @@ } } }, + { "name": "Och_HapA", "sequence": { "type": "ReferenceSequenceTrack", - "trackId": "hapa_refseq", + "trackId": "och_hapa_refseq", "adapter": { "type": "IndexedFastaAdapter", "fastaLocation": { @@ -32,29 +34,14 @@ } } } - }, - { - "name": "Och_HapB", - "sequence": { - "type": "ReferenceSequenceTrack", - "trackId": "hapb_refseq", - "adapter": { - "type": "IndexedFastaAdapter", - "fastaLocation": { - "uri": "https://gannet.fish.washington.edu/v1_web/owlshell/bu-github/OCEAN/docs/jbrowse/data/HapB/Och_HapB_assembly.fa" - }, - "faiLocation": { - "uri": "https://gannet.fish.washington.edu/v1_web/owlshell/bu-github/OCEAN/docs/jbrowse/data/HapB/Och_HapB_assembly.fa.fai" - } - } - } } + ], "tracks": [ { "type": "FeatureTrack", "trackId": "gene_annotations", - "name": "Gene Annotations", + "name": "Genes", "assemblyNames": ["Assembly 1.0"], "adapter": { "type": "Gff3Adapter", @@ -82,159 +69,12 @@ "uri": "https://gannet.fish.washington.edu/v1_web/owlshell/bu-github/OCEAN/docs/jbrowse/data/v1/repeat.gff3" } } - }, - { - "type": "QuantitativeTrack", - "trackId": "Pum_barcode02", - "name": "Pum_02", - "assemblyNames": ["Assembly 1.0"], - "adapter": { - "type": "BigWigAdapter", - "bigWigLocation": { "uri": "data/v1/exp_01/Pum_barcode02.bw" } - }, - "color": "blue" - }, - { - "type": "QuantitativeTrack", - "trackId": "Pum_barcode12_1", - "name": "Pum_12_1", - "assemblyNames": ["Assembly 1.0"], - "adapter": { - "type": "BigWigAdapter", - "bigWigLocation": { "uri": "data/v1/exp_01/Pum_barcode12_1.bw" } - }, - "color": "blue" - }, - { - "type": "QuantitativeTrack", - "trackId": "Pum_barcode13", - "name": "Pum_13", - "assemblyNames": ["Assembly 1.0"], - "adapter": { - "type": "BigWigAdapter", - "bigWigLocation": { "uri": "data/v1/exp_01/Pum_barcode13.bw" } - }, - "color": "blue" - }, - { - "type": "QuantitativeTrack", - "trackId": "Pum_barcode13_1", - "name": "Pum_13_1", - "assemblyNames": ["Assembly 1.0"], - "adapter": { - "type": "BigWigAdapter", - "bigWigLocation": { "uri": "data/v1/exp_01/Pum_barcode13_1.bw" } - }, - "color": "blue" - }, - { - "type": "QuantitativeTrack", - "trackId": "Qui_barcode01", - "name": "Qui_01", - "assemblyNames": ["Assembly 1.0"], - "adapter": { - "type": "BigWigAdapter", - "bigWigLocation": { "uri": "data/v1/exp_01/Qui_barcode01.bw" } - }, - "color": "green" - }, - { - "type": "QuantitativeTrack", - "trackId": "Qui_barcode01_1", - "name": "Qui_01_1", - "assemblyNames": ["Assembly 1.0"], - "adapter": { - "type": "BigWigAdapter", - "bigWigLocation": { "uri": "data/v1/exp_01/Qui_barcode01_1.bw" } - }, - "color": "green" - }, - { - "type": "QuantitativeTrack", - "trackId": "Qui_barcode02_1", - "name": "Qui_02_1", - "assemblyNames": ["Assembly 1.0"], - "adapter": { - "type": "BigWigAdapter", - "bigWigLocation": { "uri": "data/v1/exp_01/Qui_barcode02_1.bw" } - }, - "color": "green" - }, - { - "type": "QuantitativeTrack", - "trackId": "Qui_barcode12", - "name": "Qui_12", - "assemblyNames": ["Assembly 1.0"], - "adapter": { - "type": "BigWigAdapter", - "bigWigLocation": { "uri": "data/v1/exp_01/Qui_barcode12.bw" } - }, - "color": "green" - }, - { - "type": "QuantitativeTrack", - "trackId": "Rio_barcode03", - "name": "Rio_03", - "assemblyNames": ["Assembly 1.0"], - "adapter": { - "type": "BigWigAdapter", - "bigWigLocation": { "uri": "data/v1/exp_01/Rio_barcode03.bw" } - }, - "color": "red" - }, - { - "type": "QuantitativeTrack", - "trackId": "Rio_barcode03_1", - "name": "Rio_03_1", - "assemblyNames": ["Assembly 1.0"], - "adapter": { - "type": "BigWigAdapter", - "bigWigLocation": { "uri": "data/v1/exp_01/Rio_barcode03_1.bw" } - }, - "color": "red" - }, - { - "type": "QuantitativeTrack", - "trackId": "Rio_barcode14", - "name": "Rio_14", - "assemblyNames": ["Assembly 1.0"], - "adapter": { - "type": "BigWigAdapter", - "bigWigLocation": { "uri": "data/v1/exp_01/Rio_barcode14.bw" } - }, - "color": "red" - }, - { - "type": "QuantitativeTrack", - "trackId": "Rio_barcode14_1", - "name": "Rio_14_1", - "assemblyNames": ["Assembly 1.0"], - "adapter": { - "type": "BigWigAdapter", - "bigWigLocation": { "uri": "data/v1/exp_01/Rio_barcode14_1.bw" } - }, - "color": "red" - }, - { - "type": "QuantitativeTrack", - "trackId": "SRR30335149", - "name": "SRR30335149", - "assemblyNames": ["Assembly 1.0"], - "adapter": { - "type": "BigWigAdapter", - "bigWigLocation": { "uri": "https://gannet.fish.washington.edu/v1_web/owlshell/bu-github/OCEAN/docs/jbrowse/data/v1/SRR30335149.sorted.sp.bedGraph.bw" } - }, - "color": "purple" - }, + } + ], + "plugins": [ { - "type": "FeatureTrack", - "trackId": "deg_annotations", - "name": "DEG", - "assemblyNames": ["Assembly 1.0"], - "adapter": { - "type": "Gff3Adapter", - "gffLocation": { "uri": "data/v1/exp_01/DEG.gtf" } - } + "name": "UMDUrlPlugin", + "url": "umd_plugin.js" } ] } \ No newline at end of file diff --git a/examples/arabidopsis-config.yml b/examples/arabidopsis-config.yml new file mode 100644 index 0000000..b49483f --- /dev/null +++ b/examples/arabidopsis-config.yml @@ -0,0 +1,91 @@ +# Example Template Configuration for Arabidopsis thaliana +# This demonstrates how to adapt the platform for a different species + +# Project Information +project: + name: "ArabGenome" + full_name: "Arabidopsis thaliana Genomic Analysis Platform" + description: "An integrated web platform for plant genomics research" + +# Species Information +species: + scientific_name: "Arabidopsis thaliana" + common_name: "thale cress" + description: "the model plant organism" + emoji: "🌱" + +# Website Configuration +website: + title: "ArabGenome" + theme: "flatly" + github_repo: "https://github.com/yourlab/arabgenome" + contact_message: "For questions or feedback, please submit an issue" + +# Genome Browser Configuration +jbrowse: + version: "v3.6.4" + assemblies: + - name: "TAIR10" + fasta_url: "https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_chromosome_files/TAIR10_chr_all.fas" + fai_url: "https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_chromosome_files/TAIR10_chr_all.fas.fai" + +# Data Sources - Update these URLs to your actual data +data_sources: + nr_blast: "https://example.com/data/arabidopsis_nr_blast.csv" + uniprot: "https://example.com/data/arabidopsis_uniprot.txt" + pathways: "https://example.com/data/arabidopsis_pathways.csv" + gene_gff: "https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_gff3/TAIR10_GFF3_genes.gff" + + # Genome browser specific + gene_annotations: "data/TAIR10_genes.gff3" + ncrna_annotations: "data/TAIR10_ncrna.gff3" + repeat_annotations: "data/TAIR10_repeats.gff3" + +# Population/Study Information - Customize for your research +population_info: + enabled: true + title: "Ecotype Information" + description: "Analysis includes multiple Arabidopsis ecotypes from different geographic regions" + + # No map needed for this example + map_embed: "" + + locations: + - name: "Columbia (Col-0)" + salinity: "N/A" + freshwater_input: "Temperate climate" + tidal_exchange: "N/A" + human_influence: "Laboratory strain" + - name: "Landsberg erecta (Ler)" + salinity: "N/A" + freshwater_input: "European origin" + tidal_exchange: "N/A" + human_influence: "Wild collected" + - name: "Wassilewskija (Ws)" + salinity: "N/A" + freshwater_input: "European origin" + tidal_exchange: "N/A" + human_influence: "Wild collected" + + # Genetic relatedness + genetic_relatedness: + enabled: true + image_url: "https://example.com/arabidopsis_phylogeny.png" + +# Featured Views +featured_views: + - title: "Flowering Time Genes" + url: "jbrowse/?loc=Chr5:25000000-26000000" + - title: "Disease Resistance Cluster" + url: "jbrowse/?loc=Chr1:21000000-22000000" + +# Custom branding +branding: + logo_image: "img/arabidopsis.png" + celebration_message: "Explore the plant genome!" + platform_features: + - "🧬 Genome browser for *Arabidopsis thaliana*" + - "πŸ” Transcriptomic data" + - "πŸ“Š Interactive tools for gene expression and pathway analysis" + - "🧠 Curated annotations including GO terms and metabolic pathways" + - "🌐 Data sharing & collaboration hub for the plant research community" \ No newline at end of file diff --git a/quarto/_quarto.yml b/quarto/_quarto.yml index ae2b256..f86da4e 100644 --- a/quarto/_quarto.yml +++ b/quarto/_quarto.yml @@ -21,5 +21,4 @@ format: html: theme: simplex css: styles.css - toc: true \ No newline at end of file diff --git a/quarto/about.qmd b/quarto/about.qmd index a4baa36..e3e03b7 100644 --- a/quarto/about.qmd +++ b/quarto/about.qmd @@ -10,12 +10,17 @@ OCEAN is an integrated web platform dedicated to advancing research on *Ostrea chilensis*, the Chilean flat oyster. The platform offers:\ -β€’ 🧬 Genome browser for *O. chilensis*\ + +β€’ 🧬 Genome browser for {species.scientific_name}\ + β€’ πŸ” Epigenetic data\ -β€’ πŸ“Š Interactive tools for gene expression and regulatory region -analysis\ + +β€’ πŸ“Š Interactive tools for gene expression and regulatory region analysis\ + β€’ 🧠 Curated annotations including GO terms and pathway associations\ -β€’ 🌐 Data sharing & collaboration hub for the oyster research community + +β€’ 🌐 Data sharing & collaboration hub for the research community\ + ::: badges

@@ -29,26 +34,32 @@ analysis\ # Featured Genome Browser Views πŸ‘€ + - ### [Gene Expression following Heat Exposure](https://robertslab.github.io/OCEAN/jbrowse/?session=share-bS-rRTROpE&password=r9vWX) - -- ### [DEG MSTRG.1743.1 zoom](https://oystergen.es/jbrowse/?session=share-fpBeSAviSo&password=scJyF) + +- ### [DEG MSTRG.1743.1 zoom](https://oystergen.es/jbrowse/?session=share-fpBeSAviSo&password=scJyF) + ------------------------------------------------------------------------ + # Population Information ## Site Location Information Our focus to date is from oysters from three locales - | Location | Salinity | Freshwater Input | Tidal Exchange | Human Influence | |--------------|--------------|----------------|--------------|--------------| + | RΓ­o Pudeto | Moderate (brackish) | Moderate river inflow | Moderate | Moderate | + | PumalΓ­n | Low (oligohaline) | High (glacial & river) | Low | Low (pristine) | + | Isla Quihua | High (marine) | Low to moderate | High | Moderate to high | @@ -56,3 +67,5 @@ Our focus to date is from oysters from three locales ## Genetic Relatedness ![](http://gannet.fish.washington.edu/seashell/snaps/Monosnap_Monosnap_2025-08-03_15-06-38.png) + + diff --git a/quarto/explore.qmd b/quarto/explore.qmd index 103ff43..fe0ce76 100644 --- a/quarto/explore.qmd +++ b/quarto/explore.qmd @@ -15,14 +15,11 @@ library(readxl) library(plotly) library(readr) library(dplyr) - ``` - ## NCBI NR BLAST ```{r, echo=FALSE} - # Read the CSV file nr_data <- read_csv("http://gannet.fish.washington.edu/seashell/snaps/nr.csv") @@ -42,6 +39,7 @@ datatable( rownames = FALSE ) ``` + ## UniProt ```{r, echo=FALSE} @@ -62,11 +60,9 @@ datatable( ) ``` - ## Pathways ```{r, echo=FALSE} - # Read your Pathway CSV # Read the CSV df <- read_csv("http://gannet.fish.washington.edu/seashell/snaps/pathway_table.csv") @@ -94,7 +90,6 @@ datatable( ) ``` - ## Gene GFF ```{r, echo=FALSE} @@ -105,8 +100,6 @@ gff <- read_tsv( col_names = c("seqid", "source", "type", "start", "end", "score", "strand", "phase", "attributes") ) - - datatable( gff, extensions = 'Buttons', @@ -117,8 +110,4 @@ datatable( ), filter = "top" ) - - - -``` - +``` \ No newline at end of file diff --git a/quarto/index.qmd b/quarto/index.qmd index 73ab195..991c539 100644 --- a/quarto/index.qmd +++ b/quarto/index.qmd @@ -6,9 +6,8 @@ editor: wrap: 72 --- -OCEAN (***O**stra **c**hilensis*: **E**pigΓ©noma **A**nΓ‘lisis en **N**et) -is an integrated web platform dedicated to advancing research on *Ostrea -chilensis*, the Chilean flat oyster. By combining high-resolution +Ostrea chilensis: EpigΓ©noma AnΓ‘lisis en Net +is an integrated web platform dedicated to advancing research on *Ostrea chilensis*, the Chilean flat oyster. By combining high-resolution genomic data with cutting-edge epigenomic insights, OCEAN supports researchers exploring gene regulation, environmental plasticity, and resilience in this ecologically important marine species. @@ -29,5 +28,5 @@ Browser]{style="font-size: 1.1em;"} ::: -For questions or feedback, [please submit an -issue](https://github.com/RobertsLab/OCEAN/issues/new/choose) 🧬 +For questions or feedback, please submit an issue, [please submit an +issue](https://github.com/RobertsLab/OCEAN/issues/new/choose) 🧬 \ No newline at end of file diff --git a/setup-template.py b/setup-template.py new file mode 100755 index 0000000..be0bdfc --- /dev/null +++ b/setup-template.py @@ -0,0 +1,84 @@ +#!/usr/bin/env python3 +""" +Template processor for genomic data platform +Generates project files from templates and configuration +""" + +import yaml +import json +import os +import sys +from pathlib import Path +from jinja2 import Environment, FileSystemLoader, Template + +def load_config(config_path): + """Load YAML configuration file""" + with open(config_path, 'r') as f: + return yaml.safe_load(f) + +def process_template(template_path, output_path, config): + """Process a single template file""" + env = Environment(loader=FileSystemLoader(Path(template_path).parent)) + template = env.get_template(Path(template_path).name) + + # Render the template + rendered = template.render(**config) + + # Write to output file + with open(output_path, 'w') as f: + f.write(rendered) + + print(f"Generated: {output_path}") + +def setup_project(config_path): + """Set up a new project from templates""" + # Load configuration + config = load_config(config_path) + + # Create directories + quarto_dir = Path("quarto") + docs_dir = Path("docs") + jbrowse_dir = docs_dir / "jbrowse" + + quarto_dir.mkdir(exist_ok=True) + jbrowse_dir.mkdir(parents=True, exist_ok=True) + + # Template mappings: (template_file, output_file) + templates = [ + ("templates/index.qmd.template", "quarto/index.qmd"), + ("templates/about.qmd.template", "quarto/about.qmd"), + ("templates/explore.qmd.template", "quarto/explore.qmd"), + ("templates/browse.qmd.template", "quarto/browse.qmd"), + ("templates/_quarto.yml.template", "quarto/_quarto.yml"), + ("templates/jbrowse-config.json.template", "docs/jbrowse/config.json") + ] + + # Process each template + for template_file, output_file in templates: + if Path(template_file).exists(): + process_template(template_file, output_file, config) + else: + print(f"Warning: Template {template_file} not found") + + print(f"\nProject setup complete! Website title: {config['website']['title']}") + print(f"Species: {config['species']['scientific_name']}") + print("\nNext steps:") + print("1. Update your data sources in template-config.yml") + print("2. Add your genome data files to docs/jbrowse/data/") + print("3. Build the website with: quarto render quarto/") + +def main(): + if len(sys.argv) != 2: + print("Usage: python setup-template.py ") + print("Example: python setup-template.py template-config.yml") + sys.exit(1) + + config_file = sys.argv[1] + if not Path(config_file).exists(): + print(f"Error: Config file {config_file} not found") + sys.exit(1) + + setup_project(config_file) + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/template-config.yml b/template-config.yml new file mode 100644 index 0000000..f54efbe --- /dev/null +++ b/template-config.yml @@ -0,0 +1,95 @@ +# Template Configuration for Genomic Data Platform +# Copy this file and customize for your species/project + +# Project Information +project: + name: "OCEAN" # Short name for the project + full_name: "Ostrea chilensis: EpigΓ©noma AnΓ‘lisis en Net" # Full project name + description: "An integrated web platform dedicated to advancing research" + +# Species Information +species: + scientific_name: "Ostrea chilensis" + common_name: "Chilean flat oyster" + description: "the Chilean flat oyster" + emoji: "πŸ¦ͺ" # Optional emoji for branding + +# Website Configuration +website: + title: "OCEAN" + theme: "simplex" + github_repo: "https://github.com/RobertsLab/OCEAN" + contact_message: "For questions or feedback, please submit an issue" + +# Genome Browser Configuration +jbrowse: + version: "v3.6.4" + assemblies: + - name: "Assembly 1.0" + fasta_url: "https://gannet.fish.washington.edu/v1_web/owlshell/bu-github/OCEAN/docs/jbrowse/data/v1/merged_out.fasta.TBtools.fa" + fai_url: "https://gannet.fish.washington.edu/v1_web/owlshell/bu-github/OCEAN/docs/jbrowse/data/v1/merged_out.fasta.TBtools.fa.fai" + - name: "Och_HapA" + fasta_url: "https://gannet.fish.washington.edu/v1_web/owlshell/bu-github/OCEAN/docs/jbrowse/data/HapA/Och_HapA_assembly.fa" + fai_url: "https://gannet.fish.washington.edu/v1_web/owlshell/bu-github/OCEAN/docs/jbrowse/data/HapA/Och_HapA_assembly.fa.fai" + +# Data Sources (URLs to annotation and other data files) +data_sources: + # Annotation data + nr_blast: "http://gannet.fish.washington.edu/seashell/snaps/nr.csv" + uniprot: "http://gannet.fish.washington.edu/seashell/snaps/uniprot.txt" + pathways: "http://gannet.fish.washington.edu/seashell/snaps/pathway_table.csv" + gene_gff: "http://gannet.fish.washington.edu/seashell/snaps/gene.gff3" + + # Genome browser specific + gene_annotations: "data/v1/GN.gene.gff3" + ncrna_annotations: "data/v1/ncRNA.gff3" + repeat_annotations: "https://gannet.fish.washington.edu/v1_web/owlshell/bu-github/OCEAN/docs/jbrowse/data/v1/repeat.gff3" + +# Population/Study Information (customize or remove sections as needed) +population_info: + enabled: true + title: "Population Information" + description: "Our focus to date is from oysters from three locales" + + # Location data + map_embed: "https://www.google.com/maps/d/u/0/embed?mid=19nWyrDGOjq34Fy3nmXB-walLF1yjNdY&ehbc=2E312F&noprof=1" + + locations: + - name: "RΓ­o Pudeto" + salinity: "Moderate (brackish)" + freshwater_input: "Moderate river inflow" + tidal_exchange: "Moderate" + human_influence: "Moderate" + - name: "PumalΓ­n" + salinity: "Low (oligohaline)" + freshwater_input: "High (glacial & river)" + tidal_exchange: "Low" + human_influence: "Low (pristine)" + - name: "Isla Quihua" + salinity: "High (marine)" + freshwater_input: "Low to moderate" + tidal_exchange: "High" + human_influence: "Moderate to high" + + # Genetic relatedness + genetic_relatedness: + enabled: true + image_url: "http://gannet.fish.washington.edu/seashell/snaps/Monosnap_Monosnap_2025-08-03_15-06-38.png" + +# Featured Views (for genome browser) +featured_views: + - title: "Gene Expression following Heat Exposure" + url: "https://robertslab.github.io/OCEAN/jbrowse/?session=share-bS-rRTROpE&password=r9vWX" + - title: "DEG MSTRG.1743.1 zoom" + url: "https://oystergen.es/jbrowse/?session=share-fpBeSAviSo&password=scJyF" + +# Custom branding +branding: + logo_image: "img/ocean.png" + celebration_message: "Β‘DiviΓ©rtete explorando el ocΓ©ano!" # Optional custom message + platform_features: + - "🧬 Genome browser for {species.scientific_name}" + - "πŸ” Epigenetic data" + - "πŸ“Š Interactive tools for gene expression and regulatory region analysis" + - "🧠 Curated annotations including GO terms and pathway associations" + - "🌐 Data sharing & collaboration hub for the research community" \ No newline at end of file diff --git a/templates/_quarto.yml.template b/templates/_quarto.yml.template new file mode 100644 index 0000000..07b46cb --- /dev/null +++ b/templates/_quarto.yml.template @@ -0,0 +1,24 @@ +project: + type: website + output-dir: ../docs # Goes to root-level docs/ folder + +website: + title: "{{ website.title }}" + navbar: + left: + - text: About + file: about.qmd + - text: Annotations + href: explore.qmd + - text: Genome Browser + href: jbrowse/index.html + right: + - icon: github + href: {{ website.github_repo }} + aria-label: GitHub + +format: + html: + theme: {{ website.theme }} + css: styles.css + toc: true \ No newline at end of file diff --git a/templates/about.qmd.template b/templates/about.qmd.template new file mode 100644 index 0000000..97ba150 --- /dev/null +++ b/templates/about.qmd.template @@ -0,0 +1,58 @@ +--- +title: "About" +format: html +editor: + markdown: + wrap: 72 +--- + +{{project.name}} is an integrated web platform dedicated to advancing research on +*{{species.scientific_name}}*, {{species.description}}. + +The platform offers:\ +{% for feature in branding.platform_features %} +β€’ {{feature}}\ +{% endfor %} + +::: badges +

+ + + +

+::: + +------------------------------------------------------------------------ + +# Featured Genome Browser Views πŸ‘€ + +{% for view in featured_views %} +- ### [{{ view.title }}]({{ view.url }}) +{% endfor %} + +------------------------------------------------------------------------ + +{% if population_info.enabled %} +# {{ population_info.title }} + +## Site Location Information + +{{ population_info.description }} + + + +| Location | Salinity | Freshwater Input | Tidal Exchange | Human Influence | +|--------------|--------------|----------------|--------------|--------------| +{% for location in population_info.locations %} +| {{ location.name }} | {{ location.salinity }} | {{ location.freshwater_input }} | {{ location.tidal_exchange }} | {{ location.human_influence }} | +{% endfor %} + +{% if population_info.genetic_relatedness.enabled %} +## Genetic Relatedness + +![]({{ population_info.genetic_relatedness.image_url }}) +{% endif %} + +{% endif %} \ No newline at end of file diff --git a/templates/browse.qmd.template b/templates/browse.qmd.template new file mode 100644 index 0000000..77c6e23 --- /dev/null +++ b/templates/browse.qmd.template @@ -0,0 +1,7 @@ +## Explore the Genome + +::: {.iframe-container} + + + +::: \ No newline at end of file diff --git a/templates/explore.qmd.template b/templates/explore.qmd.template new file mode 100644 index 0000000..b1a94b2 --- /dev/null +++ b/templates/explore.qmd.template @@ -0,0 +1,113 @@ +--- +title: "Genome Annotation" +format: html +editor: visual +execute: + echo: true + warning: false + message: false +--- + +```{r setup, include=FALSE} +library(tidyverse) +library(DT) +library(readxl) +library(plotly) +library(readr) +library(dplyr) +``` + +## NCBI NR BLAST + +```{r, echo=FALSE} +# Read the CSV file +nr_data <- read_csv("{{ data_sources.nr_blast }}") + +# Subset the relevant columns +nr_subset <- nr_data[, c("query_name", "nr.hit_name", "description", "E value")] + +# Create interactive datatable +datatable( + nr_subset, + extensions = 'Buttons', + options = list( + dom = 'Bfrtip', + buttons = c('copy', 'csv', 'excel', 'pdf', 'print'), + pageLength = 4 + ), + filter = "top", + rownames = FALSE +) +``` + +## UniProt + +```{r, echo=FALSE} +# Read the tab-delimited file from the URL +uniprot_data <- read_tsv("{{ data_sources.uniprot }}") + +# Display as an interactive datatable +datatable( + uniprot_data, + extensions = 'Buttons', + options = list( + dom = 'Bfrtip', + buttons = c('copy', 'csv', 'excel', 'pdf', 'print'), + pageLength = 4 + ), + filter = "top", + rownames = FALSE +) +``` + +## Pathways + +```{r, echo=FALSE} +# Read your Pathway CSV +# Read the CSV +df <- read_csv("{{ data_sources.pathways }}") + +# Create clickable link column +df <- df %>% + mutate(Link = paste0( + "", PathwayId, "" + )) + +# Move Link column to position 3 +df <- df %>% + select(1:2, Link, everything()) + +# Display interactive table +datatable( + df, + escape = FALSE, + extensions = 'Buttons', + options = list( + dom = 'Bfrtip', + buttons = c('copy', 'csv', 'excel', 'pdf', 'print'), + pageLength = 6 + ) +) +``` + +## Gene GFF + +```{r, echo=FALSE} +# Read the GFF3 file (skip comment lines starting with "#") +gff <- read_tsv( + "{{ data_sources.gene_gff }}", + comment = "#", + col_names = c("seqid", "source", "type", "start", "end", "score", "strand", "phase", "attributes") +) + +datatable( + gff, + extensions = 'Buttons', + options = list( + dom = 'Bfrtip', + buttons = c('copy', 'csv', 'excel', 'pdf', 'print'), + pageLength = 6 + ), + filter = "top" +) +``` \ No newline at end of file diff --git a/templates/index.qmd.template b/templates/index.qmd.template new file mode 100644 index 0000000..620894a --- /dev/null +++ b/templates/index.qmd.template @@ -0,0 +1,32 @@ +--- +title: "Explore the Ocean" +format: html +editor: + markdown: + wrap: 72 +--- + +{{ project.full_name }} +is an integrated web platform dedicated to advancing research on *{{ species.scientific_name }}*, {{ species.description }}. By combining high-resolution +genomic data with cutting-edge epigenomic insights, {{ project.name }} supports +researchers exploring gene regulation, environmental plasticity, and +resilience in this ecologically important marine species. + +::: {style="text-align: center; font-size: 1.5em; margin-top: 2em;"} +{{ branding.celebration_message }} +::: + +![]({{ branding.logo_image }}){fig-align="center" +width="450px"} + +::: {style="display: flex; align-items: center; gap: 1em;"} +[πŸ‘‰ Explore JBrowse Genome +Browser]{style="font-size: 1.1em;"} + + + + +::: + +{{ website.contact_message }}, [please submit an +issue]({{ website.github_repo }}/issues/new/choose) 🧬 \ No newline at end of file diff --git a/templates/jbrowse-config.json.template b/templates/jbrowse-config.json.template new file mode 100644 index 0000000..b42ec96 --- /dev/null +++ b/templates/jbrowse-config.json.template @@ -0,0 +1,63 @@ +{ + "baseUrl": "jbrowse/", + "assemblies": [ + {% for assembly in jbrowse.assemblies %} + { + "name": "{{ assembly.name }}", + "sequence": { + "type": "ReferenceSequenceTrack", + "trackId": "{{ assembly.name | lower | replace(' ', '_') }}_refseq", + "adapter": { + "type": "IndexedFastaAdapter", + "fastaLocation": { + "uri": "{{ assembly.fasta_url }}" + }, + "faiLocation": { + "uri": "{{ assembly.fai_url }}" + } + } + } + }{% if not loop.last %},{% endif %} + {% endfor %} + ], + "tracks": [ + { + "type": "FeatureTrack", + "trackId": "gene_annotations", + "name": "Genes", + "assemblyNames": ["{{ jbrowse.assemblies[0].name }}"], + "adapter": { + "type": "Gff3Adapter", + "gffLocation": { "uri": "{{ data_sources.gene_annotations }}" } + } + }, + { + "type": "FeatureTrack", + "trackId": "ncrna_annotations", + "name": "ncRNA", + "assemblyNames": ["{{ jbrowse.assemblies[0].name }}"], + "adapter": { + "type": "Gff3Adapter", + "gffLocation": { "uri": "{{ data_sources.ncrna_annotations }}" } + } + }, + { + "type": "FeatureTrack", + "trackId": "repeat_annotations", + "name": "Repeats", + "assemblyNames": ["{{ jbrowse.assemblies[0].name }}"], + "adapter": { + "type": "Gff3Adapter", + "gffLocation": { + "uri": "{{ data_sources.repeat_annotations }}" + } + } + } + ], + "plugins": [ + { + "name": "UMDUrlPlugin", + "url": "umd_plugin.js" + } + ] +} \ No newline at end of file