Skip to content

Transcripts not found when building database for custom genome #610

@mrbertp

Description

@mrbertp

Describe the bug
I am trying to build a snpEff database for my custom genome (S. japonicus, fission yeast, assembly SJ5). I have all genome.fa, cds.fa and protein.fa files ready and checked that their ID match the transcript ID, and also added the genome to the snpEff.config file. When I build the database (java command below) a lot of warnings about transcripts not being found are displayed (see below). Also, the sanity checks for CDS and protein sequences fail with >35-37% of error.

I tried to reproduce the example of database building for the human genome (https://pcingola.github.io/SnpEff/snpeff/build_db/#example-building-the-human-genome-database) and obtained a similar result (trancripts not found).

To Reproduce

  1. SnpEff version: 5.3a (build 2025-09-02 10:24)
  2. Genome version: SJ5
  3. SnpEff full command line: sudo java -jar snpEff.jar build -gtf22 -v SJ5 2>&1 | tee SJ5.build
  4. Output / Error message:
    SJ5.build.txt

Expected behavior
I should be able to build a database with <3% of errors.

Data
Annotation files (cds.fa, genes.gtf, protein.fa):
annotation.zip
Genome:
SJ5.zip

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions