Skip to content

Tutorial

chrismit edited this page Oct 21, 2012 · 4 revisions

Basic Usage

File iterators:

X! Tandem files are encoded as xml. An iterator can be built like so for X!Tandem:

from Bio.Proteomics import ProteomicIterators as PI

f = open("C:\Users\Chris\SpectraViewer\A1.2012_06_07_12_20_00.t.xml")
for scan in PI.XTandemXMLIterator(f):
    print scan

There is also a "guesser" class:

from Bio.Proteomics import ProteomicIterators as PI

f = open("C:\Users\Chris\SpectraViewer\A1.2012_06_07_12_20_00.t.xml")
for scan in PI.AnyIterator(f):
    print scan

There are three available iterators: XTandemXMLIterator, MGFIterator, ThermoMSFIterator. Each one takes as an argument a filename or file object. ThermoMSFIterator has optional keywords of:

full=True/False(False default) to parse all information in, much slower
confidence=1(1 default) minimum confidence level
rank=1(1 default) minimun search engine rank confidence

Parsing an msf file

There are two ways to parse msf files, due to the dramatic speed differences. There is a lightweight parser that only retains information that is present through using only sql operations on the msf file. This is useful for finding the titles/modifications/peptides without caring about the m/z values which requires unzipping strings within the sql database. This adds a lot of time.

Here is a sample usage of the lightweight variant:

from Bio.Proteomics import ProteomicIterators as PI

f = "C:\Users\Chris\SpectraViewer\L484_Bart_081022A_Reverse_49.msf"
for scan in PI.AnyIterator(f):
    print scan.title, scan.peptide, scan.accession

Here is an example using the full parser settings:

from Bio.Proteomics import ProteomicIterators as PI

f = "C:\Users\Chris\SpectraViewer\L484_Bart_081022A_Reverse_49.msf"
for scan in PI.AnyIterator(f,full=True):
    print scan.title, scan.peptide, scan.accession
    print scan.scans

By default, the msf iterator cares only about peptides which have a confidence level and rank of 1. You may change those options with confidence/rank keywords.

Scan/Peptide Object

Scan Object

A scan object usually has the following attributes:

title - string giving scan title
scans - list of tuples of (mz,intensity) values
mass - precursor mass
charge - precursor charge
rt - retention time (if provided)

Attributes found that aren't known will be added as additional attributes.

Peptide Object

A peptide object usually has the following attributes. It inherits the scan object, so all attributes there are applicable here:

mods - modifications in a list of tuples like (amino acid modified, amino acid position, modification mass, short name for modification (if its in our table)
peptide - matched peptide
accession - protein accession number assigned
expect - expectation value of peptide match
id - title of scan in most cases

Clone this wiki locally