-
Notifications
You must be signed in to change notification settings - Fork 0
Tutorial
X! Tandem files are encoded as xml. An iterator can be built like so for X!Tandem:
from Bio.Proteomics import ProteomicIterators as PI
f = open("C:\Users\Chris\SpectraViewer\A1.2012_06_07_12_20_00.t.xml")
for scan in PI.XTandemXMLIterator(f):
print scan
There is also a "guesser" class:
from Bio.Proteomics import ProteomicIterators as PI
f = open("C:\Users\Chris\SpectraViewer\A1.2012_06_07_12_20_00.t.xml")
for scan in PI.AnyIterator(f):
print scan
There are three available iterators: XTandemXMLIterator, MGFIterator, ThermoMSFIterator. Each one takes as an argument a filename or file object. ThermoMSFIterator has optional keywords of:
full=True/False(False default) to parse all information in, much slower
confidence=1(1 default) minimum confidence level
rank=1(1 default) minimun search engine rank confidence
There are two ways to parse msf files, due to the dramatic speed differences. There is a lightweight parser that only retains information that is present through using only sql operations on the msf file. This is useful for finding the titles/modifications/peptides without caring about the m/z values which requires unzipping strings within the sql database. This adds a lot of time.
Here is a sample usage of the lightweight variant:
from Bio.Proteomics import ProteomicIterators as PI
f = "C:\Users\Chris\SpectraViewer\L484_Bart_081022A_Reverse_49.msf"
for scan in PI.AnyIterator(f):
print scan.title, scan.peptide, scan.accession
Here is an example using the full parser settings:
from Bio.Proteomics import ProteomicIterators as PI
f = "C:\Users\Chris\SpectraViewer\L484_Bart_081022A_Reverse_49.msf"
for scan in PI.AnyIterator(f,full=True):
print scan.title, scan.peptide, scan.accession
print scan.scans
By default, the msf iterator cares only about peptides which have a confidence level and rank of 1. You may change those options with confidence/rank keywords.
A scan object usually has the following attributes:
title - string giving scan title
scans - list of tuples of (mz,intensity) values
mass - precursor mass
charge - precursor charge
rt - retention time (if provided)
Attributes found that aren't known will be added as additional attributes.
A peptide object usually has the following attributes. It inherits the scan object, so all attributes there are applicable here:
mods - modifications in a list of tuples like (amino acid modified, amino acid position, modification mass, short name for modification (if its in our table)
peptide - matched peptide
accession - protein accession number assigned
expect - expectation value of peptide match
id - title of scan in most cases