Peptide Identification Statistics Pin the tail on the jackass? US HUPO: Bioinformatics for Proteomics Nathan Edwards – March 12, 2005
Slide 2Peptide Identification Peptide discontinuity by CID is ineffectively comprehended MS/MS spectra speak to fragmented data about amino-corrosive succession I/L, K/Q, GG/N, … Correct distinguishing pieces of proof don't accompany a declaration! US HUPO: Bioinformatics for Proteomics
Slide 3Peptide Identification High-throughput work processes request we examine constantly. Spectra may not contain enough data to be deciphered accurately … terrible static on a wireless Peptides may not coordinate our suspicions … its all Greek to me "Don't have the foggiest idea" is a worthy reply! US HUPO: Bioinformatics for Proteomics
Slide 4Peptide Identification We can't demonstrate we are correct… … so would we be able to demonstrate we aren't off-base? US HUPO: Bioinformatics for Proteomics
Slide 5Peptide Identification We can't demonstrate we are correct… … so would we be able to demonstrate we aren't off-base? NO! US HUPO: Bioinformatics for Proteomics
Slide 6Peptide Identification We can't demonstrate we are correct… … so would we be able to demonstrate we aren't off-base? All the better we can do is to demonstrate our answer is superior to speculating! NO! US HUPO: Bioinformatics for Proteomics
Slide 7Better than speculating… Better infers correlation Score or measure of level of achievement Guessing suggests haphazardness Probability and insights US HUPO: Bioinformatics for Proteomics
Slide 8Pin the tail on the jackass… US HUPO: Bioinformatics for Proteomics
Slide 9Throwing darts One at once Blindfolded Identically circulated? Uniform appropriation? Totally unrelated? Free? Pr [ Dart hits x ] = 0.05 Probability Concepts US HUPO: Bioinformatics for Proteomics
Slide 10Probability Concepts Throwing darts One at once Blindfolded Three darts Pr [Hitting 20 3 times] = 0.05 * 0.05 * 0.05 Pr [Hit 20 at any rate twice] = 0.007125 + 0.000125 US HUPO: Bioinformatics for Proteomics
Slide 11Probability Concepts US HUPO: Bioinformatics for Proteomics
Slide 12Probability Concepts Throwing darts One at once Blindfolded Three darts Pr [Hitting levels 3 times] = Pr [Hitting 1-10 3 times] = 0.5 * 0.5 * 0.5 Pr [Evens in any event twice] = 0.5 US HUPO: Bioinformatics for Proteomics
Slide 13Probability Concepts US HUPO: Bioinformatics for Proteomics
Slide 14Probability Concepts Throwing darts One at once Blindfolded 100 darts Pr [Hitting 20 3 times] = 0.139575 Pr [Hit 20 at any rate twice] = 0.9629188 US HUPO: Bioinformatics for Proteomics
Slide 15Probability Concepts US HUPO: Bioinformatics for Proteomics
Slide 16Match Score Dartboard is crests in a range Each dash is a peptide piece Pr [ Match ≥ s tops ] = Binomial( p , n ) ≈ Poisson( p n ), for little p and vast n p is prob. of part/top match, n is number of sections US HUPO: Bioinformatics for Proteomics
Slide 17Match Score Theoretical circulation Used by OMSSA Proposed, in different structures, by numerous. Likelihood of piece/crest coordinate IID (autonomous, indistinguishably appropriated) Based on match resilience Can utilize sections or tops as darts! US HUPO: Bioinformatics for Proteomics
Slide 18Match Score Theoretical appropriation presumptions Each shoot is autonomous Peaks are not "related" Each dash is indistinguishably disseminated Chance of piece/top match is the same for all pinnacles and parts US HUPO: Bioinformatics for Proteomics
Slide 19Tournament Size 100 individuals 1000 individuals 100 Darts, # 20's 100000 individuals 10000 individuals US HUPO: Bioinformatics for Proteomics
Slide 20Tournament Size 100 individuals 1000 individuals 100 Darts, # 20's 100000 individuals 10000 individuals US HUPO: Bioinformatics for Proteomics
Slide 21Number of Trials Tournament estimate == number of trials Number of peptides attempted Related to succession database measure Probability that an arbitrary match score is ≥ s 1 – Pr [ all match scores < s ] 1 – Pr [ coordinate score < s ] Trials (*) Assumes IID! Expect esteem E = Trials * Pr [ coordinate ≥ s ] Corresponds to Bonferroni bound on (*) US HUPO: Bioinformatics for Proteomics
Slide 22Better Dart Throwers US HUPO: Bioinformatics for Proteomics
Slide 23Better Random Models Comparison with totally arbitrary model isn't generally reasonable Match scores for genuine spectra with genuine peptides obey administers Even mistaken peptides coordinate with non-irregular structure! US HUPO: Bioinformatics for Proteomics
Slide 24Better Random Models Want to produce irregular part masses (shoots) that carry on more like the genuine article: Some sections are more probable than others Some pieces rely on upon others Theoretical models can just fuse this structure to a constrained degree. Can't display the properties of a specific peptide! Must catch conduct of sections as a rule US HUPO: Bioinformatics for Proteomics
Slide 25Better Random Models Generate arbitrary peptides Real looking part masses No hypothetical model! Must utilize observational dissemination Usually require they have the right forerunner mass Score capacity can demonstrate anything we like! US HUPO: Bioinformatics for Proteomics
Slide 26Better Random Models Fenyo & Beavis, Anal. Chem., 2003 US HUPO: Bioinformatics for Proteomics
Slide 27Better Random Models Fenyo & Beavis, Anal. Chem., 2003 US HUPO: Bioinformatics for Proteomics
Slide 28Better Random Models Truly irregular peptides don't look much like genuine peptides Just utilize peptides from the succession database! Admonitions: Correct peptide (non-arbitrary) might be incorporated Peptides are not autonomous Reverse arrangement stays away from just the primary issue US HUPO: Bioinformatics for Proteomics
Slide 29Extrapolating from the Empirical Distribution Fenyo & Beavis, Anal. Chem., 2003 US HUPO: Bioinformatics for Proteomics
Slide 30Extrapolating from the Empirical Distribution Often, the exact shape is reliable with a hypothetical model Fenyo & Beavis, Anal. Chem., 2003 Geer et al., J. Proteome Research, 2004 US HUPO: Bioinformatics for Proteomics
Slide 31Peptide Prophet From the Institute for Systems Biology Keller et al., Anal. Chem. 2002 Re-examination of SEQUEST results Spectra are trials (NOT peptides!) Assumes that a considerable lot of the spectra are not accurately distinguished US HUPO: Bioinformatics for Proteomics
Slide 32Peptide Prophet Keller et al., Anal. Chem. 2002 Distribution of otherworldly scores in the outcomes US HUPO: Bioinformatics for Proteomics
Slide 33Peptide Prophet Assumes a bimodal appropriation of scores, with a specific shape Ignores database estimate … yet it is incorporated verifiably Like exact circulation for peptide testing, can be connected to any score capacity Can be connected to any internet searchers' outcomes US HUPO: Bioinformatics for Proteomics
Slide 34Peptide Prophet Caveats Are spectra scores inspected from a similar dispersion? Is there enough right distinguishing pieces of proof for second pinnacle? Are spectra autonomous perceptions? Are dispersions properly formed? Enormous change over crude SEQUEST comes about US HUPO: Bioinformatics for Proteomics
Slide 35Peptides to Proteins Nesvizhskii et al., Anal. Chem. 2003 US HUPO: Bioinformatics for Proteomics
Slide 36Peptides to Proteins US HUPO: Bioinformatics for Proteomics
Slide 37Peptides to Proteins A peptide arrangement may happen in various protein groupings Variants, paralogues, protein families Separation, processing and ionization is not surely knew Proteins in succession database are to a great degree non-irregular, and extremely subordinate US HUPO: Bioinformatics for Proteomics
Slide 38Peptides to Proteins US HUPO: Bioinformatics for Proteomics
Slide 39Peptides to Proteins Mascot Protein score is entirety of peptide scores Assumes peptide recognizable pieces of proof are autonomous! SEQUEST Keeps stand out of the proteins for every peptide? US HUPO: Bioinformatics for Proteomics
Slide 40Peptides to Proteins Peptide Prophet Nesvizhskii, et al. Butt-centric. Chem 2003 Models likelihood that a protein is right in light of Probability that its peptides are right Models likelihood that a peptide is right in view of Probability that its proteins are right Proteins with one high-likelihood peptide are not dispensed with … but rather are down-weighted Assumes recognizable proof probabilities from a similar protein are free (like Mascot) US HUPO: Bioinformatics for Proteomics
Slide 41Peptides to Proteins Best accessible technique, to date, is Protein Prophet. The issue will just deteriorate, as we inquiry variations and isoform successions Proteins don't have a solitary grouping! Peptide recognizable proof is not protein distinguishing proof! US HUPO: Bioinformatics for Proteomics
Slide 42Publication Guidelines US HUPO: Bioinformatics for Proteomics
Slide 43Publication Guidelines Computational parameters Spectral handling Sequence database Search program Statistical examination Number of peptides per protein Each peptide succession checks once! Numerous types of a similar peptide tally once! US HUPO: Bioinformatics for Proteomics
Slide 44Publication Guidelines Single-peptide proteins must be expressly advocated by Peptide succession N and C terminal amino-acids Precursor mass and charge Peptide Scores Multiple types of the peptide numbered once! Natural conclusions in light of single-peptide proteins must demonstrate the range US HUPO: Bioinformatics for Proteomics
Slide 45Publication Guidelines More stringent necessities for PMF information investigation Similar to that for couple mass spectra Management of protein excess Peptides recognized from an alternate animal types? Spectra accommodation energized US HUPO: Bioinformatics for Proteomics
Slide 46Summary Could speculating be as compelling as a pursuit? More theories enhances the best figure Better guessers help us be additionally segregating Independent perceptions just number on the off chance that they are
SPONSORS
SPONSORS
SPONSORS