Core Facility — Proteomic Mass Spectrometry
Data Analysis
Here in the core facility we will tailor make analytical pathways according to your needs as a researcher. We have excellent experience with major platforms such as PLGS, Mascot, PEAKS, ProteomeDiscoverer, etc as well as lesser known engines such as MS-GF+, Comet, InsPecT, and MSAmanda. We know modification-centric engines such as Byonic and MSFragger, and quantification-centric platforms such as MaxQuant and Skyline. We have significant experience and qualifications in statistics and can integrate R/Bioconductor packages such as RForProteomics easily into pathways in KNIME, PD or Skyline, as well as using the MaxQuant partner Perseus to complete a thorough analysis of your data.
For most gel band analyses, we would commonly apply a simple Mascot search, however a qualitative analysis of the modification state of a proteome would require a multi-engine workflow with Scaffold data integration.
Below you will find brief descriptions of a few of the analytical tools at our disposal.
Data resulting from mass spectrometric analyses are often varied in nature, however these data are often required to be pooled into a larger array of results, substantially improving quality and robustness. For example, using multiple search engines commonly reduce false positive results and statistically improves search engine performance. Also a comparison between multiple replicates often strengthens identifications of proteins, modifications, and the locations of the modifications. In addition, the ability to combine and integrate large scale quantitative experiments into a single comprehensive format is essential for all biologists who investigate variation within experimental proteomes. Scaffold is an incredibly powerful framework for mass spectrometry-based proteomic analyses. It has the capability to identify regulated isoforms, compare samples to identify biological relevance, create lists of target proteins for further downstream processing, and integrate data from all known search engines. In addition, it can classify proteins by molecular function or organelle, and compare results from other, previously published data.
Byonic is a powerhouse of modification identification. It has in-built algorithms for disulfide bond and crosslink identification, intact glycopeptide identification, can search for sequence variants and localize modification sites. Crucially, the engine has the capability to search the data for any modification, at any residue on any peptide using their trademarked Wildcard Search. Additionally, the software is loaded with multiply patented algorithms such as Modification Fine Control, which allows for dozens of variable modifications to be included in a search without the usual exponential increase in processing time. Byonic is sold alongside Preview, which is a separate algorithm which performs the modern process of ‘first-pass search sampling’ – this is when a fraction of the data is initially searched to estimate MS1 and MS2 tolerances, recalibration of raw data can be performed, and a vast array of PTMs checked for high abundance modifications. Put all together, Byonic is a world leader in peptide and protein identification.
Reliable, reproducible and accurate identification of proteins is a fundamental process within a Mass Spectrometry-based Proteomic Core Facility. The search engine software Mascot is the gold standard in the world of protein identification. The software is server based and so allows for multiple simultaneous proteomic searches to be performed from any user’s PC in the world using simply their browser. Our 24-core license allows both Core facility workers to perform principle identifications, but also allows collaborators and users from differing laboratories in the University, or even from all over the world to re-examine their data at leisure. Mascot also leads the world in allowing developers access to tools, creating incredibly powerful and flexible applications which can be custom made for very specific workflows. It also allows for database generation using nucleotide sequences and environmental sequence tags. Note: Mascot is online, but only usable to core facility users as it requires a log-in. Please contact us to obtain log-in details.
Additionally, Matrix Science have superb FAQs and instructions on their website located *here* which give excellent introductions to understanding fragmentation, interpreting MSMS spectra,…
In their own words: “PEAKS Studio is a software platform with complete solutions for discovery proteomics, including protein identification and quantification, analysis of post-translational modifications (PTMs) and sequence variants (mutations), and peptide/protein de novo sequencing.” It is an extensive package which allows for modularization of the analytical process, allowing users to decide whether algorithms which discover PTMs or sequence variants are included in the pathway or not. It is another powerful iteration of a so-called ‘hybrid’ search strategy which uses sequence tags alongside conventional first-stage MS filter database searching.
There is a fundamental flaw in the analyses of raw mass spectrometry data. And that is: in order to find objects in data, you need prior knowledge of their existence. This impacts an analysis in two ways: 1) you need to know what you are looking for before you start looking, and 2) you can only look for an object which is already known to science. However, nature is not so accommodating, and new forms of modifications are regularly discovered, intimating a ‘dark world’ of biological processes as yet unknown to us. This intrinsically affects mass spectrometry database searching as, in general (barring Byonic’s Wildcard search, and Mascot’s Error tolerant search), any modification to a peptide needs to be initially specified in a separate PTM database of previously known modifications. Spectra with modifications that fall outside of this database are discarded resulting in a substantial loss of information for the researcher. Hence we arrive at what are referred to as ‘blind’ database search engines, of which MODa and MSFragger are the latest instalments. A blind search engine looks for any mass shift to any residue on a peptide sequence (in the case of MSFragger the mass shift is limited to the precursor mass and does not locate to a residue – this results in substantially faster search times). However, there is a significant, and not to be dismissed lightly, drawback to these searches, and that is that the probability space for potential target matches exponentially increases in a combinatorial explosion, which dramatically increases the time needed to perform a search. To counter this, we, in partnership with ITZ, have installed both MODa and MSFragger on the University’s High Performance Cluster (HPC) supercomputer which consists of over 250 virtual cores and operates at ~ 260 Teraflops.
Mass spectrometry of proteins comes in two principle flavours which are typically called ‘bottom-up’ and ‘top-down’ types of workflows. Bottom-up analyses attempt to characterize a protein through measuring smaller chunks, or ‘peptides’ of proteins. These processes are covered through using software packages such as Mascot and Byonic. However, top-down analytical pathways provide critical information about the protein as a whole. Its precise mass can tell a researcher whether the protein is indeed ‘intact’; the number and types of modifications it possesses; is it chemically bonded to other proteins or macromolecules, and combined with chromatography over a time series, how these characteristics can change over time. The software package ‘Intact Mass’ is the only manufacturer-independent software application which can provide very high resolution analytics to the overwhelmingly complex raw data produced by top-down experiments. It is a crucial piece of infrastructure for a Proteomic Core Facility.