Here you can find the sample data used as example input for the PIQMIe service, and the pre-computed results including the database obtained by executing this service.

Description

The data set was described in the published study on bone formation by Alves et al., 2013. The raw MS data were made available by the authors and then re-analyzed using the MaxQuant software (version 1.3.0.5) with the same parameter settings. However, for the MaxQuant/Andromeda search we used a more recent human FASTA sequence library from the UniProtKB database (release 2013_11) instead of the discontinued IPI database.

In this study the effects of activin A on human mesenchymal stem cells (hMSC) derived osteoblast differentiation and mineralization was investigated using semi-quantitative mass spectrometry (MS)-based proteomics with duplex SILAC metabolic labeling. Specifically, it involved the reciprocal (inverse) labeling strategy, in which both the activin A treated and control samples were cultured on light and heavy istotope-enriched culture media, to obtain more reliable quantitative data compared to a single-experiment approach. The analysis focused on the composition and changes in the extracellular compartments, namely the extracellular matrix (ECM) and matrix vesicles (MVs).


Table. List of SILAC experiments and conditions.
Experiment Light Heavy
ECM_LACT_HCTR activin A control
ECM_LCTR_HACT control activin A
MV_LACT_HCTR activin A control
MV_LCTR_HACT control activin A

Content:


Service description

PIQMIe (Proteomics Identifications & Quantitations Data Management & Integration Service) is a web-based tool that aids in the data management, analysis and visualization of semi-quantitative mass spectrometry (MS)-based proteomics experiments, in particular those using the stable isotope labeling with amino acids in cell culture (SILAC). In a typical SILAC proteome experiment, tens of thousands of peptides and thousands of (non-redundant) proteins are reliably identified and quantified from raw MS data, for example, using the MaxQuant software. Further downstream analyses of the resulting files are commonly carried out using stand-alone spreadsheet tools. Although such tools are useful they do not provide full control over the data as compared to a data management or information retrieval system. Moreover, they are prone to data-manipulation errors, lack interoperability and scalability. To remedy this, we have developed PIQMIe and deployed it on a scalable Cloud computing infrastructure operated by the Dutch national HPC and e-Science support center.

Prerequisites

To make a full use of the PIQMIe service through the web interface, your web browser must be Javascript and Pop-Up Windows enabled as well as support Scalable Vector Graphics (SVG). The web interface has been tested and works on all major browsers, namely Firefox (v26), Safari (v6.1), Opera (v12.16), Chrome (v31) and IE (v11). We recommend, however, to use the Firefox browser in combination with the SQLite Manager extension (add-on), which enables post-processed data set(s) to be queried locally.
Note for Opera users: the charts are not shown by default; to enable these you need to change your browser settings: go to Preferences → Advanced → Downloads → untick 'Hide file types opened with Opera' → delete the entry 'application/json'.

Workflow

Following the 'PIQMIe' link in the upper left panel (tab) you will find a data submission form with four mandatoryinput fields. PIQMIe requires the result files of a MaxQuant execution, namely evidence.txt and proteinGroups.txt (corresponding to peptide- and protein-level identifications/quantitations, respectively), and protein sequences in a FASTA-formatted library from the UniProtKB database. The data files are then transformed into a single-file cross-platform Integrated Proteomics database (Figure 1).

PIQMIe workflow
Figure 1. Computational proteomics workflow including the PIQMIe service.

In this relational database peptide and protein (group) identifications/quantitations from multiple experiments are readily stored and integrated with additional biological information such as official gene symbols, protein evidence, molecular functions, post-translational modifications etc. (Figure 2).

ER diagram of the database
Figure 2. Entity-relationship diagram of the proteomics database. Note: Only the core tables are shown here.

In principle, other software could also be used to prepare the peptide and protein (group) lists; however, their formats must comply (partially) to those of the MaxQuant result files. Each list must be a tab-delimited text file with a header line containing the following columns (case-insensitive and order-independent):

Table 1. List of fields parsed from evidence.txt file (peptide list).
Column name Mandatory
[1] Proteins Yes
Leading proteins Yes
Modified sequence Yes
Charge Yes
Mass Yes
Calibrated retention time Yes
Raw file Yes
PEP Yes
Reverse Yes
Contaminant Yes
Resolution No
Intensity L <experiment name> No
Intensity M <experiment name> No
Intensity H <experiment name> No
Ratio H/L <experiment name> No
Ratio H/M <experiment name> No
Ratio M/L <experiment name> No
Ratio H/L normalized <experiment name> Yes
Ratio H/M normalized <experiment name> Yes
Ratio M/L normalized <experiment name> Yes
[1] the column must contain exclusively UniProtKB accession(s) in the form acc1;acc2;...
Table 2. List of fields parsed from proteinGroups.txt file (protein list).
Column name Mandatory
[1] Protein IDs Yes
PEP Yes
Reverse Yes
Contaminant Yes
Intensity L <experiment name> No
Intensity M <experiment name> No
Intensity H <experiment name> No
Ratio H/L <experiment name> No
Ratio H/M <experiment name> No
Ratio M/L <experiment name> No
Ratio H/L normalized <experiment name> Yes
Ratio H/M normalized <experiment name> Yes
Ratio M/L normalized <experiment name> Yes
Ratio H/L variability [%] <experiment name> Yes
Ratio H/M variability [%] <experiment name> Yes
Ratio M/L variability [%] <experiment name> Yes
Ratio H/L count <experiment name> Yes
Ratio H/M count <experiment name> Yes
Ratio M/L count <experiment name> Yes
[2] Ratio H/L normalized <experiment name> significance B No
[2] Ratio H/M normalized <experiment name> significance B No
[2] Ratio M/L normalized <experiment name> significance B No
[1] the column must contain exclusively UniProtKB accession(s) in the form acc1;acc2;...
[2] the column can be added e.g., using the Perseus software.

Data processing

After pressing the button and successful client-side verification of the input (i.e., file extension, file name and MIME type), PIQMIe proceeds with uploading the user's data to the server. The progress of the submitted data is indicated with a progress bar and accompanied messages:
  • Uploading the data...X%'
  • Uploading completed.
  • Processing the data. This may take several minutes...
  • Processing completed.
  • Done! Click here to open the results page.
  • - after successful processing a link with a unique jobID (40 alphanumeric chars) is provided to the results page,
  • Failed! Show the error message.
  • - a server-side error is raised if an input file exceeds the per-file size limit or the file has incorrect format etc. (for details refer to the submission page or to the Workflow section).

Results

A user is presented with a concise summary of the proteomics experiment(s) in numerical and graphical forms, as well as with a searchable grid of identified/quantified proteins and interactive visualization tools, namely the peptide coverage map and 2D scatterplot of protein quantitations, to aid the identification of potentially regulated proteins or interacting partners. The results page is organized in the following links (tabs) in the upper panel:
  • Download - this is a download area including the indexed data in the form of an SQLite database
  • Peptides - provides a summary of peptide-level identifications/quantitations with the following quality-control (QC) measures:
    • n_pep_ids - number of redundant peptide identifications, filtered for decoys and contaminants
    • n_pep_qts - number of redundant peptide quantitations
    • n_unq_pep_seq+mod_ids - number of non-redundant peptide identifications unique by sequence and modifications
    • n_unq_pep_seq+mod_qts - number of non-redundant peptide quantitations unique by sequence and modifications
    • n_unq_pep_seq_ids - number of non-redundant peptide identifications unique by sequence
    • n_unq_pep_seq_qts - number of non-redundant peptide quantitations unique by sequence
    • n_pep_ids_decoys - number of redundant peptides detected as decoys (false positives)
    • n_pep_ids_conts - number of redundant peptides detected as contaminants
    • n_unq_pep_seq_decoys - number of non-redundant peptide decoys unique by sequence
    • n_unq_pep_seq_conts - number of non-redundant peptide contaminants unique by sequence
  • Proteins - provides a summary of database-dependent protein identifications (e.g., given the human proteome) with the following QC measures:
    • db - source database/section (UniProtKB/Swiss-Prot or UniProtKB/TrEMBL)
    • n_prot_acc - number of protein accessions including isoforms in the source database (or FASTA sequence library)
    • n_prot_ids - number of MS-based protein identifications (including isoforms and excluding decoys and contaminants)
    • n_prot_acc_evid_protein - number of protein accessions with protein-level evidence
    • n_prot_acc_evid_transcript - number of protein accessions with transcript-level evidence
    • n_prot_acc_evid_homology - number of protein accessions with homology-based evidence
    • n_prot_acc_evid_predicted - number of in silico predicted proteins
    • n_prot_acc_evid_uncertain - number of protein accessions with uncertain evidence
  • Protein Groups - provides a summary of protein (group)-level identifications/quantitations with the following QC measures:
    • exp_name - experiment name
    • n_pgrp_ids - number of non-redundant protein identifications, filtered for decoys and contaminants
    • n_pgrp_qts - number of non-redundant protein quantitations
    • n_pgrp_ids_by_site - number of non-redudant proteins identified by modification site
    • n_pgrp_decoys - number of non-redundant proteins detected as decoys (false positives)
    • n_pgrp_conts - number of non-redundant proteins detected as contaminants
  • Regulated Proteins - provides a summary of potentially regulated non-redundant proteins using permissive cutoffs i.e. protein fold-change based on normalized ratio (FC≥1.5) and peak intensity-based ratio significance B (P<0.05) with the following QC measures:
    • n_pgrp_ids - union of differentially regulated proteins identified in all conditions, filtered for decoys and contaminants
    • n_pgrp_ids_H/L+L/H - number of up- AND down-regulated proteins identified in both conditions H/L and L/H
    • n_pgrp_ids_H/L - number of up- OR down-regulated proteins identified in the H/L condition
    • n_pgrp_ids_L/H - number of up- OR down-regulated proteins identified in the L/H condition
    • n_pgrp_ids_H/M+M/H - number of up- AND down-regulated proteins identified in both conditions H/M and M/H
    • n_pgrp_ids_H/M - number of up- OR down-regulated proteins identified in the H/M condition
    • n_pgrp_ids_M/H - number of up- OR down-regulated proteins identified in the M/H condition
    • n_pgrp_ids_M/L+L/M - number of up- AND down-regulated proteins identified in both conditions M/L and L/M
    • n_pgrp_ids_M/L - number of up- OR down-regulated proteins identified in the M/L condition
    • n_pgrp_ids_L/M - number of up- OR down-regulated proteins identified in the L/M condition
  • Search Grid - enables to view, sort and filter the identified/quantified proteins using one or more criteria (columns) with Boolean and/or relational operators (Figure 3), links the protein accessions and groups to UniProtKB for additional biological information and to the Peptide coverage map, respectively (Figure 4).

    Example search
    Figure 3. An example query made in the Query Builder to select metalloproteinases of the ECM down-regulated by activin A signaling.

    Peptide coverage map of the ALPL proteins
    Figure 4. Identified peptides (in red) are shown on parent proteins from the curated Swiss-Prot (in blue) and uncurated TrEMBL (in gray) sections of the UniProtKB database. One of the protein accessions (UniProtKB: P05186; in green) is selected as the best scoring or leading protein of the group.

  • Scatterplot - enables to view quantitative data (e.g, SILAC protein ratios) for two experiments simultaneously in order to identify differentially regulated proteins with higher confidence compared to a single experiment (Figure 5)

  • Scatterplot
    Figure 5. Scatterplot of quantitative data for duplex SILAC reciprocal (label-swap) experiments. For example, the protein group (ID: 258) annotated as UTP-glucose-1-phosphate uridyltransferase (UGP2) is consistently up-regulated by activin A signaling in both experiments.


Data access and privacy

Each user can access only his/her own results via the use of unique jobID in the URL:
  • remotely via the website

    http://piqmie.semiqprot-emc.cloudlet.sara.nl/results/{your jobID}

  • remotely via the programmatic RESTful web service (results are returned in JSON format):
    • ../rest/{your jobID}/statpep same data used in the Peptides tab
    • ../rest/{your jobID}/statprot same data used in the Proteins tab
    • ../rest/{your jobID}/statgrp same data used in the Protein Groups tab
    • ../rest/{your jobID}/statregrp same data used in the Regulated Proteins tab
    • ../rest/{your jobID}/grp same data used in the Search Grid tab
    • ../rest/{your jobID}/pepmap?grp_id={group ID}&exp={experiment name} same data used in the Peptide coverage map
    • ../rest/{your jobID}/cmpquant?exp1={experiment name}&exp2={experiment name}&ratio1={H/L, L/H,...}&ratio2={H/L, L/H,...}&cfratio=[1-10]&cfpvalue=[0-1] same data used in the Scatterplot tab
  • locally, by querying the downloaded database via the Structured Query Language (SQL).
We ensure that your data are kept confidential and are deleted after one week from the uploading date.
Proteomics Identifications & Quantitations
Data Management & Integration Service
Briefly describe your input data set.
Select the MaxQuant evidence.txt file (max. size 500 MB).
Select the MaxQuant proteinGroups.txt file (max. size 50 MB).
Select the same FASTA sequence library from the UniProtKB database as used for the MaxQuant/Andromeda search (max. size 100 MB).
The latest proteomes can be found here.
Important: The sequence library must contain exclusively UniProtKB headers. MaxQuant result files obtained by Andromeda searches against a FASTA library from the discontinued IPI database are not supported!!!