A Phosphoproteomics Data Resource for Systems-level Modeling of Kinase Signaling Networks

Song Feng, James A. Sanford, Thomas Weber, Chelsea M. Hutchinson-Bunch, Panshak P. Dakup, Vanessa L. Paurus, Kwame Attah, Herbert M. Sauro, Wei-Jun Qian,H. Steven Wiley

Preprint posted on 3 August 2023

Discovering the dynamic world of the EGFR-MAPK phosphoproteome: Feng, Sanford and colleagues developed a multiplexed deep phosphoproteome profiling workflow unveiling 4500 protein sites exhibiting increased phosphorylation upon EGF stimulation.

Selected by Benjamin Dominik Maier


EGFR-MAPK signalling pathway

EGFR/MAPK signalling is one of the most studied signalling pathways and regulates various cellular processes such as cell growth, proliferation, and differentiation (Oda et al., 2005;Wee & Wang, 2017). Upon activation of the signalling pathway by epidermal growth factors (EGF) binding to epidermal growth factor receptors (EGFR) on the cell surface, the extracellular signal is transmitted and amplified within the cell through second messengers and cascades of phosphorylation events. Governed by kinases (adding phosphate groups to specific amino acids residues) and phosphatases (removing them), protein phosphorylation serves as a molecular on/off-switch regulating the activity, localization, and interaction of proteins through conformational changes (Ardito et al., 2017). Ultimately, this activates transcription factors and effector proteins triggering alterations in gene expression, enzyme activity, or induction of cell death.

Fig. 1 Representation of the core EGFR-MAPK signalling pathway.Figure taken from Feng, Sangford et al. (2023), BioRxiv published under the CC-BY-NC-ND 4.0 International licence.

In healthy tissue, feedback mechanisms tightly regulate signalling to prevent excessive reactions and maintain cellular balance (Lemmon et al., 2016). If dysregulated, EGFR signalling contributes to various diseases, including cancer, inflammation, vascular diseases, and Alzheimer’s (Wieduwilt and Moasser, 2008). Consequently, gaining a comprehensive understanding of how genetic alterations contribute to dysregulation and discovering means to restore normal function becomes paramount for developing targeted therapies.

Phosphoproteome Profiling and High-throughput perturbational datasets

Protein phosphorylations are usually quantified through mass spectrometry-based methods (余& Veenstra, 2021). Common workflows involve ionising phosphopeptides, separating them based on their mass-to-charge ratio, and measuring their abundance in a sample. Recently, multiplexing methods using isobaric tags have been developed (e.g.Mertins et al., 2018), which enable simultaneous analysis of multiple samples over time and various doses in a single assay minimising variability and noise between samples.

The effects of chemical or genetic perturbations on cellular signalling can be studied using automated cost-effective transcriptomics and image-based profiling technologies which resulted in the extensive publicJUMP Cell Painting(Chandrasekaran et al., 2023) andL1000 Connectivity Map(Subramanian et al., 2017) data sets. For more details, please check out my recent preLight posts on“Phospho-seq”and“Similarity metric learning on perturbational datasets”.

Mathematical Models

实验研究是不可能的所有订单ssible biological contexts, researchers employ mechanistic mathematical models to investigate cell signalling. By combining literature with time and dosage-resolved experimental data, computational representations of molecular interactions and regulatory mechanisms can be created. Through simulations of a very large number of conditions and mathematical analysis, these models can help uncover the underlying principles governing cellular responses, prioritise which conditions are worth following up experimentally and aid in the design of targeted therapies. Yet, models inherently simplify reality and thus are not entirely accurate, often overlooking complex properties, lacking rigorous experimental validation, and/or being uninterpretable black-box models. Or in George Box’s words: “All models are wrong – but some are useful”.

Key Findings


Feng, Sanford and colleagues introduced a multiplexed phosphoproteome profiling technique to create an extensive dataset focused on the EGFR-MAPK pathway in non-transformed cells under physiological conditions. By integrating their data with various protein databases, the authors validated and expanded our understanding of EGFR-ERK pathway activation and feedback regulation. Their findings highlight biphasic signalling behaviours and unveil key regulatory components. In future, their data might be used to improve existing and construct novel mathematical models of EGFR-MAPK signalling and their downstream effects.

Comprehensive Phosphoproteomics Dataset

First, the authors determined optimal treatment conditions across time, EGF dosage and inhibitor dosage using enzyme-linked immunoassays (ELISA) to measure RAS and MAPK activity. Unlike earlier research, they used a non-cancerous cell line and EGF dosages that mimic human physiology. Their study focused solely on the EGFR-MAPK signalling cascade, ignoring delayed cellular responses like EGF-induced gene expression or protein turnover. Their initial results were found to be in line with previous studies and could be reproduced in computational simulations.

跟进,作者使用串联质量spectroscopy to create three detailed datasets on phosphoproteomics, covering time-series, dose-series, and inhibitor effects in response to EGF. The goal was to understand how specific sites on proteins undergo phosphorylation changes in response to EGF. However, tandem mass spectroscopy sometimes provided uncertain results about the exact phosphorylation site. Hence, they developed a computational method that uses confidence values from a phosphosite predictor and the PhosphositePlus database information to accurately identify the correct phosphorylation sites. This method improved the accuracy of mapping phosphosites and ensured better alignment of their data with existing scientific literature. Next, they recalculated the intensity values of multi-phosphorylated phosphopeptides to obtain individual values for each phosphosite simplifying comparisons with prior literature and reducing complexity.

Fig. 2 Experimental Setup.Figure taken from Feng, Sangford et al. (2023), BioRxiv published under the CC-BY-NC-ND 4.0 International licence.

Simultaneously, the global protein abundance was measured and normalised across samples to rule out that observed changes in enzyme activity were caused by changes in protein expression instead of changes in phosphorylation.

To enhance the usability of the phosphoproteomics data, the measured data was combined with publicly available protein-specific information known to be relevant for modelling signalling pathways. This included abundance, interactions, localization, and functional roles. This resulted in an open-access resource called Phosphoprotein Explorer, containing 46,000+ phosphorylation sites on 6,600 proteins. Notably, approximately 4,500 sites from 2,110 proteins were found to be significantly enriched in response to EGF stimulation.

EGFR-MAPK Pathway Complexity

To assess the quality and resolution of the phosphoproteomics data, the authors constructed a literature-derived model of the EGF-induced MAPK pathway with known phosphorylation effects. They found that their phosphoproteomics data aligns well with known phosphorylation dynamics both for positive and negative phosphorylation events at relevant timescales and even for low abundance species.

In a more detailed analysis, they constructed systems-level maps for the EGFR-activated phosphorylation network using their extensive experimental dataset along with external protein data. Incorporating inhibitor data and PhosphoSitePlus references, they unravelled the network’s topology, including positive and negative feedback phosphorylation among crucial EGFR-MAPK pathway proteins. 18 proteins with 41 notable phosphorylation changes were identified in response to EGF, with 29 linked to activation and 12 to inhibition.

Next, the analysis was extended to include downstream proteins of RAS and MAPK, focusing on those with a minimum 2-fold increase in phosphorylation across experiments. As earlier studies concluded that key regulatory proteins are usually of low abundance and display a high number of phosphorylation sites, the list was further filtered accordingly yielding 29 proteins. While half of these proteins were already recognized as significant, the previously unidentified ones exhibited high functional scores, implying their newfound importance.

Conclusion and Perspective

While it feels challenging to keep track of relevant literature in the phosphoproteomics world, it is exciting to do research with all these new experimental and computational methods as well as access to new extensive datasets. For instance, in the same week as this article, a new article on thedetection of post-translational modifications within long polypeptides by nanopore technologygot published as well as a new preprint on amachine learning method to build time-resolved, functional phosphosignaling networks.

What I really like about the manuscript I feature in this preLights post is that the work is both experimental and computational and that it is only possible due to recent advances in both domains. Moreover, I believe that the created resource will be of great use for the community as it is a comprehensive dataset at relevant time scales and under physiological conditions.While writing the preLights post, Steven Wiley (senior author) guided me through the new database. The database is divided into gene/protein information and specific phosphorylation sites, with external proteomics database links for comprehensive protein details. Steve demonstrated the remarkable exploration potential of this tool by utilizing its highly adaptable search function, allowing the construction of intricate queries across all data fields. Currently, the tool contains the response data of MCF10A cells to EGF (this study), but the authors plan to consistently integrate new information and links as they emerge.Personally, I really look forward to potentially using the dataset to either validate or refine my current mechanistic EGFR model.


Ardito, F., Giuliani, M., Perrone, D., Troiano, G., & Lo Muzio, L. (2017). The crucial role of protein phosphorylation in cell signaling and its use as targeted therapy (Review).International journal of molecular medicine, 40(2), 271–280.

Chandrasekaran, S. N., Ackerman, J., Alix, E., Ando, D. M., Arevalo, J., Bennion, M., Boisseau, N., Borowa, A., Boyd, J. D., Brino, L., Byrne, P. J., Ceulemans, H., Ch’ng, C., Cimini, B. A., Clevert, D.-A., Deflaux, N., Doench, J. G., Dorval, T., Doyonnas, R., … Carpenter, A. E. (2023). JUMP Cell Painting dataset: morphological impact of 136,000 chemical and genetic perturbations.Cold Spring Harbor Laboratory.

Kalyuzhnyy, A., Eyers, P. A., Eyers, C. E., Bowler-Barnett, E., Martin, M. J., Sun, Z., Deutsch, E. W., & Jones, A. R. (2022). Profiling the Human Phosphoproteome to Estimate the True Extent of Protein Phosphorylation.Journal of proteome research, 21(6), 1510–1524.

Lemmon, M. A., Freed, D. M., Schlessinger, J., & Kiyatkin, A. (2016). The Dark Side of Cell Signaling: Positive Roles for Negative Regulators.Cell, 164(6), 1172–1184.

Mertins, P., Tang, L.C., Krug, K. et al. Reproducible workflow for multiplexed deep-scale proteome and phosphoproteome analysis of tumor tissues by liquid chromatography–mass spectrometry.Nat Protoc13, 1632–1661 (2018).

Jarnuczak奥乔亚,D, F。,Vieitez, C, Gehre, M., Soucheray, M., Mateus, A., Kleefeldt, A. A., Hill, A., Garcia-Alonso, L., Stein, F., Krogan, N. J., Savitski, M. M., Swaney, D. L., Vizcaíno, J. A., Noh, K. M., & Beltrao, P. (2020). The functional landscape of the human phosphoproteome.Nature biotechnology, 38(3), 365–373.

Oda, K., Matsuoka, Y., Funahashi, A., & Kitano, H. (2005). A comprehensive pathway map of epidermal growth factor receptor signaling.Molecular systems biology, 1, 2005.0010.

Subramanian, A., Narayan, R., Corsello, S. M., Peck, D. D., Natoli, T. E., Lu, X., Gould, J., Davis, J. F., Tubelli, A. A., Asiedu, J. K., Lahr, D. L., Hirschman, J. E., Liu, Z., Donahue, M., Julian, B., Khan, M., Wadden, D., Smith, I. C., Lam, D., Liberzon, A., … Golub, T. R. (2017). A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.Cell, 171(6), 1437–1452.e17.

凌晨,P & Wang z(2017)。表皮生长的r Receptor Cell Proliferation Signaling Pathways.Cancers, 9(5), 52.

Wieduwilt, M. J., & Moasser, M. M. (2008). The epidermal growth factor receptor family: biology driving targeted therapeutics.Cellular and molecular life sciences : CMLS, 65(10), 1566–1584.

余, L. R., & Veenstra, T. D. (2021). Characterization of Phosphorylated Proteins Using Mass Spectrometry.Current protein & peptide science, 22(2), 148–157.

Tags:egfr,mass spectroscopy,mathematical modelling,phosphoproteomics,signalling

Posted on: 30 August 2023 , updated on: 31 August 2023


Read preprint (No Ratings Yet)

Author's response

The author team shared

Q1: Would it be possible to adapt your methodology to obtain spatio-temporal measurements of protein phosphorylation to account for the crucial role of protein localization in signalling and its regulation?

The workflow and sensitivity of mass spectrometry-based phosphoproteomics is currently not suitable for routine spatial measurements, although new technologies might make it more practical. At the moment, phospho-specific antibodies are the best way to spatially identify abundance changes in specific protein phosphorylation sites. Because our measurements are both broad and unbiased, they can be used to identify sites that are correlated with specific events and are thus good candidates for follow-up studies, which could include spatial measurements. We have also linked each phosphorylated protein to its known intracellular location established using multiple technologies, allowing one to compare protein localization and phosphorylation dynamics.

Q2: Considering the considerable variation in the molecule numbers of signalling species, what is the sensitivity or lower/upper detection limit of your multiplexed phosphoproteomics profiling method?

Good question. It actually varies depending on the instrument and run. At the low end, we can usually see proteins at several thousand copies per cell. This is sufficient to see low abundance regulatory proteins such as SOS1 (~7,000 copies) and BRAF (~3,000 copies), but not transcriptionally controlled regulators, such as DUSP4 and DUSP6. As instruments become faster and more sensitive, this limit should drop. For example, the new Orbitrap Astral instrument is reported to accurately quantify a few hundred copies in bulk samples. There is essentially no limit at the high range, and we can quantify millions of copies per cell. This is one of the strengths of mass-spectrometry phosphoproteomics – great dynamic range.

Q3: How does your method for reducing the ambiguity of phosphorylation site mapping differ from the solutions proposed byOchoa et al. (2020)andKalyuzhnyy et al. (2022). How well do the results align between these methods?

Ochoa et al. (2020)reanalyzed many previous phosphoproteomics datasets with a stringent False Discovery Rate (FDR) setting to generate a list of high-confidence sites to evaluate by machine learning approaches. They discarded any low-confidence sites to basically clean up their starting data set, but at the expense of coverage. They suggest using consistency with previously published data as a test for reliability, as do we for ambiguous sites.Kalyuzhnyy et al. (2022)also recognized the problem of high false-positive rates in mass spectrometry-based phosphorylation site identification and surveyed multiple phosphosite databases to determine how many of the sites were probably not real. They did not actually develop a method for phosphosite identification. However, we independently came to the similar solution of using prior knowledge in databases to help assign low-confidence sites. We find that only ~70% of our sites overlap with those identified by Ochoa, which seems a reasonable match considering the acknowledged bias against low-abundance sites in that study. Remember that the datasets described byOchoa et al. (2020)andKalyuzhnyy et al. (2022)were intended to be very stringent in their identification. Their overlap with each other (~40%) is even lower than their overlap with ours (~70%). Because there is not a single “right” way to identify specific phosphorylation sites, we include the confidence values developed by bothOchoa et al. (2020)andKalyuzhnyy et al. (2022)for every identified site in our Phosphoprotein Explorer application, together with the identified peptide and its confidence values. This allows users to use their own judgement on how confident they are willing to be for the identification of a given site.

Q4: Despite extensive research, our current knowledge of cell signalling regulation is summarised by annotated, static canonical signalling pathways, whereas knowledge of how signalling is influenced and modulated by biological context and disease remains limited. How can this approach and/or dataset help to understand signal transduction rewiring in different conditions and shed light in the large unexplored dark space of understudied kinases?

Excellent question! Current databases represent an “average cell” that contains all known pathways in all configurations. However, there is no such thing as an average cell and so these databases don’t describe any real cell type. Our database includes the response and dose-sensitivity of a single cell type (MCF10A) and thus describes what is actually happening, at least under our experimental conditions. Our previous work suggests that cells only express a subset of all signaling proteins, but if they are expressed, their abundance levels are very similar between different cell types (Shi et al., 2016). In the current paper we found that some signaling proteins are always phosphorylated in response to a stimulus, but others only occasionally change, and seem to depend on cell context (e.g., cell density). We plan to repeat our study using different cell types and look for any consistent patterns. However, I doubt that there will be many differences in the signaling pathways between cells. I suspect that most cell type differences are because of differences in the downstream mediators of signaling. We identified a class of highly phosphorylated proteins that responded in synchrony with ERK pathway activation and seemed to control cell effector functions (e.g., cell migration, gene expression). There seems to be crosstalk between different pathways at the level of these proteins, which we call HPER (highly phosphorylated EGF responsive) proteins. Some of these proteins control feedback to the ERK pathway, thus regulating signaling. I suspect that the variable expression of HPER proteins together with the activity of the kinases that target them is a major determinant of cell responses. Most of the HPER proteins are poorly studied. A focus on their function and regulation should be productive and indicate whether they are functional targets of understudied kinases as well.

Q5: Recently, numerous experimental high-throughput methodologies have emerged which simultaneously measure multiple omics modalities at single-cell resolutions. At the same time, computational approaches have been developed to map/bridge/harmonise outputs from several independent single-omics measurements onto each other (Hao et al., 2023) thereby creating multi-omics data. Where do you see the future of omics measurement? Detailed single-omics measurements with computational alignment or simultaneous multi-omics measurements?

如果一个人能做高品质,同时multi-omics measurements, that would be ideal, but that is extremely difficult at the moment. Any quantitative single-cell measurement is at the limit of what current technology can achieve and so protocols are optimized for a particular experimental modality. Unless you are lucky, any multi-omic measurement will involve compromising on conditions and thus produce less than optimal data. However, low-resolution multi-omics data can be a way to map (bridge) high-resolution, single-omics measurements. This is where I think the spatial-omics field is going – using multi-omics to create a map for data from high-resolution experiments. Of course, as technologies improve, the sensitivity, coverage and resolution of multi-omics measurements are likely to improve as well, essentially combining the two approaches.

Have your say

Your email address will not be published.Required fields are marked*

This site uses Akismet to reduce spam.Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here

preListsin thebioinformaticscategory:

Also in themolecular biologycategory:

CSHL 87th Symposium: Stem Cells

Preprints mentioned by speakers at the #CSHLsymp23

List by Alex Eve

9th International Symposium on the Biology of Vertebrate Sex Determination

This preList contains preprints discussed during the 9th International Symposium on the Biology of Vertebrate Sex Determination. This conference was held in Kona, Hawaii from April 17th to 21st 2023.

List by Martin Estermann

Alumni picks – preLights 5th Birthday

This preList contains preprints that were picked and highlighted by preLights Alumni - an initiative that was set up to mark preLights 5th birthday. More entries will follow throughout February and March 2023.

List by Sergio Menchero et al.

CellBio 2022 – An ASCB/EMBO Meeting

This preLists features preprints that were discussed and presented during the CellBio 2022 meeting in Washington, DC in December 2022.

List by Nadja Hümpfer et al.

EMBL Synthetic Morphogenesis: From Gene Circuits to Tissue Architecture (2021)

A list of preprints mentioned at the #EESmorphoG virtual meeting in 2021.

List by Alex Eve

FENS 2020

A collection of preprints presented during the virtual meeting of the Federation of European Neuroscience Societies (FENS) in 2020

List by Ana Dorrego-Rivas

ECFG15 – Fungal biology

Preprints presented at 15th European Conference on Fungal Genetics 17-20 February 2020 Rome

List by Hiral Shah

ASCB EMBO Annual Meeting 2019

A collection of preprints presented at the 2019 ASCB EMBO Meeting in Washington, DC (December 7-11)

List by Madhuja Samaddar et al.

Lung Disease and Regeneration

This preprint list compiles highlights from the field of lung biology.

List by Rob Hynds


This list of preprints is focused on work expanding our knowledge on mitochondria in any organism, tissue or cell type, from the normal biology to the pathology.

List by Sandra Franco Iborra