Senior Research Associate
My name is Soumya Banerjee (first name pronounced as show-mo) and I am a senior research fellow at the University of Cambridge.
In a previous life I was a researcher at the University of Oxford and Harvard University and have had the great fortune of working in India, USA, Australia and Germany.
I analyze complex problems and implement new statistical and machine learning techniques for deriving insights from large amounts of data.
My research is in the field of complex systems. I apply explainable Artificial Intelligence (xAI) and mathematical modelling to understand complex systems. I also apply machine learning to the field of healthcare.
You can get in touch with me at
Office: FC01 (first floor) in the computer science department
Biography
Soumya Banerjee is a Senior Research Fellow at the University of Cambridge. He worked in industry for many years before completing a PhD in applying computational techniques to interdisciplinary topics. Over the last 18 years, he has worked closely with domain experts in finance, healthcare, immunology, virology, and cell biology. He has recently worked closely with clinicians and patients on using patient and public involvement to build trust in AI algorithms.
Soumya Banerjee has a PhD in Computer Science. He worked in Los Alamos National Laboratories, USA while he was in graduate school. Prior to graduate school, he was a software engineer working in the financial services sector for Fortune 500 clients.
His work is at the intersection of computer science and biological systems – he uses tools from computer science to study biological systems and takes inspiration from biological systems to design more efficient human-engineered systems. He is skilled in machine learning techniques and mathematical modelling using spatially explicit agent-based models and computationally tractable differential equation models.
He works closely with people from other domains, especially experimentalists and clinicians. His work has been recognized with a University of New Mexico Student Award for Innovation in Informatics in 2010.
He takes pride in writing industrial-strength software, which he attributes to years working in industry and skills honed in academia. As of January 2015, he was ranked within the top 200 worldwide on MATLAB Central (an online repository for Matlab code contributed by users all over the world).
Soumya Banerjee is a researcher in ethical artificial intelligence applied to complex systems. He applies artificial intelligence techniques to solve human problems. He is also very passionate about outreach, science communication and policy for using science for social good.My research uses data science for social good and answer questions about complex systems. Complex systems are all around us, from social networks to transportation systems, cities, economies and financial markets.
Research
I analyze complex problems and implement new statistical and machine learning techniques for deriving insights from large amounts of data. I work closely with people from other domains, especially experimentalists and clinicians.
I worked in industry before completing a PhD in applying computational techniques to interdisciplinary topics. I have worked closely with domain experts in finance, healthcare, immunology, virology, and cell biology. Recently I have worked closely with clinicians and patients on using patient and public involvement to build trust in AI algorithms.
My research uses data science for social good and answer questions about complex systems. Complex systems are all around us, from social networks to transportation systems, cities, economies and financial markets. I am also very passionate about outreach, science communication.
Here is a video where I explain my research
Publications
PhD Thesis
Scaling in the Immune System, PhD Thesis, University of New Mexico, USA, 2013 (pdf) (computational immunology talk part 1) (computational immunology talk part 2) (zenodo link) (OSF project link)
Selected peer-reviewed conferences and journals (in order of most relevant publications)
NOTE: In my field (Computer Science), people submit to conferences and conferences are peer-reviewed
Here is a link to preprints of my most significant publications (a more complete set of preprints can be found here, here, and here)
21) Patient and public involvement to build trust in artificial intelligence: a framework, tools and case studies, Soumya Banerjee, Phil Alsop, Linda Jones, Rudolf Cardinal, Patterns 3(6):100506, 2022 (journal link) (resource) (resource) (blog post for general audience)
(Cell Press publishing group)
(highlights: This includes a framework, case studies and tools to involve patients in AI research. Involving patients and carers in research will help build trust in AI.)
20) Software Application Profile: ShinyDataSHIELD—an R Shiny application to perform federated non-disclosive data analysis in multicohort studies, Xavier Escribà-Montagut, Yannick Marcon, Demetris Avraam, Soumya Banerjee, Tom R P Bishop, Paul Burton, Juan R González, International Journal of Epidemiology, dyac201, 2022
(link) (code) (demo) (tutorial)
(Oxford University Publishing Group, Impact Factor 9.8)
(highlights: This is an accessible user interface for statistical and machine learning. The interface allows non-technical users to carry out analysis on data using a federated privacy preserving platform.)
19) dsSurvival 2.0: Privacy enhancing survival curves for survival models in the federated DataSHIELD analysis system, Soumya Banerjee, Tom Bishop, BMC Research Notes 16, 98, 2023
(link)(preprint)(lay summary)(code)(code)(code)(code)
(highlights: a package and tools for privacy preserving survival curve visualization in clinical informatics)
18) dsSynthetic: Synthetic data generation for the DataSHIELD federated analysis system, Soumya Banerjee, Tom Bishop, BMC Research Notes 15(1):230, 2022
(link) (code, code, code) (preprint) (supplementary material)
(Springer Nature publishing group)
(highlights: a package for generating synthetic data for privacy preserving analysis in healthcare)
17) dsSurvival: Privacy preserving survival models for federated individual patient meta-analysis in DataSHIELD, Soumya Banerjee, Ghislain Sofack, Thosoris Papakonstantinou, Demetris Avraam, Paul Burton et al., BMC Research Notes, 15(1):197, 2022 (paper) (code) (code) (code) (tutorial) (preprint)
(Springer Nature publishing group)
(highlights: a package and tools for privacy preserving survival analysis in clinical informatics)
16) A class-contrastive human-interpretable machine learning approach to predict mortality in severe mental illness, Soumya Banerjee, Pietro Lio, Peter Jones, Rudolf Cardinal, Nature Partner Journal Schizophrenia, 7, 60, 2021
(Nature Partner Journal, Nature Publishing Group, Impact Factor = 6.3) (link) (journal home)
15) Simulating a community mental health service during the COVID-19 pandemic: effects of clinician-clinician encounters, clinician-patient-family encounters, symptom-triggered protective behaviour, and household clustering, Frontiers in Psychiatry, 12, 196, 2021
(Impact Factor = 3.5) (link) (preprint) (code)
14) Optogenetic tuning reveals Rho amplification-dependent dynamics of a cell contraction signal network, Dominic Kamps, Johannes Koch, Victor O. Juma, Eduard Campillo-Funollet, Melanie Graessl, Soumya Banerjee, Tomáš Mazel, Xi Chen, Yao-Wen Wu, Stephanie Portet, Anotida Madzvamuse, Perihan Nalbant, Leif Dehmelt, Cell Reports, 33(9):108467, 2020
(Cell Press publishing group, Impact Factor = 8.1) (link) (general summary) (archived general summary) (general summary OSF) (code) (code)
13) Deconvolution of monocyte responses in inflammatory bowel disease reveals an IL-1 cytokine network that regulates IL-23 in genetic and acquired IL-10 resistance Gut, 2020
(British Medical Journal publishing group, Impact Factor = 19.8) (link) (preprint)
12) Hydroxychloroquine: balancing the needs of LMICs during the COVID-19 pandemic, Soumya Banerjee, Lancet Rheumatology, 2(7):385-386, 2020 (link) (link)
(Lancet Publishing Group)
11) Predictive metabolomic profiling of microbial communities using amplicon or metagenomic sequences, Himel Mallick, Eric A. Franzosa, Lauren J. Mclver, Soumya Banerjee, et al., Nature Communications, 10(1):3136, 2019
(Impact factor = 12.2) (journal link) (code)
10) The early impact of COVID-19 on mental health and community physical health services and their patients’ mortality in Cambridgeshire and Peterborough, UK Journal of Psychiatric Research, 131, 244-254, 2020
(Impact factor = 4.4) (link)
9) Influence of correlated antigen presentation on T cell negative selection in the thymus, Soumya Banerjee, SJ Chapman, Journal of the Royal Society Interface, 15(148), 20180311, 2018
(Impact Factor = 4.3) (link) (link) (journal link) (main and supplementary material combined) (media summary)
8) Modelling the effects of phylogeny and body size on within-host pathogen replication and immune response, Soumya Banerjee, Alan Perelson, Melanie Moses, Journal of the Royal Society Interface 14(136), 20170479, 2017
(Impact Factor = 4.3) (preprint PDF) (link) (supplementary materials) (talk part 1) (talk part 2)
(media summary) (media coverage) (coverage)
(highlights: We combine data on infectious diseases from different species using machine learning. We propose a new competency metric that allows us to link within-host viral dynamics to between-host spread of diseases. Our technique can be applied to other emerging diseases like Zika and Ebola virus that infect multiple species.)
7) An excitable Rho GTPase signaling network generates dynamic subcellular contraction patterns, Melanie Graessl, Johannes Koch, Abram Calderon, Dominic Kamps, Soumya Banerjee, Tomáš Mazel, Nina Schulze, Jana Kathrin Jungkurth, Rutuja Patwardhan, Djamschid Solouk, Nico Hampe, Bernd Hoffmann, Leif Dehmelt, Perihan Nalbant. Journal of Cell Biology 216(12), 4271-4285, 2017
(Impact Factor = 9.8) (link) (preprint) (work reviewed and discussed here) (link to a movie) (code and model file for intra-cellular network link link )
(highlighted in Journal of Cell Biology spotlight) (summary for general audience)
6) Estimating biologically relevant parameters under uncertainty for experimental within-host murine West Nile virus infection, Soumya Banerjee, Jeremie Guedj, Ruy Ribeiro, Melanie Moses & Alan Perelson Journal of the Royal Society Interface, 13(117), 20160130, 2016
(Impact Factor = 4.3) (preprint) (link) (supplementary material) (summary for general audience) (summary) (coverage)
(highlights: West Nile virus (WNV) causes viral encephalitis in humans, and is related to viruses such as Dengue and Zika that are also of significant public health concern. We have developed a computational method to determine characteristics of WNV infection even in the face of limited experimental data. This could be applicable to other emerging diseases like Zika virus for which there is little data. It may be particularly useful to estimate the potential rate of within-host viral reproduction early in an outbreak in order to assess the epidemic potential of emerging pathogens.)
5) Competitive dynamics between criminals and law enforcement explains the super-linear scaling of crime in cities, Soumya Banerjee, Manuel Cebrian, Pascal van Hentenryck, Palgrave Communications 1, 15022, 2015
(Nature Publishing Group) (paper) (preprint) (data) (supplementary information) (coverage) (code)
(top 5% of all papers covered by Altmetric as of February 2016)
(highlights: Larger cities have disproportionately more crime per capita compared to smaller cities [super-linear scaling of crime]. We used techniques from dynamical systems and complex systems to explain the super-linear scaling of crime in cities and other socio-technological systems.)
4) A bioorthogonal small-molecule switch system for controlling protein function in cells, Peng Liu, Abram Calderon, Georgios Konstantinidis, Jian Hou, Stephanie Voss, Xi Chen, Fu Li, Soumya Banerjee, Jan‐Erik Hoffmann, Christiane Theiss, Leif Dehmelt, Yao‐Wen Wu (2014) 53(38), 10049-10055, Angewandte Chemie
(Impact Factor = 13.7) (preprint) (link)
(highlights: a patented technique [International patent application PCT/EP2013/060890] and the first reversible small-molecule system for controlled protein interaction in live cells. I used statistical techniques to analyze the data and performed automated cell tracking using ImageJ and CellProfiler)
3) Science and technology consortia in US biomedical research: A paradigm shift in response to unsustainable academic growth, Curt Balch, Hugo Arias-Pulido, Soumya Banerjee, Alex K. Lancaster, Kevin B. Clark, Michael Perilstein, Brian Hawkins, John Rhodes, Piotr Sliz, Jon Wilkins and Thomas W. Chittenden, (2014) BioEssays
(Impact Factor = 5.4) (link) (pdf)
2) A spatial model of the efficiency of T cell search in the influenza-infected lung, Drew Levin, Stephanie Forrest, Soumya Banerjee, Candice Clay, Judy Cannon, Melanie Moses and Frederik Koster, Journal of Theoretical Biology, 398(7), 52-63, 2016
(Impact Factor = 2.3) (journal link) (pdf) (supplementary section) (Supplementary videos of agent based model) (code)
1) Scale Invariance of Immune System Response Rates and Times: Perspectives on Immune System Architecture and Implications for Artificial Immune Systems, Soumya Banerjee and Melanie Moses, Swarm Intelligence, 4(4), 301-318, DOI: 10.1007/s11721-010-0048-2, 2010
(Impact Factor = 2.2) (journal link) (preprint) (pdf) (arXiv) (bibTeX) (code) (code)
Teaching and Supervisions
2024/2025:
- Introduction to R for Biologists - Trainer
- Linear mixed effects models - Trainer
- Reproducible Research with R - Trainer
2023/2024:
- Reproducible Research with R - Trainer
2020/2021:
- An Introduction to Machine Learning - Training Lead, Trainer
2019/2020:
- An Introduction to Machine Learning - Trainer