top of page

Illuminating Insights from
Healthcare Data


There's a revolution happening in the world of healthcare. The convergence of electronic health records (EHR) data with new data science techniques has created a landmark opportunity to understand disease biology and advance the practice of medicine like never before.

The goals of our research group are to 1) identify important clinical questions inspired by real-world practice, and 2) produce interpretable and reproducible answers to these questions using data science.


In the service of these objectives, we pre-register detailed research protocols, publish computable research documents (e.g. R Markdown, Jupyter Notebooks), and even de-identify raw EHR data for the ultimate in end-to-end reproducible data science. When the research needs exist but the tools do not, we roll up our sleeves, write code, and open-source our software.


Making reliable inferences from the unruly world of observational, real-world data isn't easy. To have any hope at all of getting "the right answer," we must integrate expertise across different domains: epidemiology and study design, biostatistics and causal inference, clinical informatics, machine learning, natural language processing, scientific computing, and the clinical domain.


To meet the challenges of this field, we have assembled a diverse team of expert advisors and collected tools and data from across the University of California, San Francisco (UCSF) and beyond. Our group is embedded in the Bakar Computational Health Sciences Institute, home to the de-identified EHR data at UCSF (with 80M+ notes)  and a UC-wide database covering 5M+ patients in California. Our group is adjacent to the Dept. of Epidemiology and Biostatistics, which provides our group with essential mentorship. We are affiliated with the UCSF Dept. of Medicine, where we interact with expert clinicians at our world-class medical center. In late 2020, we will be moving to a brand-new facility in Mission Bay featuring on-site, PHI-compliant computing servers and views of the bay. 

Our work is funded by the FDA as well as multiple biopharmaceutical sponsors who support our efforts to study real-world treatment effects.



Vivek Rudrapatna
Principal Investigator
This could be you!
Join the team!
And you too!
See the openings
section below!

Vivek Rudrapatna, MD, PhD, is an Assistant Adjunct Professor in the Division of Gastroenterology at UCSF, and the Director of Real-World Evidence Projects at the Bakar Computational Health Sciences Institute. Vivek trained in clinical data science under the direction of Dr. Atul Butte, UCSF Professor and Chief Data Scientist at University of California Health. He also completed a fellowship in Gastroenterology and maintains a practice focused on the treatment of Inflammatory Bowel Disease. Vivek is passionate about the potential for data science and artificial intelligence technologies, such as natural language processing, to support research into human diseases and to augment clinical decision making.


UCSF Bakar Institute of Computational Health Sciences

  • Atul Butte, MD, PhD

UCSF Division of Gastroenterology 

  • Uma Mahadevan, MD

  • Michael Kattah, MD, PhD

  • Bruce Wang, MD


UCSF Department of Epidemiology and Biostatistics

  • Chiung-Yu Huang, PhD

  • Isabel Elaine Allen, PhD

  • Michael Kohn, MD

Other UCSF Departments

  • Andrew Krystal, MD

  • Pelin Cinar, MD, MS

  • Stuart Gansky, DrPH


University of California (Los Angeles, San Diego) 

  • Brigid Boland, MD

  • Wael El-Nachef, MD, PhD


Mount Sinai School of Medicine

  • Benjamin Glicksberg, PhD



Real-World Evidence (RWE)

  • Rudrapatna VA, Butte AJ (2020). Opportunities and challenges in using real-world data for healthcare. J Clin Invest. 130(2):565-574.

Inflammatory Bowel Disease

  • Rudrapatna VA, Glicksberg BS, Butte AJ (2019) A Comparison of the Randomized Clinical Trial Efficacy and Real-World Effectiveness of Tofacitinib for the Treatment of Inflammatory Bowel Disease: A Cohort Study. medRxiv

  • ​Ko MS*, Rudrapatna VA*, Avila P, Mahadevan U (2020) Safety of Flexible Sigmoidoscopy in Pregnant Patients with Known or Suspected Inflammatory Bowel Disease. Dig Dis Sci. PMID: 32034603

  • Rudrapatna VA, Butte AJ (2018) Open data informatics and data repurposing for IBD. Nat Rev Gastroenterol Hepatol PMID: 30061595.

Accessibility and Reproducibility in Clinical Informatics

  • Rudrapatna VA*, Glicksberg BS*, Avila P, Harding-Theobald E, Wang C, Butte AJ (2020) Accuracy of medical billing data against the electronic health record in the measurement of colorectal cancer screening rates. BMJ Open Qual 9(1). PMID: 32209595

  • Glicksberg BS, Oskotsky B, Thangaraj PM, Giangreco N, Badgeley MA, Johnson KW, Datta D, Rudrapatna VA, Rappoport N, Shervey MM, Miotto R, Goldstein TC, Rutenberg E, Frazier R, Lee N, Israni S, Larsen R, Percha B, Li L, Dudley JT, Tatonetti NP, Butte AJ (2019) PatientExploreR: an extensible application for dynamic visualization of patient clinical history from electronic health records in the OMOP common data model. Bioinformatics 35(21):4515-4518 PMID: 31214700.

  • Glicksberg BS, Oskotsky B, Giangreco N, Thangaraj PM, Rudrapatna V, Datta D, Frazier R, Lee N, Larsen R, Tatonetti NP, Butte AJ (2019) ROMOP: a light-weight R package for interfacing with OMOP-formatted electronic health record data. JAMIA Open. 2(1):10-14.


  • Arneson D, Elliott M, Mosenia A, Oskotsky B, Vashisht R, Zack T, Bleicher P, Butte AJ, Rudrapatna VA (2020) CovidCounties - an interactive, real-time tracker of the COVID-19 pandemic at the level of US counties. medRxiv 

Coming Soon

EHR Interventions for Higher-Value Care

  • Rudrapatna VA*, Ko M*, Mosenia A, Radhakrishnan K, Butte AJ, Kathpalia P (2020) An Electronic Health Records-Based Intervention Improves Appropriate Referrals for Upper Endoscopy: A Prospective, Single Center Study. Accepted to ACG 2020.

Towards Real-World Knowledge Bases

  • Rudrapatna VA, Gupta S, Mardirossian T, Narain R, Mosenia A, Butte AJ (2020) Accurate Machine Classification of Ulcerative Colitis Mayo Subscores from Electronic Health Record Procedure Reports. Accepted to ACG 2020. 



Postdoctoral Fellow in Clinical Data Science

We are seeking highly motivated researchers to develop and apply novel clinical data science approaches in order to address questions of diagnosis, treatment, and prevention using routinely collected clinical data from the Electronic Health Records at UCSF Medical Center. The successful candidate will develop an independent research program and act as the primary driver of two research grants in the lab: the first aiming to measure the real-world effectiveness of Ustekinumab for the treatment of Crohn's Disease and compare its performance to that of prospective clinical trials, and the second aiming to identify patients suffering from a rare but treatable cause of recurrent abdominal pain even before they have been clinically diagnosed.

This position is available for 1 year with a possible extension for up to two years based on performance evaluation.

Essential Qualifications:

  • A PhD (or equivalent) in one of the following fields: computational biology, biostatistics, epidemiology, bioinformatics, data science, computational linguistics, computer science, or machine learning. MDs are welcome to apply if they have a strong background in one of the above fields.

  • A strong interest or background in clinical research and epidemiology

  • Experience in Python (preferred), R, or Julia. SQL is strongly recommended.

  • Excellent communication skills and a track record of peer-reviewed first-authored publications

  • A high degree of motivation and ability to operate independently


Desired Qualifications:

  • A background in natural language processing (NLP) and associated tasks, including text classification, information and relation extraction, knowledge representation. A specific background in clinical NLP would be extremely valuable.

  • A background in clinical informatics, including knowledge of the OMOP common data model. Experience using clinical databases, especially the Epic EHR database backends (Clarity, Caboodle) would be highly valuable.

  • A background in causal inference

  • A background in deep learning


Candidates may email a statement of research interests, curriculum vitae, and list of three references as a single PDF to Dr. Vivek Rudrapatna at with “Postdoc application” in the subject line.

bottom of page