[Mobilizeplans-starstudents] TODAY! Mobilize Center Seminar

Diane Bush dbush1 at stanford.edu
Thu Nov 17 06:40:14 PST 2016

The next Mobilize Center Seminar will be TODAY Thursday, November 17, and will feature Mobilize Center postdoctoral fellow, Jason Fries.  We look forward to seeing you!

Weak Supervision: Building Machine Learning Systems without Labeled Data

Thursday, November 17
noon - 1 pm

Y2E2 300, Stanford University

While advances in deep learning are reducing the need for manual feature engineering in many machine learning tasks, current approaches require large, labeled datasets to properly fit models. Unfortunately, in many domains training set creation itself is the primary bottleneck to building scalable machine learning systems. This motivates recent research on new ways to programmatically generate labeled data, leveraging radically weaker and more diverse forms of supervision to train models at the scales required for automatic featurization.

Data Programming is a newly proposed paradigm for training machine learning systems with unlabeled data. In this approach, users encode domain expertise and other forms of weak supervision as labeling functions -- noisy, conflicting heuristics for labeling data. These functions train a generative model of the underlying labeling process, learning function accuracies to create a "denoised" instance of our training data for use in downstream discriminative models.

We present work on two learning systems developed using Snorkel, Stanford's open source data programming framework. First, we show a weakly-supervised named entity recognition system for tagging disease and chemical names in PubMed abstracts. Evaluating our approach on three public biomedical datasets, we show that it is possible to approach traditional supervised performance without training on hand-labeled data, instead relying only on ontologies and other domain heuristics.

Second, we describe ongoing work analyzing text from the Stanford Hospital and Clinics electronic health record (EHR). Patient notes provide exciting analysis opportunities as they contain information poorly captured elsewhere in the EHR. We examine a cohort of pre and post-operative joint replacement patients, using 1/2 million patient notes to extract relationships between anatomical locations and assessments of pain. We describe system performance and future applications in medical device surveillance and other spontaneous reporting systems.

The work is done in collaboration with Alison Callahan, Alex Ratner, Sen Wu, and Christopher Ré.

Jason Fries is a Postdoctoral Fellow in Computer Science at Stanford University. He works with Prof. Chris Ré and Scott Delp as part of Stanford's Mobilize Center, an NIH Big Data to Knowledge (BD2K) site of excellence that explores data science approaches to understanding diseases of human mobility. His research focuses on information extraction and predictive modeling using unstructured text and time series data from scientific literature and the electronic medical record. His most recent projects include extracting named entities and relations from text without using labeled data  and postoperative trajectories of pain and function after joint replacement surgery. Jason received his PhD from the University of Iowa in 2015, co-advised by Alberto Segre and Dr. Phil Polgreen, working as part of Iowa's Computational Epidemiology Research Group. His thesis explored large-scale information extraction in electronic medical record text as well as machine learning approaches to public health surveillance using social media.

Please see Mobilize Events<http://mobilize.stanford.edu/events/> for a list of upcoming speakers.

Diane Bush
Assistant to Professor Scott Delp
NMBL, Mobilize Center, OpenSim
Stanford University
dbush1 at stanford.edu<mailto:dbush1 at stanford.edu>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://simtk.org/pipermail/mobilizeplans-starstudents/attachments/20161117/975b6378/attachment.html>

More information about the Mobilizeplans-starstudents mailing list