Levin Lecture Series: Spring 2020 Colloquium Seminars

January 21, 2020 

***SPECIAL SEMINAR***

Topic: "Bayesian sparse regression for large-scale observational healthcare analytics​"

11:30am-12:30pm​
AR Building, Hess Commons
Hosted by: Department of Biostatistics

Speaker: Akihiko Nishimura 
Postdoctoral Researcher in the Department of Biomathematics at University of California- Los Angeles. 
Email: akihiko4@g.ucla.edu

Abstract: Growing availability of large healthcare databases presents opportunities to investigate how patients' response to treatments vary across subgroups. Even with a large cohort size found in these databases, however, low incidence rates make it difficult to identify causes of treatment effect heterogeneity among a large number of clinical covariates. Sparse regression provides a potential solution. The Bayesian approach is particularly attractive in our setting, where the signals are weak and heterogeneity across databases are substantial. Applications of Bayesian sparse regression to large-scale data sets, however, have been hampered by the lack of scalable computational techniques. We adapt ideas from numerical linear algebra and computational physics to tackle the critical bottleneck in computing posteriors under Bayesian sparse regression. For linear and logistic models, we develop the conjugate gradient sampler for high-dimensional Gaussians along with the theory of prior-preconditioning. For more general regression and survival models, we develop the curvature-adaptive Hamiltonian Monte Carlo to efficiently sample from high-dimensional log-concave distributions. We demonstrate the scalability of our method on an observational study involving n = 1,065,745 patients and p = 15,779 clinical covariates, designed to compare effectiveness of the most common first-line hypertension treatments. The large cohort size allows us to detect an evidence of treatment effect heterogeneity previously unreported by clinical trials.


January 22, 2020

***SPECIAL SEMINAR***​

Topic: "Graphical Casual Model Selection for Applications in Health and Policy"

11:30am-12:30pm​
AR Building, Hess Commons
Hosted by: Department of Biostatistics

Speaker: Dr. Daniel Malinsky
Postdoctoral Fellow, Department of Computer Science, Johns Hopkins University

Abstract: To draw reliable conclusions about treatment and policy decisions we rely on assumptions about the background causal processes which generated our data. In highly-controlled settings, such as clinical trials, the requisite causal assumptions are typically satisfied by design. However, when dealing with complex but less structured data -- such as that from electronic health records, biological processes that are not well-understood, and some observational studies -- we may have little reason to trust convenient causal assumptions a priori. An important challenge in these settings is to select causal models on the basis of the data itself as much as possible. By representing causal models with graphs (DAGs or more complicated structures), a growing body of research has produced algorithms that exploit the statistical implications of candidate graphical structures to perform principled causal model selection: distinct graphs will imply (sometimes) distinct patterns of conditional independence and dependence, and under some assumptions we can perform a kind of "pattern matching" to select a set of candidate graphs consistent with observations. Subsequently, the selected models can be used to support identification of causal effects and (if they are identified) efficient estimation. This talk will include a brief introduction to graphical causal model selection and focus on a novel method which has been developed to make model selection possible from nonstationary time series data. Time-permitting, I will also discuss related work on respecting (causal) fairness constraints in automated decision-making systems and efficient inference in settings with data missing not-at-random.


January 23, 2020

Topic: "Treatment-Free Survival, With and Without Toxicity, a Novel Outcome Measure and Integrative Analysis for Immuno-Oncology"

11:30am-12:30pm​
AR Building, 8th Floor Auditorium
Hosted by: Shing Lee

Speaker: Dr. Meredith Regan
Associate Professor of Medicine, Harvard Medical School 
Email: mregan@jimmy.harvard.edu

Abstract: Conventional measures such as median progression-free survival may suboptimally characterize the full impact of immuno-oncology (I-O) agents, as compared with one another or with other systemic anticancer therapies. Patients discontinuing I-O agents may experience periods of disease control without needing subsequent systemic anticancer therapy but may still experience toxicity. Treatment-free survival (TFS), with and without toxicity, simultaneously characterizes disease control and toxicity for this off-treatment period. Using three phase 3 randomized clinical trials testing immune checkpoint inhibitors in patients with advanced melanoma and renal cell carcinoma, we defined TFS within an integrative analysis how patients spend their overall survival time, which has been improved with the use of these I-O agents. 


January 29, 2020

***SPECIAL SEMINAR***

Topic: "A knockoff  filter for probit and logistic regression models with Bayesian variable selection statistics​"

11:30am-12:30pm​
AR Building, 8th Floor Auditorium
Hosted by: Department of Biostatistics

Speaker: Dr. Linxi Liu
Term-Assistant Professor, Department of Statistics, Columbia University
Email: ll3098@columbia.edu

Abstract: Controlled variable selection for high-dimensional generalized linear models (GLM) is a topic that has drawn a lot of attention in recent years. It has broad applications in many research areas, ranging from social sciences to biological sciences. One frequently-encountered example is the genome-wide association study in genetics. In this talk, I will focus on the variable selection problem for probit and logistic regression models. We are particularly interested in the case when signals/associations are relatively weak, thus exact recovery of signal locations cannot be achieved. Alternatively, we seek the answer to the question whether we can select a subset of response-associated variables with controlled false discovery rate (FDR), when both the sample size n and the dimension p go to infinity and p/n converges to a constant. Our approach is built on top of the popular knockoffs framework (Barber and Candes, 2015). The procedure starts by constructing a group of knockoff variables geometrically and then calculates the test statistics based on a Bayesian model. As opposed to model-X knockoffs, we treat the regression matrix as deterministic, and identify a set of suciffent conditions under which the proposed method achieves asymptotic FDR control. Numerical results suggest power gains of the new method when the regression matrix is sparse. We also apply the proposed method to the Swedish exome-sequencing study of schizophrenia. A main challenge of the data analysis is the large number of rare mutations, which implies a sparse regression matrix. Simulation results show that our method can successfully control the FDR in this example. With FDR as a relaxed measure of Type-I error, we are able to claim gene-level discoveries, which have not been reported in previous studies.


January 30, 2020

Topic: "Fairness By Causal Mediation Analysis: Criteria, Algorithms, and Open Problems"

11:30am-12:30pm​
AR Building, 8th Floor Auditorium
Hosted by: Caleb Miles

Speaker: Dr. Ilya Shpitser
John C. Malone Assistant Professor, Department of Computer Science, Whiting School of Engineering, Johns Hopkins University
Email: ilyas@cs.jhu.edu

Abstract: Systematic discriminatory biases present in our society influence the way data is collected and stored, the way variables are defined, and the way scientific findings are put into practice as policy. Automated decision procedures and learning algorithms applied to such data may serve to perpetuate existing injustice or unfairness in our society. We consider how to solve prediction and policy learning problems in a way which ``breaks the cycle of injustice'' by correcting for the unfair dependence of outcomes, decisions, or both, on sensitive features (e.g., variables that correspond to gender, race, disability, or other protected attributes). To solve the problem, we proceed as follows.  First, we generalize Pearl's calculus of interventions to yield a calculus on arbitrary potential outcomes, allowing us to give complete identification theory for conditional path-specific effects, allowing complex fairness criteria to be expressed as functionals of the observed data.  Second, we use this theory to learn outcome predictors and optimal policies in a way that adjusts for an inappropriate dependence of a sensitive feature on other variables, adapting methods from semi-parametric statistics, empirical likelihood, and constrained optimization.

We illustrate our approach with both synthetic data and criminal justice data, and discuss ongoing work.

This is joint work with Daniel Malinsky, Razieh Nabi, and Thomas S. Richardson.


February 6, 2020

Topic: "Multiway clustering via tensor block models​"

11:30am-12:30pm​
AR Building, 8th Floor Auditorium
Hosted by: Gen Li

Speaker: Dr. Miaoyan Wang
Assistant Professor, Department of Statistics, University of Wisconsin-Madison
Email: miaoyan.wang@wisc.edu​

Abstract: Higher-order tensors arise frequently in applications such as genomics, recommendation system, topic modeling, and social network analysis. We consider the problem of identifying multiway block structure from a large noisy tensor. We propose a tensor block model, develop a unified least-square estimation, and obtain the theoretical accuracy guarantees for multiway clustering. The statistical convergence of the estimator is established, and we show that the associated clustering procedure achieves partition consistency. A sparse regularization is further developed for identifying important blocks with elevated means. The proposal handles a broad range of data types, including binary, continuous, and hybrid observations. We demonstrate the outperformance of our approach over previous methods through application to multi-tissue gene expression and contextual network analysis. This is a joint work with my student, Yuchen Zeng.


February 13, 2020

Topic: "Challenges in Developing Learning Algorithms to Personalize Treatment in Real Time​"

11:30am-12:30pm​
AR Building, 8th Floor Auditorium
Hosted by: Min Qian

Speaker: Dr. Susan Murphy
Professor, Department of Statistics, Harvard University 
Email: samurphy@fas.harvard.edu

Abstract: There are a variety of formidable challenges to reinforcement learning and control for use in designing mobile health interventions for individuals with chronic disorders. Challenges include settings in which most treatments delivered by a mobile device have immediate nonnegative (hopefully positive) effects but longer term effects tend to be negative due to user burden. Furthermore the resulting data must be amenable to conducting a variety of statistical analyses, including causal inference as well as monitoring analyses. Other challenges include an immature domain science concerning the system dynamics but the need to incorporate some domain science due to low signal to noise ratio as well as non-stationary and limited data on individuals with chronic disorders. Here we describe how we confront these challenges including our use of low variance proxies for the delay effects to the reward (e.g. immediate response) in the learning algorithm.


February 18, 2020 

***SPECIAL SEMINAR***

Topic:  "Leveraging Digital Data for Clinical Research"​

11:30am-12:30pm​
AR Building, Hess Commons
Hosted by: Department of Biostatistics

Speaker: Dr. Rui Duan
Data Scientist, Alphabet’s Verily Life Sciences
Email: jgronsbell@google.com

Abstract: The widespread adoption of electronic health records (EHR) and their subsequent linkage to specimen biorepositories has generated massive amounts of routinely collected medical data for use in translational research.  These integrated data sets enable real-world predictive modeling of disease risk and progression.  However, data heterogeneity and quality issues impose unique analytical challenges to the development of EHR-based prediction models.  For example, ascertainment of validated outcome information, such as presence of a disease condition or treatment response, is particularly challenging as it requires manual chart review.  Outcome information is therefore only available for a small number of patients in the cohort of interest, unlike the standard setting where this information is available for all patients.  In this talk I will discuss semi-supervised and weakly-supervised learning 
methods for predictive modeling in such constrained settings where the proportion of labeled data is very small.  I demonstrate that leveraging unlabeled 
examples can improve the efficiency of model estimation and evaluation and in turn substantially reduce the amount of labeled data required for developing prediction models. 


February 19, 2020 

***SPECIAL SEMINAR***

Topic: "Leveraging Digital Data for Clinical Research"

11:30am-12:30pm​
AR Building, Hess Commons
Hosted by: Department of Biostatistics

Speaker: Dr. Jessica Gronsbell
Data Scientist, Alphabet’s Verily Life Sciences
Email: jgronsbell@google.com

Abstract: The growth of availability and variety of healthcare data sources has provided unique opportunities for healthcare data integration and evidence 
synthesis, which can potentially accelerate knowledge discovery and enable better clinical decision making.  However, many practical and technical 
challenges, such as data privacy, high-dimensionality and heterogeneity across different datasets, remain to be addressed. In this talk, I present 
several methods for effective integration of electronic health records and other healthcare datasets. Specifically, we develop communication-efficient 
distributed algorithms for joint analyses of multiple datasets without the need of sharing patient-level data. Our algorithms do not require iterative 
communication across sites, and are able to account for heterogeneity across different datasets. We provide theoretical guarantees for the performance 
of our algorithms, and examples of implementing the algorithms to real world clinical research networks, including the observational health data sciences 
and informatics (OHDSI) and the national patient-centered clinical research networks (PCORnet).


February 21, 2020 

Topic: "Penalized Empirical Likelihood for the Sparse Cox Model​"

11:30am-12:30pm​
AR Building, 8th Floor Auditorium
Hosted by: Ian McKeague 

Speaker: Dr. Yichuan Zhao
Professor, Department of Mathematics and Statistics, Georgia State University
Email: yichuan@gsu.edu​

Abstract: The current penalized regression methods for selecting predictor variables and estimating the associated regression coefficients in the Cox model are mainly based on partial likelihood. In this paper, an empirical likelihood method is proposed for the Cox model in conjunction with appropriate penalty functions when the dimensionality of data is high. Theoretical properties of the resulting estimator for the large sample are proved. Simulation studies suggest that empirical likelihood works better than partial likelihood in terms of selecting correct predictors without introducing more model errors. The well-known primary biliary cirrhosis data set is used to illustrate the proposed empirical likelihood method.  

This is joint work with Dongliang Wang and Tong Tong Wu.


February 27, 2020

Topic: "TBA"

11:30am-12:30pm​
AR Building, Hess Commons
Hosted by: Prakash Gorroochurn 

Speaker: Dr. Harry Crane
Associate Professor, Department of Statistics, Rutgers University
Email: hcrane@stat.rutgers.edu​

Abstract: "TBA"


March 5, 2020

Topic: "TBA"

11:30am-12:30pm​
AR Building, 8th Floor Auditorium
Hosted by: Jianhua Hu 

Speaker: Dr. Jing Ning
Associate Professor, Department of Biostatistics, MD Anderson Cancer Center
Email: jning@mdanderson.org​

Abstract: "TBA"


March 6, 2020

Topic: "Phylogenetically Informed Methods for Microbiome Data Analysis -- An Optimal Transport Perspective​"

11:30am-12:30pm​
AR Building, Main Floor, Hess Commons
Hosted by: Department of Biostatistics

Speaker: Dr. Shulei Wang
Postdoctoral Researcher, University of Pennsylvania
Email: Shulei.Wang@pennmedicine.upenn.edu​

Abstract: It is increasingly recognized that human host-microbiome interactions play important roles in human health, from metabolism to immune system. However, analysis of human microbiome data poses great challenges in robust quantification and interpretation, due to the unique characteristics of the data, including high-dimensional compositional data, stochastic noise in composition, and a phylogenetic tree structure among the variables. In the talk, I will discuss the analysis strategies of such noisy tree-structure compositional data and share some of our recent efforts to ease such challenges from an optimal transport perspective. Specifically, I will present a new minimax optimal estimator for the Wasserstein distance on tree and introduce a novel interpretable two-sample test by leveraging the tree structure. The practical merit of the proposed methods is demonstrated by an application to a human intestinal biopsy microbiome data set for patients with inflammatory bowel disease.


March 12, 2020

Topic: "TBA"

11:30am-12:30pm​
Hammer Building, LL107
Hosted by: Ian McKeague

Speaker: Dr. Rebecca Betensky
Professor and Chair Department of Biostatistics, College of Global Public Health, NYU
Email: rebecca.betensky@nyu.edu​

Abstract: "TBA" 


March 26, 2020

Topic: "TBA"

11:30am-12:30pm​
AR Building, 8th Floor Auditorium
Hosted by: Iuliana Ionita-Laza

Speaker: Dr. Xin He
Assistant Professor,  Department of Human Genetics, The University of Chicago
Email: xinhe@uchicago.edu​

Abstract: "TBA" 


March 30, 2020

Topic: "TBA"

11:30am-12:30pm​
Hammer Building, LL107
Hosted by: Zhezhen Jin

Speaker: Dr. Joan Hu
Joan Hu Professor, Department of Statistics and Actuarial Science, Simon Fraser University
Email: joanh@stat.sfu.ca​

Abstract: "TBA"


April 2, 2020

Topic: "TBA"

11:30am-12:30pm​
AR Building, 8th Floor Auditorium
Hosted by: Bin Cheng

Speaker: Dr. Ashkan Ertefaie
Ashkan Ertefaie Assistant Professor, Department of Biostatistics and Computational Biology, University of Rochester
Email: Ashkan_Ertefaie@urmc.rochester.edu​

 

Abstract: "TBA"


April 9, 2020

Topic: "TBA"

11:30am-12:30pm​
AR Building, 8th Floor Auditorium
Hosted by: Linda Valeri

Speaker: Dr. Jukka-Pekka Onnela
Associate Professor, Department of Biostatistics, Harvard University 
Email: onnela@hsph.harvard.edu​

Abstract: "TBA"


April 16, 2020   

Topic: "TBA"

11:30am-12:30pm​
Hammer Building, LL107
Hosted by: Yifei Sun

Speaker: Dr. Gary Chan
Associate Professor, Department of Biostatistics, University of Washington
Email: kcgchan@uw.edu

Abstract: "TBA"


April 23, 2020

Topic: "TBA"

11:30am-12:30pm​
AR Building, 8th Floor Auditorium
Hosted by: Christine Mauro

Speaker: Dr. Laura Hatfield
Associate Professor, Department of Health Care Policy, Harvard Medical School 
Email: Hatfield@hcp.med.harvard.edu​

Abstract: "TBA"


May 1, 2020

Topic: "TBA"

11:30am-12:30pm​
AR Building, 8th Floor Auditorium
Hosted by: Ying Wei

Speaker: Dr. Tianxi Cai
John Rock Professor of Population and Translational Data Sciences, Department of Biostatistics, Harvard University 
Email: tcai@hsph.harvard.edu​

Abstract: "TBA"


May 7, 2020

Topic: "TBA"

11:30am-12:30pm​
AR Building, 8th Floor Auditorium
Hosted by: Seonjoo Lee

Speaker: Dr. Chaeryon Kang
Assistant Professor, Department of Biostatistics, University of Pittsburgh 
Email: crkang@pitt.edu​

Abstract: "TBA"