Levin Lecture Series: Spring 2019 Colloquium Seminars

January 24, 2019

Topic: "Towards a Biology of Resilience and Health: A Public Health Paradigm"

Speaker: Dr. Karen Bandeen-Roche, Professor and Chair at Department of Biostatistics, Johns Hopkins University

11:45am-12:45pm​

AR Building, 8th Floor Auditorium

Hosted by: Linda P. Fried, MD, MPH

Abstract: Recent advances in low-cost metagenomic and amplicon sequencing techniques enable routine sampling of environmental and host-associated microbial communities across different habitats. The data produced by these large-scale surveys typically comprise relative abundances (or compositions) of microbial taxa at different taxonomic levels. To investigate the dependency of additional covariate measurements such as metabolites or host phenotypes on the microbial compositions we introduce a general robust regression framework for compositional data. We propose a novel log-contrast regression model with mean shift parameters that allows the identification of sample outliers and maintains sub-compositional coherence with respect to the associated phylogenetic tree. The model is estimated using a sparse penalized regression approach that simultaneously enforces sparsity in mean shift and covariate parameters. We demonstrate the superiority of our approach using a wide range of synthetic simulation scenarios and infer novel associations between body mass index measurements and human gut microbes on a large public collection of human gut microbiome data.


January 31, 2019

Topic: "Let them eat cake (first)!​"

Speaker: Dr. Mine Cetinkaya-Rundel, Director of Undergraduate Studies and Associate Professor, Department of Statistical Science, Duke University

11:45am-12:45pm​

AR Building, 8th Floor Auditorium

Hosted by: Dr. Christine Mauro

Abstract: Backwards design, designing educational curricula by setting goals before choosing instructional methods and forms of assessment, is a widely accepted approach to course development. In this talk we introduce a course design approach inspired by backwards design, where students are exposed to results and findings of a data analysis first and then learn about the building blocks of the methods and techniques used to arrive at these results. We present this approach in the context of an introductory data science course that focuses on exploratory data analysis, modeling, and effective communication, while requiring reproducibility and collaboration. The talk is organized in three parts (visualization, data acquisition, and modeling) and features examples of in class activities, details of the course curriculum, and sample student work.


February 7, 2019

Topic: "CANCELLED"​

Speaker: Max Kuhn, Software Engineer, Rstudio, Inc.

11:45am-12:45pm

AR Building, 8th Floor Auditorium

Hosted by: Dr. Jeff Goldsmith

Abstract: "CANCELLED"​


February 14, 2019

Topic: "Disease progression model with real-world observational data"

Speaker:  Zhaonan Sun, Research Scientist, IBM Research

11:45am-12:45pm​

AR Building, 8th Floor Auditorium

Hosted by: Dr. Yuanjia Wang

Abstract: Chronic diseases such as COPD and Huntington's disease progress slowly over extended periods. The knowledge of how progression is affected by differing patient characteristics is crucial for informing clinical decisions. However, understanding disease progression from real-world data is challenging. Not only are observations noisy, and irregular in time, but the rate of progression may exhibit significant variation across patients. In addition, different stages of the target disease may have unbalanced coverage in the observational data.  We propose a disease progression model with covariates to tackle two difficulties when modeling disease progression with observational data. First, the proposed model explicitly accounts for patient level heterogeneity in progression by conditioning on patient characteristics. Second, the model mitigates the difficulties caused by unbalanced samples by leveraging multi-task learning structures. We demonstrate the capabilities of the proposed model by both simulation studies and an applications to real-world disease registry data.


February 21, 2019

Topic: "An Iterative Penalized Least Squares Approach to Sparse Canonical Correlation Analysis​"

Speaker:  Xin Zhang, Assistant Professor, Department of Statistics, Florida State University 

11:45am-12:45pm

AR Building, 8th Floor Auditorium

Hosted by: Dr. Gen Li

Abstract: It is increasingly interesting to model the relationship between two sets of high-dimensional measurements with potentially high correlations. Canonical correlation analysis (CCA) is a classical tool that explores the dependency of two multivariate random variables and extracts canonical pairs of highly correlated linear combinations. Driven by applications in genomics, text mining, and imaging research, among others, many recent studies generalize CCA to high-dimensional settings. However, most of them either rely on strong assumptions on covariance matrices, or do not produce nested solutions. We propose a new sparse CCA (SCCA) method that recasts high-dimensional CCA as an iterative penalized least squares problem. Thanks to the new iterative penalized least squares formulation, our method directly penalizes and estimates the sparse CCA directions with efficient algorithms. Therefore, in contrast to some existing methods, the new SCCA does not impose any sparsity assumptions on the covariance matrices. The proposed SCCA is also very flexible in the sense that it can be easily combined with properly chosen penalty functions to perform structured variable selection and incorporate prior information. Moreover, our proposal of SCCA produces nested solutions and thus provides great convenient in practice. Theoretical results show that SCCA can consistently estimate the true canonical pairs with an overwhelming probability in ultra-high dimensions. Numerical results also demonstrate the competitive performance of SCCA. 


February 28, 2019

Topic: "A Parsimonious Personalized Dose Finding Model via Dimension Reduction"

Speaker:  Ruoqing Zhu, Assistant Professor, Department of Statistics, University of Illinois at Urbana

11:45am-12:45pm

AR Building, 8th Floor Auditorium

Hosted by: Dr. Yifei Sun

Abstract: Learning an individualized dose rule (IDR) in personalized medicine is a challenging statistical problem. Existing methods for estimating the optimal IDR often suffer from the curse of dimensionality, especially when the IDR is learned nonparametrically using machine learning approaches. To tackle this problem, we propose a dimension reduction framework. The proposed framework exploits that the IDR can be reduced to a nonparametric function which relies only on a few linear combinations of the original covariates, hence leading to a more parsimonious model. To achieve this, we propose two approaches, a direct learning approach that yields the IDR as commonly desired in personalized medicine, and a pseudo-direct learning approach that focuses more on learning the dimension reduction space. Under regularity assumptions, we provide the convergence rate for the semiparametric estimators and Fisher consistency properties for the corresponding value function. For the pseudo-direct learning estimator, we use an orthogonality constrained optimization approach on Stiefel manifold to update the dimension reduction space. For the direct learning approach, we use an alternative updating scheme that iteratively updates the dimension reduction space and the nonparametric optimal dose rule function. The performances of the proposed methods are evaluated through simulation studies and a warfarin pharmacogenetic dataset.


March 6, 2019

Topic: "Object Oriented Data Analysis"

Speaker:  Steve MarronAmos Hawley Distinguished Professor, Department of Statistics, University of Michigan

11:45am-12:45pm

AR Building, 8th Floor Auditorium

Hosted by: Dr. Gen Li

Abstract: Object Oriented Data Analysis is the statistical analysis of populations of complex objects. In the special case of Functional Data Analysis, these data objects are curves, where standard Euclidean approaches, such as principal components analysis, have been very successful.  In non-Euclidean analysis, the approach of Backwards PCA is seen to be quite useful.  An overview of insightful mathematical statistics for object data is given, based on High Dimension Low Sample Size asymptotics, where the dimension grows, but the sample size is fixed.


March 7, 2019

Topic: "A modular framework for seamless oncology trials"

Speaker:  Phil BonstraAssistant Professor, Department of Biostatistics, University of Michigan

11:45am-12:45pm

AR Building, 8th Floor Auditorium

Hosted by: Dr. Ken Cheung

Abstract: With a more sophisticated understanding of the etiology and mechanisms of cancer, and more therapeutic options than ever in this domain, phase I oncology trials today are frequently tasked with multiple primary objectives. More specifically, many such designs are now ‘seamless’ in nature, meaning that, in addition to estimating a maximum tolerated dose, initial efficacy estimates at this dose level in one or more subpopulation are sought. Trial sponsors are often disinclined to proceed with further study in the absence of this additional efficacy evidence. However, with this growing complexity in trial design, it becomes challenging to analytically calculate even fundamental operating characteristics of these trials, such as (i) what is the probability that the design will identify an acceptable, i.e. safe and efficacious dose level? or (ii) how many patients will be assigned to an acceptable dose level on average? To that end, in this talk, I propose a new modular framework for thinking about and designing seamless oncology trials. Each module is either a dose assignment step or a dose-efficacy evaluation, and multiple such modules can be implemented sequentially. Just as important, I also present a trial simulator to numerically estimate the operating characteristics of such modular trials. Together, this design framework and its accompanying simulator allow the clinical trialist to compare multiple different candidate designs, more rigorously assess performance, better justify sample sizes, and ultimately select a higher quality design.


March 14, 2019

Topic: "Statistical Inference as Severe Testing (How it Gets You Beyond the Statistics Wars)​"

Speaker:  Deborah Mayo, Professor of Philosophy, Department of Philosophy, Virginia Tech

11:45am-12:45pm

AR Building, 8th Floor Auditorium

Hosted by: Dr. Prakash Gorroochurn

Abstract: High-profile failures of replication in the social and biological sciences underwrites a minimal requirement of evidence: If little or nothing has been done to rule out flaws in inferring a claim, then it has not passed a severe test. A claim is severely tested to the extent it has been subjected to and passes a test that probably would have found flaws, were they present. This minimal severe-testing requirement leads to reformulating significance tests (and related methods) to avoid familiar criticisms and abuses. Viewing statistical inference as severe testing–whether or not you accept it–offers a key to understand and get beyond the statistics wars.


March 28, 2019

Topic: "A Tuning-free Approach to High-dimensional Regression"

Speaker: Lan Wang, Professor, School of Statistics, University of Minnesota

11:45am-12:45pm

AR Building, 8th Floor Auditorium

Hosted by: Dr. Min Qian

Abstract: We introduce a new tuning-free approach for high-dimensional regression with theoretical guarantee. The new procedure possesses several appealing properties simultaneously. Computationally, it can be efficiently solved via linear programming with an easily simulated tuning parameter, which automatically adapts to both the unknown random error distribution and the correlation structure of the design matrix. It is robust with substantial efficiency gain for heavy-tailed random errors while maintains high efficiency for normal random errors. It enjoys an essential scale-equivariance property that permits coherent interpretation when the response variable undergoes a scale transformation, a desirable property possessed by the classical least squares estimator but lost by Lasso and its variants. Under weak conditions for the random error distribution, we establish a finite-sample error bound with a near-oracle rate for the new estimator with the simulated tuning parameter. (Joint work with Bo Peng, Jelena Bradic, Runze Li and Yunan Wu).


April 4, 2019

Topic: "Spectral and heritability analysis of EEG time series data using a nested Dirichlet process"

Speaker: Mark Fiecas, Assistant Professor, School of Public Health, Division of Biostatistics, University of Minnesota

11:45am-12:45pm

AR Building, 8th Floor Auditorium

Hosted by: Dr. Todd Ogden

Abstract: In this talk, we will analyze the spectral features of resting-state EEG time series data collected from twins enrolled in the Minnesota Twin Family Study (MTFS). Our goal is to calculate the heritability of the spectral features of the resting EEG data. Due to the twin design of the MTFS, the time series will have similar underlying characteristics across individuals. To account for this, we develop a Bayesian nonparametric modeling approach for estimating the spectral densities of the EEG data. In our methodology, we use Bernstein polynomials and a Dirichlet process (DP) to estimate each subject-specific spectral density. In order to estimate the spectral densities for the entire sample, we nest this model using a nested DP process. Thus, the top level DP clusters individuals with similar spectral densities and the bottom-level dependent DP fits a functional curve to the individuals within that cluster. We then extract relevant spectral features from the estimates of the spectral densities and estimate their heritability. This is joint work with PhD student Brian Hart (Univ. of Minnesota), Dr. Michele Guindani (UC Irvine), and Dr. Stephen Malone (Univ. of Minnesota).


April 11, 2019

Topic: "Instrumental Variable Learning of Marginal Structural Models​"

Speaker: Eric Tchetgen Tchetgen, Luddy Family President's Distinguished Professor, The Wharton School, Statistics Department, University of Pennsylvania

11:45am-12:45pm

AR Building, 8th Floor Auditorium

Hosted by: Dr. Caleb Miles

Abstract: In a seminal paper, Robins (1998) introduced marginal structural models (MSMs), a general class of counterfactual models for the joint effects of time-varying treatment regimes in complex longitudinal studies subject to time-varying confounding. He established identification of MSM parameters under a sequential randomization assumption (SRA), which rules out unmeasured confounding of treatment assignment over time. We extend Robins' MSM theory by considering identification of MSM parameters with the aid of a time-varying instrumental variable, when sequential randomization fails to hold due to unmeasured confounding. Our identification conditions essentially require that no unobserved confounder predicts compliance type at each follow-up time. Under this assumption, we obtain a large class of semiparametric estimators that extends standard inverse-probability weighting (IPW) and includes multiply robust estimators, including a locally semiparametric efficient estimator. The approach provides a unified solution to IV inference from point exposure to time-varying exposure settings, including mean models with possibly nonlinear link functions, quantile MSMs and time to event models such as Cox MSMs. Finally, we briefly discuss recent robust IV methods that further allow for violation of the core IV identifying condition, the exclusion restriction assumption, without compromising inference.


April 18, 2019

Topic: "The unreasonable effectiveness of public work"

Speaker: David Robinson, Chief Data Scientist, DataCamp

11:45am-12:45pm

AR Building, 8th Floor Auditorium

Hosted by: Dr. Jeff Goldsmith

Abstract: In this talk, I'll lay out the reasons that blogging, open source contribution, and other forms of public work are a critical part of a data science career. For beginners, a blog is a great accompaniment to data science coursework and tutorials, since it gives you experience applying practical data science skills to real problems. For data scientists at any stage of their careers, open source development offers practice in collaboration, documentation, and interface design that complement other kinds of software development. And for data scientists more advanced in their careers, writing a book is a great way to crystallize your expertise and ensure others can build on it. All of these practices build skills in communication and collaboration that form an essential component of data science work. Each also lets you build a public portfolio of your skills, get feedback from your peers, and network with the larger data science community.

* David Robinson is the Chief Data Scientist at DataCamp. He has previously worked as a data scientist at Stack Overflow and received his PhD from Princeton University. He is the co-author with Julia Silge of the tidytext package and the O’Reilly book Text Mining with R, as well as the author of the broom and fuzzyjoin R packages and of the e-book Introduction to Empirical Bayes. He writes about R, statistics and education on his blog Variance Explained, on Twitter as @drob, and in weekly coding screencasts on YouTube.


April 25, 2019

Topic: "CANCELLED"

Speaker: Jing Ning, Associate Professor, Department of Biostatistics, The University of Texas MD Anderson Cancer Center

11:45am-12:45pm

AR Building, 8th Floor Auditorium

Hosted by: Dr. Jianhua Hu

Abstract: "CANCELLED"


May 2, 2019

Topic: "Sensitivity analyses for unobserved effect moderation when generalizing from trial to population​"

Speaker: Elizabeth Stuart, Professor, Department of Mental Health & Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health

11:45am-12:45pm

AR Building, 8th Floor Auditorium

Hosted by: Dr. Christina Mauro

Abstract: In the presence of treatment effect heterogeneity, the average treatment effect (ATE) in a randomized controlled trial (RCT) may differ from the average effect of the same treatment if applied to a target population of interest.  But for policy purposes we may desire an estimate of the target population ATE. If all treatment effect moderators are observed in the RCT and in a dataset representing the target population, then we can obtain an estimate for the target population ATE by adjusting for the difference in the distribution of the moderators between the two samples.   However, that is often an unrealistic assumption in practice.  This talk will discuss methods for generalizing treatment effects under that assumption, as well as sensitivity analyses for two situations: (1) where we cannot adjust for a specific moderator observed in the RCT because we do not observe it in the target population; and (2) where we are concerned that the treatment effect may be moderated by factors not observed even in the RCT. Outcome-model and weighting-based sensitivity analysis methods are presented. The methods are applied to examples in drug abuse treatment. Implications for study design and analyses are also discussed, when interest is in a target population ATE. 


May 9, 2019

Topic: "TBA"

Speaker: Stacia DeSantis, Associate Professor, Department of Biostatistics, The University of Texas Health Science Center at Houston

11:45am-12:45pm

AR Building, 8th Floor Auditorium

Hosted by: Dr. Cody Chiuzan

Abstract: "TBA"