Levin Lecture Series: Fall 2018 Colloquium Seminars

September 13, 2018

Topic: "Robust regression with compositional covariates"

Speaker: Aditya Mishra, Research Fellow, Flatiron Institute's Center for Computational Biology


AR Building, 8th Floor Auditorium

Hosted by: Dr. Ian McKeague

Abstract: Recent advances in low-cost metagenomic and amplicon sequencing techniques enable routine sampling of environmental and host-associated microbial communities across different habitats. The data produced by these large-scale surveys typically comprise relative abundances (or compositions) of microbial taxa at different taxonomic levels. To investigate the dependency of additional covariate measurements such as metabolites or host phenotypes on the microbial compositions we introduce a general robust regression framework for compositional data. We propose a novel log-contrast regression model with mean shift parameters that allows the identification of sample outliers and maintains sub-compositional coherence with respect to the associated phylogenetic tree. The model is estimated using a sparse penalized regression approach that simultaneously enforces sparsity in mean shift and covariate parameters. We demonstrate the superiority of our approach using a wide range of synthetic simulation scenarios and infer novel associations between body mass index measurements and human gut microbes on a large public collection of human gut microbiome data.


September 20, 2018

Topic: "Quantitation in Colocalization Analysis: Beyond "Red + Yellow = Green"

Speaker: Ming Yuan, Professor, Department of Statistics, Columbia University


AR Building, 8th Floor Auditorium

Hosted by: Dr. Bin Cheng

Abstract: "I see yellow; therefore, there is colocalization.”  Is it really so simple when it comes to colocalization studies?  Unfortunately, and fortunately, no.  Colocalization is in fact a supremely powerful technique for scientists who want to take full advantage of what optical microscopy has to offer: quantitative, correlative information together with spatial resolution. Yet, methods for colocalization have been put into doubt now that images are no longer considered simple visual representations.  Colocalization studies have notoriously been subject to misinterpretation due to difficulties in robust quantification and, more importantly, reproducibility, which results in a constant source of confusion, frustration, and error. In this talk, I will share some of our effort and progress to ease such challenges using novel statistical and computational tools.


September 27, 2018 

Topic: "Integrative Directed Cyclic Graphical Models with Heterogeneous Samples”​

Speaker: Alex Dmitrienko, Founder and President, Mediana Inc.


AR Building, 8th Floor Auditorium

Hosted by: Dr. Naitee Ting

Abstract: This presentation will provide an overview of multiplicity issues arising in confirmatory clinical trials and introduce a general framework for constructing multiple testing procedures (MTPs) for trials with hierarchical objectives (also known as gatekeeping procedures).  Multiplicity problems with hierarchical objectives frequently exhibit a complex structure, including multiple families of hypotheses (based on several clinical endpoints, dose-control comparisons, patient populations, etc) and logical restrictions.  The proposed mixture-based approach relies on combining multiple tests across families and supports powerful and flexible MTPs, e.g., multi-stage procedures based on Hochberg-type tests that account for logical restrictions among the hypotheses of interest.  Clinical trial examples with traditional and adaptive designs will be used to illustrate the mixture-based approach and software implementation of mixture-based gatekeeping procedures will be discussed.


October 4, 2018

Topic: "Interval Dose-Finding Designs: From Model-Based to Model-Free?"

Speaker:  Yuan Ji, Professor, Department of Public Health Sciences, The University of Chicago


AR Building, 8th Floor Auditorium

Hosted by: Dr. Codruta Chiuzan

Abstract: In this talk, I will introduce a set of interval dose-finding designs, including the mTPI, mTPI-2, BOIN, and Keyboard designs. I will provide a theoretical and practical overview of these methods, their frequentist performance, and the pros and cons when compared to standard designs such as 3+3. Model-based designs such as CRM and BLRM will also be discussed. Finally, a new design, called the i3+3 design will be introduced, which is completely model-free. The i3+3 design stands for the interval 3+3 design, which uses a set of simple rules for dose escalation. The performance of the i3+3 design is compared with the interval designs, the 3+3 design, and the CRM and BLRM designs. A new metric to summarize the operating characteristics into a single value is used to evaluate all the designs.


October 11, 2018

Topic: "Geometric methods for image-based statistical analysis of brain tumors​"

Speaker:  Sebastian Kurtek, Associate Professor, Department of Statistics, Ohio State University


AR Building, 8th Floor Auditorium

Hosted by: Dr. Todd Ogden

Abstract: Biomedical studies are a common source of rich and complex imaging data. The statistical analysis of such datasets requires novel methodological developments due to two main challenges: (1) the functional nature of the data objects under study, and (2) the nonlinearity of their representation spaces. In this work, we consider the task of quantifying and analyzing two different types of tumor heterogeneity. The first type, which is represented by a probability density function, summarizes the tumor’s texture information. We use the nonparametric Fisher-Rao Riemannian framework to develop intrinsic statistical methods on the space of probability density functions for summarization and inference. The second type, which is represented by a parameterized, planar closed curve, captures the tumor’s shape information. A key component of analyzing tumor shapes is a suitable metric that enables efficient comparisons, provides tools for computing descriptive statistics and implementing principal component analysis on the tumor shape space, and allows for a rich class of continuous deformations of tumor shape. We demonstrate the utility of our framework on a dataset of Magnetic Resonance Images of patients diagnosed with Glioblastoma Multiforme, a malignant brain tumor with poor prognosis. This work was done in collaboration with my PhD student Abhijoy Saha, as well as colleagues Karthik Bharath (University of Nottingham), Veera Baladandayuthapani (University of Michigan) and Arvind Rao (University of Michigan).


October 18, 2018

Topic: "Scalable and Model-free Methods for Multiclass Probability Estimation"

Speaker:  Helen Zhang, Professor, Department of Mathematics, University of Arizona


AR Building, 8th Floor Auditorium

Hosted by: Dr. Seonjoo Lee

Abstract: Classical approaches for multiclass probability estimation are mostly model-based, such as logistic regression or LDA, by making certain assumptions on the underlying data distribution. We propose a new class of model-free methods to estimate class probabilities based on large-margin classifiers. The method is scalable for high-dimensional data by employing the divide-and-conquer technique, which solves multiple weighted large-margin classifiers and then constructs probability estimates by aggregating multiple classification rules. Without relying on any parametric assumption, the estimates are shown to be consistent asymptotically. Both simulated and real data examples are presented to illustrate performance of the new procedure.

This is the joint work with Xin Wang and Yichao Wu.


October 25, 2018

Topic: "Data Science in the Precision Medicine Era:  Will Statisticians Lead or Follow?"

Speaker:  Yu Shyr, Professor and Chair, Department of Biostatistics, Vanderbilt University


AR Building, 8th Floor Auditorium

Hosted by: Dr. Shing Lee

Abstract: The key concepts of precision medicine are prevention and treatment strategies that take individual molecular profile and clinical information into account. Single-cell next-generation sequencing technologies (NGS), liquid biopsy for circulating tumor DNA (ctDNA) analysis, microbiomics, radiomics, and other types of high-throughput assays have exploded in popularity in recent years, thanks to their ability to produce an enormous volume of data quickly and at relatively low cost. The emergence of these big data has advanced the goals of precision medicine; however, across the entire continuum of big data capture and utilization, many more challenges lie ahead—from analysis of high-throughput biomarkers to maximum exploitation of the electronic health record (EHR), to the ultimate goal of clinical guidance based on a patient’s genome. Because of these challenges, the statistics profession is in a period of disruptive change—change long-time coming, as John Tukey called for a reformation of academic statistics almost 60 years ago. He pointed to the existence of an as-yet unrecognized science in his book, The Future of Data Analysis.  More than ten years ago, John Chambers, Bill Cleveland, and Leo Breiman independently urged academic statistics to expand its boundaries beyond the classical domain of theoretical statistics. Cleveland even suggested the catchy name, “Data Science,” for his envisioned field.  Now, today, the statistical community faces a crucial moment; if we do not participate in the data revolution, we will be marginalized; if we do not adapt our mindset, we will find ourselves relegated to a supporting role on the data science stage; if we do not educate our students on new concepts in statistics, we will be less and less successful in passing the statistical torch. In this presentation, I will offer some perspectives on the changing landscape for statistical science, including the concept of treating unstructured text as quantitative data; the need for statisticians to adjust their mindset around the explosive growth in information technology; machine learning; and the AI revolution. These areas present great opportunities for our profession to strengthen our role in the data science arena. I will finish up with some thoughts about future developments. 


November 1, 2018

Topic: "Double Deep Learning for Adjusting Complex Confounding Structures In Observational Data"

Speaker:  Fei Zou, Professor, Department of Biostatistics, University of North Carolina at Chapel Hill


AR Building, 8th Floor Auditorium

Hosted by: Dr. Jianhua Hu

Abstract: Complex confounding structures are often embedded in observational data, including electronic medical record (EMR) data. A robust yet efficient double deep learning approach is proposed to adjust for the complex confounding structures in comparative effectiveness analysis of EMR data. Specifically, deep neural networks are employed to estimate the conditional expectations of the outcome and the treatment allocation given observed baseline covariates under a semiparametric framework. An improved estimation scheme is further developed to enhance the finite sample performance of the proposed method. Comprehensive numerical studies have shown superior performance of the proposed method, as compared with other existing methods, in terms of reduced bias and mean squared error of the treatment effect estimate.


November 15, 2018

Topic: "Sufficient Dimension Reduction and Covariate Overlap in Causal inference"

Speaker:  Debashis Ghosh, Professor and Chair, Department of Biostatistics and Informatics, Colorado School of Public Health


Hammer Building, Room LL205

Hosted by: Dr. Gen Li

Abstract: There has been major interest recently in estimating causal effects in observational settings.   This is driven by interest in performing comparative effectiveness research studies and having electronic data warehouses available in academic health centers.  In this talk, we explore the use of sufficient dimension reduction methodology to estimate causal effects.  We make two surprising findings: (1) the methodology allows for relaxation of the standard covariate overlap assumptions; (2) doing so leads to super-efficient estimators.  This leads to the second part of the talk, which better seeks to characterize strict overlap with high-dimensional confounders. This work is joint with Wei Luo (CUNY Baruch College), Yeying Zhu (University of Waterloo) and Efrén Cruz-Cortés (Colorado).  


November 29, 2018

Topic: "Statistical methods for correlating microbiome and other –omics data"

Speaker:  Michael Wu, Associate Member, Public Health Sciences Division, Fred Hutch Cancer Research Center


AR Building, 8th Floor Auditorium

Hosted by: Dr. Iuliana Ionita-Laza

Abstract: Understanding the relationship between microbiome and other omics data types is important both for obtaining a more comprehensive  view of biological systems as well as for elucidating mechanisms underlying outcomes and response to exposures.  However, the key features of microbiome data, including high-dimensionality, compositionality, sparsity, phylogenetic constraints, and complexity of relationships among taxa, pose a grand challenge for statistical analysis.  This is compounded by the inherent complexity of the other omics data types as well.  Recognizing these difficulties, we propose new methods for studying both community level correlations between microbiome and other data types as well as for correlating individual omic features with community composition.  We particularly use a generalized measure of multivariate dependence called the kernel RV coefficient which can efficiently measure dependence between microbiome and other omics data while accommodating important structure in the data.  Simulation studies show that our approach can often accurately identify true associations while correctly controlling the false positive rate.  We illustrate our approach on a study examining the association between host genomics and microbiome composition in IBD patients.


December 6, 2018

Topic: "Assessing health insurance coverage in Florida – opportunities for research about combining results from different sources, visualization and informative sampling"

Speaker:  Joseph Sedransk, Professor, Joint Program in Survey Methodology, University of Maryland


AR Building, 8th Floor Auditorium

Hosted by: Dr. Qixuan Chen

Abstract: Inference about county level health insurance coverage in Florida presents opportunities for research to fill gaps in the survey sampling literature. These opportunities include combining results from different sources, taking proper account of selection effects and visualization of uncertainty in maps.


December 13, 2018

Topic: "Multiply Robust Quantile Estimation with Missing Data​"

Speaker:  Peisong Han, Assistant Professor, Department of Biostatistics, University of Michigan


AR Building, 8th Floor Auditorium

Hosted by: Dr. Zhezhen Jin

Abstract: Quantiles provide a more complete picture of a data distribution compared to the mean and are of major interest in many cases. Quantile estimation is often complicated by the presence of missing data, and there has been only a limited literature dealing with this problem. We propose a general framework that combines the two widely adopted approaches for missing data analysis, the imputation approach and the inverse probability weighting approach. The proposed method allows multiple working models for both the missingness probability and the data distribution. The resulting estimators are multiply robust in the sense that they are consistent if any one of these models is correctly specified. Our proposed method is capable of dealing with many different missingness settings, including the estimation of both the marginal quantiles and the conditional quantiles for quantile regression with missing responses and/or covariates, with or without extra auxiliary variables. As an illustration, we will reanalyze the data collected from the AIDS Clinical Trials Group Protocol 175 (ACTG 175).