Levin Lecture Series: Spring 2018 Colloquium Seminars

January 25, 2018

Topic: "Causal Inference for a Single Group of Causally-Connected Units Under Stratified Interference"

Speaker: Caleb Miles, Postdoctoral Fellow, Department of Biostatistics, Epidemiology and Informatics, University of California- Berkeley

11:30am-12:30pm

 

AR Building, 8th Floor Auditorium

 

Hosted by: Dr. Jeff Goldsmith

 

Abstract: The assumption that no subject's exposure affects another subject's outcome, known as the assumption of no interference, has long held a foundational position in the study of causal inference. However, this assumption may be violated in many settings, and in recent years has been relaxed considerably. Often this has been achieved with either the aid of knowledge of an underlying network, or the assumption that the population can be partitioned into separate groups, between which there is no interference, and within which each subject's outcome may be affected by all the other subjects in the group, but only as a function of the total number of subjects exposed (the stratified interference assumption). In this paper, we consider a setting in which we can rely on neither of these aids, as each subject affects every other subject's outcome. In particular, we consider settings in which the stratified interference assumption is reasonable for a single group consisting of the entire sample, i.e., a subject's outcome is affected by all other subjects' exposures, but only via the total number of subjects exposed. This can occur when the exposure is a shared resource whose efficacy is modified by the number of subjects among whom it is shared. We present a doubly-robust estimator that allows for incorporation of machine learning, and tools for inference for a class of causal parameters that includes direct effects and overall effects under certain interventions. We conduct a simulation study, and present results from a data application where we study the effect of a nurse-based triage system on the outcomes of patients receiving HIV care in Kenyan health clinics.

 

february 1, 2018

Topic: "Association tests for neuroimaging studies of development and disorders"

Speaker: Simon Vandekar, PhD Candidate, Department of Statistics and Data, University of Pennsylvania

11:30AM-12:30PM

 

AR BUILDING, 8TH FLOOR AUDITORIUM

 

HOSTED BY: DR. JEFF GOLDSMITH

 

Abstract: In development and adulthood, cognitive symptoms of disorders are preceded by neuroanatomical abnormalities that are indicative of disease vulnerability. There are growing bodies of literature identifying neuroanatomical markers of psychosis in adolescence and dementia in adulthood. Prevention is an increasing target of intervention for reducing the burden of these disorders. Establishing early biomarkers for disease is critical for identifying individuals who are at risk and may benefit from early therapy. In this talk, we will briefly discuss tools developed in collaborative work to characterize healthy developmental patterns of neuroanatomy and cerebral blood flow through adolescence. We will then focus in depth on identifying neuroanatomical features associated with Alzheimer’s disease risk.  We propose and develop a framework for testing the association of a high-dimensional imaging measurement with a diagnostic outcome, and for localizing signal to identify regions of the brain that are associated with disease. Our procedure is based on a modification of the score test that projects the imaging data to a lower dimensional subspace. Local regional inference can then be performed using the score statistics that are projected into the lower dimensional space, which have smaller variance and degrees of freedom.

 

FEBRUARY 5, 2018 (monday seminar)

Topic: "Integrative Directed Cyclic Graphical Models with Heterogeneous Samples”​

Speaker: Yang Ni, Postdoctoral Fellow, Department of Statistics and Data Sciences, University of Texas-Austin

11:30AM-12:30PM

 

AR BUILDING, hess commons

 

HOSTED BY: DR. JEFF GOLDSMITH

 

Abstract: In this talk, I will introduce hierarchical directed cyclic graphical models to infer gene networks by integrating genomic data across platforms and across diseases.  The proposed model takes into account tumor heterogeneity. In the case of data that can be naturally divided into known groups, we propose to connect graphs by introducing a hierarchical prior across group-specific graphs, including a correlation on edge strengths across graphs. A novel thresholding prior is applied to induce sparsity of the estimated networks and its connection to spike-and-slab prior and non-local prior will also be discussed. In the case of unknown groups, we cluster subjects into subpopulations and jointly estimate cluster-specific gene networks, again using similar hierarchical priors across clusters. Two applications with multiplatform genomic data for multiple cancers will be presented to illustrate the utility of our model.

 

FEBRUARY 8, 2018

Topic: "Causal Inference with Unmeasured Confounding: an Instrumental Variable Approach"

Speaker: Linbo Wang, Postdoctoral Fellow, Department of Biostatistics, Harvard University

11:30AM-12:30PM

 

AR BUILDING, 8TH FLOOR AUDITORIUM

 

HOSTED BY: DR. JEFF GOLDSMITH

 

Abstract: Causal inference is a challenging problem because causation cannot be established from observational data alone. Researchers typically rely on additional sources of information to infer causation from association. Such information may come from powerful designs such as randomization, or background knowledge such as information on all confounders. However, perfect designs or background knowledge required for establishing causality may not always be available in practice. In this talk, I use novel causal identification results to show that the instrumental variable approach can be used to combine the power of design and background knowledge to draw causal conclusions. I also introduce novel estimation tools to construct estimators that are robust, efficient and enjoy good finite sample properties. These methods will be discussed in the context of a randomized encouragement design for a flu vaccine.

 

FEBRUARY 12, 2018 (Monday)

Topic: "Inference for statistical interactions under misspecified or high-dimensional main effects"

Speaker: Zihuai He, Postdoctoral Fellow, Department of Biostatistics, Columbia University

11:30AM-12:30PM

 

AR BUILDING, 8TH FLOOR AUDITORIUM

 

HOSTED BY: DR. JEFF GOLDSMITH

 

Abstract: An increasing number multi-omic studies (e.g. genetics, genomics, epigenetics) have generated complex high-dimensional data. A primary focus of these studies is to determine whether exposures interact in the effect that they produce on an outcome of interest. Interaction is commonly assessed by fitting regression models in which the linear predictor includes the product between those exposures. When the main interest lies in interactions, the standard approach is not satisfactory because it is prone to (possibly severe) type I error inflation when the main exposure effects are misspecified or high-dimensional. I will propose generalized score type tests for high-dimensional interaction effects on correlated outcomes. I will also discuss the theoretical justification of some empirical observations regarding Type I error control, and introduce solutions to achieve robust inference for statistical interactions. The proposed methods will be illustrated using an example from the Multi-Ethnic Study of Atherosclerosis (MESA), investigating interaction between measures of neighborhood environment and genetic regions on longitudinal measures of blood pressure over a study period of about seven years with four exams.

 

FEBRUARY 15, 2018 

Topic: "Is most published research really false? "

Speaker: Jeff Leek, Associate Professor, Department of Biostatistics, Johns Hopkins University

11:30AM-12:30PM

 

AR BUILDING, 8TH FLOOR AUDITORIUM

 

HOSTED BY: DR. gen li

 

Abstract: There has been increasing concern in both the scientific and popular press that most published research is false. In this talk I will discuss a framework for defining false discoveries in the medical literature and present estimates of the science-wise false discovery rate across science using a new regression modeling approach and data from 2.5 million published p-values. I will discuss how data analyst choices are a major contributor to the science-wise false discovery rate and how we are performing human-data interaction experiments to address these problems.

 

FEBRUARY 22, 2018 

Topic: "Fine Mapping and Alleleic Heterogeneity"

Speaker: Elezar Eskin, Professor, Departments of Computer Science and Human Genetics, UCLA

11:30AM-12:30PM

 

AR BUILDING, 8TH FLOOR AUDITORIUM

 

HOSTED BY: DR. iuliana Ionita-laza

 

Abstract: Genome Wide Association Studies (GWAS) have identified many genomic regions or loci which harbor genetic variants that affect traits.   However, within each of these regions, there are many genetic variants which as associated with the trait, yet most of these variants do not have a direct effect on the trait.  The process of identifying the actual variant in the region which has an effect on the disease is referred to as “fine mapping.”  In addition to finding the actual variants affecting a disease, fine mapping also seeks to address questions that are related to the genetic basis of disease. First, how many causal variants does a locus contain? A disease could be caused by one, single variant or multiple variants that independently affect disease status. We refer to the latter phenomenon as allelic heterogeneity (AH).  Second, when analyzing results from multiple GWASes, are the same causal variants affect both traits or are different variants effecting each trait?  Differentiating between shared and distinct causal variants is referred to as Colocalization.  In this talk, I present recent work from our group on fine mapping methods which provides a framework for identifying causal variants and can be applied to discover and quantify allelic heterogeneity and colocolization.

 

march 1, 2018 

Topic: "Statistical Design and Methods for a Scientific Breakthrough Study in HIV/AIDS Research​"

Speaker: Ying Qing Chen, Full Member Board of Governors , Program in Biostatistics and Biomathematics, Fred Hutchinson Cancer Research Center

11:30AM-12:30PM

 

AR BUILDING, 8TH FLOOR AUDITORIUM

 

HOSTED BY: DR. zhezhen jin

 

 

Abstract: The HIV Prevention Trial Network (HPTN) 052 Study is a Phase III, controlled, randomized clinical trial to assess the effectiveness of immediate versus delayed antiretroviral therapy strategies on sexual transmission of HIV-1 (Cohen, et al., 2011). It was hailed by the Science Magazine as the Scientific Breakthrough of the Year for 2011 (Alberts, 2011). In this talk, we will focus on the design and methods that underlie this successful study in HIV Treatment-as-Prevention, and discuss the lessons that we have learned for future research.

References:
Alberts, B (2011) Science breakthroughs, Science, 334: 1604
Cohen, MS, Chen, YQ, McCauley, M, et al. (2011) Prevention of HIV-1 infection with early antiretroviral therapy, New England Journal of Medicine, 365: 493-505

 

MARCH 8, 2018 

Topic: "Hypothesis testing as a game"

Speaker: Glen ShaferProfessor, Rutgers Business School – Newark and New Brunswick

11:30AM-12:30PM

 

AR BUILDING, 8TH FLOOR AUDITORIUM

 

HOSTED BY: DR. Prakash GORROCCHURN

 

 

 

Abstract: The correspondence between Pascal and Fermat is one of the most famous milestones in the history of probability.  They differed on how to solve the problem of dividing stakes, but they arrived at the same answer.  Their different approaches were both rooted in a long history of reasoning about games, but Pascal’s was more purely game-theoretic and has something to tell us about our current troubles with p-values.

 

 

MARCH 22, 2018 

Topic: "On the interval-based dose-finding designs​"

Speaker: Yuan Ji, Director, Program of Computational Genomics & Medicine, NorthShore University HealthSystem

11:30AM-12:30PM

 

AR BUILDING, 8TH FLOOR AUDITORIUM

 

HOSTED BY: DR. codruta "cody" chiuzan

 

Abstract: The landscape of dose-finding designs for phase I clinical trials is rapidly shifting in the recent years, noticeably marked by the emergence of interval-based designs. We categorize them as the iDesigns and the IB-Designs. The iDesigns are originated by the toxicity probability interval (TPI) designs and its two modifications, the mTPI and mTPI-2 designs. The IB-Designs started as the cumulative cohort design (CCD) and is recently extended by the BOIN design. We discuss the differences and similarities between these two classes of interval-based designs, and compare their simulation performance with popular non-interval designs, such as the CRM and 3+3 designs. We also show that in addition to the population-level operating characteristics from simulated trials, investigators should also assess the dose-finding decision tables from the implemented designs to better understand the per-trial and per-patient behavior. This is particularly important for nonstatisticians to assess the designs with transparency. We provide a comprehensive simulation-based comparative study on various interval-based dose-finding designs.

 

MARCH 29, 2018 

Topic: "Recent Progress in Machine Learning and Precision Medicine"

Speaker: Michael Kosorok, W.R. Kenan, Jr. Distinguished Professor and Chair, Department of Biostatistics, UNC-Chapel Hill

11:30AM-12:30PM

 

AR BUILDING, 8TH FLOOR AUDITORIUM

 

HOSTED BY: DR. min qian

 

Abstract: Precision medicine, the paradigm of improving clinical care through data driven approaches to tailoring treatment to the individual, is an important area of statistical and biomedical research. Individualized treatment rules (ITR's) formalize precision medicine as mappings from the space of patient covariates to the set of available treatments or, equivalently, as mappings which identify covariate-defined subgroups for which different treatments should be applied. ITR's are thus an important tool to improve patient outcomes through utilizing biomarkers to target treatment. Machine learning has become an increasingly utilized and evolving methodology for ITR discovery, and we discuss recent progress in this area and present examples in type I diabetes and bipolar disorder.

 

 

april 5, 2018 

Topic: "Bayesian Semi-parametric Functional Mixed Models for Serially Correlated Functional Data, with Application to Glaucoma Data"

Speaker: Jeff Morris, Professor and Interim Deputy Chair, Department of Biostatistics, The University of Texas MD Anderson Cancer Center

11:30AM-12:30PM

 

AR BUILDING, 8TH FLOOR AUDITORIUM

 

HOSTED BY: DR. gen li

 

Abstract: Glaucoma, a leading cause of blindness, is characterized by optic nerve damage related to intraocular pressure (IOP), but its full etiology is unknown. Researchers at UAB have devised a custom device to measure scleral strain continuously around the eye under fixed levels of IOP, which here is used to assess how strain varies around the posterior pole, with IOP, and across glaucoma risk factors such as age. The hypothesis is that scleral strain decreases with age, which could alter biomechanics of the optic nerve head and cause damage that could eventually lead to glaucoma. To evaluate this hypothesis, we adapted Bayesian Functional Mixed Models to model these complex data consisting of correlated functions on spherical scleral surface, with nonparametric age effects allowed to vary in magnitude and smoothness across the scleral surface, multi-level random effect functions to capture within-subject correlation, and functional growth curve terms to capture serial correlation across IOPs that can vary around the scleral surface. Our method yields fully Bayesian inference on the scleral surface or any aggregation or transformation thereof, and reveals interesting insights into the biomechanical etiology of glaucoma. The general modeling framework described is very flexible and applicable to many complex, high-dimensional functional data, and currently general Matlab/R package being assembled to fit a broad set of Bayesian Functional Mixed Models throughout the literature.  This work is to appear in JASA-ACS.

 

APRIL 12, 2018 

Topic: "Clustering Mixed-Type Data"

Speaker: Marianthi Markatou, Professor, Department of Biostatistics, University of Buffalo

11:30AM-12:30PM

 

AR BUILDING, 8TH FLOOR AUDITORIUM

 

HOSTED BY: DR. ying wei

 

 

Abstract: Despite the existence of a large number of clustering algorithms, clustering mixed interval (continuous) and categorical (nominal and/or ordinal) scale data remains a challenging problem. We show that current clustering methods for mixed-scale data suffer from at least one of two central challenges: 1) they are unable to equitably balance the contribution of continuous and categorical scale variables without strong parametric assumptions; 2) they are unable to properly handle data sets in which only a subset of variables are related to the underlying cluster structure of interest. We first develop KAMILA (KAY-means for Mixed Large data), a clustering method that addresses (1) and in many situations (2) without requiring strong assumptions. We next develop MEDEA (Multivariate Eigenvalue Decomposition Error Adjustment), a weighting scheme that addresses (2) even in the face of a large number of uninformative variables. We study theoretical aspects of our methods and demonstrate their performance using Monte Carlo simulations and real data sets.  

 

APRIL 19, 2018 

Topic: "Hybrid Principal Components Analysis For Region-Referenced Longitudinal Functional EEG Data​"

Speaker: Damla Senturk, Associate Professor, Department of Biostatistics, UCLA

11:30AM-12:30PM

 

AR BUILDING, 8TH FLOOR AUDITORIUM

 

HOSTED BY: DR. jeff goldsmith

 

Abstract: Electroencephalography (EEG) data possess a complex structure that includes regional, functional, and longitudinal dimensions. Our motivating example is a word segmentation paradigm in which typically developing (TD) children and children with Autism Spectrum Disorder (ASD) were exposed to a continuous speech stream. For each subject, continuous EEG signals recorded at each electrode were divided into one-second segments and projected into the frequency domain via Fast Fourier Transform. Following a spectral principal components analysis, the resulting data consist of region-referenced principal power indexed regionally by scalp location, functionally across frequencies and longitudinally by one-second segments. Standard EEG power analyses often collapse information across the longitudinal and functional dimensions by averaging power across segments and concentrating on specific frequency bands. We propose a hybrid principal components analysis (HPCA) for region-referenced longitudinal functional EEG data which utilizes both vector and functional principal components analyses and does not collapse information along any of the three dimensions of the data. The proposed decomposition only assumes weak separability of the higher-dimensional covariance process and utilizes a product of one dimensional eigenvectors and eigenfunctions, obtained from the regional, functional, and longitudinal marginal covariances, to represent the observed data, providing a computationally feasible non- parametric approach. A mixed effects framework is proposed to estimate the model components coupled with a bootstrap test for group level inference, both geared towards sparse data applications. Analysis of the data from the word segmentation paradigm leads to valuable insights about group-region differences among the TD and verbal and minimally verbal children with ASD. Finite sample properties of the proposed estimation framework and bootstrap inference procedure are further studied via extensive simulations. 

 

 

APRIL 26, 2018 

Topic: "Statistical Inference for High-Dimensional Longitudinal Data​"

Speaker: Runze LiVerne M. Willaman Professor, Department of Statistics, Penn State University

11:30AM-12:30PM

 

AR BUILDING, 8TH FLOOR AUDITORIUM

 

HOSTED BY: DR. min qian

 

AbstractThis paper is concerned with statistical inference for longitudinal data with ultrahigh dimensional covariates. We first study the problem of constructing confidence intervals and hypothesis tests for a low dimensional parameter ofinterest. The major challenge is  how to construct an optimal test statistic in the presence of high dimensional nuisance parameters and the sophisticated dependence among measurements. To deal with the challenge, we propose a quadratic decorrelated inference function approach, which simultaneously removes the impact of nuisance parameters and incorporates the correlation to enhance the efficiency of the estimation procedure. We prove that the proposed estimator is asymptotically normal and attains the semiparametric information bound, based on which we can construct an optimal test statistic for the parameter of interest.  We conduct simulation studies to assess the finite sample performance of the proposed procedures. Our simulation results implies that the newly proposed procedure can control both Type I error for testing a low dimensional parameter of interest. We illustrate the proposed procedure by a real data example.  This is joint work with Ethan Fang and Yang Ning.

 

May 3, 2018 

Topic: "Recent Advances in Elastic Functional Data Analysis​"

Speaker: Anuj SrivastavaDistinguished Research Professor, Department of Statistics, Florida State University

11:30AM-12:30PM

 

AR BUILDING, 8TH FLOOR AUDITORIUM

 

HOSTED BY: DR. todd Ogden

 

Abstract: Functional data analysis (FDA) is fast becoming an important research area, due to its broad applications in many branches of science, including biostatistics and bioinformatics. An essential component of FDA is registration of points across functional objects. Without proper registration, the results are often inferior and difficult to interpret. The current practice in FDA community is to treat registration as a pre-processing step, using off-the-shelf alignment procedures, and follow it up with statistical analysis of the resulting data. In contrast, Elastic FDA is a more comprehensive approach, where one solves for the registration and statistical inferences in a simultaneous fashion. The key idea here is to use Riemannian metrics with appropriate invariance properties, to form objective functions for alignment and to develop statistical models involving functional data. While these elastic metrics are complicated in general, we have developed a family of square-root transformations that map these metrics into simpler Euclidean representations, thus enabling more standard statistical procedures. Specifically, we have developed techniques for elastic functional PCA and elastic regression models involving functional variables. I will demonstrate this ideas using imaging data in neuroscience and bioinformatics, where biological structures can often be represented as functions (curves or surfaces) on intervals or spheres. Examples of curves include DTI fiber tracts and chromosomes while examples of surfaces include subcortical structures (hippocampus, thalamus, putamen, etc). Statistical goals here include shape analysis and modeling of these structures and to use their shapes in medical diagnosis.