Big Data

The rapidly growing availability of massive amounts of health information is accelerating public health research. Researchers at the Mailman School of Public Health are using big data, advanced analytic techniques, and state-of-the-art high performance computing to detect the slightest changes in population-scale electronic health records, pharmaceutical assays, or clinical imaging. With access to a power tool housed at the Medical Center, the C2B2 supercomputer – one of the fastest computer clusters in the world – Mailman School researchers can process upwards of 200 trillion calculations per second and glean insights from data to help guide decisions that affect health. Biostatistics Chair DuBois Bowman is at the vanguard of this effort, charting the future course for big data analytics at the Mailman School.

The appeal of collaborations to harness big data is growing exponentially. Medical device, pharmaceutical, and technology companies seek to leverage the quantitative revolution, setting the stage for innovative partnerships with large academic institutions such as Columbia. Today, the Mailman School is engaging with some of the largest global corporations on big data projects.

With faculty interests so broad, applications of big data can be found in every corner of the School. Jeff Goldsmith, assistant professor of Biostatistics, and Andrew Rundle, associate professor of Epidemiology, use high performance computing to examine the relationship between body mass index and physical activity in New York City youth. Tal Gross, assistant professor of Health Policy and Management, can search a hospital database to learn whether insurance makes people more or less likely to visit an emergency room. Charles DiMaggio, associate professor of Epidemiology, relies on the complex analyses that only the C2B2 computer can offer to find the effect of the 9/11 terrorist attacks on mental health, working with a database of more than one billion records. Bowman is the principal investigator for a multicenter study of Parkinson’s disease, compiling data from 1,600 patients to reveal changes in brain function and other precursors to the disease.

Environmental Health Sciences professor Jeff Shaman is using the computational power of C2B2 to simulate the spread of West Nile Virus. For Sally Findley, professor of Population and Family Health and professor of Sociomedical Sciences, having access to the supercomputer is opening up new ways to analyze nearly half a million records on how New York City school-based obesity interventions are working.

In addition to leading faculty scholars developing novel methods for utilizing big data, a high performance computing expert is available at Mailman to work individually with researchers and provide complete operational support. There are also trainings, workshops and information sessions to give an overview of the supercomputer’s capabilities and available resources.