Mailman Takes the Prize at Columbia Datathon

In their datathon debut, two doctoral students in Biostatistics beat the clock and impress the judges with their insights into the sharing economy.

October 16, 2017

As runners ready themselves for the New York City Marathon, data scientists square off in their own version of going the extra mile: the datathon. Similar to better-known hackathons, datathons bring together computer scientists, engineers, and statisticians, challenging them to untangle complex datasets and solve problems for the thrill of victory, and sometimes, a substantial cash prize.

Wodan Ling and Shanghong Xie, both PhD students in Biostatistics, competed in a six-hour datathon on Columbia’s Morningside campus in September, sponsored by the financial services firm Citadel. From a pool of over 1,000 applicants, the Mailman duo were chosen to face off against 180 entrants challenged to uncover insights from real-world datasets supplied by participating companies.

Joined by another Columbia student and a fourth from NYU, Wodan and Shanghong’s team received their dataset from a leading company in the sharing economy shortly after breakfast and got to work. They began by analyzing the structure of the data, scrubbing the data or removing false information, and organizing the figures—critical and time-consuming steps leading up to the question they proposed to answer.

“Our first task was to pick an interesting question to answer,” Wodan said.

With time quickly passing, the team decided to answer the question that would find a way to select crucial features for the company in different scenarios, inluding season and geography. 

“It’s a simple but useful question,” Wodan explained.

They may have found the question to be simple, but Wodan and Shanghong still felt the pressure of the clock as they tested and re-tested computer models, machine-learning and data-mining techniques to “train” the dataset and gather results for interpretation. At 3pm, it was laptops down and the results of each team were submitted to a panel of three judges from Columbia's Department of Electrical Engineering, Yale Universitys Department of Statistics and Data Science, and AIG, the multinational insurance corporation. While it was their first datathon, the Mailman competitors felt confident in their work, particularly the strength of the question they posed with the data set they were given.

“We knew our question would be interesting to the company,” Shanghong said.

Even so, when the judges concluded its assessment, they were caught off guard. Shanghong and Wodan’s team was victorious, earning a $20,000 cash prize.

“We didn’t expect it,” Wodan said. “We were so surprised.”

While the data question they solved wasn’t directly related to public health, both women saw the competition as a way to strengthen their skills as data scientists who have worked on developing statistical methodologies to solve health-related challenges seen in epidemiology, neuroimaging, and genetic fields.

Likewise, they say, what they learned at the Mailman School helped them excel in the competition.

On November 27, their skills will be tested again as the team returns for the Citadel datathon finals, facing off against competitors from a wide range of schools including Carnegie Mellon, Duke, and MIT for the chance at another cash prize.

F. DuBois Bowman, chair of Biostatistics, was inspired to see two of his students take home honors. “It is encouraging that we attract such talented students at Mailman and provide advanced training on analytic skills with applications to real-world societal matters at a level where our students shine brightly in an open competition that cuts across all sectors.”