Machine Learning Boot Camp: Analyzing Biomedical and Health Data

Most recent Machine Learning Boot Camp in NYC: June 6-7, 2019. 


The Machine Learning Boot Camp is a two-day intensive boot camp of seminars combined with hands-on R sessions to provide an overview of concepts, techniques, and data analysis methods with applications in biomedical research.      



Subscribe here for registration deadlines and course updates.



Summer 2019 dates: June 6-7; 8:30am - 5:00pm

This two-day intensive training will provide a broad introduction to machine learning methodology with applications in biomedical research. Taught by a team of biostatisticians, the Boot Camp will integrate seminar lectures with hands-on R lab sessions to put concepts into practice. Emphasis will be given to supervised (e.g., penalized methods, classification and decision trees, survival forests) and unsupervised methods (e.g., clustering algorithms, principal components) with numerous case studies and biomedical applications. The workshop will conclude with a brief overview on ‘deep learning’ approaches DOs and DON’Ts.

By the end of the boot camp, participants will be familiar with the following topics:

  • Penalized Regression Methods (Ridge and Lasso)
  • Support Vector Machines
  • Decision Trees (Random Forest)
  • Predicting Survival Outcomes (Cox Regression/Lasso, Survival Forests)
  • Clustering Algorithms
  • Principle Component Analysis (PCA)
  • Deep Learning – An Overview

Investigators at all career stages are welcome to attend, and we particularly encourage trainees and early-stage investigators to participate.


Noah Simon, PhD, Department of Biostatistics, School of Public Health, University of Washington. Dr. Simon’s methodological interests include computationally efficient methods for predictive modeling with high-dimensional, complex data, and the design of adaptive clinical trials.

Yifei Sun, PhD, Department of Biostatistics, Mailman School of Public Health, Columbia University. Dr. Sun’s research interests include survival and longitudinal data analysis and statistical machine learning for time-to-event data.

Cody Chiuzan, PhD, Department of Biostatistics, Mailman School of Public Health, Columbia University. Dr. Chiuzan’s research interests concern development of adaptive early-phase designs for oncology trials, including questions on the optimal study designs and endpoints for early-phase immune- and targeted-oncology agents. Dr. Chiuzan is the Director for Educational Initiatives of CTSA Biostatistics, Epidemiology and Research Design Resource (BERD) Resource.


There are three prerequisites/requirements to attend:

  1. Each participant must have an introductory background in statistics.
  2. Each participant is encouraged to be familiar with R. The main software used for the workshop will be R/RStudio, therefore we strongly recommend that participants have a basic understanding of this software prior to attending the Training. Still, it is ok if you have never used R before because a brief tutorial/refresher will be provided on the first day of Training.
  3. Each participant is required to bring a personal laptop with R/RStudio installed prior to the first day of the workshop, as all lab sessions will be done on your personal laptop. R is available for free download and installation on Mac, PC, and Linux devices. Please review the R Installation Guide below.


Basic R knowledge is highly recommended for the boot camp as noted in prerequisites above. If you are new to R or need a refresher, a brief tutorial will be offered on the first day of the boot camp.

  • R Installation Guide: R is the free software programming language we will use in the boot camp. Use this installation guide to choose the correct version for your laptop (Mac/Windows) and install it prior to the first day of the boot camp.

  • Introduction to R: A free edX class on R fundamentals using Datacamp platform 

If you have any specific questions about R and R studio in the context of the Machine Learning Boot Camp, please email us.


The Machine Learning Boot Camp will take place on the Columbia University Medical Center (CUMC) Campus in New York City, specifically at Columbia Mailman School of Public Health, 722 W. 168th Street, Allan Rosenfield Building 5th Floor, Room 532 A/B. Please note that the entrance to the building is on the 10th floor (training is located five floors below entrance).

General transportation and lodging information can be found in the Getting Around sectionA PDF map of the 2019 Boot Camp location on the CUMC campus will be available closer to training dates. 


Training scholarships are available for the Machine Learning Boot Camp. The scholarship submission period is now closed.


This boot camp was excellent in providing an introduction to machine learning. The quality of instruction was outstanding. Victoria C., Research Biostatistician at Weill Cornell Medicine, 2019

This is a great graduate-level workshop to understand the similarities and differences between traditional statistical modeling and machine learning. The level and pace are good, as are the Rmarkdown examples. - Anonymous Faculty member, 2019

I enjoyed the ML boot camp. The instructors are highly knowledgable of statistics and ML as well as helpful. The intro to R session (1 hour) was not long enough and it was very rushed due to the fact that we were scheduled to go right into the actual boot camp workshop immediately afterwards. The days were long but not overly taxing. Overall, for someone with no background in using R or ML, I feel that I learned a tremendous amount! Thanks! - Greg D, Faculty member at University of Delaware, 2019

Instructors are super dedicated and training materials are well prepared. I definitely feel more confident in applying statistical learning methods in my work. - Xian W, Research Biostatistician at Weill Cornell Medicine, 2019 

An excellent bootcamp that gives a good overview of machine learning as a concept as well as specific approaches. - Haotian W., Postdoc at Columbia Mailman School of Public Health, 2019

This was a great boot camp for people with a firm understanding of principles of statistics and machine learning, who are looking to deepen their knowledge, understanding, and application of machine learning in their research projects. - Marta J., Assistant Research Scientist at UCSD, 2019

It was a great introduction to ML and it provided me with the right tools to apply these techniques in my own research. - Sujith R., Faculty member at University of Mississippi, 2019


 Early-Bird Rate (through 4/15/19)Regular Rate (4/16/19 - 5/21/19)Columbia Discount*
Student/Postdoc/Trainee    $1,150 $1,35010%
Faculty/Academic Staff/Non-Profit Organizations$1,350 $1,55010%
Corporate/For-Profit Organizations$1,550$1,750NA

*Columbia Discount: This discount is valid for any active student, postdoc, staff, or faculty at Columbia University. To access Columbia discount, email for instructions.

Registration Fee: includes course material, breakfast, lunch, and refreshment breaks. Course material will be available to all students after the workshop. Lodging and transportation are not included.  

Cancellations: Cancellation notices must be received via email at least 30 days prior to the workshop start date in order to receive a full refund, minus a $50 administrative fee. Cancellation notices received via email 14-29 days prior to the workshop will receive a 50% refund, minus a $50 administrative fee. Please email your cancellation notice to Due to workshop capacity, we regret that we are unable to refund registration fees for cancellations after these dates. 

If you are unable to attend the training, we encourage you to send a substitute within the same registration category. Please inform us of the substitute via email at least one week prior to the training to include them on attendee communications, updated registration forms, and materials. Should the substitute fall within a different registration category your credit card will be credited/charged respectively. Please email substitute inquiries to




Want updates on new Boot Camp details or registration deadlines? Subscribe here.

Questions? Email the Boot Camp team here.

The Machine Learning Boot Camp is hosted by Columbia University's Department of Environmental Health Sciences and Department of Biostatistics in the Mailman School of Public Health, and the Irving Institute for Clinical and Translational Research: Biostatistics, Epidemiology, and Research Design (BERD) Educational Resource.