Storage Options

Which HPC Storage option should I choose?

MSPH offers access to three levels of data storage through Systems Biology and to a variety of cloud-based storage services through vendors like Amazon Web Services (AWS). 

In choosing which storage option is right for you, you should consider: 

  • Latency - How fast the data can be accessed?
  • Throughput - How much data can be accessed at once?
  • Simplicity - How difficult is it to configure storage and access data?
  • Support - How much help will you need?
  • Scalability - How often do you need to increase or decrease storage?
  • Cost - How much will it cost to store your data based on your requirements?

Most storage solutions are designed around three access types: 

  • Hot - Immediate and regular access at optimal performance levels
  • Cold - Semi-regular access with a balance of cost and performance 
  • Glacier - Archival (long-term) storage at the lowest possible cost

Hot

  • Low Latency, High Throughput - For some jobs, users require immediate and regular access to large amounts of data at the fastest speeds and with the largest bandwidths possible. Prioritizing these elements can have a dramatic impact on cost.
  • Systems Biology - TBD
  • AWS - Amazon’s Simple Storage Service (S3) is designed to integrate seamlessly with AWS products like EC2, with an emphasis on durability, security, simplicity, and speed. AWS S3 uses an object storage system, which replaces a traditional file architecture by storing data as objects and using assigned metadata for identification. This flat structure offers improved scalability and durability. Learn More About AWS S3

Cold

  • Semi-regular access - For data that is no longer in active use and might not be needed for long periods of time, or can be accessed at slower speeds, cold storage is the more cost-effective choice while still allowing for practical use.
  • Systems Biology - TBD
  • AWS - Elastic Block Store (EBS) is a flexible, scalable storage service that integrates with EC2 to deliver affordable reliable data storage to meet almost any need. Learn More About EBS

Glacier 

  • Archival access only - For completed projects and long-term data preservation without urgent access requirements, glacier storage offers the cheapest storage rates. Access to this data is typically measured in hours or even days.
  • Systems Biology - TBD
  • AWS - Amazon’s S3 Glacier provides low-cost archival storage and data backup. S3 Glacier enables users to store an unlimited amount of data without having to account for capacity planning, hardware failure, or data migrations, while offering easy transfer of data from AWS S3. Learn More About S3 Glacier