Heat Maps and Quilt Plots

Overview

Software

Description

Websites

Readings

Courses

 

Overview

A heat map is a data visualization tool that can be used to graphically represent your data. Heat maps can, at the most basic level, be thought of as tables or spreadsheets that have colors instead of numbers (a color-shaded matrix display). The color of each cell or rectangle corresponds to the magnitude of the cell amount. They are useful because they can allow for a given data set to be easily summarized and understood at a glance. The cluster heat map goes a step beyond the basic matrix shading by permuting the rows and columns of a matrix in order to find a structure. Note: In the literature, the term “heat map” is sometimes used for geographical heat maps that involve visualization of geographic data; however, on this page we focus on graphical representations of two dimensional matrices.

Description

History

The color-shaded matrix display that forms the basis of the heat map is more than a century old. Workers in many different professions, including engineering, graphic design, and accounting, have used color to shade the rows, columns, or cells of a spreadsheet in order to highlight or summarize data. For example, Toussaint Loua in his statistical atlas of the population of Paris (1873) used a shaded matrix to display and summarize the characteristics of 20 districts in Paris. There are also early examples of “clustering” or ordering of the data. In 1914, Brinton in his book on graphic methods for presenting data sorted a matrix to rank US states by various educational features. Today, they are widely used in the biological sciences to bring out patterns in the data. In gene expression data, for example, the amount (level) of gene expression determines the color in each cell of the matrix.

What is a quilt plot?

The quilt plot (sometimes called an image plot) is essentially a heat map that doesn’t incorporate a clustering structure. It can be used to present large frequency tables in a way that is easy to visually comprehend. The idea of a quilt plot being distinct from a heat map was debated in the epidemiology literature in January 2014 when PLOS One published a paper by Wand et al. called “Quilt Plots: A Simple Tool for the Visualization of Large Epidemiological Datasets”. The authors state that “quilt plots can be considered as a simple formulation of “heat maps” and that they produce a similar graphical display to “heat maps” when “clustering” and “dendrogram” options are not used (more information on these specifications below). Critics of the study claimed that there was no need to distinguish a quilt plot from a heat map.

Using Heat Maps

Clustering

Cluster analysis is an exploratory statistical technique that is used to identify patterns of groups in a data set. The analytic goal is to find clusters of samples or observations (often, genes) such that observations in a cluster are more similar to each other than they are to observations in different clusters. The cluster heat map can be used to identify both row and column clustering structures in a data matrix at the same time. Hierarchical Clustering is the most commonly used cluster approach in heat maps. It can allow the user to discover sub-structures that are inherent in a given data set. The key idea in hierarchical clustering is to recursively cluster the “closest” pair. In other words, the algorithm typically starts with n observations and joins individuals hierarchically where individuals with the smallest distance apart join together first. The distance matrix is then re-computed according to a linkage function. When creating a cluster heat map, several choices must be made. The linkage method / algorithm (e.g. average, complete, centroid) determines how the data will be grouped. The distance metric (e.g. Euclidean, correlation) defines what is meant by similarity of genes or samples to each other. These choices can determine the type and meaning of pattern that emerges.

Dendrograms

The result of a hierarchical clustering calculation is displayed in a heat map as a dendrogram which can be thought of as a tree. The lowest points of the tree are the individual observations and lines are branches connecting individual observations. The lines form nodes where they meet which correspond to a group. Row dendrograms show the distance (or similarity) between rows and column dendrograms show the distance (or similarity) between the columns. It is important to recognize that hierarchical clustering methods give a tree structure to data regardless of whether a tree structure actually exists or not.

Color Schemes

When creating heat maps, it is important to consider the color scheme that is used. This is critical as the different colors in the heat map must distinguish between different groups or between different levels of a variable. It is important to consider using colorblind friendly palettes. To avoid any issues, it is recommended to use the default color schemes provided by the program. Note that since the quilt plot does not involve clustering, it is possible for the data to be adequately represented in black and white.

Related Techniques

Cluster heat maps are often thought of as a data reduction method in epidemiological studies because they can identify clusters of interest. However, hierarchical cluster analysis methods put constraints on the data and require the data to exist in a tree space. Related techniques include Principal Component Analysis and Multi-Dimensional Scaling. These techniques put weaker constraints on the data.

Readings

Textbooks & Chapters

Michaels K. Epigenetic Epidemiology (2012)

  • In Section 5.6 (Special Considerations for the Analysis of Microarray Data) of Chapter 5, the authors discuss the use of a cluster heat map.

  • Figure 5.2 shows an example of a heat map constructed from DNA methylation data obtained from different types of non-pathological tissues.

Methodological Articles
 

  • Wilkinson L. and Friendkly M. The history of the cluster heat map. The American Statistician. 2009; 63: 179-184.
    This article provides an informative and interesting history of the cluster heat map, highlighting its use over time in different fields.

  • Shannon et al. Analyzing microarray data using cluster analysis. Pharmacogenomics. 2003; 4(1): 41-52.
    This is a review of how to analyze microarray data using cluster analysis. It provides a good summary of heat maps as well as clustering techniques.

  • Wand H, et al. Quilt plots: a simple tool for the visualisation of large epidemiological data. PLoS ONE 9(3): e93201.
    This article discusses the use of a quilt plot in order to display epidemiological data. It spurred a debate about whether or not quilt plots should be distinguished as something different than heat maps.

  • Weinstein JN. A postgenomic visual icon. Science. 2008; 319: 1772-1773.
    This commentary is easy to read and provides a quick overview of the use of cluster heat maps as a graphical visualization for genomic data.

  • Zeleis et al. Escaping RGBland: selecting colors for statistical graphs. Computational Statistics & Data Analysis. 2009; 53(9): 3259–3270. 
    This article discusses how to choose colors for statistical graphs that cooperate with each other and are appealing.

Application Articles
 

  • Eisen et al. Cluster analysis and the display of genome-wide expression patterns. PNAS. 1998; 14863-14868.
    A landmark paper that describes a cluster heat map program for genome wide expression data. It was the third most cited article in PNAS as of July 1, 2008.

  • Gu J et al. Selection of key ambient particulate variables for epidemiological studies – Applying cluster and heatmap analyses as tools for data reduction. Sci Total Environ. 2012;1;435-436:541-50.
    This study applied cluster and heat map analyses to a large dataset of 96 ambient particulate variables in order to identify key variables to use for analysis.

  • Hamid et al. Heatmap showing the standardized 209 by 35 dimensional matrix of encephalitis data set where row wise standardization is performed by subtracting the mean and dividing it by its standard deviation. BMC Infectious Diseases. 2010; 10:364.
    This study uses cluster heat maps with hierarchical cluster analysis to identify sub-groups of encephalitis patients that may share similar characteristics.

  • Lindsey et al. Using cluster heat maps to investigate relationships between body composition and laboratory measurements in HIV-infected and HIV-uninfected children and young adults. J Acquir Immune Defic Syndr. 2012; 59(3): 325–328
    This study used cluster heat maps to identify clusters of HIV-infected children with similar patterns of body composition and laboratory measures.

  • Pleil et al. Heat map visualization of complex environmental and biomarker measurements. Chemosphere. 2011; 716–723
    This study applies the heat map approach to a dataset of environmental data in order to develop further hypotheses.

Software

Basic Data Structure

  • Independent variable (e.g. human subjects, envrionmental samples, days)

  • Array of dependent variables (e.g. biomarker chemicals, biological parameters)

R:

  • For Heat Maps: package stats: heatmap & package gplots: heatmap.2 (can be found on page 21 of reference manual)

    • Be sure to put the data in matrix format!

    • choose a distance measure: dist() function: euclidean (default), maximum, canberra, binary, minkowski, manhattan

    • choose a hierarchical clustering linkage method: hclust() arguments: complete (default), single, average, mcquitty, median, centroid, ward

    • add dendrogram: dendrogram = c(“both”,”row”,”column”,”none”)

    • change colors: many color schemes, use col =, explore color schemes using ?cm.colors

  • For Quilt Plots: package fields: quilt.plot (can be found on page 116 of reference manual)

Microsoft Excel

  • Crude heat maps can also be created in Microsoft Excel to explore patterns in data. However, R is recommended.

  • To implement, select conditional formatting under the Home menu ribbon -> color scales -> more rules, and create formatting rules to shade cells. You can format all cells based on their values and change the color scale.

Other Software: MatLab

Websites
 

Courses
 

  • Did not find courses or workshops that focus specifically on heat maps. However, heat maps are often discussed briefly in courses on statistical genetics and microarrays as well as data visualization courses.

Join the Conversation

Have a question about methods? Join us on Facebook

JOIN