Big-data Clinical Trial Column


Hierarchical cluster analysis in clinical research with heterogeneous study population: highlighting its visualization with R

Zhongheng Zhang, Fionn Murtagh, Sven Van Van Poucke, Su Lin, Peng Lan

Abstract

Big data clinical research typically involves thousands of patients and there are numerous variables available. Conventionally, these variables can be handled by multivariable regression modeling. In this article, the hierarchical cluster analysis (HCA) is introduced. This method is used to explore similarity between observations and/or clusters. The result can be visualized using heat maps and dendrograms. Sometimes, it would be interesting to add scatter plot and smooth lines into the panels of the heat map. The inherent R heatmap package does not provide this function. A series of scatter plots can be created using lattice package, and then background color of each panel is mapped to the regression coefficient by using custom-made panel functions. This is the unique feature of the lattice package. Dendrograms and color keys can be added as the legend elements of the lattice system. The latticeExtra package provides some useful functions for the work.

Download Citation