Heatmap cannot cluster due to too many NA reasons and solutions

Heatmap cannot cluster due to too many NA reasons and solutions

Sometimes there are NAs in the data, which can be clustered, but sometimes an error like this will be reported: "Error in hclustfun(distfun(x)): NA/NaN/Inf in foreign function call (arg 11)"

The reason for this error is to start with the distance calculation method dist() and the clustering method hclust() called by the heatmap function.

First create a data set with NA, and make a heatmap:

library(gplots)
library()
mat = matrix( rnorm(25), 5, 5)
mat[c(1,6,8,11,15,20,22,24)] = NaN
Colors=rev(brewer.pal(11,"Spectral"))
heatmap.2( mat, col = Colors,
           trace = "none", 
           xlab = "Comparison",
           scale = c("none"),
           na.color="gray", 
           dendrogram = "row", 
           Colv = FALSE)

You can make a heat map, where the gray part is NA:

heatmap with NAs

This data set looks like this:

data

heatmap.2 calls the dist() function by default to calculate the distance (other heatmap packages basically default to this function):

dist

This data set has NA, but the heat map can still be made. The reason is because there is no NA in the distance calculated by dist(), hclust() can still cluster.

If we have a file with many NAs in it, for example, construct the following data:

mat = matrix(rnorm(49), 7, 7)
mat = rbind(mat[1:4, ], c(rep(NA,6), 1.2416), mat[5:6, ])
mat[1:2,3:7] <- rep(NA, 10)

data2

Calculate dist():

dist(mat)

return:

dist2

At this time, I did the heatmap and reported an error, hclust could not cluster: Error in hclustfun(distr): NA/NaN/Inf(arg11) cannot be used when the external function is called

This can be solved by modifying the distfun parameter, changing from the default hclust to the distance defined by ourselves, and replacing the calculated NA distance, for example:

dist_no_na <- function(mat) {
    edist <- dist(mat)
    edist[which(is.na(edist))] <- max(edist, na.rm=TRUE) * 1.1 
    return(edist)
}
heatmap.2( mat, col = Colors,
           trace = "none", 
           xlab = "Comparison",
           scale = c("none"),
           na.color="gray",
           dendrogram = "row", 
           Colv = FALSE,
           distfun=dist_no_na)

heatmap with changed NAs

Note that some heat map functions cannot adjust the clustering method.

Reference: https://cloud.tencent.com/developer/article/1607867 heatmap cannot cluster due to too many NA reasons and solutions-Cloud + Community-Tencent Cloud