'Python: How to plot heat map of 2D matrix by ignoring zeros?

I have a matrix of size 500 X 28000, which contains a lot of zeros in between. But let us consider a working example with the matrix A:

A = [[0, 0, 0, 1, 0],
    [1, 0, 0, 2, 3],
    [5, 3, 0, 0, 0],
    [5, 0, 1, 0, 3],
    [6, 0, 0, 9, 0]]

I would like to plot a heatmap of the above matrix, but since it contains a lot of zeros, the heatmap contains almost white space as seen in the figure below.

How can I ignore the zeros in the matrix and plot the heatmap?

Here is the minimal working example that I tried:

im = plt.matshow(A, cmap=pl.cm.hot, norm=LogNorm(vmin=0.01, vmax=64), aspect='auto') # pl is pylab imported a pl
plt.colorbar(im)
plt.show()

which produces:

enter image description here

as you can see it is because of the zeros the white spaces appear.

But my original matrix of size 500X280000 contains a lot of zeros, which makes my colormap almost white!!



Solution 1:[1]

This answer is in the same direction as 'Edit 2' section of Luis' answer. In fact, this is a simplified version of it. I am posting this just in order to correct my misleading statements in my comments. I saw a warning that we should not discuss in the comment area, so I am using this answering area.

Anyway, first let me post my code. Please note that I used a larger matrix randomly generated inside the script, instead of your sample matrix A.

#!/usr/bin/python
#
# This script was written by norio 2016-8-5.

import os, re, sys, random
import numpy as np

#from matplotlib.patches import Ellipse
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.image as img

mpl.rcParams['lines.linewidth'] = 2
mpl.rcParams['lines.markeredgewidth'] = 1.0
mpl.rcParams['axes.formatter.limits'] = (-4,4)
#mpl.rcParams['axes.formatter.limits'] = (-2,2)
mpl.rcParams['axes.labelsize'] = 'large'
mpl.rcParams['xtick.labelsize'] = 'large'
mpl.rcParams['ytick.labelsize'] = 'large'
mpl.rcParams['xtick.direction'] = 'out'
mpl.rcParams['ytick.direction'] = 'out'


############################################
#numrow=500
#numcol=280000
numrow=50
numcol=28000
# .. for testing
numelm=numrow*numcol
eps=1.0e-9
#
#numnz=int(1.0e-7*numelm)
numnz=int(1.0e-5*numelm)
# .. for testing
vmin=1.0e-6
vmax=1.0
outfigname='stackoverflow38790536.png'
############################################

### data matrix
# I am generating a data matrix here artificially.
print 'generating pseudo-data..'
random.seed('20160805')
matA=np.zeros((numrow, numcol))
for je in range(numnz):
    jr = random.uniform(0,numrow)
    jc = random.uniform(0,numcol)
    matA[jr,jc] = random.uniform(vmin,vmax)


### Actual processing for a given data will start from here
print 'processing..'

idxrow=[]
idxcol=[]
val=[]
for ii in range(numrow):
    for jj in range(numcol):
        if np.abs(matA[ii,jj])>eps:
            idxrow.append(ii)
            idxcol.append(jj)
            val.append( np.abs(matA[ii,jj]) )

print 'len(idxrow)=', len(idxrow)    
print 'len(idxcol)=', len(idxcol)    
print 'len(val)=',    len(val)    


############################################
# canvas setting for line plots 
############################################

f_size   = (8,5)

a1_left   = 0.15
a1_bottom  = 0.15
a1_width  = 0.65
a1_height = 0.80
#
hspace=0.02
#
ac_left   = a1_left+a1_width+hspace
ac_bottom = a1_bottom
ac_width  = 0.03
ac_height = a1_height

############################################
# plot 
############################################
print 'plotting..'

fig1=plt.figure(figsize=f_size)
ax1 =plt.axes([a1_left, a1_bottom, a1_width, a1_height], axisbg='w')

pc1=plt.scatter(idxcol, idxrow, s=20, c=val, cmap=mpl.cm.gist_heat_r)
# cf.
# http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter
plt.xlabel('Column Index', fontsize=18)
plt.ylabel('Row Index', fontsize=18)
ax1.set_xlim([0, numcol-1])
ax1.set_ylim([0, numrow-1])

axc =plt.axes([ac_left, ac_bottom, ac_width, ac_height], axisbg='w')
mpl.colorbar.Colorbar(axc,pc1, ticks=np.arange(0.0, 1.5, 0.1) )

plt.savefig(outfigname)
plt.close()

This script output a figure, 'stackoverflow38790536.png', which will look like the following. scatter plot of non-zero elements

As you can see in my code, I used scatter instead of plot. I realized that the plot command is not best suitable for the task here.

Another of my words that I need to correct is that the row_index does not need to have as much as 140,000,000(=500*280000) elements. It only need to have the row indices of the non-zero elements. More correctly, the lists, idxrow, idxcol, and val, which enter into scatter command in the code above, has the lengths equal to the number of non-zero elements.

Please note that both of these points have been correctly taken care of in Luis' answer.

Solution 2:[2]

If you remove the LogNorm, you get black squares instead of white:

im = plt.matshow(A, cmap=plt.cm.hot, aspect='auto') # pl is pylab imported a pl

enter image description here


Edit

In a colormap you always have the complete grid filled with values. That's why you actually create the grid: You account for (say: interpolate) all the points that are not exactly in the grid. That means that your data has many zeroes and that the graph correctly reflects that by looking white (or black). By ignoring those values, you create a misleading graph, if you don't have a clear reason to do so.

If the values different than zero are the ones of interest to you, then you need another type of diagram, like pointed out by norio's comment. For that, you may want to have a look at this answer.


Edit 2

Adapted from this answer

You can treat the values as 1D arrays and plot the points independently, instead of filling a mesh with non-desired values.

A = [[0, 0, 0, 1, 0],
    [1, 0, 0, 2, 3],
    [5, 3, 0, 0, 0],
    [5, 0, 1, 0, 3],
    [6, 0, 0, 9, 0]]
A = np.array(A)
lenx, leny = A.shape

xx = np.array( [ a for a in range(lenx) for a in range(leny) ] )   # Convert 3D to 3*1D
yy = np.array( [ a for a in range(lenx) for b in range(leny) ] )
zz = np.array( [ A[x][y] for x,y in zip(xx,yy) ] )
#---
xx = xx[zz!=0]    # Drop zeroes
yy = yy[zz!=0]
zz = zz[zz!=0]
#---
zi, yi, xi = np.histogram2d(yy, xx, bins=(10,10), weights=zz, normed=False)
zi = np.ma.masked_equal(zi, 0)

fig, ax = plt.subplots()
ax.pcolormesh(xi, yi, zi, edgecolors='black')
scat = ax.scatter(xx, yy, c=zz, s=200)
fig.colorbar(scat)
ax.margins(0.05)

plt.show()

enter image description here

Solution 3:[3]

Although the answer of norio is correct. I think one can give a much more to the point quick answer with only a few lines of code:

import numpy as np
import matplotlib.pyplot as plt
A = np.asarray(A)
x,y = A.nonzero() #get the notzero indices
plt.scatter(x,y,c=A[x,y],s=100,cmap='hot',marker='s') #adjust the size to your needs
plt.colorbar()
plt.show()

enter image description here

Note that the axis are inverted. you could invert them by:

ax=plt.gca()
ax.invert_xaxis()
ax.invert_yaxis()

Also note that you have much more flexibility now:

  • You can set the marker-size and the marker-type and transparancy optionally
  • This procedure is faster, as the zeros are not parsed to matplotlib.

Solution 4:[4]

You can set the zeroes to float("nan") and plot after that, works for me.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 norio
Solution 2 Community
Solution 3 JLT
Solution 4 abe