'Histogram with percentage bins in Python/numpy?

I need to create a histogram with percentage bins from a 2D set of data, like this (this is basically a set of reports from various devices, each line is a device reporting its status for a given hour):

# hour # parameter (in percents)
00     10
00     20
00     30
01     40
01     50
...

so that there would be a stacked histogram summary of the devices' reports binned by hour and percentile, just like the gnuplot example below, with bins representing the percentiles the reports fall into (say 0 < r < 10%, 10% < r < 20% and so on).

enter image description here

Right now I've only thought about creating a 2D array and feeding it all to gnuplot like this:

#!/usr/bin/python

import numpy as np
import sys

data = np.loadtxt('mac-quality.csv')
out = [ [ 0 for k in xrange(10) ] for i in (xrange(24) ) ]

for i in data:
    hour = i[0].astype(int)
    quality = i[1].astype(int)
    for bin in xrange(10):
        pct = bin * 10
        if quality > pct and quality < (pct + 10):
            print('Data: %s, H: %s Percentile: %s:') % (i, hour, pct)
            out[hour][bin] += 1
# print(out)

What would be the correct way of generating these histograms from within python?



Solution 1:[1]

This uses exactly your python code, but extends it with some Matplotlib library code, which is commonly used for plotting in python. This generally replaces gnuplot in python.

import numpy as np
import sys
import matplotlib.pyplot as plt

data = np.loadtxt('mac-quality.csv')
out = [ [ 0 for k in xrange(10) ] for i in (xrange(24) ) ]

# Number of bins you have
nBins = 10

for i in data:
    hour = i[0].astype(int)
    quality = i[1].astype(int)
    for bin in xrange(10):
        pct = bin * 10
        if quality > pct and quality < (pct + 10):
            print('Data: %s, H: %s Percentile: %s:') % (i, hour, pct)
            out[hour][bin] += 1


plt.hist(data, nBins, normed=1, histtype='bar', stacked=True)
plt.title('Some Title')
plt.show()

Solution 2:[2]

Maybe I'm misunderstanding the ask - but you could do:

data = [ ... ]
bins = 10 or [manual bins here]
bin_numbers = np.digitize(data, bins=bins).flatten()
answer = np.bincount(bin_numbers) / len(bin_numbers)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Nicholas Lawrence