'Generate underlying distribution from bins in python
I found a PDF document describing the income distribution in the US in 1978. Per income range I have the percentage of the population that falls in that income range. I'd like to generate the underlying distribution in python. The data looks something like this:
under 3000$: 6.2%
$3000-4999$: 8.5%
$5000-$6999: 7.6%
etc
See the screenshot for a more detailed description.
I've found the function scipy.stats.rv_histogram
that generates a distribution given a histogram, but I'm not sure how to create this initial histogram.
Solution 1:[1]
Refer the documentation for scipy.stats.rv_histogram it clearly states that:
Parameters: histogram: tuple of array_like
Tuple containing two array_like objects The first containing the content of n bins The second containing the (n+1) bin boundaries In particular the return value np.histogram is accepted
You can create the histogram like:
import scipy.stats
import numpy as np
total_number = 77330
prob = np.array([6.2, 8.5, 7.6, 10.8, 7.1, 9.6, 15.3, 12.2, 22.7])
data = prob*total_number
bin_boundary = np.array([0, 3000, 5000, 7000, 10000, 12000, 15000, 20000, 25000, 1e7])
hist = (data, bin_boundary)
hist_dist = scipy.stats.rv_histogram(hist)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Mankind_008 |