'How do I find the KL Divergence of samples from two 2D distributions?
Suppose I had two 2D sets of 1000 samples that look something like this:
I'd like to have a metric for the amount of difference between the distributions and thought the KL divergence would be suitable.
I've been looking at sp.stats.entropy(), however from this answer:
Interpreting scipy.stats.entropy values it appears I need to convert it to a pdf first. How can one do this using a 4 1D arrays?
The example data above was generated as follows:
dist1_x = np.random.normal(0, 10, 1000)
dist1_y = np.random.normal(0, 5, 1000)
dist2_x = np.random.normal(3, 10, 1000)
dist2_y = np.random.normal(4, 5, 1000)
plt.scatter(dist1_x, dist1_y)
plt.scatter(dist2_x, dist2_y)
plt.show()
For my real data I only have the samples, not the distribution from which they came (although if need be one could calculate the mean and variance and assume Gaussian). Is it possible to calculate the KL divergence like this?
Solution 1:[1]
There is a paper called "Kullback-Leibler Divergence Estimation of Continuous Distributions (2008)"
And you might find the open source implementation here https://gist.github.com/atabakd/ed0f7581f8510c8587bc2f41a094b518
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Chenghao Lv |