'How to get median and quartiles/percentiles of an array in JavaScript (or PHP)?
Solution 1:[1]
I updated the JavaScript translation from the first answer to use arrow functions and a bit more concise notation. The functionality remains mostly the same, except for std
, which now computes the sample standard deviation (dividing by arr.length - 1
instead of just arr.length
)
// sort array ascending
const asc = arr => arr.sort((a, b) => a - b);
const sum = arr => arr.reduce((a, b) => a + b, 0);
const mean = arr => sum(arr) / arr.length;
// sample standard deviation
const std = (arr) => {
const mu = mean(arr);
const diffArr = arr.map(a => (a - mu) ** 2);
return Math.sqrt(sum(diffArr) / (arr.length - 1));
};
const quantile = (arr, q) => {
const sorted = asc(arr);
const pos = (sorted.length - 1) * q;
const base = Math.floor(pos);
const rest = pos - base;
if (sorted[base + 1] !== undefined) {
return sorted[base] + rest * (sorted[base + 1] - sorted[base]);
} else {
return sorted[base];
}
};
const q25 = arr => quantile(arr, .25);
const q50 = arr => quantile(arr, .50);
const q75 = arr => quantile(arr, .75);
const median = arr => q50(arr);
Solution 2:[2]
After searching for a long time, finding different versions that give different results, I found this nice snippet on Bastian Pöttner's web blog, but for PHP. For the same price, we get the average and standard deviation of the data (for normal distributions)...
PHP Version
//from https://blog.poettner.de/2011/06/09/simple-statistics-with-php/
function Median($Array) {
return Quartile_50($Array);
}
function Quartile_25($Array) {
return Quartile($Array, 0.25);
}
function Quartile_50($Array) {
return Quartile($Array, 0.5);
}
function Quartile_75($Array) {
return Quartile($Array, 0.75);
}
function Quartile($Array, $Quartile) {
sort($Array);
$pos = (count($Array) - 1) * $Quartile;
$base = floor($pos);
$rest = $pos - $base;
if( isset($Array[$base+1]) ) {
return $Array[$base] + $rest * ($Array[$base+1] - $Array[$base]);
} else {
return $Array[$base];
}
}
function Average($Array) {
return array_sum($Array) / count($Array);
}
function StdDev($Array) {
if( count($Array) < 2 ) {
return;
}
$avg = Average($Array);
$sum = 0;
foreach($Array as $value) {
$sum += pow($value - $avg, 2);
}
return sqrt((1 / (count($Array) - 1)) * $sum);
}
Based on the author's comments, I simply wrote a JavaScript translation that will certainly be useful, because surprisingly, it is nearly impossible to find a JavaScript equivalent on the web, and otherwise requires additional libraries like Math.js
JavaScript Version
//adapted from https://blog.poettner.de/2011/06/09/simple-statistics-with-php/
function Median(data) {
return Quartile_50(data);
}
function Quartile_25(data) {
return Quartile(data, 0.25);
}
function Quartile_50(data) {
return Quartile(data, 0.5);
}
function Quartile_75(data) {
return Quartile(data, 0.75);
}
function Quartile(data, q) {
data=Array_Sort_Numbers(data);
var pos = ((data.length) - 1) * q;
var base = Math.floor(pos);
var rest = pos - base;
if( (data[base+1]!==undefined) ) {
return data[base] + rest * (data[base+1] - data[base]);
} else {
return data[base];
}
}
function Array_Sort_Numbers(inputarray){
return inputarray.sort(function(a, b) {
return a - b;
});
}
function Array_Sum(t){
return t.reduce(function(a, b) { return a + b; }, 0);
}
function Array_Average(data) {
return Array_Sum(data) / data.length;
}
function Array_Stdev(tab){
var i,j,total = 0, mean = 0, diffSqredArr = [];
for(i=0;i<tab.length;i+=1){
total+=tab[i];
}
mean = total/tab.length;
for(j=0;j<tab.length;j+=1){
diffSqredArr.push(Math.pow((tab[j]-mean),2));
}
return (Math.sqrt(diffSqredArr.reduce(function(firstEl, nextEl){
return firstEl + nextEl;
})/tab.length));
}
Solution 3:[3]
TL;DR
The other answers appear to have solid implementations of the "R-7" version of computing quantiles. Below is some context and another JavaScript implementation borrowed from D3 using the same R-7 method, with the bonuses that this solution is es5 compliant (no JavaScript transpilation required) and probably covers a few more edge cases.
Existing solution from D3 (ported to es5/"vanilla JS")
The "Some Background" section, below, should convince you to grab an existing implementation instead of writing your own.
One good candidate is D3's d3.array package. It has a quantile function that's essentially BSD licensed:
https://github.com/d3/d3-array/blob/master/src/quantile.js
I've quickly created a pretty straight port from es6 into vanilla JavaScript of d3's quantileSorted
function (the second function defined in that file) that requires the array of elements to have already been sorted. Here it is. I've tested it against d3's own results enough to feel it's a valid port, but your experience might differ (let me know in the comments if you find a difference, though!):
Again, remember that sorting must come before the call to this function, just as in D3's quantileSorted
.
//Credit D3: https://github.com/d3/d3-array/blob/master/LICENSE
function quantileSorted(values, p, fnValueFrom) {
var n = values.length;
if (!n) {
return;
}
fnValueFrom =
Object.prototype.toString.call(fnValueFrom) == "[object Function]"
? fnValueFrom
: function (x) {
return x;
};
p = +p;
if (p <= 0 || n < 2) {
return +fnValueFrom(values[0], 0, values);
}
if (p >= 1) {
return +fnValueFrom(values[n - 1], n - 1, values);
}
var i = (n - 1) * p,
i0 = Math.floor(i),
value0 = +fnValueFrom(values[i0], i0, values),
value1 = +fnValueFrom(values[i0 + 1], i0 + 1, values);
return value0 + (value1 - value0) * (i - i0);
}
Note that fnValueFrom
is a way to process a complex object into a value. You can see how that might work in a list of d3 usage examples here -- search down where .quantile
is used.
The quick version is if the values
are tortoises and you're sorting tortoise.age
in every case, your fnValueFrom
might be x => x.age
. More complicated versions, including ones that might require accessing the index (parameter 2) and entire collection (parameter 3) during the value calculation, are left up to the reader.
I've added a quick check here so that if nothing is given for fnValueFrom
or if what's given isn't a function the logic assumes the elements in values
are the actual sorted values themselves.
Logical comparison to existing answers
I'm reasonably sure this reduces to the same version in the other two answers (see "The R-7 Method", below), but if you needed to justify why you're using this to a product manager or whatever maybe the below will help.
Quick comparison:
function Quartile(data, q) {
data=Array_Sort_Numbers(data); // we're assuming it's already sorted, above, vs. the function use here. same difference.
var pos = ((data.length) - 1) * q; // i = (n - 1) * p
var base = Math.floor(pos); // i0 = Math.floor(i)
var rest = pos - base; // (i - i0);
if( (data[base+1]!==undefined) ) {
// value0 + (i - i0) * (value1 which is values[i0+1] - value0 which is values[i0])
return data[base] + rest * (data[base+1] - data[base]);
} else {
// I think this is covered by if (p <= 0 || n < 2)
return data[base];
}
}
So that's logically close/appears to be exactly the same. I think d3's version that I ported covers a few more edge/invalid conditions and includes the fnValueFrom
integration, both of which could be useful.
The R-7 Method vs. "Common Sense"
As mentioned in the TL;DR, the answers here, according to d3.array's readme, all use the "R-7 method".
This particular implementation [from d3] uses the R-7 method, which is the default for the R programming language and Excel.
Since the d3.array code matches the other answers here, we can safely say they're all using R-7.
Background
After a little sleuthing on some math and stats StackExchange sites (1, 2), I found that there are "common sensical" ways of calculating each quantile, but that those don't typically mesh up with the results of the nine generally recognized ways to calculate them.
The answer at that second link from stats.stackexchange says of the common-sensical method that...
Your textbook is confused. Very few people or software define quartiles this way. (It tends to make the first quartile too small and the third quartile too large.)
The
quantile
function inR
implements nine different ways to compute quantiles!
I thought that last bit was interesting, and here's what I dug up on those nine methods...
- Wikipedia's description of those nine methods here, nicely grouped in a table
- An article from the Journal of Statistics Education titled "Quartiles in Elementary Statistics"
- A blog post at SAS.com called "Sample quantiles: A comparison of 9 definitions"
The differences between d3's use of "method 7" (R-7) to determine quantiles versus the common sensical approach is demonstrated nicely in the SO question "d3.quantile seems to be calculating q1 incorrectly", and the why is described in good detail in this post that can be found in philippe's original source for the php version.
Here's a bit from Google Translate (original is in German):
In our example, this value is at the (n + 1) / 4 digit = 5.25, i.e. between the 5th value (= 5) and the 6th value (= 7). The fraction (0.25) indicates that in addition to the value of 5, ¼ of the distance between 5 and 6 is added. Q1 is therefore 5 + 0.25 * 2 = 5.5.
All together, that tells me I probably shouldn't try to code something based on my understanding of what quartiles represent and should borrow someone else's solution.
Solution 4:[4]
Based on buboh's answer, which I have used for over a year, I have noticed some weird things for calculating the Q1 and Q3 when there are 2 numbers in the middle.
I have no clue why there is a rest value and how it is used, but by my understanding if you and up having 2 numbers in the middle you need to take the average of them to calculate the median. With that in mind I edited the function:
const asc = (arr) => arr.sort((a, b) => a - b);
const quantile = (arr, q) => {
const sorted = asc(arr);
let pos = (sorted.length - 1) * q;
if (pos % 1 === 0) {
return sorted[pos];
}
pos = Math.floor(pos);
if (sorted[pos + 1] !== undefined) {
return (sorted[pos] + sorted[pos + 1]) / 2;
}
return sorted[pos];
};
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Ben |
Solution 3 | |
Solution 4 |