'Boxplot.stats R not identifying outliers
I have used boxplot.stats$out to get outliers of a list in R. However I noticed that many times it fails to identify outliers. For example:
list = c(3,4,7,500)
boxplot.stats(list)
$`stats`
[1] 3.0 3.5 5.5 253.5 500.0
$n
[1] 4
$conf
[1] -192 203
$out
numeric(0)
quantile(list)
0% 25% 50% 75% 100%
3.00 3.75 5.50 130.25 500.00
130.25+1.5*IQR(list) = 320
As you can see the boxplot.stats() function failed to find the outlier 500, even though when I looked at the documentation they are using the Q1/Q3+/-1.5*IQR method. So 500 should've been identified as an outlier, but it clearly is not finding it and I'm not sure why?
I have tried this with a list of 5 elements instead of 4, or with an outlier that is very small instead of very large and I still get the same problem.
Solution 1:[1]
Notice that the third number in the "stats" portion is 253.5, not 130.25
The documentation for boxplot.stats
says:
The two ‘hinges’ are versions of the first and third quartile, i.e., close to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n <- length(x)) and differ for even n. Whereas the quartiles only equal observations for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally for n %% 4 == 2 (n = 2 mod 4), and are in the middle of two observations otherwise
In other words, for your data, it is using (500+7)/2
as the Q3 value
(and incidentally (3+4)/2 = 3.5
as Q1, not the 3.75 that you got from
quantile
). Boxplot will use the boundary 253.5 + 1.5*(253.5 - 3.5) = 628.5
Solution 2:[2]
If you read the help page help("boxplot.stats")
carefully, the return value section says the following. My emphasis.
stats
a vector of length 5, containing the extreme of the lower
whisker, the lower ‘hinge’, the median, the upper ‘hinge’ and
the extreme of the upper whisker.
Then, in the same section, again my emphasis.
out
the values of any data points which lie beyond the extremes of the whiskers (if(do.out)).
Your data has 4 points. The extreme of the upper whisker, as returned in list member $stats
, is 500.0
, and this is the maximum of your data. There is no error.
Solution 3:[3]
Try this,
library (car)
Boxplot (Petal.Length ~ Species, id = list (n=Inf))
to identify all the outliers
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Rui Barradas |
Solution 3 | Martin Gal |