'How to write a query to find Median of column in a Table in MySql?
I was trying to solve problem in Hackerrank SQL Practice section and stuck in Problem 'Weather Observation Problem 20'.
To find Median, I though of the following approach:
- sub-query to count the lower half of the entries.
- sub-query to count the upper half of the entries.
- Equate these queries together under a WHERE clause (so that an entry has the same number of entries before and after).
QUERY:
select round(s.lat_n,4)
from station s
where (
select round(count(s.id)/2)-1
from station
) = (
select count(s1.id)
from station s1
where s1.lat_n > s.lat_n
);
PLEASE HELP ME OUT WITH THE OPTIMIZED QUERY.
LINK OF PROBLEM STATEMENT : https://www.hackerrank.com/challenges/weather-observation-station-20/problem
Solution 1:[1]
When you sort the values the median will be either exactly in the middle (odd number of rows) or the average of two values around the middle (even number of rows). For these values the following is true:
- at least half of all values (including itself) are equal or less
- at least half of all values (including itself) are equal or greater
When you find that/those values (let's call them candidates), you will need the average of distinct candidate values.
The abouve can be expressed with the following query:
select round(avg(distinct lat_n), 4) as median_lat_n
from station s
cross join (select count(*) as total from station) t
where t.total <= 2 * (select count(*) from station s1 where s1.lat_n <= s.lat_n)
and t.total <= 2 * (select count(*) from station s1 where s1.lat_n >= s.lat_n)
Note that this is a quite slow solution for big tables.
Solution 2:[2]
Select round(lat_n,4) from (select lat_n , rank() over(order by lat_n) as rnk from station )a where rnk = (select round((count(lat_n)+1)/2,0) as c from station ) ;
This worked for me
Solution 3:[3]
/* SQL ERVER / SELECT LATNR FROM ( SELECT CAST(ROUND(LAT_N,4) AS DECIMAL(17,4)) LATNR, RANK() OVER( ORDER BY LAT_N ASC) IDX FROM STATION ) AS F WHERE IDX IN ( CEILING (CAST((SELECT COUNT() +1 FROM STATION) AS FLOAT) /2 ))
Solution 4:[4]
Select round(lat_n,4)
from (select lat_n , rank() over(order by lat_n) as Rnk_Col
from station) as Rnk_Table
where Rnk_Col = (select round((count(lat_n)+1)/2,0)
from station);
Solution 5:[5]
I found another solution that is more robust than hardcoding the +1
and more efficient than including an IF
statement.
with Tbl as (select row_number() over (order by lat_n) rc, lat_n from station)
select cast(avg(lat_n) as Decimal(15,4))
from tbl
where rc = (select ceiling(max(rc)/2.0) from tbl)
Solution 6:[6]
select round(lat_n,4) from
(SELECT lat_n,(ROW_NUMBER() OVER(order BY lat_n) ) as "serial_no"
from station ) AS t1
WHERE t1.serial_no = (
select (count(*)+1)/2
from station
)
I used row_number() to create a new column in a table that gives a serial number to ascending ordered lat_n and then displayed the median lat_n value corresponding to middle value of row_number.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Paul Spiegel |
Solution 2 | Srikant |
Solution 3 | Certmax |
Solution 4 | Khaled Eraky |
Solution 5 | dmc |
Solution 6 | Gurpartap sandhu |