'get the most common value for each column
I'm attempting to create an SQL query that retrieves the total_cost
for every row in a table. Alongside that, I also need to collect the most dominant value for both columnA
and columnB
, with their respective values.
For example, with the following table contents:
cost | columnA | columnB | target |
---|---|---|---|
250 | Foo | Bar | XYZ |
200 | Foo | Bar | XYZ |
150 | Bar | Bar | ABC |
250 | Foo | Bar | ABC |
The result would need to be:
total_cost | columnA_dominant | columnB_dominant | columnA_value | columnB_value |
---|---|---|---|---|
850 | Foo | Bar | 250 | 400 |
Now I can get as far as calculating the total cost - that's no issue. I can also get the most dominant value for columnA
using this answer. But after this, I'm not sure how to also get the dominant value for columnB
and the values too.
This is my current SQL:
SELECT
SUM(`cost`) AS `total_cost`,
COUNT(`columnA`) AS `columnA_dominant`
FROM `table`
GROUP BY `columnA_dominant`
ORDER BY `columnA_dominant` DESC
WHERE `target` = "ABC"
UPDATE: Thanks to @Barmar for the idea of using a subquery, I managed to get the dominant values for columnA
and columnB
:
SELECT
-- Retrieve total cost.
SUM(`cost`) AS `total_cost`,
-- Get dominant values.
(
SELECT `columnA`
FROM `table`
GROUP BY `columnA`
ORDER BY COUNT(*) DESC
LIMIT 1
) AS `columnA_dominant`,
(
SELECT `columnB`
FROM `table`
GROUP BY `columnB`
ORDER BY COUNT(*) DESC
LIMIT 1
) AS `columnB_dominant`
FROM `table`
WHERE `target` = "XYZ"
However, I'm still having issues figuring out how to calculate the respective values.
Solution 1:[1]
You might get close, if we want to get percentage values we can try to add COUNT(*)
at subquery to get max count by columnA
and columnB
then do division by total count
SELECT
SUM(cost),
(
SELECT tt.columnA
FROM T tt
GROUP BY tt.columnA
ORDER BY COUNT(*) DESC
LIMIT 1
) AS columnA_dominant,
(
SELECT tt.columnB
FROM T tt
GROUP BY tt.columnB
ORDER BY COUNT(*) DESC
LIMIT 1
) AS columnB_dominant,
(
SELECT COUNT(*)
FROM T tt
GROUP BY tt.columnA
ORDER BY COUNT(*) DESC
LIMIT 1
) / COUNT(*) AS columnA_percentage,
(
SELECT COUNT(*)
FROM T tt
GROUP BY tt.columnB
ORDER BY COUNT(*) DESC
LIMIT 1
) / COUNT(*) AS columnB_percentage
FROM T t1
If your MySQL version supports the window function, there is another way which reduce table scan might get better performance than a correlated subquery
SELECT SUM(cost) OVER(),
FIRST_VALUE(columnA) OVER (ORDER BY counter1 DESC) columnA_dominant,
FIRST_VALUE(columnB) OVER (ORDER BY counter2 DESC) columnB_dominant,
FIRST_VALUE(counter1) OVER (ORDER BY counter1 DESC) / COUNT(*) OVER() columnA_percentage,
FIRST_VALUE(counter2) OVER (ORDER BY counter2 DESC) / COUNT(*) OVER() columnB_percentage
FROM (
SELECT *,
COUNT(*) OVER (PARTITION BY columnA) counter1,
COUNT(*) OVER (PARTITION BY columnB) counter2
FROM T
) t1
LIMIT 1
Solution 2:[2]
try this query
select sum(cost) as total_cost,p.columnA,q.columnB,p.columnA_percentage,q.columnB_percentage
from get_common,(
select top 1 columnA,columnA_percentage
from(
select columnA,count(columnA) as count_columnA,cast(count(columnA) as float)/(select count(columnA) from get_common) as columnA_percentage
from get_common
group by columnA)s
order by count_columnA desc
)p,
(select top 1 columnB,columnB_percentage
from (
select columnB,count(columnB) as count_columnB, cast(count(columnB) as float)/(select count(columnB) from get_common) as columnB_percentage
from get_common
group by columnB) t
order by count_columnB desc)q
group by p.columnA,q.columnB,p.columnA_percentage,q.columnB_percentage
so if you want to get the percent and dominant value you must make their own query like this
select top 1 columnA,columnA_percentage
from(
select columnA,count(columnA) as count_columnA,cast(count(columnA) as float)/(select count(columnA) from get_common) as columnA_percentage
from get_common
group by columnA)s
order by count_columnA desc
then you can join with the sum query to get all value you want
hope this can help you
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | D-Shih |
Solution 2 | kebis |