'get the most common value for each column

I'm attempting to create an SQL query that retrieves the total_cost for every row in a table. Alongside that, I also need to collect the most dominant value for both columnA and columnB, with their respective values.

For example, with the following table contents:

cost columnA columnB target
250 Foo Bar XYZ
200 Foo Bar XYZ
150 Bar Bar ABC
250 Foo Bar ABC

The result would need to be:

total_cost columnA_dominant columnB_dominant columnA_value columnB_value
850 Foo Bar 250 400

Now I can get as far as calculating the total cost - that's no issue. I can also get the most dominant value for columnA using this answer. But after this, I'm not sure how to also get the dominant value for columnB and the values too.

This is my current SQL:

SELECT 
    SUM(`cost`) AS `total_cost`,
    COUNT(`columnA`) AS `columnA_dominant` 
FROM `table`
GROUP BY `columnA_dominant`
ORDER BY `columnA_dominant` DESC
WHERE `target` = "ABC"

UPDATE: Thanks to @Barmar for the idea of using a subquery, I managed to get the dominant values for columnA and columnB:

SELECT 
    -- Retrieve total cost.
    SUM(`cost`) AS `total_cost`,
    -- Get dominant values.
    (
        SELECT `columnA`
        FROM `table`
        GROUP BY `columnA`
        ORDER BY COUNT(*) DESC
        LIMIT 1
    ) AS `columnA_dominant`,
    (
        SELECT `columnB`
        FROM `table`
        GROUP BY `columnB`
        ORDER BY COUNT(*) DESC
        LIMIT 1
    ) AS `columnB_dominant`
FROM `table`
WHERE `target` = "XYZ"

However, I'm still having issues figuring out how to calculate the respective values.



Solution 1:[1]

You might get close, if we want to get percentage values we can try to add COUNT(*) at subquery to get max count by columnA and columnB then do division by total count

SELECT 
    SUM(cost),
    (
        SELECT tt.columnA
        FROM T tt
        GROUP BY tt.columnA
        ORDER BY COUNT(*) DESC
        LIMIT 1
    )  AS columnA_dominant,
    (
        SELECT tt.columnB
        FROM T tt
        GROUP BY tt.columnB
        ORDER BY COUNT(*) DESC
        LIMIT 1
    )  AS columnB_dominant,
    (
        SELECT COUNT(*)
        FROM T tt
        GROUP BY tt.columnA
        ORDER BY COUNT(*) DESC
        LIMIT 1
    ) / COUNT(*) AS columnA_percentage,
    (
        SELECT COUNT(*)
        FROM T tt
        GROUP BY tt.columnB
        ORDER BY COUNT(*) DESC
        LIMIT 1
    ) / COUNT(*) AS columnB_percentage
FROM T t1

If your MySQL version supports the window function, there is another way which reduce table scan might get better performance than a correlated subquery

SELECT SUM(cost) OVER(),
       FIRST_VALUE(columnA) OVER (ORDER BY counter1 DESC) columnA_dominant,
       FIRST_VALUE(columnB) OVER (ORDER BY counter2 DESC) columnB_dominant,
       FIRST_VALUE(counter1) OVER (ORDER BY counter1 DESC) / COUNT(*) OVER() columnA_percentage,
       FIRST_VALUE(counter2) OVER (ORDER BY counter2 DESC) / COUNT(*) OVER() columnB_percentage
FROM (
  SELECT *,
         COUNT(*) OVER (PARTITION BY columnA) counter1,
         COUNT(*) OVER (PARTITION BY columnB) counter2  
  FROM T
) t1
LIMIT 1

sqlfiddle

Solution 2:[2]

try this query

select sum(cost) as total_cost,p.columnA,q.columnB,p.columnA_percentage,q.columnB_percentage
from get_common,(
select top 1 columnA,columnA_percentage
from(
select columnA,count(columnA) as count_columnA,cast(count(columnA) as float)/(select count(columnA) from get_common) as columnA_percentage
from get_common
group by columnA)s
order by count_columnA desc
)p,
(select top 1 columnB,columnB_percentage
from (
select columnB,count(columnB) as count_columnB, cast(count(columnB) as float)/(select count(columnB) from get_common) as columnB_percentage
from get_common
group by columnB) t
order by count_columnB desc)q
group by p.columnA,q.columnB,p.columnA_percentage,q.columnB_percentage

so if you want to get the percent and dominant value you must make their own query like this

select top 1 columnA,columnA_percentage
from(
select columnA,count(columnA) as count_columnA,cast(count(columnA) as float)/(select count(columnA) from get_common) as columnA_percentage
from get_common
group by columnA)s
order by count_columnA desc

then you can join with the sum query to get all value you want

hope this can help you

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 D-Shih
Solution 2 kebis