'How do I get DISTINCT columns and COUNT occurrences of sub DISTINCT columns [closed]

I am having some difficulty building a file download statistics database and displaying the information how I would like.

Table: customer_statistics

| user |  product_id  |  file_download  |  date_accessed        |
-----------------------------------------------------------------
| tom  |  1104        |  file_1.pdf     |  2017-05-06 00:00:00  |
| tom  |  1048        |  file_3.pdf     |  2017-05-06 00:00:00  |
| tom  |  1048        |  file_3.pdf     |  2017-05-06 00:00:00  |
| tom  |  1048        |  file_3.pdf     |  2017-05-06 00:00:00  |
| tom  |  1048        |  file_3.pdf     |  2017-05-06 00:00:00  |
| tom  |  1010        |  file_3.pdf     |  2017-05-06 00:00:00  |
| tom  |  1077        |  file_3.pdf     |  2017-05-06 00:00:00  |
| sue  |  1749        |  file_2.pdf     |  2017-05-06 00:00:00  |
| sue  |  1284        |  file_3.pdf     |  2017-05-06 00:00:00  |
| sue  |  1284        |  file_3.pdf     |  2017-05-06 00:00:00  |
| sue  |  1065        |  file_1.pdf     |  2017-05-06 00:00:00  |
| sue  |  1344        |  file_3.pdf     |  2017-05-06 00:00:00  |
| sue  |  2504        |  file_2.pdf     |  2017-05-06 00:00:00  |

I need to display the following based on the table above:

Tom has downloaded file_3.pdf from 3 different products but has downloaded file_3.pdf from product_id 1048 4 times.

Tom has also downloaded file_1.pdf from 1 product and only once from that product_id

Tom has a total of 7 downloads from 4 different products

Sue has downloaded file_3.pdf from 2 different products but has downloaded file_3.pdf from product_id 1284 2 times.

Sue has also downloaded file_1.pdf from 1 product and only once from that product_id

Sue has also downloaded file_2.pdf from 1 product and only once from that product_id

Sue has a total of 6 downloads from 5 different products

What would the best way to do this?

Do I need to restructure my table?



Solution 1:[1]

Please try the following...

SELECT user AS user,
       file_download AS file_download,
       product_id AS product_id,
       COUNT( * ) AS CountPerProduct,
       CountOfProducts AS CountOfProducts
FROM customer_statistics
JOIN ( SELECT user AS user,
              file_download AS file_download,
              COUNT( product_id ) AS CountOfProducts
       FROM ( SELECT user AS user,
                     file_download AS file_download,
                     product_id AS product_id
              FROM customer_statistics
              GROUP BY user,
                       file_download,
                       product_id
            ) AS uniqueComboFinder
       GROUP BY user,
                file_download
     ) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user
                               AND customer_statistics.file_download = CountOfProductsFinder.file_download
GROUP BY user,
         file_download,
         product_id,
         CountOfProducts;

This statement starts by using the following subquery to form a list of unique combinations of user, file_download and product_id...

SELECT user AS user,
       file_download AS file_download,
       product_id AS product_id
FROM customer_statistics
GROUP BY user,
         file_download,
         product_id

The results of the above subquery are used in the following subquery to get a count of how many product_id values that the user has downloaded the file from...

SELECT user AS user,
       file_download AS file_download,
       COUNT( product_id ) AS CountOfProducts
FROM ( SELECT user AS user,
              file_download AS file_download,
              product_id AS product_id
       FROM customer_statistics
       GROUP BY user,
                file_download,
                product_id
     ) AS uniqueComboFinder
GROUP BY user,
         file_download

The resulting dataset is then joined to an instance customer_statistics in such a way that the count of product_id values for each combination of user and file_download is effectively appended to every corresponding record in customer_statistics.

The dataset resulting from this joining is then grouped by each unique combination of user, file_download and product_id and the count of records belonging to each group (i.e. the count of each time that a user has downloaded a particular file from that product_id) is calculated.

I can't remember if MySQL requires CountOfProducts to be used by GROUP BY. But, even though each combination of user, file_download and product_id determines what the value of CountOfProducts will be, many forms of SQL require you to GROUP BY each nonaggregated field selected. As such, and since adding CountOfProducts to GROUP BY does no harm, I have included CountOfProducts in the GROUP BY clause.

If one or two more rules can be clarified about their structure, then the sentences displayed can be automatically generated.

If you have any questions or comments, then please feel free to post a Comment accordingly.

Appendix

To exclude a single user from the set of results, please use the following variation.

SELECT user AS user,
       file_download AS file_download,
       product_id AS product_id,
       COUNT( * ) AS CountPerProduct,
       CountOfProducts AS CountOfProducts
FROM customer_statistics
JOIN ( SELECT user AS user,
              file_download AS file_download,
              COUNT( product_id ) AS CountOfProducts
       FROM ( SELECT user AS user,
                     file_download AS file_download,
                     product_id AS product_id
              FROM customer_statistics
              WHERE user <> excludedUser
              GROUP BY user,
                       file_download,
                       product_id
            ) AS uniqueComboFinder
       GROUP BY user,
                file_download
     ) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user
                               AND customer_statistics.file_download = CountOfProductsFinder.file_download
GROUP BY user,
         file_download,
         product_id,
         CountOfProducts;

I have used excludedUser here, but you can replace that with a constant value (such as Sam) or a variable that holds the targetted value.

Please note that I have added the WHERE user <> excludedUser clause to the innermost subquery. Because its parent subquery's results are based entirely on the results of the innermost subquery, the excluded user will not be represented in the parent subquery's reults. And since the excluded User value does not appear in the parent subquery's results, when the main statement's INNER JOIN is performed based (in part) on the shared values of User, then the targetted User will be excluded from the joined dataset as well.

By adding the WHERE clause to the innermost subquery, I avoided a small amount of unnecessary processing by the middle and outer levels of the statement, thus making the overall statement slightly more efficient than if the user value had been excluded at the middle or outer levels.

Similarly, if more than one User needs to be excluded, you can exclude them by explicitly coding their values of User into the statement or by joining to a table of exclusionary values. For the first case use...

SELECT user AS user,
       file_download AS file_download,
       product_id AS product_id,
       COUNT( * ) AS CountPerProduct,
       CountOfProducts AS CountOfProducts
FROM customer_statistics
JOIN ( SELECT user AS user,
              file_download AS file_download,
              COUNT( product_id ) AS CountOfProducts
       FROM ( SELECT user AS user,
                     file_download AS file_download,
                     product_id AS product_id
              FROM customer_statistics
              WHERE user NOT IN ( 'Sam', 'I', 'Am' )
              GROUP BY user,
                       file_download,
                       product_id
            ) AS uniqueComboFinder
       GROUP BY user,
                file_download
     ) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user
                               AND customer_statistics.file_download = CountOfProductsFinder.file_download
GROUP BY user,
         file_download,
         product_id,
         CountOfProducts;

In the second case use...

SELECT user AS user,
       file_download AS file_download,
       product_id AS product_id,
       COUNT( * ) AS CountPerProduct,
       CountOfProducts AS CountOfProducts
FROM customer_statistics
JOIN ( SELECT user AS user,
              file_download AS file_download,
              COUNT( product_id ) AS CountOfProducts
       FROM ( SELECT user AS user,
                     file_download AS file_download,
                     product_id AS product_id
              FROM customer_statistics
              WHERE user NOT IN ( SELECT user
                                  FROM excludedUsers
                                )
              GROUP BY user,
                       file_download,
                       product_id
            ) AS uniqueComboFinder
       GROUP BY user,
                file_download
     ) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user
                               AND customer_statistics.file_download = CountOfProductsFinder.file_download
GROUP BY user,
         file_download,
         product_id,
         CountOfProducts;

Solution 2:[2]

I'll give you a hint to get you going.

Start with ditching the individual download records in favor of aggregates, like so:

CREATE TEMPORARY TABLE IF NOT EXISTS basic_aggregated_stats
SELECT user, file_id, product_id, COUNT(*) AS cnt 
  FROM customer_statistics
  GROUP BY user, file_id, product_id;

This is just one step (which, by the way, could also be used as a subquery in a larger more complicated query). There are more aggregations you can and should do to get the information you need. That's not really "restructuring the table".

In addition to more aggregation you need to consider getting the order right and producing subtotals as well.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 einpoklum