'How do I get DISTINCT columns and COUNT occurrences of sub DISTINCT columns [closed]
I am having some difficulty building a file download statistics database and displaying the information how I would like.
Table: customer_statistics
| user | product_id | file_download | date_accessed |
-----------------------------------------------------------------
| tom | 1104 | file_1.pdf | 2017-05-06 00:00:00 |
| tom | 1048 | file_3.pdf | 2017-05-06 00:00:00 |
| tom | 1048 | file_3.pdf | 2017-05-06 00:00:00 |
| tom | 1048 | file_3.pdf | 2017-05-06 00:00:00 |
| tom | 1048 | file_3.pdf | 2017-05-06 00:00:00 |
| tom | 1010 | file_3.pdf | 2017-05-06 00:00:00 |
| tom | 1077 | file_3.pdf | 2017-05-06 00:00:00 |
| sue | 1749 | file_2.pdf | 2017-05-06 00:00:00 |
| sue | 1284 | file_3.pdf | 2017-05-06 00:00:00 |
| sue | 1284 | file_3.pdf | 2017-05-06 00:00:00 |
| sue | 1065 | file_1.pdf | 2017-05-06 00:00:00 |
| sue | 1344 | file_3.pdf | 2017-05-06 00:00:00 |
| sue | 2504 | file_2.pdf | 2017-05-06 00:00:00 |
I need to display the following based on the table above:
Tom has downloaded
file_3.pdf
from3
different products but has downloadedfile_3.pdf
fromproduct_id 1048
4 times.Tom has also downloaded
file_1.pdf
from1
product and only once from thatproduct_id
Tom has a total of
7
downloads from4
different productsSue has downloaded
file_3.pdf
from2
different products but has downloadedfile_3.pdf
fromproduct_id 1284
2 times.Sue has also downloaded
file_1.pdf
from1
product and only once from thatproduct_id
Sue has also downloaded
file_2.pdf
from1
product and only once from thatproduct_id
Sue has a total of
6
downloads from5
different products
What would the best way to do this?
Do I need to restructure my table?
Solution 1:[1]
Please try the following...
SELECT user AS user,
file_download AS file_download,
product_id AS product_id,
COUNT( * ) AS CountPerProduct,
CountOfProducts AS CountOfProducts
FROM customer_statistics
JOIN ( SELECT user AS user,
file_download AS file_download,
COUNT( product_id ) AS CountOfProducts
FROM ( SELECT user AS user,
file_download AS file_download,
product_id AS product_id
FROM customer_statistics
GROUP BY user,
file_download,
product_id
) AS uniqueComboFinder
GROUP BY user,
file_download
) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user
AND customer_statistics.file_download = CountOfProductsFinder.file_download
GROUP BY user,
file_download,
product_id,
CountOfProducts;
This statement starts by using the following subquery to form a list of unique combinations of user
, file_download
and product_id
...
SELECT user AS user,
file_download AS file_download,
product_id AS product_id
FROM customer_statistics
GROUP BY user,
file_download,
product_id
The results of the above subquery are used in the following subquery to get a count of how many product_id
values that the user
has downloaded the file
from...
SELECT user AS user,
file_download AS file_download,
COUNT( product_id ) AS CountOfProducts
FROM ( SELECT user AS user,
file_download AS file_download,
product_id AS product_id
FROM customer_statistics
GROUP BY user,
file_download,
product_id
) AS uniqueComboFinder
GROUP BY user,
file_download
The resulting dataset is then joined to an instance customer_statistics
in such a way that the count of product_id
values for each combination of user
and file_download
is effectively appended to every corresponding record in customer_statistics
.
The dataset resulting from this joining is then grouped by each unique combination of user
, file_download
and product_id
and the count of records belonging to each group (i.e. the count of each time that a user
has downloaded a particular file
from that product_id
) is calculated.
I can't remember if MySQL
requires CountOfProducts
to be used by GROUP BY
. But, even though each combination of user
, file_download
and product_id
determines what the value of CountOfProducts
will be, many forms of SQL
require you to GROUP BY
each nonaggregated field selected. As such, and since adding CountOfProducts
to GROUP BY
does no harm, I have included CountOfProducts
in the GROUP BY
clause.
If one or two more rules can be clarified about their structure, then the sentences displayed can be automatically generated.
If you have any questions or comments, then please feel free to post a Comment accordingly.
Appendix
To exclude a single user from the set of results, please use the following variation.
SELECT user AS user,
file_download AS file_download,
product_id AS product_id,
COUNT( * ) AS CountPerProduct,
CountOfProducts AS CountOfProducts
FROM customer_statistics
JOIN ( SELECT user AS user,
file_download AS file_download,
COUNT( product_id ) AS CountOfProducts
FROM ( SELECT user AS user,
file_download AS file_download,
product_id AS product_id
FROM customer_statistics
WHERE user <> excludedUser
GROUP BY user,
file_download,
product_id
) AS uniqueComboFinder
GROUP BY user,
file_download
) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user
AND customer_statistics.file_download = CountOfProductsFinder.file_download
GROUP BY user,
file_download,
product_id,
CountOfProducts;
I have used excludedUser
here, but you can replace that with a constant value (such as Sam
) or a variable that holds the targetted value.
Please note that I have added the WHERE user <> excludedUser
clause to the innermost subquery. Because its parent subquery's results are based entirely on the results of the innermost subquery, the excluded user will not be represented in the parent subquery's reults. And since the excluded User
value does not appear in the parent subquery's results, when the main statement's INNER JOIN
is performed based (in part) on the shared values of User
, then the targetted User
will be excluded from the joined dataset as well.
By adding the WHERE
clause to the innermost subquery, I avoided a small amount of unnecessary processing by the middle and outer levels of the statement, thus making the overall statement slightly more efficient than if the user
value had been excluded at the middle or outer levels.
Similarly, if more than one User
needs to be excluded, you can exclude them by explicitly coding their values of User
into the statement or by joining to a table of exclusionary values. For the first case use...
SELECT user AS user,
file_download AS file_download,
product_id AS product_id,
COUNT( * ) AS CountPerProduct,
CountOfProducts AS CountOfProducts
FROM customer_statistics
JOIN ( SELECT user AS user,
file_download AS file_download,
COUNT( product_id ) AS CountOfProducts
FROM ( SELECT user AS user,
file_download AS file_download,
product_id AS product_id
FROM customer_statistics
WHERE user NOT IN ( 'Sam', 'I', 'Am' )
GROUP BY user,
file_download,
product_id
) AS uniqueComboFinder
GROUP BY user,
file_download
) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user
AND customer_statistics.file_download = CountOfProductsFinder.file_download
GROUP BY user,
file_download,
product_id,
CountOfProducts;
In the second case use...
SELECT user AS user,
file_download AS file_download,
product_id AS product_id,
COUNT( * ) AS CountPerProduct,
CountOfProducts AS CountOfProducts
FROM customer_statistics
JOIN ( SELECT user AS user,
file_download AS file_download,
COUNT( product_id ) AS CountOfProducts
FROM ( SELECT user AS user,
file_download AS file_download,
product_id AS product_id
FROM customer_statistics
WHERE user NOT IN ( SELECT user
FROM excludedUsers
)
GROUP BY user,
file_download,
product_id
) AS uniqueComboFinder
GROUP BY user,
file_download
) AS CountOfProductsFinder ON customer_statistics.user = CountOfProductsFinder.user
AND customer_statistics.file_download = CountOfProductsFinder.file_download
GROUP BY user,
file_download,
product_id,
CountOfProducts;
Solution 2:[2]
I'll give you a hint to get you going.
Start with ditching the individual download records in favor of aggregates, like so:
CREATE TEMPORARY TABLE IF NOT EXISTS basic_aggregated_stats
SELECT user, file_id, product_id, COUNT(*) AS cnt
FROM customer_statistics
GROUP BY user, file_id, product_id;
This is just one step (which, by the way, could also be used as a subquery in a larger more complicated query). There are more aggregations you can and should do to get the information you need. That's not really "restructuring the table".
In addition to more aggregation you need to consider getting the order right and producing subtotals as well.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | einpoklum |