'Using Distinct in Aggregate Select query

I am using oracle DB. I have a Aggregated script. We found that some of the rows in the table are repeated, unwanted and hence, is not supposed to be added in the sum.

now suppose i use Distinct command just after the select statement, will distinct command applied before aggregation or after it.



Solution 1:[1]

If you use SELECT DISTINCT, then the result set will have no duplicate rows.

If you use SELECT COUNT(DISTINCT), then the count will only count distinct values.

If you are thinking of using SUM(DISTINCT) (or DISTINCT with any other aggregation function) be warned. I have never used it (except perhaps as a demonstration), and I have written a fair number of queries.

You really need to solve the problem at the source. For instance, if accounts are being repeated, then SUM(DISTINCT) does not distinguish between accounts, only by the values assigned to the account. You need to get the logic right.

Solution 2:[2]

when you say that you have repeated rows - you must have a clear idea of uniqueness for the combination of some specific columns.

If you expect that certain column combinations are unique within specified groups yo can detect the groups deviating from that using queries following the pattern below.

select <your group by columns> 
from <your table name> 
group by <your group by predicate>
having (max(A)!=min(A) or max(B)!=min(B) or max(C)!=min(C))

Then you have to decide what to do with the problem. I would suggest cleaning up and adding unique constraints to the table.

The aggregate query you mention would run successfully for the rows in your table not having duplicate values for the combination of columns that needs to be unique. Using my example you could get the aggregates for that part of your data using the inverted having predicate.

It would be something like this

select <your aggregate functions, counts, sums, averages and so on> 
from <your table name> 
group by <your group by predicate>
having (max(A)=min(A) and max(B)=min(B) and max(C)=min(C)) 

If you must include the groups breaking uniqueness expectations you must somehow do a qualified selection of which of the variants in the group to use - you could for example go for the last one or the first one if one of your columns should happen to express something about when the row was created.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Gordon Linoff
Solution 2 Kristian Saksen