'Setting the values of a COUNT(*)
I'm using SAS and I'm stuck when I want to relay this in SQL :
I have a column that counts the number of line repetitions :
DUPLICATES | NUMBER | DATE
---------------------------------
2 | 123 | 22-05
2 | 123 | 22-05
3 | 222 | 21-05
3 | 222 | 21-05
3 | 222 | 21-05
1 | 111 | 23-05
...
Code :
CREATE WORK.TABLE TABLE_TEST AS
SELECT COUNT(*) AS DUPLICATES,
NUMBER,
DATE
FROM WORK.INITIAL_TABLE
GROUP BY NUMBER, DATE;
I would like to block the result of the COUNT(*)
so that when I do a SELECT DISTINCT...
on the table, I only have one row per value but with its initial number of repetitions :
DUPLICATES | NUMBER | DATE
---------------------------------
2 | 123 | 22-05
3 | 222 | 21-05
1 | 111 | 23-05
...
I tried to make a copy of the DUPLICATES
column but it also copied the calculation method COUNT(*)
which gives me a bad result when I do a SELECT DISTINCT
:
DUPLICATES_V2 | NUMBER | DATE
---------------------------------
1 | 123 | 22-05
1 | 222 | 21-05
1 | 111 | 23-05
...
If you have an idea of how to proceed or documentation it would help me a lot. Thanks in advance
Solution 1:[1]
There are several ways to do it, IMHO the simplest is proc sort:
proc sort data=work.INITIAL_TABLE out=TABLE_TEST nodupkey;
by NUMBER DATE;
run;
Solution 2:[2]
I think you want PROC FREQ here. I suspect your query isn't as shown and you have another column included, but not in the GROUP BY which is causing your issues. The query as shown has errors and would not successfully run.
proc freq data=initial_data noprint;
table number*date / out=want;
run;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Egor Lipchinskiy |
Solution 2 | Reeza |