'Formatting in SAS to Create Buckets
I use the following code to bucket my continuous variable in SAS, but it does not work:
proc freq data = right;
table Age;
run;
proc format;
value AgeBuckets
low -< 74 = "Younger"
75 -< 84 = "Older"
85 - high = "Oldest"
run;
data right;
format Age AgeBuckets.;
run;
It removes all of the records so I have no more data in there. What am I doing wrong?
Also, would it perhaps be best to simply create a new variable (bucketed version) off of the continuous one with if/then statements?
Solution 1:[1]
You are just not setting the dataset - rather creating a new one.
data right;
set right;
format Age AgeBuckets.;
run;
proc print;
run;
Also you are excluding ages 74 and 84 from the buckets. You may want to include them also:
proc format;
value AgeBuckets
low -< 74 = "Younger"
74 -< 84 = "Older"
84 - high = "Oldest"
run;
Solution 2:[2]
If that is your exact code, you're missing a semicolon after the last element in your PROC FORMAT statement. That would likely cause the issue you've described. It should read
proc format;
value AgeBuckets
low -< 74 = "Younger"
75 -< 84 = "Older"
85 - high = "Oldest";
run;
Solution 3:[3]
- You made a mistake in your data step, where you didn't reference the input file since you have no SET statement.
- It's rarely more efficient to use IF/THEN statements
- If you want a new variable use PUT to convert it, illustrated here
Programming where you use the same name for the input and output data set is a bad idea, it makes it very hard to find your mistakes.
proc format; value AgeBuckets low -< 75 = "Younger" 75 -< 85 = "Older" 85 - high = "Oldest" run; data right_formatted; set right; format Age AgeBuckets.; *create new variable with formatted value, will not sort correctly; Age_Formatted = put(age, ageBuckets.); run;
and:
*applying a format means that it sorts correctly for display;
proc freq data=right_formatted;
table age age_formatted ;
format age ageBuckets.;
run;
@Python R SAS user is correct about your formats as well.
Solution 4:[4]
You overwrote the dataset with the extra data step. But why did you want the data step at all? Just add the FORMAT statement to procedure that wants to group the values by their formatted values.
proc freq data = right;
table Age;
format Age AgeBuckets.;
run;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | SAS2Python |
Solution 2 | spish |
Solution 3 | Reeza |
Solution 4 | Tom |