'SAS New Variable Array from Single Variable
Should be an easy question that I cannot answer. I was given some survey data that had several questions that allowed "select as many that apply." For these questions, the response options are stored as a single variable separated by commas.
For example, say that question 1 (Q1) had 9 different response options. For that question, Person 1 might have three responses stored as: 1,3,10 whereas person 2 might have four responses stored as: 2,3,8,9.
From this single variable (Q1), I want to create 9 separate variables, one for each response option (Q1_1 to Q1_9). I believe I can do this in one DATA step with an ARRAY and a DO loop, but neither the INDEX or IN functions are working. Below is the code I've been using.
DATA Final; SET Final;
ARRAY Q1b{9} Q1_1 - Q1_9; *new variables I want to create;
DO i = 1 TO 9;
IF NOT MISSING(Q1) THEN Q1b{i} = 0; *works;
IF INDEX(Q1,"i") THEN Q1b{i} = 1; *Doesn't work;
IF Q1 IN: ("i") THEN Q1b{i} = 1; *Doesn't work either;
END;
RUN;
Using responses from the example people above, the value of Q1_1 for person 1 would be 1, whereas for person 2 it would be 0. Likewise, the value of Q1_2 for person 1 would be 0, whereas for person 2 it would be 1. I'm guessing it has something with how SAS stores the looping i (I'm thinking of it similar to a macro variable without the &). Thought are appreciated.
Thanks, Ryan
Solution 1:[1]
INDEX() needs character strings.
IF INDEX(Q1,cats(i)) THEN Q1b{i} = 1;
Note that will not scale if there are more than 9 possible responses since the digit '1' will appear in both '1' and '10'. So you might want to use INDEXW() instead. Make sure to include space in addition to comma as a delimiter to prevent trailing spaces or spaces around the commas from becoming part of the words in the string.
IF INDEXW(Q1,cats(i),' ,') THEN Q1b{i} = 1;
Boolean expressions evaluate to 0 or 1. So your program could just be:
data WANT;
set HAVE;
array q1_ [9] ;
do i = 1 to dim(q1_);
q1_[i] = 0<indexw(q1,cats(i),' ,');
end;
run;
Solution 2:[2]
- Generally shouldn't use the same input and output data set name. Makes it hard to debug and you sometimes get unexpected output.
- Check if the variable is empty before the loop to save you from loop unneccesarily
- Remove the quotes from the i to use the number i. In quotes it's looking for the letter i, not the iterator variable i.
Something like this maybe:
DATA Final_expanded; /*1*/
SET Final;
ARRAY Q1b{9} Q1_1 - Q1_9; *new variables I want to create;
if not missing(q1) then DO i = 1 TO 9; /*2*/
IF INDEX(Q1, i) THEN Q1b{i} = 1; /*3*/
END;
RUN;
EDIT: couldn't get that to work for some reason but this method does work.
DATA Final_expanded; /*1*/
SET have;
ARRAY Q1b{9} Q1_1 - Q1_9; *new variables I want to create;
nwords = countw(q1);
if not missing(q1) then DO i = 1 TO nwords; /*2*/
index = scan(q1, i);
q1b(index) = 1;
END;
RUN;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Reeza |