I have been so confused on how to implement this in SAS. I am trying to create duplicate rows if the value of "2" occurs more than once between the variables (member1 -member4). For example, if a row has the value 2 in member2, member3, and member4, then I will create 2 duplicate rows since the initial row will serve for the first variable and the duplicate rows will be for member 3 and 4. On the duplicate row for member3 for example, member 2 and 4 will be missing if their values is equal to 2. Basically the value "2" can only occur once per row. let's assume sa1 to sa4 corresponds to other variables of member1 to member4 respectively. When we create a duplicate row for each member, the other variables should be missing if they have a value of "1". For example, if the duplicate row is for member 3, then values that equal "1" for sa1, sa2 and sa4 should be set to missing. There are other variables in the dataset that will have same values for all duplicate rows as initial rows. Duplicate rows will also have a suffix for the ID to indicate the parent rows.
This is an example of the data I have
id member1 member2 member3 member4 sa1 sa2 sa3 sa4
1 0 2 2 0 0 1 1 0
2 2 2 0 5 . 1 0 0
3 2 2 3 2 1 1 0 1
Then this is the output I am trying to achieve
id member1 member2 member3 member4 sa1 sa2 sa3 sa4
1 0 2 . 0 0 1 . 0
1_1 0 . 2 0 0 . 1 0
2 2 . 0 5 . . 0 0
2_1 . 2 0 5 . 1 0 0
3 2 . 3 . 1 . 0 .
3_1 . 2 3 . . 1 0 .
3_2 . . 3 2 . . 0 1
Will appreciate any help. Thank you!
You need to count the number of '2's. You also need to remember where they used to be. "I had the spots removed for good luck, but I remember where the spots formerly were."
data have ;
input id :$10. member1 member2 member3 member4 sa1 sa2 sa3 sa4 ;
cards;
1 0 2 2 0 0 1 1 0
2 2 2 0 5 . 1 0 0
3 2 2 3 2 1 1 0 1
4 2 0 0 0 . . . .
5 0 0 0 0 . . . .
;
data want ;
set have ;
array m member1-member4 ;
array x [4] _temporary_;
do index=1 to dim(m);
x[index]=m[index]=2;
end;
n2 = sum(of x[*]);
if n2<2 then output;
else do counter=1 to n2;
id=scan(id,1,'_');
if counter > 1 then id=catx('_',id,counter-1);
counter2=0;
do index=1 to dim(m);
if x[index] then do;
counter2+1;
if counter = counter2 then m[index]=2;
else m[index]=.;
end;
end;
output;
end;
drop index n2 counter counter2;
run;
Results
Obs id member1 member2 member3 member4 sa1 sa2 sa3 sa4
1 1 0 2 . 0 0 1 1 0
2 1_1 0 . 2 0 0 1 1 0
3 2 2 . 0 5 . 1 0 0
4 2_1 . 2 0 5 . 1 0 0
5 3 2 . 3 . 1 1 0 1
6 3_1 . 2 3 . 1 1 0 1
7 3_2 . . 3 2 1 1 0 1
8 4 2 0 0 0 . . . .
9 5 0 0 0 0 . . . .
I think your expecting us to code the whole thing for you... I dont get your logic explanation of what you want - but to start off with:
create a new dataset
rename all the variables on the way in - prefix with O_ (Original)
code however you like to see how many values contain 2 (HOWMANYTWOS)
do ROW = 1 to HOWMANYTWOS
4.1 again go through the values on the O_ variables you have
4.2 if the ROW - corresponds to your increasing counter its the 2 you wish to keep and so you dont touch it - if the 2 does not correspond to your ROW - make it .
4.3 output the record with a new(if required) ID
a start for you:
data NEW;
set ORIG (rename=(MEMBER1-MEMBER4=O_MEMBER1-O_MEMBER4 ID=O_ID etc..)
HOWMANYTWOS = sum(O_MEMBER1=2,O_MEMBER2=2,O_MEMBER3=2,O_MEMBER4=2);
do ROW = 1 to HOWMANYTWOS; /* This is stepping through and creating the new rows - you need to step through the variables to see if you want to make them null before outputting... NOTE do not change O_ variables only create/update the variables going to the output dataset (The O_ version is for checking against only)
ID = ifc(ROW = 1, O_ID, catx("_", O_ID, ROW);
/* create a counter
output;
end;
run;
Sorry - Not got sas here and its been a little while
Related
I need to do the following stuff with Sas.
I have a dataset like this;
ID flg_1 flg_2 flg_3 ... flg_200
1 0 1 0 ... 1
2 1 0 0 ... 0
3 0 0 1 ... 0
4 1 1 1 .... 0
....
I would like to create a new column having the name of flags equal to 1. I mean:
ID flg_1 flg_2 flg_3 ... flg_200 NEW_VAR
1 0 1 0 ... 1 flg_2-flg_200
2 1 0 0 ... 0 flg_1
3 0 0 1 ... 0 flg_3
4 1 1 1 .... 0 flg_1-flg_2-flg_3
....
Could you help me?
thanks
Use a variable based array for iterating over flags and the vname to retrieve the variable name.
Example:
/* make sure the accumulator variable named 'flagged' is wide
* enough accommodate the case of all variables being flagged.
*/
data want;
set have;
attrib flagged length=$1600 label='List of variables that were flagged';
array flags flg_1-flg_200;
do _n_ = 1 to dim(flags);
if flags(_n_) then flagged = catx(',', flagged, vname(flags(_n_)));
end;
run;
I want to find the number of unique ids for every subset combination of the variables. For example
data have;
input id var1 var2 var3;
datalines;
5 1 0 0
5 1 1 1
5 1 0 1
5 0 0 0
6 1 0 0
7 1 1 1
8 1 0 1
9 0 0 0
10 1 0 0
11 1 0 0
12 1 . 1
13 0 0 1
;
run;
I want the result to be
var1 var2 var3 count
. . 0 5
. . 1 5
. 0 . 7
. 0 0 5
. 0 1 3
. 1 . 2
. 1 1 2
0 . . 3
0 . 0 2
0 . 1 1
0 0 . 3
0 0 0 2
0 0 1 1
1 . . 7
1 . 0 4
1 . 1 4
1 0 . 5
1 0 0 4
1 0 1 2
1 1 . 2
1 1 1 2
which is the result of appending all the possible proc sql; group bys (var1 is shown below)
proc sql;
create table sub1 as
select var1, count(distinct id) as count
from have
where not missing(var1)
group by var1
;
quit;
I don't care about the case where all variables are missing or when any of the variables in the group by are missing. Is there a more efficient way of doing this?
You can use Proc SUMMARY to compute the combinations of var1-var3 values for each id by group. From the SUMMARY output a SQL query can count the distinct ids per combination.
Example:
data have;
input id var1 var2 var3;
datalines;
5 1 0 0
5 1 1 1
5 1 0 1
5 0 0 0
6 1 0 0
7 1 1 1
8 1 0 1
9 0 0 0
10 1 0 0
11 1 0 0
12 1 . 1
13 0 0 1
;
proc summary noprint missing data=have;
by id;
class var1-var3;
output out=combos;
run;
proc sql;
create table want as
select var1, var2, var3, count(distinct id) as count
from combos
group by var1, var2, var3
;
I have a variable with IDs:
clear
input ID
1
.
2
1
.
3
4
5
4
4
6
end
How can I create separate categorical variables with ID as a name and values of 1 and 2 (the latter if the generated variable matches the ID)?
For example, variable _ID_1 should look as follows:
2
.
1
2
.
1
1
1
1
1
1
Any ideas?
Another way to do it:
clear
input ID
1
.
2
1
.
3
4
5
4
4
6
end
forvalues j = 1/6 {
generate ID_`j' = 1 + (ID == `j') if ID != .
}
list
I am dealing with a repeated measures dataset in a wide format. Each observation represents one measurement for one subject and each subject is measures six times. The data contains mainly dummy variables.
I am looking to do a count of unique dummy variable values across all six observations for each subject.
Have:
MeasurementNum SubjectID Dummy0 Dummy1 Dummy2 Dummy3 Dummy4
-----------------------------------------------------------------------------
1 1 1 1 0 0 0
2 1 0 1 0 1 0
3 1 - - - - -
4 1 0 0 1 1 0
5 1 - - - - -
6 1 0 0 0 1 0
1 2 1 0 0 1 0
2 2 0 0 0 0 0
3 2 0 1 0 0 0
4 2 1 1 0 1 0
5 2 - - - - -
6 2 1 1 1 0 0
Want:
Total for Overall
MeasurementNum SubjectID ... MeasurementNUM Total
--------------------------------...-----------------------------
1 1 ... 2 4
2 1 ... 2 4
3 1 ... - 4
4 1 ... 2 4
5 1 ... - 4
6 1 ... 1 4
1 2 ... 2 4
2 2 ... 0 4
3 2 ... 1 4
4 2 ... 3 4
5 2 ... - 4
6 2 ... 3 4
My current approach is to consolidate all six rows within each subject to one rows retaining value 1 using Proc MEANS with BY and OUTPUT statements, as described in this related question. I then use Proc SUMMARY to get the values listed under variable 'Total` in the have statement.
proc summary
data=have;
By SubjectID
class Dummy1-4;
output out=want sum=sum;
Is there a way to get the distinct/unique counts across observations without consolidating rows first?
I prefer PROC SQL as it will also allow me to do conditional counts according to subject covariates present in my working dataset. I.e. producing the want descriptives on condition of a covariate specific to the subject.
I suspect that using PROC SUMMARY (aka PROC MEANS) will be the easiest way. Sounds like you want to find the MAX for each SUBJECT and then SUM those to get the subject totals.
proc summary data=have nway ;
class SubjectID ;
var Dummy0-Dummy999;
output out=any(drop=_type_ _freq_) n=n_reps max= ;
run;
data want ;
set any ;
total = sum(of Dummy0-Dummy999) ;
run;
Not sure how SQL helps any with conditional counts. But you could generate the counts and total in one step with PROC SQL, but it would require wallpaper code like this:
proc sql ;
create table want as
select SubjectID
, count(*) as n_reps
, max(dummy0) as dummy0
, max(dummy1) as dummy1
...
, max(dummy999) as dumyy999
, sum
( max(dummy0)
, max(dummy1)
...
, max(dummy999)
) as Total
from have
group by 1
;
quit;
You could probably define a macro (or some other tool) to generate that wallpaper code for you from a list of variable names.
data test;
input Index Indicator value FinalValue;
datalines;
1 0 5 21
1 1 21 21
2 1 0 0
3 0 4 7
3 1 7 7
3 0 8 7
3 0 2 7
4 1 1 1
4 0 4 1
;
run;
I have a data set with the first 3 columns. How do I get the 4th columns based on the indicators? For example, for the index, when the indicator =1, the value is 21, so I put 21 is the final values in all lines for index 1.
Use the SAS Retain Keyword.
You can do this in a data step; by Retaining the Value where indicator = 1.
Steps:
Sort your data by Index and Indicator
Group by the Index & Retain the Value where Indicator=1
Code:
/*Sort Data by Index and Indicator & remove the hardcodeed finalvalue*/
proc sort data=test (keep= Index Indicator value);
by index descending indicator ;
run;
/*Retain the FinalValue*/
data want;
set test;
retain FinalValue;
keep Index Indicator value FinalValue;
if indicator =1 then do;FinalValue=value;end;
/*The If statement below will assign . to records that doesn't have an indicator value of 1*/
if indicator ne 1 and FIRST.Index=1 then FinalValue=.;
by index;
run;
Output:
Index=1 Indicator=1 value=21 FinalValue=21
Index=1 Indicator=0 value=5 FinalValue=21
Index=2 Indicator=1 value=0 FinalValue=0
Index=3 Indicator=1 value=7 FinalValue=7
Index=3 Indicator=0 value=4 FinalValue=7
Index=3 Indicator=0 value=8 FinalValue=7
Index=3 Indicator=0 value=2 FinalValue=7
Index=4 Indicator=1 value=1 FinalValue=1
Index=4 Indicator=0 value=4 FinalValue=1
Use proc sql by left join. Select value which indicator=1 and group by index, then left join with original dataset. It seemed that your first row of index=3 should be 7, not 0.
proc sql;
select a.*,b.finalvalue from test a
left join (select *,value as finalvalue from test group by index having indicator=1) b
on a.index=b.index;
quit;
This is rather old school but should be adequate. I reckon you call it a self merge or something.
data test;
input Index Indicator value;* FinalValue;
datalines;
1 0 5 21
1 1 21 21
2 1 0 0
3 0 4 7
3 1 7 7
3 0 8 7
3 0 2 7
4 1 1 1
4 0 4 1
;;;;
run;
data final;
if 0 then set test;
merge test(where=(indicator eq 1) rename=(value=FinalValue)) test;
by index;
run;
proc print;
run;
Final
Obs Index Indicator value Value
1 1 0 5 21
2 1 1 21 21
3 2 1 0 0
4 3 0 4 7
5 3 1 7 7
6 3 0 8 7
7 3 0 2 7
8 4 1 1 1
9 4 0 4 1