Hi there pretty simple question I think, I have a record like this for example :
name value
Mack 12
Mack 10
Mack 50
Now I want to put all the value in a variable or a single row. The result should be
value_concat
12,10,50
I try to use the first and last statement with SAS but it not working for me here is what I wrote :
data List_Trt1;
set List_Trt;
by name;
if first.name then value_concat = value;
value_concat = cats(value_concat,",",value);
if last.name then value_concat = cats(value_concat,",",value);
run;
Thank you for the help!
You're on the right track.
data List_Trt1;
set List_Trt;
by name;
length value_concat $30; *or whatever is appropriate;
retain value_concat;
if first.name then value_concat=' ';
value_concat=catx(',',value_concat,value);
if last.name then output;
run;
First, you need retain so it keeps its value throughout. Second, you need to initialize it to blank on first.name. Third, you need to output only on last.name. I use CATX because it is more appropriate to what you are doing, but your CATS should be okay also.
Related
I am trying to collapse my multiple rows of binary variables into a single row per patient id as depicted in my illustration. Could someone please help me with the SAS code to do this? Thanks
If the rule is that to set it to 1 if it is ever 1 then take the MAX. If the rule is to set it to one only if all of them are one then take the MIN.
proc summary data=have nway ;
by id;
output out=want max= ;
run;
Update trick
data want;
update have(obs=0) have;
by id;
run;
Or
proc sql;
create table want as
select ID, max('2018'n) as Y2018, max('2019'n) as Y2019, max('2020'n) as Y2020
from have
group by ID
order by ID;
quit;
Untested because you provided data as images, please post as text, preferably as a data step.
Here is a data step-based solution. Certainly more complex than the above answers, but it does show ways you can use arrays, first. and last. processing, and the retain statement.
Use a retained temporary array to hold the values of 2018-2020 until the last observation of each id group. On the last value of each id, check if each held value is 1 and set each value of the year to a 1 or 0.
data want;
set have;
by id;
array year[3] '2018'n--'2020'n;
array hold[3] _TEMPORARY_;
retain hold;
if(first.id) then call missing(of hold[*]);
do i = 1 to dim(year);
if(year[i] = 1) then hold[i] = 1;
end;
if(last.id) then do;
do i = 1 to dim(year);
year[i] = (hold[i] = 1);
end;
output;
end;
drop i;
run;
I have a questionnaire coded 1-5 and then labeled as (.) for missing variables. How do I code the data to reflect the following:
If patient has =>80% values not missing than missing values will be coded as the mean value of the questions answered. If patient is missing more than 80% of values than set measure summary to missing for patient, drop record.
condomuse;
set int108;
run;
proc means data=condomuse n nmiss missing;
var cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;
by Intround sid;
run;
Using the following assumptions:
each line/record is a unique person
all variables are numeric
NMISS(), N(), CMISS() and DIM() are functions that can work with arrays.
This will identify all records with 80% or more missing.
data temp; *temp is output data set name;
set have; *have is input data set name;
*create an array to avoid listing all variables later;
array vars_check(*) cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;
*calculate percent missing;
Percent_Missing = NMISS(of vars_check(*)) / Dim(vars_check);
if percent_missing >= 0.8 then exclude = 'Y';
else exclude = 'N';
run;
To replace with mean or a different method, PROC STDIZE can do that.
*temp is input data set name from previous step;
proc stdize data=temp out=temp_mean reponly method=mean;
*keep only records with more than 80%;
where exclude = 'N';
*list of vars to fill with mean;
VAR cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;
run;
The different methods for standardization are here, but these are standardization methods not imputation methods.
I have a series of dataset with two variables: uid and timestamp. I want to create a new variable called "session_num" to parse the timestamps into session numbers (when two timestampes are 30 min + apart, it will be marked as a new session).
For example:
I try to use retain statement in sas to loop through the timestamp, but it didn't work.
Here's my code:
Data test;
SET test1;
By uid;
RETAIN session_num session_len;
IF first.uid THEN DO;
session=1;
session_len=0;
END;
session_len=session_len+timpestamp;
IF timpestamp-session_len>1800 THEN session_num=session_num+1;
ELSE session_num=session_num;
IF last.uid;
KEEP uid timestamp session_num;
RUN;
Really appreciate if you could point out my mistake, and suggest the right solution.
Thanks!
First, here is some sample input data (in the future, you should supply your own code to generate the sample input data so others don't have to spend time doing this for you),
data test;
input uid$ timestamp : DATETIME.;
format timestamp DATETIME.;
datalines;
a 05jul2014:03:55:00
a 05jul2014:03:57:00
a 07jul2014:20:08:00
a 10jul2014:19:02:00
a 10jul2014:19:05:00
a 11jul2014:14:39:00
;
Then you can create the session variable as you defined it with
data testsession;
set test;
retain last;
retain session 0;
by uid;
if first.uid then do;
session = session +1;
last = timestamp;
end;
if (timestamp-last)/60>15 then do;
session = session+1;
end;
last = timestamp;
drop last;
run;
MrFlick's method is probably the more normal way to do this, but another option involves the look-ahead self-merge. (Yes, look-ahead, even though this is supposed to look behind - look behind is more complicated in this manner.)
data want;
retain session 1; *have to initialize to 1 for the first record!;
merge have(in=_h) have(rename=(timestamp=next uid=nextuid) firstobs=2);
output; *output first, then we adjust session for the next iteration;
if (uid ne nextuid) then session=1;
else if timestamp + 1800 < next then session+1;
drop next:;
run;
I have a variable, textvar, that looks like this:
type=1&name=bob
type=2&name=sue
I want to create a new table that looks like this:
type name
1 bob
2 sue
My approach is to use scan to split the variables on & so for the first observation I have
var1 var2
type=1 name=bob
So now I can use scan again to split on =:
vname = scan(var1, 1, '=');
value = scan(var1, 2, '=');
But how can I now assign value to the variable named vname?
PROC TRANPSOSE is the quickest way. You need an ID variable (dummy or real).
data test;
informat testvar $50.;
input testvar $;
datalines;
type=1&name=bob
type=2&name=sue
;;;;
run;
data test_vert;
set test;
id+1;
length scanner $20 vname vvalue $20;
scanner=scan(testvar,1,"&");
do _t=2 by 1 until (scanner=' ');
vname=scan(scanner,1,"=");
vvalue=scan(scanner,2,"=");
output;
scanner=scan(testvar,_t,"&");
end;
run;
proc transpose data=test_vert out=test_T;
by id;
id vname;
var vvalue;
run;
Does this help? Dynamic variable names in SAS
I think I have some code to address this, but left it at my workplace.
Obviously you haven't included your real data, but can't you just hard code some of the values if the format of the raw data is the same in each row? My code converts the "=" and "&" to "," to make the scan function easier to use.
data want (keep=type name);
set test;
_newvar=translate(testvar,",,","&=");
type=input(scan(_newvar,2),best12.);
length name $20;
name=scan(_newvar,4);
run;
I have a column in my sas file as age and another column as finalage. I want to substitute the values in age column by values in agefinal column for just one ID (that is 5)
The code that I used was:
Data temp;
set temp;
if ID = 5;
then age = agefinal;
run;
I could not substitute the values. The values in age column did not change. I tried to run this code to check the character length of values since character type is numeric for both the columns.
Code:
Proc contents data = temp;
tables age agefinal;
run;
The output that I got was:
age : character length 3.
agefinal: character length $3
I would appreciate your suggestions.
Try removing the semicolon at the end of the if statement. Right now what you're doing is deleting all records where the id isn't equal to five.
Try setting the formats to be the same
data temp;
modify temp;
format age agefinal $3.;
run;
and then see if it will let you do the substitution.
The code you provided runs with an ERROR, remove the additional semicolon and that may fix your issue:
/* ORIGINAL */
Data temp;
set temp;
if ID = 5;
then age = agefinal;
run;
/* CORRECTED */
Data temp;
set temp;
if ID = 5 /* REMOVED SEMICOLON */
then age = agefinal;
run;
Cheers
Rob